Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization
Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization


The present scaling strategy of Amazon Redshift Serverless will increase your compute capability based mostly on the question queue time and scales down when the queuing reduces on the info warehouse. Nonetheless, you may must routinely scale compute assets based mostly on elements like question complexity and information quantity to satisfy price-performance targets, no matter question queuing. To deal with this requirement, Redshift Serverless launched the artificial intelligence (AI)-driven scaling and optimization function, which scales the compute not solely based mostly on the queuing, but in addition factoring information quantity and question complexity.

On this submit, we describe how Redshift Serverless makes use of the brand new AI-driven scaling and optimization capabilities to deal with widespread use circumstances. This submit additionally consists of instance SQLs, which you’ll be able to run by yourself Redshift Serverless information warehouse to expertise the advantages of this function.

Resolution overview

The AI-powered scaling and optimization feature in Redshift Serverless supplies a user-friendly visible slider to set your required stability between value and efficiency. By transferring the slider, you possibly can select between optimized for price, balanced efficiency and value, or optimized for efficiency. Primarily based on the place you place the slider, Amazon Redshift will routinely add or take away assets to make sure higher habits and carry out different AI-driven optimizations like automated materialized views and automated desk design optimization to satisfy your chosen price-performance goal.

Price Performance Slider

The slider gives the next choices:

  • Optimized for price – Prioritizes price financial savings. Redshift makes an attempt to routinely scale up compute capability when doing so and doesn’t incur extra costs. And it’ll additionally try to scale down compute for decrease price, regardless of longer runtime.
  • Balanced – Gives stability between efficiency and value. Redshift scales for efficiency with a average price enhance.
  • Optimized for efficiency – Prioritizes efficiency. Redshift scales aggressively for optimum efficiency, probably incurring increased prices.

Within the following sections, we illustrate how the AI-driven scaling and optimization function can intelligently predict your workload compute wants and scale proactively for 3 eventualities:

  • Use case 1 – An extended-running complicated question. Compute scales based mostly on question complexity.
  • Use case 2 – A sudden spike in ingestion quantity (a three-fold enhance, from 720 million to 2.1 billion). Compute scales based mostly on information quantity.
  • Use case 3 – An information lake question scanning giant datasets (TBs). Compute scales based mostly on the anticipated information to be scanned from the info lake. The anticipated information scan is predicted by machine studying (ML) fashions based mostly on prior historic run statistics.

Within the current auto scaling mechanism, the use circumstances don’t enhance compute capability routinely except queuing is recognized throughout the occasion.

Stipulations

To comply with alongside, full the next stipulations:

  1. Create a Redshift Serverless workgroup in preview mode. For directions, see Creating a preview workgroup.
  2. Whereas creating the preview group, select Efficiency and Price Controls and Worth-performance goal, and alter the slider to Optimized for efficiency. For extra info, consult with Amazon Redshift adds new AI capabilities, including Amazon Q, to boost efficiency and productivity.
  3. Arrange an AWS Identity and Access Management (IAM) function because the default IAM function. Consult with Managing IAM roles created for a cluster using the console for directions.
  4. We use TPC-DS 1TB Cloud Information Warehouse Benchmark information to reveal this function. Run the SQL statements to create tables and cargo the TPC-DS 1TB information.

Use case 1: Scale compute based mostly on question complexity

The next question analyzes product gross sales throughout a number of channels comparable to web sites, wholesale, and retail shops. This complicated question sometimes takes about 25 minutes to run with the default 128 RPUs. Let’s run this workload on the preview workgroup created as a part of stipulations.

When a question is run for the primary time, the AI scaling system could make a suboptimal determination relating to useful resource allocation or scaling because the system continues to be studying the question and information traits. Nonetheless, the system learns from this expertise, and when the identical question is run once more, it may possibly make a extra optimum scaling determination. Subsequently, if the question didn’t scale through the first run, it is suggested to rerun the question. You may monitor the RPU capability used on the Redshift Serverless console or by querying the SYS_SERVERLESS_USAGE system view.

The outcomes cache is turned off within the following queries to keep away from fetching outcomes from the cache.

SET enable_result_cache_for_session TO off;
with /* TPC-DS demo question */
    ws as
    (choose d_year AS ws_sold_year, ws_item_sk,    ws_bill_customer_sk
     ws_customer_sk,    sum(ws_quantity) ws_qty,    sum(ws_wholesale_cost) ws_wc,
        sum(ws_sales_price) ws_sp   from web_sales   left be a part of web_returns on
     wr_order_number=ws_order_number and ws_item_sk=wr_item_sk   be a part of date_dim
     on ws_sold_date_sk = d_date_sk   the place wr_order_number is null   group by
     d_year, ws_item_sk, ws_bill_customer_sk   ),
    cs as  
    (choose d_year AS cs_sold_year,
     cs_item_sk,    cs_bill_customer_sk cs_customer_sk,    sum(cs_quantity) cs_qty,
        sum(cs_wholesale_cost) cs_wc,    sum(cs_sales_price) cs_sp   from catalog_sales
       left be a part of catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
       be a part of date_dim on cs_sold_date_sk = d_date_sk   the place cr_order_number is
     null   group by d_year, cs_item_sk, cs_bill_customer_sk   ),
    ss as  
    (choose
     d_year AS ss_sold_year, ss_item_sk,    ss_customer_sk,    sum(ss_quantity)
     ss_qty,    sum(ss_wholesale_cost) ss_wc,    sum(ss_sales_price) ss_sp
       from store_sales left be a part of store_returns on sr_ticket_number=ss_ticket_number
     and ss_item_sk=sr_item_sk   be a part of date_dim on ss_sold_date_sk = d_date_sk
       the place sr_ticket_number is null   group by d_year, ss_item_sk, ss_customer_sk
       ) 
       
       choose 
       ss_customer_sk,spherical(ss_qty/(coalesce(ws_qty+cs_qty,1)),2)
     ratio,ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price,
    coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty,coalesce(ws_wc,0)+coalesce(cs_wc,0)
     other_chan_wholesale_cost,coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price
    from ss left be a part of ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk
     and ws_customer_sk=ss_customer_sk)left be a part of cs on (cs_sold_year=ss_sold_year
     and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk)the place coalesce(ws_qty,0)>0
     and coalesce(cs_qty, 0)>0 order by   ss_customer_sk,  ss_qty desc, ss_wc
     desc, ss_sp desc,  other_chan_qty,  other_chan_wholesale_cost,  other_chan_sales_price,
      spherical(ss_qty/(coalesce(ws_qty+cs_qty,1)),2);

When the question is full, run the next SQL to seize the beginning and finish occasions of the question, which will probably be used within the subsequent question:

choose query_id,query_text,start_time,end_time, elapsed_time/1000000.0 duration_in_seconds
from sys_query_history
the place query_text like '%TPC-DS demo question%'
and query_text not like '%sys_query_history%'
order by start_time desc

Let’s assess the compute scaled through the previous start_time and end_time interval. Substitute start_time and end_time within the following question with the output of the previous question:

choose * from sys_serverless_usage
the place end_time >= 'start_time'
and end_time <= DATEADD(minute,1,'end_time')
order by end_time asc

-- Instance
--select * from sys_serverless_usage
--where end_time >= '2024-06-03 00:17:12.322353'
--and end_time <= DATEADD(minute,1,'2024-06-03 00:19:11.553218')
--order by end_time asc

The next screenshot reveals an instance output.

Use Case 1 output

You may discover the rise in compute over the period of this question. This demonstrates how Redshift Serverless scales based mostly on question complexity.

Use case 2: Scale compute based mostly on information quantity

Let’s take into account the web_sales ingestion job. For this instance, your day by day ingestion job processes 720 million data and completes in a median of two minutes. That is what you ingested within the prerequisite steps.

As a result of some occasion (comparable to month finish processing), your volumes elevated by 3 times and now your ingestion job must course of 2.1 billion data. In an current scaling strategy, this could enhance your ingestion job runtime except the queue time is sufficient to invoke extra compute assets. However with AI-driven scaling, in efficiency optimized mode, Amazon Redshift routinely scales compute to finish your ingestion job inside typical runtimes. This helps shield your ingestion SLAs.

Run the next job to ingest 2.1 billion data into the web_sales desk:

copy web_sales from 's3://redshift-downloads/TPC-DS/2.13/3TB/web_sales/' iam_role default gzip delimiter '|' EMPTYASNULL area 'us-east-1';

Run the next question to check the period of ingesting 2.1 billion data and 720 million data. Each ingestion jobs accomplished in roughly an analogous time, regardless of the three-fold enhance in quantity.

choose query_id,table_name,data_source,loaded_rows,period/1000000.0 duration_in_seconds , start_time,end_time
from sys_load_history
the place
table_name="web_sales"
order by start_time desc

Run the next question with the beginning occasions and finish occasions from the earlier output:

choose * from sys_serverless_usage
the place end_time >= 'start_time'
and end_time <= DATEADD(minute,1,'end_time')
order by end_time asc

The next is an instance output. You may discover the rise in compute capability for the ingestion job that processes 2.1 billion data. This illustrates how Redshift Serverless scaled based mostly on information quantity.

Use Case 2 Output

Use case 3: Scale information lake queries

On this use case, you create external tables pointing to TPC-DS 3TB information in an Amazon Simple Storage Service (Amazon S3) location. You then run a question that scans a big quantity of knowledge to reveal how Redshift Serverless can routinely scale compute capability as wanted.

Within the following SQL, present the ARN of the default IAM function you hooked up within the stipulations:

-- Create exterior schema
create exterior schema ext_tpcds_3t
from information catalog
database ext_tpcds_db
iam_role '<ARN of the default IAM function hooked up>'
create exterior database if not exists;

Create exterior tables by working DDL statements within the following SQL file. You need to see seven exterior tables within the question editor beneath the ext_tpcds_3t schema, as proven within the following screenshot.

External Tables

Run the next question utilizing exterior tables. As talked about within the first use case, if the question didn’t scale through the first run, it is suggested to rerun the question, as a result of the system could have discovered from the earlier expertise and may probably present higher scaling and efficiency for the next run.

The outcomes cache is turned off within the following queries to keep away from fetching outcomes from the cache.

SET enable_result_cache_for_session TO off;

with /* TPC-DS demo information lake question */

ws as
(choose d_year AS ws_sold_year, ws_item_sk, ws_bill_customer_sk
ws_customer_sk,    sum(ws_quantity) ws_qty,    sum(ws_wholesale_cost) ws_wc,
sum(ws_sales_price) ws_sp   from ext_tpcds_3t.web_sales   left be a part of ext_tpcds_3t.web_returns on
wr_order_number=ws_order_number and ws_item_sk=wr_item_sk   be a part of ext_tpcds_3t.date_dim
on ws_sold_date_sk = d_date_sk   the place wr_order_number is null   group by
d_year, ws_item_sk, ws_bill_customer_sk   ),

cs as
(choose d_year AS cs_sold_year,
cs_item_sk,    cs_bill_customer_sk cs_customer_sk,    sum(cs_quantity) cs_qty,
sum(cs_wholesale_cost) cs_wc,    sum(cs_sales_price) cs_sp   from ext_tpcds_3t.catalog_sales
left be a part of ext_tpcds_3t.catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
be a part of ext_tpcds_3t.date_dim on cs_sold_date_sk = d_date_sk   the place cr_order_number is
null   group by d_year, cs_item_sk, cs_bill_customer_sk   ),

ss as
(choose
d_year AS ss_sold_year, ss_item_sk,    ss_customer_sk,    sum(ss_quantity)
ss_qty,    sum(ss_wholesale_cost) ss_wc,    sum(ss_sales_price) ss_sp
from ext_tpcds_3t.store_sales left be a part of ext_tpcds_3t.store_returns on sr_ticket_number=ss_ticket_number
and ss_item_sk=sr_item_sk   be a part of ext_tpcds_3t.date_dim on ss_sold_date_sk = d_date_sk
the place sr_ticket_number is null   group by d_year, ss_item_sk, ss_customer_sk)

SELECT           ss_customer_sk,spherical(ss_qty/(coalesce(ws_qty+cs_qty,1)),2)
ratio,ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price,
coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty,coalesce(ws_wc,0)+coalesce(cs_wc,0)    other_chan_wholesale_cost,coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price
FROM ss left be a part of ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk)left be a part of cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk)
the place coalesce(ws_qty,0)>0
and coalesce(cs_qty, 0)>0
order by   ss_customer_sk,  ss_qty desc, ss_wc desc, ss_sp desc,  other_chan_qty,  other_chan_wholesale_cost,  other_chan_sales_price,     spherical(ss_qty/(coalesce(ws_qty+cs_qty,1)),2);

Evaluate the entire elapsed time of the question. You want the start_time and end_time from the outcomes to feed into the following question.

choose query_id,query_text,start_time,end_time, elapsed_time/1000000.0 duration_in_seconds
from sys_query_history
the place query_text like '%TPC-DS demo information lake question%'
and query_text not like '%sys_query_history%'
order by start_time desc

Run the next question to see how compute scaled through the previous start_time and end_time interval. Substitute start_time and end_time within the following question from the output of the previous question:

choose * from sys_serverless_usage
the place end_time >= 'start_time'
and end_time <= DATEADD(minute,1,'end_time')
order by end_time asc

The next screenshot reveals an instance output.

Use Case 3 Output

The elevated compute capability for this information lake question reveals that Redshift Serverless can scale to match the info being scanned. This demonstrates how Redshift Serverless can dynamically allocate assets based mostly on question wants.

Concerns when selecting your price-performance goal

You should use the price-performance slider to decide on your required price-performance goal to your workload. The AI-driven scaling and optimizations present holistic optimizations utilizing the next fashions:

  • Question prediction fashions – These decide the precise useful resource wants (reminiscence, CPU consumption, and so forth) for every particular person question
  • Scaling prediction fashions – These predict how the question would behave on completely different capability sizes

Let’s take into account a question that takes 7 minutes and prices $7. The next determine reveals the question runtimes and value with no scaling.

Scaling Type Example

A given question may scale in a couple of alternative ways, as proven under. Primarily based on the price-performance goal you selected on the slider, AI-driven scaling predicts how the question trades off efficiency and value, and scales it accordingly.

Scaling Types

The slider choices yield the next outcomes:

  • Optimized for price – Whenever you select Optimized for price, the warehouse scales up if there isn’t a extra price or lesser prices to the person. Within the previous instance, the superlinear scaling strategy demonstrates this habits. Scaling will solely happen if it may be performed in an economical method in keeping with the scaling mannequin predictions. If the scaling fashions predict that cost-optimized scaling isn’t attainable for the given workload, then the warehouse received’t scale.
  • Balanced – With the Balanced possibility, the system will scale in favor of efficiency and there will probably be a price enhance, however it is going to be a restricted enhance in price. Within the previous instance, the linear scaling strategy demonstrates this habits.
  • Optimized for efficiency – With the Optimized for efficiency possibility, the system will scale in favor of efficiency though the prices are increased and non-linear. Within the previous instance, the sublinear scaling strategy demonstrates this habits. The nearer the slider place is to the Optimized for efficiency place, the extra sublinear scaling is permitted.

The next are extra factors to notice:

  • The value-performance slider choices are dynamic and they are often modified anytime. Nonetheless, the affect of those modifications is not going to be realized instantly. The affect of that is efficient because the system learns tips on how to scale the present workload and any extra workloads higher.
  • The value-performance slider choices, Max capacity and Max RPU-hours are designed to work collectively. Max capability and Max RPU-hours are the controls to restrict most RPUs the info warehouse allowed to scale and most RPU hours allowed to devour respectively. These controls are all the time honored and enforced whatever the settings on the price-performance goal slider.
  • The AI-driven scaling and optimization function dynamically adjusts compute assets to optimize question runtime velocity whereas adhering to your price-performance necessities. It considers elements comparable to question queueing, concurrency, quantity, and complexity. The system can both run queries on a compute useful resource with decrease concurrent queries or spin up extra compute assets to keep away from queueing. The purpose is to supply the perfect price-performance stability based mostly in your decisions.

Monitoring

You may monitor the RPU scaling within the following methods:

  • Evaluate the RPU capability used graph on the Amazon Redshift console.
  • Monitor the ComputeCapacity metric beneath AWS/Redshift-Serverless and Workgroup in Amazon CloudWatch.
  • Question the SYS_QUERY_HISTORY view, offering the particular question ID or question textual content to determine the time interval. Use this time interval to question the SYS_SERVERLESS_USAGE system view to seek out the compute_capacity The compute_capacity area will present the RPUs scaled through the question runtime.

Consult with Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable for the step-by-step directions on utilizing these approaches.

Clear up

Full the next steps to delete the assets you created to keep away from surprising prices:

  1. Delete the Redshift Serverless workgroup.
  2. Delete the Redshift Serverless related namespace.

Conclusion

On this submit, we mentioned tips on how to optimize your workloads to scale based mostly on the modifications in information quantity and question complexity. We demonstrated an strategy to implement extra responsive, proactive scaling with the AI-driven scaling function in Redshift Serverless. Do this function in your setting, conduct a proof of idea in your particular workloads, and share your suggestions with us.


Concerning the Authors

Satesh Sonti is a Sr. Analytics Specialist Options Architect based mostly out of Atlanta, specialised in constructing enterprise information platforms, information warehousing, and analytics options. He has over 19 years of expertise in constructing information belongings and main complicated information platform packages for banking and insurance coverage shoppers throughout the globe.

Ashish Agrawal is a Principal Product Supervisor with Amazon Redshift, constructing cloud-based information warehouses and analytics cloud companies. Ashish has over 25 years of expertise in IT. Ashish has experience in information warehouses, information lakes, and platform as a service. Ashish has been a speaker at worldwide technical conferences.

Davide Pagano is a Software program Improvement Supervisor with Amazon Redshift based mostly out of Palo Alto, specialised in constructing cloud-based information warehouses and analytics cloud companies options. He has over 10 years of expertise with databases, out of which 6 years of expertise tailor-made to Amazon Redshift.

Leave a Reply

Your email address will not be published. Required fields are marked *