![]() The following diagram is an architectural illustration of how Automatic table optimization works:Īs you can notice, as users query the data in Amazon Redshift, automatic table optimization collects the query statistics that are analyzed using a machine learning service to predict recommendations about the sort and distribution keys. In this post, we illustrate how you can take advantage of the Automatic table optimization feature for your workloads and easily manage several thousands of tables with zero administration. Optimizations made by the Automatic table optimization feature have been shown to increase cluster performance by 24% and 34% using the 3 TB and 30 TB TPC-DS benchmark, respectively, versus a cluster without Automatic table optimization. If Amazon Redshift determines that applying a key will improve cluster performance, tables are automatically altered within hours without requiring administrator intervention. Automatic table optimization continuously observes how queries interact with tables and uses ML to select the best sort and distribution keys to optimize performance for the cluster’s workload. In Amazon Redshift, you can set the proper sort and distribution keys for tables and allow for significant performance improvements for the most demanding workloads.Īutomatic table optimization is a new self-tuning capability that helps you achieve the performance benefits of sort and distribution keys without manual effort. These capabilities use machine learning (ML) to adapt as your workloads shift, enabling you to get insights faster without spending valuable time managing your data warehouse.Īlthough Amazon Redshift provides industry-leading performance out of the box for most workloads, some queries benefit even more by pre-sorting and rearranging how data is physically set up on disk. Amazon Redshift can even automatically refresh and rewrite materialized views, speeding up query performance by orders of magnitude with pre-computed results. In addition, automatic workload management (WLM) makes sure that you use cluster resources efficiently, even with dynamic and unpredictable workloads. Amazon Redshift automates common maintenance tasks and is self-learning, self-optimizing, and constantly adapting to your actual workload.Īmazon Redshift has several features that automate performance tuning: automatic vacuum delete, automatic table sort, automatic analyze, and Amazon Redshift Advisor for actionable insights into optimizing cost and performance. This translates to the DISTKEY clause in the DDL.Amazon Redshift is a fast, petabyte-scale cloud data warehouse delivering the best price–performance. When the distribution style is chosen as KEY, the Distribution Keys selection is enabled and allows you to choose the distribution key column. This translates to the DISTSTYLE clause of the DDL. It is generally suited to small tables used frequently in joins. ALL distribution can improve execution time when used with certain dimension tables where KEY distribution is not appropriate. The downside is that it multiplies storage requirements, increases load time, and increases maintenance times for the table. This distribution style ensures that all the rows required for any join to this table are available on every node. ALL: ALL means that a copy of the entire table is distributed to every node.EVEN is the default distribution style and assumed unless a different DISTSTYLE is specified. The result is the distribution of approximately the same number of rows to each node. EVEN: EVEN means that data in the table spreads evenly across the nodes in a cluster in round-robin distribution determined by Row IDs.When DISTSTYLE of KEY is specified, one or more DISTKEY columns must be specified for the table. This allows the optimizer to perform joins more efficiently. When join columns of joining tables are set as distribution keys, the joining rows from both tables are collocated on the compute nodes. KEY: Distribution Keys (DISTKEY) means the data is distributed by the values in the DISTKEY column(s).The Distribution Style allows you to choose between one of the following values: The Distribution tab of the Table Editor for the Amazon Redshift platform allows you to create the DISTSTYLE and DISTKEY clauses for a table.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |