Analyzing How Your Redshift Cluster is Performing Queries and Storing Data

Useful Queries

Space Used – careful this shows total raw space – not nominal

Per Amazon – STV_Partitions / show the raw disk space which includes space that is reserved by Amazon Redshift for internal use, so it is larger than the nominal disk capacity, which is the amount of disk space available to the user. The Percentage of Disk Space Used metric on the Performance tab of the Amazon Redshift Management Console reports the percentage of nominal disk capacity used by your cluster. Definition of the word nominal in some dictionaries = Not In Reality.

In other words, the actual space is greater than the nominal space shown by Redshift – but DON’T go above the nominal – that is shown in the Redshift Console under performance – because it is bad JuJu 😉 and nominal space is disk space available to the  user.

This query shows raw – NOT nominal – I am still searching for a query that shows NOMINAL only.

select sum(capacity)/1024 as capacity_gbytes,
sum(used)/1024 as used_gbytes,
(sum(capacity) – sum(used))/1024 as free_gbytes
from stv_partitions
where part_begin=0;

Analyzing Table Design – this query is resource hog query – can return valuable info though

SELECT SCHEMA schemaname,
“table” tablename,
table_id tableid,
size size_in_mb,
CASE
WHEN diststyle NOT IN (‘EVEN’,’ALL’) THEN 1
ELSE 0
END has_dist_key,
CASE
WHEN sortkey1 IS NOT NULL THEN 1
ELSE 0
END has_sort_key,
CASE
WHEN encoded = ‘Y’ THEN 1
ELSE 0
END has_col_encoding,
CAST(max_blocks_per_slice – min_blocks_per_slice AS FLOAT) / GREATEST(NVL (min_blocks_per_slice,0)::int,1) ratio_skew_across_slices,
CAST(100*dist_slice AS FLOAT) /(SELECT COUNT(DISTINCT slice) FROM stv_slices) pct_slices_populated
FROM svv_table_info ti
JOIN (SELECT tbl,
MIN(c) min_blocks_per_slice,
MAX(c) max_blocks_per_slice,
COUNT(DISTINCT slice) dist_slice
FROM (SELECT b.tbl,
b.slice,
COUNT(*) AS c
FROM STV_BLOCKLIST b
GROUP BY b.tbl,
b.slice)
WHERE tbl IN (SELECT table_id FROM svv_table_info)
GROUP BY tbl) iq ON iq.tbl = ti.table_id;

Are There Tables That Need Vacuum For Unsorted Data?

# show tables where unsorted is signicant 10% in this case and the size is at least 500MB

SELECT “schema” + ‘.’ + “table”
FROM svv_table_info
where unsorted > 10
and size > 500
order by size;

Are There Tables That Need Vacuum For Deleted Data?

# trust stats
select
relname table_name,
total_rows,
visible_rows,
total_rows – visible_rows deleted_rows
from
(
select id, sum(rows) as total_rows from pg_catalog.stv_tbl_perm group by id
) a inner join
(
select relfilenode, relname, reltuples visible_rows from pg_catalog.pg_class
) b on a.id = b.relfilenode
order by
table_name

# don’t trust stats – careful – this guy is a resource hog – execute more selectively at your own risk
select
total_rows,
visible_rows,
total_rows – visible_rows as deleted_rows
from
(
select count(*) visible_rows from <your table name>
) a,
(
select sum(rows) total_rows from pg_catalog.stv_tbl_perm where name = ‘<your table name>’
) b

Monitor Redshift Storage via CloudWatch

Setup a CloudWatch alert at a certain average space used threshold.

Redshift’s GitHub With Utility Queries and Views

https://github.com/awslabs/amazon-redshift-utils

https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AdminScripts

https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AdminViews

System Tables And Views

Complete of Catalog / System Tables

https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html

List of log tables without description is here

https://docs.aws.amazon.com/redshift/latest/dg/c_intro_STL_tables.html

Important system tables and views with a brief description

SVV_Table_Info – A broad range of information on all tables, with performance data – this view is a join of many other system views / tables – a VERY valuable view

STV_Partitions – Important view that captures performance data and raw space available per node and slice – my queries show one record per 2 slices on dc2.8xlarge

STV_InFlight – shows running queries – select * from stv_inflight; with a subsequent – cancel <PID> will kill a running query – careful…

STL_QUERY – Returns execution information about a database query.

STL_QUERYTEXT – Returns the text of a recently run query – circular queue containing a limited number of rows – likely about 7 to 14 days

From AWS – regarding the two STL_QUERY views above:

The STL_QUERY and STL_QUERYTEXT tables only contain information about queries, not other utility and DDL commands. For a listing and information on all statements executed by Amazon Redshift, you can also query the STL_DDLTEXT and STL_UTILITYTEXT tables. For a complete listing of all statements executed by Amazon Redshift, you can query the SVL_STATEMENTTEXT view.

STV_Slices – small table – one row per slice – outer joins

STV_BlockList – Very Large Table – careful with this – just do aggregations – and query it infrequently / carefully as this beast is monsterous if you have a large database – one record for each 1MB block

STL_VACUUM – Displays row and block statistics for tables that have been vacuumed.

SVL_QLog – Log view of everything run on the cluster – qualify by start date etc

STL_QUERY – Returns execution information about a database query.

stl_explain

stl_alert_event_log – Records an alert when the query optimizer identifies conditions that might indicate performance issues. Use the STL_ALERT_EVENT_LOG table to identify opportunities to improve query performance.

Amazon’s Github For Redshift Utilities

https://github.com/awslabs/amazon-redshift-utils/tree/master/src/AnalyzeVacuumUtility

Per Amazon:

This Utility Analyzes and Vacuums table(s) in a Redshift Database schema, based on certain parameters like unsorted, stats off and size of the table and system alerts from stl_explain & stl_alert_event_log. By turning on/off ‘–analyze-flag' and ‘–vacuum-flag’ parameters, you can run it as ‘vacuum-only’ or ‘analyze-only’ utility. This script can be scheduled to run VACUUM and ANALYZE as part of regular maintenance/housekeeping activities, when there are fewer database activities.

 

Github – addressing stv_partitions and slice information

https://github.com/awsdocs/amazon-redshift-developer-guide/blob/master/doc_source/r_STV_PARTITIONS.md

Redshift Top 14 Performance Tuning Techiques Reference To Another Blog

Top 14 Performance Tuning Techniques for Amazon Redshift

Keywords:

Disk Full Error – you can get this on a per node basis

stv_partitions (system view or table)

More from LonzoDB on AWS

Leave a Reply

Your email address will not be published. Required fields are marked *