Ejemplo de plan de consulta - Amazon Redshift

Ejemplo de plan de consulta

En este ejemplo, se muestra cómo evaluar un plan de consulta para encontrar oportunidades y optimización de las distribuciones.

Ejecute la siguiente consulta con un comando EXPLAIN para producir un plan de consulta.

explain select lastname, catname, venuename, venuecity, venuestate, eventname, month, sum(pricepaid) as buyercost, max(totalprice) as maxtotalprice from category join event on category.catid = event.catid join venue on venue.venueid = event.venueid join sales on sales.eventid = event.eventid join listing on sales.listid = listing.listid join date on sales.dateid = date.dateid join users on users.userid = sales.buyerid group by lastname, catname, venuename, venuecity, venuestate, eventname, month having sum(pricepaid)>9999 order by catname, buyercost desc;

En la base de datos TICKIT, SALES es una tabla de hechos y LISTING la tabla de mayor dimensión. Para poder ubicar juntas las tablas, SALES se distribuye en LISTID, que es la clave externa para LISTING, y LISTING se distribuye en su clave principal, LISTID. En el siguiente ejemplo, se muestran los comandos CREATE TABLE para SALES y LISTING.

create table sales( salesid integer not null, listid integer not null distkey, sellerid integer not null, buyerid integer not null, eventid integer not null encode mostly16, dateid smallint not null, qtysold smallint not null encode mostly8, pricepaid decimal(8,2) encode delta32k, commission decimal(8,2) encode delta32k, saletime timestamp, primary key(salesid), foreign key(listid) references listing(listid), foreign key(sellerid) references users(userid), foreign key(buyerid) references users(userid), foreign key(dateid) references date(dateid)) sortkey(listid,sellerid); create table listing( listid integer not null distkey sortkey, sellerid integer not null, eventid integer not null encode mostly16, dateid smallint not null, numtickets smallint not null encode mostly8, priceperticket decimal(8,2) encode bytedict, totalprice decimal(8,2) encode mostly32, listtime timestamp, primary key(listid), foreign key(sellerid) references users(userid), foreign key(eventid) references event(eventid), foreign key(dateid) references date(dateid));

En el siguiente plan de consulta, el paso de combinación de fusión para la combinación de SALES y LISTING muestra la etiqueta DS_DIST_NONE, que indica que no es necesario redistribuir en este paso. No obstante, si avanzamos en el plan de consulta, las otras combinaciones internas muestran la etiqueta DS_BCAST_INNER, que indica que la tabla interna se difunde como parte de la ejecución de la consulta. Como con la distribución de clave solo se pueden ubicar juntas un par de tablas, se deben volver a difundir cinco tablas.

QUERY PLAN XN Merge (cost=1015345167117.54..1015345167544.46 rows=1000 width=103) Merge Key: category.catname, sum(sales.pricepaid) -> XN Network (cost=1015345167117.54..1015345167544.46 rows=170771 width=103) Send to leader -> XN Sort (cost=1015345167117.54..1015345167544.46 rows=170771 width=103) Sort Key: category.catname, sum(sales.pricepaid) -> XN HashAggregate (cost=15345150568.37..15345152276.08 rows=170771 width=103) Filter: (sum(pricepaid) > 9999.00) -> XN Hash Join DS_BCAST_INNER (cost=742.08..15345146299.10 rows=170771 width=103) Hash Cond: ("outer".catid = "inner".catid) -> XN Hash Join DS_BCAST_INNER (cost=741.94..15342942456.61 rows=170771 width=97) Hash Cond: ("outer".dateid = "inner".dateid) -> XN Hash Join DS_BCAST_INNER (cost=737.38..15269938609.81 rows=170766 width=90) Hash Cond: ("outer".buyerid = "inner".userid) -> XN Hash Join DS_BCAST_INNER (cost=112.50..3272334142.59 rows=170771 width=84) Hash Cond: ("outer".venueid = "inner".venueid) -> XN Hash Join DS_BCAST_INNER (cost=109.98..3167290276.71 rows=172456 width=47) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Merge Join DS_DIST_NONE (cost=0.00..6286.47 rows=172456 width=30) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=14) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=24) -> XN Hash (cost=87.98..87.98 rows=8798 width=25) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=25) -> XN Hash (cost=2.02..2.02 rows=202 width=41) -> XN Seq Scan on venue (cost=0.00..2.02 rows=202 width=41) -> XN Hash (cost=499.90..499.90 rows=49990 width=14) -> XN Seq Scan on users (cost=0.00..499.90 rows=49990 width=14) -> XN Hash (cost=3.65..3.65 rows=365 width=11) -> XN Seq Scan on date (cost=0.00..3.65 rows=365 width=11) -> XN Hash (cost=0.11..0.11 rows=11 width=10) -> XN Seq Scan on category (cost=0.00..0.11 rows=11 width=10)

Una solución es modificar las tablas para que tengan DISTSTYLE ALL.

ALTER TABLE users ALTER DISTSTYLE ALL; ALTER TABLE venue ALTER DISTSTYLE ALL; ALTER TABLE category ALTER DISTSTYLE ALL; ALTER TABLE date ALTER DISTSTYLE ALL; ALTER TABLE event ALTER DISTSTYLE ALL;

Ejecute nuevamente la misma consulta con el comando EXPLAIN y examine el plan de consulta nuevo. Las combinaciones ahora muestran la etiqueta DS_DIST_ALL_NONE, lo que indica que no es necesario redistribuir porque ya se distribuyeron los datos a cada nodo con el atributo DISTSTYLE ALL.

QUERY PLAN XN Merge (cost=1000000047117.54..1000000047544.46 rows=1000 width=103) Merge Key: category.catname, sum(sales.pricepaid) -> XN Network (cost=1000000047117.54..1000000047544.46 rows=170771 width=103) Send to leader -> XN Sort (cost=1000000047117.54..1000000047544.46 rows=170771 width=103) Sort Key: category.catname, sum(sales.pricepaid) -> XN HashAggregate (cost=30568.37..32276.08 rows=170771 width=103) Filter: (sum(pricepaid) > 9999.00) -> XN Hash Join DS_DIST_ALL_NONE (cost=742.08..26299.10 rows=170771 width=103) Hash Cond: ("outer".buyerid = "inner".userid) -> XN Hash Join DS_DIST_ALL_NONE (cost=117.20..21831.99 rows=170766 width=97) Hash Cond: ("outer".dateid = "inner".dateid) -> XN Hash Join DS_DIST_ALL_NONE (cost=112.64..17985.08 rows=170771 width=90) Hash Cond: ("outer".catid = "inner".catid) -> XN Hash Join DS_DIST_ALL_NONE (cost=112.50..14142.59 rows=170771 width=84) Hash Cond: ("outer".venueid = "inner".venueid) -> XN Hash Join DS_DIST_ALL_NONE (cost=109.98..10276.71 rows=172456 width=47) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Merge Join DS_DIST_NONE (cost=0.00..6286.47 rows=172456 width=30) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=14) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=24) -> XN Hash (cost=87.98..87.98 rows=8798 width=25) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=25) -> XN Hash (cost=2.02..2.02 rows=202 width=41) -> XN Seq Scan on venue (cost=0.00..2.02 rows=202 width=41) -> XN Hash (cost=0.11..0.11 rows=11 width=10) -> XN Seq Scan on category (cost=0.00..0.11 rows=11 width=10) -> XN Hash (cost=3.65..3.65 rows=365 width=11) -> XN Seq Scan on date (cost=0.00..3.65 rows=365 width=11) -> XN Hash (cost=499.90..499.90 rows=49990 width=14) -> XN Seq Scan on users (cost=0.00..499.90 rows=49990 width=14)