Usa l'EXPLAINistruzione Spark per risolvere i problemi di Spark SQL

Puoi usare l'EXPLAINistruzione Spark con Spark per risolvere i problemi del codice SQL Spark. I seguenti esempi di codice e output mostrano questo utilizzo.

Esempio — Dichiarazione Spark SELECT


spark.sql("select * from select_taxi_table").explain(True)

Output


Calculation started (calculation_id=20c1ebd0-1ccf-ef14-db35-7c1844876a7e) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [select_taxi_table], [], false

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Project [VendorID#202L, passenger_count#203L, count#204L]
+- SubqueryAlias spark_catalog.spark_demo_database.select_taxi_table
   +- Relation spark_demo_database.select_taxi_table[VendorID#202L,
       passenger_count#203L,count#204L] csv

== Optimized Logical Plan ==
Relation spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] csv

== Physical Plan ==
FileScan csv spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] 
Batched: false, DataFilters: [], Format: CSV, 
Location: InMemoryFileIndex(1 paths)
[s3://amzn-s3-demo-bucket/select_taxi], 
PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct<VendorID:bigint,passenger_count:bigint,count:bigint>

Esempio - Dataframe Spark

Il codice di esempio seguente mostra come utilizzare EXPLAIN con un dataframe Spark.


taxi1_df=taxi_df.groupBy("VendorID", "passenger_count").count()
taxi1_df.explain("extended")

Output


Calculation started (calculation_id=d2c1ebd1-f9f0-db25-8477-3effc001b309) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Aggregate ['VendorID, 'passenger_count], 
['VendorID, 'passenger_count, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,
extra#60,mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Optimized Logical Plan ==
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Project [VendorID#49L, passenger_count#52L]
   +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[VendorID#49L, passenger_count#52L], functions=[count(1)], 
output=[VendorID#49L, passenger_count#52L, count#321L])
   +- Exchange hashpartitioning(VendorID#49L, passenger_count#52L, 1000), 
      ENSURE_REQUIREMENTS, [id=#531]
      +- HashAggregate(keys=[VendorID#49L, passenger_count#52L], 
         functions=[partial_count(1)], output=[VendorID#49L, 
         passenger_count#52L, count#326L])
         +- FileScan parquet [VendorID#49L,passenger_count#52L] Batched: true, 
            DataFilters: [], Format: Parquet, 
            Location: InMemoryFileIndex(1 paths)[s3://amzn-s3-demo-bucket/
            notebooks/yellow_tripdata_2016-01.parquet], PartitionFilters: [], 
            PushedFilters: [], 
            ReadSchema: struct<VendorID:bigint,passenger_count:bigint>

Avvertimento JavaScript è disabilitato o non è disponibile nel tuo browser.

Per usare la documentazione AWS, JavaScript deve essere abilitato. Consulta le pagine della guida del browser per le istruzioni.

Convenzioni dei documenti

Gruppi di lavoro abilitati per Spark

Registra gli eventi dell'applicazione