Using partitioning hints in Spark 3.0.0
Spark
partitioning hintsCOALESCE
, REPARTITION
, and REPARTITION_BY_RANGE
.
These hints are similar to the Dataset APIs, such as coalesce
,
repartition
, and repartitionByRange
. The following hints help
you control the number of output files in Spark SQL, which helps you tune
performance:
-
Coalesce - Reduce the number of partitions to the specified number of partitions. A partition number is the only parameter of the
COALESCE
hint. -
Repartition - Repartition to the specified number of partitions by using the specified partitioning expressions. The
REPARTITION
hint parameters are a partition number, column names, or both. -
Repartition by range - Repartition to the specified number of partitions by using the specified partitioning expressions. Column names is a required parameter for the
REPARTITION_BY_RANGE
hint, and a partition number is optional. -
Rebalance - Rebalance the query result output partitions so that every partition is a reasonable size.
REBALANCE
hint parameters are an initial partition number, column names, or both or neither.