BUCKETIZATION - AWS Glue DataBrew

BUCKETIZATION

Bucketization (called Binning in the console) takes the items in a column of numeric values, groups them into bins defined by numeric ranges, and outputs a new column that displays the bin for each row. Bucketization can be done using splits or percentage. The first example below uses splits and the second example uses a percentage.

Parameters
  • sourceColumn – The name of an existing column.

    targetColumn – The name of the new column to be created.

    bucketNames – List of bucket names.

    splits – List of bucket levels. Buckets are consecutive, and an upper bound for a bucket will be a lower bound for the next bucket.

    percentage – Each bucket will be described as a percentage.

Example using splits

{ "Action": { "Operation": "BUCKETIZATION", "Parameters": { "sourceColumn": "level", "targetColumn": "bin", "bucketNames": "[\"Bin1\",\"Bin2\",\"Bin3\"]", "splits": "[\"-Infinity\",\"2\",\"20\",\"Infinity\"]" } } }
Example using a percentage
{ "Action": { "Operation": "BUCKETIZATION", "Parameters": { "sourceColumn": "level", "targetColumn": "bin", "bucketNames": "[\"Bin1\",\"Bin2\"]", "percentage": "50" } } }