Data science recipe steps - AWS Glue DataBrew

Data science recipe steps

Use these recipe steps to tabulate and summarize data from different perspectives, or to perform advanced transformations.

SCALE

Scales or normalizes the range of data in a numeric column.

Parameters
  • sourceColumn – The name of an existing column.

  • strategy – The operation to be applied to the column values:

    • MIN_MAX – Rescales the values into a range of [0,1]

    • SCALE_BETWEEN – Rescales the values into a range of 2 specified values.

    • MEAN_NORMALIZATION – Rescales the data to have a mean (μ) of 0 and standard deviation (σ) of 1 within a range of [-1, 1]

    • Z_SCORE – Linearly scale data values to have a mean (μ) of 0 and standard deviation (σ) of 1. Best for handling outliers.

  • targetColumn – The name of a column to contain the results.

Example

{ "Action": { "Operation": "NORMALIZATION", "Parameters": { "sourceColumn": "all_votes", "strategy": "MIN_MAX", "targetColumn": "all_votes_normalized" } } }