StandardDeviation - AWS Glue

StandardDeviation

Checks the standard deviation of all of the values in a column against a given expression.

Syntax

StandardDeviation <COL_NAME> <EXPRESSION>
  • COL_NAME – The name of the column that you want to evaluate the data quality rule against.

    Supported column types: Byte, Decimal, Double, Float, Integer, Long, Short

  • EXPRESSION – An expression to run against the rule type response in order to produce a Boolean value. For more information, see Expressions.

Example: Standard deviation

The following example rule checks whether the standard deviation of the values in a column named colA is less than a specified value.

StandardDeviation "Star_Rating" < 1.5 StandardDeviation "Salary" < 3500 where "Customer_ID < 10"

Sample dynamic rules

  • StandardDeviation "colA" > avg(last(10) + 0.1

  • StandardDeviation "colA" between min(last(10)) - 1 and max(last(10)) + 1

Null behavior

The StandardDeviation rule will ignore rows with NULL values in the calculation of standard deviation. For example:

+---+-----------+-----------+ |id |units1 |units2 | +---+-----------+-----------+ |100|0 |0 | |101|null |0 | |102|20 |20 | |103|null |0 | |104|40 |40 | +---+-----------+-----------+

The standard deviation of column units1 will not consider rows 101 and 103 and result to 16.33. The standard deviation for column units2 will result in 16.