RESCALE_OUTLIERS_WITH_SKEW
Returns a new column with a rescaled outlier value in each row, based on the settings in the parameters. This action works to reduce distribution skewness by applying the specified log or root transform. We recommend this action for handling skewed data.
Parameters
-
sourceColumn
– Specifies the name of an existing numeric column that might contain outliers. -
targetColumn
– Specifies the name of an existing numeric column that might contain outliers. -
outlierStrategy
– Specifies the approach to use in detecting outliers. Valid values include the following:-
Z_SCORE
– Identifies a value as an outlier when it deviates from the mean by more than the standard deviation threshold. -
MODIFIED_Z_SCORE
– Identifies a value as an outlier when it deviates from the median by more than the median absolute deviation threshold. -
IQR
– Identifies a values as an outlier when it falls beyond the first and last quartile of column data. The interquartile range (IQR) measures where the middle 50% of the data points are.
-
-
threshold
– Specifies the threshold value to use when detecting outliers. ThesourceColumn
value is identified as an outlier if the score that's calculated with theoutlierStrategy
exceeds this number. The default is 3. -
skewFunction
– Specifies the method to use when replacing outliers. Valid values include the following:-
LOG – Applies a strong transformation to reduce positive and negative skew. This is a natural logarithm (2.718281828).
-
ROOT (with
value = 3
) – Applies a fairly strong transformation to reduce positive and negative skew. (Cube root) -
ROOT (with
value = 2
) – Applies a moderate transformation to reduce positive skew only. (Square root) -
SQUARE – Applies a moderate transformation to reduce negative skew. (Square)
-
Custom transform – Applies the specified
LOG
orROOT
transform using the custom number provided in thevalue
parameter.
-
-
value
– Specifies the value to use for the custom transform. IfskewFunction
is LOG, this value represents the base of the log. IfskewFunction
is ROOT, this value represents the power of the root.
The following examples display syntax for a single RecipeAction operation. A recipe contains at least one RecipeStep operation, and a recipe step contains at least one recipe action. A recipe action runs the data transform that you specify. A group of recipe actions run in sequential order to create the final dataset.