TOKENIZATION

Splits text into smaller units, or tokens, such as individual words or terms.

Parameters

sourceColumn – The name of an existing column.
delimiter — A custom delimiter that appears between tokenized words. (The default behavior is to separate each token by a space.)
expandContractions — If ENABLED, expands contracted words. For example: "don't" becomes "do not".
stemmingMode — Splits text into smaller units or tokens, such as individual lowercase words or terms. Two stemming modes are available: PORTER | LANCASTER.
stopWordRemovalMode — Removes common words like a, an, the, and more.
customStopWords — For StopWordRemovalMode, allows you to specify a custom list of stop words.
targetColumn — The name of a column to contain the results.

Example


{
    "Action": {
        "Operation": "TOKENIZATION",
        "Parameters": {
            "customStopWords": "[]",
            "delimiter": "- ",
            "expandContractions": "ENABLED",
            "sourceColumn": "dimensions",
            "stemmingMode": "PORTER",
            "stopWordRemovalMode": "DEFAULT",
            "targetColumn": "dimensions_tokenized"
        }
    }

}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

SKEWNESS

Mathematical functions