How Step Functions parses input CSV files

Managing state and transforming data

Learn about Passing data between states with variables and Transforming data with JSONata.

Step Functions parses text delimited files based on the following rules:

The delimiter that separates fields is specified by CSVDelimiter in ReaderConfig. The delimiter defaults to COMMA.
Newlines are a delimiter that separates records.
Fields are treated as strings. For data type conversions, use the States.StringToJson intrinsic function in ItemSelector (Map).
Double quotation marks (" ") are not required to enclose strings. However, strings that are enclosed by double quotation marks can contain commas and newlines without acting as record delimiters.
You can preserve double quotes by repeating them.
Backslashes (\) are another way to escape special characters. Backslashes only work with other backslashes, double quotation marks, and the configured field separator such as comma or pipe. A backslash followed by any other character is silently removed.
You can preserve backslashes by repeating them. For example:
```
path,size
C:\\Program Files\\MyApp.exe,6534512
```
Backslashes that escape double quotation marks (\"), only work when included in pairs, so we recommend escaping double quotation marks by repeating them: "".
If the number of fields in a row is less than the number of fields in the header, Step Functions provides empty strings for the missing values.
If the number of fields in a row is more than the number of fields in the header, Step Functions skips the additional fields.

Example of parsing an input CSV file

Say that you have provided a CSV file named myCSVInput.csv that contains one row as input. Then, you've stored this file in an Amazon S3 bucket that's named amzn-s3-demo-bucket. The CSV file is as follows.


abc,123,"This string contains commas, a double quotation marks (""), and a newline (
)",{""MyKey"":""MyValue""},"[1,2,3]"

The following state machine reads this CSV file and uses ItemSelector (Map) to convert the data types of some of the fields.


{
  "StartAt": "Map",
  "States": {
    "Map": {
      "Type": "Map",
      "ItemProcessor": {
        "ProcessorConfig": {
          "Mode": "DISTRIBUTED",
          "ExecutionType": "STANDARD"
        },
        "StartAt": "Pass",
        "States": {
          "Pass": {
            "Type": "Pass",
            "End": true
          }
        }
      },
      "End": true,
      "Label": "Map",
      "MaxConcurrency": 1000,
      "ItemReader": {
        "Resource": "arn:aws:states:::s3:getObject",
        "ReaderConfig": {
          "InputType": "CSV",
          "CSVHeaderLocation": "GIVEN",
          "CSVHeaders": [
            "MyLetters",
            "MyNumbers",
            "MyString",
            "MyObject",
            "MyArray"
          ]
        },
        "Parameters": {
          "Bucket": "amzn-s3-demo-bucket",
          "Key": "myCSVInput.csv"
        }
      },
      "ItemSelector": {
        "MyLetters.$": "$$.Map.Item.Value.MyLetters",
        "MyNumbers.$": "States.StringToJson($$.Map.Item.Value.MyNumbers)",
        "MyString.$": "$$.Map.Item.Value.MyString",
        "MyObject.$": "States.StringToJson($$.Map.Item.Value.MyObject)",
        "MyArray.$": "States.StringToJson($$.Map.Item.Value.MyArray)"
      }
    }
  }
}

When you run this state machine, it produces the following output.


[
  {
    "MyNumbers": 123,
    "MyObject": {
      "MyKey": "MyValue"
    },
    "MyString": "This string contains commas, a double quote (\"), and a newline (\n)",
    "MyLetters": "abc",
    "MyArray": [
      1,
      2,
      3
    ]
  }
]

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

ResultWriter

Integrating services