Menu
Amazon Kinesis Data Analytics
Developer Guide

Using the Schema Discovery Feature on Static Data

The schema discovery feature can generate a schema from either the data in a stream or data in a static file that is stored in an Amazon S3 bucket. Suppose that you want to generate a schema for a Kinesis Data Analytics application for reference purposes or when live streaming data isn't available. You can use the schema discovery feature on a static file that contains a sample of the data in the expected format of your streaming or reference data. Kinesis Data Analytics can run schema discovery on sample data from a JSON or CSV file that's stored in an Amazon S3 bucket. Using schema discovery on a data file uses the DiscoverInputSchema API with the S3Configuration parameter specified.

To run discovery on a static file, you provide the API with an S3Configuration structure with the following information:

Note

Generating a schema from a data file is currently not available in the AWS Management Console.

To generate a schema from an Amazon S3 object using the DiscoverInputSchema API

  1. Make sure that you have the AWS CLI set up. For more information, see Step 2: Set Up the AWS Command Line Interface (AWS CLI) in the Getting Started section.

  2. Create a file named data.csv with the following contents:

    year,month,state,producer_type,energy_source,units,consumption 2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,47615 2001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,16535 2001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,22890 2001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,3020601 2001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681
  3. Sign in to the Amazon S3 console at https://console.aws.amazon.com/s3/.

  4. Create an Amazon S3 bucket and upload the data.csv file you created. Note the ARN of the created bucket. For information about creating an Amazon S3 bucket and uploading a file, see Getting Started with Amazon Simple Storage Service.

  5. Open the IAM console at https://console.aws.amazon.com/iam/. Create a role with the AmazonS3ReadOnlyAccess policy. Note the ARN of the new role. For information about creating a role, see Creating a Role to Delegate Permissions to an AWS Service. For information about how to add a policy to a role, see Modifying a Role.

  6. Run the following DiscoverInputSchema command in the AWS CLI, substituting the ARNs for your Amazon S3 bucket and IAM role:

    $aws kinesisanalytics discover-input-schema --s3-configuration '{ "RoleARN": "arn:aws:iam::123456789012:role/service-role/your-IAM-role", "BucketARN": "arn:aws:s3:::your-bucket-name", "FileKey": "data.csv" }'
  7. The response will look similar to the following:

    { "InputSchema": { "RecordEncoding": "UTF-8", "RecordColumns": [ { "SqlType": "INTEGER", "Name": "COL_year" }, { "SqlType": "INTEGER", "Name": "COL_month" }, { "SqlType": "VARCHAR(4)", "Name": "state" }, { "SqlType": "VARCHAR(64)", "Name": "producer_type" }, { "SqlType": "VARCHAR(4)", "Name": "energy_source" }, { "SqlType": "VARCHAR(16)", "Name": "units" }, { "SqlType": "INTEGER", "Name": "consumption" } ], "RecordFormat": { "RecordFormatType": "CSV", "MappingParameters": { "CSVMappingParameters": { "RecordRowDelimiter": "\r\n", "RecordColumnDelimiter": "," } } } }, "RawInputRecords": [ "year,month,state,producer_type,energy_source,units,consumption\r\n2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,47615\r\n2001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,16535\r\n2001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,22890\r\n2001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,3020601\r\n2001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681" ], "ParsedInputRecords": [ [ null, null, "state", "producer_type", "energy_source", "units", null ], [ "2001", "1", "AK", "TotalElectricPowerIndustry", "Coal", "ShortTons", "47615" ], [ "2001", "1", "AK", "ElectricGeneratorsElectricUtilities", "Coal", "ShortTons", "16535" ], [ "2001", "1", "AK", "CombinedHeatandPowerElectricPower", "Coal", "ShortTons", "22890" ], [ "2001", "1", "AL", "TotalElectricPowerIndustry", "Coal", "ShortTons", "3020601" ], [ "2001", "1", "AL", "ElectricGeneratorsElectricUtilities", "Coal", "ShortTons", "2987681" ] ] }