SparkSQL - AWS Glue

SparkSQL

Specifies a transform where you enter a SQL query using Spark SQL syntax to transform the data. The output is a single DynamicFrame.

Contents

Inputs

The data inputs identified by their node names. You can associate a table name with each input node to use in the SQL query. The name you choose must meet the Spark SQL naming restrictions.

Type: Array of strings

Array Members: Minimum number of 1 item.

Pattern: [A-Za-z0-9_-]*

Required: Yes

Name

The name of the transform node.

Type: String

Pattern: ([\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF]|[^\r\n])*

Required: Yes

SqlAliases

A list of aliases. An alias allows you to specify what name to use in the SQL for a given input. For example, you have a datasource named "MyDataSource". If you specify From as MyDataSource, and Alias as SqlName, then in your SQL you can do:

select * from SqlName

and that gets data from MyDataSource.

Type: Array of SqlAlias objects

Required: Yes

SqlQuery

A SQL query that must use Spark SQL syntax and return a single data set.

Type: String

Pattern: ([\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\s])*

Required: Yes

OutputSchemas

Specifies the data schema for the SparkSQL transform.

Type: Array of GlueSchema objects

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: