Menu
Amazon Simple Storage Service
API Reference (API Version 2006-03-01)

SELECT Object Content

Description

This operation filters the contents of an Amazon S3 object based on a simple structured query language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON or CSV) of the object. Amazon S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

For more information about Amazon S3 Select, see Selecting Content from Objects in the Amazon Simple Storage Service Developer Guide.

For more information about using SQL with Amazon S3 Select, see SQL Reference for Amazon S3 Select and Amazon Glacier Select in the Amazon Simple Storage Service Developer Guide.

Permissions

You must have s3:GetObject permission for this operation. Amazon S3 Select does not support anonymous access. For more information about permissions, see Specifying Permissions in a Policy in the Amazon Simple Storage Service Developer Guide.

Object Data Formats

You can use Amazon S3 Select to query objects that have the following format properties:

  • CSV and JSON – Objects must be in CSV or JSON format.

  • UTF-8 – UTF-8 is the only encoding type Amazon S3 Select supports.

  • GZIP or BZIP2 – CSV and JSON files can be compressed using GZIP or BZIP2. GZIP and BZIP2 are the only compression formats that Amazon S3 Select supports.

  • Server-side encryption – Amazon S3 Select supports querying objects that are protected with server-side encryption.

    For objects that are encrypted with customer-provided encryption keys (SSE-C), you must use HTTPS, and you must use the headers that are documented in the Specific Request Headers for Server-Side Encryption with Customer-Provided Encryption Keys section in the Amazon S3 GET Object REST API. For more information about SSE-C, see Server-Side Encryption (Using Customer-Provided Encryption Keys) in the Amazon Simple Storage Service Developer Guide.

    For objects that are encrypted with Amazon S3 managed encryption keys (SSE-S3) and AWS KMS managed encryption keys (SSE-KMS), server-side encryption is handled transparently, so you don't need to specify anything. For more information about server-side encryption, including SSE-S3 and SSE-KMS, see Protecting Data Using Server-Side Encryption in the Amazon Simple Storage Service Developer Guide.

Requests

Syntax

POST /ObjectName?select&select-type=2 HTTP/1.1 Host: BucketName.s3.amazonaws.com Date: date Authorization: authorization string (See Authenticating Requests (AWS Signature Version 4)) Request body goes here

Note

The syntax shows some of the request headers. For a complete list, see the "Request Headers" section of this topic.

Query parameters select and select-type-2 are both required for all requests. select-type=2 is present in order to enable extensions for future capabilities.

Request Parameters

This implementation of the operation does not use request parameters.

Request Headers

This implementation of the operation uses only request headers that are common to all operations. For more information, see Common Request Headers.

Request Body

The following XML shows the request body for an object in CSV format with results in CSV format:

<?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <CSV> <FileHeaderInfo>IGNORE</FileHeaderInfo> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> <Comments>#</Comments> <AllowQuotedRecordDelimiter>FALSE</AllowQuotedRecordDelimiter> </CSV> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> <RequestProgress> <Enabled>FALSE</Enabled> </RequestProgress> </SelectRequest>

The following XML shows the request body for an object in JSON format with results in JSON format:

<?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <JSON> <Type>DOCUMENT</Type> </JSON> </InputSerialization> <OutputSerialization> <JSON> <RecordDelimiter>\n</RecordDelimiter> </JSON> </OutputSerialization> <RequestProgress> <Enabled>FALSE</Enabled> </RequestProgress> </SelectRequest>

Note

In the XML:

  • The InputSerialization element describes the format of the data in the object that is being queried. It must specify CSV or JSON.

  • The OutputSerialization element describes the format of the data that you want Amazon S3 to return in response to the query. It must specify either CSV or JSON.

  • The format of the InputSerialization doesn't need to match the format of the OutputSerialization. So, for example, you can specify JSON in the InputSerialization and CSV in the OutputSerialization.

The following tables explain each of the XML elements in the request body.

Name Description Required
Expression

The SQL expression. For example:

  • The following SQL expression retrieves the first column of the data from the object stored in CSV format.

    SELECT s._1 FROM S3Object s

  • The following SQL expression returns everything from the object.

    SELECT * FROM S3Object

Type: String

Ancestor: SelectRequest

Yes
ExpressionType

Identifies the expression type.

Type: String

Valid values: SQL

Ancestor: SelectRequest

Yes
InputSerialization

Describes the format of the data in the object that is being queried.

Type: Container

Ancestor: SelectRequest

Yes
OutputSerialization

Describes the format of the data that you want Amazon S3 to return in response.

Type: Container

Ancestor: SelectRequest

Yes
RequestProgress

Describes optional, periodic QueryProgress messages that can be sent.

Type: Container

Ancestor: SelectRequest

No

InputSerialization container element

Name Description Required
CompressionType

Identifies whether the Amazon S3 object that is being queried is compressed. GZIP and BZIP2 are the only supported compression types.

Type: String

Valid values: NONE | GZIP | BZIP2

Default: NONE

Ancestor: InputSerialization

No
CSV | JSON

Specifies the format and certain properties of the Amazon S3 object that is being queried.

Type: Container

Ancestor: InputSerialization

Exactly one of CSV or JSON is required.

CSV container element (inside InputSerialization)

Name Description Required
RecordDelimiter

The value used to separate individual records in the input. Instead of the default value, you can specify an arbitrary delimiter, including an octal character. For example, \\036 is parsed as the "record separator" (non-printing) character.

You can specify up to two characters for a record delimiter. You can specify two characters, one character and one octal, or two octals. For example, \r\n is a valid record delimiter.

Type: String

Default: \n

Ancestor: CSV

No
FieldDelimiter

The value used to separate individual fields in a record. Instead of the default value, you can specify an arbitrary delimiter, including an octal character. For example, \\036 is parsed as the "record separator" (non-printing) character.

Type: String

Default: ,

Ancestor: CSV

No
QuoteCharacter

The value to use for escaping when the field delimiter is part of the value.

Consider this example in a CSV file:

"a, b"

The use of quotation marks makes this value a single field because you are wrapping the value in quotation marks. If you don't specify the quotation marks, the comma is a field delimiter (which makes it two separate field values, a and b).

Type: String

Default: "

Ancestor: CSV

No
QuoteEscapeCharacter

The value to use for escaping the quotation mark character inside an already escaped value. For example, the value """ a , b """ is parsed as " a , b ".

Type: String

Default: "

Ancestor: CSV

No
FileHeaderInfo

Describes the first line in the input data. It is one of the ENUM values.

  • NONE: The first line is not a column header.

  • USE: The first line is a column header, and you can use the header value to identify a column in an expression (for example, SELECT "name" FROM S3Object).

  • IGNORE: The first line is a column header, but you can't use the header values to identify the column in an expression. You can use column position (such as _1, _2, …) to identify the column (for example, SELECT s._1 FROM S3Object s).

Type: Enum

Valid values: NONE | USE | IGNORE

Ancestor: CSV

No
Comments

If the first character of a line of text matches the comment character, the row is considered a comment and is discarded from the input. You can specify any character to indicate a comment line.

Type: String

Default: #

Ancestor: CSV

No
AllowQuotedRecordDelimiter

Specifies that CSV input records might contain record delimiters within quote characters. Setting this option to TRUE could result in slower performance.

Type: Boolean

Default: FALSE

Ancestor: CSV

No

JSON container element (inside InputSerialization)

Name Description Required
Type

The type of JSON

Type: Enum

Valid values: DOCUMENT | LINES

Ancestor: JSON

Yes

OutputSerialization container element

Name Description Required
CSV | JSON

Specifies the format and certain properties of the data that is returned in response.

Type: Container

Ancestor: OutputSerialization

Exactly one of CSV or JSON is required.

CSV container element (inside OutputSerialization)

Name Description Required
QuoteFields

Indicates whether to use quotation marks around output fields.

  • ALWAYS: Always use quotation marks for output fields.

  • ASNEEDED: Use quotation marks for output fields when needed.

Type: String

Valid values: ALWAYS | ASNEEDED

Default: ASNEEDED

Ancestor: CSV

No
RecordDelimiter

The value used to separate individual records in the output. Instead of the default value, you can specify an arbitrary delimiter, including an octal character. For example, \\036 is parsed as the "record separator" (non-printing) character.

You can specify up to two characters for a record delimiter. You can specify two characters, one character and one octal, or two octals. For example, \r\n is a valid record delimiter.

Type: String

Default: \n

Ancestor: CSV

No
FieldDelimiter

The value you want Amazon S3 to use to separate individual fields in a record. Instead of the default value, you can specify an arbitrary delimiter, including an octal character. For example, \\036 is parsed as the "record separator" (non-printing) character.

Type: String

Default: ,

Ancestor: CSV

No
QuoteCharacter

The value to use for escaping when the field delimiter is part of the value. For example, if the value is a, b, then Amazon S3 wraps this field value in quotation marks as follows: " a , b ".

Type: String

Default: "

Ancestor: CSV

No
QuoteEscapeCharacter

The value to use for escaping the quotation mark character inside an already escaped value. For example, if the value is " a , b ", then Amazon S3 wraps the value in quotation marks as follows: """ a , b """.

Type: String

Default: "

Ancestor: CSV

No

JSON container element (inside OutputSerialization)

Name Description Required
RecordDelimiter

The value used to separate individual records in the output. Instead of the default value, you can specify an arbitrary delimiter, including an octal character. For example, \\036 is parsed as the "record separator" (non-printing) character.

You can specify up to two characters for a record delimiter. You can specify two characters, one character and one octal, or two octals. For example, \r\n is a valid record delimiter.

Type: String

Default: \n

Ancestor: JSON

No

RequestProgress container element

Name Description Required
Enabled

Specifies whether periodic QueryProgress messages should be sent.

Type: Boolean

Default: FALSE

Ancestor: RequestProgress

No

Responses

A successful operation returns 200 OK status code.

Response Headers

This implementation of the operation uses only response headers that are common to most responses. For more information, see Common Response Headers.

Response Body

Because the response size is unknown, Amazon S3 streams the response as a series of messages and includes a Transfer-Encoding header with chunked as its value in the response. The following example shows the response format at the top level:

<Message 1> <Message 2> <Message 3> ...... <Message n>

Each message consists of two sections: the prelude and the data. The prelude section consists of 1) the total byte-length of the message, and 2) the combined byte-length of all the headers. The data section consists of 1) the headers, and 2) a payload.

Each section ends with a 4-byte big-endian integer checksum (CRC). Amazon S3 Select uses CRC32 (often referred to as GZIP CRC32) to calculate both CRCs. For more information about CRC32, see GZIP file format specification version 4.3.

Total message overhead including the prelude and both checksums is 16 bytes.

Note

All integer values within messages are in network byte order, or big-endian order.

The following diagram shows the components that make up a message and a header. Note that there are multiple headers per message.


					Screenshot of an example message structure showing total byte-length, header byte-length, prelude crc,
						header, payload, and message crc.

Note

For Amazon S3 Select, the header value type is always 7 (type=String). For this type, the header value consists of two components, a 2-byte big-endian integer length, and a UTF-8 string that is of that byte-length. The following diagram shows the components that make up Amazon S3 Select headers.


					Screenshot of the headers structure showing header name byte-length, header name string, 
						header value type, value byte-length and value string.

Payload byte-length calculations (these two calculations are equivalent):

  • payload_length = total_length - header_length - sizeOf(total_length) - sizeOf(header_length) - sizeOf(prelude_crc) - sizeOf(message_crc)

  • payload_length = total_length - header_length - 16

Each message contains the following components:

  • Prelude: Always fixed size of 8 bytes (two fields of 4 bytes each):

    • First four bytes: Total byte-length: Big-endian integer byte-length of the entire message (including the 4-byte total length field itself).

    • Second four bytes: Headers byte-length: Big-endian integer byte-length of the headers portion of the message (excluding the headers length field itself).

  • Prelude CRC: 4-byte big-endian integer checksum (CRC) for the prelude portion of the message (excluding the CRC itself). The prelude has a separate CRC from the message CRC (see below), to ensure that corrupted byte-length information can be detected immediately, without causing pathological buffering behavior.

  • Headers: A set of metadata annotating the message, such as the message type, payload format, and so on. Messages can have multiple headers, so this portion of the message can have different byte-lengths depending on the message type. Headers are key-value pairs, where both the key and value are UTF-8 strings. Headers can appear in any order within the headers portion of the message, and any given header type can only appear once.

    For Amazon S3 Select, following is a list of header names and the set of valid values depending on the message type.

    • MessageType Header:

      • HeaderName => ":message-type"

      • Valid HeaderValues => "error", "event"

    • EventType Header:

      • HeaderName => ":event-type"

      • Valid HeaderValues => "Records", "Cont", "Progress", "Stats", "End"

    • ErrorCode Header:

      • HeaderName => ":error-code"

      • Valid HeaderValues => Error Code from the table in the Special Errors section.

    • ErrorMessage Header:

      • HeaderName => ":error-message"

      • Valid HeaderValues => Error message returned by the service, to help diagnose request-level errors.

  • Payload: Can be anything.

  • Message CRC: 4-byte big-endian integer checksum (CRC) from the start of the message to the start of the checksum (that is, everything in the message excluding the message CRC itself).

Each header contains the following components. There can be multiple headers per message.

  • Header Name Byte-Length: Byte-length of the header name.

  • Header Name: Name of the header, indicating the header type. Valid values: ":message-type" ":event-type" ":error-code" ":error-message"

  • Header Value Type: Enum indicating the header value type. For Amazon S3 Select, this is always 7.

  • Value String Byte-Length: (For Amazon S3 Select) Byte-length of the header value string.

  • Header Value String: (For Amazon S3 Select) Value of the header string. Valid values for this field vary based on the type of the header. See the sections below for valid values for each header type and message type.

For Amazon S3 Select, responses can be messages of the following types:

  • Records message: Can contain a single record, partial records, or multiple records. Depending on the size of the result, a response can contain one or more of these messages.

  • Continuation message: Amazon S3 periodically sends this message to keep the TCP connection open. These messages appear in responses at random. The client must detect the message type and process accordingly.

  • Progress message: Amazon S3 periodically sends this message, if requested. It contains information about the progress of a query that has started but has not yet completed.

  • Stats message: Amazon S3 sends this message at the end of the request. It contains statistics about the query.

  • End message: Indicates that the request is complete, and no more messages will be sent. You should not assume that the request is complete until the client receives an End message.

  • RequestLevelError message: Amazon S3 sends this message if the request failed for any reason. It contains the error code and error message for the failure. If Amazon S3 sends a RequestLevelError message, it doesn't send an End message.

The following sections explain the structure of each message type in more detail.

For sample code and unit tests that use this protocol, see AWS C Event Stream on the GitHub website.

Records Message

Header specification

Records messages contain three headers, as follows:


							Screenshot of an example message structure including the headers for this record type.
Payload specification

Records message payloads can contain a single record, partial records, or multiple records.

Continuation Message

Header specification

Continuation messages contain two headers, as follows:


							Screenshot of an example message structure including the headers for this record type.
Payload specification

Continuation messages have no payload.

Progress Message

Header specification

Progress messages contain three headers, as follows:


							Screenshot of an example message structure including the headers for this record type.
Payload specification

Progress message payload is an XML document containing information about the progress of a request.

  • BytesScanned => Number of bytes that have been processed before being uncompressed (if the file is compressed).

  • BytesProcessed => Number of bytes that have been processed after being uncompressed (if the file is compressed).

  • BytesReturned => Current number of bytes of records payload data returned by Amazon S3.

For uncompressed files, BytesScanned and BytesProcessed are equal.

Example:

<?xml version="1.0" encoding="UTF-8"?> <Progress> <BytesScanned>512</BytesScanned> <BytesProcessed>1024</BytesProcessed> <BytesReturned>1024</BytesReturned> </Progress>

Stats Message

Header specification

Stats messages contain three headers, as follows:


							Screenshot of an example message structure including the headers for this record type.
Payload specification

Stats message payload is an XML document containing information about a request's stats when processing is complete.

  • BytesScanned => Number of bytes that have been processed before being uncompressed (if the file is compressed).

  • BytesProcessed => Number of bytes that have been processed after being uncompressed (if the file is compressed).

  • BytesReturned => Total number of bytes of records payload data returned by Amazon S3.

For uncompressed files, BytesScanned and BytesProcessed are equal.

Example:

<?xml version="1.0" encoding="UTF-8"?> <Stats> <BytesScanned>512</BytesScanned> <BytesProcessed>1024</BytesProcessed> <BytesReturned>1024</BytesReturned> </Stats>

End Message

Header specification

End messages contain two headers, as follows:


							Screenshot of an example message structure including the headers for this record type.
Payload specification

End messages have no payload.

Request Level Error Message

Header specification

Request-level error messages contain three headers, as follows:


							Screenshot of an example message structure including the headers for this record type.

For a list of possible error codes and error messages, see the table in the Special Errors section.

Payload specification

Request-level error messages have no payload.

Special Errors

The following table contains special errors that SELECT Object Content might return.

For general information about Amazon S3 errors and a list of error codes, see Error Responses.

Error Code Description HTTP Status Code SOAP Fault Code Prefix
Busy The service is unavailable. Please retry. 503 Client
UnauthorizedAccess You are not authorized to perform this operation 401 Client
EmptyRequestBody Request body cannot be empty. 400 Client
ExpressionTooLong The SQL expression is too long: The maximum byte-length for the SQL expression is 256 KB. 400 Client
IllegalSqlFunctionArgument Illegal argument was used in the SQL function. 400 Client
InternalError Encountered an internal error. 500 Client
InvalidColumnIndex Column index in the SQL expression is invalid. 400 Client
InvalidKeyPath Key path in the SQL expression is invalid. 400 Client
ColumnTooLong The length of a column in the result is greater than maxCharsPerColumn of 1 MB. 400 Client
OverMaxColumn The number of columns in the result is greater than the maximum allowable number of columns. 400 Client
OverMaxRecordSize The length of a record in the result is greater than maxCharsPerRecord of 1 MB. 400 Client
MissingHeaders Some headers in the query are missing from the file. Check the file and try again. 400 Client
InvalidCompressionFormat The file is not in a supported compression format. Only GZIP and BZIP2 are supported. 400 Client
TruncatedInput Object decompression failed. Check that the object is properly compressed using the format specified in the request. 400 Client
InvalidExpressionType The ExpressionType is invalid. Only SQL expressions are supported. 400 Client
InvalidFileHeaderInfo The FileHeaderInfo is invalid. Only NONE, USE, and IGNORE are supported. 400 Client
InvalidJsonType The JsonType is invalid. Only DOCUMENT and LINES are supported. 400 Client
InvalidQuoteFields The QuoteFields is invalid. Only ALWAYS and ASNEEDED are supported. 400 Client
InvalidRequestParameter The value of a parameter in SelectRequest element is invalid. Check the service API documentation and try again. 400 Client
CSVParsingError Encountered an error parsing the CSV file. Check the file and try again. 400 Client
JSONParsingError Encountered an error parsing the JSON file. Check the file and try again. 400 Client
ExternalEvalException The query cannot be evaluated. Check the file and try again. 400 Client
InvalidDataType The SQL expression contains an invalid data type. 400 Client
UnrecognizedFormatException Encountered an invalid record type. 400 Client
InvalidTextEncoding Invalid encoding type. Only UTF-8 encoding is supported. 400 Client
InvalidDataSource Invalid data source type. Only CSV and JSON are supported. 400 Client
InvalidTableAlias The SQL expression contains an invalid table alias. 400 Client
MalformedXML The XML provided was not well-formed or did not validate against our published schema. Check the service documentation and try again. 400 Client
MultipleDataSourcesUnsupported Multiple data sources are not supported. 400 Client
MissingRequiredParameter The SelectRequest entity is missing a required parameter. Check the service documentation and try again. 400 Client
ObjectSerializationConflict InputSerialization specifies more than one format, or OutputSerialization specifies more than one format. InputSerialization and OutputSerialization can only specify one format each. 400 Client
UnsupportedFunction Encountered an unsupported SQL function. 400 Client
UnsupportedSqlOperation Encountered an unsupported SQL operation. 400 Client
UnsupportedSqlStructure Encountered an unsupported SQL structure. Check the SQL Reference. 400 Client
UnsupportedStorageClass Encountered an invalid storage class. Only STANDARD, STANDARD_IA, and ONEZONE_IA storage classes are supported. 400 Client
UnsupportedSyntax Encountered invalid syntax. 400 Client
UnsupportedRangeHeader Range header is not supported for this operation. 400 Client
LexerInvalidChar The SQL expression contains an invalid character. 400 Client
LexerInvalidOperator The SQL expression contains an invalid literal. 400 Client
LexerInvalidLiteral The SQL expression contains an invalid operator. 400 Client
LexerInvalidIONLiteral The SQL expression contains an invalid operator. 400 Client
ParseExpectedDatePart Did not find the expected date part in the SQL expression. 400 Client
ParseExpectedKeyword Did not find the expected keyword in the SQL expression. 400 Client
ParseExpectedTokenType Did not find the expected token in the SQL expression. 400 Client
ParseExpected2TokenTypes Did not find the expected token in the SQL expression. 400 Client
ParseExpectedNumber Did not find the expected number in the SQL expression. 400 Client
ParseExpectedRightParenBuiltinFunctionCall Did not find the expected right parenthesis character in the SQL expression. 400 Client
ParseExpectedTypeName Did not find the expected type name in the SQL expression. 400 Client
ParseExpectedWhenClause Did not find the expected WHEN clause in the SQL expression. CASE is not supported. 400 Client
ParseUnsupportedToken The SQL expression contains an unsupported token. 400 Client
ParseUnsupportedLiteralsGroupBy The SQL expression contains an unsupported use of GROUP BY. 400 Client
ParseExpectedMember The SQL expression contains an unsupported use of MEMBER. 400 Client
ParseUnsupportedSelect The SQL expression contains an unsupported use of SELECT. 400 Client
ParseUnsupportedCase The SQL expression contains an unsupported use of CASE. 400 Client
ParseUnsupportedCaseClause The SQL expression contains an unsupported use of CASE. 400 Client
ParseUnsupportedAlias The SQL expression contains an unsupported use of ALIAS. 400 Client
ParseUnsupportedSyntax The SQL expression contains unsupported syntax. 400 Client
ParseUnknownOperator The SQL expression contains an invalid operator. 400 Client
ParseInvalidPathComponent The SQL expression contains an invalid path component. 400 Client
ParseMissingIdentAfterAt Did not find the expected identifier after the @ symbol in the SQL expression. 400 Client
ParseUnexpectedOperator The SQL expression contains an unexpected operator. 400 Client
ParseUnexpectedTerm The SQL expression contains an unexpected term. 400 Client
ParseUnexpectedToken The SQL expression contains an unexpected token. 400 Client
ParseUnExpectedKeyword The SQL expression contains an unexpected keyword. 400 Client
ParseExpectedExpression Did not find the expected SQL expression. 400 Client
ParseExpectedLeftParenAfterCast Did not find the expected left parenthesis after CAST in the SQL expression. 400 Client
ParseExpectedLeftParenValueConstructor Did not find expected the left parenthesis in the SQL expression. 400 Client
ParseExpectedLeftParenBuiltinFunctionCall Did not find the expected left parenthesis in the SQL expression. 400 Client
ParseExpectedArgumentDelimiter Did not find the expected argument delimiter in the SQL expression. 400 Client
ParseCastArity The SQL expression CAST has incorrect arity. 400 Client
ParseInvalidTypeParam The SQL expression contains an invalid parameter value. 400 Client
ParseEmptySelect The SQL expression contains an empty SELECT. 400 Client
ParseSelectMissingFrom The SQL expression contains a missing FROM after SELECT list. 400 Client
ParseExpectedIdentForGroupName GROUP is not supported in the SQL expression. 400 Client
ParseExpectedIdentForAlias Did not find the expected identifier for the alias in the SQL expression. 400 Client
ParseUnsupportedCallWithStar Only COUNT with (*) as a parameter is supported in the SQL expression. 400 Client
ParseNonUnaryAgregateFunctionCall Only one argument is supported for aggregate functions in the SQL expression. 400 Client
ParseMalformedJoin JOIN is not supported in the SQL expression. 400 Client
ParseExpectedIdentForAt Did not find the expected identifier for AT name in the SQL expression. 400 Client
ParseAsteriskIsNotAloneInSelectList Other expressions are not allowed in the SELECT list when '*' is used without dot notation in the SQL expression. 400 Client
ParseCannotMixSqbAndWildcardInSelectList Cannot mix [] and * in the same expression in a SELECT list in SQL expression. 400 Client
ParseInvalidContextForWildcardInSelectList Invalid use of * in SELECT list in the SQL expression. 400 Client
EvaluatorBindingDoesNotExist A column name or a path provided does not exist in the SQL expression. 400 Client
ValueParseFailure Time stamp parse failure in the SQL expression. 400 Client
IncorrectSqlFunctionArgumentType Incorrect type of arguments in function call in the SQL expression. 400 Client
AmbiguousFieldName Field name matches to multiple fields in the file. Check the SQL expression and the file, and try again. 400 Client
EvaluatorInvalidArguments Incorrect number of arguments in the function call in the SQL expression. 400 Client
EvaluatorInvalidTimestampFormatPattern Invalid time stamp format string in the SQL expression. 400 Client
ValueParseFailure Time stamp parse failure in the SQL expression. 400 Client
IntegerOverflow Integer overflow or underflow in the SQL expression. 400 Client
LikeInvalidInputs Invalid argument given to the LIKE clause in the SQL expression. 400 Client
CastFailed Attempt to convert from one data type to another using CAST failed in the SQL expression. 400 Client
InvalidCast Attempt to convert from one data type to another using CAST failed in the SQL expression. 400 Client
EvaluatorInvalidTimestampFormatPattern Time stamp format pattern requires additional fields in the SQL expression. 400 Client
EvaluatorInvalidTimestampFormatPatternSymbolForParsing Time stamp format pattern contains a valid format symbol that cannot be applied to time stamp parsing in the SQL expression. 400 Client
EvaluatorTimestampFormatPatternDuplicateFields Time stamp format pattern contains multiple format specifiers representing the time stamp field in the SQL expression. 400 Client
EvaluatorTimestampFormatPatternHourClockAmPmMismatch Time stamp format pattern contains a 12-hour hour of day format symbol but doesn't also contain an AM/PM field, or it contains a 24-hour hour of day format specifier and contains an AM/PM field in the SQL expression. 400 Client
EvaluatorUnterminatedTimestampFormatPatternToken Time stamp format pattern contains unterminated token in the SQL expression. 400 Client
EvaluatorInvalidTimestampFormatPatternToken Time stamp format pattern contains an invalid token in the SQL expression. 400 Client
EvaluatorInvalidTimestampFormatPatternSymbol Time stamp format pattern contains an invalid symbol in the SQL expression. 400 Client

Examples

Example 1: CSV Object

The following select request retrieves all records from an object with data stored in CSV format. The OutputSerialization element directs Amazon S3 to return results in CSV.

POST /exampleobject.csv?select&select-type=2 HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <CSV> <FileHeaderInfo>IGNORE</FileHeaderInfo> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> <Comments>#</Comments> </CSV> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

You can try different queries in the Expression element:

  • Assuming that you are not using column headers, you can identify columns using positional headers:

    SELECT s._1, s._2 FROM S3Object s WHERE s._3 > 100
  • If you have column headers and you set the FileHeaderInfo to Use, you can identify columns by name in the expression:

    SELECT s.Id, s.FirstName, s.SSN FROM S3Object s
  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s WHERE s._1 < 1

The following is a sample response.

HTTP/1.1 200 OK x-amz-id-2: GFihv3y6+kE7KG11GEkQhU7/2/cHR3Yb2fCb2S04nxI423Dqwg2XiQ0B/UZlzYQvPiBlZNRcovw= x-amz-request-id: 9F341CD3C4BA79E0 Date: Tue, 17 Oct 2017 23:54:05 GMT A series of messages

Example 2: JSON Object

The following select request retrieves all records from an object with data stored in JSON format. The OutputSerialization directs Amazon S3 to return results in CSV.

POST /exampleobject.json?select&select-type=2 HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <JSON> <Type>DOCUMENT</Type> </JSON> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

You can try different queries in the Expression element:

  • You can filter by string comparison using record keys:

    SELECT s.country, s.city from S3Object s where s.city = 'Seattle'
  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s

Notes

The SELECT Object Content operation does not support the following GET Object functionality. For more information, see GET Object.

  • Range: You cannot specify the range of bytes of an object to return.

  • GLACIER and REDUCED_REDUNDANCY storage classes: You cannot specify either the GLACIER or REDUCED_REDUNDANCY storage classes. For more information, about storage classes see Storage Classes in the Amazon Simple Storage Service Developer Guide.

Related Resources