Menu
Amazon Simple Storage Service
API Reference (API Version 2006-03-01)

SELECT Object Content (Preview)

Preview Program

This feature is only available as part of a Preview program and is subject to change at any time. If you are interested in participating in the Preview program, see Preview Request Form.

During the Preview program, this feature will only be available in the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore) and EU (Ireland).

During the Preview program, the following features are either not supported or are only partially supported:

  • Encryption: Objects that are encrypted at rest are not supported.

  • Compression: Only GZIP compression is supported.

  • Format: Only CSV and JSON formats are supported.

  • Encoding: Only UTF-8 encoding is supported.

Description

Retrieves object content based on an SQL expression. In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV) of the object. Amazon S3 uses this to parse object data into records and returns only records that match the specified SQL expression. In addition, you can specify the data serialization format for the response.

The API requires permission for the s3:GetObject API (you must have read access). Also, this operation does not support anonymous access.

Requests

Syntax

POST /ObjectName?select HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: date
Authorization: authorization string (See Authenticating Requests (AWS Signature Version
		4))

Request body goes here

Note

The syntax shows some of the request headers. For a complete list, see the "Request Headers" section of this topic.

Request Parameters

This implementation of the operation does not use request parameters.

Request Headers

You can use the following request headers in addition to the Common Request Headers.

Request Body

The following XML shows the request body:

Copy
<SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <CSV> <FileHeaderInfo>IGNORE</FileHeaderInfo> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> <Comments>#</Comments> </CSV> <JSON> <Type>Document</Type> </JSON> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> <JSON> <RecordDelimiter>\n</RecordDelimiter> </JSON> </OutputSerialization> </SelectRequest>

In the XML:

  • The InputSerialization element must specify either CSV or JSON. It describes the format of the data in the object that is being queried.

  • The OutputSerialization element must specify either CSV or JSON. It describes the format of the data that you want Amazon S3 to return in response.

The following tables explain each of the XML elements in the request body.

Name Description Required
Expression

The SQL expression. For example:

  • The following SQL expression retrieves the first column of the data from the object stored in CSV format.

    SELECT s._1 FROM S3Object s

  • The following SQL expression returns everything from the object.

    SELECT * FROM S3Object

Type: String

Ancestors: SelectRequest

Yes
ExpressionType

Identifies the expression type.

Type: String

Valid values: SQL

Ancestors: SelectRequest

Yes
InputSerialization

Describes the format of the data in the object that is being queried.

Type: Container

Ancestors: SelectRequest

Yes
OutputSerialization

Describes the format of the data that you want Amazon S3 to return in response.

Type: Container

Ancestors: SelectRequest

Yes

InputSerialization container element

Name Description Required
CompressionType

Identifies whether the Amazon S3 object that is being queried contains gzipped content.

Type: String

Valid values: NONE or GZIP

Default: NONE

Ancestors: InputSerialization

No
CSV or JSON

Specifies certain properties of the Amazon S3 object that is being queried.

Type: Container

Ancestors: InputSerialization

Exactly one of CSV or JSON is required

CSV container element (inside InputSerialization)

Name Description Required
RecordDelimiter

The value used to separate individual records in the input. Instead of the default value, you can specify an arbitrary delimiter.

Type: String

Default: \n

Ancestors: CSV

No
FieldDelimiter

The value used to separate individual fields in a record. You can specify an arbitrary delimiter. For example, in the case of a PSV file, a pipe (|) can be the field delimiter.

Type: String

Default: ,

Ancestors: CSV

No
QuoteCharacter

The value to use for escaping when the field delimiter is part of the value.

Consider this example in a CSV file:

"a, b"

The use of quotation marks makes this value a single field because you are wrapping the value in quotation marks. If you don't specify the quotation marks, the comma is a field delimiter (which makes it two separate field values, a and b).

Type: String

Default: "

Ancestors: CSV

No
quoteEscapeCharacter

The value to use for escaping the quotation mark character inside an already escaped value. For example, the value """ a , b """ is parsed as " a , b ".

Type: String

Default: "

Ancestors: CSV

No
FileHeaderInfo

Describes the first line in the input data. It is one of the ENUM values.

  • None: First line is not a header.

  • Ignore: First line is a header, but you can't use the header values to indicate the column in an expression. You can use column position (such as _1, _2, …) to indicate the column (SELECT s._1 FROM S3OBJECT s).

  • Use: First line is a header, and you can use the header value to identify a column in an expression (SELECT "name" FROM S3OBJECT).

Type: Enum

Valid values: None | Use | Ignore

Ancestors: CSV

No
Comments

If the first character of a line of text matches the comment character, the row is considered a comment and is discarded from the input. You can specify any character to indicate a comment line.

Type: String

Default: #

Ancestors: CSV

No

JSON container element (inside InputSerialization)

Name Description Required
Type

The type of JSON

Type: Enum

Valid values: Document | Stream | Lines

Default: Document

Ancestors: JSON

No

OutputSerialization container element

Name Description Required
CSV or JSON

Specifies certain properties of the data that is returned in response.

Type: Container

Ancestors: OutputSerialization

Exactly one of CSV or JSON is required

CSV container element (inside OutputSerialization)

Name Description Required
QuoteFields

Indicates whether to use quotation marks around output fields.

  • Always: Always use quotation marks for output fields.

  • AsNeeded: Use quotation marks for output fields when needed.

Type: String

Valid values: Always | AsNeeded

Default: AsNeeded

Ancestors: CSV

No
RecordDelimiter

The value used to separate individual records in the output. Instead of the default value, you can specify an arbitrary delimiter.

Type: String

Default: \n

Ancestors: CSV

No
FieldDelimiter

The value you want Amazon S3 to use to separate individual fields in a record. You can specify an arbitrary delimiter. For example in the case of a PSV file, a pipe (|) can be the field delimiter.

Type: String

Default: ,

Ancestors: CSV

No
QuoteCharacter

The value to use for escaping when the field delimiter is part of the value. For example, if the value is a, b, then Amazon S3 wraps this field value in quotation marks as follows: " a , b ".

Type: String

Default: "

Ancestors: CSV

No
quoteEscapeCharacter

The value to use for escaping the quotation mark character inside an already escaped value. For example, if the value is " a , b ", then Amazon S3 wraps the value in quotation marks as follows: """ a , b """.

Type: String

Default: "

Ancestors: CSV

No

JSON container element (inside OutputSerialization)

Name Description Required
RecordDelimiter

The value used to separate individual records in the output. Instead of the default value, you can specify an arbitrary delimiter.

Type: String

Default: \n

Ancestors: JSON

No

Responses

A successful operation returns either 200 OK status code.

Response Headers

This implementation of the operation uses only response headers that are common to most responses. For more information, see Common Response Headers.

Response Body

Because the response size is unknown, Amazon S3 streams the response as a series of frames and includes a Transfer-Encoding header with chunked as its value in the response. The following example shows the response format at the top level:

<FRAME 1>
<FRAME 1>
<FRAME 2>
......
<FRAME n>

The following diagram shows a frame structure:


					Screenshot of an example frame structure showing type, payload length,
						and header checksum.

The response includes frames of the following types:

  • Records frame: Includes one or more records. A record never spans multiple frames. Depending on the size of the result, the response can have one or more of these frames.

  • Continuation frame: Amazon S3 periodically sends this frame to keep the TCP connection open. These frames appear in response at random. The client must detect the frame type and process accordingly.

  • End frame: Indicates the end of the SQL expression processing, so the client can close the connection.

Each frame contains the following components:

  • Header: All frames have the same header length.

    • Type: Indicates the frame type. The first byte indicates the version (1, 2, 3, and so on in binary format). The last three bytes identify the frame type, as shown in the following examples. A frame in the response can be one of the following types (Amazon S3 converts the type value to binary when constructing the frame):

      • Record(s) frame: 100000000000000000000001 (8388609 in decimal)

      • Continuation frame: 100000000000000000000100 (8388612 in decimal)

      • End frame: 100000000000000000000101 (8388613 in decimal)

    • Payload length: Indicates the length of the payload.

    • Header checksum: Indicates the 32 bit checksum of the header fields.

  • Payload: Can be anything.

  • Payload checksum: Indicates the 32 bit checksum of the payload.

Each of the frame types is explained in the following sections.

Record(s) Frame

A frame of this type contains actual data records. Depending on the frame type, the first part of the payload contains an offset field.


						Screenshot of an example frame structure showing the payload
							containing the offset and records fields.

Amazon S3 serializes the records frame based on the OutputSerialization element that you specify in the request (for example, CSV or JSON). The API ensures that a single record is not spread across multiple frames.

Continuation Frame

A frame of this type indicates progress and to keep the TCP connection open.


						Screenshot of an example continuation frame structure showing the
							payload in the offset field.

The Offset indicates the byte offset in your object data where Amazon S3 is processing.

End Frame

A frame of this type indicates the end of streaming. That is, there are no more frames following this frame, and the client can close the connection.


						Screenshot of an example end frame structure showing the payload in
							the offset field..

Special Errors

The following table contains special errors that SELECT Object Content might return.

For general information about Amazon S3 errors and a list of error codes, see Error Responses.

Error Code Description HTTP Status Code SOAP Fault Code Prefix
MalformedXML The XML you provided was not well formed or did not validate against our published schema. 4xx Client
UnsupportedObjectType Expression type is not supported. 4xx Client
InvalidExpressionType ExpressionType is invalid. 4xx Client
InvalidFileHeaderInfo FileHeaderInfo is invalid. 4xx Client
InvalidJsonType JSON type is invalid. 4xx Client
InvalidQuoteFields QuoteCharacter is invalid. 4xx Client
InvalidDataFormat The data format is invalid. 4xx Client
InvalidCompressionFormat The compression format is not applicable to the queried object. 4xx Client
InvalidTextEncoding UTF-8 encoding is required. A text encoding error was found. 4xx Client
UnsupportedEncryptionMode Server-side encryption is not supported. 4xx Client
UnsupportedStorageClass Storage class is not supported. 4xx Client
EmptyRequestBody Request body cannot be empty. 4xx Client
ExpressionTooLong SQL expression in the request is too long. 4xx Client
UnsupportedIsolatedRequest Unsupported request. 4xx Client
MissingRequiredParameter A required parameter is missing. 4xx Client
ObjectSerializationConflict Serialization parameters are in conflict. 4xx Client
LexerInvalidChar Invalid character. 4xx Client
LexerInvalidOperator Invalid operator. 4xx Client
LexerInvalidLiteral Invalid literal. 4xx Client
ParseExpectedKeyword Did not find expected keyword. 4xx Client
ParseExpectedRightParenBuiltinFunctionCall Expected right parenthesis. 4xx Client
ParseExpectedTokenType Unexpected token found. 4xx Client
ParseExpected2TokenTypes Unexpected token found. 4xx Client
ParseExpectedNumber Expected number. 4xx Client
ParseExpectedTypeName Expected type name. 4xx Client
ParseExpectedWhenClause CASE is not supported. 4xx Client
ParseUnsupportedToken Unexpected token found. 4xx Client
ParseUnsupportedLiteralsGroupBy GROUP BY is not supported. 4xx Client
ParseExpectedMember MEMBER is not supported. 4xx Client
ParseUnsupportedSelect Unsupported use of SELECT. 4xx Client
ParseUnsupportedCase CASE is not supported. 4xx Client
ParseUnsupportedCaseClause CASE is not supported. 4xx Client
ParseUnsupportedAlias Unsupported syntax for alias. 4xx Client
ParseUnsupportedSyntax Unsupported syntax. 4xx Client
ParseUnknownOperator Unsupported operator. 4xx Client
ParseInvalidPathComponent Invalid Path component. 4xx Client
ParseMissingIdentAfterAt Identifier expected after `@` symbol. 4xx Client
ParseUnexpectedOperator Unexpected operator. 4xx Client
ParseUnexpectedTerm Unexpected term. 4xx Client
ParseUnexpectedToken Unexpected token. 4xx Client
ParseUnExpectedKeyword Unexpected keyword. 4xx Client
ParseExpectedExpression Did not find expected expression. 4xx Client
ParseExpectedLeftParenAfterCast Expected left parenthesis after CAST. 4xx Client
ParseExpectedLeftParenValueConstructor Expected left parenthesis. 4xx Client
ParseExpectedLeftParenBuiltinFunctionCall Expected left parenthesis. 4xx Client
ParseExpectedArgumentDelimiter Expected argument delimiter. 4xx Client
ParseCastArity Cast to type has incorrect arity. 4xx Client
ParseInvalidTypeParam Invalid value used for type parameter. 4xx Client
ParseEmptySelect Found empty SELECT list. 4xx Client
ParseSelectMissingFrom Missing FROM after SELECT list. 4xx Client
ParseExpectedIdentForGroupName GROUP is not supported. 4xx Client
ParseExpectedIdentForAlias Expected identifier for alias. 4xx Client
ParseUnsupportedCallWithStar Function call, other than COUNT, with (*) as parameter is not supported. 4xx Client
ParseNonUnaryAgregateFunctionCall Aggregate function calls take one argument only. 4xx Client
ParseMalformedJoin JOIN is not supported. 4xx Client
ParseExpectedIdentForAt Expected identifier for AT name. 4xx Client
EvaluatorInvalidTimestampFormatPattern Invalid time stamp format string. 4xx Client
InvalidDataSource Invalid data source type. 4xx Client
IllegalSqlFunctionArgument Illegal argument provided for SQL function. 4xx Client
InvalidColumnIndex The column index is invalid. 4xx Client
OverMaxNumberOfColumn The character number in one column or the total column number in one record is more than the maximum threshold. 4xx Client
OverMaxRecordSize The character number in one record is more than the maximum threshold. 4xx Client
InvalidKeyPath The key path is invalid. 4xx Client
InvalidTableAlias Invalid table alias. 4xx Client
MultipleDataSourcesUnsupported Multiple data sources are not supported. 4xx Client
UnsupportedFunction Unsupported function. 4xx Client
UnsupportedSqlStructure Unsupported SQL structure. 4xx Client
UnsupportedSqlOperation Unsupported SQL operation. 4xx Client
UnsupportedSyntax Unsupported syntax. 4xx Client

Examples

Example 1: CSV Object

The following select request retrieves all records from an object with data stored in CSV format. The InputSerialization element specifies CSV format because the source object is assumed to have data in CSV format. The OutputSerialization element directs Amazon S3 to return results in CSV.

Copy
POST /exampleobject.csv?select HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <CSV> <FileHeaderInfo>IGNORE</FileHeaderInfo> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> <Comments>#</Comments> </CSV> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

You can try different queries in the Expression element:

  • Assuming you are not using any headers, you can specify columns using positional headers:

    SELECT s._1, s._2 FROM S3Object s WHERE s._3 > 100
  • If you have headers and you set the fileHeaderInfo to Use, you can specify headers in the expression:

    SELECT s.Id, s.FirstName, s.SSN FROM S3Object s
  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s WHERE s._1 < 1

The following is a sample response.

HTTP/1.1 200 OK
x-amz-id-2: GFihv3y6+kE7KG11GEkQhU7/2/cHR3Yb2fCb2S04nxI423Dqwg2XiQ0B/UZlzYQvPiBlZNRcovw=
x-amz-request-id: 9F341CD3C4BA79E0
Date: Tue, 17 Oct 2017 23:54:05 GMT

A series of frames

If a copy of the object is already restored, Amazon S3 returns a 200 OK response, updating only the restored copy's expiry time.

Example 2: JSON Object

The following select request retrieves all records from an object with data stored in JSON format. The InputSerialization element specifies JSON format because the source object is assumed to have data in JSON format. The OutputSerialization directs Amazon S3 to return results in CSV.

Copy
POST /exampleobject.json?select HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <JSON> <Type>Document</Type> </JSON> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

You can try different queries in the Expression element:

  • You can filter by string comparison using record keys:

    SELECT s.country, s.city from S3Object s where s.city = 'Seattle'
  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s

Notes

The SELECT Object Content operation does not support the following GET Object functionality. For more information, see GET Object.

  • Range: You cannot specify the range of bytes of an object to return.

  • GLACIER storage class: You cannot specify the GLACIER storage class object. For more information, see Storage Classes in the Amazon Simple Storage Service Developer Guide.

Related Resources