Amazon Simple Storage Service
API Reference (API Version 2006-03-01)

The AWS Documentation website is getting a new look!
Try it now and let us know what you think. Switch to the new look >>

You can return to the original look by selecting English in the language selector above.

SelectObjectContent

This operation filters the contents of an Amazon S3 object based on a simple structured query language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV, or Apache Parquet) of the object. Amazon S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

For more information about Amazon S3 Select, see Selecting Content from Objects in the Amazon Simple Storage Service Developer Guide.

For more information about using SQL with Amazon S3 Select, see SQL Reference for Amazon S3 Select and Glacier Select in the Amazon Simple Storage Service Developer Guide.

You must have s3:GetObject permission for this operation. Amazon S3 Select does not support anonymous access. For more information about permissions, see Specifying Permissions in a Policy in the Amazon Simple Storage Service Developer Guide.

You can use Amazon S3 Select to query objects that have the following format properties:

  • CSV, JSON, and Parquet - Objects must be in CSV, JSON, or Parquet format.

  • UTF-8 - UTF-8 is the only encoding type Amazon S3 Select supports.

  • GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. GZIP and BZIP2 are the only compression formats that Amazon S3 Select supports for CSV and JSON files. Amazon S3 Select supports columnar compression for Parquet using GZIP or Snappy. Amazon S3 Select does not support whole-object compression for Parquet objects.

  • Server-side encryption - Amazon S3 Select supports querying objects that are protected with server-side encryption.

    For objects that are encrypted with customer-provided encryption keys (SSE-C), you must use HTTPS, and you must use the headers that are documented in the GetObject. For more information about SSE-C, see Server-Side Encryption (Using Customer-Provided Encryption Keys) in the Amazon Simple Storage Service Developer Guide.

    For objects that are encrypted with Amazon S3 managed encryption keys (SSE-S3) and customer master keys (CMKs) stored in AWS Key Management Service (SSE-KMS), server-side encryption is handled transparently, so you don't need to specify anything. For more information about server-side encryption, including SSE-S3 and SSE-KMS, see Protecting Data Using Server-Side Encryption in the Amazon Simple Storage Service Developer Guide.

The SelectObjectContent operation does not support the following GetObject functionality. For more information, see GetObject.

  • Range: You cannot specify the range of bytes of an object to return.

  • GLACIER, DEEP_ARCHIVE and REDUCED_REDUNDANCY storage classes: You cannot specify the GLACIER, DEEP_ARCHIVE, or REDUCED_REDUNDANCY storage classes. For more information, about storage classes see Storage Classes in the Amazon Simple Storage Service Developer Guide.

Special Errors

For a list of special errors for this operation and for general information about Amazon S3 errors and a list of error codes, see Error Responses

Request Syntax

POST /{Key+}?select&select-type=2 HTTP/1.1 Host: Bucket.s3.amazonaws.com x-amz-server-side-encryption-customer-algorithm: SSECustomerAlgorithm x-amz-server-side-encryption-customer-key: SSECustomerKey x-amz-server-side-encryption-customer-key-MD5: SSECustomerKeyMD5 <?xml version="1.0" encoding="UTF-8"?> <SelectObjectContentRequest xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Expression>string</Expression> <ExpressionType>string</ExpressionType> <RequestProgress> <Enabled>boolean</Enabled> </RequestProgress> <InputSerialization> <CompressionType>string</CompressionType> <CSV> <AllowQuotedRecordDelimiter>boolean</AllowQuotedRecordDelimiter> <Comments>string</Comments> <FieldDelimiter>string</FieldDelimiter> <FileHeaderInfo>string</FileHeaderInfo> <QuoteCharacter>string</QuoteCharacter> <QuoteEscapeCharacter>string</QuoteEscapeCharacter> <RecordDelimiter>string</RecordDelimiter> </CSV> <JSON> <Type>string</Type> </JSON> <Parquet> </Parquet> </InputSerialization> <OutputSerialization> <CSV> <FieldDelimiter>string</FieldDelimiter> <QuoteCharacter>string</QuoteCharacter> <QuoteEscapeCharacter>string</QuoteEscapeCharacter> <QuoteFields>string</QuoteFields> <RecordDelimiter>string</RecordDelimiter> </CSV> <JSON> <RecordDelimiter>string</RecordDelimiter> </JSON> </OutputSerialization> </SelectObjectContentRequest>

URI Request Parameters

The request requires the following URI parameters.

Bucket

The S3 bucket.

Key

The object key.

Length Constraints: Minimum length of 1.

x-amz-server-side-encryption-customer-algorithm

The SSE Algorithm used to encrypt the object. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

x-amz-server-side-encryption-customer-key

The SSE Customer Key. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

x-amz-server-side-encryption-customer-key-MD5

The SSE Customer Key MD5. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

Request Body

The request accepts the following data in XML format.

SelectObjectContentRequest

Root level tag for the SelectObjectContentRequest parameters.

Required: Yes

Expression

The expression that is used to query the object.

Type: String

Required: Yes

ExpressionType

The type of the provided expression (for example., SQL).

Type: String

Valid Values: SQL

Required: Yes

InputSerialization

Describes the format of the data in the object that is being queried.

Type: InputSerialization data type

Required: Yes

OutputSerialization

Describes the format of the data that you want Amazon S3 to return in response.

Type: OutputSerialization data type

Required: Yes

RequestProgress

Specifies if periodic request progress information should be enabled.

Type: RequestProgress data type

Required: No

Response Syntax

HTTP/1.1 200 <?xml version="1.0" encoding="UTF-8"?> <Payload> <Records> <Payload>blob</Payload> </Records> <Stats> <Details> <BytesProcessed>long</BytesProcessed> <BytesReturned>long</BytesReturned> <BytesScanned>long</BytesScanned> </Details> </Stats> <Progress> <Details> <BytesProcessed>long</BytesProcessed> <BytesReturned>long</BytesReturned> <BytesScanned>long</BytesScanned> </Details> </Progress> <Cont> </Cont> <End> </End> </Payload>

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in XML format by the service.

Payload

Root level tag for the Payload parameters.

Required: Yes

Cont

The Continuation Event.

Type: ContinuationEvent data type

End

The End Event.

Type: EndEvent data type

Progress

The Progress Event.

Type: ProgressEvent data type

Records

The Records Event.

Type: RecordsEvent data type

Stats

The Stats Event.

Type: StatsEvent data type

Examples

Example 1: CSV Object

The following select request retrieves all records from an object with data stored in CSV format. The OutputSerialization element directs Amazon S3 to return results in CSV.

You can try different queries in the Expression element:

  • Assuming that you are not using column headers, you can identify columns using positional headers:

    SELECT s._1, s._2 FROM S3Object s WHERE s._3 > 100

  • If you have column headers and you set the FileHeaderInfo to Use, you can identify columns by name in the expression:

    role="nocopy" language="sql">SELECT s.Id, s.FirstName, s.SSN FROM S3Object s

  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s WHERE s._1 < 1

POST /exampleobject.csv?select&select-type=2 HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <CSV> <FileHeaderInfo>IGNORE</FileHeaderInfo> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> <Comments>#</Comments> </CSV> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

The following is a sample response.

HTTP/1.1 200 OK x-amz-id-2: GFihv3y6+kE7KG11GEkQhU7/2/cHR3Yb2fCb2S04nxI423Dqwg2XiQ0B/UZlzYQvPiBlZNRcovw= x-amz-request-id: 9F341CD3C4BA79E0 Date: Tue, 17 Oct 2017 23:54:05 GMT A series of messages

Example 2: JSON Object

The following select request retrieves all records from an object with data stored in JSON format. The OutputSerialization directs Amazon S3 to return results in CSV.

You can try different queries in the Expression element:

  • You can filter by string comparison using record keys:

    SELECT s.country, s.city from S3Object s where s.city = 'Seattle'

  • You can specify functions in the SQL expression:

    SELECT count(*) FROM S3Object s

POST /exampleobject.json?select&select-type=2 HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>GZIP</CompressionType> <JSON> <Type>DOCUMENT</Type> </JSON> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

The following is a sample response.

HTTP/1.1 200 OK x-amz-id-2: GFihv3y6+kE7KG11GEkQhU7/2/cHR3Yb2fCb2S04nxI423Dqwg2XiQ0B/UZlzYQvPiBlZNRcovw= x-amz-request-id: 9F341CD3C4BA79E0 Date: Tue, 17 Oct 2017 23:54:05 GMT A series of messages

Example 3: Parquet Object

  • The InputSerialization element describes the format of the data in the object that is being queried. It must specify CSV, JSON, or Parquet.

  • The OutputSerialization element describes the format of the data that you want Amazon S3 to return in response to the query. It must specify CSV, JSON. Amazon S3 doesn't support outputting data in the Parquet format.

  • The format of the InputSerialization doesn't need to match the format of the OutputSerialization. So, for example, you can specify JSON in the InputSerialization and CSV in the OutputSerialization.

POST /exampleobject.parquet?select&select-type=2 HTTP/1.1 Host: examplebucket.s3.amazonaws.com Date: Tue, 17 Oct 2017 01:49:52 GMT Authorization: authorization string Content-Length: content length <?xml version="1.0" encoding="UTF-8"?> <SelectRequest> <Expression>Select * from S3Object</Expression> <ExpressionType>SQL</ExpressionType> <InputSerialization> <CompressionType>NONE</CompressionType> <Parquet> </Parquet> </InputSerialization> <OutputSerialization> <CSV> <QuoteFields>ASNEEDED</QuoteFields> <RecordDelimiter>\n</RecordDelimiter> <FieldDelimiter>,</FieldDelimiter> <QuoteCharacter>"</QuoteCharacter> <QuoteEscapeCharacter>"</QuoteEscapeCharacter> </CSV> </OutputSerialization> </SelectRequest>

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: