在对象上使用 Amazon S3 Select 的示例 - Amazon Simple Storage Service

在对象上使用 Amazon S3 Select 的示例

您可以将 S3 Select 与 Amazon S3 REST API 和 AWS 开发工具包一起使用,从对象中选择内容。

您可以使用 AWS 开发工具包从对象中选择内容。然而,如果您的应用程序需要它,则可以直接发送 REST 请求。有关请求和响应格式的更多信息,请参阅 SELECT 对象内容

您可以使用 Amazon S3 Select 通过 selectObjectContent 方法来选择对象的内容,成功时将返回 SQL 表达式的结果。

Java

以下 Java 代码返回对象 (包含以 CSV 格式存储的数据) 中存储的每条记录的第一列的值。它还请求返回 ProgressStats 消息。必须提供有效的存储桶名称和包含 CSV 格式的数据的对象。

有关创建和测试有效示例的说明,请参阅 测试 Amazon S3 Java 代码示例

package com.amazonaws; import com.amazonaws.services.s3.AmazonS3; import com.amazonaws.services.s3.AmazonS3ClientBuilder; import com.amazonaws.services.s3.model.CSVInput; import com.amazonaws.services.s3.model.CSVOutput; import com.amazonaws.services.s3.model.CompressionType; import com.amazonaws.services.s3.model.ExpressionType; import com.amazonaws.services.s3.model.InputSerialization; import com.amazonaws.services.s3.model.OutputSerialization; import com.amazonaws.services.s3.model.SelectObjectContentEvent; import com.amazonaws.services.s3.model.SelectObjectContentEventVisitor; import com.amazonaws.services.s3.model.SelectObjectContentRequest; import com.amazonaws.services.s3.model.SelectObjectContentResult; import java.io.File; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import java.util.concurrent.atomic.AtomicBoolean; import static com.amazonaws.util.IOUtils.copy; /** * This example shows how to query data from S3Select and consume the response in the form of an * InputStream of records and write it to a file. */ public class RecordInputStreamExample { private static final String BUCKET_NAME = "${my-s3-bucket}"; private static final String CSV_OBJECT_KEY = "${my-csv-object-key}"; private static final String S3_SELECT_RESULTS_PATH = "${my-s3-select-results-path}"; private static final String QUERY = "select s._1 from S3Object s"; public static void main(String[] args) throws Exception { final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient(); SelectObjectContentRequest request = generateBaseCSVRequest(BUCKET_NAME, CSV_OBJECT_KEY, QUERY); final AtomicBoolean isResultComplete = new AtomicBoolean(false); try (OutputStream fileOutputStream = new FileOutputStream(new File (S3_SELECT_RESULTS_PATH)); SelectObjectContentResult result = s3Client.selectObjectContent(request)) { InputStream resultInputStream = result.getPayload().getRecordsInputStream( new SelectObjectContentEventVisitor() { @Override public void visit(SelectObjectContentEvent.StatsEvent event) { System.out.println( "Received Stats, Bytes Scanned: " + event.getDetails().getBytesScanned() + " Bytes Processed: " + event.getDetails().getBytesProcessed()); } /* * An End Event informs that the request has finished successfully. */ @Override public void visit(SelectObjectContentEvent.EndEvent event) { isResultComplete.set(true); System.out.println("Received End Event. Result is complete."); } } ); copy(resultInputStream, fileOutputStream); } /* * The End Event indicates all matching records have been transmitted. * If the End Event is not received, the results may be incomplete. */ if (!isResultComplete.get()) { throw new Exception("S3 Select request was incomplete as End Event was not received."); } } private static SelectObjectContentRequest generateBaseCSVRequest(String bucket, String key, String query) { SelectObjectContentRequest request = new SelectObjectContentRequest(); request.setBucketName(bucket); request.setKey(key); request.setExpression(query); request.setExpressionType(ExpressionType.SQL); InputSerialization inputSerialization = new InputSerialization(); inputSerialization.setCsv(new CSVInput()); inputSerialization.setCompressionType(CompressionType.NONE); request.setInputSerialization(inputSerialization); OutputSerialization outputSerialization = new OutputSerialization(); outputSerialization.setCsv(new CSVOutput()); request.setOutputSerialization(outputSerialization); return request; } }
JavaScript

有关将 AWS SDK for JavaScript 与 S3 SelectObjectContent API 一起使用从 Amazon S3 中存储的 JSON 和 CSV 文件中选择记录的 JavaScript 示例,请参阅博客文章在 AWS SDK for JavaScript 中引入对 Amazon S3 Select 的支持

Python

有关使用结构化查询语言 (SQL) 查询通过使用 S3 Select 以逗号分隔值 (CSV) 文件形式加载到 Amazon S3 的数据进行搜索的 Python 示例,请参阅博客文章使用 Amazon S3 Select 在没有服务器或数据库的情况下查询数据 (Querying data without servers or databases using Amazon S3 Select)。