本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
中繼資料表查詢範例
下列範例示範如何使用標準 SQL 查詢,從 S3 中繼資料表取得不同類型的資訊。
使用下列範例時,請記住:
-
這些範例是為了與 Amazon Athena 搭配使用所撰寫。您可能需要修改範例,以搭配不同的查詢引擎使用。
-
確定您了解如何最佳化查詢。
-
b_
將 取代為您命名空間的名稱。general-purpose-bucket-name
-
如需支援資料欄的完整清單,請參閱 S3 中繼資料日誌資料表結構描述和 S3 中繼資料即時庫存資料表結構描述。
內容
日誌資料表範例查詢
您可以使用下列範例查詢來查詢日誌資料表。
依副檔名尋找物件
下列查詢會傳回具有特定副檔名 (.jpg
在此情況下為 ) 的物件:
SELECT key FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE key LIKE '%.jpg' AND record_type = 'CREATE'general-purpose-bucket-name
列出物件刪除
下列查詢會傳回物件刪除事件,包括發出請求的 AWS 帳戶 ID 或服務 AWS 主體:
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, requester, source_ip_address, version_id FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE record_type = 'DELETE';general-purpose-bucket-name
列出物件使用的 AWS KMS 加密金鑰
下列查詢會傳回加密物件之 AWS Key Management Service (AWS KMS) 金鑰ARNs:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."
b_
"."journal";general-purpose-bucket-name
列出未使用 KMS 金鑰的物件
下列查詢會傳回未使用 AWS KMS 金鑰加密的物件:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE encryption_status NOT IN ('SSE-KMS', 'DSSE-KMS') AND record_type = 'CREATE';general-purpose-bucket-name
列出過去 7 天內用於 PUT
操作的 AWS KMS 加密金鑰
下列查詢會傳回加密物件之 AWS Key Management Service (AWS KMS) 金鑰ARNs:
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE record_timestamp > (current_date - interval '7' day) AND kms_key_arn is NOT NULL;general-purpose-bucket-name
列出 S3 生命週期在過去 24 小時內刪除的物件
下列查詢傳回列出 S3 生命週期在最後一天過期的物件:
SELECT bucket, key, version_id, last_modified_date, record_timestamp, requester FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE requester = 's3.amazonaws.com' AND record_type = 'DELETE' AND record_timestamp > (current_date - interval '1' day)general-purpose-bucket-name
檢視 Amazon Bedrock 提供的中繼資料
有些 AWS 服務 (例如 Amazon Bedrock) 會將物件上傳至 Amazon S3。您可以查詢這些服務提供的物件中繼資料。例如,下列查詢包含 user_metadata
欄,以判斷是否有物件由 Amazon Bedrock 上傳到一般用途儲存貯體:
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, user_metadata FROM "s3tablescatalog/aws-s3"."
b_
"."journal" WHERE record_type = 'CREATE' AND user_metadata['content-source'] = 'AmazonBedrock';general-purpose-bucket-name
如果 Amazon Bedrock 已將物件上傳至您的儲存貯體,user_metadata
欄會在查詢結果中顯示以下與物件相關聯的中繼資料:
user_metadata {content-additional-params -> requestid="CVK8FWYRW0M9JW65", signedContentSHA384="38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b", content-model-id -> bedrock-model-arn, content-source -> AmazonBedrock}
了解物件的目前狀態
下列查詢可協助您判斷物件的目前狀態。查詢會識別每個物件的最新版本、篩選掉已刪除的物件,並根據序號標記每個物件的最新版本。結果會依 bucket
、key
和 sequence_number
欄排序。
WITH records_of_interest as ( -- Start with a query that can narrow down the records of interest. SELECT * from "s3tablescatalog/aws-s3"."
b_
"."journal" ), version_stacks as ( SELECT *, -- Introduce a column called 'next_sequence_number', which is the next larger -- sequence_number for the same key version_id in sorted order. LEAD(sequence_number, 1) over (partition by (bucket, key, coalesce(version_id, '')) order by sequence_number ASC) as next_sequence_number from records_of_interest ), -- Pick the 'tip' of each version stack triple: (bucket, key, version_id). -- The tip of the version stack is the row of that triple with the largest sequencer. -- Selecting only the tip filters out any row duplicates. -- This isn't typical, but some events can be delivered more than once to the table -- and include rows that might no longer exist in the bucket (since the -- table contains rows for both extant and extinct objects). -- In the next subquery, eliminate the rows that contain deleted objects. current_versions as ( SELECT * from version_stacks where next_sequence_number is NULL ), -- Eliminate the rows that are extinct from the bucket by filtering with -- record_type. An object version has been deleted from the bucket if its tip is -- record_type==DELETE. existing_current_versions as ( SELECT * from current_versions where not (record_type = 'DELETE' and is_delete_marker = FALSE) ), -- Optionally, to determine which of several object versions is the 'latest', -- you can compare their sequence numbers. A version_id is the latest if its -- tip's sequencer is the largest among all other tips in the same key. with_is_latest as ( SELECT *, -- Determine if the sequence_number of this row is the same as the largest sequencer for the key that still exists. sequence_number = (MAX(sequence_number) over (partition by (bucket, key))) as is_latest_version FROM existing_current_versions ) SELECT * from with_is_latest ORDER BY bucket, key, sequence_number;general-purpose-bucket-name
庫存資料表範例查詢
您可以使用下列範例查詢來查詢庫存資料表。
探索使用特定標籤的資料集
下列查詢會傳回使用指定標籤的資料集:
SELECT * FROM "s3tablescatalog/aws-s3"."
b_
"."inventory" WHERE object_tags['key1'] = 'value1' AND object_tags['key2'] = 'value2';general-purpose-bucket-name
列出未使用 SSE-KMS 加密的物件
下列查詢會傳回未使用 SSE-KMS 加密的物件:
SELECT key, encryption_status FROM "s3tablescatalog/aws-s3"."
b_
"."inventory" WHERE encryption_status != 'SSE-KMS';general-purpose-bucket-name
列出未加密的物件
下列查詢會傳回未加密的物件:
SELECT bucket, key, version_id FROM "s3tablescatalog/aws-s3"."
b_
"."inventory" WHERE encryption_status IS NULL;general-purpose-bucket-name
列出 Amazon Bedrock 產生的物件
下列查詢列出 Amazon Bedrock 產生的物件:
SELECT DISTINCT bucket, key, sequence_number, user_metadata FROM "s3tablescatalog/aws-s3"."
b_
"."inventory" WHERE user_metadata['content-source'] = 'AmazonBedrock';general-purpose-bucket-name
使用日誌資料表協調清查資料表
下列查詢會產生與儲存貯體目前內容最新的inventory-table-like清單。更精確地說,產生的清單結合了清查資料表的最新快照和日誌資料表中的最新事件。
若要讓此查詢產生最準確的結果,日誌和庫存資料表都必須處於作用中狀態。
我們建議您將此查詢用於包含少於十億 (10^9) 個物件的一般用途儲存貯體。
此範例查詢會將下列簡化套用至清單結果 (相較於庫存資料表):
-
資料欄遺漏 – 資料欄
bucket
、is_multipart
、kms_key_arn
、、encryption_status
is_bucket_key_enabled
和checksum_algorithm
不屬於最終結果的一部分。將一組選用資料欄保持在最低限度可改善效能。 -
包含所有記錄 – 查詢會傳回所有物件金鑰和版本,包括 null 版本 (未版本控制或暫停版本控制儲存貯體) 和刪除標記。如需如何篩選結果以僅顯示您感興趣的金鑰的範例,請參閱查詢結尾的
WHERE
子句。 -
加速對帳 – 在極少數情況下,查詢可能會暫時報告不再位於儲存貯體中的物件。一旦庫存資料表的下一個快照可用,就會消除這些差異。此行為是效能和準確性之間的權衡。
WITH inventory_time_cte AS ( SELECT COALESCE(inventory_time_from_property, inventory_time_default) AS inventory_time FROM ( SELECT * FROM (VALUES (TIMESTAMP '2024-12-01 00:00')) AS T (inventory_time_default) LEFT OUTER JOIN ( SELECT from_unixtime(CAST(value AS BIGINT) / 1000.0) AS inventory_time_from_property FROM "journal$properties" WHERE key = 'aws.s3metadata.oldest-uncoalesced-record-timestamp' LIMIT 1 ) ON TRUE ) ), working_set AS ( SELECT key, sequence_number, version_id, is_delete_marker, size, COALESCE(last_modified_date, record_timestamp) AS last_modified_date, e_tag, storage_class, object_tags, user_metadata, (record_type = 'DELETE' AND NOT COALESCE(is_delete_marker, FALSE)) AS _is_perm_delete FROM journal j CROSS JOIN inventory_time_cte t WHERE j.record_timestamp > (t.inventory_time - interval '15' minute) UNION ALL SELECT key, sequence_number, version_id, is_delete_marker, size, last_modified_date, e_tag, storage_class, object_tags, user_metadata, FALSE AS _is_perm_delete FROM inventory i ), updated_inventory AS ( SELECT * FROM ( SELECT *, MAX(sequence_number) OVER (PARTITION BY key, version_id) AS _supremum_sn FROM working_set ) WHERE sequence_number = _supremum_sn ) SELECT key, sequence_number, version_id, is_delete_marker, size, last_modified_date, e_tag, storage_class, object_tags, user_metadata FROM updated_inventory -- This filter omits only permanent deletes from the results. Delete markers will still be shown. WHERE NOT _is_perm_delete -- You can add additional filters here. Examples: -- AND object_tags['department'] = 'billing' -- AND starts_with(key, 'reports/') ORDER BY key ASC, sequence_number DESC