Data formats for AWS Clean Rooms - AWS Clean Rooms

Data formats for AWS Clean Rooms

To query data, the datasets must be in a format that AWS Clean Rooms supports. ​The Amazon S3 bucket with the datasets and the AWS Clean Rooms cluster must be in the same AWS Region. ​ ​

Supported data formats

AWS Clean Rooms supports the following structured formats:

Note

A timestamp value in a text file must be in the format yyyy-MM-dd HH:mm:ss.SSSSSS. For example: 2017-05-01 11:30:59.000000. ​

We recommend using a columnar storage file format, such as Apache Parquet. With a columnar storage file format, you can minimize data transfer out of Amazon S3 by selecting only the columns that you need. ​ For optimal performance, large objects should be split into 100mb–1gb objects.

Supported data types

For an optimal experience with AWS Clean Rooms, all of your data must be cataloged in AWS Glue. For more information, see the section titled Getting started with the AWS Glue Data Catalog in the AWS Glue Developer Guide.

AWS Clean Rooms supports the following AWS Glue Data Catalog data types:

  • bigint

  • boolean

  • char

  • date

  • decimal

  • double

  • float

  • int

  • Nested data types such as:

    • array

    • map

    • struct

  • smallint

  • string

  • timestamp

  • varchar

AWS Clean Rooms does not support:

  • binary

  • interval

File compression types for AWS Clean Rooms

To reduce storage space, improve performance, and minimize costs, we strongly recommend that you compress your datasets.

AWS Clean Rooms recognizes file compression types based on the file extension and supports the compression types and extensions shown in the following table. ​

Compression algorithm File extension
GZIP .gz
Bzip2 .bz2
Snappy .snappy

You can apply compression at different levels. Most commonly, you compress a whole file or compress individual blocks within a file. Compressing columnar formats at the file level doesn't yield performance benefits. ​

Server-side encryption for AWS Clean Rooms

Note

Server-side encryption does not replace cryptographic computing for those use cases that require it.

AWS Clean Rooms transparently decrypts datasets that are encrypted using the following encryption options: ​

  • SSE-S3 – Server-side encryption using an AES-256 encryption key managed by Amazon S3

  • SSE-KMS – Server-side encryption with keys managed by AWS Key Management Service

To use SSE-S3, the AWS Clean Rooms service role used to associate the configured table to the collaboration must have KMS-decrypt permissions. To use SSE-KMS, the KMS key policy must also allow the AWS Clean Rooms service role to decrypt. ​

AWS Clean Rooms doesn't support Amazon S3 client-side encryption. For more information about server-side encryption, see Protecting data using server-side encryption in the Amazon Simple Storage Service User Guide. ​