Supported file and storage formats in Amazon Macie - Amazon Macie

Supported file and storage formats in Amazon Macie

Amazon Macie can analyze data in many different formats, including commonly used compression and archive formats. This support applies to the use of managed data identifiers and the use of custom data identifiers.

When Macie analyzes data, it performs a deep inspection that factors the file or storage format for the data. For data in a compressed or archive file, Macie inspects both the full file and the contents of the file. To inspect the file’s contents, Macie decompresses the file, and then inspects each extracted file that uses a supported format. Macie can do this for as many as 1,000,000 files and up to a nested depth of 10 levels.

The following table lists and describes the types of file and storage formats that Macie can analyze to detect sensitive data. For each supported type, the table also lists the applicable file name extensions.

File or storage type Description File name extensions

Big data

Apache Avro object containers and Apache Parquet files

.avro, .parquet

Compression or archive

GNU Zip compressed archives, TAR archives, and ZIP compressed archives

.gz, .gzip, .tar, .zip

Document

Adobe Portable Document Format files, Microsoft Excel workbooks, and Microsoft Word documents

.doc, .docx, .pdf, .xls, .xlsx

Text

Non-binary text files such as comma-separated values (CSV) files, Hypertext Markup Language (HTML) files, JavaScript Object Notation (JSON) files, JSON Lines files, plaintext documents, tab-separated values (TSV) files, and Extensible Markup Language (XML) files

.csv, .htm, .html, .json, .jsonl, .tsv, .txt, .xml, and others (depending on the type of non-binary text file)

Macie doesn’t analyze data in images or audio, video, and other types of multimedia content.

For information about the quotas that apply to sensitive data discovery, see Amazon Macie quotas.