Supported file and storage formats in Amazon Macie - Amazon Macie

Supported file and storage formats in Amazon Macie

When Amazon Macie analyzes data, it performs a deep inspection that factors the file or storage format for the data. Macie can analyze data in many different formats, including commonly used compression and archive formats. This support applies to the use of both managed data identifiers and custom data identifiers.

When Macie analyzes a compressed or archive file, it inspects both the full file and the contents of the file. To inspect the file’s contents, it decompresses the file, and then inspects each extracted file that uses a supported format. Macie can do this for as many as 1,000,000 files and up to a nested depth of 10 levels.

The following table lists and describes the file and storage formats that Macie can analyze to detect sensitive data, organized by type. For each supported type, it also lists the applicable file name extensions.

File or storage type Description File name extensions

Big data

Apache Avro object containers and Apache Parquet files

.avro, .parquet

Compression or archive

GNU Zip compressed archives, TAR archives, and ZIP compressed archives

.gz, .gzip, .tar, .zip


Adobe Portable Document Format files, Microsoft Excel workbooks, and Microsoft Word documents

.doc, .docx, .pdf, .xls, .xlsx


Non-binary text files such as comma-separated values (CSV) files, Hypertext Markup Language (HTML) files, JavaScript Object Notation (JSON) files, JSON Lines files, plain-text documents, tab-separated values (TSV) files, and Extensible Markup Language (XML) files

.csv, .htm, .html, .json, .jsonl, .tsv, .txt, .xml, and others (depending on the type of non-binary text file)

Macie doesn’t analyze data in images or audio, video, and other types of multimedia content.

For information about the quotas that apply to sensitive data discovery, see Amazon Macie quotas.