Amazon EMR
Developer Guide

Impala-supported File and Compression Formats

Choosing the correct file type and compression is key for optimizing the performance of your Impala cluster. With Impala, you can query the following data types:

  • Parquet

  • Avro

  • RCFile

  • SequenceFile

  • Unstructured text

In addition, Impala supports the following compression types:

  • Snappy

  • GZIP

  • LZO (for text files only)

  • Deflate (except Parquet and text)

  • BZIP2 (except Parquet and text)

Depending on the file type and compression, you may need to use Hive to load data or create a table.