Types of documents - Amazon Kendra

Types of documents

An index can contain multiple documents and multiple types of documents. An index can also include both structured and unstructured text.

Structured text

  • Frequently asked questions and answers

Frequently asked question and answer format documents are used to answer questions such as How tall is the Space Needle? You can specify multiple questions that return the same answer. You specify the questions and answers in a comma-separated values (CSV) file stored in an Amazon S3 bucket.

If you want to use a JSON file for your set of frequently asked questions and answers, JSON files are treated as plain text and is a type of unstructured text.

For an example, see Adding questions and answers directly to an index.

Unstructured text

Amazon Kendra supports the following types of unstructured text:

  • HTML files

  • Microsoft PowerPoint (PPT) presentations

  • MS WORD documents

  • Plain text documents

  • PDFs

  • Comma Separated Values (CSV) files

  • Microsoft Excel (MS EXCEL) files

  • XML files

  • JSON files

  • Markdown Documentation (MD) files

  • Rich Text Format (RTF) files

  • Extensible Stylesheet Language Transformation (XSLT) files

You can add unstructured documents to your index in three ways: