Types of documents - Amazon Kendra

Types of documents

An index can include both structured and unstructured text:

  • Structured text

    • Frequently asked questions and answers

  • Unstructured text

    • HTML files

    • Microsoft PowerPoint presentations

    • Microsoft Word documents

    • Plain text documents

    • PDFs

You can add documents directly to an index by calling the BatchPutDocument operation. You can also add documents from a data source. For information about adding files to a data source, see Adding documents from a data source. For an example that shows how to add Microsoft Word documents directly to an index from an Amazon S3 bucket, see Adding documents from an Amazon S3 bucket.

An index can contain multiple documents and multiple types of documents.

HTML

HTML format files. You add an HTML file to an index the same way that you add a plain text file.

Plain text

You can add plain text files to an index using the BatchPutDocument operation or from a data source. For an example of adding a plain text document directly to an index, see Adding documents with the API.

Microsoft Word document

Microsoft Word format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

Microsoft PowerPoint document

Microsoft PowerPoint format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

Portable document format (PDF)

PDF format files can be added to an index either as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

Frequently asked questions and answers

Frequently asked question and answer format documents are used to answer questions such as How tall is the Space Needle? You can specify multiple questions that return the same answer. You specify the questions and answers in a comma-separated values (CSV) file stored in an Amazon S3 bucket.

For an example, see Adding questions and answers directly to an index.