Types of documents
An index can include both structured and unstructured text:
-
Structured text
-
Frequently asked questions and answers
-
-
Unstructured text
-
HTML files
-
Microsoft PowerPoint presentations
-
Microsoft Word documents
-
Plain text documents
-
PDFs
-
You can add documents directly to an index by calling the BatchPutDocument operation. You can also add documents from a data source. For information about adding files to a data source, see Adding documents from a data source. For an example that shows how to add Microsoft Word documents directly to an index from an Amazon S3 bucket, see Adding documents from an Amazon S3 bucket.
An index can contain multiple documents and multiple types of documents.
HTML
HTML format files. You add an HTML file to an index the same way that you add a plain text file.
Plain text
You can add plain text files to an index using the
BatchPutDocument
operation or from a data
source. For an example of adding a plain text document
directly to an index, see Adding documents with the API.
Microsoft Word document
Microsoft Word format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Microsoft PowerPoint document
Microsoft PowerPoint format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Portable document format (PDF)
PDF format files can be added to an index either as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Frequently asked questions and answers
Frequently asked question and answer format documents are used to answer questions such as How tall is the Space Needle? You can specify multiple questions that return the same answer. You specify the questions and answers in a comma-separated values (CSV) file stored in an Amazon S3 bucket.
For an example, see Adding questions and answers directly to an index .