Index
An index holds the contents of your documents and is structured in a way to make the documents searchable. The way you add documents to the index depends on how you store your documents.
-
If you store your documents in some kind of repository, such as an Amazon S3 bucket or a Microsoft SharePoint site, you use a data source connector to index your documents from your repository.
-
If you don't store your documents in a repository, you use the BatchPutDocument API to directly index your documents.
-
For FAQ questions and answers, which must be stored in an Amazon Kendra (Amazon S3) bucket, you upload them from the bucket
You can create indexes with the Amazon Kendra console, the AWS CLI, or an AWS SDK. For information about the types of documents that can be indexed, see Types of documents.
Using Amazon Kendra built-in document fields
With the UpdateIndex
API, you can create reserved or built-in fields using
DocumentMetadataConfigurationUpdates
and specifying the reserved field name to map
to your equivalent document attribute or document field. You can also create custom fields this
way. If you use a data source connector, most include field mappings that map your data source
document fields to Amazon Kendra index fields. When you create a field, you can configure
the Search
object to set the field as displayable, facetable, searchable, and
sortable. You can configure the Relevance
object to set the field's relevance
boosting duration, freshness, importance, rank order, and importance values. You cannot
change the field type once you have created the field.
Amazon Kendra has the following reserved or built-in document fields that you can use:
-
_authors
—A list of one or more authors responsible for the content of the document. -
_category
—A category that places a document in a specific group. -
_created_at
—The date and time in ISO 8601 format that the document was created. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time. -
_data_source_id
—The identifier of the data source that contains the document. -
_document_body
—The content of the document. -
_document_id
—A unique identifier for the document. -
_document_title
—The title of the document. -
_excerpt_page_number
—The page number in a PDF file where the document excerpt appears. If your index was created before September 8, 2020, you must re-index your documents before you can use this attribute. -
_faq_id
—If this is an FAQ question and answer, a unique identifier for them. -
_file_type
—The file type of the document, such as pdf or doc. -
_last_updated_at
—The date and time in ISO 8601 format that the document was last updated. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time. -
_source_uri
—The URI where the document is available. For example, the URI of the document on a company website. -
_version
—An identifier for the specific version of a document. -
_view_count
—The number of times that the document has been viewed. -
_language_code
(String)—The code for a language that applies to the document. This defaults to English if you do not specify a language. For more information on supported languages, including their codes, see Adding documents in languages other than English.
For custom fields, you create these fields using DocumentMetadataConfigurationUpdates
with the UpdateIndex
API, just as you do when creating a reserved or
built-in field. You must set the appropriate data type for your custom field. You cannot
change the field type once you have created the field.
The following are the types you can set for custom fields:
-
Date
-
Number
-
String
-
String list
You create a custom field using the console or by using the UpdateIndex API. After you create a custom field, you map it to a document attribute, just as you do with a reserved field. If you added a document to the index with BatchPutDocument API, you map the attributes with the API. For documents indexed from an Amazon S3 data source, you map the attributes using a metadata file that contains a JSON structure that describes the document attributes. For documents indexed with a database or a data source that allows field mapping, you map attributes with the console or the data source configuration.
Searching indexes
After you create an index, you can start searching your documents. For more information, see Searching indexes.