Document
A document in an index.
Contents
- Id
-
A identifier of the document in the index.
Note, each document ID must be unique per index. You cannot create a data source to index your documents with their unique IDs and then use the
BatchPutDocument
API to index the same documents, or vice versa. You can delete a data source and then use theBatchPutDocument
API to index the same documents, or vice versa.Type: String
Length Constraints: Minimum length of 1. Maximum length of 2048.
Required: Yes
- AccessControlConfigurationId
-
The identifier of the access control configuration that you want to apply to the document.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 36.
Pattern:
[a-zA-Z0-9-]+
Required: No
- AccessControlList
-
Information on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.
Type: Array of Principal objects
Required: No
- Attributes
-
Custom attributes to apply to the document. Use the custom attributes to provide additional information for searching, to provide facets for refining searches, and to provide additional information in the query response.
For example, 'DataSourceId' and 'DataSourceSyncJobId' are custom attributes that provide information on the synchronization of documents running on a data source. Note, 'DataSourceSyncJobId' could be an optional custom attribute as Amazon Kendra will use the ID of a running sync job.
Type: Array of DocumentAttribute objects
Required: No
- Blob
-
The contents of the document.
Documents passed to the
Blob
parameter must be base64 encoded. Your code might not need to encode the document file bytes if you're using an AWS SDK to call Amazon Kendra APIs. If you are calling the Amazon Kendra endpoint directly using REST, you must base64 encode the contents before sending.Type: Base64-encoded binary data object
Required: No
- ContentType
-
The file type of the document in the
Blob
field.If you want to index snippets or subsets of HTML documents instead of the entirety of the HTML documents, you must add the
HTML
start and closing tags (<HTML>content</HTML>
) around the content.Type: String
Valid Values:
PDF | HTML | MS_WORD | PLAIN_TEXT | PPT | RTF | XML | XSLT | MS_EXCEL | CSV | JSON | MD
Required: No
- HierarchicalAccessControlList
-
The list of principal lists that define the hierarchy for which documents users should have access to.
Type: Array of HierarchicalPrincipal objects
Array Members: Minimum number of 1 item. Maximum number of 30 items.
Required: No
- S3Path
-
Information required to find a specific file in an Amazon S3 bucket.
Type: S3Path object
Required: No
- Title
-
The title of the document.
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: