BatchPutDocument
Adds one or more documents to an index.
The BatchPutDocument
API enables you to ingest inline documents or a set
of documents stored in an Amazon S3 bucket. Use this API to ingest your text and
unstructured text into an index, add custom attributes to the documents, and to attach
an access control list to the documents added to the index.
The documents are indexed asynchronously. You can see the progress of the batch using
AWS
CloudWatch. Any error messages related to processing the batch are sent to your
AWS
CloudWatch log. You can also use the BatchGetDocumentStatus
API to
monitor the progress of indexing your documents.
For an example of ingesting inline documents using Python and Java SDKs, see Adding files directly to an index.
Request Syntax
{
"CustomDocumentEnrichmentConfiguration": {
"InlineConfigurations": [
{
"Condition": {
"ConditionDocumentAttributeKey": "string
",
"ConditionOnValue": {
"DateValue": number
,
"LongValue": number
,
"StringListValue": [ "string
" ],
"StringValue": "string
"
},
"Operator": "string
"
},
"DocumentContentDeletion": boolean
,
"Target": {
"TargetDocumentAttributeKey": "string
",
"TargetDocumentAttributeValue": {
"DateValue": number
,
"LongValue": number
,
"StringListValue": [ "string
" ],
"StringValue": "string
"
},
"TargetDocumentAttributeValueDeletion": boolean
}
}
],
"PostExtractionHookConfiguration": {
"InvocationCondition": {
"ConditionDocumentAttributeKey": "string
",
"ConditionOnValue": {
"DateValue": number
,
"LongValue": number
,
"StringListValue": [ "string
" ],
"StringValue": "string
"
},
"Operator": "string
"
},
"LambdaArn": "string
",
"S3Bucket": "string
"
},
"PreExtractionHookConfiguration": {
"InvocationCondition": {
"ConditionDocumentAttributeKey": "string
",
"ConditionOnValue": {
"DateValue": number
,
"LongValue": number
,
"StringListValue": [ "string
" ],
"StringValue": "string
"
},
"Operator": "string
"
},
"LambdaArn": "string
",
"S3Bucket": "string
"
},
"RoleArn": "string
"
},
"Documents": [
{
"AccessControlConfigurationId": "string
",
"AccessControlList": [
{
"Access": "string
",
"DataSourceId": "string
",
"Name": "string
",
"Type": "string
"
}
],
"Attributes": [
{
"Key": "string
",
"Value": {
"DateValue": number
,
"LongValue": number
,
"StringListValue": [ "string
" ],
"StringValue": "string
"
}
}
],
"Blob": blob
,
"ContentType": "string
",
"HierarchicalAccessControlList": [
{
"PrincipalList": [
{
"Access": "string
",
"DataSourceId": "string
",
"Name": "string
",
"Type": "string
"
}
]
}
],
"Id": "string
",
"S3Path": {
"Bucket": "string
",
"Key": "string
"
},
"Title": "string
"
}
],
"IndexId": "string
",
"RoleArn": "string
"
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- CustomDocumentEnrichmentConfiguration
-
Configuration information for altering your document metadata and content during the document ingestion process when you use the
BatchPutDocument
API.For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.
Type: CustomDocumentEnrichmentConfiguration object
Required: No
- Documents
-
One or more documents to add to the index.
Documents have the following file size limits.
-
50 MB total size for any file
-
5 MB extracted text for any file
For more information, see Quotas.
Type: Array of Document objects
Array Members: Minimum number of 1 item. Maximum number of 10 items.
Required: Yes
-
- IndexId
-
The identifier of the index to add the documents to. You need to create the index first using the
CreateIndex
API.Type: String
Length Constraints: Fixed length of 36.
Pattern:
[a-zA-Z0-9][a-zA-Z0-9-]*
Required: Yes
- RoleArn
-
The Amazon Resource Name (ARN) of an IAM role with permission to access your S3 bucket. For more information, see IAM access roles for Amazon Kendra.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 1284.
Pattern:
arn:[a-z0-9-\.]{1,63}:[a-z0-9-\.]{0,63}:[a-z0-9-\.]{0,63}:[a-z0-9-\.]{0,63}:[^/].{0,1023}
Required: No
Response Syntax
{
"FailedDocuments": [
{
"ErrorCode": "string",
"ErrorMessage": "string",
"Id": "string"
}
]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- FailedDocuments
-
A list of documents that were not added to the index because the document failed a validation check. Each document contains an error message that indicates why the document couldn't be added to the index.
If there was an error adding a document to an index the error is reported in your AWS CloudWatch log. For more information, see Monitoring Amazon Kendra with Amazon CloudWatch logs.
Type: Array of BatchPutDocumentResponseFailedDocument objects
Errors
For information about the errors that are common to all actions, see Common Errors.
- AccessDeniedException
-
You don't have sufficient access to perform this action. Please ensure you have the required permission policies and user accounts and try again.
HTTP Status Code: 400
- ConflictException
-
A conflict occurred with the request. Please fix any inconsistences with your resources and try again.
HTTP Status Code: 400
- InternalServerException
-
An issue occurred with the internal server used for your Amazon Kendra service. Please wait a few minutes and try again, or contact Support
for help. HTTP Status Code: 500
- ResourceNotFoundException
-
The resource you want to use doesn’t exist. Please check you have provided the correct resource and try again.
HTTP Status Code: 400
- ServiceQuotaExceededException
-
You have exceeded the set limits for your Amazon Kendra service. Please see Quotas for more information, or contact Support
to inquire about an increase of limits. HTTP Status Code: 400
- ThrottlingException
-
The request was denied due to request throttling. Please reduce the number of requests and try again.
HTTP Status Code: 400
- ValidationException
-
The input fails to satisfy the constraints set by the Amazon Kendra service. Please provide the correct input and try again.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: