Processing through CLI
Process your first document through CLI
Input files into an S3 bucket
Before processing documents with BDA, you must first upload your documents to an S3 bucket:
Syntax
aws s3 cp <source> <target> [--options]
Example:
aws s3 cp /local/path/document.pdf s3://my-bda-bucket/input/document.pdf
Basic processing command structure
Use the invoke-data-automation-async
command to process files:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://amzn-s3-demo-bucket/sample-images/sample-image.jpg" }' \ --output-configuration '{ "s3Uri": "s3://amzn-s3-demo-bucket/output/" }' \ --data-automation-configuration '{ "dataAutomationProjectArn": "Amazon Resource Name (ARN)", "stage": "LIVE" }' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"
Advanced processing command structure
Video processing with time segments
For video files, you can specify time segments to process:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://my-bucket/video.mp4", "assetProcessingConfiguration": { "video": { "segmentConfiguration": { "timestampSegment": { "startTimeMillis": 0, "endTimeMillis": 300000 } } } } }' \ --output-configuration '{ "s3Uri": "s3://my-bucket/output/" }' \ --data-automation-configuration '{ "dataAutomationProjectArn": "Amazon Resource Name (ARN)", "stage": "LIVE" }' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"
Using custom blueprints
You can specify custom blueprints directly in the command:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://my-bucket/document.pdf" }' \ --output-configuration '{ "s3Uri": "s3://my-bucket/output/" }' \ --blueprints '[ { "blueprintArn": "Amazon Resource Name (ARN)", "version": "1", "stage": "LIVE" } ]' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"
Adding encryption configuration
For enhanced security, you can add encryption configuration:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://my-bucket/document.pdf" }' \ --output-configuration '{ "s3Uri": "s3://my-bucket/output/" }' \ --data-automation-configuration '{ "dataAutomationProjectArn": "Amazon Resource Name (ARN)", "stage": "LIVE" }' \ --encryption-configuration '{ "kmsKeyId": "Amazon Resource Name (ARN)", "kmsEncryptionContext": { "Department": "Finance", "Project": "DocumentProcessing" } }' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"
Event notifications
Enable EventBridge notifications for processing completion:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://my-bucket/document.pdf" }' \ --output-configuration '{ "s3Uri": "s3://my-bucket/output/" }' \ --data-automation-configuration '{ "dataAutomationProjectArn": "Amazon Resource Name (ARN)", "stage": "LIVE" }' \ --notification-configuration '{ "eventBridgeConfiguration": { "eventBridgeEnabled": true } }' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"
Checking processing status
Use the get-data-automation-status
command to check the status of your processing job:
aws bedrock-data-automation-runtime get-data-automation-status \ --invocation-arn "Amazon Resource Name (ARN)"
The response will include the current status:
{ "status": "COMPLETED", "creationTime": "2025-07-24T12:34:56.789Z", "lastModifiedTime": "2025-07-24T12:45:12.345Z", "outputLocation": "s3://my-bucket/output/abcd1234/" }
Retrieve processing results
Locating output files in S3
List the output files in your S3 bucket:
aws s3 ls s3://amzn-s3-demo-bucket/output/
Download the results to your local machine:
aws s3 cp s3://amzn-s3-demo-bucket/output/ ~/Downloads/bda-results/ --recursive
Understanding output structure
The output typically includes:
-
standard-output.json
: Contains standard extraction results -
custom-output.json
: Contains results from custom blueprints -
metadata.json
: Contains processing metadata and confidence scores
Common response fields
Standard output typically includes:
-
extractedData
: The main extracted information -
confidence
: Confidence scores for each extracted field -
metadata
: Processing information including timestamps and model details -
boundingBoxes
: Location information for detected elements (if enabled)
Error handling and troubleshooting
Common error scenarios and solutions:
-
Invalid S3 URI: Ensure your S3 bucket exists and you have proper permissions
-
Missing data-automation-profile-arn: This parameter is required for all processing requests
-
Project not found: Verify your project ARN is correct and the project exists
-
Unsupported file format: Check that your file format is supported by BDA
Adding tags to processing jobs
You can add tags to help organize and track your processing jobs:
aws bedrock-data-automation-runtime invoke-data-automation-async \ --input-configuration '{ "s3Uri": "s3://my-bucket/document.pdf" }' \ --output-configuration '{ "s3Uri": "s3://my-bucket/output/" }' \ --data-automation-configuration '{ "dataAutomationProjectArn": "Amazon Resource Name (ARN)", "stage": "LIVE" }' \ --tags '[ { "key": "Department", "value": "Finance" }, { "key": "Project", "value": "InvoiceProcessing" } ]' \ --data-automation-profile-arn "Amazon Resource Name (ARN)"