Retrieve the content of documents from knowledge base
The GetDocumentContent API allows you to retrieve the content of documents
that have been ingested into an Amazon Bedrock Knowledge Base. This API returns a pre-signed URL that
provides temporary, secure access to download or view the original or extracted content of a
document.
This is useful when you want to:
Access the source document referenced in a
RetrieveAPI responseDownload the original file (PDF, Word, HTML, etc.) from a knowledge base
Retrieve the extracted/parsed text content of a document in JSON format
Build applications that let users view or download source documents behind
RetrieveAPI responses
How it works
You call
GetDocumentContentwith the knowledge base ID, data source ID, and document ID.The service validates your access permissions (including any ACL-based access controls configured on the knowledge base).
The API returns a pre-signed URL and the document's MIME type.
You use the pre-signed URL to download the document content. The URL expires after 5 minutes.
IAM permissions
Calling GetDocumentContent requires both bedrock:Retrieve
and bedrock:GetDocumentContent IAM actions on the knowledge base resource.
This is because the API internally validates retrieval-level access before returning
document content. Ensure your IAM policy includes both actions:
{ "Effect": "Allow", "Action": [ "bedrock:Retrieve", "bedrock:GetDocumentContent" ], "Resource": "arn:aws:bedrock:region:account-id:knowledge-base/kb-id" }
Usage examples
Same account with ACL enabled
When your knowledge base has ACL-based access control enabled, pass
userContext with the user's identity to ensure document-level
permission checks:
import boto3 import requests client = boto3.client('bedrock-agent-runtime') # Step 1: Retrieve relevant documents retrieve_response = client.retrieve( knowledgeBaseId='KBID1234567', retrievalQuery={'text': 'What is the refund policy?'} ) # Step 2: Get the full document content for the top result result = retrieve_response['retrievalResults'][0] doc_response = client.get_document_content( knowledgeBaseId='KBID1234567', dataSourceId=result['metadata']['_data_source_id'], documentId=result['documentId'], outputFormat='RAW', userContext={ 'userId': 'user-email', 'groups': [ {'id': 'group-engineering'}, {'id': 'group-project-alpha'} ] } ) # Step 3: Download the document download = requests.get(doc_response['presignedUrl']) with open('document.pdf', 'wb') as f: f.write(download.content)
Same account without ACL enabled
When ACLs are not configured, omit userContext:
import boto3 import requests client = boto3.client('bedrock-agent-runtime') # Step 1: Retrieve relevant documents retrieve_response = client.retrieve( knowledgeBaseId='KBID1234567', retrievalQuery={'text': 'What is the refund policy?'} ) # Step 2: Get the full document content result = retrieve_response['retrievalResults'][0] doc_response = client.get_document_content( knowledgeBaseId='KBID1234567', dataSourceId=result['metadata']['_data_source_id'], documentId=result['documentId'], outputFormat='RAW' ) # Step 3: Download the document download = requests.get(doc_response['presignedUrl']) with open('document.pdf', 'wb') as f: f.write(download.content)
Cross-account without ACL enabled
For cross-account access, the knowledge base owner must attach a resource policy to their knowledge base that grants the caller's account permission. Then the caller uses the full knowledge base ARN.
Step 1: KB owner attaches a resource policy to the knowledge base
The account that owns the knowledge base (for example, 999999999999)
must attach a resource policy granting the caller account (for example,
111111111111) access:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "111111111111" }, "Action": [ "bedrock:Retrieve", "bedrock:GetDocumentContent" ], "Resource": "arn:aws:bedrock:us-east-1:999999999999:knowledge-base/KBID1234567" } ] }
This is done via the PutKnowledgeBaseResourcePolicy API or through the Amazon Bedrock console.
Step 2: Caller account has IAM permissions to invoke the API
The caller's IAM role/user (in account 111111111111) needs an IAM
policy allowing the actions on the cross-account KB ARN:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:Retrieve", "bedrock:GetDocumentContent" ], "Resource": "arn:aws:bedrock:us-east-1:999999999999:knowledge-base/KBID1234567" } ] }
Step 3: Call the API using the full KB ARN
import boto3 import requests client = boto3.client('bedrock-agent-runtime') CROSS_ACCOUNT_KB_ARN = 'arn:aws:bedrock:us-east-1:999999999999:knowledge-base/KBID1234567' # Step 1: Retrieve relevant documents using the KB ARN retrieve_response = client.retrieve( knowledgeBaseId=CROSS_ACCOUNT_KB_ARN, retrievalQuery={'text': 'What is the refund policy?'} ) # Step 2: Get the full document content using the same ARN result = retrieve_response['retrievalResults'][0] doc_response = client.get_document_content( knowledgeBaseId=CROSS_ACCOUNT_KB_ARN, dataSourceId=result['metadata']['_data_source_id'], documentId=result['documentId'], outputFormat='RAW' ) # Step 3: Download the document download = requests.get(doc_response['presignedUrl']) with open('document.pdf', 'wb') as f: f.write(download.content)
Both the resource policy (on the KB owner side) and the IAM policy (on the caller side) must be in place. Access is denied if either is missing.