Parsing options for your data source
Parsing refers to the interpretation of documents and their meaningful components. Amazon Bedrock Knowledge Bases offers the following options for parsing your data source during ingestion:
-
Amazon Bedrock default parser – Only parses text in your documents. This parser doesn't incur any usage charges.
-
Amazon Bedrock Data Automation (Preview) – A fully-managed service that effectively processes multimodal data, including both text and images, without the need to provide any additional prompting. For more information about this service, see Amazon Bedrock Data Automation.
-
Foundation models – Processes multimodal data, including both text and images, using a foundation model or inference profile. This parser provides you the option to customize the prompt used for data extraction. The cost of this parser depends on the number of tokens processed by the foundation model. For a list of models that support parsing of Amazon Bedrock Knowledge Bases data, see Supported models and Regions for parsing.
Note
If you choose a foundation model or Amazon Bedrock Data Automation for parsing and it fails to parse a file, the Amazon Bedrock default parser is used instead.
The following table summarizes file type support for each type of parser:
File types | Extension | Default parser | Amazon Bedrock Data Automation | Foundation model |
---|---|---|---|---|
Plain text (ASCII only) | .txt | Yes | Yes | Yes |
Markdown | .md | Yes | Yes | Yes |
HyperText Markup Language | .html | Yes | Yes | Yes |
Microsoft Word documents | .doc/.docx | Yes | Yes | Yes |
Comma-separated values | .csv | Yes | Yes | Yes |
Microsoft Excel spreadsheet | .xls/.xlsx | Yes | Yes | Yes |
Portable Document Format (PDF) | Yes | Yes | Yes | |
Images – JPEG/PNG format | .jpeg, .png | No | Yes | Yes |
When selecting how to parse your data, consider the following:
-
Whether your data is purely textual or if it contains multimodal data, such as images, graphs, and charts, that you want the knowledge base to be able to query.
-
Whether you want the option to customize the prompt that is used to instruct the model on how to parse your data.
-
The cost of the parser. For more information, see Amazon Bedrock Pricing
.
There are limits for the types of files and total data that can be parsed using advanced parsing. For information about the file types for advanced parsing, see Supported document formats and limits for knowledge base data. For information about the total data that can be parsed using advanced parsing, see Amazon Bedrock endpoints and quotas in the AWS General Reference.
To learn how to configure how your knowledge base is parsed, see the connection configuration for a supported data source in Connect a data source to your knowledge base.