Parsing options for your data source - Amazon Bedrock

Parsing options for your data source

Parsing refers to the interpretation of documents and their meaningful components. Amazon Bedrock Knowledge Bases offers the following options for parsing your data source during ingestion:

  • Amazon Bedrock default parser – Only parses text in your documents. This parser doesn't incur any usage charges.

  • Amazon Bedrock Data Automation (Preview) – A fully-managed service that effectively processes multimodal data, including both text and images, without the need to provide any additional prompting. For more information about this service, see Amazon Bedrock Data Automation.

  • Foundation models – Processes multimodal data, including both text and images, using a foundation model or inference profile. This parser provides you the option to customize the prompt used for data extraction. The cost of this parser depends on the number of tokens processed by the foundation model. For a list of models that support parsing of Amazon Bedrock Knowledge Bases data, see Supported models and Regions for parsing.

Note

If you choose a foundation model or Amazon Bedrock Data Automation for parsing and it fails to parse a file, the Amazon Bedrock default parser is used instead.

The following table summarizes file type support for each type of parser:

File types Extension Default parser Amazon Bedrock Data Automation Foundation model
Plain text (ASCII only) .txt Yes Yes Yes Yes Yes Yes
Markdown .md Yes Yes Yes Yes Yes Yes
HyperText Markup Language .html Yes Yes Yes Yes Yes Yes
Microsoft Word documents .doc/.docx Yes Yes Yes Yes Yes Yes
Comma-separated values .csv Yes Yes Yes Yes Yes Yes
Microsoft Excel spreadsheet .xls/.xlsx Yes Yes Yes Yes Yes Yes
Portable Document Format (PDF) .pdf Yes Yes Yes Yes Yes Yes
Images – JPEG/PNG format .jpeg, .png No No Yes Yes Yes Yes

When selecting how to parse your data, consider the following:

  • Whether your data is purely textual or if it contains multimodal data, such as images, graphs, and charts, that you want the knowledge base to be able to query.

  • Whether you want the option to customize the prompt that is used to instruct the model on how to parse your data.

  • The cost of the parser. For more information, see Amazon Bedrock Pricing.

There are limits for the types of files and total data that can be parsed using advanced parsing. For information about the file types for advanced parsing, see Supported document formats and limits for knowledge base data. For information about the total data that can be parsed using advanced parsing, see Amazon Bedrock endpoints and quotas in the AWS General Reference.

To learn how to configure how your knowledge base is parsed, see the connection configuration for a supported data source in Connect a data source to your knowledge base.