How Amazon Bedrock knowledge bases work - Amazon Bedrock

How Amazon Bedrock knowledge bases work

Amazon Bedrock Knowledge Bases help you take advantage of Retrieval Augmented Generation (RAG), a popular technique that involves drawing information from a data store to augment the responses generated by Large Language Models (LLMs). When you set up a knowledge base with your data source and vector store, your application can query the knowledge base to return information to answer the query either with direct quotations from sources or with natural responses generated from the query results.

With knowledge bases, you can build applications that are enriched by the context that is received from querying a knowledge base. It enables a faster time to market by abstracting from the heavy lifting of building pipelines and providing you an out-of-the-box RAG solution to reduce the build time for your application. Adding a knowledge base also increases cost-effectiveness by removing the need to continually train your model to be able to leverage your private data.

The following diagrams illustrate schematically how RAG is carried out. Knowledge base simplifies the setup and implementation of RAG by automating several steps in this process.

Pre-processing data

To enable effective retrieval from private data, a common practice is to convert the data into text and split it into manageable pieces. The pieces or chunks are then converted to embeddings and written to a vector index, while maintaining a mapping to the original document. These embeddings are used to determine semantic similarity between queries and text from the data sources. The following image illustrates pre-processing of data for the vector database.

Pre-processing data for retrieval-augmented generation

Vector embeddings are a series of numbers that represent each chunk of text. A model converts each text chunk into series of numbers, known as vectors, so that the texts can be mathematically compared. These vectors can either be floating-point numbers (float32) or binary numbers. Most embeddings models supported by Amazon Bedrock use floating-point vectors by default. But you can choose to use a model that supports binary vectors and specify binary as the data type for the embeddings model. If you want to use binary vectors, you must also choose a model and vector store that supports binary vectors. Binary vectors that use only 1 bit per dimension are not as costly on storage as floating-point (float32) vectors that use 32 bits per dimension. However, binary vectors are not as precise as floating-point vectors in their vector representation of the text.

The following is an example of a piece of text with the traditional floating-point vector representation of the text chunk and the simpler binary vector equivalent.

Text: "Amazon Bedrock uses high-performing foundation models from leading AI companies and Amazon."
Floating-point vector: [0.041..., 0.056..., -0.018..., -0.012..., -0.020..., ...]
Binary vector: [1,1,0,0,0, ...]

Runtime execution

At runtime, an embedding model is used to convert the user's query to a vector. The vector index is then queried to find chunks that are semantically similar to the user's query by comparing document vectors to the user query vector. In the final step, the user prompt is augmented with the additional context from the chunks that are retrieved from the vector index. The prompt alongside the additional context is then sent to the model to generate a response for the user. The following image illustrates how RAG operates at runtime to augment responses to user queries.

Retrieval-augmented generation at runtime

To learn more about how to turn your data into a knowledge base, how to query your knowledge base after you've set it up, and customizations that you can apply to the data source during ingestion, see the following topics: