Retrieval Augmented Generation
Foundation models are usually trained offline, making the model agnostic to any
data that is created after the model was trained. Additionally, foundation models
are trained on very general domain corpora, making them less effective for
domain-specific tasks. You can use Retrieval Augmented Generation (RAG) to retrieve
data from outside a foundation model and augment your prompts by adding the relevant
retrieved data in context. For more information about RAG model architectures, see
Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
With RAG, the external data used to augment your prompts can come from multiple data sources, such as a document repositories, databases, or APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted to numerical representations using embedding language models. Embedding is the process by which text is given numerical representation in a vector space. RAG model architectures compare the embeddings of user queries within the vector of the knowledge library. The original user prompt is then appended with relevant context from similar documents within the knowledge library. This augmented prompt is then sent to the foundation model. You can update knowledge libraries and their relevant embeddings asynchronously.
The retrieved document should be large enough to contain useful context to help augment the prompt, but small enough to fit into the maximum sequence length of the prompt. You can use task-specific JumpStart models, such as the General Text Embeddings (GTE) model from Hugging Face, to provide the embeddings for your prompts and knowledge library documents. After comparing the prompt and document embeddings to find the most relevant documents, construct a new prompt with the supplemental context. Then, pass the augmented prompt to a text generation model of your choosing.
Example notebooks
For more information on RAG foundation model solutions, see the following example notebooks:
-
Retrieval-Augmented Generation: Question Answering using LLama-2, Pinecone and Custom Dataset
-
Retrieval-Augmented Generation: Question Answering based on Custom Dataset
-
Retrieval-Augmented Generation: Question Answering using Llama-2 and Text Embedding Models
-
Amazon SageMaker JumpStart - Text Embedding and Sentence Similarity
You can clone the Amazon SageMaker examples repository