Writing best practices to optimize RAG applications - AWS Prescriptive Guidance

Writing best practices to optimize RAG applications

Ivan Cui and Samantha Stuart, Amazon Web Services

July 2025 (document history)

Large language models (LLMs) have revolutionized the field of artificial intelligence with their remarkable ability to understand and generate human-like text. However, they face a significant limitation: they can only work with knowledge contained in their training data. This is where Retrieval Augmented Generation (RAG) helps. It offers a solution that combines LLMs with external knowledge sources, such as your organization's data and documents. Through a two-stage process that involves information retrieval and response generation, RAG enables AI systems to access and incorporate up-to-date information from various sources, resulting in more accurate and informed responses that bridge the gap between static model knowledge and dynamic real-world information needs.

How can you optimize content for retrieval in a RAG-based application? This guide provides best practices to help you optimize the formatting and writing style of text-based content in the knowledge base. Optimizing the content enhances the context that helps RAG applications understand task-specific information more accurately. When the system retrieves highly relevant and accurate content, then the quality of the LLM's response improves. Optimizing the context delivery process at a system level is called context engineering, and it forms an essential part of agentic RAG architectures. In agentic RAG, one or more additional LLMs reason and act on intake requests before the RAG execution. This facilitates a multi-step information delivery process. As RAG architectures grow increasingly complex, source content optimization remains the most direct means of delivering clear context to LLMs. These best practices are designed to help you maximize your organization's investment in a RAG application.

Intended audience

This guide is intended for AI engineers, data scientists, data engineers, or software developers who are building LLM applications with one or more RAG components. To understand the concepts and recommendations in this guide, you should be familiar with vector databases and prompts for LLMs.

Objectives

The recommendations in this guide can help you achieve the following:

  • Improve the accuracy and relevancy of responses generated by RAG applications by providing well-structured and semantically rich source documents, optimized for token usage and redundancy.

  • Help RAG applications to better understand domain-specific knowledge and context by providing clear definitions and explanations within source documents.

  • Facilitate easier maintenance and knowledge base updates for RAG applications by adhering to consistent formatting and structuring guidelines across source documents.

  • Improve the scalability of RAG solutions by breaking down large, monolithic documents into smaller, self-contained units that can be efficiently indexed and retrieved.