GENPERF04-BP02 Optimize vector sizes for your use case

Embedding models may offer support for different sizes of vectors when embedding data. Optimizing the vector size for an embedding may introduce long-term performance gains.

Desired outcome: When implemented, this best practice helps verify that vector sizes are optimized for a specific use case, which can lead to improved performance over time.

Benefits of establishing this best practice: Consider mechanical sympathy - Optimizing vector sizes for supported vector embedding models may improve performance of your application. Familiarize yourself with how your selected embedding model performs embeddings and retrievals when optimizing.

Level of risk exposed if this best practice is not established: Low

Implementation guidance

When embedding unstructured data into a vector database, it's important to test multiple embedding models with various vector sizes to optimize data retrieval and identify performance trade-offs. While there's a general relationship between vector size and accuracy within a model family, this correlation isn't universal across all embedding models. The performance of your embeddings depends on several factors: the specific data you're encoding, the chosen embedding model, and the vector size used within that model. Consider checking popular leaderboards like HuggingFace's Massive Text Embedding Benchmark (MTEB) Leaderboard when selecting an embedding model.

Start with a more compact encoding, and only increase the vector size if warranted by your use cases to improve accuracy or minimize loss. Consider the nature of your dataset and how focused the topics or language are. The more narrow and deep the content, the more likely fine-tuning is to improve accuracy while potentially reducing vector size.

For use cases where higher latency is acceptable, larger vector sizes within a given model may offer more accuracy and nuance. Conversely, for low-latency requirements, smaller vector sizes typically result in faster retrieval. However, it's crucial to note that a well-tuned model with smaller dimensions (like 256) can sometimes outperform a more generic model with larger dimensions (like 1024 or greater) in both accuracy and speed.

Keep in mind that some models offer a limited range of permissible vector dimensions. Always test and evaluate the performance of different models and vector sizes with your specific dataset to find the optimal balance between accuracy and latency for your use case.

Implementation steps

Identify the most important performance KPI for this workload (like accuracy, speed, memory usage, or scalability).
Determine the number of vector options supported by your selected embedding model and design experiments meant to test each option.
- Remember to experiment on a variety of data to get a clear determination of which embedding size is best for this workload.
Run the experiment and determine the most performant embedding model for this scenario.

Resources

Related practices:

Related guides, videos, and documentation:

Related examples:

Related tools:

HuggingFace's MTEB Leaderboard

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

GENPERF04-BP01 Test vector store features for latency and relevant performance

Cost optimization