Definitions

Agent: An AI system that can perform tasks autonomously and interact with its environment to achieve specific goals.
Bias and fairness testing: Evaluating and mitigating potential biases or unfair outcomes from AI models, particularly in areas like gender, race, or age.
Continuous pre-training: The process of continuously updating a pre-trained model with new data to improve its performance and adapt to evolving domains or tasks.
Chunking: Breaking up large data files into small, discreet chunks to allow the foundation model to fit that data into a context window.
Data management: The process of identifying, collecting, storing, aggregating, searching, tracking, governing, and using data.
Embedding: Transforms chunks of data into vectors that represent semantic meaning.
Fine-tuning: The process of adapting a pre-trained model to a specific task or domain by training it on a smaller, task-specific dataset.
Foundation models: Large language models pre-trained on vast amounts of data, serving as a foundation for downstream tasks and fine-tuning.
Foundation model providers: Companies or organizations that develop and release foundation models for use by others.
Generative AI: AI systems capable of generating new content, such as text, images, or code, based on input data or prompts.
Hallucination: A phenomenon where a generative AI model produces outputs that are inconsistent, factually incorrect, or unrelated to the input prompt.
Human oversight: Mechanisms for human experts to review, validate, and control critical decisions or outputs from AI models.
Indexing: Process of inserting embedded chunks into a vector data store.
Knowledge graph: A structured representation of real-world entities and their relationships, used to enhance the contextual understanding and reasoning capabilities of AI systems.
LLMOps or GenAIOps: Operational practices and principles for managing the lifecycle of large language models (LLMs), including model selection, data preparation, deployment, monitoring, and governance.
Model card: A document that provides key information about a machine learning model, including its intended use, training data, performance characteristics, and potential limitations or biases.
Model customization: The process of modifying a foundation model using various techniques to control its behavior.
Model distillation: A technique for creating a smaller, more efficient model that mimics the behavior of a larger, more advanced model.
Model evaluation: The process of assessing the performance, robustness, and other characteristics of language models using various metrics and techniques.
Model gateway: An interaction layer offering secure access to the model hub through standardized APIs.
Model hub: A central repository providing access to enterprise foundation models from first-party, third-party, and open-source providers.
Model interpretability: The ability to understand and explain the reasoning behind a model's outputs, increasing transparency and interpretability.
Model orchestration: Encapsulation of multistep workflows which are characteristic of generative AI workflows.
Pre-Training: Building a foundation model from scratch. Requires GPU clusters to run continuously for weeks.
Prompt catalog: A centralized repository for storing, managing, and versioning prompts used to interact with generative AI models.
Prompt engineering: The practice of carefully crafting prompts to guide language models to produce desired outputs.
Provisioned throughput: Feature of Amazon Bedrock that allows you to provision a higher level of throughput at a fixed cost for predictable, high-throughput workloads.
Quantization: Techniques for reducing the precision of model parameters, thereby decreasing the memory footprint and computational requirements.
Responsible AI: The practice of developing and deploying AI systems in a manner that prioritizes fairness, transparency, accountability, and adherence to ethical principles.
Retrieval-Augmented Generation (RAG): A technique/architectural style where a language model's output is augmented with relevant information retrieved from a corpus of documents. This technique is employed to make sure the responses are grounded with the documents and to reduce hallucination.
Self-hosted models: AI models that are deployed and managed by the organization using them, rather than relying on a third-party provider.
Serverless architecture: An architecture pattern where the cloud provider automatically manages the allocation and provisioning of computational resources, allowing for scalability and cost optimization.
Tokenization: The process of breaking down input text into smaller units called tokens, which can be words, subwords, or characters, as a preprocessing step for natural language processing tasks.
Vector store: A specialized data store for efficient storage and retrieval of high-dimensional vector embeddings, often used in semantic search and retrieval tasks. Vector stores such as Amazon OpenSearch Service serverless support different search algorithms.
Zero-shot learning: The ability of a model to perform a task or make predictions on examples it has never seen before, without requiring task-specific training data.

For the latest AWS terminology, see the AWS glossary in the AWS Glossary Reference.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Abstract and introduction

Design principles