Sync to ingest your data sources into the knowledge base
After you create your knowledge base, you ingest the data sources into the knowledge base so that they're indexed and able to be queried. Ingestion converts the raw data in your data source into vector embeddings. It also associates the raw text and any relevant metadata that you set up for filtering to augment the querying process. Before you begin ingestion, check that your data source fulfills the following conditions:
-
The Amazon S3 bucket for the data source is in the same region as the knowledge base.
-
The files are in supported formats. For more information, see Set up a vector index for your knowledge base in a supported vector store.
-
The files don't exceed the maximum file size of 50 MB. For more information, see Knowledge base quotas.
-
If your data source contains metadata files, check the following conditions to ensure that the metadata files aren't ignored:
-
Each
.metadata.json
file shares the same name as the source file that it's associated with. -
If the vector index for your knowledge base is in an Amazon OpenSearch Serverless vector store, check that the vector index is configured with the
faiss
engine. If the vector index is configured with thenmslib
engine, you'll have to do one of the following:-
Create a new knowledge base in the console and let Amazon Bedrock automatically create a vector index in Amazon OpenSearch Serverless for you.
-
Create another vector index in the vector store and select
faiss
as the Engine. Then create a new knowledge base and specify the new vector index.
-
-
If the vector index for your knowledge base is in an Amazon Aurora database cluster, check that the table for your index contains a column for each metadata property in your metadata files before starting ingestion.
-
Note
Each time you add, modify, or remove files from the S3 bucket for a data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes the objects in your S3 bucket that have been added, modified, or deleted since the last sync.
To learn how to ingest your data sources into your knowledge base, Select the tab corresponding to your method of choice and follow the steps.