Amazon Titan Multimodal Embeddings G1 model - Amazon Bedrock

Amazon Titan Multimodal Embeddings G1 model

Amazon Titan Multimodal Embeddings G1 Generation 1 (G1) is a multimodal embeddings model for use cases like searching images by text, image, or a combination of text and image. Designed for high accuracy and fast responses, this model is an ideal choice for search and recommendations use cases.

  • Model IDamazon.titan-embed-image-v1

  • Max input text tokens – 128

  • Languages – English

  • Max input image size – 5 MB

  • Output vector size – 1,024 (default), 384, 256

  • Inference types – On-Demand, Provisioned Throughput

  • Supported use cases – image search, recommendations, and personalization

Embedding length

Setting a custom embedding length is optional. The embedding default length is 1024 characters which will work for most use cases. The embedding length can be set to 256, 384, or 1024 characters. Larger embedding sizes create more detailed responses, but will also increase the computational time. Shorter embedding lengths are less detailed but will improve the response time.

# EmbeddingConfig Shape { 'outputEmbeddingLength': int // Optional, One of: [256, 384, 1024], default: 1024 } # Updated API Payload Example body = json.dumps({ "inputText": "hi", "inputImage": image_string, "embeddingConfig": { "outputEmbeddingLength": 256 } })


  • Input to the Amazon Titan Multimodal Embeddings G1 finetuning is image-text pairs.

  • Image formats: PNG, JPEG

  • Input image size limit: 5 MB

  • Image dimensions: min: 128 px, max: 4,096 px

  • Max number of tokens in caption: 128

  • Training dataset size range: 1000 - 500,000

  • Validation dataset size range: 8 - 50,000

  • Caption length in characters: 0 - 2,560

  • Maximum total pixels per image: 2048*2048*3

  • Aspect ratio (w/h): min: 0.25, max: 4

Preparing datasets

For the training dataset, create a .jsonlfile with multiple JSON lines. Each JSON line contains both an image-ref and caption attributes similar to Sagemaker Augmented Manifest format. A validation dataset is required. Auto-captioning is not currently supported.

{"image-ref": "s3://bucket-1/folder1/0001.png", "caption": "some text"} {"image-ref": "s3://bucket-1/folder2/0002.png", "caption": "some text"} {"image-ref": "s3://bucket-1/folder1/0003.png", "caption": "some text"}

For both the training and validation datasets, you will create .jsonlfiles with multiple JSON lines.

The Amazon S3 paths need to be in the same folders where you have provided permissions for Amazon Bedrock to access the data by attaching an IAM policy to your Amazon Bedrock service role. For more information on granting an IAM policies for training data, see Grant custom jobs access to your training data.


These values can be adjusted for the Multimodal Embeddings model hyperparameters. The default values will work well for most use cases.

  • Learning rate - (min/max learning rate) – default: 5.00E-05, min: 5.00E-08, max: 1

  • Batch size - Effective batch size – default: 576, min: 256, max: 9,216

  • Max epochs – default: "auto", min: 1, max: 100