Cohere Embed models
You make inference requests to an Embed model with InvokeModel You need the model ID
for the model that you want to use. To get the model ID, see Amazon Bedrock model IDs.
Amazon Bedrock doesn't support streaming responses from Cohere Embed models.
Request and Response
- Request
-
The Cohere Embed models have the following
inference parameters.
{
"texts":[string],
"input_type": "search_document|search_query|classification|clustering",
"truncate": "NONE|START|END",
"embedding_types": embedding_types
}
The following are required parameters.
-
texts – An array
of strings for the model to embed. For
optimal performance, we recommend reducing the length of each text to less than
512 tokens. 1 token is about 4 characters.
The following are text per call and character limits.
Texts per call
Minimum |
Maximum |
0 texts
|
96 texts
|
Characters
Minimum |
Maximum |
0 characters
|
2048 characters
|
-
input_type –
Prepends special tokens to differentiate each type from one another. You should
not mix different types together, except when mixing types for
for search and retrieval. In this case, embed your corpus with the
search_document
type and embedded queries with type search_query
type.
-
search_document
– In search
use-cases, use search_document
when you encode
documents for embeddings that you store in a vector database.
-
search_query
– Use search_query
when querying your vector DB to find relevant documents.
-
classification
– Use classification
when using embeddings as an input to a text classifier.
-
clustering
– Use clustering
to cluster the embeddings.
The following are optional parameters:
-
truncate –
Specifies how the API handles inputs longer than the maximum token length. Use
one of the following:
-
NONE
– (Default) Returns an error when the input exceeds the maximum input token length.
-
START
– Discards the start of the input.
-
END
– Discards the end of the input.
If you specify START
or END
, the model discards
the input until the remaining input is exactly the maximum input token length
for the model.
-
embedding_types –
Specifies the types of embeddings you want to have returned. Optional and default is None
,
which returns the Embed Floats
response type. Can be one or more of the following types:
-
float
– Use this value to return the default float embeddings.
-
int8
– Use this value to return signed int8 embeddings.
-
uint8
– Use this value to return unsigned int8 embeddings.
-
binary
– Use this value to return signed binary embeddings.
-
ubinary
– Use this value to return unsigned binary embeddings.
For more information, see https://docs.cohere.com/reference/embed in the
Cohere documentation.
- Response
-
The body
response from a call to InvokeModel
is the following:
{
"embeddings": [
[ <array of 1024 floats>
]
],
"id": string,
"response_type" : "embeddings_floats,
"texts": [string]
}
The body
response has the following fields:
-
id – An identifier for the response.
-
response_type – The response type. This value
is always embeddings_floats
.
-
embeddings – An array of
embeddings, where each embedding is an array of floats with 1024 elements. The length of
the embeddings
array will be the same as the length of the
original texts
array.
-
texts – An array containing the
text entries for which embeddings were returned.
For more information, see https://docs.cohere.com/reference/embed.
Code example
This examples shows how to call the Cohere Embed English model.
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text embeddings using the Cohere Embed English model.
"""
import json
import logging
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
def generate_text_embeddings(model_id, body):
"""
Generate text embedding by using the Cohere Embed model.
Args:
model_id (str): The model ID to use.
body (str) : The reqest body to use.
Returns:
dict: The response from the model.
"""
logger.info(
"Generating text emdeddings with the Cohere Embed model %s", model_id)
accept = '*/*'
content_type = 'application/json'
bedrock = boto3.client(service_name='bedrock-runtime')
response = bedrock.invoke_model(
body=body,
modelId=model_id,
accept=accept,
contentType=content_type
)
logger.info("Successfully generated text with Cohere model %s", model_id)
return response
def main():
"""
Entrypoint for Cohere Embed example.
"""
logging.basicConfig(level=logging.INFO,
format="%(levelname)s: %(message)s")
model_id = 'cohere.embed-english-v3'
text1 = "hello world"
text2 = "this is a test"
input_type = "search_document"
embedding_types = ["int8", "float"]
try:
body = json.dumps({
"texts": [
text1,
text2],
"input_type": input_type,
"embedding_types": embedding_types}
)
response = generate_text_embeddings(model_id=model_id,
body=body)
response_body = json.loads(response.get('body').read())
print(f"ID: {response_body.get('id')}")
print(f"Response type: {response_body.get('response_type')}")
print("Embeddings")
for i, embedding in enumerate(response_body.get('embeddings')):
print(f"\tEmbedding {i}")
print(*embedding)
print("Texts")
for i, text in enumerate(response_body.get('texts')):
print(f"\tText {i}: {text}")
except ClientError as err:
message = err.response["Error"]["Message"]
logger.error("A client error occurred: %s", message)
print("A client error occured: " +
format(message))
else:
print(
f"Finished generating text embeddings with Cohere model {model_id}.")
if __name__ == "__main__":
main()