Request and response structure for image generation - Amazon Nova

Request and response structure for image generation

The following examples present different image generation use cases. Each example provides an explanation of the fields that are used for the image generation.

Text-to-image request
{ "taskType": "TEXT_IMAGE", "textToImageParams": { "text": string, "negativeText": string }, "imageGenerationConfig": { "width": int, "height": int, "quality": "standard" | "premium", "cfgScale": float, "seed": int, "numberOfImages": int } }

The following textToImageParams fields are used in this request:

  • text (Required) – A text prompt to generate the image. The prompt must be 1-1024 characters in length.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Text-to-image request with image conditioning
{ "taskType": "TEXT_IMAGE", "textToImageParams": { "conditionImage": string (Base64 encoded image), "controlMode": "CANNY_EDGE" | "SEGMENTATION", "controlStrength": float, "text": string, "negativeText": string }, "imageGenerationConfig": { "width": int, "height": int, "quality": "standard" | "premium", "cfgScale": float, "seed": int, "numberOfImages": int } }

The following textToImageParams fields are used in this request:

  • conditionImage (Required) – A JPEG or PNG image that guides the layout and composition of the generated image. The image must be formatted as a Base64 string. See Input images for image generation for additional requirements.

  • controlMode (Optional) – Specifies what conditioning mode is be used. The default value is "CANNY_EDGE".

    • CANNY_EDGE – Elements of the generated image will follow the prominent contours, or "edges", of the condition image closely.

    • SEGMENTATION – The condition image will be automatically analyzed to identify prominent content shapes. This analysis results in a segmentation mask which guides the generation, resulting in a generated image that closely follows the layout of the condition image but allows the model more freedom within the bounds of each content area.

  • controlStrength (Optional) – Specifies how similar the layout and composition of the generated image should be to the conditionImage. The range is 0 to 1.0, and lower values introduce more randomness. The default value is 0.7.

  • text (Required) – A text prompt to generate the image. The prompt must be 1-1024 characters in length.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Color guided image generation request
{ "taskType": "COLOR_GUIDED_GENERATION", "colorGuidedGenerationParams": { "colors": string[] (list of hexadecimal color values), "referenceImage": string (Base64 encoded image), "text": string, "negativeText": string }, "imageGenerationConfig": { "width": int, "height": int, "quality": "standard" | "premium", "cfgScale": float, "seed": int, "numberOfImages": int } }

The following colorGuidedGenerationParams fields are used in this request:

  • colors (Required) – A list of up to 10 color codes that define the desired color palette for your image. Expressed as hexadecimal values in the form “#RRGGBB”. For example, "#00FF00" is pure green and "#FCF2AB" is a warm yellow. The colors list has the strongest effect when a referenceImage is not provided. Otherwise, the colors in the list and the colors from the reference image will both be used in the final output.

  • referenceImage (Optional) – A JPEG or PNG image to use as a subject and style reference. The colors of the image will also be incorporated into you final output, along with the colors in from the colors list. See Input images for image generation for additional requirements.

  • text (Required) – A text prompt to generate the image. The prompt must be 1-1024 characters in length.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Image variation request
{ "taskType": "IMAGE_VARIATION", "imageVariationParams": { "images": string[] (list of Base64 encoded images), "similarityStrength": float, "text": string, "negativeText": string }, "imageGenerationConfig": { "numberOfImages": int, "height": int, "width": int, "cfgScale": float, "seed": int, "numberOfImages": int } }

The following imageVariationParams fields are used in this request:

  • images (Required) - A list of 1–5 images to use as references. Each must be in JPEG or PNG format and encoded as Base64 strings. See Input images for image generation for additional requirements.

  • similarityStrength (Optional) – Specifies how similar the generated image should be to the input images. Valid values are betweeen 0.2-1.0 with lower values used to introduce more randomness.

  • text (Optional) – A text prompt to generate the image. The prompt must be 1-1024 characters in length. If you omit this field, the model will remove elements inside the masked area. They will be replaced with a seamless extension of the image background.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Inpainting request
{ "taskType": "INPAINTING", "inPaintingParams": { "image": string (Base64 encoded image), "maskPrompt": string, "maskImage": string (Base64 encoded image), "text": string, "negativeText": string }, "imageGenerationConfig": { "numberOfImages": int, "quality": "standard" | "premium", "cfgScale": float, "seed": int } }

The following inPaintingParams fields are used in this request:

  • image (Required) - The JPEG or PNG that you want to modify, formatted as a Base64 string. See Input images for image generation for additional requirements.

  • maskPrompt or maskImage (Required) – You must specify either the maskPrompt or the maskImage parameter, but not both.

    The maskPrompt is a natural language text prompt that describes the regions of the image to edit.

    The maskImage is an image that defines the areas of the image to edit. The mask image must be the same size as the input image. Areas to be edited are shared pure black and areas to ignore are shaded pure white. No other colors are allowed in the mask image.

  • text (Optional) – A text prompt that describes what to generate within the masked region. The prompt must be 1-1024 characters in length. If you omit this field, the model will remove elements inside the masked area. They will be replaced with a seamless extension of the image background.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Outpainting request
{ "taskType": "OUTPAINTING", "outPaintingParams": { "image": string (Base64 encoded image), "maskPrompt": string, "maskImage": string (Base64 encoded image), "outPaintingMode": "DEFAULT" | "PRECISE", "text": string, "negativeText": string }, "imageGenerationConfig": { "numberOfImages": int, "quality": "standard" | "premium" "cfgScale": float, "seed": int } }

The following outPaintingParams fields are used in this request:

  • image (Required) - The JPEG or PNG that you want to modify, formatted as a Base64 string. See Input images for image generation for additional requirements.

  • maskPrompt or maskImage (Required) – You must specify either the maskPrompt or the maskImage parameter, but not both.

    The maskPrompt is a natural language text prompt that describes the regions of the image to edit.

    The maskImage is an image that defines the areas of the image to edit. The mask image must be the same size as the input image. Areas to be edited are shared pure black and areas to ignore are shaded pure white. No other colors are allowed in the mask image.

  • outPaintingMode - Determines how the mask that you provide is interpreted.

    Use DEFAULT to transition smoothly between the masked area and the non-masked area. Some of the original pixels are used as the starting point for the new background. This mode is generally better when you want the new background to use similar colors as the original background. However, you can get a halo effect if your prompt calls for a new background that is significantly different than the original background.

    Use PRECISE to strictly adhere to the mask boundaries. This mode is generally better when you are making significant changes to the background.

  • text (Optional) – A text prompt that describes what to generate within the masked region. The prompt must be 1-1024 characters in length. If you omit this field, the model will remove elements inside the masked area. They will be replaced with a seamless extension of the image background.

  • negativeText (Optional) – A text prompt to define what not to include in the image. This value must be 1-1024 characters in length.

Note

Avoid using negating words (“no”, “not”, “without”, etc.) in your text and negativeText values. For example, if you do not want mirrors in an image, instead of including "no mirrors" or "without mirrors" in the text field, use the word "mirrors" in the negativeText field.

Background removal request
{ "taskType": "BACKGROUND_REMOVAL", "backgroundRemovalParams": { "image": string (Base64 encoded image) } }

The following backgroundRemovalParams field is used in this request:

The BACKGROUND_REMOVAL task will return a PNG image with full 8-bit transparency. This format gives you smooth, clean isolation of the foreground objects and makes it easy to composite the image with other elements in an image editing app, presentation, or website. The background can easily be changed to a solid color using simple custom code.

Response body
{ "images": "images": string[] (list of Base64 encoded images), "error": string }

The response body will contain one or more of the following fields:

  • images – When successful, a list of Base64-encoded strings that represent each image that was generated is returned. This list does not always contain the same number of images that you requested. Individual images might be blocked after generation if they do not align with the AWS Responsible AI (RAI) content moderation policy. Only images that align with the RAI policy are returned.

  • error – If any image does not align with the RAI policy, this field is returned. Otherwise, this field is omitted from the response.

The imageGenerationConfig field is common to all task types except BACKGROUND_REMOVAL. It is optional and contains the following fields. If you omit this object, the default configurations are used.

  • width and height (Optional) – Define the size and aspect ratio of the generated image. Both default to 1024. For the full list of supported resolutions, see Supported image resolutions.

  • quality (Optional) - Specifies the quality to use when generating the image - "standard" (default) or "premium".

  • cfgScale (Optional) – Specifies how strongly the generated image should adhere to the prompt. Use a lower value to introduce more randomness in the generation.

    Minimum Maximum Default
    1.1 10 6.5
  • numberOfImages (Optional) – The number of images to generate.

    Minimum Maximum Default
    1 5 1
  • seed (Optional) – Determines the initial noise setting for the generation process. Changing the seed value while leaving all other parameters the same will produce a totally new image that still adheres to your prompt, dimensions, and other settings. It is common to experiment with a variety of seed values to find the perfect image.

    Minimum Maximum Default
    0 858,993,459 12
Important

Resolution (width and height), numberOfImages, and quality all have an impact on the time it takes for generation to complete. The AWS SDK has a default read_timeout of 60 seconds which can easily be exceeded when using higher values for these parameters. Therefore, it is recommended that you increase the read_timeout of your invocation calls to at least 5 minutes (300 seconds). The code examples demonstrate how to do this.