

# Cross-modal input
<a name="sonic-cross-modal"></a>

Amazon Nova 2 Sonic now supports cross-modal input, allowing you to send text messages in addition to voice input during a conversation session. While speech remains the primary mode of interaction, text input provides flexibility for scenarios where typing is more convenient or appropriate.

** Continuous streaming required**: Cross-modal input requires an active streaming session to function properly. The session must maintain continuous streaming like a regular voice session, otherwise standard session timeouts will be applied and the connection will be terminated.

Sensitivity levels in cross-modal text input is useful for scenarios such as:
+ Client-side app integration (web and mobile): Allows users to interact with the application using both text and voice, supporting seamless multimodal experiences.
+ "Model-start-first" use case: A text message can be sent immediately after the session starts to prompt the model to begin speaking.
+ Guiding the model during async tool calling: When a toolUse event is triggered and the system begins processing tool calls, the client can send a text message to Sonic to provide a natural response while waiting — such as, “Hold on a second while I process your information. In the meantime, is there anything else I can assist with?” 
+ Telephony DTMF integration: Customer uses phone keypad to enter sensitive information (such as credit card numbers). Note: Amazon Nova Sonic does not process DTMF tones natively. To support DTMF input, your system must detect the tones, convert them to text (such as "1234"), and send to Nova 2 Sonic.

## How it works
<a name="sonic-cross-modal-works"></a>

Cross-modal input uses a three-event sequence similar to audio input:

1. **Content Start Event:** Signals the beginning of text input

1. **Text Input Event:**Contains the actual text message

1. **Content End Event:** Signals the completion of text input

All three events must use the same promptName and contentName to maintain the sequence. A new UUID should be generated for contentName each time you send text input to ensure proper multi-turn conversation tracking.

## Event structure
<a name="sonic-cross-modal-events"></a>

### 1. Content Start Event
<a name="cross-modal-events-start-event"></a>

Initiates the text input sequence with configuration details:

```
{
  "event": {
    "contentStart": {
      "promptName": "<prompt_name>",
      "contentName": "<new_content_name>",
      "role": "USER",
      "type": "TEXT",
      "interactive": true,
      "textInputConfiguration": {
        "mediaType": "text/plain"
      }
    }
  }
}
```

 Key Parameters:
+ `promptName`:The name of your conversation prompt (consistent across the session)
+ `contentName`: A unique identifier for this text input (generate a new UUID for each message)
+ `role`: Set to `"USER"` to indicate user input
+ `type`: Set to `"TEXT"` for text input
+ `interactive`: Set to `true` to enable interactive mode
+ `mediaType`: Set to `"text/plain"` for plain text content

### 2. Text Input Event
<a name="cross-modal-events-text-input-event"></a>

Contains the actual text message content:

```
{
  "event": {
    "textInput": {
      "promptName": "<prompt_name>",
      "contentName": "<new_content_name>",
      "content": "<your_text_message>"
    }
  }
}
```

 Key Parameters:
+ `promptName`: Must match the value from Content Start Event
+ `contentName`: Must match the value from Content Start Event
+ `role`: Your text message string

### 3. Content End Event
<a name="cross-modal-events-content-end-event"></a>

Signals the completion of the text input:

```
{
  "event": {
    "contentEnd": {
      "promptName": "<prompt_name>",
      "contentName": "<new_content_name>"
    }
  }
}
```

 Key Parameters:
+ `promptName`: Must match the value from previous events
+ `contentName`: Must match the value from previous events