Broadcasting and Encoding Monitoring and Events Stream Playback Auto-Record to Amazon S3 Miscellaneous Topics

Troubleshooting FAQs

This document describes best practices and troubleshooting tips for Amazon Interactive Video Service (IVS). Unexpected or unintended behaviors may occur when using IVS. These behaviors can occur at various points in the streaming process, from broadcasting to playback of content:

Unexpected or unintended behaviors can occur at various points in the streaming process, from broadcasting to playback of content.

For information on support and other Amazon IVS resources, see Resources and Support.

Broadcasting and Encoding

Questions in this section are about broadcasting, encoding, and first-mile conditions of streaming to IVS. These behaviors occur before the content reaches IVS servers.

Topics:

What is stream starvation?
Why did the stream suddenly stop?
What happens when I switch networks while streaming?
How can I have multi-region redundancy with IVS?
How do I troubleshoot an IVS Web Broadcast SDK session?
How do I use Google Chrome’s WebRTC-internals metrics to evaluate an IVS Web Broadcast SDK session?

What is stream starvation?

"Stream starvation" is a delay or halt in content packet delivery when you are sending content to IVS; that is, when content is being ingested by IVS. If IVS does not get the expected amount of bits on ingest that the encoding device advertised it would send over a certain timeframe, this is considered a starvation event. Often, starvation events are caused by the broadcaster’s encoder, local network conditions, and/or in transit over the public internet, between the encoding device and IVS.

From a viewer's perspective, starvation events may appear as video that lags, buffers, or freezes. Stream-starvations events can be brief (less than 5 seconds) or long (several minutes), depending on the nature of the starvation event.

To allow monitoring for starvation events, IVS sends starvation events as Amazon EventBridge events; see Examples: Stream Health Change in Using Amazon EventBridge with Amazon IVS. These are sent when a stream enters or exits a state of starvation. Depending on the use case, you can take an appropriate action, like notifying the broadcaster and viewers of intermittent stream conditions.

For additional starvation monitoring tools, see Monitoring Amazon IVS Low-Latency Streaming, the IVS ListStreams API endpoint (filtering by health), and the IVS GetStream endpoint (to analyze an individual stream). Also see How do I monitor stream-starvation events?

Why did the stream suddenly stop?

The following are the most common reasons why a stream can abruptly stop (i.e., the stream session ends):

Missing ingest data — When the ingest of a stream session completely stops (no data ingested into IVS) for 30 seconds, the IVS ingest server terminates the IVS stream session. The 30-second period allows the broadcaster to reconnect to the ingest server. However, in some cases (such as switching networks), reconnection to the existing stream session may not be possible, as the TLS handshake of RTMPS has been broken. Common root causes for this include network issues (like congestion between the broadcast device and IVS), complete loss of internet on the broadcast device, or the broadcast device not producing content segments (FLV tags).

Often, stream disconnection aligns with a stream-starvation event; the starvation event is triggered when there is a halt in incoming data. If a starvation-start event is sent and then a stream-end event is sent (without a starvation-end event), this often indicates that the stream was ended due to no data being sent to IVS.
IVS StopStream endpoint — During an IVS stream session, if the StopStream API call is made, the IVS stream session will end. The StopStream endpoint disconnects the incoming RTMPS stream from the IVS ingest server. Depending on the encoding software/hardware being used, a new stream session may be attempted.
Encoder error — Some software/hardware encoders will disconnect the stream session when an error occurs during the encoding process. From the IVS perspective, these disconnections appear as intentional disconnects by the broadcaster. However, in the encoding logs, it may be determined that the stream was disconnected due to an unintentional error.

What happens when I switch networks while streaming?

When a broadcaster switches networks (for example, from WiFi to cellular), an ongoing RTMPS connection is disconnected. While the broadcaster’s internet connection probably is re-established after 3-4 seconds, the new connection has a new IP address due to the network switch, which generates a new RTMPS connection. During this switch, the previous RTMPS connection is not disconnected cleanly: the encoder does not send IVS a disconnect message. As a result, IVS waits 30 seconds for the previous RTMPS connection to reconnect, which blocks the new RTMPS stream on the new network from connecting to IVS.

To enable faster switching between networks, we recommend that you use the IVS StopStream endpoint to close the previous stream session when the device switches networks. In this scenario, when the broadcast device connects to the new network, the broadcast device could call the StopStream endpoint to end the now-dormant stream. Following a successful StopStream call, the broadcast device could begin a new stream session on the new network without waiting for 30 seconds.

How can I have multi-region redundancy with IVS?

Redundancy within IVS can be achieved in several ways; see Resilience in IVS Security .

IVS is separated into different networking planes; Control and Data.

The control plane is regional (based on AWS regions) and stores information about IVS resources (channels, stream keys, playback key pairs, and recording configurations).
The data plane is not restricted to an AWS region and is the network that carries data from ingest to egress. Even if a channel is created in the us-west-2 region (for example), the video that is streamed to that channel may not go through us-west-2.

Also see Global Solution, Regional Control. Consider these two scenarios:

If only one control-plane region (e.g., us-east-1) is being used — If a particular AWS control region experiences a degradation or outage, the IVS control plane may experience latency or errors when creating, reading, updating, or deleting any of the following: channels, stream keys, playback key pairs, or recording configurations. Trying to start a new stream during an outage may result in more latency or errors when initiating a stream session. Depending on severity of the degradation, it may be possible to continue broadcasting to a channel with an already ongoing stream.

If playback authorization is enabled, current viewers probably can continue their playback of ongoing streams, but new viewers may not be able to start viewing if there are issues with playback key-pair authorization. If playback authorization is not enabled, both current and new viewers should be able to view the ongoing stream.

The IVS Auto-Record to S3 feature also may be interrupted in the event of an outage.

The IVS control plane does not automatically fail over to another AWS region in the event of a regional outage.
If two control-plane regions (e.g., us-east-1 and us-west-2) are being used, and the second region is a failover if the primary region is unavailable — IVS does not natively support regional control-plane failover; thus, if a control-plane region experiences issues, new streams starting or calls to the control plane may experience issues. However, the data plane probably would not be impacted, so ongoing streams for the control plane region would continue without issue. Moving the control plane to a secondary (failover) region would need to be accomplished on the application side. You can write custom implementation logic to handle control-plane failover. We do not have official guidance on how to manage a regional channel failover.

By separating the video data plane and the regional control plane, the IVS architecture adds resilience: ongoing live streams should have little to no interruption in the event of a regional control-plane failure. IVS maintains an SLA of 99.9% uptime and is committed to ensuring the stability of its infrastructure for its customers (see our SLA).

How do I troubleshoot an IVS Web Broadcast SDK session?

The IVS Web Broadcast SDK works slightly differently than a normal IVS RTMPS ingest session. The Web Broadcast SDK leverages the WebRTC protocol to stream to an IVS endpoint. Once the content enters the IVS endpoint, it is processed and remuxed/transcoded into the HLS output for viewing.

Due to the nature of the Web Broadcast SDK, note these tips for troubleshooting encoding behaviors:

Close any tabs/programs on the broadcasting device that are not required to be open during the broadcasting session. Extraneous tabs/programs can use computing resources (such as CPU, RAM, and networking), which can cause poor performance for the broadcasting application. For tabs/programs that cannot be closed, ensure they are not using unnecessary amounts of computing resources.
Ensure that the device’s upload speed exceeds 200 Kbps. (This is noted in one of the Known Issues for the Web Broadcast SDK.) To evaluate the upload speed, open the Task Manager of the broadcasting device to analyze the network available when streaming. If the upload speed/bitrate is lower than expected or desired, evaluate other tabs/processes that may be consuming bandwidth. Also, look at other machines on the local network that may be consuming high amounts of bandwidth.
If there are random spikes in CPU usage, look at the Task Manager of the machine to understand what processes may be consuming CPU. A common service that randomly causes CPU usage is anti-virus software which runs periodic scans on the machine.
Try to stream via https://stream.ivs.rocks/ to help isolate environments and ensure that the application logic is not causing the undesirable behavior. This site is operated by IVS and is a solid testing environment to evaluate if any part of the integration with the Web Broadcast SDK is the root cause of the undesirable behavior.
Try using Google Chrome’s WebRTC-internals (see below).

How do I use Google Chrome’s WebRTC-internals metrics to evaluate an IVS Web Broadcast SDK session?

When streaming via the IVS Web Broadcast SDK, various behaviors can occur during encoding and sending of the broadcast. Follow these steps to troubleshoot or gather information about the session on the broadcasting device:

In Google Chrome, open the broadcasting webpage.
Open a new Chrome tab and go to chrome://webrtc-internals/ (copy this exactly).
In the original broadcasting-webpage tab, start the Web Broadcasting SDK session and let the session run until the behavior is observed.
Once the behavior is observed, switch to the chrome://webrtc-internals/ tab (do not end the broadcast session), and ensure that the correct webpage is displayed:
Open the Create Dump expandable section at the very top of the screen.
Select Download the PeerConnection updates and stats data at the top of the screen (right below Create Dump), to download the .txt file from the relevant session.
Once downloaded, the file will show an historical view of the WebRTC connection. You can view this in various tools or send it to the AWS Support team for further analysis.

Monitoring and Events

Questions in this section are about IVS monitoring, metrics, and events.

Topics:

How do I monitor stream-starvation events?
How do I use Amazon CloudWatch to monitor IVS service quotas?
How do I diagnose stream instability using IVS Stream Health?

How do I monitor stream-starvation events?

We recommend the following methods of monitoring for stream-starvation events:

Amazon EventBridge with Amazon IVS — When a stream-starvation event starts or ends, IVS produces an EventBridge stream health change event. Using Amazon EventBridge targets and rules, you can use these stream-starvation event to get alerts when stream starvation is occurring. For details on targets and rules, see the Amazon EventBridge User Guide.
Monitoring Amazon IVS Low-Latency Streaming — During a live-stream session, data is recorded and then available via IVS stream-health analytics. This includes information about encoder configuration, ingest metrics, and stream-session events. This is beneficial when monitoring an ongoing stream or retroactively evaluating a stream. You can use the IVS console or API to identify streams that have experienced starvation. Stream-session data is available for 60 days, even after a channel is deleted, so this can be useful for identifying past streams with starvation events.
Filtering Streams by Health — With the IVS console or the IVS ListStreams API endpoint, you can use the health filter to find stream sessions that are in a STARVING state. Also, the IVS CloudWatch metric for ConcurrentStreams includes a Health dimension that you can use to gather a total count of streams that are in a stream-starvation state. See Monitoring Amazon IVS Low-Latency Streaming.
You can use the IVS GetStream endpoint to analyze an individual stream.

Also see What is stream starvation?

How do I use Amazon CloudWatch to monitor IVS service quotas?

You can use Amazon CloudWatch to proactively monitor/manage IVS service quotas. See IVS Service Quotas. This documentation includes information on creating CloudWatch alarms for usage metrics.

We recommend that you set up a proper SNS topic to notify the correct individuals/groups when an alarm is triggered. If the alarm is triggered and the quota is adjustable, you should request a service-quota increase with a new value. See IVS Service Quotas for information on requesting an increase.

How do I diagnose stream instability using IVS Stream Health?

We recommend that you evaluate stream instability using the IVS Stream Health dashboard. Instructions are in Monitoring Amazon IVS Low-Latency Streaming.

The dashboard has time-series graphs for video bitrate, frame rate, and audio bitrate; examples are below. Also, you can click View in CloudWatch to view the data in Amazon CloudWatch.

Several scenarios are discussed below.

Low Internet Bandwidth or Internet Congestion

In this case, the stream is relatively unstable, even when bitrates are lowered. Either there is not enough bandwidth between the broadcaster and the ISP or between the ISP and IVS, or something is wrong in the network path to IVS. To resolve this, check that no other network process is using bandwidth, or contact the ISP for network diagnostics.