Appendix C: Analysis State Machine
The Media2Cloud solution includes an analysis state machine that is composed of three different AWS Step Functions sub-state machines and a set of AWS Lambda functions that start, monitor, and collect results from the sub-state machines. The analysis state machine consists of the following sub-state machines:
-
A video analysis sub-state machine that manages the video-based analysis process
-
An audio analysis sub-state machine that manages the audio-based analysis process
-
An image analysis sub-state machine that manages the image-based analysis process
When the ingest process is completed, the web interface displays a Create metadata button to start an analysis workflow. When initiated, the web interface sends an analysis request to Amazon API Gateway where a Lambda function validates the request and starts the analysis state machine. Similar to the ingest state machine, the analysis state machine publishes progress and status to an AWS IoT topic. The web interface processes the progress status and sends updates to the web interface.

Figure 10: Analysis workflow
-
Start analysis - Parses the input request and starts the video and audio analysis machines for a video file and/or starts the image analysis machine for an image file by calling the
StepFunctions StartExecution
API. -
Check analysis status - Periodically checks each of the sub-state machine’s status by calling the
StepFunctions DescribeExecution
API. -
Collect analysis results - Collects outputs from each sub-state machine by calling the
StepFunctions DescribeExecution
API, parses, and joins the results. -
Index analysis results - Indexes analysis metadata collected from each sub-state machine to an Amazon Elasticsearch engine.
Video Analysis Sub-state Machine
The video analysis sub-state machine is managed by the analysis state machine. It
runs a series of Amazon Rekognition async processes to extract faces, celebrities,
labels, moderation, and face match data from the video file. This sub-state machine
consists of a number of parallel branches where each branch runs and monitors a specific
Amazon Rekognition async process. For example, StartCelebrityRecognition
detects celebrities from the video.

Figure 11: Video analysis workflow
The following video analysis workflow shows the celebrity detection process.

Figure 12: Video analysis workflow for celebrity detection
-
Start celeb detection - Calls the Amazon Rekognition
StartCelebrityRecognition
API to start the async process. -
Check celeb detection status - Periodically calls the Amazon Rekognition
GetCelebrityRecognition
API to check the status. -
Collect celeb results - Downloads celebrity results using the
GetCelebrityRecognition
API and stores raw results to an Amazon Simple Storage Service (Amazon S3) bucket in the following filepath:
. This Lambda function also temporarily stores a list of celebrity names in an Amazon DynamoDB table,<uuid>
/<filename>
/analysis/raw/<datetime>
/rekog/celeb/analysis-queue-table
, for further processing. -
Create celeb tracks - Fetches the list of celebrity names from the analysis-queue-table, downloads the metadata from the Amazon S3 bucket using the Amazon S3 SelectObjectContent API, converts timecode based celebrity metadata into a WebVTT track, and updates the analysis-queue-table to remove the processed celebrity name. The processed WebVTT track and corresponding metadata are uploaded to the Amazon S3 bucket in
and<uuid>
/<filename>
/analysis/vtt/rekog/celeb/
.<uuid>
/<filename>
/analysis/metadata/rekog/celeb/
Audio Analysis Sub-state Machine
The audio analysis sub-state machine is managed by the analysis state machine. This sub-state machine runs Amazon Transcribe and Amazon Comprehend to extract transcription, entities, key phrases, sentiments, topic, and classification metadata. This sub-state machine first runs Amazon Transcribe to convert speech to text and starts a number of branches in parallel where each branch runs and monitors a specific Amazon Comprehend process.

Figure 13: Audio analysis workflow
The following audio analysis workflow shows the Lambda functions used to run the audio processes.

Figure 14: Audio analysis workflow
-
Start transcribe process - Calls the Amazon Transcribe
StartTranscriptionJob
async API to start the speech to text process and transitions the transcribe status state toWait
. -
Check transcribe process status - Calls the Amazon Transcribe
GetTranscriptionJob
async API to check transcription job status. -
Download transcription - Downloads transcription results from the Amazon Transcribe service and stores the raw results to the Amazon S3 bucket in
. The state machine then starts the parallel branch.<uuid>
/<filename>
/analysis/raw/<datetime>
/transcribe/

Figure 15: Audio analysis workflow to create subtitle
-
Create subtitle state - Converts the timestamp-based transcription into a WebVTT subtitle track. The WebVTT track is uploaded to the Amazon S3 bucket in
.<uuid>
/<filename>
/analysis/vtt/transcribe/

Figure 16: Audio analysis workflow to comprehend entity detection
-
Start entity detection - Downloads the transcription from the Amazon S3 bucket, checks to ensure the transcription contains enough data to run natural language processing (NLP). If there is not enough data, the
$.status
flag is set to NO_DATA. If there is enough data, the Lambda function calls the Amazon ComprehendBatchDetectEntities
API and stores the raw metadata to the Amazon S3 bucket in
.<uuid>
/<filename>
/analysis/raw/<datetime>
/comprehend/entity/ -
Create entity tracks - Converts word offset entity metadata results to timestamp-based metadata results. The result is stored to the Amazon S3 bucket in
.<uuid>
/<filename>
/analysis/vtt/comprehend/entity/
Image Analysis Sub-state Machine
The image analysis sub-state machine is managed by the analysis state machine. It runs a series of Amazon Rekognition image (synchronized) processes to extract faces, celebrities, labels, moderation, face match, and texts from the video or image file.

Figure 17: Image analysis workflow
-
Start image analysis - Runs the Amazon Rekognition
RecognizeCelebrities
,DetectFaces
,SearchFacesByImage
,DetectLabels
,DetectModerationLabels
, andDetectText
APIs in parallel, and collects and stores the metadata to the Amazon S3 bucket in
.<uuid>
/<filename>
/analysis/raw/<datetime>
/rekog-image/<type>
/