Appendix C: Analysis State Machine - Media2Cloud

Appendix C: Analysis State Machine

The Media2Cloud solution includes an analysis state machine that is composed of three different AWS Step Functions sub-state machines and a set of AWS Lambda functions that start, monitor, and collect results from the sub-state machines. The analysis state machine consists of the following sub-state machines:

  • A video analysis sub-state machine that manages the video-based analysis process

  • An audio analysis sub-state machine that manages the audio-based analysis process

  • An image analysis sub-state machine that manages the image-based analysis process

When the ingest process is completed, the web interface displays a Create metadata button to start an analysis workflow. When initiated, the web interface sends an analysis request to Amazon API Gateway where a Lambda function validates the request and starts the analysis state machine. Similar to the ingest state machine, the analysis state machine publishes progress and status to an AWS IoT topic. The web interface processes the progress status and sends updates to the web interface.


      Analysis workflow

Figure 10: Analysis workflow

  • Start analysis - Parses the input request and starts the video and audio analysis machines for a video file and/or starts the image analysis machine for an image file by calling the StepFunctions StartExecution API.

  • Check analysis status - Periodically checks each of the sub-state machine’s status by calling the StepFunctions DescribeExecution API.

  • Collect analysis results - Collects outputs from each sub-state machine by calling the StepFunctions DescribeExecution API, parses, and joins the results.

  • Index analysis results - Indexes analysis metadata collected from each sub-state machine to an Amazon Elasticsearch engine.

Video Analysis Sub-state Machine

The video analysis sub-state machine is managed by the analysis state machine. It runs a series of Amazon Rekognition async processes to extract faces, celebrities, labels, moderation, and face match data from the video file. This sub-state machine consists of a number of parallel branches where each branch runs and monitors a specific Amazon Rekognition async process. For example, StartCelebrityRecognition detects celebrities from the video.


      Video analysis workflow

Figure 11: Video analysis workflow

The following video analysis workflow shows the celebrity detection process.


      Video analysis workflow for celebrity detection

Figure 12: Video analysis workflow for celebrity detection

  • Start celeb detection - Calls the Amazon Rekognition StartCelebrityRecognition API to start the async process.

  • Check celeb detection status - Periodically calls the Amazon Rekognition GetCelebrityRecognition API to check the status.

  • Collect celeb results - Downloads celebrity results using the GetCelebrityRecognition API and stores raw results to an Amazon Simple Storage Service (Amazon S3) bucket in the following filepath: <uuid>/<filename>/analysis/raw/<datetime>/rekog/celeb/. This Lambda function also temporarily stores a list of celebrity names in an Amazon DynamoDB table, analysis-queue-table, for further processing.

  • Create celeb tracks - Fetches the list of celebrity names from the analysis-queue-table, downloads the metadata from the Amazon S3 bucket using the Amazon S3 SelectObjectContent API, converts timecode based celebrity metadata into a WebVTT track, and updates the analysis-queue-table to remove the processed celebrity name. The processed WebVTT track and corresponding metadata are uploaded to the Amazon S3 bucket in <uuid>/<filename>/analysis/vtt/rekog/celeb/ and <uuid>/<filename>/analysis/metadata/rekog/celeb/.

Audio Analysis Sub-state Machine

The audio analysis sub-state machine is managed by the analysis state machine. This sub-state machine runs Amazon Transcribe and Amazon Comprehend to extract transcription, entities, key phrases, sentiments, topic, and classification metadata. This sub-state machine first runs Amazon Transcribe to convert speech to text and starts a number of branches in parallel where each branch runs and monitors a specific Amazon Comprehend process.


        Audio analysis workflow

Figure 13: Audio analysis workflow

The following audio analysis workflow shows the Lambda functions used to run the audio processes.


        Audio analysis workflow

Figure 14: Audio analysis workflow

  • Start transcribe process - Calls the Amazon Transcribe StartTranscriptionJob async API to start the speech to text process and transitions the transcribe status state to Wait.

  • Check transcribe process status - Calls the Amazon Transcribe GetTranscriptionJob async API to check transcription job status.

  • Download transcription - Downloads transcription results from the Amazon Transcribe service and stores the raw results to the Amazon S3 bucket in <uuid>/<filename>/analysis/raw/<datetime>/transcribe/. The state machine then starts the parallel branch.


        Audio analysis workflow to create subtitle

Figure 15: Audio analysis workflow to create subtitle

  • Create subtitle state - Converts the timestamp-based transcription into a WebVTT subtitle track. The WebVTT track is uploaded to the Amazon S3 bucket in <uuid>/<filename>/analysis/vtt/transcribe/.


      Audio analysis workflow to comprehend entity detection

Figure 16: Audio analysis workflow to comprehend entity detection

  • Start entity detection - Downloads the transcription from the Amazon S3 bucket, checks to ensure the transcription contains enough data to run natural language processing (NLP). If there is not enough data, the $.status flag is set to NO_DATA. If there is enough data, the Lambda function calls the Amazon Comprehend BatchDetectEntities API and stores the raw metadata to the Amazon S3 bucket in <uuid>/<filename>/analysis/raw/<datetime>/comprehend/entity/.

  • Create entity tracks - Converts word offset entity metadata results to timestamp-based metadata results. The result is stored to the Amazon S3 bucket in <uuid>/<filename>/analysis/vtt/comprehend/entity/.

Image Analysis Sub-state Machine

The image analysis sub-state machine is managed by the analysis state machine. It runs a series of Amazon Rekognition image (synchronized) processes to extract faces, celebrities, labels, moderation, face match, and texts from the video or image file.


        Image analysis workflow

Figure 17: Image analysis workflow

  • Start image analysis - Runs the Amazon Rekognition RecognizeCelebrities, DetectFaces, SearchFacesByImage, DetectLabels, DetectModerationLabels, and DetectText APIs in parallel, and collects and stores the metadata to the Amazon S3 bucket in <uuid>/<filename>/analysis/raw/<datetime>/rekog-image/<type>/.