Appendix B: Ingest State Machine - Media2Cloud

Appendix B: Ingest State Machine

The Media2Cloud solution ingests videos and images to extract media information and generate proxies using AWS Step Functions state machines and an AWS Lambda function. When a new video or image file is uploaded through the web interface, the solution sends an HTTP request to the Amazon API Gateway RESTful API endpoint to start the ingest process. A Lambda function invokes the ingest state machine. The state machine progress and status are sent to an AWS IoT topic that enables the web interface to refresh the results.

The ingest state machine is composed of the following processes.


      Ingest workflow

Figure 7: Ingest workflow

  • Create record - Creates a record of the uploaded file to the ingest Amazon DynamoDB table.

  • Restore object - Checks the storage class of the uploaded file using the S3.HeadObject API. If the file is in either the GLACIER or DEEP_ARCHIVE storage class, the Lambda function starts the restore process using the S3.RestoreObject API.

  • Compute checksum - Incrementally computes the MD5 checksum of a 20 GB chunk using the S3.GetObject byte range.


      Ingest workflow

Figure 8: Ingest workflow

  • Run mediainfo - Runs the MediaInfo tool to extract technical metadata from the video or audio file. The raw MediaInfo XML result is stored in the proxy bucket.

  • Start transcode - Creates a job template based on the media information extracted by MediaInfo. If the video file contains multiple audio tracks (an MXF file can contain eight to 16 audio tracks), the Lambda function selects the best combination of audio tracks, and runs AWS Elemental MediaConvert to create the proxy files and thumbnails. The proxy files and thumbnail images are stored in a proxy S3 bucket.

  • Check transcode status - Checks the transcode status by using the MediaConvert.GetJob API.

  • Run imageinfo - Runs exiftool to extract EXIF information from the image file, generates an image proxy file, and stores the proxies to proxy bucket.


      Ingest workflow

Figure 9: Ingest workflow

  • Update record - Collects all results from the states such as locations of the proxies, thumbnail images, and either MediaInfo (for videos) or embedded technical metadata within videos or images and updates the results to the ingest DynamoDB table.

  • Index ingest results - Indexes the technical metadata to Amazon Elasticsearch Service cluster.