Using S3DataConverter to Manage Large Workflow Data

By default, the AWS Flow Framework for Ruby uses YAMLDataConverter to serialize Ruby objects that you pass to (as input data), and that is returned from, your workflows and activities.

The Amazon SWF service-defined limit for input or output data from activities and workflows is 32,768 (32K) characters. Any data structure that you want to pass as data directly must fit within this limit.

If you want to pass more than 32K characters of data to a workflow or activity, use S3DataConverter. Data larger than 32K characters is stored in an Amazon S3 bucket, and S3DataConverter passes a hash containing an Amazon S3 path to the data for the workflow instead of passing the data itself.

S3DataConverter locally caches data that it serializes or deserializes and uses the cached data if it exists; it only downloads data from S3 when necessary.

To use S3DataConverter

  1. Activate S3Dataconverter by setting the AWS_SWF_BUCKET_NAME environment variable to an Amazon S3 bucket name.
  2. (Optional) Set a bucket lifecycle on the Amazon S3 bucket used to store your SWF data.

Activate S3DataConverter#

S3DataConverter will be used automatically instead of YAMLDataConverter if you set the AWS_SWF_BUCKET_NAME environment variable. For example, on Linux, macOS, or Unix, use:

export AWS_SWF_BUCKET_NAME="bucketname"

On Windows, use set instead of export.

Note

If you deploy your Ruby applications using AWS Elastic Beanstalk, see Customizing and Configuring a Ruby Environment in the Elastic Beanstalk Developer Guide for information about how to set environment variables on your instances.

Set a bucket lifecycle#

The AWS Flow Framework for Ruby doesn't delete files from S3 in order to prevent loss of data. It is recommended that you use Object Lifecycle Management in Amazon S3 to automatically delete objects after a certain period of time.

For example, here is an Amazon S3 bucket lifecycle policy that deletes objects automatically after three days:

{
    "Rules": [
        {
            "Status": "Enabled",
            "Prefix": "",
            "Expiration": {
                "Days": 3
            },
            "ID": "swf-bucket-rule"
        }
    ]
}

You should set your bucket lifecycle so that it respects the run-time of your workflows. For more information about setting bucket lifecycle configurations, see Specifying a Lifecycle Configuration in the Amazon S3 Developer Guide.