AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
Creating a pipeline Using parametrized templates
You can use a parametrized template to customize a pipeline definition. This enables you to create a common pipeline definition but provide different parameters when you add the pipeline definition to a new pipeline.
Contents
Add myVariables to the pipeline definition
When you create the pipeline definition file, specify variables using the
following syntax: #{myVariable}. It is required
that the variable is prefixed by my. For example, the
following pipeline definition file,
pipeline-definition.json, includes the following
variables: myShellCmd,
myS3InputLoc, and
myS3OutputLoc.
Note
A pipeline definition has an upper limit of 50 parameters.
{ "objects": [ { "id": "ShellCommandActivityObj", "input": { "ref": "S3InputLocation" }, "name": "ShellCommandActivityObj", "runsOn": { "ref": "EC2ResourceObj" }, "command": "#{myShellCmd}", "output": { "ref": "S3OutputLocation" }, "type": "ShellCommandActivity", "stage": "true" }, { "id": "Default", "scheduleType": "CRON", "failureAndRerunMode": "CASCADE", "schedule": { "ref": "Schedule_15mins" }, "name": "Default", "role": "DataPipelineDefaultRole", "resourceRole": "DataPipelineDefaultResourceRole" }, { "id": "S3InputLocation", "name": "S3InputLocation", "directoryPath": "#{myS3InputLoc}", "type": "S3DataNode" }, { "id": "S3OutputLocation", "name": "S3OutputLocation", "directoryPath": "#{myS3OutputLoc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}", "type": "S3DataNode" }, { "id": "Schedule_15mins", "occurrences": "4", "name": "Every 15 minutes", "startAt": "FIRST_ACTIVATION_DATE_TIME", "type": "Schedule", "period": "15 Minutes" }, { "terminateAfter": "20 Minutes", "id": "EC2ResourceObj", "name": "EC2ResourceObj", "instanceType":"t1.micro", "type": "Ec2Resource" } ] }
Define parameter objects
You can create a separate file with parameter objects that defines the
variables in your pipeline definition. For example, the following JSON file,
parameters.json, contains parameter objects for the
myShellCmd,
myS3InputLoc, and
myS3OutputLoc variables from the example
pipeline definition above.
{ "parameters": [ { "id": "myShellCmd", "description": "Shell command to run", "type": "String", "default": "grep -rc \"GET\" ${INPUT1_STAGING_DIR}/* > ${OUTPUT1_STAGING_DIR}/output.txt" }, { "id": "myS3InputLoc", "description": "S3 input location", "type": "AWS::S3::ObjectKey", "default": "s3://us-east-1.elasticmapreduce.samples/pig-apache-logs/data" }, { "id": "myS3OutputLoc", "description": "S3 output location", "type": "AWS::S3::ObjectKey" } ] }
Note
You could add these objects directly to the pipeline definition file instead of using a separate file.
The following table describes the attributes for parameter objects.
| Attribute | Type | Description |
|---|---|---|
id |
String | The unique identifier of the parameter. To mask the value
while it is typed or displayed, add an asterisk ('*') as a
prefix. For example, *myVariable—. Notes
that this also encrypts the value before it is stored by
AWS Data Pipeline. |
| description | String | A description of the parameter. |
| type | String, Integer, Double, or AWS::S3::ObjectKey | The parameter type that defines the allowed range of input values and validation rules. The default is String. |
| optional | Boolean | Indicates whether the parameter is optional or required.
The default is false. |
| allowedValues | List of Strings | Enumerates all permitted values for the parameter. |
| default | String | The default value for the parameter. If you specify a value for this parameter using parameter values, it overrides the default value. |
| isArray | Boolean | Indicates whether the parameter is an array. |
Define Parameter Values
You can create a separate file to define your variables using parameter
values. For example, the following JSON file,
file://values.json, contains the value for
myS3OutputLoc variable from the example
pipeline definition above.
{ "values": { "myS3OutputLoc": "myOutputLocation" } }
Submitting the pipeline definition
When you submit your pipeline definition, you can specify parameters, parameter objects, and parameter values. For example, you can use the put-pipeline-definition AWS CLI command as follows:
$ aws datapipeline put-pipeline-definition --pipeline-idid--pipeline-definition file://pipeline-definition.json\ --parameter-objects file://parameters.json--parameter-values-uri file://values.json
Note
A pipeline definition has an upper limit of 50 parameters. The size of
the file for parameter-values-uri has an upper limit of 15
KB.