AWS CloudFormation
User Guide (Version )

AWS::DataPipeline::Pipeline

The AWS::DataPipeline::Pipeline resource specifies a data pipeline that you can use to automate the movement and transformation of data. In each pipeline, you define pipeline objects, such as activities, schedules, data nodes, and resources. For information about pipeline objects and components that you can use, see Pipeline Object Reference in the AWS Data Pipeline Developer Guide.

The AWS::DataPipeline::Pipeline resource adds tasks, schedules, and preconditions to the specified pipeline. You can use PutPipelineDefinition to populate a new pipeline.

PutPipelineDefinition also validates the configuration as it adds it to the pipeline. Changes to the pipeline are saved unless one of the following validation errors exist in the pipeline.

  • An object is missing a name or identifier field.

  • A string or reference field is empty.

  • The number of objects in the pipeline exceeds the allowed maximum number of objects.

  • The pipeline is in a FINISHED state.

Pipeline object definitions are passed to the PutPipelineDefinition action and returned by the GetPipelineDefinition action.

Syntax

To declare this entity in your AWS CloudFormation template, use the following syntax:

JSON

{ "Type" : "AWS::DataPipeline::Pipeline", "Properties" : { "Activate" : Boolean, "Description" : String, "Name" : String, "ParameterObjects" : [ ParameterObject, ... ], "ParameterValues" : [ ParameterValue, ... ], "PipelineObjects" : [ PipelineObject, ... ], "PipelineTags" : [ PipelineTag, ... ] } }

YAML

Type: AWS::DataPipeline::Pipeline Properties: Activate: Boolean Description: String Name: String ParameterObjects: - ParameterObject ParameterValues: - ParameterValue PipelineObjects: - PipelineObject PipelineTags: - PipelineTag

Properties

Activate

Indicates whether to validate and start the pipeline or stop an active pipeline. By default, the value is set to true.

Required: No

Type: Boolean

Update requires: No interruption

Description

A description of the pipeline.

Required: No

Type: String

Minimum: 0

Maximum: 1024

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*

Update requires: Replacement

Name

The name of the pipeline.

Required: Yes

Type: String

Minimum: 1

Maximum: 1024

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\n\t]*

Update requires: Replacement

ParameterObjects

The parameter objects used with the pipeline.

Required: Yes

Type: List of ParameterObject

Update requires: No interruption

ParameterValues

The parameter values used with the pipeline.

Required: No

Type: List of ParameterValue

Update requires: No interruption

PipelineObjects

The objects that define the pipeline. These objects overwrite the existing pipeline definition. Not all objects, fields, and values can be updated. For information about restrictions, see Editing Your Pipeline in the AWS Data Pipeline Developer Guide.

Required: No

Type: List of PipelineObject

Update requires: No interruption

PipelineTags

A list of arbitrary tags (key-value pairs) to associate with the pipeline, which you can use to control permissions. For more information, see Controlling Access to Pipelines and Resources in the AWS Data Pipeline Developer Guide.

Required: No

Type: List of PipelineTag

Update requires: No interruption

Return Values

Ref

When you pass the logical ID of this resource to the intrinsic Ref function, Ref returns the pipeline ID.

For more information about using the Ref function, see Ref.

Examples

The following data pipeline backs up data from an Amazon DynamoDB table to an Amazon S3 bucket. The pipeline uses the HiveCopyActivity activity to copy the data, and runs it once a day. The roles for the pipeline and the pipeline resource are declared elsewhere in the same template.

JSON

"DynamoDBInputS3OutputHive": { "Type": "AWS::DataPipeline::Pipeline", "Properties": { "Name": "DynamoDBInputS3OutputHive", "Description": "Pipeline to backup DynamoDB data to S3", "Activate": "true", "ParameterObjects": [ { "Id": "myDDBReadThroughputRatio", "Attributes": [ { "Key": "description", "StringValue": "DynamoDB read throughput ratio" }, { "Key": "type", "StringValue": "Double" }, { "Key": "default", "StringValue": "0.2" } ] }, { "Id": "myOutputS3Loc", "Attributes": [ { "Key": "description", "StringValue": "S3 output bucket" }, { "Key": "type", "StringValue": "AWS::S3::ObjectKey" }, { "Key": "default", "StringValue": { "Fn::Join" : [ "", [ "s3://", { "Ref": "S3OutputLoc" } ] ] } } ] }, { "Id": "myDDBTableName", "Attributes": [ { "Key": "description", "StringValue": "DynamoDB Table Name " }, { "Key": "type", "StringValue": "String" } ] } ], "ParameterValues": [ { "Id": "myDDBTableName", "StringValue": { "Ref": "TableName" } } ], "PipelineObjects": [ { "Id": "S3BackupLocation", "Name": "Copy data to this S3 location", "Fields": [ { "Key": "type", "StringValue": "S3DataNode" }, { "Key": "dataFormat", "RefValue": "DDBExportFormat" }, { "Key": "directoryPath", "StringValue": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}" } ] }, { "Id": "DDBSourceTable", "Name": "DDBSourceTable", "Fields": [ { "Key": "tableName", "StringValue": "#{myDDBTableName}" }, { "Key": "type", "StringValue": "DynamoDBDataNode" }, { "Key": "dataFormat", "RefValue": "DDBExportFormat" }, { "Key": "readThroughputPercent", "StringValue": "#{myDDBReadThroughputRatio}" } ] }, { "Id": "DDBExportFormat", "Name": "DDBExportFormat", "Fields": [ { "Key": "type", "StringValue": "DynamoDBExportDataFormat" } ] }, { "Id": "TableBackupActivity", "Name": "TableBackupActivity", "Fields": [ { "Key": "resizeClusterBeforeRunning", "StringValue": "true" }, { "Key": "type", "StringValue": "HiveCopyActivity" }, { "Key": "input", "RefValue": "DDBSourceTable" }, { "Key": "runsOn", "RefValue": "EmrClusterForBackup" }, { "Key": "output", "RefValue": "S3BackupLocation" } ] }, { "Id": "DefaultSchedule", "Name": "RunOnce", "Fields": [ { "Key": "occurrences", "StringValue": "1" }, { "Key": "startAt", "StringValue": "FIRST_ACTIVATION_DATE_TIME" }, { "Key": "type", "StringValue": "Schedule" }, { "Key": "period", "StringValue": "1 Day" } ] }, { "Id": "Default", "Name": "Default", "Fields": [ { "Key": "type", "StringValue": "Default" }, { "Key": "scheduleType", "StringValue": "cron" }, { "Key": "failureAndRerunMode", "StringValue": "CASCADE" }, { "Key": "role", "StringValue": "DataPipelineDefaultRole" }, { "Key": "resourceRole", "StringValue": "DataPipelineDefaultResourceRole" }, { "Key": "schedule", "RefValue": "DefaultSchedule" } ] }, { "Id": "EmrClusterForBackup", "Name": "EmrClusterForBackup", "Fields": [ { "Key": "terminateAfter", "StringValue": "2 Hours" }, { "Key": "amiVersion", "StringValue": "3.3.2" }, { "Key": "masterInstanceType", "StringValue": "m1.medium" }, { "Key": "coreInstanceType", "StringValue": "m1.medium" }, { "Key": "coreInstanceCount", "StringValue": "1" }, { "Key": "type", "StringValue": "EmrCluster" } ] } ] } }

YAML

DynamoDBInputS3OutputHive: Type: AWS::DataPipeline::Pipeline Properties: Name: DynamoDBInputS3OutputHive Description: "Pipeline to backup DynamoDB data to S3" Activate: true ParameterObjects: - Id: "myDDBReadThroughputRatio" Attributes: - Key: "description" StringValue: "DynamoDB read throughput ratio" - Key: "type" StringValue: "Double" - Key: "default" StringValue: "0.2" - Id: "myOutputS3Loc" Attributes: - Key: "description" StringValue: "S3 output bucket" - Key: "type" StringValue: "AWS::S3::ObjectKey" - Key: "default" StringValue: Fn::Join: - "" - - "s3://" - Ref: "S3OutputLoc" - Id: "myDDBTableName" Attributes: - Key: "description" StringValue: "DynamoDB Table Name " - Key: "type" StringValue: "String" ParameterValues: - Id: "myDDBTableName" StringValue: Ref: "TableName" PipelineObjects: - Id: "S3BackupLocation" Name: "Copy data to this S3 location" Fields: - Key: "type" StringValue: "S3DataNode" - Key: "dataFormat" RefValue: "DDBExportFormat" - Key: "directoryPath" StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}" - Id: "DDBSourceTable" Name: "DDBSourceTable" Fields: - Key: "tableName" StringValue: "#{myDDBTableName}" - Key: "type" StringValue: "DynamoDBDataNode" - Key: "dataFormat" RefValue: "DDBExportFormat" - Key: "readThroughputPercent" StringValue: "#{myDDBReadThroughputRatio}" - Id: "DDBExportFormat" Name: "DDBExportFormat" Fields: - Key: "type" StringValue: "DynamoDBExportDataFormat" - Id: "TableBackupActivity" Name: "TableBackupActivity" Fields: - Key: "resizeClusterBeforeRunning" StringValue: "true" - Key: "type" StringValue: "HiveCopyActivity" - Key: "input" RefValue: "DDBSourceTable" - Key: "runsOn" RefValue: "EmrClusterForBackup" - Key: "output" RefValue: "S3BackupLocation" - Id: "DefaultSchedule" Name: "RunOnce" Fields: - Key: "occurrences" StringValue: "1" - Key: "startAt" StringValue: "FIRST_ACTIVATION_DATE_TIME" - Key: "type" StringValue: "Schedule" - Key: "period" StringValue: "1 Day" - Id: "Default" Name: "Default" Fields: - Key: "type" StringValue: "Default" - Key: "scheduleType" StringValue: "cron" - Key: "failureAndRerunMode" StringValue: "CASCADE" - Key: "role" StringValue: "DataPipelineDefaultRole" - Key: "resourceRole" StringValue: "DataPipelineDefaultResourceRole" - Key: "schedule" RefValue: "DefaultSchedule" - Id: "EmrClusterForBackup" Name: "EmrClusterForBackup" Fields: - Key: "terminateAfter" StringValue: "2 Hours" - Key: "amiVersion" StringValue: "3.3.2" - Key: "masterInstanceType" StringValue: "m1.medium" - Key: "coreInstanceType" StringValue: "m1.medium" - Key: "coreInstanceCount" StringValue: "1" - Key: "type" StringValue: "EmrCluster"

See Also