AWS Data Pipeline
Developer Guide (API Version 2012-10-29)

Document History

This documentation is associated with the 2012-10-29 version of AWS Data Pipeline.

Latest documentation update: 9 November 2018.

Change Description Release Date

Updated the lists of supported Amazon EC2 and Amazon EMR instances.

Updated the list of IDs of the HVM (Hardware Virtual Machine) AMIs used for the instances.

Updated the lists of supported Amazon EC2 and Amazon EMR instances. For more information, see Supported Instance Types for Pipeline Work Activities.

Updated the list of IDs of the HVM (Hardware Virtual Machine) AMIs used for the instances. For more information, see Syntax and search for imageId.

9 November 2018
Added configuration for attaching Amazon EBS volumes to cluster nodes, and for launching an Amazon EMR cluster into a private subnet.

Added configuration options to an EMRcluster object. You can use these options in pipelines that use Amazon EMR clusters.

Use the coreEbsConfiguration, masterEbsConfiguration, and TaskEbsConfiguration fields to configure the attachment of Amazon EBS volumes to core, master, and task nodes in the Amazon EMR cluster. For more information, see Attach EBS volumes to cluster nodes.

Use the emrManagedMasterSecurityGroupId, emrManagedSlaveSecurityGroupId, and ServiceAccessSecurityGroupId fields to configure an Amazon EMR cluster in a private subnet. For more information, see Configure an Amazon EMR cluster in a private subnet.

For more information about EMRcluster syntax, see EmrCluster.

19 April 2018
Added the list of supported Amazon EC2 and Amazon EMR instances.

Added the list of instances that AWS Data Pipeline creates by default, if you do not specify an instance type in the pipeline definition. Added a list of supported Amazon EC2 and Amazon EMR instances. For more information, see Supported Instance Types for Pipeline Work Activities.

22 March 2018
Added support for On-demand pipelines.
  • Added support for On-demand pipelines, which allows you to re-run a pipeline by activating it again. For more information, see On-Demand.

22 February 2016
Additional support for RDS databases
  • Added rdsInstanceId, region, and jdbcDriverJarUri to RdsDatabase.

  • Updated database in SqlActivity to also support RdsDatabase.

17 August 2015
Additional JDBC support
7 July 2015
HadoopActivity, Availability Zone, and Spot Support
  • Added support for submitting parallel work to Hadoop clusters. For more information, see HadoopActivity.

  • Added the ability to request Spot Instances with Ec2Resource and EmrCluster.

  • Added the ability to launch EmrCluster resources in a specified Availability Zone.

1 June 2015
Deactivating pipelines

Added support for deactivating active pipelines. For more information, see Deactivating Your Pipeline.

7 April 2015
Updated templates and console

Added new templates as reflected in the console. Updated the Getting Started chapter to use the Getting Started with ShellCommandActivity template. For more information, see Creating Pipelines Using Console Templates.

25 November 2014
VPC support

Added support for launching resources into a virtual private cloud (VPC). For more information, see Launching Resources for Your Pipeline into a VPC.

12 March 2014
Region support

Added support for multiple service regions. In addition to us-east-1, AWS Data Pipeline is supported in eu-west-1, ap-northeast-1, ap-southeast-2, and us-west-2.

20 February 2014
Amazon Redshift support

Added support for Amazon Redshift in AWS Data Pipeline, including a new console template (Copy to Redshift) and a tutorial to demonstrate the template. For more information, see Copy Data to Amazon Redshift Using AWS Data Pipeline, RedshiftDataNode, RedshiftDatabase, and RedshiftCopyActivity.

6 November 2013
PigActivity

Added PigActivity, which provides native support for Pig. For more information, see PigActivity.

15 October 2013
New console template, activity, and data format

Added the new CrossRegion DynamoDB Copy console template, including the new HiveCopyActivity and DynamoDBExportDataFormat.

21 August 2013
Cascading failures and reruns

Added information about AWS Data Pipeline cascading failure and rerun behavior. For more information, see Cascading Failures and Reruns.

8 August 2013
Troubleshooting video

Added the AWS Data Pipeline Basic Troubleshooting video. For more information, see Troubleshooting.

17 July 2013
Editing active pipelines

Added more information about editing active pipelines and rerunning pipeline components. For more information, see Editing Your Pipeline.

17 July 2013
Use resources in different regions

Added more information about using resources in different regions. For more information, see Using a Pipeline with Resources in Multiple Regions.

17 June 2013
WAITING_ON_DEPENDENCIES status

CHECKING_PRECONDITIONS status changed to WAITING_ON_DEPENDENCIES and added the @waitingOn runtime field for pipeline objects.

20 May 2013
DynamoDBDataFormat

Added DynamoDBDataFormat template.

23 April 2013
Process Web Logs video and Spot Instances support

Introduced the video "Process Web Logs with AWS Data Pipeline, Amazon EMR, and Hive," and Amazon EC2 Spot Instances support.

21 February 2013

The initial release of the AWS Data Pipeline Developer Guide.

20 December 2012