Activities - AWS Data Pipeline

Activities

In AWS Data Pipeline, an activity is a pipeline component that defines the work to perform. AWS Data Pipeline provides several pre-packaged activities that accommodate common scenarios, such as moving data from one location to another, running Hive queries, and so on. Activities are extensible, so you can run your own custom scripts to support endless combinations.

AWS Data Pipeline supports the following types of activities:

CopyActivity

Copies data from one location to another.

EmrActivity

Runs an Amazon EMR cluster.

HiveActivity

Runs a Hive query on an Amazon EMR cluster.

HiveCopyActivity

Runs a Hive query on an Amazon EMR cluster with support for advanced data filtering and support for S3DataNode and DynamoDBDataNode.

PigActivity

Runs a Pig script on an Amazon EMR cluster.

RedshiftCopyActivity

Copies data to and from Amazon Redshift tables.

ShellCommandActivity

Runs a custom UNIX/Linux shell command as an activity.

SqlActivity

Runs a SQL query on a database.

Some activities have special support for staging data and database tables. For more information, see Staging Data and Tables with Pipeline Activities.