AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
Activities
In AWS Data Pipeline, an activity is a pipeline component that defines the work to perform. AWS Data Pipeline provides several pre-packaged activities that accommodate common scenarios, such as moving data from one location to another, running Hive queries, and so on. Activities are extensible, so you can run your own custom scripts to support endless combinations.
AWS Data Pipeline supports the following types of activities:
- CopyActivity
-
Copies data from one location to another.
- EmrActivity
-
Runs an Amazon EMR cluster.
- HiveActivity
-
Runs a Hive query on an Amazon EMR cluster.
- HiveCopyActivity
-
Runs a Hive query on an Amazon EMR cluster with support for advanced data filtering and support for S3DataNode and DynamoDBDataNode.
- PigActivity
-
Runs a Pig script on an Amazon EMR cluster.
- RedshiftCopyActivity
-
Copies data to and from Amazon Redshift tables.
- ShellCommandActivity
-
Runs a custom UNIX/Linux shell command as an activity.
- SqlActivity
-
Runs a SQL query on a database.
Some activities have special support for staging data and database tables. For more information, see Staging Data and Tables with Pipeline Activities.