AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
ShellCommandActivity
Runs a command or script. You can use ShellCommandActivity
to run
time-series or cron-like scheduled tasks.
When the stage
field is set to true and used with an
S3DataNode
, ShellCommandActivity
supports the concept
of staging data, which means that you can move data from Amazon S3 to a stage location,
such as Amazon EC2 or your local environment, perform work on the data using scripts and
the ShellCommandActivity
, and move it back to Amazon S3.
In this case, when your shell command is connected to an input
S3DataNode
, your shell scripts operate directly on the data using
${INPUT1_STAGING_DIR}
, ${INPUT2_STAGING_DIR}
, and
other fields, referring to the ShellCommandActivity
input fields.
Similarly, output from the shell-command can be staged in an output directory to
be automatically pushed to Amazon S3, referred to by ${OUTPUT1_STAGING_DIR}
,
${OUTPUT2_STAGING_DIR}
, and so on.
These expressions can pass as command-line arguments to the shell-command for you to use in data transformation logic.
ShellCommandActivity
returns Linux-style error codes and strings. If
a ShellCommandActivity
results in error, the error
returned is a non-zero value.
Example
The following is an example of this object type.
{ "id" : "CreateDirectory", "type" : "ShellCommandActivity", "command" : "mkdir new-directory" }
Syntax
Object Invocation Fields | Description | Slot Type |
---|---|---|
schedule |
This object is invoked within the execution
of a To set the dependency execution order for
this object, specify a To satisfy this requirement, explicitly set
a In most cases, it is better to put the
To spread the load, AWS Data Pipeline creates physical objects slightly ahead of schedule, but runs them on schedule. For more information about example optional schedule configurations, see https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html |
Reference Object, e.g. "schedule":{"ref":"myScheduleId"} |
Required Group (One of the following is required) | Description | Slot Type |
---|---|---|
command | The command to run. Use $ to reference positional parameters and
scriptArgument to specify the
parameters for the command. This value and any
associated parameters must function in the
environment from which you are running the Task
Runner. |
String |
scriptUri | An Amazon S3 URI path for a file to download and run as a shell command. Specify only one
scriptUri , or command
field. scriptUri cannot use
parameters, use command
instead. |
String |
Required Group (One of the following is required) | Description | Slot Type |
---|---|---|
runsOn | The computational resource to run the activity or command, for example, an Amazon EC2 instance or an Amazon EMR cluster. | Reference Object, e.g. "runsOn":{"ref":"myResourceId"} |
workerGroup | Used for routing tasks. If you provide a runsOn value and
workerGroup exists,
workerGroup is ignored. |
String |
Optional Fields | Description | Slot Type |
---|---|---|
attemptStatus | The most recently reported status from the remote activity. | String |
attemptTimeout | The timeout for the remote work completion. If set, then a remote activity that does not complete within the specified starting time may be retried. | Period |
dependsOn | Specifies a dependency on another runnable object. | Reference Object, e.g. "dependsOn":{"ref":"myActivityId"} |
failureAndRerunMode | Describes consumer node behavior when dependencies fail or are rerun. | Enumeration |
input | The location of the input data. | Reference Object, e.g. "input":{"ref":"myDataNodeId"} |
lateAfterTimeout | The elapsed time after pipeline start within which the object must complete. It is triggered only when the schedule type is not set to ondemand . |
Period |
maxActiveInstances | The maximum number of concurrent active instances of a component. Re-runs do not count toward the number of active instances. | Integer |
maximumRetries | The maximum number attempt retries on failure. | Integer |
onFail | An action to run when current object fails. | Reference Object, e.g. "onFail":{"ref":"myActionId"} |
onLateAction | Actions that should be triggered if an object has not yet been scheduled or is not completed. | Reference Object, e.g. "onLateAction":{"ref":"myActionId"} |
onSuccess | An action to run when current object succeeds. | Reference Object, e.g. "onSuccess":{"ref":"myActionId"} |
output | The location of the output data. | Reference Object, e.g. "output":{"ref":"myDataNodeId"} |
parent | The parent of the current object from which slots will be inherited. | Reference Object, e.g. "parent":{"ref":"myBaseObjectId"} |
pipelineLogUri | The Amazon S3 URI, such as 's3://BucketName/Key/' for uploading logs for the
pipeline. |
String |
precondition | Optionally defines a precondition. A data node is not marked "READY" until all preconditions have been met. | Reference Object, e.g. "precondition":{"ref":"myPreconditionId"} |
reportProgressTimeout | The timeout for successive calls to reportProgress by remote activities. If set,
then remote activities that do not report progress
for the specified period may be considered stalled
and are retried. |
Period |
retryDelay | The timeout duration between two retry attempts. | Period |
scheduleType |
Allows you to specify whether the objects in your pipeline definition should be scheduled at the beginning of the interval or at the end of the interval. The values are: If set to If set to If set to |
Enumeration |
scriptArgument | A JSON-formatted array of strings to pass to the command specified by the command. For
example, if command is echo $1 $2 ,
specify scriptArgument as
"param1", "param2" . For multiple
arguments and parameters, pass the
scriptArgument as follows:
"scriptArgument":"arg1","scriptArgument":"param1","scriptArgument":"arg2","scriptArgument":"param2" .
The scriptArgument can only be used
with command ; Using it with
scriptUri causes an error. |
String |
stage | Determines whether staging is enabled and allows your shell commands to have access to the
staged-data variables, such as
${INPUT1_STAGING_DIR} and
${OUTPUT1_STAGING_DIR} . |
Boolean |
stderr | The path that receives redirected system error messages from the command. If you use the
runsOn field, this must be an Amazon S3
path because of the transitory nature of the
resource running your activity. However, if you
specify the workerGroup field, a
local file path is permitted. |
String |
stdout | The Amazon S3 path that receives redirected output from the command. If you use the
runsOn field, this must be an Amazon S3
path because of the transitory nature of the
resource running your activity. However, if you
specify the workerGroup field, a
local file path is permitted. |
String |
Runtime Fields | Description | Slot Type |
---|---|---|
@activeInstances | The list of the currently scheduled active instance objects. | Reference Object, e.g. "activeInstances":{"ref":"myRunnableObjectId"} |
@actualEndTime | The time when the execution of this object finished. | DateTime |
@actualStartTime | The time when the execution of this object started. | DateTime |
cancellationReason | The cancellationReason if this object was cancelled. |
String |
@cascadeFailedOn | The description of the dependency chain that caused the object failure. | Reference Object, e.g. "cascadeFailedOn":{"ref":"myRunnableObjectId"} |
emrStepLog | Amazon EMR step logs available only on Amazon EMR activity attempts. | String |
errorId | The errorId if this object failed. |
String |
errorMessage | The errorMessage if this object failed. |
String |
errorStackTrace | The error stack trace if this object failed. | String |
@finishedTime | The time at which the object finished its execution. | DateTime |
hadoopJobLog | Hadoop job logs available on attempts for Amazon EMR-based activities. | String |
@healthStatus | The health status of the object which reflects success or failure of the last object instance that reached a terminated state. | String |
@healthStatusFromInstanceId | The Id of the last instance object that reached a terminated state. | String |
@healthStatusUpdatedTime | The time at which the health status was updated last time. | DateTime |
hostname | The host name of the client that picked up the task attempt. | String |
@lastDeactivatedTime | The time at which this object was last deactivated. | DateTime |
@latestCompletedRunTime | The time of the latest run for which the execution completed. | DateTime |
@latestRunTime | The time of the latest run for which the execution was scheduled. | DateTime |
@nextRunTime | The time of the run to be scheduled next. | DateTime |
reportProgressTime | The most recent time that remote activity reported progress. | DateTime |
@scheduledEndTime | The schedule end time for object. | DateTime |
@scheduledStartTime | The schedule start time for object. | DateTime |
@status | The status of the object. | String |
@version | The AWS Data Pipeline version used to create the object. | String |
@waitingOn | The description of the list of dependencies this object is waiting on. | Reference Object, e.g. "waitingOn":{"ref":"myRunnableObjectId"} |
System Fields | Description | Slot Type |
---|---|---|
@error | The error describing the ill-formed object. | String |
@pipelineId | The Id of the pipeline to which this object belongs. | String |
@sphere | The place of an object in the lifecycle. Component Objects give rise to Instance Objects which execute Attempt Objects. | String |