ShellCommandActivity

Runs a command or script. You can use ShellCommandActivity to run time-series or cron-like scheduled tasks.

When the stage field is set to true and used with an S3DataNode, ShellCommandActivity supports the concept of staging data, which means that you can move data from Amazon S3 to a stage location, such as Amazon EC2 or your local environment, perform work on the data using scripts and the ShellCommandActivity, and move it back to Amazon S3.

In this case, when your shell command is connected to an input S3DataNode, your shell scripts operate directly on the data using ${INPUT1_STAGING_DIR}, ${INPUT2_STAGING_DIR}, and other fields, referring to the ShellCommandActivity input fields.

Similarly, output from the shell-command can be staged in an output directory to be automatically pushed to Amazon S3, referred to by ${OUTPUT1_STAGING_DIR}, ${OUTPUT2_STAGING_DIR}, and so on.

These expressions can pass as command-line arguments to the shell-command for you to use in data transformation logic.

ShellCommandActivity returns Linux-style error codes and strings. If a ShellCommandActivity results in error, the error returned is a non-zero value.

Example

The following is an example of this object type.


{
  "id" : "CreateDirectory",
  "type" : "ShellCommandActivity",
  "command" : "mkdir new-directory"
}

Syntax

Object Invocation Fields Description Slot Type

schedule

Object Invocation Fields	Description	Slot Type
schedule	This object is invoked within the execution of a `schedule` interval. To set the dependency execution order for this object, specify a `schedule` reference to another object. To satisfy this requirement, explicitly set a `schedule` on the object, for example, by specifying `"schedule": {"ref": "DefaultSchedule"}`. In most cases, it is better to put the `schedule` reference on the default pipeline object so that all objects inherit that schedule. If the pipeline consists of a tree of schedules (schedules within the master schedule), create a parent object that has a schedule reference. To spread the load, AWS Data Pipeline creates physical objects slightly ahead of schedule, but runs them on schedule. For more information about example optional schedule configurations, see https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html	Reference Object, e.g. "schedule":{"ref":"myScheduleId"}

This object is invoked within the execution of a schedule interval.

To set the dependency execution order for this object, specify a schedule reference to another object.

To satisfy this requirement, explicitly set a schedule on the object, for example, by specifying "schedule": {"ref": "DefaultSchedule"}.

In most cases, it is better to put the schedule reference on the default pipeline object so that all objects inherit that schedule. If the pipeline consists of a tree of schedules (schedules within the master schedule), create a parent object that has a schedule reference.

To spread the load, AWS Data Pipeline creates physical objects slightly ahead of schedule, but runs them on schedule.

For more information about example optional schedule configurations, see https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html

Reference Object, e.g. "schedule":{"ref":"myScheduleId"}

Required Group (One of the following is required)	Description	Slot Type
command	The command to run. Use `$` to reference positional parameters and `scriptArgument` to specify the parameters for the command. This value and any associated parameters must function in the environment from which you are running the Task Runner.	String
scriptUri	An Amazon S3 URI path for a file to download and run as a shell command. Specify only one `scriptUri`, or `command` field. `scriptUri` cannot use parameters, use `command` instead.	String

Required Group (One of the following is required)	Description	Slot Type
runsOn	The computational resource to run the activity or command, for example, an Amazon EC2 instance or an Amazon EMR cluster.	Reference Object, e.g. "runsOn":{"ref":"myResourceId"}
workerGroup	Used for routing tasks. If you provide a `runsOn` value and `workerGroup` exists, `workerGroup` is ignored.	String

Optional Fields	Description	Slot Type
attemptStatus	The most recently reported status from the remote activity.	String
attemptTimeout	The timeout for the remote work completion. If set, then a remote activity that does not complete within the specified starting time may be retried.	Period
dependsOn	Specifies a dependency on another runnable object.	Reference Object, e.g. "dependsOn":{"ref":"myActivityId"}
failureAndRerunMode	Describes consumer node behavior when dependencies fail or are rerun.	Enumeration
input	The location of the input data.	Reference Object, e.g. "input":{"ref":"myDataNodeId"}
lateAfterTimeout	The elapsed time after pipeline start within which the object must complete. It is triggered only when the schedule type is not set to `ondemand`.	Period
maxActiveInstances	The maximum number of concurrent active instances of a component. Re-runs do not count toward the number of active instances.	Integer
maximumRetries	The maximum number attempt retries on failure.	Integer
onFail	An action to run when current object fails.	Reference Object, e.g. "onFail":{"ref":"myActionId"}
onLateAction	Actions that should be triggered if an object has not yet been scheduled or is not completed.	Reference Object, e.g. "onLateAction":{"ref":"myActionId"}
onSuccess	An action to run when current object succeeds.	Reference Object, e.g. "onSuccess":{"ref":"myActionId"}
output	The location of the output data.	Reference Object, e.g. "output":{"ref":"myDataNodeId"}
parent	The parent of the current object from which slots will be inherited.	Reference Object, e.g. "parent":{"ref":"myBaseObjectId"}
pipelineLogUri	The Amazon S3 URI, such as `'s3://BucketName/Key/'` for uploading logs for the pipeline.	String
precondition	Optionally defines a precondition. A data node is not marked "READY" until all preconditions have been met.	Reference Object, e.g. "precondition":{"ref":"myPreconditionId"}
reportProgressTimeout	The timeout for successive calls to `reportProgress` by remote activities. If set, then remote activities that do not report progress for the specified period may be considered stalled and are retried.	Period
retryDelay	The timeout duration between two retry attempts.	Period
scheduleType	Allows you to specify whether the objects in your pipeline definition should be scheduled at the beginning of the interval or at the end of the interval. The values are: `cron`, `ondemand`, and `timeseries`. If set to `timeseries`, instances are scheduled at the end of each interval. If set to `Cron`, instances are scheduled at the beginning of each interval. If set to `ondemand`, you can run a pipeline one time, per activation. This means you do not have to clone or recreate the pipeline to run it again. If you use an `ondemand` schedule, specify it in the default object as the only `scheduleType` for objects in the pipeline. To use `ondemand` pipelines, call the `ActivatePipeline` operation for each subsequent run.	Enumeration
scriptArgument	A JSON-formatted array of strings to pass to the command specified by the command. For example, if command is `echo $1 $2`, specify `scriptArgument` as `"param1", "param2"`. For multiple arguments and parameters, pass the `scriptArgument` as follows:`"scriptArgument":"arg1","scriptArgument":"param1","scriptArgument":"arg2","scriptArgument":"param2"`. The `scriptArgument` can only be used with `command`; Using it with `scriptUri` causes an error.	String
stage	Determines whether staging is enabled and allows your shell commands to have access to the staged-data variables, such as `${INPUT1_STAGING_DIR}` and`${OUTPUT1_STAGING_DIR}`.	Boolean
stderr	The path that receives redirected system error messages from the command. If you use the `runsOn` field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However, if you specify the `workerGroup` field, a local file path is permitted.	String
stdout	The Amazon S3 path that receives redirected output from the command. If you use the `runsOn` field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However, if you specify the `workerGroup` field, a local file path is permitted.	String

Runtime Fields	Description	Slot Type
@activeInstances	The list of the currently scheduled active instance objects.	Reference Object, e.g. "activeInstances":{"ref":"myRunnableObjectId"}
@actualEndTime	The time when the execution of this object finished.	DateTime
@actualStartTime	The time when the execution of this object started.	DateTime
cancellationReason	The `cancellationReason` if this object was cancelled.	String
@cascadeFailedOn	The description of the dependency chain that caused the object failure.	Reference Object, e.g. "cascadeFailedOn":{"ref":"myRunnableObjectId"}
emrStepLog	Amazon EMR step logs available only on Amazon EMR activity attempts.	String
errorId	The `errorId` if this object failed.	String
errorMessage	The `errorMessage` if this object failed.	String
errorStackTrace	The error stack trace if this object failed.	String
@finishedTime	The time at which the object finished its execution.	DateTime
hadoopJobLog	Hadoop job logs available on attempts for Amazon EMR-based activities.	String
@healthStatus	The health status of the object which reflects success or failure of the last object instance that reached a terminated state.	String
@healthStatusFromInstanceId	The Id of the last instance object that reached a terminated state.	String
@healthStatusUpdatedTime	The time at which the health status was updated last time.	DateTime
hostname	The host name of the client that picked up the task attempt.	String
@lastDeactivatedTime	The time at which this object was last deactivated.	DateTime
@latestCompletedRunTime	The time of the latest run for which the execution completed.	DateTime
@latestRunTime	The time of the latest run for which the execution was scheduled.	DateTime
@nextRunTime	The time of the run to be scheduled next.	DateTime
reportProgressTime	The most recent time that remote activity reported progress.	DateTime
@scheduledEndTime	The schedule end time for object.	DateTime
@scheduledStartTime	The schedule start time for object.	DateTime
@status	The status of the object.	String
@version	The AWS Data Pipeline version used to create the object.	String
@waitingOn	The description of the list of dependencies this object is waiting on.	Reference Object, e.g. "waitingOn":{"ref":"myRunnableObjectId"}

System Fields	Description	Slot Type
@error	The error describing the ill-formed object.	String
@pipelineId	The Id of the pipeline to which this object belongs.	String
@sphere	The place of an object in the lifecycle. Component Objects give rise to Instance Objects which execute Attempt Objects.	String

ShellCommandActivity

Example

Syntax

See Also