AWS services or capabilities described in AWS Documentation may vary by region/location. Click Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.
New-GLUEJob-Command <JobCommand>-SourceControlDetails_AuthStrategy <SourceControlAuthStrategy>-SourceControlDetails_AuthToken <String>-SourceControlDetails_Branch <String>-CodeGenConfigurationNode <Hashtable>-Connections_Connection <String[]>-DefaultArgument <Hashtable>-Description <String>-ExecutionClass <ExecutionClass>-SourceControlDetails_Folder <String>-GlueVersion <String>-SourceControlDetails_LastCommitId <String>-LogUri <String>-MaxCapacity <Double>-ExecutionProperty_MaxConcurrentRun <Int32>-MaxRetry <Int32>-Name <String>-NonOverridableArgument <Hashtable>-NotificationProperty_NotifyDelayAfter <Int32>-NumberOfWorker <Int32>-SourceControlDetails_Owner <String>-SourceControlDetails_Provider <SourceControlProvider>-SourceControlDetails_Repository <String>-Role <String>-SecurityConfiguration <String>-Tag <Hashtable>-Timeout <Int32>-WorkerType <WorkerType>-AllocatedCapacity <Int32>-Select <String>-PassThru <SwitchParameter>-Force <SwitchParameter>-ClientConfig <AmazonGlueConfig>
MaxCapacity
instead.The number of Glue data processing units (DPUs) to allocate to this Job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.This parameter is deprecated. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | CodeGenConfigurationNodes |
JobCommand
that runs this job. Required? | True |
Position? | 1 |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | Connections_Connections |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | DefaultArguments |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
glueetl
will be allowed to set ExecutionClass
to FLEX
. The flexible execution class is available for Spark jobs. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | ExecutionProperty_MaxConcurrentRuns |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark. Ray jobs should set GlueVersion
to 4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by the Runtime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.Jobs that are created without specifying a Glue version default to Glue 0.9. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Maximum capacity
. Instead, you should specify a Worker type
and the Number of workers
.Do not set MaxCapacity
if using WorkerType
and NumberOfWorkers
.The value that can be allocated for MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | MaxRetries |
Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | NonOverridableArguments |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
workerType
that are allocated when a job runs. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | NumberOfWorkers |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
SecurityConfiguration
structure to be used with this job. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | Tags |
TIMEOUT
status. The default is 2,880 minutes (48 hours). Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs.G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for the G.4X
worker type.G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs.Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | AK |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | AWSProfilesLocation, ProfilesLocation |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | StoredCredentials, AWSProfileName |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | RegionToCall |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | SK, SecretAccessKey |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | ST |
$Command = New-Object Amazon.Glue.Model.JobCommand
$Command.Name = 'glueetl'
$Command.ScriptLocation = 's3://aws-glue-scripts-000000000000-us-west-2/admin/MyTestGlueJob.py'
$Command
$Source = "source_test_table"
$Target = "target_test_table"
$Connections = $Source, $Target
$DefArgs = @{
'--TempDir' = 's3://aws-glue-temporary-000000000000-us-west-2/admin'
'--job-bookmark-option' = 'job-bookmark-disable'
'--job-language' = 'python'
}
$DefArgs
$ExecutionProp = New-Object Amazon.Glue.Model.ExecutionProperty
$ExecutionProp.MaxConcurrentRuns = 1
$ExecutionProp
$JobParams = @{
"AllocatedCapacity" = "5"
"Command" = $Command
"Connections_Connection" = $Connections
"DefaultArguments" = $DefArgs
"Description" = "This is a test"
"ExecutionProperty" = $ExecutionProp
"MaxRetries" = "1"
"Name" = "MyOregonTestGlueJob"
"Role" = "Amazon-GlueServiceRoleForSSM"
"Timeout" = "20"
}
New-GlueJob @JobParamsThis example creates a new job in AWS Glue. The command name value is alwaysglueetl
. AWS Glue supports running job scripts written in Python or Scala. In this example, the job script (MyTestGlueJob.py) is written in Python. Python parameters are specified in the$DefArgs
variable, and then passed to the PowerShell command in theDefaultArguments
parameter, which accepts a hashtable. The parameters in the$JobParams
variable come from the CreateJob API, documented in the Jobs (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html) topic of the AWS Glue API reference.
AWS Tools for PowerShell: 2.x.y.z