AWS Glue 2017-03-31
- Client: Aws\Glue\GlueClient
- Service ID: glue
- Version: 2017-03-31
This page describes the parameters and results for the operations of the AWS Glue (2017-03-31), and shows how to use the Aws\Glue\GlueClient object to call the described operations. This documentation is specific to the 2017-03-31 API version of the service.
Operation Summary
Each of the following operations can be created from a client using
$client->getCommand('CommandName')
, where "CommandName" is the
name of one of the following operations. Note: a command is a value that
encapsulates an operation and the parameters used to create an HTTP request.
You can also create and send a command immediately using the magic methods
available on a client object: $client->commandName(/* parameters */)
.
You can send the command asynchronously (returning a promise) by appending the
word "Async" to the operation name: $client->commandNameAsync(/* parameters */)
.
- BatchCreatePartition ( array $params = [] )
- Creates one or more partitions in a batch operation.
- BatchDeleteConnection ( array $params = [] )
- Deletes a list of connection definitions from the Data Catalog.
- BatchDeletePartition ( array $params = [] )
- Deletes one or more partitions in a batch operation.
- BatchDeleteTable ( array $params = [] )
- Deletes multiple tables at once.
- BatchDeleteTableVersion ( array $params = [] )
- Deletes a specified batch of versions of a table.
- BatchGetBlueprints ( array $params = [] )
- Retrieves information about a list of blueprints.
- BatchGetCrawlers ( array $params = [] )
- Returns a list of resource metadata for a given list of crawler names.
- BatchGetCustomEntityTypes ( array $params = [] )
- Retrieves the details for the custom patterns specified by a list of names.
- BatchGetDataQualityResult ( array $params = [] )
- Retrieves a list of data quality results for the specified result IDs.
- BatchGetDevEndpoints ( array $params = [] )
- Returns a list of resource metadata for a given list of development endpoint names.
- BatchGetJobs ( array $params = [] )
- Returns a list of resource metadata for a given list of job names.
- BatchGetPartition ( array $params = [] )
- Retrieves partitions in a batch request.
- BatchGetTableOptimizer ( array $params = [] )
- Returns the configuration for the specified table optimizers.
- BatchGetTriggers ( array $params = [] )
- Returns a list of resource metadata for a given list of trigger names.
- BatchGetWorkflows ( array $params = [] )
- Returns a list of resource metadata for a given list of workflow names.
- BatchPutDataQualityStatisticAnnotation ( array $params = [] )
- Annotate datapoints over time for a specific data quality statistic.
- BatchStopJobRun ( array $params = [] )
- Stops one or more job runs for a specified job definition.
- BatchUpdatePartition ( array $params = [] )
- Updates one or more partitions in a batch operation.
- CancelDataQualityRuleRecommendationRun ( array $params = [] )
- Cancels the specified recommendation run that was being used to generate rules.
- CancelDataQualityRulesetEvaluationRun ( array $params = [] )
- Cancels a run where a ruleset is being evaluated against a data source.
- CancelMLTaskRun ( array $params = [] )
- Cancels (stops) a task run.
- CancelStatement ( array $params = [] )
- Cancels the statement.
- CheckSchemaVersionValidity ( array $params = [] )
- Validates the supplied schema.
- CreateBlueprint ( array $params = [] )
- Registers a blueprint with Glue.
- CreateClassifier ( array $params = [] )
- Creates a classifier in the user's account.
- CreateColumnStatisticsTaskSettings ( array $params = [] )
- Creates settings for a column statistics task.
- CreateConnection ( array $params = [] )
- Creates a connection definition in the Data Catalog.
- CreateCrawler ( array $params = [] )
- Creates a new crawler with specified targets, role, configuration, and optional schedule.
- CreateCustomEntityType ( array $params = [] )
- Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
- CreateDataQualityRuleset ( array $params = [] )
- Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
- CreateDatabase ( array $params = [] )
- Creates a new database in a Data Catalog.
- CreateDevEndpoint ( array $params = [] )
- Creates a new development endpoint.
- CreateJob ( array $params = [] )
- Creates a new job definition.
- CreateMLTransform ( array $params = [] )
- Creates an Glue machine learning transform.
- CreatePartition ( array $params = [] )
- Creates a new partition.
- CreatePartitionIndex ( array $params = [] )
- Creates a specified partition index in an existing table.
- CreateRegistry ( array $params = [] )
- Creates a new registry which may be used to hold a collection of schemas.
- CreateSchema ( array $params = [] )
- Creates a new schema set and registers the schema definition.
- CreateScript ( array $params = [] )
- Transforms a directed acyclic graph (DAG) into code.
- CreateSecurityConfiguration ( array $params = [] )
- Creates a new security configuration.
- CreateSession ( array $params = [] )
- Creates a new session.
- CreateTable ( array $params = [] )
- Creates a new table definition in the Data Catalog.
- CreateTableOptimizer ( array $params = [] )
- Creates a new table optimizer for a specific function.
- CreateTrigger ( array $params = [] )
- Creates a new trigger.
- CreateUsageProfile ( array $params = [] )
- Creates an Glue usage profile.
- CreateUserDefinedFunction ( array $params = [] )
- Creates a new function definition in the Data Catalog.
- CreateWorkflow ( array $params = [] )
- Creates a new workflow.
- DeleteBlueprint ( array $params = [] )
- Deletes an existing blueprint.
- DeleteClassifier ( array $params = [] )
- Removes a classifier from the Data Catalog.
- DeleteColumnStatisticsForPartition ( array $params = [] )
- Delete the partition column statistics of a column.
- DeleteColumnStatisticsForTable ( array $params = [] )
- Retrieves table statistics of columns.
- DeleteColumnStatisticsTaskSettings ( array $params = [] )
- Deletes settings for a column statistics task.
- DeleteConnection ( array $params = [] )
- Deletes a connection from the Data Catalog.
- DeleteCrawler ( array $params = [] )
- Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING.
- DeleteCustomEntityType ( array $params = [] )
- Deletes a custom pattern by specifying its name.
- DeleteDataQualityRuleset ( array $params = [] )
- Deletes a data quality ruleset.
- DeleteDatabase ( array $params = [] )
- Removes a specified database from a Data Catalog.
- DeleteDevEndpoint ( array $params = [] )
- Deletes a specified development endpoint.
- DeleteJob ( array $params = [] )
- Deletes a specified job definition.
- DeleteMLTransform ( array $params = [] )
- Deletes an Glue machine learning transform.
- DeletePartition ( array $params = [] )
- Deletes a specified partition.
- DeletePartitionIndex ( array $params = [] )
- Deletes a specified partition index from an existing table.
- DeleteRegistry ( array $params = [] )
- Delete the entire registry including schema and all of its versions.
- DeleteResourcePolicy ( array $params = [] )
- Deletes a specified policy.
- DeleteSchema ( array $params = [] )
- Deletes the entire schema set, including the schema set and all of its versions.
- DeleteSchemaVersions ( array $params = [] )
- Remove versions from the specified schema.
- DeleteSecurityConfiguration ( array $params = [] )
- Deletes a specified security configuration.
- DeleteSession ( array $params = [] )
- Deletes the session.
- DeleteTable ( array $params = [] )
- Removes a table definition from the Data Catalog.
- DeleteTableOptimizer ( array $params = [] )
- Deletes an optimizer and all associated metadata for a table.
- DeleteTableVersion ( array $params = [] )
- Deletes a specified version of a table.
- DeleteTrigger ( array $params = [] )
- Deletes a specified trigger.
- DeleteUsageProfile ( array $params = [] )
- Deletes the Glue specified usage profile.
- DeleteUserDefinedFunction ( array $params = [] )
- Deletes an existing function definition from the Data Catalog.
- DeleteWorkflow ( array $params = [] )
- Deletes a workflow.
- GetBlueprint ( array $params = [] )
- Retrieves the details of a blueprint.
- GetBlueprintRun ( array $params = [] )
- Retrieves the details of a blueprint run.
- GetBlueprintRuns ( array $params = [] )
- Retrieves the details of blueprint runs for a specified blueprint.
- GetCatalogImportStatus ( array $params = [] )
- Retrieves the status of a migration operation.
- GetClassifier ( array $params = [] )
- Retrieve a classifier by name.
- GetClassifiers ( array $params = [] )
- Lists all classifier objects in the Data Catalog.
- GetColumnStatisticsForPartition ( array $params = [] )
- Retrieves partition statistics of columns.
- GetColumnStatisticsForTable ( array $params = [] )
- Retrieves table statistics of columns.
- GetColumnStatisticsTaskRun ( array $params = [] )
- Get the associated metadata/information for a task run, given a task run ID.
- GetColumnStatisticsTaskRuns ( array $params = [] )
- Retrieves information about all runs associated with the specified table.
- GetColumnStatisticsTaskSettings ( array $params = [] )
- Gets settings for a column statistics task.
- GetConnection ( array $params = [] )
- Retrieves a connection definition from the Data Catalog.
- GetConnections ( array $params = [] )
- Retrieves a list of connection definitions from the Data Catalog.
- GetCrawler ( array $params = [] )
- Retrieves metadata for a specified crawler.
- GetCrawlerMetrics ( array $params = [] )
- Retrieves metrics about specified crawlers.
- GetCrawlers ( array $params = [] )
- Retrieves metadata for all crawlers defined in the customer account.
- GetCustomEntityType ( array $params = [] )
- Retrieves the details of a custom pattern by specifying its name.
- GetDataCatalogEncryptionSettings ( array $params = [] )
- Retrieves the security configuration for a specified catalog.
- GetDataQualityModel ( array $params = [] )
- Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
- GetDataQualityModelResult ( array $params = [] )
- Retrieve a statistic's predictions for a given Profile ID.
- GetDataQualityResult ( array $params = [] )
- Retrieves the result of a data quality rule evaluation.
- GetDataQualityRuleRecommendationRun ( array $params = [] )
- Gets the specified recommendation run that was used to generate rules.
- GetDataQualityRuleset ( array $params = [] )
- Returns an existing ruleset by identifier or name.
- GetDataQualityRulesetEvaluationRun ( array $params = [] )
- Retrieves a specific run where a ruleset is evaluated against a data source.
- GetDatabase ( array $params = [] )
- Retrieves the definition of a specified database.
- GetDatabases ( array $params = [] )
- Retrieves all databases defined in a given Data Catalog.
- GetDataflowGraph ( array $params = [] )
- Transforms a Python script into a directed acyclic graph (DAG).
- GetDevEndpoint ( array $params = [] )
- Retrieves information about a specified development endpoint.
- GetDevEndpoints ( array $params = [] )
- Retrieves all the development endpoints in this Amazon Web Services account.
- GetJob ( array $params = [] )
- Retrieves an existing job definition.
- GetJobBookmark ( array $params = [] )
- Returns information on a job bookmark entry.
- GetJobRun ( array $params = [] )
- Retrieves the metadata for a given job run.
- GetJobRuns ( array $params = [] )
- Retrieves metadata for all runs of a given job definition.
- GetJobs ( array $params = [] )
- Retrieves all current job definitions.
- GetMLTaskRun ( array $params = [] )
- Gets details for a specific task run on a machine learning transform.
- GetMLTaskRuns ( array $params = [] )
- Gets a list of runs for a machine learning transform.
- GetMLTransform ( array $params = [] )
- Gets an Glue machine learning transform artifact and all its corresponding metadata.
- GetMLTransforms ( array $params = [] )
- Gets a sortable, filterable list of existing Glue machine learning transforms.
- GetMapping ( array $params = [] )
- Creates mappings.
- GetPartition ( array $params = [] )
- Retrieves information about a specified partition.
- GetPartitionIndexes ( array $params = [] )
- Retrieves the partition indexes associated with a table.
- GetPartitions ( array $params = [] )
- Retrieves information about the partitions in a table.
- GetPlan ( array $params = [] )
- Gets code to perform a specified mapping.
- GetRegistry ( array $params = [] )
- Describes the specified registry in detail.
- GetResourcePolicies ( array $params = [] )
- Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants.
- GetResourcePolicy ( array $params = [] )
- Retrieves a specified resource policy.
- GetSchema ( array $params = [] )
- Describes the specified schema in detail.
- GetSchemaByDefinition ( array $params = [] )
- Retrieves a schema by the SchemaDefinition.
- GetSchemaVersion ( array $params = [] )
- Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
- GetSchemaVersionsDiff ( array $params = [] )
- Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
- GetSecurityConfiguration ( array $params = [] )
- Retrieves a specified security configuration.
- GetSecurityConfigurations ( array $params = [] )
- Retrieves a list of all security configurations.
- GetSession ( array $params = [] )
- Retrieves the session.
- GetStatement ( array $params = [] )
- Retrieves the statement.
- GetTable ( array $params = [] )
- Retrieves the Table definition in a Data Catalog for a specified table.
- GetTableOptimizer ( array $params = [] )
- Returns the configuration of all optimizers associated with a specified table.
- GetTableVersion ( array $params = [] )
- Retrieves a specified version of a table.
- GetTableVersions ( array $params = [] )
- Retrieves a list of strings that identify available versions of a specified table.
- GetTables ( array $params = [] )
- Retrieves the definitions of some or all of the tables in a given Database.
- GetTags ( array $params = [] )
- Retrieves a list of tags associated with a resource.
- GetTrigger ( array $params = [] )
- Retrieves the definition of a trigger.
- GetTriggers ( array $params = [] )
- Gets all the triggers associated with a job.
- GetUnfilteredPartitionMetadata ( array $params = [] )
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
- GetUnfilteredPartitionsMetadata ( array $params = [] )
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
- GetUnfilteredTableMetadata ( array $params = [] )
- Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
- GetUsageProfile ( array $params = [] )
- Retrieves information about the specified Glue usage profile.
- GetUserDefinedFunction ( array $params = [] )
- Retrieves a specified function definition from the Data Catalog.
- GetUserDefinedFunctions ( array $params = [] )
- Retrieves multiple function definitions from the Data Catalog.
- GetWorkflow ( array $params = [] )
- Retrieves resource metadata for a workflow.
- GetWorkflowRun ( array $params = [] )
- Retrieves the metadata for a given workflow run.
- GetWorkflowRunProperties ( array $params = [] )
- Retrieves the workflow run properties which were set during the run.
- GetWorkflowRuns ( array $params = [] )
- Retrieves metadata for all runs of a given workflow.
- ImportCatalogToGlue ( array $params = [] )
- Imports an existing Amazon Athena Data Catalog to Glue.
- ListBlueprints ( array $params = [] )
- Lists all the blueprint names in an account.
- ListColumnStatisticsTaskRuns ( array $params = [] )
- List all task runs for a particular account.
- ListCrawlers ( array $params = [] )
- Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag.
- ListCrawls ( array $params = [] )
- Returns all the crawls of a specified crawler.
- ListCustomEntityTypes ( array $params = [] )
- Lists all the custom patterns that have been created.
- ListDataQualityResults ( array $params = [] )
- Returns all data quality execution results for your account.
- ListDataQualityRuleRecommendationRuns ( array $params = [] )
- Lists the recommendation runs meeting the filter criteria.
- ListDataQualityRulesetEvaluationRuns ( array $params = [] )
- Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
- ListDataQualityRulesets ( array $params = [] )
- Returns a paginated list of rulesets for the specified list of Glue tables.
- ListDataQualityStatisticAnnotations ( array $params = [] )
- Retrieve annotations for a data quality statistic.
- ListDataQualityStatistics ( array $params = [] )
- Retrieves a list of data quality statistics.
- ListDevEndpoints ( array $params = [] )
- Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag.
- ListJobs ( array $params = [] )
- Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag.
- ListMLTransforms ( array $params = [] )
- Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag.
- ListRegistries ( array $params = [] )
- Returns a list of registries that you have created, with minimal registry information.
- ListSchemaVersions ( array $params = [] )
- Returns a list of schema versions that you have created, with minimal information.
- ListSchemas ( array $params = [] )
- Returns a list of schemas with minimal details.
- ListSessions ( array $params = [] )
- Retrieve a list of sessions.
- ListStatements ( array $params = [] )
- Lists statements for the session.
- ListTableOptimizerRuns ( array $params = [] )
- Lists the history of previous optimizer runs for a specific table.
- ListTriggers ( array $params = [] )
- Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag.
- ListUsageProfiles ( array $params = [] )
- List all the Glue usage profiles.
- ListWorkflows ( array $params = [] )
- Lists names of workflows created in the account.
- PutDataCatalogEncryptionSettings ( array $params = [] )
- Sets the security configuration for a specified catalog.
- PutDataQualityProfileAnnotation ( array $params = [] )
- Annotate all datapoints for a Profile.
- PutResourcePolicy ( array $params = [] )
- Sets the Data Catalog resource policy for access control.
- PutSchemaVersionMetadata ( array $params = [] )
- Puts the metadata key value pair for a specified schema version ID.
- PutWorkflowRunProperties ( array $params = [] )
- Puts the specified workflow run properties for the given workflow run.
- QuerySchemaVersionMetadata ( array $params = [] )
- Queries for the schema version metadata information.
- RegisterSchemaVersion ( array $params = [] )
- Adds a new version to the existing schema.
- RemoveSchemaVersionMetadata ( array $params = [] )
- Removes a key value pair from the schema version metadata for the specified schema version ID.
- ResetJobBookmark ( array $params = [] )
- Resets a bookmark entry.
- ResumeWorkflowRun ( array $params = [] )
- Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
- RunStatement ( array $params = [] )
- Executes the statement.
- SearchTables ( array $params = [] )
- Searches a set of tables based on properties in the table metadata as well as on the parent database.
- StartBlueprintRun ( array $params = [] )
- Starts a new run of the specified blueprint.
- StartColumnStatisticsTaskRun ( array $params = [] )
- Starts a column statistics task run, for a specified table and columns.
- StartColumnStatisticsTaskRunSchedule ( array $params = [] )
- Starts a column statistics task run schedule.
- StartCrawler ( array $params = [] )
- Starts a crawl using the specified crawler, regardless of what is scheduled.
- StartCrawlerSchedule ( array $params = [] )
- Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED.
- StartDataQualityRuleRecommendationRun ( array $params = [] )
- Starts a recommendation run that is used to generate rules when you don't know what rules to write.
- StartDataQualityRulesetEvaluationRun ( array $params = [] )
- Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table).
- StartExportLabelsTaskRun ( array $params = [] )
- Begins an asynchronous task to export all labeled data for a particular transform.
- StartImportLabelsTaskRun ( array $params = [] )
- Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
- StartJobRun ( array $params = [] )
- Starts a job run using a job definition.
- StartMLEvaluationTaskRun ( array $params = [] )
- Starts a task to estimate the quality of the transform.
- StartMLLabelingSetGenerationTaskRun ( array $params = [] )
- Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
- StartTrigger ( array $params = [] )
- Starts an existing trigger.
- StartWorkflowRun ( array $params = [] )
- Starts a new run of the specified workflow.
- StopColumnStatisticsTaskRun ( array $params = [] )
- Stops a task run for the specified table.
- StopColumnStatisticsTaskRunSchedule ( array $params = [] )
- Stops a column statistics task run schedule.
- StopCrawler ( array $params = [] )
- If the specified crawler is running, stops the crawl.
- StopCrawlerSchedule ( array $params = [] )
- Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running.
- StopSession ( array $params = [] )
- Stops the session.
- StopTrigger ( array $params = [] )
- Stops a specified trigger.
- StopWorkflowRun ( array $params = [] )
- Stops the execution of the specified workflow run.
- TagResource ( array $params = [] )
- Adds tags to a resource.
- TestConnection ( array $params = [] )
- Tests a connection to a service to validate the service credentials that you provide.
- UntagResource ( array $params = [] )
- Removes tags from a resource.
- UpdateBlueprint ( array $params = [] )
- Updates a registered blueprint.
- UpdateClassifier ( array $params = [] )
- Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present).
- UpdateColumnStatisticsForPartition ( array $params = [] )
- Creates or updates partition statistics of columns.
- UpdateColumnStatisticsForTable ( array $params = [] )
- Creates or updates table statistics of columns.
- UpdateColumnStatisticsTaskSettings ( array $params = [] )
- Updates settings for a column statistics task.
- UpdateConnection ( array $params = [] )
- Updates a connection definition in the Data Catalog.
- UpdateCrawler ( array $params = [] )
- Updates a crawler.
- UpdateCrawlerSchedule ( array $params = [] )
- Updates the schedule of a crawler using a cron expression.
- UpdateDataQualityRuleset ( array $params = [] )
- Updates the specified data quality ruleset.
- UpdateDatabase ( array $params = [] )
- Updates an existing database definition in a Data Catalog.
- UpdateDevEndpoint ( array $params = [] )
- Updates a specified development endpoint.
- UpdateJob ( array $params = [] )
- Updates an existing job definition.
- UpdateJobFromSourceControl ( array $params = [] )
- Synchronizes a job from the source control repository.
- UpdateMLTransform ( array $params = [] )
- Updates an existing machine learning transform.
- UpdatePartition ( array $params = [] )
- Updates a partition.
- UpdateRegistry ( array $params = [] )
- Updates an existing registry which is used to hold a collection of schemas.
- UpdateSchema ( array $params = [] )
- Updates the description, compatibility setting, or version checkpoint for a schema set.
- UpdateSourceControlFromJob ( array $params = [] )
- Synchronizes a job to the source control repository.
- UpdateTable ( array $params = [] )
- Updates a metadata table in the Data Catalog.
- UpdateTableOptimizer ( array $params = [] )
- Updates the configuration for an existing table optimizer.
- UpdateTrigger ( array $params = [] )
- Updates a trigger definition.
- UpdateUsageProfile ( array $params = [] )
- Update an Glue usage profile.
- UpdateUserDefinedFunction ( array $params = [] )
- Updates an existing function definition in the Data Catalog.
- UpdateWorkflow ( array $params = [] )
- Updates an existing workflow.
Paginators
Paginators handle automatically iterating over paginated API results. Paginators are associated with specific API operations, and they accept the parameters that the corresponding API operation accepts. You can get a paginator from a client class using getPaginator($paginatorName, $operationParameters). This client supports the following paginators:
- GetBlueprintRuns
- GetClassifiers
- GetColumnStatisticsTaskRuns
- GetConnections
- GetCrawlerMetrics
- GetCrawlers
- GetDatabases
- GetDevEndpoints
- GetJobRuns
- GetJobs
- GetMLTaskRuns
- GetMLTransforms
- GetPartitionIndexes
- GetPartitions
- GetResourcePolicies
- GetSecurityConfigurations
- GetTableVersions
- GetTables
- GetTriggers
- GetUnfilteredPartitionsMetadata
- GetUserDefinedFunctions
- GetWorkflowRuns
- ListBlueprints
- ListColumnStatisticsTaskRuns
- ListCrawlers
- ListCustomEntityTypes
- ListDataQualityResults
- ListDataQualityRuleRecommendationRuns
- ListDataQualityRulesetEvaluationRuns
- ListDataQualityRulesets
- ListDevEndpoints
- ListJobs
- ListMLTransforms
- ListRegistries
- ListSchemaVersions
- ListSchemas
- ListSessions
- ListTableOptimizerRuns
- ListTriggers
- ListUsageProfiles
- ListWorkflows
- SearchTables
Operations
BatchCreatePartition
$result = $client->batchCreatePartition
([/* ... */]); $promise = $client->batchCreatePartitionAsync
([/* ... */]);
Creates one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchCreatePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionInputList' => [ // REQUIRED [ 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog in which the partition is to be created. Currently, this should be the Amazon Web Services account ID.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be created.
- PartitionInputList
-
- Required: Yes
- Type: Array of PartitionInput structures
A list of
PartitionInput
structures that define the partitions to be created. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be created.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValues' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of PartitionError structures
The errors encountered when trying to create the requested partitions.
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
BatchDeleteConnection
$result = $client->batchDeleteConnection
([/* ... */]); $promise = $client->batchDeleteConnectionAsync
([/* ... */]);
Deletes a list of connection definitions from the Data Catalog.
Parameter Syntax
$result = $client->batchDeleteConnection([ 'CatalogId' => '<string>', 'ConnectionNameList' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionNameList
-
- Required: Yes
- Type: Array of strings
A list of names of the connections to delete.
Result Syntax
[ 'Errors' => [ '<NameString>' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], // ... ], 'Succeeded' => ['<string>', ...], ]
Result Details
Members
- Errors
-
- Type: Associative array of custom strings keys (NameString) to ErrorDetail structures
A map of the names of connections that were not successfully deleted to error details.
- Succeeded
-
- Type: Array of strings
A list of names of the connection definitions that were successfully deleted.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchDeletePartition
$result = $client->batchDeletePartition
([/* ... */]); $promise = $client->batchDeletePartitionAsync
([/* ... */]);
Deletes one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchDeletePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionsToDelete' => [ // REQUIRED [ 'Values' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table in question resides.
- PartitionsToDelete
-
- Required: Yes
- Type: Array of PartitionValueList structures
A list of
PartitionInput
structures that define the partitions to be deleted. - TableName
-
- Required: Yes
- Type: string
The name of the table that contains the partitions to be deleted.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValues' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of PartitionError structures
The errors encountered when trying to delete the requested partitions.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchDeleteTable
$result = $client->batchDeleteTable
([/* ... */]); $promise = $client->batchDeleteTableAsync
([/* ... */]);
Deletes multiple tables at once.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling BatchDeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
Parameter Syntax
$result = $client->batchDeleteTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TablesToDelete' => ['<string>', ...], // REQUIRED 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the tables to delete reside. For Hive compatibility, this name is entirely lowercase.
- TablesToDelete
-
- Required: Yes
- Type: Array of strings
A list of the table to delete.
- TransactionId
-
- Type: string
The transaction ID at which to delete the table contents.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'TableName' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of TableError structures
A list of errors encountered in attempting to delete the specified tables.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ResourceNotReadyException:
A resource was not ready for a transaction.
BatchDeleteTableVersion
$result = $client->batchDeleteTableVersion
([/* ... */]); $promise = $client->batchDeleteTableVersionAsync
([/* ... */]);
Deletes a specified batch of versions of a table.
Parameter Syntax
$result = $client->batchDeleteTableVersion([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'VersionIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
- VersionIds
-
- Required: Yes
- Type: Array of strings
A list of the IDs of versions to be deleted. A
VersionId
is a string representation of an integer. Each version is incremented by 1.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'TableName' => '<string>', 'VersionId' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of TableVersionError structures
A list of errors encountered while trying to delete the specified table versions.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchGetBlueprints
$result = $client->batchGetBlueprints
([/* ... */]); $promise = $client->batchGetBlueprintsAsync
([/* ... */]);
Retrieves information about a list of blueprints.
Parameter Syntax
$result = $client->batchGetBlueprints([ 'IncludeBlueprint' => true || false, 'IncludeParameterSpec' => true || false, 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- IncludeBlueprint
-
- Type: boolean
Specifies whether or not to include the blueprint in the response.
- IncludeParameterSpec
-
- Type: boolean
Specifies whether or not to include the parameters, as a JSON string, for the blueprint in the response.
- Names
-
- Required: Yes
- Type: Array of strings
A list of blueprint names.
Result Syntax
[ 'Blueprints' => [ [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'ErrorMessage' => '<string>', 'LastActiveDefinition' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'ParameterSpec' => '<string>', ], 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'ParameterSpec' => '<string>', 'Status' => 'CREATING|ACTIVE|UPDATING|FAILED', ], // ... ], 'MissingBlueprints' => ['<string>', ...], ]
Result Details
Members
- Blueprints
-
- Type: Array of Blueprint structures
Returns a list of blueprint as a
Blueprints
object. - MissingBlueprints
-
- Type: Array of strings
Returns a list of
BlueprintNames
that were not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetCrawlers
$result = $client->batchGetCrawlers
([/* ... */]); $promise = $client->batchGetCrawlersAsync
([/* ... */]);
Returns a list of resource metadata for a given list of crawler names. After calling the ListCrawlers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetCrawlers([ 'CrawlerNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CrawlerNames
-
- Required: Yes
- Type: Array of strings
A list of crawler names, which might be the names returned from the
ListCrawlers
operation.
Result Syntax
[ 'Crawlers' => [ [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], // ... ], 'CrawlersNotFound' => ['<string>', ...], ]
Result Details
Members
- Crawlers
-
- Type: Array of Crawler structures
A list of crawler definitions.
- CrawlersNotFound
-
- Type: Array of strings
A list of names of crawlers that were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
BatchGetCustomEntityTypes
$result = $client->batchGetCustomEntityTypes
([/* ... */]); $promise = $client->batchGetCustomEntityTypesAsync
([/* ... */]);
Retrieves the details for the custom patterns specified by a list of names.
Parameter Syntax
$result = $client->batchGetCustomEntityTypes([ 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- Names
-
- Required: Yes
- Type: Array of strings
A list of names of the custom patterns that you want to retrieve.
Result Syntax
[ 'CustomEntityTypes' => [ [ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', 'RegexString' => '<string>', ], // ... ], 'CustomEntityTypesNotFound' => ['<string>', ...], ]
Result Details
Members
- CustomEntityTypes
-
- Type: Array of CustomEntityType structures
A list of
CustomEntityType
objects representing the custom patterns that have been created. - CustomEntityTypesNotFound
-
- Type: Array of strings
A list of the names of custom patterns that were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchGetDataQualityResult
$result = $client->batchGetDataQualityResult
([/* ... */]); $promise = $client->batchGetDataQualityResultAsync
([/* ... */]);
Retrieves a list of data quality results for the specified result IDs.
Parameter Syntax
$result = $client->batchGetDataQualityResult([ 'ResultIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResultIds
-
- Required: Yes
- Type: Array of strings
A list of unique result IDs for the data quality results.
Result Syntax
[ 'Results' => [ [ 'AnalyzerResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluationMessage' => '<string>', 'Name' => '<string>', ], // ... ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'EvaluationContext' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', 'Observations' => [ [ 'Description' => '<string>', 'MetricBasedObservation' => [ 'MetricName' => '<string>', 'MetricValues' => [ 'ActualValue' => <float>, 'ExpectedValue' => <float>, 'LowerLimit' => <float>, 'UpperLimit' => <float>, ], 'NewRules' => ['<string>', ...], 'StatisticId' => '<string>', ], ], // ... ], 'ProfileId' => '<string>', 'ResultId' => '<string>', 'RuleResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluatedRule' => '<string>', 'EvaluationMessage' => '<string>', 'Name' => '<string>', 'Result' => 'PASS|FAIL|ERROR', ], // ... ], 'RulesetEvaluationRunId' => '<string>', 'RulesetName' => '<string>', 'Score' => <float>, 'StartedOn' => <DateTime>, ], // ... ], 'ResultsNotFound' => ['<string>', ...], ]
Result Details
Members
- Results
-
- Required: Yes
- Type: Array of DataQualityResult structures
A list of
DataQualityResult
objects representing the data quality results. - ResultsNotFound
-
- Type: Array of strings
A list of result IDs for which results were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
BatchGetDevEndpoints
$result = $client->batchGetDevEndpoints
([/* ... */]); $promise = $client->batchGetDevEndpointsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of development endpoint names. After calling the ListDevEndpoints
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetDevEndpoints([ 'DevEndpointNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- DevEndpointNames
-
- Required: Yes
- Type: Array of strings
The list of
DevEndpoint
names, which might be the names returned from theListDevEndpoint
operation.
Result Syntax
[ 'DevEndpoints' => [ [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], // ... ], 'DevEndpointsNotFound' => ['<string>', ...], ]
Result Details
Members
- DevEndpoints
-
- Type: Array of DevEndpoint structures
A list of
DevEndpoint
definitions. - DevEndpointsNotFound
-
- Type: Array of strings
A list of
DevEndpoints
not found.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetJobs
$result = $client->batchGetJobs
([/* ... */]); $promise = $client->batchGetJobsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of job names. After calling the ListJobs
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetJobs([ 'JobNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- JobNames
-
- Required: Yes
- Type: Array of strings
A list of job names, which might be the names returned from the
ListJobs
operation.
Result Syntax
[ 'Jobs' => [ [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', 'Type' => 'str|int|float|complex|bool|list|null', 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', 'TransformName' => '<string>', 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'Filter' => [ 'Filters' => [ [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', 'Values' => [ [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', 'Value' => ['<string>', ...], ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'LogicalOperator' => 'AND|OR', 'Name' => '<string>', ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ [ 'From' => '<string>', 'Keys' => [ ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', 'Name' => '<string>', ], 'Merge' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PrimaryKeys' => [ ['<string>', ...], // ... ], 'Source' => '<string>', ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'MaskValue' => '<string>', 'Name' => '<string>', 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'Recipe' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RecipeReference' => [ 'RecipeArn' => '<string>', 'RecipeVersion' => '<string>', ], 'RecipeSteps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'RenameField' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'SourcePath' => ['<string>', ...], 'TargetPath' => ['<string>', ...], ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'S3CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'QuoteChar' => 'quote|quillemet|single_quote|disabled', 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'gzip|lzo|uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SnowflakeSource' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ [ 'Alias' => '<string>', 'From' => '<string>', ], // ... ], 'SqlQuery' => '<string>', ], 'Spigot' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Path' => '<string>', 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'UnionType' => 'ALL|DISTINCT', ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LastModifiedOn' => <DateTime>, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], 'JobsNotFound' => ['<string>', ...], ]
Result Details
Members
- Jobs
-
- Type: Array of Job structures
A list of job definitions.
- JobsNotFound
-
- Type: Array of strings
A list of names of jobs not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetPartition
$result = $client->batchGetPartition
([/* ... */]); $promise = $client->batchGetPartitionAsync
([/* ... */]);
Retrieves partitions in a batch request.
Parameter Syntax
$result = $client->batchGetPartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionsToGet' => [ // REQUIRED [ 'Values' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionsToGet
-
- Required: Yes
- Type: Array of PartitionValueList structures
A list of partition values identifying the partitions to retrieve.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'Partitions' => [ [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], // ... ], 'UnprocessedKeys' => [ [ 'Values' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Partitions
-
- Type: Array of Partition structures
A list of the requested partitions.
- UnprocessedKeys
-
- Type: Array of PartitionValueList structures
A list of the partition values in the request for which partitions were not returned.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
- InvalidStateException:
An error that indicates your data is in an invalid state.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
BatchGetTableOptimizer
$result = $client->batchGetTableOptimizer
([/* ... */]); $promise = $client->batchGetTableOptimizerAsync
([/* ... */]);
Returns the configuration for the specified table optimizers.
Parameter Syntax
$result = $client->batchGetTableOptimizer([ 'Entries' => [ // REQUIRED [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'tableName' => '<string>', 'type' => 'compaction|retention|orphan_file_deletion', ], // ... ], ]);
Parameter Details
Members
- Entries
-
- Required: Yes
- Type: Array of BatchGetTableOptimizerEntry structures
A list of
BatchGetTableOptimizerEntry
objects specifying the table optimizers to retrieve.
Result Syntax
[ 'Failures' => [ [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'tableName' => '<string>', 'type' => 'compaction|retention|orphan_file_deletion', ], // ... ], 'TableOptimizers' => [ [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'tableName' => '<string>', 'tableOptimizer' => [ 'configuration' => [ 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'lastRun' => [ 'compactionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfBytesCompacted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfFilesCompacted' => <integer>, ], ], 'endTimestamp' => <DateTime>, 'error' => '<string>', 'eventType' => 'starting|completed|failed|in_progress', 'metrics' => [ 'JobDurationInHour' => '<string>', 'NumberOfBytesCompacted' => '<string>', 'NumberOfDpus' => '<string>', 'NumberOfFilesCompacted' => '<string>', ], 'orphanFileDeletionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDpus' => <integer>, 'NumberOfOrphanFilesDeleted' => <integer>, ], ], 'retentionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDataFilesDeleted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfManifestFilesDeleted' => <integer>, 'NumberOfManifestListsDeleted' => <integer>, ], ], 'startTimestamp' => <DateTime>, ], 'type' => 'compaction|retention|orphan_file_deletion', ], ], // ... ], ]
Result Details
Members
- Failures
-
- Type: Array of BatchGetTableOptimizerError structures
A list of errors from the operation.
- TableOptimizers
-
- Type: Array of BatchTableOptimizer structures
A list of
BatchTableOptimizer
objects.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
BatchGetTriggers
$result = $client->batchGetTriggers
([/* ... */]); $promise = $client->batchGetTriggersAsync
([/* ... */]);
Returns a list of resource metadata for a given list of trigger names. After calling the ListTriggers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetTriggers([ 'TriggerNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- TriggerNames
-
- Required: Yes
- Type: Array of strings
A list of trigger names, which may be the names returned from the
ListTriggers
operation.
Result Syntax
[ 'Triggers' => [ [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], // ... ], 'TriggersNotFound' => ['<string>', ...], ]
Result Details
Members
- Triggers
-
- Type: Array of Trigger structures
A list of trigger definitions.
- TriggersNotFound
-
- Type: Array of strings
A list of names of triggers not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetWorkflows
$result = $client->batchGetWorkflows
([/* ... */]); $promise = $client->batchGetWorkflowsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of workflow names. After calling the ListWorkflows
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetWorkflows([ 'IncludeGraph' => true || false, 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- IncludeGraph
-
- Type: boolean
Specifies whether to include a graph when returning the workflow resource metadata.
- Names
-
- Required: Yes
- Type: Array of strings
A list of workflow names, which may be the names returned from the
ListWorkflows
operation.
Result Syntax
[ 'MissingWorkflows' => ['<string>', ...], 'Workflows' => [ [ 'BlueprintDetails' => [ 'BlueprintName' => '<string>', 'RunId' => '<string>', ], 'CreatedOn' => <DateTime>, 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'LastModifiedOn' => <DateTime>, 'LastRun' => [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'Name' => '<string>', 'PreviousRunId' => '<string>', 'StartedOn' => <DateTime>, 'StartingEventBatchCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Statistics' => [ 'ErroredActions' => <integer>, 'FailedActions' => <integer>, 'RunningActions' => <integer>, 'StoppedActions' => <integer>, 'SucceededActions' => <integer>, 'TimeoutActions' => <integer>, 'TotalActions' => <integer>, 'WaitingActions' => <integer>, ], 'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR', 'WorkflowRunId' => '<string>', 'WorkflowRunProperties' => ['<string>', ...], ], 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', ], // ... ], ]
Result Details
Members
- MissingWorkflows
-
- Type: Array of strings
A list of names of workflows not found.
- Workflows
-
- Type: Array of Workflow structures
A list of workflow resource metadata.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchPutDataQualityStatisticAnnotation
$result = $client->batchPutDataQualityStatisticAnnotation
([/* ... */]); $promise = $client->batchPutDataQualityStatisticAnnotationAsync
([/* ... */]);
Annotate datapoints over time for a specific data quality statistic.
Parameter Syntax
$result = $client->batchPutDataQualityStatisticAnnotation([ 'ClientToken' => '<string>', 'InclusionAnnotations' => [ // REQUIRED [ 'InclusionAnnotation' => 'INCLUDE|EXCLUDE', 'ProfileId' => '<string>', 'StatisticId' => '<string>', ], // ... ], ]);
Parameter Details
Members
- ClientToken
-
- Type: string
Client Token.
- InclusionAnnotations
-
- Required: Yes
- Type: Array of DatapointInclusionAnnotation structures
A list of
DatapointInclusionAnnotation
's.
Result Syntax
[ 'FailedInclusionAnnotations' => [ [ 'FailureReason' => '<string>', 'ProfileId' => '<string>', 'StatisticId' => '<string>', ], // ... ], ]
Result Details
Members
- FailedInclusionAnnotations
-
- Type: Array of AnnotationError structures
A list of
AnnotationError
's.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
BatchStopJobRun
$result = $client->batchStopJobRun
([/* ... */]); $promise = $client->batchStopJobRunAsync
([/* ... */]);
Stops one or more job runs for a specified job definition.
Parameter Syntax
$result = $client->batchStopJobRun([ 'JobName' => '<string>', // REQUIRED 'JobRunIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition for which to stop job runs.
- JobRunIds
-
- Required: Yes
- Type: Array of strings
A list of the
JobRunIds
that should be stopped for that job definition.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'JobName' => '<string>', 'JobRunId' => '<string>', ], // ... ], 'SuccessfulSubmissions' => [ [ 'JobName' => '<string>', 'JobRunId' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of BatchStopJobRunError structures
A list of the errors that were encountered in trying to stop
JobRuns
, including theJobRunId
for which each error was encountered and details about the error. - SuccessfulSubmissions
-
- Type: Array of BatchStopJobRunSuccessfulSubmission structures
A list of the JobRuns that were successfully submitted for stopping.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchUpdatePartition
$result = $client->batchUpdatePartition
([/* ... */]); $promise = $client->batchUpdatePartitionAsync
([/* ... */]);
Updates one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchUpdatePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Entries' => [ // REQUIRED [ 'PartitionInput' => [ // REQUIRED 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], 'PartitionValueList' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog in which the partition is to be updated. Currently, this should be the Amazon Web Services account ID.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be updated.
- Entries
-
- Required: Yes
- Type: Array of BatchUpdatePartitionRequestEntry structures
A list of up to 100
BatchUpdatePartitionRequestEntry
objects to update. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be updated.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValueList' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of BatchUpdatePartitionFailureEntry structures
The errors encountered when trying to update the requested partitions. A list of
BatchUpdatePartitionFailureEntry
objects.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
CancelDataQualityRuleRecommendationRun
$result = $client->cancelDataQualityRuleRecommendationRun
([/* ... */]); $promise = $client->cancelDataQualityRuleRecommendationRunAsync
([/* ... */]);
Cancels the specified recommendation run that was being used to generate rules.
Parameter Syntax
$result = $client->cancelDataQualityRuleRecommendationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelDataQualityRulesetEvaluationRun
$result = $client->cancelDataQualityRulesetEvaluationRun
([/* ... */]); $promise = $client->cancelDataQualityRulesetEvaluationRunAsync
([/* ... */]);
Cancels a run where a ruleset is being evaluated against a data source.
Parameter Syntax
$result = $client->cancelDataQualityRulesetEvaluationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelMLTaskRun
$result = $client->cancelMLTaskRun
([/* ... */]); $promise = $client->cancelMLTaskRunAsync
([/* ... */]);
Cancels (stops) a task run. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can cancel a machine learning task run at any time by calling CancelMLTaskRun
with a task run's parent transform's TransformID
and the task run's TaskRunId
.
Parameter Syntax
$result = $client->cancelMLTaskRun([ 'TaskRunId' => '<string>', // REQUIRED 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TaskRunId
-
- Required: Yes
- Type: string
A unique identifier for the task run.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'TaskRunId' => '<string>', 'TransformId' => '<string>', ]
Result Details
Members
- Status
-
- Type: string
The status for this run.
- TaskRunId
-
- Type: string
The unique identifier for the task run.
- TransformId
-
- Type: string
The unique identifier of the machine learning transform.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelStatement
$result = $client->cancelStatement
([/* ... */]); $promise = $client->cancelStatementAsync
([/* ... */]);
Cancels the statement.
Parameter Syntax
$result = $client->cancelStatement([ 'Id' => <integer>, // REQUIRED 'RequestOrigin' => '<string>', 'SessionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: int
The ID of the statement to be cancelled.
- RequestOrigin
-
- Type: string
The origin of the request to cancel the statement.
- SessionId
-
- Required: Yes
- Type: string
The Session ID of the statement to be cancelled.
Result Syntax
[]
Result Details
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
CheckSchemaVersionValidity
$result = $client->checkSchemaVersionValidity
([/* ... */]); $promise = $client->checkSchemaVersionValidityAsync
([/* ... */]);
Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat
as the format. Since it does not take a schema set name, no compatibility checks are performed.
Parameter Syntax
$result = $client->checkSchemaVersionValidity([ 'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED 'SchemaDefinition' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DataFormat
-
- Required: Yes
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - SchemaDefinition
-
- Required: Yes
- Type: string
The definition of the schema that has to be validated.
Result Syntax
[ 'Error' => '<string>', 'Valid' => true || false, ]
Result Details
Members
- Error
-
- Type: string
A validation failure error message.
- Valid
-
- Type: boolean
Return true, if the schema is valid and false otherwise.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
CreateBlueprint
$result = $client->createBlueprint
([/* ... */]); $promise = $client->createBlueprintAsync
([/* ... */]);
Registers a blueprint with Glue.
Parameter Syntax
$result = $client->createBlueprint([ 'BlueprintLocation' => '<string>', // REQUIRED 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- BlueprintLocation
-
- Required: Yes
- Type: string
Specifies a path in Amazon S3 where the blueprint is published.
- Description
-
- Type: string
A description of the blueprint.
- Name
-
- Required: Yes
- Type: string
The name of the blueprint.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to be applied to this blueprint.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Returns the name of the blueprint that was registered.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateClassifier
$result = $client->createClassifier
([/* ... */]); $promise = $client->createClassifierAsync
([/* ... */]);
Creates a classifier in the user's account. This can be a GrokClassifier
, an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on which field of the request is present.
Parameter Syntax
$result = $client->createClassifier([ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', ], 'GrokClassifier' => [ 'Classification' => '<string>', // REQUIRED 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'JsonClassifier' => [ 'JsonPath' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'XMLClassifier' => [ 'Classification' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RowTag' => '<string>', ], ]);
Parameter Details
Members
- CsvClassifier
-
- Type: CreateCsvClassifierRequest structure
A
CsvClassifier
object specifying the classifier to create. - GrokClassifier
-
- Type: CreateGrokClassifierRequest structure
A
GrokClassifier
object specifying the classifier to create. - JsonClassifier
-
- Type: CreateJsonClassifierRequest structure
A
JsonClassifier
object specifying the classifier to create. - XMLClassifier
-
- Type: CreateXMLClassifierRequest structure
An
XMLClassifier
object specifying the classifier to create.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
CreateColumnStatisticsTaskSettings
$result = $client->createColumnStatisticsTaskSettings
([/* ... */]); $promise = $client->createColumnStatisticsTaskSettingsAsync
([/* ... */]);
Creates settings for a column statistics task.
Parameter Syntax
$result = $client->createColumnStatisticsTaskSettings([ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'DatabaseName' => '<string>', // REQUIRED 'Role' => '<string>', // REQUIRED 'SampleSize' => <float>, 'Schedule' => '<string>', 'SecurityConfiguration' => '<string>', 'TableName' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CatalogID
-
- Type: string
The ID of the Data Catalog in which the database resides.
- ColumnNameList
-
- Type: Array of strings
A list of column names for which to run statistics.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- Role
-
- Required: Yes
- Type: string
The role used for running the column statistics.
- SampleSize
-
- Type: double
The percentage of data to sample.
- Schedule
-
- Type: string
A schedule for running the column statistics, specified in CRON syntax.
- SecurityConfiguration
-
- Type: string
Name of the security configuration that is used to encrypt CloudWatch logs.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to generate column statistics.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A map of tags.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ColumnStatisticsTaskRunningException:
An exception thrown when you try to start another job while running a column stats generation job.
CreateConnection
$result = $client->createConnection
([/* ... */]); $promise = $client->createConnectionAsync
([/* ... */]);
Creates a connection definition in the Data Catalog.
Connections used for creating federated resources require the IAM glue:PassConnection
permission.
Parameter Syntax
$result = $client->createConnection([ 'CatalogId' => '<string>', 'ConnectionInput' => [ // REQUIRED 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'AuthorizationCodeProperties' => [ 'AuthorizationCode' => '<string>', 'RedirectUri' => '<string>', ], 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], // REQUIRED 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', // REQUIRED 'Description' => '<string>', 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'ValidateCredentials' => true || false, ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the connection. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionInput
-
- Required: Yes
- Type: ConnectionInput structure
A
ConnectionInput
object defining the connection to create. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags you assign to the connection.
Result Syntax
[ 'CreateConnectionStatus' => 'READY|IN_PROGRESS|FAILED', ]
Result Details
Members
- CreateConnectionStatus
-
- Type: string
The status of the connection creation request. The request can take some time for certain authentication types, for example when creating an OAuth connection with token exchange over VPC.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
CreateCrawler
$result = $client->createCrawler
([/* ... */]); $promise = $client->createCrawlerAsync
([/* ... */]);
Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets
field, the jdbcTargets
field, or the DynamoDBTargets
field.
Parameter Syntax
$result = $client->createCrawler([ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlerSecurityConfiguration' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', // REQUIRED 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', // REQUIRED 'Schedule' => '<string>', 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'TablePrefix' => '<string>', 'Tags' => ['<string>', ...], 'Targets' => [ // REQUIRED 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], // REQUIRED ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], ]);
Parameter Details
Members
- Classifiers
-
- Type: Array of strings
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
- Configuration
-
- Type: string
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
- CrawlerSecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used by this crawler. - DatabaseName
-
- Type: string
The Glue database where results are written, such as:
arn:aws:daylight:us-east-1::database/sometable/*
. - Description
-
- Type: string
A description of the new crawler.
- LakeFormationConfiguration
-
- Type: LakeFormationConfiguration structure
Specifies Lake Formation configuration settings for the crawler.
- LineageConfiguration
-
- Type: LineageConfiguration structure
Specifies data lineage configuration settings for the crawler.
- Name
-
- Required: Yes
- Type: string
Name of the new crawler.
- RecrawlPolicy
-
- Type: RecrawlPolicy structure
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
- Role
-
- Required: Yes
- Type: string
The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources.
- Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
. - SchemaChangePolicy
-
- Type: SchemaChangePolicy structure
The policy for the crawler's update and deletion behavior.
- TablePrefix
-
- Type: string
The table prefix used for catalog tables that are created.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Targets
-
- Required: Yes
- Type: CrawlerTargets structure
A list of collection of targets to crawl.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateCustomEntityType
$result = $client->createCustomEntityType
([/* ... */]); $promise = $client->createCustomEntityTypeAsync
([/* ... */]);
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
Each custom pattern you create specifies a regular expression and an optional list of context words. If no context words are passed only a regular expression is checked.
Parameter Syntax
$result = $client->createCustomEntityType([ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'RegexString' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- ContextWords
-
- Type: Array of strings
A list of context words. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.
If no context words are passed only a regular expression is checked.
- Name
-
- Required: Yes
- Type: string
A name for the custom pattern that allows it to be retrieved or deleted later. This name must be unique per Amazon Web Services account.
- RegexString
-
- Required: Yes
- Type: string
A regular expression string that is used for detecting sensitive data in a custom pattern.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the custom entity type.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the custom pattern you created.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateDataQualityRuleset
$result = $client->createDataQualityRuleset
([/* ... */]); $promise = $client->createDataQualityRulesetAsync
([/* ... */]);
Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the Glue developer guide.
Parameter Syntax
$result = $client->createDataQualityRuleset([ 'ClientToken' => '<string>', 'DataQualitySecurityConfiguration' => '<string>', 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Ruleset' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ]);
Parameter Details
Members
- ClientToken
-
- Type: string
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- Description
-
- Type: string
A description of the data quality ruleset.
- Name
-
- Required: Yes
- Type: string
A unique name for the data quality ruleset.
- Ruleset
-
- Required: Yes
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the data quality ruleset.
- TargetTable
-
- Type: DataQualityTargetTable structure
A target table associated with the data quality ruleset.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
A unique name for the data quality ruleset.
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateDatabase
$result = $client->createDatabase
([/* ... */]); $promise = $client->createDatabaseAsync
([/* ... */]);
Creates a new database in a Data Catalog.
Parameter Syntax
$result = $client->createDatabase([ 'CatalogId' => '<string>', 'DatabaseInput' => [ // REQUIRED 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the database. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseInput
-
- Required: Yes
- Type: DatabaseInput structure
The metadata for the database.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags you assign to the database.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- FederatedResourceAlreadyExistsException:
A federated resource already exists.
CreateDevEndpoint
$result = $client->createDevEndpoint
([/* ... */]); $promise = $client->createDevEndpointAsync
([/* ... */]);
Creates a new development endpoint.
Parameter Syntax
$result = $client->createDevEndpoint([ 'Arguments' => ['<string>', ...], 'EndpointName' => '<string>', // REQUIRED 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'GlueVersion' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'SubnetId' => '<string>', 'Tags' => ['<string>', ...], 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
A map of arguments used to configure the
DevEndpoint
. - EndpointName
-
- Required: Yes
- Type: string
The name to be assigned to the new
DevEndpoint
. - ExtraJarsS3Path
-
- Type: string
The path to one or more Java
.jar
files in an S3 bucket that should be loaded in yourDevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in your
DevEndpoint
. Multiple values must be complete paths separated by a comma.You can only use pure Python libraries with a
DevEndpoint
. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported. - GlueVersion
-
- Type: string
Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.
For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Development endpoints that are created without specifying a Glue version default to Glue 0.9.
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - NumberOfNodes
-
- Type: int
The number of Glue Data Processing Units (DPUs) to allocate to this
DevEndpoint
. - NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated to the development endpoint.The maximum number of workers you can define are 299 for
G.1X
, and 149 forG.2X
. - PublicKey
-
- Type: string
The public key to be used by this
DevEndpoint
for authentication. This attribute is provided for backward compatibility because the recommended attribute to use is public keys. - PublicKeys
-
- Type: Array of strings
A list of public keys to be used by the development endpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.
If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys. Call the
UpdateDevEndpoint
API with the public key content in thedeletePublicKeys
attribute, and the list of new keys in theaddPublicKeys
attribute. - RoleArn
-
- Required: Yes
- Type: string
The IAM role for the
DevEndpoint
. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with thisDevEndpoint
. - SecurityGroupIds
-
- Type: Array of strings
Security group IDs for the security groups to be used by the new
DevEndpoint
. - SubnetId
-
- Type: string
The subnet ID for the new
DevEndpoint
to use. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this DevEndpoint. You may use tags to limit access to the DevEndpoint. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated to the development endpoint. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
Known issue: when a development endpoint is created with the
G.2X
WorkerType
configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk.
Result Syntax
[ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ]
Result Details
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The map of arguments used to configure this
DevEndpoint
.Valid arguments are:
-
"--enable-glue-datacatalog": ""
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - AvailabilityZone
-
- Type: string
The Amazon Web Services Availability Zone where this
DevEndpoint
is located. - CreatedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The point in time at which this
DevEndpoint
was created. - EndpointName
-
- Type: string
The name assigned to the new
DevEndpoint
. - ExtraJarsS3Path
-
- Type: string
Path to one or more Java
.jar
files in an S3 bucket that will be loaded in yourDevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an S3 bucket that will be loaded in your
DevEndpoint
. - FailureReason
-
- Type: string
The reason for a current failure in this
DevEndpoint
. - GlueVersion
-
- Type: string
Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.
For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
- NumberOfNodes
-
- Type: int
The number of Glue Data Processing Units (DPUs) allocated to this DevEndpoint.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated to the development endpoint. - RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the role assigned to the new
DevEndpoint
. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure being used with thisDevEndpoint
. - SecurityGroupIds
-
- Type: Array of strings
The security groups assigned to the new
DevEndpoint
. - Status
-
- Type: string
The current status of the new
DevEndpoint
. - SubnetId
-
- Type: string
The subnet ID assigned to the new
DevEndpoint
. - VpcId
-
- Type: string
The ID of the virtual private cloud (VPC) used by this
DevEndpoint
. - WorkerType
-
- Type: string
The type of predefined worker that is allocated to the development endpoint. May be a value of Standard, G.1X, or G.2X.
- YarnEndpointAddress
-
- Type: string
The address of the YARN endpoint used by this
DevEndpoint
. - ZeppelinRemoteSparkInterpreterPort
-
- Type: int
The Apache Zeppelin port for the remote Apache Spark interpreter.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateJob
$result = $client->createJob
([/* ... */]); $promise = $client->createJobAsync
([/* ... */]);
Creates a new job definition.
Parameter Syntax
$result = $client->createJob([ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ // REQUIRED [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', // REQUIRED 'Column' => ['<string>', ...], // REQUIRED ], // ... ], 'Groups' => [ // REQUIRED ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Mapping' => [ // REQUIRED [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', // REQUIRED ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', // REQUIRED ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', // REQUIRED ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'CustomCode' => [ 'ClassName' => '<string>', // REQUIRED 'Code' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', // REQUIRED 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'DropFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ // REQUIRED 'Id' => '<string>', // REQUIRED 'Label' => '<string>', // REQUIRED ], 'Value' => '<string>', // REQUIRED ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', // REQUIRED 'Type' => 'str|int|float|complex|bool|list|null', // REQUIRED 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', // REQUIRED 'TransformName' => '<string>', // REQUIRED 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Filter' => [ 'Filters' => [ // REQUIRED [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', // REQUIRED 'Values' => [ // REQUIRED [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', // REQUIRED 'Value' => ['<string>', ...], // REQUIRED ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'LogicalOperator' => 'AND|OR', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ // REQUIRED [ 'From' => '<string>', // REQUIRED 'Keys' => [ // REQUIRED ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Merge' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PrimaryKeys' => [ // REQUIRED ['<string>', ...], // ... ], 'Source' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'MaskValue' => '<string>', 'Name' => '<string>', // REQUIRED 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', // REQUIRED 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'Recipe' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RecipeReference' => [ 'RecipeArn' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', // REQUIRED ], 'RecipeSteps' => [ [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'RenameField' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'SourcePath' => ['<string>', ...], // REQUIRED 'TargetPath' => ['<string>', ...], // REQUIRED ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'S3CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'QuoteChar' => 'quote|quillemet|single_quote|disabled', // REQUIRED 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', // REQUIRED 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Compression' => 'gzip|lzo|uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'SnowflakeSource' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ // REQUIRED [ 'Alias' => '<string>', // REQUIRED 'From' => '<string>', // REQUIRED ], // ... ], 'SqlQuery' => '<string>', // REQUIRED ], 'Spigot' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Path' => '<string>', // REQUIRED 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'UnionType' => 'ALL|DISTINCT', // REQUIRED ], ], // ... ], 'Command' => [ // REQUIRED 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'Role' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- AllocatedCapacity
-
- Type: int
This parameter is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) to allocate to this Job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- CodeGenConfigurationNodes
-
- Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures
The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.
- Command
-
- Required: Yes
- Type: JobCommand structure
The
JobCommand
that runs this job. - Connections
-
- Type: ConnectionsList structure
The connections used for this job.
- DefaultArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The default arguments for every run of this job, specified as name-value pairs.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- Description
-
- Type: string
Description of the job being defined.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - ExecutionProperty
-
- Type: ExecutionProperty structure
An
ExecutionProperty
specifying the maximum number of concurrent runs allowed for this job. - GlueVersion
-
- Type: string
In Spark jobs,
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.Ray jobs should set
GlueVersion
to4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by theRuntime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Jobs that are created without specifying a Glue version default to Glue 0.9.
- JobMode
-
- Type: string
A mode that describes how a job was created. Valid values are:
-
SCRIPT
- The job was created using the Glue Studio script editor. -
VISUAL
- The job was created using the Glue Studio visual editor. -
NOTEBOOK
- The job was created using an interactive sessions notebook.
When the
JobMode
field is missing or null,SCRIPT
is assigned as the default value. - JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job runs for this job.
A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing.
If this field does not match the value set in the job run, then the value from the job run field will be used.
- LogUri
-
- Type: string
This field is reserved for future use.
- MaintenanceWindow
-
- Type: string
This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.
Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0+ jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- MaxRetries
-
- Type: int
The maximum number of times to retry this job if it fails.
- Name
-
- Required: Yes
- Type: string
The name you assign to this job definition. It must be unique in your account.
- NonOverridableArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - Role
-
- Required: Yes
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role associated with this job.
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job. - SourceControlDetails
-
- Type: SourceControlDetails structure
The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this job. You may use tags to limit access to the job. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Timeout
-
- Type: int
The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours) for batch jobs.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The unique name that was provided for this job definition.
Errors
- InvalidInputException:
The input provided was not valid.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- AlreadyExistsException:
A resource to be created or added already exists.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
CreateMLTransform
$result = $client->createMLTransform
([/* ... */]); $promise = $client->createMLTransformAsync
([/* ... */]);
Creates an Glue machine learning transform. This operation creates the transform and all the necessary parameters to train it.
Call this operation as the first step in the process of using a machine learning transform (such as the FindMatches
transform) for deduplicating data. You can provide an optional Description
, in addition to the parameters that you want to use for your algorithm.
You must also specify certain parameters for the tasks that Glue runs on your behalf as part of learning from your data and creating a high-quality machine learning transform. These parameters include Role
, and optionally, AllocatedCapacity
, Timeout
, and MaxRetries
. For more information, see Jobs.
Parameter Syntax
$result = $client->createMLTransform([ 'Description' => '<string>', 'GlueVersion' => '<string>', 'InputRecordTables' => [ // REQUIRED [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], // ... ], 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'NumberOfWorkers' => <integer>, 'Parameters' => [ // REQUIRED 'FindMatchesParameters' => [ 'AccuracyCostTradeoff' => <float>, 'EnforceProvidedLabels' => true || false, 'PrecisionRecallTradeoff' => <float>, 'PrimaryKeyColumnName' => '<string>', ], 'TransformType' => 'FIND_MATCHES', // REQUIRED ], 'Role' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'TransformEncryption' => [ 'MlUserDataEncryption' => [ 'KmsKeyId' => '<string>', 'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS', // REQUIRED ], 'TaskRunSecurityConfigurationName' => '<string>', ], 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the machine learning transform that is being defined. The default is an empty string.
- GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- InputRecordTables
-
- Required: Yes
- Type: Array of GlueTable structures
A list of Glue table definitions used by the transform.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only.When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only. - MaxRetries
-
- Type: int
The maximum number of times to retry a task for this transform after a task run fails.
- Name
-
- Required: Yes
- Type: string
The unique name that you give the transform when you create it.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when this task runs.If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). - Parameters
-
- Required: Yes
- Type: TransformParameters structure
The algorithmic parameters that are specific to the transform type used. Conditionally dependent on the transform type.
- Role
-
- Required: Yes
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both Glue service role permissions to Glue resources, and Amazon S3 permissions required by the transform.
-
This role needs Glue service role permissions to allow access to resources in Glue. See Attach a Policy to IAM Users That Access Glue.
-
This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this machine learning transform. You may use tags to limit access to the machine learning transform. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Timeout
-
- Type: int
The timeout of the task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours). - TransformEncryption
-
- Type: TransformEncryption structure
The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker. -
For the
G.2X
worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
Result Syntax
[ 'TransformId' => '<string>', ]
Result Details
Members
- TransformId
-
- Type: string
A unique identifier that is generated for the transform.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- AccessDeniedException:
Access to a resource was denied.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
CreatePartition
$result = $client->createPartition
([/* ... */]); $promise = $client->createPartitionAsync
([/* ... */]);
Creates a new partition.
Parameter Syntax
$result = $client->createPartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionInput' => [ // REQUIRED 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The Amazon Web Services account ID of the catalog in which the partition is to be created.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be created.
- PartitionInput
-
- Required: Yes
- Type: PartitionInput structure
A
PartitionInput
structure defining the partition to be created. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be created.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
CreatePartitionIndex
$result = $client->createPartitionIndex
([/* ... */]); $promise = $client->createPartitionIndexAsync
([/* ... */]);
Creates a specified partition index in an existing table.
Parameter Syntax
$result = $client->createPartitionIndex([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionIndex' => [ // REQUIRED 'IndexName' => '<string>', // REQUIRED 'Keys' => ['<string>', ...], // REQUIRED ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
Specifies the name of a database in which you want to create a partition index.
- PartitionIndex
-
- Required: Yes
- Type: PartitionIndex structure
Specifies a
PartitionIndex
structure to create a partition index in an existing table. - TableName
-
- Required: Yes
- Type: string
Specifies the name of a table in which you want to create a partition index.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
CreateRegistry
$result = $client->createRegistry
([/* ... */]); $promise = $client->createRegistryAsync
([/* ... */]);
Creates a new registry which may be used to hold a collection of schemas.
Parameter Syntax
$result = $client->createRegistry([ 'Description' => '<string>', 'RegistryName' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the registry. If description is not provided, there will not be any default value for this.
- RegistryName
-
- Required: Yes
- Type: string
Name of the registry to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API.
Result Syntax
[ 'Description' => '<string>', 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Tags' => ['<string>', ...], ]
Result Details
Members
- Description
-
- Type: string
A description of the registry.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the newly created registry.
- RegistryName
-
- Type: string
The name of the registry.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags for the registry.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
CreateSchema
$result = $client->createSchema
([/* ... */]); $promise = $client->createSchemaAsync
([/* ... */]);
Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.
When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion
API is used.
When this API is called without a RegistryId
, this will create an entry for a "default-registry" in the registry database tables, if it is not already present.
Parameter Syntax
$result = $client->createSchema([ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED 'Description' => '<string>', 'RegistryId' => [ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], 'SchemaDefinition' => '<string>', 'SchemaName' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Compatibility
-
- Type: string
The compatibility mode of the schema. The possible values are:
-
NONE: No compatibility mode applies. You can use this choice in development scenarios or if you do not know the compatibility mode that you want to apply to schemas. Any new version added will be accepted without undergoing a compatibility check.
-
DISABLED: This compatibility choice prevents versioning for a particular schema. You can use this choice to prevent future versioning of a schema.
-
BACKWARD: This compatibility choice is recommended as it allows data receivers to read both the current and one previous schema version. This means that for instance, a new schema version cannot drop data fields or change the type of these fields, so they can't be read by readers using the previous version.
-
BACKWARD_ALL: This compatibility choice allows data receivers to read both the current and all previous schema versions. You can use this choice when you need to delete fields or add optional fields, and check compatibility against all previous schema versions.
-
FORWARD: This compatibility choice allows data receivers to read both the current and one next schema version, but not necessarily later versions. You can use this choice when you need to add fields or delete optional fields, but only check compatibility against the last schema version.
-
FORWARD_ALL: This compatibility choice allows data receivers to read written by producers of any new registered schema. You can use this choice when you need to add fields or delete optional fields, and check compatibility against all previous schema versions.
-
FULL: This compatibility choice allows data receivers to read data written by producers using the previous or next version of the schema, but not necessarily earlier or later versions. You can use this choice when you need to add or remove optional fields, but only check compatibility against the last schema version.
-
FULL_ALL: This compatibility choice allows data receivers to read data written by producers using all previous schema versions. You can use this choice when you need to add or remove optional fields, and check compatibility against all previous schema versions.
- DataFormat
-
- Required: Yes
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - Description
-
- Type: string
An optional description of the schema. If description is not provided, there will not be any automatic default value for this.
- RegistryId
-
- Type: RegistryId structure
This is a wrapper shape to contain the registry identity fields. If this is not provided, the default registry will be used. The ARN format for the same will be:
arn:aws:glue:us-east-2:<customer id>:registry/default-registry:random-5-letter-id
. - SchemaDefinition
-
- Type: string
The schema definition using the
DataFormat
setting forSchemaName
. - SchemaName
-
- Required: Yes
- Type: string
Name of the schema to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API. If specified, follows the Amazon Web Services tags-on-create pattern.
Result Syntax
[ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'DataFormat' => 'AVRO|JSON|PROTOBUF', 'Description' => '<string>', 'LatestSchemaVersion' => <integer>, 'NextSchemaVersion' => <integer>, 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaCheckpoint' => <integer>, 'SchemaName' => '<string>', 'SchemaStatus' => 'AVAILABLE|PENDING|DELETING', 'SchemaVersionId' => '<string>', 'SchemaVersionStatus' => 'AVAILABLE|PENDING|FAILURE|DELETING', 'Tags' => ['<string>', ...], ]
Result Details
Members
- Compatibility
-
- Type: string
The schema compatibility mode.
- DataFormat
-
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - Description
-
- Type: string
A description of the schema if specified when created.
- LatestSchemaVersion
-
- Type: long (int|float)
The latest version of the schema associated with the returned schema definition.
- NextSchemaVersion
-
- Type: long (int|float)
The next version of the schema associated with the returned schema definition.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry.
- RegistryName
-
- Type: string
The name of the registry.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaCheckpoint
-
- Type: long (int|float)
The version number of the checkpoint (the last time the compatibility mode was changed).
- SchemaName
-
- Type: string
The name of the schema.
- SchemaStatus
-
- Type: string
The status of the schema.
- SchemaVersionId
-
- Type: string
The unique identifier of the first schema version.
- SchemaVersionStatus
-
- Type: string
The status of the first schema version created.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags for the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
CreateScript
$result = $client->createScript
([/* ... */]); $promise = $client->createScriptAsync
([/* ... */]);
Transforms a directed acyclic graph (DAG) into code.
Parameter Syntax
$result = $client->createScript([ 'DagEdges' => [ [ 'Source' => '<string>', // REQUIRED 'Target' => '<string>', // REQUIRED 'TargetParameter' => '<string>', ], // ... ], 'DagNodes' => [ [ 'Args' => [ // REQUIRED [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'Id' => '<string>', // REQUIRED 'LineNumber' => <integer>, 'NodeType' => '<string>', // REQUIRED ], // ... ], 'Language' => 'PYTHON|SCALA', ]);
Parameter Details
Members
- DagEdges
-
- Type: Array of CodeGenEdge structures
A list of the edges in the DAG.
- DagNodes
-
- Type: Array of CodeGenNode structures
A list of the nodes in the DAG.
- Language
-
- Type: string
The programming language of the resulting code from the DAG.
Result Syntax
[ 'PythonScript' => '<string>', 'ScalaCode' => '<string>', ]
Result Details
Members
- PythonScript
-
- Type: string
The Python script generated from the DAG.
- ScalaCode
-
- Type: string
The Scala code generated from the DAG.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
CreateSecurityConfiguration
$result = $client->createSecurityConfiguration
([/* ... */]); $promise = $client->createSecurityConfigurationAsync
([/* ... */]);
Creates a new security configuration. A security configuration is a set of security properties that can be used by Glue. You can use a security configuration to encrypt data at rest. For information about using security configurations in Glue, see Encrypting Data Written by Crawlers, Jobs, and Development Endpoints.
Parameter Syntax
$result = $client->createSecurityConfiguration([ 'EncryptionConfiguration' => [ // REQUIRED 'CloudWatchEncryption' => [ 'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS', 'KmsKeyArn' => '<string>', ], 'JobBookmarksEncryption' => [ 'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS', 'KmsKeyArn' => '<string>', ], 'S3Encryption' => [ [ 'KmsKeyArn' => '<string>', 'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3', ], // ... ], ], 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EncryptionConfiguration
-
- Required: Yes
- Type: EncryptionConfiguration structure
The encryption configuration for the new security configuration.
- Name
-
- Required: Yes
- Type: string
The name for the new security configuration.
Result Syntax
[ 'CreatedTimestamp' => <DateTime>, 'Name' => '<string>', ]
Result Details
Members
- CreatedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the new security configuration was created.
- Name
-
- Type: string
The name assigned to the new security configuration.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateSession
$result = $client->createSession
([/* ... */]); $promise = $client->createSessionAsync
([/* ... */]);
Creates a new session.
Parameter Syntax
$result = $client->createSession([ 'Command' => [ // REQUIRED 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'GlueVersion' => '<string>', 'Id' => '<string>', // REQUIRED 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'RequestOrigin' => '<string>', 'Role' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Command
-
- Required: Yes
- Type: SessionCommand structure
The
SessionCommand
that runs the job. - Connections
-
- Type: ConnectionsList structure
The number of connections to use for the session.
- DefaultArguments
-
- Type: Associative array of custom strings keys (OrchestrationNameString) to strings
A map array of key-value pairs. Max is 75 pairs.
- Description
-
- Type: string
The description of the session.
- GlueVersion
-
- Type: string
The Glue version determines the versions of Apache Spark and Python that Glue supports. The GlueVersion must be greater than 2.0.
- Id
-
- Required: Yes
- Type: string
The ID of the session request.
- IdleTimeout
-
- Type: int
The number of minutes when idle before session times out. Default for Spark ETL jobs is value of Timeout. Consult the documentation for other job types.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
WorkerType
to use for the session. - RequestOrigin
-
- Type: string
The origin of the request.
- Role
-
- Required: Yes
- Type: string
The IAM Role ARN
- SecurityConfiguration
-
- Type: string
The name of the SecurityConfiguration structure to be used with the session
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The map of key value pairs (tags) belonging to the session.
- Timeout
-
- Type: int
The number of minutes before session times out. Default for Spark ETL jobs is 48 hours (2880 minutes), the maximum session lifetime for this job type. Consult the documentation for other job types.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, or G.8X for Spark jobs. Accepts the value Z.2X for Ray notebooks.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Result Syntax
[ 'Session' => [ 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'CompletedOn' => <DateTime>, 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DPUSeconds' => <float>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <float>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Progress' => <float>, 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- Session
-
- Type: Session structure
Returns the session object in the response.
Errors
- AccessDeniedException:
Access to a resource was denied.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateTable
$result = $client->createTable
([/* ... */]); $promise = $client->createTableAsync
([/* ... */]);
Creates a new table definition in the Data Catalog.
Parameter Syntax
$result = $client->createTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'OpenTableFormatInput' => [ 'IcebergInput' => [ 'MetadataOperation' => 'CREATE', // REQUIRED 'Version' => '<string>', ], ], 'PartitionIndexes' => [ [ 'IndexName' => '<string>', // REQUIRED 'Keys' => ['<string>', ...], // REQUIRED ], // ... ], 'TableInput' => [ // REQUIRED 'Description' => '<string>', 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Name' => '<string>', // REQUIRED 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the
Table
. If none is supplied, the Amazon Web Services account ID is used by default. - DatabaseName
-
- Required: Yes
- Type: string
The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.
- OpenTableFormatInput
-
- Type: OpenTableFormatInput structure
Specifies an
OpenTableFormatInput
structure when creating an open format table. - PartitionIndexes
-
- Type: Array of PartitionIndex structures
A list of partition indexes,
PartitionIndex
structures, to create in the table. - TableInput
-
- Required: Yes
- Type: TableInput structure
The
TableInput
object that defines the metadata table to create in the catalog. - TransactionId
-
- Type: string
The ID of the transaction.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- ResourceNotReadyException:
A resource was not ready for a transaction.
CreateTableOptimizer
$result = $client->createTableOptimizer
([/* ... */]); $promise = $client->createTableOptimizerAsync
([/* ... */]);
Creates a new table optimizer for a specific function. compaction
is the only currently supported optimizer type.
Parameter Syntax
$result = $client->createTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'TableOptimizerConfiguration' => [ // REQUIRED 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- TableOptimizerConfiguration
-
- Required: Yes
- Type: TableOptimizerConfiguration structure
A
TableOptimizerConfiguration
object representing the configuration of a table optimizer. - Type
-
- Required: Yes
- Type: string
The type of table optimizer. Currently, the only valid value is
compaction
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- ValidationException:
A value could not be validated.
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
CreateTrigger
$result = $client->createTrigger
([/* ... */]); $promise = $client->createTriggerAsync
([/* ... */]);
Creates a new trigger.
Parameter Syntax
$result = $client->createTrigger([ 'Actions' => [ // REQUIRED [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, // REQUIRED 'BatchWindow' => <integer>, ], 'Name' => '<string>', // REQUIRED 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'StartOnCreation' => true || false, 'Tags' => ['<string>', ...], 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', // REQUIRED 'WorkflowName' => '<string>', ]);
Parameter Details
Members
- Actions
-
- Required: Yes
- Type: Array of Action structures
The actions initiated by this trigger when it fires.
- Description
-
- Type: string
A description of the new trigger.
- EventBatchingCondition
-
- Type: EventBatchingCondition structure
Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.
- Name
-
- Required: Yes
- Type: string
The name of the trigger.
- Predicate
-
- Type: Predicate structure
A predicate to specify when the new trigger should fire.
This field is required when the trigger type is
CONDITIONAL
. - Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.This field is required when the trigger type is SCHEDULED.
- StartOnCreation
-
- Type: boolean
Set to
true
to startSCHEDULED
andCONDITIONAL
triggers when created. True is not supported forON_DEMAND
triggers. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this trigger. You may use tags to limit access to the trigger. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Type
-
- Required: Yes
- Type: string
The type of the new trigger.
- WorkflowName
-
- Type: string
The name of the workflow associated with the trigger.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
CreateUsageProfile
$result = $client->createUsageProfile
([/* ... */]); $promise = $client->createUsageProfileAsync
([/* ... */]);
Creates an Glue usage profile.
Parameter Syntax
$result = $client->createUsageProfile([ 'Configuration' => [ // REQUIRED 'JobConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], 'SessionConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], ], 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Configuration
-
- Required: Yes
- Type: ProfileConfiguration structure
A
ProfileConfiguration
object specifying the job and session values for the profile. - Description
-
- Type: string
A description of the usage profile.
- Name
-
- Required: Yes
- Type: string
The name of the usage profile.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the usage profile.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the usage profile that was created.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- OperationNotSupportedException:
The operation is not available in the region.
CreateUserDefinedFunction
$result = $client->createUserDefinedFunction
([/* ... */]); $promise = $client->createUserDefinedFunctionAsync
([/* ... */]);
Creates a new function definition in the Data Catalog.
Parameter Syntax
$result = $client->createUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionInput' => [ // REQUIRED 'ClassName' => '<string>', 'FunctionName' => '<string>', 'OwnerName' => '<string>', 'OwnerType' => 'USER|ROLE|GROUP', 'ResourceUris' => [ [ 'ResourceType' => 'JAR|FILE|ARCHIVE', 'Uri' => '<string>', ], // ... ], ], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the function. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which to create the function.
- FunctionInput
-
- Required: Yes
- Type: UserDefinedFunctionInput structure
A
FunctionInput
object that defines the function to create in the Data Catalog.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
CreateWorkflow
$result = $client->createWorkflow
([/* ... */]); $promise = $client->createWorkflowAsync
([/* ... */]);
Creates a new workflow.
Parameter Syntax
$result = $client->createWorkflow([ 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- DefaultRunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
A collection of properties to be used as part of each execution of the workflow.
- Description
-
- Type: string
A description of the workflow.
- MaxConcurrentRuns
-
- Type: int
You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.
- Name
-
- Required: Yes
- Type: string
The name to be assigned to the workflow. It should be unique within your account.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to be used with this workflow.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the workflow which was provided as part of the request.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteBlueprint
$result = $client->deleteBlueprint
([/* ... */]); $promise = $client->deleteBlueprintAsync
([/* ... */]);
Deletes an existing blueprint.
Parameter Syntax
$result = $client->deleteBlueprint([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the blueprint to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Returns the name of the blueprint that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeleteClassifier
$result = $client->deleteClassifier
([/* ... */]); $promise = $client->deleteClassifierAsync
([/* ... */]);
Removes a classifier from the Data Catalog.
Parameter Syntax
$result = $client->deleteClassifier([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the classifier to remove.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
DeleteColumnStatisticsForPartition
$result = $client->deleteColumnStatisticsForPartition
([/* ... */]); $promise = $client->deleteColumnStatisticsForPartitionAsync
([/* ... */]);
Delete the partition column statistics of a column.
The Identity and Access Management (IAM) permission required for this operation is DeletePartition
.
Parameter Syntax
$result = $client->deleteColumnStatisticsForPartition([ 'CatalogId' => '<string>', 'ColumnName' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnName
-
- Required: Yes
- Type: string
Name of the column.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
A list of partition values identifying the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
DeleteColumnStatisticsForTable
$result = $client->deleteColumnStatisticsForTable
([/* ... */]); $promise = $client->deleteColumnStatisticsForTableAsync
([/* ... */]);
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is DeleteTable
.
Parameter Syntax
$result = $client->deleteColumnStatisticsForTable([ 'CatalogId' => '<string>', 'ColumnName' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnName
-
- Required: Yes
- Type: string
The name of the column.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
DeleteColumnStatisticsTaskSettings
$result = $client->deleteColumnStatisticsTaskSettings
([/* ... */]); $promise = $client->deleteColumnStatisticsTaskSettingsAsync
([/* ... */]);
Deletes settings for a column statistics task.
Parameter Syntax
$result = $client->deleteColumnStatisticsTaskSettings([ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to delete column statistics.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
DeleteConnection
$result = $client->deleteConnection
([/* ... */]); $promise = $client->deleteConnectionAsync
([/* ... */]);
Deletes a connection from the Data Catalog.
Parameter Syntax
$result = $client->deleteConnection([ 'CatalogId' => '<string>', 'ConnectionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection to delete.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
DeleteCrawler
$result = $client->deleteCrawler
([/* ... */]); $promise = $client->deleteCrawlerAsync
([/* ... */]);
Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
.
Parameter Syntax
$result = $client->deleteCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the crawler to remove.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- CrawlerRunningException:
The operation cannot be performed because the crawler is already running.
- SchedulerTransitioningException:
The specified scheduler is transitioning.
- OperationTimeoutException:
The operation timed out.
DeleteCustomEntityType
$result = $client->deleteCustomEntityType
([/* ... */]); $promise = $client->deleteCustomEntityTypeAsync
([/* ... */]);
Deletes a custom pattern by specifying its name.
Parameter Syntax
$result = $client->deleteCustomEntityType([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the custom pattern that you want to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the custom pattern you deleted.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
DeleteDataQualityRuleset
$result = $client->deleteDataQualityRuleset
([/* ... */]); $promise = $client->deleteDataQualityRulesetAsync
([/* ... */]);
Deletes a data quality ruleset.
Parameter Syntax
$result = $client->deleteDataQualityRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
A name for the data quality ruleset.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeleteDatabase
$result = $client->deleteDatabase
([/* ... */]); $promise = $client->deleteDatabaseAsync
([/* ... */]);
Removes a specified database from a Data Catalog.
After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling DeleteDatabase
, use DeleteTableVersion
or BatchDeleteTableVersion
, DeletePartition
or BatchDeletePartition
, DeleteUserDefinedFunction
, and DeleteTable
or BatchDeleteTable
, to delete any resources that belong to the database.
Parameter Syntax
$result = $client->deleteDatabase([ 'CatalogId' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.
- Name
-
- Required: Yes
- Type: string
The name of the database to delete. For Hive compatibility, this must be all lowercase.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteDevEndpoint
$result = $client->deleteDevEndpoint
([/* ... */]); $promise = $client->deleteDevEndpointAsync
([/* ... */]);
Deletes a specified development endpoint.
Parameter Syntax
$result = $client->deleteDevEndpoint([ 'EndpointName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EndpointName
-
- Required: Yes
- Type: string
The name of the
DevEndpoint
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
DeleteJob
$result = $client->deleteJob
([/* ... */]); $promise = $client->deleteJobAsync
([/* ... */]);
Deletes a specified job definition. If the job definition is not found, no exception is thrown.
Parameter Syntax
$result = $client->deleteJob([ 'JobName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition to delete.
Result Syntax
[ 'JobName' => '<string>', ]
Result Details
Members
- JobName
-
- Type: string
The name of the job definition that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteMLTransform
$result = $client->deleteMLTransform
([/* ... */]); $promise = $client->deleteMLTransformAsync
([/* ... */]);
Deletes an Glue machine learning transform. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue. If you no longer need a transform, you can delete it by calling DeleteMLTransforms
. However, any Glue jobs that still reference the deleted transform will no longer succeed.
Parameter Syntax
$result = $client->deleteMLTransform([ 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the transform to delete.
Result Syntax
[ 'TransformId' => '<string>', ]
Result Details
Members
- TransformId
-
- Type: string
The unique identifier of the transform that was deleted.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeletePartition
$result = $client->deletePartition
([/* ... */]); $promise = $client->deletePartitionAsync
([/* ... */]);
Deletes a specified partition.
Parameter Syntax
$result = $client->deletePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table in question resides.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
The values that define the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the table that contains the partition to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeletePartitionIndex
$result = $client->deletePartitionIndex
([/* ... */]); $promise = $client->deletePartitionIndexAsync
([/* ... */]);
Deletes a specified partition index from an existing table.
Parameter Syntax
$result = $client->deletePartitionIndex([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'IndexName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
Specifies the name of a database from which you want to delete a partition index.
- IndexName
-
- Required: Yes
- Type: string
The name of the partition index to be deleted.
- TableName
-
- Required: Yes
- Type: string
Specifies the name of a table from which you want to delete a partition index.
Result Syntax
[]
Result Details
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.- GlueEncryptionException:
An encryption operation failed.
DeleteRegistry
$result = $client->deleteRegistry
([/* ... */]); $promise = $client->deleteRegistryAsync
([/* ... */]);
Delete the entire registry including schema and all of its versions. To get the status of the delete operation, you can call the GetRegistry
API after the asynchronous call. Deleting a registry will deactivate all online operations for the registry such as the UpdateRegistry
, CreateSchema
, UpdateSchema
, and RegisterSchemaVersion
APIs.
Parameter Syntax
$result = $client->deleteRegistry([ 'RegistryId' => [ // REQUIRED 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], ]);
Parameter Details
Members
- RegistryId
-
- Required: Yes
- Type: RegistryId structure
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Result Syntax
[ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Status' => 'AVAILABLE|DELETING', ]
Result Details
Members
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry being deleted.
- RegistryName
-
- Type: string
The name of the registry being deleted.
- Status
-
- Type: string
The status of the registry. A successful operation will return the
Deleting
status.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteResourcePolicy
$result = $client->deleteResourcePolicy
([/* ... */]); $promise = $client->deleteResourcePolicyAsync
([/* ... */]);
Deletes a specified policy.
Parameter Syntax
$result = $client->deleteResourcePolicy([ 'PolicyHashCondition' => '<string>', 'ResourceArn' => '<string>', ]);
Parameter Details
Members
- PolicyHashCondition
-
- Type: string
The hash value returned when this policy was set.
- ResourceArn
-
- Type: string
The ARN of the Glue resource for the resource policy to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ConditionCheckFailureException:
A specified condition was not satisfied.
DeleteSchema
$result = $client->deleteSchema
([/* ... */]); $promise = $client->deleteSchemaAsync
([/* ... */]);
Deletes the entire schema set, including the schema set and all of its versions. To get the status of the delete operation, you can call GetSchema
API after the asynchronous call. Deleting a registry will deactivate all online operations for the schema, such as the GetSchemaByDefinition
, and RegisterSchemaVersion
APIs.
Parameter Syntax
$result = $client->deleteSchema([ 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
Result Syntax
[ 'SchemaArn' => '<string>', 'SchemaName' => '<string>', 'Status' => 'AVAILABLE|PENDING|DELETING', ]
Result Details
Members
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema being deleted.
- SchemaName
-
- Type: string
The name of the schema being deleted.
- Status
-
- Type: string
The status of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteSchemaVersions
$result = $client->deleteSchemaVersions
([/* ... */]); $promise = $client->deleteSchemaVersionsAsync
([/* ... */]);
Remove versions from the specified schema. A version number or range may be supplied. If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned. Calling the GetSchemaVersions
API after this call will list the status of the deleted versions.
When the range of version numbers contain check pointed version, the API will return a 409 conflict and will not proceed with the deletion. You have to remove the checkpoint first using the DeleteSchemaCheckpoint
API before using this API.
You cannot use the DeleteSchemaVersions
API to delete the first schema version in the schema set. The first schema version can only be deleted by the DeleteSchema
API. This operation will also delete the attached SchemaVersionMetadata
under the schema versions. Hard deletes will be enforced on the database.
If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.
Parameter Syntax
$result = $client->deleteSchemaVersions([ 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'Versions' => '<string>', // REQUIRED ]);
Parameter Details
Members
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
- Versions
-
- Required: Yes
- Type: string
A version range may be supplied which may be of the format:
-
a single version number, 5
-
a range, 5-8 : deletes versions 5, 6, 7, 8
Result Syntax
[ 'SchemaVersionErrors' => [ [ 'ErrorDetails' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'VersionNumber' => <integer>, ], // ... ], ]
Result Details
Members
- SchemaVersionErrors
-
- Type: Array of SchemaVersionErrorItem structures
A list of
SchemaVersionErrorItem
objects, each containing an error and schema version.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteSecurityConfiguration
$result = $client->deleteSecurityConfiguration
([/* ... */]); $promise = $client->deleteSecurityConfigurationAsync
([/* ... */]);
Deletes a specified security configuration.
Parameter Syntax
$result = $client->deleteSecurityConfiguration([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the security configuration to delete.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteSession
$result = $client->deleteSession
([/* ... */]); $promise = $client->deleteSessionAsync
([/* ... */]);
Deletes the session.
Parameter Syntax
$result = $client->deleteSession([ 'Id' => '<string>', // REQUIRED 'RequestOrigin' => '<string>', ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: string
The ID of the session to be deleted.
- RequestOrigin
-
- Type: string
The name of the origin of the delete session request.
Result Syntax
[ 'Id' => '<string>', ]
Result Details
Members
- Id
-
- Type: string
Returns the ID of the deleted session.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteTable
$result = $client->deleteTable
([/* ... */]); $promise = $client->deleteTableAsync
([/* ... */]);
Removes a table definition from the Data Catalog.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling DeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
Parameter Syntax
$result = $client->deleteTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.
- Name
-
- Required: Yes
- Type: string
The name of the table to be deleted. For Hive compatibility, this name is entirely lowercase.
- TransactionId
-
- Type: string
The transaction ID at which to delete the table contents.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- ResourceNotReadyException:
A resource was not ready for a transaction.
DeleteTableOptimizer
$result = $client->deleteTableOptimizer
([/* ... */]); $promise = $client->deleteTableOptimizerAsync
([/* ... */]);
Deletes an optimizer and all associated metadata for a table. The optimization will no longer be performed on the table.
Parameter Syntax
$result = $client->deleteTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- Type
-
- Required: Yes
- Type: string
The type of table optimizer.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
DeleteTableVersion
$result = $client->deleteTableVersion
([/* ... */]); $promise = $client->deleteTableVersionAsync
([/* ... */]);
Deletes a specified version of a table.
Parameter Syntax
$result = $client->deleteTableVersion([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'VersionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
- VersionId
-
- Required: Yes
- Type: string
The ID of the table version to be deleted. A
VersionID
is a string representation of an integer. Each version is incremented by 1.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteTrigger
$result = $client->deleteTrigger
([/* ... */]); $promise = $client->deleteTriggerAsync
([/* ... */]);
Deletes a specified trigger. If the trigger is not found, no exception is thrown.
Parameter Syntax
$result = $client->deleteTrigger([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteUsageProfile
$result = $client->deleteUsageProfile
([/* ... */]); $promise = $client->deleteUsageProfileAsync
([/* ... */]);
Deletes the Glue specified usage profile.
Parameter Syntax
$result = $client->deleteUsageProfile([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the usage profile to delete.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- OperationNotSupportedException:
The operation is not available in the region.
DeleteUserDefinedFunction
$result = $client->deleteUserDefinedFunction
([/* ... */]); $promise = $client->deleteUserDefinedFunctionAsync
([/* ... */]);
Deletes an existing function definition from the Data Catalog.
Parameter Syntax
$result = $client->deleteUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the function to be deleted is located. If none is supplied, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the function is located.
- FunctionName
-
- Required: Yes
- Type: string
The name of the function definition to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteWorkflow
$result = $client->deleteWorkflow
([/* ... */]); $promise = $client->deleteWorkflowAsync
([/* ... */]);
Deletes a workflow.
Parameter Syntax
$result = $client->deleteWorkflow([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the workflow to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Name of the workflow specified in input.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
GetBlueprint
$result = $client->getBlueprint
([/* ... */]); $promise = $client->getBlueprintAsync
([/* ... */]);
Retrieves the details of a blueprint.
Parameter Syntax
$result = $client->getBlueprint([ 'IncludeBlueprint' => true || false, 'IncludeParameterSpec' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- IncludeBlueprint
-
- Type: boolean
Specifies whether or not to include the blueprint in the response.
- IncludeParameterSpec
-
- Type: boolean
Specifies whether or not to include the parameter specification.
- Name
-
- Required: Yes
- Type: string
The name of the blueprint.
Result Syntax
[ 'Blueprint' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'ErrorMessage' => '<string>', 'LastActiveDefinition' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'ParameterSpec' => '<string>', ], 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'ParameterSpec' => '<string>', 'Status' => 'CREATING|ACTIVE|UPDATING|FAILED', ], ]
Result Details
Members
- Blueprint
-
- Type: Blueprint structure
Returns a
Blueprint
object.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetBlueprintRun
$result = $client->getBlueprintRun
([/* ... */]); $promise = $client->getBlueprintRunAsync
([/* ... */]);
Retrieves the details of a blueprint run.
Parameter Syntax
$result = $client->getBlueprintRun([ 'BlueprintName' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- BlueprintName
-
- Required: Yes
- Type: string
The name of the blueprint.
- RunId
-
- Required: Yes
- Type: string
The run ID for the blueprint run you want to retrieve.
Result Syntax
[ 'BlueprintRun' => [ 'BlueprintName' => '<string>', 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Parameters' => '<string>', 'RoleArn' => '<string>', 'RollbackErrorMessage' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK', 'WorkflowName' => '<string>', ], ]
Result Details
Members
- BlueprintRun
-
- Type: BlueprintRun structure
Returns a
BlueprintRun
object.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetBlueprintRuns
$result = $client->getBlueprintRuns
([/* ... */]); $promise = $client->getBlueprintRunsAsync
([/* ... */]);
Retrieves the details of blueprint runs for a specified blueprint.
Parameter Syntax
$result = $client->getBlueprintRuns([ 'BlueprintName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- BlueprintName
-
- Required: Yes
- Type: string
The name of the blueprint.
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'BlueprintRuns' => [ [ 'BlueprintName' => '<string>', 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Parameters' => '<string>', 'RoleArn' => '<string>', 'RollbackErrorMessage' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK', 'WorkflowName' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- BlueprintRuns
-
- Type: Array of BlueprintRun structures
Returns a list of
BlueprintRun
objects. - NextToken
-
- Type: string
A continuation token, if not all blueprint runs have been returned.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetCatalogImportStatus
$result = $client->getCatalogImportStatus
([/* ... */]); $promise = $client->getCatalogImportStatusAsync
([/* ... */]);
Retrieves the status of a migration operation.
Parameter Syntax
$result = $client->getCatalogImportStatus([ 'CatalogId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog to migrate. Currently, this should be the Amazon Web Services account ID.
Result Syntax
[ 'ImportStatus' => [ 'ImportCompleted' => true || false, 'ImportTime' => <DateTime>, 'ImportedBy' => '<string>', ], ]
Result Details
Members
- ImportStatus
-
- Type: CatalogImportStatus structure
The status of the specified catalog migration.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetClassifier
$result = $client->getClassifier
([/* ... */]); $promise = $client->getClassifierAsync
([/* ... */]);
Retrieve a classifier by name.
Parameter Syntax
$result = $client->getClassifier([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the classifier to retrieve.
Result Syntax
[ 'Classifier' => [ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CreationTime' => <DateTime>, 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', 'Version' => <integer>, ], 'GrokClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'JsonClassifier' => [ 'CreationTime' => <DateTime>, 'JsonPath' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'XMLClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'RowTag' => '<string>', 'Version' => <integer>, ], ], ]
Result Details
Members
- Classifier
-
- Type: Classifier structure
The requested classifier.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
GetClassifiers
$result = $client->getClassifiers
([/* ... */]); $promise = $client->getClassifiersAsync
([/* ... */]);
Lists all classifier objects in the Data Catalog.
Parameter Syntax
$result = $client->getClassifiers([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The size of the list to return (optional).
- NextToken
-
- Type: string
An optional continuation token.
Result Syntax
[ 'Classifiers' => [ [ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CreationTime' => <DateTime>, 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', 'Version' => <integer>, ], 'GrokClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'JsonClassifier' => [ 'CreationTime' => <DateTime>, 'JsonPath' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'XMLClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'RowTag' => '<string>', 'Version' => <integer>, ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Classifiers
-
- Type: Array of Classifier structures
The requested list of classifier objects.
- NextToken
-
- Type: string
A continuation token.
Errors
- OperationTimeoutException:
The operation timed out.
GetColumnStatisticsForPartition
$result = $client->getColumnStatisticsForPartition
([/* ... */]); $promise = $client->getColumnStatisticsForPartitionAsync
([/* ... */]);
Retrieves partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetPartition
.
Parameter Syntax
$result = $client->getColumnStatisticsForPartition([ 'CatalogId' => '<string>', 'ColumnNames' => ['<string>', ...], // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNames
-
- Required: Yes
- Type: Array of strings
A list of the column names.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
A list of partition values identifying the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'ColumnStatisticsList' => [ [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], // ... ], 'Errors' => [ [ 'ColumnName' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- ColumnStatisticsList
-
- Type: Array of ColumnStatistics structures
List of ColumnStatistics that failed to be retrieved.
- Errors
-
- Type: Array of ColumnError structures
Error occurred during retrieving column statistics data.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetColumnStatisticsForTable
$result = $client->getColumnStatisticsForTable
([/* ... */]); $promise = $client->getColumnStatisticsForTableAsync
([/* ... */]);
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetTable
.
Parameter Syntax
$result = $client->getColumnStatisticsForTable([ 'CatalogId' => '<string>', 'ColumnNames' => ['<string>', ...], // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNames
-
- Required: Yes
- Type: Array of strings
A list of the column names.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'ColumnStatisticsList' => [ [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], // ... ], 'Errors' => [ [ 'ColumnName' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- ColumnStatisticsList
-
- Type: Array of ColumnStatistics structures
List of ColumnStatistics.
- Errors
-
- Type: Array of ColumnError structures
List of ColumnStatistics that failed to be retrieved.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetColumnStatisticsTaskRun
$result = $client->getColumnStatisticsTaskRun
([/* ... */]); $promise = $client->getColumnStatisticsTaskRunAsync
([/* ... */]);
Get the associated metadata/information for a task run, given a task run ID.
Parameter Syntax
$result = $client->getColumnStatisticsTaskRun([ 'ColumnStatisticsTaskRunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ColumnStatisticsTaskRunId
-
- Required: Yes
- Type: string
The identifier for the particular column statistics task run.
Result Syntax
[ 'ColumnStatisticsTaskRun' => [ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'ColumnStatisticsTaskRunId' => '<string>', 'ComputationType' => 'FULL|INCREMENTAL', 'CreationTime' => <DateTime>, 'CustomerId' => '<string>', 'DPUSeconds' => <float>, 'DatabaseName' => '<string>', 'EndTime' => <DateTime>, 'ErrorMessage' => '<string>', 'LastUpdated' => <DateTime>, 'NumberOfWorkers' => <integer>, 'Role' => '<string>', 'SampleSize' => <float>, 'SecurityConfiguration' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED', 'TableName' => '<string>', 'WorkerType' => '<string>', ], ]
Result Details
Members
- ColumnStatisticsTaskRun
-
- Type: ColumnStatisticsTaskRun structure
A
ColumnStatisticsTaskRun
object representing the details of the column stats run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetColumnStatisticsTaskRuns
$result = $client->getColumnStatisticsTaskRuns
([/* ... */]); $promise = $client->getColumnStatisticsTaskRunsAsync
([/* ... */]);
Retrieves information about all runs associated with the specified table.
Parameter Syntax
$result = $client->getColumnStatisticsTaskRuns([ 'DatabaseName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
Result Syntax
[ 'ColumnStatisticsTaskRuns' => [ [ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'ColumnStatisticsTaskRunId' => '<string>', 'ComputationType' => 'FULL|INCREMENTAL', 'CreationTime' => <DateTime>, 'CustomerId' => '<string>', 'DPUSeconds' => <float>, 'DatabaseName' => '<string>', 'EndTime' => <DateTime>, 'ErrorMessage' => '<string>', 'LastUpdated' => <DateTime>, 'NumberOfWorkers' => <integer>, 'Role' => '<string>', 'SampleSize' => <float>, 'SecurityConfiguration' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED', 'TableName' => '<string>', 'WorkerType' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- ColumnStatisticsTaskRuns
-
- Type: Array of ColumnStatisticsTaskRun structures
A list of column statistics task runs.
- NextToken
-
- Type: string
A continuation token, if not all task runs have yet been returned.
Errors
- OperationTimeoutException:
The operation timed out.
GetColumnStatisticsTaskSettings
$result = $client->getColumnStatisticsTaskSettings
([/* ... */]); $promise = $client->getColumnStatisticsTaskSettingsAsync
([/* ... */]);
Gets settings for a column statistics task.
Parameter Syntax
$result = $client->getColumnStatisticsTaskSettings([ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to retrieve column statistics.
Result Syntax
[ 'ColumnStatisticsTaskSettings' => [ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'DatabaseName' => '<string>', 'Role' => '<string>', 'SampleSize' => <float>, 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SecurityConfiguration' => '<string>', 'TableName' => '<string>', ], ]
Result Details
Members
- ColumnStatisticsTaskSettings
-
- Type: ColumnStatisticsTaskSettings structure
A
ColumnStatisticsTaskSettings
object representing the settings for the column statistics task.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
GetConnection
$result = $client->getConnection
([/* ... */]); $promise = $client->getConnectionAsync
([/* ... */]);
Retrieves a connection definition from the Data Catalog.
Parameter Syntax
$result = $client->getConnection([ 'CatalogId' => '<string>', 'HidePassword' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.
- HidePassword
-
- Type: boolean
Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.
- Name
-
- Required: Yes
- Type: string
The name of the connection definition to retrieve.
Result Syntax
[ 'Connection' => [ 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'CreationTime' => <DateTime>, 'Description' => '<string>', 'LastConnectionValidationTime' => <DateTime>, 'LastUpdatedBy' => '<string>', 'LastUpdatedTime' => <DateTime>, 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'Status' => 'READY|IN_PROGRESS|FAILED', 'StatusReason' => '<string>', ], ]
Result Details
Members
- Connection
-
- Type: Connection structure
The requested connection definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
GetConnections
$result = $client->getConnections
([/* ... */]); $promise = $client->getConnectionsAsync
([/* ... */]);
Retrieves a list of connection definitions from the Data Catalog.
Parameter Syntax
$result = $client->getConnections([ 'CatalogId' => '<string>', 'Filter' => [ 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'MatchCriteria' => ['<string>', ...], ], 'HidePassword' => true || false, 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.
- Filter
-
- Type: GetConnectionsFilter structure
A filter that controls which connections are returned.
- HidePassword
-
- Type: boolean
Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.
- MaxResults
-
- Type: int
The maximum number of connections to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'ConnectionList' => [ [ 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'CreationTime' => <DateTime>, 'Description' => '<string>', 'LastConnectionValidationTime' => <DateTime>, 'LastUpdatedBy' => '<string>', 'LastUpdatedTime' => <DateTime>, 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'Status' => 'READY|IN_PROGRESS|FAILED', 'StatusReason' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- ConnectionList
-
- Type: Array of Connection structures
A list of requested connection definitions.
- NextToken
-
- Type: string
A continuation token, if the list of connections returned does not include the last of the filtered connections.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
GetCrawler
$result = $client->getCrawler
([/* ... */]); $promise = $client->getCrawlerAsync
([/* ... */]);
Retrieves metadata for a specified crawler.
Parameter Syntax
$result = $client->getCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the crawler to retrieve metadata for.
Result Syntax
[ 'Crawler' => [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], ]
Result Details
Members
- Crawler
-
- Type: Crawler structure
The metadata for the specified crawler.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
GetCrawlerMetrics
$result = $client->getCrawlerMetrics
([/* ... */]); $promise = $client->getCrawlerMetricsAsync
([/* ... */]);
Retrieves metrics about specified crawlers.
Parameter Syntax
$result = $client->getCrawlerMetrics([ 'CrawlerNameList' => ['<string>', ...], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- CrawlerNameList
-
- Type: Array of strings
A list of the names of crawlers about which to retrieve metrics.
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'CrawlerMetricsList' => [ [ 'CrawlerName' => '<string>', 'LastRuntimeSeconds' => <float>, 'MedianRuntimeSeconds' => <float>, 'StillEstimating' => true || false, 'TablesCreated' => <integer>, 'TablesDeleted' => <integer>, 'TablesUpdated' => <integer>, 'TimeLeftSeconds' => <float>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- CrawlerMetricsList
-
- Type: Array of CrawlerMetrics structures
A list of metrics for the specified crawler.
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
Errors
- OperationTimeoutException:
The operation timed out.
GetCrawlers
$result = $client->getCrawlers
([/* ... */]); $promise = $client->getCrawlersAsync
([/* ... */]);
Retrieves metadata for all crawlers defined in the customer account.
Parameter Syntax
$result = $client->getCrawlers([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The number of crawlers to return on each call.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'Crawlers' => [ [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Crawlers
-
- Type: Array of Crawler structures
A list of crawler metadata.
- NextToken
-
- Type: string
A continuation token, if the returned list has not reached the end of those defined in this customer account.
Errors
- OperationTimeoutException:
The operation timed out.
GetCustomEntityType
$result = $client->getCustomEntityType
([/* ... */]); $promise = $client->getCustomEntityTypeAsync
([/* ... */]);
Retrieves the details of a custom pattern by specifying its name.
Parameter Syntax
$result = $client->getCustomEntityType([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the custom pattern that you want to retrieve.
Result Syntax
[ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', 'RegexString' => '<string>', ]
Result Details
Members
- ContextWords
-
- Type: Array of strings
A list of context words if specified when you created the custom pattern. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.
- Name
-
- Type: string
The name of the custom pattern that you retrieved.
- RegexString
-
- Type: string
A regular expression string that is used for detecting sensitive data in a custom pattern.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
GetDataCatalogEncryptionSettings
$result = $client->getDataCatalogEncryptionSettings
([/* ... */]); $promise = $client->getDataCatalogEncryptionSettingsAsync
([/* ... */]);
Retrieves the security configuration for a specified catalog.
Parameter Syntax
$result = $client->getDataCatalogEncryptionSettings([ 'CatalogId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog to retrieve the security configuration for. If none is provided, the Amazon Web Services account ID is used by default.
Result Syntax
[ 'DataCatalogEncryptionSettings' => [ 'ConnectionPasswordEncryption' => [ 'AwsKmsKeyId' => '<string>', 'ReturnConnectionPasswordEncrypted' => true || false, ], 'EncryptionAtRest' => [ 'CatalogEncryptionMode' => 'DISABLED|SSE-KMS|SSE-KMS-WITH-SERVICE-ROLE', 'CatalogEncryptionServiceRole' => '<string>', 'SseAwsKmsKeyId' => '<string>', ], ], ]
Result Details
Members
- DataCatalogEncryptionSettings
-
- Type: DataCatalogEncryptionSettings structure
The requested security configuration.
Errors
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
GetDataQualityModel
$result = $client->getDataQualityModel
([/* ... */]); $promise = $client->getDataQualityModelAsync
([/* ... */]);
Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
Parameter Syntax
$result = $client->getDataQualityModel([ 'ProfileId' => '<string>', // REQUIRED 'StatisticId' => '<string>', ]);
Parameter Details
Members
- ProfileId
-
- Required: Yes
- Type: string
The Profile ID.
- StatisticId
-
- Type: string
The Statistic ID.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'FailureReason' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'RUNNING|SUCCEEDED|FAILED', ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training completed.
- FailureReason
-
- Type: string
The training failure reason.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training started.
- Status
-
- Type: string
The training status of the data quality model.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityModelResult
$result = $client->getDataQualityModelResult
([/* ... */]); $promise = $client->getDataQualityModelResultAsync
([/* ... */]);
Retrieve a statistic's predictions for a given Profile ID.
Parameter Syntax
$result = $client->getDataQualityModelResult([ 'ProfileId' => '<string>', // REQUIRED 'StatisticId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ProfileId
-
- Required: Yes
- Type: string
The Profile ID.
- StatisticId
-
- Required: Yes
- Type: string
The Statistic ID.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'Model' => [ [ 'ActualValue' => <float>, 'Date' => <DateTime>, 'InclusionAnnotation' => 'INCLUDE|EXCLUDE', 'LowerBound' => <float>, 'PredictedValue' => <float>, 'UpperBound' => <float>, ], // ... ], ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training completed.
- Model
-
- Type: Array of StatisticModelResult structures
A list of
StatisticModelResult
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityResult
$result = $client->getDataQualityResult
([/* ... */]); $promise = $client->getDataQualityResultAsync
([/* ... */]);
Retrieves the result of a data quality rule evaluation.
Parameter Syntax
$result = $client->getDataQualityResult([ 'ResultId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ResultId
-
- Required: Yes
- Type: string
A unique result ID for the data quality result.
Result Syntax
[ 'AnalyzerResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluationMessage' => '<string>', 'Name' => '<string>', ], // ... ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'EvaluationContext' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', 'Observations' => [ [ 'Description' => '<string>', 'MetricBasedObservation' => [ 'MetricName' => '<string>', 'MetricValues' => [ 'ActualValue' => <float>, 'ExpectedValue' => <float>, 'LowerLimit' => <float>, 'UpperLimit' => <float>, ], 'NewRules' => ['<string>', ...], 'StatisticId' => '<string>', ], ], // ... ], 'ProfileId' => '<string>', 'ResultId' => '<string>', 'RuleResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluatedRule' => '<string>', 'EvaluationMessage' => '<string>', 'Name' => '<string>', 'Result' => 'PASS|FAIL|ERROR', ], // ... ], 'RulesetEvaluationRunId' => '<string>', 'RulesetName' => '<string>', 'Score' => <float>, 'StartedOn' => <DateTime>, ]
Result Details
Members
- AnalyzerResults
-
- Type: Array of DataQualityAnalyzerResult structures
A list of
DataQualityAnalyzerResult
objects representing the results for each analyzer. - CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the run for this data quality result was completed.
- DataSource
-
- Type: DataSource structure
The table associated with the data quality result, if any.
- EvaluationContext
-
- Type: string
In the context of a job in Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the
evaluationContext
can differentiate the nodes. - JobName
-
- Type: string
The job name associated with the data quality result, if any.
- JobRunId
-
- Type: string
The job run ID associated with the data quality result, if any.
- Observations
-
- Type: Array of DataQualityObservation structures
A list of
DataQualityObservation
objects representing the observations generated after evaluating the rules and analyzers. - ProfileId
-
- Type: string
The Profile ID for the data quality result.
- ResultId
-
- Type: string
A unique result ID for the data quality result.
- RuleResults
-
- Type: Array of DataQualityRuleResult structures
A list of
DataQualityRuleResult
objects representing the results for each rule. - RulesetEvaluationRunId
-
- Type: string
The unique run ID associated with the ruleset evaluation.
- RulesetName
-
- Type: string
The name of the ruleset associated with the data quality result.
- Score
-
- Type: double
An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the run for this data quality result started.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
GetDataQualityRuleRecommendationRun
$result = $client->getDataQualityRuleRecommendationRun
([/* ... */]); $promise = $client->getDataQualityRuleRecommendationRunAsync
([/* ... */]);
Gets the specified recommendation run that was used to generate rules.
Parameter Syntax
$result = $client->getDataQualityRuleRecommendationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'CreatedRulesetName' => '<string>', 'DataQualitySecurityConfiguration' => '<string>', 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'NumberOfWorkers' => <integer>, 'RecommendedRuleset' => '<string>', 'Role' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'Timeout' => <integer>, ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run was completed.
- CreatedRulesetName
-
- Type: string
The name of the ruleset that was created by the run.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- DataSource
-
- Type: DataSource structure
The data source (an Glue table) associated with this run.
- ErrorString
-
- Type: string
The error strings that are associated with the run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality rule recommendation run was modified.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - RecommendedRuleset
-
- Type: string
When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.
- Role
-
- Type: string
An IAM role supplied to encrypt the results of the run.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run started.
- Status
-
- Type: string
The status for this run.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityRuleset
$result = $client->getDataQualityRuleset
([/* ... */]); $promise = $client->getDataQualityRulesetAsync
([/* ... */]);
Returns an existing ruleset by identifier or name.
Parameter Syntax
$result = $client->getDataQualityRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the ruleset.
Result Syntax
[ 'CreatedOn' => <DateTime>, 'DataQualitySecurityConfiguration' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'RecommendationRunId' => '<string>', 'Ruleset' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ]
Result Details
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The time and date that this data quality ruleset was created.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- Description
-
- Type: string
A description of the ruleset.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality ruleset was modified.
- Name
-
- Type: string
The name of the ruleset.
- RecommendationRunId
-
- Type: string
When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
- Ruleset
-
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
- TargetTable
-
- Type: DataQualityTargetTable structure
The name and database name of the target table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityRulesetEvaluationRun
$result = $client->getDataQualityRulesetEvaluationRun
([/* ... */]); $promise = $client->getDataQualityRulesetEvaluationRunAsync
([/* ... */]);
Retrieves a specific run where a ruleset is evaluated against a data source.
Parameter Syntax
$result = $client->getDataQualityRulesetEvaluationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[ 'AdditionalDataSources' => [ '<NameString>' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], // ... ], 'AdditionalRunOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'CompositeRuleEvaluationMethod' => 'COLUMN|ROW', 'ResultsS3Prefix' => '<string>', ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'NumberOfWorkers' => <integer>, 'ResultIds' => ['<string>', ...], 'Role' => '<string>', 'RulesetNames' => ['<string>', ...], 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'Timeout' => <integer>, ]
Result Details
Members
- AdditionalDataSources
-
- Type: Associative array of custom strings keys (NameString) to DataSource structures
A map of reference strings to additional data sources you can specify for an evaluation run.
- AdditionalRunOptions
-
- Type: DataQualityEvaluationRunAdditionalRunOptions structure
Additional run options you can specify for an evaluation run.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run was completed.
- DataSource
-
- Type: DataSource structure
The data source (an Glue table) associated with this evaluation run.
- ErrorString
-
- Type: string
The error strings that are associated with the run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality rule recommendation run was modified.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - ResultIds
-
- Type: Array of strings
A list of result IDs for the data quality results for the run.
- Role
-
- Type: string
An IAM role supplied to encrypt the results of the run.
- RulesetNames
-
- Type: Array of strings
A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run started.
- Status
-
- Type: string
The status for this run.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDatabase
$result = $client->getDatabase
([/* ... */]); $promise = $client->getDatabaseAsync
([/* ... */]);
Retrieves the definition of a specified database.
Parameter Syntax
$result = $client->getDatabase([ 'CatalogId' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.
- Name
-
- Required: Yes
- Type: string
The name of the database to retrieve. For Hive compatibility, this should be all lowercase.
Result Syntax
[ 'Database' => [ 'CatalogId' => '<string>', 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'CreateTime' => <DateTime>, 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], ]
Result Details
Members
- Database
-
- Type: Database structure
The definition of the specified database in the Data Catalog.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- FederationSourceException:
A federation source failed.
GetDatabases
$result = $client->getDatabases
([/* ... */]); $promise = $client->getDatabasesAsync
([/* ... */]);
Retrieves all databases defined in a given Data Catalog.
Parameter Syntax
$result = $client->getDatabases([ 'AttributesToGet' => ['<string>', ...], 'CatalogId' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ResourceShareType' => 'FOREIGN|ALL|FEDERATED', ]);
Parameter Details
Members
- AttributesToGet
-
- Type: Array of strings
Specifies the database fields returned by the
GetDatabases
call. This parameter doesn’t accept an empty list. The request must include theNAME
. - CatalogId
-
- Type: string
The ID of the Data Catalog from which to retrieve
Databases
. If none is provided, the Amazon Web Services account ID is used by default. - MaxResults
-
- Type: int
The maximum number of databases to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- ResourceShareType
-
- Type: string
Allows you to specify that you want to list the databases shared with your account. The allowable values are
FEDERATED
,FOREIGN
orALL
.-
If set to
FEDERATED
, will list the federated databases (referencing an external entity) shared with your account. -
If set to
FOREIGN
, will list the databases shared with your account. -
If set to
ALL
, will list the databases shared with your account, as well as the databases in yor local account.
Result Syntax
[ 'DatabaseList' => [ [ 'CatalogId' => '<string>', 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'CreateTime' => <DateTime>, 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- DatabaseList
-
- Required: Yes
- Type: Array of Database structures
A list of
Database
objects from the specified catalog. - NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetDataflowGraph
$result = $client->getDataflowGraph
([/* ... */]); $promise = $client->getDataflowGraphAsync
([/* ... */]);
Transforms a Python script into a directed acyclic graph (DAG).
Parameter Syntax
$result = $client->getDataflowGraph([ 'PythonScript' => '<string>', ]);
Parameter Details
Members
- PythonScript
-
- Type: string
The Python script to transform.
Result Syntax
[ 'DagEdges' => [ [ 'Source' => '<string>', 'Target' => '<string>', 'TargetParameter' => '<string>', ], // ... ], 'DagNodes' => [ [ 'Args' => [ [ 'Name' => '<string>', 'Param' => true || false, 'Value' => '<string>', ], // ... ], 'Id' => '<string>', 'LineNumber' => <integer>, 'NodeType' => '<string>', ], // ... ], ]
Result Details
Members
- DagEdges
-
- Type: Array of CodeGenEdge structures
A list of the edges in the resulting DAG.
- DagNodes
-
- Type: Array of CodeGenNode structures
A list of the nodes in the resulting DAG.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetDevEndpoint
$result = $client->getDevEndpoint
([/* ... */]); $promise = $client->getDevEndpointAsync
([/* ... */]);
Retrieves information about a specified development endpoint.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address, and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
Parameter Syntax
$result = $client->getDevEndpoint([ 'EndpointName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EndpointName
-
- Required: Yes
- Type: string
Name of the
DevEndpoint
to retrieve information for.
Result Syntax
[ 'DevEndpoint' => [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], ]
Result Details
Members
- DevEndpoint
-
- Type: DevEndpoint structure
A
DevEndpoint
definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetDevEndpoints
$result = $client->getDevEndpoints
([/* ... */]); $promise = $client->getDevEndpointsAsync
([/* ... */]);
Retrieves all the development endpoints in this Amazon Web Services account.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
Parameter Syntax
$result = $client->getDevEndpoints([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of information to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'DevEndpoints' => [ [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- DevEndpoints
-
- Type: Array of DevEndpoint structures
A list of
DevEndpoint
definitions. - NextToken
-
- Type: string
A continuation token, if not all
DevEndpoint
definitions have yet been returned.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetJob
$result = $client->getJob
([/* ... */]); $promise = $client->getJobAsync
([/* ... */]);
Retrieves an existing job definition.
Parameter Syntax
$result = $client->getJob([ 'JobName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition to retrieve.
Result Syntax
[ 'Job' => [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', 'Type' => 'str|int|float|complex|bool|list|null', 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', 'TransformName' => '<string>', 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'Filter' => [ 'Filters' => [ [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', 'Values' => [ [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', 'Value' => ['<string>', ...], ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'LogicalOperator' => 'AND|OR', 'Name' => '<string>', ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ [ 'From' => '<string>', 'Keys' => [ ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', 'Name' => '<string>', ], 'Merge' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PrimaryKeys' => [ ['<string>', ...], // ... ], 'Source' => '<string>', ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'MaskValue' => '<string>', 'Name' => '<string>', 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'Recipe' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RecipeReference' => [ 'RecipeArn' => '<string>', 'RecipeVersion' => '<string>', ], 'RecipeSteps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'RenameField' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'SourcePath' => ['<string>', ...], 'TargetPath' => ['<string>', ...], ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'S3CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'QuoteChar' => 'quote|quillemet|single_quote|disabled', 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'gzip|lzo|uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SnowflakeSource' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ [ 'Alias' => '<string>', 'From' => '<string>', ], // ... ], 'SqlQuery' => '<string>', ], 'Spigot' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Path' => '<string>', 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'UnionType' => 'ALL|DISTINCT', ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LastModifiedOn' => <DateTime>, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- Job
-
- Type: Job structure
The requested job definition.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobBookmark
$result = $client->getJobBookmark
([/* ... */]); $promise = $client->getJobBookmarkAsync
([/* ... */]);
Returns information on a job bookmark entry.
For more information about enabling and using job bookmarks, see:
Parameter Syntax
$result = $client->getJobBookmark([ 'JobName' => '<string>', // REQUIRED 'RunId' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job in question.
- RunId
-
- Type: string
The unique run identifier associated with this job run.
Result Syntax
[ 'JobBookmarkEntry' => [ 'Attempt' => <integer>, 'JobBookmark' => '<string>', 'JobName' => '<string>', 'PreviousRunId' => '<string>', 'Run' => <integer>, 'RunId' => '<string>', 'Version' => <integer>, ], ]
Result Details
Members
- JobBookmarkEntry
-
- Type: JobBookmarkEntry structure
A structure that defines a point that a job can resume processing.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ValidationException:
A value could not be validated.
GetJobRun
$result = $client->getJobRun
([/* ... */]); $promise = $client->getJobRunAsync
([/* ... */]);
Retrieves the metadata for a given job run. Job run history is accessible for 90 days for your workflow and job run.
Parameter Syntax
$result = $client->getJobRun([ 'JobName' => '<string>', // REQUIRED 'PredecessorsIncluded' => true || false, 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
Name of the job definition being run.
- PredecessorsIncluded
-
- Type: boolean
True if a list of predecessor runs should be returned.
- RunId
-
- Required: Yes
- Type: string
The ID of the job run.
Result Syntax
[ 'JobRun' => [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- JobRun
-
- Type: JobRun structure
The requested job-run metadata.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobRuns
$result = $client->getJobRuns
([/* ... */]); $promise = $client->getJobRunsAsync
([/* ... */]);
Retrieves metadata for all runs of a given job definition.
Parameter Syntax
$result = $client->getJobRuns([ 'JobName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition for which to retrieve all job runs.
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- JobRuns
-
- Type: Array of JobRun structures
A list of job-run metadata objects.
- NextToken
-
- Type: string
A continuation token, if not all requested job runs have been returned.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobs
$result = $client->getJobs
([/* ... */]); $promise = $client->getJobsAsync
([/* ... */]);
Retrieves all current job definitions.
Parameter Syntax
$result = $client->getJobs([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'Jobs' => [ [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', 'Type' => 'str|int|float|complex|bool|list|null', 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', 'TransformName' => '<string>', 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'Filter' => [ 'Filters' => [ [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', 'Values' => [ [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', 'Value' => ['<string>', ...], ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'LogicalOperator' => 'AND|OR', 'Name' => '<string>', ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ [ 'From' => '<string>', 'Keys' => [ ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', 'Name' => '<string>', ], 'Merge' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PrimaryKeys' => [ ['<string>', ...], // ... ], 'Source' => '<string>', ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'MaskValue' => '<string>', 'Name' => '<string>', 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'Recipe' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RecipeReference' => [ 'RecipeArn' => '<string>', 'RecipeVersion' => '<string>', ], 'RecipeSteps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'RenameField' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'SourcePath' => ['<string>', ...], 'TargetPath' => ['<string>', ...], ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'S3CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'QuoteChar' => 'quote|quillemet|single_quote|disabled', 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'gzip|lzo|uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SnowflakeSource' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ [ 'Alias' => '<string>', 'From' => '<string>', ], // ... ], 'SqlQuery' => '<string>', ], 'Spigot' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Path' => '<string>', 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'UnionType' => 'ALL|DISTINCT', ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LastModifiedOn' => <DateTime>, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Jobs
-
- Type: Array of Job structures
A list of job definitions.
- NextToken
-
- Type: string
A continuation token, if not all job definitions have yet been returned.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetMLTaskRun
$result = $client->getMLTaskRun
([/* ... */]); $promise = $client->getMLTaskRunAsync
([/* ... */]);
Gets details for a specific task run on a machine learning transform. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can check the stats of any task run by calling GetMLTaskRun
with the TaskRunID
and its parent transform's TransformID
.
Parameter Syntax
$result = $client->getMLTaskRun([ 'TaskRunId' => '<string>', // REQUIRED 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TaskRunId
-
- Required: Yes
- Type: string
The unique identifier of the task run.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'Properties' => [ 'ExportLabelsTaskRunProperties' => [ 'OutputS3Path' => '<string>', ], 'FindMatchesTaskRunProperties' => [ 'JobId' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', ], 'ImportLabelsTaskRunProperties' => [ 'InputS3Path' => '<string>', 'Replace' => true || false, ], 'LabelingSetGenerationTaskRunProperties' => [ 'OutputS3Path' => '<string>', ], 'TaskType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES', ], 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'TaskRunId' => '<string>', 'TransformId' => '<string>', ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this task run was completed.
- ErrorString
-
- Type: string
The error strings that are associated with the task run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the task run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this task run was last modified.
- LogGroupName
-
- Type: string
The names of the log groups that are associated with the task run.
- Properties
-
- Type: TaskRunProperties structure
The list of properties that are associated with the task run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this task run started.
- Status
-
- Type: string
The status for this task run.
- TaskRunId
-
- Type: string
The unique run identifier associated with this run.
- TransformId
-
- Type: string
The unique identifier of the task run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetMLTaskRuns
$result = $client->getMLTaskRuns
([/* ... */]); $promise = $client->getMLTaskRunsAsync
([/* ... */]);
Gets a list of runs for a machine learning transform. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can get a sortable, filterable list of machine learning task runs by calling GetMLTaskRuns
with their parent transform's TransformID
and other optional parameters as documented in this section.
This operation returns a list of historic runs and must be paginated.
Parameter Syntax
$result = $client->getMLTaskRuns([ 'Filter' => [ 'StartedAfter' => <integer || string || DateTime>, 'StartedBefore' => <integer || string || DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'TaskRunType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES', ], 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Sort' => [ 'Column' => 'TASK_RUN_TYPE|STATUS|STARTED', // REQUIRED 'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED ], 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Filter
-
- Type: TaskRunFilterCriteria structure
The filter criteria, in the
TaskRunFilterCriteria
structure, for the task run. - MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A token for pagination of the results. The default is empty.
- Sort
-
- Type: TaskRunSortCriteria structure
The sorting criteria, in the
TaskRunSortCriteria
structure, for the task run. - TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'NextToken' => '<string>', 'TaskRuns' => [ [ 'CompletedOn' => <DateTime>, 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'Properties' => [ 'ExportLabelsTaskRunProperties' => [ 'OutputS3Path' => '<string>', ], 'FindMatchesTaskRunProperties' => [ 'JobId' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', ], 'ImportLabelsTaskRunProperties' => [ 'InputS3Path' => '<string>', 'Replace' => true || false, ], 'LabelingSetGenerationTaskRunProperties' => [ 'OutputS3Path' => '<string>', ], 'TaskType' => 'EVALUATION|LABELING_SET_GENERATION|IMPORT_LABELS|EXPORT_LABELS|FIND_MATCHES', ], 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'TaskRunId' => '<string>', 'TransformId' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- TaskRuns
-
- Type: Array of TaskRun structures
A list of task runs that are associated with the transform.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetMLTransform
$result = $client->getMLTransform
([/* ... */]); $promise = $client->getMLTransformAsync
([/* ... */]);
Gets an Glue machine learning transform artifact and all its corresponding metadata. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue. You can retrieve their metadata by calling GetMLTransform
.
Parameter Syntax
$result = $client->getMLTransform([ 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the transform, generated at the time that the transform was created.
Result Syntax
[ 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'EvaluationMetrics' => [ 'FindMatchesMetrics' => [ 'AreaUnderPRCurve' => <float>, 'ColumnImportances' => [ [ 'ColumnName' => '<string>', 'Importance' => <float>, ], // ... ], 'ConfusionMatrix' => [ 'NumFalseNegatives' => <integer>, 'NumFalsePositives' => <integer>, 'NumTrueNegatives' => <integer>, 'NumTruePositives' => <integer>, ], 'F1' => <float>, 'Precision' => <float>, 'Recall' => <float>, ], 'TransformType' => 'FIND_MATCHES', ], 'GlueVersion' => '<string>', 'InputRecordTables' => [ [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], // ... ], 'LabelCount' => <integer>, 'LastModifiedOn' => <DateTime>, 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NumberOfWorkers' => <integer>, 'Parameters' => [ 'FindMatchesParameters' => [ 'AccuracyCostTradeoff' => <float>, 'EnforceProvidedLabels' => true || false, 'PrecisionRecallTradeoff' => <float>, 'PrimaryKeyColumnName' => '<string>', ], 'TransformType' => 'FIND_MATCHES', ], 'Role' => '<string>', 'Schema' => [ [ 'DataType' => '<string>', 'Name' => '<string>', ], // ... ], 'Status' => 'NOT_READY|READY|DELETING', 'Timeout' => <integer>, 'TransformEncryption' => [ 'MlUserDataEncryption' => [ 'KmsKeyId' => '<string>', 'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS', ], 'TaskRunSecurityConfigurationName' => '<string>', ], 'TransformId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]
Result Details
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the transform was created.
- Description
-
- Type: string
A description of the transform.
- EvaluationMetrics
-
- Type: EvaluationMetrics structure
The latest evaluation metrics.
- GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- InputRecordTables
-
- Type: Array of GlueTable structures
A list of Glue table definitions used by the transform.
- LabelCount
-
- Type: int
The number of labels available for this transform.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the transform was last modified.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only. - MaxRetries
-
- Type: int
The maximum number of times to retry a task for this transform after a task run fails.
- Name
-
- Type: string
The unique name given to the transform when it was created.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when this task runs. - Parameters
-
- Type: TransformParameters structure
The configuration parameters that are specific to the algorithm used.
- Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions.
- Schema
-
- Type: Array of SchemaColumn structures
The
Map<Column, Type>
object that represents the schema that this transform accepts. Has an upper bound of 100 columns. - Status
-
- Type: string
The last known status of the transform (to indicate whether it can be used or not). One of "NOT_READY", "READY", or "DELETING".
- Timeout
-
- Type: int
The timeout for a task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours). - TransformEncryption
-
- Type: TransformEncryption structure
The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.
- TransformId
-
- Type: string
The unique identifier of the transform, generated at the time that the transform was created.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker. -
For the
G.2X
worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetMLTransforms
$result = $client->getMLTransforms
([/* ... */]); $promise = $client->getMLTransformsAsync
([/* ... */]);
Gets a sortable, filterable list of existing Glue machine learning transforms. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue, and you can retrieve their metadata by calling GetMLTransforms
.
Parameter Syntax
$result = $client->getMLTransforms([ 'Filter' => [ 'CreatedAfter' => <integer || string || DateTime>, 'CreatedBefore' => <integer || string || DateTime>, 'GlueVersion' => '<string>', 'LastModifiedAfter' => <integer || string || DateTime>, 'LastModifiedBefore' => <integer || string || DateTime>, 'Name' => '<string>', 'Schema' => [ [ 'DataType' => '<string>', 'Name' => '<string>', ], // ... ], 'Status' => 'NOT_READY|READY|DELETING', 'TransformType' => 'FIND_MATCHES', ], 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Sort' => [ 'Column' => 'NAME|TRANSFORM_TYPE|STATUS|CREATED|LAST_MODIFIED', // REQUIRED 'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED ], ]);
Parameter Details
Members
- Filter
-
- Type: TransformFilterCriteria structure
The filter transformation criteria.
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
- Sort
-
- Type: TransformSortCriteria structure
The sorting criteria.
Result Syntax
[ 'NextToken' => '<string>', 'Transforms' => [ [ 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'EvaluationMetrics' => [ 'FindMatchesMetrics' => [ 'AreaUnderPRCurve' => <float>, 'ColumnImportances' => [ [ 'ColumnName' => '<string>', 'Importance' => <float>, ], // ... ], 'ConfusionMatrix' => [ 'NumFalseNegatives' => <integer>, 'NumFalsePositives' => <integer>, 'NumTrueNegatives' => <integer>, 'NumTruePositives' => <integer>, ], 'F1' => <float>, 'Precision' => <float>, 'Recall' => <float>, ], 'TransformType' => 'FIND_MATCHES', ], 'GlueVersion' => '<string>', 'InputRecordTables' => [ [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], // ... ], 'LabelCount' => <integer>, 'LastModifiedOn' => <DateTime>, 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NumberOfWorkers' => <integer>, 'Parameters' => [ 'FindMatchesParameters' => [ 'AccuracyCostTradeoff' => <float>, 'EnforceProvidedLabels' => true || false, 'PrecisionRecallTradeoff' => <float>, 'PrimaryKeyColumnName' => '<string>', ], 'TransformType' => 'FIND_MATCHES', ], 'Role' => '<string>', 'Schema' => [ [ 'DataType' => '<string>', 'Name' => '<string>', ], // ... ], 'Status' => 'NOT_READY|READY|DELETING', 'Timeout' => <integer>, 'TransformEncryption' => [ 'MlUserDataEncryption' => [ 'KmsKeyId' => '<string>', 'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS', ], 'TaskRunSecurityConfigurationName' => '<string>', ], 'TransformId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- Transforms
-
- Required: Yes
- Type: Array of MLTransform structures
A list of machine learning transforms.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetMapping
$result = $client->getMapping
([/* ... */]); $promise = $client->getMappingAsync
([/* ... */]);
Creates mappings.
Parameter Syntax
$result = $client->getMapping([ 'Location' => [ 'DynamoDB' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'Jdbc' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'S3' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], ], 'Sinks' => [ [ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], // ... ], 'Source' => [ // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ]);
Parameter Details
Members
- Location
-
- Type: Location structure
Parameters for the mapping.
- Sinks
-
- Type: Array of CatalogEntry structures
A list of target tables.
- Source
-
- Required: Yes
- Type: CatalogEntry structure
Specifies the source table.
Result Syntax
[ 'Mapping' => [ [ 'SourcePath' => '<string>', 'SourceTable' => '<string>', 'SourceType' => '<string>', 'TargetPath' => '<string>', 'TargetTable' => '<string>', 'TargetType' => '<string>', ], // ... ], ]
Result Details
Members
- Mapping
-
- Required: Yes
- Type: Array of MappingEntry structures
A list of mappings to the specified targets.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- EntityNotFoundException:
A specified entity does not exist
GetPartition
$result = $client->getPartition
([/* ... */]); $promise = $client->getPartitionAsync
([/* ... */]);
Retrieves information about a specified partition.
Parameter Syntax
$result = $client->getPartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition in question resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partition resides.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
The values that define the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partition's table.
Result Syntax
[ 'Partition' => [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], ]
Result Details
Members
- Partition
-
- Type: Partition structure
The requested information, in the form of a
Partition
object.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetPartitionIndexes
$result = $client->getPartitionIndexes
([/* ... */]); $promise = $client->getPartitionIndexesAsync
([/* ... */]);
Retrieves the partition indexes associated with a table.
Parameter Syntax
$result = $client->getPartitionIndexes([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'NextToken' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
Specifies the name of a database from which you want to retrieve partition indexes.
- NextToken
-
- Type: string
A continuation token, included if this is a continuation call.
- TableName
-
- Required: Yes
- Type: string
Specifies the name of a table for which you want to retrieve the partition indexes.
Result Syntax
[ 'NextToken' => '<string>', 'PartitionIndexDescriptorList' => [ [ 'BackfillErrors' => [ [ 'Code' => 'ENCRYPTED_PARTITION_ERROR|INTERNAL_ERROR|INVALID_PARTITION_TYPE_DATA_ERROR|MISSING_PARTITION_VALUE_ERROR|UNSUPPORTED_PARTITION_CHARACTER_ERROR', 'Partitions' => [ [ 'Values' => ['<string>', ...], ], // ... ], ], // ... ], 'IndexName' => '<string>', 'IndexStatus' => 'CREATING|ACTIVE|DELETING|FAILED', 'Keys' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, present if the current list segment is not the last.
- PartitionIndexDescriptorList
-
- Type: Array of PartitionIndexDescriptor structures
A list of index descriptors.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.
GetPartitions
$result = $client->getPartitions
([/* ... */]); $promise = $client->getPartitionsAsync
([/* ... */]);
Retrieves information about the partitions in a table.
Parameter Syntax
$result = $client->getPartitions([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'ExcludeColumnSchema' => true || false, 'Expression' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'QueryAsOfTime' => <integer || string || DateTime>, 'Segment' => [ 'SegmentNumber' => <integer>, // REQUIRED 'TotalSegments' => <integer>, // REQUIRED ], 'TableName' => '<string>', // REQUIRED 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- ExcludeColumnSchema
-
- Type: boolean
When true, specifies not returning the partition column schema. Useful when you are interested only in other partition attributes such as partition values or location. This approach avoids the problem of a large response by not returning duplicate data.
- Expression
-
- Type: string
An expression that filters the partitions to be returned.
The expression uses SQL syntax similar to the SQL
WHERE
filter clause. The SQL statement parser JSQLParser parses the expression.Operators: The following are the operators that you can use in the
Expression
API call:- =
-
Checks whether the values of the two operands are equal; if yes, then the condition becomes true.
Example: Assume 'variable a' holds 10 and 'variable b' holds 20.
(a = b) is not true.
- < >
-
Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.
Example: (a < > b) is true.
- >
-
Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.
Example: (a > b) is not true.
- <
-
Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.
Example: (a < b) is true.
- >=
-
Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a >= b) is not true.
- <=
-
Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a <= b) is true.
- AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL
-
Logical operators.
Supported Partition Key Types: The following are the supported partition keys.
-
string
-
date
-
timestamp
-
int
-
bigint
-
long
-
tinyint
-
smallint
-
decimal
If an type is encountered that is not valid, an exception is thrown.
The following list shows the valid operators on each type. When you define a crawler, the
partitionKey
type is created as aSTRING
, to be compatible with the catalog partitions.Sample API Call:
- MaxResults
-
- Type: int
The maximum number of partitions to return in a single response.
- NextToken
-
- Type: string
A continuation token, if this is not the first call to retrieve these partitions.
- QueryAsOfTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time as of when to read the partition contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with
TransactionId
. - Segment
-
- Type: Segment structure
The segment of the table's partitions to scan in this request.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
- TransactionId
-
- Type: string
The transaction ID at which to read the partition contents.
Result Syntax
[ 'NextToken' => '<string>', 'Partitions' => [ [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the returned list of partitions does not include the last one.
- Partitions
-
- Type: Array of Partition structures
A list of requested partitions.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
- InvalidStateException:
An error that indicates your data is in an invalid state.
- ResourceNotReadyException:
A resource was not ready for a transaction.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetPlan
$result = $client->getPlan
([/* ... */]); $promise = $client->getPlanAsync
([/* ... */]);
Gets code to perform a specified mapping.
Parameter Syntax
$result = $client->getPlan([ 'AdditionalPlanOptionsMap' => ['<string>', ...], 'Language' => 'PYTHON|SCALA', 'Location' => [ 'DynamoDB' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'Jdbc' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'S3' => [ [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], ], 'Mapping' => [ // REQUIRED [ 'SourcePath' => '<string>', 'SourceTable' => '<string>', 'SourceType' => '<string>', 'TargetPath' => '<string>', 'TargetTable' => '<string>', 'TargetType' => '<string>', ], // ... ], 'Sinks' => [ [ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], // ... ], 'Source' => [ // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ]);
Parameter Details
Members
- AdditionalPlanOptionsMap
-
- Type: Associative array of custom strings keys (GenericString) to strings
A map to hold additional optional key-value parameters.
Currently, these key-value pairs are supported:
-
inferSchema
— Specifies whether to setinferSchema
to true or false for the default script generated by an Glue job. For example, to setinferSchema
to true, pass the following key value pair:--additional-plan-options-map '{"inferSchema":"true"}'
- Language
-
- Type: string
The programming language of the code to perform the mapping.
- Location
-
- Type: Location structure
The parameters for the mapping.
- Mapping
-
- Required: Yes
- Type: Array of MappingEntry structures
The list of mappings from a source table to target tables.
- Sinks
-
- Type: Array of CatalogEntry structures
The target tables.
- Source
-
- Required: Yes
- Type: CatalogEntry structure
The source table.
Result Syntax
[ 'PythonScript' => '<string>', 'ScalaCode' => '<string>', ]
Result Details
Members
- PythonScript
-
- Type: string
A Python script to perform the mapping.
- ScalaCode
-
- Type: string
The Scala code to perform the mapping.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetRegistry
$result = $client->getRegistry
([/* ... */]); $promise = $client->getRegistryAsync
([/* ... */]);
Describes the specified registry in detail.
Parameter Syntax
$result = $client->getRegistry([ 'RegistryId' => [ // REQUIRED 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], ]);
Parameter Details
Members
- RegistryId
-
- Required: Yes
- Type: RegistryId structure
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Result Syntax
[ 'CreatedTime' => '<string>', 'Description' => '<string>', 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Status' => 'AVAILABLE|DELETING', 'UpdatedTime' => '<string>', ]
Result Details
Members
- CreatedTime
-
- Type: string
The date and time the registry was created.
- Description
-
- Type: string
A description of the registry.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry.
- RegistryName
-
- Type: string
The name of the registry.
- Status
-
- Type: string
The status of the registry.
- UpdatedTime
-
- Type: string
The date and time the registry was updated.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
GetResourcePolicies
$result = $client->getResourcePolicies
([/* ... */]); $promise = $client->getResourcePoliciesAsync
([/* ... */]);
Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants. Also retrieves the Data Catalog resource policy.
If you enabled metadata encryption in Data Catalog settings, and you do not have permission on the KMS key, the operation can't return the Data Catalog resource policy.
Parameter Syntax
$result = $client->getResourcePolicies([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'GetResourcePoliciesResponseList' => [ [ 'CreateTime' => <DateTime>, 'PolicyHash' => '<string>', 'PolicyInJson' => '<string>', 'UpdateTime' => <DateTime>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- GetResourcePoliciesResponseList
-
- Type: Array of GluePolicy structures
A list of the individual resource policies and the account-level resource policy.
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last resource policy available.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
GetResourcePolicy
$result = $client->getResourcePolicy
([/* ... */]); $promise = $client->getResourcePolicyAsync
([/* ... */]);
Retrieves a specified resource policy.
Parameter Syntax
$result = $client->getResourcePolicy([ 'ResourceArn' => '<string>', ]);
Parameter Details
Members
- ResourceArn
-
- Type: string
The ARN of the Glue resource for which to retrieve the resource policy. If not supplied, the Data Catalog resource policy is returned. Use
GetResourcePolicies
to view all existing resource policies. For more information see Specifying Glue Resource ARNs.
Result Syntax
[ 'CreateTime' => <DateTime>, 'PolicyHash' => '<string>', 'PolicyInJson' => '<string>', 'UpdateTime' => <DateTime>, ]
Result Details
Members
- CreateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time at which the policy was created.
- PolicyHash
-
- Type: string
Contains the hash value associated with this policy.
- PolicyInJson
-
- Type: string
Contains the requested policy document, in JSON format.
- UpdateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time at which the policy was last updated.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetSchema
$result = $client->getSchema
([/* ... */]); $promise = $client->getSchemaAsync
([/* ... */]);
Describes the specified schema in detail.
Parameter Syntax
$result = $client->getSchema([ 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided. -
SchemaId$SchemaName: The name of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided.
Result Syntax
[ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'CreatedTime' => '<string>', 'DataFormat' => 'AVRO|JSON|PROTOBUF', 'Description' => '<string>', 'LatestSchemaVersion' => <integer>, 'NextSchemaVersion' => <integer>, 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaCheckpoint' => <integer>, 'SchemaName' => '<string>', 'SchemaStatus' => 'AVAILABLE|PENDING|DELETING', 'UpdatedTime' => '<string>', ]
Result Details
Members
- Compatibility
-
- Type: string
The compatibility mode of the schema.
- CreatedTime
-
- Type: string
The date and time the schema was created.
- DataFormat
-
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - Description
-
- Type: string
A description of schema if specified when created
- LatestSchemaVersion
-
- Type: long (int|float)
The latest version of the schema associated with the returned schema definition.
- NextSchemaVersion
-
- Type: long (int|float)
The next version of the schema associated with the returned schema definition.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry.
- RegistryName
-
- Type: string
The name of the registry.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaCheckpoint
-
- Type: long (int|float)
The version number of the checkpoint (the last time the compatibility mode was changed).
- SchemaName
-
- Type: string
The name of the schema.
- SchemaStatus
-
- Type: string
The status of the schema.
- UpdatedTime
-
- Type: string
The date and time the schema was updated.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
GetSchemaByDefinition
$result = $client->getSchemaByDefinition
([/* ... */]); $promise = $client->getSchemaByDefinitionAsync
([/* ... */]);
Retrieves a schema by the SchemaDefinition
. The schema definition is sent to the Schema Registry, canonicalized, and hashed. If the hash is matched within the scope of the SchemaName
or ARN (or the default registry, if none is supplied), that schema’s metadata is returned. Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted
statuses will not be included in the results.
Parameter Syntax
$result = $client->getSchemaByDefinition([ 'SchemaDefinition' => '<string>', // REQUIRED 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- SchemaDefinition
-
- Required: Yes
- Type: string
The definition of the schema for which schema details are required.
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of
SchemaArn
orSchemaName
has to be provided. -
SchemaId$SchemaName: The name of the schema. One of
SchemaArn
orSchemaName
has to be provided.
Result Syntax
[ 'CreatedTime' => '<string>', 'DataFormat' => 'AVRO|JSON|PROTOBUF', 'SchemaArn' => '<string>', 'SchemaVersionId' => '<string>', 'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING', ]
Result Details
Members
- CreatedTime
-
- Type: string
The date and time the schema was created.
- DataFormat
-
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaVersionId
-
- Type: string
The schema ID of the schema version.
- Status
-
- Type: string
The status of the schema version.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
GetSchemaVersion
$result = $client->getSchemaVersion
([/* ... */]); $promise = $client->getSchemaVersionAsync
([/* ... */]);
Get the specified schema by its unique ID assigned when a version of the schema is created or registered. Schema versions in Deleted status will not be included in the results.
Parameter Syntax
$result = $client->getSchemaVersion([ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => [ 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- SchemaId
-
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided. -
SchemaId$SchemaName: The name of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided.
- SchemaVersionId
-
- Type: string
The
SchemaVersionId
of the schema version. This field is required for fetching by schema ID. Either this or theSchemaId
wrapper has to be provided. - SchemaVersionNumber
-
- Type: SchemaVersionNumber structure
The version number of the schema.
Result Syntax
[ 'CreatedTime' => '<string>', 'DataFormat' => 'AVRO|JSON|PROTOBUF', 'SchemaArn' => '<string>', 'SchemaDefinition' => '<string>', 'SchemaVersionId' => '<string>', 'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING', 'VersionNumber' => <integer>, ]
Result Details
Members
- CreatedTime
-
- Type: string
The date and time the schema version was created.
- DataFormat
-
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaDefinition
-
- Type: string
The schema definition for the schema ID.
- SchemaVersionId
-
- Type: string
The
SchemaVersionId
of the schema version. - Status
-
- Type: string
The status of the schema version.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
GetSchemaVersionsDiff
$result = $client->getSchemaVersionsDiff
([/* ... */]); $promise = $client->getSchemaVersionsDiffAsync
([/* ... */]);
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
This API allows you to compare two schema versions between two schema definitions under the same schema.
Parameter Syntax
$result = $client->getSchemaVersionsDiff([ 'FirstSchemaVersionNumber' => [ // REQUIRED 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], 'SchemaDiffType' => 'SYNTAX_DIFF', // REQUIRED 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SecondSchemaVersionNumber' => [ // REQUIRED 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- FirstSchemaVersionNumber
-
- Required: Yes
- Type: SchemaVersionNumber structure
The first of the two schema versions to be compared.
- SchemaDiffType
-
- Required: Yes
- Type: string
Refers to
SYNTAX_DIFF
, which is the currently supported diff type. - SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of
SchemaArn
orSchemaName
has to be provided. -
SchemaId$SchemaName: The name of the schema. One of
SchemaArn
orSchemaName
has to be provided.
- SecondSchemaVersionNumber
-
- Required: Yes
- Type: SchemaVersionNumber structure
The second of the two schema versions to be compared.
Result Syntax
[ 'Diff' => '<string>', ]
Result Details
Members
- Diff
-
- Type: string
The difference between schemas as a string in JsonPatch format.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
GetSecurityConfiguration
$result = $client->getSecurityConfiguration
([/* ... */]); $promise = $client->getSecurityConfigurationAsync
([/* ... */]);
Retrieves a specified security configuration.
Parameter Syntax
$result = $client->getSecurityConfiguration([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the security configuration to retrieve.
Result Syntax
[ 'SecurityConfiguration' => [ 'CreatedTimeStamp' => <DateTime>, 'EncryptionConfiguration' => [ 'CloudWatchEncryption' => [ 'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS', 'KmsKeyArn' => '<string>', ], 'JobBookmarksEncryption' => [ 'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS', 'KmsKeyArn' => '<string>', ], 'S3Encryption' => [ [ 'KmsKeyArn' => '<string>', 'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3', ], // ... ], ], 'Name' => '<string>', ], ]
Result Details
Members
- SecurityConfiguration
-
- Type: SecurityConfiguration structure
The requested security configuration.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetSecurityConfigurations
$result = $client->getSecurityConfigurations
([/* ... */]); $promise = $client->getSecurityConfigurationsAsync
([/* ... */]);
Retrieves a list of all security configurations.
Parameter Syntax
$result = $client->getSecurityConfigurations([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'NextToken' => '<string>', 'SecurityConfigurations' => [ [ 'CreatedTimeStamp' => <DateTime>, 'EncryptionConfiguration' => [ 'CloudWatchEncryption' => [ 'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS', 'KmsKeyArn' => '<string>', ], 'JobBookmarksEncryption' => [ 'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS', 'KmsKeyArn' => '<string>', ], 'S3Encryption' => [ [ 'KmsKeyArn' => '<string>', 'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3', ], // ... ], ], 'Name' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if there are more security configurations to return.
- SecurityConfigurations
-
- Type: Array of SecurityConfiguration structures
A list of security configurations.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetSession
$result = $client->getSession
([/* ... */]); $promise = $client->getSessionAsync
([/* ... */]);
Retrieves the session.
Parameter Syntax
$result = $client->getSession([ 'Id' => '<string>', // REQUIRED 'RequestOrigin' => '<string>', ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: string
The ID of the session.
- RequestOrigin
-
- Type: string
The origin of the request.
Result Syntax
[ 'Session' => [ 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'CompletedOn' => <DateTime>, 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DPUSeconds' => <float>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <float>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Progress' => <float>, 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- Session
-
- Type: Session structure
The session object is returned in the response.
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetStatement
$result = $client->getStatement
([/* ... */]); $promise = $client->getStatementAsync
([/* ... */]);
Retrieves the statement.
Parameter Syntax
$result = $client->getStatement([ 'Id' => <integer>, // REQUIRED 'RequestOrigin' => '<string>', 'SessionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: int
The Id of the statement.
- RequestOrigin
-
- Type: string
The origin of the request.
- SessionId
-
- Required: Yes
- Type: string
The Session ID of the statement.
Result Syntax
[ 'Statement' => [ 'Code' => '<string>', 'CompletedOn' => <integer>, 'Id' => <integer>, 'Output' => [ 'Data' => [ 'TextPlain' => '<string>', ], 'ErrorName' => '<string>', 'ErrorValue' => '<string>', 'ExecutionCount' => <integer>, 'Status' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR', 'Traceback' => ['<string>', ...], ], 'Progress' => <float>, 'StartedOn' => <integer>, 'State' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR', ], ]
Result Details
Members
- Statement
-
- Type: Statement structure
Returns the statement.
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
GetTable
$result = $client->getTable
([/* ... */]); $promise = $client->getTableAsync
([/* ... */]);
Retrieves the Table
definition in a Data Catalog for a specified table.
Parameter Syntax
$result = $client->getTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'IncludeStatusDetails' => true || false, 'Name' => '<string>', // REQUIRED 'QueryAsOfTime' => <integer || string || DateTime>, 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- IncludeStatusDetails
-
- Type: boolean
Specifies whether to include status details related to a request to create or update an Glue Data Catalog view.
- Name
-
- Required: Yes
- Type: string
The name of the table for which to retrieve the definition. For Hive compatibility, this name is entirely lowercase.
- QueryAsOfTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time as of when to read the table contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with
TransactionId
. - TransactionId
-
- Type: string
The transaction ID at which to read the table contents.
Result Syntax
[ 'Table' => [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], ]
Result Details
Members
- Table
-
- Type: Table structure
The
Table
object that defines the specified table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ResourceNotReadyException:
A resource was not ready for a transaction.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetTableOptimizer
$result = $client->getTableOptimizer
([/* ... */]); $promise = $client->getTableOptimizerAsync
([/* ... */]);
Returns the configuration of all optimizers associated with a specified table.
Parameter Syntax
$result = $client->getTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- Type
-
- Required: Yes
- Type: string
The type of table optimizer.
Result Syntax
[ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', 'TableOptimizer' => [ 'configuration' => [ 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'lastRun' => [ 'compactionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfBytesCompacted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfFilesCompacted' => <integer>, ], ], 'endTimestamp' => <DateTime>, 'error' => '<string>', 'eventType' => 'starting|completed|failed|in_progress', 'metrics' => [ 'JobDurationInHour' => '<string>', 'NumberOfBytesCompacted' => '<string>', 'NumberOfDpus' => '<string>', 'NumberOfFilesCompacted' => '<string>', ], 'orphanFileDeletionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDpus' => <integer>, 'NumberOfOrphanFilesDeleted' => <integer>, ], ], 'retentionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDataFilesDeleted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfManifestFilesDeleted' => <integer>, 'NumberOfManifestListsDeleted' => <integer>, ], ], 'startTimestamp' => <DateTime>, ], 'type' => 'compaction|retention|orphan_file_deletion', ], ]
Result Details
Members
- CatalogId
-
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Type: string
The name of the table.
- TableOptimizer
-
- Type: TableOptimizer structure
The optimizer associated with the specified table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
GetTableVersion
$result = $client->getTableVersion
([/* ... */]); $promise = $client->getTableVersionAsync
([/* ... */]);
Retrieves a specified version of a table.
Parameter Syntax
$result = $client->getTableVersion([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'VersionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
- VersionId
-
- Type: string
The ID value of the table version to be retrieved. A
VersionID
is a string representation of an integer. Each version is incremented by 1.
Result Syntax
[ 'TableVersion' => [ 'Table' => [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], 'VersionId' => '<string>', ], ]
Result Details
Members
- TableVersion
-
- Type: TableVersion structure
The requested table version.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetTableVersions
$result = $client->getTableVersions
([/* ... */]); $promise = $client->getTableVersionsAsync
([/* ... */]);
Retrieves a list of strings that identify available versions of a specified table.
Parameter Syntax
$result = $client->getTableVersions([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- MaxResults
-
- Type: int
The maximum number of table versions to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is not the first call.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
Result Syntax
[ 'NextToken' => '<string>', 'TableVersions' => [ [ 'Table' => [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], 'VersionId' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the list of available versions does not include the last one.
- TableVersions
-
- Type: Array of TableVersion structures
A list of strings identifying available versions of the specified table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetTables
$result = $client->getTables
([/* ... */]); $promise = $client->getTablesAsync
([/* ... */]);
Retrieves the definitions of some or all of the tables in a given Database
.
Parameter Syntax
$result = $client->getTables([ 'AttributesToGet' => ['<string>', ...], 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Expression' => '<string>', 'IncludeStatusDetails' => true || false, 'MaxResults' => <integer>, 'NextToken' => '<string>', 'QueryAsOfTime' => <integer || string || DateTime>, 'TransactionId' => '<string>', ]);
Parameter Details
Members
- AttributesToGet
-
- Type: Array of strings
Specifies the table fields returned by the
GetTables
call. This parameter doesn’t accept an empty list. The request must includeNAME
.The following are the valid combinations of values:
-
NAME
- Names of all tables in the database. -
NAME
,TABLE_TYPE
- Names of all tables and the table types.
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog whose tables to list. For Hive compatibility, this name is entirely lowercase.
- Expression
-
- Type: string
A regular expression pattern. If present, only those tables whose names match the pattern are returned.
- IncludeStatusDetails
-
- Type: boolean
Specifies whether to include status details related to a request to create or update an Glue Data Catalog view.
- MaxResults
-
- Type: int
The maximum number of tables to return in a single response.
- NextToken
-
- Type: string
A continuation token, included if this is a continuation call.
- QueryAsOfTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time as of when to read the table contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with
TransactionId
. - TransactionId
-
- Type: string
The transaction ID at which to read the table contents.
Result Syntax
[ 'NextToken' => '<string>', 'TableList' => [ [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, present if the current list segment is not the last.
- TableList
-
- Type: Array of Table structures
A list of the requested
Table
objects.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetTags
$result = $client->getTags
([/* ... */]); $promise = $client->getTagsAsync
([/* ... */]);
Retrieves a list of tags associated with a resource.
Parameter Syntax
$result = $client->getTags([ 'ResourceArn' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the resource for which to retrieve tags.
Result Syntax
[ 'Tags' => ['<string>', ...], ]
Result Details
Members
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The requested tags.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- EntityNotFoundException:
A specified entity does not exist
GetTrigger
$result = $client->getTrigger
([/* ... */]); $promise = $client->getTriggerAsync
([/* ... */]);
Retrieves the definition of a trigger.
Parameter Syntax
$result = $client->getTrigger([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to retrieve.
Result Syntax
[ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ]
Result Details
Members
- Trigger
-
- Type: Trigger structure
The requested trigger definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetTriggers
$result = $client->getTriggers
([/* ... */]); $promise = $client->getTriggersAsync
([/* ... */]);
Gets all the triggers associated with a job.
Parameter Syntax
$result = $client->getTriggers([ 'DependentJobName' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- DependentJobName
-
- Type: string
The name of the job to retrieve triggers for. The trigger that can start this job is returned, and if there is no such trigger, all triggers are returned.
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'NextToken' => '<string>', 'Triggers' => [ [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if not all the requested triggers have yet been returned.
- Triggers
-
- Type: Array of Trigger structures
A list of triggers for the specified job.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetUnfilteredPartitionMetadata
$result = $client->getUnfilteredPartitionMetadata
([/* ... */]); $promise = $client->getUnfilteredPartitionMetadataAsync
([/* ... */]);
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
For IAM authorization, the public IAM action associated with this API is glue:GetPartition
.
Parameter Syntax
$result = $client->getUnfilteredPartitionMetadata([ 'AuditContext' => [ 'AdditionalAuditContext' => '<string>', 'AllColumnsRequested' => true || false, 'RequestedColumns' => ['<string>', ...], ], 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'QuerySessionContext' => [ 'AdditionalContext' => ['<string>', ...], 'ClusterId' => '<string>', 'QueryAuthorizationId' => '<string>', 'QueryId' => '<string>', 'QueryStartTime' => <integer || string || DateTime>, ], 'Region' => '<string>', 'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- AuditContext
-
- Type: AuditContext structure
A structure containing Lake Formation audit context information.
- CatalogId
-
- Required: Yes
- Type: string
The catalog ID where the partition resides.
- DatabaseName
-
- Required: Yes
- Type: string
(Required) Specifies the name of a database that contains the partition.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
(Required) A list of partition key values.
- QuerySessionContext
-
- Type: QuerySessionContext structure
A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.
- Region
-
- Type: string
Specified only if the base tables belong to a different Amazon Web Services Region.
- SupportedPermissionTypes
-
- Required: Yes
- Type: Array of strings
(Required) A list of supported permission types.
- TableName
-
- Required: Yes
- Type: string
(Required) Specifies the name of a table that contains the partition.
Result Syntax
[ 'AuthorizedColumns' => ['<string>', ...], 'IsRegisteredWithLakeFormation' => true || false, 'Partition' => [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], ]
Result Details
Members
- AuthorizedColumns
-
- Type: Array of strings
A list of column names that the user has been granted access to.
- IsRegisteredWithLakeFormation
-
- Type: boolean
A Boolean value that indicates whether the partition location is registered with Lake Formation.
- Partition
-
- Type: Partition structure
A Partition object containing the partition metadata.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- PermissionTypeMismatchException:
The operation timed out.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetUnfilteredPartitionsMetadata
$result = $client->getUnfilteredPartitionsMetadata
([/* ... */]); $promise = $client->getUnfilteredPartitionsMetadataAsync
([/* ... */]);
Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
For IAM authorization, the public IAM action associated with this API is glue:GetPartitions
.
Parameter Syntax
$result = $client->getUnfilteredPartitionsMetadata([ 'AuditContext' => [ 'AdditionalAuditContext' => '<string>', 'AllColumnsRequested' => true || false, 'RequestedColumns' => ['<string>', ...], ], 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'Expression' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'QuerySessionContext' => [ 'AdditionalContext' => ['<string>', ...], 'ClusterId' => '<string>', 'QueryAuthorizationId' => '<string>', 'QueryId' => '<string>', 'QueryStartTime' => <integer || string || DateTime>, ], 'Region' => '<string>', 'Segment' => [ 'SegmentNumber' => <integer>, // REQUIRED 'TotalSegments' => <integer>, // REQUIRED ], 'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- AuditContext
-
- Type: AuditContext structure
A structure containing Lake Formation audit context information.
- CatalogId
-
- Required: Yes
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is provided, the AWS account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- Expression
-
- Type: string
An expression that filters the partitions to be returned.
The expression uses SQL syntax similar to the SQL
WHERE
filter clause. The SQL statement parser JSQLParser parses the expression.Operators: The following are the operators that you can use in the
Expression
API call:- =
-
Checks whether the values of the two operands are equal; if yes, then the condition becomes true.
Example: Assume 'variable a' holds 10 and 'variable b' holds 20.
(a = b) is not true.
- < >
-
Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.
Example: (a < > b) is true.
- >
-
Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.
Example: (a > b) is not true.
- <
-
Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.
Example: (a < b) is true.
- >=
-
Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a >= b) is not true.
- <=
-
Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a <= b) is true.
- AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL
-
Logical operators.
Supported Partition Key Types: The following are the supported partition keys.
-
string
-
date
-
timestamp
-
int
-
bigint
-
long
-
tinyint
-
smallint
-
decimal
If an type is encountered that is not valid, an exception is thrown.
- MaxResults
-
- Type: int
The maximum number of partitions to return in a single response.
- NextToken
-
- Type: string
A continuation token, if this is not the first call to retrieve these partitions.
- QuerySessionContext
-
- Type: QuerySessionContext structure
A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.
- Region
-
- Type: string
Specified only if the base tables belong to a different Amazon Web Services Region.
- Segment
-
- Type: Segment structure
The segment of the table's partitions to scan in this request.
- SupportedPermissionTypes
-
- Required: Yes
- Type: Array of strings
A list of supported permission types.
- TableName
-
- Required: Yes
- Type: string
The name of the table that contains the partition.
Result Syntax
[ 'NextToken' => '<string>', 'UnfilteredPartitions' => [ [ 'AuthorizedColumns' => ['<string>', ...], 'IsRegisteredWithLakeFormation' => true || false, 'Partition' => [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the returned list of partitions does not include the last one.
- UnfilteredPartitions
-
- Type: Array of UnfilteredPartition structures
A list of requested partitions.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- PermissionTypeMismatchException:
The operation timed out.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetUnfilteredTableMetadata
$result = $client->getUnfilteredTableMetadata
([/* ... */]); $promise = $client->getUnfilteredTableMetadataAsync
([/* ... */]);
Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
For IAM authorization, the public IAM action associated with this API is glue:GetTable
.
Parameter Syntax
$result = $client->getUnfilteredTableMetadata([ 'AuditContext' => [ 'AdditionalAuditContext' => '<string>', 'AllColumnsRequested' => true || false, 'RequestedColumns' => ['<string>', ...], ], 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'ParentResourceArn' => '<string>', 'Permissions' => ['<string>', ...], 'QuerySessionContext' => [ 'AdditionalContext' => ['<string>', ...], 'ClusterId' => '<string>', 'QueryAuthorizationId' => '<string>', 'QueryId' => '<string>', 'QueryStartTime' => <integer || string || DateTime>, ], 'Region' => '<string>', 'RootResourceArn' => '<string>', 'SupportedDialect' => [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', ], 'SupportedPermissionTypes' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- AuditContext
-
- Type: AuditContext structure
A structure containing Lake Formation audit context information.
- CatalogId
-
- Required: Yes
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
(Required) Specifies the name of a database that contains the table.
- Name
-
- Required: Yes
- Type: string
(Required) Specifies the name of a table for which you are requesting metadata.
- ParentResourceArn
-
- Type: string
The resource ARN of the view.
- Permissions
-
- Type: Array of strings
The Lake Formation data permissions of the caller on the table. Used to authorize the call when no view context is found.
- QuerySessionContext
-
- Type: QuerySessionContext structure
A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.
- Region
-
- Type: string
Specified only if the base tables belong to a different Amazon Web Services Region.
- RootResourceArn
-
- Type: string
The resource ARN of the root view in a chain of nested views.
- SupportedDialect
-
- Type: SupportedDialect structure
A structure specifying the dialect and dialect version used by the query engine.
- SupportedPermissionTypes
-
- Required: Yes
- Type: Array of strings
Indicates the level of filtering a third-party analytical engine is capable of enforcing when calling the
GetUnfilteredTableMetadata
API operation. Accepted values are:-
COLUMN_PERMISSION
- Column permissions ensure that users can access only specific columns in the table. If there are particular columns contain sensitive data, data lake administrators can define column filters that exclude access to specific columns. -
CELL_FILTER_PERMISSION
- Cell-level filtering combines column filtering (include or exclude columns) and row filter expressions to restrict access to individual elements in the table. -
NESTED_PERMISSION
- Nested permissions combines cell-level filtering and nested column filtering to restrict access to columns and/or nested columns in specific rows based on row filter expressions. -
NESTED_CELL_PERMISSION
- Nested cell permissions combines nested permission with nested cell-level filtering. This allows different subsets of nested columns to be restricted based on an array of row filter expressions.
Note: Each of these permission types follows a hierarchical order where each subsequent permission type includes all permission of the previous type.
Important: If you provide a supported permission type that doesn't match the user's level of permissions on the table, then Lake Formation raises an exception. For example, if the third-party engine calling the
GetUnfilteredTableMetadata
operation can enforce only column-level filtering, and the user has nested cell filtering applied on the table, Lake Formation throws an exception, and will not return unfiltered table metadata and data access credentials.
Result Syntax
[ 'AuthorizedColumns' => ['<string>', ...], 'CellFilters' => [ [ 'ColumnName' => '<string>', 'RowFilterExpression' => '<string>', ], // ... ], 'IsMultiDialectView' => true || false, 'IsProtected' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'Permissions' => ['<string>', ...], 'QueryAuthorizationId' => '<string>', 'ResourceArn' => '<string>', 'RowFilter' => '<string>', 'Table' => [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], ]
Result Details
Members
- AuthorizedColumns
-
- Type: Array of strings
A list of column names that the user has been granted access to.
- CellFilters
-
- Type: Array of ColumnRowFilter structures
A list of column row filters.
- IsMultiDialectView
-
- Type: boolean
Specifies whether the view supports the SQL dialects of one or more different query engines and can therefore be read by those engines.
- IsProtected
-
- Type: boolean
A flag that instructs the engine not to push user-provided operations into the logical plan of the view during query planning. However, if set this flag does not guarantee that the engine will comply. Refer to the engine's documentation to understand the guarantees provided, if any.
- IsRegisteredWithLakeFormation
-
- Type: boolean
A Boolean value that indicates whether the partition location is registered with Lake Formation.
- Permissions
-
- Type: Array of strings
The Lake Formation data permissions of the caller on the table. Used to authorize the call when no view context is found.
- QueryAuthorizationId
-
- Type: string
A cryptographically generated query identifier generated by Glue or Lake Formation.
- ResourceArn
-
- Type: string
The resource ARN of the parent resource extracted from the request.
- RowFilter
-
- Type: string
The filter that applies to the table. For example when applying the filter in SQL, it would go in the
WHERE
clause and can be evaluated by using anAND
operator with any other predicates applied by the user querying the table. - Table
-
- Type: Table structure
A Table object containing the table metadata.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- PermissionTypeMismatchException:
The operation timed out.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
GetUsageProfile
$result = $client->getUsageProfile
([/* ... */]); $promise = $client->getUsageProfileAsync
([/* ... */]);
Retrieves information about the specified Glue usage profile.
Parameter Syntax
$result = $client->getUsageProfile([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the usage profile to retrieve.
Result Syntax
[ 'Configuration' => [ 'JobConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], 'SessionConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], ], 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', ]
Result Details
Members
- Configuration
-
- Type: ProfileConfiguration structure
A
ProfileConfiguration
object specifying the job and session values for the profile. - CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the usage profile was created.
- Description
-
- Type: string
A description of the usage profile.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the usage profile was last modified.
- Name
-
- Type: string
The name of the usage profile.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- OperationNotSupportedException:
The operation is not available in the region.
GetUserDefinedFunction
$result = $client->getUserDefinedFunction
([/* ... */]); $promise = $client->getUserDefinedFunctionAsync
([/* ... */]);
Retrieves a specified function definition from the Data Catalog.
Parameter Syntax
$result = $client->getUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the function to be retrieved is located. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the function is located.
- FunctionName
-
- Required: Yes
- Type: string
The name of the function.
Result Syntax
[ 'UserDefinedFunction' => [ 'CatalogId' => '<string>', 'ClassName' => '<string>', 'CreateTime' => <DateTime>, 'DatabaseName' => '<string>', 'FunctionName' => '<string>', 'OwnerName' => '<string>', 'OwnerType' => 'USER|ROLE|GROUP', 'ResourceUris' => [ [ 'ResourceType' => 'JAR|FILE|ARCHIVE', 'Uri' => '<string>', ], // ... ], ], ]
Result Details
Members
- UserDefinedFunction
-
- Type: UserDefinedFunction structure
The requested function definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetUserDefinedFunctions
$result = $client->getUserDefinedFunctions
([/* ... */]); $promise = $client->getUserDefinedFunctionsAsync
([/* ... */]);
Retrieves multiple function definitions from the Data Catalog.
Parameter Syntax
$result = $client->getUserDefinedFunctions([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Pattern' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the functions to be retrieved are located. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Type: string
The name of the catalog database where the functions are located. If none is provided, functions from all the databases across the catalog will be returned.
- MaxResults
-
- Type: int
The maximum number of functions to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- Pattern
-
- Required: Yes
- Type: string
An optional function-name pattern string that filters the function definitions returned.
Result Syntax
[ 'NextToken' => '<string>', 'UserDefinedFunctions' => [ [ 'CatalogId' => '<string>', 'ClassName' => '<string>', 'CreateTime' => <DateTime>, 'DatabaseName' => '<string>', 'FunctionName' => '<string>', 'OwnerName' => '<string>', 'OwnerType' => 'USER|ROLE|GROUP', 'ResourceUris' => [ [ 'ResourceType' => 'JAR|FILE|ARCHIVE', 'Uri' => '<string>', ], // ... ], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the list of functions returned does not include the last requested function.
- UserDefinedFunctions
-
- Type: Array of UserDefinedFunction structures
A list of requested function definitions.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
GetWorkflow
$result = $client->getWorkflow
([/* ... */]); $promise = $client->getWorkflowAsync
([/* ... */]);
Retrieves resource metadata for a workflow.
Parameter Syntax
$result = $client->getWorkflow([ 'IncludeGraph' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- IncludeGraph
-
- Type: boolean
Specifies whether to include a graph when returning the workflow resource metadata.
- Name
-
- Required: Yes
- Type: string
The name of the workflow to retrieve.
Result Syntax
[ 'Workflow' => [ 'BlueprintDetails' => [ 'BlueprintName' => '<string>', 'RunId' => '<string>', ], 'CreatedOn' => <DateTime>, 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'LastModifiedOn' => <DateTime>, 'LastRun' => [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'Name' => '<string>', 'PreviousRunId' => '<string>', 'StartedOn' => <DateTime>, 'StartingEventBatchCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Statistics' => [ 'ErroredActions' => <integer>, 'FailedActions' => <integer>, 'RunningActions' => <integer>, 'StoppedActions' => <integer>, 'SucceededActions' => <integer>, 'TimeoutActions' => <integer>, 'TotalActions' => <integer>, 'WaitingActions' => <integer>, ], 'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR', 'WorkflowRunId' => '<string>', 'WorkflowRunProperties' => ['<string>', ...], ], 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', ], ]
Result Details
Members
- Workflow
-
- Type: Workflow structure
The resource metadata for the workflow.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetWorkflowRun
$result = $client->getWorkflowRun
([/* ... */]); $promise = $client->getWorkflowRunAsync
([/* ... */]);
Retrieves the metadata for a given workflow run. Job run history is accessible for 90 days for your workflow and job run.
Parameter Syntax
$result = $client->getWorkflowRun([ 'IncludeGraph' => true || false, 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- IncludeGraph
-
- Type: boolean
Specifies whether to include the workflow graph in response or not.
- Name
-
- Required: Yes
- Type: string
Name of the workflow being run.
- RunId
-
- Required: Yes
- Type: string
The ID of the workflow run.
Result Syntax
[ 'Run' => [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'Name' => '<string>', 'PreviousRunId' => '<string>', 'StartedOn' => <DateTime>, 'StartingEventBatchCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Statistics' => [ 'ErroredActions' => <integer>, 'FailedActions' => <integer>, 'RunningActions' => <integer>, 'StoppedActions' => <integer>, 'SucceededActions' => <integer>, 'TimeoutActions' => <integer>, 'TotalActions' => <integer>, 'WaitingActions' => <integer>, ], 'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR', 'WorkflowRunId' => '<string>', 'WorkflowRunProperties' => ['<string>', ...], ], ]
Result Details
Members
- Run
-
- Type: WorkflowRun structure
The requested workflow run metadata.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetWorkflowRunProperties
$result = $client->getWorkflowRunProperties
([/* ... */]); $promise = $client->getWorkflowRunPropertiesAsync
([/* ... */]);
Retrieves the workflow run properties which were set during the run.
Parameter Syntax
$result = $client->getWorkflowRunProperties([ 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the workflow which was run.
- RunId
-
- Required: Yes
- Type: string
The ID of the workflow run whose run properties should be returned.
Result Syntax
[ 'RunProperties' => ['<string>', ...], ]
Result Details
Members
- RunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
The workflow run properties which were set during the specified run.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetWorkflowRuns
$result = $client->getWorkflowRuns
([/* ... */]); $promise = $client->getWorkflowRunsAsync
([/* ... */]);
Retrieves metadata for all runs of a given workflow.
Parameter Syntax
$result = $client->getWorkflowRuns([ 'IncludeGraph' => true || false, 'MaxResults' => <integer>, 'Name' => '<string>', // REQUIRED 'NextToken' => '<string>', ]);
Parameter Details
Members
- IncludeGraph
-
- Type: boolean
Specifies whether to include the workflow graph in response or not.
- MaxResults
-
- Type: int
The maximum number of workflow runs to be included in the response.
- Name
-
- Required: Yes
- Type: string
Name of the workflow whose metadata of runs should be returned.
- NextToken
-
- Type: string
The maximum size of the response.
Result Syntax
[ 'NextToken' => '<string>', 'Runs' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'Name' => '<string>', 'PreviousRunId' => '<string>', 'StartedOn' => <DateTime>, 'StartingEventBatchCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Statistics' => [ 'ErroredActions' => <integer>, 'FailedActions' => <integer>, 'RunningActions' => <integer>, 'StoppedActions' => <integer>, 'SucceededActions' => <integer>, 'TimeoutActions' => <integer>, 'TotalActions' => <integer>, 'WaitingActions' => <integer>, ], 'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR', 'WorkflowRunId' => '<string>', 'WorkflowRunProperties' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if not all requested workflow runs have been returned.
- Runs
-
- Type: Array of WorkflowRun structures
A list of workflow run metadata objects.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ImportCatalogToGlue
$result = $client->importCatalogToGlue
([/* ... */]); $promise = $client->importCatalogToGlueAsync
([/* ... */]);
Imports an existing Amazon Athena Data Catalog to Glue.
Parameter Syntax
$result = $client->importCatalogToGlue([ 'CatalogId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog to import. Currently, this should be the Amazon Web Services account ID.
Result Syntax
[]
Result Details
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListBlueprints
$result = $client->listBlueprints
([/* ... */]); $promise = $client->listBlueprintsAsync
([/* ... */]);
Lists all the blueprint names in an account.
Parameter Syntax
$result = $client->listBlueprints([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Filters the list by an Amazon Web Services resource tag.
Result Syntax
[ 'Blueprints' => ['<string>', ...], 'NextToken' => '<string>', ]
Result Details
Members
- Blueprints
-
- Type: Array of strings
List of names of blueprints in the account.
- NextToken
-
- Type: string
A continuation token, if not all blueprint names have been returned.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListColumnStatisticsTaskRuns
$result = $client->listColumnStatisticsTaskRuns
([/* ... */]); $promise = $client->listColumnStatisticsTaskRunsAsync
([/* ... */]);
List all task runs for a particular account.
Parameter Syntax
$result = $client->listColumnStatisticsTaskRuns([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'ColumnStatisticsTaskRunIds' => ['<string>', ...], 'NextToken' => '<string>', ]
Result Details
Members
- ColumnStatisticsTaskRunIds
-
- Type: Array of strings
A list of column statistics task run IDs.
- NextToken
-
- Type: string
A continuation token, if not all task run IDs have yet been returned.
Errors
- OperationTimeoutException:
The operation timed out.
ListCrawlers
$result = $client->listCrawlers
([/* ... */]); $promise = $client->listCrawlersAsync
([/* ... */]);
Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
Parameter Syntax
$result = $client->listCrawlers([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Specifies to return only these tagged resources.
Result Syntax
[ 'CrawlerNames' => ['<string>', ...], 'NextToken' => '<string>', ]
Result Details
Members
- CrawlerNames
-
- Type: Array of strings
The names of all crawlers in the account, or the crawlers with the specified tags.
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
Errors
- OperationTimeoutException:
The operation timed out.
ListCrawls
$result = $client->listCrawls
([/* ... */]); $promise = $client->listCrawlsAsync
([/* ... */]);
Returns all the crawls of a specified crawler. Returns only the crawls that have occurred since the launch date of the crawler history feature, and only retains up to 12 months of crawls. Older crawls will not be returned.
You may use this API to:
-
Retrive all the crawls of a specified crawler.
-
Retrieve all the crawls of a specified crawler within a limited count.
-
Retrieve all the crawls of a specified crawler in a specific time range.
-
Retrieve all the crawls of a specified crawler with a particular state, crawl ID, or DPU hour value.
Parameter Syntax
$result = $client->listCrawls([ 'CrawlerName' => '<string>', // REQUIRED 'Filters' => [ [ 'FieldName' => 'CRAWL_ID|STATE|START_TIME|END_TIME|DPU_HOUR', 'FieldValue' => '<string>', 'FilterOperator' => 'GT|GE|LT|LE|EQ|NE', ], // ... ], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- CrawlerName
-
- Required: Yes
- Type: string
The name of the crawler whose runs you want to retrieve.
- Filters
-
- Type: Array of CrawlsFilter structures
Filters the crawls by the criteria you specify in a list of
CrawlsFilter
objects. - MaxResults
-
- Type: int
The maximum number of results to return. The default is 20, and maximum is 100.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'Crawls' => [ [ 'CrawlId' => '<string>', 'DPUHour' => <float>, 'EndTime' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'State' => 'RUNNING|COMPLETED|FAILED|STOPPED', 'Summary' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Crawls
-
- Type: Array of CrawlerHistory structures
A list of
CrawlerHistory
objects representing the crawl runs that meet your criteria. - NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
ListCustomEntityTypes
$result = $client->listCustomEntityTypes
([/* ... */]); $promise = $client->listCustomEntityTypesAsync
([/* ... */]);
Lists all the custom patterns that have been created.
Parameter Syntax
$result = $client->listCustomEntityTypes([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of key-value pair tags.
Result Syntax
[ 'CustomEntityTypes' => [ [ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', 'RegexString' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- CustomEntityTypes
-
- Type: Array of CustomEntityType structures
A list of
CustomEntityType
objects representing custom patterns. - NextToken
-
- Type: string
A pagination token, if more results are available.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListDataQualityResults
$result = $client->listDataQualityResults
([/* ... */]); $promise = $client->listDataQualityResultsAsync
([/* ... */]);
Returns all data quality execution results for your account.
Parameter Syntax
$result = $client->listDataQualityResults([ 'Filter' => [ 'DataSource' => [ 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'JobName' => '<string>', 'JobRunId' => '<string>', 'StartedAfter' => <integer || string || DateTime>, 'StartedBefore' => <integer || string || DateTime>, ], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- Filter
-
- Type: DataQualityResultFilterCriteria structure
The filter criteria.
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
Result Syntax
[ 'NextToken' => '<string>', 'Results' => [ [ 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'JobName' => '<string>', 'JobRunId' => '<string>', 'ResultId' => '<string>', 'StartedOn' => <DateTime>, ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- Results
-
- Required: Yes
- Type: Array of DataQualityResultDescription structures
A list of
DataQualityResultDescription
objects.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListDataQualityRuleRecommendationRuns
$result = $client->listDataQualityRuleRecommendationRuns
([/* ... */]); $promise = $client->listDataQualityRuleRecommendationRunsAsync
([/* ... */]);
Lists the recommendation runs meeting the filter criteria.
Parameter Syntax
$result = $client->listDataQualityRuleRecommendationRuns([ 'Filter' => [ 'DataSource' => [ // REQUIRED 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'StartedAfter' => <integer || string || DateTime>, 'StartedBefore' => <integer || string || DateTime>, ], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- Filter
-
- Type: DataQualityRuleRecommendationRunFilter structure
The filter criteria.
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
Result Syntax
[ 'NextToken' => '<string>', 'Runs' => [ [ 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- Runs
-
- Type: Array of DataQualityRuleRecommendationRunDescription structures
A list of
DataQualityRuleRecommendationRunDescription
objects.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListDataQualityRulesetEvaluationRuns
$result = $client->listDataQualityRulesetEvaluationRuns
([/* ... */]); $promise = $client->listDataQualityRulesetEvaluationRunsAsync
([/* ... */]);
Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
Parameter Syntax
$result = $client->listDataQualityRulesetEvaluationRuns([ 'Filter' => [ 'DataSource' => [ // REQUIRED 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'StartedAfter' => <integer || string || DateTime>, 'StartedBefore' => <integer || string || DateTime>, ], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- Filter
-
- Type: DataQualityRulesetEvaluationRunFilter structure
The filter criteria.
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
Result Syntax
[ 'NextToken' => '<string>', 'Runs' => [ [ 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- Runs
-
- Type: Array of DataQualityRulesetEvaluationRunDescription structures
A list of
DataQualityRulesetEvaluationRunDescription
objects representing data quality ruleset runs.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListDataQualityRulesets
$result = $client->listDataQualityRulesets
([/* ... */]); $promise = $client->listDataQualityRulesetsAsync
([/* ... */]);
Returns a paginated list of rulesets for the specified list of Glue tables.
Parameter Syntax
$result = $client->listDataQualityRulesets([ 'Filter' => [ 'CreatedAfter' => <integer || string || DateTime>, 'CreatedBefore' => <integer || string || DateTime>, 'Description' => '<string>', 'LastModifiedAfter' => <integer || string || DateTime>, 'LastModifiedBefore' => <integer || string || DateTime>, 'Name' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Filter
-
- Type: DataQualityRulesetFilterCriteria structure
The filter criteria.
- MaxResults
-
- Type: int
The maximum number of results to return.
- NextToken
-
- Type: string
A paginated token to offset the results.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of key-value pair tags.
Result Syntax
[ 'NextToken' => '<string>', 'Rulesets' => [ [ 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'RecommendationRunId' => '<string>', 'RuleCount' => <integer>, 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token, if more results are available.
- Rulesets
-
- Type: Array of DataQualityRulesetListDetails structures
A paginated list of rulesets for the specified list of Glue tables.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListDataQualityStatisticAnnotations
$result = $client->listDataQualityStatisticAnnotations
([/* ... */]); $promise = $client->listDataQualityStatisticAnnotationsAsync
([/* ... */]);
Retrieve annotations for a data quality statistic.
Parameter Syntax
$result = $client->listDataQualityStatisticAnnotations([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ProfileId' => '<string>', 'StatisticId' => '<string>', 'TimestampFilter' => [ 'RecordedAfter' => <integer || string || DateTime>, 'RecordedBefore' => <integer || string || DateTime>, ], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
A pagination token to retrieve the next set of results.
- ProfileId
-
- Type: string
The Profile ID.
- StatisticId
-
- Type: string
The Statistic ID.
- TimestampFilter
-
- Type: TimestampFilter structure
A timestamp filter.
Result Syntax
[ 'Annotations' => [ [ 'InclusionAnnotation' => [ 'LastModifiedOn' => <DateTime>, 'Value' => 'INCLUDE|EXCLUDE', ], 'ProfileId' => '<string>', 'StatisticId' => '<string>', 'StatisticRecordedOn' => <DateTime>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Annotations
-
- Type: Array of StatisticAnnotation structures
A list of
StatisticAnnotation
applied to the Statistic - NextToken
-
- Type: string
A pagination token to retrieve the next set of results.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
ListDataQualityStatistics
$result = $client->listDataQualityStatistics
([/* ... */]); $promise = $client->listDataQualityStatisticsAsync
([/* ... */]);
Retrieves a list of data quality statistics.
Parameter Syntax
$result = $client->listDataQualityStatistics([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ProfileId' => '<string>', 'StatisticId' => '<string>', 'TimestampFilter' => [ 'RecordedAfter' => <integer || string || DateTime>, 'RecordedBefore' => <integer || string || DateTime>, ], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
A pagination token to request the next page of results.
- ProfileId
-
- Type: string
The Profile ID.
- StatisticId
-
- Type: string
The Statistic ID.
- TimestampFilter
-
- Type: TimestampFilter structure
A timestamp filter.
Result Syntax
[ 'NextToken' => '<string>', 'Statistics' => [ [ 'ColumnsReferenced' => ['<string>', ...], 'DoubleValue' => <float>, 'EvaluationLevel' => 'Dataset|Column|Multicolumn', 'InclusionAnnotation' => [ 'LastModifiedOn' => <DateTime>, 'Value' => 'INCLUDE|EXCLUDE', ], 'ProfileId' => '<string>', 'RecordedOn' => <DateTime>, 'ReferencedDatasets' => ['<string>', ...], 'RunIdentifier' => [ 'JobRunId' => '<string>', 'RunId' => '<string>', ], 'StatisticId' => '<string>', 'StatisticName' => '<string>', 'StatisticProperties' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A pagination token to request the next page of results.
- Statistics
-
- Type: Array of StatisticSummary structures
A
StatisticSummaryList
.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
ListDevEndpoints
$result = $client->listDevEndpoints
([/* ... */]); $promise = $client->listDevEndpointsAsync
([/* ... */]);
Retrieves the names of all DevEndpoint
resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
Parameter Syntax
$result = $client->listDevEndpoints([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Specifies to return only these tagged resources.
Result Syntax
[ 'DevEndpointNames' => ['<string>', ...], 'NextToken' => '<string>', ]
Result Details
Members
- DevEndpointNames
-
- Type: Array of strings
The names of all the
DevEndpoint
s in the account, or theDevEndpoint
s with the specified tags. - NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListJobs
$result = $client->listJobs
([/* ... */]); $promise = $client->listJobsAsync
([/* ... */]);
Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
Parameter Syntax
$result = $client->listJobs([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Specifies to return only these tagged resources.
Result Syntax
[ 'JobNames' => ['<string>', ...], 'NextToken' => '<string>', ]
Result Details
Members
- JobNames
-
- Type: Array of strings
The names of all jobs in the account, or the jobs with the specified tags.
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListMLTransforms
$result = $client->listMLTransforms
([/* ... */]); $promise = $client->listMLTransformsAsync
([/* ... */]);
Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag. This operation takes the optional Tags
field, which you can use as a filter of the responses so that tagged resources can be retrieved as a group. If you choose to use tag filtering, only resources with the tags are retrieved.
Parameter Syntax
$result = $client->listMLTransforms([ 'Filter' => [ 'CreatedAfter' => <integer || string || DateTime>, 'CreatedBefore' => <integer || string || DateTime>, 'GlueVersion' => '<string>', 'LastModifiedAfter' => <integer || string || DateTime>, 'LastModifiedBefore' => <integer || string || DateTime>, 'Name' => '<string>', 'Schema' => [ [ 'DataType' => '<string>', 'Name' => '<string>', ], // ... ], 'Status' => 'NOT_READY|READY|DELETING', 'TransformType' => 'FIND_MATCHES', ], 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Sort' => [ 'Column' => 'NAME|TRANSFORM_TYPE|STATUS|CREATED|LAST_MODIFIED', // REQUIRED 'SortDirection' => 'DESCENDING|ASCENDING', // REQUIRED ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Filter
-
- Type: TransformFilterCriteria structure
A
TransformFilterCriteria
used to filter the machine learning transforms. - MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Sort
-
- Type: TransformSortCriteria structure
A
TransformSortCriteria
used to sort the machine learning transforms. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Specifies to return only these tagged resources.
Result Syntax
[ 'NextToken' => '<string>', 'TransformIds' => ['<string>', ...], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
- TransformIds
-
- Required: Yes
- Type: Array of strings
The identifiers of all the machine learning transforms in the account, or the machine learning transforms with the specified tags.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
ListRegistries
$result = $client->listRegistries
([/* ... */]); $promise = $client->listRegistriesAsync
([/* ... */]);
Returns a list of registries that you have created, with minimal registry information. Registries in the Deleting
status will not be included in the results. Empty results will be returned if there are no registries available.
Parameter Syntax
$result = $client->listRegistries([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'NextToken' => '<string>', 'Registries' => [ [ 'CreatedTime' => '<string>', 'Description' => '<string>', 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Status' => 'AVAILABLE|DELETING', 'UpdatedTime' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
- Registries
-
- Type: Array of RegistryListItem structures
An array of
RegistryDetailedListItem
objects containing minimal details of each registry.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
ListSchemaVersions
$result = $client->listSchemaVersions
([/* ... */]); $promise = $client->listSchemaVersionsAsync
([/* ... */]);
Returns a list of schema versions that you have created, with minimal information. Schema versions in Deleted status will not be included in the results. Empty results will be returned if there are no schema versions available.
Parameter Syntax
$result = $client->listSchemaVersions([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided. -
SchemaId$SchemaName: The name of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided.
Result Syntax
[ 'NextToken' => '<string>', 'Schemas' => [ [ 'CreatedTime' => '<string>', 'SchemaArn' => '<string>', 'SchemaVersionId' => '<string>', 'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING', 'VersionNumber' => <integer>, ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
- Schemas
-
- Type: Array of SchemaVersionListItem structures
An array of
SchemaVersionList
objects containing details of each schema version.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
ListSchemas
$result = $client->listSchemas
([/* ... */]); $promise = $client->listSchemasAsync
([/* ... */]);
Returns a list of schemas with minimal details. Schemas in Deleting status will not be included in the results. Empty results will be returned if there are no schemas available.
When the RegistryId
is not provided, all the schemas across registries will be part of the API response.
Parameter Syntax
$result = $client->listSchemas([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'RegistryId' => [ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- RegistryId
-
- Type: RegistryId structure
A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Result Syntax
[ 'NextToken' => '<string>', 'Schemas' => [ [ 'CreatedTime' => '<string>', 'Description' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', 'SchemaStatus' => 'AVAILABLE|PENDING|DELETING', 'UpdatedTime' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
- Schemas
-
- Type: Array of SchemaListItem structures
An array of
SchemaListItem
objects containing details of each schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
ListSessions
$result = $client->listSessions
([/* ... */]); $promise = $client->listSessionsAsync
([/* ... */]);
Retrieve a list of sessions.
Parameter Syntax
$result = $client->listSessions([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'RequestOrigin' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results.
- NextToken
-
- Type: string
The token for the next set of results, or null if there are no more result.
- RequestOrigin
-
- Type: string
The origin of the request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Tags belonging to the session.
Result Syntax
[ 'Ids' => ['<string>', ...], 'NextToken' => '<string>', 'Sessions' => [ [ 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'CompletedOn' => <DateTime>, 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DPUSeconds' => <float>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <float>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Progress' => <float>, 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ]
Result Details
Members
- Ids
-
- Type: Array of strings
Returns the ID of the session.
- NextToken
-
- Type: string
The token for the next set of results, or null if there are no more result.
- Sessions
-
- Type: Array of Session structures
Returns the session object.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListStatements
$result = $client->listStatements
([/* ... */]); $promise = $client->listStatementsAsync
([/* ... */]);
Lists statements for the session.
Parameter Syntax
$result = $client->listStatements([ 'NextToken' => '<string>', 'RequestOrigin' => '<string>', 'SessionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- RequestOrigin
-
- Type: string
The origin of the request to list statements.
- SessionId
-
- Required: Yes
- Type: string
The Session ID of the statements.
Result Syntax
[ 'NextToken' => '<string>', 'Statements' => [ [ 'Code' => '<string>', 'CompletedOn' => <integer>, 'Id' => <integer>, 'Output' => [ 'Data' => [ 'TextPlain' => '<string>', ], 'ErrorName' => '<string>', 'ErrorValue' => '<string>', 'ExecutionCount' => <integer>, 'Status' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR', 'Traceback' => ['<string>', ...], ], 'Progress' => <float>, 'StartedOn' => <integer>, 'State' => 'WAITING|RUNNING|AVAILABLE|CANCELLING|CANCELLED|ERROR', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if not all statements have yet been returned.
- Statements
-
- Type: Array of Statement structures
Returns the list of statements.
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
ListTableOptimizerRuns
$result = $client->listTableOptimizerRuns
([/* ... */]); $promise = $client->listTableOptimizerRunsAsync
([/* ... */]);
Lists the history of previous optimizer runs for a specific table.
Parameter Syntax
$result = $client->listTableOptimizerRuns([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', 'TableName' => '<string>', // REQUIRED 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- MaxResults
-
- Type: int
The maximum number of optimizer runs to return on each call.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- Type
-
- Required: Yes
- Type: string
The type of table optimizer. Currently, the only valid value is
compaction
.
Result Syntax
[ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'NextToken' => '<string>', 'TableName' => '<string>', 'TableOptimizerRuns' => [ [ 'compactionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfBytesCompacted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfFilesCompacted' => <integer>, ], ], 'endTimestamp' => <DateTime>, 'error' => '<string>', 'eventType' => 'starting|completed|failed|in_progress', 'metrics' => [ 'JobDurationInHour' => '<string>', 'NumberOfBytesCompacted' => '<string>', 'NumberOfDpus' => '<string>', 'NumberOfFilesCompacted' => '<string>', ], 'orphanFileDeletionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDpus' => <integer>, 'NumberOfOrphanFilesDeleted' => <integer>, ], ], 'retentionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDataFilesDeleted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfManifestFilesDeleted' => <integer>, 'NumberOfManifestListsDeleted' => <integer>, ], ], 'startTimestamp' => <DateTime>, ], // ... ], ]
Result Details
Members
- CatalogId
-
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Type: string
The name of the database in the catalog in which the table resides.
- NextToken
-
- Type: string
A continuation token for paginating the returned list of optimizer runs, returned if the current segment of the list is not the last.
- TableName
-
- Type: string
The name of the table.
- TableOptimizerRuns
-
- Type: Array of TableOptimizerRun structures
A list of the optimizer runs associated with a table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
ListTriggers
$result = $client->listTriggers
([/* ... */]); $promise = $client->listTriggersAsync
([/* ... */]);
Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
Parameter Syntax
$result = $client->listTriggers([ 'DependentJobName' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- DependentJobName
-
- Type: string
The name of the job for which to retrieve triggers. The trigger that can start this job is returned. If there is no such trigger, all triggers are returned.
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Specifies to return only these tagged resources.
Result Syntax
[ 'NextToken' => '<string>', 'TriggerNames' => ['<string>', ...], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
- TriggerNames
-
- Type: Array of strings
The names of all triggers in the account, or the triggers with the specified tags.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ListUsageProfiles
$result = $client->listUsageProfiles
([/* ... */]); $promise = $client->listUsageProfilesAsync
([/* ... */]);
List all the Glue usage profiles.
Parameter Syntax
$result = $client->listUsageProfiles([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of usage profiles to return in a single response.
- NextToken
-
- Type: string
A continuation token, included if this is a continuation call.
Result Syntax
[ 'NextToken' => '<string>', 'Profiles' => [ [ 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, present if the current list segment is not the last.
- Profiles
-
- Type: Array of UsageProfileDefinition structures
A list of usage profile (
UsageProfileDefinition
) objects.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- OperationNotSupportedException:
The operation is not available in the region.
ListWorkflows
$result = $client->listWorkflows
([/* ... */]); $promise = $client->listWorkflowsAsync
([/* ... */]);
Lists names of workflows created in the account.
Parameter Syntax
$result = $client->listWorkflows([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'NextToken' => '<string>', 'Workflows' => ['<string>', ...], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, if not all workflow names have been returned.
- Workflows
-
- Type: Array of strings
List of names of workflows in the account.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
PutDataCatalogEncryptionSettings
$result = $client->putDataCatalogEncryptionSettings
([/* ... */]); $promise = $client->putDataCatalogEncryptionSettingsAsync
([/* ... */]);
Sets the security configuration for a specified catalog. After the configuration has been set, the specified encryption is applied to every catalog write thereafter.
Parameter Syntax
$result = $client->putDataCatalogEncryptionSettings([ 'CatalogId' => '<string>', 'DataCatalogEncryptionSettings' => [ // REQUIRED 'ConnectionPasswordEncryption' => [ 'AwsKmsKeyId' => '<string>', 'ReturnConnectionPasswordEncrypted' => true || false, // REQUIRED ], 'EncryptionAtRest' => [ 'CatalogEncryptionMode' => 'DISABLED|SSE-KMS|SSE-KMS-WITH-SERVICE-ROLE', // REQUIRED 'CatalogEncryptionServiceRole' => '<string>', 'SseAwsKmsKeyId' => '<string>', ], ], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog to set the security configuration for. If none is provided, the Amazon Web Services account ID is used by default.
- DataCatalogEncryptionSettings
-
- Required: Yes
- Type: DataCatalogEncryptionSettings structure
The security configuration to set.
Result Syntax
[]
Result Details
Errors
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
PutDataQualityProfileAnnotation
$result = $client->putDataQualityProfileAnnotation
([/* ... */]); $promise = $client->putDataQualityProfileAnnotationAsync
([/* ... */]);
Annotate all datapoints for a Profile.
Parameter Syntax
$result = $client->putDataQualityProfileAnnotation([ 'InclusionAnnotation' => 'INCLUDE|EXCLUDE', // REQUIRED 'ProfileId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- InclusionAnnotation
-
- Required: Yes
- Type: string
The inclusion annotation value to apply to the profile.
- ProfileId
-
- Required: Yes
- Type: string
The ID of the data quality monitoring profile to annotate.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
PutResourcePolicy
$result = $client->putResourcePolicy
([/* ... */]); $promise = $client->putResourcePolicyAsync
([/* ... */]);
Sets the Data Catalog resource policy for access control.
Parameter Syntax
$result = $client->putResourcePolicy([ 'EnableHybrid' => 'TRUE|FALSE', 'PolicyExistsCondition' => 'MUST_EXIST|NOT_EXIST|NONE', 'PolicyHashCondition' => '<string>', 'PolicyInJson' => '<string>', // REQUIRED 'ResourceArn' => '<string>', ]);
Parameter Details
Members
- EnableHybrid
-
- Type: string
If
'TRUE'
, indicates that you are using both methods to grant cross-account access to Data Catalog resources:-
By directly updating the resource policy with
PutResourePolicy
-
By using the Grant permissions command on the Amazon Web Services Management Console.
Must be set to
'TRUE'
if you have already used the Management Console to grant cross-account access, otherwise the call fails. Default is 'FALSE'. - PolicyExistsCondition
-
- Type: string
A value of
MUST_EXIST
is used to update a policy. A value ofNOT_EXIST
is used to create a new policy. If a value ofNONE
or a null value is used, the call does not depend on the existence of a policy. - PolicyHashCondition
-
- Type: string
The hash value returned when the previous policy was set using
PutResourcePolicy
. Its purpose is to prevent concurrent modifications of a policy. Do not use this parameter if no previous policy has been set. - PolicyInJson
-
- Required: Yes
- Type: string
Contains the policy document to set, in JSON format.
- ResourceArn
-
- Type: string
Do not use. For internal use only.
Result Syntax
[ 'PolicyHash' => '<string>', ]
Result Details
Members
- PolicyHash
-
- Type: string
A hash of the policy that has just been set. This must be included in a subsequent call that overwrites or updates this policy.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ConditionCheckFailureException:
A specified condition was not satisfied.
PutSchemaVersionMetadata
$result = $client->putSchemaVersionMetadata
([/* ... */]); $promise = $client->putSchemaVersionMetadataAsync
([/* ... */]);
Puts the metadata key value pair for a specified schema version ID. A maximum of 10 key value pairs will be allowed per schema version. They can be added over one or more calls.
Parameter Syntax
$result = $client->putSchemaVersionMetadata([ 'MetadataKeyValue' => [ // REQUIRED 'MetadataKey' => '<string>', 'MetadataValue' => '<string>', ], 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => [ 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- MetadataKeyValue
-
- Required: Yes
- Type: MetadataKeyValuePair structure
The metadata key's corresponding value.
- SchemaId
-
- Type: SchemaId structure
The unique ID for the schema.
- SchemaVersionId
-
- Type: string
The unique version ID of the schema version.
- SchemaVersionNumber
-
- Type: SchemaVersionNumber structure
The version number of the schema.
Result Syntax
[ 'LatestVersion' => true || false, 'MetadataKey' => '<string>', 'MetadataValue' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', 'SchemaVersionId' => '<string>', 'VersionNumber' => <integer>, ]
Result Details
Members
- LatestVersion
-
- Type: boolean
The latest version of the schema.
- MetadataKey
-
- Type: string
The metadata key.
- MetadataValue
-
- Type: string
The value of the metadata key.
- RegistryName
-
- Type: string
The name for the registry.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) for the schema.
- SchemaName
-
- Type: string
The name for the schema.
- SchemaVersionId
-
- Type: string
The unique version ID of the schema version.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
PutWorkflowRunProperties
$result = $client->putWorkflowRunProperties
([/* ... */]); $promise = $client->putWorkflowRunPropertiesAsync
([/* ... */]);
Puts the specified workflow run properties for the given workflow run. If a property already exists for the specified run, then it overrides the value otherwise adds the property to existing properties.
Parameter Syntax
$result = $client->putWorkflowRunProperties([ 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED 'RunProperties' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the workflow which was run.
- RunId
-
- Required: Yes
- Type: string
The ID of the workflow run for which the run properties should be updated.
- RunProperties
-
- Required: Yes
- Type: Associative array of custom strings keys (IdString) to strings
The properties to put for the specified run.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
QuerySchemaVersionMetadata
$result = $client->querySchemaVersionMetadata
([/* ... */]); $promise = $client->querySchemaVersionMetadataAsync
([/* ... */]);
Queries for the schema version metadata information.
Parameter Syntax
$result = $client->querySchemaVersionMetadata([ 'MaxResults' => <integer>, 'MetadataList' => [ [ 'MetadataKey' => '<string>', 'MetadataValue' => '<string>', ], // ... ], 'NextToken' => '<string>', 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => [ 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- MaxResults
-
- Type: int
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
- MetadataList
-
- Type: Array of MetadataKeyValuePair structures
Search key-value pairs for metadata, if they are not provided all the metadata information will be fetched.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- SchemaId
-
- Type: SchemaId structure
A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
- SchemaVersionId
-
- Type: string
The unique version ID of the schema version.
- SchemaVersionNumber
-
- Type: SchemaVersionNumber structure
The version number of the schema.
Result Syntax
[ 'MetadataInfoMap' => [ '<MetadataKeyString>' => [ 'CreatedTime' => '<string>', 'MetadataValue' => '<string>', 'OtherMetadataValueList' => [ [ 'CreatedTime' => '<string>', 'MetadataValue' => '<string>', ], // ... ], ], // ... ], 'NextToken' => '<string>', 'SchemaVersionId' => '<string>', ]
Result Details
Members
- MetadataInfoMap
-
- Type: Associative array of custom strings keys (MetadataKeyString) to MetadataInfo structures
A map of a metadata key and associated values.
- NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
- SchemaVersionId
-
- Type: string
The unique version ID of the schema version.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
RegisterSchemaVersion
$result = $client->registerSchemaVersion
([/* ... */]); $promise = $client->registerSchemaVersionAsync
([/* ... */]);
Adds a new version to the existing schema. Returns an error if new version of schema does not meet the compatibility requirements of the schema set. This API will not create a new schema set and will return a 404 error if the schema set is not already present in the Schema Registry.
If this is the first schema definition to be registered in the Schema Registry, this API will store the schema version and return immediately. Otherwise, this call has the potential to run longer than other operations due to compatibility modes. You can call the GetSchemaVersion
API with the SchemaVersionId
to check compatibility modes.
If the same schema definition is already stored in Schema Registry as a version, the schema ID of the existing schema is returned to the caller.
Parameter Syntax
$result = $client->registerSchemaVersion([ 'SchemaDefinition' => '<string>', // REQUIRED 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- SchemaDefinition
-
- Required: Yes
- Type: string
The schema definition using the
DataFormat
setting for theSchemaName
. - SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided. -
SchemaId$SchemaName: The name of the schema. Either
SchemaArn
orSchemaName
andRegistryName
has to be provided.
Result Syntax
[ 'SchemaVersionId' => '<string>', 'Status' => 'AVAILABLE|PENDING|FAILURE|DELETING', 'VersionNumber' => <integer>, ]
Result Details
Members
- SchemaVersionId
-
- Type: string
The unique ID that represents the version of this schema.
- Status
-
- Type: string
The status of the schema version.
- VersionNumber
-
- Type: long (int|float)
The version of this schema (for sync flow only, in case this is the first version).
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
RemoveSchemaVersionMetadata
$result = $client->removeSchemaVersionMetadata
([/* ... */]); $promise = $client->removeSchemaVersionMetadataAsync
([/* ... */]);
Removes a key value pair from the schema version metadata for the specified schema version ID.
Parameter Syntax
$result = $client->removeSchemaVersionMetadata([ 'MetadataKeyValue' => [ // REQUIRED 'MetadataKey' => '<string>', 'MetadataValue' => '<string>', ], 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => [ 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- MetadataKeyValue
-
- Required: Yes
- Type: MetadataKeyValuePair structure
The value of the metadata key.
- SchemaId
-
- Type: SchemaId structure
A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
- SchemaVersionId
-
- Type: string
The unique version ID of the schema version.
- SchemaVersionNumber
-
- Type: SchemaVersionNumber structure
The version number of the schema.
Result Syntax
[ 'LatestVersion' => true || false, 'MetadataKey' => '<string>', 'MetadataValue' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', 'SchemaVersionId' => '<string>', 'VersionNumber' => <integer>, ]
Result Details
Members
- LatestVersion
-
- Type: boolean
The latest version of the schema.
- MetadataKey
-
- Type: string
The metadata key.
- MetadataValue
-
- Type: string
The value of the metadata key.
- RegistryName
-
- Type: string
The name of the registry.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaName
-
- Type: string
The name of the schema.
- SchemaVersionId
-
- Type: string
The version ID for the schema version.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
ResetJobBookmark
$result = $client->resetJobBookmark
([/* ... */]); $promise = $client->resetJobBookmarkAsync
([/* ... */]);
Resets a bookmark entry.
For more information about enabling and using job bookmarks, see:
Parameter Syntax
$result = $client->resetJobBookmark([ 'JobName' => '<string>', // REQUIRED 'RunId' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job in question.
- RunId
-
- Type: string
The unique run identifier associated with this job run.
Result Syntax
[ 'JobBookmarkEntry' => [ 'Attempt' => <integer>, 'JobBookmark' => '<string>', 'JobName' => '<string>', 'PreviousRunId' => '<string>', 'Run' => <integer>, 'RunId' => '<string>', 'Version' => <integer>, ], ]
Result Details
Members
- JobBookmarkEntry
-
- Type: JobBookmarkEntry structure
The reset bookmark entry.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
ResumeWorkflowRun
$result = $client->resumeWorkflowRun
([/* ... */]); $promise = $client->resumeWorkflowRunAsync
([/* ... */]);
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run. The selected nodes and all nodes that are downstream from the selected nodes are run.
Parameter Syntax
$result = $client->resumeWorkflowRun([ 'Name' => '<string>', // REQUIRED 'NodeIds' => ['<string>', ...], // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the workflow to resume.
- NodeIds
-
- Required: Yes
- Type: Array of strings
A list of the node IDs for the nodes you want to restart. The nodes that are to be restarted must have a run attempt in the original run.
- RunId
-
- Required: Yes
- Type: string
The ID of the workflow run to resume.
Result Syntax
[ 'NodeIds' => ['<string>', ...], 'RunId' => '<string>', ]
Result Details
Members
- NodeIds
-
- Type: Array of strings
A list of the node IDs for the nodes that were actually restarted.
- RunId
-
- Type: string
The new ID assigned to the resumed workflow run. Each resume of a workflow run will have a new run ID.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
- IllegalWorkflowStateException:
The workflow is in an invalid state to perform a requested operation.
RunStatement
$result = $client->runStatement
([/* ... */]); $promise = $client->runStatementAsync
([/* ... */]);
Executes the statement.
Parameter Syntax
$result = $client->runStatement([ 'Code' => '<string>', // REQUIRED 'RequestOrigin' => '<string>', 'SessionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Code
-
- Required: Yes
- Type: string
The statement code to be run.
- RequestOrigin
-
- Type: string
The origin of the request.
- SessionId
-
- Required: Yes
- Type: string
The Session Id of the statement to be run.
Result Syntax
[ 'Id' => <integer>, ]
Result Details
Members
- Id
-
- Type: int
Returns the Id of the statement that was run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
SearchTables
$result = $client->searchTables
([/* ... */]); $promise = $client->searchTablesAsync
([/* ... */]);
Searches a set of tables based on properties in the table metadata as well as on the parent database. You can search against text or filter conditions.
You can only get tables that you have access to based on the security policies defined in Lake Formation. You need at least a read-only access to the table for it to be returned. If you do not have access to all the columns in the table, these columns will not be searched against when returning the list of tables back to you. If you have access to the columns but not the data in the columns, those columns and the associated metadata for those columns will be included in the search.
Parameter Syntax
$result = $client->searchTables([ 'CatalogId' => '<string>', 'Filters' => [ [ 'Comparator' => 'EQUALS|GREATER_THAN|LESS_THAN|GREATER_THAN_EQUALS|LESS_THAN_EQUALS', 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'IncludeStatusDetails' => true || false, 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ResourceShareType' => 'FOREIGN|ALL|FEDERATED', 'SearchText' => '<string>', 'SortCriteria' => [ [ 'FieldName' => '<string>', 'Sort' => 'ASC|DESC', ], // ... ], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
A unique identifier, consisting of
account_id
. - Filters
-
- Type: Array of PropertyPredicate structures
A list of key-value pairs, and a comparator used to filter the search results. Returns all entities matching the predicate.
The
Comparator
member of thePropertyPredicate
struct is used only for time fields, and can be omitted for other field types. Also, when comparing string values, such as whenKey=Name
, a fuzzy match algorithm is used. TheKey
field (for example, the value of theName
field) is split on certain punctuation characters, for example, -, :, #, etc. into tokens. Then each token is exact-match compared with theValue
member ofPropertyPredicate
. For example, ifKey=Name
andValue=link
, tables namedcustomer-link
andxx-link-yy
are returned, butxxlinkyy
is not returned. - IncludeStatusDetails
-
- Type: boolean
Specifies whether to include status details related to a request to create or update an Glue Data Catalog view.
- MaxResults
-
- Type: int
The maximum number of tables to return in a single response.
- NextToken
-
- Type: string
A continuation token, included if this is a continuation call.
- ResourceShareType
-
- Type: string
Allows you to specify that you want to search the tables shared with your account. The allowable values are
FOREIGN
orALL
.-
If set to
FOREIGN
, will search the tables shared with your account. -
If set to
ALL
, will search the tables shared with your account, as well as the tables in yor local account.
- SearchText
-
- Type: string
A string used for a text search.
Specifying a value in quotes filters based on an exact match to the value.
- SortCriteria
-
- Type: Array of SortCriterion structures
A list of criteria for sorting the results by a field name, in an ascending or descending order.
Result Syntax
[ 'NextToken' => '<string>', 'TableList' => [ [ 'CatalogId' => '<string>', 'CreateTime' => <DateTime>, 'CreatedBy' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'FederatedTable' => [ 'ConnectionName' => '<string>', 'DatabaseIdentifier' => '<string>', 'Identifier' => '<string>', ], 'IsMultiDialectView' => true || false, 'IsRegisteredWithLakeFormation' => true || false, 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Name' => '<string>', 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'Status' => [ 'Action' => 'UPDATE|CREATE', 'Details' => [ 'RequestedChange' => [...], // RECURSIVE 'ViewValidations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'ViewValidationText' => '<string>', ], // ... ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'RequestTime' => <DateTime>, 'RequestedBy' => '<string>', 'State' => 'QUEUED|IN_PROGRESS|SUCCESS|STOPPED|FAILED', 'UpdateTime' => <DateTime>, 'UpdatedBy' => '<string>', ], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'UpdateTime' => <DateTime>, 'VersionId' => '<string>', 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'IsStale' => true || false, 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A continuation token, present if the current list segment is not the last.
- TableList
-
- Type: Array of Table structures
A list of the requested
Table
objects. TheSearchTables
response returns only the tables that you have access to.
Errors
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
StartBlueprintRun
$result = $client->startBlueprintRun
([/* ... */]); $promise = $client->startBlueprintRunAsync
([/* ... */]);
Starts a new run of the specified blueprint.
Parameter Syntax
$result = $client->startBlueprintRun([ 'BlueprintName' => '<string>', // REQUIRED 'Parameters' => '<string>', 'RoleArn' => '<string>', // REQUIRED ]);
Parameter Details
Members
- BlueprintName
-
- Required: Yes
- Type: string
The name of the blueprint.
- Parameters
-
- Type: string
Specifies the parameters as a
BlueprintParameters
object. - RoleArn
-
- Required: Yes
- Type: string
Specifies the IAM role used to create the workflow.
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Type: string
The run ID for this blueprint run.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- EntityNotFoundException:
A specified entity does not exist
- IllegalBlueprintStateException:
The blueprint is in an invalid state to perform a requested operation.
StartColumnStatisticsTaskRun
$result = $client->startColumnStatisticsTaskRun
([/* ... */]); $promise = $client->startColumnStatisticsTaskRunAsync
([/* ... */]);
Starts a column statistics task run, for a specified table and columns.
Parameter Syntax
$result = $client->startColumnStatisticsTaskRun([ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'DatabaseName' => '<string>', // REQUIRED 'Role' => '<string>', // REQUIRED 'SampleSize' => <float>, 'SecurityConfiguration' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogID
-
- Type: string
The ID of the Data Catalog where the table reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNameList
-
- Type: Array of strings
A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- Role
-
- Required: Yes
- Type: string
The IAM role that the service assumes to generate statistics.
- SampleSize
-
- Type: double
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
- SecurityConfiguration
-
- Type: string
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
- TableName
-
- Required: Yes
- Type: string
The name of the table to generate statistics.
Result Syntax
[ 'ColumnStatisticsTaskRunId' => '<string>', ]
Result Details
Members
- ColumnStatisticsTaskRunId
-
- Type: string
The identifier for the column statistics task run.
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- ColumnStatisticsTaskRunningException:
An exception thrown when you try to start another job while running a column stats generation job.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InvalidInputException:
The input provided was not valid.
StartColumnStatisticsTaskRunSchedule
$result = $client->startColumnStatisticsTaskRunSchedule
([/* ... */]); $promise = $client->startColumnStatisticsTaskRunScheduleAsync
([/* ... */]);
Starts a column statistics task run schedule.
Parameter Syntax
$result = $client->startColumnStatisticsTaskRunSchedule([ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to start a column statistic task run schedule.
Result Syntax
[]
Result Details
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
StartCrawler
$result = $client->startCrawler
([/* ... */]); $promise = $client->startCrawlerAsync
([/* ... */]);
Starts a crawl using the specified crawler, regardless of what is scheduled. If the crawler is already running, returns a CrawlerRunningException.
Parameter Syntax
$result = $client->startCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the crawler to start.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- CrawlerRunningException:
The operation cannot be performed because the crawler is already running.
- OperationTimeoutException:
The operation timed out.
StartCrawlerSchedule
$result = $client->startCrawlerSchedule
([/* ... */]); $promise = $client->startCrawlerScheduleAsync
([/* ... */]);
Changes the schedule state of the specified crawler to SCHEDULED
, unless the crawler is already running or the schedule state is already SCHEDULED
.
Parameter Syntax
$result = $client->startCrawlerSchedule([ 'CrawlerName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CrawlerName
-
- Required: Yes
- Type: string
Name of the crawler to schedule.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- SchedulerRunningException:
The specified scheduler is already running.
- SchedulerTransitioningException:
The specified scheduler is transitioning.
- NoScheduleException:
There is no applicable schedule.
- OperationTimeoutException:
The operation timed out.
StartDataQualityRuleRecommendationRun
$result = $client->startDataQualityRuleRecommendationRun
([/* ... */]); $promise = $client->startDataQualityRuleRecommendationRunAsync
([/* ... */]);
Starts a recommendation run that is used to generate rules when you don't know what rules to write. Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.
Recommendation runs are automatically deleted after 90 days.
Parameter Syntax
$result = $client->startDataQualityRuleRecommendationRun([ 'ClientToken' => '<string>', 'CreatedRulesetName' => '<string>', 'DataQualitySecurityConfiguration' => '<string>', 'DataSource' => [ // REQUIRED 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'NumberOfWorkers' => <integer>, 'Role' => '<string>', // REQUIRED 'Timeout' => <integer>, ]);
Parameter Details
Members
- ClientToken
-
- Type: string
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
- CreatedRulesetName
-
- Type: string
A name for the ruleset.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- DataSource
-
- Required: Yes
- Type: DataSource structure
The data source (Glue table) associated with this run.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - Role
-
- Required: Yes
- Type: string
An IAM role supplied to encrypt the results of the run.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Type: string
The unique run identifier associated with this run.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.
StartDataQualityRulesetEvaluationRun
$result = $client->startDataQualityRulesetEvaluationRun
([/* ... */]); $promise = $client->startDataQualityRulesetEvaluationRunAsync
([/* ... */]);
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult
API.
Parameter Syntax
$result = $client->startDataQualityRulesetEvaluationRun([ 'AdditionalDataSources' => [ '<NameString>' => [ 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], // ... ], 'AdditionalRunOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'CompositeRuleEvaluationMethod' => 'COLUMN|ROW', 'ResultsS3Prefix' => '<string>', ], 'ClientToken' => '<string>', 'DataSource' => [ // REQUIRED 'GlueTable' => [ // REQUIRED 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ], 'NumberOfWorkers' => <integer>, 'Role' => '<string>', // REQUIRED 'RulesetNames' => ['<string>', ...], // REQUIRED 'Timeout' => <integer>, ]);
Parameter Details
Members
- AdditionalDataSources
-
- Type: Associative array of custom strings keys (NameString) to DataSource structures
A map of reference strings to additional data sources you can specify for an evaluation run.
- AdditionalRunOptions
-
- Type: DataQualityEvaluationRunAdditionalRunOptions structure
Additional run options you can specify for an evaluation run.
- ClientToken
-
- Type: string
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
- DataSource
-
- Required: Yes
- Type: DataSource structure
The data source (Glue table) associated with this run.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - Role
-
- Required: Yes
- Type: string
An IAM role supplied to encrypt the results of the run.
- RulesetNames
-
- Required: Yes
- Type: Array of strings
A list of ruleset names.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Type: string
The unique run identifier associated with this run.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.
StartExportLabelsTaskRun
$result = $client->startExportLabelsTaskRun
([/* ... */]); $promise = $client->startExportLabelsTaskRunAsync
([/* ... */]);
Begins an asynchronous task to export all labeled data for a particular transform. This task is the only label-related API call that is not part of the typical active learning workflow. You typically use StartExportLabelsTaskRun
when you want to work with all of your existing labels at the same time, such as when you want to remove or change labels that were previously submitted as truth. This API operation accepts the TransformId
whose labels you want to export and an Amazon Simple Storage Service (Amazon S3) path to export the labels to. The operation returns a TaskRunId
. You can check on the status of your task run by calling the GetMLTaskRun
API.
Parameter Syntax
$result = $client->startExportLabelsTaskRun([ 'OutputS3Path' => '<string>', // REQUIRED 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- OutputS3Path
-
- Required: Yes
- Type: string
The Amazon S3 path where you export the labels.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'TaskRunId' => '<string>', ]
Result Details
Members
- TaskRunId
-
- Type: string
The unique identifier for the task run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
StartImportLabelsTaskRun
$result = $client->startImportLabelsTaskRun
([/* ... */]); $promise = $client->startImportLabelsTaskRunAsync
([/* ... */]);
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality. This API operation is generally used as part of the active learning workflow that starts with the StartMLLabelingSetGenerationTaskRun
call and that ultimately results in improving the quality of your machine learning transform.
After the StartMLLabelingSetGenerationTaskRun
finishes, Glue machine learning will have generated a series of questions for humans to answer. (Answering these questions is often called 'labeling' in the machine learning workflows). In the case of the FindMatches
transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?” After the labeling process is finished, users upload their answers/labels with a call to StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes, all future runs of the machine learning transform use the new and improved labels and perform a higher-quality transformation.
By default, StartMLLabelingSetGenerationTaskRun
continually learns from and combines all labels that you upload unless you set Replace
to true. If you set Replace
to true, StartImportLabelsTaskRun
deletes and forgets all previously uploaded labels and learns only from the exact set that you upload. Replacing labels can be helpful if you realize that you previously uploaded incorrect labels, and you believe that they are having a negative effect on your transform quality.
You can check on the status of your task run by calling the GetMLTaskRun
operation.
Parameter Syntax
$result = $client->startImportLabelsTaskRun([ 'InputS3Path' => '<string>', // REQUIRED 'ReplaceAllLabels' => true || false, 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- InputS3Path
-
- Required: Yes
- Type: string
The Amazon Simple Storage Service (Amazon S3) path from where you import the labels.
- ReplaceAllLabels
-
- Type: boolean
Indicates whether to overwrite your existing labels.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'TaskRunId' => '<string>', ]
Result Details
Members
- TaskRunId
-
- Type: string
The unique identifier for the task run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
StartJobRun
$result = $client->startJobRun
([/* ... */]); $promise = $client->startJobRunAsync
([/* ... */]);
Starts a job run using a job definition.
Parameter Syntax
$result = $client->startJobRun([ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'ExecutionClass' => 'FLEX|STANDARD', 'JobName' => '<string>', // REQUIRED 'JobRunId' => '<string>', 'JobRunQueuingEnabled' => true || false, 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- AllocatedCapacity
-
- Type: int
This field is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) to allocate to this JobRun. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The job arguments associated with this run. For this job run, they replace the default arguments set in the job definition itself.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - JobName
-
- Required: Yes
- Type: string
The name of the job definition to use.
- JobRunId
-
- Type: string
The ID of a previous
JobRun
to retry. - JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job run.
A value of true means job run queuing is enabled for the job run. If false or not populated, the job run will not be considered for queueing.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0+ jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job run notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job run. - Timeout
-
- Type: int
The
JobRun
timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and entersTIMEOUT
status. This value overrides the timeout value set in the parent job.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Result Syntax
[ 'JobRunId' => '<string>', ]
Result Details
Members
- JobRunId
-
- Type: string
The ID assigned to this job run.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
StartMLEvaluationTaskRun
$result = $client->startMLEvaluationTaskRun
([/* ... */]); $promise = $client->startMLEvaluationTaskRunAsync
([/* ... */]);
Starts a task to estimate the quality of the transform.
When you provide label sets as examples of truth, Glue machine learning uses some of those examples to learn from them. The rest of the labels are used as a test to estimate quality.
Returns a unique identifier for the run. You can call GetMLTaskRun
to get more information about the stats of the EvaluationTaskRun
.
Parameter Syntax
$result = $client->startMLEvaluationTaskRun([ 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'TaskRunId' => '<string>', ]
Result Details
Members
- TaskRunId
-
- Type: string
The unique identifier associated with this run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
- MLTransformNotReadyException:
The machine learning transform is not ready to run.
StartMLLabelingSetGenerationTaskRun
$result = $client->startMLLabelingSetGenerationTaskRun
([/* ... */]); $promise = $client->startMLLabelingSetGenerationTaskRunAsync
([/* ... */]);
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
When the StartMLLabelingSetGenerationTaskRun
finishes, Glue will have generated a "labeling set" or a set of questions for humans to answer.
In the case of the FindMatches
transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?”
After the labeling process is finished, you can upload your labels with a call to StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes, all future runs of the machine learning transform will use the new and improved labels and perform a higher-quality transformation.
Parameter Syntax
$result = $client->startMLLabelingSetGenerationTaskRun([ 'OutputS3Path' => '<string>', // REQUIRED 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- OutputS3Path
-
- Required: Yes
- Type: string
The Amazon Simple Storage Service (Amazon S3) path where you generate the labeling set.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'TaskRunId' => '<string>', ]
Result Details
Members
- TaskRunId
-
- Type: string
The unique run identifier that is associated with this task run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
StartTrigger
$result = $client->startTrigger
([/* ... */]); $promise = $client->startTriggerAsync
([/* ... */]);
Starts an existing trigger. See Triggering Jobs for information about how different types of trigger are started.
Parameter Syntax
$result = $client->startTrigger([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to start.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger that was started.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
StartWorkflowRun
$result = $client->startWorkflowRun
([/* ... */]); $promise = $client->startWorkflowRunAsync
([/* ... */]);
Starts a new run of the specified workflow.
Parameter Syntax
$result = $client->startWorkflowRun([ 'Name' => '<string>', // REQUIRED 'RunProperties' => ['<string>', ...], ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the workflow to start.
- RunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
The workflow run properties for the new workflow run.
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Type: string
An Id for the new run.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentRunsExceededException:
Too many jobs are being run concurrently.
StopColumnStatisticsTaskRun
$result = $client->stopColumnStatisticsTaskRun
([/* ... */]); $promise = $client->stopColumnStatisticsTaskRunAsync
([/* ... */]);
Stops a task run for the specified table.
Parameter Syntax
$result = $client->stopColumnStatisticsTaskRun([ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- ColumnStatisticsTaskNotRunningException:
An exception thrown when you try to stop a task run when there is no task running.
- ColumnStatisticsTaskStoppingException:
An exception thrown when you try to stop a task run.
- OperationTimeoutException:
The operation timed out.
StopColumnStatisticsTaskRunSchedule
$result = $client->stopColumnStatisticsTaskRunSchedule
([/* ... */]); $promise = $client->stopColumnStatisticsTaskRunScheduleAsync
([/* ... */]);
Stops a column statistics task run schedule.
Parameter Syntax
$result = $client->stopColumnStatisticsTaskRunSchedule([ 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to stop a column statistic task run schedule.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
StopCrawler
$result = $client->stopCrawler
([/* ... */]); $promise = $client->stopCrawlerAsync
([/* ... */]);
If the specified crawler is running, stops the crawl.
Parameter Syntax
$result = $client->stopCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the crawler to stop.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- CrawlerNotRunningException:
The specified crawler is not running.
- CrawlerStoppingException:
The specified crawler is stopping.
- OperationTimeoutException:
The operation timed out.
StopCrawlerSchedule
$result = $client->stopCrawlerSchedule
([/* ... */]); $promise = $client->stopCrawlerScheduleAsync
([/* ... */]);
Sets the schedule state of the specified crawler to NOT_SCHEDULED
, but does not stop the crawler if it is already running.
Parameter Syntax
$result = $client->stopCrawlerSchedule([ 'CrawlerName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CrawlerName
-
- Required: Yes
- Type: string
Name of the crawler whose schedule state to set.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- SchedulerNotRunningException:
The specified scheduler is not running.
- SchedulerTransitioningException:
The specified scheduler is transitioning.
- OperationTimeoutException:
The operation timed out.
StopSession
$result = $client->stopSession
([/* ... */]); $promise = $client->stopSessionAsync
([/* ... */]);
Stops the session.
Parameter Syntax
$result = $client->stopSession([ 'Id' => '<string>', // REQUIRED 'RequestOrigin' => '<string>', ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: string
The ID of the session to be stopped.
- RequestOrigin
-
- Type: string
The origin of the request.
Result Syntax
[ 'Id' => '<string>', ]
Result Details
Members
- Id
-
- Type: string
Returns the Id of the stopped session.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
StopTrigger
$result = $client->stopTrigger
([/* ... */]); $promise = $client->stopTriggerAsync
([/* ... */]);
Stops a specified trigger.
Parameter Syntax
$result = $client->stopTrigger([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to stop.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger that was stopped.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
StopWorkflowRun
$result = $client->stopWorkflowRun
([/* ... */]); $promise = $client->stopWorkflowRunAsync
([/* ... */]);
Stops the execution of the specified workflow run.
Parameter Syntax
$result = $client->stopWorkflowRun([ 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the workflow to stop.
- RunId
-
- Required: Yes
- Type: string
The ID of the workflow run to stop.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- IllegalWorkflowStateException:
The workflow is in an invalid state to perform a requested operation.
TagResource
$result = $client->tagResource
([/* ... */]); $promise = $client->tagResourceAsync
([/* ... */]);
Adds tags to a resource. A tag is a label you can assign to an Amazon Web Services resource. In Glue, you can tag only certain resources. For information about what resources you can tag, see Amazon Web Services Tags in Glue.
Parameter Syntax
$result = $client->tagResource([ 'ResourceArn' => '<string>', // REQUIRED 'TagsToAdd' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
The ARN of the Glue resource to which to add the tags. For more information about Glue resource ARNs, see the Glue ARN string pattern.
- TagsToAdd
-
- Required: Yes
- Type: Associative array of custom strings keys (TagKey) to strings
Tags to add to this resource.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- EntityNotFoundException:
A specified entity does not exist
TestConnection
$result = $client->testConnection
([/* ... */]); $promise = $client->testConnectionAsync
([/* ... */]);
Tests a connection to a service to validate the service credentials that you provide.
You can either provide an existing connection name or a TestConnectionInput
for testing a non-existing connection input. Providing both at the same time will cause an error.
If the action is successful, the service sends back an HTTP 200 response.
Parameter Syntax
$result = $client->testConnection([ 'ConnectionName' => '<string>', 'TestConnectionInput' => [ 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'AuthorizationCodeProperties' => [ 'AuthorizationCode' => '<string>', 'RedirectUri' => '<string>', ], 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], // REQUIRED 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', // REQUIRED ], ]);
Parameter Details
Members
- ConnectionName
-
- Type: string
Optional. The name of the connection to test. If only name is provided, the operation will get the connection and use that for testing.
- TestConnectionInput
-
- Type: TestConnectionInput structure
A structure that is used to specify testing a connection to a service.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
- FederationSourceException:
A federation source failed.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.- InternalServiceException:
An internal service error occurred.
UntagResource
$result = $client->untagResource
([/* ... */]); $promise = $client->untagResourceAsync
([/* ... */]);
Removes tags from a resource.
Parameter Syntax
$result = $client->untagResource([ 'ResourceArn' => '<string>', // REQUIRED 'TagsToRemove' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the resource from which to remove the tags.
- TagsToRemove
-
- Required: Yes
- Type: Array of strings
Tags to remove from this resource.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- EntityNotFoundException:
A specified entity does not exist
UpdateBlueprint
$result = $client->updateBlueprint
([/* ... */]); $promise = $client->updateBlueprintAsync
([/* ... */]);
Updates a registered blueprint.
Parameter Syntax
$result = $client->updateBlueprint([ 'BlueprintLocation' => '<string>', // REQUIRED 'Description' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- BlueprintLocation
-
- Required: Yes
- Type: string
Specifies a path in Amazon S3 where the blueprint is published.
- Description
-
- Type: string
A description of the blueprint.
- Name
-
- Required: Yes
- Type: string
The name of the blueprint.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Returns the name of the blueprint that was updated.
Errors
- EntityNotFoundException:
A specified entity does not exist
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- IllegalBlueprintStateException:
The blueprint is in an invalid state to perform a requested operation.
UpdateClassifier
$result = $client->updateClassifier
([/* ... */]); $promise = $client->updateClassifierAsync
([/* ... */]);
Modifies an existing classifier (a GrokClassifier
, an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on which field is present).
Parameter Syntax
$result = $client->updateClassifier([ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', ], 'GrokClassifier' => [ 'Classification' => '<string>', 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', 'Name' => '<string>', // REQUIRED ], 'JsonClassifier' => [ 'JsonPath' => '<string>', 'Name' => '<string>', // REQUIRED ], 'XMLClassifier' => [ 'Classification' => '<string>', 'Name' => '<string>', // REQUIRED 'RowTag' => '<string>', ], ]);
Parameter Details
Members
- CsvClassifier
-
- Type: UpdateCsvClassifierRequest structure
A
CsvClassifier
object with updated fields. - GrokClassifier
-
- Type: UpdateGrokClassifierRequest structure
A
GrokClassifier
object with updated fields. - JsonClassifier
-
- Type: UpdateJsonClassifierRequest structure
A
JsonClassifier
object with updated fields. - XMLClassifier
-
- Type: UpdateXMLClassifierRequest structure
An
XMLClassifier
object with updated fields.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- VersionMismatchException:
There was a version conflict.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
UpdateColumnStatisticsForPartition
$result = $client->updateColumnStatisticsForPartition
([/* ... */]); $promise = $client->updateColumnStatisticsForPartitionAsync
([/* ... */]);
Creates or updates partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is UpdatePartition
.
Parameter Syntax
$result = $client->updateColumnStatisticsForPartition([ 'CatalogId' => '<string>', 'ColumnStatisticsList' => [ // REQUIRED [ 'AnalyzedTime' => <integer || string || DateTime>, // REQUIRED 'ColumnName' => '<string>', // REQUIRED 'ColumnType' => '<string>', // REQUIRED 'StatisticsData' => [ // REQUIRED 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, // REQUIRED 'MaximumLength' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED 'NumberOfTrues' => <integer>, // REQUIRED ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <integer || string || DateTime>, 'MinimumValue' => <integer || string || DateTime>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, // REQUIRED 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED ], 'MinimumValue' => [ 'Scale' => <integer>, // REQUIRED 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED ], 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, // REQUIRED 'MaximumLength' => <integer>, // REQUIRED 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', // REQUIRED ], ], // ... ], 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnStatisticsList
-
- Required: Yes
- Type: Array of ColumnStatistics structures
A list of the column statistics.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
A list of partition values identifying the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'Errors' => [ [ 'ColumnStatistics' => [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of ColumnStatisticsError structures
Error occurred during updating column statistics data.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
UpdateColumnStatisticsForTable
$result = $client->updateColumnStatisticsForTable
([/* ... */]); $promise = $client->updateColumnStatisticsForTableAsync
([/* ... */]);
Creates or updates table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is UpdateTable
.
Parameter Syntax
$result = $client->updateColumnStatisticsForTable([ 'CatalogId' => '<string>', 'ColumnStatisticsList' => [ // REQUIRED [ 'AnalyzedTime' => <integer || string || DateTime>, // REQUIRED 'ColumnName' => '<string>', // REQUIRED 'ColumnType' => '<string>', // REQUIRED 'StatisticsData' => [ // REQUIRED 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, // REQUIRED 'MaximumLength' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED 'NumberOfTrues' => <integer>, // REQUIRED ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <integer || string || DateTime>, 'MinimumValue' => <integer || string || DateTime>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, // REQUIRED 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED ], 'MinimumValue' => [ 'Scale' => <integer>, // REQUIRED 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, // REQUIRED ], 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, // REQUIRED 'MaximumLength' => <integer>, // REQUIRED 'NumberOfDistinctValues' => <integer>, // REQUIRED 'NumberOfNulls' => <integer>, // REQUIRED ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', // REQUIRED ], ], // ... ], 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnStatisticsList
-
- Required: Yes
- Type: Array of ColumnStatistics structures
A list of the column statistics.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'Errors' => [ [ 'ColumnStatistics' => [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of ColumnStatisticsError structures
List of ColumnStatisticsErrors.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
UpdateColumnStatisticsTaskSettings
$result = $client->updateColumnStatisticsTaskSettings
([/* ... */]); $promise = $client->updateColumnStatisticsTaskSettingsAsync
([/* ... */]);
Updates settings for a column statistics task.
Parameter Syntax
$result = $client->updateColumnStatisticsTaskSettings([ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'DatabaseName' => '<string>', // REQUIRED 'Role' => '<string>', 'SampleSize' => <float>, 'Schedule' => '<string>', 'SecurityConfiguration' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogID
-
- Type: string
The ID of the Data Catalog in which the database resides.
- ColumnNameList
-
- Type: Array of strings
A list of column names for which to run statistics.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- Role
-
- Type: string
The role used for running the column statistics.
- SampleSize
-
- Type: double
The percentage of data to sample.
- Schedule
-
- Type: string
A schedule for running the column statistics, specified in CRON syntax.
- SecurityConfiguration
-
- Type: string
Name of the security configuration that is used to encrypt CloudWatch logs.
- TableName
-
- Required: Yes
- Type: string
The name of the table for which to generate column statistics.
Result Syntax
[]
Result Details
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- VersionMismatchException:
There was a version conflict.
- OperationTimeoutException:
The operation timed out.
UpdateConnection
$result = $client->updateConnection
([/* ... */]); $promise = $client->updateConnectionAsync
([/* ... */]);
Updates a connection definition in the Data Catalog.
Parameter Syntax
$result = $client->updateConnection([ 'CatalogId' => '<string>', 'ConnectionInput' => [ // REQUIRED 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'AuthorizationCodeProperties' => [ 'AuthorizationCode' => '<string>', 'RedirectUri' => '<string>', ], 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], // REQUIRED 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', // REQUIRED 'Description' => '<string>', 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'ValidateCredentials' => true || false, ], 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionInput
-
- Required: Yes
- Type: ConnectionInput structure
A
ConnectionInput
object that redefines the connection in question. - Name
-
- Required: Yes
- Type: string
The name of the connection definition to update.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
UpdateCrawler
$result = $client->updateCrawler
([/* ... */]); $promise = $client->updateCrawlerAsync
([/* ... */]);
Updates a crawler. If a crawler is running, you must stop it using StopCrawler
before updating it.
Parameter Syntax
$result = $client->updateCrawler([ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlerSecurityConfiguration' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', // REQUIRED 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => '<string>', 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], // REQUIRED ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], ]);
Parameter Details
Members
- Classifiers
-
- Type: Array of strings
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
- Configuration
-
- Type: string
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
- CrawlerSecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used by this crawler. - DatabaseName
-
- Type: string
The Glue database where results are stored, such as:
arn:aws:daylight:us-east-1::database/sometable/*
. - Description
-
- Type: string
A description of the new crawler.
- LakeFormationConfiguration
-
- Type: LakeFormationConfiguration structure
Specifies Lake Formation configuration settings for the crawler.
- LineageConfiguration
-
- Type: LineageConfiguration structure
Specifies data lineage configuration settings for the crawler.
- Name
-
- Required: Yes
- Type: string
Name of the new crawler.
- RecrawlPolicy
-
- Type: RecrawlPolicy structure
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
- Role
-
- Type: string
The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources.
- Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
. - SchemaChangePolicy
-
- Type: SchemaChangePolicy structure
The policy for the crawler's update and deletion behavior.
- TablePrefix
-
- Type: string
The table prefix used for catalog tables that are created.
- Targets
-
- Type: CrawlerTargets structure
A list of targets to crawl.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- VersionMismatchException:
There was a version conflict.
- EntityNotFoundException:
A specified entity does not exist
- CrawlerRunningException:
The operation cannot be performed because the crawler is already running.
- OperationTimeoutException:
The operation timed out.
UpdateCrawlerSchedule
$result = $client->updateCrawlerSchedule
([/* ... */]); $promise = $client->updateCrawlerScheduleAsync
([/* ... */]);
Updates the schedule of a crawler using a cron
expression.
Parameter Syntax
$result = $client->updateCrawlerSchedule([ 'CrawlerName' => '<string>', // REQUIRED 'Schedule' => '<string>', ]);
Parameter Details
Members
- CrawlerName
-
- Required: Yes
- Type: string
The name of the crawler whose schedule to update.
- Schedule
-
- Type: string
The updated
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- VersionMismatchException:
There was a version conflict.
- SchedulerTransitioningException:
The specified scheduler is transitioning.
- OperationTimeoutException:
The operation timed out.
UpdateDataQualityRuleset
$result = $client->updateDataQualityRuleset
([/* ... */]); $promise = $client->updateDataQualityRulesetAsync
([/* ... */]);
Updates the specified data quality ruleset.
Parameter Syntax
$result = $client->updateDataQualityRuleset([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Ruleset' => '<string>', ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the ruleset.
- Name
-
- Required: Yes
- Type: string
The name of the data quality ruleset.
- Ruleset
-
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
Result Syntax
[ 'Description' => '<string>', 'Name' => '<string>', 'Ruleset' => '<string>', ]
Result Details
Members
- Description
-
- Type: string
A description of the ruleset.
- Name
-
- Type: string
The name of the data quality ruleset.
- Ruleset
-
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AlreadyExistsException:
A resource to be created or added already exists.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
UpdateDatabase
$result = $client->updateDatabase
([/* ... */]); $promise = $client->updateDatabaseAsync
([/* ... */]);
Updates an existing database definition in a Data Catalog.
Parameter Syntax
$result = $client->updateDatabase([ 'CatalogId' => '<string>', 'DatabaseInput' => [ // REQUIRED 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the metadata database resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseInput
-
- Required: Yes
- Type: DatabaseInput structure
A
DatabaseInput
object specifying the new definition of the metadata database in the catalog. - Name
-
- Required: Yes
- Type: string
The name of the database to update in the catalog. For Hive compatibility, this is folded to lowercase.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
UpdateDevEndpoint
$result = $client->updateDevEndpoint
([/* ... */]); $promise = $client->updateDevEndpointAsync
([/* ... */]);
Updates a specified development endpoint.
Parameter Syntax
$result = $client->updateDevEndpoint([ 'AddArguments' => ['<string>', ...], 'AddPublicKeys' => ['<string>', ...], 'CustomLibraries' => [ 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', ], 'DeleteArguments' => ['<string>', ...], 'DeletePublicKeys' => ['<string>', ...], 'EndpointName' => '<string>', // REQUIRED 'PublicKey' => '<string>', 'UpdateEtlLibraries' => true || false, ]);
Parameter Details
Members
- AddArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The map of arguments to add the map of arguments used to configure the
DevEndpoint
.Valid arguments are:
-
"--enable-glue-datacatalog": ""
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - AddPublicKeys
-
- Type: Array of strings
The list of public keys for the
DevEndpoint
to use. - CustomLibraries
-
- Type: DevEndpointCustomLibraries structure
Custom Python or Java libraries to be loaded in the
DevEndpoint
. - DeleteArguments
-
- Type: Array of strings
The list of argument keys to be deleted from the map of arguments used to configure the
DevEndpoint
. - DeletePublicKeys
-
- Type: Array of strings
The list of public keys to be deleted from the
DevEndpoint
. - EndpointName
-
- Required: Yes
- Type: string
The name of the
DevEndpoint
to be updated. - PublicKey
-
- Type: string
The public key for the
DevEndpoint
to use. - UpdateEtlLibraries
-
- Type: boolean
True
if the list of custom libraries to be loaded in the development endpoint needs to be updated, orFalse
if otherwise.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
UpdateJob
$result = $client->updateJob
([/* ... */]); $promise = $client->updateJobAsync
([/* ... */]);
Updates an existing job definition. The previous job definition is completely overwritten by this information.
Parameter Syntax
$result = $client->updateJob([ 'JobName' => '<string>', // REQUIRED 'JobUpdate' => [ // REQUIRED 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ // REQUIRED [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', // REQUIRED 'Column' => ['<string>', ...], // REQUIRED ], // ... ], 'Groups' => [ // REQUIRED ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Mapping' => [ // REQUIRED [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', // REQUIRED ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', // REQUIRED ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', // REQUIRED ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'CustomCode' => [ 'ClassName' => '<string>', // REQUIRED 'Code' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', // REQUIRED 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'DropFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ // REQUIRED 'Id' => '<string>', // REQUIRED 'Label' => '<string>', // REQUIRED ], 'Value' => '<string>', // REQUIRED ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', // REQUIRED 'Type' => 'str|int|float|complex|bool|list|null', // REQUIRED 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', // REQUIRED 'TransformName' => '<string>', // REQUIRED 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Filter' => [ 'Filters' => [ // REQUIRED [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', // REQUIRED 'Values' => [ // REQUIRED [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', // REQUIRED 'Value' => ['<string>', ...], // REQUIRED ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'LogicalOperator' => 'AND|OR', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ // REQUIRED [ 'From' => '<string>', // REQUIRED 'Keys' => [ // REQUIRED ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Merge' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PrimaryKeys' => [ // REQUIRED ['<string>', ...], // ... ], 'Source' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'MaskValue' => '<string>', 'Name' => '<string>', // REQUIRED 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', // REQUIRED 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'Recipe' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RecipeReference' => [ 'RecipeArn' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', // REQUIRED ], 'RecipeSteps' => [ [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'RenameField' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'SourcePath' => ['<string>', ...], // REQUIRED 'TargetPath' => ['<string>', ...], // REQUIRED ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'S3CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'QuoteChar' => 'quote|quillemet|single_quote|disabled', // REQUIRED 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', // REQUIRED 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Compression' => 'gzip|lzo|uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'SnowflakeSource' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ // REQUIRED [ 'Alias' => '<string>', // REQUIRED 'From' => '<string>', // REQUIRED ], // ... ], 'SqlQuery' => '<string>', // REQUIRED ], 'Spigot' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Path' => '<string>', // REQUIRED 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'UnionType' => 'ALL|DISTINCT', // REQUIRED ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition to update.
- JobUpdate
-
- Required: Yes
- Type: JobUpdate structure
Specifies the values with which to update the job definition. Unspecified configuration is removed or reset to default values.
Result Syntax
[ 'JobName' => '<string>', ]
Result Details
Members
- JobName
-
- Type: string
Returns the name of the updated job definition.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
UpdateJobFromSourceControl
$result = $client->updateJobFromSourceControl
([/* ... */]); $promise = $client->updateJobFromSourceControlAsync
([/* ... */]);
Synchronizes a job from the source control repository. This operation takes the job artifacts that are located in the remote repository and updates the Glue internal stores with these artifacts.
This API supports optional parameters which take in the repository information.
Parameter Syntax
$result = $client->updateJobFromSourceControl([ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'BranchName' => '<string>', 'CommitId' => '<string>', 'Folder' => '<string>', 'JobName' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'RepositoryName' => '<string>', 'RepositoryOwner' => '<string>', ]);
Parameter Details
Members
- AuthStrategy
-
- Type: string
The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.
- AuthToken
-
- Type: string
The value of the authorization token.
- BranchName
-
- Type: string
An optional branch in the remote repository.
- CommitId
-
- Type: string
A commit ID for a commit in the remote repository.
- Folder
-
- Type: string
An optional folder in the remote repository.
- JobName
-
- Type: string
The name of the Glue job to be synchronized to or from the remote repository.
- Provider
-
- Type: string
The provider for the remote repository. Possible values: GITHUB, AWS_CODE_COMMIT, GITLAB, BITBUCKET.
- RepositoryName
-
- Type: string
The name of the remote repository that contains the job artifacts. For BitBucket providers,
RepositoryName
should includeWorkspaceName
. Use the format<WorkspaceName>/<RepositoryName>
. - RepositoryOwner
-
- Type: string
The owner of the remote repository that contains the job artifacts.
Result Syntax
[ 'JobName' => '<string>', ]
Result Details
Members
- JobName
-
- Type: string
The name of the Glue job.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
UpdateMLTransform
$result = $client->updateMLTransform
([/* ... */]); $promise = $client->updateMLTransformAsync
([/* ... */]);
Updates an existing machine learning transform. Call this operation to tune the algorithm parameters to achieve better results.
After calling this operation, you can call the StartMLEvaluationTaskRun
operation to assess how well your new parameters achieved your goals (such as improving the quality of your machine learning transform, or making it more cost-effective).
Parameter Syntax
$result = $client->updateMLTransform([ 'Description' => '<string>', 'GlueVersion' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NumberOfWorkers' => <integer>, 'Parameters' => [ 'FindMatchesParameters' => [ 'AccuracyCostTradeoff' => <float>, 'EnforceProvidedLabels' => true || false, 'PrecisionRecallTradeoff' => <float>, 'PrimaryKeyColumnName' => '<string>', ], 'TransformType' => 'FIND_MATCHES', // REQUIRED ], 'Role' => '<string>', 'Timeout' => <integer>, 'TransformId' => '<string>', // REQUIRED 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the transform. The default is an empty string.
- GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only. - MaxRetries
-
- Type: int
The maximum number of times to retry a task for this transform after a task run fails.
- Name
-
- Type: string
The unique name that you gave the transform when you created it.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when this task runs. - Parameters
-
- Type: TransformParameters structure
The configuration parameters that are specific to the transform type (algorithm) used. Conditionally dependent on the transform type.
- Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions.
- Timeout
-
- Type: int
The timeout for a task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours). - TransformId
-
- Required: Yes
- Type: string
A unique identifier that was generated when the transform was created.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker. -
For the
G.2X
worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
Result Syntax
[ 'TransformId' => '<string>', ]
Result Details
Members
- TransformId
-
- Type: string
The unique identifier for the transform that was updated.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- AccessDeniedException:
Access to a resource was denied.
UpdatePartition
$result = $client->updatePartition
([/* ... */]); $promise = $client->updatePartitionAsync
([/* ... */]);
Updates a partition.
Parameter Syntax
$result = $client->updatePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionInput' => [ // REQUIRED 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], 'PartitionValueList' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition to be updated resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table in question resides.
- PartitionInput
-
- Required: Yes
- Type: PartitionInput structure
The new partition object to update the partition to.
The
Values
property can't be changed. If you want to change the partition key values for a partition, delete and recreate the partition. - PartitionValueList
-
- Required: Yes
- Type: Array of strings
List of partition key values that define the partition to update.
- TableName
-
- Required: Yes
- Type: string
The name of the table in which the partition to be updated is located.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
UpdateRegistry
$result = $client->updateRegistry
([/* ... */]); $promise = $client->updateRegistryAsync
([/* ... */]);
Updates an existing registry which is used to hold a collection of schemas. The updated properties relate to the registry, and do not modify any of the schemas within the registry.
Parameter Syntax
$result = $client->updateRegistry([ 'Description' => '<string>', // REQUIRED 'RegistryId' => [ // REQUIRED 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], ]);
Parameter Details
Members
- Description
-
- Required: Yes
- Type: string
A description of the registry. If description is not provided, this field will not be updated.
- RegistryId
-
- Required: Yes
- Type: RegistryId structure
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Result Syntax
[ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ]
Result Details
Members
- RegistryArn
-
- Type: string
The Amazon Resource name (ARN) of the updated registry.
- RegistryName
-
- Type: string
The name of the updated registry.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
UpdateSchema
$result = $client->updateSchema
([/* ... */]); $promise = $client->updateSchemaAsync
([/* ... */]);
Updates the description, compatibility setting, or version checkpoint for a schema set.
For updating the compatibility setting, the call will not validate compatibility for the entire set of schema versions with the new compatibility setting. If the value for Compatibility
is provided, the VersionNumber
(a checkpoint) is also required. The API will validate the checkpoint version number for consistency.
If the value for the VersionNumber
(checkpoint) is provided, Compatibility
is optional and this can be used to set/reset a checkpoint for the schema.
This update will happen only if the schema is in the AVAILABLE state.
Parameter Syntax
$result = $client->updateSchema([ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'Description' => '<string>', 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionNumber' => [ 'LatestVersion' => true || false, 'VersionNumber' => <integer>, ], ]);
Parameter Details
Members
- Compatibility
-
- Type: string
The new compatibility setting for the schema.
- Description
-
- Type: string
The new description for the schema.
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure to contain schema identity fields. The structure contains:
-
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of
SchemaArn
orSchemaName
has to be provided. -
SchemaId$SchemaName: The name of the schema. One of
SchemaArn
orSchemaName
has to be provided.
- SchemaVersionNumber
-
- Type: SchemaVersionNumber structure
Version number required for check pointing. One of
VersionNumber
orCompatibility
has to be provided.
Result Syntax
[ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ]
Result Details
Members
- RegistryName
-
- Type: string
The name of the registry that contains the schema.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaName
-
- Type: string
The name of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
UpdateSourceControlFromJob
$result = $client->updateSourceControlFromJob
([/* ... */]); $promise = $client->updateSourceControlFromJobAsync
([/* ... */]);
Synchronizes a job to the source control repository. This operation takes the job artifacts from the Glue internal stores and makes a commit to the remote repository that is configured on the job.
This API supports optional parameters which take in the repository information.
Parameter Syntax
$result = $client->updateSourceControlFromJob([ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'BranchName' => '<string>', 'CommitId' => '<string>', 'Folder' => '<string>', 'JobName' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'RepositoryName' => '<string>', 'RepositoryOwner' => '<string>', ]);
Parameter Details
Members
- AuthStrategy
-
- Type: string
The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.
- AuthToken
-
- Type: string
The value of the authorization token.
- BranchName
-
- Type: string
An optional branch in the remote repository.
- CommitId
-
- Type: string
A commit ID for a commit in the remote repository.
- Folder
-
- Type: string
An optional folder in the remote repository.
- JobName
-
- Type: string
The name of the Glue job to be synchronized to or from the remote repository.
- Provider
-
- Type: string
The provider for the remote repository. Possible values: GITHUB, AWS_CODE_COMMIT, GITLAB, BITBUCKET.
- RepositoryName
-
- Type: string
The name of the remote repository that contains the job artifacts. For BitBucket providers,
RepositoryName
should includeWorkspaceName
. Use the format<WorkspaceName>/<RepositoryName>
. - RepositoryOwner
-
- Type: string
The owner of the remote repository that contains the job artifacts.
Result Syntax
[ 'JobName' => '<string>', ]
Result Details
Members
- JobName
-
- Type: string
The name of the Glue job.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
UpdateTable
$result = $client->updateTable
([/* ... */]); $promise = $client->updateTableAsync
([/* ... */]);
Updates a metadata table in the Data Catalog.
Parameter Syntax
$result = $client->updateTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Force' => true || false, 'SkipArchive' => true || false, 'TableInput' => [ // REQUIRED 'Description' => '<string>', 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Name' => '<string>', // REQUIRED 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], 'TransactionId' => '<string>', 'VersionId' => '<string>', 'ViewUpdateAction' => 'ADD|REPLACE|ADD_OR_REPLACE|DROP', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.
- Force
-
- Type: boolean
A flag that can be set to true to ignore matching storage descriptor and subobject matching requirements.
- SkipArchive
-
- Type: boolean
By default,
UpdateTable
always creates an archived version of the table before updating it. However, ifskipArchive
is set to true,UpdateTable
does not create the archived version. - TableInput
-
- Required: Yes
- Type: TableInput structure
An updated
TableInput
object to define the metadata table in the catalog. - TransactionId
-
- Type: string
The transaction ID at which to update the table contents.
- VersionId
-
- Type: string
The version ID at which to update the table contents.
- ViewUpdateAction
-
- Type: string
The operation to be performed when updating the view.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
- ResourceNotReadyException:
A resource was not ready for a transaction.
UpdateTableOptimizer
$result = $client->updateTableOptimizer
([/* ... */]); $promise = $client->updateTableOptimizerAsync
([/* ... */]);
Updates the configuration for an existing table optimizer.
Parameter Syntax
$result = $client->updateTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'TableOptimizerConfiguration' => [ // REQUIRED 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- TableOptimizerConfiguration
-
- Required: Yes
- Type: TableOptimizerConfiguration structure
A
TableOptimizerConfiguration
object representing the configuration of a table optimizer. - Type
-
- Required: Yes
- Type: string
The type of table optimizer. Currently, the only valid value is
compaction
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- ValidationException:
A value could not be validated.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
UpdateTrigger
$result = $client->updateTrigger
([/* ... */]); $promise = $client->updateTriggerAsync
([/* ... */]);
Updates a trigger definition.
Parameter Syntax
$result = $client->updateTrigger([ 'Name' => '<string>', // REQUIRED 'TriggerUpdate' => [ // REQUIRED 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, // REQUIRED 'BatchWindow' => <integer>, ], 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', ], ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to update.
- TriggerUpdate
-
- Required: Yes
- Type: TriggerUpdate structure
The new values with which to update the trigger.
Result Syntax
[ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ]
Result Details
Members
- Trigger
-
- Type: Trigger structure
The resulting trigger definition.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
UpdateUsageProfile
$result = $client->updateUsageProfile
([/* ... */]); $promise = $client->updateUsageProfileAsync
([/* ... */]);
Update an Glue usage profile.
Parameter Syntax
$result = $client->updateUsageProfile([ 'Configuration' => [ // REQUIRED 'JobConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], 'SessionConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], ], 'Description' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Configuration
-
- Required: Yes
- Type: ProfileConfiguration structure
A
ProfileConfiguration
object specifying the job and session values for the profile. - Description
-
- Type: string
A description of the usage profile.
- Name
-
- Required: Yes
- Type: string
The name of the usage profile.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the usage profile that was updated.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- OperationNotSupportedException:
The operation is not available in the region.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
UpdateUserDefinedFunction
$result = $client->updateUserDefinedFunction
([/* ... */]); $promise = $client->updateUserDefinedFunctionAsync
([/* ... */]);
Updates an existing function definition in the Data Catalog.
Parameter Syntax
$result = $client->updateUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionInput' => [ // REQUIRED 'ClassName' => '<string>', 'FunctionName' => '<string>', 'OwnerName' => '<string>', 'OwnerType' => 'USER|ROLE|GROUP', 'ResourceUris' => [ [ 'ResourceType' => 'JAR|FILE|ARCHIVE', 'Uri' => '<string>', ], // ... ], ], 'FunctionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the function to be updated is located. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the function to be updated is located.
- FunctionInput
-
- Required: Yes
- Type: UserDefinedFunctionInput structure
A
FunctionInput
object that redefines the function in the Data Catalog. - FunctionName
-
- Required: Yes
- Type: string
The name of the function.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
UpdateWorkflow
$result = $client->updateWorkflow
([/* ... */]); $promise = $client->updateWorkflowAsync
([/* ... */]);
Updates an existing workflow.
Parameter Syntax
$result = $client->updateWorkflow([ 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DefaultRunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
A collection of properties to be used as part of each execution of the workflow.
- Description
-
- Type: string
The description of the workflow.
- MaxConcurrentRuns
-
- Type: int
You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.
- Name
-
- Required: Yes
- Type: string
Name of the workflow to be updated.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the workflow which was specified in input.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
Shapes
AccessDeniedException
Description
Access to a resource was denied.
Members
- Message
-
- Type: string
A message describing the problem.
Action
Description
Defines an action to be initiated by a trigger.
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The job arguments used when this trigger fires. For this job run, they replace the default arguments set in the job definition itself.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the key-value pairs that Glue consumes to set up your job, see the Special Parameters Used by Glue topic in the developer guide.
- CrawlerName
-
- Type: string
The name of the crawler to be used with this action.
- JobName
-
- Type: string
The name of a job to be run.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job run notification.
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this action. - Timeout
-
- Type: int
The
JobRun
timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and entersTIMEOUT
status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.
Aggregate
Description
Specifies a transform that groups rows by chosen fields and computes the aggregated value by specified function.
Members
- Aggs
-
- Required: Yes
- Type: Array of AggregateOperation structures
Specifies the aggregate functions to be performed on specified fields.
- Groups
-
- Required: Yes
- Type: Array of stringss
Specifies the fields to group by.
- Inputs
-
- Required: Yes
- Type: Array of strings
Specifies the fields and rows to use as inputs for the aggregate transform.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
AggregateOperation
Description
Specifies the set of parameters needed to perform aggregation in the aggregate transform.
Members
- AggFunc
-
- Required: Yes
- Type: string
Specifies the aggregation function to apply.
Possible aggregation functions include: avg countDistinct, count, first, last, kurtosis, max, min, skewness, stddev_samp, stddev_pop, sum, sumDistinct, var_samp, var_pop
- Column
-
- Required: Yes
- Type: Array of strings
Specifies the column on the data set on which the aggregation function will be applied.
AlreadyExistsException
Description
A resource to be created or added already exists.
Members
- Message
-
- Type: string
A message describing the problem.
AmazonRedshiftAdvancedOption
Description
Specifies an optional value when connecting to the Redshift cluster.
Members
- Key
-
- Type: string
The key for the additional connection option.
- Value
-
- Type: string
The value for the additional connection option.
AmazonRedshiftNodeData
Description
Specifies an Amazon Redshift node.
Members
- AccessType
-
- Type: string
The access type for the Redshift connection. Can be a direct connection or catalog connections.
- Action
-
- Type: string
Specifies how writing to a Redshift cluser will occur.
- AdvancedOptions
-
- Type: Array of AmazonRedshiftAdvancedOption structures
Optional values when connecting to the Redshift cluster.
- CatalogDatabase
-
- Type: Option structure
The name of the Glue Data Catalog database when working with a data catalog.
- CatalogRedshiftSchema
-
- Type: string
The Redshift schema name when working with a data catalog.
- CatalogRedshiftTable
-
- Type: string
The database table to read from.
- CatalogTable
-
- Type: Option structure
The Glue Data Catalog table name when working with a data catalog.
- Connection
-
- Type: Option structure
The Glue connection to the Redshift cluster.
- CrawlerConnection
-
- Type: string
Specifies the name of the connection that is associated with the catalog table used.
- IamRole
-
- Type: Option structure
Optional. The role name use when connection to S3. The IAM role ill default to the role on the job when left blank.
- MergeAction
-
- Type: string
The action used when to detemine how a MERGE in a Redshift sink will be handled.
- MergeClause
-
- Type: string
The SQL used in a custom merge to deal with matching records.
- MergeWhenMatched
-
- Type: string
The action used when to detemine how a MERGE in a Redshift sink will be handled when an existing record matches a new record.
- MergeWhenNotMatched
-
- Type: string
The action used when to detemine how a MERGE in a Redshift sink will be handled when an existing record doesn't match a new record.
- PostAction
-
- Type: string
The SQL used before a MERGE or APPEND with upsert is run.
- PreAction
-
- Type: string
The SQL used before a MERGE or APPEND with upsert is run.
- SampleQuery
-
- Type: string
The SQL used to fetch the data from a Redshift sources when the SourceType is 'query'.
- Schema
-
- Type: Option structure
The Redshift schema name when working with a direct connection.
- SelectedColumns
-
- Type: Array of Option structures
The list of column names used to determine a matching record when doing a MERGE or APPEND with upsert.
- SourceType
-
- Type: string
The source type to specify whether a specific table is the source or a custom query.
- StagingTable
-
- Type: string
The name of the temporary staging table that is used when doing a MERGE or APPEND with upsert.
- Table
-
- Type: Option structure
The Redshift table name when working with a direct connection.
- TablePrefix
-
- Type: string
Specifies the prefix to a table.
- TableSchema
-
- Type: Array of Option structures
The array of schema output for a given node.
- TempDir
-
- Type: string
The Amazon S3 path where temporary data can be staged when copying out of the database.
- Upsert
-
- Type: boolean
The action used on Redshift sinks when doing an APPEND.
AmazonRedshiftSource
Description
Specifies an Amazon Redshift source.
Members
- Data
-
- Type: AmazonRedshiftNodeData structure
Specifies the data of the Amazon Reshift source node.
- Name
-
- Type: string
The name of the Amazon Redshift source.
AmazonRedshiftTarget
Description
Specifies an Amazon Redshift target.
Members
- Data
-
- Type: AmazonRedshiftNodeData structure
Specifies the data of the Amazon Redshift target node.
- Inputs
-
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Type: string
The name of the Amazon Redshift target.
AnnotationError
Description
A failed annotation.
Members
- FailureReason
-
- Type: string
The reason why the annotation failed.
- ProfileId
-
- Type: string
The Profile ID for the failed annotation.
- StatisticId
-
- Type: string
The Statistic ID for the failed annotation.
ApplyMapping
Description
Specifies a transform that maps data property keys in the data source to data property keys in the data target. You can rename keys, modify the data types for keys, and choose which keys to drop from the dataset.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Mapping
-
- Required: Yes
- Type: Array of Mapping structures
Specifies the mapping of data property keys in the data source to data property keys in the data target.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
AthenaConnectorSource
Description
Specifies a connector to an Amazon Athena data source.
Members
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection that is associated with the connector.
- ConnectionTable
-
- Type: string
The name of the table in the data source.
- ConnectionType
-
- Required: Yes
- Type: string
The type of connection, such as marketplace.athena or custom.athena, designating a connection to an Amazon Athena data store.
- ConnectorName
-
- Required: Yes
- Type: string
The name of a connector that assists with accessing the data store in Glue Studio.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the custom Athena source.
- SchemaName
-
- Required: Yes
- Type: string
The name of the Cloudwatch log group to read from. For example,
/aws-glue/jobs/output
.
AuditContext
Description
A structure containing the Lake Formation audit context.
Members
- AdditionalAuditContext
-
- Type: string
A string containing the additional audit context information.
- AllColumnsRequested
-
- Type: boolean
All columns request for audit.
- RequestedColumns
-
- Type: Array of strings
The requested columns for audit.
AuthenticationConfiguration
Description
A structure containing the authentication configuration.
Members
- AuthenticationType
-
- Type: string
A structure containing the authentication configuration.
- OAuth2Properties
-
- Type: OAuth2Properties structure
The properties for OAuth2 authentication.
- SecretArn
-
- Type: string
The secret manager ARN to store credentials.
AuthenticationConfigurationInput
Description
A structure containing the authentication configuration in the CreateConnection request.
Members
- AuthenticationType
-
- Type: string
A structure containing the authentication configuration in the CreateConnection request.
- OAuth2Properties
-
- Type: OAuth2PropertiesInput structure
The properties for OAuth2 authentication in the CreateConnection request.
- SecretArn
-
- Type: string
The secret manager ARN to store credentials in the CreateConnection request.
AuthorizationCodeProperties
Description
The set of properties required for the the OAuth2 AUTHORIZATION_CODE
grant type workflow.
Members
- AuthorizationCode
-
- Type: string
An authorization code to be used in the third leg of the
AUTHORIZATION_CODE
grant workflow. This is a single-use code which becomes invalid once exchanged for an access token, thus it is acceptable to have this value as a request parameter. - RedirectUri
-
- Type: string
The redirect URI where the user gets redirected to by authorization server when issuing an authorization code. The URI is subsequently used when the authorization code is exchanged for an access token.
BackfillError
Description
A list of errors that can occur when registering partition indexes for an existing table.
These errors give the details about why an index registration failed and provide a limited number of partitions in the response, so that you can fix the partitions at fault and try registering the index again. The most common set of errors that can occur are categorized as follows:
-
EncryptedPartitionError: The partitions are encrypted.
-
InvalidPartitionTypeDataError: The partition value doesn't match the data type for that partition column.
-
MissingPartitionValueError: The partitions are encrypted.
-
UnsupportedPartitionCharacterError: Characters inside the partition value are not supported. For example: U+0000 , U+0001, U+0002.
-
InternalError: Any error which does not belong to other error codes.
Members
- Code
-
- Type: string
The error code for an error that occurred when registering partition indexes for an existing table.
- Partitions
-
- Type: Array of PartitionValueList structures
A list of a limited number of partitions in the response.
BasicCatalogTarget
Description
Specifies a target that uses a Glue Data Catalog table.
Members
- Database
-
- Required: Yes
- Type: string
The database that contains the table you want to use as the target. This database must already exist in the Data Catalog.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of your data target.
- PartitionKeys
-
- Type: Array of stringss
The partition keys used to distribute data across multiple partitions or shards based on a specific key or set of key.
- Table
-
- Required: Yes
- Type: string
The table that defines the schema of your output data. This table must already exist in the Data Catalog.
BatchGetTableOptimizerEntry
Description
Represents a table optimizer to retrieve in the BatchGetTableOptimizer
operation.
Members
- catalogId
-
- Type: string
The Catalog ID of the table.
- databaseName
-
- Type: string
The name of the database in the catalog in which the table resides.
- tableName
-
- Type: string
The name of the table.
- type
-
- Type: string
The type of table optimizer.
BatchGetTableOptimizerError
Description
Contains details on one of the errors in the error list returned by the BatchGetTableOptimizer
operation.
Members
- catalogId
-
- Type: string
The Catalog ID of the table.
- databaseName
-
- Type: string
The name of the database in the catalog in which the table resides.
- error
-
- Type: ErrorDetail structure
An
ErrorDetail
object containing code and message details about the error. - tableName
-
- Type: string
The name of the table.
- type
-
- Type: string
The type of table optimizer.
BatchStopJobRunError
Description
Records an error that occurred when attempting to stop a specified job run.
Members
- ErrorDetail
-
- Type: ErrorDetail structure
Specifies details about the error that was encountered.
- JobName
-
- Type: string
The name of the job definition that is used in the job run in question.
- JobRunId
-
- Type: string
The
JobRunId
of the job run in question.
BatchStopJobRunSuccessfulSubmission
Description
Records a successful request to stop a specified JobRun
.
Members
- JobName
-
- Type: string
The name of the job definition used in the job run that was stopped.
- JobRunId
-
- Type: string
The
JobRunId
of the job run that was stopped.
BatchTableOptimizer
Description
Contains details for one of the table optimizers returned by the BatchGetTableOptimizer
operation.
Members
- catalogId
-
- Type: string
The Catalog ID of the table.
- databaseName
-
- Type: string
The name of the database in the catalog in which the table resides.
- tableName
-
- Type: string
The name of the table.
- tableOptimizer
-
- Type: TableOptimizer structure
A
TableOptimizer
object that contains details on the configuration and last run of a table optimizer.
BatchUpdatePartitionFailureEntry
Description
Contains information about a batch update partition error.
Members
- ErrorDetail
-
- Type: ErrorDetail structure
The details about the batch update partition error.
- PartitionValueList
-
- Type: Array of strings
A list of values defining the partitions.
BatchUpdatePartitionRequestEntry
Description
A structure that contains the values and structure used to update a partition.
Members
- PartitionInput
-
- Required: Yes
- Type: PartitionInput structure
The structure used to update a partition.
- PartitionValueList
-
- Required: Yes
- Type: Array of strings
A list of values defining the partitions.
BinaryColumnStatisticsData
Description
Defines column statistics supported for bit sequence data values.
Members
- AverageLength
-
- Required: Yes
- Type: double
The average bit sequence length in the column.
- MaximumLength
-
- Required: Yes
- Type: long (int|float)
The size of the longest bit sequence in the column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
Blueprint
Description
The details of a blueprint.
Members
- BlueprintLocation
-
- Type: string
Specifies the path in Amazon S3 where the blueprint is published.
- BlueprintServiceLocation
-
- Type: string
Specifies a path in Amazon S3 where the blueprint is copied when you call
CreateBlueprint/UpdateBlueprint
to register the blueprint in Glue. - CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time the blueprint was registered.
- Description
-
- Type: string
The description of the blueprint.
- ErrorMessage
-
- Type: string
An error message.
- LastActiveDefinition
-
- Type: LastActiveDefinition structure
When there are multiple versions of a blueprint and the latest version has some errors, this attribute indicates the last successful blueprint definition that is available with the service.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time the blueprint was last modified.
- Name
-
- Type: string
The name of the blueprint.
- ParameterSpec
-
- Type: string
A JSON string that indicates the list of parameter specifications for the blueprint.
- Status
-
- Type: string
The status of the blueprint registration.
-
Creating — The blueprint registration is in progress.
-
Active — The blueprint has been successfully registered.
-
Updating — An update to the blueprint registration is in progress.
-
Failed — The blueprint registration failed.
BlueprintDetails
Description
The details of a blueprint.
Members
- BlueprintName
-
- Type: string
The name of the blueprint.
- RunId
-
- Type: string
The run ID for this blueprint.
BlueprintRun
Description
The details of a blueprint run.
Members
- BlueprintName
-
- Type: string
The name of the blueprint.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the blueprint run completed.
- ErrorMessage
-
- Type: string
Indicates any errors that are seen while running the blueprint.
- Parameters
-
- Type: string
The blueprint parameters as a string. You will have to provide a value for each key that is required from the parameter spec that is defined in the
Blueprint$ParameterSpec
. - RoleArn
-
- Type: string
The role ARN. This role will be assumed by the Glue service and will be used to create the workflow and other entities of a workflow.
- RollbackErrorMessage
-
- Type: string
If there are any errors while creating the entities of a workflow, we try to roll back the created entities until that point and delete them. This attribute indicates the errors seen while trying to delete the entities that are created.
- RunId
-
- Type: string
The run ID for this blueprint run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the blueprint run started.
- State
-
- Type: string
The state of the blueprint run. Possible values are:
-
Running — The blueprint run is in progress.
-
Succeeded — The blueprint run completed successfully.
-
Failed — The blueprint run failed and rollback is complete.
-
Rolling Back — The blueprint run failed and rollback is in progress.
- WorkflowName
-
- Type: string
The name of a workflow that is created as a result of a successful blueprint run. If a blueprint run has an error, there will not be a workflow created.
BooleanColumnStatisticsData
Description
Defines column statistics supported for Boolean data columns.
Members
- NumberOfFalses
-
- Required: Yes
- Type: long (int|float)
The number of false values in the column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
- NumberOfTrues
-
- Required: Yes
- Type: long (int|float)
The number of true values in the column.
CatalogDeltaSource
Description
Specifies a Delta Lake data source that is registered in the Glue Data Catalog.
Members
- AdditionalDeltaOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the Delta Lake data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Delta Lake source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
CatalogEntry
Description
Specifies a table definition in the Glue Data Catalog.
Members
- DatabaseName
-
- Required: Yes
- Type: string
The database in which the table metadata resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table in question.
CatalogHudiSource
Description
Specifies a Hudi data source that is registered in the Glue Data Catalog.
Members
- AdditionalHudiOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the Hudi data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Hudi source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
CatalogImportStatus
Description
A structure containing migration status information.
Members
- ImportCompleted
-
- Type: boolean
True
if the migration has completed, orFalse
otherwise. - ImportTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that the migration was started.
- ImportedBy
-
- Type: string
The name of the person who initiated the migration.
CatalogKafkaSource
Description
Specifies an Apache Kafka data store in the Data Catalog.
Members
- DataPreviewOptions
-
- Type: StreamingDataPreviewOptions structure
Specifies options related to data preview for viewing a sample of your data.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- DetectSchema
-
- Type: boolean
Whether to automatically determine the schema from the incoming data.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- StreamingOptions
-
- Type: KafkaStreamingSourceOptions structure
Specifies the streaming options.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
- WindowSize
-
- Type: int
The amount of time to spend processing each micro batch.
CatalogKinesisSource
Description
Specifies a Kinesis data source in the Glue Data Catalog.
Members
- DataPreviewOptions
-
- Type: StreamingDataPreviewOptions structure
Additional options for data preview.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- DetectSchema
-
- Type: boolean
Whether to automatically determine the schema from the incoming data.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- StreamingOptions
-
- Type: KinesisStreamingSourceOptions structure
Additional options for the Kinesis streaming data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
- WindowSize
-
- Type: int
The amount of time to spend processing each micro batch.
CatalogSchemaChangePolicy
Description
A policy that specifies update behavior for the crawler.
Members
- EnableUpdateCatalog
-
- Type: boolean
Whether to use the specified update behavior when the crawler finds a changed schema.
- UpdateBehavior
-
- Type: string
The update behavior when the crawler finds a changed schema.
CatalogSource
Description
Specifies a data store in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
CatalogTarget
Description
Specifies an Glue Data Catalog target.
Members
- ConnectionName
-
- Type: string
The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a
Catalog
connection type paired with aNETWORK
Connection type. - DatabaseName
-
- Required: Yes
- Type: string
The name of the database to be synchronized.
- DlqEventQueueArn
-
- Type: string
A valid Amazon dead-letter SQS ARN. For example,
arn:aws:sqs:region:account:deadLetterQueue
. - EventQueueArn
-
- Type: string
A valid Amazon SQS ARN. For example,
arn:aws:sqs:region:account:sqs
. - Tables
-
- Required: Yes
- Type: Array of strings
A list of the tables to be synchronized.
Classifier
Description
Classifiers are triggered during a crawl task. A classifier checks whether a given file is in a format it can handle. If it is, the classifier creates a schema in the form of a StructType
object that matches that data format.
You can use the standard classifiers that Glue provides, or you can write your own classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A classifier can be a grok
classifier, an XML
classifier, a JSON
classifier, or a custom CSV
classifier, as specified in one of the fields in the Classifier
object.
Members
- CsvClassifier
-
- Type: CsvClassifier structure
A classifier for comma-separated values (CSV).
- GrokClassifier
-
- Type: GrokClassifier structure
A classifier that uses
grok
. - JsonClassifier
-
- Type: JsonClassifier structure
A classifier for JSON content.
- XMLClassifier
-
- Type: XMLClassifier structure
A classifier for XML content.
CloudWatchEncryption
Description
Specifies how Amazon CloudWatch data should be encrypted.
Members
- CloudWatchEncryptionMode
-
- Type: string
The encryption mode to use for CloudWatch data.
- KmsKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
CodeGenConfigurationNode
Description
CodeGenConfigurationNode
enumerates all valid Node types. One and only one of its member variables can be populated.
Members
- Aggregate
-
- Type: Aggregate structure
Specifies a transform that groups rows by chosen fields and computes the aggregated value by specified function.
- AmazonRedshiftSource
-
- Type: AmazonRedshiftSource structure
Specifies a target that writes to a data source in Amazon Redshift.
- AmazonRedshiftTarget
-
- Type: AmazonRedshiftTarget structure
Specifies a target that writes to a data target in Amazon Redshift.
- ApplyMapping
-
- Type: ApplyMapping structure
Specifies a transform that maps data property keys in the data source to data property keys in the data target. You can rename keys, modify the data types for keys, and choose which keys to drop from the dataset.
- AthenaConnectorSource
-
- Type: AthenaConnectorSource structure
Specifies a connector to an Amazon Athena data source.
- CatalogDeltaSource
-
- Type: CatalogDeltaSource structure
Specifies a Delta Lake data source that is registered in the Glue Data Catalog.
- CatalogHudiSource
-
- Type: CatalogHudiSource structure
Specifies a Hudi data source that is registered in the Glue Data Catalog.
- CatalogKafkaSource
-
- Type: CatalogKafkaSource structure
Specifies an Apache Kafka data store in the Data Catalog.
- CatalogKinesisSource
-
- Type: CatalogKinesisSource structure
Specifies a Kinesis data source in the Glue Data Catalog.
- CatalogSource
-
- Type: CatalogSource structure
Specifies a data store in the Glue Data Catalog.
- CatalogTarget
-
- Type: BasicCatalogTarget structure
Specifies a target that uses a Glue Data Catalog table.
- ConnectorDataSource
-
- Type: ConnectorDataSource structure
Specifies a source generated with standard connection options.
- ConnectorDataTarget
-
- Type: ConnectorDataTarget structure
Specifies a target generated with standard connection options.
- CustomCode
-
- Type: CustomCode structure
Specifies a transform that uses custom code you provide to perform the data transformation. The output is a collection of DynamicFrames.
- DirectJDBCSource
-
- Type: DirectJDBCSource structure
Specifies the direct JDBC source connection.
- DirectKafkaSource
-
- Type: DirectKafkaSource structure
Specifies an Apache Kafka data store.
- DirectKinesisSource
-
- Type: DirectKinesisSource structure
Specifies a direct Amazon Kinesis data source.
- DropDuplicates
-
- Type: DropDuplicates structure
Specifies a transform that removes rows of repeating data from a data set.
- DropFields
-
- Type: DropFields structure
Specifies a transform that chooses the data property keys that you want to drop.
- DropNullFields
-
- Type: DropNullFields structure
Specifies a transform that removes columns from the dataset if all values in the column are 'null'. By default, Glue Studio will recognize null objects, but some values such as empty strings, strings that are "null", -1 integers or other placeholders such as zeros, are not automatically recognized as nulls.
- DynamicTransform
-
- Type: DynamicTransform structure
Specifies a custom visual transform created by a user.
- DynamoDBCatalogSource
-
- Type: DynamoDBCatalogSource structure
Specifies a DynamoDBC Catalog data store in the Glue Data Catalog.
- EvaluateDataQuality
-
- Type: EvaluateDataQuality structure
Specifies your data quality evaluation criteria.
- EvaluateDataQualityMultiFrame
-
- Type: EvaluateDataQualityMultiFrame structure
Specifies your data quality evaluation criteria. Allows multiple input data and returns a collection of Dynamic Frames.
- FillMissingValues
-
- Type: FillMissingValues structure
Specifies a transform that locates records in the dataset that have missing values and adds a new field with a value determined by imputation. The input data set is used to train the machine learning model that determines what the missing value should be.
- Filter
-
- Type: Filter structure
Specifies a transform that splits a dataset into two, based on a filter condition.
- GovernedCatalogSource
-
- Type: GovernedCatalogSource structure
Specifies a data source in a goverened Data Catalog.
- GovernedCatalogTarget
-
- Type: GovernedCatalogTarget structure
Specifies a data target that writes to a goverened catalog.
- JDBCConnectorSource
-
- Type: JDBCConnectorSource structure
Specifies a connector to a JDBC data source.
- JDBCConnectorTarget
-
- Type: JDBCConnectorTarget structure
Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.
- Join
-
- Type: Join structure
Specifies a transform that joins two datasets into one dataset using a comparison phrase on the specified data property keys. You can use inner, outer, left, right, left semi, and left anti joins.
- Merge
-
- Type: Merge structure
Specifies a transform that merges a
DynamicFrame
with a stagingDynamicFrame
based on the specified primary keys to identify records. Duplicate records (records with the same primary keys) are not de-duplicated. - MicrosoftSQLServerCatalogSource
-
- Type: MicrosoftSQLServerCatalogSource structure
Specifies a Microsoft SQL server data source in the Glue Data Catalog.
- MicrosoftSQLServerCatalogTarget
-
- Type: MicrosoftSQLServerCatalogTarget structure
Specifies a target that uses Microsoft SQL.
- MySQLCatalogSource
-
- Type: MySQLCatalogSource structure
Specifies a MySQL data source in the Glue Data Catalog.
- MySQLCatalogTarget
-
- Type: MySQLCatalogTarget structure
Specifies a target that uses MySQL.
- OracleSQLCatalogSource
-
- Type: OracleSQLCatalogSource structure
Specifies an Oracle data source in the Glue Data Catalog.
- OracleSQLCatalogTarget
-
- Type: OracleSQLCatalogTarget structure
Specifies a target that uses Oracle SQL.
- PIIDetection
-
- Type: PIIDetection structure
Specifies a transform that identifies, removes or masks PII data.
- PostgreSQLCatalogSource
-
- Type: PostgreSQLCatalogSource structure
Specifies a PostgresSQL data source in the Glue Data Catalog.
- PostgreSQLCatalogTarget
-
- Type: PostgreSQLCatalogTarget structure
Specifies a target that uses Postgres SQL.
- Recipe
-
- Type: Recipe structure
Specifies a Glue DataBrew recipe node.
- RedshiftSource
-
- Type: RedshiftSource structure
Specifies an Amazon Redshift data store.
- RedshiftTarget
-
- Type: RedshiftTarget structure
Specifies a target that uses Amazon Redshift.
- RelationalCatalogSource
-
- Type: RelationalCatalogSource structure
Specifies a relational catalog data store in the Glue Data Catalog.
- RenameField
-
- Type: RenameField structure
Specifies a transform that renames a single data property key.
- S3CatalogDeltaSource
-
- Type: S3CatalogDeltaSource structure
Specifies a Delta Lake data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.
- S3CatalogHudiSource
-
- Type: S3CatalogHudiSource structure
Specifies a Hudi data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.
- S3CatalogSource
-
- Type: S3CatalogSource structure
Specifies an Amazon S3 data store in the Glue Data Catalog.
- S3CatalogTarget
-
- Type: S3CatalogTarget structure
Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.
- S3CsvSource
-
- Type: S3CsvSource structure
Specifies a command-separated value (CSV) data store stored in Amazon S3.
- S3DeltaCatalogTarget
-
- Type: S3DeltaCatalogTarget structure
Specifies a target that writes to a Delta Lake data source in the Glue Data Catalog.
- S3DeltaDirectTarget
-
- Type: S3DeltaDirectTarget structure
Specifies a target that writes to a Delta Lake data source in Amazon S3.
- S3DeltaSource
-
- Type: S3DeltaSource structure
Specifies a Delta Lake data source stored in Amazon S3.
- S3DirectTarget
-
- Type: S3DirectTarget structure
Specifies a data target that writes to Amazon S3.
- S3GlueParquetTarget
-
- Type: S3GlueParquetTarget structure
Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.
- S3HudiCatalogTarget
-
- Type: S3HudiCatalogTarget structure
Specifies a target that writes to a Hudi data source in the Glue Data Catalog.
- S3HudiDirectTarget
-
- Type: S3HudiDirectTarget structure
Specifies a target that writes to a Hudi data source in Amazon S3.
- S3HudiSource
-
- Type: S3HudiSource structure
Specifies a Hudi data source stored in Amazon S3.
- S3JsonSource
-
- Type: S3JsonSource structure
Specifies a JSON data store stored in Amazon S3.
- S3ParquetSource
-
- Type: S3ParquetSource structure
Specifies an Apache Parquet data store stored in Amazon S3.
- SelectFields
-
- Type: SelectFields structure
Specifies a transform that chooses the data property keys that you want to keep.
- SelectFromCollection
-
- Type: SelectFromCollection structure
Specifies a transform that chooses one
DynamicFrame
from a collection ofDynamicFrames
. The output is the selectedDynamicFrame
- SnowflakeSource
-
- Type: SnowflakeSource structure
Specifies a Snowflake data source.
- SnowflakeTarget
-
- Type: SnowflakeTarget structure
Specifies a target that writes to a Snowflake data source.
- SparkConnectorSource
-
- Type: SparkConnectorSource structure
Specifies a connector to an Apache Spark data source.
- SparkConnectorTarget
-
- Type: SparkConnectorTarget structure
Specifies a target that uses an Apache Spark connector.
- SparkSQL
-
- Type: SparkSQL structure
Specifies a transform where you enter a SQL query using Spark SQL syntax to transform the data. The output is a single
DynamicFrame
. - Spigot
-
- Type: Spigot structure
Specifies a transform that writes samples of the data to an Amazon S3 bucket.
- SplitFields
-
- Type: SplitFields structure
Specifies a transform that splits data property keys into two
DynamicFrames
. The output is a collection ofDynamicFrames
: one with selected data property keys, and one with the remaining data property keys. - Union
-
- Type: Union structure
Specifies a transform that combines the rows from two or more datasets into a single result.
CodeGenEdge
Description
Represents a directional edge in a directed acyclic graph (DAG).
Members
- Source
-
- Required: Yes
- Type: string
The ID of the node at which the edge starts.
- Target
-
- Required: Yes
- Type: string
The ID of the node at which the edge ends.
- TargetParameter
-
- Type: string
The target of the edge.
CodeGenNode
Description
Represents a node in a directed acyclic graph (DAG)
Members
- Args
-
- Required: Yes
- Type: Array of CodeGenNodeArg structures
Properties of the node, in the form of name-value pairs.
- Id
-
- Required: Yes
- Type: string
A node identifier that is unique within the node's graph.
- LineNumber
-
- Type: int
The line number of the node.
- NodeType
-
- Required: Yes
- Type: string
The type of node that this is.
CodeGenNodeArg
Description
An argument or property of a node.
Members
- Name
-
- Required: Yes
- Type: string
The name of the argument or property.
- Param
-
- Type: boolean
True if the value is used as a parameter.
- Value
-
- Required: Yes
- Type: string
The value of the argument or property.
Column
Description
A column in a Table
.
Members
- Comment
-
- Type: string
A free-form text comment.
- Name
-
- Required: Yes
- Type: string
The name of the
Column
. - Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define properties associated with the column.
- Type
-
- Type: string
The data type of the
Column
.
ColumnError
Description
Encapsulates a column name that failed and the reason for failure.
Members
- ColumnName
-
- Type: string
The name of the column that failed.
- Error
-
- Type: ErrorDetail structure
An error message with the reason for the failure of an operation.
ColumnImportance
Description
A structure containing the column name and column importance score for a column.
Column importance helps you understand how columns contribute to your model, by identifying which columns in your records are more important than others.
Members
- ColumnName
-
- Type: string
The name of a column.
- Importance
-
- Type: double
The column importance score for the column, as a decimal.
ColumnRowFilter
Description
A filter that uses both column-level and row-level filtering.
Members
- ColumnName
-
- Type: string
A string containing the name of the column.
- RowFilterExpression
-
- Type: string
A string containing the row-level filter expression.
ColumnStatistics
Description
Represents the generated column-level statistics for a table or partition.
Members
- AnalyzedTime
-
- Required: Yes
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp of when column statistics were generated.
- ColumnName
-
- Required: Yes
- Type: string
Name of column which statistics belong to.
- ColumnType
-
- Required: Yes
- Type: string
The data type of the column.
- StatisticsData
-
- Required: Yes
- Type: ColumnStatisticsData structure
A
ColumnStatisticData
object that contains the statistics data values.
ColumnStatisticsData
Description
Contains the individual types of column statistics data. Only one data object should be set and indicated by the Type
attribute.
Members
- BinaryColumnStatisticsData
-
- Type: BinaryColumnStatisticsData structure
Binary column statistics data.
- BooleanColumnStatisticsData
-
- Type: BooleanColumnStatisticsData structure
Boolean column statistics data.
- DateColumnStatisticsData
-
- Type: DateColumnStatisticsData structure
Date column statistics data.
- DecimalColumnStatisticsData
-
- Type: DecimalColumnStatisticsData structure
Decimal column statistics data. UnscaledValues within are Base64-encoded binary objects storing big-endian, two's complement representations of the decimal's unscaled value.
- DoubleColumnStatisticsData
-
- Type: DoubleColumnStatisticsData structure
Double column statistics data.
- LongColumnStatisticsData
-
- Type: LongColumnStatisticsData structure
Long column statistics data.
- StringColumnStatisticsData
-
- Type: StringColumnStatisticsData structure
String column statistics data.
- Type
-
- Required: Yes
- Type: string
The type of column statistics data.
ColumnStatisticsError
Description
Encapsulates a ColumnStatistics
object that failed and the reason for failure.
Members
- ColumnStatistics
-
- Type: ColumnStatistics structure
The
ColumnStatistics
of the column. - Error
-
- Type: ErrorDetail structure
An error message with the reason for the failure of an operation.
ColumnStatisticsTaskNotRunningException
Description
An exception thrown when you try to stop a task run when there is no task running.
Members
- Message
-
- Type: string
A message describing the problem.
ColumnStatisticsTaskRun
Description
The object that shows the details of the column stats run.
Members
- CatalogID
-
- Type: string
The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNameList
-
- Type: Array of strings
A list of the column names. If none is supplied, all column names for the table will be used by default.
- ColumnStatisticsTaskRunId
-
- Type: string
The identifier for the particular column statistics task run.
- ComputationType
-
- Type: string
The type of column statistics computation.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this task was created.
- CustomerId
-
- Type: string
The Amazon Web Services account ID.
- DPUSeconds
-
- Type: double
The calculated DPU usage in seconds for all autoscaled workers.
- DatabaseName
-
- Type: string
The database where the table resides.
- EndTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The end time of the task.
- ErrorMessage
-
- Type: string
The error message for the job.
- LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last point in time when this task was modified.
- NumberOfWorkers
-
- Type: int
The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
- Role
-
- Type: string
The IAM role that the service assumes to generate statistics.
- SampleSize
-
- Type: double
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
- SecurityConfiguration
-
- Type: string
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
- StartTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The start time of the task.
- Status
-
- Type: string
The status of the task run.
- TableName
-
- Type: string
The name of the table for which column statistics is generated.
- WorkerType
-
- Type: string
The type of workers being used for generating stats. The default is
g.1x
.
ColumnStatisticsTaskRunningException
Description
An exception thrown when you try to start another job while running a column stats generation job.
Members
- Message
-
- Type: string
A message describing the problem.
ColumnStatisticsTaskSettings
Description
The settings for a column statistics task.
Members
- CatalogID
-
- Type: string
The ID of the Data Catalog in which the database resides.
- ColumnNameList
-
- Type: Array of strings
A list of column names for which to run statistics.
- DatabaseName
-
- Type: string
The name of the database where the table resides.
- Role
-
- Type: string
The role used for running the column statistics.
- SampleSize
-
- Type: double
The percentage of data to sample.
- Schedule
-
- Type: Schedule structure
A schedule for running the column statistics, specified in CRON syntax.
- SecurityConfiguration
-
- Type: string
Name of the security configuration that is used to encrypt CloudWatch logs.
- TableName
-
- Type: string
The name of the table for which to generate column statistics.
ColumnStatisticsTaskStoppingException
Description
An exception thrown when you try to stop a task run.
Members
- Message
-
- Type: string
A message describing the problem.
CompactionMetrics
Description
A structure that contains compaction metrics for the optimizer run.
Members
- IcebergMetrics
-
- Type: IcebergCompactionMetrics structure
A structure containing the Iceberg compaction metrics for the optimizer run.
ConcurrentModificationException
Description
Two processes are trying to modify a resource simultaneously.
Members
- Message
-
- Type: string
A message describing the problem.
ConcurrentRunsExceededException
Description
Too many jobs are being run concurrently.
Members
- Message
-
- Type: string
A message describing the problem.
Condition
Description
Defines a condition under which a trigger fires.
Members
- CrawlState
-
- Type: string
The state of the crawler to which this condition applies.
- CrawlerName
-
- Type: string
The name of the crawler to which this condition applies.
- JobName
-
- Type: string
The name of the job whose
JobRuns
this condition applies to, and on which this trigger waits. - LogicalOperator
-
- Type: string
A logical operator.
- State
-
- Type: string
The condition state. Currently, the only job states that a trigger can listen for are
SUCCEEDED
,STOPPED
,FAILED
, andTIMEOUT
. The only crawler states that a trigger can listen for areSUCCEEDED
,FAILED
, andCANCELLED
.
ConditionCheckFailureException
Description
A specified condition was not satisfied.
Members
- Message
-
- Type: string
A message describing the problem.
ConditionExpression
Description
Condition expression defined in the Glue Studio data preparation recipe node.
Members
- Condition
-
- Required: Yes
- Type: string
The condition of the condition expression.
- TargetColumn
-
- Required: Yes
- Type: string
The target column of the condition expressions.
- Value
-
- Type: string
The value of the condition expression.
ConfigurationObject
Description
Specifies the values that an admin sets for each job or session parameter configured in a Glue usage profile.
Members
- AllowedValues
-
- Type: Array of strings
A list of allowed values for the parameter.
- DefaultValue
-
- Type: string
A default value for the parameter.
- MaxValue
-
- Type: string
A maximum allowed value for the parameter.
- MinValue
-
- Type: string
A minimum allowed value for the parameter.
ConflictException
Description
The CreatePartitions
API was called on a table that has indexes enabled.
Members
- Message
-
- Type: string
A message describing the problem.
ConfusionMatrix
Description
The confusion matrix shows you what your transform is predicting accurately and what types of errors it is making.
For more information, see Confusion matrix in Wikipedia.
Members
- NumFalseNegatives
-
- Type: long (int|float)
The number of matches in the data that the transform didn't find, in the confusion matrix for your transform.
- NumFalsePositives
-
- Type: long (int|float)
The number of nonmatches in the data that the transform incorrectly classified as a match, in the confusion matrix for your transform.
- NumTrueNegatives
-
- Type: long (int|float)
The number of nonmatches in the data that the transform correctly rejected, in the confusion matrix for your transform.
- NumTruePositives
-
- Type: long (int|float)
The number of matches in the data that the transform correctly found, in the confusion matrix for your transform.
Connection
Description
Defines a connection to a data source.
Members
- AthenaProperties
-
- Type: Associative array of custom strings keys (PropertyKey) to strings
This field is not currently used.
- AuthenticationConfiguration
-
- Type: AuthenticationConfiguration structure
The authentication properties of the connection.
- ConnectionProperties
-
- Type: Associative array of custom strings keys (ConnectionPropertyKey) to strings
These key-value pairs define parameters for the connection:
-
HOST
- The host URI: either the fully qualified domain name (FQDN) or the IPv4 address of the database host. -
PORT
- The port number, between 1024 and 65535, of the port on which the database host is listening for database connections. -
USER_NAME
- The name under which to log in to the database. The value string forUSER_NAME
is "USERNAME
". -
PASSWORD
- A password, if one is used, for the user name. -
ENCRYPTED_PASSWORD
- When you enable connection password protection by settingConnectionPasswordEncryption
in the Data Catalog encryption settings, this field stores the encrypted password. -
JDBC_DRIVER_JAR_URI
- The Amazon Simple Storage Service (Amazon S3) path of the JAR file that contains the JDBC driver to use. -
JDBC_DRIVER_CLASS_NAME
- The class name of the JDBC driver to use. -
JDBC_ENGINE
- The name of the JDBC engine to use. -
JDBC_ENGINE_VERSION
- The version of the JDBC engine to use. -
CONFIG_FILES
- (Reserved for future use.) -
INSTANCE_ID
- The instance ID to use. -
JDBC_CONNECTION_URL
- The URL for connecting to a JDBC data source. -
JDBC_ENFORCE_SSL
- A Boolean string (true, false) specifying whether Secure Sockets Layer (SSL) with hostname matching is enforced for the JDBC connection on the client. The default is false. -
CUSTOM_JDBC_CERT
- An Amazon S3 location specifying the customer's root certificate. Glue uses this root certificate to validate the customer’s certificate when connecting to the customer database. Glue only handles X.509 certificates. The certificate provided must be DER-encoded and supplied in Base64 encoding PEM format. -
SKIP_CUSTOM_JDBC_CERT_VALIDATION
- By default, this isfalse
. Glue validates the Signature algorithm and Subject Public Key Algorithm for the customer certificate. The only permitted algorithms for the Signature algorithm are SHA256withRSA, SHA384withRSA or SHA512withRSA. For the Subject Public Key Algorithm, the key length must be at least 2048. You can set the value of this property totrue
to skip Glue’s validation of the customer certificate. -
CUSTOM_JDBC_CERT_STRING
- A custom JDBC certificate string which is used for domain match or distinguished name match to prevent a man-in-the-middle attack. In Oracle database, this is used as theSSL_SERVER_CERT_DN
; in Microsoft SQL Server, this is used as thehostNameInCertificate
. -
CONNECTION_URL
- The URL for connecting to a general (non-JDBC) data source. -
SECRET_ID
- The secret ID used for the secret manager of credentials. -
CONNECTOR_URL
- The connector URL for a MARKETPLACE or CUSTOM connection. -
CONNECTOR_TYPE
- The connector type for a MARKETPLACE or CUSTOM connection. -
CONNECTOR_CLASS_NAME
- The connector class name for a MARKETPLACE or CUSTOM connection. -
KAFKA_BOOTSTRAP_SERVERS
- A comma-separated list of host and port pairs that are the addresses of the Apache Kafka brokers in a Kafka cluster to which a Kafka client will connect to and bootstrap itself. -
KAFKA_SSL_ENABLED
- Whether to enable or disable SSL on an Apache Kafka connection. Default value is "true". -
KAFKA_CUSTOM_CERT
- The Amazon S3 URL for the private CA cert file (.pem format). The default is an empty string. -
KAFKA_SKIP_CUSTOM_CERT_VALIDATION
- Whether to skip the validation of the CA cert file or not. Glue validates for three algorithms: SHA256withRSA, SHA384withRSA and SHA512withRSA. Default value is "false". -
KAFKA_CLIENT_KEYSTORE
- The Amazon S3 location of the client keystore file for Kafka client side authentication (Optional). -
KAFKA_CLIENT_KEYSTORE_PASSWORD
- The password to access the provided keystore (Optional). -
KAFKA_CLIENT_KEY_PASSWORD
- A keystore can consist of multiple keys, so this is the password to access the client key to be used with the Kafka server side key (Optional). -
ENCRYPTED_KAFKA_CLIENT_KEYSTORE_PASSWORD
- The encrypted version of the Kafka client keystore password (if the user has the Glue encrypt passwords setting selected). -
ENCRYPTED_KAFKA_CLIENT_KEY_PASSWORD
- The encrypted version of the Kafka client key password (if the user has the Glue encrypt passwords setting selected). -
KAFKA_SASL_MECHANISM
-"SCRAM-SHA-512"
,"GSSAPI"
,"AWS_MSK_IAM"
, or"PLAIN"
. These are the supported SASL Mechanisms. -
KAFKA_SASL_PLAIN_USERNAME
- A plaintext username used to authenticate with the "PLAIN" mechanism. -
KAFKA_SASL_PLAIN_PASSWORD
- A plaintext password used to authenticate with the "PLAIN" mechanism. -
ENCRYPTED_KAFKA_SASL_PLAIN_PASSWORD
- The encrypted version of the Kafka SASL PLAIN password (if the user has the Glue encrypt passwords setting selected). -
KAFKA_SASL_SCRAM_USERNAME
- A plaintext username used to authenticate with the "SCRAM-SHA-512" mechanism. -
KAFKA_SASL_SCRAM_PASSWORD
- A plaintext password used to authenticate with the "SCRAM-SHA-512" mechanism. -
ENCRYPTED_KAFKA_SASL_SCRAM_PASSWORD
- The encrypted version of the Kafka SASL SCRAM password (if the user has the Glue encrypt passwords setting selected). -
KAFKA_SASL_SCRAM_SECRETS_ARN
- The Amazon Resource Name of a secret in Amazon Web Services Secrets Manager. -
KAFKA_SASL_GSSAPI_KEYTAB
- The S3 location of a Kerberoskeytab
file. A keytab stores long-term keys for one or more principals. For more information, see MIT Kerberos Documentation: Keytab. -
KAFKA_SASL_GSSAPI_KRB5_CONF
- The S3 location of a Kerberoskrb5.conf
file. A krb5.conf stores Kerberos configuration information, such as the location of the KDC server. For more information, see MIT Kerberos Documentation: krb5.conf. -
KAFKA_SASL_GSSAPI_SERVICE
- The Kerberos service name, as set withsasl.kerberos.service.name
in your Kafka Configuration. -
KAFKA_SASL_GSSAPI_PRINCIPAL
- The name of the Kerberos princial used by Glue. For more information, see Kafka Documentation: Configuring Kafka Brokers. -
ROLE_ARN
- The role to be used for running queries. -
REGION
- The Amazon Web Services Region where queries will be run. -
WORKGROUP_NAME
- The name of an Amazon Redshift serverless workgroup or Amazon Athena workgroup in which queries will run. -
CLUSTER_IDENTIFIER
- The cluster identifier of an Amazon Redshift cluster in which queries will run. -
DATABASE
- The Amazon Redshift database that you are connecting to.
- ConnectionType
-
- Type: string
The type of the connection. Currently, SFTP is not supported.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp of the time that this connection definition was created.
- Description
-
- Type: string
The description of the connection.
- LastConnectionValidationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp of the time this connection was last validated.
- LastUpdatedBy
-
- Type: string
The user, group, or role that last updated this connection definition.
- LastUpdatedTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp of the last time the connection definition was updated.
- MatchCriteria
-
- Type: Array of strings
A list of criteria that can be used in selecting this connection.
- Name
-
- Type: string
The name of the connection definition.
- PhysicalConnectionRequirements
-
- Type: PhysicalConnectionRequirements structure
The physical connection requirements, such as virtual private cloud (VPC) and
SecurityGroup
, that are needed to make this connection successfully. - Status
-
- Type: string
The status of the connection. Can be one of:
READY
,IN_PROGRESS
, orFAILED
. - StatusReason
-
- Type: string
The reason for the connection status.
ConnectionInput
Description
A structure that is used to specify a connection to create or update.
Members
- AthenaProperties
-
- Type: Associative array of custom strings keys (PropertyKey) to strings
This field is not currently used.
- AuthenticationConfiguration
-
- Type: AuthenticationConfigurationInput structure
The authentication properties of the connection. Used for a Salesforce connection.
- ConnectionProperties
-
- Required: Yes
- Type: Associative array of custom strings keys (ConnectionPropertyKey) to strings
These key-value pairs define parameters for the connection.
- ConnectionType
-
- Required: Yes
- Type: string
The type of the connection. Currently, these types are supported:
-
JDBC
- Designates a connection to a database through Java Database Connectivity (JDBC).JDBC
Connections use the following ConnectionParameters.-
Required: All of (
HOST
,PORT
,JDBC_ENGINE
) orJDBC_CONNECTION_URL
. -
Required: All of (
USERNAME
,PASSWORD
) orSECRET_ID
. -
Optional:
JDBC_ENFORCE_SSL
,CUSTOM_JDBC_CERT
,CUSTOM_JDBC_CERT_STRING
,SKIP_CUSTOM_JDBC_CERT_VALIDATION
. These parameters are used to configure SSL with JDBC.
-
-
KAFKA
- Designates a connection to an Apache Kafka streaming platform.KAFKA
Connections use the following ConnectionParameters.-
Required:
KAFKA_BOOTSTRAP_SERVERS
. -
Optional:
KAFKA_SSL_ENABLED
,KAFKA_CUSTOM_CERT
,KAFKA_SKIP_CUSTOM_CERT_VALIDATION
. These parameters are used to configure SSL withKAFKA
. -
Optional:
KAFKA_CLIENT_KEYSTORE
,KAFKA_CLIENT_KEYSTORE_PASSWORD
,KAFKA_CLIENT_KEY_PASSWORD
,ENCRYPTED_KAFKA_CLIENT_KEYSTORE_PASSWORD
,ENCRYPTED_KAFKA_CLIENT_KEY_PASSWORD
. These parameters are used to configure TLS client configuration with SSL inKAFKA
. -
Optional:
KAFKA_SASL_MECHANISM
. Can be specified asSCRAM-SHA-512
,GSSAPI
, orAWS_MSK_IAM
. -
Optional:
KAFKA_SASL_SCRAM_USERNAME
,KAFKA_SASL_SCRAM_PASSWORD
,ENCRYPTED_KAFKA_SASL_SCRAM_PASSWORD
. These parameters are used to configure SASL/SCRAM-SHA-512 authentication withKAFKA
. -
Optional:
KAFKA_SASL_GSSAPI_KEYTAB
,KAFKA_SASL_GSSAPI_KRB5_CONF
,KAFKA_SASL_GSSAPI_SERVICE
,KAFKA_SASL_GSSAPI_PRINCIPAL
. These parameters are used to configure SASL/GSSAPI authentication withKAFKA
.
-
-
MONGODB
- Designates a connection to a MongoDB document database.MONGODB
Connections use the following ConnectionParameters.-
Required:
CONNECTION_URL
. -
Required: All of (
USERNAME
,PASSWORD
) orSECRET_ID
.
-
-
SALESFORCE
- Designates a connection to Salesforce using OAuth authencation.-
Requires the
AuthenticationConfiguration
member to be configured.
-
-
VIEW_VALIDATION_REDSHIFT
- Designates a connection used for view validation by Amazon Redshift. -
VIEW_VALIDATION_ATHENA
- Designates a connection used for view validation by Amazon Athena. -
NETWORK
- Designates a network connection to a data source within an Amazon Virtual Private Cloud environment (Amazon VPC).NETWORK
Connections do not require ConnectionParameters. Instead, provide a PhysicalConnectionRequirements. -
MARKETPLACE
- Uses configuration settings contained in a connector purchased from Amazon Web Services Marketplace to read from and write to data stores that are not natively supported by Glue.MARKETPLACE
Connections use the following ConnectionParameters.-
Required:
CONNECTOR_TYPE
,CONNECTOR_URL
,CONNECTOR_CLASS_NAME
,CONNECTION_URL
. -
Required for
JDBC
CONNECTOR_TYPE
connections: All of (USERNAME
,PASSWORD
) orSECRET_ID
.
-
-
CUSTOM
- Uses configuration settings contained in a custom connector to read from and write to data stores that are not natively supported by Glue.
SFTP
is not supported.For more information about how optional ConnectionProperties are used to configure features in Glue, consult Glue connection properties.
For more information about how optional ConnectionProperties are used to configure features in Glue Studio, consult Using connectors and connections.
- Description
-
- Type: string
The description of the connection.
- MatchCriteria
-
- Type: Array of strings
A list of criteria that can be used in selecting this connection.
- Name
-
- Required: Yes
- Type: string
The name of the connection.
- PhysicalConnectionRequirements
-
- Type: PhysicalConnectionRequirements structure
The physical connection requirements, such as virtual private cloud (VPC) and
SecurityGroup
, that are needed to successfully make this connection. - ValidateCredentials
-
- Type: boolean
A flag to validate the credentials during create connection. Used for a Salesforce connection. Default is true.
ConnectionPasswordEncryption
Description
The data structure used by the Data Catalog to encrypt the password as part of CreateConnection
or UpdateConnection
and store it in the ENCRYPTED_PASSWORD
field in the connection properties. You can enable catalog encryption or only password encryption.
When a CreationConnection
request arrives containing a password, the Data Catalog first encrypts the password using your KMS key. It then encrypts the whole connection object again if catalog encryption is also enabled.
This encryption requires that you set KMS key permissions to enable or restrict access on the password key according to your security requirements. For example, you might want only administrators to have decrypt permission on the password key.
Members
- AwsKmsKeyId
-
- Type: string
An KMS key that is used to encrypt the connection password.
If connection password protection is enabled, the caller of
CreateConnection
andUpdateConnection
needs at leastkms:Encrypt
permission on the specified KMS key, to encrypt passwords before storing them in the Data Catalog.You can set the decrypt permission to enable or restrict access on the password key according to your security requirements.
- ReturnConnectionPasswordEncrypted
-
- Required: Yes
- Type: boolean
When the
ReturnConnectionPasswordEncrypted
flag is set to "true", passwords remain encrypted in the responses ofGetConnection
andGetConnections
. This encryption takes effect independently from catalog encryption.
ConnectionsList
Description
Specifies the connections used by a job.
Members
- Connections
-
- Type: Array of strings
A list of connections used by the job.
ConnectorDataSource
Description
Specifies a source generated with standard connection options.
Members
- ConnectionType
-
- Required: Yes
- Type: string
The
connectionType
, as provided to the underlying Glue library. This node type supports the following connection types:-
opensearch
-
azuresql
-
azurecosmos
-
bigquery
-
saphana
-
teradata
-
vertica
- Data
-
- Required: Yes
- Type: Associative array of custom strings keys (GenericString) to strings
A map specifying connection options for the node. You can find standard connection options for the corresponding connection type in the Connection parameters section of the Glue documentation.
- Name
-
- Required: Yes
- Type: string
The name of this source node.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for this source.
ConnectorDataTarget
Description
Specifies a target generated with standard connection options.
Members
- ConnectionType
-
- Required: Yes
- Type: string
The
connectionType
, as provided to the underlying Glue library. This node type supports the following connection types:-
opensearch
-
azuresql
-
azurecosmos
-
bigquery
-
saphana
-
teradata
-
vertica
- Data
-
- Required: Yes
- Type: Associative array of custom strings keys (GenericString) to strings
A map specifying connection options for the node. You can find standard connection options for the corresponding connection type in the Connection parameters section of the Glue documentation.
- Inputs
-
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of this target node.
Crawl
Description
The details of a crawl in the workflow.
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time on which the crawl completed.
- ErrorMessage
-
- Type: string
The error message associated with the crawl.
- LogGroup
-
- Type: string
The log group associated with the crawl.
- LogStream
-
- Type: string
The log stream associated with the crawl.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time on which the crawl started.
- State
-
- Type: string
The state of the crawler.
Crawler
Description
Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the Glue Data Catalog.
Members
- Classifiers
-
- Type: Array of strings
A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
- Configuration
-
- Type: string
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
- CrawlElapsedTime
-
- Type: long (int|float)
If the crawler is running, contains the total time elapsed since the last crawl began.
- CrawlerSecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used by this crawler. - CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that the crawler was created.
- DatabaseName
-
- Type: string
The name of the database in which the crawler's output is stored.
- Description
-
- Type: string
A description of the crawler.
- LakeFormationConfiguration
-
- Type: LakeFormationConfiguration structure
Specifies whether the crawler should use Lake Formation credentials for the crawler instead of the IAM role credentials.
- LastCrawl
-
- Type: LastCrawlInfo structure
The status of the last crawl, and potentially error information if an error occurred.
- LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that the crawler was last updated.
- LineageConfiguration
-
- Type: LineageConfiguration structure
A configuration that specifies whether data lineage is enabled for the crawler.
- Name
-
- Type: string
The name of the crawler.
- RecrawlPolicy
-
- Type: RecrawlPolicy structure
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
- Role
-
- Type: string
The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
- Schedule
-
- Type: Schedule structure
For scheduled crawlers, the schedule when the crawler runs.
- SchemaChangePolicy
-
- Type: SchemaChangePolicy structure
The policy that specifies update and delete behaviors for the crawler.
- State
-
- Type: string
Indicates whether the crawler is running, or whether a run is pending.
- TablePrefix
-
- Type: string
The prefix added to the names of tables that are created.
- Targets
-
- Type: CrawlerTargets structure
A collection of targets to crawl.
- Version
-
- Type: long (int|float)
The version of the crawler.
CrawlerHistory
Description
Contains the information for a run of a crawler.
Members
- CrawlId
-
- Type: string
A UUID identifier for each crawl.
- DPUHour
-
- Type: double
The number of data processing units (DPU) used in hours for the crawl.
- EndTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time on which the crawl ended.
- ErrorMessage
-
- Type: string
If an error occurred, the error message associated with the crawl.
- LogGroup
-
- Type: string
The log group associated with the crawl.
- LogStream
-
- Type: string
The log stream associated with the crawl.
- MessagePrefix
-
- Type: string
The prefix for a CloudWatch message about this crawl.
- StartTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time on which the crawl started.
- State
-
- Type: string
The state of the crawl.
- Summary
-
- Type: string
A run summary for the specific crawl in JSON. Contains the catalog tables and partitions that were added, updated, or deleted.
CrawlerMetrics
Description
Metrics for a specified crawler.
Members
- CrawlerName
-
- Type: string
The name of the crawler.
- LastRuntimeSeconds
-
- Type: double
The duration of the crawler's most recent run, in seconds.
- MedianRuntimeSeconds
-
- Type: double
The median duration of this crawler's runs, in seconds.
- StillEstimating
-
- Type: boolean
True if the crawler is still estimating how long it will take to complete this run.
- TablesCreated
-
- Type: int
The number of tables created by this crawler.
- TablesDeleted
-
- Type: int
The number of tables deleted by this crawler.
- TablesUpdated
-
- Type: int
The number of tables updated by this crawler.
- TimeLeftSeconds
-
- Type: double
The estimated time left to complete a running crawl.
CrawlerNodeDetails
Description
The details of a Crawler node present in the workflow.
Members
- Crawls
-
- Type: Array of Crawl structures
A list of crawls represented by the crawl node.
CrawlerNotRunningException
Description
The specified crawler is not running.
Members
- Message
-
- Type: string
A message describing the problem.
CrawlerRunningException
Description
The operation cannot be performed because the crawler is already running.
Members
- Message
-
- Type: string
A message describing the problem.
CrawlerStoppingException
Description
The specified crawler is stopping.
Members
- Message
-
- Type: string
A message describing the problem.
CrawlerTargets
Description
Specifies data stores to crawl.
Members
- CatalogTargets
-
- Type: Array of CatalogTarget structures
Specifies Glue Data Catalog targets.
- DeltaTargets
-
- Type: Array of DeltaTarget structures
Specifies Delta data store targets.
- DynamoDBTargets
-
- Type: Array of DynamoDBTarget structures
Specifies Amazon DynamoDB targets.
- HudiTargets
-
- Type: Array of HudiTarget structures
Specifies Apache Hudi data store targets.
- IcebergTargets
-
- Type: Array of IcebergTarget structures
Specifies Apache Iceberg data store targets.
- JdbcTargets
-
- Type: Array of JdbcTarget structures
Specifies JDBC targets.
- MongoDBTargets
-
- Type: Array of MongoDBTarget structures
Specifies Amazon DocumentDB or MongoDB targets.
- S3Targets
-
- Type: Array of S3Target structures
Specifies Amazon Simple Storage Service (Amazon S3) targets.
CrawlsFilter
Description
A list of fields, comparators and value that you can use to filter the crawler runs for a specified crawler.
Members
- FieldName
-
- Type: string
A key used to filter the crawler runs for a specified crawler. Valid values for each of the field names are:
-
CRAWL_ID
: A string representing the UUID identifier for a crawl. -
STATE
: A string representing the state of the crawl. -
START_TIME
andEND_TIME
: The epoch timestamp in milliseconds. -
DPU_HOUR
: The number of data processing unit (DPU) hours used for the crawl.
- FieldValue
-
- Type: string
The value provided for comparison on the crawl field.
- FilterOperator
-
- Type: string
A defined comparator that operates on the value. The available operators are:
-
GT
: Greater than. -
GE
: Greater than or equal to. -
LT
: Less than. -
LE
: Less than or equal to. -
EQ
: Equal to. -
NE
: Not equal to.
CreateCsvClassifierRequest
Description
Specifies a custom CSV classifier for CreateClassifier
to create.
Members
- AllowSingleColumn
-
- Type: boolean
Enables the processing of files that contain only one column.
- ContainsHeader
-
- Type: string
Indicates whether the CSV file contains a header.
- CustomDatatypeConfigured
-
- Type: boolean
Enables the configuration of custom datatypes.
- CustomDatatypes
-
- Type: Array of strings
Creates a list of supported custom datatypes.
- Delimiter
-
- Type: string
A custom symbol to denote what separates each column entry in the row.
- DisableValueTrimming
-
- Type: boolean
Specifies not to trim values before identifying the type of column values. The default value is true.
- Header
-
- Type: Array of strings
A list of strings representing column names.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- QuoteSymbol
-
- Type: string
A custom symbol to denote what combines content into a single column value. Must be different from the column delimiter.
- Serde
-
- Type: string
Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are
OpenCSVSerDe
,LazySimpleSerDe
, andNone
. You can specify theNone
value when you want the crawler to do the detection.
CreateGrokClassifierRequest
Description
Specifies a grok
classifier for CreateClassifier
to create.
Members
- Classification
-
- Required: Yes
- Type: string
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.
- CustomPatterns
-
- Type: string
Optional custom grok patterns used by this classifier.
- GrokPattern
-
- Required: Yes
- Type: string
The grok pattern used by this classifier.
- Name
-
- Required: Yes
- Type: string
The name of the new classifier.
CreateJsonClassifierRequest
Description
Specifies a JSON classifier for CreateClassifier
to create.
Members
- JsonPath
-
- Required: Yes
- Type: string
A
JsonPath
string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers. - Name
-
- Required: Yes
- Type: string
The name of the classifier.
CreateXMLClassifierRequest
Description
Specifies an XML classifier for CreateClassifier
to create.
Members
- Classification
-
- Required: Yes
- Type: string
An identifier of the data format that the classifier matches.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- RowTag
-
- Type: string
The XML tag designating the element that contains each record in an XML document being parsed. This can't identify a self-closing element (closed by
/>
). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example,<row item_a="A" item_b="B"></row>
is okay, but<row item_a="A" item_b="B" />
is not).
CsvClassifier
Description
A classifier for custom CSV
content.
Members
- AllowSingleColumn
-
- Type: boolean
Enables the processing of files that contain only one column.
- ContainsHeader
-
- Type: string
Indicates whether the CSV file contains a header.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was registered.
- CustomDatatypeConfigured
-
- Type: boolean
Enables the custom datatype to be configured.
- CustomDatatypes
-
- Type: Array of strings
A list of custom datatypes including "BINARY", "BOOLEAN", "DATE", "DECIMAL", "DOUBLE", "FLOAT", "INT", "LONG", "SHORT", "STRING", "TIMESTAMP".
- Delimiter
-
- Type: string
A custom symbol to denote what separates each column entry in the row.
- DisableValueTrimming
-
- Type: boolean
Specifies not to trim values before identifying the type of column values. The default value is
true
. - Header
-
- Type: Array of strings
A list of strings representing column names.
- LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was last updated.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- QuoteSymbol
-
- Type: string
A custom symbol to denote what combines content into a single column value. It must be different from the column delimiter.
- Serde
-
- Type: string
Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are
OpenCSVSerDe
,LazySimpleSerDe
, andNone
. You can specify theNone
value when you want the crawler to do the detection. - Version
-
- Type: long (int|float)
The version of this classifier.
CustomCode
Description
Specifies a transform that uses custom code you provide to perform the data transformation. The output is a collection of DynamicFrames.
Members
- ClassName
-
- Required: Yes
- Type: string
The name defined for the custom code node class.
- Code
-
- Required: Yes
- Type: string
The custom code that is used to perform the data transformation.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the custom code transform.
CustomEntityType
Description
An object representing a custom pattern for detecting sensitive data across the columns and rows of your structured data.
Members
- ContextWords
-
- Type: Array of strings
A list of context words. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.
If no context words are passed only a regular expression is checked.
- Name
-
- Required: Yes
- Type: string
A name for the custom pattern that allows it to be retrieved or deleted later. This name must be unique per Amazon Web Services account.
- RegexString
-
- Required: Yes
- Type: string
A regular expression string that is used for detecting sensitive data in a custom pattern.
DQResultsPublishingOptions
Description
Options to configure how your data quality evaluation results are published.
Members
- CloudWatchMetricsEnabled
-
- Type: boolean
Enable metrics for your data quality results.
- EvaluationContext
-
- Type: string
The context of the evaluation.
- ResultsPublishingEnabled
-
- Type: boolean
Enable publishing for your data quality results.
- ResultsS3Prefix
-
- Type: string
The Amazon S3 prefix prepended to the results.
DQStopJobOnFailureOptions
Description
Options to configure how your job will stop if your data quality evaluation fails.
Members
- StopJobOnFailureTiming
-
- Type: string
When to stop job if your data quality evaluation fails. Options are Immediate or AfterDataLoad.
DataCatalogEncryptionSettings
Description
Contains configuration information for maintaining Data Catalog security.
Members
- ConnectionPasswordEncryption
-
- Type: ConnectionPasswordEncryption structure
When connection password protection is enabled, the Data Catalog uses a customer-provided key to encrypt the password as part of
CreateConnection
orUpdateConnection
and store it in theENCRYPTED_PASSWORD
field in the connection properties. You can enable catalog encryption or only password encryption. - EncryptionAtRest
-
- Type: EncryptionAtRest structure
Specifies the encryption-at-rest configuration for the Data Catalog.
DataLakePrincipal
Description
The Lake Formation principal.
Members
- DataLakePrincipalIdentifier
-
- Type: string
An identifier for the Lake Formation principal.
DataQualityAnalyzerResult
Description
Describes the result of the evaluation of a data quality analyzer.
Members
- Description
-
- Type: string
A description of the data quality analyzer.
- EvaluatedMetrics
-
- Type: Associative array of custom strings keys (NameString) to doubles
A map of metrics associated with the evaluation of the analyzer.
- EvaluationMessage
-
- Type: string
An evaluation message.
- Name
-
- Type: string
The name of the data quality analyzer.
DataQualityEvaluationRunAdditionalRunOptions
Description
Additional run options you can specify for an evaluation run.
Members
- CloudWatchMetricsEnabled
-
- Type: boolean
Whether or not to enable CloudWatch metrics.
- CompositeRuleEvaluationMethod
-
- Type: string
Set the evaluation method for composite rules in the ruleset to ROW/COLUMN
- ResultsS3Prefix
-
- Type: string
Prefix for Amazon S3 to store results.
DataQualityMetricValues
Description
Describes the data quality metric value according to the analysis of historical data.
Members
- ActualValue
-
- Type: double
The actual value of the data quality metric.
- ExpectedValue
-
- Type: double
The expected value of the data quality metric according to the analysis of historical data.
- LowerLimit
-
- Type: double
The lower limit of the data quality metric value according to the analysis of historical data.
- UpperLimit
-
- Type: double
The upper limit of the data quality metric value according to the analysis of historical data.
DataQualityObservation
Description
Describes the observation generated after evaluating the rules and analyzers.
Members
- Description
-
- Type: string
A description of the data quality observation.
- MetricBasedObservation
-
- Type: MetricBasedObservation structure
An object of type
MetricBasedObservation
representing the observation that is based on evaluated data quality metrics.
DataQualityResult
Description
Describes a data quality result.
Members
- AnalyzerResults
-
- Type: Array of DataQualityAnalyzerResult structures
A list of
DataQualityAnalyzerResult
objects representing the results for each analyzer. - CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this data quality run completed.
- DataSource
-
- Type: DataSource structure
The table associated with the data quality result, if any.
- EvaluationContext
-
- Type: string
In the context of a job in Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the
evaluationContext
can differentiate the nodes. - JobName
-
- Type: string
The job name associated with the data quality result, if any.
- JobRunId
-
- Type: string
The job run ID associated with the data quality result, if any.
- Observations
-
- Type: Array of DataQualityObservation structures
A list of
DataQualityObservation
objects representing the observations generated after evaluating the rules and analyzers. - ProfileId
-
- Type: string
The Profile ID for the data quality result.
- ResultId
-
- Type: string
A unique result ID for the data quality result.
- RuleResults
-
- Type: Array of DataQualityRuleResult structures
A list of
DataQualityRuleResult
objects representing the results for each rule. - RulesetEvaluationRunId
-
- Type: string
The unique run ID for the ruleset evaluation for this data quality result.
- RulesetName
-
- Type: string
The name of the ruleset associated with the data quality result.
- Score
-
- Type: double
An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this data quality run started.
DataQualityResultDescription
Description
Describes a data quality result.
Members
- DataSource
-
- Type: DataSource structure
The table name associated with the data quality result.
- JobName
-
- Type: string
The job name associated with the data quality result.
- JobRunId
-
- Type: string
The job run ID associated with the data quality result.
- ResultId
-
- Type: string
The unique result ID for this data quality result.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that the run started for this data quality result.
DataQualityResultFilterCriteria
Description
Criteria used to return data quality results.
Members
- DataSource
-
- Type: DataSource structure
Filter results by the specified data source. For example, retrieving all results for an Glue table.
- JobName
-
- Type: string
Filter results by the specified job name.
- JobRunId
-
- Type: string
Filter results by the specified job run ID.
- StartedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter results by runs that started after this time.
- StartedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter results by runs that started before this time.
DataQualityRuleRecommendationRunDescription
Description
Describes the result of a data quality rule recommendation run.
Members
- DataSource
-
- Type: DataSource structure
The data source (Glue table) associated with the recommendation run.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run started.
- Status
-
- Type: string
The status for this run.
DataQualityRuleRecommendationRunFilter
Description
A filter for listing data quality recommendation runs.
Members
- DataSource
-
- Required: Yes
- Type: DataSource structure
Filter based on a specified data source (Glue table).
- StartedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter based on time for results started after provided time.
- StartedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter based on time for results started before provided time.
DataQualityRuleResult
Description
Describes the result of the evaluation of a data quality rule.
Members
- Description
-
- Type: string
A description of the data quality rule.
- EvaluatedMetrics
-
- Type: Associative array of custom strings keys (NameString) to doubles
A map of metrics associated with the evaluation of the rule.
- EvaluatedRule
-
- Type: string
The evaluated rule.
- EvaluationMessage
-
- Type: string
An evaluation message.
- Name
-
- Type: string
The name of the data quality rule.
- Result
-
- Type: string
A pass or fail status for the rule.
DataQualityRulesetEvaluationRunDescription
Description
Describes the result of a data quality ruleset evaluation run.
Members
- DataSource
-
- Type: DataSource structure
The data source (an Glue table) associated with the run.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the run started.
- Status
-
- Type: string
The status for this run.
DataQualityRulesetEvaluationRunFilter
Description
The filter criteria.
Members
- DataSource
-
- Required: Yes
- Type: DataSource structure
Filter based on a data source (an Glue table) associated with the run.
- StartedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter results by runs that started after this time.
- StartedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter results by runs that started before this time.
DataQualityRulesetFilterCriteria
Description
The criteria used to filter data quality rulesets.
Members
- CreatedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on rulesets created after this date.
- CreatedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on rulesets created before this date.
- Description
-
- Type: string
The description of the ruleset filter criteria.
- LastModifiedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on rulesets last modified after this date.
- LastModifiedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on rulesets last modified before this date.
- Name
-
- Type: string
The name of the ruleset filter criteria.
- TargetTable
-
- Type: DataQualityTargetTable structure
The name and database name of the target table.
DataQualityRulesetListDetails
Description
Describes a data quality ruleset returned by GetDataQualityRuleset
.
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time the data quality ruleset was created.
- Description
-
- Type: string
A description of the data quality ruleset.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time the data quality ruleset was last modified.
- Name
-
- Type: string
The name of the data quality ruleset.
- RecommendationRunId
-
- Type: string
When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
- RuleCount
-
- Type: int
The number of rules in the ruleset.
- TargetTable
-
- Type: DataQualityTargetTable structure
An object representing an Glue table.
DataQualityTargetTable
Description
An object representing an Glue table.
Members
- CatalogId
-
- Type: string
The catalog id where the Glue table exists.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the Glue table exists.
- TableName
-
- Required: Yes
- Type: string
The name of the Glue table.
DataSource
Description
A data source (an Glue table) for which you want data quality results.
Members
- GlueTable
-
- Required: Yes
- Type: GlueTable structure
An Glue table.
Database
Description
The Database
object represents a logical grouping of tables that might reside in a Hive metastore or an RDBMS.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides.
- CreateTableDefaultPermissions
-
- Type: Array of PrincipalPermissions structures
Creates a set of default permissions on the table for principals. Used by Lake Formation. Not used in the normal course of Glue operations.
- CreateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the metadata database was created in the catalog.
- Description
-
- Type: string
A description of the database.
- FederatedDatabase
-
- Type: FederatedDatabase structure
A
FederatedDatabase
structure that references an entity outside the Glue Data Catalog. - LocationUri
-
- Type: string
The location of the database (for example, an HDFS path).
- Name
-
- Required: Yes
- Type: string
The name of the database. For Hive compatibility, this is folded to lowercase when it is stored.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define parameters and properties of the database.
- TargetDatabase
-
- Type: DatabaseIdentifier structure
A
DatabaseIdentifier
structure that describes a target database for resource linking.
DatabaseIdentifier
Description
A structure that describes a target database for resource linking.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides.
- DatabaseName
-
- Type: string
The name of the catalog database.
- Region
-
- Type: string
Region of the target database.
DatabaseInput
Description
The structure used to create or update a database.
Members
- CreateTableDefaultPermissions
-
- Type: Array of PrincipalPermissions structures
Creates a set of default permissions on the table for principals. Used by Lake Formation. Not used in the normal course of Glue operations.
- Description
-
- Type: string
A description of the database.
- FederatedDatabase
-
- Type: FederatedDatabase structure
A
FederatedDatabase
structure that references an entity outside the Glue Data Catalog. - LocationUri
-
- Type: string
The location of the database (for example, an HDFS path).
- Name
-
- Required: Yes
- Type: string
The name of the database. For Hive compatibility, this is folded to lowercase when it is stored.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define parameters and properties of the database.
These key-value pairs define parameters and properties of the database.
- TargetDatabase
-
- Type: DatabaseIdentifier structure
A
DatabaseIdentifier
structure that describes a target database for resource linking.
DatapointInclusionAnnotation
Description
An Inclusion Annotation.
Members
- InclusionAnnotation
-
- Type: string
The inclusion annotation value to apply to the statistic.
- ProfileId
-
- Type: string
The ID of the data quality profile the statistic belongs to.
- StatisticId
-
- Type: string
The Statistic ID.
Datatype
Description
A structure representing the datatype of the value.
Members
- Id
-
- Required: Yes
- Type: string
The datatype of the value.
- Label
-
- Required: Yes
- Type: string
A label assigned to the datatype.
DateColumnStatisticsData
Description
Defines column statistics supported for timestamp data columns.
Members
- MaximumValue
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The highest value in the column.
- MinimumValue
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The lowest value in the column.
- NumberOfDistinctValues
-
- Required: Yes
- Type: long (int|float)
The number of distinct values in a column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
DecimalColumnStatisticsData
Description
Defines column statistics supported for fixed-point number data columns.
Members
- MaximumValue
-
- Type: DecimalNumber structure
The highest value in the column.
- MinimumValue
-
- Type: DecimalNumber structure
The lowest value in the column.
- NumberOfDistinctValues
-
- Required: Yes
- Type: long (int|float)
The number of distinct values in a column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
DecimalNumber
Description
Contains a numeric value in decimal format.
Members
- Scale
-
- Required: Yes
- Type: int
The scale that determines where the decimal point falls in the unscaled value.
- UnscaledValue
-
- Required: Yes
- Type: blob (string|resource|Psr\Http\Message\StreamInterface)
The unscaled numeric value.
DeltaTarget
Description
Specifies a Delta data store to crawl one or more Delta tables.
Members
- ConnectionName
-
- Type: string
The name of the connection to use to connect to the Delta table target.
- CreateNativeDeltaTable
-
- Type: boolean
Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly.
- DeltaTables
-
- Type: Array of strings
A list of the Amazon S3 paths to the Delta tables.
- WriteManifest
-
- Type: boolean
Specifies whether to write the manifest files to the Delta table path.
DevEndpoint
Description
A development endpoint where a developer can remotely debug extract, transform, and load (ETL) scripts.
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
A map of arguments used to configure the
DevEndpoint
.Valid arguments are:
-
"--enable-glue-datacatalog": ""
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - AvailabilityZone
-
- Type: string
The Amazon Web Services Availability Zone where this
DevEndpoint
is located. - CreatedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The point in time at which this DevEndpoint was created.
- EndpointName
-
- Type: string
The name of the
DevEndpoint
. - ExtraJarsS3Path
-
- Type: string
The path to one or more Java
.jar
files in an S3 bucket that should be loaded in yourDevEndpoint
.You can only use pure Java/Scala libraries with a
DevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in your
DevEndpoint
. Multiple values must be complete paths separated by a comma.You can only use pure Python libraries with a
DevEndpoint
. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not currently supported. - FailureReason
-
- Type: string
The reason for a current failure in this
DevEndpoint
. - GlueVersion
-
- Type: string
Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.
For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Development endpoints that are created without specifying a Glue version default to Glue 0.9.
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - LastModifiedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The point in time at which this
DevEndpoint
was last modified. - LastUpdateStatus
-
- Type: string
The status of the last update.
- NumberOfNodes
-
- Type: int
The number of Glue Data Processing Units (DPUs) allocated to this
DevEndpoint
. - NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated to the development endpoint.The maximum number of workers you can define are 299 for
G.1X
, and 149 forG.2X
. - PrivateAddress
-
- Type: string
A private IP address to access the
DevEndpoint
within a VPC if theDevEndpoint
is created within one. ThePrivateAddress
field is present only when you create theDevEndpoint
within your VPC. - PublicAddress
-
- Type: string
The public IP address used by this
DevEndpoint
. ThePublicAddress
field is present only when you create a non-virtual private cloud (VPC)DevEndpoint
. - PublicKey
-
- Type: string
The public key to be used by this
DevEndpoint
for authentication. This attribute is provided for backward compatibility because the recommended attribute to use is public keys. - PublicKeys
-
- Type: Array of strings
A list of public keys to be used by the
DevEndpoints
for authentication. Using this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys. Call the
UpdateDevEndpoint
API operation with the public key content in thedeletePublicKeys
attribute, and the list of new keys in theaddPublicKeys
attribute. - RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the IAM role used in this
DevEndpoint
. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with thisDevEndpoint
. - SecurityGroupIds
-
- Type: Array of strings
A list of security group identifiers used in this
DevEndpoint
. - Status
-
- Type: string
The current status of this
DevEndpoint
. - SubnetId
-
- Type: string
The subnet ID for this
DevEndpoint
. - VpcId
-
- Type: string
The ID of the virtual private cloud (VPC) used by this
DevEndpoint
. - WorkerType
-
- Type: string
The type of predefined worker that is allocated to the development endpoint. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
Known issue: when a development endpoint is created with the
G.2X
WorkerType
configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk. - YarnEndpointAddress
-
- Type: string
The YARN endpoint address used by this
DevEndpoint
. - ZeppelinRemoteSparkInterpreterPort
-
- Type: int
The Apache Zeppelin port for the remote Apache Spark interpreter.
DevEndpointCustomLibraries
Description
Custom libraries to be loaded into a development endpoint.
Members
- ExtraJarsS3Path
-
- Type: string
The path to one or more Java
.jar
files in an S3 bucket that should be loaded in yourDevEndpoint
.You can only use pure Java/Scala libraries with a
DevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an Amazon Simple Storage Service (Amazon S3) bucket that should be loaded in your
DevEndpoint
. Multiple values must be complete paths separated by a comma.You can only use pure Python libraries with a
DevEndpoint
. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not currently supported.
DirectJDBCSource
Description
Specifies the direct JDBC source connection.
Members
- ConnectionName
-
- Required: Yes
- Type: string
The connection name of the JDBC source.
- ConnectionType
-
- Required: Yes
- Type: string
The connection type of the JDBC source.
- Database
-
- Required: Yes
- Type: string
The database of the JDBC source connection.
- Name
-
- Required: Yes
- Type: string
The name of the JDBC source connection.
- RedshiftTmpDir
-
- Type: string
The temp directory of the JDBC Redshift source.
- Table
-
- Required: Yes
- Type: string
The table of the JDBC source connection.
DirectKafkaSource
Description
Specifies an Apache Kafka data store.
Members
- DataPreviewOptions
-
- Type: StreamingDataPreviewOptions structure
Specifies options related to data preview for viewing a sample of your data.
- DetectSchema
-
- Type: boolean
Whether to automatically determine the schema from the incoming data.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- StreamingOptions
-
- Type: KafkaStreamingSourceOptions structure
Specifies the streaming options.
- WindowSize
-
- Type: int
The amount of time to spend processing each micro batch.
DirectKinesisSource
Description
Specifies a direct Amazon Kinesis data source.
Members
- DataPreviewOptions
-
- Type: StreamingDataPreviewOptions structure
Additional options for data preview.
- DetectSchema
-
- Type: boolean
Whether to automatically determine the schema from the incoming data.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- StreamingOptions
-
- Type: KinesisStreamingSourceOptions structure
Additional options for the Kinesis streaming data source.
- WindowSize
-
- Type: int
The amount of time to spend processing each micro batch.
DirectSchemaChangePolicy
Description
A policy that specifies update behavior for the crawler.
Members
- Database
-
- Type: string
Specifies the database that the schema change policy applies to.
- EnableUpdateCatalog
-
- Type: boolean
Whether to use the specified update behavior when the crawler finds a changed schema.
- Table
-
- Type: string
Specifies the table in the database that the schema change policy applies to.
- UpdateBehavior
-
- Type: string
The update behavior when the crawler finds a changed schema.
DoubleColumnStatisticsData
Description
Defines column statistics supported for floating-point number data columns.
Members
- MaximumValue
-
- Type: double
The highest value in the column.
- MinimumValue
-
- Type: double
The lowest value in the column.
- NumberOfDistinctValues
-
- Required: Yes
- Type: long (int|float)
The number of distinct values in a column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
DropDuplicates
Description
Specifies a transform that removes rows of repeating data from a data set.
Members
- Columns
-
- Type: Array of stringss
The name of the columns to be merged or removed if repeating.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
DropFields
Description
Specifies a transform that chooses the data property keys that you want to drop.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- Paths
-
- Required: Yes
- Type: Array of stringss
A JSON path to a variable in the data structure.
DropNullFields
Description
Specifies a transform that removes columns from the dataset if all values in the column are 'null'. By default, Glue Studio will recognize null objects, but some values such as empty strings, strings that are "null", -1 integers or other placeholders such as zeros, are not automatically recognized as nulls.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- NullCheckBoxList
-
- Type: NullCheckBoxList structure
A structure that represents whether certain values are recognized as null values for removal.
- NullTextList
-
- Type: Array of NullValueField structures
A structure that specifies a list of NullValueField structures that represent a custom null value such as zero or other value being used as a null placeholder unique to the dataset.
The
DropNullFields
transform removes custom null values only if both the value of the null placeholder and the datatype match the data.
DynamicTransform
Description
Specifies the set of parameters needed to perform the dynamic transform.
Members
- FunctionName
-
- Required: Yes
- Type: string
Specifies the name of the function of the dynamic transform.
- Inputs
-
- Required: Yes
- Type: Array of strings
Specifies the inputs for the dynamic transform that are required.
- Name
-
- Required: Yes
- Type: string
Specifies the name of the dynamic transform.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the dynamic transform.
- Parameters
-
- Type: Array of TransformConfigParameter structures
Specifies the parameters of the dynamic transform.
- Path
-
- Required: Yes
- Type: string
Specifies the path of the dynamic transform source and config files.
- TransformName
-
- Required: Yes
- Type: string
Specifies the name of the dynamic transform as it appears in the Glue Studio visual editor.
- Version
-
- Type: string
This field is not used and will be deprecated in future release.
DynamoDBCatalogSource
Description
Specifies a DynamoDB data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
DynamoDBTarget
Description
Specifies an Amazon DynamoDB table to crawl.
Members
- Path
-
- Type: string
The name of the DynamoDB table to crawl.
- scanAll
-
- Type: boolean
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
. - scanRate
-
- Type: double
The percentage of the configured read capacity units to use by the Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.
The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).
Edge
Description
An edge represents a directed connection between two Glue components that are part of the workflow the edge belongs to.
Members
- DestinationId
-
- Type: string
The unique of the node within the workflow where the edge ends.
- SourceId
-
- Type: string
The unique of the node within the workflow where the edge starts.
EncryptionAtRest
Description
Specifies the encryption-at-rest configuration for the Data Catalog.
Members
- CatalogEncryptionMode
-
- Required: Yes
- Type: string
The encryption-at-rest mode for encrypting Data Catalog data.
- CatalogEncryptionServiceRole
-
- Type: string
The role that Glue assumes to encrypt and decrypt the Data Catalog objects on the caller's behalf.
- SseAwsKmsKeyId
-
- Type: string
The ID of the KMS key to use for encryption at rest.
EncryptionConfiguration
Description
Specifies an encryption configuration.
Members
- CloudWatchEncryption
-
- Type: CloudWatchEncryption structure
The encryption configuration for Amazon CloudWatch.
- JobBookmarksEncryption
-
- Type: JobBookmarksEncryption structure
The encryption configuration for job bookmarks.
- S3Encryption
-
- Type: Array of S3Encryption structures
The encryption configuration for Amazon Simple Storage Service (Amazon S3) data.
EntityNotFoundException
Description
A specified entity does not exist
Members
- FromFederationSource
-
- Type: boolean
Indicates whether or not the exception relates to a federated source.
- Message
-
- Type: string
A message describing the problem.
ErrorDetail
Description
Contains details about an error.
Members
- ErrorCode
-
- Type: string
The code associated with this error.
- ErrorMessage
-
- Type: string
A message describing the error.
ErrorDetails
Description
An object containing error details.
Members
- ErrorCode
-
- Type: string
The error code for an error.
- ErrorMessage
-
- Type: string
The error message for an error.
EvaluateDataQuality
Description
Specifies your data quality evaluation criteria.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The inputs of your data quality evaluation.
- Name
-
- Required: Yes
- Type: string
The name of the data quality evaluation.
- Output
-
- Type: string
The output of your data quality evaluation.
- PublishingOptions
-
- Type: DQResultsPublishingOptions structure
Options to configure how your results are published.
- Ruleset
-
- Required: Yes
- Type: string
The ruleset for your data quality evaluation.
- StopJobOnFailureOptions
-
- Type: DQStopJobOnFailureOptions structure
Options to configure how your job will stop if your data quality evaluation fails.
EvaluateDataQualityMultiFrame
Description
Specifies your data quality evaluation criteria.
Members
- AdditionalDataSources
-
- Type: Associative array of custom strings keys (NodeName) to strings
The aliases of all data sources except primary.
- AdditionalOptions
-
- Type: Associative array of custom strings keys (AdditionalOptionKeys) to strings
Options to configure runtime behavior of the transform.
- Inputs
-
- Required: Yes
- Type: Array of strings
The inputs of your data quality evaluation. The first input in this list is the primary data source.
- Name
-
- Required: Yes
- Type: string
The name of the data quality evaluation.
- PublishingOptions
-
- Type: DQResultsPublishingOptions structure
Options to configure how your results are published.
- Ruleset
-
- Required: Yes
- Type: string
The ruleset for your data quality evaluation.
- StopJobOnFailureOptions
-
- Type: DQStopJobOnFailureOptions structure
Options to configure how your job will stop if your data quality evaluation fails.
EvaluationMetrics
Description
Evaluation metrics provide an estimate of the quality of your machine learning transform.
Members
- FindMatchesMetrics
-
- Type: FindMatchesMetrics structure
The evaluation metrics for the find matches algorithm.
- TransformType
-
- Required: Yes
- Type: string
The type of machine learning transform.
EventBatchingCondition
Description
Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.
Members
- BatchSize
-
- Required: Yes
- Type: int
Number of events that must be received from Amazon EventBridge before EventBridge event trigger fires.
- BatchWindow
-
- Type: int
Window of time in seconds after which EventBridge event trigger fires. Window starts when first event is received.
ExecutionProperty
Description
An execution property of a job.
Members
- MaxConcurrentRuns
-
- Type: int
The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.
ExportLabelsTaskRunProperties
Description
Specifies configuration properties for an exporting labels task run.
Members
- OutputS3Path
-
- Type: string
The Amazon Simple Storage Service (Amazon S3) path where you will export the labels.
FederatedDatabase
Description
A database that points to an entity outside the Glue Data Catalog.
Members
- ConnectionName
-
- Type: string
The name of the connection to the external metastore.
- Identifier
-
- Type: string
A unique identifier for the federated database.
FederatedResourceAlreadyExistsException
Description
A federated resource already exists.
Members
- AssociatedGlueResource
-
- Type: string
The associated Glue resource already exists.
- Message
-
- Type: string
The message describing the problem.
FederatedTable
Description
A table that points to an entity outside the Glue Data Catalog.
Members
- ConnectionName
-
- Type: string
The name of the connection to the external metastore.
- DatabaseIdentifier
-
- Type: string
A unique identifier for the federated database.
- Identifier
-
- Type: string
A unique identifier for the federated table.
FederationSourceException
Description
A federation source failed.
Members
- FederationSourceErrorCode
-
- Type: string
The error code of the problem.
- Message
-
- Type: string
The message describing the problem.
FederationSourceRetryableException
Description
A federation source failed, but the operation may be retried.
Members
- Message
-
- Type: string
A message describing the problem.
FillMissingValues
Description
Specifies a transform that locates records in the dataset that have missing values and adds a new field with a value determined by imputation. The input data set is used to train the machine learning model that determines what the missing value should be.
Members
- FilledPath
-
- Type: string
A JSON path to a variable in the data structure for the dataset that is filled.
- ImputedPath
-
- Required: Yes
- Type: string
A JSON path to a variable in the data structure for the dataset that is imputed.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
Filter
Description
Specifies a transform that splits a dataset into two, based on a filter condition.
Members
- Filters
-
- Required: Yes
- Type: Array of FilterExpression structures
Specifies a filter expression.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- LogicalOperator
-
- Required: Yes
- Type: string
The operator used to filter rows by comparing the key value to a specified value.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
FilterExpression
Description
Specifies a filter expression.
Members
- Negated
-
- Type: boolean
Whether the expression is to be negated.
- Operation
-
- Required: Yes
- Type: string
The type of operation to perform in the expression.
- Values
-
- Required: Yes
- Type: Array of FilterValue structures
A list of filter values.
FilterValue
Description
Represents a single entry in the list of values for a FilterExpression
.
Members
- Type
-
- Required: Yes
- Type: string
The type of filter value.
- Value
-
- Required: Yes
- Type: Array of strings
The value to be associated.
FindMatchesMetrics
Description
The evaluation metrics for the find matches algorithm. The quality of your machine learning transform is measured by getting your transform to predict some matches and comparing the results to known matches from the same dataset. The quality metrics are based on a subset of your data, so they are not precise.
Members
- AreaUnderPRCurve
-
- Type: double
The area under the precision/recall curve (AUPRC) is a single number measuring the overall quality of the transform, that is independent of the choice made for precision vs. recall. Higher values indicate that you have a more attractive precision vs. recall tradeoff.
For more information, see Precision and recall in Wikipedia.
- ColumnImportances
-
- Type: Array of ColumnImportance structures
A list of
ColumnImportance
structures containing column importance metrics, sorted in order of descending importance. - ConfusionMatrix
-
- Type: ConfusionMatrix structure
The confusion matrix shows you what your transform is predicting accurately and what types of errors it is making.
For more information, see Confusion matrix in Wikipedia.
- F1
-
- Type: double
The maximum F1 metric indicates the transform's accuracy between 0 and 1, where 1 is the best accuracy.
For more information, see F1 score in Wikipedia.
- Precision
-
- Type: double
The precision metric indicates when often your transform is correct when it predicts a match. Specifically, it measures how well the transform finds true positives from the total true positives possible.
For more information, see Precision and recall in Wikipedia.
- Recall
-
- Type: double
The recall metric indicates that for an actual match, how often your transform predicts the match. Specifically, it measures how well the transform finds true positives from the total records in the source data.
For more information, see Precision and recall in Wikipedia.
FindMatchesParameters
Description
The parameters to configure the find matches transform.
Members
- AccuracyCostTradeoff
-
- Type: double
The value that is selected when tuning your transform for a balance between accuracy and cost. A value of 0.5 means that the system balances accuracy and cost concerns. A value of 1.0 means a bias purely for accuracy, which typically results in a higher cost, sometimes substantially higher. A value of 0.0 means a bias purely for cost, which results in a less accurate
FindMatches
transform, sometimes with unacceptable accuracy.Accuracy measures how well the transform finds true positives and true negatives. Increasing accuracy requires more machine resources and cost. But it also results in increased recall.
Cost measures how many compute resources, and thus money, are consumed to run the transform.
- EnforceProvidedLabels
-
- Type: boolean
The value to switch on or off to force the output to match the provided labels from users. If the value is
True
, thefind matches
transform forces the output to match the provided labels. The results override the normal conflation results. If the value isFalse
, thefind matches
transform does not ensure all the labels provided are respected, and the results rely on the trained model.Note that setting this value to true may increase the conflation execution time.
- PrecisionRecallTradeoff
-
- Type: double
The value selected when tuning your transform for a balance between precision and recall. A value of 0.5 means no preference; a value of 1.0 means a bias purely for precision, and a value of 0.0 means a bias for recall. Because this is a tradeoff, choosing values close to 1.0 means very low recall, and choosing values close to 0.0 results in very low precision.
The precision metric indicates how often your model is correct when it predicts a match.
The recall metric indicates that for an actual match, how often your model predicts the match.
- PrimaryKeyColumnName
-
- Type: string
The name of a column that uniquely identifies rows in the source table. Used to help identify matching records.
FindMatchesTaskRunProperties
Description
Specifies configuration properties for a Find Matches task run.
Members
- JobId
-
- Type: string
The job ID for the Find Matches task run.
- JobName
-
- Type: string
The name assigned to the job for the Find Matches task run.
- JobRunId
-
- Type: string
The job run ID for the Find Matches task run.
GetConnectionsFilter
Description
Filters the connection definitions that are returned by the GetConnections
API operation.
Members
- ConnectionType
-
- Type: string
The type of connections to return. Currently, SFTP is not supported.
- MatchCriteria
-
- Type: Array of strings
A criteria string that must match the criteria recorded in the connection definition for that connection definition to be returned.
GlueEncryptionException
Description
An encryption operation failed.
Members
- Message
-
- Type: string
The message describing the problem.
GluePolicy
Description
A structure for returning a resource policy.
Members
- CreateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time at which the policy was created.
- PolicyHash
-
- Type: string
Contains the hash value associated with this policy.
- PolicyInJson
-
- Type: string
Contains the requested policy document, in JSON format.
- UpdateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time at which the policy was last updated.
GlueSchema
Description
Specifies a user-defined schema when a schema cannot be determined by Glue.
Members
- Columns
-
- Type: Array of GlueStudioSchemaColumn structures
Specifies the column definitions that make up a Glue schema.
GlueStudioSchemaColumn
Description
Specifies a single column in a Glue schema definition.
Members
- Name
-
- Required: Yes
- Type: string
The name of the column in the Glue Studio schema.
- Type
-
- Type: string
The hive type for this column in the Glue Studio schema.
GlueTable
Description
The database and table in the Glue Data Catalog that is used for input or output data.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (NameString) to strings
Additional options for the table. Currently there are two keys supported:
-
pushDownPredicate
: to filter on partitions without having to list and read all the files in your dataset. -
catalogPartitionPredicate
: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
- CatalogId
-
- Type: string
A unique identifier for the Glue Data Catalog.
- ConnectionName
-
- Type: string
The name of the connection to the Glue Data Catalog.
- DatabaseName
-
- Required: Yes
- Type: string
A database name in the Glue Data Catalog.
- TableName
-
- Required: Yes
- Type: string
A table name in the Glue Data Catalog.
GovernedCatalogSource
Description
Specifies the data store in the governed Glue Data Catalog.
Members
- AdditionalOptions
-
- Type: S3SourceAdditionalOptions structure
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- PartitionPredicate
-
- Type: string
Partitions satisfying this predicate are deleted. Files within the retention period in these partitions are not deleted. Set to
""
– empty by default. - Table
-
- Required: Yes
- Type: string
The database table to read from.
GovernedCatalogTarget
Description
Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- SchemaChangePolicy
-
- Type: CatalogSchemaChangePolicy structure
A policy that specifies update behavior for the governed catalog.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
GrokClassifier
Description
A classifier that uses grok
patterns.
Members
- Classification
-
- Required: Yes
- Type: string
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was registered.
- CustomPatterns
-
- Type: string
Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifiers.
- GrokPattern
-
- Required: Yes
- Type: string
The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifiers.
- LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was last updated.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- Version
-
- Type: long (int|float)
The version of this classifier.
HudiTarget
Description
Specifies an Apache Hudi data source.
Members
- ConnectionName
-
- Type: string
The name of the connection to use to connect to the Hudi target. If your Hudi files are stored in buckets that require VPC authorization, you can set their connection properties here.
- Exclusions
-
- Type: Array of strings
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
- MaximumTraversalDepth
-
- Type: int
The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Hudi metadata folder in your Amazon S3 path. Used to limit the crawler run time.
- Paths
-
- Type: Array of strings
An array of Amazon S3 location strings for Hudi, each indicating the root folder with which the metadata files for a Hudi table resides. The Hudi folder may be located in a child folder of the root folder.
The crawler will scan all folders underneath a path for a Hudi folder.
IcebergCompactionMetrics
Description
Compaction metrics for Iceberg for the optimizer run.
Members
- JobDurationInHour
-
- Type: double
The duration of the job in hours.
- NumberOfBytesCompacted
-
- Type: long (int|float)
The number of bytes removed by the compaction job run.
- NumberOfDpus
-
- Type: int
The number of DPU hours consumed by the job.
- NumberOfFilesCompacted
-
- Type: long (int|float)
The number of files removed by the compaction job run.
IcebergInput
Description
A structure that defines an Apache Iceberg metadata table to create in the catalog.
Members
- MetadataOperation
-
- Required: Yes
- Type: string
A required metadata operation. Can only be set to
CREATE
. - Version
-
- Type: string
The table version for the Iceberg table. Defaults to 2.
IcebergOrphanFileDeletionConfiguration
Description
The configuration for an Iceberg orphan file deletion optimizer.
Members
- location
-
- Type: string
Specifies a directory in which to look for files (defaults to the table's location). You may choose a sub-directory rather than the top-level table location.
- orphanFileRetentionPeriodInDays
-
- Type: int
The number of days that orphan files should be retained before file deletion. If an input is not provided, the default value 3 will be used.
IcebergOrphanFileDeletionMetrics
Description
Orphan file deletion metrics for Iceberg for the optimizer run.
Members
- JobDurationInHour
-
- Type: double
The duration of the job in hours.
- NumberOfDpus
-
- Type: int
The number of DPU hours consumed by the job.
- NumberOfOrphanFilesDeleted
-
- Type: long (int|float)
The number of orphan files deleted by the orphan file deletion job run.
IcebergRetentionConfiguration
Description
The configuration for an Iceberg snapshot retention optimizer.
Members
- cleanExpiredFiles
-
- Type: boolean
If set to false, snapshots are only deleted from table metadata, and the underlying data and metadata files are not deleted.
- numberOfSnapshotsToRetain
-
- Type: int
The number of Iceberg snapshots to retain within the retention period. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 1 will be used.
- snapshotRetentionPeriodInDays
-
- Type: int
The number of days to retain the Iceberg snapshots. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 5 will be used.
IcebergRetentionMetrics
Description
Snapshot retention metrics for Iceberg for the optimizer run.
Members
- JobDurationInHour
-
- Type: double
The duration of the job in hours.
- NumberOfDataFilesDeleted
-
- Type: long (int|float)
The number of data files deleted by the retention job run.
- NumberOfDpus
-
- Type: int
The number of DPU hours consumed by the job.
- NumberOfManifestFilesDeleted
-
- Type: long (int|float)
The number of manifest files deleted by the retention job run.
- NumberOfManifestListsDeleted
-
- Type: long (int|float)
The number of manifest lists deleted by the retention job run.
IcebergTarget
Description
Specifies an Apache Iceberg data source where Iceberg tables are stored in Amazon S3.
Members
- ConnectionName
-
- Type: string
The name of the connection to use to connect to the Iceberg target.
- Exclusions
-
- Type: Array of strings
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
- MaximumTraversalDepth
-
- Type: int
The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Iceberg metadata folder in your Amazon S3 path. Used to limit the crawler run time.
- Paths
-
- Type: Array of strings
One or more Amazon S3 paths that contains Iceberg metadata folders as
s3://bucket/prefix
.
IdempotentParameterMismatchException
Description
The same unique identifier was associated with two different records.
Members
- Message
-
- Type: string
A message describing the problem.
IllegalBlueprintStateException
Description
The blueprint is in an invalid state to perform a requested operation.
Members
- Message
-
- Type: string
A message describing the problem.
IllegalSessionStateException
Description
The session is in an invalid state to perform a requested operation.
Members
- Message
-
- Type: string
A message describing the problem.
IllegalWorkflowStateException
Description
The workflow is in an invalid state to perform a requested operation.
Members
- Message
-
- Type: string
A message describing the problem.
ImportLabelsTaskRunProperties
Description
Specifies configuration properties for an importing labels task run.
Members
- InputS3Path
-
- Type: string
The Amazon Simple Storage Service (Amazon S3) path from where you will import the labels.
- Replace
-
- Type: boolean
Indicates whether to overwrite your existing labels.
InternalServiceException
Description
An internal service error occurred.
Members
- Message
-
- Type: string
A message describing the problem.
InvalidInputException
Description
The input provided was not valid.
Members
- FromFederationSource
-
- Type: boolean
Indicates whether or not the exception relates to a federated source.
- Message
-
- Type: string
A message describing the problem.
InvalidStateException
Description
An error that indicates your data is in an invalid state.
Members
- Message
-
- Type: string
A message describing the problem.
JDBCConnectorOptions
Description
Additional connection options for the connector.
Members
- DataTypeMapping
-
- Type: Associative array of custom strings keys (JDBCDataType) to strings
Custom data type mapping that builds a mapping from a JDBC data type to an Glue data type. For example, the option
"dataTypeMapping":{"FLOAT":"STRING"}
maps data fields of JDBC typeFLOAT
into the JavaString
type by calling theResultSet.getString()
method of the driver, and uses it to build the Glue record. TheResultSet
object is implemented by each driver, so the behavior is specific to the driver you use. Refer to the documentation for your JDBC driver to understand how the driver performs the conversions. - FilterPredicate
-
- Type: string
Extra condition clause to filter data from source. For example:
BillingCity='Mountain View'
When using a query instead of a table name, you should validate that the query works with the specified
filterPredicate
. - JobBookmarkKeys
-
- Type: Array of strings
The name of the job bookmark keys on which to sort.
- JobBookmarkKeysSortOrder
-
- Type: string
Specifies an ascending or descending sort order.
- LowerBound
-
- Type: long (int|float)
The minimum value of
partitionColumn
that is used to decide partition stride. - NumPartitions
-
- Type: long (int|float)
The number of partitions. This value, along with
lowerBound
(inclusive) andupperBound
(exclusive), form partition strides for generatedWHERE
clause expressions that are used to split thepartitionColumn
. - PartitionColumn
-
- Type: string
The name of an integer column that is used for partitioning. This option works only when it's included with
lowerBound
,upperBound
, andnumPartitions
. This option works the same way as in the Spark SQL JDBC reader. - UpperBound
-
- Type: long (int|float)
The maximum value of
partitionColumn
that is used to decide partition stride.
JDBCConnectorSource
Description
Specifies a connector to a JDBC data source.
Members
- AdditionalOptions
-
- Type: JDBCConnectorOptions structure
Additional connection options for the connector.
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection that is associated with the connector.
- ConnectionTable
-
- Type: string
The name of the table in the data source.
- ConnectionType
-
- Required: Yes
- Type: string
The type of connection, such as marketplace.jdbc or custom.jdbc, designating a connection to a JDBC data store.
- ConnectorName
-
- Required: Yes
- Type: string
The name of a connector that assists with accessing the data store in Glue Studio.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the custom JDBC source.
- Query
-
- Type: string
The table or SQL query to get the data from. You can specify either
ConnectionTable
orquery
, but not both.
JDBCConnectorTarget
Description
Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Additional connection options for the connector.
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection that is associated with the connector.
- ConnectionTable
-
- Required: Yes
- Type: string
The name of the table in the data target.
- ConnectionType
-
- Required: Yes
- Type: string
The type of connection, such as marketplace.jdbc or custom.jdbc, designating a connection to a JDBC data target.
- ConnectorName
-
- Required: Yes
- Type: string
The name of a connector that will be used.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the JDBC target.
JdbcTarget
Description
Specifies a JDBC data store to crawl.
Members
- ConnectionName
-
- Type: string
The name of the connection to use to connect to the JDBC target.
- EnableAdditionalMetadata
-
- Type: Array of strings
Specify a value of
RAWTYPES
orCOMMENTS
to enable additional metadata in table responses.RAWTYPES
provides the native-level datatype.COMMENTS
provides comments associated with a column or table in the database.If you do not need additional metadata, keep the field empty.
- Exclusions
-
- Type: Array of strings
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
- Path
-
- Type: string
The path of the JDBC target.
Job
Description
Specifies a job definition.
Members
- AllocatedCapacity
-
- Type: int
This field is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) allocated to runs of this job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- CodeGenConfigurationNodes
-
- Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures
The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.
- Command
-
- Type: JobCommand structure
The
JobCommand
that runs this job. - Connections
-
- Type: ConnectionsList structure
The connections used for this job.
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time and date that this job definition was created.
- DefaultArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The default arguments for every run of this job, specified as name-value pairs.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- Description
-
- Type: string
A description of the job.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - ExecutionProperty
-
- Type: ExecutionProperty structure
An
ExecutionProperty
specifying the maximum number of concurrent runs allowed for this job. - GlueVersion
-
- Type: string
In Spark jobs,
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.Ray jobs should set
GlueVersion
to4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by theRuntime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Jobs that are created without specifying a Glue version default to Glue 0.9.
- JobMode
-
- Type: string
A mode that describes how a job was created. Valid values are:
-
SCRIPT
- The job was created using the Glue Studio script editor. -
VISUAL
- The job was created using the Glue Studio visual editor. -
NOTEBOOK
- The job was created using an interactive sessions notebook.
When the
JobMode
field is missing or null,SCRIPT
is assigned as the default value. - JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job runs for this job.
A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing.
If this field does not match the value set in the job run, then the value from the job run field will be used.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last point in time when this job definition was modified.
- LogUri
-
- Type: string
This field is reserved for future use.
- MaintenanceWindow
-
- Type: string
This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.
Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0 or later jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- MaxRetries
-
- Type: int
The maximum number of times to retry this job after a JobRun fails.
- Name
-
- Type: string
The name you assign to this job definition.
- NonOverridableArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - ProfileName
-
- Type: string
The name of an Glue usage profile associated with the job.
- Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role associated with this job.
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job. - SourceControlDetails
-
- Type: SourceControlDetails structure
The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
- Timeout
-
- Type: int
The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours) for batch jobs.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
JobBookmarkEntry
Description
Defines a point that a job can resume processing.
Members
- Attempt
-
- Type: int
The attempt ID number.
- JobBookmark
-
- Type: string
The bookmark itself.
- JobName
-
- Type: string
The name of the job in question.
- PreviousRunId
-
- Type: string
The unique run identifier associated with the previous job run.
- Run
-
- Type: int
The run ID number.
- RunId
-
- Type: string
The run ID number.
- Version
-
- Type: int
The version of the job.
JobBookmarksEncryption
Description
Specifies how job bookmark data should be encrypted.
Members
- JobBookmarksEncryptionMode
-
- Type: string
The encryption mode to use for job bookmarks data.
- KmsKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
JobCommand
Description
Specifies code that runs when a job is run.
Members
- Name
-
- Type: string
The name of the job command. For an Apache Spark ETL job, this must be
glueetl
. For a Python shell job, it must bepythonshell
. For an Apache Spark streaming ETL job, this must begluestreaming
. For a Ray job, this must beglueray
. - PythonVersion
-
- Type: string
The Python version being used to run a Python shell job. Allowed values are 2 or 3.
- Runtime
-
- Type: string
In Ray jobs, Runtime is used to specify the versions of Ray, Python and additional libraries available in your environment. This field is not used in other job types. For supported runtime environment values, see Supported Ray runtime environments in the Glue Developer Guide.
- ScriptLocation
-
- Type: string
Specifies the Amazon Simple Storage Service (Amazon S3) path to a script that runs a job.
JobNodeDetails
Description
The details of a Job node present in the workflow.
Members
- JobRuns
-
- Type: Array of JobRun structures
The information for the job runs represented by the job node.
JobRun
Description
Contains information about a job run.
Members
- AllocatedCapacity
-
- Type: int
This field is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) allocated to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The job arguments associated with this run. For this job run, they replace the default arguments set in the job definition itself.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- Attempt
-
- Type: int
The number of the attempt to run this job.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that this job run completed.
- DPUSeconds
-
- Type: double
This field can be set for either job runs with execution class
FLEX
or when Auto Scaling is enabled, and represents the total time each executor ran during the lifecycle of a job run in seconds, multiplied by a DPU factor (1 forG.1X
, 2 forG.2X
, or 0.25 forG.025X
workers). This value may be different than theexecutionEngineRuntime
*MaxCapacity
as in the case of Auto Scaling jobs, as the number of executors running at a given time may be less than theMaxCapacity
. Therefore, it is possible that the value ofDPUSeconds
is less thanexecutionEngineRuntime
*MaxCapacity
. - ErrorMessage
-
- Type: string
An error message associated with this job run.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - ExecutionTime
-
- Type: int
The amount of time (in seconds) that the job run consumed resources.
- GlueVersion
-
- Type: string
In Spark jobs,
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.Ray jobs should set
GlueVersion
to4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by theRuntime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Jobs that are created without specifying a Glue version default to Glue 0.9.
- Id
-
- Type: string
The ID of this job run.
- JobMode
-
- Type: string
A mode that describes how a job was created. Valid values are:
-
SCRIPT
- The job was created using the Glue Studio script editor. -
VISUAL
- The job was created using the Glue Studio visual editor. -
NOTEBOOK
- The job was created using an interactive sessions notebook.
When the
JobMode
field is missing or null,SCRIPT
is assigned as the default value. - JobName
-
- Type: string
The name of the job definition being used in this run.
- JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job run.
A value of true means job run queuing is enabled for the job run. If false or not populated, the job run will not be considered for queueing.
- JobRunState
-
- Type: string
The current state of the job run. For more information about the statuses of jobs that have terminated abnormally, see Glue Job Run Statuses.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that this job run was modified.
- LogGroupName
-
- Type: string
The name of the log group for secure logging that can be server-side encrypted in Amazon CloudWatch using KMS. This name can be
/aws-glue/jobs/
, in which case the default encryption isNONE
. If you add a role name andSecurityConfiguration
name (in other words,/aws-glue/jobs-yourRoleName-yourSecurityConfigurationName/
), then that security configuration is used to encrypt the log group. - MaintenanceWindow
-
- Type: string
This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.
Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0+ jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job run notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - PredecessorRuns
-
- Type: Array of Predecessor structures
A list of predecessors to this job run.
- PreviousRunId
-
- Type: string
The ID of the previous run of this job. For example, the
JobRunId
specified in theStartJobRun
action. - ProfileName
-
- Type: string
The name of an Glue usage profile associated with the job run.
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job run. - StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time at which this job run was started.
- StateDetail
-
- Type: string
This field holds details that pertain to the state of a job run. The field is nullable.
For example, when a job run is in a WAITING state as a result of job run queuing, the field has the reason why the job run is in that state.
- Timeout
-
- Type: int
The
JobRun
timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and entersTIMEOUT
status. This value overrides the timeout value set in the parent job.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- TriggerName
-
- Type: string
The name of the trigger that started this job run.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
JobUpdate
Description
Specifies information used to update an existing job definition. The previous job definition is completely overwritten by this information.
Members
- AllocatedCapacity
-
- Type: int
This field is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) to allocate to this job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- CodeGenConfigurationNodes
-
- Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures
The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.
- Command
-
- Type: JobCommand structure
The
JobCommand
that runs this job (required). - Connections
-
- Type: ConnectionsList structure
The connections used for this job.
- DefaultArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The default arguments for every run of this job, specified as name-value pairs.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- Description
-
- Type: string
Description of the job being defined.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - ExecutionProperty
-
- Type: ExecutionProperty structure
An
ExecutionProperty
specifying the maximum number of concurrent runs allowed for this job. - GlueVersion
-
- Type: string
In Spark jobs,
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.Ray jobs should set
GlueVersion
to4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by theRuntime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Jobs that are created without specifying a Glue version default to Glue 0.9.
- JobMode
-
- Type: string
A mode that describes how a job was created. Valid values are:
-
SCRIPT
- The job was created using the Glue Studio script editor. -
VISUAL
- The job was created using the Glue Studio visual editor. -
NOTEBOOK
- The job was created using an interactive sessions notebook.
When the
JobMode
field is missing or null,SCRIPT
is assigned as the default value. - JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job runs for this job.
A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing.
If this field does not match the value set in the job run, then the value from the job run field will be used.
- LogUri
-
- Type: string
This field is reserved for future use.
- MaintenanceWindow
-
- Type: string
This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.
Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0+ jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- MaxRetries
-
- Type: int
The maximum number of times to retry this job if it fails.
- NonOverridableArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies the configuration properties of a job notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role associated with this job (required).
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job. - SourceControlDetails
-
- Type: SourceControlDetails structure
The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
- Timeout
-
- Type: int
The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours) for batch jobs.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Join
Description
Specifies a transform that joins two datasets into one dataset using a comparison phrase on the specified data property keys. You can use inner, outer, left, right, left semi, and left anti joins.
Members
- Columns
-
- Required: Yes
- Type: Array of JoinColumn structures
A list of the two columns to be joined.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- JoinType
-
- Required: Yes
- Type: string
Specifies the type of join to be performed on the datasets.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
JoinColumn
Description
Specifies a column to be joined.
Members
- From
-
- Required: Yes
- Type: string
The column to be joined.
- Keys
-
- Required: Yes
- Type: Array of stringss
The key of the column to be joined.
JsonClassifier
Description
A classifier for JSON
content.
Members
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was registered.
- JsonPath
-
- Required: Yes
- Type: string
A
JsonPath
string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers. - LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was last updated.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- Version
-
- Type: long (int|float)
The version of this classifier.
KafkaStreamingSourceOptions
Description
Additional options for streaming.
Members
- AddRecordTimestamp
-
- Type: string
When this option is set to 'true', the data output will contain an additional column named "__src_timestamp" that indicates the time when the corresponding record received by the topic. The default value is 'false'. This option is supported in Glue version 4.0 or later.
- Assign
-
- Type: string
The specific
TopicPartitions
to consume. You must specify at least one of"topicName"
,"assign"
or"subscribePattern"
. - BootstrapServers
-
- Type: string
A list of bootstrap server URLs, for example, as
b-1.vpc-test-2.o4q88o.c6.kafka.us-east-1.amazonaws.com:9094
. This option must be specified in the API call or defined in the table metadata in the Data Catalog. - Classification
-
- Type: string
An optional classification.
- ConnectionName
-
- Type: string
The name of the connection.
- Delimiter
-
- Type: string
Specifies the delimiter character.
- EmitConsumerLagMetrics
-
- Type: string
When this option is set to 'true', for each batch, it will emit the metrics for the duration between the oldest record received by the topic and the time it arrives in Glue to CloudWatch. The metric's name is "glue.driver.streaming.maxConsumerLagInMs". The default value is 'false'. This option is supported in Glue version 4.0 or later.
- EndingOffsets
-
- Type: string
The end point when a batch query is ended. Possible values are either
"latest"
or a JSON string that specifies an ending offset for eachTopicPartition
. - IncludeHeaders
-
- Type: boolean
Whether to include the Kafka headers. When the option is set to "true", the data output will contain an additional column named "glue_streaming_kafka_headers" with type
Array[Struct(key: String, value: String)]
. The default value is "false". This option is available in Glue version 3.0 or later only. - MaxOffsetsPerTrigger
-
- Type: long (int|float)
The rate limit on the maximum number of offsets that are processed per trigger interval. The specified total number of offsets is proportionally split across
topicPartitions
of different volumes. The default value is null, which means that the consumer reads all offsets until the known latest offset. - MinPartitions
-
- Type: int
The desired minimum number of partitions to read from Kafka. The default value is null, which means that the number of spark partitions is equal to the number of Kafka partitions.
- NumRetries
-
- Type: int
The number of times to retry before failing to fetch Kafka offsets. The default value is
3
. - PollTimeoutMs
-
- Type: long (int|float)
The timeout in milliseconds to poll data from Kafka in Spark job executors. The default value is
512
. - RetryIntervalMs
-
- Type: long (int|float)
The time in milliseconds to wait before retrying to fetch Kafka offsets. The default value is
10
. - SecurityProtocol
-
- Type: string
The protocol used to communicate with brokers. The possible values are
"SSL"
or"PLAINTEXT"
. - StartingOffsets
-
- Type: string
The starting position in the Kafka topic to read data from. The possible values are
"earliest"
or"latest"
. The default value is"latest"
. - StartingTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp of the record in the Kafka topic to start reading data from. The possible values are a timestamp string in UTC format of the pattern
yyyy-mm-ddTHH:MM:SSZ
(where Z represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00+08:00").Only one of
StartingTimestamp
orStartingOffsets
must be set. - SubscribePattern
-
- Type: string
A Java regex string that identifies the topic list to subscribe to. You must specify at least one of
"topicName"
,"assign"
or"subscribePattern"
. - TopicName
-
- Type: string
The topic name as specified in Apache Kafka. You must specify at least one of
"topicName"
,"assign"
or"subscribePattern"
.
KeySchemaElement
Description
A partition key pair consisting of a name and a type.
Members
- Name
-
- Required: Yes
- Type: string
The name of a partition key.
- Type
-
- Required: Yes
- Type: string
The type of a partition key.
KinesisStreamingSourceOptions
Description
Additional options for the Amazon Kinesis streaming data source.
Members
- AddIdleTimeBetweenReads
-
- Type: boolean
Adds a time delay between two consecutive getRecords operations. The default value is
"False"
. This option is only configurable for Glue version 2.0 and above. - AddRecordTimestamp
-
- Type: string
When this option is set to 'true', the data output will contain an additional column named "__src_timestamp" that indicates the time when the corresponding record received by the stream. The default value is 'false'. This option is supported in Glue version 4.0 or later.
- AvoidEmptyBatches
-
- Type: boolean
Avoids creating an empty microbatch job by checking for unread data in the Kinesis data stream before the batch is started. The default value is
"False"
. - Classification
-
- Type: string
An optional classification.
- Delimiter
-
- Type: string
Specifies the delimiter character.
- DescribeShardInterval
-
- Type: long (int|float)
The minimum time interval between two ListShards API calls for your script to consider resharding. The default value is
1s
. - EmitConsumerLagMetrics
-
- Type: string
When this option is set to 'true', for each batch, it will emit the metrics for the duration between the oldest record received by the stream and the time it arrives in Glue to CloudWatch. The metric's name is "glue.driver.streaming.maxConsumerLagInMs". The default value is 'false'. This option is supported in Glue version 4.0 or later.
- EndpointUrl
-
- Type: string
The URL of the Kinesis endpoint.
- IdleTimeBetweenReadsInMs
-
- Type: long (int|float)
The minimum time delay between two consecutive getRecords operations, specified in ms. The default value is
1000
. This option is only configurable for Glue version 2.0 and above. - MaxFetchRecordsPerShard
-
- Type: long (int|float)
The maximum number of records to fetch per shard in the Kinesis data stream per microbatch. Note: The client can exceed this limit if the streaming job has already read extra records from Kinesis (in the same get-records call). If
MaxFetchRecordsPerShard
needs to be strict then it needs to be a multiple ofMaxRecordPerRead
. The default value is100000
. - MaxFetchTimeInMs
-
- Type: long (int|float)
The maximum time spent for the job executor to read records for the current batch from the Kinesis data stream, specified in milliseconds (ms). Multiple
GetRecords
API calls may be made within this time. The default value is1000
. - MaxRecordPerRead
-
- Type: long (int|float)
The maximum number of records to fetch from the Kinesis data stream in each getRecords operation. The default value is
10000
. - MaxRetryIntervalMs
-
- Type: long (int|float)
The maximum cool-off time period (specified in ms) between two retries of a Kinesis Data Streams API call. The default value is
10000
. - NumRetries
-
- Type: int
The maximum number of retries for Kinesis Data Streams API requests. The default value is
3
. - RetryIntervalMs
-
- Type: long (int|float)
The cool-off time period (specified in ms) before retrying the Kinesis Data Streams API call. The default value is
1000
. - RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the role to assume using AWS Security Token Service (AWS STS). This role must have permissions for describe or read record operations for the Kinesis data stream. You must use this parameter when accessing a data stream in a different account. Used in conjunction with
"awsSTSSessionName"
. - RoleSessionName
-
- Type: string
An identifier for the session assuming the role using AWS STS. You must use this parameter when accessing a data stream in a different account. Used in conjunction with
"awsSTSRoleARN"
. - StartingPosition
-
- Type: string
The starting position in the Kinesis data stream to read data from. The possible values are
"latest"
,"trim_horizon"
,"earliest"
, or a timestamp string in UTC format in the patternyyyy-mm-ddTHH:MM:SSZ
(whereZ
represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00-04:00"). The default value is"latest"
.Note: Using a value that is a timestamp string in UTC format for "startingPosition" is supported only for Glue version 4.0 or later.
- StartingTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp of the record in the Kinesis data stream to start reading data from. The possible values are a timestamp string in UTC format of the pattern
yyyy-mm-ddTHH:MM:SSZ
(where Z represents a UTC timezone offset with a +/-. For example: "2023-04-04T08:00:00+08:00"). - StreamArn
-
- Type: string
The Amazon Resource Name (ARN) of the Kinesis data stream.
- StreamName
-
- Type: string
The name of the Kinesis data stream.
LabelingSetGenerationTaskRunProperties
Description
Specifies configuration properties for a labeling set generation task run.
Members
- OutputS3Path
-
- Type: string
The Amazon Simple Storage Service (Amazon S3) path where you will generate the labeling set.
LakeFormationConfiguration
Description
Specifies Lake Formation configuration settings for the crawler.
Members
- AccountId
-
- Type: string
Required for cross account crawls. For same account crawls as the target data, this can be left as null.
- UseLakeFormationCredentials
-
- Type: boolean
Specifies whether to use Lake Formation credentials for the crawler instead of the IAM role credentials.
LastActiveDefinition
Description
When there are multiple versions of a blueprint and the latest version has some errors, this attribute indicates the last successful blueprint definition that is available with the service.
Members
- BlueprintLocation
-
- Type: string
Specifies a path in Amazon S3 where the blueprint is published by the Glue developer.
- BlueprintServiceLocation
-
- Type: string
Specifies a path in Amazon S3 where the blueprint is copied when you create or update the blueprint.
- Description
-
- Type: string
The description of the blueprint.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time the blueprint was last modified.
- ParameterSpec
-
- Type: string
A JSON string specifying the parameters for the blueprint.
LastCrawlInfo
Description
Status and error information about the most recent crawl.
Members
- ErrorMessage
-
- Type: string
If an error occurred, the error information about the last crawl.
- LogGroup
-
- Type: string
The log group for the last crawl.
- LogStream
-
- Type: string
The log stream for the last crawl.
- MessagePrefix
-
- Type: string
The prefix for a message about this crawl.
- StartTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the crawl started.
- Status
-
- Type: string
Status of the last crawl.
LineageConfiguration
Description
Specifies data lineage configuration settings for the crawler.
Members
- CrawlerLineageSettings
-
- Type: string
Specifies whether data lineage is enabled for the crawler. Valid values are:
-
ENABLE: enables data lineage for the crawler
-
DISABLE: disables data lineage for the crawler
Location
Description
The location of resources.
Members
- DynamoDB
-
- Type: Array of CodeGenNodeArg structures
An Amazon DynamoDB table location.
- Jdbc
-
- Type: Array of CodeGenNodeArg structures
A JDBC location.
- S3
-
- Type: Array of CodeGenNodeArg structures
An Amazon Simple Storage Service (Amazon S3) location.
LongColumnStatisticsData
Description
Defines column statistics supported for integer data columns.
Members
- MaximumValue
-
- Type: long (int|float)
The highest value in the column.
- MinimumValue
-
- Type: long (int|float)
The lowest value in the column.
- NumberOfDistinctValues
-
- Required: Yes
- Type: long (int|float)
The number of distinct values in a column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
MLTransform
Description
A structure for a machine learning transform.
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The time and date that this machine learning transform was created.
- Description
-
- Type: string
A user-defined, long-form description text for the machine learning transform. Descriptions are not guaranteed to be unique and can be changed at any time.
- EvaluationMetrics
-
- Type: EvaluationMetrics structure
An
EvaluationMetrics
object. Evaluation metrics provide an estimate of the quality of your machine learning transform. - GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- InputRecordTables
-
- Type: Array of GlueTable structures
A list of Glue table definitions used by the transform.
- LabelCount
-
- Type: int
A count identifier for the labeling files generated by Glue for this transform. As you create a better transform, you can iteratively download, label, and upload the labeling file.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this machine learning transform was modified.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only. - MaxRetries
-
- Type: int
The maximum number of times to retry after an
MLTaskRun
of the machine learning transform fails. - Name
-
- Type: string
A user-defined name for the machine learning transform. Names are not guaranteed unique and can be changed at any time.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a task of the transform runs.If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). - Parameters
-
- Type: TransformParameters structure
A
TransformParameters
object. You can use parameters to tune (customize) the behavior of the machine learning transform by specifying what data it learns from and your preference on various tradeoffs (such as precious vs. recall, or accuracy vs. cost). - Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both Glue service role permissions to Glue resources, and Amazon S3 permissions required by the transform.
-
This role needs Glue service role permissions to allow access to resources in Glue. See Attach a Policy to IAM Users That Access Glue.
-
This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.
- Schema
-
- Type: Array of SchemaColumn structures
A map of key-value pairs representing the columns and data types that this transform can run against. Has an upper bound of 100 columns.
- Status
-
- Type: string
The current status of the machine learning transform.
- Timeout
-
- Type: int
The timeout in minutes of the machine learning transform.
- TransformEncryption
-
- Type: TransformEncryption structure
The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.
- TransformId
-
- Type: string
The unique transform ID that is generated for the machine learning transform. The ID is guaranteed to be unique and does not change.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a task of this transform runs. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker. -
For the
G.2X
worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
MLTransformNotReadyException
Description
The machine learning transform is not ready to run.
Members
- Message
-
- Type: string
A message describing the problem.
MLUserDataEncryption
Description
The encryption-at-rest settings of the transform that apply to accessing user data.
Members
- KmsKeyId
-
- Type: string
The ID for the customer-provided KMS key.
- MlUserDataEncryptionMode
-
- Required: Yes
- Type: string
The encryption mode applied to user data. Valid values are:
-
DISABLED: encryption is disabled
-
SSEKMS: use of server-side encryption with Key Management Service (SSE-KMS) for user data stored in Amazon S3.
Mapping
Description
Specifies the mapping of data property keys.
Members
- Children
-
- Type: Array of Mapping structures
Only applicable to nested data structures. If you want to change the parent structure, but also one of its children, you can fill out this data strucutre. It is also
Mapping
, but itsFromPath
will be the parent'sFromPath
plus theFromPath
from this structure.For the children part, suppose you have the structure:
{ "FromPath": "OuterStructure", "ToKey": "OuterStructure", "ToType": "Struct", "Dropped": false, "Chidlren": [{ "FromPath": "inner", "ToKey": "inner", "ToType": "Double", "Dropped": false, }] }
You can specify a
Mapping
that looks like:{ "FromPath": "OuterStructure", "ToKey": "OuterStructure", "ToType": "Struct", "Dropped": false, "Chidlren": [{ "FromPath": "inner", "ToKey": "inner", "ToType": "Double", "Dropped": false, }] }
- Dropped
-
- Type: boolean
If true, then the column is removed.
- FromPath
-
- Type: Array of strings
The table or column to be modified.
- FromType
-
- Type: string
The type of the data to be modified.
- ToKey
-
- Type: string
After the apply mapping, what the name of the column should be. Can be the same as
FromPath
. - ToType
-
- Type: string
The data type that the data is to be modified to.
MappingEntry
Description
Defines a mapping.
Members
- SourcePath
-
- Type: string
The source path.
- SourceTable
-
- Type: string
The name of the source table.
- SourceType
-
- Type: string
The source type.
- TargetPath
-
- Type: string
The target path.
- TargetTable
-
- Type: string
The target table.
- TargetType
-
- Type: string
The target type.
Merge
Description
Specifies a transform that merges a DynamicFrame
with a staging DynamicFrame
based on the specified primary keys to identify records. Duplicate records (records with the same primary keys) are not de-duplicated.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- PrimaryKeys
-
- Required: Yes
- Type: Array of stringss
The list of primary key fields to match records from the source and staging dynamic frames.
- Source
-
- Required: Yes
- Type: string
The source
DynamicFrame
that will be merged with a stagingDynamicFrame
.
MetadataInfo
Description
A structure containing metadata information for a schema version.
Members
- CreatedTime
-
- Type: string
The time at which the entry was created.
- MetadataValue
-
- Type: string
The metadata key’s corresponding value.
- OtherMetadataValueList
-
- Type: Array of OtherMetadataValueListItem structures
Other metadata belonging to the same metadata key.
MetadataKeyValuePair
Description
A structure containing a key value pair for metadata.
Members
- MetadataKey
-
- Type: string
A metadata key.
- MetadataValue
-
- Type: string
A metadata key’s corresponding value.
MetricBasedObservation
Description
Describes the metric based observation generated based on evaluated data quality metrics.
Members
- MetricName
-
- Type: string
The name of the data quality metric used for generating the observation.
- MetricValues
-
- Type: DataQualityMetricValues structure
An object of type
DataQualityMetricValues
representing the analysis of the data quality metric value. - NewRules
-
- Type: Array of strings
A list of new data quality rules generated as part of the observation based on the data quality metric value.
- StatisticId
-
- Type: string
The Statistic ID.
MicrosoftSQLServerCatalogSource
Description
Specifies a Microsoft SQL server data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
MicrosoftSQLServerCatalogTarget
Description
Specifies a target that uses Microsoft SQL.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
MongoDBTarget
Description
Specifies an Amazon DocumentDB or MongoDB data store to crawl.
Members
- ConnectionName
-
- Type: string
The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target.
- Path
-
- Type: string
The path of the Amazon DocumentDB or MongoDB target (database/collection).
- ScanAll
-
- Type: boolean
Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.
A value of
true
means to scan all records, while a value offalse
means to sample the records. If no value is specified, the value defaults totrue
.
MySQLCatalogSource
Description
Specifies a MySQL data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
MySQLCatalogTarget
Description
Specifies a target that uses MySQL.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
NoScheduleException
Description
There is no applicable schedule.
Members
- Message
-
- Type: string
A message describing the problem.
Node
Description
A node represents an Glue component (trigger, crawler, or job) on a workflow graph.
Members
- CrawlerDetails
-
- Type: CrawlerNodeDetails structure
Details of the crawler when the node represents a crawler.
- JobDetails
-
- Type: JobNodeDetails structure
Details of the Job when the node represents a Job.
- Name
-
- Type: string
The name of the Glue component represented by the node.
- TriggerDetails
-
- Type: TriggerNodeDetails structure
Details of the Trigger when the node represents a Trigger.
- Type
-
- Type: string
The type of Glue component represented by the node.
- UniqueId
-
- Type: string
The unique Id assigned to the node within the workflow.
NotificationProperty
Description
Specifies configuration properties of a notification.
Members
- NotifyDelayAfter
-
- Type: int
After a job run starts, the number of minutes to wait before sending a job run delay notification.
NullCheckBoxList
Description
Represents whether certain values are recognized as null values for removal.
Members
- IsEmpty
-
- Type: boolean
Specifies that an empty string is considered as a null value.
- IsNegOne
-
- Type: boolean
Specifies that an integer value of -1 is considered as a null value.
- IsNullString
-
- Type: boolean
Specifies that a value spelling out the word 'null' is considered as a null value.
NullValueField
Description
Represents a custom null value such as a zeros or other value being used as a null placeholder unique to the dataset.
Members
- Datatype
-
- Required: Yes
- Type: Datatype structure
The datatype of the value.
- Value
-
- Required: Yes
- Type: string
The value of the null placeholder.
OAuth2ClientApplication
Description
The OAuth2 client app used for the connection.
Members
- AWSManagedClientApplicationReference
-
- Type: string
The reference to the SaaS-side client app that is Amazon Web Services managed.
- UserManagedClientApplicationClientId
-
- Type: string
The client application clientID if the ClientAppType is
USER_MANAGED
.
OAuth2Properties
Description
A structure containing properties for OAuth2 authentication.
Members
- OAuth2ClientApplication
-
- Type: OAuth2ClientApplication structure
The client application type. For example, AWS_MANAGED or USER_MANAGED.
- OAuth2GrantType
-
- Type: string
The OAuth2 grant type. For example,
AUTHORIZATION_CODE
,JWT_BEARER
, orCLIENT_CREDENTIALS
. - TokenUrl
-
- Type: string
The URL of the provider's authentication server, to exchange an authorization code for an access token.
- TokenUrlParametersMap
-
- Type: Associative array of custom strings keys (TokenUrlParameterKey) to strings
A map of parameters that are added to the token
GET
request.
OAuth2PropertiesInput
Description
A structure containing properties for OAuth2 in the CreateConnection request.
Members
- AuthorizationCodeProperties
-
- Type: AuthorizationCodeProperties structure
The set of properties required for the the OAuth2
AUTHORIZATION_CODE
grant type. - OAuth2ClientApplication
-
- Type: OAuth2ClientApplication structure
The client application type in the CreateConnection request. For example,
AWS_MANAGED
orUSER_MANAGED
. - OAuth2GrantType
-
- Type: string
The OAuth2 grant type in the CreateConnection request. For example,
AUTHORIZATION_CODE
,JWT_BEARER
, orCLIENT_CREDENTIALS
. - TokenUrl
-
- Type: string
The URL of the provider's authentication server, to exchange an authorization code for an access token.
- TokenUrlParametersMap
-
- Type: Associative array of custom strings keys (TokenUrlParameterKey) to strings
A map of parameters that are added to the token
GET
request.
OpenTableFormatInput
Description
A structure representing an open format table.
Members
- IcebergInput
-
- Type: IcebergInput structure
Specifies an
IcebergInput
structure that defines an Apache Iceberg metadata table.
OperationNotSupportedException
Description
The operation is not available in the region.
Members
- Message
-
- Type: string
A message describing the problem.
OperationTimeoutException
Description
The operation timed out.
Members
- Message
-
- Type: string
A message describing the problem.
Option
Description
Specifies an option value.
Members
- Description
-
- Type: string
Specifies the description of the option.
- Label
-
- Type: string
Specifies the label of the option.
- Value
-
- Type: string
Specifies the value of the option.
OracleSQLCatalogSource
Description
Specifies an Oracle data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
OracleSQLCatalogTarget
Description
Specifies a target that uses Oracle SQL.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
Order
Description
Specifies the sort order of a sorted column.
Members
- Column
-
- Required: Yes
- Type: string
The name of the column.
- SortOrder
-
- Required: Yes
- Type: int
Indicates that the column is sorted in ascending order (
== 1
), or in descending order (==0
).
OrphanFileDeletionConfiguration
Description
The configuration for an orphan file deletion optimizer.
Members
- icebergConfiguration
-
- Type: IcebergOrphanFileDeletionConfiguration structure
The configuration for an Iceberg orphan file deletion optimizer.
OrphanFileDeletionMetrics
Description
A structure that contains orphan file deletion metrics for the optimizer run.
Members
- IcebergMetrics
-
- Type: IcebergOrphanFileDeletionMetrics structure
A structure containing the Iceberg orphan file deletion metrics for the optimizer run.
OtherMetadataValueListItem
Description
A structure containing other metadata for a schema version belonging to the same metadata key.
Members
- CreatedTime
-
- Type: string
The time at which the entry was created.
- MetadataValue
-
- Type: string
The metadata key’s corresponding value for the other metadata belonging to the same metadata key.
PIIDetection
Description
Specifies a transform that identifies, removes or masks PII data.
Members
- EntityTypesToDetect
-
- Required: Yes
- Type: Array of strings
Indicates the types of entities the PIIDetection transform will identify as PII data.
PII type entities include: PERSON_NAME, DATE, USA_SNN, EMAIL, USA_ITIN, USA_PASSPORT_NUMBER, PHONE_NUMBER, BANK_ACCOUNT, IP_ADDRESS, MAC_ADDRESS, USA_CPT_CODE, USA_HCPCS_CODE, USA_NATIONAL_DRUG_CODE, USA_MEDICARE_BENEFICIARY_IDENTIFIER, USA_HEALTH_INSURANCE_CLAIM_NUMBER,CREDIT_CARD,USA_NATIONAL_PROVIDER_IDENTIFIER,USA_DEA_NUMBER,USA_DRIVING_LICENSE
- Inputs
-
- Required: Yes
- Type: Array of strings
The node ID inputs to the transform.
- MaskValue
-
- Type: string
Indicates the value that will replace the detected entity.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- OutputColumnName
-
- Type: string
Indicates the output column name that will contain any entity type detected in that row.
- PiiType
-
- Required: Yes
- Type: string
Indicates the type of PIIDetection transform.
- SampleFraction
-
- Type: double
Indicates the fraction of the data to sample when scanning for PII entities.
- ThresholdFraction
-
- Type: double
Indicates the fraction of the data that must be met in order for a column to be identified as PII data.
Partition
Description
Represents a slice of table data.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the partition resides.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the partition was created.
- DatabaseName
-
- Type: string
The name of the catalog database in which to create the partition.
- LastAccessTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time at which the partition was accessed.
- LastAnalyzedTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time at which column statistics were computed for this partition.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define partition parameters.
- StorageDescriptor
-
- Type: StorageDescriptor structure
Provides information about the physical location where the partition is stored.
- TableName
-
- Type: string
The name of the database table in which to create the partition.
- Values
-
- Type: Array of strings
The values of the partition.
PartitionError
Description
Contains information about a partition error.
Members
- ErrorDetail
-
- Type: ErrorDetail structure
The details about the partition error.
- PartitionValues
-
- Type: Array of strings
The values that define the partition.
PartitionIndex
Description
A structure for a partition index.
Members
- IndexName
-
- Required: Yes
- Type: string
The name of the partition index.
- Keys
-
- Required: Yes
- Type: Array of strings
The keys for the partition index.
PartitionIndexDescriptor
Description
A descriptor for a partition index in a table.
Members
- BackfillErrors
-
- Type: Array of BackfillError structures
A list of errors that can occur when registering partition indexes for an existing table.
- IndexName
-
- Required: Yes
- Type: string
The name of the partition index.
- IndexStatus
-
- Required: Yes
- Type: string
The status of the partition index.
The possible statuses are:
-
CREATING: The index is being created. When an index is in a CREATING state, the index or its table cannot be deleted.
-
ACTIVE: The index creation succeeds.
-
FAILED: The index creation fails.
-
DELETING: The index is deleted from the list of indexes.
- Keys
-
- Required: Yes
- Type: Array of KeySchemaElement structures
A list of one or more keys, as
KeySchemaElement
structures, for the partition index.
PartitionInput
Description
The structure used to create and update a partition.
Members
- LastAccessTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time at which the partition was accessed.
- LastAnalyzedTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time at which column statistics were computed for this partition.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define partition parameters.
- StorageDescriptor
-
- Type: StorageDescriptor structure
Provides information about the physical location where the partition is stored.
- Values
-
- Type: Array of strings
The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.
The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise Glue will add the values to the wrong keys.
PartitionValueList
Description
Contains a list of values defining partitions.
Members
- Values
-
- Required: Yes
- Type: Array of strings
The list of values.
PermissionTypeMismatchException
Description
The operation timed out.
Members
- Message
-
- Type: string
There is a mismatch between the SupportedPermissionType used in the query request and the permissions defined on the target table.
PhysicalConnectionRequirements
Description
The OAuth client app in GetConnection response.
Members
- AvailabilityZone
-
- Type: string
The connection's Availability Zone.
- SecurityGroupIdList
-
- Type: Array of strings
The security group ID list used by the connection.
- SubnetId
-
- Type: string
The subnet ID used by the connection.
PostgreSQLCatalogSource
Description
Specifies a PostgresSQL data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
PostgreSQLCatalogTarget
Description
Specifies a target that uses Postgres SQL.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
Predecessor
Description
A job run that was used in the predicate of a conditional trigger that triggered this job run.
Members
- JobName
-
- Type: string
The name of the job definition used by the predecessor job run.
- RunId
-
- Type: string
The job-run ID of the predecessor job run.
Predicate
Description
Defines the predicate of the trigger, which determines when it fires.
Members
- Conditions
-
- Type: Array of Condition structures
A list of the conditions that determine when the trigger will fire.
- Logical
-
- Type: string
An optional field if only one condition is listed. If multiple conditions are listed, then this field is required.
PrincipalPermissions
Description
Permissions granted to a principal.
Members
- Permissions
-
- Type: Array of strings
The permissions that are granted to the principal.
- Principal
-
- Type: DataLakePrincipal structure
The principal who is granted permissions.
ProfileConfiguration
Description
Specifies the job and session values that an admin configures in an Glue usage profile.
Members
- JobConfiguration
-
- Type: Associative array of custom strings keys (NameString) to ConfigurationObject structures
A key-value map of configuration parameters for Glue jobs.
- SessionConfiguration
-
- Type: Associative array of custom strings keys (NameString) to ConfigurationObject structures
A key-value map of configuration parameters for Glue sessions.
PropertyPredicate
Description
Defines a property predicate.
Members
- Comparator
-
- Type: string
The comparator used to compare this property to others.
- Key
-
- Type: string
The key of the property.
- Value
-
- Type: string
The value of the property.
QuerySessionContext
Description
A structure used as a protocol between query engines and Lake Formation or Glue. Contains both a Lake Formation generated authorization identifier and information from the request's authorization context.
Members
- AdditionalContext
-
- Type: Associative array of custom strings keys (ContextKey) to strings
An opaque string-string map passed by the query engine.
- ClusterId
-
- Type: string
An identifier string for the consumer cluster.
- QueryAuthorizationId
-
- Type: string
A cryptographically generated query identifier generated by Glue or Lake Formation.
- QueryId
-
- Type: string
A unique identifier generated by the query engine for the query.
- QueryStartTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp provided by the query engine for when the query started.
Recipe
Description
A Glue Studio node that uses a Glue DataBrew recipe in Glue jobs.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the recipe node, identified by id.
- Name
-
- Required: Yes
- Type: string
The name of the Glue Studio node.
- RecipeReference
-
- Type: RecipeReference structure
A reference to the DataBrew recipe used by the node.
- RecipeSteps
-
- Type: Array of RecipeStep structures
Transform steps used in the recipe node.
RecipeAction
Description
Actions defined in the Glue Studio data preparation recipe node.
Members
- Operation
-
- Required: Yes
- Type: string
The operation of the recipe action.
- Parameters
-
- Type: Associative array of custom strings keys (ParameterName) to strings
The parameters of the recipe action.
RecipeReference
Description
A reference to a Glue DataBrew recipe.
Members
- RecipeArn
-
- Required: Yes
- Type: string
The ARN of the DataBrew recipe.
- RecipeVersion
-
- Required: Yes
- Type: string
The RecipeVersion of the DataBrew recipe.
RecipeStep
Description
A recipe step used in a Glue Studio data preparation recipe node.
Members
- Action
-
- Required: Yes
- Type: RecipeAction structure
The transformation action of the recipe step.
- ConditionExpressions
-
- Type: Array of ConditionExpression structures
The condition expressions for the recipe step.
RecrawlPolicy
Description
When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in Glue in the developer guide.
Members
- RecrawlBehavior
-
- Type: string
Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run.
A value of
CRAWL_EVERYTHING
specifies crawling the entire dataset again.A value of
CRAWL_NEW_FOLDERS_ONLY
specifies crawling only folders that were added since the last crawler run.A value of
CRAWL_EVENT_MODE
specifies crawling only the changes identified by Amazon S3 events.
RedshiftSource
Description
Specifies an Amazon Redshift data store.
Members
- Database
-
- Required: Yes
- Type: string
The database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the Amazon Redshift data store.
- RedshiftTmpDir
-
- Type: string
The Amazon S3 path where temporary data can be staged when copying out of the database.
- Table
-
- Required: Yes
- Type: string
The database table to read from.
- TmpDirIAMRole
-
- Type: string
The IAM role with permissions.
RedshiftTarget
Description
Specifies a target that uses Amazon Redshift.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- RedshiftTmpDir
-
- Type: string
The Amazon S3 path where temporary data can be staged when copying out of the database.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
- TmpDirIAMRole
-
- Type: string
The IAM role with permissions.
- UpsertRedshiftOptions
-
- Type: UpsertRedshiftTargetOptions structure
The set of options to configure an upsert operation when writing to a Redshift target.
RegistryId
Description
A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Members
- RegistryArn
-
- Type: string
Arn of the registry to be updated. One of
RegistryArn
orRegistryName
has to be provided. - RegistryName
-
- Type: string
Name of the registry. Used only for lookup. One of
RegistryArn
orRegistryName
has to be provided.
RegistryListItem
Description
A structure containing the details for a registry.
Members
- CreatedTime
-
- Type: string
The data the registry was created.
- Description
-
- Type: string
A description of the registry.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry.
- RegistryName
-
- Type: string
The name of the registry.
- Status
-
- Type: string
The status of the registry.
- UpdatedTime
-
- Type: string
The date the registry was updated.
RelationalCatalogSource
Description
Specifies a Relational database data source in the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
RenameField
Description
Specifies a transform that renames a single data property key.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- SourcePath
-
- Required: Yes
- Type: Array of strings
A JSON path to a variable in the data structure for the source data.
- TargetPath
-
- Required: Yes
- Type: Array of strings
A JSON path to a variable in the data structure for the target data.
ResourceNotReadyException
Description
A resource was not ready for a transaction.
Members
- Message
-
- Type: string
A message describing the problem.
ResourceNumberLimitExceededException
Description
A resource numerical limit was exceeded.
Members
- Message
-
- Type: string
A message describing the problem.
ResourceUri
Description
The URIs for function resources.
Members
- ResourceType
-
- Type: string
The type of the resource.
- Uri
-
- Type: string
The URI for accessing the resource.
RetentionConfiguration
Description
The configuration for a snapshot retention optimizer.
Members
- icebergConfiguration
-
- Type: IcebergRetentionConfiguration structure
The configuration for an Iceberg snapshot retention optimizer.
RetentionMetrics
Description
A structure that contains retention metrics for the optimizer run.
Members
- IcebergMetrics
-
- Type: IcebergRetentionMetrics structure
A structure containing the Iceberg retention metrics for the optimizer run.
RunIdentifier
Description
A run identifier.
Members
- JobRunId
-
- Type: string
The Job Run ID.
- RunId
-
- Type: string
The Run ID.
RunMetrics
Description
Metrics for the optimizer run.
This structure is deprecated. See the individual metric members for compaction, retention, and orphan file deletion.
Members
- JobDurationInHour
-
- Type: string
The duration of the job in hours.
- NumberOfBytesCompacted
-
- Type: string
The number of bytes removed by the compaction job run.
- NumberOfDpus
-
- Type: string
The number of DPU hours consumed by the job.
- NumberOfFilesCompacted
-
- Type: string
The number of files removed by the compaction job run.
S3CatalogDeltaSource
Description
Specifies a Delta Lake data source that is registered in the Glue Data Catalog. The data source must be stored in Amazon S3.
Members
- AdditionalDeltaOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the Delta Lake data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Delta Lake source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
S3CatalogHudiSource
Description
Specifies a Hudi data source that is registered in the Glue Data Catalog. The Hudi data source must be stored in Amazon S3.
Members
- AdditionalHudiOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The name of the database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the Hudi data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Hudi source.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to read from.
S3CatalogSource
Description
Specifies an Amazon S3 data store in the Glue Data Catalog.
Members
- AdditionalOptions
-
- Type: S3SourceAdditionalOptions structure
Specifies additional connection options.
- Database
-
- Required: Yes
- Type: string
The database to read from.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- PartitionPredicate
-
- Type: string
Partitions satisfying this predicate are deleted. Files within the retention period in these partitions are not deleted. Set to
""
– empty by default. - Table
-
- Required: Yes
- Type: string
The database table to read from.
S3CatalogTarget
Description
Specifies a data target that writes to Amazon S3 using the Glue Data Catalog.
Members
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- SchemaChangePolicy
-
- Type: CatalogSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
S3CsvSource
Description
Specifies a command-separated value (CSV) data store stored in Amazon S3.
Members
- AdditionalOptions
-
- Type: S3DirectSourceAdditionalOptions structure
Specifies additional connection options.
- CompressionType
-
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Escaper
-
- Type: string
Specifies a character to use for escaping. This option is used only when reading CSV files. The default value is
none
. If enabled, the character which immediately follows is used as-is, except for a small set of well-known escapes (\n
,\r
,\t
, and\0
). - Exclusions
-
- Type: Array of strings
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
- GroupFiles
-
- Type: string
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to
"none"
. - GroupSize
-
- Type: string
The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files,
"groupFiles"
must be set to"inPartition"
for this to take effect. - MaxBand
-
- Type: int
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
- MaxFilesInBand
-
- Type: int
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
- Multiline
-
- Type: boolean
A Boolean value that specifies whether a single record can span multiple lines. This can occur when a field contains a quoted new-line character. You must set this option to True if any record spans multiple lines. The default value is
False
, which allows for more aggressive file-splitting during parsing. - Name
-
- Required: Yes
- Type: string
The name of the data store.
- OptimizePerformance
-
- Type: boolean
A Boolean value that specifies whether to use the advanced SIMD CSV reader along with Apache Arrow based columnar memory formats. Only available in Glue version 3.0.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the S3 CSV source.
- Paths
-
- Required: Yes
- Type: Array of strings
A list of the Amazon S3 paths to read from.
- QuoteChar
-
- Required: Yes
- Type: string
Specifies the character to use for quoting. The default is a double quote:
'"'
. Set this to-1
to turn off quoting entirely. - Recurse
-
- Type: boolean
If set to true, recursively reads files in all subdirectories under the specified paths.
- Separator
-
- Required: Yes
- Type: string
Specifies the delimiter character. The default is a comma: ",", but any other character can be specified.
- SkipFirst
-
- Type: boolean
A Boolean value that specifies whether to skip the first data line. The default value is
False
. - WithHeader
-
- Type: boolean
A Boolean value that specifies whether to treat the first line as a header. The default value is
False
. - WriteHeader
-
- Type: boolean
A Boolean value that specifies whether to write the header to output. The default value is
True
.
S3DeltaCatalogTarget
Description
Specifies a target that writes to a Delta Lake data source in the Glue Data Catalog.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options for the connector.
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- SchemaChangePolicy
-
- Type: CatalogSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
S3DeltaDirectTarget
Description
Specifies a target that writes to a Delta Lake data source in Amazon S3.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options for the connector.
- Compression
-
- Required: Yes
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Format
-
- Required: Yes
- Type: string
Specifies the data output format for the target.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- Path
-
- Required: Yes
- Type: string
The Amazon S3 path of your Delta Lake data source to write to.
- SchemaChangePolicy
-
- Type: DirectSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
S3DeltaSource
Description
Specifies a Delta Lake data source stored in Amazon S3.
Members
- AdditionalDeltaOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- AdditionalOptions
-
- Type: S3DirectSourceAdditionalOptions structure
Specifies additional options for the connector.
- Name
-
- Required: Yes
- Type: string
The name of the Delta Lake source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Delta Lake source.
- Paths
-
- Required: Yes
- Type: Array of strings
A list of the Amazon S3 paths to read from.
S3DirectSourceAdditionalOptions
Description
Specifies additional connection options for the Amazon S3 data store.
Members
- BoundedFiles
-
- Type: long (int|float)
Sets the upper limit for the target number of files that will be processed.
- BoundedSize
-
- Type: long (int|float)
Sets the upper limit for the target size of the dataset in bytes that will be processed.
- EnableSamplePath
-
- Type: boolean
Sets option to enable a sample path.
- SamplePath
-
- Type: string
If enabled, specifies the sample path.
S3DirectTarget
Description
Specifies a data target that writes to Amazon S3.
Members
- Compression
-
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Format
-
- Required: Yes
- Type: string
Specifies the data output format for the target.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- Path
-
- Required: Yes
- Type: string
A single Amazon S3 path to write to.
- SchemaChangePolicy
-
- Type: DirectSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
S3Encryption
Description
Specifies how Amazon Simple Storage Service (Amazon S3) data should be encrypted.
Members
- KmsKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
- S3EncryptionMode
-
- Type: string
The encryption mode to use for Amazon S3 data.
S3GlueParquetTarget
Description
Specifies a data target that writes to Amazon S3 in Apache Parquet columnar storage.
Members
- Compression
-
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- Path
-
- Required: Yes
- Type: string
A single Amazon S3 path to write to.
- SchemaChangePolicy
-
- Type: DirectSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
S3HudiCatalogTarget
Description
Specifies a target that writes to a Hudi data source in the Glue Data Catalog.
Members
- AdditionalOptions
-
- Required: Yes
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options for the connector.
- Database
-
- Required: Yes
- Type: string
The name of the database to write to.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- SchemaChangePolicy
-
- Type: CatalogSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
- Table
-
- Required: Yes
- Type: string
The name of the table in the database to write to.
S3HudiDirectTarget
Description
Specifies a target that writes to a Hudi data source in Amazon S3.
Members
- AdditionalOptions
-
- Required: Yes
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options for the connector.
- Compression
-
- Required: Yes
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Format
-
- Required: Yes
- Type: string
Specifies the data output format for the target.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- PartitionKeys
-
- Type: Array of stringss
Specifies native partitioning using a sequence of keys.
- Path
-
- Required: Yes
- Type: string
The Amazon S3 path of your Hudi data source to write to.
- SchemaChangePolicy
-
- Type: DirectSchemaChangePolicy structure
A policy that specifies update behavior for the crawler.
S3HudiSource
Description
Specifies a Hudi data source stored in Amazon S3.
Members
- AdditionalHudiOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional connection options.
- AdditionalOptions
-
- Type: S3DirectSourceAdditionalOptions structure
Specifies additional options for the connector.
- Name
-
- Required: Yes
- Type: string
The name of the Hudi source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the Hudi source.
- Paths
-
- Required: Yes
- Type: Array of strings
A list of the Amazon S3 paths to read from.
S3JsonSource
Description
Specifies a JSON data store stored in Amazon S3.
Members
- AdditionalOptions
-
- Type: S3DirectSourceAdditionalOptions structure
Specifies additional connection options.
- CompressionType
-
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Exclusions
-
- Type: Array of strings
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
- GroupFiles
-
- Type: string
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to
"none"
. - GroupSize
-
- Type: string
The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files,
"groupFiles"
must be set to"inPartition"
for this to take effect. - JsonPath
-
- Type: string
A JsonPath string defining the JSON data.
- MaxBand
-
- Type: int
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
- MaxFilesInBand
-
- Type: int
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
- Multiline
-
- Type: boolean
A Boolean value that specifies whether a single record can span multiple lines. This can occur when a field contains a quoted new-line character. You must set this option to True if any record spans multiple lines. The default value is
False
, which allows for more aggressive file-splitting during parsing. - Name
-
- Required: Yes
- Type: string
The name of the data store.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the S3 JSON source.
- Paths
-
- Required: Yes
- Type: Array of strings
A list of the Amazon S3 paths to read from.
- Recurse
-
- Type: boolean
If set to true, recursively reads files in all subdirectories under the specified paths.
S3ParquetSource
Description
Specifies an Apache Parquet data store stored in Amazon S3.
Members
- AdditionalOptions
-
- Type: S3DirectSourceAdditionalOptions structure
Specifies additional connection options.
- CompressionType
-
- Type: string
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension. Possible values are
"gzip"
and"bzip"
). - Exclusions
-
- Type: Array of strings
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
- GroupFiles
-
- Type: string
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000 files, set this parameter to
"none"
. - GroupSize
-
- Type: string
The target group size in bytes. The default is computed based on the input data size and the size of your cluster. When there are fewer than 50,000 input files,
"groupFiles"
must be set to"inPartition"
for this to take effect. - MaxBand
-
- Type: int
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
- MaxFilesInBand
-
- Type: int
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
- Name
-
- Required: Yes
- Type: string
The name of the data store.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the S3 Parquet source.
- Paths
-
- Required: Yes
- Type: Array of strings
A list of the Amazon S3 paths to read from.
- Recurse
-
- Type: boolean
If set to true, recursively reads files in all subdirectories under the specified paths.
S3SourceAdditionalOptions
Description
Specifies additional connection options for the Amazon S3 data store.
Members
- BoundedFiles
-
- Type: long (int|float)
Sets the upper limit for the target number of files that will be processed.
- BoundedSize
-
- Type: long (int|float)
Sets the upper limit for the target size of the dataset in bytes that will be processed.
S3Target
Description
Specifies a data store in Amazon Simple Storage Service (Amazon S3).
Members
- ConnectionName
-
- Type: string
The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC).
- DlqEventQueueArn
-
- Type: string
A valid Amazon dead-letter SQS ARN. For example,
arn:aws:sqs:region:account:deadLetterQueue
. - EventQueueArn
-
- Type: string
A valid Amazon SQS ARN. For example,
arn:aws:sqs:region:account:sqs
. - Exclusions
-
- Type: Array of strings
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.
- Path
-
- Type: string
The path to the Amazon S3 target.
- SampleSize
-
- Type: int
Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249.
Schedule
Description
A scheduling object using a cron
statement to schedule an event.
Members
- ScheduleExpression
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
. - State
-
- Type: string
The state of the schedule.
SchedulerNotRunningException
Description
The specified scheduler is not running.
Members
- Message
-
- Type: string
A message describing the problem.
SchedulerRunningException
Description
The specified scheduler is already running.
Members
- Message
-
- Type: string
A message describing the problem.
SchedulerTransitioningException
Description
The specified scheduler is transitioning.
Members
- Message
-
- Type: string
A message describing the problem.
SchemaChangePolicy
Description
A policy that specifies update and deletion behaviors for the crawler.
Members
- DeleteBehavior
-
- Type: string
The deletion behavior when the crawler finds a deleted object.
- UpdateBehavior
-
- Type: string
The update behavior when the crawler finds a changed schema.
SchemaColumn
Description
A key-value pair representing a column and data type that this transform can run against. The Schema
parameter of the MLTransform
may contain up to 100 of these structures.
Members
- DataType
-
- Type: string
The type of data in the column.
- Name
-
- Type: string
The name of the column.
SchemaId
Description
The unique ID of the schema in the Glue schema registry.
Members
- RegistryName
-
- Type: string
The name of the schema registry that contains the schema.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema. One of
SchemaArn
orSchemaName
has to be provided. - SchemaName
-
- Type: string
The name of the schema. One of
SchemaArn
orSchemaName
has to be provided.
SchemaListItem
Description
An object that contains minimal details for a schema.
Members
- CreatedTime
-
- Type: string
The date and time that a schema was created.
- Description
-
- Type: string
A description for the schema.
- RegistryName
-
- Type: string
the name of the registry where the schema resides.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) for the schema.
- SchemaName
-
- Type: string
The name of the schema.
- SchemaStatus
-
- Type: string
The status of the schema.
- UpdatedTime
-
- Type: string
The date and time that a schema was updated.
SchemaReference
Description
An object that references a schema stored in the Glue Schema Registry.
Members
- SchemaId
-
- Type: SchemaId structure
A structure that contains schema identity fields. Either this or the
SchemaVersionId
has to be provided. - SchemaVersionId
-
- Type: string
The unique ID assigned to a version of the schema. Either this or the
SchemaId
has to be provided. - SchemaVersionNumber
-
- Type: long (int|float)
The version number of the schema.
SchemaVersionErrorItem
Description
An object that contains the error details for an operation on a schema version.
Members
- ErrorDetails
-
- Type: ErrorDetails structure
The details of the error for the schema version.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
SchemaVersionListItem
Description
An object containing the details about a schema version.
Members
- CreatedTime
-
- Type: string
The date and time the schema version was created.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaVersionId
-
- Type: string
The unique identifier of the schema version.
- Status
-
- Type: string
The status of the schema version.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
SchemaVersionNumber
Description
A structure containing the schema version information.
Members
- LatestVersion
-
- Type: boolean
The latest version available for the schema.
- VersionNumber
-
- Type: long (int|float)
The version number of the schema.
SecurityConfiguration
Description
Specifies a security configuration.
Members
- CreatedTimeStamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which this security configuration was created.
- EncryptionConfiguration
-
- Type: EncryptionConfiguration structure
The encryption configuration associated with this security configuration.
- Name
-
- Type: string
The name of the security configuration.
Segment
Description
Defines a non-overlapping region of a table's partitions, allowing multiple requests to be run in parallel.
Members
- SegmentNumber
-
- Required: Yes
- Type: int
The zero-based index number of the segment. For example, if the total number of segments is 4,
SegmentNumber
values range from 0 through 3. - TotalSegments
-
- Required: Yes
- Type: int
The total number of segments.
SelectFields
Description
Specifies a transform that chooses the data property keys that you want to keep.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- Paths
-
- Required: Yes
- Type: Array of stringss
A JSON path to a variable in the data structure.
SelectFromCollection
Description
Specifies a transform that chooses one DynamicFrame
from a collection of DynamicFrames
. The output is the selected DynamicFrame
Members
- Index
-
- Required: Yes
- Type: int
The index for the DynamicFrame to be selected.
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
SerDeInfo
Description
Information about a serialization/deserialization program (SerDe) that serves as an extractor and loader.
Members
- Name
-
- Type: string
Name of the SerDe.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define initialization parameters for the SerDe.
- SerializationLibrary
-
- Type: string
Usually the class that implements the SerDe. An example is
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.
Session
Description
The period in which a remote Spark runtime environment is running.
Members
- Command
-
- Type: SessionCommand structure
The command object.See SessionCommand.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that this session is completed.
- Connections
-
- Type: ConnectionsList structure
The number of connections used for the session.
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time and date when the session was created.
- DPUSeconds
-
- Type: double
The DPUs consumed by the session (formula: ExecutionTime * MaxCapacity).
- DefaultArguments
-
- Type: Associative array of custom strings keys (OrchestrationNameString) to strings
A map array of key-value pairs. Max is 75 pairs.
- Description
-
- Type: string
The description of the session.
- ErrorMessage
-
- Type: string
The error message displayed during the session.
- ExecutionTime
-
- Type: double
The total time the session ran for.
- GlueVersion
-
- Type: string
The Glue version determines the versions of Apache Spark and Python that Glue supports. The GlueVersion must be greater than 2.0.
- Id
-
- Type: string
The ID of the session.
- IdleTimeout
-
- Type: int
The number of minutes when idle before the session times out.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
WorkerType
to use for the session. - ProfileName
-
- Type: string
The name of an Glue usage profile associated with the session.
- Progress
-
- Type: double
The code execution progress of the session.
- Role
-
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role associated with the Session.
- SecurityConfiguration
-
- Type: string
The name of the SecurityConfiguration structure to be used with the session.
- Status
-
- Type: string
The session status.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a session runs. Accepts a value of
G.1X
,G.2X
,G.4X
, orG.8X
for Spark sessions. Accepts the valueZ.2X
for Ray sessions.
SessionCommand
Description
The SessionCommand
that runs the job.
Members
- Name
-
- Type: string
Specifies the name of the SessionCommand. Can be 'glueetl' or 'gluestreaming'.
- PythonVersion
-
- Type: string
Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.
SkewedInfo
Description
Specifies skewed values in a table. Skewed values are those that occur with very high frequency.
Members
- SkewedColumnNames
-
- Type: Array of strings
A list of names of columns that contain skewed values.
- SkewedColumnValueLocationMaps
-
- Type: Associative array of custom strings keys (ColumnValuesString) to strings
A mapping of skewed values to the columns that contain them.
- SkewedColumnValues
-
- Type: Array of strings
A list of values that appear so frequently as to be considered skewed.
SnowflakeNodeData
Description
Specifies configuration for Snowflake nodes in Glue Studio.
Members
- Action
-
- Type: string
Specifies what action to take when writing to a table with preexisting data. Valid values:
append
,merge
,truncate
,drop
. - AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Specifies additional options passed to the Snowflake connector. If options are specified elsewhere in this node, this will take precedence.
- AutoPushdown
-
- Type: boolean
Specifies whether automatic query pushdown is enabled. If pushdown is enabled, then when a query is run on Spark, if part of the query can be "pushed down" to the Snowflake server, it is pushed down. This improves performance of some queries.
- Connection
-
- Type: Option structure
Specifies a Glue Data Catalog Connection to a Snowflake endpoint.
- Database
-
- Type: string
Specifies a Snowflake database for your node to use.
- IamRole
-
- Type: Option structure
Not currently used.
- MergeAction
-
- Type: string
Specifies a merge action. Valid values:
simple
,custom
. If simple, merge behavior is defined byMergeWhenMatched
andMergeWhenNotMatched
. If custom, defined byMergeClause
. - MergeClause
-
- Type: string
A SQL statement that specifies a custom merge behavior.
- MergeWhenMatched
-
- Type: string
Specifies how to resolve records that match preexisting data when merging. Valid values:
update
,delete
. - MergeWhenNotMatched
-
- Type: string
Specifies how to process records that do not match preexisting data when merging. Valid values:
insert
,none
. - PostAction
-
- Type: string
A SQL string run after the Snowflake connector performs its standard actions.
- PreAction
-
- Type: string
A SQL string run before the Snowflake connector performs its standard actions.
- SampleQuery
-
- Type: string
A SQL string used to retrieve data with the
query
sourcetype. - Schema
-
- Type: string
Specifies a Snowflake database schema for your node to use.
- SelectedColumns
-
- Type: Array of Option structures
Specifies the columns combined to identify a record when detecting matches for merges and upserts. A list of structures with
value
,label
anddescription
keys. Each structure describes a column. - SourceType
-
- Type: string
Specifies how retrieved data is specified. Valid values:
"table"
,"query"
. - StagingTable
-
- Type: string
The name of a staging table used when performing
merge
or upsertappend
actions. Data is written to this table, then moved totable
by a generated postaction. - Table
-
- Type: string
Specifies a Snowflake table for your node to use.
- TableSchema
-
- Type: Array of Option structures
Manually defines the target schema for the node. A list of structures with
value
,label
anddescription
keys. Each structure defines a column. - TempDir
-
- Type: string
Not currently used.
- Upsert
-
- Type: boolean
Used when Action is
append
. Specifies the resolution behavior when a row already exists. If true, preexisting rows will be updated. If false, those rows will be inserted.
SnowflakeSource
Description
Specifies a Snowflake data source.
Members
- Data
-
- Required: Yes
- Type: SnowflakeNodeData structure
Configuration for the Snowflake data source.
- Name
-
- Required: Yes
- Type: string
The name of the Snowflake data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies user-defined schemas for your output data.
SnowflakeTarget
Description
Specifies a Snowflake target.
Members
- Data
-
- Required: Yes
- Type: SnowflakeNodeData structure
Specifies the data of the Snowflake target node.
- Inputs
-
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the Snowflake target.
SortCriterion
Description
Specifies a field to sort by and a sort order.
Members
- FieldName
-
- Type: string
The name of the field on which to sort.
- Sort
-
- Type: string
An ascending or descending sort.
SourceControlDetails
Description
The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
Members
- AuthStrategy
-
- Type: string
The type of authentication, which can be an authentication token stored in Amazon Web Services Secrets Manager, or a personal access token.
- AuthToken
-
- Type: string
The value of an authorization token.
- Branch
-
- Type: string
An optional branch in the remote repository.
- Folder
-
- Type: string
An optional folder in the remote repository.
- LastCommitId
-
- Type: string
The last commit ID for a commit in the remote repository.
- Owner
-
- Type: string
The owner of the remote repository that contains the job artifacts.
- Provider
-
- Type: string
The provider for the remote repository.
- Repository
-
- Type: string
The name of the remote repository that contains the job artifacts.
SparkConnectorSource
Description
Specifies a connector to an Apache Spark data source.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Additional connection options for the connector.
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection that is associated with the connector.
- ConnectionType
-
- Required: Yes
- Type: string
The type of connection, such as marketplace.spark or custom.spark, designating a connection to an Apache Spark data store.
- ConnectorName
-
- Required: Yes
- Type: string
The name of a connector that assists with accessing the data store in Glue Studio.
- Name
-
- Required: Yes
- Type: string
The name of the data source.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies data schema for the custom spark source.
SparkConnectorTarget
Description
Specifies a target that uses an Apache Spark connector.
Members
- AdditionalOptions
-
- Type: Associative array of custom strings keys (EnclosedInStringProperty) to strings
Additional connection options for the connector.
- ConnectionName
-
- Required: Yes
- Type: string
The name of a connection for an Apache Spark connector.
- ConnectionType
-
- Required: Yes
- Type: string
The type of connection, such as marketplace.spark or custom.spark, designating a connection to an Apache Spark data store.
- ConnectorName
-
- Required: Yes
- Type: string
The name of an Apache Spark connector.
- Inputs
-
- Required: Yes
- Type: Array of strings
The nodes that are inputs to the data target.
- Name
-
- Required: Yes
- Type: string
The name of the data target.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the custom spark target.
SparkSQL
Description
Specifies a transform where you enter a SQL query using Spark SQL syntax to transform the data. The output is a single DynamicFrame
.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names. You can associate a table name with each input node to use in the SQL query. The name you choose must meet the Spark SQL naming restrictions.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- OutputSchemas
-
- Type: Array of GlueSchema structures
Specifies the data schema for the SparkSQL transform.
- SqlAliases
-
- Required: Yes
- Type: Array of SqlAlias structures
A list of aliases. An alias allows you to specify what name to use in the SQL for a given input. For example, you have a datasource named "MyDataSource". If you specify
From
as MyDataSource, andAlias
as SqlName, then in your SQL you can do:select * from SqlName
and that gets data from MyDataSource.
- SqlQuery
-
- Required: Yes
- Type: string
A SQL query that must use Spark SQL syntax and return a single data set.
Spigot
Description
Specifies a transform that writes samples of the data to an Amazon S3 bucket.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- Path
-
- Required: Yes
- Type: string
A path in Amazon S3 where the transform will write a subset of records from the dataset to a JSON file in an Amazon S3 bucket.
- Prob
-
- Type: double
The probability (a decimal value with a maximum value of 1) of picking any given record. A value of 1 indicates that each row read from the dataset should be included in the sample output.
- Topk
-
- Type: int
Specifies a number of records to write starting from the beginning of the dataset.
SplitFields
Description
Specifies a transform that splits data property keys into two DynamicFrames
. The output is a collection of DynamicFrames
: one with selected data property keys, and one with the remaining data property keys.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The data inputs identified by their node names.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- Paths
-
- Required: Yes
- Type: Array of stringss
A JSON path to a variable in the data structure.
SqlAlias
Description
Represents a single entry in the list of values for SqlAliases
.
Members
- Alias
-
- Required: Yes
- Type: string
A temporary name given to a table, or a column in a table.
- From
-
- Required: Yes
- Type: string
A table, or a column in a table.
StartingEventBatchCondition
Description
The batch condition that started the workflow run. Either the number of events in the batch size arrived, in which case the BatchSize member is non-zero, or the batch window expired, in which case the BatchWindow member is non-zero.
Members
- BatchSize
-
- Type: int
Number of events in the batch.
- BatchWindow
-
- Type: int
Duration of the batch window in seconds.
Statement
Description
The statement or request for a particular action to occur in a session.
Members
- Code
-
- Type: string
The execution code of the statement.
- CompletedOn
-
- Type: long (int|float)
The unix time and date that the job definition was completed.
- Id
-
- Type: int
The ID of the statement.
- Output
-
- Type: StatementOutput structure
The output in JSON.
- Progress
-
- Type: double
The code execution progress.
- StartedOn
-
- Type: long (int|float)
The unix time and date that the job definition was started.
- State
-
- Type: string
The state while request is actioned.
StatementOutput
Description
The code execution output in JSON format.
Members
- Data
-
- Type: StatementOutputData structure
The code execution output.
- ErrorName
-
- Type: string
The name of the error in the output.
- ErrorValue
-
- Type: string
The error value of the output.
- ExecutionCount
-
- Type: int
The execution count of the output.
- Status
-
- Type: string
The status of the code execution output.
- Traceback
-
- Type: Array of strings
The traceback of the output.
StatementOutputData
Description
The code execution output in JSON format.
Members
- TextPlain
-
- Type: string
The code execution output in text format.
StatisticAnnotation
Description
A Statistic Annotation.
Members
- InclusionAnnotation
-
- Type: TimestampedInclusionAnnotation structure
The inclusion annotation applied to the statistic.
- ProfileId
-
- Type: string
The Profile ID.
- StatisticId
-
- Type: string
The Statistic ID.
- StatisticRecordedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the annotated statistic was recorded.
StatisticModelResult
Description
The statistic model result.
Members
- ActualValue
-
- Type: double
The actual value.
- Date
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date.
- InclusionAnnotation
-
- Type: string
The inclusion annotation.
- LowerBound
-
- Type: double
The lower bound.
- PredictedValue
-
- Type: double
The predicted value.
- UpperBound
-
- Type: double
The upper bound.
StatisticSummary
Description
Summary information about a statistic.
Members
- ColumnsReferenced
-
- Type: Array of strings
The list of columns referenced by the statistic.
- DoubleValue
-
- Type: double
The value of the statistic.
- EvaluationLevel
-
- Type: string
The evaluation level of the statistic. Possible values:
Dataset
,Column
,Multicolumn
. - InclusionAnnotation
-
- Type: TimestampedInclusionAnnotation structure
The inclusion annotation for the statistic.
- ProfileId
-
- Type: string
The Profile ID.
- RecordedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the statistic was recorded.
- ReferencedDatasets
-
- Type: Array of strings
The list of datasets referenced by the statistic.
- RunIdentifier
-
- Type: RunIdentifier structure
The Run Identifier
- StatisticId
-
- Type: string
The Statistic ID.
- StatisticName
-
- Type: string
The name of the statistic.
- StatisticProperties
-
- Type: Associative array of custom strings keys (NameString) to strings
A
StatisticPropertiesMap
, which contains aNameString
andDescriptionString
StatusDetails
Description
A structure containing information about an asynchronous change to a table.
Members
- RequestedChange
-
- Type: Table structure
A
Table
object representing the requested changes. - ViewValidations
-
- Type: Array of ViewValidation structures
A list of
ViewValidation
objects that contain information for an analytical engine to validate a view.
StorageDescriptor
Description
Describes the physical storage of table data.
Members
- AdditionalLocations
-
- Type: Array of strings
A list of locations that point to the path where a Delta table is located.
- BucketColumns
-
- Type: Array of strings
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
- Columns
-
- Type: Array of Column structures
A list of the
Columns
in the table. - Compressed
-
- Type: boolean
True
if the data in the table is compressed, orFalse
if not. - InputFormat
-
- Type: string
The input format:
SequenceFileInputFormat
(binary), orTextInputFormat
, or a custom format. - Location
-
- Type: string
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
- NumberOfBuckets
-
- Type: int
Must be specified if the table contains any dimension columns.
- OutputFormat
-
- Type: string
The output format:
SequenceFileOutputFormat
(binary), orIgnoreKeyTextOutputFormat
, or a custom format. - Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
The user-supplied properties in key-value form.
- SchemaReference
-
- Type: SchemaReference structure
An object that references a schema stored in the Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
- SerdeInfo
-
- Type: SerDeInfo structure
The serialization/deserialization (SerDe) information.
- SkewedInfo
-
- Type: SkewedInfo structure
The information about values that appear frequently in a column (skewed values).
- SortColumns
-
- Type: Array of Order structures
A list specifying the sort order of each bucket in the table.
- StoredAsSubDirectories
-
- Type: boolean
True
if the table data is stored in subdirectories, orFalse
if not.
StreamingDataPreviewOptions
Description
Specifies options related to data preview for viewing a sample of your data.
Members
- PollingTime
-
- Type: long (int|float)
The polling time in milliseconds.
- RecordPollingLimit
-
- Type: long (int|float)
The limit to the number of records polled.
StringColumnStatisticsData
Description
Defines column statistics supported for character sequence data values.
Members
- AverageLength
-
- Required: Yes
- Type: double
The average string length in the column.
- MaximumLength
-
- Required: Yes
- Type: long (int|float)
The size of the longest string in the column.
- NumberOfDistinctValues
-
- Required: Yes
- Type: long (int|float)
The number of distinct values in a column.
- NumberOfNulls
-
- Required: Yes
- Type: long (int|float)
The number of null values in the column.
SupportedDialect
Description
A structure specifying the dialect and dialect version used by the query engine.
Members
- Dialect
-
- Type: string
The dialect of the query engine.
- DialectVersion
-
- Type: string
The version of the dialect of the query engine. For example, 3.0.0.
Table
Description
Represents a collection of related data organized in columns and rows.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the table resides.
- CreateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time when the table definition was created in the Data Catalog.
- CreatedBy
-
- Type: string
The person or entity who created the table.
- DatabaseName
-
- Type: string
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
- Description
-
- Type: string
A description of the table.
- FederatedTable
-
- Type: FederatedTable structure
A
FederatedTable
structure that references an entity outside the Glue Data Catalog. - IsMultiDialectView
-
- Type: boolean
Specifies whether the view supports the SQL dialects of one or more different query engines and can therefore be read by those engines.
- IsRegisteredWithLakeFormation
-
- Type: boolean
Indicates whether the table has been registered with Lake Formation.
- LastAccessTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
- LastAnalyzedTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that column statistics were computed for this table.
- Name
-
- Required: Yes
- Type: string
The table name. For Hive compatibility, this must be entirely lowercase.
- Owner
-
- Type: string
The owner of the table.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define properties associated with the table.
- PartitionKeys
-
- Type: Array of Column structures
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any
partitionKeys
, you must at least set the value ofpartitionKeys
to an empty list. For example:"PartitionKeys": []
- Retention
-
- Type: int
The retention time for this table.
- Status
-
- Type: TableStatus structure
A structure containing information about the state of an asynchronous change to a table.
- StorageDescriptor
-
- Type: StorageDescriptor structure
A storage descriptor containing information about the physical storage of this table.
- TableType
-
- Type: string
The type of this table. Glue will create tables with the
EXTERNAL_TABLE
type. Other services, such as Athena, may create tables with additional table types.Glue related table types:
- EXTERNAL_TABLE
-
Hive compatible attribute - indicates a non-Hive managed table.
- GOVERNED
-
Used by Lake Formation. The Glue Data Catalog understands
GOVERNED
.
- TargetTable
-
- Type: TableIdentifier structure
A
TableIdentifier
structure that describes a target table for resource linking. - UpdateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that the table was updated.
- VersionId
-
- Type: string
The ID of the table version.
- ViewDefinition
-
- Type: ViewDefinition structure
A structure that contains all the information that defines the view, including the dialect or dialects for the view, and the query.
- ViewExpandedText
-
- Type: string
Included for Apache Hive compatibility. Not used in the normal course of Glue operations.
- ViewOriginalText
-
- Type: string
Included for Apache Hive compatibility. Not used in the normal course of Glue operations. If the table is a
VIRTUAL_VIEW
, certain Athena configuration encoded in base64.
TableError
Description
An error record for table operations.
Members
- ErrorDetail
-
- Type: ErrorDetail structure
The details about the error.
- TableName
-
- Type: string
The name of the table. For Hive compatibility, this must be entirely lowercase.
TableIdentifier
Description
A structure that describes a target table for resource linking.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the table resides.
- DatabaseName
-
- Type: string
The name of the catalog database that contains the target table.
- Name
-
- Type: string
The name of the target table.
- Region
-
- Type: string
Region of the target table.
TableInput
Description
A structure used to define a table.
Members
- Description
-
- Type: string
A description of the table.
- LastAccessTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that the table was accessed.
- LastAnalyzedTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last time that column statistics were computed for this table.
- Name
-
- Required: Yes
- Type: string
The table name. For Hive compatibility, this is folded to lowercase when it is stored.
- Owner
-
- Type: string
The table owner. Included for Apache Hive compatibility. Not used in the normal course of Glue operations.
- Parameters
-
- Type: Associative array of custom strings keys (KeyString) to strings
These key-value pairs define properties associated with the table.
- PartitionKeys
-
- Type: Array of Column structures
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any
partitionKeys
, you must at least set the value ofpartitionKeys
to an empty list. For example:"PartitionKeys": []
- Retention
-
- Type: int
The retention time for this table.
- StorageDescriptor
-
- Type: StorageDescriptor structure
A storage descriptor containing information about the physical storage of this table.
- TableType
-
- Type: string
The type of this table. Glue will create tables with the
EXTERNAL_TABLE
type. Other services, such as Athena, may create tables with additional table types.Glue related table types:
- EXTERNAL_TABLE
-
Hive compatible attribute - indicates a non-Hive managed table.
- GOVERNED
-
Used by Lake Formation. The Glue Data Catalog understands
GOVERNED
.
- TargetTable
-
- Type: TableIdentifier structure
A
TableIdentifier
structure that describes a target table for resource linking. - ViewDefinition
-
- Type: ViewDefinitionInput structure
A structure that contains all the information that defines the view, including the dialect or dialects for the view, and the query.
- ViewExpandedText
-
- Type: string
Included for Apache Hive compatibility. Not used in the normal course of Glue operations.
- ViewOriginalText
-
- Type: string
Included for Apache Hive compatibility. Not used in the normal course of Glue operations. If the table is a
VIRTUAL_VIEW
, certain Athena configuration encoded in base64.
TableOptimizer
Description
Contains details about an optimizer associated with a table.
Members
- configuration
-
- Type: TableOptimizerConfiguration structure
A
TableOptimizerConfiguration
object that was specified when creating or updating a table optimizer. - lastRun
-
- Type: TableOptimizerRun structure
A
TableOptimizerRun
object representing the last run of the table optimizer. - type
-
- Type: string
The type of table optimizer. The valid values are:
-
compaction
: for managing compaction with a table optimizer. -
retention
: for managing the retention of snapshot with a table optimizer. -
orphan_file_deletion
: for managing the deletion of orphan files with a table optimizer.
TableOptimizerConfiguration
Description
Contains details on the configuration of a table optimizer. You pass this configuration when creating or updating a table optimizer.
Members
- enabled
-
- Type: boolean
Whether table optimization is enabled.
- orphanFileDeletionConfiguration
-
- Type: OrphanFileDeletionConfiguration structure
The configuration for an orphan file deletion optimizer.
- retentionConfiguration
-
- Type: RetentionConfiguration structure
The configuration for a snapshot retention optimizer.
- roleArn
-
- Type: string
A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.
TableOptimizerRun
Description
Contains details for a table optimizer run.
Members
- compactionMetrics
-
- Type: CompactionMetrics structure
A
CompactionMetrics
object containing metrics for the optimizer run. - endTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Represents the epoch timestamp at which the compaction job ended.
- error
-
- Type: string
An error that occured during the optimizer run.
- eventType
-
- Type: string
An event type representing the status of the table optimizer run.
- metrics
-
- Type: RunMetrics structure
A
RunMetrics
object containing metrics for the optimizer run.This member is deprecated. See the individual metric members for compaction, retention, and orphan file deletion.
- orphanFileDeletionMetrics
-
- Type: OrphanFileDeletionMetrics structure
An
OrphanFileDeletionMetrics
object containing metrics for the optimizer run. - retentionMetrics
-
- Type: RetentionMetrics structure
A
RetentionMetrics
object containing metrics for the optimizer run. - startTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Represents the epoch timestamp at which the compaction job was started within Lake Formation.
TableStatus
Description
A structure containing information about the state of an asynchronous change to a table.
Members
- Action
-
- Type: string
Indicates which action was called on the table, currently only
CREATE
orUPDATE
. - Details
-
- Type: StatusDetails structure
A
StatusDetails
object with information about the requested change. - Error
-
- Type: ErrorDetail structure
An error that will only appear when the state is "FAILED". This is a parent level exception message, there may be different
Error
s for each dialect. - RequestTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
An ISO 8601 formatted date string indicating the time that the change was initiated.
- RequestedBy
-
- Type: string
The ARN of the user who requested the asynchronous change.
- State
-
- Type: string
A generic status for the change in progress, such as QUEUED, IN_PROGRESS, SUCCESS, or FAILED.
- UpdateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
An ISO 8601 formatted date string indicating the time that the state was last updated.
- UpdatedBy
-
- Type: string
The ARN of the user to last manually alter the asynchronous change (requesting cancellation, etc).
TableVersion
Description
Specifies a version of a table.
Members
- Table
-
- Type: Table structure
The table in question.
- VersionId
-
- Type: string
The ID value that identifies this table version. A
VersionId
is a string representation of an integer. Each version is incremented by 1.
TableVersionError
Description
An error record for table-version operations.
Members
- ErrorDetail
-
- Type: ErrorDetail structure
The details about the error.
- TableName
-
- Type: string
The name of the table in question.
- VersionId
-
- Type: string
The ID value of the version in question. A
VersionID
is a string representation of an integer. Each version is incremented by 1.
TaskRun
Description
The sampling parameters that are associated with the machine learning transform.
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last point in time that the requested task run was completed.
- ErrorString
-
- Type: string
The list of error strings associated with this task run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the task run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last point in time that the requested task run was updated.
- LogGroupName
-
- Type: string
The names of the log group for secure logging, associated with this task run.
- Properties
-
- Type: TaskRunProperties structure
Specifies configuration properties associated with this task run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that this task run started.
- Status
-
- Type: string
The current status of the requested task run.
- TaskRunId
-
- Type: string
The unique identifier for this task run.
- TransformId
-
- Type: string
The unique identifier for the transform.
TaskRunFilterCriteria
Description
The criteria that are used to filter the task runs for the machine learning transform.
Members
- StartedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on task runs started after this date.
- StartedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on task runs started before this date.
- Status
-
- Type: string
The current status of the task run.
- TaskRunType
-
- Type: string
The type of task run.
TaskRunProperties
Description
The configuration properties for the task run.
Members
- ExportLabelsTaskRunProperties
-
- Type: ExportLabelsTaskRunProperties structure
The configuration properties for an exporting labels task run.
- FindMatchesTaskRunProperties
-
- Type: FindMatchesTaskRunProperties structure
The configuration properties for a find matches task run.
- ImportLabelsTaskRunProperties
-
- Type: ImportLabelsTaskRunProperties structure
The configuration properties for an importing labels task run.
- LabelingSetGenerationTaskRunProperties
-
- Type: LabelingSetGenerationTaskRunProperties structure
The configuration properties for a labeling set generation task run.
- TaskType
-
- Type: string
The type of task run.
TaskRunSortCriteria
Description
The sorting criteria that are used to sort the list of task runs for the machine learning transform.
Members
- Column
-
- Required: Yes
- Type: string
The column to be used to sort the list of task runs for the machine learning transform.
- SortDirection
-
- Required: Yes
- Type: string
The sort direction to be used to sort the list of task runs for the machine learning transform.
TestConnectionInput
Description
A structure that is used to specify testing a connection to a service.
Members
- AuthenticationConfiguration
-
- Type: AuthenticationConfigurationInput structure
A structure containing the authentication configuration in the TestConnection request. Required for a connection to Salesforce using OAuth authentication.
- ConnectionProperties
-
- Required: Yes
- Type: Associative array of custom strings keys (ConnectionPropertyKey) to strings
The key-value pairs that define parameters for the connection.
JDBC connections use the following connection properties:
-
Required: All of (
HOST
,PORT
,JDBC_ENGINE
) orJDBC_CONNECTION_URL
. -
Required: All of (
USERNAME
,PASSWORD
) orSECRET_ID
. -
Optional:
JDBC_ENFORCE_SSL
,CUSTOM_JDBC_CERT
,CUSTOM_JDBC_CERT_STRING
,SKIP_CUSTOM_JDBC_CERT_VALIDATION
. These parameters are used to configure SSL with JDBC.
SALESFORCE connections require the
AuthenticationConfiguration
member to be configured. - ConnectionType
-
- Required: Yes
- Type: string
The type of connection to test. This operation is only available for the
JDBC
orSALESFORCE
connection types.
ThrottlingException
Description
The throttling threshhold was exceeded.
Members
- Message
-
- Type: string
A message describing the problem.
TimestampFilter
Description
A timestamp filter.
Members
- RecordedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp after which statistics should be included in the results.
- RecordedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp before which statistics should be included in the results.
TimestampedInclusionAnnotation
Description
A timestamped inclusion annotation.
Members
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the inclusion annotation was last modified.
- Value
-
- Type: string
The inclusion annotation value.
TransformConfigParameter
Description
Specifies the parameters in the config file of the dynamic transform.
Members
- IsOptional
-
- Type: boolean
Specifies whether the parameter is optional or not in the config file of the dynamic transform.
- ListType
-
- Type: string
Specifies the list type of the parameter in the config file of the dynamic transform.
- Name
-
- Required: Yes
- Type: string
Specifies the name of the parameter in the config file of the dynamic transform.
- Type
-
- Required: Yes
- Type: string
Specifies the parameter type in the config file of the dynamic transform.
- ValidationMessage
-
- Type: string
Specifies the validation message in the config file of the dynamic transform.
- ValidationRule
-
- Type: string
Specifies the validation rule in the config file of the dynamic transform.
- Value
-
- Type: Array of strings
Specifies the value of the parameter in the config file of the dynamic transform.
TransformEncryption
Description
The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.
Additionally, imported labels and trained transforms can now be encrypted using a customer provided KMS key.
Members
- MlUserDataEncryption
-
- Type: MLUserDataEncryption structure
An
MLUserDataEncryption
object containing the encryption mode and customer-provided KMS key ID. - TaskRunSecurityConfigurationName
-
- Type: string
The name of the security configuration.
TransformFilterCriteria
Description
The criteria used to filter the machine learning transforms.
Members
- CreatedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time and date after which the transforms were created.
- CreatedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time and date before which the transforms were created.
- GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- LastModifiedAfter
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on transforms last modified after this date.
- LastModifiedBefore
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
Filter on transforms last modified before this date.
- Name
-
- Type: string
A unique transform name that is used to filter the machine learning transforms.
- Schema
-
- Type: Array of SchemaColumn structures
Filters on datasets with a specific schema. The
Map<Column, Type>
object is an array of key-value pairs representing the schema this transform accepts, whereColumn
is the name of a column, andType
is the type of the data such as an integer or string. Has an upper bound of 100 columns. - Status
-
- Type: string
Filters the list of machine learning transforms by the last known status of the transforms (to indicate whether a transform can be used or not). One of "NOT_READY", "READY", or "DELETING".
- TransformType
-
- Type: string
The type of machine learning transform that is used to filter the machine learning transforms.
TransformParameters
Description
The algorithm-specific parameters that are associated with the machine learning transform.
Members
- FindMatchesParameters
-
- Type: FindMatchesParameters structure
The parameters for the find matches algorithm.
- TransformType
-
- Required: Yes
- Type: string
The type of machine learning transform.
For information about the types of machine learning transforms, see Creating Machine Learning Transforms.
TransformSortCriteria
Description
The sorting criteria that are associated with the machine learning transform.
Members
- Column
-
- Required: Yes
- Type: string
The column to be used in the sorting criteria that are associated with the machine learning transform.
- SortDirection
-
- Required: Yes
- Type: string
The sort direction to be used in the sorting criteria that are associated with the machine learning transform.
Trigger
Description
Information about a specific trigger.
Members
- Actions
-
- Type: Array of Action structures
The actions initiated by this trigger.
- Description
-
- Type: string
A description of this trigger.
- EventBatchingCondition
-
- Type: EventBatchingCondition structure
Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.
- Id
-
- Type: string
Reserved for future use.
- Name
-
- Type: string
The name of the trigger.
- Predicate
-
- Type: Predicate structure
The predicate of this trigger, which defines when it will fire.
- Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
. - State
-
- Type: string
The current state of the trigger.
- Type
-
- Type: string
The type of trigger that this is.
- WorkflowName
-
- Type: string
The name of the workflow associated with the trigger.
TriggerNodeDetails
Description
The details of a Trigger node present in the workflow.
Members
- Trigger
-
- Type: Trigger structure
The information of the trigger represented by the trigger node.
TriggerUpdate
Description
A structure used to provide information used to update a trigger. This object updates the previous trigger definition by overwriting it completely.
Members
- Actions
-
- Type: Array of Action structures
The actions initiated by this trigger.
- Description
-
- Type: string
A description of this trigger.
- EventBatchingCondition
-
- Type: EventBatchingCondition structure
Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.
- Name
-
- Type: string
Reserved for future use.
- Predicate
-
- Type: Predicate structure
The predicate of this trigger, which defines when it will fire.
- Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.
UnfilteredPartition
Description
A partition that contains unfiltered metadata.
Members
- AuthorizedColumns
-
- Type: Array of strings
The list of columns the user has permissions to access.
- IsRegisteredWithLakeFormation
-
- Type: boolean
A Boolean value indicating that the partition location is registered with Lake Formation.
- Partition
-
- Type: Partition structure
The partition object.
Union
Description
Specifies a transform that combines the rows from two or more datasets into a single result.
Members
- Inputs
-
- Required: Yes
- Type: Array of strings
The node ID inputs to the transform.
- Name
-
- Required: Yes
- Type: string
The name of the transform node.
- UnionType
-
- Required: Yes
- Type: string
Indicates the type of Union transform.
Specify
ALL
to join all rows from data sources to the resulting DynamicFrame. The resulting union does not remove duplicate rows.Specify
DISTINCT
to remove duplicate rows in the resulting DynamicFrame.
UpdateCsvClassifierRequest
Description
Specifies a custom CSV classifier to be updated.
Members
- AllowSingleColumn
-
- Type: boolean
Enables the processing of files that contain only one column.
- ContainsHeader
-
- Type: string
Indicates whether the CSV file contains a header.
- CustomDatatypeConfigured
-
- Type: boolean
Specifies the configuration of custom datatypes.
- CustomDatatypes
-
- Type: Array of strings
Specifies a list of supported custom datatypes.
- Delimiter
-
- Type: string
A custom symbol to denote what separates each column entry in the row.
- DisableValueTrimming
-
- Type: boolean
Specifies not to trim values before identifying the type of column values. The default value is true.
- Header
-
- Type: Array of strings
A list of strings representing column names.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- QuoteSymbol
-
- Type: string
A custom symbol to denote what combines content into a single column value. It must be different from the column delimiter.
- Serde
-
- Type: string
Sets the SerDe for processing CSV in the classifier, which will be applied in the Data Catalog. Valid values are
OpenCSVSerDe
,LazySimpleSerDe
, andNone
. You can specify theNone
value when you want the crawler to do the detection.
UpdateGrokClassifierRequest
Description
Specifies a grok classifier to update when passed to UpdateClassifier
.
Members
- Classification
-
- Type: string
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.
- CustomPatterns
-
- Type: string
Optional custom grok patterns used by this classifier.
- GrokPattern
-
- Type: string
The grok pattern used by this classifier.
- Name
-
- Required: Yes
- Type: string
The name of the
GrokClassifier
.
UpdateJsonClassifierRequest
Description
Specifies a JSON classifier to be updated.
Members
- JsonPath
-
- Type: string
A
JsonPath
string defining the JSON data for the classifier to classify. Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers. - Name
-
- Required: Yes
- Type: string
The name of the classifier.
UpdateXMLClassifierRequest
Description
Specifies an XML classifier to be updated.
Members
- Classification
-
- Type: string
An identifier of the data format that the classifier matches.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- RowTag
-
- Type: string
The XML tag designating the element that contains each record in an XML document being parsed. This cannot identify a self-closing element (closed by
/>
). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example,<row item_a="A" item_b="B"></row>
is okay, but<row item_a="A" item_b="B" />
is not).
UpsertRedshiftTargetOptions
Description
The options to configure an upsert operation when writing to a Redshift target .
Members
- ConnectionName
-
- Type: string
The name of the connection to use to write to Redshift.
- TableLocation
-
- Type: string
The physical location of the Redshift table.
- UpsertKeys
-
- Type: Array of strings
The keys used to determine whether to perform an update or insert.
UsageProfileDefinition
Description
Describes an Glue usage profile.
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the usage profile was created.
- Description
-
- Type: string
A description of the usage profile.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the usage profile was last modified.
- Name
-
- Type: string
The name of the usage profile.
UserDefinedFunction
Description
Represents the equivalent of a Hive user-defined function (UDF
) definition.
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the function resides.
- ClassName
-
- Type: string
The Java class that contains the function code.
- CreateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the function was created.
- DatabaseName
-
- Type: string
The name of the catalog database that contains the function.
- FunctionName
-
- Type: string
The name of the function.
- OwnerName
-
- Type: string
The owner of the function.
- OwnerType
-
- Type: string
The owner type.
- ResourceUris
-
- Type: Array of ResourceUri structures
The resource URIs for the function.
UserDefinedFunctionInput
Description
A structure used to create or update a user-defined function.
Members
- ClassName
-
- Type: string
The Java class that contains the function code.
- FunctionName
-
- Type: string
The name of the function.
- OwnerName
-
- Type: string
The owner of the function.
- OwnerType
-
- Type: string
The owner type.
- ResourceUris
-
- Type: Array of ResourceUri structures
The resource URIs for the function.
ValidationException
Description
A value could not be validated.
Members
- Message
-
- Type: string
A message describing the problem.
VersionMismatchException
Description
There was a version conflict.
Members
- Message
-
- Type: string
A message describing the problem.
ViewDefinition
Description
A structure containing details for representations.
Members
- Definer
-
- Type: string
The definer of a view in SQL.
- IsProtected
-
- Type: boolean
You can set this flag as true to instruct the engine not to push user-provided operations into the logical plan of the view during query planning. However, setting this flag does not guarantee that the engine will comply. Refer to the engine's documentation to understand the guarantees provided, if any.
- Representations
-
- Type: Array of ViewRepresentation structures
A list of representations.
- SubObjects
-
- Type: Array of strings
A list of table Amazon Resource Names (ARNs).
ViewDefinitionInput
Description
A structure containing details for creating or updating an Glue view.
Members
- Definer
-
- Type: string
The definer of a view in SQL.
- IsProtected
-
- Type: boolean
You can set this flag as true to instruct the engine not to push user-provided operations into the logical plan of the view during query planning. However, setting this flag does not guarantee that the engine will comply. Refer to the engine's documentation to understand the guarantees provided, if any.
- Representations
-
- Type: Array of ViewRepresentationInput structures
A list of structures that contains the dialect of the view, and the query that defines the view.
- SubObjects
-
- Type: Array of strings
A list of base table ARNs that make up the view.
ViewRepresentation
Description
A structure that contains the dialect of the view, and the query that defines the view.
Members
- Dialect
-
- Type: string
The dialect of the query engine.
- DialectVersion
-
- Type: string
The version of the dialect of the query engine. For example, 3.0.0.
- IsStale
-
- Type: boolean
Dialects marked as stale are no longer valid and must be updated before they can be queried in their respective query engines.
- ValidationConnection
-
- Type: string
The name of the connection to be used to validate the specific representation of the view.
- ViewExpandedText
-
- Type: string
The expanded SQL for the view. This SQL is used by engines while processing a query on a view. Engines may perform operations during view creation to transform
ViewOriginalText
toViewExpandedText
. For example:-
Fully qualified identifiers:
SELECT * from table1 -> SELECT * from db1.table1
- ViewOriginalText
-
- Type: string
The
SELECT
query provided by the customer duringCREATE VIEW DDL
. This SQL is not used during a query on a view (ViewExpandedText
is used instead).ViewOriginalText
is used for cases likeSHOW CREATE VIEW
where users want to see the original DDL command that created the view.
ViewRepresentationInput
Description
A structure containing details of a representation to update or create a Lake Formation view.
Members
- Dialect
-
- Type: string
A parameter that specifies the engine type of a specific representation.
- DialectVersion
-
- Type: string
A parameter that specifies the version of the engine of a specific representation.
- ValidationConnection
-
- Type: string
The name of the connection to be used to validate the specific representation of the view.
- ViewExpandedText
-
- Type: string
A string that represents the SQL query that describes the view with expanded resource ARNs
- ViewOriginalText
-
- Type: string
A string that represents the original SQL query that describes the view.
ViewValidation
Description
A structure that contains information for an analytical engine to validate a view, prior to persisting the view metadata. Used in the case of direct UpdateTable
or CreateTable
API calls.
Members
- Dialect
-
- Type: string
The dialect of the query engine.
- DialectVersion
-
- Type: string
The version of the dialect of the query engine. For example, 3.0.0.
- Error
-
- Type: ErrorDetail structure
An error associated with the validation.
- State
-
- Type: string
The state of the validation.
- UpdateTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time of the last update.
- ViewValidationText
-
- Type: string
The
SELECT
query that defines the view, as provided by the customer.
Workflow
Description
A workflow is a collection of multiple dependent Glue jobs and crawlers that are run to complete a complex ETL task. A workflow manages the execution and monitoring of all its jobs and crawlers.
Members
- BlueprintDetails
-
- Type: BlueprintDetails structure
This structure indicates the details of the blueprint that this particular workflow is created from.
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the workflow was created.
- DefaultRunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
A collection of properties to be used as part of each execution of the workflow. The run properties are made available to each job in the workflow. A job can modify the properties for the next jobs in the flow.
- Description
-
- Type: string
A description of the workflow.
- Graph
-
- Type: WorkflowGraph structure
The graph representing all the Glue components that belong to the workflow as nodes and directed connections between them as edges.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the workflow was last modified.
- LastRun
-
- Type: WorkflowRun structure
The information about the last execution of the workflow.
- MaxConcurrentRuns
-
- Type: int
You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.
- Name
-
- Type: string
The name of the workflow.
WorkflowGraph
Description
A workflow graph represents the complete workflow containing all the Glue components present in the workflow and all the directed connections between them.
Members
WorkflowRun
Description
A workflow run is an execution of a workflow providing all the runtime information.
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the workflow run completed.
- ErrorMessage
-
- Type: string
This error message describes any error that may have occurred in starting the workflow run. Currently the only error message is "Concurrent runs exceeded for workflow:
foo
." - Graph
-
- Type: WorkflowGraph structure
The graph representing all the Glue components that belong to the workflow as nodes and directed connections between them as edges.
- Name
-
- Type: string
Name of the workflow that was run.
- PreviousRunId
-
- Type: string
The ID of the previous workflow run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the workflow run was started.
- StartingEventBatchCondition
-
- Type: StartingEventBatchCondition structure
The batch condition that started the workflow run.
- Statistics
-
- Type: WorkflowRunStatistics structure
The statistics of the run.
- Status
-
- Type: string
The status of the workflow run.
- WorkflowRunId
-
- Type: string
The ID of this workflow run.
- WorkflowRunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
The workflow run properties which were set during the run.
WorkflowRunStatistics
Description
Workflow run statistics provides statistics about the workflow run.
Members
- ErroredActions
-
- Type: int
Indicates the count of job runs in the ERROR state in the workflow run.
- FailedActions
-
- Type: int
Total number of Actions that have failed.
- RunningActions
-
- Type: int
Total number Actions in running state.
- StoppedActions
-
- Type: int
Total number of Actions that have stopped.
- SucceededActions
-
- Type: int
Total number of Actions that have succeeded.
- TimeoutActions
-
- Type: int
Total number of Actions that timed out.
- TotalActions
-
- Type: int
Total number of Actions in the workflow run.
- WaitingActions
-
- Type: int
Indicates the count of job runs in WAITING state in the workflow run.
XMLClassifier
Description
A classifier for XML
content.
Members
- Classification
-
- Required: Yes
- Type: string
An identifier of the data format that the classifier matches.
- CreationTime
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was registered.
- LastUpdated
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time that this classifier was last updated.
- Name
-
- Required: Yes
- Type: string
The name of the classifier.
- RowTag
-
- Type: string
The XML tag designating the element that contains each record in an XML document being parsed. This can't identify a self-closing element (closed by
/>
). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example,<row item_a="A" item_b="B"></row>
is okay, but<row item_a="A" item_b="B" />
is not). - Version
-
- Type: long (int|float)
The version of this classifier.