AWS Glue 2017-03-31
- Client: Aws\Glue\GlueClient
- Service ID: glue
- Version: 2017-03-31
This page describes the parameters and results for the operations of the AWS Glue (2017-03-31), and shows how to use the Aws\Glue\GlueClient object to call the described operations. This documentation is specific to the 2017-03-31 API version of the service.
Operation Summary
Each of the following operations can be created from a client using
$client->getCommand('CommandName')
, where "CommandName" is the
name of one of the following operations. Note: a command is a value that
encapsulates an operation and the parameters used to create an HTTP request.
You can also create and send a command immediately using the magic methods
available on a client object: $client->commandName(/* parameters */)
.
You can send the command asynchronously (returning a promise) by appending the
word "Async" to the operation name: $client->commandNameAsync(/* parameters */)
.
- BatchCreatePartition ( array $params = [] )
- Creates one or more partitions in a batch operation.
- BatchDeleteConnection ( array $params = [] )
- Deletes a list of connection definitions from the Data Catalog.
- BatchDeletePartition ( array $params = [] )
- Deletes one or more partitions in a batch operation.
- BatchDeleteTable ( array $params = [] )
- Deletes multiple tables at once.
- BatchDeleteTableVersion ( array $params = [] )
- Deletes a specified batch of versions of a table.
- BatchGetBlueprints ( array $params = [] )
- Retrieves information about a list of blueprints.
- BatchGetCrawlers ( array $params = [] )
- Returns a list of resource metadata for a given list of crawler names.
- BatchGetCustomEntityTypes ( array $params = [] )
- Retrieves the details for the custom patterns specified by a list of names.
- BatchGetDataQualityResult ( array $params = [] )
- Retrieves a list of data quality results for the specified result IDs.
- BatchGetDevEndpoints ( array $params = [] )
- Returns a list of resource metadata for a given list of development endpoint names.
- BatchGetJobs ( array $params = [] )
- Returns a list of resource metadata for a given list of job names.
- BatchGetPartition ( array $params = [] )
- Retrieves partitions in a batch request.
- BatchGetTableOptimizer ( array $params = [] )
- Returns the configuration for the specified table optimizers.
- BatchGetTriggers ( array $params = [] )
- Returns a list of resource metadata for a given list of trigger names.
- BatchGetWorkflows ( array $params = [] )
- Returns a list of resource metadata for a given list of workflow names.
- BatchPutDataQualityStatisticAnnotation ( array $params = [] )
- Annotate datapoints over time for a specific data quality statistic.
- BatchStopJobRun ( array $params = [] )
- Stops one or more job runs for a specified job definition.
- BatchUpdatePartition ( array $params = [] )
- Updates one or more partitions in a batch operation.
- CancelDataQualityRuleRecommendationRun ( array $params = [] )
- Cancels the specified recommendation run that was being used to generate rules.
- CancelDataQualityRulesetEvaluationRun ( array $params = [] )
- Cancels a run where a ruleset is being evaluated against a data source.
- CancelMLTaskRun ( array $params = [] )
- Cancels (stops) a task run.
- CancelStatement ( array $params = [] )
- Cancels the statement.
- CheckSchemaVersionValidity ( array $params = [] )
- Validates the supplied schema.
- CreateBlueprint ( array $params = [] )
- Registers a blueprint with Glue.
- CreateClassifier ( array $params = [] )
- Creates a classifier in the user's account.
- CreateConnection ( array $params = [] )
- Creates a connection definition in the Data Catalog.
- CreateCrawler ( array $params = [] )
- Creates a new crawler with specified targets, role, configuration, and optional schedule.
- CreateCustomEntityType ( array $params = [] )
- Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
- CreateDataQualityRuleset ( array $params = [] )
- Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
- CreateDatabase ( array $params = [] )
- Creates a new database in a Data Catalog.
- CreateDevEndpoint ( array $params = [] )
- Creates a new development endpoint.
- CreateJob ( array $params = [] )
- Creates a new job definition.
- CreateMLTransform ( array $params = [] )
- Creates an Glue machine learning transform.
- CreatePartition ( array $params = [] )
- Creates a new partition.
- CreatePartitionIndex ( array $params = [] )
- Creates a specified partition index in an existing table.
- CreateRegistry ( array $params = [] )
- Creates a new registry which may be used to hold a collection of schemas.
- CreateSchema ( array $params = [] )
- Creates a new schema set and registers the schema definition.
- CreateScript ( array $params = [] )
- Transforms a directed acyclic graph (DAG) into code.
- CreateSecurityConfiguration ( array $params = [] )
- Creates a new security configuration.
- CreateSession ( array $params = [] )
- Creates a new session.
- CreateTable ( array $params = [] )
- Creates a new table definition in the Data Catalog.
- CreateTableOptimizer ( array $params = [] )
- Creates a new table optimizer for a specific function.
- CreateTrigger ( array $params = [] )
- Creates a new trigger.
- CreateUsageProfile ( array $params = [] )
- Creates an Glue usage profile.
- CreateUserDefinedFunction ( array $params = [] )
- Creates a new function definition in the Data Catalog.
- CreateWorkflow ( array $params = [] )
- Creates a new workflow.
- DeleteBlueprint ( array $params = [] )
- Deletes an existing blueprint.
- DeleteClassifier ( array $params = [] )
- Removes a classifier from the Data Catalog.
- DeleteColumnStatisticsForPartition ( array $params = [] )
- Delete the partition column statistics of a column.
- DeleteColumnStatisticsForTable ( array $params = [] )
- Retrieves table statistics of columns.
- DeleteConnection ( array $params = [] )
- Deletes a connection from the Data Catalog.
- DeleteCrawler ( array $params = [] )
- Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING.
- DeleteCustomEntityType ( array $params = [] )
- Deletes a custom pattern by specifying its name.
- DeleteDataQualityRuleset ( array $params = [] )
- Deletes a data quality ruleset.
- DeleteDatabase ( array $params = [] )
- Removes a specified database from a Data Catalog.
- DeleteDevEndpoint ( array $params = [] )
- Deletes a specified development endpoint.
- DeleteJob ( array $params = [] )
- Deletes a specified job definition.
- DeleteMLTransform ( array $params = [] )
- Deletes an Glue machine learning transform.
- DeletePartition ( array $params = [] )
- Deletes a specified partition.
- DeletePartitionIndex ( array $params = [] )
- Deletes a specified partition index from an existing table.
- DeleteRegistry ( array $params = [] )
- Delete the entire registry including schema and all of its versions.
- DeleteResourcePolicy ( array $params = [] )
- Deletes a specified policy.
- DeleteSchema ( array $params = [] )
- Deletes the entire schema set, including the schema set and all of its versions.
- DeleteSchemaVersions ( array $params = [] )
- Remove versions from the specified schema.
- DeleteSecurityConfiguration ( array $params = [] )
- Deletes a specified security configuration.
- DeleteSession ( array $params = [] )
- Deletes the session.
- DeleteTable ( array $params = [] )
- Removes a table definition from the Data Catalog.
- DeleteTableOptimizer ( array $params = [] )
- Deletes an optimizer and all associated metadata for a table.
- DeleteTableVersion ( array $params = [] )
- Deletes a specified version of a table.
- DeleteTrigger ( array $params = [] )
- Deletes a specified trigger.
- DeleteUsageProfile ( array $params = [] )
- Deletes the Glue specified usage profile.
- DeleteUserDefinedFunction ( array $params = [] )
- Deletes an existing function definition from the Data Catalog.
- DeleteWorkflow ( array $params = [] )
- Deletes a workflow.
- GetBlueprint ( array $params = [] )
- Retrieves the details of a blueprint.
- GetBlueprintRun ( array $params = [] )
- Retrieves the details of a blueprint run.
- GetBlueprintRuns ( array $params = [] )
- Retrieves the details of blueprint runs for a specified blueprint.
- GetCatalogImportStatus ( array $params = [] )
- Retrieves the status of a migration operation.
- GetClassifier ( array $params = [] )
- Retrieve a classifier by name.
- GetClassifiers ( array $params = [] )
- Lists all classifier objects in the Data Catalog.
- GetColumnStatisticsForPartition ( array $params = [] )
- Retrieves partition statistics of columns.
- GetColumnStatisticsForTable ( array $params = [] )
- Retrieves table statistics of columns.
- GetColumnStatisticsTaskRun ( array $params = [] )
- Get the associated metadata/information for a task run, given a task run ID.
- GetColumnStatisticsTaskRuns ( array $params = [] )
- Retrieves information about all runs associated with the specified table.
- GetConnection ( array $params = [] )
- Retrieves a connection definition from the Data Catalog.
- GetConnections ( array $params = [] )
- Retrieves a list of connection definitions from the Data Catalog.
- GetCrawler ( array $params = [] )
- Retrieves metadata for a specified crawler.
- GetCrawlerMetrics ( array $params = [] )
- Retrieves metrics about specified crawlers.
- GetCrawlers ( array $params = [] )
- Retrieves metadata for all crawlers defined in the customer account.
- GetCustomEntityType ( array $params = [] )
- Retrieves the details of a custom pattern by specifying its name.
- GetDataCatalogEncryptionSettings ( array $params = [] )
- Retrieves the security configuration for a specified catalog.
- GetDataQualityModel ( array $params = [] )
- Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
- GetDataQualityModelResult ( array $params = [] )
- Retrieve a statistic's predictions for a given Profile ID.
- GetDataQualityResult ( array $params = [] )
- Retrieves the result of a data quality rule evaluation.
- GetDataQualityRuleRecommendationRun ( array $params = [] )
- Gets the specified recommendation run that was used to generate rules.
- GetDataQualityRuleset ( array $params = [] )
- Returns an existing ruleset by identifier or name.
- GetDataQualityRulesetEvaluationRun ( array $params = [] )
- Retrieves a specific run where a ruleset is evaluated against a data source.
- GetDatabase ( array $params = [] )
- Retrieves the definition of a specified database.
- GetDatabases ( array $params = [] )
- Retrieves all databases defined in a given Data Catalog.
- GetDataflowGraph ( array $params = [] )
- Transforms a Python script into a directed acyclic graph (DAG).
- GetDevEndpoint ( array $params = [] )
- Retrieves information about a specified development endpoint.
- GetDevEndpoints ( array $params = [] )
- Retrieves all the development endpoints in this Amazon Web Services account.
- GetJob ( array $params = [] )
- Retrieves an existing job definition.
- GetJobBookmark ( array $params = [] )
- Returns information on a job bookmark entry.
- GetJobRun ( array $params = [] )
- Retrieves the metadata for a given job run.
- GetJobRuns ( array $params = [] )
- Retrieves metadata for all runs of a given job definition.
- GetJobs ( array $params = [] )
- Retrieves all current job definitions.
- GetMLTaskRun ( array $params = [] )
- Gets details for a specific task run on a machine learning transform.
- GetMLTaskRuns ( array $params = [] )
- Gets a list of runs for a machine learning transform.
- GetMLTransform ( array $params = [] )
- Gets an Glue machine learning transform artifact and all its corresponding metadata.
- GetMLTransforms ( array $params = [] )
- Gets a sortable, filterable list of existing Glue machine learning transforms.
- GetMapping ( array $params = [] )
- Creates mappings.
- GetPartition ( array $params = [] )
- Retrieves information about a specified partition.
- GetPartitionIndexes ( array $params = [] )
- Retrieves the partition indexes associated with a table.
- GetPartitions ( array $params = [] )
- Retrieves information about the partitions in a table.
- GetPlan ( array $params = [] )
- Gets code to perform a specified mapping.
- GetRegistry ( array $params = [] )
- Describes the specified registry in detail.
- GetResourcePolicies ( array $params = [] )
- Retrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants.
- GetResourcePolicy ( array $params = [] )
- Retrieves a specified resource policy.
- GetSchema ( array $params = [] )
- Describes the specified schema in detail.
- GetSchemaByDefinition ( array $params = [] )
- Retrieves a schema by the SchemaDefinition.
- GetSchemaVersion ( array $params = [] )
- Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
- GetSchemaVersionsDiff ( array $params = [] )
- Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
- GetSecurityConfiguration ( array $params = [] )
- Retrieves a specified security configuration.
- GetSecurityConfigurations ( array $params = [] )
- Retrieves a list of all security configurations.
- GetSession ( array $params = [] )
- Retrieves the session.
- GetStatement ( array $params = [] )
- Retrieves the statement.
- GetTable ( array $params = [] )
- Retrieves the Table definition in a Data Catalog for a specified table.
- GetTableOptimizer ( array $params = [] )
- Returns the configuration of all optimizers associated with a specified table.
- GetTableVersion ( array $params = [] )
- Retrieves a specified version of a table.
- GetTableVersions ( array $params = [] )
- Retrieves a list of strings that identify available versions of a specified table.
- GetTables ( array $params = [] )
- Retrieves the definitions of some or all of the tables in a given Database.
- GetTags ( array $params = [] )
- Retrieves a list of tags associated with a resource.
- GetTrigger ( array $params = [] )
- Retrieves the definition of a trigger.
- GetTriggers ( array $params = [] )
- Gets all the triggers associated with a job.
- GetUnfilteredPartitionMetadata ( array $params = [] )
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
- GetUnfilteredPartitionsMetadata ( array $params = [] )
- Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
- GetUnfilteredTableMetadata ( array $params = [] )
- Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
- GetUsageProfile ( array $params = [] )
- Retrieves information about the specified Glue usage profile.
- GetUserDefinedFunction ( array $params = [] )
- Retrieves a specified function definition from the Data Catalog.
- GetUserDefinedFunctions ( array $params = [] )
- Retrieves multiple function definitions from the Data Catalog.
- GetWorkflow ( array $params = [] )
- Retrieves resource metadata for a workflow.
- GetWorkflowRun ( array $params = [] )
- Retrieves the metadata for a given workflow run.
- GetWorkflowRunProperties ( array $params = [] )
- Retrieves the workflow run properties which were set during the run.
- GetWorkflowRuns ( array $params = [] )
- Retrieves metadata for all runs of a given workflow.
- ImportCatalogToGlue ( array $params = [] )
- Imports an existing Amazon Athena Data Catalog to Glue.
- ListBlueprints ( array $params = [] )
- Lists all the blueprint names in an account.
- ListColumnStatisticsTaskRuns ( array $params = [] )
- List all task runs for a particular account.
- ListCrawlers ( array $params = [] )
- Retrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag.
- ListCrawls ( array $params = [] )
- Returns all the crawls of a specified crawler.
- ListCustomEntityTypes ( array $params = [] )
- Lists all the custom patterns that have been created.
- ListDataQualityResults ( array $params = [] )
- Returns all data quality execution results for your account.
- ListDataQualityRuleRecommendationRuns ( array $params = [] )
- Lists the recommendation runs meeting the filter criteria.
- ListDataQualityRulesetEvaluationRuns ( array $params = [] )
- Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.
- ListDataQualityRulesets ( array $params = [] )
- Returns a paginated list of rulesets for the specified list of Glue tables.
- ListDataQualityStatisticAnnotations ( array $params = [] )
- Retrieve annotations for a data quality statistic.
- ListDataQualityStatistics ( array $params = [] )
- Retrieves a list of data quality statistics.
- ListDevEndpoints ( array $params = [] )
- Retrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag.
- ListJobs ( array $params = [] )
- Retrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag.
- ListMLTransforms ( array $params = [] )
- Retrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag.
- ListRegistries ( array $params = [] )
- Returns a list of registries that you have created, with minimal registry information.
- ListSchemaVersions ( array $params = [] )
- Returns a list of schema versions that you have created, with minimal information.
- ListSchemas ( array $params = [] )
- Returns a list of schemas with minimal details.
- ListSessions ( array $params = [] )
- Retrieve a list of sessions.
- ListStatements ( array $params = [] )
- Lists statements for the session.
- ListTableOptimizerRuns ( array $params = [] )
- Lists the history of previous optimizer runs for a specific table.
- ListTriggers ( array $params = [] )
- Retrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag.
- ListUsageProfiles ( array $params = [] )
- List all the Glue usage profiles.
- ListWorkflows ( array $params = [] )
- Lists names of workflows created in the account.
- PutDataCatalogEncryptionSettings ( array $params = [] )
- Sets the security configuration for a specified catalog.
- PutDataQualityProfileAnnotation ( array $params = [] )
- Annotate all datapoints for a Profile.
- PutResourcePolicy ( array $params = [] )
- Sets the Data Catalog resource policy for access control.
- PutSchemaVersionMetadata ( array $params = [] )
- Puts the metadata key value pair for a specified schema version ID.
- PutWorkflowRunProperties ( array $params = [] )
- Puts the specified workflow run properties for the given workflow run.
- QuerySchemaVersionMetadata ( array $params = [] )
- Queries for the schema version metadata information.
- RegisterSchemaVersion ( array $params = [] )
- Adds a new version to the existing schema.
- RemoveSchemaVersionMetadata ( array $params = [] )
- Removes a key value pair from the schema version metadata for the specified schema version ID.
- ResetJobBookmark ( array $params = [] )
- Resets a bookmark entry.
- ResumeWorkflowRun ( array $params = [] )
- Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
- RunStatement ( array $params = [] )
- Executes the statement.
- SearchTables ( array $params = [] )
- Searches a set of tables based on properties in the table metadata as well as on the parent database.
- StartBlueprintRun ( array $params = [] )
- Starts a new run of the specified blueprint.
- StartColumnStatisticsTaskRun ( array $params = [] )
- Starts a column statistics task run, for a specified table and columns.
- StartCrawler ( array $params = [] )
- Starts a crawl using the specified crawler, regardless of what is scheduled.
- StartCrawlerSchedule ( array $params = [] )
- Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED.
- StartDataQualityRuleRecommendationRun ( array $params = [] )
- Starts a recommendation run that is used to generate rules when you don't know what rules to write.
- StartDataQualityRulesetEvaluationRun ( array $params = [] )
- Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table).
- StartExportLabelsTaskRun ( array $params = [] )
- Begins an asynchronous task to export all labeled data for a particular transform.
- StartImportLabelsTaskRun ( array $params = [] )
- Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
- StartJobRun ( array $params = [] )
- Starts a job run using a job definition.
- StartMLEvaluationTaskRun ( array $params = [] )
- Starts a task to estimate the quality of the transform.
- StartMLLabelingSetGenerationTaskRun ( array $params = [] )
- Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
- StartTrigger ( array $params = [] )
- Starts an existing trigger.
- StartWorkflowRun ( array $params = [] )
- Starts a new run of the specified workflow.
- StopColumnStatisticsTaskRun ( array $params = [] )
- Stops a task run for the specified table.
- StopCrawler ( array $params = [] )
- If the specified crawler is running, stops the crawl.
- StopCrawlerSchedule ( array $params = [] )
- Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running.
- StopSession ( array $params = [] )
- Stops the session.
- StopTrigger ( array $params = [] )
- Stops a specified trigger.
- StopWorkflowRun ( array $params = [] )
- Stops the execution of the specified workflow run.
- TagResource ( array $params = [] )
- Adds tags to a resource.
- TestConnection ( array $params = [] )
- Tests a connection to a service to validate the service credentials that you provide.
- UntagResource ( array $params = [] )
- Removes tags from a resource.
- UpdateBlueprint ( array $params = [] )
- Updates a registered blueprint.
- UpdateClassifier ( array $params = [] )
- Modifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present).
- UpdateColumnStatisticsForPartition ( array $params = [] )
- Creates or updates partition statistics of columns.
- UpdateColumnStatisticsForTable ( array $params = [] )
- Creates or updates table statistics of columns.
- UpdateConnection ( array $params = [] )
- Updates a connection definition in the Data Catalog.
- UpdateCrawler ( array $params = [] )
- Updates a crawler.
- UpdateCrawlerSchedule ( array $params = [] )
- Updates the schedule of a crawler using a cron expression.
- UpdateDataQualityRuleset ( array $params = [] )
- Updates the specified data quality ruleset.
- UpdateDatabase ( array $params = [] )
- Updates an existing database definition in a Data Catalog.
- UpdateDevEndpoint ( array $params = [] )
- Updates a specified development endpoint.
- UpdateJob ( array $params = [] )
- Updates an existing job definition.
- UpdateJobFromSourceControl ( array $params = [] )
- Synchronizes a job from the source control repository.
- UpdateMLTransform ( array $params = [] )
- Updates an existing machine learning transform.
- UpdatePartition ( array $params = [] )
- Updates a partition.
- UpdateRegistry ( array $params = [] )
- Updates an existing registry which is used to hold a collection of schemas.
- UpdateSchema ( array $params = [] )
- Updates the description, compatibility setting, or version checkpoint for a schema set.
- UpdateSourceControlFromJob ( array $params = [] )
- Synchronizes a job to the source control repository.
- UpdateTable ( array $params = [] )
- Updates a metadata table in the Data Catalog.
- UpdateTableOptimizer ( array $params = [] )
- Updates the configuration for an existing table optimizer.
- UpdateTrigger ( array $params = [] )
- Updates a trigger definition.
- UpdateUsageProfile ( array $params = [] )
- Update an Glue usage profile.
- UpdateUserDefinedFunction ( array $params = [] )
- Updates an existing function definition in the Data Catalog.
- UpdateWorkflow ( array $params = [] )
- Updates an existing workflow.
Paginators
Paginators handle automatically iterating over paginated API results. Paginators are associated with specific API operations, and they accept the parameters that the corresponding API operation accepts. You can get a paginator from a client class using getPaginator($paginatorName, $operationParameters). This client supports the following paginators:
- GetBlueprintRuns
- GetClassifiers
- GetColumnStatisticsTaskRuns
- GetConnections
- GetCrawlerMetrics
- GetCrawlers
- GetDatabases
- GetDevEndpoints
- GetJobRuns
- GetJobs
- GetMLTaskRuns
- GetMLTransforms
- GetPartitionIndexes
- GetPartitions
- GetResourcePolicies
- GetSecurityConfigurations
- GetTableVersions
- GetTables
- GetTriggers
- GetUnfilteredPartitionsMetadata
- GetUserDefinedFunctions
- GetWorkflowRuns
- ListBlueprints
- ListColumnStatisticsTaskRuns
- ListCrawlers
- ListCustomEntityTypes
- ListDataQualityResults
- ListDataQualityRuleRecommendationRuns
- ListDataQualityRulesetEvaluationRuns
- ListDataQualityRulesets
- ListDevEndpoints
- ListJobs
- ListMLTransforms
- ListRegistries
- ListSchemaVersions
- ListSchemas
- ListSessions
- ListTableOptimizerRuns
- ListTriggers
- ListUsageProfiles
- ListWorkflows
- SearchTables
Operations
BatchCreatePartition
$result = $client->batchCreatePartition
([/* ... */]); $promise = $client->batchCreatePartitionAsync
([/* ... */]);
Creates one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchCreatePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionInputList' => [ // REQUIRED [ 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog in which the partition is to be created. Currently, this should be the Amazon Web Services account ID.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be created.
- PartitionInputList
-
- Required: Yes
- Type: Array of PartitionInput structures
A list of
PartitionInput
structures that define the partitions to be created. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be created.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValues' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of PartitionError structures
The errors encountered when trying to create the requested partitions.
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
BatchDeleteConnection
$result = $client->batchDeleteConnection
([/* ... */]); $promise = $client->batchDeleteConnectionAsync
([/* ... */]);
Deletes a list of connection definitions from the Data Catalog.
Parameter Syntax
$result = $client->batchDeleteConnection([ 'CatalogId' => '<string>', 'ConnectionNameList' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionNameList
-
- Required: Yes
- Type: Array of strings
A list of names of the connections to delete.
Result Syntax
[ 'Errors' => [ '<NameString>' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], // ... ], 'Succeeded' => ['<string>', ...], ]
Result Details
Members
- Errors
-
- Type: Associative array of custom strings keys (NameString) to ErrorDetail structures
A map of the names of connections that were not successfully deleted to error details.
- Succeeded
-
- Type: Array of strings
A list of names of the connection definitions that were successfully deleted.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchDeletePartition
$result = $client->batchDeletePartition
([/* ... */]); $promise = $client->batchDeletePartitionAsync
([/* ... */]);
Deletes one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchDeletePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionsToDelete' => [ // REQUIRED [ 'Values' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table in question resides.
- PartitionsToDelete
-
- Required: Yes
- Type: Array of PartitionValueList structures
A list of
PartitionInput
structures that define the partitions to be deleted. - TableName
-
- Required: Yes
- Type: string
The name of the table that contains the partitions to be deleted.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValues' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of PartitionError structures
The errors encountered when trying to delete the requested partitions.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchDeleteTable
$result = $client->batchDeleteTable
([/* ... */]); $promise = $client->batchDeleteTableAsync
([/* ... */]);
Deletes multiple tables at once.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling BatchDeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
Parameter Syntax
$result = $client->batchDeleteTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TablesToDelete' => ['<string>', ...], // REQUIRED 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the tables to delete reside. For Hive compatibility, this name is entirely lowercase.
- TablesToDelete
-
- Required: Yes
- Type: Array of strings
A list of the table to delete.
- TransactionId
-
- Type: string
The transaction ID at which to delete the table contents.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'TableName' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of TableError structures
A list of errors encountered in attempting to delete the specified tables.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ResourceNotReadyException:
A resource was not ready for a transaction.
BatchDeleteTableVersion
$result = $client->batchDeleteTableVersion
([/* ... */]); $promise = $client->batchDeleteTableVersionAsync
([/* ... */]);
Deletes a specified batch of versions of a table.
Parameter Syntax
$result = $client->batchDeleteTableVersion([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'VersionIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
- VersionIds
-
- Required: Yes
- Type: Array of strings
A list of the IDs of versions to be deleted. A
VersionId
is a string representation of an integer. Each version is incremented by 1.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'TableName' => '<string>', 'VersionId' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of TableVersionError structures
A list of errors encountered while trying to delete the specified table versions.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchGetBlueprints
$result = $client->batchGetBlueprints
([/* ... */]); $promise = $client->batchGetBlueprintsAsync
([/* ... */]);
Retrieves information about a list of blueprints.
Parameter Syntax
$result = $client->batchGetBlueprints([ 'IncludeBlueprint' => true || false, 'IncludeParameterSpec' => true || false, 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- IncludeBlueprint
-
- Type: boolean
Specifies whether or not to include the blueprint in the response.
- IncludeParameterSpec
-
- Type: boolean
Specifies whether or not to include the parameters, as a JSON string, for the blueprint in the response.
- Names
-
- Required: Yes
- Type: Array of strings
A list of blueprint names.
Result Syntax
[ 'Blueprints' => [ [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'ErrorMessage' => '<string>', 'LastActiveDefinition' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'ParameterSpec' => '<string>', ], 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'ParameterSpec' => '<string>', 'Status' => 'CREATING|ACTIVE|UPDATING|FAILED', ], // ... ], 'MissingBlueprints' => ['<string>', ...], ]
Result Details
Members
- Blueprints
-
- Type: Array of Blueprint structures
Returns a list of blueprint as a
Blueprints
object. - MissingBlueprints
-
- Type: Array of strings
Returns a list of
BlueprintNames
that were not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetCrawlers
$result = $client->batchGetCrawlers
([/* ... */]); $promise = $client->batchGetCrawlersAsync
([/* ... */]);
Returns a list of resource metadata for a given list of crawler names. After calling the ListCrawlers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetCrawlers([ 'CrawlerNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- CrawlerNames
-
- Required: Yes
- Type: Array of strings
A list of crawler names, which might be the names returned from the
ListCrawlers
operation.
Result Syntax
[ 'Crawlers' => [ [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], // ... ], 'CrawlersNotFound' => ['<string>', ...], ]
Result Details
Members
- Crawlers
-
- Type: Array of Crawler structures
A list of crawler definitions.
- CrawlersNotFound
-
- Type: Array of strings
A list of names of crawlers that were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
BatchGetCustomEntityTypes
$result = $client->batchGetCustomEntityTypes
([/* ... */]); $promise = $client->batchGetCustomEntityTypesAsync
([/* ... */]);
Retrieves the details for the custom patterns specified by a list of names.
Parameter Syntax
$result = $client->batchGetCustomEntityTypes([ 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- Names
-
- Required: Yes
- Type: Array of strings
A list of names of the custom patterns that you want to retrieve.
Result Syntax
[ 'CustomEntityTypes' => [ [ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', 'RegexString' => '<string>', ], // ... ], 'CustomEntityTypesNotFound' => ['<string>', ...], ]
Result Details
Members
- CustomEntityTypes
-
- Type: Array of CustomEntityType structures
A list of
CustomEntityType
objects representing the custom patterns that have been created. - CustomEntityTypesNotFound
-
- Type: Array of strings
A list of the names of custom patterns that were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchGetDataQualityResult
$result = $client->batchGetDataQualityResult
([/* ... */]); $promise = $client->batchGetDataQualityResultAsync
([/* ... */]);
Retrieves a list of data quality results for the specified result IDs.
Parameter Syntax
$result = $client->batchGetDataQualityResult([ 'ResultIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResultIds
-
- Required: Yes
- Type: Array of strings
A list of unique result IDs for the data quality results.
Result Syntax
[ 'Results' => [ [ 'AnalyzerResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluationMessage' => '<string>', 'Name' => '<string>', ], // ... ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'EvaluationContext' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', 'Observations' => [ [ 'Description' => '<string>', 'MetricBasedObservation' => [ 'MetricName' => '<string>', 'MetricValues' => [ 'ActualValue' => <float>, 'ExpectedValue' => <float>, 'LowerLimit' => <float>, 'UpperLimit' => <float>, ], 'NewRules' => ['<string>', ...], 'StatisticId' => '<string>', ], ], // ... ], 'ProfileId' => '<string>', 'ResultId' => '<string>', 'RuleResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluatedRule' => '<string>', 'EvaluationMessage' => '<string>', 'Name' => '<string>', 'Result' => 'PASS|FAIL|ERROR', ], // ... ], 'RulesetEvaluationRunId' => '<string>', 'RulesetName' => '<string>', 'Score' => <float>, 'StartedOn' => <DateTime>, ], // ... ], 'ResultsNotFound' => ['<string>', ...], ]
Result Details
Members
- Results
-
- Required: Yes
- Type: Array of DataQualityResult structures
A list of
DataQualityResult
objects representing the data quality results. - ResultsNotFound
-
- Type: Array of strings
A list of result IDs for which results were not found.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
BatchGetDevEndpoints
$result = $client->batchGetDevEndpoints
([/* ... */]); $promise = $client->batchGetDevEndpointsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of development endpoint names. After calling the ListDevEndpoints
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetDevEndpoints([ 'DevEndpointNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- DevEndpointNames
-
- Required: Yes
- Type: Array of strings
The list of
DevEndpoint
names, which might be the names returned from theListDevEndpoint
operation.
Result Syntax
[ 'DevEndpoints' => [ [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], // ... ], 'DevEndpointsNotFound' => ['<string>', ...], ]
Result Details
Members
- DevEndpoints
-
- Type: Array of DevEndpoint structures
A list of
DevEndpoint
definitions. - DevEndpointsNotFound
-
- Type: Array of strings
A list of
DevEndpoints
not found.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetJobs
$result = $client->batchGetJobs
([/* ... */]); $promise = $client->batchGetJobsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of job names. After calling the ListJobs
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetJobs([ 'JobNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- JobNames
-
- Required: Yes
- Type: Array of strings
A list of job names, which might be the names returned from the
ListJobs
operation.
Result Syntax
[ 'Jobs' => [ [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', 'Type' => 'str|int|float|complex|bool|list|null', 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', 'TransformName' => '<string>', 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'Filter' => [ 'Filters' => [ [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', 'Values' => [ [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', 'Value' => ['<string>', ...], ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'LogicalOperator' => 'AND|OR', 'Name' => '<string>', ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ [ 'From' => '<string>', 'Keys' => [ ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', 'Name' => '<string>', ], 'Merge' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PrimaryKeys' => [ ['<string>', ...], // ... ], 'Source' => '<string>', ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'MaskValue' => '<string>', 'Name' => '<string>', 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'Recipe' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RecipeReference' => [ 'RecipeArn' => '<string>', 'RecipeVersion' => '<string>', ], 'RecipeSteps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'RenameField' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'SourcePath' => ['<string>', ...], 'TargetPath' => ['<string>', ...], ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'S3CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'QuoteChar' => 'quote|quillemet|single_quote|disabled', 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'gzip|lzo|uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SnowflakeSource' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ [ 'Alias' => '<string>', 'From' => '<string>', ], // ... ], 'SqlQuery' => '<string>', ], 'Spigot' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Path' => '<string>', 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'UnionType' => 'ALL|DISTINCT', ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LastModifiedOn' => <DateTime>, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], 'JobsNotFound' => ['<string>', ...], ]
Result Details
Members
- Jobs
-
- Type: Array of Job structures
A list of job definitions.
- JobsNotFound
-
- Type: Array of strings
A list of names of jobs not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetPartition
$result = $client->batchGetPartition
([/* ... */]); $promise = $client->batchGetPartitionAsync
([/* ... */]);
Retrieves partitions in a batch request.
Parameter Syntax
$result = $client->batchGetPartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionsToGet' => [ // REQUIRED [ 'Values' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionsToGet
-
- Required: Yes
- Type: Array of PartitionValueList structures
A list of partition values identifying the partitions to retrieve.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'Partitions' => [ [ 'CatalogId' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'LastAccessTime' => <DateTime>, 'LastAnalyzedTime' => <DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', 'SortOrder' => <integer>, ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableName' => '<string>', 'Values' => ['<string>', ...], ], // ... ], 'UnprocessedKeys' => [ [ 'Values' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Partitions
-
- Type: Array of Partition structures
A list of the requested partitions.
- UnprocessedKeys
-
- Type: Array of PartitionValueList structures
A list of the partition values in the request for which partitions were not returned.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
- InvalidStateException:
An error that indicates your data is in an invalid state.
- FederationSourceException:
A federation source failed.
- FederationSourceRetryableException:
A federation source failed, but the operation may be retried.
BatchGetTableOptimizer
$result = $client->batchGetTableOptimizer
([/* ... */]); $promise = $client->batchGetTableOptimizerAsync
([/* ... */]);
Returns the configuration for the specified table optimizers.
Parameter Syntax
$result = $client->batchGetTableOptimizer([ 'Entries' => [ // REQUIRED [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'tableName' => '<string>', 'type' => 'compaction|retention|orphan_file_deletion', ], // ... ], ]);
Parameter Details
Members
- Entries
-
- Required: Yes
- Type: Array of BatchGetTableOptimizerEntry structures
A list of
BatchGetTableOptimizerEntry
objects specifying the table optimizers to retrieve.
Result Syntax
[ 'Failures' => [ [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'tableName' => '<string>', 'type' => 'compaction|retention|orphan_file_deletion', ], // ... ], 'TableOptimizers' => [ [ 'catalogId' => '<string>', 'databaseName' => '<string>', 'tableName' => '<string>', 'tableOptimizer' => [ 'configuration' => [ 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'lastRun' => [ 'compactionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfBytesCompacted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfFilesCompacted' => <integer>, ], ], 'endTimestamp' => <DateTime>, 'error' => '<string>', 'eventType' => 'starting|completed|failed|in_progress', 'metrics' => [ 'JobDurationInHour' => '<string>', 'NumberOfBytesCompacted' => '<string>', 'NumberOfDpus' => '<string>', 'NumberOfFilesCompacted' => '<string>', ], 'orphanFileDeletionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDpus' => <integer>, 'NumberOfOrphanFilesDeleted' => <integer>, ], ], 'retentionMetrics' => [ 'IcebergMetrics' => [ 'JobDurationInHour' => <float>, 'NumberOfDataFilesDeleted' => <integer>, 'NumberOfDpus' => <integer>, 'NumberOfManifestFilesDeleted' => <integer>, 'NumberOfManifestListsDeleted' => <integer>, ], ], 'startTimestamp' => <DateTime>, ], 'type' => 'compaction|retention|orphan_file_deletion', ], ], // ... ], ]
Result Details
Members
- Failures
-
- Type: Array of BatchGetTableOptimizerError structures
A list of errors from the operation.
- TableOptimizers
-
- Type: Array of BatchTableOptimizer structures
A list of
BatchTableOptimizer
objects.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
BatchGetTriggers
$result = $client->batchGetTriggers
([/* ... */]); $promise = $client->batchGetTriggersAsync
([/* ... */]);
Returns a list of resource metadata for a given list of trigger names. After calling the ListTriggers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetTriggers([ 'TriggerNames' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- TriggerNames
-
- Required: Yes
- Type: Array of strings
A list of trigger names, which may be the names returned from the
ListTriggers
operation.
Result Syntax
[ 'Triggers' => [ [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], // ... ], 'TriggersNotFound' => ['<string>', ...], ]
Result Details
Members
- Triggers
-
- Type: Array of Trigger structures
A list of trigger definitions.
- TriggersNotFound
-
- Type: Array of strings
A list of names of triggers not found.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchGetWorkflows
$result = $client->batchGetWorkflows
([/* ... */]); $promise = $client->batchGetWorkflowsAsync
([/* ... */]);
Returns a list of resource metadata for a given list of workflow names. After calling the ListWorkflows
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
Parameter Syntax
$result = $client->batchGetWorkflows([ 'IncludeGraph' => true || false, 'Names' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- IncludeGraph
-
- Type: boolean
Specifies whether to include a graph when returning the workflow resource metadata.
- Names
-
- Required: Yes
- Type: Array of strings
A list of workflow names, which may be the names returned from the
ListWorkflows
operation.
Result Syntax
[ 'MissingWorkflows' => ['<string>', ...], 'Workflows' => [ [ 'BlueprintDetails' => [ 'BlueprintName' => '<string>', 'RunId' => '<string>', ], 'CreatedOn' => <DateTime>, 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'LastModifiedOn' => <DateTime>, 'LastRun' => [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Graph' => [ 'Edges' => [ [ 'DestinationId' => '<string>', 'SourceId' => '<string>', ], // ... ], 'Nodes' => [ [ 'CrawlerDetails' => [ 'Crawls' => [ [ 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', ], // ... ], ], 'JobDetails' => [ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], ], 'Name' => '<string>', 'TriggerDetails' => [ 'Trigger' => [ 'Actions' => [ [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Id' => '<string>', 'Name' => '<string>', 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'State' => 'CREATING|CREATED|ACTIVATING|ACTIVATED|DEACTIVATING|DEACTIVATED|DELETING|UPDATING', 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', 'WorkflowName' => '<string>', ], ], 'Type' => 'CRAWLER|JOB|TRIGGER', 'UniqueId' => '<string>', ], // ... ], ], 'Name' => '<string>', 'PreviousRunId' => '<string>', 'StartedOn' => <DateTime>, 'StartingEventBatchCondition' => [ 'BatchSize' => <integer>, 'BatchWindow' => <integer>, ], 'Statistics' => [ 'ErroredActions' => <integer>, 'FailedActions' => <integer>, 'RunningActions' => <integer>, 'StoppedActions' => <integer>, 'SucceededActions' => <integer>, 'TimeoutActions' => <integer>, 'TotalActions' => <integer>, 'WaitingActions' => <integer>, ], 'Status' => 'RUNNING|COMPLETED|STOPPING|STOPPED|ERROR', 'WorkflowRunId' => '<string>', 'WorkflowRunProperties' => ['<string>', ...], ], 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', ], // ... ], ]
Result Details
Members
- MissingWorkflows
-
- Type: Array of strings
A list of names of workflows not found.
- Workflows
-
- Type: Array of Workflow structures
A list of workflow resource metadata.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
BatchPutDataQualityStatisticAnnotation
$result = $client->batchPutDataQualityStatisticAnnotation
([/* ... */]); $promise = $client->batchPutDataQualityStatisticAnnotationAsync
([/* ... */]);
Annotate datapoints over time for a specific data quality statistic.
Parameter Syntax
$result = $client->batchPutDataQualityStatisticAnnotation([ 'ClientToken' => '<string>', 'InclusionAnnotations' => [ // REQUIRED [ 'InclusionAnnotation' => 'INCLUDE|EXCLUDE', 'ProfileId' => '<string>', 'StatisticId' => '<string>', ], // ... ], ]);
Parameter Details
Members
- ClientToken
-
- Type: string
Client Token.
- InclusionAnnotations
-
- Required: Yes
- Type: Array of DatapointInclusionAnnotation structures
A list of
DatapointInclusionAnnotation
's.
Result Syntax
[ 'FailedInclusionAnnotations' => [ [ 'FailureReason' => '<string>', 'ProfileId' => '<string>', 'StatisticId' => '<string>', ], // ... ], ]
Result Details
Members
- FailedInclusionAnnotations
-
- Type: Array of AnnotationError structures
A list of
AnnotationError
's.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
BatchStopJobRun
$result = $client->batchStopJobRun
([/* ... */]); $promise = $client->batchStopJobRunAsync
([/* ... */]);
Stops one or more job runs for a specified job definition.
Parameter Syntax
$result = $client->batchStopJobRun([ 'JobName' => '<string>', // REQUIRED 'JobRunIds' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition for which to stop job runs.
- JobRunIds
-
- Required: Yes
- Type: Array of strings
A list of the
JobRunIds
that should be stopped for that job definition.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'JobName' => '<string>', 'JobRunId' => '<string>', ], // ... ], 'SuccessfulSubmissions' => [ [ 'JobName' => '<string>', 'JobRunId' => '<string>', ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of BatchStopJobRunError structures
A list of the errors that were encountered in trying to stop
JobRuns
, including theJobRunId
for which each error was encountered and details about the error. - SuccessfulSubmissions
-
- Type: Array of BatchStopJobRunSuccessfulSubmission structures
A list of the JobRuns that were successfully submitted for stopping.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
BatchUpdatePartition
$result = $client->batchUpdatePartition
([/* ... */]); $promise = $client->batchUpdatePartitionAsync
([/* ... */]);
Updates one or more partitions in a batch operation.
Parameter Syntax
$result = $client->batchUpdatePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Entries' => [ // REQUIRED [ 'PartitionInput' => [ // REQUIRED 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], 'PartitionValueList' => ['<string>', ...], // REQUIRED ], // ... ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog in which the partition is to be updated. Currently, this should be the Amazon Web Services account ID.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be updated.
- Entries
-
- Required: Yes
- Type: Array of BatchUpdatePartitionRequestEntry structures
A list of up to 100
BatchUpdatePartitionRequestEntry
objects to update. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be updated.
Result Syntax
[ 'Errors' => [ [ 'ErrorDetail' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'PartitionValueList' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- Errors
-
- Type: Array of BatchUpdatePartitionFailureEntry structures
The errors encountered when trying to update the requested partitions. A list of
BatchUpdatePartitionFailureEntry
objects.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- GlueEncryptionException:
An encryption operation failed.
CancelDataQualityRuleRecommendationRun
$result = $client->cancelDataQualityRuleRecommendationRun
([/* ... */]); $promise = $client->cancelDataQualityRuleRecommendationRunAsync
([/* ... */]);
Cancels the specified recommendation run that was being used to generate rules.
Parameter Syntax
$result = $client->cancelDataQualityRuleRecommendationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelDataQualityRulesetEvaluationRun
$result = $client->cancelDataQualityRulesetEvaluationRun
([/* ... */]); $promise = $client->cancelDataQualityRulesetEvaluationRunAsync
([/* ... */]);
Cancels a run where a ruleset is being evaluated against a data source.
Parameter Syntax
$result = $client->cancelDataQualityRulesetEvaluationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelMLTaskRun
$result = $client->cancelMLTaskRun
([/* ... */]); $promise = $client->cancelMLTaskRunAsync
([/* ... */]);
Cancels (stops) a task run. Machine learning task runs are asynchronous tasks that Glue runs on your behalf as part of various machine learning workflows. You can cancel a machine learning task run at any time by calling CancelMLTaskRun
with a task run's parent transform's TransformID
and the task run's TaskRunId
.
Parameter Syntax
$result = $client->cancelMLTaskRun([ 'TaskRunId' => '<string>', // REQUIRED 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TaskRunId
-
- Required: Yes
- Type: string
A unique identifier for the task run.
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the machine learning transform.
Result Syntax
[ 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'TaskRunId' => '<string>', 'TransformId' => '<string>', ]
Result Details
Members
- Status
-
- Type: string
The status for this run.
- TaskRunId
-
- Type: string
The unique identifier for the task run.
- TransformId
-
- Type: string
The unique identifier of the machine learning transform.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
CancelStatement
$result = $client->cancelStatement
([/* ... */]); $promise = $client->cancelStatementAsync
([/* ... */]);
Cancels the statement.
Parameter Syntax
$result = $client->cancelStatement([ 'Id' => <integer>, // REQUIRED 'RequestOrigin' => '<string>', 'SessionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: int
The ID of the statement to be cancelled.
- RequestOrigin
-
- Type: string
The origin of the request to cancel the statement.
- SessionId
-
- Required: Yes
- Type: string
The Session ID of the statement to be cancelled.
Result Syntax
[]
Result Details
Errors
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
CheckSchemaVersionValidity
$result = $client->checkSchemaVersionValidity
([/* ... */]); $promise = $client->checkSchemaVersionValidityAsync
([/* ... */]);
Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat
as the format. Since it does not take a schema set name, no compatibility checks are performed.
Parameter Syntax
$result = $client->checkSchemaVersionValidity([ 'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED 'SchemaDefinition' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DataFormat
-
- Required: Yes
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - SchemaDefinition
-
- Required: Yes
- Type: string
The definition of the schema that has to be validated.
Result Syntax
[ 'Error' => '<string>', 'Valid' => true || false, ]
Result Details
Members
- Error
-
- Type: string
A validation failure error message.
- Valid
-
- Type: boolean
Return true, if the schema is valid and false otherwise.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
CreateBlueprint
$result = $client->createBlueprint
([/* ... */]); $promise = $client->createBlueprintAsync
([/* ... */]);
Registers a blueprint with Glue.
Parameter Syntax
$result = $client->createBlueprint([ 'BlueprintLocation' => '<string>', // REQUIRED 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- BlueprintLocation
-
- Required: Yes
- Type: string
Specifies a path in Amazon S3 where the blueprint is published.
- Description
-
- Type: string
A description of the blueprint.
- Name
-
- Required: Yes
- Type: string
The name of the blueprint.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to be applied to this blueprint.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Returns the name of the blueprint that was registered.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateClassifier
$result = $client->createClassifier
([/* ... */]); $promise = $client->createClassifierAsync
([/* ... */]);
Creates a classifier in the user's account. This can be a GrokClassifier
, an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on which field of the request is present.
Parameter Syntax
$result = $client->createClassifier([ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', ], 'GrokClassifier' => [ 'Classification' => '<string>', // REQUIRED 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'JsonClassifier' => [ 'JsonPath' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'XMLClassifier' => [ 'Classification' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RowTag' => '<string>', ], ]);
Parameter Details
Members
- CsvClassifier
-
- Type: CreateCsvClassifierRequest structure
A
CsvClassifier
object specifying the classifier to create. - GrokClassifier
-
- Type: CreateGrokClassifierRequest structure
A
GrokClassifier
object specifying the classifier to create. - JsonClassifier
-
- Type: CreateJsonClassifierRequest structure
A
JsonClassifier
object specifying the classifier to create. - XMLClassifier
-
- Type: CreateXMLClassifierRequest structure
An
XMLClassifier
object specifying the classifier to create.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
CreateConnection
$result = $client->createConnection
([/* ... */]); $promise = $client->createConnectionAsync
([/* ... */]);
Creates a connection definition in the Data Catalog.
Connections used for creating federated resources require the IAM glue:PassConnection
permission.
Parameter Syntax
$result = $client->createConnection([ 'CatalogId' => '<string>', 'ConnectionInput' => [ // REQUIRED 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'AuthorizationCodeProperties' => [ 'AuthorizationCode' => '<string>', 'RedirectUri' => '<string>', ], 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], // REQUIRED 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', // REQUIRED 'Description' => '<string>', 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'ValidateCredentials' => true || false, ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the connection. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionInput
-
- Required: Yes
- Type: ConnectionInput structure
A
ConnectionInput
object defining the connection to create. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags you assign to the connection.
Result Syntax
[ 'CreateConnectionStatus' => 'READY|IN_PROGRESS|FAILED', ]
Result Details
Members
- CreateConnectionStatus
-
- Type: string
The status of the connection creation request. The request can take some time for certain authentication types, for example when creating an OAuth connection with token exchange over VPC.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
CreateCrawler
$result = $client->createCrawler
([/* ... */]); $promise = $client->createCrawlerAsync
([/* ... */]);
Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets
field, the jdbcTargets
field, or the DynamoDBTargets
field.
Parameter Syntax
$result = $client->createCrawler([ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlerSecurityConfiguration' => '<string>', 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', // REQUIRED 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', // REQUIRED 'Schedule' => '<string>', 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'TablePrefix' => '<string>', 'Tags' => ['<string>', ...], 'Targets' => [ // REQUIRED 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], // REQUIRED ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], ]);
Parameter Details
Members
- Classifiers
-
- Type: Array of strings
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
- Configuration
-
- Type: string
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
- CrawlerSecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used by this crawler. - DatabaseName
-
- Type: string
The Glue database where results are written, such as:
arn:aws:daylight:us-east-1::database/sometable/*
. - Description
-
- Type: string
A description of the new crawler.
- LakeFormationConfiguration
-
- Type: LakeFormationConfiguration structure
Specifies Lake Formation configuration settings for the crawler.
- LineageConfiguration
-
- Type: LineageConfiguration structure
Specifies data lineage configuration settings for the crawler.
- Name
-
- Required: Yes
- Type: string
Name of the new crawler.
- RecrawlPolicy
-
- Type: RecrawlPolicy structure
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
- Role
-
- Required: Yes
- Type: string
The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources.
- Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
. - SchemaChangePolicy
-
- Type: SchemaChangePolicy structure
The policy for the crawler's update and deletion behavior.
- TablePrefix
-
- Type: string
The table prefix used for catalog tables that are created.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Targets
-
- Required: Yes
- Type: CrawlerTargets structure
A list of collection of targets to crawl.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateCustomEntityType
$result = $client->createCustomEntityType
([/* ... */]); $promise = $client->createCustomEntityTypeAsync
([/* ... */]);
Creates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data.
Each custom pattern you create specifies a regular expression and an optional list of context words. If no context words are passed only a regular expression is checked.
Parameter Syntax
$result = $client->createCustomEntityType([ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'RegexString' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- ContextWords
-
- Type: Array of strings
A list of context words. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.
If no context words are passed only a regular expression is checked.
- Name
-
- Required: Yes
- Type: string
A name for the custom pattern that allows it to be retrieved or deleted later. This name must be unique per Amazon Web Services account.
- RegexString
-
- Required: Yes
- Type: string
A regular expression string that is used for detecting sensitive data in a custom pattern.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the custom entity type.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the custom pattern you created.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateDataQualityRuleset
$result = $client->createDataQualityRuleset
([/* ... */]); $promise = $client->createDataQualityRulesetAsync
([/* ... */]);
Creates a data quality ruleset with DQDL rules applied to a specified Glue table.
You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the Glue developer guide.
Parameter Syntax
$result = $client->createDataQualityRuleset([ 'ClientToken' => '<string>', 'DataQualitySecurityConfiguration' => '<string>', 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Ruleset' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], ]);
Parameter Details
Members
- ClientToken
-
- Type: string
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- Description
-
- Type: string
A description of the data quality ruleset.
- Name
-
- Required: Yes
- Type: string
A unique name for the data quality ruleset.
- Ruleset
-
- Required: Yes
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the data quality ruleset.
- TargetTable
-
- Type: DataQualityTargetTable structure
A target table associated with the data quality ruleset.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
A unique name for the data quality ruleset.
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateDatabase
$result = $client->createDatabase
([/* ... */]); $promise = $client->createDatabaseAsync
([/* ... */]);
Creates a new database in a Data Catalog.
Parameter Syntax
$result = $client->createDatabase([ 'CatalogId' => '<string>', 'DatabaseInput' => [ // REQUIRED 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the database. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseInput
-
- Required: Yes
- Type: DatabaseInput structure
The metadata for the database.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags you assign to the database.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- FederatedResourceAlreadyExistsException:
A federated resource already exists.
CreateDevEndpoint
$result = $client->createDevEndpoint
([/* ... */]); $promise = $client->createDevEndpointAsync
([/* ... */]);
Creates a new development endpoint.
Parameter Syntax
$result = $client->createDevEndpoint([ 'Arguments' => ['<string>', ...], 'EndpointName' => '<string>', // REQUIRED 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'GlueVersion' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'SubnetId' => '<string>', 'Tags' => ['<string>', ...], 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
A map of arguments used to configure the
DevEndpoint
. - EndpointName
-
- Required: Yes
- Type: string
The name to be assigned to the new
DevEndpoint
. - ExtraJarsS3Path
-
- Type: string
The path to one or more Java
.jar
files in an S3 bucket that should be loaded in yourDevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an Amazon S3 bucket that should be loaded in your
DevEndpoint
. Multiple values must be complete paths separated by a comma.You can only use pure Python libraries with a
DevEndpoint
. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported. - GlueVersion
-
- Type: string
Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.
For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Development endpoints that are created without specifying a Glue version default to Glue 0.9.
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - NumberOfNodes
-
- Type: int
The number of Glue Data Processing Units (DPUs) to allocate to this
DevEndpoint
. - NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated to the development endpoint.The maximum number of workers you can define are 299 for
G.1X
, and 149 forG.2X
. - PublicKey
-
- Type: string
The public key to be used by this
DevEndpoint
for authentication. This attribute is provided for backward compatibility because the recommended attribute to use is public keys. - PublicKeys
-
- Type: Array of strings
A list of public keys to be used by the development endpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.
If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys. Call the
UpdateDevEndpoint
API with the public key content in thedeletePublicKeys
attribute, and the list of new keys in theaddPublicKeys
attribute. - RoleArn
-
- Required: Yes
- Type: string
The IAM role for the
DevEndpoint
. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with thisDevEndpoint
. - SecurityGroupIds
-
- Type: Array of strings
Security group IDs for the security groups to be used by the new
DevEndpoint
. - SubnetId
-
- Type: string
The subnet ID for the new
DevEndpoint
to use. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this DevEndpoint. You may use tags to limit access to the DevEndpoint. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated to the development endpoint. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
Known issue: when a development endpoint is created with the
G.2X
WorkerType
configuration, the Spark drivers for the development endpoint will run on 4 vCPU, 16 GB of memory, and a 64 GB disk.
Result Syntax
[ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ]
Result Details
Members
- Arguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The map of arguments used to configure this
DevEndpoint
.Valid arguments are:
-
"--enable-glue-datacatalog": ""
You can specify a version of Python support for development endpoints by using the
Arguments
parameter in theCreateDevEndpoint
orUpdateDevEndpoint
APIs. If no arguments are provided, the version defaults to Python 2. - AvailabilityZone
-
- Type: string
The Amazon Web Services Availability Zone where this
DevEndpoint
is located. - CreatedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The point in time at which this
DevEndpoint
was created. - EndpointName
-
- Type: string
The name assigned to the new
DevEndpoint
. - ExtraJarsS3Path
-
- Type: string
Path to one or more Java
.jar
files in an S3 bucket that will be loaded in yourDevEndpoint
. - ExtraPythonLibsS3Path
-
- Type: string
The paths to one or more Python libraries in an S3 bucket that will be loaded in your
DevEndpoint
. - FailureReason
-
- Type: string
The reason for a current failure in this
DevEndpoint
. - GlueVersion
-
- Type: string
Glue version determines the versions of Apache Spark and Python that Glue supports. The Python version indicates the version supported for running your ETL scripts on development endpoints.
For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
- NumberOfNodes
-
- Type: int
The number of Glue Data Processing Units (DPUs) allocated to this DevEndpoint.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated to the development endpoint. - RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the role assigned to the new
DevEndpoint
. - SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure being used with thisDevEndpoint
. - SecurityGroupIds
-
- Type: Array of strings
The security groups assigned to the new
DevEndpoint
. - Status
-
- Type: string
The current status of the new
DevEndpoint
. - SubnetId
-
- Type: string
The subnet ID assigned to the new
DevEndpoint
. - VpcId
-
- Type: string
The ID of the virtual private cloud (VPC) used by this
DevEndpoint
. - WorkerType
-
- Type: string
The type of predefined worker that is allocated to the development endpoint. May be a value of Standard, G.1X, or G.2X.
- YarnEndpointAddress
-
- Type: string
The address of the YARN endpoint used by this
DevEndpoint
. - ZeppelinRemoteSparkInterpreterPort
-
- Type: int
The Apache Zeppelin port for the remote Apache Spark interpreter.
Errors
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateJob
$result = $client->createJob
([/* ... */]); $promise = $client->createJobAsync
([/* ... */]);
Creates a new job definition.
Parameter Syntax
$result = $client->createJob([ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ // REQUIRED [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', // REQUIRED 'Column' => ['<string>', ...], // REQUIRED ], // ... ], 'Groups' => [ // REQUIRED ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Mapping' => [ // REQUIRED [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', // REQUIRED ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', // REQUIRED ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', // REQUIRED 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', // REQUIRED 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', // REQUIRED ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', // REQUIRED 'Data' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'CustomCode' => [ 'ClassName' => '<string>', // REQUIRED 'Code' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', // REQUIRED 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <integer || string || DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', // REQUIRED 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <integer || string || DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'DropFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ // REQUIRED 'Id' => '<string>', // REQUIRED 'Label' => '<string>', // REQUIRED ], 'Value' => '<string>', // REQUIRED ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', // REQUIRED 'Type' => 'str|int|float|complex|bool|list|null', // REQUIRED 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', // REQUIRED 'TransformName' => '<string>', // REQUIRED 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', // REQUIRED 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Filter' => [ 'Filters' => [ // REQUIRED [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', // REQUIRED 'Values' => [ // REQUIRED [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', // REQUIRED 'Value' => ['<string>', ...], // REQUIRED ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'LogicalOperator' => 'AND|OR', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionTable' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ // REQUIRED [ 'From' => '<string>', // REQUIRED 'Keys' => [ // REQUIRED ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], // REQUIRED 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', // REQUIRED 'Name' => '<string>', // REQUIRED ], 'Merge' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PrimaryKeys' => [ // REQUIRED ['<string>', ...], // ... ], 'Source' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'MaskValue' => '<string>', 'Name' => '<string>', // REQUIRED 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', // REQUIRED 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'Recipe' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RecipeReference' => [ 'RecipeArn' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', // REQUIRED ], 'RecipeSteps' => [ [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', // REQUIRED 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'Table' => '<string>', // REQUIRED ], 'RenameField' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'SourcePath' => ['<string>', ...], // REQUIRED 'TargetPath' => ['<string>', ...], // REQUIRED ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', // REQUIRED ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionPredicate' => '<string>', 'Table' => '<string>', // REQUIRED ], 'S3CatalogTarget' => [ 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'QuoteChar' => 'quote|quillemet|single_quote|disabled', // REQUIRED 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', // REQUIRED 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Database' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', // REQUIRED ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], // REQUIRED 'Compression' => 'gzip|lzo|uncompressed|snappy', // REQUIRED 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', // REQUIRED 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], // REQUIRED 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED ], 'SnowflakeSource' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ // REQUIRED 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', // REQUIRED 'ConnectionType' => '<string>', // REQUIRED 'ConnectorName' => '<string>', // REQUIRED 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', // REQUIRED 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ // REQUIRED [ 'Alias' => '<string>', // REQUIRED 'From' => '<string>', // REQUIRED ], // ... ], 'SqlQuery' => '<string>', // REQUIRED ], 'Spigot' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Path' => '<string>', // REQUIRED 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'Paths' => [ // REQUIRED ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], // REQUIRED 'Name' => '<string>', // REQUIRED 'UnionType' => 'ALL|DISTINCT', // REQUIRED ], ], // ... ], 'Command' => [ // REQUIRED 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'Role' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- AllocatedCapacity
-
- Type: int
This parameter is deprecated. Use
MaxCapacity
instead.The number of Glue data processing units (DPUs) to allocate to this Job. You can allocate a minimum of 2 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
- CodeGenConfigurationNodes
-
- Type: Associative array of custom strings keys (NodeId) to CodeGenConfigurationNode structures
The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.
- Command
-
- Required: Yes
- Type: JobCommand structure
The
JobCommand
that runs this job. - Connections
-
- Type: ConnectionsList structure
The connections used for this job.
- DefaultArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
The default arguments for every run of this job, specified as name-value pairs.
You can specify arguments here that your own job-execution script consumes, as well as arguments that Glue itself consumes.
Job arguments may be logged. Do not pass plaintext secrets as arguments. Retrieve secrets from a Glue Connection, Secrets Manager or other secret management mechanism if you intend to keep them within the Job.
For information about how to specify and consume your own Job arguments, see the Calling Glue APIs in Python topic in the developer guide.
For information about the arguments you can provide to this field when configuring Spark jobs, see the Special Parameters Used by Glue topic in the developer guide.
For information about the arguments you can provide to this field when configuring Ray jobs, see Using job parameters in Ray jobs in the developer guide.
- Description
-
- Type: string
Description of the job being defined.
- ExecutionClass
-
- Type: string
Indicates whether the job is run with a standard or flexible execution class. The standard execution-class is ideal for time-sensitive workloads that require fast job startup and dedicated resources.
The flexible execution class is appropriate for time-insensitive jobs whose start and completion times may vary.
Only jobs with Glue version 3.0 and above and command type
glueetl
will be allowed to setExecutionClass
toFLEX
. The flexible execution class is available for Spark jobs. - ExecutionProperty
-
- Type: ExecutionProperty structure
An
ExecutionProperty
specifying the maximum number of concurrent runs allowed for this job. - GlueVersion
-
- Type: string
In Spark jobs,
GlueVersion
determines the versions of Apache Spark and Python that Glue available in a job. The Python version indicates the version supported for jobs of type Spark.Ray jobs should set
GlueVersion
to4.0
or greater. However, the versions of Ray, Python and additional libraries available in your Ray job are determined by theRuntime
parameter of the Job command.For more information about the available Glue versions and corresponding Spark and Python versions, see Glue version in the developer guide.
Jobs that are created without specifying a Glue version default to Glue 0.9.
- JobMode
-
- Type: string
A mode that describes how a job was created. Valid values are:
-
SCRIPT
- The job was created using the Glue Studio script editor. -
VISUAL
- The job was created using the Glue Studio visual editor. -
NOTEBOOK
- The job was created using an interactive sessions notebook.
When the
JobMode
field is missing or null,SCRIPT
is assigned as the default value. - JobRunQueuingEnabled
-
- Type: boolean
Specifies whether job run queuing is enabled for the job runs for this job.
A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing.
If this field does not match the value set in the job run, then the value from the job run field will be used.
- LogUri
-
- Type: string
This field is reserved for future use.
- MaintenanceWindow
-
- Type: string
This field specifies a day of the week and hour for a maintenance window for streaming jobs. Glue periodically performs maintenance activities. During these maintenance windows, Glue will need to restart your streaming jobs.
Glue will restart the job within 3 hours of the specified maintenance window. For instance, if you set up the maintenance window for Monday at 10:00AM GMT, your jobs will be restarted between 10:00AM GMT to 1:00PM GMT.
- MaxCapacity
-
- Type: double
For Glue version 1.0 or earlier jobs, using the standard worker type, the number of Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
For Glue version 2.0+ jobs, you cannot specify a
Maximum capacity
. Instead, you should specify aWorker type
and theNumber of workers
.Do not set
MaxCapacity
if usingWorkerType
andNumberOfWorkers
.The value that can be allocated for
MaxCapacity
depends on whether you are running a Python shell job, an Apache Spark ETL job, or an Apache Spark streaming ETL job:-
When you specify a Python shell job (
JobCommand.Name
="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU. -
When you specify an Apache Spark ETL job (
JobCommand.Name
="glueetl") or Apache Spark streaming ETL job (JobCommand.Name
="gluestreaming"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.
- MaxRetries
-
- Type: int
The maximum number of times to retry this job if it fails.
- Name
-
- Required: Yes
- Type: string
The name you assign to this job definition. It must be unique in your account.
- NonOverridableArguments
-
- Type: Associative array of custom strings keys (GenericString) to strings
Arguments for this job that are not overridden when providing job arguments in a job run, specified as name-value pairs.
- NotificationProperty
-
- Type: NotificationProperty structure
Specifies configuration properties of a job notification.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when a job runs. - Role
-
- Required: Yes
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role associated with this job.
- SecurityConfiguration
-
- Type: string
The name of the
SecurityConfiguration
structure to be used with this job. - SourceControlDetails
-
- Type: SourceControlDetails structure
The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this job. You may use tags to limit access to the job. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Timeout
-
- Type: int
The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours) for batch jobs.Streaming jobs must have timeout values less than 7 days or 10080 minutes. When the value is left blank, the job will be restarted after 7 days based if you have not setup a maintenance window. If you have setup maintenance window, it will be restarted during the maintenance window after 7 days.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, G.8X or G.025X for Spark jobs. Accepts the value Z.2X for Ray jobs.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
G.025X
worker type, each worker maps to 0.25 DPU (2 vCPUs, 4 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for Glue version 3.0 streaming jobs. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The unique name that was provided for this job definition.
Errors
- InvalidInputException:
The input provided was not valid.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- AlreadyExistsException:
A resource to be created or added already exists.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
CreateMLTransform
$result = $client->createMLTransform
([/* ... */]); $promise = $client->createMLTransformAsync
([/* ... */]);
Creates an Glue machine learning transform. This operation creates the transform and all the necessary parameters to train it.
Call this operation as the first step in the process of using a machine learning transform (such as the FindMatches
transform) for deduplicating data. You can provide an optional Description
, in addition to the parameters that you want to use for your algorithm.
You must also specify certain parameters for the tasks that Glue runs on your behalf as part of learning from your data and creating a high-quality machine learning transform. These parameters include Role
, and optionally, AllocatedCapacity
, Timeout
, and MaxRetries
. For more information, see Jobs.
Parameter Syntax
$result = $client->createMLTransform([ 'Description' => '<string>', 'GlueVersion' => '<string>', 'InputRecordTables' => [ // REQUIRED [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ], // ... ], 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'NumberOfWorkers' => <integer>, 'Parameters' => [ // REQUIRED 'FindMatchesParameters' => [ 'AccuracyCostTradeoff' => <float>, 'EnforceProvidedLabels' => true || false, 'PrecisionRecallTradeoff' => <float>, 'PrimaryKeyColumnName' => '<string>', ], 'TransformType' => 'FIND_MATCHES', // REQUIRED ], 'Role' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'TransformEncryption' => [ 'MlUserDataEncryption' => [ 'KmsKeyId' => '<string>', 'MlUserDataEncryptionMode' => 'DISABLED|SSE-KMS', // REQUIRED ], 'TaskRunSecurityConfigurationName' => '<string>', ], 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the machine learning transform that is being defined. The default is an empty string.
- GlueVersion
-
- Type: string
This value determines which version of Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see Glue Versions in the developer guide.
- InputRecordTables
-
- Required: Yes
- Type: Array of GlueTable structures
A list of Glue table definitions used by the transform.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the Glue pricing page.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only.When the
WorkerType
field is set to a value other thanStandard
, theMaxCapacity
field is set automatically and becomes read-only. - MaxRetries
-
- Type: int
The maximum number of times to retry a task for this transform after a task run fails.
- Name
-
- Required: Yes
- Type: string
The unique name that you give the transform when you create it.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
workerType
that are allocated when this task runs.If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). - Parameters
-
- Required: Yes
- Type: TransformParameters structure
The algorithmic parameters that are specific to the transform type used. Conditionally dependent on the transform type.
- Role
-
- Required: Yes
- Type: string
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both Glue service role permissions to Glue resources, and Amazon S3 permissions required by the transform.
-
This role needs Glue service role permissions to allow access to resources in Glue. See Attach a Policy to IAM Users That Access Glue.
-
This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this machine learning transform. You may use tags to limit access to the machine learning transform. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Timeout
-
- Type: int
The timeout of the task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours). - TransformEncryption
-
- Type: TransformEncryption structure
The encryption-at-rest settings of the transform that apply to accessing user data. Machine learning transforms can access user data encrypted in Amazon S3 using KMS.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.
-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker. -
For the
G.2X
worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
MaxCapacity
is a mutually exclusive option withNumberOfWorkers
andWorkerType
.-
If either
NumberOfWorkers
orWorkerType
is set, thenMaxCapacity
cannot be set. -
If
MaxCapacity
is set then neitherNumberOfWorkers
orWorkerType
can be set. -
If
WorkerType
is set, thenNumberOfWorkers
is required (and vice versa). -
MaxCapacity
andNumberOfWorkers
must both be at least 1.
Result Syntax
[ 'TransformId' => '<string>', ]
Result Details
Members
- TransformId
-
- Type: string
A unique identifier that is generated for the transform.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- AccessDeniedException:
Access to a resource was denied.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
CreatePartition
$result = $client->createPartition
([/* ... */]); $promise = $client->createPartitionAsync
([/* ... */]);
Creates a new partition.
Parameter Syntax
$result = $client->createPartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionInput' => [ // REQUIRED 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Parameters' => ['<string>', ...], 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'Values' => ['<string>', ...], ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The Amazon Web Services account ID of the catalog in which the partition is to be created.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the metadata database in which the partition is to be created.
- PartitionInput
-
- Required: Yes
- Type: PartitionInput structure
A
PartitionInput
structure defining the partition to be created. - TableName
-
- Required: Yes
- Type: string
The name of the metadata table in which the partition is to be created.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
CreatePartitionIndex
$result = $client->createPartitionIndex
([/* ... */]); $promise = $client->createPartitionIndexAsync
([/* ... */]);
Creates a specified partition index in an existing table.
Parameter Syntax
$result = $client->createPartitionIndex([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionIndex' => [ // REQUIRED 'IndexName' => '<string>', // REQUIRED 'Keys' => ['<string>', ...], // REQUIRED ], 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
Specifies the name of a database in which you want to create a partition index.
- PartitionIndex
-
- Required: Yes
- Type: PartitionIndex structure
Specifies a
PartitionIndex
structure to create a partition index in an existing table. - TableName
-
- Required: Yes
- Type: string
Specifies the name of a table in which you want to create a partition index.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
CreateRegistry
$result = $client->createRegistry
([/* ... */]); $promise = $client->createRegistryAsync
([/* ... */]);
Creates a new registry which may be used to hold a collection of schemas.
Parameter Syntax
$result = $client->createRegistry([ 'Description' => '<string>', 'RegistryName' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the registry. If description is not provided, there will not be any default value for this.
- RegistryName
-
- Required: Yes
- Type: string
Name of the registry to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API.
Result Syntax
[ 'Description' => '<string>', 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Tags' => ['<string>', ...], ]
Result Details
Members
- Description
-
- Type: string
A description of the registry.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the newly created registry.
- RegistryName
-
- Type: string
The name of the registry.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags for the registry.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
CreateSchema
$result = $client->createSchema
([/* ... */]); $promise = $client->createSchemaAsync
([/* ... */]);
Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.
When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion
API is used.
When this API is called without a RegistryId
, this will create an entry for a "default-registry" in the registry database tables, if it is not already present.
Parameter Syntax
$result = $client->createSchema([ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'DataFormat' => 'AVRO|JSON|PROTOBUF', // REQUIRED 'Description' => '<string>', 'RegistryId' => [ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], 'SchemaDefinition' => '<string>', 'SchemaName' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Compatibility
-
- Type: string
The compatibility mode of the schema. The possible values are:
-
NONE: No compatibility mode applies. You can use this choice in development scenarios or if you do not know the compatibility mode that you want to apply to schemas. Any new version added will be accepted without undergoing a compatibility check.
-
DISABLED: This compatibility choice prevents versioning for a particular schema. You can use this choice to prevent future versioning of a schema.
-
BACKWARD: This compatibility choice is recommended as it allows data receivers to read both the current and one previous schema version. This means that for instance, a new schema version cannot drop data fields or change the type of these fields, so they can't be read by readers using the previous version.
-
BACKWARD_ALL: This compatibility choice allows data receivers to read both the current and all previous schema versions. You can use this choice when you need to delete fields or add optional fields, and check compatibility against all previous schema versions.
-
FORWARD: This compatibility choice allows data receivers to read both the current and one next schema version, but not necessarily later versions. You can use this choice when you need to add fields or delete optional fields, but only check compatibility against the last schema version.
-
FORWARD_ALL: This compatibility choice allows data receivers to read written by producers of any new registered schema. You can use this choice when you need to add fields or delete optional fields, and check compatibility against all previous schema versions.
-
FULL: This compatibility choice allows data receivers to read data written by producers using the previous or next version of the schema, but not necessarily earlier or later versions. You can use this choice when you need to add or remove optional fields, but only check compatibility against the last schema version.
-
FULL_ALL: This compatibility choice allows data receivers to read data written by producers using all previous schema versions. You can use this choice when you need to add or remove optional fields, and check compatibility against all previous schema versions.
- DataFormat
-
- Required: Yes
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - Description
-
- Type: string
An optional description of the schema. If description is not provided, there will not be any automatic default value for this.
- RegistryId
-
- Type: RegistryId structure
This is a wrapper shape to contain the registry identity fields. If this is not provided, the default registry will be used. The ARN format for the same will be:
arn:aws:glue:us-east-2:<customer id>:registry/default-registry:random-5-letter-id
. - SchemaDefinition
-
- Type: string
The schema definition using the
DataFormat
setting forSchemaName
. - SchemaName
-
- Required: Yes
- Type: string
Name of the schema to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API. If specified, follows the Amazon Web Services tags-on-create pattern.
Result Syntax
[ 'Compatibility' => 'NONE|DISABLED|BACKWARD|BACKWARD_ALL|FORWARD|FORWARD_ALL|FULL|FULL_ALL', 'DataFormat' => 'AVRO|JSON|PROTOBUF', 'Description' => '<string>', 'LatestSchemaVersion' => <integer>, 'NextSchemaVersion' => <integer>, 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaCheckpoint' => <integer>, 'SchemaName' => '<string>', 'SchemaStatus' => 'AVAILABLE|PENDING|DELETING', 'SchemaVersionId' => '<string>', 'SchemaVersionStatus' => 'AVAILABLE|PENDING|FAILURE|DELETING', 'Tags' => ['<string>', ...], ]
Result Details
Members
- Compatibility
-
- Type: string
The schema compatibility mode.
- DataFormat
-
- Type: string
The data format of the schema definition. Currently
AVRO
,JSON
andPROTOBUF
are supported. - Description
-
- Type: string
A description of the schema if specified when created.
- LatestSchemaVersion
-
- Type: long (int|float)
The latest version of the schema associated with the returned schema definition.
- NextSchemaVersion
-
- Type: long (int|float)
The next version of the schema associated with the returned schema definition.
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry.
- RegistryName
-
- Type: string
The name of the registry.
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema.
- SchemaCheckpoint
-
- Type: long (int|float)
The version number of the checkpoint (the last time the compatibility mode was changed).
- SchemaName
-
- Type: string
The name of the schema.
- SchemaStatus
-
- Type: string
The status of the schema.
- SchemaVersionId
-
- Type: string
The unique identifier of the first schema version.
- SchemaVersionStatus
-
- Type: string
The status of the first schema version created.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags for the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- EntityNotFoundException:
A specified entity does not exist
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- InternalServiceException:
An internal service error occurred.
CreateScript
$result = $client->createScript
([/* ... */]); $promise = $client->createScriptAsync
([/* ... */]);
Transforms a directed acyclic graph (DAG) into code.
Parameter Syntax
$result = $client->createScript([ 'DagEdges' => [ [ 'Source' => '<string>', // REQUIRED 'Target' => '<string>', // REQUIRED 'TargetParameter' => '<string>', ], // ... ], 'DagNodes' => [ [ 'Args' => [ // REQUIRED [ 'Name' => '<string>', // REQUIRED 'Param' => true || false, 'Value' => '<string>', // REQUIRED ], // ... ], 'Id' => '<string>', // REQUIRED 'LineNumber' => <integer>, 'NodeType' => '<string>', // REQUIRED ], // ... ], 'Language' => 'PYTHON|SCALA', ]);
Parameter Details
Members
- DagEdges
-
- Type: Array of CodeGenEdge structures
A list of the edges in the DAG.
- DagNodes
-
- Type: Array of CodeGenNode structures
A list of the nodes in the DAG.
- Language
-
- Type: string
The programming language of the resulting code from the DAG.
Result Syntax
[ 'PythonScript' => '<string>', 'ScalaCode' => '<string>', ]
Result Details
Members
- PythonScript
-
- Type: string
The Python script generated from the DAG.
- ScalaCode
-
- Type: string
The Scala code generated from the DAG.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
CreateSecurityConfiguration
$result = $client->createSecurityConfiguration
([/* ... */]); $promise = $client->createSecurityConfigurationAsync
([/* ... */]);
Creates a new security configuration. A security configuration is a set of security properties that can be used by Glue. You can use a security configuration to encrypt data at rest. For information about using security configurations in Glue, see Encrypting Data Written by Crawlers, Jobs, and Development Endpoints.
Parameter Syntax
$result = $client->createSecurityConfiguration([ 'EncryptionConfiguration' => [ // REQUIRED 'CloudWatchEncryption' => [ 'CloudWatchEncryptionMode' => 'DISABLED|SSE-KMS', 'KmsKeyArn' => '<string>', ], 'JobBookmarksEncryption' => [ 'JobBookmarksEncryptionMode' => 'DISABLED|CSE-KMS', 'KmsKeyArn' => '<string>', ], 'S3Encryption' => [ [ 'KmsKeyArn' => '<string>', 'S3EncryptionMode' => 'DISABLED|SSE-KMS|SSE-S3', ], // ... ], ], 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EncryptionConfiguration
-
- Required: Yes
- Type: EncryptionConfiguration structure
The encryption configuration for the new security configuration.
- Name
-
- Required: Yes
- Type: string
The name for the new security configuration.
Result Syntax
[ 'CreatedTimestamp' => <DateTime>, 'Name' => '<string>', ]
Result Details
Members
- CreatedTimestamp
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The time at which the new security configuration was created.
- Name
-
- Type: string
The name assigned to the new security configuration.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateSession
$result = $client->createSession
([/* ... */]); $promise = $client->createSessionAsync
([/* ... */]);
Creates a new session.
Parameter Syntax
$result = $client->createSession([ 'Command' => [ // REQUIRED 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'GlueVersion' => '<string>', 'Id' => '<string>', // REQUIRED 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'RequestOrigin' => '<string>', 'Role' => '<string>', // REQUIRED 'SecurityConfiguration' => '<string>', 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ]);
Parameter Details
Members
- Command
-
- Required: Yes
- Type: SessionCommand structure
The
SessionCommand
that runs the job. - Connections
-
- Type: ConnectionsList structure
The number of connections to use for the session.
- DefaultArguments
-
- Type: Associative array of custom strings keys (OrchestrationNameString) to strings
A map array of key-value pairs. Max is 75 pairs.
- Description
-
- Type: string
The description of the session.
- GlueVersion
-
- Type: string
The Glue version determines the versions of Apache Spark and Python that Glue supports. The GlueVersion must be greater than 2.0.
- Id
-
- Required: Yes
- Type: string
The ID of the session request.
- IdleTimeout
-
- Type: int
The number of minutes when idle before session times out. Default for Spark ETL jobs is value of Timeout. Consult the documentation for other job types.
- MaxCapacity
-
- Type: double
The number of Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.
- NumberOfWorkers
-
- Type: int
The number of workers of a defined
WorkerType
to use for the session. - RequestOrigin
-
- Type: string
The origin of the request.
- Role
-
- Required: Yes
- Type: string
The IAM Role ARN
- SecurityConfiguration
-
- Type: string
The name of the SecurityConfiguration structure to be used with the session
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The map of key value pairs (tags) belonging to the session.
- Timeout
-
- Type: int
The number of minutes before session times out. Default for Spark ETL jobs is 48 hours (2880 minutes), the maximum session lifetime for this job type. Consult the documentation for other job types.
- WorkerType
-
- Type: string
The type of predefined worker that is allocated when a job runs. Accepts a value of G.1X, G.2X, G.4X, or G.8X for Spark jobs. Accepts the value Z.2X for Ray notebooks.
-
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free), and provides 1 executor per worker. We recommend this worker type for workloads such as data transforms, joins, and queries, to offers a scalable and cost effective way to run most jobs. -
For the
G.4X
worker type, each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs in the following Amazon Web Services Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). -
For the
G.8X
worker type, each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for Glue version 3.0 or later Spark ETL jobs, in the same Amazon Web Services Regions as supported for theG.4X
worker type. -
For the
Z.2X
worker type, each worker maps to 2 M-DPU (8vCPUs, 64 GB of memory) with 128 GB disk (approximately 120GB free), and provides up to 8 Ray workers based on the autoscaler.
Result Syntax
[ 'Session' => [ 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', ], 'CompletedOn' => <DateTime>, 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DPUSeconds' => <float>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <float>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'IdleTimeout' => <integer>, 'MaxCapacity' => <float>, 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Progress' => <float>, 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'Status' => 'PROVISIONING|READY|FAILED|TIMEOUT|STOPPING|STOPPED', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- Session
-
- Type: Session structure
Returns the session object in the response.
Errors
- AccessDeniedException:
Access to a resource was denied.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ValidationException:
A value could not be validated.
- AlreadyExistsException:
A resource to be created or added already exists.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
CreateTable
$result = $client->createTable
([/* ... */]); $promise = $client->createTableAsync
([/* ... */]);
Creates a new table definition in the Data Catalog.
Parameter Syntax
$result = $client->createTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'OpenTableFormatInput' => [ 'IcebergInput' => [ 'MetadataOperation' => 'CREATE', // REQUIRED 'Version' => '<string>', ], ], 'PartitionIndexes' => [ [ 'IndexName' => '<string>', // REQUIRED 'Keys' => ['<string>', ...], // REQUIRED ], // ... ], 'TableInput' => [ // REQUIRED 'Description' => '<string>', 'LastAccessTime' => <integer || string || DateTime>, 'LastAnalyzedTime' => <integer || string || DateTime>, 'Name' => '<string>', // REQUIRED 'Owner' => '<string>', 'Parameters' => ['<string>', ...], 'PartitionKeys' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Retention' => <integer>, 'StorageDescriptor' => [ 'AdditionalLocations' => ['<string>', ...], 'BucketColumns' => ['<string>', ...], 'Columns' => [ [ 'Comment' => '<string>', 'Name' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], 'Type' => '<string>', ], // ... ], 'Compressed' => true || false, 'InputFormat' => '<string>', 'Location' => '<string>', 'NumberOfBuckets' => <integer>, 'OutputFormat' => '<string>', 'Parameters' => ['<string>', ...], 'SchemaReference' => [ 'SchemaId' => [ 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'SchemaVersionId' => '<string>', 'SchemaVersionNumber' => <integer>, ], 'SerdeInfo' => [ 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'SerializationLibrary' => '<string>', ], 'SkewedInfo' => [ 'SkewedColumnNames' => ['<string>', ...], 'SkewedColumnValueLocationMaps' => ['<string>', ...], 'SkewedColumnValues' => ['<string>', ...], ], 'SortColumns' => [ [ 'Column' => '<string>', // REQUIRED 'SortOrder' => <integer>, // REQUIRED ], // ... ], 'StoredAsSubDirectories' => true || false, ], 'TableType' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Name' => '<string>', 'Region' => '<string>', ], 'ViewDefinition' => [ 'Definer' => '<string>', 'IsProtected' => true || false, 'Representations' => [ [ 'Dialect' => 'REDSHIFT|ATHENA|SPARK', 'DialectVersion' => '<string>', 'ValidationConnection' => '<string>', 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], // ... ], 'SubObjects' => ['<string>', ...], ], 'ViewExpandedText' => '<string>', 'ViewOriginalText' => '<string>', ], 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the
Table
. If none is supplied, the Amazon Web Services account ID is used by default. - DatabaseName
-
- Required: Yes
- Type: string
The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.
- OpenTableFormatInput
-
- Type: OpenTableFormatInput structure
Specifies an
OpenTableFormatInput
structure when creating an open format table. - PartitionIndexes
-
- Type: Array of PartitionIndex structures
A list of partition indexes,
PartitionIndex
structures, to create in the table. - TableInput
-
- Required: Yes
- Type: TableInput structure
The
TableInput
object that defines the metadata table to create in the catalog. - TransactionId
-
- Type: string
The ID of the transaction.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- ResourceNotReadyException:
A resource was not ready for a transaction.
CreateTableOptimizer
$result = $client->createTableOptimizer
([/* ... */]); $promise = $client->createTableOptimizerAsync
([/* ... */]);
Creates a new table optimizer for a specific function. compaction
is the only currently supported optimizer type.
Parameter Syntax
$result = $client->createTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'TableOptimizerConfiguration' => [ // REQUIRED 'enabled' => true || false, 'orphanFileDeletionConfiguration' => [ 'icebergConfiguration' => [ 'location' => '<string>', 'orphanFileRetentionPeriodInDays' => <integer>, ], ], 'retentionConfiguration' => [ 'icebergConfiguration' => [ 'cleanExpiredFiles' => true || false, 'numberOfSnapshotsToRetain' => <integer>, 'snapshotRetentionPeriodInDays' => <integer>, ], ], 'roleArn' => '<string>', ], 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- TableOptimizerConfiguration
-
- Required: Yes
- Type: TableOptimizerConfiguration structure
A
TableOptimizerConfiguration
object representing the configuration of a table optimizer. - Type
-
- Required: Yes
- Type: string
The type of table optimizer. Currently, the only valid value is
compaction
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- ValidationException:
A value could not be validated.
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- AlreadyExistsException:
A resource to be created or added already exists.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
CreateTrigger
$result = $client->createTrigger
([/* ... */]); $promise = $client->createTriggerAsync
([/* ... */]);
Creates a new trigger.
Parameter Syntax
$result = $client->createTrigger([ 'Actions' => [ // REQUIRED [ 'Arguments' => ['<string>', ...], 'CrawlerName' => '<string>', 'JobName' => '<string>', 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'SecurityConfiguration' => '<string>', 'Timeout' => <integer>, ], // ... ], 'Description' => '<string>', 'EventBatchingCondition' => [ 'BatchSize' => <integer>, // REQUIRED 'BatchWindow' => <integer>, ], 'Name' => '<string>', // REQUIRED 'Predicate' => [ 'Conditions' => [ [ 'CrawlState' => 'RUNNING|CANCELLING|CANCELLED|SUCCEEDED|FAILED|ERROR', 'CrawlerName' => '<string>', 'JobName' => '<string>', 'LogicalOperator' => 'EQUALS', 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', ], // ... ], 'Logical' => 'AND|ANY', ], 'Schedule' => '<string>', 'StartOnCreation' => true || false, 'Tags' => ['<string>', ...], 'Type' => 'SCHEDULED|CONDITIONAL|ON_DEMAND|EVENT', // REQUIRED 'WorkflowName' => '<string>', ]);
Parameter Details
Members
- Actions
-
- Required: Yes
- Type: Array of Action structures
The actions initiated by this trigger when it fires.
- Description
-
- Type: string
A description of the new trigger.
- EventBatchingCondition
-
- Type: EventBatchingCondition structure
Batch condition that must be met (specified number of events received or batch time window expired) before EventBridge event trigger fires.
- Name
-
- Required: Yes
- Type: string
The name of the trigger.
- Predicate
-
- Type: Predicate structure
A predicate to specify when the new trigger should fire.
This field is required when the trigger type is
CONDITIONAL
. - Schedule
-
- Type: string
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.This field is required when the trigger type is SCHEDULED.
- StartOnCreation
-
- Type: boolean
Set to
true
to startSCHEDULED
andCONDITIONAL
triggers when created. True is not supported forON_DEMAND
triggers. - Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to use with this trigger. You may use tags to limit access to the trigger. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide.
- Type
-
- Required: Yes
- Type: string
The type of the new trigger.
- WorkflowName
-
- Type: string
The name of the workflow associated with the trigger.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- IdempotentParameterMismatchException:
The same unique identifier was associated with two different records.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
CreateUsageProfile
$result = $client->createUsageProfile
([/* ... */]); $promise = $client->createUsageProfileAsync
([/* ... */]);
Creates an Glue usage profile.
Parameter Syntax
$result = $client->createUsageProfile([ 'Configuration' => [ // REQUIRED 'JobConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], 'SessionConfiguration' => [ '<NameString>' => [ 'AllowedValues' => ['<string>', ...], 'DefaultValue' => '<string>', 'MaxValue' => '<string>', 'MinValue' => '<string>', ], // ... ], ], 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Configuration
-
- Required: Yes
- Type: ProfileConfiguration structure
A
ProfileConfiguration
object specifying the job and session values for the profile. - Description
-
- Type: string
A description of the usage profile.
- Name
-
- Required: Yes
- Type: string
The name of the usage profile.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags applied to the usage profile.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the usage profile that was created.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- AlreadyExistsException:
A resource to be created or added already exists.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- OperationNotSupportedException:
The operation is not available in the region.
CreateUserDefinedFunction
$result = $client->createUserDefinedFunction
([/* ... */]); $promise = $client->createUserDefinedFunctionAsync
([/* ... */]);
Creates a new function definition in the Data Catalog.
Parameter Syntax
$result = $client->createUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionInput' => [ // REQUIRED 'ClassName' => '<string>', 'FunctionName' => '<string>', 'OwnerName' => '<string>', 'OwnerType' => 'USER|ROLE|GROUP', 'ResourceUris' => [ [ 'ResourceType' => 'JAR|FILE|ARCHIVE', 'Uri' => '<string>', ], // ... ], ], ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which to create the function. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which to create the function.
- FunctionInput
-
- Required: Yes
- Type: UserDefinedFunctionInput structure
A
FunctionInput
object that defines the function to create in the Data Catalog.
Result Syntax
[]
Result Details
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- GlueEncryptionException:
An encryption operation failed.
CreateWorkflow
$result = $client->createWorkflow
([/* ... */]); $promise = $client->createWorkflowAsync
([/* ... */]);
Creates a new workflow.
Parameter Syntax
$result = $client->createWorkflow([ 'DefaultRunProperties' => ['<string>', ...], 'Description' => '<string>', 'MaxConcurrentRuns' => <integer>, 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- DefaultRunProperties
-
- Type: Associative array of custom strings keys (IdString) to strings
A collection of properties to be used as part of each execution of the workflow.
- Description
-
- Type: string
A description of the workflow.
- MaxConcurrentRuns
-
- Type: int
You can use this parameter to prevent unwanted multiple updates to data, to control costs, or in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.
- Name
-
- Required: Yes
- Type: string
The name to be assigned to the workflow. It should be unique within your account.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
The tags to be used with this workflow.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the workflow which was provided as part of the request.
Errors
- AlreadyExistsException:
A resource to be created or added already exists.
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ResourceNumberLimitExceededException:
A resource numerical limit was exceeded.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteBlueprint
$result = $client->deleteBlueprint
([/* ... */]); $promise = $client->deleteBlueprintAsync
([/* ... */]);
Deletes an existing blueprint.
Parameter Syntax
$result = $client->deleteBlueprint([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the blueprint to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Returns the name of the blueprint that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeleteClassifier
$result = $client->deleteClassifier
([/* ... */]); $promise = $client->deleteClassifierAsync
([/* ... */]);
Removes a classifier from the Data Catalog.
Parameter Syntax
$result = $client->deleteClassifier([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the classifier to remove.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
DeleteColumnStatisticsForPartition
$result = $client->deleteColumnStatisticsForPartition
([/* ... */]); $promise = $client->deleteColumnStatisticsForPartitionAsync
([/* ... */]);
Delete the partition column statistics of a column.
The Identity and Access Management (IAM) permission required for this operation is DeletePartition
.
Parameter Syntax
$result = $client->deleteColumnStatisticsForPartition([ 'CatalogId' => '<string>', 'ColumnName' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnName
-
- Required: Yes
- Type: string
Name of the column.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
A list of partition values identifying the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
DeleteColumnStatisticsForTable
$result = $client->deleteColumnStatisticsForTable
([/* ... */]); $promise = $client->deleteColumnStatisticsForTableAsync
([/* ... */]);
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is DeleteTable
.
Parameter Syntax
$result = $client->deleteColumnStatisticsForTable([ 'CatalogId' => '<string>', 'ColumnName' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnName
-
- Required: Yes
- Type: string
The name of the column.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
DeleteConnection
$result = $client->deleteConnection
([/* ... */]); $promise = $client->deleteConnectionAsync
([/* ... */]);
Deletes a connection from the Data Catalog.
Parameter Syntax
$result = $client->deleteConnection([ 'CatalogId' => '<string>', 'ConnectionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.
- ConnectionName
-
- Required: Yes
- Type: string
The name of the connection to delete.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
DeleteCrawler
$result = $client->deleteCrawler
([/* ... */]); $promise = $client->deleteCrawlerAsync
([/* ... */]);
Removes a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
.
Parameter Syntax
$result = $client->deleteCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the crawler to remove.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- CrawlerRunningException:
The operation cannot be performed because the crawler is already running.
- SchedulerTransitioningException:
The specified scheduler is transitioning.
- OperationTimeoutException:
The operation timed out.
DeleteCustomEntityType
$result = $client->deleteCustomEntityType
([/* ... */]); $promise = $client->deleteCustomEntityTypeAsync
([/* ... */]);
Deletes a custom pattern by specifying its name.
Parameter Syntax
$result = $client->deleteCustomEntityType([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the custom pattern that you want to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the custom pattern you deleted.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
DeleteDataQualityRuleset
$result = $client->deleteDataQualityRuleset
([/* ... */]); $promise = $client->deleteDataQualityRulesetAsync
([/* ... */]);
Deletes a data quality ruleset.
Parameter Syntax
$result = $client->deleteDataQualityRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
A name for the data quality ruleset.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeleteDatabase
$result = $client->deleteDatabase
([/* ... */]); $promise = $client->deleteDatabaseAsync
([/* ... */]);
Removes a specified database from a Data Catalog.
After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling DeleteDatabase
, use DeleteTableVersion
or BatchDeleteTableVersion
, DeletePartition
or BatchDeletePartition
, DeleteUserDefinedFunction
, and DeleteTable
or BatchDeleteTable
, to delete any resources that belong to the database.
Parameter Syntax
$result = $client->deleteDatabase([ 'CatalogId' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.
- Name
-
- Required: Yes
- Type: string
The name of the database to delete. For Hive compatibility, this must be all lowercase.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteDevEndpoint
$result = $client->deleteDevEndpoint
([/* ... */]); $promise = $client->deleteDevEndpointAsync
([/* ... */]);
Deletes a specified development endpoint.
Parameter Syntax
$result = $client->deleteDevEndpoint([ 'EndpointName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EndpointName
-
- Required: Yes
- Type: string
The name of the
DevEndpoint
.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
DeleteJob
$result = $client->deleteJob
([/* ... */]); $promise = $client->deleteJobAsync
([/* ... */]);
Deletes a specified job definition. If the job definition is not found, no exception is thrown.
Parameter Syntax
$result = $client->deleteJob([ 'JobName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition to delete.
Result Syntax
[ 'JobName' => '<string>', ]
Result Details
Members
- JobName
-
- Type: string
The name of the job definition that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteMLTransform
$result = $client->deleteMLTransform
([/* ... */]); $promise = $client->deleteMLTransformAsync
([/* ... */]);
Deletes an Glue machine learning transform. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by Glue. If you no longer need a transform, you can delete it by calling DeleteMLTransforms
. However, any Glue jobs that still reference the deleted transform will no longer succeed.
Parameter Syntax
$result = $client->deleteMLTransform([ 'TransformId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- TransformId
-
- Required: Yes
- Type: string
The unique identifier of the transform to delete.
Result Syntax
[ 'TransformId' => '<string>', ]
Result Details
Members
- TransformId
-
- Type: string
The unique identifier of the transform that was deleted.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
DeletePartition
$result = $client->deletePartition
([/* ... */]); $promise = $client->deletePartitionAsync
([/* ... */]);
Deletes a specified partition.
Parameter Syntax
$result = $client->deletePartition([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partition to be deleted resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table in question resides.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
The values that define the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the table that contains the partition to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeletePartitionIndex
$result = $client->deletePartitionIndex
([/* ... */]); $promise = $client->deletePartitionIndexAsync
([/* ... */]);
Deletes a specified partition index from an existing table.
Parameter Syntax
$result = $client->deletePartitionIndex([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'IndexName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The catalog ID where the table resides.
- DatabaseName
-
- Required: Yes
- Type: string
Specifies the name of a database from which you want to delete a partition index.
- IndexName
-
- Required: Yes
- Type: string
The name of the partition index to be deleted.
- TableName
-
- Required: Yes
- Type: string
Specifies the name of a table from which you want to delete a partition index.
Result Syntax
[]
Result Details
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- ConflictException:
The
CreatePartitions
API was called on a table that has indexes enabled.- GlueEncryptionException:
An encryption operation failed.
DeleteRegistry
$result = $client->deleteRegistry
([/* ... */]); $promise = $client->deleteRegistryAsync
([/* ... */]);
Delete the entire registry including schema and all of its versions. To get the status of the delete operation, you can call the GetRegistry
API after the asynchronous call. Deleting a registry will deactivate all online operations for the registry such as the UpdateRegistry
, CreateSchema
, UpdateSchema
, and RegisterSchemaVersion
APIs.
Parameter Syntax
$result = $client->deleteRegistry([ 'RegistryId' => [ // REQUIRED 'RegistryArn' => '<string>', 'RegistryName' => '<string>', ], ]);
Parameter Details
Members
- RegistryId
-
- Required: Yes
- Type: RegistryId structure
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
Result Syntax
[ 'RegistryArn' => '<string>', 'RegistryName' => '<string>', 'Status' => 'AVAILABLE|DELETING', ]
Result Details
Members
- RegistryArn
-
- Type: string
The Amazon Resource Name (ARN) of the registry being deleted.
- RegistryName
-
- Type: string
The name of the registry being deleted.
- Status
-
- Type: string
The status of the registry. A successful operation will return the
Deleting
status.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteResourcePolicy
$result = $client->deleteResourcePolicy
([/* ... */]); $promise = $client->deleteResourcePolicyAsync
([/* ... */]);
Deletes a specified policy.
Parameter Syntax
$result = $client->deleteResourcePolicy([ 'PolicyHashCondition' => '<string>', 'ResourceArn' => '<string>', ]);
Parameter Details
Members
- PolicyHashCondition
-
- Type: string
The hash value returned when this policy was set.
- ResourceArn
-
- Type: string
The ARN of the Glue resource for the resource policy to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- ConditionCheckFailureException:
A specified condition was not satisfied.
DeleteSchema
$result = $client->deleteSchema
([/* ... */]); $promise = $client->deleteSchemaAsync
([/* ... */]);
Deletes the entire schema set, including the schema set and all of its versions. To get the status of the delete operation, you can call GetSchema
API after the asynchronous call. Deleting a registry will deactivate all online operations for the schema, such as the GetSchemaByDefinition
, and RegisterSchemaVersion
APIs.
Parameter Syntax
$result = $client->deleteSchema([ 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], ]);
Parameter Details
Members
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
Result Syntax
[ 'SchemaArn' => '<string>', 'SchemaName' => '<string>', 'Status' => 'AVAILABLE|PENDING|DELETING', ]
Result Details
Members
- SchemaArn
-
- Type: string
The Amazon Resource Name (ARN) of the schema being deleted.
- SchemaName
-
- Type: string
The name of the schema being deleted.
- Status
-
- Type: string
The status of the schema.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteSchemaVersions
$result = $client->deleteSchemaVersions
([/* ... */]); $promise = $client->deleteSchemaVersionsAsync
([/* ... */]);
Remove versions from the specified schema. A version number or range may be supplied. If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned. Calling the GetSchemaVersions
API after this call will list the status of the deleted versions.
When the range of version numbers contain check pointed version, the API will return a 409 conflict and will not proceed with the deletion. You have to remove the checkpoint first using the DeleteSchemaCheckpoint
API before using this API.
You cannot use the DeleteSchemaVersions
API to delete the first schema version in the schema set. The first schema version can only be deleted by the DeleteSchema
API. This operation will also delete the attached SchemaVersionMetadata
under the schema versions. Hard deletes will be enforced on the database.
If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.
Parameter Syntax
$result = $client->deleteSchemaVersions([ 'SchemaId' => [ // REQUIRED 'RegistryName' => '<string>', 'SchemaArn' => '<string>', 'SchemaName' => '<string>', ], 'Versions' => '<string>', // REQUIRED ]);
Parameter Details
Members
- SchemaId
-
- Required: Yes
- Type: SchemaId structure
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
- Versions
-
- Required: Yes
- Type: string
A version range may be supplied which may be of the format:
-
a single version number, 5
-
a range, 5-8 : deletes versions 5, 6, 7, 8
Result Syntax
[ 'SchemaVersionErrors' => [ [ 'ErrorDetails' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], 'VersionNumber' => <integer>, ], // ... ], ]
Result Details
Members
- SchemaVersionErrors
-
- Type: Array of SchemaVersionErrorItem structures
A list of
SchemaVersionErrorItem
objects, each containing an error and schema version.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteSecurityConfiguration
$result = $client->deleteSecurityConfiguration
([/* ... */]); $promise = $client->deleteSecurityConfigurationAsync
([/* ... */]);
Deletes a specified security configuration.
Parameter Syntax
$result = $client->deleteSecurityConfiguration([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the security configuration to delete.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteSession
$result = $client->deleteSession
([/* ... */]); $promise = $client->deleteSessionAsync
([/* ... */]);
Deletes the session.
Parameter Syntax
$result = $client->deleteSession([ 'Id' => '<string>', // REQUIRED 'RequestOrigin' => '<string>', ]);
Parameter Details
Members
- Id
-
- Required: Yes
- Type: string
The ID of the session to be deleted.
- RequestOrigin
-
- Type: string
The name of the origin of the delete session request.
Result Syntax
[ 'Id' => '<string>', ]
Result Details
Members
- Id
-
- Type: string
Returns the ID of the deleted session.
Errors
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- IllegalSessionStateException:
The session is in an invalid state to perform a requested operation.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteTable
$result = $client->deleteTable
([/* ... */]); $promise = $client->deleteTableAsync
([/* ... */]);
Removes a table definition from the Data Catalog.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.
To ensure the immediate deletion of all related resources, before calling DeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
Parameter Syntax
$result = $client->deleteTable([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'TransactionId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the table resides. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.
- Name
-
- Required: Yes
- Type: string
The name of the table to be deleted. For Hive compatibility, this name is entirely lowercase.
- TransactionId
-
- Type: string
The transaction ID at which to delete the table contents.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
- ResourceNotReadyException:
A resource was not ready for a transaction.
DeleteTableOptimizer
$result = $client->deleteTableOptimizer
([/* ... */]); $promise = $client->deleteTableOptimizerAsync
([/* ... */]);
Deletes an optimizer and all associated metadata for a table. The optimization will no longer be performed on the table.
Parameter Syntax
$result = $client->deleteTableOptimizer([ 'CatalogId' => '<string>', // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'Type' => 'compaction|retention|orphan_file_deletion', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Required: Yes
- Type: string
The Catalog ID of the table.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database in the catalog in which the table resides.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
- Type
-
- Required: Yes
- Type: string
The type of table optimizer.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- ThrottlingException:
The throttling threshhold was exceeded.
DeleteTableVersion
$result = $client->deleteTableVersion
([/* ... */]); $promise = $client->deleteTableVersionAsync
([/* ... */]);
Deletes a specified version of a table.
Parameter Syntax
$result = $client->deleteTableVersion([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'VersionId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the tables reside. If none is provided, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
- TableName
-
- Required: Yes
- Type: string
The name of the table. For Hive compatibility, this name is entirely lowercase.
- VersionId
-
- Required: Yes
- Type: string
The ID of the table version to be deleted. A
VersionID
is a string representation of an integer. Each version is incremented by 1.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteTrigger
$result = $client->deleteTrigger
([/* ... */]); $promise = $client->deleteTriggerAsync
([/* ... */]);
Deletes a specified trigger. If the trigger is not found, no exception is thrown.
Parameter Syntax
$result = $client->deleteTrigger([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the trigger to delete.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
The name of the trigger that was deleted.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
DeleteUsageProfile
$result = $client->deleteUsageProfile
([/* ... */]); $promise = $client->deleteUsageProfileAsync
([/* ... */]);
Deletes the Glue specified usage profile.
Parameter Syntax
$result = $client->deleteUsageProfile([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the usage profile to delete.
Result Syntax
[]
Result Details
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- OperationNotSupportedException:
The operation is not available in the region.
DeleteUserDefinedFunction
$result = $client->deleteUserDefinedFunction
([/* ... */]); $promise = $client->deleteUserDefinedFunctionAsync
([/* ... */]);
Deletes an existing function definition from the Data Catalog.
Parameter Syntax
$result = $client->deleteUserDefinedFunction([ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'FunctionName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the function to be deleted is located. If none is supplied, the Amazon Web Services account ID is used by default.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the function is located.
- FunctionName
-
- Required: Yes
- Type: string
The name of the function definition to be deleted.
Result Syntax
[]
Result Details
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
DeleteWorkflow
$result = $client->deleteWorkflow
([/* ... */]); $promise = $client->deleteWorkflowAsync
([/* ... */]);
Deletes a workflow.
Parameter Syntax
$result = $client->deleteWorkflow([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the workflow to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Type: string
Name of the workflow specified in input.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ConcurrentModificationException:
Two processes are trying to modify a resource simultaneously.
GetBlueprint
$result = $client->getBlueprint
([/* ... */]); $promise = $client->getBlueprintAsync
([/* ... */]);
Retrieves the details of a blueprint.
Parameter Syntax
$result = $client->getBlueprint([ 'IncludeBlueprint' => true || false, 'IncludeParameterSpec' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- IncludeBlueprint
-
- Type: boolean
Specifies whether or not to include the blueprint in the response.
- IncludeParameterSpec
-
- Type: boolean
Specifies whether or not to include the parameter specification.
- Name
-
- Required: Yes
- Type: string
The name of the blueprint.
Result Syntax
[ 'Blueprint' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'CreatedOn' => <DateTime>, 'Description' => '<string>', 'ErrorMessage' => '<string>', 'LastActiveDefinition' => [ 'BlueprintLocation' => '<string>', 'BlueprintServiceLocation' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'ParameterSpec' => '<string>', ], 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'ParameterSpec' => '<string>', 'Status' => 'CREATING|ACTIVE|UPDATING|FAILED', ], ]
Result Details
Members
- Blueprint
-
- Type: Blueprint structure
Returns a
Blueprint
object.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetBlueprintRun
$result = $client->getBlueprintRun
([/* ... */]); $promise = $client->getBlueprintRunAsync
([/* ... */]);
Retrieves the details of a blueprint run.
Parameter Syntax
$result = $client->getBlueprintRun([ 'BlueprintName' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- BlueprintName
-
- Required: Yes
- Type: string
The name of the blueprint.
- RunId
-
- Required: Yes
- Type: string
The run ID for the blueprint run you want to retrieve.
Result Syntax
[ 'BlueprintRun' => [ 'BlueprintName' => '<string>', 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Parameters' => '<string>', 'RoleArn' => '<string>', 'RollbackErrorMessage' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK', 'WorkflowName' => '<string>', ], ]
Result Details
Members
- BlueprintRun
-
- Type: BlueprintRun structure
Returns a
BlueprintRun
object.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetBlueprintRuns
$result = $client->getBlueprintRuns
([/* ... */]); $promise = $client->getBlueprintRunsAsync
([/* ... */]);
Retrieves the details of blueprint runs for a specified blueprint.
Parameter Syntax
$result = $client->getBlueprintRuns([ 'BlueprintName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- BlueprintName
-
- Required: Yes
- Type: string
The name of the blueprint.
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'BlueprintRuns' => [ [ 'BlueprintName' => '<string>', 'CompletedOn' => <DateTime>, 'ErrorMessage' => '<string>', 'Parameters' => '<string>', 'RoleArn' => '<string>', 'RollbackErrorMessage' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'RUNNING|SUCCEEDED|FAILED|ROLLING_BACK', 'WorkflowName' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- BlueprintRuns
-
- Type: Array of BlueprintRun structures
Returns a list of
BlueprintRun
objects. - NextToken
-
- Type: string
A continuation token, if not all blueprint runs have been returned.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetCatalogImportStatus
$result = $client->getCatalogImportStatus
([/* ... */]); $promise = $client->getCatalogImportStatusAsync
([/* ... */]);
Retrieves the status of a migration operation.
Parameter Syntax
$result = $client->getCatalogImportStatus([ 'CatalogId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the catalog to migrate. Currently, this should be the Amazon Web Services account ID.
Result Syntax
[ 'ImportStatus' => [ 'ImportCompleted' => true || false, 'ImportTime' => <DateTime>, 'ImportedBy' => '<string>', ], ]
Result Details
Members
- ImportStatus
-
- Type: CatalogImportStatus structure
The status of the specified catalog migration.
Errors
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetClassifier
$result = $client->getClassifier
([/* ... */]); $promise = $client->getClassifierAsync
([/* ... */]);
Retrieve a classifier by name.
Parameter Syntax
$result = $client->getClassifier([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
Name of the classifier to retrieve.
Result Syntax
[ 'Classifier' => [ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CreationTime' => <DateTime>, 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', 'Version' => <integer>, ], 'GrokClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'JsonClassifier' => [ 'CreationTime' => <DateTime>, 'JsonPath' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'XMLClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'RowTag' => '<string>', 'Version' => <integer>, ], ], ]
Result Details
Members
- Classifier
-
- Type: Classifier structure
The requested classifier.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
GetClassifiers
$result = $client->getClassifiers
([/* ... */]); $promise = $client->getClassifiersAsync
([/* ... */]);
Lists all classifier objects in the Data Catalog.
Parameter Syntax
$result = $client->getClassifiers([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The size of the list to return (optional).
- NextToken
-
- Type: string
An optional continuation token.
Result Syntax
[ 'Classifiers' => [ [ 'CsvClassifier' => [ 'AllowSingleColumn' => true || false, 'ContainsHeader' => 'UNKNOWN|PRESENT|ABSENT', 'CreationTime' => <DateTime>, 'CustomDatatypeConfigured' => true || false, 'CustomDatatypes' => ['<string>', ...], 'Delimiter' => '<string>', 'DisableValueTrimming' => true || false, 'Header' => ['<string>', ...], 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'QuoteSymbol' => '<string>', 'Serde' => 'OpenCSVSerDe|LazySimpleSerDe|None', 'Version' => <integer>, ], 'GrokClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'CustomPatterns' => '<string>', 'GrokPattern' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'JsonClassifier' => [ 'CreationTime' => <DateTime>, 'JsonPath' => '<string>', 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'Version' => <integer>, ], 'XMLClassifier' => [ 'Classification' => '<string>', 'CreationTime' => <DateTime>, 'LastUpdated' => <DateTime>, 'Name' => '<string>', 'RowTag' => '<string>', 'Version' => <integer>, ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Classifiers
-
- Type: Array of Classifier structures
The requested list of classifier objects.
- NextToken
-
- Type: string
A continuation token.
Errors
- OperationTimeoutException:
The operation timed out.
GetColumnStatisticsForPartition
$result = $client->getColumnStatisticsForPartition
([/* ... */]); $promise = $client->getColumnStatisticsForPartitionAsync
([/* ... */]);
Retrieves partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetPartition
.
Parameter Syntax
$result = $client->getColumnStatisticsForPartition([ 'CatalogId' => '<string>', 'ColumnNames' => ['<string>', ...], // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'PartitionValues' => ['<string>', ...], // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNames
-
- Required: Yes
- Type: Array of strings
A list of the column names.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- PartitionValues
-
- Required: Yes
- Type: Array of strings
A list of partition values identifying the partition.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'ColumnStatisticsList' => [ [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], // ... ], 'Errors' => [ [ 'ColumnName' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- ColumnStatisticsList
-
- Type: Array of ColumnStatistics structures
List of ColumnStatistics that failed to be retrieved.
- Errors
-
- Type: Array of ColumnError structures
Error occurred during retrieving column statistics data.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetColumnStatisticsForTable
$result = $client->getColumnStatisticsForTable
([/* ... */]); $promise = $client->getColumnStatisticsForTableAsync
([/* ... */]);
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetTable
.
Parameter Syntax
$result = $client->getColumnStatisticsForTable([ 'CatalogId' => '<string>', 'ColumnNames' => ['<string>', ...], // REQUIRED 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.
- ColumnNames
-
- Required: Yes
- Type: Array of strings
A list of the column names.
- DatabaseName
-
- Required: Yes
- Type: string
The name of the catalog database where the partitions reside.
- TableName
-
- Required: Yes
- Type: string
The name of the partitions' table.
Result Syntax
[ 'ColumnStatisticsList' => [ [ 'AnalyzedTime' => <DateTime>, 'ColumnName' => '<string>', 'ColumnType' => '<string>', 'StatisticsData' => [ 'BinaryColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfNulls' => <integer>, ], 'BooleanColumnStatisticsData' => [ 'NumberOfFalses' => <integer>, 'NumberOfNulls' => <integer>, 'NumberOfTrues' => <integer>, ], 'DateColumnStatisticsData' => [ 'MaximumValue' => <DateTime>, 'MinimumValue' => <DateTime>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DecimalColumnStatisticsData' => [ 'MaximumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'MinimumValue' => [ 'Scale' => <integer>, 'UnscaledValue' => <string || resource || Psr\Http\Message\StreamInterface>, ], 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'DoubleColumnStatisticsData' => [ 'MaximumValue' => <float>, 'MinimumValue' => <float>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'LongColumnStatisticsData' => [ 'MaximumValue' => <integer>, 'MinimumValue' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'StringColumnStatisticsData' => [ 'AverageLength' => <float>, 'MaximumLength' => <integer>, 'NumberOfDistinctValues' => <integer>, 'NumberOfNulls' => <integer>, ], 'Type' => 'BOOLEAN|DATE|DECIMAL|DOUBLE|LONG|STRING|BINARY', ], ], // ... ], 'Errors' => [ [ 'ColumnName' => '<string>', 'Error' => [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', ], ], // ... ], ]
Result Details
Members
- ColumnStatisticsList
-
- Type: Array of ColumnStatistics structures
List of ColumnStatistics.
- Errors
-
- Type: Array of ColumnError structures
List of ColumnStatistics that failed to be retrieved.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetColumnStatisticsTaskRun
$result = $client->getColumnStatisticsTaskRun
([/* ... */]); $promise = $client->getColumnStatisticsTaskRunAsync
([/* ... */]);
Get the associated metadata/information for a task run, given a task run ID.
Parameter Syntax
$result = $client->getColumnStatisticsTaskRun([ 'ColumnStatisticsTaskRunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ColumnStatisticsTaskRunId
-
- Required: Yes
- Type: string
The identifier for the particular column statistics task run.
Result Syntax
[ 'ColumnStatisticsTaskRun' => [ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'ColumnStatisticsTaskRunId' => '<string>', 'CreationTime' => <DateTime>, 'CustomerId' => '<string>', 'DPUSeconds' => <float>, 'DatabaseName' => '<string>', 'EndTime' => <DateTime>, 'ErrorMessage' => '<string>', 'LastUpdated' => <DateTime>, 'NumberOfWorkers' => <integer>, 'Role' => '<string>', 'SampleSize' => <float>, 'SecurityConfiguration' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED', 'TableName' => '<string>', 'WorkerType' => '<string>', ], ]
Result Details
Members
- ColumnStatisticsTaskRun
-
- Type: ColumnStatisticsTaskRun structure
A
ColumnStatisticsTaskRun
object representing the details of the column stats run.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetColumnStatisticsTaskRuns
$result = $client->getColumnStatisticsTaskRuns
([/* ... */]); $promise = $client->getColumnStatisticsTaskRunsAsync
([/* ... */]);
Retrieves information about all runs associated with the specified table.
Parameter Syntax
$result = $client->getColumnStatisticsTaskRuns([ 'DatabaseName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', 'TableName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- DatabaseName
-
- Required: Yes
- Type: string
The name of the database where the table resides.
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- TableName
-
- Required: Yes
- Type: string
The name of the table.
Result Syntax
[ 'ColumnStatisticsTaskRuns' => [ [ 'CatalogID' => '<string>', 'ColumnNameList' => ['<string>', ...], 'ColumnStatisticsTaskRunId' => '<string>', 'CreationTime' => <DateTime>, 'CustomerId' => '<string>', 'DPUSeconds' => <float>, 'DatabaseName' => '<string>', 'EndTime' => <DateTime>, 'ErrorMessage' => '<string>', 'LastUpdated' => <DateTime>, 'NumberOfWorkers' => <integer>, 'Role' => '<string>', 'SampleSize' => <float>, 'SecurityConfiguration' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'STARTING|RUNNING|SUCCEEDED|FAILED|STOPPED', 'TableName' => '<string>', 'WorkerType' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- ColumnStatisticsTaskRuns
-
- Type: Array of ColumnStatisticsTaskRun structures
A list of column statistics task runs.
- NextToken
-
- Type: string
A continuation token, if not all task runs have yet been returned.
Errors
- OperationTimeoutException:
The operation timed out.
GetConnection
$result = $client->getConnection
([/* ... */]); $promise = $client->getConnectionAsync
([/* ... */]);
Retrieves a connection definition from the Data Catalog.
Parameter Syntax
$result = $client->getConnection([ 'CatalogId' => '<string>', 'HidePassword' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connection resides. If none is provided, the Amazon Web Services account ID is used by default.
- HidePassword
-
- Type: boolean
Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.
- Name
-
- Required: Yes
- Type: string
The name of the connection definition to retrieve.
Result Syntax
[ 'Connection' => [ 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'CreationTime' => <DateTime>, 'Description' => '<string>', 'LastConnectionValidationTime' => <DateTime>, 'LastUpdatedBy' => '<string>', 'LastUpdatedTime' => <DateTime>, 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'Status' => 'READY|IN_PROGRESS|FAILED', 'StatusReason' => '<string>', ], ]
Result Details
Members
- Connection
-
- Type: Connection structure
The requested connection definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
GetConnections
$result = $client->getConnections
([/* ... */]); $promise = $client->getConnectionsAsync
([/* ... */]);
Retrieves a list of connection definitions from the Data Catalog.
Parameter Syntax
$result = $client->getConnections([ 'CatalogId' => '<string>', 'Filter' => [ 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'MatchCriteria' => ['<string>', ...], ], 'HidePassword' => true || false, 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the connections reside. If none is provided, the Amazon Web Services account ID is used by default.
- Filter
-
- Type: GetConnectionsFilter structure
A filter that controls which connections are returned.
- HidePassword
-
- Type: boolean
Allows you to retrieve the connection metadata without returning the password. For instance, the Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.
- MaxResults
-
- Type: int
The maximum number of connections to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'ConnectionList' => [ [ 'AthenaProperties' => ['<string>', ...], 'AuthenticationConfiguration' => [ 'AuthenticationType' => 'BASIC|OAUTH2|CUSTOM', 'OAuth2Properties' => [ 'OAuth2ClientApplication' => [ 'AWSManagedClientApplicationReference' => '<string>', 'UserManagedClientApplicationClientId' => '<string>', ], 'OAuth2GrantType' => 'AUTHORIZATION_CODE|CLIENT_CREDENTIALS|JWT_BEARER', 'TokenUrl' => '<string>', 'TokenUrlParametersMap' => ['<string>', ...], ], 'SecretArn' => '<string>', ], 'ConnectionProperties' => ['<string>', ...], 'ConnectionType' => 'JDBC|SFTP|MONGODB|KAFKA|NETWORK|MARKETPLACE|CUSTOM|SALESFORCE|VIEW_VALIDATION_REDSHIFT|VIEW_VALIDATION_ATHENA', 'CreationTime' => <DateTime>, 'Description' => '<string>', 'LastConnectionValidationTime' => <DateTime>, 'LastUpdatedBy' => '<string>', 'LastUpdatedTime' => <DateTime>, 'MatchCriteria' => ['<string>', ...], 'Name' => '<string>', 'PhysicalConnectionRequirements' => [ 'AvailabilityZone' => '<string>', 'SecurityGroupIdList' => ['<string>', ...], 'SubnetId' => '<string>', ], 'Status' => 'READY|IN_PROGRESS|FAILED', 'StatusReason' => '<string>', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- ConnectionList
-
- Type: Array of Connection structures
A list of requested connection definitions.
- NextToken
-
- Type: string
A continuation token, if the list of connections returned does not include the last of the filtered connections.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
- GlueEncryptionException:
An encryption operation failed.
GetCrawler
$result = $client->getCrawler
([/* ... */]); $promise = $client->getCrawlerAsync
([/* ... */]);
Retrieves metadata for a specified crawler.
Parameter Syntax
$result = $client->getCrawler([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the crawler to retrieve metadata for.
Result Syntax
[ 'Crawler' => [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], ]
Result Details
Members
- Crawler
-
- Type: Crawler structure
The metadata for the specified crawler.
Errors
- EntityNotFoundException:
A specified entity does not exist
- OperationTimeoutException:
The operation timed out.
GetCrawlerMetrics
$result = $client->getCrawlerMetrics
([/* ... */]); $promise = $client->getCrawlerMetricsAsync
([/* ... */]);
Retrieves metrics about specified crawlers.
Parameter Syntax
$result = $client->getCrawlerMetrics([ 'CrawlerNameList' => ['<string>', ...], 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- CrawlerNameList
-
- Type: Array of strings
A list of the names of crawlers about which to retrieve metrics.
- MaxResults
-
- Type: int
The maximum size of a list to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'CrawlerMetricsList' => [ [ 'CrawlerName' => '<string>', 'LastRuntimeSeconds' => <float>, 'MedianRuntimeSeconds' => <float>, 'StillEstimating' => true || false, 'TablesCreated' => <integer>, 'TablesDeleted' => <integer>, 'TablesUpdated' => <integer>, 'TimeLeftSeconds' => <float>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- CrawlerMetricsList
-
- Type: Array of CrawlerMetrics structures
A list of metrics for the specified crawler.
- NextToken
-
- Type: string
A continuation token, if the returned list does not contain the last metric available.
Errors
- OperationTimeoutException:
The operation timed out.
GetCrawlers
$result = $client->getCrawlers
([/* ... */]); $promise = $client->getCrawlersAsync
([/* ... */]);
Retrieves metadata for all crawlers defined in the customer account.
Parameter Syntax
$result = $client->getCrawlers([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The number of crawlers to return on each call.
- NextToken
-
- Type: string
A continuation token, if this is a continuation request.
Result Syntax
[ 'Crawlers' => [ [ 'Classifiers' => ['<string>', ...], 'Configuration' => '<string>', 'CrawlElapsedTime' => <integer>, 'CrawlerSecurityConfiguration' => '<string>', 'CreationTime' => <DateTime>, 'DatabaseName' => '<string>', 'Description' => '<string>', 'LakeFormationConfiguration' => [ 'AccountId' => '<string>', 'UseLakeFormationCredentials' => true || false, ], 'LastCrawl' => [ 'ErrorMessage' => '<string>', 'LogGroup' => '<string>', 'LogStream' => '<string>', 'MessagePrefix' => '<string>', 'StartTime' => <DateTime>, 'Status' => 'SUCCEEDED|CANCELLED|FAILED', ], 'LastUpdated' => <DateTime>, 'LineageConfiguration' => [ 'CrawlerLineageSettings' => 'ENABLE|DISABLE', ], 'Name' => '<string>', 'RecrawlPolicy' => [ 'RecrawlBehavior' => 'CRAWL_EVERYTHING|CRAWL_NEW_FOLDERS_ONLY|CRAWL_EVENT_MODE', ], 'Role' => '<string>', 'Schedule' => [ 'ScheduleExpression' => '<string>', 'State' => 'SCHEDULED|NOT_SCHEDULED|TRANSITIONING', ], 'SchemaChangePolicy' => [ 'DeleteBehavior' => 'LOG|DELETE_FROM_DATABASE|DEPRECATE_IN_DATABASE', 'UpdateBehavior' => 'LOG|UPDATE_IN_DATABASE', ], 'State' => 'READY|RUNNING|STOPPING', 'TablePrefix' => '<string>', 'Targets' => [ 'CatalogTargets' => [ [ 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Tables' => ['<string>', ...], ], // ... ], 'DeltaTargets' => [ [ 'ConnectionName' => '<string>', 'CreateNativeDeltaTable' => true || false, 'DeltaTables' => ['<string>', ...], 'WriteManifest' => true || false, ], // ... ], 'DynamoDBTargets' => [ [ 'Path' => '<string>', 'scanAll' => true || false, 'scanRate' => <float>, ], // ... ], 'HudiTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'IcebergTargets' => [ [ 'ConnectionName' => '<string>', 'Exclusions' => ['<string>', ...], 'MaximumTraversalDepth' => <integer>, 'Paths' => ['<string>', ...], ], // ... ], 'JdbcTargets' => [ [ 'ConnectionName' => '<string>', 'EnableAdditionalMetadata' => ['<string>', ...], 'Exclusions' => ['<string>', ...], 'Path' => '<string>', ], // ... ], 'MongoDBTargets' => [ [ 'ConnectionName' => '<string>', 'Path' => '<string>', 'ScanAll' => true || false, ], // ... ], 'S3Targets' => [ [ 'ConnectionName' => '<string>', 'DlqEventQueueArn' => '<string>', 'EventQueueArn' => '<string>', 'Exclusions' => ['<string>', ...], 'Path' => '<string>', 'SampleSize' => <integer>, ], // ... ], ], 'Version' => <integer>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Crawlers
-
- Type: Array of Crawler structures
A list of crawler metadata.
- NextToken
-
- Type: string
A continuation token, if the returned list has not reached the end of those defined in this customer account.
Errors
- OperationTimeoutException:
The operation timed out.
GetCustomEntityType
$result = $client->getCustomEntityType
([/* ... */]); $promise = $client->getCustomEntityTypeAsync
([/* ... */]);
Retrieves the details of a custom pattern by specifying its name.
Parameter Syntax
$result = $client->getCustomEntityType([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the custom pattern that you want to retrieve.
Result Syntax
[ 'ContextWords' => ['<string>', ...], 'Name' => '<string>', 'RegexString' => '<string>', ]
Result Details
Members
- ContextWords
-
- Type: Array of strings
A list of context words if specified when you created the custom pattern. If none of these context words are found within the vicinity of the regular expression the data will not be detected as sensitive data.
- Name
-
- Type: string
The name of the custom pattern that you retrieved.
- RegexString
-
- Type: string
A regular expression string that is used for detecting sensitive data in a custom pattern.
Errors
- EntityNotFoundException:
A specified entity does not exist
- AccessDeniedException:
Access to a resource was denied.
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
GetDataCatalogEncryptionSettings
$result = $client->getDataCatalogEncryptionSettings
([/* ... */]); $promise = $client->getDataCatalogEncryptionSettingsAsync
([/* ... */]);
Retrieves the security configuration for a specified catalog.
Parameter Syntax
$result = $client->getDataCatalogEncryptionSettings([ 'CatalogId' => '<string>', ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog to retrieve the security configuration for. If none is provided, the Amazon Web Services account ID is used by default.
Result Syntax
[ 'DataCatalogEncryptionSettings' => [ 'ConnectionPasswordEncryption' => [ 'AwsKmsKeyId' => '<string>', 'ReturnConnectionPasswordEncrypted' => true || false, ], 'EncryptionAtRest' => [ 'CatalogEncryptionMode' => 'DISABLED|SSE-KMS|SSE-KMS-WITH-SERVICE-ROLE', 'CatalogEncryptionServiceRole' => '<string>', 'SseAwsKmsKeyId' => '<string>', ], ], ]
Result Details
Members
- DataCatalogEncryptionSettings
-
- Type: DataCatalogEncryptionSettings structure
The requested security configuration.
Errors
- InternalServiceException:
An internal service error occurred.
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
GetDataQualityModel
$result = $client->getDataQualityModel
([/* ... */]); $promise = $client->getDataQualityModelAsync
([/* ... */]);
Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).
Parameter Syntax
$result = $client->getDataQualityModel([ 'ProfileId' => '<string>', // REQUIRED 'StatisticId' => '<string>', ]);
Parameter Details
Members
- ProfileId
-
- Required: Yes
- Type: string
The Profile ID.
- StatisticId
-
- Type: string
The Statistic ID.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'FailureReason' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'RUNNING|SUCCEEDED|FAILED', ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training completed.
- FailureReason
-
- Type: string
The training failure reason.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training started.
- Status
-
- Type: string
The training status of the data quality model.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityModelResult
$result = $client->getDataQualityModelResult
([/* ... */]); $promise = $client->getDataQualityModelResultAsync
([/* ... */]);
Retrieve a statistic's predictions for a given Profile ID.
Parameter Syntax
$result = $client->getDataQualityModelResult([ 'ProfileId' => '<string>', // REQUIRED 'StatisticId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ProfileId
-
- Required: Yes
- Type: string
The Profile ID.
- StatisticId
-
- Required: Yes
- Type: string
The Statistic ID.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'Model' => [ [ 'ActualValue' => <float>, 'Date' => <DateTime>, 'InclusionAnnotation' => 'INCLUDE|EXCLUDE', 'LowerBound' => <float>, 'PredictedValue' => <float>, 'UpperBound' => <float>, ], // ... ], ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The timestamp when the data quality model training completed.
- Model
-
- Type: Array of StatisticModelResult structures
A list of
StatisticModelResult
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityResult
$result = $client->getDataQualityResult
([/* ... */]); $promise = $client->getDataQualityResultAsync
([/* ... */]);
Retrieves the result of a data quality rule evaluation.
Parameter Syntax
$result = $client->getDataQualityResult([ 'ResultId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ResultId
-
- Required: Yes
- Type: string
A unique result ID for the data quality result.
Result Syntax
[ 'AnalyzerResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluationMessage' => '<string>', 'Name' => '<string>', ], // ... ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'EvaluationContext' => '<string>', 'JobName' => '<string>', 'JobRunId' => '<string>', 'Observations' => [ [ 'Description' => '<string>', 'MetricBasedObservation' => [ 'MetricName' => '<string>', 'MetricValues' => [ 'ActualValue' => <float>, 'ExpectedValue' => <float>, 'LowerLimit' => <float>, 'UpperLimit' => <float>, ], 'NewRules' => ['<string>', ...], 'StatisticId' => '<string>', ], ], // ... ], 'ProfileId' => '<string>', 'ResultId' => '<string>', 'RuleResults' => [ [ 'Description' => '<string>', 'EvaluatedMetrics' => [<float>, ...], 'EvaluatedRule' => '<string>', 'EvaluationMessage' => '<string>', 'Name' => '<string>', 'Result' => 'PASS|FAIL|ERROR', ], // ... ], 'RulesetEvaluationRunId' => '<string>', 'RulesetName' => '<string>', 'Score' => <float>, 'StartedOn' => <DateTime>, ]
Result Details
Members
- AnalyzerResults
-
- Type: Array of DataQualityAnalyzerResult structures
A list of
DataQualityAnalyzerResult
objects representing the results for each analyzer. - CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the run for this data quality result was completed.
- DataSource
-
- Type: DataSource structure
The table associated with the data quality result, if any.
- EvaluationContext
-
- Type: string
In the context of a job in Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the
evaluationContext
can differentiate the nodes. - JobName
-
- Type: string
The job name associated with the data quality result, if any.
- JobRunId
-
- Type: string
The job run ID associated with the data quality result, if any.
- Observations
-
- Type: Array of DataQualityObservation structures
A list of
DataQualityObservation
objects representing the observations generated after evaluating the rules and analyzers. - ProfileId
-
- Type: string
The Profile ID for the data quality result.
- ResultId
-
- Type: string
A unique result ID for the data quality result.
- RuleResults
-
- Type: Array of DataQualityRuleResult structures
A list of
DataQualityRuleResult
objects representing the results for each rule. - RulesetEvaluationRunId
-
- Type: string
The unique run ID associated with the ruleset evaluation.
- RulesetName
-
- Type: string
The name of the ruleset associated with the data quality result.
- Score
-
- Type: double
An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the run for this data quality result started.
Errors
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
- EntityNotFoundException:
A specified entity does not exist
GetDataQualityRuleRecommendationRun
$result = $client->getDataQualityRuleRecommendationRun
([/* ... */]); $promise = $client->getDataQualityRuleRecommendationRunAsync
([/* ... */]);
Gets the specified recommendation run that was used to generate rules.
Parameter Syntax
$result = $client->getDataQualityRuleRecommendationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[ 'CompletedOn' => <DateTime>, 'CreatedRulesetName' => '<string>', 'DataQualitySecurityConfiguration' => '<string>', 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'NumberOfWorkers' => <integer>, 'RecommendedRuleset' => '<string>', 'Role' => '<string>', 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'Timeout' => <integer>, ]
Result Details
Members
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run was completed.
- CreatedRulesetName
-
- Type: string
The name of the ruleset that was created by the run.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- DataSource
-
- Type: DataSource structure
The data source (an Glue table) associated with this run.
- ErrorString
-
- Type: string
The error strings that are associated with the run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality rule recommendation run was modified.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - RecommendedRuleset
-
- Type: string
When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.
- Role
-
- Type: string
An IAM role supplied to encrypt the results of the run.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run started.
- Status
-
- Type: string
The status for this run.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityRuleset
$result = $client->getDataQualityRuleset
([/* ... */]); $promise = $client->getDataQualityRulesetAsync
([/* ... */]);
Returns an existing ruleset by identifier or name.
Parameter Syntax
$result = $client->getDataQualityRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the ruleset.
Result Syntax
[ 'CreatedOn' => <DateTime>, 'DataQualitySecurityConfiguration' => '<string>', 'Description' => '<string>', 'LastModifiedOn' => <DateTime>, 'Name' => '<string>', 'RecommendationRunId' => '<string>', 'Ruleset' => '<string>', 'TargetTable' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ]
Result Details
Members
- CreatedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The time and date that this data quality ruleset was created.
- DataQualitySecurityConfiguration
-
- Type: string
The name of the security configuration created with the data quality encryption option.
- Description
-
- Type: string
A description of the ruleset.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality ruleset was modified.
- Name
-
- Type: string
The name of the ruleset.
- RecommendationRunId
-
- Type: string
When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
- Ruleset
-
- Type: string
A Data Quality Definition Language (DQDL) ruleset. For more information, see the Glue developer guide.
- TargetTable
-
- Type: DataQualityTargetTable structure
The name and database name of the target table.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDataQualityRulesetEvaluationRun
$result = $client->getDataQualityRulesetEvaluationRun
([/* ... */]); $promise = $client->getDataQualityRulesetEvaluationRunAsync
([/* ... */]);
Retrieves a specific run where a ruleset is evaluated against a data source.
Parameter Syntax
$result = $client->getDataQualityRulesetEvaluationRun([ 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- RunId
-
- Required: Yes
- Type: string
The unique run identifier associated with this run.
Result Syntax
[ 'AdditionalDataSources' => [ '<NameString>' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], // ... ], 'AdditionalRunOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'CompositeRuleEvaluationMethod' => 'COLUMN|ROW', 'ResultsS3Prefix' => '<string>', ], 'CompletedOn' => <DateTime>, 'DataSource' => [ 'GlueTable' => [ 'AdditionalOptions' => ['<string>', ...], 'CatalogId' => '<string>', 'ConnectionName' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', ], ], 'ErrorString' => '<string>', 'ExecutionTime' => <integer>, 'LastModifiedOn' => <DateTime>, 'NumberOfWorkers' => <integer>, 'ResultIds' => ['<string>', ...], 'Role' => '<string>', 'RulesetNames' => ['<string>', ...], 'RunId' => '<string>', 'StartedOn' => <DateTime>, 'Status' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'Timeout' => <integer>, ]
Result Details
Members
- AdditionalDataSources
-
- Type: Associative array of custom strings keys (NameString) to DataSource structures
A map of reference strings to additional data sources you can specify for an evaluation run.
- AdditionalRunOptions
-
- Type: DataQualityEvaluationRunAdditionalRunOptions structure
Additional run options you can specify for an evaluation run.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run was completed.
- DataSource
-
- Type: DataSource structure
The data source (an Glue table) associated with this evaluation run.
- ErrorString
-
- Type: string
The error strings that are associated with the run.
- ExecutionTime
-
- Type: int
The amount of time (in seconds) that the run consumed resources.
- LastModifiedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
A timestamp. The last point in time when this data quality rule recommendation run was modified.
- NumberOfWorkers
-
- Type: int
The number of
G.1X
workers to be used in the run. The default is 5. - ResultIds
-
- Type: Array of strings
A list of result IDs for the data quality results for the run.
- Role
-
- Type: string
An IAM role supplied to encrypt the results of the run.
- RulesetNames
-
- Type: Array of strings
A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.
- RunId
-
- Type: string
The unique run identifier associated with this run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when this run started.
- Status
-
- Type: string
The status for this run.
- Timeout
-
- Type: int
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- OperationTimeoutException:
The operation timed out.
- InternalServiceException:
An internal service error occurred.
GetDatabase
$result = $client->getDatabase
([/* ... */]); $promise = $client->getDatabaseAsync
([/* ... */]);
Retrieves the definition of a specified database.
Parameter Syntax
$result = $client->getDatabase([ 'CatalogId' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CatalogId
-
- Type: string
The ID of the Data Catalog in which the database resides. If none is provided, the Amazon Web Services account ID is used by default.
- Name
-
- Required: Yes
- Type: string
The name of the database to retrieve. For Hive compatibility, this should be all lowercase.
Result Syntax
[ 'Database' => [ 'CatalogId' => '<string>', 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'CreateTime' => <DateTime>, 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], ]
Result Details
Members
- Database
-
- Type: Database structure
The definition of the specified database in the Data Catalog.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
- FederationSourceException:
A federation source failed.
GetDatabases
$result = $client->getDatabases
([/* ... */]); $promise = $client->getDatabasesAsync
([/* ... */]);
Retrieves all databases defined in a given Data Catalog.
Parameter Syntax
$result = $client->getDatabases([ 'AttributesToGet' => ['<string>', ...], 'CatalogId' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ResourceShareType' => 'FOREIGN|ALL|FEDERATED', ]);
Parameter Details
Members
- AttributesToGet
-
- Type: Array of strings
Specifies the database fields returned by the
GetDatabases
call. This parameter doesn’t accept an empty list. The request must include theNAME
. - CatalogId
-
- Type: string
The ID of the Data Catalog from which to retrieve
Databases
. If none is provided, the Amazon Web Services account ID is used by default. - MaxResults
-
- Type: int
The maximum number of databases to return in one response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
- ResourceShareType
-
- Type: string
Allows you to specify that you want to list the databases shared with your account. The allowable values are
FEDERATED
,FOREIGN
orALL
.-
If set to
FEDERATED
, will list the federated databases (referencing an external entity) shared with your account. -
If set to
FOREIGN
, will list the databases shared with your account. -
If set to
ALL
, will list the databases shared with your account, as well as the databases in yor local account.
Result Syntax
[ 'DatabaseList' => [ [ 'CatalogId' => '<string>', 'CreateTableDefaultPermissions' => [ [ 'Permissions' => ['<string>', ...], 'Principal' => [ 'DataLakePrincipalIdentifier' => '<string>', ], ], // ... ], 'CreateTime' => <DateTime>, 'Description' => '<string>', 'FederatedDatabase' => [ 'ConnectionName' => '<string>', 'Identifier' => '<string>', ], 'LocationUri' => '<string>', 'Name' => '<string>', 'Parameters' => ['<string>', ...], 'TargetDatabase' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'Region' => '<string>', ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- DatabaseList
-
- Required: Yes
- Type: Array of Database structures
A list of
Database
objects from the specified catalog. - NextToken
-
- Type: string
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- GlueEncryptionException:
An encryption operation failed.
GetDataflowGraph
$result = $client->getDataflowGraph
([/* ... */]); $promise = $client->getDataflowGraphAsync
([/* ... */]);
Transforms a Python script into a directed acyclic graph (DAG).
Parameter Syntax
$result = $client->getDataflowGraph([ 'PythonScript' => '<string>', ]);
Parameter Details
Members
- PythonScript
-
- Type: string
The Python script to transform.
Result Syntax
[ 'DagEdges' => [ [ 'Source' => '<string>', 'Target' => '<string>', 'TargetParameter' => '<string>', ], // ... ], 'DagNodes' => [ [ 'Args' => [ [ 'Name' => '<string>', 'Param' => true || false, 'Value' => '<string>', ], // ... ], 'Id' => '<string>', 'LineNumber' => <integer>, 'NodeType' => '<string>', ], // ... ], ]
Result Details
Members
- DagEdges
-
- Type: Array of CodeGenEdge structures
A list of the edges in the resulting DAG.
- DagNodes
-
- Type: Array of CodeGenNode structures
A list of the nodes in the resulting DAG.
Errors
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetDevEndpoint
$result = $client->getDevEndpoint
([/* ... */]); $promise = $client->getDevEndpointAsync
([/* ... */]);
Retrieves information about a specified development endpoint.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address, and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
Parameter Syntax
$result = $client->getDevEndpoint([ 'EndpointName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- EndpointName
-
- Required: Yes
- Type: string
Name of the
DevEndpoint
to retrieve information for.
Result Syntax
[ 'DevEndpoint' => [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], ]
Result Details
Members
- DevEndpoint
-
- Type: DevEndpoint structure
A
DevEndpoint
definition.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetDevEndpoints
$result = $client->getDevEndpoints
([/* ... */]); $promise = $client->getDevEndpointsAsync
([/* ... */]);
Retrieves all the development endpoints in this Amazon Web Services account.
When you create a development endpoint in a virtual private cloud (VPC), Glue returns only a private IP address and the public IP address field is not populated. When you create a non-VPC development endpoint, Glue returns only a public IP address.
Parameter Syntax
$result = $client->getDevEndpoints([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of information to return.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'DevEndpoints' => [ [ 'Arguments' => ['<string>', ...], 'AvailabilityZone' => '<string>', 'CreatedTimestamp' => <DateTime>, 'EndpointName' => '<string>', 'ExtraJarsS3Path' => '<string>', 'ExtraPythonLibsS3Path' => '<string>', 'FailureReason' => '<string>', 'GlueVersion' => '<string>', 'LastModifiedTimestamp' => <DateTime>, 'LastUpdateStatus' => '<string>', 'NumberOfNodes' => <integer>, 'NumberOfWorkers' => <integer>, 'PrivateAddress' => '<string>', 'PublicAddress' => '<string>', 'PublicKey' => '<string>', 'PublicKeys' => ['<string>', ...], 'RoleArn' => '<string>', 'SecurityConfiguration' => '<string>', 'SecurityGroupIds' => ['<string>', ...], 'Status' => '<string>', 'SubnetId' => '<string>', 'VpcId' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', 'YarnEndpointAddress' => '<string>', 'ZeppelinRemoteSparkInterpreterPort' => <integer>, ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- DevEndpoints
-
- Type: Array of DevEndpoint structures
A list of
DevEndpoint
definitions. - NextToken
-
- Type: string
A continuation token, if not all
DevEndpoint
definitions have yet been returned.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- InvalidInputException:
The input provided was not valid.
GetJob
$result = $client->getJob
([/* ... */]); $promise = $client->getJobAsync
([/* ... */]);
Retrieves an existing job definition.
Parameter Syntax
$result = $client->getJob([ 'JobName' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition to retrieve.
Result Syntax
[ 'Job' => [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Parameters' => [ [ 'IsOptional' => true || false, 'ListType' => 'str|int|float|complex|bool|list|null', 'Name' => '<string>', 'Type' => 'str|int|float|complex|bool|list|null', 'ValidationMessage' => '<string>', 'ValidationRule' => '<string>', 'Value' => ['<string>', ...], ], // ... ], 'Path' => '<string>', 'TransformName' => '<string>', 'Version' => '<string>', ], 'DynamoDBCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'EvaluateDataQuality' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Output' => 'PrimaryInput|EvaluationResults', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'EvaluateDataQualityMultiFrame' => [ 'AdditionalDataSources' => ['<string>', ...], 'AdditionalOptions' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PublishingOptions' => [ 'CloudWatchMetricsEnabled' => true || false, 'EvaluationContext' => '<string>', 'ResultsPublishingEnabled' => true || false, 'ResultsS3Prefix' => '<string>', ], 'Ruleset' => '<string>', 'StopJobOnFailureOptions' => [ 'StopJobOnFailureTiming' => 'Immediate|AfterDataLoad', ], ], 'FillMissingValues' => [ 'FilledPath' => '<string>', 'ImputedPath' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'Filter' => [ 'Filters' => [ [ 'Negated' => true || false, 'Operation' => 'EQ|LT|GT|LTE|GTE|REGEX|ISNULL', 'Values' => [ [ 'Type' => 'COLUMNEXTRACTED|CONSTANT', 'Value' => ['<string>', ...], ], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'LogicalOperator' => 'AND|OR', 'Name' => '<string>', ], 'GovernedCatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'GovernedCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'JDBCConnectorSource' => [ 'AdditionalOptions' => [ 'DataTypeMapping' => ['<string>', ...], 'FilterPredicate' => '<string>', 'JobBookmarkKeys' => ['<string>', ...], 'JobBookmarkKeysSortOrder' => '<string>', 'LowerBound' => <integer>, 'NumPartitions' => <integer>, 'PartitionColumn' => '<string>', 'UpperBound' => <integer>, ], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Query' => '<string>', ], 'JDBCConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'Join' => [ 'Columns' => [ [ 'From' => '<string>', 'Keys' => [ ['<string>', ...], // ... ], ], // ... ], 'Inputs' => ['<string>', ...], 'JoinType' => 'equijoin|left|right|outer|leftsemi|leftanti', 'Name' => '<string>', ], 'Merge' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PrimaryKeys' => [ ['<string>', ...], // ... ], 'Source' => '<string>', ], 'MicrosoftSQLServerCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MicrosoftSQLServerCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'MySQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'OracleSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'PIIDetection' => [ 'EntityTypesToDetect' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'MaskValue' => '<string>', 'Name' => '<string>', 'OutputColumnName' => '<string>', 'PiiType' => 'RowAudit|RowMasking|ColumnAudit|ColumnMasking', 'SampleFraction' => <float>, 'ThresholdFraction' => <float>, ], 'PostgreSQLCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'PostgreSQLCatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Table' => '<string>', ], 'Recipe' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RecipeReference' => [ 'RecipeArn' => '<string>', 'RecipeVersion' => '<string>', ], 'RecipeSteps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], ], 'RedshiftSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', ], 'RedshiftTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', 'TmpDirIAMRole' => '<string>', 'UpsertRedshiftOptions' => [ 'ConnectionName' => '<string>', 'TableLocation' => '<string>', 'UpsertKeys' => ['<string>', ...], ], ], 'RelationalCatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'RenameField' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'SourcePath' => ['<string>', ...], 'TargetPath' => ['<string>', ...], ], 'S3CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'S3CatalogSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, ], 'Database' => '<string>', 'Name' => '<string>', 'PartitionPredicate' => '<string>', 'Table' => '<string>', ], 'S3CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3CsvSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Escaper' => '<string>', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OptimizePerformance' => true || false, 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'QuoteChar' => 'quote|quillemet|single_quote|disabled', 'Recurse' => true || false, 'Separator' => 'comma|ctrla|pipe|semicolon|tab', 'SkipFirst' => true || false, 'WithHeader' => true || false, 'WriteHeader' => true || false, ], 'S3DeltaCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3DeltaDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3DeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3DirectTarget' => [ 'Compression' => '<string>', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3GlueParquetTarget' => [ 'Compression' => 'snappy|lzo|gzip|uncompressed|none', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiCatalogTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'SchemaChangePolicy' => [ 'EnableUpdateCatalog' => true || false, 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], 'Table' => '<string>', ], 'S3HudiDirectTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'Compression' => 'gzip|lzo|uncompressed|snappy', 'Format' => 'json|csv|avro|orc|parquet|hudi|delta', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Path' => '<string>', 'SchemaChangePolicy' => [ 'Database' => '<string>', 'EnableUpdateCatalog' => true || false, 'Table' => '<string>', 'UpdateBehavior' => 'UPDATE_IN_DATABASE|LOG', ], ], 'S3HudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], ], 'S3JsonSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'gzip|bzip2', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'JsonPath' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Multiline' => true || false, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'S3ParquetSource' => [ 'AdditionalOptions' => [ 'BoundedFiles' => <integer>, 'BoundedSize' => <integer>, 'EnableSamplePath' => true || false, 'SamplePath' => '<string>', ], 'CompressionType' => 'snappy|lzo|gzip|uncompressed|none', 'Exclusions' => ['<string>', ...], 'GroupFiles' => '<string>', 'GroupSize' => '<string>', 'MaxBand' => <integer>, 'MaxFilesInBand' => <integer>, 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Paths' => ['<string>', ...], 'Recurse' => true || false, ], 'SelectFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'SelectFromCollection' => [ 'Index' => <integer>, 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SnowflakeSource' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SnowflakeTarget' => [ 'Data' => [ 'Action' => '<string>', 'AdditionalOptions' => ['<string>', ...], 'AutoPushdown' => true || false, 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Database' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => '<string>', 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'SparkConnectorSource' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkConnectorTarget' => [ 'AdditionalOptions' => ['<string>', ...], 'ConnectionName' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'SparkSQL' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SqlAliases' => [ [ 'Alias' => '<string>', 'From' => '<string>', ], // ... ], 'SqlQuery' => '<string>', ], 'Spigot' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Path' => '<string>', 'Prob' => <float>, 'Topk' => <integer>, ], 'SplitFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'Union' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'UnionType' => 'ALL|DISTINCT', ], ], // ... ], 'Command' => [ 'Name' => '<string>', 'PythonVersion' => '<string>', 'Runtime' => '<string>', 'ScriptLocation' => '<string>', ], 'Connections' => [ 'Connections' => ['<string>', ...], ], 'CreatedOn' => <DateTime>, 'DefaultArguments' => ['<string>', ...], 'Description' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionProperty' => [ 'MaxConcurrentRuns' => <integer>, ], 'GlueVersion' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobRunQueuingEnabled' => true || false, 'LastModifiedOn' => <DateTime>, 'LogUri' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'NonOverridableArguments' => ['<string>', ...], 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'ProfileName' => '<string>', 'Role' => '<string>', 'SecurityConfiguration' => '<string>', 'SourceControlDetails' => [ 'AuthStrategy' => 'PERSONAL_ACCESS_TOKEN|AWS_SECRETS_MANAGER', 'AuthToken' => '<string>', 'Branch' => '<string>', 'Folder' => '<string>', 'LastCommitId' => '<string>', 'Owner' => '<string>', 'Provider' => 'GITHUB|GITLAB|BITBUCKET|AWS_CODE_COMMIT', 'Repository' => '<string>', ], 'Timeout' => <integer>, 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- Job
-
- Type: Job structure
The requested job definition.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobBookmark
$result = $client->getJobBookmark
([/* ... */]); $promise = $client->getJobBookmarkAsync
([/* ... */]);
Returns information on a job bookmark entry.
For more information about enabling and using job bookmarks, see:
Parameter Syntax
$result = $client->getJobBookmark([ 'JobName' => '<string>', // REQUIRED 'RunId' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job in question.
- RunId
-
- Type: string
The unique run identifier associated with this job run.
Result Syntax
[ 'JobBookmarkEntry' => [ 'Attempt' => <integer>, 'JobBookmark' => '<string>', 'JobName' => '<string>', 'PreviousRunId' => '<string>', 'Run' => <integer>, 'RunId' => '<string>', 'Version' => <integer>, ], ]
Result Details
Members
- JobBookmarkEntry
-
- Type: JobBookmarkEntry structure
A structure that defines a point that a job can resume processing.
Errors
- EntityNotFoundException:
A specified entity does not exist
- InvalidInputException:
The input provided was not valid.
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
- ValidationException:
A value could not be validated.
GetJobRun
$result = $client->getJobRun
([/* ... */]); $promise = $client->getJobRunAsync
([/* ... */]);
Retrieves the metadata for a given job run. Job run history is accessible for 90 days for your workflow and job run.
Parameter Syntax
$result = $client->getJobRun([ 'JobName' => '<string>', // REQUIRED 'PredecessorsIncluded' => true || false, 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
Name of the job definition being run.
- PredecessorsIncluded
-
- Type: boolean
True if a list of predecessor runs should be returned.
- RunId
-
- Required: Yes
- Type: string
The ID of the job run.
Result Syntax
[ 'JobRun' => [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], ]
Result Details
Members
- JobRun
-
- Type: JobRun structure
The requested job-run metadata.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobRuns
$result = $client->getJobRuns
([/* ... */]); $promise = $client->getJobRunsAsync
([/* ... */]);
Retrieves metadata for all runs of a given job definition.
Parameter Syntax
$result = $client->getJobRuns([ 'JobName' => '<string>', // REQUIRED 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Required: Yes
- Type: string
The name of the job definition for which to retrieve all job runs.
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'JobRuns' => [ [ 'AllocatedCapacity' => <integer>, 'Arguments' => ['<string>', ...], 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DPUSeconds' => <float>, 'ErrorMessage' => '<string>', 'ExecutionClass' => 'FLEX|STANDARD', 'ExecutionTime' => <integer>, 'GlueVersion' => '<string>', 'Id' => '<string>', 'JobMode' => 'SCRIPT|VISUAL|NOTEBOOK', 'JobName' => '<string>', 'JobRunQueuingEnabled' => true || false, 'JobRunState' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT|ERROR|WAITING|EXPIRED', 'LastModifiedOn' => <DateTime>, 'LogGroupName' => '<string>', 'MaintenanceWindow' => '<string>', 'MaxCapacity' => <float>, 'NotificationProperty' => [ 'NotifyDelayAfter' => <integer>, ], 'NumberOfWorkers' => <integer>, 'PredecessorRuns' => [ [ 'JobName' => '<string>', 'RunId' => '<string>', ], // ... ], 'PreviousRunId' => '<string>', 'ProfileName' => '<string>', 'SecurityConfiguration' => '<string>', 'StartedOn' => <DateTime>, 'StateDetail' => '<string>', 'Timeout' => <integer>, 'TriggerName' => '<string>', 'WorkerType' => 'Standard|G.1X|G.2X|G.025X|G.4X|G.8X|Z.2X', ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- JobRuns
-
- Type: Array of JobRun structures
A list of job-run metadata objects.
- NextToken
-
- Type: string
A continuation token, if not all requested job runs have been returned.
Errors
- InvalidInputException:
The input provided was not valid.
- EntityNotFoundException:
A specified entity does not exist
- InternalServiceException:
An internal service error occurred.
- OperationTimeoutException:
The operation timed out.
GetJobs
$result = $client->getJobs
([/* ... */]); $promise = $client->getJobsAsync
([/* ... */]);
Retrieves all current job definitions.
Parameter Syntax
$result = $client->getJobs([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum size of the response.
- NextToken
-
- Type: string
A continuation token, if this is a continuation call.
Result Syntax
[ 'Jobs' => [ [ 'AllocatedCapacity' => <integer>, 'CodeGenConfigurationNodes' => [ '<NodeId>' => [ 'Aggregate' => [ 'Aggs' => [ [ 'AggFunc' => 'avg|countDistinct|count|first|last|kurtosis|max|min|skewness|stddev_samp|stddev_pop|sum|sumDistinct|var_samp|var_pop', 'Column' => ['<string>', ...], ], // ... ], 'Groups' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'AmazonRedshiftSource' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Name' => '<string>', ], 'AmazonRedshiftTarget' => [ 'Data' => [ 'AccessType' => '<string>', 'Action' => '<string>', 'AdvancedOptions' => [ [ 'Key' => '<string>', 'Value' => '<string>', ], // ... ], 'CatalogDatabase' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CatalogRedshiftSchema' => '<string>', 'CatalogRedshiftTable' => '<string>', 'CatalogTable' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'Connection' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'CrawlerConnection' => '<string>', 'IamRole' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'MergeAction' => '<string>', 'MergeClause' => '<string>', 'MergeWhenMatched' => '<string>', 'MergeWhenNotMatched' => '<string>', 'PostAction' => '<string>', 'PreAction' => '<string>', 'SampleQuery' => '<string>', 'Schema' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'SelectedColumns' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'SourceType' => '<string>', 'StagingTable' => '<string>', 'Table' => [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], 'TablePrefix' => '<string>', 'TableSchema' => [ [ 'Description' => '<string>', 'Label' => '<string>', 'Value' => '<string>', ], // ... ], 'TempDir' => '<string>', 'Upsert' => true || false, ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'ApplyMapping' => [ 'Inputs' => ['<string>', ...], 'Mapping' => [ [ 'Children' => [...], // RECURSIVE 'Dropped' => true || false, 'FromPath' => ['<string>', ...], 'FromType' => '<string>', 'ToKey' => '<string>', 'ToType' => '<string>', ], // ... ], 'Name' => '<string>', ], 'AthenaConnectorSource' => [ 'ConnectionName' => '<string>', 'ConnectionTable' => '<string>', 'ConnectionType' => '<string>', 'ConnectorName' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'SchemaName' => '<string>', ], 'CatalogDeltaSource' => [ 'AdditionalDeltaOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogHudiSource' => [ 'AdditionalHudiOptions' => ['<string>', ...], 'Database' => '<string>', 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], 'Table' => '<string>', ], 'CatalogKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'Database' => '<string>', 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'Table' => '<string>', 'WindowSize' => <integer>, ], 'CatalogSource' => [ 'Database' => '<string>', 'Name' => '<string>', 'Table' => '<string>', ], 'CatalogTarget' => [ 'Database' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'PartitionKeys' => [ ['<string>', ...], // ... ], 'Table' => '<string>', ], 'ConnectorDataSource' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'ConnectorDataTarget' => [ 'ConnectionType' => '<string>', 'Data' => ['<string>', ...], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'CustomCode' => [ 'ClassName' => '<string>', 'Code' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'OutputSchemas' => [ [ 'Columns' => [ [ 'Name' => '<string>', 'Type' => '<string>', ], // ... ], ], // ... ], ], 'DirectJDBCSource' => [ 'ConnectionName' => '<string>', 'ConnectionType' => 'sqlserver|mysql|oracle|postgresql|redshift', 'Database' => '<string>', 'Name' => '<string>', 'RedshiftTmpDir' => '<string>', 'Table' => '<string>', ], 'DirectKafkaSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddRecordTimestamp' => '<string>', 'Assign' => '<string>', 'BootstrapServers' => '<string>', 'Classification' => '<string>', 'ConnectionName' => '<string>', 'Delimiter' => '<string>', 'EmitConsumerLagMetrics' => '<string>', 'EndingOffsets' => '<string>', 'IncludeHeaders' => true || false, 'MaxOffsetsPerTrigger' => <integer>, 'MinPartitions' => <integer>, 'NumRetries' => <integer>, 'PollTimeoutMs' => <integer>, 'RetryIntervalMs' => <integer>, 'SecurityProtocol' => '<string>', 'StartingOffsets' => '<string>', 'StartingTimestamp' => <DateTime>, 'SubscribePattern' => '<string>', 'TopicName' => '<string>', ], 'WindowSize' => <integer>, ], 'DirectKinesisSource' => [ 'DataPreviewOptions' => [ 'PollingTime' => <integer>, 'RecordPollingLimit' => <integer>, ], 'DetectSchema' => true || false, 'Name' => '<string>', 'StreamingOptions' => [ 'AddIdleTimeBetweenReads' => true || false, 'AddRecordTimestamp' => '<string>', 'AvoidEmptyBatches' => true || false, 'Classification' => '<string>', 'Delimiter' => '<string>', 'DescribeShardInterval' => <integer>, 'EmitConsumerLagMetrics' => '<string>', 'EndpointUrl' => '<string>', 'IdleTimeBetweenReadsInMs' => <integer>, 'MaxFetchRecordsPerShard' => <integer>, 'MaxFetchTimeInMs' => <integer>, 'MaxRecordPerRead' => <integer>, 'MaxRetryIntervalMs' => <integer>, 'NumRetries' => <integer>, 'RetryIntervalMs' => <integer>, 'RoleArn' => '<string>', 'RoleSessionName' => '<string>', 'StartingPosition' => 'latest|trim_horizon|earliest|timestamp', 'StartingTimestamp' => <DateTime>, 'StreamArn' => '<string>', 'StreamName' => '<string>', ], 'WindowSize' => <integer>, ], 'DropDuplicates' => [ 'Columns' => [ ['<string>', ...], // ... ], 'Inputs' => ['<string>', ...], 'Name' => '<string>', ], 'DropFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'Paths' => [ ['<string>', ...], // ... ], ], 'DropNullFields' => [ 'Inputs' => ['<string>', ...], 'Name' => '<string>', 'NullCheckBoxList' => [ 'IsEmpty' => true || false, 'IsNegOne' => true || false, 'IsNullString' => true || false, ], 'NullTextList' => [ [ 'Datatype' => [ 'Id' => '<string>', 'Label' => '<string>', ], 'Value' => '<string>', ], // ... ], ], 'DynamicTransform' => [ 'FunctionName' => '<string>', 'Inputs' => ['<string>', ...], 'Name' => '<string>',