StartDataQualityRulesetEvaluationRun - AWS Glue

StartDataQualityRulesetEvaluationRun

Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (AWS Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult API.

Request Syntax

{ "AdditionalDataSources": { "string" : { "GlueTable": { "AdditionalOptions": { "string" : "string" }, "CatalogId": "string", "ConnectionName": "string", "DatabaseName": "string", "TableName": "string" } } }, "AdditionalRunOptions": { "CloudWatchMetricsEnabled": boolean, "ResultsS3Prefix": "string" }, "ClientToken": "string", "DataSource": { "GlueTable": { "AdditionalOptions": { "string" : "string" }, "CatalogId": "string", "ConnectionName": "string", "DatabaseName": "string", "TableName": "string" } }, "NumberOfWorkers": number, "Role": "string", "RulesetNames": [ "string" ], "Timeout": number }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

AdditionalDataSources

A map of reference strings to additional data sources you can specify for an evaluation run.

Type: String to DataSource object map

Key Length Constraints: Minimum length of 1. Maximum length of 255.

Key Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: No

AdditionalRunOptions

Additional run options you can specify for an evaluation run.

Type: DataQualityEvaluationRunAdditionalRunOptions object

Required: No

ClientToken

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 255.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: No

DataSource

The data source (AWS Glue table) associated with this run.

Type: DataSource object

Required: Yes

NumberOfWorkers

The number of G.1X workers to be used in the run. The default is 5.

Type: Integer

Required: No

Role

An IAM role supplied to encrypt the results of the run.

Type: String

Required: Yes

RulesetNames

A list of ruleset names.

Type: Array of strings

Array Members: Minimum number of 1 item. Maximum number of 10 items.

Length Constraints: Minimum length of 1. Maximum length of 255.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: Yes

Timeout

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

Type: Integer

Valid Range: Minimum value of 1.

Required: No

Response Syntax

{ "RunId": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

RunId

The unique run identifier associated with this run.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 255.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Errors

For information about the errors that are common to all actions, see Common Errors.

ConflictException

The CreatePartitions API was called on a table that has indexes enabled.

HTTP Status Code: 400

EntityNotFoundException

A specified entity does not exist

HTTP Status Code: 400

InternalServiceException

An internal service error occurred.

HTTP Status Code: 500

InvalidInputException

The input provided was not valid.

HTTP Status Code: 400

OperationTimeoutException

The operation timed out.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: