— data types —DataSource DataQualityRulesetListDetails DataQualityTargetTable DataQualityRulesetEvaluationRunDescription DataQualityRulesetEvaluationRunFilter DataQualityEvaluationRunAdditionalRunOptions DataQualityRuleRecommendationRunDescription DataQualityRuleRecommendationRunFilter DataQualityResult DataQualityAnalyzerResult DataQualityObservation MetricBasedObservation DataQualityMetricValues DataQualityRuleResult DataQualityResultDescription DataQualityResultFilterCriteria DataQualityRulesetFilterCriteria DataQualityAggregatedMetrics StatisticAnnotation TimestampedInclusionAnnotation AnnotationError DatapointInclusionAnnotation StatisticSummaryList StatisticSummary RunIdentifier StatisticModelResult DataQualityGlueTable — operations —StartDataQualityRulesetEvaluationRun (start_data_quality_ruleset_evaluation_run)CancelDataQualityRulesetEvaluationRun (cancel_data_quality_ruleset_evaluation_run)GetDataQualityRulesetEvaluationRun (get_data_quality_ruleset_evaluation_run)ListDataQualityRulesetEvaluationRuns (list_data_quality_ruleset_evaluation_runs)StartDataQualityRuleRecommendationRun (start_data_quality_rule_recommendation_run)CancelDataQualityRuleRecommendationRun (cancel_data_quality_rule_recommendation_run)GetDataQualityRuleRecommendationRun (get_data_quality_rule_recommendation_run)ListDataQualityRuleRecommendationRuns (list_data_quality_rule_recommendation_runs)GetDataQualityResult (get_data_quality_result)BatchGetDataQualityResult (batch_get_data_quality_result)ListDataQualityResults (list_data_quality_results)CreateDataQualityRuleset (create_data_quality_ruleset)DeleteDataQualityRuleset (delete_data_quality_ruleset)GetDataQualityRuleset (get_data_quality_ruleset)ListDataQualityRulesets (list_data_quality_rulesets)UpdateDataQualityRuleset (update_data_quality_ruleset)ListDataQualityStatistics (list_data_quality_statistics)TimestampFilter CreateDataQualityRulesetRequest GetDataQualityRulesetResponse GetDataQualityResultResponse StartDataQualityRuleRecommendationRunRequest GetDataQualityRuleRecommendationRunResponse BatchPutDataQualityStatisticAnnotation (batch_put_data_quality_statistic_annotation)GetDataQualityModel (get_data_quality_model)GetDataQualityModelResult (get_data_quality_model_result)ListDataQualityStatisticAnnotations (list_data_quality_statistic_annotations)PutDataQualityProfileAnnotation (put_data_quality_profile_annotation)

Data Quality API

The Data Quality API describes the data quality data types, and includes the API for creating, deleting, or updating data quality rulesets, runs and evaluations.

Data types

DataSource structure
DataQualityRulesetListDetails structure
DataQualityTargetTable structure
DataQualityRulesetEvaluationRunDescription structure
DataQualityRulesetEvaluationRunFilter structure
DataQualityEvaluationRunAdditionalRunOptions structure
DataQualityRuleRecommendationRunDescription structure
DataQualityRuleRecommendationRunFilter structure
DataQualityResult structure
DataQualityAnalyzerResult structure
DataQualityObservation structure
MetricBasedObservation structure
DataQualityMetricValues structure
DataQualityRuleResult structure
DataQualityResultDescription structure
DataQualityResultFilterCriteria structure
DataQualityRulesetFilterCriteria structure
DataQualityAggregatedMetrics structure
StatisticAnnotation structure
TimestampedInclusionAnnotation structure
AnnotationError structure
DatapointInclusionAnnotation structure
StatisticSummaryList list
StatisticSummary structure
RunIdentifier structure
StatisticModelResult structure
DataQualityGlueTable structure

DataSource structure

A data source (an AWS Glue table) for which you want data quality results.

Fields

GlueTable – A GlueTable object.

An AWS Glue table.
DataQualityGlueTable – A DataQualityGlueTable object.

An AWS Glue table for Data Quality Operations.

DataQualityRulesetListDetails structure

Describes a data quality ruleset returned by GetDataQualityRuleset.

Fields

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality ruleset.
CreatedOn – Timestamp.

The date and time the data quality ruleset was created.
LastModifiedOn – Timestamp.

The date and time the data quality ruleset was last modified.
TargetTable – A DataQualityTargetTable object.

An object representing an AWS Glue table.
RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
RuleCount – Number (integer).

The number of rules in the ruleset.

DataQualityTargetTable structure

An object representing an AWS Glue table.

Fields

TableName – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the AWS Glue table.
DatabaseName – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the database where the AWS Glue table exists.
CatalogId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The catalog id where the AWS Glue table exists.

DataQualityRulesetEvaluationRunDescription structure

Describes the result of a data quality ruleset evaluation run.

Fields

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.
Status – UTF-8 string (valid values: RUNNING | FINISHED | FAILED | PENDING_EXECUTION | TIMED_OUT | CANCELING | CANCELED | RECEIVED_BY_TASKRUNNER).

The status for this run.
StartedOn – Timestamp.

The date and time when the run started.
DataSource – A DataSource object.

The data source (an AWS Glue table) associated with the run.

DataQualityRulesetEvaluationRunFilter structure

The filter criteria.

Fields

DataSource – Required: A DataSource object.

Filter based on a data source (an AWS Glue table) associated with the run.
StartedBefore – Timestamp.

Filter results by runs that started before this time.
StartedAfter – Timestamp.

Filter results by runs that started after this time.

DataQualityEvaluationRunAdditionalRunOptions structure

Additional run options you can specify for an evaluation run.

Fields

CloudWatchMetricsEnabled – Boolean.

Whether or not to enable CloudWatch metrics.
ResultsS3Prefix – UTF-8 string.

Prefix for Amazon S3 to store results.
CompositeRuleEvaluationMethod – UTF-8 string (valid values: COLUMN | ROW).

Set the evaluation method for composite rules in the ruleset to ROW/COLUMN

DataQualityRuleRecommendationRunDescription structure

Describes the result of a data quality rule recommendation run.

Fields

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.
Status – UTF-8 string (valid values: RUNNING | FINISHED | FAILED | PENDING_EXECUTION | TIMED_OUT | CANCELING | CANCELED | RECEIVED_BY_TASKRUNNER).

The status for this run.
StartedOn – Timestamp.

The date and time when this run started.
DataSource – A DataSource object.

The data source (AWS Glue table) associated with the recommendation run.

DataQualityRuleRecommendationRunFilter structure

A filter for listing data quality recommendation runs.

Fields

DataSource – Required: A DataSource object.

Filter based on a specified data source (AWS Glue table).
StartedBefore – Timestamp.

Filter based on time for results started before provided time.
StartedAfter – Timestamp.

Filter based on time for results started after provided time.

DataQualityResult structure

Describes a data quality result.

Fields

ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique result ID for the data quality result.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID for the data quality result.
Score – Number (double), not more than 1.0.

An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
DataSource – A DataSource object.

The table associated with the data quality result, if any.
RulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset associated with the data quality result.
EvaluationContext – UTF-8 string.

In the context of a job in AWS Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.
StartedOn – Timestamp.

The date and time when this data quality run started.
CompletedOn – Timestamp.

The date and time when this data quality run completed.
JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job name associated with the data quality result, if any.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job run ID associated with the data quality result, if any.
RulesetEvaluationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run ID for the ruleset evaluation for this data quality result.
RuleResults – An array of DataQualityRuleResult objects, not more than 2000 structures.

A list of DataQualityRuleResult objects representing the results for each rule.
AnalyzerResults – An array of DataQualityAnalyzerResult objects, not more than 2000 structures.

A list of DataQualityAnalyzerResult objects representing the results for each analyzer.
Observations – An array of DataQualityObservation objects, not more than 50 structures.

A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.
AggregatedMetrics – A DataQualityAggregatedMetrics object.

A summary of DataQualityAggregatedMetrics objects showing the total counts of processed rows and rules, including their pass/fail statistics based on row-level results.

DataQualityAnalyzerResult structure

Describes the result of the evaluation of a data quality analyzer.

Fields

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality analyzer.
Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality analyzer.
EvaluationMessage – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

An evaluation message.
EvaluatedMetrics – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a Number (double).

A map of metrics associated with the evaluation of the analyzer.

DataQualityObservation structure

Describes the observation generated after evaluating the rules and analyzers.

Fields

Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality observation.
MetricBasedObservation – A MetricBasedObservation object.

An object of type MetricBasedObservation representing the observation that is based on evaluated data quality metrics.

MetricBasedObservation structure

Describes the metric based observation generated based on evaluated data quality metrics.

Fields

MetricName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality metric used for generating the observation.
StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
MetricValues – A DataQualityMetricValues object.

An object of type DataQualityMetricValues representing the analysis of the data quality metric value.
NewRules – An array of UTF-8 strings.

A list of new data quality rules generated as part of the observation based on the data quality metric value.

DataQualityMetricValues structure

Describes the data quality metric value according to the analysis of historical data.

Fields

ActualValue – Number (double).

The actual value of the data quality metric.
ExpectedValue – Number (double).

The expected value of the data quality metric according to the analysis of historical data.
LowerLimit – Number (double).

The lower limit of the data quality metric value according to the analysis of historical data.
UpperLimit – Number (double).

The upper limit of the data quality metric value according to the analysis of historical data.

DataQualityRuleResult structure

Describes the result of the evaluation of a data quality rule.

Fields

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality rule.
Description – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality rule.
EvaluationMessage – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

An evaluation message.
Result – UTF-8 string (valid values: PASS | FAIL | ERROR).

A pass or fail status for the rule.
EvaluatedMetrics – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a Number (double).

A map of metrics associated with the evaluation of the rule.
EvaluatedRule – UTF-8 string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

The evaluated rule.
RuleMetrics – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a Number (double).

A map containing metrics associated with the evaluation of the rule based on row-level results.

DataQualityResultDescription structure

Describes a data quality result.

Fields

ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique result ID for this data quality result.
DataSource – A DataSource object.

The table name associated with the data quality result.
JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job name associated with the data quality result.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job run ID associated with the data quality result.
StartedOn – Timestamp.

The time that the run started for this data quality result.

DataQualityResultFilterCriteria structure

Criteria used to return data quality results.

Fields

DataSource – A DataSource object.

Filter results by the specified data source. For example, retrieving all results for an AWS Glue table.
JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Filter results by the specified job name.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Filter results by the specified job run ID.
StartedAfter – Timestamp.

Filter results by runs that started after this time.
StartedBefore – Timestamp.

Filter results by runs that started before this time.

DataQualityRulesetFilterCriteria structure

The criteria used to filter data quality rulesets.

Fields

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset filter criteria.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

The description of the ruleset filter criteria.
CreatedBefore – Timestamp.

Filter on rulesets created before this date.
CreatedAfter – Timestamp.

Filter on rulesets created after this date.
LastModifiedBefore – Timestamp.

Filter on rulesets last modified before this date.
LastModifiedAfter – Timestamp.

Filter on rulesets last modified after this date.
TargetTable – A DataQualityTargetTable object.

The name and database name of the target table.

DataQualityAggregatedMetrics structure

A summary of metrics showing the total counts of processed rows and rules, including their pass/fail statistics based on row-level results.

Fields

TotalRowsProcessed – Number (double).

The total number of rows that were processed during the data quality evaluation.
TotalRowsPassed – Number (double).

The total number of rows that passed all applicable data quality rules.
TotalRowsFailed – Number (double).

The total number of rows that failed one or more data quality rules.
TotalRulesProcessed – Number (double).

The total number of data quality rules that were evaluated.
TotalRulesPassed – Number (double).

The total number of data quality rules that passed their evaluation criteria.
TotalRulesFailed – Number (double).

The total number of data quality rules that failed their evaluation criteria.

StatisticAnnotation structure

A Statistic Annotation.

Fields

ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.
StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
StatisticRecordedOn – Timestamp.

The timestamp when the annotated statistic was recorded.
InclusionAnnotation – A TimestampedInclusionAnnotation object.

The inclusion annotation applied to the statistic.

TimestampedInclusionAnnotation structure

A timestamped inclusion annotation.

Fields

Value – UTF-8 string (valid values: INCLUDE | EXCLUDE).

The inclusion annotation value.
LastModifiedOn – Timestamp.

The timestamp when the inclusion annotation was last modified.

AnnotationError structure

A failed annotation.

Fields

ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID for the failed annotation.
StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID for the failed annotation.
FailureReason – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

The reason why the annotation failed.

DatapointInclusionAnnotation structure

An Inclusion Annotation.

Fields

ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The ID of the data quality profile the statistic belongs to.
StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
InclusionAnnotation – UTF-8 string (valid values: INCLUDE | EXCLUDE).

The inclusion annotation value to apply to the statistic.

StatisticSummaryList list

A list of StatisticSummary.

An array of StatisticSummary objects.

A list of StatisticSummary.

StatisticSummary structure

Summary information about a statistic.

Fields

StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.
RunIdentifier – A RunIdentifier object.

The Run Identifier
StatisticName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Custom string pattern #16.

The name of the statistic.
DoubleValue – Number (double).

The value of the statistic.
EvaluationLevel – UTF-8 string (valid values: Dataset="DATASET" | Column="COLUMN" | Multicolumn="MULTICOLUMN").

The evaluation level of the statistic. Possible values: Dataset, Column, Multicolumn.
ColumnsReferenced – An array of UTF-8 strings.

The list of columns referenced by the statistic.
ReferencedDatasets – An array of UTF-8 strings.

The list of datasets referenced by the statistic.
StatisticProperties – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A StatisticPropertiesMap, which contains a NameString and DescriptionString
RecordedOn – Timestamp.

The timestamp when the statistic was recorded.
InclusionAnnotation – A TimestampedInclusionAnnotation object.

The inclusion annotation for the statistic.

RunIdentifier structure

A run identifier.

Fields

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Run ID.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Job Run ID.

StatisticModelResult structure

The statistic model result.

Fields

LowerBound – Number (double).

The lower bound.
UpperBound – Number (double).

The upper bound.
PredictedValue – Number (double).

The predicted value.
ActualValue – Number (double).

The actual value.
Date – Timestamp.

The date.
InclusionAnnotation – UTF-8 string (valid values: INCLUDE | EXCLUDE).

The inclusion annotation.

DataQualityGlueTable structure

The database and table in the AWS Glue Data Catalog that is used for input or output data for Data Quality Operations.

Fields

DatabaseName – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A database name in the AWS Glue Data Catalog.
TableName – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A table name in the AWS Glue Data Catalog.
CatalogId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique identifier for the AWS Glue Data Catalog.
ConnectionName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the connection to the AWS Glue Data Catalog.
AdditionalOptions – A map array of key-value pairs, not less than 1 or more than 10 pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

Additional options for the table. Currently there are two keys supported:
- pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
- catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the AWS Glue Data Catalog.
PreProcessingQuery – UTF-8 string, not more than 51200 bytes long, matching the URI address multi-line string pattern.

SQL Query of SparkSQL format that can be used to pre-process the data for the table in AWS Glue Data Catalog, before running the Data Quality Operation.

Operations

StartDataQualityRulesetEvaluationRun action (Python: start_data_quality_ruleset_evaluation_run)
CancelDataQualityRulesetEvaluationRun action (Python: cancel_data_quality_ruleset_evaluation_run)
GetDataQualityRulesetEvaluationRun action (Python: get_data_quality_ruleset_evaluation_run)
ListDataQualityRulesetEvaluationRuns action (Python: list_data_quality_ruleset_evaluation_runs)
StartDataQualityRuleRecommendationRun action (Python: start_data_quality_rule_recommendation_run)
CancelDataQualityRuleRecommendationRun action (Python: cancel_data_quality_rule_recommendation_run)
GetDataQualityRuleRecommendationRun action (Python: get_data_quality_rule_recommendation_run)
ListDataQualityRuleRecommendationRuns action (Python: list_data_quality_rule_recommendation_runs)
GetDataQualityResult action (Python: get_data_quality_result)
BatchGetDataQualityResult action (Python: batch_get_data_quality_result)
ListDataQualityResults action (Python: list_data_quality_results)
CreateDataQualityRuleset action (Python: create_data_quality_ruleset)
DeleteDataQualityRuleset action (Python: delete_data_quality_ruleset)
GetDataQualityRuleset action (Python: get_data_quality_ruleset)
ListDataQualityRulesets action (Python: list_data_quality_rulesets)
UpdateDataQualityRuleset action (Python: update_data_quality_ruleset)
ListDataQualityStatistics action (Python: list_data_quality_statistics)
TimestampFilter structure
CreateDataQualityRulesetRequest structure
GetDataQualityRulesetResponse structure
GetDataQualityResultResponse structure
StartDataQualityRuleRecommendationRunRequest structure
GetDataQualityRuleRecommendationRunResponse structure
BatchPutDataQualityStatisticAnnotation action (Python: batch_put_data_quality_statistic_annotation)
GetDataQualityModel action (Python: get_data_quality_model)
GetDataQualityModelResult action (Python: get_data_quality_model_result)
ListDataQualityStatisticAnnotations action (Python: list_data_quality_statistic_annotations)
PutDataQualityProfileAnnotation action (Python: put_data_quality_profile_annotation)

StartDataQualityRulesetEvaluationRun action (Python: start_data_quality_ruleset_evaluation_run)

Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (AWS Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult API.

Request

DataSource – Required: A DataSource object.

The data source (AWS Glue table) associated with this run.
Role – Required: UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
AdditionalRunOptions – A DataQualityEvaluationRunAdditionalRunOptions object.

Additional run options you can specify for an evaluation run.
RulesetNames – Required: An array of UTF-8 strings, not less than 1 or more than 10 strings.

A list of ruleset names.
AdditionalDataSources – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a A DataSource object.

A map of reference strings to additional data sources you can specify for an evaluation run.

Response

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Errors

InvalidInputException
EntityNotFoundException
OperationTimeoutException
InternalServiceException
ConflictException

CancelDataQualityRulesetEvaluationRun action (Python: cancel_data_quality_ruleset_evaluation_run)

Cancels a run where a ruleset is being evaluated against a data source.

Request

RunId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Response

No Response parameters.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

GetDataQualityRulesetEvaluationRun action (Python: get_data_quality_ruleset_evaluation_run)

Retrieves a specific run where a ruleset is evaluated against a data source.

Request

RunId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Response

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.
DataSource – A DataSource object.

The data source (an AWS Glue table) associated with this evaluation run.
Role – UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
AdditionalRunOptions – A DataQualityEvaluationRunAdditionalRunOptions object.

Additional run options you can specify for an evaluation run.
Status – UTF-8 string (valid values: RUNNING | FINISHED | FAILED | PENDING_EXECUTION | TIMED_OUT | CANCELING | CANCELED | RECEIVED_BY_TASKRUNNER).

The status for this run.
ErrorString – UTF-8 string.

The error strings that are associated with the run.
StartedOn – Timestamp.

The date and time when this run started.
LastModifiedOn – Timestamp.

A timestamp. The last point in time when this data quality rule recommendation run was modified.
CompletedOn – Timestamp.

The date and time when this run was completed.
ExecutionTime – Number (integer).

The amount of time (in seconds) that the run consumed resources.
RulesetNames – An array of UTF-8 strings, not less than 1 or more than 10 strings.

A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.
ResultIds – An array of UTF-8 strings, not less than 1 or more than 10 strings.

A list of result IDs for the data quality results for the run.
AdditionalDataSources – A map array of key-value pairs.

Each key is a UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Each value is a A DataSource object.

A map of reference strings to additional data sources you can specify for an evaluation run.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

ListDataQualityRulesetEvaluationRuns action (Python: list_data_quality_ruleset_evaluation_runs)

Lists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source.

Request

Filter – A DataQualityRulesetEvaluationRunFilter object.

The filter criteria.
NextToken – UTF-8 string.

A paginated token to offset the results.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return.

Response

Runs – An array of DataQualityRulesetEvaluationRunDescription objects.

A list of DataQualityRulesetEvaluationRunDescription objects representing data quality ruleset runs.
NextToken – UTF-8 string.

A pagination token, if more results are available.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException

StartDataQualityRuleRecommendationRun action (Python: start_data_quality_rule_recommendation_run)

Starts a recommendation run that is used to generate rules when you don't know what rules to write. AWS Glue Data Quality analyzes the data and comes up with recommendations for a potential ruleset. You can then triage the ruleset and modify the generated ruleset to your liking.

Recommendation runs are automatically deleted after 90 days.

Request

The request of the Data Quality rule recommendation request.

DataSource – Required: A DataSource object.

The data source (AWS Glue table) associated with this run.
Role – Required: UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A name for the ruleset.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Response

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException
ConflictException

CancelDataQualityRuleRecommendationRun action (Python: cancel_data_quality_rule_recommendation_run)

Cancels the specified recommendation run that was being used to generate rules.

Request

RunId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Response

No Response parameters.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

GetDataQualityRuleRecommendationRun action (Python: get_data_quality_rule_recommendation_run)

Gets the specified recommendation run that was used to generate rules.

Request

RunId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.

Response

The response for the Data Quality rule recommendation run.

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.
DataSource – A DataSource object.

The data source (an AWS Glue table) associated with this run.
Role – UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
Status – UTF-8 string (valid values: RUNNING | FINISHED | FAILED | PENDING_EXECUTION | TIMED_OUT | CANCELING | CANCELED | RECEIVED_BY_TASKRUNNER).

The status for this run.
ErrorString – UTF-8 string.

The error strings that are associated with the run.
StartedOn – Timestamp.

The date and time when this run started.
LastModifiedOn – Timestamp.

A timestamp. The last point in time when this data quality rule recommendation run was modified.
CompletedOn – Timestamp.

The date and time when this run was completed.
ExecutionTime – Number (integer).

The amount of time (in seconds) that the run consumed resources.
RecommendedRuleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.
CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset that was created by the run.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

ListDataQualityRuleRecommendationRuns action (Python: list_data_quality_rule_recommendation_runs)

Lists the recommendation runs meeting the filter criteria.

Request

Filter – A DataQualityRuleRecommendationRunFilter object.

The filter criteria.
NextToken – UTF-8 string.

A paginated token to offset the results.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return.

Response

Runs – An array of DataQualityRuleRecommendationRunDescription objects.

A list of DataQualityRuleRecommendationRunDescription objects.
NextToken – UTF-8 string.

A pagination token, if more results are available.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException

GetDataQualityResult action (Python: get_data_quality_result)

Retrieves the result of a data quality rule evaluation.

Request

ResultId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique result ID for the data quality result.

Response

The response for the data quality result.

ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique result ID for the data quality result.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID for the data quality result.
Score – Number (double), not more than 1.0.

An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
DataSource – A DataSource object.

The table associated with the data quality result, if any.
RulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset associated with the data quality result.
EvaluationContext – UTF-8 string.

In the context of a job in AWS Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.
StartedOn – Timestamp.

The date and time when the run for this data quality result started.
CompletedOn – Timestamp.

The date and time when the run for this data quality result was completed.
JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job name associated with the data quality result, if any.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job run ID associated with the data quality result, if any.
RulesetEvaluationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run ID associated with the ruleset evaluation.
RuleResults – An array of DataQualityRuleResult objects, not more than 2000 structures.

A list of DataQualityRuleResult objects representing the results for each rule.
AnalyzerResults – An array of DataQualityAnalyzerResult objects, not more than 2000 structures.

A list of DataQualityAnalyzerResult objects representing the results for each analyzer.
Observations – An array of DataQualityObservation objects, not more than 50 structures.

A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.
AggregatedMetrics – A DataQualityAggregatedMetrics object.

A summary of DataQualityAggregatedMetrics objects showing the total counts of processed rows and rules, including their pass/fail statistics based on row-level results.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException
EntityNotFoundException

BatchGetDataQualityResult action (Python: batch_get_data_quality_result)

Retrieves a list of data quality results for the specified result IDs.

Request

ResultIds – Required: An array of UTF-8 strings, not less than 1 or more than 100 strings.

A list of unique result IDs for the data quality results.

Response

Results – Required: An array of DataQualityResult objects.

A list of DataQualityResult objects representing the data quality results.
ResultsNotFound – An array of UTF-8 strings, not less than 1 or more than 100 strings.

A list of result IDs for which results were not found.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException

ListDataQualityResults action (Python: list_data_quality_results)

Returns all data quality execution results for your account.

Request

Filter – A DataQualityResultFilterCriteria object.

The filter criteria.
NextToken – UTF-8 string.

A paginated token to offset the results.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return.

Response

Results – Required: An array of DataQualityResultDescription objects.

A list of DataQualityResultDescription objects.
NextToken – UTF-8 string.

A pagination token, if more results are available.

Errors

InvalidInputException
OperationTimeoutException
InternalServiceException

CreateDataQualityRuleset action (Python: create_data_quality_ruleset)

Creates a data quality ruleset with DQDL rules applied to a specified AWS Glue table.

You create the ruleset using the Data Quality Definition Language (DQDL). For more information, see the AWS Glue developer guide.

Request

A request to create a data quality ruleset.

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique name for the data quality ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality ruleset.
Ruleset – Required: UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.
Tags – A map array of key-value pairs, not more than 50 pairs.

Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

Each value is a UTF-8 string, not more than 256 bytes long.

A list of tags applied to the data quality ruleset.
TargetTable – A DataQualityTargetTable object.

A target table associated with the data quality ruleset.
RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique run ID for the recommendation run.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

Response

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique name for the data quality ruleset.

Errors

InvalidInputException
AlreadyExistsException
OperationTimeoutException
InternalServiceException
ResourceNumberLimitExceededException

DeleteDataQualityRuleset action (Python: delete_data_quality_ruleset)

Deletes a data quality ruleset.

Request

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A name for the data quality ruleset.

Response

No Response parameters.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

GetDataQualityRuleset action (Python: get_data_quality_ruleset)

Returns an existing ruleset by identifier or name.

Request

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset.

Response

Returns the data quality ruleset response.

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the ruleset.
Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.
TargetTable – A DataQualityTargetTable object.

The name and database name of the target table.
CreatedOn – Timestamp.

A timestamp. The time and date that this data quality ruleset was created.
LastModifiedOn – Timestamp.

A timestamp. The last point in time when this data quality ruleset was modified.
RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

ListDataQualityRulesets action (Python: list_data_quality_rulesets)

Returns a paginated list of rulesets for the specified list of AWS Glue tables.

Request

NextToken – UTF-8 string.

A paginated token to offset the results.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return.
Filter – A DataQualityRulesetFilterCriteria object.

The filter criteria.
Tags – A map array of key-value pairs, not more than 50 pairs.

Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

Each value is a UTF-8 string, not more than 256 bytes long.

A list of key-value pair tags.

Response

Rulesets – An array of DataQualityRulesetListDetails objects.

A paginated list of rulesets for the specified list of AWS Glue tables.
NextToken – UTF-8 string.

A pagination token, if more results are available.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

UpdateDataQualityRuleset action (Python: update_data_quality_ruleset)

Updates the specified data quality ruleset.

Request

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the ruleset.
Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.

Response

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the data quality ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the ruleset.
Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.

Errors

EntityNotFoundException
AlreadyExistsException
IdempotentParameterMismatchException
InvalidInputException
OperationTimeoutException
InternalServiceException
ResourceNumberLimitExceededException

ListDataQualityStatistics action (Python: list_data_quality_statistics)

Retrieves a list of data quality statistics.

Request

StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.
TimestampFilter – A TimestampFilter object.

A timestamp filter.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return in this request.
NextToken – UTF-8 string.

A pagination token to request the next page of results.

Response

Statistics – An array of StatisticSummary objects.

A StatisticSummaryList.
NextToken – UTF-8 string.

A pagination token to request the next page of results.

Errors

EntityNotFoundException
InvalidInputException
InternalServiceException

TimestampFilter structure

A timestamp filter.

Fields

RecordedBefore – Timestamp.

The timestamp before which statistics should be included in the results.
RecordedAfter – Timestamp.

The timestamp after which statistics should be included in the results.

CreateDataQualityRulesetRequest structure

A request to create a data quality ruleset.

Fields

Name – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique name for the data quality ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the data quality ruleset.
Ruleset – Required: UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.
Tags – A map array of key-value pairs, not more than 50 pairs.

Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

Each value is a UTF-8 string, not more than 256 bytes long.

A list of tags applied to the data quality ruleset.
TargetTable – A DataQualityTargetTable object.

A target table associated with the data quality ruleset.
RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique run ID for the recommendation run.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

GetDataQualityRulesetResponse structure

Returns the data quality ruleset response.

Fields

Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset.
Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

A description of the ruleset.
Ruleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

A Data Quality Definition Language (DQDL) ruleset. For more information, see the AWS Glue developer guide.
TargetTable – A DataQualityTargetTable object.

The name and database name of the target table.
CreatedOn – Timestamp.

A timestamp. The time and date that this data quality ruleset was created.
LastModifiedOn – Timestamp.

A timestamp. The last point in time when this data quality ruleset was modified.
RecommendationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

When a ruleset was created from a recommendation run, this run ID is generated to link the two together.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.

GetDataQualityResultResponse structure

The response for the data quality result.

Fields

ResultId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A unique result ID for the data quality result.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID for the data quality result.
Score – Number (double), not more than 1.0.

An aggregate data quality score. Represents the ratio of rules that passed to the total number of rules.
DataSource – A DataSource object.

The table associated with the data quality result, if any.
RulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset associated with the data quality result.
EvaluationContext – UTF-8 string.

In the context of a job in AWS Glue Studio, each node in the canvas is typically assigned some sort of name and data quality nodes will have names. In the case of multiple nodes, the evaluationContext can differentiate the nodes.
StartedOn – Timestamp.

The date and time when the run for this data quality result started.
CompletedOn – Timestamp.

The date and time when the run for this data quality result was completed.
JobName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job name associated with the data quality result, if any.
JobRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The job run ID associated with the data quality result, if any.
RulesetEvaluationRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run ID associated with the ruleset evaluation.
RuleResults – An array of DataQualityRuleResult objects, not more than 2000 structures.

A list of DataQualityRuleResult objects representing the results for each rule.
AnalyzerResults – An array of DataQualityAnalyzerResult objects, not more than 2000 structures.

A list of DataQualityAnalyzerResult objects representing the results for each analyzer.
Observations – An array of DataQualityObservation objects, not more than 50 structures.

A list of DataQualityObservation objects representing the observations generated after evaluating the rules and analyzers.
AggregatedMetrics – A DataQualityAggregatedMetrics object.

A summary of DataQualityAggregatedMetrics objects showing the total counts of processed rows and rules, including their pass/fail statistics based on row-level results.

StartDataQualityRuleRecommendationRunRequest structure

The request of the Data Quality rule recommendation request.

Fields

DataSource – Required: A DataSource object.

The data source (AWS Glue table) associated with this run.
Role – Required: UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

A name for the ruleset.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

GetDataQualityRuleRecommendationRunResponse structure

The response for the Data Quality rule recommendation run.

Fields

RunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The unique run identifier associated with this run.
DataSource – A DataSource object.

The data source (an AWS Glue table) associated with this run.
Role – UTF-8 string.

An IAM role supplied to encrypt the results of the run.
NumberOfWorkers – Number (integer).

The number of G.1X workers to be used in the run. The default is 5.
Timeout – Number (integer), at least 1.

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
Status – UTF-8 string (valid values: RUNNING | FINISHED | FAILED | PENDING_EXECUTION | TIMED_OUT | CANCELING | CANCELED | RECEIVED_BY_TASKRUNNER).

The status for this run.
ErrorString – UTF-8 string.

The error strings that are associated with the run.
StartedOn – Timestamp.

The date and time when this run started.
LastModifiedOn – Timestamp.

A timestamp. The last point in time when this data quality rule recommendation run was modified.
CompletedOn – Timestamp.

The date and time when this run was completed.
ExecutionTime – Number (integer).

The amount of time (in seconds) that the run consumed resources.
RecommendedRuleset – UTF-8 string, not less than 1 or more than 65536 bytes long.

When a start rule recommendation run completes, it creates a recommended ruleset (a set of rules). This member has those rules in Data Quality Definition Language (DQDL) format.
CreatedRulesetName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the ruleset that was created by the run.
DataQualitySecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The name of the security configuration created with the data quality encryption option.

BatchPutDataQualityStatisticAnnotation action (Python: batch_put_data_quality_statistic_annotation)

Annotate datapoints over time for a specific data quality statistic. The API requires both profileID and statisticID as part of the InclusionAnnotation input. The API only works for a single statisticId across multiple profiles.

Request

InclusionAnnotations – Required: An array of DatapointInclusionAnnotation objects.

A list of DatapointInclusionAnnotation's. The InclusionAnnotations must contain a profileId and statisticId. If there are multiple InclusionAnnotations, the list must refer to a single statisticId across multiple profileIds.
ClientToken – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

Client Token.

Response

FailedInclusionAnnotations – An array of AnnotationError objects.

A list of AnnotationError's.

Errors

EntityNotFoundException
InvalidInputException
InternalServiceException
ResourceNumberLimitExceededException

GetDataQualityModel action (Python: get_data_quality_model)

Retrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason).

Request

StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
ProfileId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.

Response

Status – UTF-8 string (valid values: RUNNING | SUCCEEDED | FAILED).

The training status of the data quality model.
StartedOn – Timestamp.

The timestamp when the data quality model training started.
CompletedOn – Timestamp.

The timestamp when the data quality model training completed.
FailureReason – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The training failure reason.

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

GetDataQualityModelResult action (Python: get_data_quality_model_result)

Retrieve a statistic's predictions for a given Profile ID.

Request

StatisticId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
ProfileId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.

Response

CompletedOn – Timestamp.

The timestamp when the data quality model training completed.
Model – An array of StatisticModelResult objects.

A list of StatisticModelResult

Errors

EntityNotFoundException
InvalidInputException
OperationTimeoutException
InternalServiceException

ListDataQualityStatisticAnnotations action (Python: list_data_quality_statistic_annotations)

Retrieve annotations for a data quality statistic.

Request

StatisticId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Statistic ID.
ProfileId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The Profile ID.
TimestampFilter – A TimestampFilter object.

A timestamp filter.
MaxResults – Number (integer), not less than 1 or more than 1000.

The maximum number of results to return in this request.
NextToken – UTF-8 string.

A pagination token to retrieve the next set of results.

Response

Annotations – An array of StatisticAnnotation objects.

A list of StatisticAnnotation applied to the Statistic
NextToken – UTF-8 string.

A pagination token to retrieve the next set of results.

Errors

InvalidInputException
InternalServiceException

PutDataQualityProfileAnnotation action (Python: put_data_quality_profile_annotation)

Annotate all datapoints for a Profile.

Request

ProfileId – Required: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

The ID of the data quality monitoring profile to annotate.
InclusionAnnotation – Required: UTF-8 string (valid values: INCLUDE | EXCLUDE).

The inclusion annotation value to apply to the profile.

Response

No Response parameters.

Errors

EntityNotFoundException
InvalidInputException
InternalServiceException

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Machine learning

Sensitive Data