Matching workflow types Data output options Matching workflow results

Match input data using a matching workflow

A matching workflow is a data processing job that combines and compares data from different input sources and determines which records match based on different matching techniques. AWS Entity Resolution reads your data from your specified locations, finds matches between records, and assigns a Match ID to each matched set of data.

The following diagram summarizes how to create a matching workflow.

A summary of the four steps to create a matching workflow in AWS Entity Resolution

Topics

Matching workflow types

AWS Entity Resolution supports three types of matching workflows:

Rule-based matching: Uses configurable rules to identify matching records based on exact or fuzzy matching of specified fields. You define the matching criteria, such as matching names that are spelled similarly or addresses that are formatted differently.
Machine learning-based matching: Uses machine learning models to identify similar records, even when the data has variations, errors, or missing fields. This approach can detect more complex matches than rule-based matching.
Provider service-based matching: Uses third-party data providers to enrich and validate your data before matching. This type of matching is not compatible with Amazon Connect Customer Profiles output.

Data output options

AWS Entity Resolution can write data output files to:

An Amazon S3 location that you specify
Amazon Connect Customer Profiles (for customer data deduplication)

Important

Exporting to Amazon Connect Customer Profiles is not compatible with provider-based matching. To export to Amazon Connect Customer Profiles, you must use rule-based matching or machine learning-based matching.

You can use AWS Entity Resolution to hash output data if desired – helping you maintain control over your data.

The following table shows the three types of matching workflows and their supported output destinations.

Matching type	S3 output	Customer Profiles Output
rule-based	Yes	Yes
machine learning-based	Yes	Yes
provider service-based	Yes	No

Matching workflow results

After you create and run a matching workflow, you can view the results in your specified S3 location or in Amazon Connect Customer Profiles. Matching workflows generate IDs after the data is indexed.

A matching workflow can have multiple runs and the results (successes or errors) are written to a folder with the jobId as the name.

For each run for S3 output destinations:

The data output contains both a file for successful matches and a file for errors
Successful results are written to a success folder containing multiple files
Errors are written to an error folder with multiple fields

For each run for Amazon Connect Customer Profiles output destinations:

Deduplicated customer records are sent directly to your Amazon Connect instance
You can view your recent job history in the AWS Entity Resolution console
Existing profiles in Amazon Connect are not included in the deduplication process

After you create and run a matching workflow, you can use the output of rule-based matching or machine learning (ML) matching as an input to provider service-based matching or the other way around to meet your business needs.

For example, to save provider subscription costs, you can first run rule-based matching to find matches on your data. Then, you can send a subset of unmatched records to provider service-based matching. Note that if you plan to export to Customer Profiles, you should use rule-based or machine learning-based matching only.

For more information about troubleshooting errors, see Troubleshooting matching workflows.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Adding or updating a resource policy for an ID namespace

Creating a rule-based matching workflow