Creating a schema mapping - AWS Entity Resolution

Creating a schema mapping

This procedure describes the process of creating a schema mapping using the AWS Entity Resolution console.

There are three ways to create a schema mapping:

  • Import existing input data using the Import from AWS Glue option – Use this creation method to define input fields starting with pre-populated columns from an AWS Glue table using a guided flow.

  • Manually defining input data using the Build custom schema option – Use this creation method to manually define the input fields using a guided flow.

  • Manually create using the Use JSON editor option – Use a JSON editor to manually create, use a sample, or import existing input data.

    Note

    The Unique ID and Input fields aren't available with this option.

Import from AWS Glue
To create schema mapping by importing existing input data from AWS Glue
  1. Sign in to the AWS Management Console and open the AWS Entity Resolution console with your AWS account, if you haven't yet done so.

  2. In the left navigation pane, under Data preparation, choose Schema mappings.

  3. On the Schema mappings page, in the upper right corner, choose Create schema mapping.

  4. For Step 1: Specify schema details, do the following:

    1. For Name and creation method, enter a Schema mapping name and an optional Description.

    2. For Creation method, choose Import from AWS Glue.

    3. Choose the AWS Glue database from the dropdown, and then choose the AWS Glue table from the dropdown.

      To create a new table, go to the AWS Glue console https://console.aws.amazon.com/glue/. For more information, see AWS Glue tables in the AWS Glue User Guide.

    4. For Unique ID, specify the column that distinctly references each row of your data.

      For example: Primary_key, Row_ID, or Record_ID.

      Note

      The Unique ID column is required. The Unique ID must be a unique identifier within a single table. However, across different tables, the Unique ID can have duplicate values. If the Unique ID isn't specified, isn't unique within the same source, or overlaps in terms of attribute names across sources, then AWS Entity Resolution rejects the record when the matching workflow is run. If you are using this schema mapping in a rule-based matching workflow, the Unique ID must not exceed 38 characters.

    5. For Input fields, choose the columns you want to use for matching and for optional pass through.

      You can choose a maximum of 34 columns total for both matching and pass through.

      1. Under Matching, choose the columns you to use as input fields for matching.

        You can choose a maximum of 24 columns total for matching.

      2. Select Add columns for pass through if you want to specify the columns that aren't used for matching.

      3. (Optional) Under Pass through, choose the columns to include as pass through columns.

    6. (Optional) If you want to enable Tags for the resource, choose Add new tag, and then enter the Key and Value pair.

    7. Choose Next.

  5. For Step 2: Map input fields, define the input fields you want to use for matching and for optional pass through.

    1. For Input fields for matching, for each Input field, specify the Input type, Match key, and Hashing status.

      The Input type helps you classify the data. The Match key enables input field comparison to your matching workflow. The Hashing status indicates if the column value for that input field is hashed or cleartext.

      Note

      If you're creating a schema mapping to use with the LiveRamp provider service-based matching technique, then you can:

      • Specify the Input type as LiveRamp ID.

      • Specify the name field as either multiple fields (such as first_name, last_name) or in one field.

      • Specify the street address field as either multiple fields (such as address1, address2) or in one field.

        If matching against an address, a zip code is required.

      • Include email or phone with name, and those fields can match against the street address.

      Note

      If you're creating a schema mapping to use with the machine learning-based matching workflow, your dataset must contain at least one of the following attributes: phonenumber, emailaddress, fullname, addresses, or birthdate.

      Don't specify the Input type for any of these attributes as a Custom string.

    2. (Optional) For Input fields for pass through, add the input fields that won't be matched and their corresponding Hashing status.

      The Hashing status indicates if the column value for that input field is hashed or cleartext.

    3. Choose Next.

  6. For Step 3: Group data, do the following:

    1. Choose the related Name fields, and then enter the Group name and Match key.

      For example, choose input fields First name, Middle name, and Last name. Then enter a Group name called “Full name” and a Match key called “Full name” to enable the comparison.

    2. Choose the related Address fields, and then enter the Group name and Match key.

      For example, choose input fields Home street address 1, Home street address 2, and Home city. Then enter a Group name called “Shipping address” and a Match key called “Shipping address” to enable the comparison.

    3. Choose the related Phone number fields, and then enter the Group name and Match key.

      For example, choose input fields Home phone 1, Home phone 2, and Cell phone. Then enter a Group name called “Shipping phone number” and a Match key called “Shipping phone number” to enable the comparison.

      If you have more than one type of data, you can add more groups.

    4. Choose Next.

  7. For Step 4: Review and create, do the following:

    1. Review the selections that you made for the previous steps and edit if necessary.

    2. Choose Create schema mapping.

      Note

      You can’t modify a schema mapping after you associate it to a workflow. You can clone a schema mapping if you want to use an existing configuration to create a new schema mapping.

After you create the schema mapping, you're ready to create a matching workflow or create an ID namespace.

Build custom schema
To create a schema mapping using the Build custom schema option
  1. Sign in to the AWS Management Console and open the AWS Entity Resolution console with your AWS account, if you haven't yet done so.

  2. In the left navigation pane, under Data preparation, choose Schema mappings.

  3. On the Schema mappings page, in the upper right corner, choose Create schema mapping.

  4. For Step 1: Specify schema details, do the following:

    1. For name and creation method, enter a Schema mapping name and an optional Description.

    2. For Creation method, choose Build custom schema.

    3. For Unique ID, enter a unique ID to identify each row of your data.

      For example: Primary_key, Row_ID, or Record_ID.

      Note

      The Unique ID column is required. The Unique ID must be a unique identifier within a single table. However, across different tables, the Unique ID can have duplicate values. If the Unique ID isn't specified, isn't unique within the same source, or overlaps in terms of attribute names across sources, then AWS Entity Resolution rejects the record when the matching workflow is run. If you are using this schema mapping in a rule-based matching workflow, the Unique ID must not exceed 38 characters.

    4. (Optional) If you want to enable Tags for the resource, choose Add new tag, and then enter the Key and Value pair.

    5. Choose Next.

  5. For Step 2: Map input fields, define the input fields you want to use for matching and for optional pass through.

    You can define a maximum of 34 columns total for both matching and pass through.

    1. For Input fields for matching, add an Input field, and its corresponding Input type, Match key, and Hashing status.

      You can add a maximum of 24 input fields total for matching.

      The Input type helps you classify the data. The Match key enables input field comparison to your matching workflow. The Hashing status indicates if the column value for that input field is hashed or cleartext.

      Note

      If you're creating a schema mapping to use with the LiveRamp provider service-based matching technique, then you can specify the Input type as LiveRamp ID. If you want to include PII data in the output, then you must specify the Input type as Custom string.

      Note

      If you're creating a schema mapping to use with the machine learning-based matching workflow, your dataset must contain at least one of the following attributes: phonenumber, emailaddress, fullname, addresses, or birthdate.

      Don't specify the Input type for any of these attributes as a Custom string.

    2. (Optional) For Input fields for pass through, add the input fields that won't be matched and their corresponding Hashing status.

    3. Choose Next.

  6. For Step 3: Group data:

    1. Choose the related Name fields, and then enter the Group name and Match key.

      For example, choose input fields First name, Middle name, and Last name. Then enter a Group name called “Full name” and a Match key called “Full name” to enable the comparison.

    2. Choose the related Address fields, and then enter the Group name and Match key.

      For example, choose input fields Home street address 1, Home street address 2, and Home city. Then enter a Group name called “Shipping address” and a Match key called “Shipping address” to enable the comparison.

    3. Choose the related Phone number fields, and then enter the Group name and Match key.

      For example, choose input fields Home phone 1, Home phone 2, and Cell phone. Then enter a Group name called “Shipping phone number” and a Match key called “Shipping phone number” to enable the comparison.

      If you have more than one type of data, you can add more groups.

    4. Choose Next.

  7. For Step 4: Review and create, do the following:

    1. Review the selections that you made for the previous steps and edit if necessary.

    2. Choose Create schema mapping.

      Note

      You can’t modify a schema mapping after you associate it with a workflow. You can clone a schema mapping if you want to use an existing configuration to create a new schema mapping.

After you create the schema mapping, you're ready to create a matching workflow or create an ID namespace.

Use JSON editor
To create a schema mapping by using the JSON editor
  1. Sign in to the AWS Management Console and open the AWS Entity Resolution console with your AWS account, if you haven't yet done so.

  2. In the left navigation pane, under Data preparation, choose Schema mappings.

  3. On the Schema mappings page, in the upper right corner, choose Create schema mapping.

  4. For Step 1: Specify schema details, do the following:

    1. For name and creation method, enter a Schema mapping name and an optional Description.

    2. For Creation method, choose Use JSON editor.

    3. (Optional) If you want to enable Tags for the resource, choose Add new tag, and then enter the Key and Value pair.

    4. Choose Next.

  5. For Step 2: Specify mapping:

    1. Start building the schema in the JSON editor or choose one of the following options based on your goal:

      Your goal Recommended option
      Start building your schema mapping Insert sample JSON and then edit the information as necessary.
      Use an existing JSON file Import from file
    2. Choose Next.

  6. For Step 3: Review and create:

    1. Review the selections that you made for the previous steps and edit if necessary.

    2. Choose Create schema mapping.

      Note

      You can’t modify a schema mapping after you associate it with a workflow. You can clone a schema mapping if you want to use an existing configuration to create a new schema mapping.

After you create the schema mapping, you're ready to create a matching workflow or create an ID namespace.

Topics