Step 8: Use a Blueprint to Create a Workflow - AWS Lake Formation

Step 8: Use a Blueprint to Create a Workflow

The AWS Lake Formation workflow generates the AWS Glue jobs, crawlers, and triggers that discover and ingest data into your data lake. You create a workflow based on one of the predefined Lake Formation blueprints.

  1. On the Lake Formation console, in the navigation pane, choose Blueprints, and then choose Use blueprint.

  2. On the Use a blueprint page, under Blueprint type, choose Database snapshot.

  3. Under Import source, for Database connection, choose the connection that you just created, datalake-tutorial, or choose an existing connection for your data source.

  4. For Source data path, enter the path from which to ingest data, in the form <database>/<schema>/<table>.

    You can substitute the percent (%) wildcard for schema or table. For databases that support schemas, enter <database>/<schema>/% to match all tables in <schema> within <database>. Oracle Database and MySQL don’t support schema in the path; instead, enter <database>/%. For Oracle Database, <database> is the system identifier (SID).

    For example, if an Oracle database has orcl as its SID, enter orcl/% to match all tables that the user specified in the JDCB connection has access to.

    Important

    This field is case-sensitive.

  5. Under Import target, specify these parameters:

    Target database lakeformation_tutorial
    Target storage location s3://<yourName>-datalake-tutorial
    Data format (Choose Parquet or CSV)
  6. For import frequency, choose Run on demand.

  7. Under Import options, specify these parameters:

    Workflow name lakeformationjdbctest
    IAM role LakeFormationWorkflowRole
    Table prefix jdbctest
    Note

    Must be lower case.

  8. Choose Create, and wait for the console to report that the workflow was successfully created.

    Tip

    Did you get the following error message?

    User: arn:aws:iam::<account-id>:user/<datalake_administrator_user> is not authorized to perform: iam:PassRole on resource:arn:aws:iam::<account-id>:role/LakeFormationWorkflowRole...

    If so, check that you replaced <account-id> in the inline policy for the data lake administrator user with a valid AWS account number.