Working with crawlers on the AWS Glue console - AWS Glue

Working with crawlers on the AWS Glue console

A crawler accesses your data store, extracts metadata, and creates table definitions in the AWS Glue Data Catalog. The Crawlers pane in the AWS Glue console lists all the crawlers that you create. The list displays status and metrics from the last run of your crawler.

To add a crawler using the console

  1. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/. Choose Crawlers in the navigation pane.

  2. Choose Add crawler, and follow the instructions in the Add crawler wizard.

    Note

    To get step-by-step guidance for adding a crawler, choose Add crawler under Tutorials in the navigation pane. You can also use the Add crawler wizard to create and modify an IAM role that attaches a policy that includes permissions for your Amazon Simple Storage Service (Amazon S3) data stores.

    Optionally, you can tag your crawler with a Tag key and optional Tag value. Once created, tag keys are read-only. Use tags on some resources to help you organize and identify them. For more information, see AWS tags in AWS Glue.

    Optionally, you can add a security configuration to a crawler to specify at-rest encryption options.

When a crawler runs, the provided IAM role must have permission to access the data store that is crawled.

When you crawl a JDBC data store, a connection is required. For more information, see Adding an AWS Glue connection. An exclude path is relative to the include path. For example, to exclude a table in your JDBC data store, type the table name in the exclude path.

When you crawl DynamoDB tables, you can choose one table name from the list of DynamoDB tables in your account.

Tip

For more information about configuring crawlers, see Crawler properties.

Viewing Crawler Results and Details

Viewing crawler results and details

After the crawler runs successfully, it creates table definitions in the Data Catalog. Choose Tables in the navigation pane to see the tables that were created by your crawler in the database that you specified.

You can view information related to the crawler itself as follows:

  • The Crawlers page on the AWS Glue console displays the following properties for a crawler:

    Property Description
    Name

    When you create a crawler, you must give it a unique name.

    Status

    A crawler can be ready, starting, stopping, scheduled, or schedule paused. A running crawler progresses from starting to stopping. You can resume or pause a schedule attached to a crawler.

    Schedule

    You can choose to run your crawler on demand or choose a frequency with a schedule. For more information about scheduling a crawler, see Scheduling a crawler.

    Last run

    The date and time of the last time the crawler was run.

    Log

    Links to any available logs from the last run of the crawler.

    Tables changes from last run

    The number of tables in the AWS Glue Data Catalog that were updated by the latest run of the crawler.

  • To view the history of a crawler, choose Crawlers in the navigation pane to see the crawlers you created. Choose a crawler from the list of available crawlers. You can view the crawler properties and view the crawler history in the Crawler runs tab.

    The Crawler runs tab displays information about each time the crawler ran, including Start time (UTC), End time (UTC), Duration, Status, DPU hours, and Table changes.

    
                        The screenshot shows the Crawler runs tab when viewing a crawler's details.
  • To see additional information, choose a tab in the crawler details page. Each tab will display information related to the crawler.

    • Schedule: Any schedules created for the crawler will be visible here.

    • Data sources: All data sources scanned by the crawler will be visible here.

    • Classifiers: All classifiers assigned to the crawler will be visible here.

    • Tags: Any tags created and assigned to an AWS resource will be visible here.