Menu
AWS Glue
Developer Guide

Working with Crawlers on the AWS Glue Console

A crawler accesses your data store, extracts metadata, and creates table definitions in the AWS Glue Data Catalog. The Crawlers tab in the AWS Glue console lists all the crawlers that you create. The list displays status and metrics from the last run of your crawler.

To add a crawler using the console:

  1. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/. Choose the Crawlers tab.

  2. Choose Add crawler, and follow the instructions in the Add crawler wizard.

    Note

    To get step-by-step guidance for adding a crawler, see the Add a crawler tutorial in the console.

Viewing Crawler Results

To view the results of a crawler, find the crawler name in the list and choose the Logs link. This link takes you to the CloudWatch Logs where you can see details about which tables were created in the AWS Glue Data Catalog and any errors that were encountered. You can manage your log retention period in the CloudWatch console. The default log retention is Never Expire. For more information about how to change the retention period, see Change Log Data Retention in CloudWatch Logs.

To see details of a crawler, choose the crawler name in the list. Crawler details include the information you defined when you created the crawler with the Add crawler wizard. When a crawler run completes, choose the Tables tab to see the tables that were created by your crawler in the database you specified.

Note

The crawler assumes the permissions of the IAM role that you specify when you define it. This IAM role must have permission to extract data from your data store and write to the Data Catalog. The AWS Glue console lists only IAM roles that have attached a trust policy for the AWS Glue principal service. For more information about providing roles for AWS Glue, see Using Identity-Based Policies (IAM Policies).

The following are some important properties and metrics about the last run of a crawler:

Name

When you create a crawler, you must give it a unique name.

Schedule

You can choose to run your crawler on demand or choose a frequency with a schedule. For more information about scheduling a crawler, see Scheduling a Crawler.

Status

A crawler can be ready, starting, stopping, scheduled, or schedule paused. A running crawler progresses from starting to stopping. You can resume or pause a schedule attached to a crawler.

Logs

Links to any available logs from the last run of the crawler.

Last runtime

The amount of time it took the crawler to run when it last ran.

Median runtime

The median amount of time it took the crawler to run since it was created.

Tables updated

The number of tables in the AWS Glue Data Catalog that were updated by the latest run of the crawler.

Tables added

The number of tables that were added into the AWS Glue Data Catalog by the latest run of the crawler.

When you crawl a JDBC data store, a connection is required. When you specify the include path for JDBC, prefix the string with jdbc:.

For example:

jdbc:mysql://dnd-rds-glue-test.cbjxkctlvjbe.us-east-1.rds.amazonaws.com:3306/sampledatabase

An exclude path is relative to the include path. For example, to exclude a table in your data store, type the table name in the exclude path.

On this page: