There are more AWS SDK examples available in the AWS Doc SDK Examples
Getting started with the Glue Data Catalog
The following code example shows how to:
Create a database
Create a table
Clean up resources
- Bash
-
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # AWS Glue Data Catalog Tutorial Script # This script demonstrates how to create and manage AWS Glue Data Catalog resources using the AWS CLI # Cost improvements: Reduced API calls, optimized queries, eliminated redundant operations # Reliability improvements: Enhanced error handling, input validation, resource tracking set -euo pipefail # Setup logging LOG_FILE="glue-tutorial-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS Glue Data Catalog tutorial script at $(date)" echo "All operations will be logged to $LOG_FILE" # Generate a unique identifier for resource names UNIQUE_ID=$(openssl rand -hex 4) DB_NAME="tutorial-db-${UNIQUE_ID}" TABLE_NAME="flights-data-${UNIQUE_ID}" TABLE_INPUT_FILE="table-input-${UNIQUE_ID}.json" # Track created resources declare -a CREATED_RESOURCES=() # Set default region if not provided AWS_REGION="${AWS_REGION:-us-east-1}" # Flag to track if database was successfully created DATABASE_CREATED=false # Trap to ensure cleanup on exit trap cleanup_resources EXIT # Function to check command status check_status() { if [ $? -ne 0 ]; then echo "ERROR: $1 failed." >&2 exit 1 fi } # Function to cleanup resources cleanup_resources() { local exit_code=$? echo "Attempting to clean up resources..." # Delete resources in reverse order for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do resource=${CREATED_RESOURCES[$i]} resource_type=$(echo "$resource" | cut -d':' -f1) resource_name=$(echo "$resource" | cut -d':' -f2) echo "Deleting $resource_type: $resource_name" case $resource_type in "table") if [ "$DATABASE_CREATED" = true ]; then aws glue delete-table \ --database-name "$DB_NAME" \ --name "$resource_name" \ --region "$AWS_REGION" \ 2>/dev/null || echo "Warning: Failed to delete table $resource_name" fi ;; "database") aws glue delete-database \ --name "$resource_name" \ --region "$AWS_REGION" \ 2>/dev/null || echo "Warning: Failed to delete database $resource_name" ;; *) echo "Unknown resource type: $resource_type" >&2 ;; esac done # Clean up temporary files securely if [ -f "$TABLE_INPUT_FILE" ]; then if command -v shred &> /dev/null; then shred -vfz -n 3 "$TABLE_INPUT_FILE" 2>/dev/null || rm -f "$TABLE_INPUT_FILE" else rm -f "$TABLE_INPUT_FILE" fi fi echo "Cleanup completed." exit $exit_code } # Function to validate prerequisites validate_prerequisites() { # Validate AWS CLI is available if ! command -v aws &> /dev/null; then echo "ERROR: AWS CLI is not installed or not in PATH" >&2 exit 1 fi # Validate AWS CLI version local AWS_CLI_VERSION AWS_CLI_VERSION=$(aws --version 2>&1 | cut -d' ' -f1 | cut -d'/' -f2 | cut -d'.' -f1) if [ "$AWS_CLI_VERSION" -lt 1 ]; then echo "ERROR: AWS CLI is required" >&2 exit 1 fi # Validate jq is available for JSON validation if ! command -v jq &> /dev/null; then echo "ERROR: jq is not installed or not in PATH" >&2 exit 1 fi # Validate AWS credentials and get account identity in single call (cost optimization) local CALLER_IDENTITY CALLER_IDENTITY=$(aws sts get-caller-identity --region "$AWS_REGION" --query 'Account' --output text 2>/dev/null) || { echo "ERROR: Failed to get AWS caller identity. Check credentials and permissions." >&2 exit 1 } if [ -z "$CALLER_IDENTITY" ] || [ "$CALLER_IDENTITY" == "None" ]; then echo "ERROR: Unable to determine AWS account identity" >&2 exit 1 fi echo "Using AWS Account: $CALLER_IDENTITY" echo "Using Region: $AWS_REGION" } # Function to create database with verification create_database() { echo "Step 1: Creating a database named $DB_NAME" if ! aws glue create-database \ --database-input "Name=$DB_NAME,Description=Database for AWS Glue tutorial" \ --region "$AWS_REGION" \ --output json > /dev/null 2>&1; then echo "ERROR: Failed to create database $DB_NAME" >&2 exit 1 fi DATABASE_CREATED=true CREATED_RESOURCES+=("database:$DB_NAME") echo "Database $DB_NAME created successfully." } # Function to prepare table input JSON prepare_table_input() { # Create a temporary JSON file for table input with restricted permissions if ! touch "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to create temporary file $TABLE_INPUT_FILE" >&2 exit 1 fi if ! chmod 600 "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to set permissions on $TABLE_INPUT_FILE" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi cat > "$TABLE_INPUT_FILE" << 'EOF' { "Name": "TABLE_NAME_PLACEHOLDER", "StorageDescriptor": { "Columns": [ { "Name": "year", "Type": "bigint" }, { "Name": "quarter", "Type": "bigint" } ], "Location": "s3://crawler-public-us-west-2/flight/2016/csv", "InputFormat": "org.apache.hadoop.mapred.TextInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "Compressed": false, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "Parameters": { "field.delim": ",", "serialization.format": "," } } }, "PartitionKeys": [ { "Name": "mon", "Type": "string" } ], "TableType": "EXTERNAL_TABLE", "Parameters": { "EXTERNAL": "TRUE", "classification": "csv", "columnsOrdered": "true", "compressionType": "none", "delimiter": ",", "skip.header.line.count": "1", "typeOfData": "file" } } EOF # Replace placeholder with actual table name if ! sed -i "s/TABLE_NAME_PLACEHOLDER/$TABLE_NAME/g" "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to substitute table name in JSON file" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi # Validate JSON syntax before using it if ! jq empty "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Invalid JSON in table input file" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi } # Function to create table create_table() { echo "Step 2: Creating a table named $TABLE_NAME in database $DB_NAME" prepare_table_input if ! aws glue create-table \ --database-name "$DB_NAME" \ --table-input "file://${TABLE_INPUT_FILE}" \ --region "$AWS_REGION" \ --output json > /dev/null 2>&1; then echo "ERROR: Failed to create table $TABLE_NAME" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi CREATED_RESOURCES+=("table:$TABLE_NAME") echo "Table $TABLE_NAME created successfully." } # Function to get and display table details display_table_details() { echo "Step 3: Getting details of table $TABLE_NAME" if ! aws glue get-table \ --database-name "$DB_NAME" \ --name "$TABLE_NAME" \ --region "$AWS_REGION" \ --output json; then echo "ERROR: Failed to retrieve table details" >&2 exit 1 fi } # Function to display summary display_summary() { echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "Database: $DB_NAME" echo "Table: $TABLE_NAME" echo "===========================================" } # Main execution flow validate_prerequisites create_database create_table display_table_details display_summary echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Starting cleanup process..." echo "Script completed at $(date)"-
For API details, see the following topics in AWS CLI Command Reference.
-
Scenarios
AWS Health