使用 AWS SDK for .NET 的 AWS Glue 示例 - AWS SDK for .NET

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 AWS SDK for .NET 的 AWS Glue 示例

以下代码示例演示了如何将AWS SDK for .NET 与 AWS Glue 结合使用,以执行操作和实现常见场景。

操作是大型程序的代码摘录,必须在上下文中运行。您可以通过操作了解如何调用单个服务函数,还可以通过函数相关场景和跨服务示例的上下文查看操作。

场景是指显示如何通过在同一服务中调用多个函数来完成特定任务的代码示例。

每个示例都包含一个指向的链接 GitHub,您可以在其中找到有关如何在上下文中设置和运行代码的说明。

开始使用

以下代码示例演示了如何开始使用 AWS Glue。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

namespace GlueActions; public class HelloGlue { private static ILogger logger = null!; static async Task Main(string[] args) { // Set up dependency injection for AWS Glue. using var host = Host.CreateDefaultBuilder(args) .ConfigureLogging(logging => logging.AddFilter("System", LogLevel.Debug) .AddFilter<DebugLoggerProvider>("Microsoft", LogLevel.Information) .AddFilter<ConsoleLoggerProvider>("Microsoft", LogLevel.Trace)) .ConfigureServices((_, services) => services.AddAWSService<IAmazonGlue>() .AddTransient<GlueWrapper>() ) .Build(); logger = LoggerFactory.Create(builder => { builder.AddConsole(); }) .CreateLogger<HelloGlue>(); var glueClient = host.Services.GetRequiredService<IAmazonGlue>(); var request = new ListJobsRequest(); var jobNames = new List<string>(); do { var response = await glueClient.ListJobsAsync(request); jobNames.AddRange(response.JobNames); request.NextToken = response.NextToken; } while (request.NextToken is not null); Console.Clear(); Console.WriteLine("Hello, Glue. Let's list your existing Glue Jobs:"); if (jobNames.Count == 0) { Console.WriteLine("You don't have any AWS Glue jobs."); } else { jobNames.ForEach(Console.WriteLine); } } }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考ListJobs中的。

操作

以下代码示例演示了如何创建 AWS Glue 爬网程序。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Create an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name for the crawler.</param> /// <param name="crawlerDescription">A description of the crawler.</param> /// <param name="role">The AWS Identity and Access Management (IAM) role to /// be assumed by the crawler.</param> /// <param name="schedule">The schedule on which the crawler will be executed.</param> /// <param name="s3Path">The path to the Amazon Simple Storage Service (Amazon S3) /// bucket where the Python script has been stored.</param> /// <param name="dbName">The name to use for the database that will be /// created by the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> CreateCrawlerAsync( string crawlerName, string crawlerDescription, string role, string schedule, string s3Path, string dbName) { var s3Target = new S3Target { Path = s3Path, }; var targetList = new List<S3Target> { s3Target, }; var targets = new CrawlerTargets { S3Targets = targetList, }; var crawlerRequest = new CreateCrawlerRequest { DatabaseName = dbName, Name = crawlerName, Description = crawlerDescription, Targets = targets, Role = role, Schedule = schedule, }; var response = await _amazonGlue.CreateCrawlerAsync(crawlerRequest); return response.HttpStatusCode == System.Net.HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考CreateCrawler中的。

以下代码示例演示如何创建 AWS Glue 任务定义。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Create an AWS Glue job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <param name="roleName">The name of the IAM role to be assumed by /// the job.</param> /// <param name="description">A description of the job.</param> /// <param name="scriptUrl">The URL to the script.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> CreateJobAsync(string dbName, string tableName, string bucketUrl, string jobName, string roleName, string description, string scriptUrl) { var command = new JobCommand { PythonVersion = "3", Name = "glueetl", ScriptLocation = scriptUrl, }; var arguments = new Dictionary<string, string> { { "--input_database", dbName }, { "--input_table", tableName }, { "--output_bucket_url", bucketUrl } }; var request = new CreateJobRequest { Command = command, DefaultArguments = arguments, Description = description, GlueVersion = "3.0", Name = jobName, NumberOfWorkers = 10, Role = roleName, WorkerType = "G.1X" }; var response = await _amazonGlue.CreateJobAsync(request); return response.HttpStatusCode == HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考CreateJob中的。

以下代码示例演示了如何删除 AWS Glue 爬网程序。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Delete an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteCrawlerAsync(string crawlerName) { var response = await _amazonGlue.DeleteCrawlerAsync(new DeleteCrawlerRequest { Name = crawlerName }); return response.HttpStatusCode == HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考DeleteCrawler中的。

以下代码示例演示了如何从 AWS Glue Data Catalog 中删除数据库。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Delete the AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteDatabaseAsync(string dbName) { var response = await _amazonGlue.DeleteDatabaseAsync(new DeleteDatabaseRequest { Name = dbName }); return response.HttpStatusCode == HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考DeleteDatabase中的。

以下代码示例演示如何删除 AWS Glue 任务定义和所有相关运行。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Delete an AWS Glue job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteJobAsync(string jobName) { var response = await _amazonGlue.DeleteJobAsync(new DeleteJobRequest { JobName = jobName }); return response.HttpStatusCode == HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考DeleteJob中的。

以下代码示例演示了如何从 AWS Glue Data Catalog 数据库中删除表。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Delete a table from an AWS Glue database. /// </summary> /// <param name="tableName">The table to delete.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteTableAsync(string dbName, string tableName) { var response = await _amazonGlue.DeleteTableAsync(new DeleteTableRequest { Name = tableName, DatabaseName = dbName }); return response.HttpStatusCode == HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考DeleteTable中的。

以下代码示例演示了如何获取 AWS Glue 爬网程序。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Get information about an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Crawler object describing the crawler.</returns> public async Task<Crawler?> GetCrawlerAsync(string crawlerName) { var crawlerRequest = new GetCrawlerRequest { Name = crawlerName, }; var response = await _amazonGlue.GetCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { var databaseName = response.Crawler.DatabaseName; Console.WriteLine($"{crawlerName} has the database {databaseName}"); return response.Crawler; } Console.WriteLine($"No information regarding {crawlerName} could be found."); return null; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考GetCrawler中的。

以下代码示例演示了如何从 AWS Glue Data Catalog 中获取数据库。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Get information about an AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A Database object containing information about the database.</returns> public async Task<Database> GetDatabaseAsync(string dbName) { var databasesRequest = new GetDatabaseRequest { Name = dbName, }; var response = await _amazonGlue.GetDatabaseAsync(databasesRequest); return response.Database; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考GetDatabase中的。

以下代码示例演示如何让 AWS Glue 任务运行。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Get information about a specific AWS Glue job run. /// </summary> /// <param name="jobName">The name of the job.</param> /// <param name="jobRunId">The Id of the job run.</param> /// <returns>A JobRun object with information about the job run.</returns> public async Task<JobRun> GetJobRunAsync(string jobName, string jobRunId) { var response = await _amazonGlue.GetJobRunAsync(new GetJobRunRequest { JobName = jobName, RunId = jobRunId }); return response.JobRun; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考GetJobRun中的。

以下代码示例演示如何获取 AWS Glue 任务运行情况。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Get information about all AWS Glue runs of a specific job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A list of JobRun objects.</returns> public async Task<List<JobRun>> GetJobRunsAsync(string jobName) { var jobRuns = new List<JobRun>(); var request = new GetJobRunsRequest { JobName = jobName, }; // No need to loop to get all the log groups--the SDK does it for us behind the scenes var paginatorForJobRuns = _amazonGlue.Paginators.GetJobRuns(request); await foreach (var response in paginatorForJobRuns.Responses) { response.JobRuns.ForEach(jobRun => { jobRuns.Add(jobRun); }); } return jobRuns; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考GetJobRuns中的。

以下代码示例演示了如何在 AWS Glue Data Catalog 中从数据库获取表。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Get a list of tables for an AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A list of Table objects.</returns> public async Task<List<Table>> GetTablesAsync(string dbName) { var request = new GetTablesRequest { DatabaseName = dbName }; var tables = new List<Table>(); // Get a paginator for listing the tables. var tablePaginator = _amazonGlue.Paginators.GetTables(request); await foreach (var response in tablePaginator.Responses) { tables.AddRange(response.TableList); } return tables; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考GetTables中的。

以下代码示例演示如何列出 AWS Glue 任务定义。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// List AWS Glue jobs using a paginator. /// </summary> /// <returns>A list of AWS Glue job names.</returns> public async Task<List<string>> ListJobsAsync() { var jobNames = new List<string>(); var listJobsPaginator = _amazonGlue.Paginators.ListJobs(new ListJobsRequest { MaxResults = 10 }); await foreach (var response in listJobsPaginator.Responses) { jobNames.AddRange(response.JobNames); } return jobNames; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考ListJobs中的。

以下代码示例演示了如何启动 AWS Glue 爬网程序。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Start an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> StartCrawlerAsync(string crawlerName) { var crawlerRequest = new StartCrawlerRequest { Name = crawlerName, }; var response = await _amazonGlue.StartCrawlerAsync(crawlerRequest); return response.HttpStatusCode == System.Net.HttpStatusCode.OK; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考StartCrawler中的。

以下代码示例演示如何启动 AWS Glue 任务运行。

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

/// <summary> /// Start an AWS Glue job run. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A string representing the job run Id.</returns> public async Task<string> StartJobRunAsync( string jobName, string inputDatabase, string inputTable, string bucketName) { var request = new StartJobRunRequest { JobName = jobName, Arguments = new Dictionary<string, string> { {"--input_database", inputDatabase}, {"--input_table", inputTable}, {"--output_bucket_url", $"s3://{bucketName}/"} } }; var response = await _amazonGlue.StartJobRunAsync(request); return response.JobRunId; }
  • 有关 API 的详细信息,请参阅 AWS SDK for .NETAPI 参考StartJobRun中的。

场景

以下代码示例演示了操作流程:

  • 创建爬网程序,爬取公有 Amazon S3 存储桶并生成包含 CSV 格式的元数据的数据库。

  • 在 AWS Glue Data Catalog 中列出有关数据库和表的信息。

  • 创建任务,从 S3 存储桶提取 CSV 数据,转换数据,然后将 JSON 格式的输出加载到另一个 S3 存储桶中。

  • 列出有关任务运行的信息,查看转换后的数据,并清理资源。

有关更多信息,请参阅教程:AWS Glue Studio 入门

AWS SDK for .NET
注意

还有更多相关信息 GitHub。在 AWS 代码示例存储库中查找完整示例,了解如何进行设置和运行。

创建一个包含场景中使用的 AWS Glue 函数的类。

using System.Net; namespace GlueActions; public class GlueWrapper { private readonly IAmazonGlue _amazonGlue; /// <summary> /// Constructor for the AWS Glue actions wrapper. /// </summary> /// <param name="amazonGlue"></param> public GlueWrapper(IAmazonGlue amazonGlue) { _amazonGlue = amazonGlue; } /// <summary> /// Create an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name for the crawler.</param> /// <param name="crawlerDescription">A description of the crawler.</param> /// <param name="role">The AWS Identity and Access Management (IAM) role to /// be assumed by the crawler.</param> /// <param name="schedule">The schedule on which the crawler will be executed.</param> /// <param name="s3Path">The path to the Amazon Simple Storage Service (Amazon S3) /// bucket where the Python script has been stored.</param> /// <param name="dbName">The name to use for the database that will be /// created by the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> CreateCrawlerAsync( string crawlerName, string crawlerDescription, string role, string schedule, string s3Path, string dbName) { var s3Target = new S3Target { Path = s3Path, }; var targetList = new List<S3Target> { s3Target, }; var targets = new CrawlerTargets { S3Targets = targetList, }; var crawlerRequest = new CreateCrawlerRequest { DatabaseName = dbName, Name = crawlerName, Description = crawlerDescription, Targets = targets, Role = role, Schedule = schedule, }; var response = await _amazonGlue.CreateCrawlerAsync(crawlerRequest); return response.HttpStatusCode == System.Net.HttpStatusCode.OK; } /// <summary> /// Create an AWS Glue job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <param name="roleName">The name of the IAM role to be assumed by /// the job.</param> /// <param name="description">A description of the job.</param> /// <param name="scriptUrl">The URL to the script.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> CreateJobAsync(string dbName, string tableName, string bucketUrl, string jobName, string roleName, string description, string scriptUrl) { var command = new JobCommand { PythonVersion = "3", Name = "glueetl", ScriptLocation = scriptUrl, }; var arguments = new Dictionary<string, string> { { "--input_database", dbName }, { "--input_table", tableName }, { "--output_bucket_url", bucketUrl } }; var request = new CreateJobRequest { Command = command, DefaultArguments = arguments, Description = description, GlueVersion = "3.0", Name = jobName, NumberOfWorkers = 10, Role = roleName, WorkerType = "G.1X" }; var response = await _amazonGlue.CreateJobAsync(request); return response.HttpStatusCode == HttpStatusCode.OK; } /// <summary> /// Delete an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteCrawlerAsync(string crawlerName) { var response = await _amazonGlue.DeleteCrawlerAsync(new DeleteCrawlerRequest { Name = crawlerName }); return response.HttpStatusCode == HttpStatusCode.OK; } /// <summary> /// Delete the AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteDatabaseAsync(string dbName) { var response = await _amazonGlue.DeleteDatabaseAsync(new DeleteDatabaseRequest { Name = dbName }); return response.HttpStatusCode == HttpStatusCode.OK; } /// <summary> /// Delete an AWS Glue job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteJobAsync(string jobName) { var response = await _amazonGlue.DeleteJobAsync(new DeleteJobRequest { JobName = jobName }); return response.HttpStatusCode == HttpStatusCode.OK; } /// <summary> /// Delete a table from an AWS Glue database. /// </summary> /// <param name="tableName">The table to delete.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> DeleteTableAsync(string dbName, string tableName) { var response = await _amazonGlue.DeleteTableAsync(new DeleteTableRequest { Name = tableName, DatabaseName = dbName }); return response.HttpStatusCode == HttpStatusCode.OK; } /// <summary> /// Get information about an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Crawler object describing the crawler.</returns> public async Task<Crawler?> GetCrawlerAsync(string crawlerName) { var crawlerRequest = new GetCrawlerRequest { Name = crawlerName, }; var response = await _amazonGlue.GetCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { var databaseName = response.Crawler.DatabaseName; Console.WriteLine($"{crawlerName} has the database {databaseName}"); return response.Crawler; } Console.WriteLine($"No information regarding {crawlerName} could be found."); return null; } /// <summary> /// Get information about the state of an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A value describing the state of the crawler.</returns> public async Task<CrawlerState> GetCrawlerStateAsync(string crawlerName) { var response = await _amazonGlue.GetCrawlerAsync( new GetCrawlerRequest { Name = crawlerName }); return response.Crawler.State; } /// <summary> /// Get information about an AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A Database object containing information about the database.</returns> public async Task<Database> GetDatabaseAsync(string dbName) { var databasesRequest = new GetDatabaseRequest { Name = dbName, }; var response = await _amazonGlue.GetDatabaseAsync(databasesRequest); return response.Database; } /// <summary> /// Get information about a specific AWS Glue job run. /// </summary> /// <param name="jobName">The name of the job.</param> /// <param name="jobRunId">The Id of the job run.</param> /// <returns>A JobRun object with information about the job run.</returns> public async Task<JobRun> GetJobRunAsync(string jobName, string jobRunId) { var response = await _amazonGlue.GetJobRunAsync(new GetJobRunRequest { JobName = jobName, RunId = jobRunId }); return response.JobRun; } /// <summary> /// Get information about all AWS Glue runs of a specific job. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A list of JobRun objects.</returns> public async Task<List<JobRun>> GetJobRunsAsync(string jobName) { var jobRuns = new List<JobRun>(); var request = new GetJobRunsRequest { JobName = jobName, }; // No need to loop to get all the log groups--the SDK does it for us behind the scenes var paginatorForJobRuns = _amazonGlue.Paginators.GetJobRuns(request); await foreach (var response in paginatorForJobRuns.Responses) { response.JobRuns.ForEach(jobRun => { jobRuns.Add(jobRun); }); } return jobRuns; } /// <summary> /// Get a list of tables for an AWS Glue database. /// </summary> /// <param name="dbName">The name of the database.</param> /// <returns>A list of Table objects.</returns> public async Task<List<Table>> GetTablesAsync(string dbName) { var request = new GetTablesRequest { DatabaseName = dbName }; var tables = new List<Table>(); // Get a paginator for listing the tables. var tablePaginator = _amazonGlue.Paginators.GetTables(request); await foreach (var response in tablePaginator.Responses) { tables.AddRange(response.TableList); } return tables; } /// <summary> /// List AWS Glue jobs using a paginator. /// </summary> /// <returns>A list of AWS Glue job names.</returns> public async Task<List<string>> ListJobsAsync() { var jobNames = new List<string>(); var listJobsPaginator = _amazonGlue.Paginators.ListJobs(new ListJobsRequest { MaxResults = 10 }); await foreach (var response in listJobsPaginator.Responses) { jobNames.AddRange(response.JobNames); } return jobNames; } /// <summary> /// Start an AWS Glue crawler. /// </summary> /// <param name="crawlerName">The name of the crawler.</param> /// <returns>A Boolean value indicating the success of the action.</returns> public async Task<bool> StartCrawlerAsync(string crawlerName) { var crawlerRequest = new StartCrawlerRequest { Name = crawlerName, }; var response = await _amazonGlue.StartCrawlerAsync(crawlerRequest); return response.HttpStatusCode == System.Net.HttpStatusCode.OK; } /// <summary> /// Start an AWS Glue job run. /// </summary> /// <param name="jobName">The name of the job.</param> /// <returns>A string representing the job run Id.</returns> public async Task<string> StartJobRunAsync( string jobName, string inputDatabase, string inputTable, string bucketName) { var request = new StartJobRunRequest { JobName = jobName, Arguments = new Dictionary<string, string> { {"--input_database", inputDatabase}, {"--input_table", inputTable}, {"--output_bucket_url", $"s3://{bucketName}/"} } }; var response = await _amazonGlue.StartJobRunAsync(request); return response.JobRunId; } }

创建运行场景的类。

global using Amazon.Glue; global using GlueActions; global using Microsoft.Extensions.Configuration; global using Microsoft.Extensions.DependencyInjection; global using Microsoft.Extensions.Hosting; global using Microsoft.Extensions.Logging; global using Microsoft.Extensions.Logging.Console; global using Microsoft.Extensions.Logging.Debug; using Amazon.Glue.Model; using Amazon.S3; using Amazon.S3.Model; namespace GlueBasics; public class GlueBasics { private static ILogger logger = null!; private static IConfiguration _configuration = null!; static async Task Main(string[] args) { // Set up dependency injection for AWS Glue. using var host = Host.CreateDefaultBuilder(args) .ConfigureLogging(logging => logging.AddFilter("System", LogLevel.Debug) .AddFilter<DebugLoggerProvider>("Microsoft", LogLevel.Information) .AddFilter<ConsoleLoggerProvider>("Microsoft", LogLevel.Trace)) .ConfigureServices((_, services) => services.AddAWSService<IAmazonGlue>() .AddTransient<GlueWrapper>() .AddTransient<UiWrapper>() ) .Build(); logger = LoggerFactory.Create(builder => { builder.AddConsole(); }) .CreateLogger<GlueBasics>(); _configuration = new ConfigurationBuilder() .SetBasePath(Directory.GetCurrentDirectory()) .AddJsonFile("settings.json") // Load settings from .json file. .AddJsonFile("settings.local.json", true) // Optionally load local settings. .Build(); // These values are stored in settings.json // Once you have run the CDK script to deploy the resources, // edit the file to set "BucketName", "RoleName", and "ScriptURL" // to the appropriate values. Also set "CrawlerName" to the name // you want to give the crawler when it is created. string bucketName = _configuration["BucketName"]!; string bucketUrl = _configuration["BucketUrl"]!; string crawlerName = _configuration["CrawlerName"]!; string roleName = _configuration["RoleName"]!; string sourceData = _configuration["SourceData"]!; string dbName = _configuration["DbName"]!; string cron = _configuration["Cron"]!; string scriptUrl = _configuration["ScriptURL"]!; string jobName = _configuration["JobName"]!; var wrapper = host.Services.GetRequiredService<GlueWrapper>(); var uiWrapper = host.Services.GetRequiredService<UiWrapper>(); uiWrapper.DisplayOverview(); uiWrapper.PressEnter(); // Create the crawler and wait for it to be ready. uiWrapper.DisplayTitle("Create AWS Glue crawler"); Console.WriteLine("Let's begin by creating the AWS Glue crawler."); var crawlerDescription = "Crawler created for the AWS Glue Basics scenario."; var crawlerCreated = await wrapper.CreateCrawlerAsync(crawlerName, crawlerDescription, roleName, cron, sourceData, dbName); if (crawlerCreated) { Console.WriteLine($"The crawler: {crawlerName} has been created. Now let's wait until it's ready."); CrawlerState crawlerState; do { crawlerState = await wrapper.GetCrawlerStateAsync(crawlerName); } while (crawlerState != "READY"); Console.WriteLine($"The crawler {crawlerName} is now ready for use."); } else { Console.WriteLine($"Couldn't create crawler {crawlerName}."); return; // Exit the application. } uiWrapper.DisplayTitle("Start AWS Glue crawler"); Console.WriteLine("Now let's wait until the crawler has successfully started."); var crawlerStarted = await wrapper.StartCrawlerAsync(crawlerName); if (crawlerStarted) { CrawlerState crawlerState; do { crawlerState = await wrapper.GetCrawlerStateAsync(crawlerName); } while (crawlerState != "READY"); Console.WriteLine($"The crawler {crawlerName} is now ready for use."); } else { Console.WriteLine($"Couldn't start the crawler {crawlerName}."); return; // Exit the application. } uiWrapper.PressEnter(); Console.WriteLine($"\nLet's take a look at the database: {dbName}"); var database = await wrapper.GetDatabaseAsync(dbName); if (database != null) { uiWrapper.DisplayTitle($"{database.Name} Details"); Console.WriteLine($"{database.Name} created on {database.CreateTime}"); Console.WriteLine(database.Description); } uiWrapper.PressEnter(); var tables = await wrapper.GetTablesAsync(dbName); if (tables.Count > 0) { tables.ForEach(table => { Console.WriteLine($"{table.Name}\tCreated: {table.CreateTime}\tUpdated: {table.UpdateTime}"); }); } uiWrapper.PressEnter(); uiWrapper.DisplayTitle("Create AWS Glue job"); Console.WriteLine("Creating a new AWS Glue job."); var description = "An AWS Glue job created using the AWS SDK for .NET"; await wrapper.CreateJobAsync(dbName, tables[0].Name, bucketUrl, jobName, roleName, description, scriptUrl); uiWrapper.PressEnter(); uiWrapper.DisplayTitle("Starting AWS Glue job"); Console.WriteLine("Starting the new AWS Glue job..."); var jobRunId = await wrapper.StartJobRunAsync(jobName, dbName, tables[0].Name, bucketName); var jobRunComplete = false; var jobRun = new JobRun(); do { jobRun = await wrapper.GetJobRunAsync(jobName, jobRunId); if (jobRun.JobRunState == "SUCCEEDED" || jobRun.JobRunState == "STOPPED" || jobRun.JobRunState == "FAILED" || jobRun.JobRunState == "TIMEOUT") { jobRunComplete = true; } } while (!jobRunComplete); uiWrapper.DisplayTitle($"Data in {bucketName}"); // Get the list of data stored in the S3 bucket. var s3Client = new AmazonS3Client(); var response = await s3Client.ListObjectsAsync(new ListObjectsRequest { BucketName = bucketName }); response.S3Objects.ForEach(s3Object => { Console.WriteLine(s3Object.Key); }); uiWrapper.DisplayTitle("AWS Glue jobs"); var jobNames = await wrapper.ListJobsAsync(); jobNames.ForEach(jobName => { Console.WriteLine(jobName); }); uiWrapper.PressEnter(); uiWrapper.DisplayTitle("Get AWS Glue job run information"); Console.WriteLine("Getting information about the AWS Glue job."); var jobRuns = await wrapper.GetJobRunsAsync(jobName); jobRuns.ForEach(jobRun => { Console.WriteLine($"{jobRun.JobName}\t{jobRun.JobRunState}\t{jobRun.CompletedOn}"); }); uiWrapper.PressEnter(); uiWrapper.DisplayTitle("Deleting resources"); Console.WriteLine("Deleting the AWS Glue job used by the example."); await wrapper.DeleteJobAsync(jobName); Console.WriteLine("Deleting the tables from the database."); tables.ForEach(async table => { await wrapper.DeleteTableAsync(dbName, table.Name); }); Console.WriteLine("Deleting the database."); await wrapper.DeleteDatabaseAsync(dbName); Console.WriteLine("Deleting the AWS Glue crawler."); await wrapper.DeleteCrawlerAsync(crawlerName); Console.WriteLine("The AWS Glue scenario has completed."); uiWrapper.PressEnter(); } } namespace GlueBasics; public class UiWrapper { public readonly string SepBar = new string('-', Console.WindowWidth); /// <summary> /// Show information about the scenario. /// </summary> public void DisplayOverview() { Console.Clear(); DisplayTitle("Amazon Glue: get started with crawlers and jobs"); Console.WriteLine("This example application does the following:"); Console.WriteLine("\t 1. Create a crawler, pass it the IAM role and the URL to the public S3 bucket that contains the source data"); Console.WriteLine("\t 2. Start the crawler."); Console.WriteLine("\t 3. Get the database created by the crawler and the tables in the database."); Console.WriteLine("\t 4. Create a job."); Console.WriteLine("\t 5. Start a job run."); Console.WriteLine("\t 6. Wait for the job run to complete."); Console.WriteLine("\t 7. Show the data stored in the bucket."); Console.WriteLine("\t 8. List jobs for the account."); Console.WriteLine("\t 9. Get job run details for the job that was run."); Console.WriteLine("\t10. Delete the demo job."); Console.WriteLine("\t11. Delete the database and tables created for the demo."); Console.WriteLine("\t12. Delete the crawler."); } /// <summary> /// Display a message and wait until the user presses enter. /// </summary> public void PressEnter() { Console.Write("\nPlease press <Enter> to continue. "); _ = Console.ReadLine(); } /// <summary> /// Pad a string with spaces to center it on the console display. /// </summary> /// <param name="strToCenter">The string to center on the screen.</param> /// <returns>The string padded to make it center on the screen.</returns> public string CenterString(string strToCenter) { var padAmount = (Console.WindowWidth - strToCenter.Length) / 2; var leftPad = new string(' ', padAmount); return $"{leftPad}{strToCenter}"; } /// <summary> /// Display a line of hyphens, the centered text of the title and another /// line of hyphens. /// </summary> /// <param name="strTitle">The string to be displayed.</param> public void DisplayTitle(string strTitle) { Console.WriteLine(SepBar); Console.WriteLine(CenterString(strTitle)); Console.WriteLine(SepBar); } }