AWS Glue 使用範例 AWS CLI - AWS Command Line Interface

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

AWS Glue 使用範例 AWS CLI

下列程式碼範例說明如何使用 AWS Command Line Interface 與來執行動作及實作常見案例 AWS Glue。

Actions 是大型程式的程式碼摘錄,必須在內容中執行。雖然動作會告訴您如何呼叫個別服務函數,但您可以在其相關情境和跨服務範例中查看內容中的動作。

Scenarios (案例) 是向您展示如何呼叫相同服務中的多個函數來完成特定任務的程式碼範例。

每個範例都包含一個連結 GitHub,您可以在其中找到如何在內容中設定和執行程式碼的指示。

主題

動作

下列程式碼範例會示範如何使用batch-stop-job-run

AWS CLI

若要停止工作執行

下列batch-stop-job-run範例會停止工作執行。

aws glue batch-stop-job-run \ --job-name "my-testing-job" \ --job-run-id jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f

輸出:

{ "SuccessfulSubmissions": [ { "JobName": "my-testing-job", "JobRunId": "jr_852f1de1f29fb62e0ba4166c33970803935d87f14f96cfdee5089d5274a61d3f" } ], "Errors": [], "ResponseMetadata": { "RequestId": "66bd6b90-01db-44ab-95b9-6aeff0e73d88", "HTTPStatusCode": 200, "HTTPHeaders": { "date": "Fri, 16 Oct 2020 20:54:51 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "148", "connection": "keep-alive", "x-amzn-requestid": "66bd6b90-01db-44ab-95b9-6aeff0e73d88" }, "RetryAttempts": 0 } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考BatchStopJobRun中的。

下列程式碼範例會示範如何使用create-connection

AWS CLI

建立 AWS Glue 資料倉庫連線的步驟

下列create-connection範例會在 AWS Glue 資料型錄中建立連線,以提供 Kafka 資料倉庫的連線資訊。

aws glue create-connection \ --connection-input '{ \ "Name":"conn-kafka-custom", \ "Description":"kafka connection with ssl to custom kafka", \ "ConnectionType":"KAFKA", \ "ConnectionProperties":{ \ "KAFKA_BOOTSTRAP_SERVERS":"<Kafka-broker-server-url>:<SSL-Port>", \ "KAFKA_SSL_ENABLED":"true", \ "KAFKA_CUSTOM_CERT": "s3://bucket/prefix/cert-file.pem" \ }, \ "PhysicalConnectionRequirements":{ \ "SubnetId":"subnet-1234", \ "SecurityGroupIdList":["sg-1234"], \ "AvailabilityZone":"us-east-1a"} \ }' \ --region us-east-1 --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue 開發人員指南中的 AWS Glue 資料型錄中AWS 的定義連線

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考CreateConnection中的。

下列程式碼範例會示範如何使用create-database

AWS CLI

建立資料庫

下列create-database範例會在 AWS Glue 資料型錄中建立資料庫。

aws glue create-database \ --database-input "{\"Name\":\"tempdb\"}" \ --profile my_profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考CreateDatabase中的。

下列程式碼範例會示範如何使用create-job

AWS CLI

建立工作以轉換資料

下列 create-job 範例會建立串流工作,它可執行存放在 S3 中的指令碼。

aws glue create-job \ --name my-testing-job \ --role AWSGlueServiceRoleDefault \ --command '{ \ "Name": "gluestreaming", \ "ScriptLocation": "s3://DOC-EXAMPLE-BUCKET/folder/" \ }' \ --region us-east-1 \ --output json \ --default-arguments '{ \ "--job-language":"scala", \ "--class":"GlueApp" \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

test_script.scala 的內容:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

輸出:

{ "Name": "my-testing-job" }

如需詳細資訊,請參閱 AWS Glue 開發人員指南中的使用AWS Glue 編寫工作

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考CreateJob中的。

下列程式碼範例會示範如何使用create-table

AWS CLI

範例 1:若要建立 Kinesis 資料串流的表格

下列create-table範例會在 AWS Glue 資料型錄中建立描述 Kinesis 資料串流的表格。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kinesis-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"my-testing-stream", \ "Parameters":{ \ "typeOfData":"kinesis","streamName":"my-testing-stream", \ "kinesisUrl":"https://kinesis.us-east-1.amazonaws.com" \ }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue 開發人員指南中的 AWS Glue 資料型錄中AWS 的定義表格

範例 2:若要建立 Kafka 資料倉庫的資料表

下列create-table範例會在 AWS Glue 資料型錄中建立描述 Kafka 資料倉庫的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"test-kafka-input", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"sensorid", "Type":"int"}, \ {"Name":"currenttemperature", "Type":"int"}, \ {"Name":"status", "Type":"string"} ], \ "Location":"glue-topic", \ "Parameters":{ \ "typeOfData":"kafka","topicName":"glue-topic", \ "connectionName":"my-kafka-connection" }, \ "SerdeInfo":{ \ "SerializationLibrary":"org.apache.hadoop.hive.serde2.OpenCSVSerde"} \ }, \ "Parameters":{ \ "separatorChar":","} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue 開發人員指南中的 AWS Glue 資料型錄中AWS 的定義表格

範例 3:建立 AWS S3 資料存放區的資料表

下列create-table範例會在 AWS Glue 資料型錄中建立一個說明 AWS 簡易儲存服務 (AWS S3) 資料存放區的資料表。

aws glue create-table \ --database-name tempdb \ --table-input '{"Name":"s3-output", "StorageDescriptor":{ \ "Columns":[ \ {"Name":"s1", "Type":"string"}, \ {"Name":"s2", "Type":"int"}, \ {"Name":"s3", "Type":"string"} ], \ "Location":"s3://bucket-path/", \ "SerdeInfo":{ \ "SerializationLibrary":"org.openx.data.jsonserde.JsonSerDe"} \ }, \ "Parameters":{ \ "classification":"json"} \ }' \ --profile my-profile \ --endpoint https://glue.us-east-1.amazonaws.com

此命令不會產生輸出。

如需詳細資訊,請參閱 Glue 開發人員指南中的 AWS Glue 資料型錄中AWS 的定義表格

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考CreateTable中的。

下列程式碼範例會示範如何使用delete-job

AWS CLI

若要刪除工作

下列 delete-job 範例會刪除不再需要的工作。

aws glue delete-job \ --job-name my-testing-job

輸出:

{ "JobName": "my-testing-job" }

如需詳細資訊,請參閱 Glue 開發人員指南中的在 AWS Glu AWS e 主控台上使用工作。

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考DeleteJob中的。

下列程式碼範例會示範如何使用get-databases

AWS CLI

列出 AWS Glue 資料型錄中部分或所有資料庫的定義

下列 get-databases 範例會傳回 Data Catalog 中的資料庫相關資訊。

aws glue get-databases

輸出:

{ "DatabaseList": [ { "Name": "default", "Description": "Default Hive database", "LocationUri": "file:/spark-warehouse", "CreateTime": 1602084052.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "flights-db", "CreateTime": 1587072847.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "legislators", "CreateTime": 1601415625.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" }, { "Name": "tempdb", "CreateTime": 1601498566.0, "CreateTableDefaultPermissions": [ { "Principal": { "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS" }, "Permissions": [ "ALL" ] } ], "CatalogId": "111122223333" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的在 Data Catalog 中定義資料庫

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetDatabases中的。

下列程式碼範例會示範如何使用get-job-run

AWS CLI

取得工作執行的相關資訊

以下 get-job-run 範例會擷取工作執行的相關資訊。

aws glue get-job-run \ --job-name "Combine legistators data" \ --run-id jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e

輸出:

{ "JobRun": { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "Combine legistators data", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetJobRun中的。

下列程式碼範例會示範如何使用get-job-runs

AWS CLI

取得工作的全部工作執行相關資訊

以下 get-job-runs 範例會擷取工作的工作執行相關資訊。

aws glue get-job-runs \ --job-name "my-testing-job"

輸出:

{ "JobRuns": [ { "Id": "jr_012e176506505074d94d761755e5c62538ee1aad6f17d39f527e9140cf0c9a5e", "Attempt": 0, "JobName": "my-testing-job", "StartedOn": 1602873931.255, "LastModifiedOn": 1602874075.985, "CompletedOn": 1602874075.985, "JobRunState": "SUCCEEDED", "Arguments": { "--enable-continuous-cloudwatch-log": "true", "--enable-metrics": "", "--enable-spark-ui": "true", "--job-bookmark-option": "job-bookmark-enable", "--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/" }, "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 117, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_2", "Attempt": 2, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "JobName": "my-testing-job", "StartedOn": 1602811168.496, "LastModifiedOn": 1602811282.39, "CompletedOn": 1602811282.39, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 021AAB703DB20A2D; S3 Extended Request ID: teZk24Y09TkXzBvMPG502L5VJBhe9DJuWA9/TXtuGOqfByajkfL/Tlqt5JBGdEGpigAqzdMDM/U=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 110, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" }, { "Id": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f_attempt_1", "Attempt": 1, "PreviousRunId": "jr_03cc19ddab11c4e244d3f735567de74ff93b0b3ef468a713ffe73e53d1aec08f", "JobName": "my-testing-job", "StartedOn": 1602811020.518, "LastModifiedOn": 1602811138.364, "CompletedOn": 1602811138.364, "JobRunState": "FAILED", "ErrorMessage": "An error occurred while calling o122.pyWriteDynamicFrame. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 2671D37856AE7ABB; S3 Extended Request ID: RLJCJw20brV+PpC6GpORahyF2fp9flB5SSb2bTGPnUSPVizLXRl1PN3QZldb+v1o9qRVktNYbW8=)", "PredecessorRuns": [], "AllocatedCapacity": 10, "ExecutionTime": 113, "Timeout": 2880, "MaxCapacity": 10.0, "WorkerType": "G.1X", "NumberOfWorkers": 10, "LogGroupName": "/aws-glue/jobs", "GlueVersion": "2.0" } ] }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作執行

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetJobRuns中的。

下列程式碼範例會示範如何使用get-job

AWS CLI

擷取工作相關資訊

以下 get-job 範例會擷取工作相關資訊。

aws glue get-job \ --job-name my-testing-job

輸出:

{ "Job": { "Name": "my-testing-job", "Role": "Glue_DefaultRole", "CreatedOn": 1602805698.167, "LastModifiedOn": 1602805698.167, "ExecutionProperty": { "MaxConcurrentRuns": 1 }, "Command": { "Name": "gluestreaming", "ScriptLocation": "s3://janetst-bucket-01/Scripts/test_script.scala", "PythonVersion": "2" }, "DefaultArguments": { "--class": "GlueApp", "--job-language": "scala" }, "MaxRetries": 0, "AllocatedCapacity": 10, "MaxCapacity": 10.0, "GlueVersion": "1.0" } }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的工作

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetJob中的。

下列程式碼範例會示範如何使用get-plan

AWS CLI

若要取得將資料從來源表格對映至目標資料表的產生程式碼

以下內容get-plan擷取產生的程式碼,將資料行從資料來源映射到資料目標。

aws glue get-plan --mapping '[ \ { \ "SourcePath":"sensorid", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"sensorid", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"currenttemperature", \ "SourceTable":"anything", \ "SourceType":"int", \ "TargetPath":"currenttemperature", \ "TargetTable":"anything", \ "TargetType":"int" \ }, \ { \ "SourcePath":"status", \ "SourceTable":"anything", \ "SourceType":"string", \ "TargetPath":"status", \ "TargetTable":"anything", \ "TargetType":"string" \ }]' \ --source '{ \ "DatabaseName":"tempdb", \ "TableName":"s3-source" \ }' \ --sinks '[ \ { \ "DatabaseName":"tempdb", \ "TableName":"my-s3-sink" \ }]' --language "scala" --endpoint https://glue.us-east-1.amazonaws.com --output "text"

輸出:

import com.amazonaws.services.glue.ChoiceOption import com.amazonaws.services.glue.GlueContext import com.amazonaws.services.glue.MappingSpec import com.amazonaws.services.glue.ResolveSpec import com.amazonaws.services.glue.errors.CallSite import com.amazonaws.services.glue.util.GlueArgParser import com.amazonaws.services.glue.util.Job import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext import scala.collection.JavaConverters._ object GlueApp { def main(sysArgs: Array[String]) { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) // @params: [JOB_NAME] val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME").toArray) Job.init(args("JOB_NAME"), glueContext, args.asJava) // @type: DataSource // @args: [database = "tempdb", table_name = "s3-source", transformation_ctx = "datasource0"] // @return: datasource0 // @inputs: [] val datasource0 = glueContext.getCatalogSource(database = "tempdb", tableName = "s3-source", redshiftTmpDir = "", transformationContext = "datasource0").getDynamicFrame() // @type: ApplyMapping // @args: [mapping = [("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")], transformation_ctx = "applymapping1"] // @return: applymapping1 // @inputs: [frame = datasource0] val applymapping1 = datasource0.applyMapping(mappings = Seq(("sensorid", "int", "sensorid", "int"), ("currenttemperature", "int", "currenttemperature", "int"), ("status", "string", "status", "string")), caseSensitive = false, transformationContext = "applymapping1") // @type: SelectFields // @args: [paths = ["sensorid", "currenttemperature", "status"], transformation_ctx = "selectfields2"] // @return: selectfields2 // @inputs: [frame = applymapping1] val selectfields2 = applymapping1.selectFields(paths = Seq("sensorid", "currenttemperature", "status"), transformationContext = "selectfields2") // @type: ResolveChoice // @args: [choice = "MATCH_CATALOG", database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "resolvechoice3"] // @return: resolvechoice3 // @inputs: [frame = selectfields2] val resolvechoice3 = selectfields2.resolveChoice(choiceOption = Some(ChoiceOption("MATCH_CATALOG")), database = Some("tempdb"), tableName = Some("my-s3-sink"), transformationContext = "resolvechoice3") // @type: DataSink // @args: [database = "tempdb", table_name = "my-s3-sink", transformation_ctx = "datasink4"] // @return: datasink4 // @inputs: [frame = resolvechoice3] val datasink4 = glueContext.getCatalogSink(database = "tempdb", tableName = "my-s3-sink", redshiftTmpDir = "", transformationContext = "datasink4").writeDynamicFrame(resolvechoice3) Job.commit() } }

如需詳細資訊,請參閱 AWS Glue 開發人員指南中的在AWS Glue 中編輯指

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetPlan中的。

下列程式碼範例會示範如何使用get-tables

AWS CLI

列出指定資料庫中部分或全部資料表的定義

下列 get-tables 範例會傳回指定資料庫中資料表的相關資訊。

aws glue get-tables --database-name 'tempdb'

輸出:

{ "TableList": [ { "Name": "my-s3-sink", "DatabaseName": "tempdb", "CreateTime": 1602730539.0, "UpdateTime": 1602730539.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/test-s3-output/", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "s3-source", "DatabaseName": "tempdb", "CreateTime": 1602730658.0, "UpdateTime": 1602730658.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "s3://janetst-bucket-01/", "Compressed": false, "NumberOfBuckets": 0, "SortColumns": [], "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" }, { "Name": "test-kinesis-input", "DatabaseName": "tempdb", "CreateTime": 1601507001.0, "UpdateTime": 1601507001.0, "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "sensorid", "Type": "int" }, { "Name": "currenttemperature", "Type": "int" }, { "Name": "status", "Type": "string" } ], "Location": "my-testing-stream", "Compressed": false, "NumberOfBuckets": 0, "SerdeInfo": { "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe" }, "SortColumns": [], "Parameters": { "kinesisUrl": "https://kinesis.us-east-1.amazonaws.com", "streamName": "my-testing-stream", "typeOfData": "kinesis" }, "StoredAsSubDirectories": false }, "Parameters": { "classification": "json" }, "CreatedBy": "arn:aws:iam::007436865787:user/JRSTERN", "IsRegisteredWithLakeFormation": false, "CatalogId": "007436865787" } ] }

如需詳細資訊,請參閱 Glue 開發人員指南中的 AWS Glue 資料型錄中AWS 的定義表格

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考GetTables中的。

下列程式碼範例會示範如何使用start-crawler

AWS CLI

啟動爬蟲程式

以下 start-crawler 範例會啟動爬蟲程式。

aws glue start-crawler --name my-crawler

輸出:

None

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的定義爬蟲程式

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考StartCrawler中的。

下列程式碼範例會示範如何使用start-job-run

AWS CLI

開始執行工作

以下 start-job-run 範例會啟動工作。

aws glue start-job-run \ --job-name my-job

輸出:

{ "JobRunId": "jr_22208b1f44eb5376a60569d4b21dd20fcb8621e1a366b4e7b2494af764b82ded" }

如需詳細資訊,請參閱《AWS Glue 開發人員指南》中的授權工作

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考StartJobRun中的。