AWS Glue용 AWS CloudFormation

포커스 모드

AWS Glue용 AWS CloudFormation - AWS Glue

샘플 데이터베이스 샘플 데이터베이스, 테이블 및 파티션 샘플 Grok 분류자 샘플 JSON 분류자 샘플 XML 분류자 샘플 Amazon S3 크롤러 샘플 연결 샘플 JDBC 크롤러 Amazon S3에서 Amazon S3로의 샘플 작업 JDBC에서 Amazon S3로의 샘플 작업 샘플 온디맨드 트리거 일정이 정해진 샘플 트리거 샘플 조건부 트리거 샘플 기계 학습 변환 샘플 데이터 품질 규칙 세트 EventBridge 스케줄러에서 샘플 데이터 품질 규칙 세트 샘플 개발 엔드포인트

AWS CloudFormation는 여러 AWS 리소스를 생성할 수 있는 서비스입니다. AWS Glue는 AWS Glue Data Catalog에 객체를 생성하는 API 작업을 제공합니다. 그러나 AWS CloudFormation 템플릿 파일에 AWS Glue 객체 및 기타 AWS 리소스 객체를 정의하고 생성할 때 용이할 수 있습니다. 그런 다음 객체 생성 작업을 자동화할 수 있습니다.

AWS CloudFormation은 단순화된 구문((JSON(JavaScript Object Notation) 또는 YAML(YAML Ain't Markup Language))을 제공하여 AWS 리소스의 생성을 보여줍니다. AWS CloudFormation 템플릿을 사용하여 데이터베이스, 테이블, 파티션, 크롤러, 분류자 및 연결과 같은 Data Catalog 객체를 정의할 수 있습니다. 작업, 트리거 및 개발 엔드포인트와 같은 ETL 객체를 정의할 수도 있습니다. 필요한 모든 AWS 리소스를 설명하는 템플릿을 생성하면 AWS CloudFormation이 해당 리소스의 프로비저닝과 구성을 담당합니다.

자세한 내용은 AWS CloudFormation User Guide의 What Is AWS CloudFormation? 및 Working with AWS CloudFormation Templates를 참조하세요.

AWS Glue와 호환되는 AWS CloudFormation 템플릿을 사용하려는 경우 관리자는 AWS CloudFormation과 AWS 서비스 및 의존 대상 작업에 액세스 권한을 부여해야 합니다. AWS CloudFormation 리소스를 생성하는 데 필요한 권한을 부여하려면 다음 정책을 AWS CloudFormation으로 작업하는 사용자에게 연결합니다.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [                
        "cloudformation:*"        
      ],
      "Resource": "*"
    }
  ]
}

다음 테이블은 작업을 포함하여 AWS CloudFormation 템플릿을 수행하도록 허용합니다. AWS 리소스 유형 및 속성 유형에 대한 정보 링크를 포함하고 AWS CloudFormation 템플릿에 추가할 수 있습니다.

AWS Glue 리소스	AWS CloudFormation 템플릿	AWS Glue 샘플
분류자	AWS::Glue::Classifier	Grok 분류자, JSON 분류자, XML 분류자
연결	AWS::Glue::Connection	MySQL 연결
크롤러	AWS::Glue::Crawler	Amazon S3 크롤러, MySQL 크롤러
데이터베이스	AWS::Glue::Database	데이터베이스 비우기, 데이터베이스 및 테이블
개발 엔드포인트	AWS::Glue::DevEndpoint	개발 엔드포인트
작업	AWS::Glue::Job	Amazon S3 작업, JDBC 작업
기계 학습 변환	AWS::Glue::MLTransform	기계 학습 변환
데이터 품질 규칙 세트	AWS::Glue::DataQualityRuleset	데이터 품질 규칙 세트, EventBridge 스케줄러에서 데이터 품질 규칙 세트
Partition	AWS::Glue::Partition	테이블 파티션
표	AWS::Glue::Table	데이터베이스의 테이블
트리거	AWS::Glue::Trigger	온디맨드 트리커, 일정이 정해진 트리거, 조건부 트리거

시작하려면 다음 샘플 템플릿을 사용하고 메타데이터로 사용자 지정합니다. AWS CloudFormation 콘솔을 사용하여 AWS CloudFormation 스택을 생성한 다음 객체를 AWS Glue 및 기타 관련 서비스에 추가합니다. AWS Glue 객체의 많은 필드는 선택 사항입니다. 이 템플릿은 필요한 필드 또는 작업 및 기능적 AWS Glue객체에 필요한 필드를 보여줍니다.

AWS CloudFormation 템플릿은 JSON 또는 YAML 형식일 수 있습니다. 이런 예제의 YAML은 더 쉽게 읽게 사용됩니다. 예제는 설명문(#)을 포함하여 템플릿에서 정의된 값을 설명합니다.

AWS CloudFormation 템플릿에 Parameters 섹션을 포함시킬 수 있습니다. 이 섹션은 샘플 텍스트 또는 YAML 파일이 AWS CloudFormation 콘솔에 입력되어 스택을 생성할 경우 변경될 수 있습니다. 템플릿의 Resources 섹션에는 AWS Glue 구문 정의와 관련 객체가 포함됩니다. AWS CloudFormation 템플릿 구문 정의에는 더 세분화된 속성 구문을 포함한 속성이 포함될 수 있습니다. AWS Glue 객체는 모든 속성이 없어도 생성할 수 있습니다. 다음 샘플은 AWS Glue 객체를 생성할 수 있는 일반 속성 예제 값을 보여줍니다.

AWS Glue 데이터베이스의 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 데이터베이스는 메타데이터 테이블을 포함합니다. 데이터베이스는 적은 속성으로 구성되어 있고 AWS CloudFormation 템플릿으로 Data Catalog에 생성될 수 있습니다. 다음 샘플 템플릿을 제공하여 시작을 돕고 AWS Glue로 AWS CloudFormation 스택의 사용에 대하여 보여줍니다. 샘플 템플릿이 생성한 유일한 리소스는 cfn-mysampledatabase라는 데이터베이스입니다. YAML을 입력한 후, 샘플 텍스트를 편집하거나 AWS CloudFormation 콘솔 값을 변경하면 리소스를 변경할 수 있습니다.

다음은 AWS Glue 데이터베이스를 생성할 수 있는 일반 속성 예제 값을 보여줍니다. AWS Glue용 AWS CloudFormation 데이터베이스 템플릿에 대한 자세한 내용은 AWS::Glue::Database를 참조하세요.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CloudFormation template in YAML to demonstrate creating a database named mysampledatabase
# The metadata created in the Data Catalog points to the flights public S3 bucket
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:
  CFNDatabaseName:
    Type: String
    Default: cfn-mysampledatabse

# Resources section defines metadata for the Data Catalog
Resources:
# Create an AWS Glue database
  CFNDatabaseFlights:
    Type: AWS::Glue::Database
    Properties:
      # The database is created in the Data Catalog for your account
      CatalogId: !Ref AWS::AccountId   
      DatabaseInput:
        # The name of the database is defined in the Parameters section above
        Name: !Ref CFNDatabaseName	
        Description: Database to hold tables for flights data
        LocationUri: s3://crawler-public-us-east-1/flight/2016/csv/
        #Parameters: Leave AWS database parameters blank

AWS Glue 데이터베이스. 테이블 및 파티션의 샘플 AWS CloudFormation 템플릿

AWS Glue 테이블은 메타데이터를 포함하여 ETL 스크립트에서 진행하고자 하는 데이터의 구조 및 위치를 정의할 수 있습니다. 테이블 내에서 파티션을 정의하여 데이터를 병렬로 진행할 수 있습니다. 파티션은 키로 정의한 데이터 덩어리입니다. 예를 들어, 키로써 월을 사용하여 모든 1월 데이터는 동일한 파티션을 포함합니다. AWS Glue의 경우, 데이터베이스는 테이블을 포함하고 테이블은 파티션을 포함합니다.

다음 샘플에서는 AWS CloudFormation 템플릿을 사용하여 데이터베이스, 테이블 및 파티션을 채우는 방법을 보여줍니다. 기본 데이터 포맷은 csv이고 콤마(,)로 범위가 제한됩니다. 데이터베이스는 테이블을 포함하기 전에 존재하고 테이블은 파티션이 생성되기 전에 존재해야 하기 때문에 템플릿은 DependsOn을 사용하여 생성될 경우, 객체의 종속성을 정의합니다.

샘플 값은 공개적으로 사용 가능한 Amazon S3 버킷에서 항공 데이터를 포함하는 테이블을 정의합니다. 설명하자면 오직 몇몇 데이터 열과 하나의 파티션 키가 정의됩니다. 4개의 파티션은 Data Catalog에도 정의됩니다. 기본 데이터의 스토리지를 설명하는 어떤 필드는 StorageDescriptor 필드에서 보여줍니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CloudFormation template in YAML to demonstrate creating a database, a table, and partitions
# The metadata created in the Data Catalog points to the flights public S3 bucket
#
# Parameters substituted in the Resources section
# These parameters are names of the resources created in the Data Catalog
Parameters:
  CFNDatabaseName:
    Type: String
    Default: cfn-database-flights-1
  CFNTableName1:
    Type: String
    Default: cfn-manual-table-flights-1
# Resources to create metadata in the Data Catalog
Resources:
###
# Create an AWS Glue database
  CFNDatabaseFlights:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: !Ref CFNDatabaseName	
        Description: Database to hold tables for flights data
###
# Create an AWS Glue table
  CFNTableFlights:
    # Creating the table waits for the database to be created
    DependsOn: CFNDatabaseFlights
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableInput:
        Name: !Ref CFNTableName1
        Description: Define the first few columns of the flights table
        TableType: EXTERNAL_TABLE
        Parameters: {
    "classification": "csv"
  }
#       ViewExpandedText: String
        PartitionKeys:
        # Data is partitioned by month
        - Name: mon
          Type: bigint
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: year
            Type: bigint
          - Name: quarter
            Type: bigint
          - Name: month
            Type: bigint
          - Name: day_of_month
            Type: bigint			
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
# Partition 1
# Create an AWS Glue partition  
  CFNPartitionMon1:
    DependsOn: CFNTableFlights
    Type: AWS::Glue::Partition
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableName: !Ref CFNTableName1
      PartitionInput:
        Values:
        - 1
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: mon
            Type: bigint
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/mon=1/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
# Partition 2
# Create an AWS Glue partition 
  CFNPartitionMon2:
    DependsOn: CFNTableFlights
    Type: AWS::Glue::Partition
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableName: !Ref CFNTableName1
      PartitionInput:
        Values:
        - 2
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: mon
            Type: bigint
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/mon=2/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
# Partition 3
# Create an AWS Glue partition 
  CFNPartitionMon3:
    DependsOn: CFNTableFlights
    Type: AWS::Glue::Partition
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableName: !Ref CFNTableName1
      PartitionInput:
        Values:
        - 3
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: mon
            Type: bigint
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/mon=3/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
# Partition 4
# Create an AWS Glue partition 
  CFNPartitionMon4:
    DependsOn: CFNTableFlights
    Type: AWS::Glue::Partition
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableName: !Ref CFNTableName1
      PartitionInput:
        Values:
        - 4
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: mon
            Type: bigint
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/mon=4/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

AWS Glue Grok 분류자의 샘플 AWS CloudFormation 템플릿

AWS Glue 분류자는 데이터의 스키마를 결정합니다. 사용자 분류자의 한 유형은 grok 패턴을 사용하여 데이터와 매치합니다. 패턴이 맞으면 사용자 분류자는 테이블 스키마를 생성하고 classification을 분류자 정의의 값 세트를 설정합니다.

이 샘플은 분류자를 생성하여 message라는 하나의 열과 함께 스키마를 생성하고 greedy의 분류를 설정합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a classifier
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the classifier to be created
  CFNClassifierName:  
    Type: String
    Default: cfn-classifier-grok-one-column-1                                                               	
#
#
# Resources section defines metadata for the Data Catalog
Resources:
# Create classifier that uses grok pattern to put all data in one column and classifies it as "greedy".	
  CFNClassifierFlights:
    Type: AWS::Glue::Classifier   
    Properties:
      GrokClassifier:
        #Grok classifier that puts all data in one column		
        Name: !Ref CFNClassifierName
        Classification: greedy                                                        	   
        GrokPattern: "%{GREEDYDATA:message}"
        #CustomPatterns: none

AWS Glue JSON 분류자의 샘플 AWS CloudFormation 템플릿

AWS Glue 분류자는 데이터의 스키마를 결정합니다. 분류자가 분류해야 하는 JSON 데이터를 정의하는 JsonPath 문자열을 사용하는 유형의 사용자 지정 분류자입니다. AWS Glue은(는) JsonPath에 대한 연산자 하위 집단을 지원합니다. 자세한 설명은 Writing JsonPath Custom Classifiers(JsonPath 사용자 지정 분류자 작성)에서 확인할 수 있습니다.

패턴이 맞으면 사용자 분류자는 테이블 스키마를 생성하는 데 사용됩니다.

이 샘플은 객체 내의 Records3 어레이에 있는 개별 레코드로 스키마를 생성하는 분류자를 생성합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a JSON classifier
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the classifier to be created
  CFNClassifierName:  
    Type: String
    Default: cfn-classifier-json-one-column-1                                                               	
#
#
# Resources section defines metadata for the Data Catalog
Resources:
# Create classifier that uses a JSON pattern.	
  CFNClassifierFlights:
    Type: AWS::Glue::Classifier   
    Properties:
      JSONClassifier:
        #JSON classifier		
        Name: !Ref CFNClassifierName
        JsonPath: $.Records3[*]

AWS Glue XML 분류자의 샘플 AWS CloudFormation 템플릿

AWS Glue 분류자는 데이터의 스키마를 결정합니다. 분석 중인 XML 문서에 각각의 레코드를 포함하는 요소를 지정하도록 XML 태그를 지정하는 유형의 사용자 지정 분류자입니다. 패턴이 맞으면 사용자 분류자는 테이블 스키마를 생성하고 classification을 분류자 정의의 값 세트를 설정합니다.

이 샘플은 분류자를 생성하여 Record 태그에 있는 각 레코드와 함께 스키마를 생성하고 XML의 분류를 설정합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating an XML classifier
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the classifier to be created
  CFNClassifierName:  
    Type: String
    Default: cfn-classifier-xml-one-column-1                                                               	
#
#
# Resources section defines metadata for the Data Catalog
Resources:
# Create classifier that uses the XML pattern and classifies it as "XML".	
  CFNClassifierFlights:
    Type: AWS::Glue::Classifier   
    Properties:
      XMLClassifier:
        #XML classifier		
        Name: !Ref CFNClassifierName
        Classification: XML   
        RowTag: <Records>

Amazon S3용 AWS Glue 크롤러의 샘플 AWS CloudFormation 템플릿

AWS Glue 크롤러는 데이터에 대응하는 Data Catalog에 메타데이터 테이블을 생성합니다. 그런 다음 이 테이블 정의를 ETL 작업의 원본 및 대상으로써 사용할 수 있습니다.

이 샘플은 크롤러, 필요한 IAM 역할 및 AWS Glue 데이터베이스를 Data Catalog에 생성합니다. 크롤러가 실행되면 IAM 역할을 가정하고 퍼블릭 항공 데이터의 데이터베이스에 테이블을 생성합니다. 테이블은 접두사 "cfn_sample_1_"로 생성합니다. 템플릿에 의해 생성된 IAM 역할은 글로벌 권한을 허용하여 사용자 역할을 생성하고자 합니다. 이 분류자는 사용자 지정 분류자를 정의하지 않습니다. 기본적으로 AWS Glue 내장 분류자가 사용됩니다.

이 샘플을 AWS CloudFormation 콘솔에 제출하면 IAM 역할을 생성하고자 하는 확신이 있어야 합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a crawler
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the crawler to be created
  CFNCrawlerName:  
    Type: String
    Default: cfn-crawler-flights-1
  CFNDatabaseName:
    Type: String
    Default: cfn-database-flights-1
  CFNTablePrefixName:
    Type: String
    Default: cfn_sample_1_	
#
#
# Resources section defines metadata for the Data Catalog
Resources:
#Create IAM Role assumed by the crawler. For demonstration, this role is given all permissions.
  CFNRoleFlights:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "glue.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        -
          PolicyName: "root"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: "*"
                Resource: "*"
 # Create a database to contain tables created by the crawler
  CFNDatabaseFlights:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: !Ref CFNDatabaseName
        Description: "AWS Glue container to hold metadata tables for the flights crawler"
 #Create a crawler to crawl the flights data on a public S3 bucket
  CFNCrawlerFlights:
    Type: AWS::Glue::Crawler
    Properties:
      Name: !Ref CFNCrawlerName
      Role: !GetAtt CFNRoleFlights.Arn
      #Classifiers: none, use the default classifier
      Description: AWS Glue crawler to crawl flights data
      #Schedule: none, use default run-on-demand
      DatabaseName: !Ref CFNDatabaseName
      Targets:
        S3Targets:
          # Public S3 bucket with the flights data
          - Path: "s3://crawler-public-us-east-1/flight/2016/csv"
      TablePrefix: !Ref CFNTablePrefixName
      SchemaChangePolicy:
        UpdateBehavior: "UPDATE_IN_DATABASE"
        DeleteBehavior: "LOG"
      Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"

AWS Glue 연결의 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 연결은 JDBC 데이터베이스 연결에 필요한 JDBC 및 네트워크 정보를 포함합니다. 이 정보는 JDBC 데이터베이스에 연결하여 ETL 작업을 크롤하거나 실행할 때 사용됩니다.

이 샘플은 devdb라는 Amazon RDS MySQL 데이터베이스의 연결을 생성합니다. 이 연결이 사용되면 IAM 역할, 데이터베이스 자격 및 네트워크 연결 값도 제공되어야 합니다. 템플릿에 필요한 필드의 상세 정보를 참조하십시오.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a connection
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the connection to be created
  CFNConnectionName:  
    Type: String
    Default: cfn-connection-mysql-flights-1
  CFNJDBCString:  
    Type: String
    Default: "jdbc:mysql://xxx-mysql.yyyyyyyyyyyyyy.us-east-1.rds.amazonaws.com:3306/devdb"
  CFNJDBCUser:  
    Type: String
    Default: "master"
  CFNJDBCPassword:  
    Type: String
    Default: "12345678"
    NoEcho: true
#
#
# Resources section defines metadata for the Data Catalog
Resources:
  CFNConnectionMySQL:
    Type: AWS::Glue::Connection
    Properties:
      CatalogId: !Ref AWS::AccountId
      ConnectionInput: 
        Description: "Connect to MySQL database."
        ConnectionType: "JDBC"
        #MatchCriteria: none		
        PhysicalConnectionRequirements:
          AvailabilityZone: "us-east-1d"
          SecurityGroupIdList: 
           - "sg-7d52b812"
          SubnetId: "subnet-84f326ee" 
        ConnectionProperties: {
          "JDBC_CONNECTION_URL": !Ref CFNJDBCString,
          "USERNAME": !Ref CFNJDBCUser,
          "PASSWORD": !Ref CFNJDBCPassword
        }
        Name: !Ref CFNConnectionName

JDBC을 위한 AWS Glue 크롤러의 샘플 AWS CloudFormation 템플릿

이 샘플은 크롤러, 필요한 IAM 역할 및 AWS Glue 데이터베이스를 Data Catalog에 생성합니다. 크롤러가 실행되면 IAM 역할을 가정하고 MySQL 데이터베이스에 저장된 퍼블릭 항공 데이터의 데이터베이스에 테이블을 생성합니다. 테이블은 접두사 "cfn_jdbc_1_"로 생성합니다. 템플릿에 의해 생성된 IAM 역할은 글로벌 권한을 허용하여 사용자 역할을 생성하고자 합니다. JDBC 데이터는 사용자 지정 분류자를 정의할 수 없습니다. 기본적으로 AWS Glue 내장 분류자가 사용됩니다.

이 샘플을 AWS CloudFormation 콘솔에 제출하면 IAM 역할을 생성하고자 하는 확신이 있어야 합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a crawler
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the crawler to be created
  CFNCrawlerName:  
    Type: String
    Default: cfn-crawler-jdbc-flights-1
# The name of the database to be created to contain tables	
  CFNDatabaseName:
    Type: String
    Default: cfn-database-jdbc-flights-1
# The prefix for all tables crawled and created	
  CFNTablePrefixName:
    Type: String
    Default: cfn_jdbc_1_
# The name of the existing connection to the MySQL database
  CFNConnectionName:  
    Type: String
    Default: cfn-connection-mysql-flights-1
# The name of the JDBC path (database/schema/table) with wildcard (%) to crawl	
  CFNJDBCPath:  
    Type: String
    Default: saldev/%		
#
#
# Resources section defines metadata for the Data Catalog
Resources:
#Create IAM Role assumed by the crawler. For demonstration, this role is given all permissions.
  CFNRoleFlights:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "glue.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        -
          PolicyName: "root"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: "*"
                Resource: "*"
 # Create a database to contain tables created by the crawler
  CFNDatabaseFlights:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: !Ref CFNDatabaseName
        Description: "AWS Glue container to hold metadata tables for the flights crawler"
 #Create a crawler to crawl the flights data in MySQL database
  CFNCrawlerFlights:
    Type: AWS::Glue::Crawler
    Properties:
      Name: !Ref CFNCrawlerName
      Role: !GetAtt CFNRoleFlights.Arn
      #Classifiers: none, use the default classifier
      Description: AWS Glue crawler to crawl flights data
      #Schedule: none, use default run-on-demand
      DatabaseName: !Ref CFNDatabaseName
      Targets:
        JdbcTargets:
          # JDBC MySQL database with the flights data
          - ConnectionName: !Ref CFNConnectionName
            Path: !Ref CFNJDBCPath
          #Exclusions: none
      TablePrefix: !Ref CFNTablePrefixName
      SchemaChangePolicy:
        UpdateBehavior: "UPDATE_IN_DATABASE"
        DeleteBehavior: "LOG"
	  Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"

Amazon S3에서 Amazon S3로의 AWS Glue 작업을 위한 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 작업은 AWS Glue의 스크립트를 실행하는 데 필요한 파라미터 값을 포함합니다.

이 샘플은 csv 포맷의 Amazon S3 버킷에서 항공 데이터를 읽고 Amazon S3 Parquet 파일에 작성하는 작업을 생성합니다. 이 작업에 실행되는 스크립트는 먼저 존재해야 합니다. AWS Glue 콘솔을 통해 환경의 ETL 스트립트를 생성할 수 있습니다. 이 작업이 실행되면 올바른 권한과 함께 IAM 역할도 제공되어야 합니다.

범용 파라미터 값은 템플릿에서 보여줍니다. 예를 들어 AllocatedCapacity(DPU)는 기본값 5로 돌아갑니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a job using the public flights S3 table in a public bucket
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the job to be created
  CFNJobName:  
    Type: String
    Default: cfn-job-S3-to-S3-2
# The name of the IAM role that the job assumes. It must have access to data, script, temporary directory
  CFNIAMRoleName:  
    Type: String
    Default: AWSGlueServiceRoleGA
# The S3 path where the script for this job is located
  CFNScriptLocation:  
    Type: String
    Default: s3://aws-glue-scripts-123456789012-us-east-1/myid/sal-job-test2	
#
#
# Resources section defines metadata for the Data Catalog
Resources:                                      
# Create job to run script which accesses flightscsv table and write to S3 file as parquet.
# The script already exists and is called by this job	
  CFNJobFlights:
    Type: AWS::Glue::Job   
    Properties:
      Role: !Ref CFNIAMRoleName  
      #DefaultArguments: JSON object 
      # If script written in Scala, then set DefaultArguments={'--job-language'; 'scala', '--class': 'your scala class'}
      #Connections:  No connection needed for S3 to S3 job 
      #  ConnectionsList  
      #MaxRetries: Double  
      Description: Job created with CloudFormation  
      #LogUri: String  
      Command:   
        Name: glueetl  
        ScriptLocation: !Ref CFNScriptLocation
             # for access to directories use proper IAM role with permission to buckets and folders that begin with "aws-glue-"					 
             # script uses temp directory from job definition if required (temp directory not used S3 to S3)
             # script defines target for output as s3://aws-glue-target/sal    			 
      AllocatedCapacity: 5  
      ExecutionProperty:   
        MaxConcurrentRuns: 1  
      Name: !Ref CFNJobName

JDBC에서 Amazon S3로의 AWS Glue 작업용 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 작업은 AWS Glue의 스크립트를 실행하는 데 필요한 파라미터 값을 포함합니다.

이 샘플은 cfn-connection-mysql-flights-1이라는 연결에 의해 정의되어 MySQL JDBC 데이터베이스로부터 항공 데이터를 읽고 Amazon S3 Parquet 파일에 작성하는 작업을 생성합니다. 이 작업에 실행되는 스크립트는 먼저 존재해야 합니다. AWS Glue 콘솔을 통해 환경의 ETL 스트립트를 생성할 수 있습니다. 이 작업이 실행되면 올바른 권한과 함께 IAM 역할도 제공되어야 합니다.

범용 파라미터 값은 템플릿에서 보여줍니다. 예를 들어 AllocatedCapacity(DPU)는 기본값 5로 돌아갑니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a job using a MySQL JDBC DB with the flights data to an S3 file
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the job to be created
  CFNJobName:  
    Type: String
    Default: cfn-job-JDBC-to-S3-1
# The name of the IAM role that the job assumes. It must have access to data, script, temporary directory
  CFNIAMRoleName:  
    Type: String
    Default: AWSGlueServiceRoleGA
# The S3 path where the script for this job is located
  CFNScriptLocation:  
    Type: String
    Default: s3://aws-glue-scripts-123456789012-us-east-1/myid/sal-job-dec4a	
# The name of the connection used for JDBC data source
  CFNConnectionName:  
    Type: String
    Default: cfn-connection-mysql-flights-1
#
#
# Resources section defines metadata for the Data Catalog
Resources:                                      
# Create job to run script which accesses JDBC flights table via a connection and write to S3 file as parquet.
# The script already exists and is called by this job	
  CFNJobFlights:
    Type: AWS::Glue::Job   
    Properties:
      Role: !Ref CFNIAMRoleName  
      #DefaultArguments: JSON object  
      # For example, if required by script, set temporary directory as DefaultArguments={'--TempDir'; 's3://aws-glue-temporary-xyc/sal'}
      Connections:
        Connections:
        - !Ref CFNConnectionName 
      #MaxRetries: Double  
      Description: Job created with CloudFormation using existing script
      #LogUri: String  
      Command:   
        Name: glueetl  
        ScriptLocation: !Ref CFNScriptLocation
             # for access to directories use proper IAM role with permission to buckets and folders that begin with "aws-glue-"					 
             # if required, script defines temp directory as argument TempDir and used in script like redshift_tmp_dir = args["TempDir"] 
             # script defines target for output as s3://aws-glue-target/sal    			 
      AllocatedCapacity: 5  
      ExecutionProperty:   
        MaxConcurrentRuns: 1  
      Name: !Ref CFNJobName

AWS Glue 온디맨드 트리거의 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 트리거는 트리거가 실행되어 작업이 시작될 때 필요한 파라미터 값을 포함합니다. 온디맨드 트리거는 작동 시 시작합니다.

이 예제에서는 cfn-job-S3-to-S3-1이라는 하나의 작업을 시작하는 온디맨드 트리거를 만듭니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating an on-demand trigger
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:
  # The existing job to be started by this trigger 
  CFNJobName:
    Type: String
    Default: cfn-job-S3-to-S3-1
  # The name of the trigger to be created
  CFNTriggerName:
    Type: String
    Default: cfn-trigger-ondemand-flights-1	
#
# Resources section defines metadata for the Data Catalog
# Sample CFN YAML to demonstrate creating an on-demand trigger for a job	
Resources:                                      
# Create trigger to run an existing job (CFNJobName) on an on-demand schedule.	
  CFNTriggerSample:
    Type: AWS::Glue::Trigger   
    Properties:
      Name:
        Ref: CFNTriggerName		
      Description: Trigger created with CloudFormation
      Type: ON_DEMAND                                                        	   
      Actions:
        - JobName: !Ref CFNJobName                	  
        # Arguments: JSON object
      #Schedule: 
      #Predicate:

AWS Glue 일정이 정해진 트리거를 위한 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 트리거는 트리거가 실행되어 작업이 시작될 때 필요한 파라미터 값을 포함합니다. 일정이 정해진 트리거는 활성화되고 cron 타이머가 뜨면 시작합니다.

이 예제에서는 cfn-job-S3-to-S3-1이라는 하나의 작업을 시작하는 일정이 정해진 트리거를 만듭니다. 이 타이머는 매 10분마다 주중에 작업을 실행하는 cron 표현식입니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a scheduled trigger
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:
  # The existing job to be started by this trigger 
  CFNJobName:
    Type: String
    Default: cfn-job-S3-to-S3-1
  # The name of the trigger to be created
  CFNTriggerName:
    Type: String
    Default: cfn-trigger-scheduled-flights-1	
#
# Resources section defines metadata for the Data Catalog
# Sample CFN YAML to demonstrate creating a scheduled trigger for a job
#	
Resources:                                      
# Create trigger to run an existing job (CFNJobName) on a cron schedule.	
  TriggerSample1CFN:
    Type: AWS::Glue::Trigger   
    Properties:
      Name:
        Ref: CFNTriggerName		
      Description: Trigger created with CloudFormation
      Type: SCHEDULED                                                        	   
      Actions:
        - JobName: !Ref CFNJobName                	  
        # Arguments: JSON object
      # # Run the trigger every 10 minutes on Monday to Friday 		
      Schedule: cron(0/10 * ? * MON-FRI *) 
      #Predicate:

AWS Glue 조건부 트리거의 샘플 AWS CloudFormation 템플릿

Data Catalog의 AWS Glue 트리거는 트리거가 실행되어 작업이 시작될 때 필요한 파라미터 값을 포함합니다. 성공적으로 작업을 완료하는 것처럼 활성화되고 조건이 맞으면 조건부 트리거는 시작됩니다.

이 예제에서는 cfn-job-S3-to-S3-1이라는 하나의 작업을 시작하는 조건부 트리거를 만듭니다. 이 작업은 cfn-job-S3-to-S3-2라는 작업을 성공적으로 완료하면 시작합니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a conditional trigger for a job, which starts when another job completes
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:
  # The existing job to be started by this trigger 
  CFNJobName:
    Type: String
    Default: cfn-job-S3-to-S3-1
  # The existing job that when it finishes causes trigger to fire
  CFNJobName2:
    Type: String
    Default: cfn-job-S3-to-S3-2	
  # The name of the trigger to be created
  CFNTriggerName:
    Type: String
    Default: cfn-trigger-conditional-1	
#	
Resources:                                      
# Create trigger to run an existing job (CFNJobName) when another job completes (CFNJobName2).	
  CFNTriggerSample:
    Type: AWS::Glue::Trigger   
    Properties:
      Name:
        Ref: CFNTriggerName		
      Description: Trigger created with CloudFormation
      Type: CONDITIONAL                                                        	   
      Actions:
        - JobName: !Ref CFNJobName                	  
        # Arguments: JSON object
      #Schedule: none 
      Predicate:
        #Value for Logical is required if more than 1 job listed in Conditions	  
        Logical: AND
        Conditions:
          - LogicalOperator: EQUALS	
            JobName: !Ref CFNJobName2
            State: SUCCEEDED

AWS Glue 개발 엔드포인트의 샘플 AWS CloudFormation 템플릿

AWS Glue 기계 학습 변환은 데이터를 정리하기 위한 사용자 지정 변환입니다. 여기에는 FindMatches로 명명된 사용 가능 변환이 하나 있습니다. FindMatches 변환으로는 레코드에 공통된 고유 식별자가 없고 정확히 일치되는 필드 또한 없을 경우에도 데이터 세트에서 중복 레코드나 일치 레코드를 식별할 수 있습니다.

이 샘플에서는 기계 학습 변환을 생성합니다. 기계 학습 변환을 생성하는 데 필요한 파라미터에 대한 자세한 내용을 알아보려면 AWS Lake Formation FindMatches로 레코드 매칭 섹션을 참조하세요.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a machine learning transform
#
# Resources section defines metadata for the machine learning transform
Resources:
  MyMLTransform:
    Type: "AWS::Glue::MLTransform"
    Condition: "isGlueMLGARegion"
    Properties:
      Name: !Sub "MyTransform"
      Description: "The bestest transform ever"
      Role: !ImportValue MyMLTransformUserRole
      GlueVersion: "1.0"
      WorkerType: "Standard"
      NumberOfWorkers: 5
      Timeout: 120
      MaxRetries: 1
      InputRecordTables:
        GlueTables:
          - DatabaseName: !ImportValue MyMLTransformDatabase
            TableName: !ImportValue MyMLTransformTable
      TransformParameters:
        TransformType: "FIND_MATCHES"
        FindMatchesParameters:
          PrimaryKeyColumnName: "testcolumn"
          PrecisionRecallTradeoff: 0.5
          AccuracyCostTradeoff: 0.5
          EnforceProvidedLabels: True
      Tags:
        key1: "value1"
        key2: "value2"
      TransformEncryption:
        TaskRunSecurityConfigurationName: !ImportValue MyMLTransformSecurityConfiguration
        MLUserDataEncryption:
          MLUserDataEncryptionMode: "SSE-KMS"
          KmsKeyId: !ImportValue MyMLTransformEncryptionKey

AWS Glue Data Quality 규칙 세트에 대한 샘플 AWS CloudFormation 템플릿

AWS Glue 데이터 품질 규칙 세트에는 데이터 카탈로그의 테이블에서 평가할 수 있는 규칙이 포함되어 있습니다. 규칙 세트를 대상 테이블에 배치한 후에는 데이터 카탈로그로 이동하여 규칙 세트 내 해당 규칙에 대해 데이터를 실행하는 평가를 실행할 수 있습니다. 이러한 규칙은 행 수 평가에서 데이터에 대한 참조 무결성 평가에 이르기까지 다양할 수 있습니다.

다음 샘플은 지정된 대상 테이블에서 다양한 규칙이 포함된 규칙 세트를 생성하는 CloudFormation 템플릿입니다.


AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a DataQualityRuleset
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
  # The name of the ruleset to be created
  RulesetName:  
    Type: String
    Default: "CFNRulesetName"
  RulesetDescription:  
    Type: String
    Default: "CFN DataQualityRuleset"
  # Rules that will be associated with this ruleset
  Rules:  
    Type: String
    Default: 'Rules = [
        RowCount > 100,
        IsUnique "id",
        IsComplete "nametype"
        ]'
  # Name of database and table within Data Catalog which the ruleset will 
  # be applied too
  DatabaseName:  
    Type: String
    Default: "ExampleDatabaseName"
  TableName:  
    Type: String
    Default: "ExampleTableName"

# Resources section defines metadata for the Data Catalog
Resources:
  # Creates a Data Quality ruleset under specified rules 
  DQRuleset:
    Type: AWS::Glue::DataQualityRuleset
    Properties:
      Name: !Ref RulesetName
      Description: !Ref RulesetDescription
      # The String within rules must be formatted in DQDL, a language 
      # used specifically to make rules
      Ruleset: !Ref Rules
      # The targeted table must exist within Data Catalog alongside 
      # the correct database
      TargetTable:
        DatabaseName: !Ref DatabaseName
        TableName: !Ref TableName

EventBridge 스케줄러에서 AWS Glue Data Quality 규칙 세트에 대한 샘플 AWS CloudFormation 템플릿

AWS Glue 데이터 품질 규칙 세트에는 데이터 카탈로그의 테이블에서 평가할 수 있는 규칙이 포함되어 있습니다. 규칙 세트를 대상 테이블에 배치한 후에는 데이터 카탈로그로 이동하여 규칙 세트 내 해당 규칙에 대해 데이터를 실행하는 평가를 실행할 수 있습니다. 규칙 세트를 평가하기 위해 수동으로 데이터 카탈로그로 이동할 필요 없이, CloudFormation 템플릿 내에 EventBridge 스케줄러를 추가하여 일정 간격으로 이러한 규칙 세트 평가를 예약할 수도 있습니다.

다음 샘플은 데이터 품질 규칙 세트와 EventBridge 스케줄러를 생성하여 앞서 언급한 규칙 세트를 5분마다 평가하는 CloudFormation 템플릿입니다.


AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a DataQualityRuleset
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
  # The name of the ruleset to be created
  RulesetName:  
    Type: String
    Default: "CFNRulesetName"
  # Rules that will be associated with this Ruleset
  Rules:  
    Type: String
    Default: 'Rules = [
        RowCount > 100,
        IsUnique "id",
        IsComplete "nametype"
        ]'
  # The name of the Schedule to be created  
  ScheduleName:  
    Type: String
    Default: "ScheduleDQRulsetEvaluation"
  # This expression determines the rate at which the Schedule will evaluate
  # your data using the above ruleset
  ScheduleRate:
    Type: String
    Default: "rate(5 minutes)"
  # The Request that being sent must match the details of the Data Quality Ruleset
  ScheduleRequest:
    Type: String
    Default: '
        { "DataSource": { "GlueTable": { "DatabaseName": "ExampleDatabaseName",
         "TableName": "ExampleTableName" } },
         "Role": "role/AWSGlueServiceRoleDefault",
          "RulesetNames": [ ""CFNRulesetName"" ] }
        '

# Resources section defines metadata for the Data Catalog
Resources:
  # Creates a Data Quality ruleset under specified rules 
  DQRuleset:
    Type: AWS::Glue::DataQualityRuleset
    Properties:
      Name: !Ref RulesetName
      Description: "CFN DataQualityRuleset"
      # The String within rules must be formatted in DQDL, a language 
      # used specifically to make rules
      Ruleset: !Ref Rules
      # The targeted table must exist within Data Catalog alongside 
      # the correct database
      TargetTable:
        DatabaseName: "ExampleDatabaseName"
        TableName: "ExampleTableName"
  # Create a Scheduler to schedule evaluation runs on the above ruleset
  ScheduleDQEval:
    Type: AWS::Scheduler::Schedule
    Properties: 
      Name: !Ref ScheduleName
      Description: "Schedule DataQualityRuleset Evaluations"
      FlexibleTimeWindow: 
        Mode: "OFF"
      ScheduleExpression: !Ref ScheduleRate
      ScheduleExpressionTimezone: "America/New_York"
      State: "ENABLED"
      Target: 
        # The ARN is the API that will be run, since we want to evaluate our ruleset
        # we want this specific ARN
        Arn: "arn:aws:scheduler:::aws-sdk:glue:startDataQualityRulesetEvaluationRun"
        # Your RoleArn must have approval to schedule
        RoleArn: "arn:aws:iam::123456789012:role/AWSGlueServiceRoleDefault"
        # This is the Request that is being sent to the Arn
        Input: '
        { "DataSource": { "GlueTable": { "DatabaseName": "sampledb", "TableName": "meteorite" } },
         "Role": "role/AWSGlueServiceRoleDefault",
          "RulesetNames": [ "TestCFN" ] }
        '

AWS Glue 개발 엔드포인트의 샘플 AWS CloudFormation 템플릿

AWS Glue 개발 엔드포인트는 AWS Glue 스크립트를 개발하고 테스트하는 데 사용되는 환경입니다.

이 샘플은 성공적으로 샘플을 생성할 때 필요한 최소한의 네트워크 파라미터 값으로 개발 엔드포인트를 생성합니다. 개발 엔드포인터를 설정할 필요가 있는 파라미터에 대한 자세한 내용은 AWS Glue의 개발에 대한 네트워킹 설정를 참조하십시오.

기존 IAM 역할 ARN(Amazon Resource Name)을 제공하여 개발 엔드포인트를 생성합니다. 노트북 서버를 개발 엔드포인터에 생성하고자 한다면 유효한 RSA 퍼블릭 키를 제공하고 대응하는 프라이빗 키를 사용 가능한 상태로 유지합니다.

참고

개발 엔드포인트와 연결되어 생성한 어떤 노트북 서버도 관리합니다. 따라서 엔드포인트를 삭제하고자 한다면 AWS CloudFormation 콘솔에서 AWS CloudFormation 스택을 삭제해야 노트북 서버를 삭제할 수 있습니다.



---
AWSTemplateFormatVersion: '2010-09-09'
# Sample CFN YAML to demonstrate creating a development endpoint
#
# Parameters section contains names that are substituted in the Resources section
# These parameters are the names the resources created in the Data Catalog
Parameters:                                                                                                       
# The name of the crawler to be created
  CFNEndpointName:  
    Type: String
    Default: cfn-devendpoint-1
  CFNIAMRoleArn:
    Type: String
    Default: arn:aws:iam::123456789012/role/AWSGlueServiceRoleGA	
#
#
# Resources section defines metadata for the Data Catalog
Resources:
  CFNDevEndpoint:
    Type: AWS::Glue::DevEndpoint
    Properties:
      EndpointName: !Ref CFNEndpointName
      #ExtraJarsS3Path: String
      #ExtraPythonLibsS3Path: String
      NumberOfNodes: 5
      PublicKey: ssh-rsa public.....key myuserid-key
      RoleArn: !Ref CFNIAMRoleArn
      SecurityGroupIds: 
        - sg-64986c0b
      SubnetId: subnet-c67cccac