將 Delta Lake 叢集與 Spark 和 AWS Glue 搭配使用

若要使用 AWS Glue Catalog 做為 Delta Lake 資料表的中繼存放區，請使用下列步驟建立叢集。如需使用指定 Delta Lake 分類的資訊 AWS Command Line Interface，請參閱在建立叢集 AWS Command Line Interface 時使用提供組態，或在建立叢集時使用 Java 開發套件提供組態。

建立 Delta Lake 叢集

使用下列內容建立檔案 configurations.json：



[{"Classification":"delta-defaults",  
"Properties":{"delta.enabled":"true"}},
{"Classification":"spark-hive-site",
"Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]

使用下列組態建立叢集，並將 example Amazon S3 bucket path 和 subnet ID 取代為您自己的值。



aws emr create-cluster 
    --release-label  emr-6.9.0  
    --applications Name=Spark  
    --configurations file://delta_configurations.json 
    --region us-east-1  
    --name My_Spark_Delta_Cluster  
    --log-uri  s3://amzn-s3-demo-bucket/  
    --instance-type m5.xlarge  
    --instance-count 2   
    --service-role EMR_DefaultRole_V2  
    --ec2-attributes  InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

Delta Lake 搭配 Spark

考量事項