

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# AWS Glue PySpark 轉換參考
<a name="aws-glue-programming-python-transforms"></a>

AWS Glue 提供下列內建轉換，您可以在 PySpark ETL 操作中使用。您的資料會在資料結構中從轉換傳遞至轉換，而此資料結構稱為 *DynamicFrame*，是 Apache Spark SQL `DataFrame` 的延伸。`DynamicFrame` 包含您的資料，而您可以參考其結構描述以處理資料。

這些轉換大多數也作為 `DynamicFrame` 類別的方法存在。如需詳細資訊，請參閱 [DynamicFrame 轉換](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-_transforms)。
+ [GlueTransform base 類別](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md)
+ [ApplyMapping 類別](aws-glue-api-crawler-pyspark-transforms-ApplyMapping.md)
+ [DropFields 類別](aws-glue-api-crawler-pyspark-transforms-DropFields.md)
+ [DropNullFields 類別](aws-glue-api-crawler-pyspark-transforms-DropNullFields.md)
+ [ErrorsAsDynamicFrame 類別](aws-glue-api-crawler-pyspark-transforms-ErrorsAsDynamicFrame.md)
+ [EvaluateDataQuality 類別](aws-glue-api-crawler-pyspark-transforms-EvaluateDataQuality.md)
+ [FillMissingValues 類別](aws-glue-api-crawler-pyspark-transforms-fillmissingvalues.md)
+ [Filter 類別](aws-glue-api-crawler-pyspark-transforms-filter.md)
+ [FindIncrementalMatches 類別](aws-glue-api-crawler-pyspark-transforms-findincrementalmatches.md)
+ [FindMatches 類別](aws-glue-api-crawler-pyspark-transforms-findmatches.md)
+ [FlatMap 類別](aws-glue-api-crawler-pyspark-transforms-flat-map.md)
+ [Join 類別](aws-glue-api-crawler-pyspark-transforms-join.md)
+ [Map 類別](aws-glue-api-crawler-pyspark-transforms-map.md)
+ [MapToCollection 類別](aws-glue-api-crawler-pyspark-transforms-MapToCollection.md)
+ [mergeDynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-merge)
+ [Relationalize 類別](aws-glue-api-crawler-pyspark-transforms-Relationalize.md)
+ [RenameField 類別](aws-glue-api-crawler-pyspark-transforms-RenameField.md)
+ [ResolveChoice 類別](aws-glue-api-crawler-pyspark-transforms-ResolveChoice.md)
+ [SelectFields 類別](aws-glue-api-crawler-pyspark-transforms-SelectFields.md)
+ [SelectFromCollection 類別](aws-glue-api-crawler-pyspark-transforms-SelectFromCollection.md)
+ [Simplify\_ddb\_json 類別](aws-glue-api-crawler-pyspark-transforms-simplify-ddb-json.md)
+ [Spigot 類別](aws-glue-api-crawler-pyspark-transforms-spigot.md)
+ [SplitFields 類別](aws-glue-api-crawler-pyspark-transforms-SplitFields.md)
+ [SplitRows 類別](aws-glue-api-crawler-pyspark-transforms-SplitRows.md)
+ [Unbox 類別](aws-glue-api-crawler-pyspark-transforms-Unbox.md)
+ [UnnestFrame 類別](aws-glue-api-crawler-pyspark-transforms-UnnestFrame.md)

## 資料整合轉換
<a name="aws-glue-programming-python-di-transforms"></a>

 對於 AWS Glue 4.0 及更高版本，請使用 建立或更新任務引數`key: --enable-glue-di-transforms, value: true`。

 範例任務指令碼：

```
from pyspark.context import SparkContext
        
from awsgluedi.transforms import *
sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()   
except:
    print("Unexpected Error happened ")
    raise
```

 使用筆記本的範例工作階段 

```
%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%region eu-west-1
```

```
%%configure
{
    "--enable-glue-di-transforms": "true"
}
```

```
from pyspark.context import SparkContext
from awsgluedi.transforms import *

sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()    
except:
    print("Unexpected Error happened ")
    raise
```

 使用 的範例工作階段 AWS CLI 

```
aws glue create-session --default-arguments "--enable-glue-di-transforms=true"
```

 DI 轉換：
+  [FlagDuplicatesInColumn 類別](aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn.md) 
+  [FormatPhoneNumber 類別](aws-glue-api-pyspark-transforms-FormatPhoneNumber.md) 
+  [FormatCase 類別](aws-glue-api-pyspark-transforms-FormatCase.md) 
+  [FillWithMode 類別](aws-glue-api-pyspark-transforms-FillWithMode.md) 
+  [FlagDuplicateRows 類別](aws-glue-api-pyspark-transforms-FlagDuplicateRows.md) 
+  [RemoveDuplicates 類別](aws-glue-api-pyspark-transforms-RemoveDuplicates.md) 
+  [MonthName 類別](aws-glue-api-pyspark-transforms-MonthName.md) 
+  [IsEven 類別](aws-glue-api-pyspark-transforms-IsEven.md) 
+  [CryptographicHash 類別](aws-glue-api-pyspark-transforms-CryptographicHash.md) 
+  [解密類別](aws-glue-api-pyspark-transforms-Decrypt.md) 
+  [加密類別](aws-glue-api-pyspark-transforms-Encrypt.md) 
+  [IntToIp 類別](aws-glue-api-pyspark-transforms-IntToIp.md) 
+  [IpToInt 類別](aws-glue-api-pyspark-transforms-IpToInt.md) 

### Maven：將外掛程式與 Spark 應用程式綁定在一起
<a name="aws-glue-programming-python-di-transforms-maven"></a>

 您可以在本機開發 Spark 應用程式時，透過在 Maven `pom.xml` 中新增外掛程式相依項，將轉換相依項與您的 Spark 應用程式和 Spark 發行版本 (3.3 版) 捆綁在一起。

```
<repositories>
   ...
    <repository>
        <id>aws-glue-etl-artifacts</id>
        <url>https://aws-glue-etl-artifacts.s3.amazonaws.com/release/ </url>
    </repository>
</repositories>
...
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>AWSGlueTransforms</artifactId>
    <version>4.0.0</version>
</dependency>
```

 或者，您也可以直接從 AWS Glue Maven 成品下載二進位檔，並將其包含在 Spark 應用程式中，如下所示。

```
#!/bin/bash
sudo wget -v https://aws-glue-etl-artifacts.s3.amazonaws.com/release/com/amazonaws/AWSGlueTransforms/4.0.0/AWSGlueTransforms-4.0.0.jar -P /usr/lib/spark/jars/
```