10/12/2022
New functionality:
You can now reuse data flows for different data sets. For more
information, see Reusing Data Flows for Different Datasets.
10/05/2022
New functionality:
You can now use Principal Component Analysis (PCA) as a transform. For more
information, see Reduce Dimensionality within a Dataset.
10/05/2022
New functionality:
You can now refit parameters in your Data Wrangler flow. For more
information, see Export.
10/03/2022
New functionality:
You can now deploy models from your Data Wrangler flow. For more
information, see Automatically Train Models on Your Data
Flow.
9/20/2022
New functionality:
You can now set data retention periods in Athena. For more
information, see Import data from Athena.
6/9/2022
New functionality:
You can now use Amazon SageMaker Autopilot to train a model directly from your Data Wrangler flow. For more
information, see Automatically Train Models on Your Data
Flow.
5/6/2022
New functionality:
You can now use additional m5 and r5 instances. For more
information, see Instances.
4/27/2022
New functionalities:
4/1/2022
New functionality:
You can now use Databricks as a data source. For more
information, see Import data from Databricks (JDBC).
2/2/2022
New functionalities:
-
You can now export using destination nodes. For more
information, see Export
-
You can import ORC and JSON files. For more information about
file types, see Import.
-
Data Wrangler now supports using the SMOTE transform. For more
information, see Balance Data.
-
Data Wrangler now supports similarity encoding for categorical data.
For more information, see Similarity
encode.
-
Data Wrangler now supports unnesting JSON data. For more information,
see Unnest JSON Data.
-
Data Wrangler now supports expanding the values of an array into
separate columns. For more information, see Explode Array.
-
Data Wrangler now supports reaching out to the service team when you're
having issues. For more information, see Troubleshoot.
-
Data Wrangler supports editing and deleting steps in your data flow.
For more information, see Delete a Step from Your Data
Flow and
Edit a Step in Your Data Wrangler
Flow.
-
You can now perform transformations on multiple columns. For
more information, see Transform Data.
-
Data Wrangler now supports cost allocation tags. For more information,
see Using Cost Allocation Tags.
10/16/2021
New functionality:
Data Wrangler now supports Athena workgroups. For more information, see Import data from Athena.
10/6/2021
New functionality:
Data Wrangler now supports transforming time series data. For more information,
see Transform Time Series.
7/15/2021
New functionalities:
-
Snowflake and Data Wrangler is now
supported. You can use Snowflake as a data source in
Data Wrangler.
-
Added support for custom field delimiter in CSV. Now comma,
colon, semicolon, pipe (|) and Tab are supported.
-
Now you can export results directly to Amazon S3.
-
Added a few new multicollinearity analyzers: Variance
Inflation Factors, Principal Component Analysis and Lasso
feature selection.
Enhancements:
Bug Fixes:
4/26/2021
Enhancements:
-
Added support for distributed processing Jobs. You can use
multiple instances when running a processing job.
-
Data Wrangler Processing job now automatically coalesces small outputs
when estimated result size is less than 1 gigabytes.
-
Feature Store Notebook: Improved feature store ingestion
performance
-
Data Wrangler Processing jobs now use 1.x as the authoritative
container tag for future releases.
Bug Fixes:
-
Fixed rendering issues for faceted histogram.
-
Fixed Export to Processing Job to support
vector type columns.
-
Fixed Extract using regex operator to return the
first captured group if one or more exists in the regular
expression or regex.
2/8/2021
New Functionalities:
-
Data Wrangler Flows supports multiple instances.
-
Updated Export to Data Wrangler Job Notebook to use SageMaker SDK
2.20.0.
-
Updated Export to Pipeline Notebook to use SageMaker SDK
2.20.0.
-
Updated Export to Pipeline Notebook to add XGBoost training
example as an optional step.
Enhancements:
Bug Fixes:
-
Fixed type inference issue in Quick model.
-
Fixed the bias metric bug in bias reports.
-
Fixed the Featurize text transform to work with columns with
missing values.
-
Fixed Histogram and Scatter plot built-in visualizations to
work with datasets that contain array-like columns.
-
Athena query now re-runs if the query execution ID has
expired.
|