Release Notes - Amazon SageMaker

Release Notes

Data Wrangler is regularly updated with new features and bug fixes. To upgrade the version of Data Wrangler you are using in Studio, follow the instructions in Shut down and Update Studio Apps.

Release Notes

10/12/2022

New functionality:

You can now reuse data flows for different data sets. For more information, see Reusing Data Flows for Different Datasets.

10/05/2022

New functionality:

You can now use Principal Component Analysis (PCA) as a transform. For more information, see Reduce Dimensionality within a Dataset.

10/05/2022

New functionality:

You can now refit parameters in your Data Wrangler flow. For more information, see Export.

10/03/2022

New functionality:

You can now deploy models from your Data Wrangler flow. For more information, see Automatically Train Models on Your Data Flow.

9/20/2022

New functionality:

You can now set data retention periods in Athena. For more information, see Import data from Athena.

6/9/2022

New functionality:

You can now use Amazon SageMaker Autopilot to train a model directly from your Data Wrangler flow. For more information, see Automatically Train Models on Your Data Flow.

5/6/2022

New functionality:

You can now use additional m5 and r5 instances. For more information, see Instances.

4/27/2022

New functionalities:

4/1/2022

New functionality:

You can now use Databricks as a data source. For more information, see Import data from Databricks (JDBC).

2/2/2022

New functionalities:

  • You can now export using destination nodes. For more information, see Export

  • You can import ORC and JSON files. For more information about file types, see Import.

  • Data Wrangler now supports using the SMOTE transform. For more information, see Balance Data.

  • Data Wrangler now supports similarity encoding for categorical data. For more information, see Similarity encode.

  • Data Wrangler now supports unnesting JSON data. For more information, see Unnest JSON Data.

  • Data Wrangler now supports expanding the values of an array into separate columns. For more information, see Explode Array.

  • Data Wrangler now supports reaching out to the service team when you're having issues. For more information, see Troubleshoot.

  • Data Wrangler supports editing and deleting steps in your data flow. For more information, see Delete a Step from Your Data Flow and Edit a Step in Your Data Wrangler Flow.

  • You can now perform transformations on multiple columns. For more information, see Transform Data.

  • Data Wrangler now supports cost allocation tags. For more information, see Using Cost Allocation Tags.

10/16/2021

New functionality:

Data Wrangler now supports Athena workgroups. For more information, see Import data from Athena.

10/6/2021

New functionality:

Data Wrangler now supports transforming time series data. For more information, see Transform Time Series.

7/15/2021

New functionalities:

  • Snowflake and Data Wrangler is now supported. You can use Snowflake as a data source in Data Wrangler.

  • Added support for custom field delimiter in CSV. Now comma, colon, semicolon, pipe (|) and Tab are supported.

  • Now you can export results directly to Amazon S3.

  • Added a few new multicollinearity analyzers: Variance Inflation Factors, Principal Component Analysis and Lasso feature selection.

Enhancements:

  • The analyze charts can no longer be could be packed with overlapping labels.

Bug Fixes:

  • One-hot encoder handles empty string gracefully.

  • Fixed crashes that occured when a dataframe column name contained dots.

4/26/2021

Enhancements:

  • Added support for distributed processing Jobs. You can use multiple instances when running a processing job.

  • Data Wrangler Processing job now automatically coalesces small outputs when estimated result size is less than 1 gigabytes.

  • Feature Store Notebook: Improved feature store ingestion performance

  • Data Wrangler Processing jobs now use 1.x as the authoritative container tag for future releases.

Bug Fixes:

  • Fixed rendering issues for faceted histogram.

  • Fixed Export to Processing Job to support vector type columns.

  • Fixed Extract using regex operator to return the first captured group if one or more exists in the regular expression or regex.

2/8/2021

New Functionalities:

  • Data Wrangler Flows supports multiple instances.

  • Updated Export to Data Wrangler Job Notebook to use SageMaker SDK 2.20.0.

  • Updated Export to Pipeline Notebook to use SageMaker SDK 2.20.0.

  • Updated Export to Pipeline Notebook to add XGBoost training example as an optional step.

Enhancements:

  • To improve performance, importing CSV files that contain multiple lines in a single field is no longer supported.

Bug Fixes:

  • Fixed type inference issue in Quick model.

  • Fixed the bias metric bug in bias reports.

  • Fixed the Featurize text transform to work with columns with missing values.

  • Fixed Histogram and Scatter plot built-in visualizations to work with datasets that contain array-like columns.

  • Athena query now re-runs if the query execution ID has expired.