Release Notes - Amazon SageMaker

Release Notes

Data Wrangler is regularly updated with new features and bug fixes. To upgrade the version of Data Wrangler you are using in Studio Classic, follow the instructions in Shut down and Update Studio Classic Apps.

Release Notes

8/31/2023

New functionality:

You can now create a Data Quality and Insights report on your entire dataset. For more information, see Get Insights On Data and Data Quality.

5/20/2023

New functionality:

You can now import your data from Salesforce Data Cloud. For more information, see Import data from Salesforce Data Cloud.

4/18/2023

New functionality:

You can now get your data in a format that Amazon Personalize can interpret. For more information, see Map Columns for Amazon Personalize.

3/1/2023

New functionality:

You can now use Hive to import your data from Amazon EMR. For more information, see Import data from Amazon EMR.

12/10/2022

New functionality:

You can now export your Data Wrangler flow to an inference endpoint. For more information, see Export to an Inference Endpoint.

New functionality:

You can now use an interactive notebook widget for data preparation. For more information, see Use an Interactive Data Preparation Widget in an Amazon SageMaker Studio Classic Notebook to Get Data Insights.

New functionality:

You can now import data from SaaS platforms. For more information, see Import Data From Software as a Service (SaaS) Platforms.

10/12/2022

New functionality:

You can now reuse data flows for different data sets. For more information, see Reusing Data Flows for Different Datasets.

10/05/2022

New functionality:

You can now use Principal Component Analysis (PCA) as a transform. For more information, see Reduce Dimensionality within a Dataset.

10/05/2022

New functionality:

You can now refit parameters in your Data Wrangler flow. For more information, see Export.

10/03/2022

New functionality:

You can now deploy models from your Data Wrangler flow. For more information, see Automatically Train Models on Your Data Flow.

9/20/2022

New functionality:

You can now set data retention periods in Athena. For more information, see Import data from Athena.

6/9/2022

New functionality:

You can now use Amazon SageMaker Autopilot to train a model directly from your Data Wrangler flow. For more information, see Automatically Train Models on Your Data Flow.

5/6/2022

New functionality:

You can now use additional m5 and r5 instances. For more information, see Instances.

4/27/2022

New functionalities:

4/1/2022

New functionality:

You can now use Databricks as a data source. For more information, see Import data from Databricks (JDBC).

2/2/2022

New functionalities:

  • You can now export using destination nodes. For more information, see Export

  • You can import ORC and JSON files. For more information about file types, see Import.

  • Data Wrangler now supports using the SMOTE transform. For more information, see Balance Data.

  • Data Wrangler now supports similarity encoding for categorical data. For more information, see Similarity encode.

  • Data Wrangler now supports unnesting JSON data. For more information, see Unnest JSON Data.

  • Data Wrangler now supports expanding the values of an array into separate columns. For more information, see Explode Array.

  • Data Wrangler now supports reaching out to the service team when you're having issues. For more information, see Troubleshoot.

  • Data Wrangler supports editing and deleting steps in your data flow. For more information, see Delete a Step from Your Data Flow and Edit a Step in Your Data Wrangler Flow.

  • You can now perform transformations on multiple columns. For more information, see Transform Data.

  • Data Wrangler now supports cost allocation tags. For more information, see Using Cost Allocation Tags.

10/16/2021

New functionality:

Data Wrangler now supports Athena workgroups. For more information, see Import data from Athena.

10/6/2021

New functionality:

Data Wrangler now supports transforming time series data. For more information, see Transform Time Series.

7/15/2021

New functionalities:

  • Snowflake and Data Wrangler is now supported. You can use Snowflake as a data source in Data Wrangler.

  • Added support for custom field delimiter in CSV. Now comma, colon, semicolon, pipe (|) and Tab are supported.

  • Now you can export results directly to Amazon S3.

  • Added a few new multicollinearity analyzers: Variance Inflation Factors, Principal Component Analysis and Lasso feature selection.

Enhancements:

  • The analyze charts can no longer be could be packed with overlapping labels.

Bug Fixes:

  • One-hot encoder handles empty string gracefully.

  • Fixed crashes that occured when a dataframe column name contained dots.

4/26/2021

Enhancements:

  • Added support for distributed processing Jobs. You can use multiple instances when running a processing job.

  • Data Wrangler Processing job now automatically coalesces small outputs when estimated result size is less than 1 gigabytes.

  • Feature Store Notebook: Improved feature store ingestion performance

  • Data Wrangler Processing jobs now use 1.x as the authoritative container tag for future releases.

Bug Fixes:

  • Fixed rendering issues for faceted histogram.

  • Fixed Export to Processing Job to support vector type columns.

  • Fixed Extract using regex operator to return the first captured group if one or more exists in the regular expression or regex.

2/8/2021

New Functionalities:

  • Data Wrangler Flows supports multiple instances.

  • Updated Export to Data Wrangler Job Notebook to use SageMaker SDK 2.20.0.

  • Updated Export to Pipeline Notebook to use SageMaker SDK 2.20.0.

  • Updated Export to Pipeline Notebook to add XGBoost training example as an optional step.

Enhancements:

  • To improve performance, importing CSV files that contain multiple lines in a single field is no longer supported.

Bug Fixes:

  • Fixed type inference issue in Quick model.

  • Fixed the bias metric bug in bias reports.

  • Fixed the Featurize text transform to work with columns with missing values.

  • Fixed Histogram and Scatter plot built-in visualizations to work with datasets that contain array-like columns.

  • Athena query now re-runs if the query execution ID has expired.