Unified storage in Amazon SageMaker Unified Studio
Amazon SageMaker Unified Studio provides flexible file storage options to support your analytics, AI and ML workflows.
Amazon SageMaker Unified Studio brings together the functionality and tools from existing AWS Analytics and AI/ML services into a single data and AI development environment. As you work with different tools like JupyterLab, SQL Editor, Visual ETL Builder, or capabilities from Amazon Bedrock inside Amazon SageMaker Unified Studio you'll create and manage files that represent your work.
S3 storage
Amazon Simple Storage Service (S3) storage is the default option for storage of project files in Amazon SageMaker Unified Studio.
With S3 storage, you can easily share files by moving them between local and shared folders using simple drag-and-drop operations. The file explorer provides a consistent interface across all tools, displaying both local and shared directories in a single view with drag-and-drop functionality for easy file management. It allows users to create, edit, delete, upload, and download files directly through the interface, with optional auto-save capabilities to prevent data loss.
S3 storage provides basic file versioning capabilities when enabled by your administrator. This option is available in all AWS regions where Amazon SageMaker Unified Studio is supported, making it ideal for teams working across different geographic locations.
For more information on configuring S3 storage see Configuring project storage options.
Key benefits of S3 storage:
Simple file management
Easy file sharing with drag-and-drop between folders
Availability in all regions where Amazon S3 is supported
Git-based storage
For projects requiring advanced version control, Amazon SageMaker Unified Studio allows you to connect your project to a Git repository where all project members can access, store, and collaborate on files. This option provides full version control capabilities including comprehensive commit history, branching, and merging.
When you choose Git-based storage, you'll need to specify a repository and branch during project creation. Once the project is created, you'll be able to see the files that were created during repository bootstrapping directly from the project's home page.
With Git-based storage, you'll have access to full Git semantics regardless of whether you're using space-based tools like JupyterLab or web-based tools like SQL Query Editor. This provides a consistent experience for team members accustomed to working with Git.
Key benefits of Git-based storage include:
Full version control with commit history, branching, and merging
Collaboration features like pull requests and code reviews
Cross-project sharing by allowing multiple projects to use the same repository
Integration with existing development workflows
Storage working in different tools
Amazon SageMaker Unified Studio provides a consistent storage experience across different tools while optimizing for each tool's specific requirements.
Web-based tools
When using web-based tools such as Query Editor and Visual ETL, you'll interact with files through a unified File Explorer interface. This explorer displays your shared directory and allows you to navigate and manage shared files seamlessly.
You can perform various file operations directly from the File Explorer:
Create, edit, and delete files and folders
Upload and download files to/from shared storage
Access version history (when available)
Edit files directly within the source
All web-based tools offer optional auto-save functionality, which can be enabled to automatically save your changes as you work. This feature helps prevent data loss if you navigate away from the page or experience connectivity issues.
Space-based tools
Space-based tools like JupyterLab and Code Editor provide access to two types of storage spaces to support both individual work and team collaboration.
Local storage (local folder)
The local storage features dedicated EBS storage that delivers superior performance for frequent file operations within your workspace. Local storage serves as your personal workspace and the files in it are private to your Space.
Within your local storage, you can create and manage subfolders to organize your files effectively. This helps you maintain a structured workspace for different aspects of your work.
When you save files to your local storage, they operate on a 'last write wins' principle—new changes overwrite previous versions without versioning capabilities.
Your local folder
Includes this root folder and any subfolders (except shared)
Serves as your private workspace within each project
Allows you to work on files privately
Is ideal for frequent file access and modification
Is visible only in this space
Remains isolated from other project members, creating a secure environment for experimentation and development
Shared storage (shared folder)
Shared storage is implemented in Amazon S3 or Git repository and is accessible from all Amazon SageMaker Unified Studio tools. Project members can create and manage subfolders within the shared storage to help organize artifacts effectively.
By default, all project members have read, write, update, and delete access to files within the shared storage. This central repository allows team members to access common resources, share completed work, and maintain project artifacts in a single location.
Shared storage operates on a "last write wins" principle, so you have to coordinate with team members when working on the same files to avoid overwriting each other's changes.
The shared folder (Git and non-Git):
Contains files visible to all project members
Functions as a collaborative workspace accessible to all project members
Is accessible across all your tools
Updates immediately when any member adds or modifies files
Operates on a "last write wins" mechanism, so team members should coordinate when working on the same files
Is not well-suited for heavy file read/write workloads due to remote Amazon S3 origin of this folder and potential additional costs associated with frequent Amazon S3 access
If two individuals are modifying the same file in this folder at the same time that might result in losing some changes
You can copy files between these locations as needed, allowing you to optimize your workflow based on performance requirements and collaboration needs. For example, copy files from shared storage to local storage for ML tasks requiring low latency.