MLCOST-17: Start training with small datasets - Machine Learning Lens

MLCOST-17: Start training with small datasets

Start experimentation with smaller datasets on a small compute instance or local system. This approach allows you to iterate quickly at low cost. After the experimentation period, scale up to train with the full dataset available on a separate compute cluster. Choose the appropriate storage layer for training data based on the performance requirements.

Implementation plan

  • Use SageMaker notebooks - Notebooks are a popular way to explore and experiment with data in small quantities. Iterating with a small sample of the dataset locally and then scaling to train on the full dataset in a distributed manner is common in machine learning. Amazon SageMaker notebook instances provide a hosted Jupyter environment that can be used to explore small samples of data.  

Documents