Appendix A: Using Your Own Data - Machine Learning for Telecommunication

Appendix A: Using Your Own Data

This solution includes a synthetic dataset for model-training purposes, and is intended for novice machine learning (ML) data scientists. (We would like to thank Ribbon Communications for providing the synthetic dataset. The data was generated by test data generators, and is not customer or sensitive data). If you choose to use the Jupyter notebooks against your own datasets, we recommend following the AWS best practices for uploading data into Amazon S3 to ensure that your data is uploaded quickly and securely.

Use the following steps to use your own datasets in the Jupyter notebooks:

  1. Update the Source Bucket and Source Prefix location in AWS Cloudformation template to point to the location of your data in Amazon S3.

  2. Transferring your synthetic data is optional. Modify the Synthetic Data to No, if the Demo Data transfer is not required.

  3. Select a notebook to run.

  4. Modify the value of the bucket_name variable to the Amazon S3 folder location of the Parquet files.

  5. Choose Cell, then select Run all.


AWS Glue charges by the amount of data that is scanned and processed. Customers who have large datasets can save on costs and achieve better performance if the data is partitioned, compressed, or converted into a columnar format such as, Apache Parquet format.