Amazon EC2 Option - Teaching Big Data Skills with Amazon EMR

Amazon EC2 Option

Although this document focuses on teaching big-data concepts using Amazon EMR, another deployment option is to deploy Hadoop workloads on Amazon Elastic Compute Cloud (Amazon EC2) instances. In fact, the underlying compute platform that EMR runs on is EC2. However, the value of using the EMR managed service is the simplification of management of these nodes, and the deployment of the EMR software packages necessary to run these workloads in a single AWS point of management. In addition, the EMR service takes care of automatic scaling based on Hadoop workload metrics as well as mapping Unix user permissions to Amazon S3 access policies via EMR File System (EMRFS). Although technically possible to provision the same workloads on EC2 instances, EC2 provisioning is outside the scope of this post. We recommend that you consider EMR to simplify managing your environment and free up more time to develop Hadoop coursework.