This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.
The AWS advantage in big data analytics
Analyzing large datasets requires significant compute capacity that can vary in size, based on the amount of input data and the type of analysis. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can easily scale up and down based on demand. As requirements change, you can easily resize your environment (horizontally or vertically) on AWS to meet your needs, without having to wait for additional hardware or over-investing to provision enough capacity.
For mission-critical applications on a more traditional infrastructure, system designers have no choice but to over-provision, because a surge in additional data due to an increase in business needs must be something the system can handle. By contrast, on AWS, you can provision more capacity and compute in a matter of minutes, meaning that your big data applications grow and shrink as demand dictates, and your system runs as close to optimal efficiency as possible.
In addition, you get flexible computing on a global infrastructure with access to the many
different geographic
Regions
-
Amazon Simple Storage Service
(Amazon S3) to store data -
AWS Glue
to orchestrate jobs to move and transform the data easily -
AWS IoT
, which lets connected devices interact with cloud applications and other connected devices
As the amount of data being generated continues to grow, AWS has many options to get that
data to the cloud, including secure devices like AWS
Snow Family
As mobile continues to rapidly grow in usage, you can use the suite of services within the
AWS Mobile Hub
These capabilities of AWS make it an ideal fit for solving big data problems, and many
customers have implemented successful big data analytics workloads on AWS. For more information
about case studies, see Big Data
Customer Success Stories
The following services for collecting, processing, storing, and analyzing big data are described in order:
-
Amazon Managed Streaming for Apache Kafka
(Amazon MSK) -
Amazon Elastic Map Reduce
(Amazon EMR) -
Amazon OpenSearch Service
(OpenSearch Service) -
Amazon Compute Services
(Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (Amazon EKS) are available for self-managed big data applications.)