The AWS advantage in big data analytics

Analyzing large datasets requires significant compute capacity that can vary in size, based on the amount of input data and the type of analysis. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can easily scale up and down based on demand. As requirements change, you can easily resize your environment (horizontally or vertically) on AWS to meet your needs, without having to wait for additional hardware or over-investing to provision enough capacity.

For mission-critical applications on a more traditional infrastructure, system designers have no choice but to over-provision, because a surge in additional data due to an increase in business needs must be something the system can handle. By contrast, on AWS, you can provision more capacity and compute in a matter of minutes, meaning that your big data applications grow and shrink as demand dictates, and your system runs as close to optimal efficiency as possible.

In addition, you get flexible computing on a global infrastructure with access to the many different geographic Regions that AWS offers, along with the ability to use other scalable services that augment to build sophisticated big data applications. These other services include:

Amazon Simple Storage Service (Amazon S3) to store data
AWS Glue to orchestrate jobs to move and transform the data easily
AWS IoT, which lets connected devices interact with cloud applications and other connected devices

As the amount of data being generated continues to grow, AWS has many options to get that data to the cloud, including secure devices like AWS Snow Family to accelerate petabyte-scale data transfers, delivery streams with Amazon Data Firehose to load streaming data continuously, migrating databases using AWS Database Migration Service, and scalable private connections through AWS Direct Connect.

As mobile continues to rapidly grow in usage, you can use the suite of services within the AWS Mobile Hub to collect and measure app usage and data, or export that data to another service for further custom analysis.

These capabilities of AWS make it an ideal fit for solving big data problems, and many customers have implemented successful big data analytics workloads on AWS. For more information about case studies, see Big Data Customer Success Stories.

The following services for collecting, processing, storing, and analyzing big data are described in order:

Amazon Kinesis
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
AWS Lambda
Amazon Elastic Map Reduce (Amazon EMR)
AWS Glue
AWS Lake Formation
Amazon Machine Learning
Amazon DynamoDB
Amazon Redshift
Amazon OpenSearch Service (OpenSearch Service)
QuickSight
Amazon Compute Services (Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (Amazon EKS) are available for self-managed big data applications.)
Amazon Athena

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Introduction

Amazon Kinesis