Processing and Analytics Layer - Analytics Lens

Processing and Analytics Layer

The processing and analytics layer is responsible for providing tools and services for querying and processing (that is, cleansing, validating, transforming, enriching and normalizing) the datasets to derive business insights in both batch and real time streaming mode. There are many services that can be used for the processing and analytics layer.

Amazon EMR is a managed service to easily run and scale Apache Spark, Hadoop, HBase, Presto, Hive, and other big data frameworks across dynamically scalable Amazon EC2 instances and interact with data in other AWS data stores, such as Amazon S3 and Amazon DynamoDB.

Amazon Redshift is a fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. Redshift Spectrum is a feature of Amazon Redshift that enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required. Redshift Spectrum can execute highly sophisticated queries against an exabyte of data or more—in just minutes.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena integrates with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas, populate your catalog with new and modified table and partition definitions, and maintain schema versioning.

With Amazon Neptune, you can create a fast, reliable, fully managed graph database that makes it easy to build and run applications that work with highly connected datasets. It supports popular graph models, such as Property Graph and W3C's RDF, and their respective query languages, Apache TinkerPop Gremlin and SPARQL. Amazon Neptune can power graph relationship use cases, such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Amazon SageMaker is a fully managed machine learning platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Data scientists can use it to easily create, train, and deploy ML models against data lake elements.

Existing services can also be used for processing and analytics, including Amazon Kinesis, Amazon RDS, Apache Kafka, and AWS Glue ETL jobs.