This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.
Network Analytics
This section presents a network analytics architecture on AWS that provides
flexibility, scalability, and innovation through Machine Learning (ML) integration. The
components to a network analytics solution can be divided in four categories: ingestion,
storage, processing and analysis, and consumption. The following reference architecture
illustrates the AWS services that support the proposed architecture.
Data can be ingested through AWS
Transfer for Secure File Transfer Protocol (SFTP) to periodically collect data from
NFx, Domain Managers, Custom Edge collectors, and legacy network performance analytics
solutions. Similarly, you can leverage Kinesis
and/or Amazon MSK to inject real-time performance
data such as events-driven messages (for example, UE attach). Kinesis supports real-time data
streaming where data collected is available in milliseconds to enable real-time analytics use
cases.
Amazon S3 provides flexible, scalable, and performant storage. Amazon S3 enables DSPs to manage
data and access controls, query-in-place for analytics, and provide a wide range of
cost-effective storage classes. AWS Lake Formation
(Lake Formation) provides an effective, simple way to secure the data lake supporting your network
analytics solution. You can use one single data lake for your data, whether it is
untransformed network performance data or enriched performance data. You can govern access to
the data by allowing read instructions from an operations team to a given table while allowing
a development team the ability to alter it. Data lakes provide you with the ability to reduce
data duplication by governing what can be consumed and how it can be consumed, and providing
one viewpoint of DSP’s performance (and configuration) data.
An AWS Glue
Crawler crawls into your data lake to identify the format and create the tables (or
updates) in your Data Catalog. It creates the structure that allows you to query your data.
For example, if an operator initiates ingestion and loads data into the Amazon S3 buckets for a new NFx, DSPs can define the AWS Glue
Crawler that will go through the NFx performance data and identify its metadata. Once the
AWS Glue Data Catalog is built using the AWS Glue Crawler, DSPs have the ability to easily
query their data using Amazon Athena (a serverless
interactive query service that allows you to analyze data in Amazon S3). DSPs can access their
network data on the fly and perform complex SQL queries.
EMR can be leveraged to process the vast amount of
network data. EMR makes it easy for the operator to set up, operate, and scale their big data
environment by automating time-consuming tasks (like provisioning capacity and tuning
clusters). Similarly, DSPs can leverage Kinesis to ingest real-time data and run an AWS Lambda function to transform the
ingested data.
DSPs can leverage Amazon Redshift (Redshift) as
a data warehouse solution to create specialized views and procedures, and support their
network analytics needs. AWS Glue ETL jobs can be
leveraged to create a database schema in Redshift and copy data from Amazon S3 to Redshift.
Amazon QuickSight makes it easy for DSPs to build
dashboards showing the performance of their network, share that information across engineering
and leadership groups, and support quick integration with ML-powered insights. QuickSight
reads from Redshift, from Amazon S3 through Athena, etc., making it a great
Business Intelligence (BI) tool to correlate data at various stages of a given analysis path.
AWS services integrate easily with existing DSPs’ in-house consumption solutions by
providing the tools, APIs, and security necessary. For example, DSPs can perform SQL queries
towards Redshift to feed into their legacy reporting systems using the same SQL queries used
in their current set of queries.