Analysis phase - AWS Prescriptive Guidance

Analysis phase

By processing PDF files, you extract content that can be used for further processing and analysis. For example, you can identify cost trends by using the cost fields of daily operations reports or generate insights by aggregating key performance indicator (KPIs) for business operations. You can also combine extracted content with other data sources, including data lakes, data warehouses, third-party data, or customer relationship management (CRM) data to perform in-depth business analytics.

Amazon QuickSight is a serverless business intelligence service that connects to the Amazon Simple Storage Service (Amazon S3) bucket that contains your extracted PDF file data. Your business analysts can then create a dashboard to analyze, visualize, and directly generate insights from the JSON files in the S3 bucket. The dashboard connects to the S3 bucket and automatically updates after new PDF files are processed. You can also share the dashboard with different users and users can also subscribe to the dashboard to view it on a mobile device. For more information about this, see Creating a dataset using Amazon S3 files in the Amazon QuickSight documentation.

Most PDF files also contain rich text content inside forms and tables or in a free text paragraph. After the text content is extracted, the rich text content can be used by other AWS artificial intelligence and machine learning (AI/ML) services that can handle natural-language processing (NLP), such as Amazon Comprehend or Amazon Translate. You can also use Amazon Kendra for indexing and searching documents extracted from a large database of PDF files.

Your data scientists and ML engineers can also use Amazon SageMaker to directly access the extracted data in the S3 bucket or Amazon DynamoDB table and then implement advanced ML modeling and prediction.

Best practices for the analysis phase

You can use the following two best practices to ensure a successful analytics phase:

  • Create a manifest file to use an S3 bucket as a data source for Amazon QuickSight. For more information about this, see Create an analysis using your own Amazon S3 data in the Amazon QuickSight documentation.

  • Automatically update your dataset to capture any new data added to Amazon S3 and refresh your dashboard. For more information about this, see Refreshing a dataset on a schedule in the Amazon QuickSight documentation.