Data collection
You can collect data from a variety of sources within AWS, but it's important to choose the right data collection tool for your use case. The following diagram shows how the data collection stage fits into the data engineering automation and access control lifecycle.

AWS provides the following data collection tools:
-
Amazon Kinesis
helps you collect streaming data. Kinesis also offers seamless integration and processing capabilities. -
AWS Database Migration Service (AWS DMS)
helps you ingest data from relational databases. AWS DMS has configuration options and direct connections between on-premises and database services, such as Amazon Simple Storage Service (Amazon S3), that are hosted on AWS. -
AWS Glue
is an extract, transform, and load (ETL) tool that helps you ingest unstructured data.
There are several use cases for collecting unstructured or semi-structured data by using Amazon S3 for storage. For example, a manufacturing site’s data collection use case could require historical data to be ingested for machine history data as XML files, event data as JSON files, and purchase data from a relational database. This use case could also require that all three data sources must be joined.
Before you start the data ingestion process, we recommend that you understand what data must be ingested, and then choose the right tool to collect this data.