Implementation plan Documents Blogs Videos Examples

MLREL-03: Use a data catalog

Process data across multiple data stores using data catalog technology. An advanced data catalog service can enable ETL process integration. This approach enables more reliability and efficiency.

Implementation plan

Use AWS Glue Data Catalog - The AWS Glue Data Catalog provides a way to track the data assets that have been loaded into your ML workload. Data catalogs also describe how data is transformed as it is loaded into the data lake and data warehouse. AWS Glue is a fully managed ETL (extract, transform, and load) service. It enables a simple and cost-effective approach to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog. It also has an ETL engine that automatically generates Python or Scala code. With a flexible scheduler, AWS Glue handles dependency resolution, job monitoring, and retries.

Documents

Blogs

Videos

Examples

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Reliability pillar best practices

MLREL-04: Use a data pipeline