Data lake design principles - Amazon Connect Data Lake Best Practices

Data lake design principles

Building a data lake can break down data silos and democratize data for value extraction. A central data repository empowers organizations to make data-driven decisions and innovate quickly.

Organizations want a cost-effective and elastic storage capacity to store disparate data sources that grow exponentially. They want to centrally govern and share vast amounts of data across different business units. Furthermore, they want to empower their employees and stakeholders to derive business insights with shorter time-to-value.

Considerations when designing a data lake:

  • How do you collect, store, and analyze high-velocity data across various data types, including structured, unstructured, and semi-structured?

  • How do you store and share petabytes of data on-demand globally and cost-effectively?

  • How do you scale IT resources to support a high number of concurrent queries against your data and scale down automatically for cost savings?

  • How do your users view, search, and run queries on multiple data repositories today?

  • How do you derive future insights using historical data patterns and past scenarios?