FAQ
This section provides answers to commonly raised questions about designing a data lake for growth and scale on the AWS Cloud.
Is this data lake reference architecture more applicable to enterprise organizations?
This guide's data lake reference architecture can be applied to data lakes belonging to organizations of any size. The reference architecture standardizes the data exchange interface, lowers the overhead and cost to maintain and grow the data lake, and can be applied to any scale that your organization’s data lake grows to.
Can I still use this reference architecture if my organization only has one data producer?
This guide's data lake reference architecture is still relevant and beneficial even if your organization only has one data producer. Without the centralized catalog, your data producer has to handle the growth of data consumers, which adds increasing complexity and overhead. Your data lake is also a long-term asset for your organization and typically organizations add more data producers. For example, you might need an additional data producer to store sensitive data for compliance reasons or because your organization acquires another business unit that has its own data producer.
My data lake directly connects one data producer with multiple data consumers. Is this guide's data lake reference architecture still relevant?
The data lake reference architecture would benefit your organization in the long term. You could use a two-step approach and begin by building the centralized catalog for new data consumers. You could then connect your existing data consumers to the centralized catalog.
Should my organization follow the onboarding and access granting workflow without making changes to it?
No, the main purpose of that section is to illustrate the logical activity blocks required during the onboarding process. All organizations should customize the process and might even have multiple processes, depending on the sensitivity of their data.
Another consideration is that the process flow uses the resource-based sharing approach in AWS Lake Formation. There are other data-sharing methods supported by Lake Formation, such as tag-based sharing, where differences in the process can be tailored for the specific sharing method.