Application of the Māori data guidance
This section outlines how the Māori data guidance could be applied to the digital archive solution scenario and the proposed solution architecture. It is written from the perspective of the software development company.
Operational excellence
How do you incorporate Māori views into your technology governance and operations? This section focuses on developing general knowledge of te ao Māori within your organisation, especially as it relates to how your organisation works with Māori as customers. In this scenario, consider the following:
-
What level of Māori cultural capability does you organisation have to respond to this scenario? Is it enough to help you understand some of the needs and requirements that the iwi customer may have? If not, in the short term you may consider engaging a Māori expert to help your organisation work with this specific customer. In the longer term, you can consider developing this knowledge in your organisation.
-
If your offering incorporates aspects of ongoing operations or support for the application, what can you do to incorporate the needs of your Māori customer? This may include the need to develop processes that provide the flexibility to support specific tikanga that your Māori customer may wish to incorporate into operational and support processes. For example, in the process of responding to a support call, staff may need to access part of the system that stores tapu (sensitive) data. The customer may wish that a specific protocol be followed by staff when accessing this tapu data.
How can you design data collection with your Māori customers in mind? This section focuses on how data is captured or collected. In this scenario, consider the following:
-
The system allows external people to register for an account to use the system. Consider what data is required to allow a user to sign up and use the system. In this scenario, it may be appropriate to allow a user to indicate what hapū they associate with, as this could be used to help present relevant content to that user. Alternatively, rather than collecting data about a user, you could allow the user to specify their interests. These interests could then be used to personalise the content suggested to the user. At the time of registration, any relevant personal information collection notices should be clearly presented to the user in easy-to-understand language.
-
The system allows users to make submissions for inclusion in the archive. For example, an authorised user could submit photos or images from an event they attended or submit historical documents or images that their whānau has collected over time. Given the solution is a digital archive, there should be requirements relating to the capture of metadata, like capturing where the data came from or what kind of rights the owner has given the holder of that information. From a Māori data perspective, the owner may have specific requirements for accessing or handling the data. This may require certain access and use restrictions to be put in place.
How do you use or share Māori data back with Māori? This section focuses on how Māori data is used or shared. Use and sharing can be from the perspective of a Māori organisation using the data, as is the case in this scenario, but it could also apply to third parties who collect, store, or generate Māori data, such as a medical centre, a government agency, or a non-profit delivering community services. The archive system in this scenario is designed to capture and store digital items by the iwi organisation. Some of the considerations in this section are from the perspective of the iwi and how they use or share the data captured and stored in the archives system.
-
One of the key objectives for the archive solution is to make data accessible, and the features of the solution reflect this. In this scenario, the consideration of how Māori data could be shared back with Māori might mean how information from the iwi archive could be shared with other Māori organisations. For example, an individual hapū may have their own archives which could complement data from the broader iwi archive. Alternatively, data about who has accessed and seen the iwi archive data may be useful for hapū to understand the level of interest or engagement with the content they have shared.
-
Consideration should be given to who might want to get data out of the archive system, as well as the most appropriate way retrieve that data. For example, the solution may allow individual users to do a search for content and download the specific item with its metadata. But what happens in a scenario where a third party may want to get multiple items? For example, if an individual hapū also has an archive system, how could they integrate their system with the iwi archive system? An API could facilitate programmatic access, which may make it easier to integrate and retrieve large amounts of content.
-
The proposed architecture makes use of AWS AI services to perform tasks such as text extraction, document comprehension, transcription of video and audio, and object recognition in images and videos. The hapū can decide how the service operates in respect to the inputs and outputs. Does the service retain the inputs (like documents and images) for any purpose, and if so, who has access to the inputs? Are inputs used to develop and improve the service, and if so, is there a way to opt out of that transfer? For example, Amazon Transcribe, Amazon Comprehend, Amazon Textract, Amazon Rekognition, and Amazon Translate all allow customers to opt-out of the transfers of customer data to develop and improve services. For more detail, see Privacy Features of AWS Services
. -
Because the architecture makes use of AWS AI services, work with the iwi customer to verify that they understand how the proposed AI services work, what function they serve, and what the benefits of the AI services are. You should seek guidance from them on the suitability of using such tools to perform the specific functions and where tikanga may need to be applied. For example, one consideration may be separation between the living and dead. Historical content often relates to those who have passed, so there may be a desire to process items separately or even exclude certain items from being processed by AI services. Another consideration is the accuracy of AI services when analysing te reo Māori in written or spoken form. If the AI services have not been trained on te reo Māori, the transcriptions produced, entities identified, or classifications determined may be inaccurate or incomplete, which can reduce the usefulness of that the data. The system architecture or features need to incorporate the outcomes of these discussions. For example, there may be logic that checks for a specific tag on an item and uses that to determine if an item should be sent to AI services for text extraction or object detection.
Security
How is Māori data protected? This section focuses on specific security considerations from a Māori data perspective.
-
The archive system holds a range of data, some of which may be considered tapu, and therefore data restrictions may need to be in place. There may be a need to store tapu data separately from other data. Guidance is required from the iwi customer on what separation looks like in a digital system. The preceding high-level architecture diagram shows that Amazon S3 and Amazon RDS are used as data stores. Information that is classified as tapu may need to be stored in a separate S3 bucket. The application would require logic to determine which bucket to save items into and provide functions that move the data between buckets if the data classification changed. The Amazon RDS database stores metadata about the item. This may include data about people, places, and events. If some of the metadata is classified as tapu, guidance should be sought on whether the data needs to be stored in a different database table or possibly a separate database. This would then be balanced with potential system complexity and cost.
-
The classification of the data may also require additional access and security controls. Restricting access could be achieved through role-based or attribute-based access controls. System administrators could control access to more sensitive data through the granting of permissions to a user or role. Typically, audit logging provides traceability of who has accessed items in the archive. This can be used to validate the security access controls are working.
-
Protecting data for long-term safety can be achieved by incorporating AWS Well-Architected security and resiliency best practices. From a security perspective, this includes understanding the threats that your application and organisation face. Identify mitigations that can be implemented as security controls. Given the proposed architecture for the archive solution, a potential threat to the long-term safety of the digital content stored in Amazon S3 is a ransomware event. Once a potential event is identified, determine steps that can be taken to help protect your application, detect if this kind of event occurs, and respond to, and recover from such an event. For more detail, see The anatomy of ransomware event targeting data residing in Amazon S3
.
How can you identify and classify Māori data? This section focuses on understanding what Māori data is in the context of your organisation and having a method to classify data as Māori data. This can then guide your architecture when capturing, processing, and storing that data. In this scenario, consider the following:
-
It's clear that the archives system contains Māori data, considering that the customer is an iwi organisation and the system stores and processes data about their history, knowledge, and people.
-
The high-level requirements include being able to control access to data for different users. This indicates that there are different types of data stored within the archive. Discuss how data is classified and how that classification is recorded in the system with the iwi customer. Is there an expectation that the system automatically determines the classification based on the document content or metadata? Should a user manually classify the document? Once classified, how do you record the classification as a piece of metadata, and link this to the digital item?
-
The other considerations in this section are mainly for organisations that capture and process Māori data as part of delivering their products and services. They do not apply to this scenario.
How do you maintain the privacy of personal Māori data? This section focuses on maintaining privacy of personal data. In this scenario, consider the following:
-
The application is likely capturing personal data when a new user registers with the application. Consideration needs to be given to how much data is collected, for what purpose, and how this is communicated to new users as well as how ongoing consent gets managed so that a user has the option to revoke access to that personal information and that all collection and management of personal information is in accordance with New Zealand privacy laws.
Reliability
How do you safely retain data for future generations? This section focuses on understanding that Māori data often needs to be protected and resilient so it can be accessed by future generations. In this scenario, consider the following:
-
The archive holds extremely valuable taonga for the iwi organisation. Given its importance, the architecture needs to ensure that the data is resilient over time. The digital content is stored in Amazon S3, which provide high levels of durability (11 nines). This provides a level of protection from data loss caused by service events. Amazon S3 features like versioning can be used to protect data from deletion or corruption events. For more information on versioning, see Using Versioning in Amazon S3 buckets. Regular backups using Amazon S3 replication or AWS Backup can provide another layer of protection by creating additional copies of the objects stored in the archive.
-
Archive and preservation systems often have multiple copies of content. There may be a preservation master and one or more copies that are used for general access. In some cases, the general access copies may be modified by lowering the resolution of video to be more easily consumed on a variety of devices or converting documents from one format to another to make it easier to consume on a range of different devices. Supporting multiple copies has cost implications in terms of storage. It also requires application features to perform functions such as file copies, file conversions, and image or video resampling.
-
The proposed architecture uses an Amazon RDS database to store system data, including data about users, system usage, and metadata about items in the archive. To protect this database, the native database backup feature can be used to create regular backups of the database. Operational processes need to be established to verify that backups are occurring as expected and test the restoration process periodically.
Cost optimisation
This section focuses on understanding the cost considerations when designing a solution. In this scenario, consider the following:
Clearly present the cost to benefit trade-offs when looking at all infrastructure options. There may be a desire to have data located close to the iwi organisation. At the time of writing (June 2024), the nearest AWS Regions to New Zealand are Sydney and Melbourne, Australia. The Auckland Local Zone is available and is parented to the AWS Sydney Region. AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to customer and data centre provider premises. An AWS Outpost could be deployed into a data centre close to the iwi organisation. Cost components include the AWS Outpost and the cost of hosting the AWS Outpost in a third-party data centre that meets the minimum requirements. The proposed architecture also uses AWS AI services, including Amazon Textract, Amazon Comprehend, and Amazon Rekognition. These services currently are only available in an AWS Region. The solution would therefore need to consider the network connectivity and bandwidth requirements from the AWS Outpost to the Region, which may impact the overall solution cost.
Sustainability
How do you design and operate systems to minimise potential impacts on the environment? This section focuses on considering the impacts of technology on the environment.
-
Work with your customer to identify if they have specific sustainability goals, and identify what metrics are being used to measure attainment of those goals. Determine how you might produce data from the digital archive solution that can feed into the measurement of those metrics.
-
You may prompt the iwi to consider how the iwi can reduce carbon emissions by using AWS instead of alternatives like on-premise servers. You may wish to discuss with the iwi’s kaitiaki board the pros of using a monthly report from the AWS Customer Carbon Footprint Tool to monitor carbon emissions and set a 12 month goal to reduce carbon emissions associated with their use of AWS through optimisations.