Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe
Created by Praveen Kumar Jeyarajan (AWS), Jundong Qiao (AWS), Megan Wu (AWS), and Rajiv Upadhyay (AWS)
Summary
Capturing institutional knowledge is paramount for ensuring organizational success and resilience. Institutional knowledge represents the collective wisdom, insights, and experiences accumulated by employees over time, often tacit in nature and passed down informally. This wealth of information encompasses unique approaches, best practices, and solutions to intricate problems that might not be documented elsewhere. By formalizing and documenting this knowledge, companies can preserve institutional memory, foster innovation, enhance decision-making processes, and accelerate learning curves for new employees. Additionally, it promotes collaboration, empowers individuals, and cultivates a culture of continuous improvement. Ultimately, harnessing institutional knowledge helps companies use their most valuable asset—the collective intelligence of their workforce—to navigate challenges, drive growth, and maintain competitive advantage in dynamic business environments.
This pattern explains how to capture institutional knowledge through voice recordings from senior employees. It uses Amazon Transcribe and Amazon Bedrock for systematic documentation and verification. By documenting this informal knowledge, you can preserve it and share it with subsequent cohorts of employees. This endeavor supports operational excellence and improves the effectiveness of training programs through the incorporation of practical knowledge acquired through direct experience.
Prerequisites and limitations
Prerequisites
An active AWS account
Docker, installed
AWS Cloud Development Kit (AWS CDK) version 2.114.1 or later, installed and bootstrapped to the
us-east-1
orus-west-2
AWS RegionsAWS CDK Toolkit version 2.114.1 or later, installed
AWS Command Line Interface (AWS CLI), installed and configured
Python version 3.12 or later, installed
Permissions to create Amazon Transcribe, Amazon Bedrock, Amazon Simple Storage Service (Amazon S3), and AWS Lambda resources
Limitations
This solution is deployed to a single AWS account.
This solution can be deployed only in AWS Regions where Amazon Bedrock and Amazon Transcribe are available. For information about availability, see the documentation for Amazon Bedrock and Amazon Transcribe.
The audio files must be in a format that Amazon Transcribe supports. For a list of supported formats, see Media formats in the Transcribe documentation.
Product versions
AWS SDK for Python (Boto3) version 1.34.57 or later
LangChain version 0.1.12 or later
Architecture
The architecture represents a serverless workflow on AWS. AWS Step Functions orchestrates Lambda functions for audio processing, text analysis, and document generation. The following diagram shows the Step Functions workflow, also known as a state machine.

Each step in the state machine is handled by a distinct Lambda function. The following are the steps in the document generation process:
The
preprocess
Lambda function validates the input passed to Step Functions and lists all of the audio files present in the provided Amazon S3 URI folder path. Downstream Lambda functions in the workflow use the file list to validate, summarize, and generate the document.The
transcribe
Lambda function uses Amazon Transcribe to convert audio files into text transcripts. This Lambda function is responsible for initiating the transcription process and accurately transforming speech into text, which is then stored for subsequent processing.The
validate
Lambda function analyzes the text transcripts, determining the relevance of the responses to the initial questions. By using a large language model (LLM) through Amazon Bedrock, it identifies and separates on-topic answers from off-topic responses.The
summarize
Lambda function uses Amazon Bedrock to generate a coherent and concise summary of the on-topic answers.The
generate
Lambda function assembles the summaries into a well-structured document. It can format the document according to predefined templates and include any additional necessary content or data.If any of the Lambda functions fail, you receive an email notification through Amazon Simple Notification Service (Amazon SNS).
Throughout this process, AWS Step Functions makes sure that each Lambda function is initiated in the correct sequence. This state machine has the capacity for parallel processing to enhance efficiency. An Amazon S3 bucket acts as the central storage repository, supporting the workflow by managing the various media and document formats involved.
Tools
AWS services
Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API.
AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
Amazon Simple Notification Service (Amazon SNS) helps you coordinate and manage the exchange of messages between publishers and clients, including web servers and email addresses.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.
Amazon Transcribe is an automatic speech recognition service that uses machine learning models to convert audio to text.
Other tools
LangChain
is a framework for developing applications that are powered by large language models (LLMs).
Code repository
The code for this pattern is available in the GitHub genai-knowledge-capture
The code repository contains the following files and folders:
assets
folder – The static assets for the solution, such as the architecture diagram and the public datasetcode/lambdas
folder – The Python code for all Lambda functionscode/lambdas/generate
folder - The Python code that generates a document from the summarized data in the S3 bucketcode/lambdas/preprocess
folder - The Python code that processes the inputs for the Step Functions state machinecode/lambdas/summarize
folder - The Python code that summarizes the transcribed data by using Amazon Bedrock servicecode/lambdas/transcribe
folder - The Python code that converts speech data (audio file) into text by using Amazon Transcribecode/lambdas/validate
folder - The Python code that validates whether all answers pertain to the same topic
code/code_stack.py
– The AWS CDK construct Python file that is used to create AWS resourcesapp.py
– The AWS CDK app Python file that is used to deploy AWS resources in the target AWS accountrequirements.txt
– The list of all Python dependencies that must be installed for the AWS CDKcdk.json
– The input file to provide values that are required to create resources
Best practices
The code example provided is for proof-of-concept (PoC) or pilot purposes only. If you want to take the solution to production, use the following best practices:
Enable Amazon S3 access logging
Enable VPC Flow Logs
Epics
Task | Description | Skills required |
---|---|---|
Export variables for the account and AWS Region. | To provide AWS credentials for the AWS CDK by using environment variables, run the following commands.
| AWS DevOps, DevOps engineer |
Set up the AWS CLI named profile. | To set up the AWS CLI named profile for the account, follow the instructions in Configuration and credential file settings. | AWS DevOps, DevOps engineer |
Task | Description | Skills required |
---|---|---|
Clone the repo to your local workstation. | To clone the genai-knowledge-capture
| AWS DevOps, DevOps engineer |
(Optional) Replace the audio files. | To customize the sample application to incorporate your own data, do the following:
| AWS DevOps, DevOps engineer |
Set up the Python virtual environment. | To set up the Python virtual environment, run the following commands.
| AWS DevOps, DevOps engineer |
Synthesize the AWS CDK code. | To convert the code to an AWS CloudFormation stack configuration, run the following command.
| AWS DevOps, DevOps engineer |
Task | Description | Skills required |
---|---|---|
Provision foundation model access. | Enable access to the Anthropic Claude 3 Sonnet model for your AWS account. For instructions, see Add model access in the Bedrock documentation. | AWS DevOps |
Deploy resources in the account. | To deploy resources in the AWS account by using the AWS CDK, do the following:
| AWS DevOps, DevOps engineer |
Subscribe to the Amazon SNS topic. | To subscribe to the Amazon SNS topic for notification, do the following:
| General AWS |
Task | Description | Skills required |
---|---|---|
Run the state machine. |
| App developer, General AWS |
Task | Description | Skills required |
---|---|---|
Remove the AWS resources. | After you test the solution, clean up the resources:
| AWS DevOps, DevOps engineer |
Related resources
AWS documentation
Amazon Bedrock resources:
AWS CDK resources:
AWS Step Functions resources:
Other resources