Build RAG-based generative AI applications in AWS using Amazon FSx for NetApp ONTAP with Amazon Bedrock

The post is co-written with Michael Shaul and Sasha Korman from NetApp.

Generative artificial intelligence (AI) applications are commonly built using a technique called Retrieval Augmented Generation (RAG) that provides foundation models (FMs) access to additional data they didn’t have during training. This data is used to enrich the generative AI prompt to deliver more context-specific and accurate responses without continuously retraining the FM, while also improving transparency and minimizing hallucinations.

In this post, we demonstrate a solution using Amazon FSx for NetApp ONTAP with Amazon Bedrock to provide a RAG experience for your generative AI applications on AWS by bringing company-specific, unstructured user file data to Amazon Bedrock in a straightforward, fast, and secure way.

Our solution uses an FSx for ONTAP file system as the source of unstructured data and continuously populates an Amazon OpenSearch Serverless vector database with the user’s existing files and folders and associated metadata. This enables a RAG scenario with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data retrieved from the OpenSearch Serverless vector database.

When developing generative AI applications such as a Q&A chatbot using RAG, customers are also concerned about keeping their data secure and preventing end-users from querying information from unauthorized data sources. Our solution also uses FSx for ONTAP to allow users to extend their current data security and access mechanisms to augment model responses from Amazon Bedrock. We use FSx for ONTAP as the source of associated metadata, specifically the user’s security access control list (ACL) configurations attached to their files and folders and populate that metadata into OpenSearch Serverless. By combining access control operations with file events that notify the RAG application of new and changed data on the file system, our solution demonstrates how FSx for ONTAP enables Amazon Bedrock to only use embeddings from authorized files for the specific users that connect to our generative AI application.

AWS serverless services make it straightforward to focus on building generative AI applications by providing automatic scaling, built-in high availability, and a pay-for-use billing model. Event-driven compute with AWS Lambda is a good fit for compute-intensive, on-demand tasks such as document embedding and flexible large language model (LLM) orchestration, and Amazon API Gateway provides an API interface that allows for pluggable frontends and event-driven invocation of the LLMs. Our solution also demonstrates how to build a scalable, automated, API-driven serverless application layer on top of Amazon Bedrock and FSx for ONTAP using API Gateway and Lambda.

Solution overview

The solution provisions an FSx for ONTAP Multi-AZ file system with a storage virtual machine (SVM) joined to an AWS Managed Microsoft AD domain. An OpenSearch Serverless vector search collection provides a scalable and high-performance similarity search capability. We use an Amazon Elastic Compute Cloud (Amazon EC2) Windows server as an SMB/CIFS client to the FSx for ONTAP volume and configure data sharing and ACLs for the SMB shares in the volume. We use this data and ACLs to test permissions-based access to the embeddings in a RAG scenario with Amazon Bedrock.

The embeddings container component of our solution is deployed on an EC2 Linux server and mounted as an NFS client on the FSx for ONTAP volume. It periodically migrates existing files and folders along with their security ACL configurations to OpenSearch Serverless. It populates an index in the OpenSearch Serverless vector search collection with company-specific data (and associated metadata and ACLs) from the NFS share on the FSx for ONTAP file system.

The solution implements a RAG Retrieval Lambda function that allows RAG with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data and associated metadata (including ACLs) retrieved from the OpenSearch Serverless index that was populated by the embeddings container component. The RAG Retrieval Lambda function stores conversation history for the user interaction in an Amazon DynamoDB table.

End-users interact with the solution by submitting a natural language prompt either through a chatbot application or directly through the API Gateway interface. The chatbot application container is built using Streamlit and fronted by an AWS Application Load Balancer (ALB). When a user submits a natural language prompt to the chatbot UI using the ALB, the chatbot container interacts with the API Gateway interface that then invokes the RAG Retrieval Lambda function to fetch the response for the user. The user can also directly submit prompt requests to API Gateway and obtain a response. We demonstrate permissions-based access to the RAG documents by explicitly retrieving the SID of a user and then using that SID in the chatbot or API Gateway request, where the RAG Retrieval Lambda function then matches the SID to the Windows ACLs configured for the document. As an additional authentication step in a production environment, you may want to also authenticate the user against an identity provider and then match the user against the permissions configured for the documents.

The following diagram illustrates the end-to-end flow for our solution. We start by configuring data sharing and ACLs with FSx for ONTAP, and then these are periodically scanned by the embeddings container. The embeddings container splits the documents into chunks and uses the Amazon Titan Embeddings model to create vector embeddings from these chunks. It then stores these vector embeddings with associated metadata in our vector database by populating an index in a vector collection in OpenSearch Serverless. The following diagram illustrates the end-to-end flow.

The following architecture diagram illustrates the various components of our solution. overall architecture diagram describing all the components of the solution

Prerequisites

Complete the following prerequisite steps:

Make sure you have model access in Amazon Bedrock. In this solution, we use Anthropic Claude v3 Sonnet on Amazon Bedrock.
Install the AWS Command Line Interface (AWS CLI).
Install Docker.
Install Terraform.

Deploy the solution

The solution is available for download on this GitHub repo. Cloning the repository and using the Terraform template will provision all the components with their required configurations.

Clone the repository for this solution:

sudo yum install -y unzip
git clone https://github.com/aws-samples/genai-bedrock-fsxontap.git
cd genai-bedrock-fsxontap/terraform

From the terraform folder, deploy the entire solution using Terraform:
```
terraform init
terraform apply -auto-approve
```

This process can take 15–20 minutes to complete. When finished, the output of the terraform commands should look like the following:

api-invoke-url = "https://9ng1jjn8qi.execute-api.<region>.amazonaws.com/prod"
fsx-management-ip = toset([
"198.19.255.230",])
fsx-secret-id = "arn:aws:secretsmanager:<region>:<account-id>:secret:AmazonBedrock-FSx-NetAPP-ONTAP-a2fZEdIt-0fBcS9"
fsx-svm-smb-dns-name = "BRSVM.BEDROCK-01.COM"
lb-dns-name = "chat-load-balancer-2040177936.<region>.elb.amazonaws.com"

Load data and set permissions

To test the solution, we will use the EC2 Windows server (ad_host) mounted as an SMB/CIFS client to the FSx for ONTAP volume to share sample data and set user permissions that will then be used to populate the OpenSearch Serverless index by the solution’s embedding container component. Perform the following steps to mount your FSx for ONTAP SVM data volume as a network drive, upload data to this shared network drive, and set permissions based on Windows ACLs:

Obtain the ad_host instance DNS from the output of your Terraform template.
Navigate to AWS Systems Manager Fleet Manager on your AWS console, locate the ad_host instance and follow instructions here to login with Remote Desktop. Use the domain admin user bedrock-01\Admin and obtain the password from AWS Secrets Manager. You can find the password using the Secrets Manager fsx-secret-id secret id from the output of your Terraform template.
To mount an FSx for ONTAP data volume as a network drive, under This PC, choose (right-click) Network and then choose Map Network drive.
Choose the drive letter and use the FSx for ONTAP share path for the mount
(\\<svm>.<domain >\c$\<volume-name>):
Upload the Amazon Bedrock User Guide to the shared network drive and set permissions to the admin user only (make sure that you disable inheritance under Advanced):
Upload the Amazon FSx for ONTAP User Guide to the shared drive and make sure permissions are set to Everyone:
On the ad_host server, open the command prompt and enter the following command to obtain the SID for the admin user:
```
wmic useraccount where name="Admin" get sid
```

Test permissions using the chatbot

To test permissions using the chatbot, obtain the lb-dns-name URL from the output of your Terraform template and access it through your web browser:

test with chatbot and enter prompt

For the prompt query, ask any general question on the FSx for ONTAP user guide that is available for access to everyone. In our scenario, we asked “How can I create an FSx for ONTAP file system,” and the model replied back with detailed steps and source attribution in the chat window to create an FSx for ONTAP file system using the AWS Management Console, AWS CLI, or FSx API:

test with chatbot and enter prompt related to the bedrock guide

Now, let’s ask a question about the Amazon Bedrock user guide that is available for admin access only. In our scenario, we asked “How do I use foundation models with Amazon Bedrock,” and the model replied with the response that it doesn’t have enough information to provide a detailed answer to the question.:

Use the admin SID on the user (SID) filter search in the chat UI and ask the same question in the prompt. This time, the model should reply with steps detailing how to use FMs with Amazon Bedrock and provide the source attribution used by the model for the response:

Test permissions using API Gateway

You can also query the model directly using API Gateway. Obtain the api-invoke-url parameter from the output of your Terraform template.

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: application/json' -d '{"session_id": "1","prompt": "What is an FSxN ONTAP filesystem?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "NA", "memory_window": 10}'

Then invoke the API gateway with Everyone access for a query related to the FSx for ONTAP user guide by setting the value of the metadata parameter to NA to indicate Everyone access:

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: application/json' -d '{"session_id": "1","prompt": "what is bedrock?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "S-1-5-21-4037439088-1296877785-2872080499-1112", "memory_window": 10}'

Cleanup

To avoid recurring charges, clean up your account after trying the solution. From the terraform folder, delete the Terraform template for the solution:

terraform apply --destroy

Conclusion

In this post, we demonstrated a solution that uses FSx for ONTAP with Amazon Bedrock and uses FSx for ONTAP support for file ownership and ACLs to provide permissions-based access in a RAG scenario for generative AI applications. Our solution enables you to build generative AI applications with Amazon Bedrock where you can enrich the generative AI prompt in Amazon Bedrock with your company-specific, unstructured user file data from an FSx for ONTAP file system. This solution enables you to deliver more relevant, context-specific, and accurate responses while also making sure only authorized users have access to that data. Finally, the solution demonstrates the use of AWS serverless services with FSx for ONTAP and Amazon Bedrock that enable automatic scaling, event-driven compute, and API interfaces for your generative AI applications on AWS.

For more information about how to get started building with Amazon Bedrock and FSx for ONTAP, refer to the following resources:

About the authors

Kanishk Mahajan is Principal, Solutions Architecture at AWS. He leads cloud transformation and solution architecture for ISV customers and partner at AWS. Kanishk specializes in containers, cloud operations, migrations and modernizations, AI/ML, resilience and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.

Michael Shaul is a Principal Architect at NetApp’s office of the CTO. He has over 20 years of experience building data management systems, applications, and infrastructure solutions. He has a unique in-depth perspective on cloud technologies, builder, and AI solutions.

Sasha Korman is a tech visionary leader of dynamic development and QA teams across Israel and India. With 14-years at NetApp that began as a programmer, his hands-on experience and leadership have been pivotal in steering complex projects to success, with a focus on innovation, scalability, and reliability.