Building high scale RAG applications with Microsoft Fabric Eventhouse
Introduction
In this article I will guide you on how to build a Generative AI application in Microsoft Fabric.
This guide will walk you through implementing a RAG (Retrieval Augmented Generation) system in Microsoft Fabric using Azure OpenAI and Microsoft Fabric Eventhouse as your vector store.
Why MS Fabric Eventhouse?
Fabric Eventhouse is built using the Kusto Engine that delivers top-notch performance for similarity search at high scale.
If you are looking to build a RAG application with a large number of embeddings vectors, look no more, using MS Fabric you can leverage the processing power for building the Vector Database and the high performant engine powering Fabric Eventhouse DB.
If you want to know more about using Fabric Eventhouse as a Vector store here are some links.
Azure Data Explorer for Vector Similarity Search
Optimizing Vector Similarity Search on Azure Data Explorer – Performance Update
Optimizing Vector Similarity Searches at Scale
What is RAG – Retrieval Augmented Generation?
Large Language Models (LLMs) excel in creating text that resembles human writing.
Initially, LLMs are equipped with a broad spectrum of knowledge from extensive datasets used for their training. This grants them flexibility but may not provide the specialized focus or knowledge necessary in certain topics.
Retrieval Augmented Generation (RAG) is a technique that improves the pertinence and precision of LLMs by incorporating real-time, relevant information into their responses. With RAG, an LLM is boosted by a search system that sifts through unstructured text to find information, which then refines the LLM’s replies.
What is a Vector Database?
The Vector Database is a vital component in the retrieval process in RAG, facilitating the quick and effective identification of relevant text sections in response to a query, based on how closely they match the search terms.
Vector DBs are data stores optimized for storing and processing vector data. Vector data can refer to data types such as geometric shapes, spatial data, or more abstract high-dimensional data used in machine learning applications, such as embeddings.
These databases are designed to efficiently handle operations such as similarity search, nearest neighbour search, and other operations that are common when dealing with high-dimensional vector spaces.
For example, in machine learning, it’s common to convert text, images, or other complex data into high-dimensional vectors using models like word embeddings, image embeddings, etc. To efficiently search and compare these vectors, a vector database or vector store with specialized indexing and search algorithms would be used.
In our case we will use Azure OpenAI Ada Embeddings model to create embeddings, which are vector representations of the text we are indexing and storing in Microsoft Fabric Eventhouse DB.
The code
The code can be found here.
We will use the Moby Dick book from the Gutenberg project in PDF format as our knowledge base.
We will read the PDF file, cut the text into chunks of 1000 characters and calculate the embeddings for each chunk, then we will store the text and the embeddings in our Vector Database (Fabric Eventhouse)
We will then ask questions and get answers from our Vector DB and send the question and answers to Azure OpenAI GPT4 to get a response in natural language.
Processing the files and indexing the embeddings
We will do this once – only to create the embeddings and then save them into our Vector Database – Fabric Eventhouse
Read files from Fabric Lakehouse
Create embeddings from the text using Azure OpenAI ada Embeddings model
Save the text and embeddings in our Fabric Eventhouse DB
RAG – Getting answers
Every time we want to search for answers from our knowledge base, we will:
Create the embeddings for the question and search our Fabric Eventhouse for the answers, using Similarity search
Combining the question and the retrieved answers from our Vector Database, we will call Azure OpenAI GPT4 model to get “natural language” answer.
Prerequisites
To follow this guide, you will need to ensure that you have access to the following services and have the necessary credentials and keys set up.
Microsoft Fabric.
Azure OpenAI Studio to manage and deploy OpenAI models.
Setup
Create a Fabric Workspace
Create a Lakehouse
Upload the moby dick pdf file
Create an Eventhouse DB called “GenAI_eventhouse”
Click on the DB name and then “Explore your data” on the top-right side
Create the “bookEmbeddings” table
Paste the following command and run it
.create table bookEmbeddings (document_name:string, content:string, embedding:dynamic)
Import our notebook
Grab your Azure openAI endpoint and secret key and paste it in the notebook, replace your models deployment names if needed.
Get the Eventhouse URI and paste it as “KUSTO_URI” in the notebook
Connect the notebook to the Lakehouse
Let’s run our notebook
This will install all the python libraries we need
%pip install openai==1.12.0 azure-kusto-data langchain tenacity langchain-openai pypdf
Run cell 2 after configuring the environment variables for:
OPENAI_GPT4_DEPLOYMENT_NAME=”gpt-4″
OPENAI_DEPLOYMENT_ENDPOINT=”<your-azure openai endpoint>”
OPENAI_API_KEY=”<your-azure openai api key>”
OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = “text-embedding-ada-002”
KUSTO_URI = “<your-eventhouse cluster-uri>”
Run cell 3
Here we create an Azure OpenAI client and define a function to calculate embeddings
client = AzureOpenAI(
azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
api_key=OPENAI_API_KEY,
api_version=”2023-09-01-preview”
)
#we use the tenacity library to create delays and retries when calling openAI embeddings to avoid hitting throttling limits
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def generate_embeddings(text):
# replace newlines, which can negatively affect performance.
txt = text.replace(“n”, ” “)
return client.embeddings.create(input = [txt], model=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME).data[0].embedding
Run cell 4
Read the file, divide it into 1000 chars chunks
# splitting into 1000 char long chunks with 30 char overlap
# split [“nn”, “n”, ” “, “”]
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=30,
)
documentName = “moby dick book”
#Copy File API path
fileName = “/lakehouse/default/Files/moby dick.pdf”
loader = PyPDFLoader(fileName)
pages = loader.load_and_split(text_splitter=splitter)
print(“Number of pages: “, len(pages))
Run cell 5
Save the text chunks to a pandas dataframe
#save all the pages into a pandas dataframe
import pandas as pd
df = pd.DataFrame(columns=[‘document_name’, ‘content’, ’embedding’])
for page in pages:
df.loc[len(df.index)] = [documentName, page.page_content, “”]
df.head()
Run cell 6
Calculate embeddings
# calculate the embeddings using openAI ada
df[“embedding”] = df.content.apply(lambda x: generate_embeddings(x))
print(df.head(2))
Run cell 7
Write the data to MS Fabric Eventhouse
df_sp = spark.createDataFrame(df)
df_sp.write.
format(“com.microsoft.kusto.spark.synapse.datasource”).
option(“kustoCluster”,KUSTO_URI).
option(“kustoDatabase”,KUSTO_DATABASE).
option(“kustoTable”, KUSTO_TABLE).
option(“accessToken”, accessToken ).
mode(“Append”).save()
Let’s check the data was saved to our Vector Database
Go to the Eventhouse and run this query
bookEmbeddings
| take 10
Go back to the notebook and run the rest of the cells
Creates a function to call GPT4 for a NL answer
def call_openAI(text):
response = client.chat.completions.create(
model=OPENAI_GPT4_DEPLOYMENT_NAME,
messages = text,
temperature=0
)
return response.choices[0].message.content
Creates a function to retrieve answers using embeddings with similarity search
def get_answer_from_eventhouse(question, nr_of_answers=1):
searchedEmbedding = generate_embeddings(question)
kusto_query = KUSTO_TABLE + ” | extend similarity = series_cosine_similarity(dynamic(“+str(searchedEmbedding)+”), embedding) | top ” + str(nr_of_answers) + ” by similarity desc “
kustoDf = spark.read
.format(“com.microsoft.kusto.spark.synapse.datasource”)
.option(“kustoCluster”,KUSTO_URI)
.option(“kustoDatabase”,KUSTO_DATABASE)
.option(“accessToken”, accessToken)
.option(“kustoQuery”, kusto_query).load()
return kustoDf
Retrieves 2 answers from Eventhouse
nr_of_answers = 2
question = “Why does the coffin prepared for Queequeg become Ishmael’s life buoy once the Pequod sinks?”
answers_df = get_answer_from_eventhouse(question, nr_of_answers)
Concatenates the answers
answer = “”
for row in answers_df.rdd.toLocalIterator():
answer = answer + ” ” + row[‘content’]
Creates a prompt for GPT4 with the question and the 2 answers
prompt = ‘Question: {}’.format(question) + ‘n’ + ‘Information: {}’.format(answer)
# prepare prompt
messages = [{“role”: “system”, “content”: “You are a HELPFUL assistant answering users questions. Answer the question using the provided information and do not add anything else.”},
{“role”: “user”, “content”: prompt}]
result = call_openAI(messages)
display(result)
That’s it, you have built your very first RAG app using MS Fabric
All the code can be found here.
Thanks
Denise
Microsoft Tech Community – Latest Blogs –Read More