Setup Teams Toolkit RAG bot with Azure Ai Search indexes
Hello, new to the forum and im trying to learn how i could create a Teams RAG bot which would use Azure Ai Search indexers aswell as default OPENAI LLM.
I have a Azure OpenAI created and working.
I have created a Azure Storage where i have uploaded a bunch of PDFs, which i have connected through datasources in Azure Ai Search.
I have also created a indexer that will index files in the Azure Storage.
This all works, without any problems.
But when it comes to debugging my bot in Teams Toolkit on my computer, i keep running into that the bot doesn’t have any info of those PDFs i’ve uploaded and indexed. I have already edited the files:
src/indexers/setup.py line 62 ” index = ‘myindexname’ “
Aswell as
src/bot.py line 40 ” indexName= ‘myindexname’ “
The only working solution i’ve found to this is if i put my PDFs in src/indexers/data/.
I have edited my src/indexers/get_data.py to look like this:
import os
import PyPDF2
async def get_doc_data(embeddings):
docs = []
data_dir = os.path.join(os.getcwd(), ‘src/indexers/data/’)
pdf_files = [f for f in os.listdir(data_dir) if f.endswith(‘.pdf’)]
for idx, file_name in enumerate(pdf_files):
file_path = os.path.join(data_dir, file_name)
with open(file_path, ‘rb’) as file: # ‘rb’
reader = PyPDF2.PdfReader(file)
raw_description = “”
for page in reader.pages:
raw_description += page.extract_text() or “”
doc = {
“docId”: str(idx + 1),
“docTitle”: file_name,
“description”: raw_description,
“descriptionVector”: await get_embedding_vector(raw_description, embeddings=embeddings),
}
docs.append(doc)
return docs
async def get_embedding_vector(text: str, embeddings):
result = await embeddings.create_embeddings(text)
if result.status != ‘success’ or not result.output:
if result.status == ‘error’:
raise Exception(f”Failed to generate embeddings for description: <{text[:200]+’…’}>nnError: {result.output}”)
raise Exception(f”Failed to generate embeddings for description: <{text[:200]+’…’}>”)
return result.output[0]
When i run the command “python src/indexers/setup.py” it uploads my PDFs to an index and the teams toolkit bot have my pdf data. I can now chat over my PDFs data.
But i don’t want to be forced to upload my index like this everytime. I want to use a index that already exist and have datasources and indexers connected already to not be forced to manually update it or running that command.
This bot when finished will be uploaded to Teams, and will only be available for our users through Teams.
Does anybody know or have a guide to how i actually can achive this?
Hello, new to the forum and im trying to learn how i could create a Teams RAG bot which would use Azure Ai Search indexers aswell as default OPENAI LLM. I have a Azure OpenAI created and working.I have created a Azure Storage where i have uploaded a bunch of PDFs, which i have connected through datasources in Azure Ai Search.I have also created a indexer that will index files in the Azure Storage.This all works, without any problems. But when it comes to debugging my bot in Teams Toolkit on my computer, i keep running into that the bot doesn’t have any info of those PDFs i’ve uploaded and indexed. I have already edited the files:src/indexers/setup.py line 62 ” index = ‘myindexname’ “Aswell assrc/bot.py line 40 ” indexName= ‘myindexname’ “The only working solution i’ve found to this is if i put my PDFs in src/indexers/data/.I have edited my src/indexers/get_data.py to look like this:import os
import PyPDF2
async def get_doc_data(embeddings):
docs = []
data_dir = os.path.join(os.getcwd(), ‘src/indexers/data/’)
pdf_files = [f for f in os.listdir(data_dir) if f.endswith(‘.pdf’)]
for idx, file_name in enumerate(pdf_files):
file_path = os.path.join(data_dir, file_name)
with open(file_path, ‘rb’) as file: # ‘rb’
reader = PyPDF2.PdfReader(file)
raw_description = “”
for page in reader.pages:
raw_description += page.extract_text() or “”
doc = {
“docId”: str(idx + 1),
“docTitle”: file_name,
“description”: raw_description,
“descriptionVector”: await get_embedding_vector(raw_description, embeddings=embeddings),
}
docs.append(doc)
return docs
async def get_embedding_vector(text: str, embeddings):
result = await embeddings.create_embeddings(text)
if result.status != ‘success’ or not result.output:
if result.status == ‘error’:
raise Exception(f”Failed to generate embeddings for description: <{text[:200]+’…’}>nnError: {result.output}”)
raise Exception(f”Failed to generate embeddings for description: <{text[:200]+’…’}>”)
return result.output[0]When i run the command “python src/indexers/setup.py” it uploads my PDFs to an index and the teams toolkit bot have my pdf data. I can now chat over my PDFs data. But i don’t want to be forced to upload my index like this everytime. I want to use a index that already exist and have datasources and indexers connected already to not be forced to manually update it or running that command. This bot when finished will be uploaded to Teams, and will only be available for our users through Teams.Does anybody know or have a guide to how i actually can achive this? Read More