Category: Microsoft
Category Archives: Microsoft
Enhancing Retrieval-Augmented Generation with a Multimodal Knowledge Extraction and Retrieval System
The rapid evolution of AI has led to powerful tools for knowledge retrieval and question-answering systems, particularly with the rise of Retrieval-Augmented Generation (RAG) systems. This blog post introduces my capstone project, created as part of the IXN program at UCL in collaboration with Microsoft, aimed at enhancing RAG systems by integrating multimodal knowledge extraction and retrieval capabilities. The system enables AI agents to process both textual and visual data, offering more accurate and contextually relevant responses. In this post, I’ll walk you through the project’s goals, development journey, technical implementation, and outcomes.
Project Overview
The main goal of this project was to improve the performance of RAG systems by refining how multimodal data is extracted, stored, and retrieved. Current RAG systems primarily rely on text-based data, which limits their ability to generate accurate responses when queries require a combination of text and images. To address this, I developed a system capable of extracting, processing, and retrieving multimodal data from Wikimedia, allowing AI agents to generate more accurate, grounded and contextually relevant answers.
Key features include:
Multimodal Knowledge Extraction: Data from Wikimedia (text, images, tables) is preprocessed, run through the transformation pipeline, and stored in vector and graph databases for efficient retrieval.
Dynamic Knowledge Retrieval: A custom query engine, combined with an agentic approach using the ReAct agent, ensures flexible and accurate retrieval of information by dynamically selecting the best tools and strategies for each query.
The project began by addressing the limitations of existing RAG systems, particularly their difficulties with handling visual data and delivering accurate responses. After reviewing various technologies, a system architecture was developed to support both text and image data. Throughout the process, components were refined to ensure compatibility between LlamaIndex, Qdrant, and Neo4j, while optimising performance for managing large datasets. The primary challenges lay in handling the large volumes of data from Wikimedia, especially the processing of images, and refactoring the system for Dockerisation. These challenges were met through iterative improvements to the system architecture, ensuring efficient multimodal data handling and reliable deployment across environments.
Implementation Overview
This project integrates both textual and visual data to enhance RAG systems’ retrieval and response generation. The system’s architecture is split into two main processes:
Knowledge Extraction: Data is fetched from Wikimedia and transformed into embeddings for text and images. These embeddings are stored in Qdrant for efficient retrieval, while Neo4j captures the relationships between the nodes, ensuring the preservation of data structure.
Knowledge Retrieval: A dynamic query engine processes user queries, retrieving data from both Qdrant (using vector search) and Neo4j (via graph traversal). Advanced techniques like query expansion, reranking, and cross-referencing ensure the most relevant information is returned.
Tech Stack
The following technologies were used to build and deploy the system:
Python: Core programming language for data pipelines
LlamaIndex: Framework for indexing, transforming, and retrieving multimodal data
Qdrant: Vector database for similarity searches based on embeddings
Neo4j: Graph database used to store and manage relationships between data entities
Azure OpenAI (GPT-4O): Used for handling multimodal inputs, deploying models via Azure App Services
Text Embedding Ada-002: Model for generating text embeddings
Azure Computer Vision: Used for generating image embeddings
Gradio: Provides an interactive interface for querying the system
Docker and Docker Compose: Used for containerization and orchestration, ensuring consistent deployment
Implementation Details
Multimodal Knowledge Extraction
The system starts by fetching both textual and visual data from Wikimedia, using the Wikimedia API and web scraping techniques. Then the key steps in knowledge extraction implementation are:
Data Preprocessing: Text is cleaned, images are classified into categories such as plots or images for appropriate handling during later transformations, and tables are structured for easier processing.
Node Creation and Transformation: Initial LlamaIndex nodes are created from this data, which then undergo several transformations through the transformation pipeline using GPT-4O model deployed via Azure OpenAI:
Text and Table Transformations: Text data is cleaned, split into smaller chunks using semantic chunking, and new derived nodes are created from various transformations, like key entity extraction or table analysis. Each node has a unique Llamaindex ID and carries metadata such as title, context, and relationships reflecting the hierarchical structure of the Wikimedia page and parent-child relationships with new transformed nodes.
Image Transformations: Images are processed to generate descriptions, perform plot analysis, and identify key objects based on the image type, resulting in the creation of new text nodes.
Embeddings Generation: The last stage of the pipeline is to generate embeddings for images and transformed text nodes:
Text Embeddings: Generated using the text-embedding-ada-002 model deployed with Azure OpenAI on Azure App Services.
Image Embeddings: Generated using the Azure Computer Vision service.
Storage: Both text and image embeddings are stored in Qdrant with reference node IDs in the payload for fast retrieval. The full nodes and their relationships are stored in Neo4j:
Knowledge Retrieval
The retrieval process involves several key steps:
Query Expansion: The system generates multiple variations of the original query, expanding the search space to capture relevant data.
Vector Search: The expanded queries are passed to Qdrant for a similarity-based search using cosine similarity.
Reranking and Cross-Retrieval: Results are then reranked by relevance. Retrieved nodes from Qdrant contain LlamaIndex node IDs in the payload. These are used to fetch the nodes from Neo4j and then to get the nodes with original data from Wikimedia by traversing the graph, ensuring the final response is based only on original Wikipedia content.
ReAct Agent Integration: The ReAct agent dynamically manages the retrieval process by selecting tools based on the query context. It integrates with the custom-built query engine to balance AI-generated insights with the original data from Neo4j and Qdrant.
Dockerization with Docker Compose
To ensure consistent deployment across different environments, the entire application is containerised using Docker. Docker Compose orchestrates multiple containers, including the knowledge extractor, retriever, Neo4j, and Qdrant services. This setup simplifies the deployment process and enhances scalability.
Results and Outcomes
The system effectively enhances the grounding and accuracy of responses generated by RAG systems. By incorporating multimodal data, it delivers contextually relevant answers, particularly in scenarios where visual information was critical. The integration of Qdrant and Neo4j proved to be highly efficient, enabling fast retrieval and accurate results.
Additionally, a user-friendly interface built with Gradio allows users to interact with the system and compare the AI-generated responses with standard LLM output, offering an easy way to evaluate the improvements.
Here is a snapshot of the Gradio UI:
Future Development
Several directions for future development have been identified to further enhance the system’s capabilities:
Agentic Framework Expansion: A future version of the system could incorporate an autonomous tool capable of determining whether the existing knowledge base is sufficient for a query. If the knowledge base is found lacking, the system could automatically initiate a knowledge extraction process to address the gap. This enhancement would bring greater adaptability and self-sufficiency to the system.
Knowledge Graph with Entities: Expanding the knowledge graph to include key entities such as individuals, locations, and events or others appropriate for the domain. This would add considerable depth and precision to the retrieval process. The integration of such entities would provide a more comprehensive and interconnected knowledge base, improving both the relevance and accuracy of results.
Enhanced Multimodality: Future iterations could expand the system’s capabilities in handling image data. This may include adding support for image comparison, object detection, or breaking images down into distinct components. Such features would enable more sophisticated queries and increase the system’s versatility in handling diverse data formats.
Incorporating these advancements will position the system to play an important role in the evolving field of multimodal AI, further bridging the gap between text and visual data integration in knowledge retrieval.
Summary
This project demonstrates the potential of enhancing RAG systems by integrating multimodal data, allowing AI to process both text and images more effectively. Through the use of technologies like LlamaIndex, Qdrant, and Neo4j, the system delivers more grounded, contextually relevant answers at high speed. With a focus on accurate knowledge retrieval and dynamic query handling, the project showcases a significant advancement in AI-driven question-answering systems. For more insights and to explore the project, please visit the GitHub repository.
If you’d like to connect, feel free to reach out to me on LinkedIn.
Microsoft Tech Community – Latest Blogs –Read More
2024-09 Cumulative Update for Windows 10 Version 22H2 for x64-based Systems (KB5043064)
I can’t install this, it quits at 7%, any info on what to do. Thanks. I cant get rid of it.
I can’t install this, it quits at 7%, any info on what to do. Thanks. I cant get rid of it. Read More
Automatic Task Assignment Based on Schedule
I kindly request assistance with creating a formula.
I have a an employee table with employee names and start times (A1:B5). I have a second table with the day’s tasks start times and their end times (D1:E17), all being 30 minutes to complete. I have a running count (G1:G17) of how many times an employee completed a task as the employee can only complete 4 total tasks and taken out of rotation and cannot be assigned anymore.
I need a result like in H1:H17 that follows the schedule and assigns a free employee (column A) based on their start times (column B), the 30 minutes to complete the task (column D and E), and on their running count of no more than 4 (column G)
I kindly request assistance with creating a formula. I have a an employee table with employee names and start times (A1:B5). I have a second table with the day’s tasks start times and their end times (D1:E17), all being 30 minutes to complete. I have a running count (G1:G17) of how many times an employee completed a task as the employee can only complete 4 total tasks and taken out of rotation and cannot be assigned anymore. I need a result like in H1:H17 that follows the schedule and assigns a free employee (column A) based on their start times (column B), the 30 minutes to complete the task (column D and E), and on their running count of no more than 4 (column G) Read More
Time zone/scheduled times changing for some reminder emails
Hi everyone,
My work team are using the Bookings app to manage training session bookings and registrations. Everything is set up in line with the MS Bookings guides. We have reminder emails set up to auto send 1 day before the session starts.
We have recently been notified that some of our reminder emails are advising of the incorrect time and stating UTC Coordinated Universal Time.
Our region and time zone settings are correct (UTC+10 Brisbane), and we have ticked Always show time slots in business time zone, however this shouldn’t be an issue as all of our current attendees are located within the same time zone.
The initial booking information and confirmation email states the correct time of the session, and majority of the reminder emails are correct, it’s just some of them that change the time. It’s not always the same session type either, or for everyone who has registered, it’s very sporadic. E.g. Two people registered for the same session, one reminder emails state the correct time, the other receives one that says the session is 12:30am-01:00am when the correct time is 10:30am-11:00am on the same day.
We have searched thoroughly through the Bookings system and are unable to find a resolution.
We have also checked with one of our attendees who received an incorrect reminder to check their time settings in Teams and 365 and all are correct.
Any information or how to resolve this would be most welcome.
Thanks,
Cara
Hi everyone,My work team are using the Bookings app to manage training session bookings and registrations. Everything is set up in line with the MS Bookings guides. We have reminder emails set up to auto send 1 day before the session starts.We have recently been notified that some of our reminder emails are advising of the incorrect time and stating UTC Coordinated Universal Time.Our region and time zone settings are correct (UTC+10 Brisbane), and we have ticked Always show time slots in business time zone, however this shouldn’t be an issue as all of our current attendees are located within the same time zone.The initial booking information and confirmation email states the correct time of the session, and majority of the reminder emails are correct, it’s just some of them that change the time. It’s not always the same session type either, or for everyone who has registered, it’s very sporadic. E.g. Two people registered for the same session, one reminder emails state the correct time, the other receives one that says the session is 12:30am-01:00am when the correct time is 10:30am-11:00am on the same day.We have searched thoroughly through the Bookings system and are unable to find a resolution.We have also checked with one of our attendees who received an incorrect reminder to check their time settings in Teams and 365 and all are correct.Any information or how to resolve this would be most welcome. Thanks,Cara Read More
Graph API for accessing Mail Info
Hi PowerShell Community,
I need to access the info below
Total Numbers of email by each mailboxTotal Attachments per email by each mailboxType of attachments (PDF, Word, Excel, PPT, Image, Etc..)Total email, each mailboxLast access
For which i need to know which Graph API I need for the info mentioned above. Can someone please help us in finding this.
Hi PowerShell Community, I need to access the info below Total Numbers of email by each mailboxTotal Attachments per email by each mailboxType of attachments (PDF, Word, Excel, PPT, Image, Etc..)Total email, each mailboxLast accessFor which i need to know which Graph API I need for the info mentioned above. Can someone please help us in finding this. Read More
Archive
Hi Outlook Team,
Good Day , Customer wants to clarify the following questions regarding the Archive Process.
According to the MRM policy applied to user mailboxes, emails do not immediately appear in the Archive folder. They become visible to end users after 7 days.
Where do these emails reside during those seven days, and is this behavior expected?
The customer wants to comprehend the behavior of email disappearance.
If email disappearance is intentional by design, the customer seeks a change in this process.
Hi Outlook Team,
Good Day , Customer wants to clarify the following questions regarding the Archive Process.
According to the MRM policy applied to user mailboxes, emails do not immediately appear in the Archive folder. They become visible to end users after 7 days.
Where do these emails reside during those seven days, and is this behavior expected?
The customer wants to comprehend the behavior of email disappearance.
If email disappearance is intentional by design, the customer seeks a change in this process. Read More
Visual studio 2022 community Issue
I’ve recently switched to a Mac (Mac 15 M3), and I’ve discovered that Visual Studio is no longer supported for Mac users. To work around this, I installed a Windows virtual machine on Parallels and attempted to install the latest version of Visual Studio 2022 Community Edition. However, I’m encountering an issue: the Azure Development workload is not appearing in the list of available workloads during installation. This problem doesn’t occur on a standard Windows machine but does persist on the virtual machine. I’ve tried several solutions without success. Could you assist me with this?
I’ve recently switched to a Mac (Mac 15 M3), and I’ve discovered that Visual Studio is no longer supported for Mac users. To work around this, I installed a Windows virtual machine on Parallels and attempted to install the latest version of Visual Studio 2022 Community Edition. However, I’m encountering an issue: the Azure Development workload is not appearing in the list of available workloads during installation. This problem doesn’t occur on a standard Windows machine but does persist on the virtual machine. I’ve tried several solutions without success. Could you assist me with this? Read More
Email Auto Forwarding issue
Hi There,
I am receiving auto-forwarded emails from a client’s account, but some emails are not coming through. The client’s IT team has confirmed all emails show as delivered, yet I am only receiving some of them. I have already checked my Junk folder and found nothing there. How should I further investigate this issue?
Thanks for your help.
Best Regards,
Tiffany Liu
Hi There, I am receiving auto-forwarded emails from a client’s account, but some emails are not coming through. The client’s IT team has confirmed all emails show as delivered, yet I am only receiving some of them. I have already checked my Junk folder and found nothing there. How should I further investigate this issue? Thanks for your help. Best Regards, Tiffany Liu Read More
Moving Files across between Channels in MS Teams
Hello! I’m trying to rearrange and organize the files of our team by moving them from one channel to the other to optimize the storage that we have. I’ve been encountering an issue with this when the progress of moving the files gets stuck at a certain percentage for a long time. When I check the destination folder, the files are already there and at the same time, the origin folder is already empty but even with these the progress is still at 57% for some of the folders. Not sure if it’s already safe to cancel, close teams or shut down my computer without losing any of our files.
Would appreciate any feedback on this.
Thank you!
Hello! I’m trying to rearrange and organize the files of our team by moving them from one channel to the other to optimize the storage that we have. I’ve been encountering an issue with this when the progress of moving the files gets stuck at a certain percentage for a long time. When I check the destination folder, the files are already there and at the same time, the origin folder is already empty but even with these the progress is still at 57% for some of the folders. Not sure if it’s already safe to cancel, close teams or shut down my computer without losing any of our files. Would appreciate any feedback on this. Thank you! Read More
Unique and Pizza Tower
Pizza Tower provides a distinctive encounter. It’s a call to action to move quickly, be frantic, avoid stagnation, and resist fear.
You will find yourself reliving the steps from the beginning to gain P points, even with the game’s short play time-by the finish, it feels like it should have been longer. The game’s intricate, energetic, and dynamic gameplay is the cause. The game has a lot of details, and the gameplay is continually being expanded with new components. The pizza tower game features some of the most catchy soundtrack tunes of the year it was released, and the animations are superb. One of the best independent games of 2023 that every player should play is We Are One, because the boss fights in the game are challenging but enjoyable.
Pizza Tower provides a distinctive encounter. It’s a call to action to move quickly, be frantic, avoid stagnation, and resist fear.You will find yourself reliving the steps from the beginning to gain P points, even with the game’s short play time-by the finish, it feels like it should have been longer. The game’s intricate, energetic, and dynamic gameplay is the cause. The game has a lot of details, and the gameplay is continually being expanded with new components. The pizza tower game features some of the most catchy soundtrack tunes of the year it was released, and the animations are superb. One of the best independent games of 2023 that every player should play is We Are One, because the boss fights in the game are challenging but enjoyable. Read More
Feed data location to run against Sentinel’s KQL function
Hi,
We have a feed consisting of around 250,000-300,000 entries and will be imported daily. We do not intend to store this data in Sentinel as a table and would like to store it somewhere else (Cosmos, storage, etc.) from where we can grab this data and run it against one of our Sentinel’s KQL functions to generate Alerts.
Planning to use Logic Apps/Functions to do the above actions. But would like to know what would be the right solution here so that comparing the feed data against KQL function results would be fast and not of high cost
Thank you !!
Hi, We have a feed consisting of around 250,000-300,000 entries and will be imported daily. We do not intend to store this data in Sentinel as a table and would like to store it somewhere else (Cosmos, storage, etc.) from where we can grab this data and run it against one of our Sentinel’s KQL functions to generate Alerts. Planning to use Logic Apps/Functions to do the above actions. But would like to know what would be the right solution here so that comparing the feed data against KQL function results would be fast and not of high cost Thank you !! Read More
Teams Outlook add-in suddenly stopped working
New Teams installed on the session hosts for the system and Teams add-in also installed for the system. However, Outlook fails to point to the location where plugin is saved. In the past it worked like a charm, and a week ago stopped working all of a sudden. Outlook always point the add-in location to the app data folder, but it is not installed for the user.
I tried a fresh mult-session image from Azure Marketplace, and encountered similar issue.
New Teams installed on the session hosts for the system and Teams add-in also installed for the system. However, Outlook fails to point to the location where plugin is saved. In the past it worked like a charm, and a week ago stopped working all of a sudden. Outlook always point the add-in location to the app data folder, but it is not installed for the user.I tried a fresh mult-session image from Azure Marketplace, and encountered similar issue. Read More
Identify replies in Microsoft Search API search Teams messages response
Using Microsoft Search API to search Teams messages, how do I tell if a hit is a channel message (post) or a reply to one? They look exactly the same in the response.
Adding the field “replyToId” to the request doesn’t add it to the response. I can’t see any difference between the hits.
Currently, after getting the search response, my code calls the API’s get-message to get the data of each message, but it fails for messages that are actually replies.
Using Microsoft Search API to search Teams messages, how do I tell if a hit is a channel message (post) or a reply to one? They look exactly the same in the response.Adding the field “replyToId” to the request doesn’t add it to the response. I can’t see any difference between the hits.Currently, after getting the search response, my code calls the API’s get-message to get the data of each message, but it fails for messages that are actually replies. Read More
Descubra o Portal que Toda Pessoa Desenvolvedora JavaScript Precisa Conhecer na Microsoft!
Você sabia que a Microsoft oferece um portal exclusivo voltado para desenvolvedores(as) JavaScript? O JavaScript at Microsoft reúne tudo o que você precisa em um único lugar para começar a desenvolver aplicações, aprender mais sobre JavaScript e se manter atualizado(a) com as novidades da Microsoft!
Vamos explorar um pouco mais sobre essa incrível plataforma e entender como você pode aproveitar ao máximo seus recursos!
O que é o JavaScript at Microsoft
No JavaScript at Microsoft, você encontrará tutoriais práticos, documentações detalhadas, exemplos de código com Azure e muito mais! Seja você um(a) desenvolvedor(a) iniciante ou experiente, essa plataforma é projetada para apoiar e acelerar seu aprendizado e desenvolvimento, ajudando-o(a) a tirar o máximo proveito das tecnologias relacionadas ao JavaScript.
O que você encontrará no JavaScript at Microsoft?
Há muitos recursos interessantes nesse portal! O mais interessante é que as informações estão super organizadas e centralizadas. Assim, você pode obter rapidamente todas as informações necessárias sobre o mundo JavaScript na Microsoft.
Vamos conferir o que você encontrará no JavaScript at Microsoft:
Serverless ChatGPT com RAG usando LangChain.js
Logo na primeira parte da página, você encontrará vídeos recentes, tutoriais, artigos e até mesmo exemplos de código, como o Serverless AI Chat with RAG using LangChain.js. Essa é uma aplicação onde você aprenderá a criar seu próprio ChatGPT serverless utilizando a técnica de Recuperação Aumentada de Geração (RAG) com LangChain.js. Você pode executá-lo localmente com Ollama e Mistral, ou implementá-lo no Azure em apenas alguns minutos, aproveitando seus próprios dados.
Recomendamos que você explore esse exemplo incrível! Há muito para aprender e, quem sabe, se inspirar para criar sua própria versão de chatbot com JavaScript! Faça um fork do projeto agora mesmo e deixe seu ⭐!
Vídeos e Séries sobre JavaScript + Azure
Na seção de vídeos, você encontrará uma série de conteúdos sobre como usar JavaScript com Azure. São vídeos que variam desde tutoriais curtos até palestras de 30 a 45 minutos, ensinando como criar aplicações incríveis com JavaScript e Azure.
Por exemplo, neste ano tivemos o JavaScript Developer Day, com inúmeras palestras incríveis de especialistas da Microsoft e da Comunidade Técnica, abordando como você pode utilizar o JavaScript com diferentes serviços do Azure! Algumas palestras notáveis incluem:
Building a versatile RAG Pattern chat bot with Azure OpenAI, LangChain | JavaScript Dev Day
LangChain.js + Azure: A Generative AI App Journey | JavaScript Dev Day
GitHub Copilot Can Do That? | JavaScript Dev Day
Exemplos de Código JavaScript + Azure e Projetos Open Source
Nesta seção, você encontrará diversos projetos open source nos quais você pode contribuir! Muitos desses projetos são mantidos pelo time de JavaScript Advocacy e Developer Division da Microsoft. Eles são voltados para uso empresarial e seguem as melhores práticas de desenvolvimento em JavaScript! Explore esses projetos, faça experimentos e ajude-nos a melhorá-los com suas contribuições!
Tutoriais e Mais Vídeos!
Na seção de tutoriais, você encontrará uma ampla variedade de vídeos tutoriais, cobrindo diferentes necessidades. Desde o uso do depurador do Visual Studio Code até a utilização do Azure Static Web Apps para hospedar aplicações estáticas.
Alguns exemplos de tutoriais disponíveis são:
End-to-end browser debugging of your Azure Static Web Apps with Visual Studio Code
Azure libraries packages for JavaScript
Introduction to Playwright: What is Playwright?
Deploy React websites to the cloud with Azure Static Web Apps
Workshops e Documentações
Por fim, você encontrará diversos workshops e documentações oficiais sobre como usar JavaScript com Azure e outras tecnologias da Microsoft.
No hub, você encontrará workshops como:
Microservices in practice with Node.js, Docker and Azure
LAB: Build a serverless web application end-to-end on Microsoft Azure
Create your own ChatGPT with Retrieval-Augmented-Generation
Build JavaScript applications with Node.js
Conclusão
O JavaScript at Microsoft é um portal completo para quem deseja aprender mais sobre JavaScript e como usá-lo com as tecnologias da Microsoft. Então, se você quer se aprofundar em JavaScript, Azure, TypeScript, Inteligência Artificial, Testes, e muito mais, não deixe de acessar o portal e explorar todos os recursos disponíveis!
Espero que tenha gostado deste artigo e que você explore ainda mais o JavaScript at Microsoft! Se tiver alguma dúvida ou sugestão, não deixe de comentar aqui embaixo! 😎
Microsoft Tech Community – Latest Blogs –Read More
MacBook mail is not working even after removing the account
I already tried removing the account but it continues giving error in the password
I already tried removing the account but it continues giving error in the password Read More
Display content (page, List) based on condition
In Sharepoint online, how can I display or hide an object (page, list etc.) based on a role or group membership or any other value?
Basically display the content based on a condition.
In Sharepoint online, how can I display or hide an object (page, list etc.) based on a role or group membership or any other value?Basically display the content based on a condition. Read More
Changing Specific Highlight Colors in Microsoft Word Without Affecting Others
Hello Everyone,
I have text highlighted in multiple colors (green, yellow, and blue), and I want to change only the blue highlights to yellow without affecting the green highlights.
Thanks,
Hello Everyone, I have text highlighted in multiple colors (green, yellow, and blue), and I want to change only the blue highlights to yellow without affecting the green highlights. Thanks, Read More