Category: Microsoft
Category Archives: Microsoft
System Center Licences help
Hi,
I`m planning to buy System Center and have some doubts about witch licences i need.
I have a 3 node vmware cluster with 96 total physycal cores.
I’m planning to use configuration manager to manage deployment, updates and app installation on near 400 client pc.
As i understand if i buy datacenter edition i can manage all windows servers on the cluster and have to buy 400 operations manager client licences for client pc’s and with standard i have to buy client licences for servers to.
Is that ok?
Thanks
Hi,I`m planning to buy System Center and have some doubts about witch licences i need.I have a 3 node vmware cluster with 96 total physycal cores.I’m planning to use configuration manager to manage deployment, updates and app installation on near 400 client pc. As i understand if i buy datacenter edition i can manage all windows servers on the cluster and have to buy 400 operations manager client licences for client pc’s and with standard i have to buy client licences for servers to.Is that ok? Thanks Read More
Adding TEAMS meetings to personal calendar -Free version
Hi!
I am trying to work out how to add my Teams meetings to my calendar whilst using the free version. There seems to be no calendar link at all. I have seen some extensions but they only work for the organisation/work package. I would prefer to link my Google calendar though have an Outlook calendar I could link if it’s easier.
Can anyone help? Thanks 😁
Hi! I am trying to work out how to add my Teams meetings to my calendar whilst using the free version. There seems to be no calendar link at all. I have seen some extensions but they only work for the organisation/work package. I would prefer to link my Google calendar though have an Outlook calendar I could link if it’s easier. Can anyone help? Thanks 😁 Read More
Copilot Does Not Summarise Slides in Powerpoint
I have a 40 slide presentation. Some of the slides are particularly wordy, so what I was planning to do was to ask Copilot to summarise a series of slides and add that as a companion document to the presentation.
When I ask it to summarise a single slide by asking it to “Summarise slide 5” it grabs a title from a previous headline slide and creates a summary based seemingly on the whole presentation. If I ask it to summarise multiple slides with “Please summarise slides 5, 7, 10….” It creates a headline for each slide but summarises the wrong slides throughout and places the wrong titles on each slide in the response.
Anyone else tried summarising specific slides in a presentation?
I have a 40 slide presentation. Some of the slides are particularly wordy, so what I was planning to do was to ask Copilot to summarise a series of slides and add that as a companion document to the presentation. When I ask it to summarise a single slide by asking it to “Summarise slide 5” it grabs a title from a previous headline slide and creates a summary based seemingly on the whole presentation. If I ask it to summarise multiple slides with “Please summarise slides 5, 7, 10….” It creates a headline for each slide but summarises the wrong slides throughout and places the wrong titles on each slide in the response. Anyone else tried summarising specific slides in a presentation? Read More
The New Entra ID Photo Update Settings Policy for User Profile Photos
A new Entra ID photo update settings policy aims to cure the mish-mash of existing settings controlling how user profile photos are updated in Microsoft 365. The new policy is based on a Microsoft Graph resource. Work is needed to update clients to respect the policy settings and take over from current controls, like the OWA mailbox policy.
https://office365itpros.com/2024/09/16/photo-update-settings-policy/
A new Entra ID photo update settings policy aims to cure the mish-mash of existing settings controlling how user profile photos are updated in Microsoft 365. The new policy is based on a Microsoft Graph resource. Work is needed to update clients to respect the policy settings and take over from current controls, like the OWA mailbox policy.
https://office365itpros.com/2024/09/16/photo-update-settings-policy/ Read More
How do I get HP Bloatware script to run during enrollment?
Hi all
I am wanting to put a script within Intune so during the deployment phase of autopilot it would remove the HP Bloatware however if I attach this script
https://gist.github.com/mark05e/a79221b4245962a477a49eb281d97388 into the deployment policy it would fail everytime.
I can only remove it by going to the start menu , run powershell as admin and then run the script manually from there.
What do I need to add so the script can run automatically?
It wouldn’t matter if it can’t be run during the enrollment stage but I do want to somehow automatic this .
Thanks
Hi all I am wanting to put a script within Intune so during the deployment phase of autopilot it would remove the HP Bloatware however if I attach this scripthttps://gist.github.com/mark05e/a79221b4245962a477a49eb281d97388 into the deployment policy it would fail everytime. I can only remove it by going to the start menu , run powershell as admin and then run the script manually from there. What do I need to add so the script can run automatically? It wouldn’t matter if it can’t be run during the enrollment stage but I do want to somehow automatic this . Thanks Read More
Windows 365 GPOs instead of Intune Policies
Hi,
can i set the Windows 365 specific settings (except the provisioning policy) like RDS-Policy via Group Policies, or due i always have to use Intune Profiles?
If GPO are possible, where do i find the requires admx-files
Best Regards
Carsten.
Hi,can i set the Windows 365 specific settings (except the provisioning policy) like RDS-Policy via Group Policies, or due i always have to use Intune Profiles? If GPO are possible, where do i find the requires admx-files Best RegardsCarsten. Read More
Moving from Amazon Quantum Ledger Database (QLDB) to ledger in Azure SQL
Overview
Amazon Web Services (AWS) has announced the discontinuation of its Amazon Quantum Ledger Database (QLDB). Their documentation has been updated to indicate that support for QLDB will end on July 31, 2025. AWS’s decision has prompted many customers to explore alternative solutions for their ledger database needs.
Introducing ledger in Azure SQL
AWS propose Aurora PostgreSQL for audit use cases as an alternative to QLDB for common ledger database use cases. While Aurora offers detailed audit logging and permanent log retention, it does not include cryptographic verifiability.
Microsoft offers an excellent replacement through its ledger feature in Azure SQL. It provides similar functionalities, ensuring data integrity through cryptographic verification, and leverages the full capabilities of Azure SQL. Ledger is a feature that offers the power of Blockchain in Azure SQL Database, Azure SQL Managed Instance and SQL Server. Ledger allows establishing trust across different business entities while maintaining the simplicity and performance of a relational database. The data is centrally managed, and you can cryptographically attest to other parties, such as auditors or business partners, that your data can be trusted and hasn’t been tampered with.
More information about ledger in Azure SQL and SQL Server can be found in the ledger documentation.
Key features of ledger
Captures and cryptographically links data changes to make the data tamper-evident and verifiable.
Uses the same SQL environment you already know.
Is easy to deploy and maintain.
It comes with NO extra cost.
Migration support
Amazon QLDB uses a document-oriented data model. The complexity lies in normalizing data during migration, converting the document model into a relational model, and handling any changes to the document model. Microsoft can assist organizations in migrating from Amazon QLDB to Azure SQL Database. For more detailed information and to initiate the migration process, organizations are encouraged to contact Microsoft support or their Microsoft Account Manager.
Conclusion
Even though QLDB is being phased out, Microsoft offers an excellent alternative for users to host their data with cryptographic immutability. This ensures they can uphold rigorous data integrity standards. Ledger in Azure SQL serves as a strong alternative, with effortless integration into the Azure SQL environment.
Useful links
For more information and to get started with ledger in Azure SQL, see:
Explore the Azure SQL Database ledger documentation
Read the whitepaper
GitHub demo/sample
Data Exposed episode (video)
Listen to the ledger podcast
Microsoft Tech Community – Latest Blogs –Read More
Share button on Meeting Side Panel is not sharing the CURRENT PAGE to the stage
I am using teams-js sdk version 2.9.0. I have a web app which can be added to meetings on teams.
When a user adds my app to the meeting, the app will show a menu of items (landing page for meeting) on the meeting side panel. Once the user selects an item from the list, user will be redirected to item details page (separate page from landing page). Now., If the user clicks on share button, the stage will show landing page (items list page) instead of the item details page which is the current page of the side panel!!
Sample Reproducing Scenario:
Create a web app which has 2 pages. One page should show list of items (PageA). Other page(PageB) should show item details on click on an item from the PageA.
Create a teams app for your website with meetingStage and meetingSidePanel context enabled in config and.
1. Create a Teams Meeting
2. Load the app you created to the meeting.
3. It will show a list of items (PageA). Select an item.
4.You will be redirected to item details page (PageB) on side panel itself.
5.Click on Share button.
6.Stage will load the landing page (PageA-list of items) again instead of the current page (PageB) on side panel.
I am expecting the current state of the side panel to be shared to stage on click of share.
Can anyone please help on this?
I am using teams-js sdk version 2.9.0. I have a web app which can be added to meetings on teams. When a user adds my app to the meeting, the app will show a menu of items (landing page for meeting) on the meeting side panel. Once the user selects an item from the list, user will be redirected to item details page (separate page from landing page). Now., If the user clicks on share button, the stage will show landing page (items list page) instead of the item details page which is the current page of the side panel!! Sample Reproducing Scenario:Create a web app which has 2 pages. One page should show list of items (PageA). Other page(PageB) should show item details on click on an item from the PageA.Create a teams app for your website with meetingStage and meetingSidePanel context enabled in config and. 1. Create a Teams Meeting2. Load the app you created to the meeting.3. It will show a list of items (PageA). Select an item.4.You will be redirected to item details page (PageB) on side panel itself.5.Click on Share button.6.Stage will load the landing page (PageA-list of items) again instead of the current page (PageB) on side panel. I am expecting the current state of the side panel to be shared to stage on click of share.Can anyone please help on this?@prasad_das-MSFT Read More
Exclamation mark instead of unread messages – bad user experience
Hi,
Since the new teams i get sometimes the exclamation mark and i have an “account problem” with one of my customers. Normally my signin is expired there and i need to resign in.
But i do not need them at the moment. The exclamation mark kills the “unread messages” number on my taskbar – which is needed. Anyone else experiencing this? Signing in to the customer tenant(s) is no real solution – as this disrupts the work flow.
BR
Stephan
Hi, Since the new teams i get sometimes the exclamation mark and i have an “account problem” with one of my customers. Normally my signin is expired there and i need to resign in. But i do not need them at the moment. The exclamation mark kills the “unread messages” number on my taskbar – which is needed. Anyone else experiencing this? Signing in to the customer tenant(s) is no real solution – as this disrupts the work flow. BRStephan Read More
Episode 157 of Microsoft Cloud and Hosting Partner Online Meeting | Tuesday September 17, 2024
I’m looking forward to seeing you online on Tuesday September 17 from 12:30pm Sydney time for the 157th episode of the Microsoft Cloud and Hosting Partner Online Meeting. If you haven’t registered yet it’s not too late. Simply click here for details, the agenda and to register.
Attached here are the slides we’ll cover so you can “follow the bouncing ball” as we cover the usual topics of Commerce and Operations Updates, the feature topic “A Tour of Microsoft Entra Suite”, News, Views, Venues; Did You See; and Technical Updates.
Have a good evening.
Regards, Phil
I’m looking forward to seeing you online on Tuesday September 17 from 12:30pm Sydney time for the 157th episode of the Microsoft Cloud and Hosting Partner Online Meeting. If you haven’t registered yet it’s not too late. Simply click here for details, the agenda and to register.
Attached here are the slides we’ll cover so you can “follow the bouncing ball” as we cover the usual topics of Commerce and Operations Updates, the feature topic “A Tour of Microsoft Entra Suite”, News, Views, Venues; Did You See; and Technical Updates.
Have a good evening.
Regards, Phil Read More
Beta channel selection disabled and unselectable after reboot
Hello everyone,
Yesterday I downloaded the iso of the latest 24h2 release preview and installed it, then chose the beta channel as the insider subscription type, but when I restarted it didn’t keep the selection because I saw myself in release preview without the possibility of choosing the beta channel, which in fact is grayed out and not selectable, while dev and canary can be chosen.
Currently, I am in release preview with build 26100.1742 ge_release.
I installed all the updates and restarted again, but nothing has changed.
Can you tell me why and where I went wrong?
Thanks everyone.
Hello everyone,Yesterday I downloaded the iso of the latest 24h2 release preview and installed it, then chose the beta channel as the insider subscription type, but when I restarted it didn’t keep the selection because I saw myself in release preview without the possibility of choosing the beta channel, which in fact is grayed out and not selectable, while dev and canary can be chosen.Currently, I am in release preview with build 26100.1742 ge_release.I installed all the updates and restarted again, but nothing has changed.Can you tell me why and where I went wrong?Thanks everyone. Read More
XDR deception – decoy working – lures not deploying
Hi everyone,
i am trying to create some custom deceptions with the help of this blog post:
Stack Your Deception: Stacking MDE Deception Rules with Thinkst Canarytokens · Attack the SOC
The decoys are working (if i ping a host i specified – alerts are raised).
But i cannot find the lures. I created some special lures for high privilege personas and placed them into {HOME} and a filepath beneath that.
But i cannot find the files (show hidden is on). Are the folders also created by deception?
It’s 5 days now – so time should also not be the problem.
How to troubleshoot?
BR
Stephan
Hi everyone, i am trying to create some custom deceptions with the help of this blog post:Stack Your Deception: Stacking MDE Deception Rules with Thinkst Canarytokens · Attack the SOC The decoys are working (if i ping a host i specified – alerts are raised).But i cannot find the lures. I created some special lures for high privilege personas and placed them into {HOME} and a filepath beneath that.But i cannot find the files (show hidden is on). Are the folders also created by deception?It’s 5 days now – so time should also not be the problem. How to troubleshoot? BRStephan Read More
Filter SAP Data at source with Synapse/ADF CDC
Hi everyone,
I’m currently working on a project in Azure Synapse where I’m using the SAP CDC Connector to connect to an S4Hana system. My goal is to filter data on the source side before storing it in my ADLS Gen2, as there are certain data restrictions that I need to adhere to.
I need to fetch multiple objects from SAP, and I typically use a parameterized approach for this. I have a JSON file that contains parameters and queries for each object I want to retrieve from the source. For instance, I define SQL queries in the JSON file to perform the filtering. This method works well with SQL Connectors.
However, with the SAP CDC Connector, I haven’t been able to find any functionality that allows me to apply such filtering directly at the source.
Here’s what I’m doing so far:
I’m currently using a dataflow in a for each loop. In the dataflow however, I cannot pass SQL queries and Im stuck with the expression builder. I cannot figure out how to dynamically pass query like filtering. So Im just getting the unfiltered objects, which is not an option. I have so many objects, that I cant maintain a non parameterized version.
I tried using a copy data activity as well, however when selecting it, I do not get the option to choose the SAP CDC Integration Dataset.
Has anyone successfully managed to filter tables at the source when using the SAP CDC linked service? Any insights or suggestions on how to achieve this would be greatly appreciated.
Thanks in advance for your help!
Hi everyone, I’m currently working on a project in Azure Synapse where I’m using the SAP CDC Connector to connect to an S4Hana system. My goal is to filter data on the source side before storing it in my ADLS Gen2, as there are certain data restrictions that I need to adhere to.I need to fetch multiple objects from SAP, and I typically use a parameterized approach for this. I have a JSON file that contains parameters and queries for each object I want to retrieve from the source. For instance, I define SQL queries in the JSON file to perform the filtering. This method works well with SQL Connectors.However, with the SAP CDC Connector, I haven’t been able to find any functionality that allows me to apply such filtering directly at the source. Here’s what I’m doing so far:I’m currently using a dataflow in a for each loop. In the dataflow however, I cannot pass SQL queries and Im stuck with the expression builder. I cannot figure out how to dynamically pass query like filtering. So Im just getting the unfiltered objects, which is not an option. I have so many objects, that I cant maintain a non parameterized version.I tried using a copy data activity as well, however when selecting it, I do not get the option to choose the SAP CDC Integration Dataset. Has anyone successfully managed to filter tables at the source when using the SAP CDC linked service? Any insights or suggestions on how to achieve this would be greatly appreciated. Thanks in advance for your help! Read More
Grades app – filter by TAG
On the grades app, I can filter by date range, and by grading category. I have the need to filter by TAG
(I co-teach a class with a colleague, we use categories for TYPE of assignment, and tags to mark WHICH of us is associated with the assignment…. and I’d like to filter this).
Hopefully, it should be an uncontentious tweak to make!
On the grades app, I can filter by date range, and by grading category. I have the need to filter by TAG(I co-teach a class with a colleague, we use categories for TYPE of assignment, and tags to mark WHICH of us is associated with the assignment…. and I’d like to filter this).Hopefully, it should be an uncontentious tweak to make! Read More
Entra Connect Sync duplicated UPN
Hi
I had Entra Connect running for a long time without issues. Out of the blue Connect Sync started to report Duplicate Attribute on 3 users User Principal Name.
The 3 users, Connect Sync believe has a conflicting value in Entra, do exist in Entra, but with a smtp address which matches the UPN, and is not the the users UPN.
If i run the following command on my on-prem AD the UPN does not exist in any form of domain name:
Get-ADUser -Filter {UserPrincipalName -eq “email address removed for privacy reasons”}
Get-ADUser -Filter {UserPrincipalName -eq “e-mail@domain.local”}
Get-ADUser -Filter {UserPrincipalName -eq “email address removed for privacy reasons”}
All my users UPN are different from the configured on-prem ProxyAddresses so the above error mesage makes no sense. And futher more the 3 users which sync sees as a conflict do not even has a ProxyAddresses configured.
Any ideas how to futher debug this?
/Robert
Hi I had Entra Connect running for a long time without issues. Out of the blue Connect Sync started to report Duplicate Attribute on 3 users User Principal Name. The 3 users, Connect Sync believe has a conflicting value in Entra, do exist in Entra, but with a smtp address which matches the UPN, and is not the the users UPN. If i run the following command on my on-prem AD the UPN does not exist in any form of domain name: Get-ADUser -Filter {UserPrincipalName -eq “email address removed for privacy reasons”}Get-ADUser -Filter {UserPrincipalName -eq “e-mail@domain.local”}Get-ADUser -Filter {UserPrincipalName -eq “email address removed for privacy reasons”} All my users UPN are different from the configured on-prem ProxyAddresses so the above error mesage makes no sense. And futher more the 3 users which sync sees as a conflict do not even has a ProxyAddresses configured. Any ideas how to futher debug this? /Robert Read More
Microsoft Attack Simulator Training Foreign Language
I need some help in the ability to change the Microsoft Attack Simulator Video training from the default of English to a foreign language. The chosen video training does support the language, but I have been unsuccessful in finding the setting in activating the foreign language.
I need some help in the ability to change the Microsoft Attack Simulator Video training from the default of English to a foreign language. The chosen video training does support the language, but I have been unsuccessful in finding the setting in activating the foreign language. Read More
licença necessária
boa noite, hoje eu fui usar a função histórico de ações no Excel do meu pc, mais sempre da bloqueado, e quando eu vou ver aparece que preciso de licença, vocês sabem o que isso significa? espero que possam me ajudar.
boa noite, hoje eu fui usar a função histórico de ações no Excel do meu pc, mais sempre da bloqueado, e quando eu vou ver aparece que preciso de licença, vocês sabem o que isso significa? espero que possam me ajudar. Read More
2 Stocks missing from Stocks Data type in excel
Two stock are missing from the Stocks data type.
1. Premier Energies Ltd listed on the National Stock Exchange (NSE) of India & Bombay Stock Exchange (BSE). XNSE:PREMIERENE.
2. Bajaj Housing Finance Ltd. listed on the same exchanges as above.
Both are newly listed stocks. First one listed on 3rd September 2024. Second listed today i.e. 16th Spetember 2024.
How to get these added ? I have given feedback on the 1st one multiple times.
Where to log a request ?
Thanks.
Two stock are missing from the Stocks data type. 1. Premier Energies Ltd listed on the National Stock Exchange (NSE) of India & Bombay Stock Exchange (BSE). XNSE:PREMIERENE. 2. Bajaj Housing Finance Ltd. listed on the same exchanges as above. Both are newly listed stocks. First one listed on 3rd September 2024. Second listed today i.e. 16th Spetember 2024. How to get these added ? I have given feedback on the 1st one multiple times.Where to log a request ? Thanks. Read More
Switch to Azure Business Continuity Center for your at scale BCDR management needs
In response to the evolving customer requirements and environments since COVID-19, including the shift towards hybrid work models and the increase in ransomware attacks, we have observed a growing trend among customers to invest in multiple vendors for data protection. To address these needs, we have developed the Azure Business Continuity (ABC) Center, a streamlined, centralized management center that simplifies backup and disaster recovery across various environments (Azure, Hybrid) and solutions (Azure Backup and Azure Site Recovery). Below are few resources to learn more about Azure Business Continuity Center:
Business Continuity with ABCC: Part 4: optimize security configuration – Microsoft Community Hub
Business Continuity with ABCC: Part 5: Monitoring protection – Microsoft Community Hub
ABCC, currently in public preview since November 2023, is designed as an enhanced version of the Backup Center and will eventually replace it. Getting started is simple, with no prerequisites or costs involved. Even if you’ve been using Backup Center, no additional action is needed to begin viewing your protection estate in Azure Business Continuity Center. To start with , simply navigate to Azure portal and search for Azure Business Continuity Center.
Azure Business Continuity Center (ABCC) providers enhanced experiences for business continuity, and we want our customers to adapt to it before it replaces the Backup Center. To support this transition, we have removed the Backup Center from the global search in the Azure portal, bust there is still option available from ABCC to go to Backup Center.
Backup Center will no longer appear in the Azure Portal search results across all regions. We encourage you to explore the Azure Business Continuity Center (ABCC) for your BCDR journey and provide your valuable feedback to help us enhance it to better meet your needs.
If you still want to launch Backup center, you can first go to Azure Business Continuity Center, from the Azure portal search.
Then, from ABCC help menu, kindly select “Go to Backup Center”.
If you are transitioning to the Backup Center, please share your reasons for doing so, including any missing capabilities, performance issues, or other concerns you may have encountered. Your insights are invaluable in helping us enhance the ABCC experience.
Microsoft Tech Community – Latest Blogs –Read More
Enhancing Retrieval-Augmented Generation with a Multimodal Knowledge Extraction and Retrieval System
The rapid evolution of AI has led to powerful tools for knowledge retrieval and question-answering systems, particularly with the rise of Retrieval-Augmented Generation (RAG) systems. This blog post introduces my capstone project, created as part of the IXN program at UCL in collaboration with Microsoft, aimed at enhancing RAG systems by integrating multimodal knowledge extraction and retrieval capabilities. The system enables AI agents to process both textual and visual data, offering more accurate and contextually relevant responses. In this post, I’ll walk you through the project’s goals, development journey, technical implementation, and outcomes.
Project Overview
The main goal of this project was to improve the performance of RAG systems by refining how multimodal data is extracted, stored, and retrieved. Current RAG systems primarily rely on text-based data, which limits their ability to generate accurate responses when queries require a combination of text and images. To address this, I developed a system capable of extracting, processing, and retrieving multimodal data from Wikimedia, allowing AI agents to generate more accurate, grounded and contextually relevant answers.
Key features include:
Multimodal Knowledge Extraction: Data from Wikimedia (text, images, tables) is preprocessed, run through the transformation pipeline, and stored in vector and graph databases for efficient retrieval.
Dynamic Knowledge Retrieval: A custom query engine, combined with an agentic approach using the ReAct agent, ensures flexible and accurate retrieval of information by dynamically selecting the best tools and strategies for each query.
The project began by addressing the limitations of existing RAG systems, particularly their difficulties with handling visual data and delivering accurate responses. After reviewing various technologies, a system architecture was developed to support both text and image data. Throughout the process, components were refined to ensure compatibility between LlamaIndex, Qdrant, and Neo4j, while optimising performance for managing large datasets. The primary challenges lay in handling the large volumes of data from Wikimedia, especially the processing of images, and refactoring the system for Dockerisation. These challenges were met through iterative improvements to the system architecture, ensuring efficient multimodal data handling and reliable deployment across environments.
Implementation Overview
This project integrates both textual and visual data to enhance RAG systems’ retrieval and response generation. The system’s architecture is split into two main processes:
Knowledge Extraction: Data is fetched from Wikimedia and transformed into embeddings for text and images. These embeddings are stored in Qdrant for efficient retrieval, while Neo4j captures the relationships between the nodes, ensuring the preservation of data structure.
Knowledge Retrieval: A dynamic query engine processes user queries, retrieving data from both Qdrant (using vector search) and Neo4j (via graph traversal). Advanced techniques like query expansion, reranking, and cross-referencing ensure the most relevant information is returned.
Tech Stack
The following technologies were used to build and deploy the system:
Python: Core programming language for data pipelines
LlamaIndex: Framework for indexing, transforming, and retrieving multimodal data
Qdrant: Vector database for similarity searches based on embeddings
Neo4j: Graph database used to store and manage relationships between data entities
Azure OpenAI (GPT-4O): Used for handling multimodal inputs, deploying models via Azure App Services
Text Embedding Ada-002: Model for generating text embeddings
Azure Computer Vision: Used for generating image embeddings
Gradio: Provides an interactive interface for querying the system
Docker and Docker Compose: Used for containerization and orchestration, ensuring consistent deployment
Implementation Details
Multimodal Knowledge Extraction
The system starts by fetching both textual and visual data from Wikimedia, using the Wikimedia API and web scraping techniques. Then the key steps in knowledge extraction implementation are:
Data Preprocessing: Text is cleaned, images are classified into categories such as plots or images for appropriate handling during later transformations, and tables are structured for easier processing.
Node Creation and Transformation: Initial LlamaIndex nodes are created from this data, which then undergo several transformations through the transformation pipeline using GPT-4O model deployed via Azure OpenAI:
Text and Table Transformations: Text data is cleaned, split into smaller chunks using semantic chunking, and new derived nodes are created from various transformations, like key entity extraction or table analysis. Each node has a unique Llamaindex ID and carries metadata such as title, context, and relationships reflecting the hierarchical structure of the Wikimedia page and parent-child relationships with new transformed nodes.
Image Transformations: Images are processed to generate descriptions, perform plot analysis, and identify key objects based on the image type, resulting in the creation of new text nodes.
Embeddings Generation: The last stage of the pipeline is to generate embeddings for images and transformed text nodes:
Text Embeddings: Generated using the text-embedding-ada-002 model deployed with Azure OpenAI on Azure App Services.
Image Embeddings: Generated using the Azure Computer Vision service.
Storage: Both text and image embeddings are stored in Qdrant with reference node IDs in the payload for fast retrieval. The full nodes and their relationships are stored in Neo4j:
Knowledge Retrieval
The retrieval process involves several key steps:
Query Expansion: The system generates multiple variations of the original query, expanding the search space to capture relevant data.
Vector Search: The expanded queries are passed to Qdrant for a similarity-based search using cosine similarity.
Reranking and Cross-Retrieval: Results are then reranked by relevance. Retrieved nodes from Qdrant contain LlamaIndex node IDs in the payload. These are used to fetch the nodes from Neo4j and then to get the nodes with original data from Wikimedia by traversing the graph, ensuring the final response is based only on original Wikipedia content.
ReAct Agent Integration: The ReAct agent dynamically manages the retrieval process by selecting tools based on the query context. It integrates with the custom-built query engine to balance AI-generated insights with the original data from Neo4j and Qdrant.
Dockerization with Docker Compose
To ensure consistent deployment across different environments, the entire application is containerised using Docker. Docker Compose orchestrates multiple containers, including the knowledge extractor, retriever, Neo4j, and Qdrant services. This setup simplifies the deployment process and enhances scalability.
Results and Outcomes
The system effectively enhances the grounding and accuracy of responses generated by RAG systems. By incorporating multimodal data, it delivers contextually relevant answers, particularly in scenarios where visual information was critical. The integration of Qdrant and Neo4j proved to be highly efficient, enabling fast retrieval and accurate results.
Additionally, a user-friendly interface built with Gradio allows users to interact with the system and compare the AI-generated responses with standard LLM output, offering an easy way to evaluate the improvements.
Here is a snapshot of the Gradio UI:
Future Development
Several directions for future development have been identified to further enhance the system’s capabilities:
Agentic Framework Expansion: A future version of the system could incorporate an autonomous tool capable of determining whether the existing knowledge base is sufficient for a query. If the knowledge base is found lacking, the system could automatically initiate a knowledge extraction process to address the gap. This enhancement would bring greater adaptability and self-sufficiency to the system.
Knowledge Graph with Entities: Expanding the knowledge graph to include key entities such as individuals, locations, and events or others appropriate for the domain. This would add considerable depth and precision to the retrieval process. The integration of such entities would provide a more comprehensive and interconnected knowledge base, improving both the relevance and accuracy of results.
Enhanced Multimodality: Future iterations could expand the system’s capabilities in handling image data. This may include adding support for image comparison, object detection, or breaking images down into distinct components. Such features would enable more sophisticated queries and increase the system’s versatility in handling diverse data formats.
Incorporating these advancements will position the system to play an important role in the evolving field of multimodal AI, further bridging the gap between text and visual data integration in knowledge retrieval.
Summary
This project demonstrates the potential of enhancing RAG systems by integrating multimodal data, allowing AI to process both text and images more effectively. Through the use of technologies like LlamaIndex, Qdrant, and Neo4j, the system delivers more grounded, contextually relevant answers at high speed. With a focus on accurate knowledge retrieval and dynamic query handling, the project showcases a significant advancement in AI-driven question-answering systems. For more insights and to explore the project, please visit the GitHub repository.
If you’d like to connect, feel free to reach out to me on LinkedIn.
Microsoft Tech Community – Latest Blogs –Read More