Category: Microsoft
Category Archives: Microsoft
Formula (VBA generated) fills the entire column of selected cell
Hello everyone
I’m currently experiencing an issue with my Excel (365). The code is supposed to insert a formula into a cell when I double-click it, but it fills the entire column of the selected cell with the formula.
Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
Dim r As Integer
If Not Intersect(Target, Me.Range(“F3:R65”)) Is Nothing And IsEmpty(Target.Value) Then
r = Target.Row
Target.Formula = “=IF($C” & r & “>$D” & r & “,$D” & r & “+1-$C” & r & “,$D” & r & “-$C” & r & “)”
Cancel = True
End If
End Sub
Before:
After (Double-clicked cell I15):
Hello everyoneI’m currently experiencing an issue with my Excel (365). The code is supposed to insert a formula into a cell when I double-click it, but it fills the entire column of the selected cell with the formula. Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
Dim r As Integer
If Not Intersect(Target, Me.Range(“F3:R65”)) Is Nothing And IsEmpty(Target.Value) Then
r = Target.Row
Target.Formula = “=IF($C” & r & “>$D” & r & “,$D” & r & “+1-$C” & r & “,$D” & r & “-$C” & r & “)”
Cancel = True
End If
End Sub Before:After (Double-clicked cell I15): Read More
Sqlserver- Not Null Cases
Hi,
Please help me on below requirement.
In my table, we have 5 columns, and ID is the key column.
And I need a logic like
1. If Reason1 is null & Reason2 is not null then New_Txt
2. If Reason1 is not null and Reason2 is not null then Old_txt
If Reason2 is not null and Reason1 is Not null then New_Txt
( It have to generate 2 diff rows)
Base Table:
IDR1R2Old_txtNew_txt11200 PRO0003AB11200P03Q08GD11200Q08Q09HE
Required OutPut Like :
IDR1R2Old_TxtNew_Txt11200NullP03 B11200P03Q08G 11200P03Q08 D11200Q08Q09H 11200Q08Q09 E
Hi, Please help me on below requirement.In my table, we have 5 columns, and ID is the key column.And I need a logic like1. If Reason1 is null & Reason2 is not null then New_Txt2. If Reason1 is not null and Reason2 is not null then Old_txt If Reason2 is not null and Reason1 is Not null then New_Txt ( It have to generate 2 diff rows)Base Table:IDR1R2Old_txtNew_txt11200 PRO0003AB11200P03Q08GD11200Q08Q09HE Required OutPut Like : IDR1R2Old_TxtNew_Txt11200NullP03 B11200P03Q08G 11200P03Q08 D11200Q08Q09H 11200Q08Q09 E Read More
Access Projects through Planner
Hello,
since the new Planner update in Teams I can see projects of groups that I’m not even part of. Where is the accessibility to Projects controlled?
Thanks
Timo Schuldt
Hello, since the new Planner update in Teams I can see projects of groups that I’m not even part of. Where is the accessibility to Projects controlled? ThanksTimo Schuldt Read More
Building Retrieval Augmented Generation on VSCode & AI Toolkit
Retrieval Augmented Generation (RAG) on VS Code AI Toolkit:
AI toolkit allows users to quickly deploy models locally or in the cloud, test and integrate them via a user-friendly playground or REST API, fine-tune models for specific requirements, and deploy AI-powered features either in the cloud or embedded within device applications.
In the previous blogs, we learnt how to get started with AI Toolkit by installing and creating basic application. Please refer to the blogs for detailed insights and updates if this is your first time using the VS Code AI Toolkit.
Visual Studio Code AI Toolkit: Run LLMs locally
Visual Studio AI Toolkit: Building GenAI Applications
Retrieval Augmented Generation (RAG):
LLMs are trained on a specific dataset from various domains. When you want to work with LLMs, there might be information that is specific to your dataset or domain which the LLMs doesn’t have enough knowledge about. Generated language might also need to be referenced based on the domain and use case. RAG is used for these use cases to increase the applicability to specific domains and datasets. e.g. The LLM might know about legal services in general but if you want to reference specific statutes in US Law and get references to them then RAG might be a good approach to do this. A similar approach can be applied for any other country specific laws as well.
Retrieval-Augmented Generation (RAG) is a hybrid approach in natural language processing (NLP) that combines two key elements: retrieval of relevant information from external data sources and generation of text based on this retrieved information.
Retrieval: These models retrieve relevant text from a large repository of documents when generating responses or completing tasks. They excel at providing factual and specific information. A better retrieval mechanism leads to a more accurate and relevant response.
Instead of relying solely on learned patterns and data, RAG incorporates a retrieval step. This involves searching or retrieving specific relevant snippets from a large database of documents related to the input query or task. RAG improves the accuracy and relevancy of the response based on the domain specific document repository.
Generation: These models generate responses from scratch based on learned patterns and data that they are being trained on. They can create fluent and contextually appropriate text but may struggle with accuracy and factual correctness when dealing with specific queries. e.g. if we ask the model for current dollar rate, then the language model can only generate the response based on the data that it is trained on and it is likely to be inaccurate. The information contained in the model is based on the documents it is trained on and for current events such as news and currency prices which are dynamic, it will be accurate only until the training date cutoff. So, connecting it to a reliable source then allows us to extend the model to get it from the right source. Similarly, in-case the model answers from its pretrained data, it may not be able to quote the reference for the data. This can be limiting in certain cases if user doesn’t know the source of the answer. With RAG the source can be referenced as the answer is generated.
Some Applications:
Question Answering: RAG can excel in tasks where precise answers backed by evidence are required, such as in open-domain question answering systems.
Content Creation: It can also be used to generate content that is both informative and accurate, leveraging retrieved knowledge to enhance the generation process.
Summarization: RAG can also be used to create both abstractive and extractive summaries. Extractive summaries identify the important sentences from the document(s) and generate a summary based on their relative importance. Abstractive summaries identify the most important ideas and content and synthesize them in their own words.
In this series, lets create a basic RAG application.
Let’s discuss the architecture in two parts, first part would be creation of database and second is retrieval.
Creation of Database
We will use a PDF file that will be used for RAG implementation. We will first extract text from the PDF file and then convert that into smaller pieces of documents, which are often referred as ‘chunks ‘and the process of doing so is known as ‘chunking’. This process helps the language model to extract the right document without exceeding the context limit. Chunk overlap and chunk size must be balanced well for a getting good results.
Once we have document chunks, we will next proceed to convert them into embeddings. Embeddings are a foundational concept in NLP. Embeddings enable machines to understand and process human language more effectively by representing words or phrases as vectors in a continuous space where semantic relationships are encoded.
How Embeddings Work?
Vector Representation: Each word or phrase is represented as a vector of real numbers. For example, in a 300-dimensional embedding space, each word might be represented by a vector with 300 numerical values.
Semantic Similarity: Words with similar meanings are represented by vectors that are closer together in the embedding space. For instance, vectors for “dog” and “cat” would be closer than vectors for “dog” and “car”.
Learned from Data: Embeddings are learned from large amounts of textual data using techniques like Word2Vec, GloVe (Global Vectors for Word Representation), or through neural network-based approaches such as Transformer models.
Applications: Embeddings are widely used in NLP tasks such as sentiment analysis, machine translation, text classification, and more. They allow models to generalize better to data they have not seen before and capture intricate relationships between words.
Once we have the embeddings, we store these in a unique database called as vector database.
Vector Databases
Vector databases are specialized for storing and retrieving vector data efficiently, making them an essential component in applications that rely on similarity-based search and analysis of high-dimensional data vectors. ChromaDB, is such an AI-native open-source vector database which will be used in this tutorial. For learning more about ChromaDB, Click here.
We will be utilizing ChromaDB from the Langchain framework in this tutorial.
Setting up a virtual environment (venv) is highly recommended while following this blog. Python is a prerequisite, if not installed please install the latest version of python on the machine. For detailed steps click here. Once the environment is setup, ChromaDB needs to be installed using the python package installer “pip”.
In the VSCode terminal type the following command,
pip install chromadb
We will also utilize LangChain, a widely used OS framework for developing GenAI applications. Langchain is an Opensource Framework which is used as an orchestrator to build customizable AI applications especially those application which use LLMs/SLMs . Langchain provides tools and abstractions that make it easier to customize, control, and integrate LLMs into applications.
The following are some of the major components of Langchain.
Prompts: Prompts are the text instructions or questions that you provide to the LLM. Well-crafted prompts are crucial for getting accurate and relevant responses from the LLM. LangChain provides templates to structure prompts and make them more reusable.
LLMs: Large Language models (LLMs) like GPT-4o, LLaMA, and others are pre-trained models with vast knowledge and capabilities. LangChain seamlessly integrates with popular LLM providers. You can also use your own custom-trained models.
Chains: Chains are responsible for orchestrating the flow of data and interactions between different components.
Types of Chains:
Sequential Chains: Execute components in a linear order.
Parallel Chains: Execute components concurrently.
Conditional Chains: Execute components based on certain conditions.
Generative Chains: Generate text or other outputs.
Custom Chains: You can create your own custom chains to suit specific use cases
Callbacks:
Monitoring and Logging: Callbacks provide a way to track the progress of your application and log important events.
Customizations: You can implement custom callbacks to perform actions like sending notifications or storing data.
Indexes:
Document Retrieval: Indexes are used to store and retrieve documents that can be used as context for LLMs.
Vector Databases: LangChain supports various vector databases for efficient document retrieval.
Agents:
Autonomous Actions: Agents are capable of taking actions based on the information they gather from the environment.
Decision-Making: Agents use LLMs to make decisions and complete tasks.
Memory:
Context Preservation: Memory allows LLMs to maintain context and remember information from previous interactions.
Types of Memory:
Conversation Memory: Stores the history of a conversation.
Document Memory: Stores information from documents.
Episodic Memory: Stores information about past events.
Tools:
External Integration: Tools enable LLMs to interact with external resources like search engines, calculators, or APIs.
Expanding Capabilities: Tools can enhance the functionality of your applications.
By effectively combining these components, you can create a wide range of applications, including chatbots, question-answering systems, text summarization tools, and more.
To install, type the following command,
pip install langchain
LangChain Community contains third-party integrations that implement the base interfaces defined in LangChain Core, making them ready-to-use in any LangChain application.
Type the following command in VSCode terminal to install the Langchain community,
pip install langchain-community
Let’s now begin by importing the required libraries into our coding editor. It is recommended to use a notebook file (.ipynb) for this part. Head to Visual Studio code, and then create a new file, name is as dbmaker.ipynb, we must be having a notebook file now, Select the Kernel to the virtual environment that we have created earlier or use the python version installed on the local machine.
Now import the following libraries,
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import DirectoryLoader,PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
It is quite important to understand the use of the above statements, so let’s look at these one by one,
Chroma:
Module: langchain_community.vectorstores
Purpose: Chroma is a vector store that allows you to store and retrieve high-dimensional vectors. It is used to store embeddings of documents or text and retrieve them based on similarity searches.
Usage: Typically used in applications that require efficient similarity searches, such as document retrieval or question-answering systems.
HuggingFaceEmbeddings:
Module: langchain_community.embeddings
Purpose: HuggingFaceEmbeddings provides a way to generate embeddings using models from the Hugging Face library. These embeddings are numerical representations of text that capture semantic meaning.
Usage: Used to convert text into embeddings that can be stored in a vector store like Chroma for similarity searches.
DirectoryLoader:
Module: langchain_community.document_loaders
Purpose: DirectoryLoader is used to load documents from a specified directory. It can handle various file types and is useful for batch processing of documents.
Usage: Commonly used to load a large number of documents from a directory for further processing, such as embedding generation or text splitting.
PyMuPDFLoader:
Module: langchain_community.document_loaders
Purpose: PyMuPDFLoader is a specialized document loader that uses the PyMuPDF library to load and process PDF documents.
Usage: Used to extract text from PDF files, which can then be processed further, such as generating embeddings or splitting text.
RecursiveCharacterTextSplitter:
Module: langchain.text_splitter
Purpose: RecursiveCharacterTextSplitter is used to split text into smaller chunks based on character count. It recursively splits text to ensure that chunks are of manageable size while preserving semantic meaning.
Usage: Useful in scenarios where large documents need to be broken down into smaller, more manageable pieces for processing, such as embedding generation or indexing.
Once we have done these imports successfully, it’s time now to specify the directory of the documents and also to specify the embedding model.
#Directory of the PDF Files
dir = ‘docs/’
# OS Embedding model from Huggingface
embeddings = HuggingFaceEmbeddings(model_name=’all-MiniLM-L6-v2′)
This line defines a variable ‘dir’ that holds the path to a directory containing PDF files. This directory path will be used later to load and process the PDF files stored in this location.
The model that will be used in this tutorial is all-MiniLM-L6-v2, which is a pre-trained model from Hugging Face.
The embeddings object will be used to convert text into numerical embeddings. These embeddings capture the semantic meaning of the text and can be used for various tasks such as similarity searches, clustering, or feeding into other machine learning models.
It’s now time to load the documents from the library, so let’s create a function to achieve this,
#Loading the documents
def load_docs(dir):
loader=DirectoryLoader(dir,loader_cls=PyMuPDFLoader,use_multithreading=True,max_concurrency=128,show_progress=True,silent_errors=True)
documents=loader.load()
return documents
Now let’s step through the code and explain what we are doing here,
The load_docs function is designed to load documents from the specified directory. It uses the DirectoryLoader class to handle the loading process, with specific configurations to optimize performance and handle unforeseen errors gracefully.
The load_docs function is designed to accept one Parameter that is the directory. The directory path where the PDF files are located.
Initialize DirectoryLoader:
dir: The directory containing the PDF files.
loader_cls=PyMuPDFLoader: Specifies that the PyMuPDFLoader class should be used to load the PDF files. This loader is specialized for handling PDF documents.
use_multithreading=True: Enables multithreading to speed up the loading process.
max_concurrency=128: Sets the maximum number of concurrent threads to 128. This allows for parallel processing of multiple files.
show_progress=True: Displays a progress bar to indicate the loading progress.
silent_errors=True: Suppresses error messages, allowing the loading process to continue even if some files fail to load.
Load Documents:
documents = loader.load(): Calls the load method of the DirectoryLoader instance to load the documents from the specified directory.
Return Documents:
return documents: Returns the loaded documents.
This function will be now used while we create chunks, throughout this tutorial we will be following functional paradigm in order to use them efficiently wherever needed.
To create chunks, lets now design a function.
#Splitting the documents into chunks
def split_docs(documents,chunk_size=1000,chunk_overlap=100):
text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=chunk_overlap)
docs=text_splitter.split_documents(documents)
return docs
The split_docs function is designed to split a list of documents into smaller chunks. This is useful for processing large documents in manageable pieces, especially for tasks like embedding generation or indexing. This helps a lot especially while there is a need to send limited context to the model when there is a token limit.
Parameters of split_docs:
documents: A list of documents to be split. Each document is typically a string or a structured object containing text.
chunk_size (default 1000): The maximum number of characters in each chunk.
chunk_overlap (default 100): The number of characters that overlap between consecutive chunks. This helps to maintain context across chunks.
The chunk size and chunk overlap needs to be tweaked according to the use case, but for this tutorial we are keeping it to a default standard value.
Steps
Initialize RecursiveCharacterTextSplitter:
chunk_size=chunk_size: Sets the maximum size of each chunk.
chunk_overlap=chunk_overlap: Sets the number of overlapping characters between chunks.
The RecursiveCharacterTextSplitter is designed to split text into chunks while preserving semantic meaning as much as possible.
Split Documents:
docs = text_splitter.split_documents(documents): Calls the split_documents method of the text_splitter instance to split the input documents into smaller chunks.
Return Chunks:
return docs: Returns the list of document chunks.
We have now completed the task of designing functions that will further help us in performing the tasks that we intend to do inorder to create a vector database.
Its time to utilise the functions and create the vector database,
documents=load_docs(dir)
len(documents)
doc=split_docs(documents)
print(len(doc))
Let’s call the load_docs function with the directory path, to load the documents from the specified directory. The loaded documents are stored in the documents variable. Then, to check the length of the documents, we can use the len() function.
Now call the split_docs function with the loaded documents to split them into smaller chunks. The resulting chunks are stored in the doc variable. The length of the doc list is printed, indicating how many chunks were created from the original documents.
As we have the chunked embeddings ready, its now time to store these into a vector database. As we have already discussed we will be using the ChromaDB in this tutorial.
save_to=Chroma.from_documents(documents=doc,embedding=embeddings,persist_directory=’./ai-toolkit’)
The above line of code initializes a Chroma vector store from a list of document chunks and their corresponding embeddings. The vector store is then saved to a specified directory for persistence.
Parameters
documents=doc: The list of document chunks that were created by the split_docs function.
embedding=embeddings: The embedding model used to generate embeddings for the document chunks. This is an instance of HuggingFaceEmbeddings.
persist_directory=’./ai-toolkit’: The directory where the vector store will be saved. This allows the vector store to be persisted and loaded later.
Our database is now ready and we can ask a sample search query to this. Although we haven’t still configured the small language model to this, but we can still try to see how the retriever works, so let’s ask a sample query,
query=”What is Fine tuning”
We have defined our query, now its time to search it in our newly created ‘ai-toolkit’ named vector database. In order to do this, we will need to take the following steps,
Initialize a Chroma vector store by loading it from the specified directory.
Perform a similarity search on the vector store using the provided query.
Print the entire list of search results to the console.
Print the content of the first document or chunk in the search results.
The code is as follows,
db1=Chroma(persist_directory=’./ai-toolkit’,embedding_function=embeddings)
results=db1.similarity_search(query)
print(results)
print(results[0].page_content)
Upon successful execution, now there will be some results appearing in the output cell.
In this article we have learned the concepts about RAG and created embedding successfully. In the next part of this article, we will take look at how we can use RAG with these embeddings in ChromaDB to get better results using AI toolkit for VSCode and Phi-3 Model downloaded locally
Meanwhile you can take a look at the following resources about RAG and AI Toolkit
AI Toolkit for VSCode Documentation
AI Toolkit for Visual Studio Code (Github)
RAG in AI studio (Concepts)
RAG in AI Search
Phi-3 Cookbook
Microsoft Tech Community – Latest Blogs –Read More
DocAider: Automated Documentation Maintenance for Open-source GitHub Repositories
Project Overview
Comprehensive documentation is crucial for end users in open-source software projects, but manual creation and maintenance are time-consuming and costly. Traditional documentation generators like Pydoc rely on predefined rules and in-line code information. However, emerging generative AI techniques, particularly Large Language Models (LLMs), offer new possibilities for enhanced documentation generation. Developed in partnership with Microsoft and UCL, DocAider aims to create an AI-powered documentation tool that automatically generates and updates code documentation. The tool leverages Github Actions workflows to trigger documentation tasks upon pull requests (PRs) opening, providing valuable insights into continuous documentation maintenance. This approach addresses the challenges of automating documentation and ensures that project documentation remains current with minimal human intervention. This project aims to leverage LLM technologies, combined with Microsoft Semantic Kernel, Microsoft Autogen, and Azure AI Studio, to mitigate the burden of maintaining up-to-date documentation.
This system uses a multi-agent architecture where multiple agents work together to complete the task. It offers two innovative features: a recursive update mechanism, which ensures that changes ripple throughout all related documentation, and continuous monitoring and updating code via pull requests.
DocAider offers a promising solution for software engineers, with the potential to automatically maintain clean and up-to-date documentation. It allows developers to concentrate more on coding while simplifying the onboarding process for new team members. Additionally, it helps reduce costs and boosts overall efficiency.
Project Journey
This project was completed over 3 months. The few weeks, the team focused on the requirements engineering portion of the project, where we set functional and non-functional requirements, created context and architecture diagrams and broke the project down with the stakeholders, so that it was easy for us to implement in the following months, making sure we included the most important features and requirements. This process also allowed the team to see how much is realistically achievable, and what should be kept as optional if time allowed us to complete.
During implementation, our team employed agile methodologies, Git practices, and continuous integration and testing. We chose agile as our development approach because it facilitated constant communication between the team and stakeholders. This strategy proved crucial to our product’s success. Bi-weekly meetings with stakeholders allowed us to report progress, plan upcoming tasks and clarify the requirements. Additionally, we held weekly internal meetings for team members to showcase their work and assess overall progress.
Technical Details
DocAider is an LLM-powered tool that generates and updates documentation automatically. It performs the documentation tasks using a customised GitHub Actions workflow and runs in the background. We developed DocAider by integrating Semantic Kernel and AutoGen. The tools facilitate the development of AI-based software. Furthermore, we deployed and managed Azure OpenAI LLMs on Azure AI Studio. To obtain good results, we used the GPT-4-0125-preview model to create documentation for the source code. The temperature parameter was set between 0 and 0.2 for more deterministic and factual LLM responses.
Documentation Generation
AutoGen provides multiple conversation patterns to orchestrate AI agents, such as sequential chats, group chats, nested chats, etc. We used the sequential chats to create documentation. The figure shows our multi-agent architecture, which reduces LLM hallucinations in two ways: appropriate code context information and self-improvement. Four agents perform different tasks in sequence and an agent manager controls the multi-agent conversation.
Code Context Agent creates a graph representation of the entire repository, mapping the relationships between function calls. It then generates comprehensive information about the codebase using the actual source code and the relationship graph.
Documentation Generation Agent produces baseline documentation taking into account the contextual information passed from the previous agent. The documentation contains three basic sections: overview, class/function/method descriptions and input/output examples.
Review Agent assesses the baseline documentation, and suggests the improvements.
Revise Agent modifies the baseline documentation according to the suggestions and returns the improved documentation to Agent Manager.
Agent Manager controls the conversation process and responds to function calling requests from LLM-configured agents.
By using Semantic Kernel, we built skilled agents for performing specific tasks. AutoGen facilitates agent interactions to complete complex workflows. The LLM function calling capability helps to reduce programming efforts and makes agents flexible. An agent can autonomously execute external functions defined in the associated plugins to complete a variety of tasks.
Documentation Update
To maintain consistency and accuracy across all related documentation when a class/function in a file is changed, the Documentation Update feature performs the update recursively. If a class/function is modified, the system will automatically update the documentation for all dependent files. This includes documentation of the source file, as well as documentation of files that use functions dependent on the changed class/function. This recursive update feature ensures that all related documentation remains up–to–date with the latest changes in the code.
Additionally, Documentation update on PR Comment allows reviewers to trigger the documentation updates on specified files by commenting in a specific format. The reviewer can specify which file needs an update and provide instructions on what changes should be made. The system will then process this comment and update the documentation as instructed. This feature ensures that precise and targeted documentation updates can be made based on reviewer feedback, improving the overall quality and relevance of the documentation. Furthermore, it removes the need for developers to manually change documentation according to reviewers’ comments. The comment triggering this process needs to be in this format:
“Documentation {file_path}: {comment}”. For example, “Documentation main.py: Add more I/O examples”.
Results and Outcome
Our evaluation process involved three stages: a case study executing our system on a well–known repository to showcase our system, a comparison of our system against RepoAgent, and a quantitative analysis. Through this process, we could determine our system’s performance.
Case Study:
The section presents results from applying our tool to generate and update documentation for the Graphviz repository’s Python files. The system produced well-structured documentation, including overviews, global variables, function/class descriptions, and I/O examples, providing clear explanations of file purposes and usage guidelines. When updates were made to the base.py file, adding logging functionality and a new method, the system successfully incorporated these changes while preserving existing content. The system also demonstrated its ability to handle recursive updates, propagating changes from the ParameterBase class in base.py to dependent files like engine.py. Additionally, it responded effectively to a PR comment, requesting more input/output examples, showcasing its capacity to incorporate reviewer feedback. Overall, the multi-agent system proved capable of generating, updating, and maintaining comprehensive documentation across all files in a software repository.
Comparison with RepoAgent
We compared DocAider, a multi-agent documentation system, with RepoAgent, another LLM-based documentation generation tool. While RepoAgent produces lengthy paragraphs, DocAider generates concise, bullet-pointed documentation, aligning with developers’ preferences for brevity. DocAider’s multi-agent approach potentially enhances accuracy and reduces hallucinations compared to RepoAgent’s single-agent system. DocAider also implements a Reviewer and Revisor agent to suggest and apply improvements. A notable feature of DocAider is its HTML-based front-end interface, which improves documentation accessibility and organization – factors highly valued by developers.
While our system is well-designed and offers unique features like recursive updates, RepoAgent stands out by providing thorough I/O examples for every function. However, LLMs can make incorrect assumptions as function complexity increases, leading to factual inaccuracies or nonsensical outputs. To mitigate this, we restrict the LLM from making such assumptions, resulting in some functions/classes lacking I/O examples.
Quantitive Analysis
We conducted a quantitative analysis of DocAider’s performance across six popular GitHub repositories: collarama, fake-useragent, graphiz, photon, progress, and pywhat. These repositories were selected based on their popularity (over 1000 stars each) and size (small to medium, limited to 20 files per repository). All of them varied in the number of functions and classes. Scores are normalised between 0 and 1, reflecting the presence of these attributes in the documentation. For instance, a score of 1 for Function/Class Description indicates that every class and function in the repository is described in the documentation, while a score of 0.5 for I/O examples means that only half of the functions have I/O examples provided in the documentation.
DocAider achieved perfect scores (1.0) for function/class descriptions across all repositories, demonstrating consistent performance. For parameters/attributes, most repositories received perfect scores, with only collamara scoring slightly lower at 0.94 due to two functions lacking parameter documentation. I/O examples showed the most variation, with scores ranging from 0.54 (photon and progress) to 0.88 (collamara). Lower scores were often due to specific function types without return values (e.g., class init methods) or complex logic that made example generation challenging. Return value documentation was consistently strong, with all repositories scoring 1.0.
Overall, DocAider is proficient in many areas, such as generating function/class descriptions and handling most documentation aspects. However, there is room for improvement in consistently documenting I/O examples, particularly for functions with more complex logic.
Lessons Learned
The development of DocAider provided valuable insights across several key areas. Firstly, the adoption of a multi-agent approach proved crucial in managing system’s complexity. Initially, a single-agent design led to issues such as hallucinations and incomplete documentation. By transitioning to a multi-agent architecture, the team was able to distribute tasks across specialized agents, each handling specific aspects of the documentation process. This approach significantly improved the accuracy and reliability of the documentation while also enhancing system scalability. The success of this strategy highlighted the importance of modular design and task specialization in complex AI-driven systems. Secondly, prompt engineering emerged as a critical and unexpectedly challenging aspect of the project. The quality of generated documentation was heavily dependent on the prompts given to the Large Language Models (LLMs). Initial struggles with overly broad or contextually lacking prompts led to irrelevant or inaccurate outputs. Through iterative testing and refinement, the team developed more precise and context-aware prompts, significantly improving documentation quality. This experience underscored the complexity and importance of effective prompt engineering in applications requiring high accuracy and relevance. Lastly, the team learned the critical importance of managing dependency versions. An incident where a new version of Semantic Kernel (1.3.0) caused the software to crash in Docker due to API changes highlighted the need for version consistency across development and deployment environments. This experience emphasized the importance of carefully managing and aligning dependency versions to ensure system stability and functionality.
Team Contributions
Jakupov Dias (Team Leader): Team management, Stakeholder communication, development of Documentation Update, Recursive Update, Update on PR comment, prompt engineering.
Chengqi Ke: development of Retrieval Augmented Generation and multi-agent communication using Semantic Kernel and AutoGen.
Fatima Hussain: development of GitHub workflows, evaluation of DocAider’s performance and effectiveness.
Tanmay Thaware: development of Retrieval Augmented Generation, evaluation of DocAider’s performance.
Tomas Kopunec: development of Abstract Syntax Tree, Recursive Update and HTML front-end .
Zena Wang: development of GitHub workflows and handled deployment, packaging the tool into a Docker image.
Future Work
Conclusion
DocAider successfully automated the creation and upkeep of accurate, up-to-date documentation, significantly reducing the manual workload for developers. By leveraging AI tools like Microsoft AutoGen, Semantic Kernel, and Azure AI Studio, the project addressed key challenges in maintaining consistent, real-time documentation.
While budget constraints, missing I/O examples, and the limitations of LLMs posed challenges, the project established a solid foundation for future improvements. Beyond solving the immediate need for documentation management, DocAider raised the bar for efficiency and accuracy in software development, showcasing the potential of AI-driven solutions for more advanced applications.
Call To Action
We invite you to explore DocAider further and consider how its innovative approach to documentation maintenance can be applied in your projects. Here are some steps you can take and explore the tools we used:
Connect with Us: Feel free to reach out to our team for more information or collaboration opportunities.
AutoGen: https://www.microsoft.com/en-us/research/project/autogen/
Semantic Kernel: https://learn.microsoft.com/en-us/semantic-kernel/overview/?tabs=Csharp
Special Thanks to Contributors
Each contributor’s continuous support and involvement all plays a crucial role in the success of the project, here, we present a special thanks to all following contributors.
Lee Stott, Principal Cloud Advocate Manager at Microsoft
Diego Colombo, Principal Software Engineer at Microsoft
Jens Krinke, Senior Lecturer and Academic Supervisor
Team
The team involved in developing this project included 6 members. All of us are Masters students at UCL studying Software Systems Engineering
Dias Jakupov – Team Leader – Full Stack Developer
GitHub URL: https://github.com/Dias2406/
LinkedIn URL: https://www.linkedin.com/in/dias-jakupov-a05258221/
Chengqi Ke – Full Stack Developer
GitHub URL: https://github.com/CQ-Ke/
LinkedIn URL: http://linkedin.com/in/chengqi-ke-9b91a8313/
Tomas Kopunec – Full Stack Developer
GitHub URL: https://github.com/TomasKopunec/
LinkedIn URL: https://www.linkedin.com/in/tomas-kopunec-425b0199/
Fatima Hussain – Full Stack Developer
GitHub URL: https://github.com/fatimahuss/
LinkedIn URL: http://linkedin.com/in/fatima-noor-hussain/
Tanmay Thaware – Full Stack Developer
GitHub URL: https://github.com/tanmaythaware/
LinkedIn URL: http://linkedin.com/in/tanmaythaware/
Zena Wang – Full Stack Developer
GitHub URL: https://github.com/ZenaWangqwq/
LinkedIn URL: https://www.linkedin.com/in/zena-wang-b63a8822b/
Microsoft Tech Community – Latest Blogs –Read More
staff booking across timezones
Hi eve!
Our team has 5 members, 2 in India, working in IST, one member working in GST and 2 members working in EST. I have kept my business hours 24 hours open since the time zones cover most of the day. I being the admin have my time zone set to IST. How do I enable the coherence of the time zone among the members across the time zones.
I want the customer to be able to book timings across the time zones. How do I setup the calendar for all my staff which do not sync in time, will the outlook for my EST and GST staff sync with IST if I use IST to set up their time?
Hi eve!Our team has 5 members, 2 in India, working in IST, one member working in GST and 2 members working in EST. I have kept my business hours 24 hours open since the time zones cover most of the day. I being the admin have my time zone set to IST. How do I enable the coherence of the time zone among the members across the time zones.I want the customer to be able to book timings across the time zones. How do I setup the calendar for all my staff which do not sync in time, will the outlook for my EST and GST staff sync with IST if I use IST to set up their time? Read More
old spfx project fails to bundle on windows server 2019 VM
Hi all,
I hope someone can help as I wasted a lot of time troubleshooting.
I have an old spfx project for on-premise sharepoint.
On my laptop I can bundle it using nodeV8 and gulp CLI version: 2.3.0 Local version: 3.9.1
I have configured a dev VM with windows 2019 and copied over the code, configured the same versions of node and gulp but when I do a bundle it fails on errors that it cannot find files referenced to in css files.
Looking at the errors they make even less sense because the files are there, but it is looking in the wrong folder. instead of “srcassests” it is looking in “srccomponentsassets” (which does not exist).
I can only guess that it is the version of windows (2019) somehow impacting webpack’s translation of the relative paths.
The error message says “Can’t resolve ‘../../assets/icons/app.png'”
but the actual code file says
Hi all,I hope someone can help as I wasted a lot of time troubleshooting.I have an old spfx project for on-premise sharepoint.On my laptop I can bundle it using nodeV8 and gulp CLI version: 2.3.0 Local version: 3.9.1I have configured a dev VM with windows 2019 and copied over the code, configured the same versions of node and gulp but when I do a bundle it fails on errors that it cannot find files referenced to in css files.Looking at the errors they make even less sense because the files are there, but it is looking in the wrong folder. instead of “srcassests” it is looking in “srccomponentsassets” (which does not exist). I can only guess that it is the version of windows (2019) somehow impacting webpack’s translation of the relative paths. The error message says “Can’t resolve ‘../../assets/icons/app.png'”but the actual code file says background-image: url(../../../assets/icons/app.png); so you see it ignores the first “../” why is this happening only on a server VM (I created two VMs from scratch to make sure it is not something I did the first time). Read More
Word Form control markers showing on PDF export
Hello,
As the subject says, I’m getting markers showing in my PDF files that have been exported from Word. The document/template is set up with content controls for the form fields. When the form fields have been filled out, the file is exported as a PDF.
In the image the left side is the Word doc, the right side is the PDF. I’ve indicated the marks with cyan ovals.
The text has exported as an image and they have exported as paths.
I have exported using “Save as Adobe PDF” from the File menu, “Create PDF” from the Ribbon and by selecting PDF from options under “Save As” in the File menu.
I’m running MS365 on Windows 11 Pro. I rarely use Word.
As this file will be used by an number of different users of varying experience levels, I am hoping that there is solution that can be applied to the template.
Any suggestions will be greatly appreciated.
Hello,As the subject says, I’m getting markers showing in my PDF files that have been exported from Word. The document/template is set up with content controls for the form fields. When the form fields have been filled out, the file is exported as a PDF.In the image the left side is the Word doc, the right side is the PDF. I’ve indicated the marks with cyan ovals.The text has exported as an image and they have exported as paths.I have exported using “Save as Adobe PDF” from the File menu, “Create PDF” from the Ribbon and by selecting PDF from options under “Save As” in the File menu. I’m running MS365 on Windows 11 Pro. I rarely use Word.As this file will be used by an number of different users of varying experience levels, I am hoping that there is solution that can be applied to the template. Any suggestions will be greatly appreciated. Read More
Delete Like Voting
On the Sharepoint site pages you can like the page at the bottom. How can we delete individual like votes? The likes still show employees who have not been with the company for a long time. Is there a way to delete this with PnP Powershell?
On the Sharepoint site pages you can like the page at the bottom. How can we delete individual like votes? The likes still show employees who have not been with the company for a long time. Is there a way to delete this with PnP Powershell? Read More
ผมล้างข้อมูลของแอป microsoft authenticator โดยไม่ตั้งใจทำให้รหัสในการเข้าบัญชี 2 ชั้นของผมหาย
ผมต้องใช้บัญชีนี้ในการทำงานผมสามารถแก้ไขมันยังไงได้บ้าง
ผมต้องใช้บัญชีนี้ในการทำงานผมสามารถแก้ไขมันยังไงได้บ้าง Read More
Team Calendar alternative and link to work items
We’ve started using the Team Calendar, but was looking for something which could link to work items and dates so we can see how things are planned out in a calendar month(s) view by iteration cycles.
Ideally, we only want to update the work item e.g due date which will sync across to the calendar.
I have tried looking for an extension with no luck, and seeing if someone has found a solution.
We’ve started using the Team Calendar, but was looking for something which could link to work items and dates so we can see how things are planned out in a calendar month(s) view by iteration cycles. Ideally, we only want to update the work item e.g due date which will sync across to the calendar. I have tried looking for an extension with no luck, and seeing if someone has found a solution. Read More
Price increase D365 Sandboxes – did I miss something?
Dear all
last week, we received this mail regarding a tier 5 Sandbox and its pricing:
I know about the upcoming price update as from October 1 but a) it is not October yet and b) this SKU was not announced to be increased at all.
Is this a mistake in Microsoft’s billing system? (almost impossible – ok, sarcasm off now)
Thanks for your inputs
cheers
Daniel
Dear alllast week, we received this mail regarding a tier 5 Sandbox and its pricing:I know about the upcoming price update as from October 1 but a) it is not October yet and b) this SKU was not announced to be increased at all.Is this a mistake in Microsoft’s billing system? (almost impossible – ok, sarcasm off now)Thanks for your inputscheersDaniel Read More
Simple import data and lookup
I want to import data from excel and update a table in sqlserver
The excel file has a column and a pk to compare with DB table, and update db table column.
I am able to get the data using excel source and now looking for how to update the database table one column IsValid using excel (header in excel is same as header in sql table and data is also same) data.
Excel file and db table has primary key value as ID, if the ID from excel and DB matches I want to update the table data column isValid using excels isValid field
Is it using look up we can do it or using merge ?
I want to import data from excel and update a table in sqlserver The excel file has a column and a pk to compare with DB table, and update db table column. I am able to get the data using excel source and now looking for how to update the database table one column IsValid using excel (header in excel is same as header in sql table and data is also same) data. Excel file and db table has primary key value as ID, if the ID from excel and DB matches I want to update the table data column isValid using excels isValid field Is it using look up we can do it or using merge ? Read More
Cross forest and tenant free/bussy and GAL sharing
Is it possible to setup Free/Busy and GAL sharing between two organization, with their distinct and untrusted forest, where one has Exchange On-prem Only and the second one has exchange online only ?
thanks
Is it possible to setup Free/Busy and GAL sharing between two organization, with their distinct and untrusted forest, where one has Exchange On-prem Only and the second one has exchange online only ?thanks Read More
Unable to run signed scripts using Live response
Hi,
Our scripts uploaded in MDE library are signed by a certificate. MDE throws an error upon running any of the signed scripts. However, when these scripts are executed using standalone Powershell console they work as intended. MDE support suggests in order to execute signed scripts we need to install the certificate on the device before executing those signed scripts.
What’s interesting here is on the system where we ran the script successfully (using PS console), the certificate was not installed on it. Also, we were able to validate cert chain using powershell.
Any suggestions on what can be done here as we were hoping MDE executes in the same way as PS does. Not intending to install the cert on every device the script gets executed.
Thank you !!
Hi, Our scripts uploaded in MDE library are signed by a certificate. MDE throws an error upon running any of the signed scripts. However, when these scripts are executed using standalone Powershell console they work as intended. MDE support suggests in order to execute signed scripts we need to install the certificate on the device before executing those signed scripts. What’s interesting here is on the system where we ran the script successfully (using PS console), the certificate was not installed on it. Also, we were able to validate cert chain using powershell. Any suggestions on what can be done here as we were hoping MDE executes in the same way as PS does. Not intending to install the cert on every device the script gets executed. Thank you !! Read More
Suggestion to change the order in which tasks get added in lists
Hey, I find Microsoft to do to be an excellent and simple application, however, in some ways I have found it to come across as unrefined and unpleasant to use.
Irritatingly, when adding a task in a list, the task is added at the top of the list, meaning, if you have an order to your list, you have to start adding from the bottom up or drag each added task to the bottom.
It would greatly improve the useability of the app if you made tasks add to the bottom of the list.
Thank you for your time and consideration, and I hope that you will continue the great work that keeps my tasks organised.
Best regards.
Hey, I find Microsoft to do to be an excellent and simple application, however, in some ways I have found it to come across as unrefined and unpleasant to use.Irritatingly, when adding a task in a list, the task is added at the top of the list, meaning, if you have an order to your list, you have to start adding from the bottom up or drag each added task to the bottom.It would greatly improve the useability of the app if you made tasks add to the bottom of the list. Thank you for your time and consideration, and I hope that you will continue the great work that keeps my tasks organised.Best regards. Read More
Azure OpenAI Service Dev Day Conference in Japan
Azure OpenAI Service Dev Day, a conference focused on the Azure OpenAI Service, was held in Tokyo. The event attracted 700 attendees, beginning with a keynote session in the morning and featuring two tracks of breakout sessions, sponsor booths, Ask the Speaker areas, and a networking party in the evening. It was a full day packed with energy from both the participants and the organizers, providing an opportunity to learn about the latest updates and use cases of Azure OpenAI Service while expanding professional networks.
What stands out about this conference is that it was organized by a volunteer-led community, not a Microsoft-led event. The event was a festival of multiple technical communities utilizing Azure OpenAI Service, and its planning and execution—from conceptualization to day-of operations—were managed by Microsoft MVPs, Microsoft Regional Directors, and community leaders, who usually work within their own groups but came together for this occasion. The opening session kicked off the event by clearly stating, “This is a technical community event,” which created a strong sense of unity among the attendees, forming a new community in the process.
Kazuyuki Miyake a Microsoft Azure MVP and Microsoft Regional Director, who is one of the founders and co-organizers of this conference, shares the inspiration behind hosting the event: “The inspiration behind hosting this conference was the desire to take action on the generative AI movement that has been building over the past year and a half. We aimed to provide a platform to share insights on generative AI from a developer’s perspective. Additionally, it was important for us to extend the reach of community-driven initiatives from Japan to the whole of Asia, aligning with the mission of the Microsoft Regional Director.”
The organizing team members who gathered early in the morning to attend a meeting at the venue
At this conference, a wide range of sessions were held by industry experts who are leveraging Azure OpenAI Service in business across various sectors in Japan, from innovative startups to globally renowned enterprises. Kazuyuki comments on this, saying, “One of the highlights of the conference was the visit from key AI figures from Microsoft headquarters, who introduced the latest roadmap. Additionally, the breakout sessions featured exciting talks by Azure AI engineers, including several Microsoft MVPs. Moreover, leading Japanese automotive manufacturers and innovative startups showcased their AI usage cases, adding immense value to the event.”
Microsoft speakers, including Marco Casalaina, the Vice President of Products for Azure AI at Microsoft Corporation, who flew in from the United States, along with the event organizers Shingo Yoshida (third from the right) and Kazuyuki Miyake (right)
(From left) Azure MVP speaker, Tatsuro Shibamura, and community leader speaker, Nahoko Ushirokawa
Adding even more diversity to the conference was Mijeong Jeon, a Korean AI Platform MVP. Kazuyuki, with the desire to bring new perspectives not found in Japan, invited her as a speaker from Korea, where both the government and private sector are actively working together on AI initiatives, including the Seoul AI Hub mentioned in our recent blog article, Microsoft AI Tour and Community Engagements in Seoul.
When preparing her presentation for a different audience than usual, Mijeong considered the interests and trends among Japanese developers. With suggestion from Kazuyuki, she delivered a session explaining Prompty, a topic that had limited information in Japan, which was met with great enthusiasm from the audience. Reflecting on her experience at this event, she shares her thoughts, “This was my first time presenting in a different country, and I anticipated that it might differ from my usual presentations in Korea. However, upon arriving at the venue and meeting the Japanese audience, I was struck by their deep passion for AI and their eagerness to incorporate this technology into their work. I was pleasantly surprised to find that this enthusiasm for cutting-edge technology transcends language barriers and is shared by both Japanese and Korean developers.”
Despite it being his first time presenting at an international conference, Mijeong provided a technical explanation in front of a large audience
Additionally, based on interactions with participants during the event, there was a growing sense of the importance of hosting cross-country events between countries where communication tends to occur more often in the participants’ native languages rather than in English. “One particular moment that stood out was when an audience member approached me after my session to ask detailed questions about the practical aspects of building LLM services in Korea. Our conversation delved into several key issues, including the performance differences between English prompts versus Japanese or Korean prompts when using English based LLMs, and how token usage can vary significantly depending on the language. We also discussed the challenges and benefits of using models specifically developed for their respective languages. Through this exchange, we discovered that, as non-English language users, we share many similar concerns and experiences when working with large language models. This discussion underscored the importance of cross-country events, especially in the field of language models.”
After the session, Mijeong engaged in discussions with participants at the Ask the Speaker area
After each participant had increased their motivation to utilize Azure OpenAI Service and improve their skills, the evening party, AOAI “Connect” Night, was held. The entire event space was utilized, with DJs, including Microsoft Regional Directors, playing music. Some participants enjoyed dancing, others engaged in conversations about technology, and still others participated in a quiz contest or enjoyed giving presentations during the Lightning Talk session. Everyone enjoyed the post-event time in their own way. The organizers led all of these activities, and their vibrant energy in making the event more exciting was highly impressive.
Microsoft Regional Directors Kazuyuki Miyake and Atsushi Kojima, DJ at the event.
Staff energizing the Lightning Talk (from left: Kazuyuki Sakemi, Maki Nagase — Azure MVPs from this August)
250 French cruller donuts, prepared because their design closely resembles the OpenAI logo
Kazuyuki summarized the conference, reflecting with the words “I was astonished that despite having a preparation period of only about two months, we reached approximately 700 attendees. Additionally, it was fascinating that Microsoft speakers introduced some unreleased features, even though it was a community-led conference. The social gathering was also packed with enjoyable activities like LT sessions and a DJ performance, making it a memorable experience.”
While many events centered around generative AI are held worldwide, in-person events specifically focused on Azure OpenAI Service like this one are quite rare. It was a truly valuable learning opportunity. Based on this experience, Kazuyuki holds hopes for the future, aiming to further spread the excitement around Azure AI. “We plan to establish an Azure AI developer community, building on the team that organized this conference. Moreover, we aim to expand community activities not only in Tokyo but also to other regions in Japan and across Asian countries. We are committed to providing a platform for Microsoft MVPs and engineers from various regions and companies to share their insights.”
Mijeong, the only international MVP speaker, offered the following encouraging message for those looking to further develop their skills in Azure OpenAI Service and related technologies. “I believe there are many individuals eager to develop their skills in Azure OpenAI Service and the broader field of large language models (LLMs), and I count myself among them. This is a new and rapidly emerging area, which makes all of us pioneers in the field. To excel as a pioneer, I think the best approach is to experiment with different techniques and apply them to real-world scenarios. Unlike more established technologies, there is still a relative scarcity of documentation and resources available, so hands-on experience is invaluable. Language itself presents unique challenges, such as its diversity, reliance on context, and subtle cultural nuances. Because of this, we can’t always depend on the experiences of others; instead, we must create our own journeys. This is a crucial part of mastering this field.”
She continues, “Moreover, the pace of development in language models and related technologies is incredibly fast. It’s important to focus on what truly matters to your work, given the limited time available. By experimenting with various technologies and keeping an eye on company blogs that introduce the most suitable ones, attending tech events, and following thought leaders’ blogs and YouTube channels, you can quickly broaden your perspective. I hope you discover the best path to mastering cutting-edge technologies, and I look forward to the day when we are on the same page, exploring these advancements together.”
The organizing team members who supported the amazing event until the very end
*Picture Credits: Kensuke Nakai (cover picture: Rie Moriguchi)
Microsoft Tech Community – Latest Blogs –Read More
Transaction Volumes of support, eIDAS compliance, API integration capability
Hi ,
I like to know the API integration capability of Microsoft eSignature , Volumes of Transaction support, pricing, Compliance to eIDAS, GDPR, Multiparty Signature capability, Dashboard features etc. Please share or contact me. Regards
Hi , I like to know the API integration capability of Microsoft eSignature , Volumes of Transaction support, pricing, Compliance to eIDAS, GDPR, Multiparty Signature capability, Dashboard features etc. Please share or contact me. Regards Read More
Error 500 when I use filters
I am trying to filter one query on a list of sharepoint
if i add “?$filter whatever i get on 500 error
no matter the kind of filter.. it fails
do we have to do anything special to activate filters? . I connect using rest with c#
I am trying to filter one query on a list of sharepoint if i add “?$filter whatever i get on 500 error no matter the kind of filter.. it fails do we have to do anything special to activate filters? . I connect using rest with c# Read More
Apple IOS devices app restrictions configuration profile – Please review Policy option
I am not sure where to click on this screen to accept the privacy policy there is no check mark where I can click to accept it. How can I proceed next in the configuration profiles for Apple IOS devices configuration with restrictions, I have no clue where to click to proceed to the next step? Urgent, I need this to be done by tomorrow. Please, help!
I am not sure where to click on this screen to accept the privacy policy there is no check mark where I can click to accept it. How can I proceed next in the configuration profiles for Apple IOS devices configuration with restrictions, I have no clue where to click to proceed to the next step? Urgent, I need this to be done by tomorrow. Please, help! Read More