Month: September 2024
Formula (VBA generated) fills the entire column of selected cell
Hello everyone
I’m currently experiencing an issue with my Excel (365). The code is supposed to insert a formula into a cell when I double-click it, but it fills the entire column of the selected cell with the formula.
Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
Dim r As Integer
If Not Intersect(Target, Me.Range(“F3:R65”)) Is Nothing And IsEmpty(Target.Value) Then
r = Target.Row
Target.Formula = “=IF($C” & r & “>$D” & r & “,$D” & r & “+1-$C” & r & “,$D” & r & “-$C” & r & “)”
Cancel = True
End If
End Sub
Before:
After (Double-clicked cell I15):
Hello everyoneI’m currently experiencing an issue with my Excel (365). The code is supposed to insert a formula into a cell when I double-click it, but it fills the entire column of the selected cell with the formula. Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
Dim r As Integer
If Not Intersect(Target, Me.Range(“F3:R65”)) Is Nothing And IsEmpty(Target.Value) Then
r = Target.Row
Target.Formula = “=IF($C” & r & “>$D” & r & “,$D” & r & “+1-$C” & r & “,$D” & r & “-$C” & r & “)”
Cancel = True
End If
End Sub Before:After (Double-clicked cell I15): Read More
Sqlserver- Not Null Cases
Hi,
Please help me on below requirement.
In my table, we have 5 columns, and ID is the key column.
And I need a logic like
1. If Reason1 is null & Reason2 is not null then New_Txt
2. If Reason1 is not null and Reason2 is not null then Old_txt
If Reason2 is not null and Reason1 is Not null then New_Txt
( It have to generate 2 diff rows)
Base Table:
IDR1R2Old_txtNew_txt11200 PRO0003AB11200P03Q08GD11200Q08Q09HE
Required OutPut Like :
IDR1R2Old_TxtNew_Txt11200NullP03 B11200P03Q08G 11200P03Q08 D11200Q08Q09H 11200Q08Q09 E
Hi, Please help me on below requirement.In my table, we have 5 columns, and ID is the key column.And I need a logic like1. If Reason1 is null & Reason2 is not null then New_Txt2. If Reason1 is not null and Reason2 is not null then Old_txt If Reason2 is not null and Reason1 is Not null then New_Txt ( It have to generate 2 diff rows)Base Table:IDR1R2Old_txtNew_txt11200 PRO0003AB11200P03Q08GD11200Q08Q09HE Required OutPut Like : IDR1R2Old_TxtNew_Txt11200NullP03 B11200P03Q08G 11200P03Q08 D11200Q08Q09H 11200Q08Q09 E Read More
Access Projects through Planner
Hello,
since the new Planner update in Teams I can see projects of groups that I’m not even part of. Where is the accessibility to Projects controlled?
Thanks
Timo Schuldt
Hello, since the new Planner update in Teams I can see projects of groups that I’m not even part of. Where is the accessibility to Projects controlled? ThanksTimo Schuldt Read More
Building Retrieval Augmented Generation on VSCode & AI Toolkit
Retrieval Augmented Generation (RAG) on VS Code AI Toolkit:
AI toolkit allows users to quickly deploy models locally or in the cloud, test and integrate them via a user-friendly playground or REST API, fine-tune models for specific requirements, and deploy AI-powered features either in the cloud or embedded within device applications.
In the previous blogs, we learnt how to get started with AI Toolkit by installing and creating basic application. Please refer to the blogs for detailed insights and updates if this is your first time using the VS Code AI Toolkit.
Visual Studio Code AI Toolkit: Run LLMs locally
Visual Studio AI Toolkit: Building GenAI Applications
Retrieval Augmented Generation (RAG):
LLMs are trained on a specific dataset from various domains. When you want to work with LLMs, there might be information that is specific to your dataset or domain which the LLMs doesn’t have enough knowledge about. Generated language might also need to be referenced based on the domain and use case. RAG is used for these use cases to increase the applicability to specific domains and datasets. e.g. The LLM might know about legal services in general but if you want to reference specific statutes in US Law and get references to them then RAG might be a good approach to do this. A similar approach can be applied for any other country specific laws as well.
Retrieval-Augmented Generation (RAG) is a hybrid approach in natural language processing (NLP) that combines two key elements: retrieval of relevant information from external data sources and generation of text based on this retrieved information.
Retrieval: These models retrieve relevant text from a large repository of documents when generating responses or completing tasks. They excel at providing factual and specific information. A better retrieval mechanism leads to a more accurate and relevant response.
Instead of relying solely on learned patterns and data, RAG incorporates a retrieval step. This involves searching or retrieving specific relevant snippets from a large database of documents related to the input query or task. RAG improves the accuracy and relevancy of the response based on the domain specific document repository.
Generation: These models generate responses from scratch based on learned patterns and data that they are being trained on. They can create fluent and contextually appropriate text but may struggle with accuracy and factual correctness when dealing with specific queries. e.g. if we ask the model for current dollar rate, then the language model can only generate the response based on the data that it is trained on and it is likely to be inaccurate. The information contained in the model is based on the documents it is trained on and for current events such as news and currency prices which are dynamic, it will be accurate only until the training date cutoff. So, connecting it to a reliable source then allows us to extend the model to get it from the right source. Similarly, in-case the model answers from its pretrained data, it may not be able to quote the reference for the data. This can be limiting in certain cases if user doesn’t know the source of the answer. With RAG the source can be referenced as the answer is generated.
Some Applications:
Question Answering: RAG can excel in tasks where precise answers backed by evidence are required, such as in open-domain question answering systems.
Content Creation: It can also be used to generate content that is both informative and accurate, leveraging retrieved knowledge to enhance the generation process.
Summarization: RAG can also be used to create both abstractive and extractive summaries. Extractive summaries identify the important sentences from the document(s) and generate a summary based on their relative importance. Abstractive summaries identify the most important ideas and content and synthesize them in their own words.
In this series, lets create a basic RAG application.
Let’s discuss the architecture in two parts, first part would be creation of database and second is retrieval.
Creation of Database
We will use a PDF file that will be used for RAG implementation. We will first extract text from the PDF file and then convert that into smaller pieces of documents, which are often referred as ‘chunks ‘and the process of doing so is known as ‘chunking’. This process helps the language model to extract the right document without exceeding the context limit. Chunk overlap and chunk size must be balanced well for a getting good results.
Once we have document chunks, we will next proceed to convert them into embeddings. Embeddings are a foundational concept in NLP. Embeddings enable machines to understand and process human language more effectively by representing words or phrases as vectors in a continuous space where semantic relationships are encoded.
How Embeddings Work?
Vector Representation: Each word or phrase is represented as a vector of real numbers. For example, in a 300-dimensional embedding space, each word might be represented by a vector with 300 numerical values.
Semantic Similarity: Words with similar meanings are represented by vectors that are closer together in the embedding space. For instance, vectors for “dog” and “cat” would be closer than vectors for “dog” and “car”.
Learned from Data: Embeddings are learned from large amounts of textual data using techniques like Word2Vec, GloVe (Global Vectors for Word Representation), or through neural network-based approaches such as Transformer models.
Applications: Embeddings are widely used in NLP tasks such as sentiment analysis, machine translation, text classification, and more. They allow models to generalize better to data they have not seen before and capture intricate relationships between words.
Once we have the embeddings, we store these in a unique database called as vector database.
Vector Databases
Vector databases are specialized for storing and retrieving vector data efficiently, making them an essential component in applications that rely on similarity-based search and analysis of high-dimensional data vectors. ChromaDB, is such an AI-native open-source vector database which will be used in this tutorial. For learning more about ChromaDB, Click here.
We will be utilizing ChromaDB from the Langchain framework in this tutorial.
Setting up a virtual environment (venv) is highly recommended while following this blog. Python is a prerequisite, if not installed please install the latest version of python on the machine. For detailed steps click here. Once the environment is setup, ChromaDB needs to be installed using the python package installer “pip”.
In the VSCode terminal type the following command,
pip install chromadb
We will also utilize LangChain, a widely used OS framework for developing GenAI applications. Langchain is an Opensource Framework which is used as an orchestrator to build customizable AI applications especially those application which use LLMs/SLMs . Langchain provides tools and abstractions that make it easier to customize, control, and integrate LLMs into applications.
The following are some of the major components of Langchain.
Prompts: Prompts are the text instructions or questions that you provide to the LLM. Well-crafted prompts are crucial for getting accurate and relevant responses from the LLM. LangChain provides templates to structure prompts and make them more reusable.
LLMs: Large Language models (LLMs) like GPT-4o, LLaMA, and others are pre-trained models with vast knowledge and capabilities. LangChain seamlessly integrates with popular LLM providers. You can also use your own custom-trained models.
Chains: Chains are responsible for orchestrating the flow of data and interactions between different components.
Types of Chains:
Sequential Chains: Execute components in a linear order.
Parallel Chains: Execute components concurrently.
Conditional Chains: Execute components based on certain conditions.
Generative Chains: Generate text or other outputs.
Custom Chains: You can create your own custom chains to suit specific use cases
Callbacks:
Monitoring and Logging: Callbacks provide a way to track the progress of your application and log important events.
Customizations: You can implement custom callbacks to perform actions like sending notifications or storing data.
Indexes:
Document Retrieval: Indexes are used to store and retrieve documents that can be used as context for LLMs.
Vector Databases: LangChain supports various vector databases for efficient document retrieval.
Agents:
Autonomous Actions: Agents are capable of taking actions based on the information they gather from the environment.
Decision-Making: Agents use LLMs to make decisions and complete tasks.
Memory:
Context Preservation: Memory allows LLMs to maintain context and remember information from previous interactions.
Types of Memory:
Conversation Memory: Stores the history of a conversation.
Document Memory: Stores information from documents.
Episodic Memory: Stores information about past events.
Tools:
External Integration: Tools enable LLMs to interact with external resources like search engines, calculators, or APIs.
Expanding Capabilities: Tools can enhance the functionality of your applications.
By effectively combining these components, you can create a wide range of applications, including chatbots, question-answering systems, text summarization tools, and more.
To install, type the following command,
pip install langchain
LangChain Community contains third-party integrations that implement the base interfaces defined in LangChain Core, making them ready-to-use in any LangChain application.
Type the following command in VSCode terminal to install the Langchain community,
pip install langchain-community
Let’s now begin by importing the required libraries into our coding editor. It is recommended to use a notebook file (.ipynb) for this part. Head to Visual Studio code, and then create a new file, name is as dbmaker.ipynb, we must be having a notebook file now, Select the Kernel to the virtual environment that we have created earlier or use the python version installed on the local machine.
Now import the following libraries,
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import DirectoryLoader,PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
It is quite important to understand the use of the above statements, so let’s look at these one by one,
Chroma:
Module: langchain_community.vectorstores
Purpose: Chroma is a vector store that allows you to store and retrieve high-dimensional vectors. It is used to store embeddings of documents or text and retrieve them based on similarity searches.
Usage: Typically used in applications that require efficient similarity searches, such as document retrieval or question-answering systems.
HuggingFaceEmbeddings:
Module: langchain_community.embeddings
Purpose: HuggingFaceEmbeddings provides a way to generate embeddings using models from the Hugging Face library. These embeddings are numerical representations of text that capture semantic meaning.
Usage: Used to convert text into embeddings that can be stored in a vector store like Chroma for similarity searches.
DirectoryLoader:
Module: langchain_community.document_loaders
Purpose: DirectoryLoader is used to load documents from a specified directory. It can handle various file types and is useful for batch processing of documents.
Usage: Commonly used to load a large number of documents from a directory for further processing, such as embedding generation or text splitting.
PyMuPDFLoader:
Module: langchain_community.document_loaders
Purpose: PyMuPDFLoader is a specialized document loader that uses the PyMuPDF library to load and process PDF documents.
Usage: Used to extract text from PDF files, which can then be processed further, such as generating embeddings or splitting text.
RecursiveCharacterTextSplitter:
Module: langchain.text_splitter
Purpose: RecursiveCharacterTextSplitter is used to split text into smaller chunks based on character count. It recursively splits text to ensure that chunks are of manageable size while preserving semantic meaning.
Usage: Useful in scenarios where large documents need to be broken down into smaller, more manageable pieces for processing, such as embedding generation or indexing.
Once we have done these imports successfully, it’s time now to specify the directory of the documents and also to specify the embedding model.
#Directory of the PDF Files
dir = ‘docs/’
# OS Embedding model from Huggingface
embeddings = HuggingFaceEmbeddings(model_name=’all-MiniLM-L6-v2′)
This line defines a variable ‘dir’ that holds the path to a directory containing PDF files. This directory path will be used later to load and process the PDF files stored in this location.
The model that will be used in this tutorial is all-MiniLM-L6-v2, which is a pre-trained model from Hugging Face.
The embeddings object will be used to convert text into numerical embeddings. These embeddings capture the semantic meaning of the text and can be used for various tasks such as similarity searches, clustering, or feeding into other machine learning models.
It’s now time to load the documents from the library, so let’s create a function to achieve this,
#Loading the documents
def load_docs(dir):
loader=DirectoryLoader(dir,loader_cls=PyMuPDFLoader,use_multithreading=True,max_concurrency=128,show_progress=True,silent_errors=True)
documents=loader.load()
return documents
Now let’s step through the code and explain what we are doing here,
The load_docs function is designed to load documents from the specified directory. It uses the DirectoryLoader class to handle the loading process, with specific configurations to optimize performance and handle unforeseen errors gracefully.
The load_docs function is designed to accept one Parameter that is the directory. The directory path where the PDF files are located.
Initialize DirectoryLoader:
dir: The directory containing the PDF files.
loader_cls=PyMuPDFLoader: Specifies that the PyMuPDFLoader class should be used to load the PDF files. This loader is specialized for handling PDF documents.
use_multithreading=True: Enables multithreading to speed up the loading process.
max_concurrency=128: Sets the maximum number of concurrent threads to 128. This allows for parallel processing of multiple files.
show_progress=True: Displays a progress bar to indicate the loading progress.
silent_errors=True: Suppresses error messages, allowing the loading process to continue even if some files fail to load.
Load Documents:
documents = loader.load(): Calls the load method of the DirectoryLoader instance to load the documents from the specified directory.
Return Documents:
return documents: Returns the loaded documents.
This function will be now used while we create chunks, throughout this tutorial we will be following functional paradigm in order to use them efficiently wherever needed.
To create chunks, lets now design a function.
#Splitting the documents into chunks
def split_docs(documents,chunk_size=1000,chunk_overlap=100):
text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=chunk_overlap)
docs=text_splitter.split_documents(documents)
return docs
The split_docs function is designed to split a list of documents into smaller chunks. This is useful for processing large documents in manageable pieces, especially for tasks like embedding generation or indexing. This helps a lot especially while there is a need to send limited context to the model when there is a token limit.
Parameters of split_docs:
documents: A list of documents to be split. Each document is typically a string or a structured object containing text.
chunk_size (default 1000): The maximum number of characters in each chunk.
chunk_overlap (default 100): The number of characters that overlap between consecutive chunks. This helps to maintain context across chunks.
The chunk size and chunk overlap needs to be tweaked according to the use case, but for this tutorial we are keeping it to a default standard value.
Steps
Initialize RecursiveCharacterTextSplitter:
chunk_size=chunk_size: Sets the maximum size of each chunk.
chunk_overlap=chunk_overlap: Sets the number of overlapping characters between chunks.
The RecursiveCharacterTextSplitter is designed to split text into chunks while preserving semantic meaning as much as possible.
Split Documents:
docs = text_splitter.split_documents(documents): Calls the split_documents method of the text_splitter instance to split the input documents into smaller chunks.
Return Chunks:
return docs: Returns the list of document chunks.
We have now completed the task of designing functions that will further help us in performing the tasks that we intend to do inorder to create a vector database.
Its time to utilise the functions and create the vector database,
documents=load_docs(dir)
len(documents)
doc=split_docs(documents)
print(len(doc))
Let’s call the load_docs function with the directory path, to load the documents from the specified directory. The loaded documents are stored in the documents variable. Then, to check the length of the documents, we can use the len() function.
Now call the split_docs function with the loaded documents to split them into smaller chunks. The resulting chunks are stored in the doc variable. The length of the doc list is printed, indicating how many chunks were created from the original documents.
As we have the chunked embeddings ready, its now time to store these into a vector database. As we have already discussed we will be using the ChromaDB in this tutorial.
save_to=Chroma.from_documents(documents=doc,embedding=embeddings,persist_directory=’./ai-toolkit’)
The above line of code initializes a Chroma vector store from a list of document chunks and their corresponding embeddings. The vector store is then saved to a specified directory for persistence.
Parameters
documents=doc: The list of document chunks that were created by the split_docs function.
embedding=embeddings: The embedding model used to generate embeddings for the document chunks. This is an instance of HuggingFaceEmbeddings.
persist_directory=’./ai-toolkit’: The directory where the vector store will be saved. This allows the vector store to be persisted and loaded later.
Our database is now ready and we can ask a sample search query to this. Although we haven’t still configured the small language model to this, but we can still try to see how the retriever works, so let’s ask a sample query,
query=”What is Fine tuning”
We have defined our query, now its time to search it in our newly created ‘ai-toolkit’ named vector database. In order to do this, we will need to take the following steps,
Initialize a Chroma vector store by loading it from the specified directory.
Perform a similarity search on the vector store using the provided query.
Print the entire list of search results to the console.
Print the content of the first document or chunk in the search results.
The code is as follows,
db1=Chroma(persist_directory=’./ai-toolkit’,embedding_function=embeddings)
results=db1.similarity_search(query)
print(results)
print(results[0].page_content)
Upon successful execution, now there will be some results appearing in the output cell.
In this article we have learned the concepts about RAG and created embedding successfully. In the next part of this article, we will take look at how we can use RAG with these embeddings in ChromaDB to get better results using AI toolkit for VSCode and Phi-3 Model downloaded locally
Meanwhile you can take a look at the following resources about RAG and AI Toolkit
AI Toolkit for VSCode Documentation
AI Toolkit for Visual Studio Code (Github)
RAG in AI studio (Concepts)
RAG in AI Search
Phi-3 Cookbook
Microsoft Tech Community – Latest Blogs –Read More
DocAider: Automated Documentation Maintenance for Open-source GitHub Repositories
Project Overview
Comprehensive documentation is crucial for end users in open-source software projects, but manual creation and maintenance are time-consuming and costly. Traditional documentation generators like Pydoc rely on predefined rules and in-line code information. However, emerging generative AI techniques, particularly Large Language Models (LLMs), offer new possibilities for enhanced documentation generation. Developed in partnership with Microsoft and UCL, DocAider aims to create an AI-powered documentation tool that automatically generates and updates code documentation. The tool leverages Github Actions workflows to trigger documentation tasks upon pull requests (PRs) opening, providing valuable insights into continuous documentation maintenance. This approach addresses the challenges of automating documentation and ensures that project documentation remains current with minimal human intervention. This project aims to leverage LLM technologies, combined with Microsoft Semantic Kernel, Microsoft Autogen, and Azure AI Studio, to mitigate the burden of maintaining up-to-date documentation.
This system uses a multi-agent architecture where multiple agents work together to complete the task. It offers two innovative features: a recursive update mechanism, which ensures that changes ripple throughout all related documentation, and continuous monitoring and updating code via pull requests.
DocAider offers a promising solution for software engineers, with the potential to automatically maintain clean and up-to-date documentation. It allows developers to concentrate more on coding while simplifying the onboarding process for new team members. Additionally, it helps reduce costs and boosts overall efficiency.
Project Journey
This project was completed over 3 months. The few weeks, the team focused on the requirements engineering portion of the project, where we set functional and non-functional requirements, created context and architecture diagrams and broke the project down with the stakeholders, so that it was easy for us to implement in the following months, making sure we included the most important features and requirements. This process also allowed the team to see how much is realistically achievable, and what should be kept as optional if time allowed us to complete.
During implementation, our team employed agile methodologies, Git practices, and continuous integration and testing. We chose agile as our development approach because it facilitated constant communication between the team and stakeholders. This strategy proved crucial to our product’s success. Bi-weekly meetings with stakeholders allowed us to report progress, plan upcoming tasks and clarify the requirements. Additionally, we held weekly internal meetings for team members to showcase their work and assess overall progress.
Technical Details
DocAider is an LLM-powered tool that generates and updates documentation automatically. It performs the documentation tasks using a customised GitHub Actions workflow and runs in the background. We developed DocAider by integrating Semantic Kernel and AutoGen. The tools facilitate the development of AI-based software. Furthermore, we deployed and managed Azure OpenAI LLMs on Azure AI Studio. To obtain good results, we used the GPT-4-0125-preview model to create documentation for the source code. The temperature parameter was set between 0 and 0.2 for more deterministic and factual LLM responses.
Documentation Generation
AutoGen provides multiple conversation patterns to orchestrate AI agents, such as sequential chats, group chats, nested chats, etc. We used the sequential chats to create documentation. The figure shows our multi-agent architecture, which reduces LLM hallucinations in two ways: appropriate code context information and self-improvement. Four agents perform different tasks in sequence and an agent manager controls the multi-agent conversation.
Code Context Agent creates a graph representation of the entire repository, mapping the relationships between function calls. It then generates comprehensive information about the codebase using the actual source code and the relationship graph.
Documentation Generation Agent produces baseline documentation taking into account the contextual information passed from the previous agent. The documentation contains three basic sections: overview, class/function/method descriptions and input/output examples.
Review Agent assesses the baseline documentation, and suggests the improvements.
Revise Agent modifies the baseline documentation according to the suggestions and returns the improved documentation to Agent Manager.
Agent Manager controls the conversation process and responds to function calling requests from LLM-configured agents.
By using Semantic Kernel, we built skilled agents for performing specific tasks. AutoGen facilitates agent interactions to complete complex workflows. The LLM function calling capability helps to reduce programming efforts and makes agents flexible. An agent can autonomously execute external functions defined in the associated plugins to complete a variety of tasks.
Documentation Update
To maintain consistency and accuracy across all related documentation when a class/function in a file is changed, the Documentation Update feature performs the update recursively. If a class/function is modified, the system will automatically update the documentation for all dependent files. This includes documentation of the source file, as well as documentation of files that use functions dependent on the changed class/function. This recursive update feature ensures that all related documentation remains up–to–date with the latest changes in the code.
Additionally, Documentation update on PR Comment allows reviewers to trigger the documentation updates on specified files by commenting in a specific format. The reviewer can specify which file needs an update and provide instructions on what changes should be made. The system will then process this comment and update the documentation as instructed. This feature ensures that precise and targeted documentation updates can be made based on reviewer feedback, improving the overall quality and relevance of the documentation. Furthermore, it removes the need for developers to manually change documentation according to reviewers’ comments. The comment triggering this process needs to be in this format:
“Documentation {file_path}: {comment}”. For example, “Documentation main.py: Add more I/O examples”.
Results and Outcome
Our evaluation process involved three stages: a case study executing our system on a well–known repository to showcase our system, a comparison of our system against RepoAgent, and a quantitative analysis. Through this process, we could determine our system’s performance.
Case Study:
The section presents results from applying our tool to generate and update documentation for the Graphviz repository’s Python files. The system produced well-structured documentation, including overviews, global variables, function/class descriptions, and I/O examples, providing clear explanations of file purposes and usage guidelines. When updates were made to the base.py file, adding logging functionality and a new method, the system successfully incorporated these changes while preserving existing content. The system also demonstrated its ability to handle recursive updates, propagating changes from the ParameterBase class in base.py to dependent files like engine.py. Additionally, it responded effectively to a PR comment, requesting more input/output examples, showcasing its capacity to incorporate reviewer feedback. Overall, the multi-agent system proved capable of generating, updating, and maintaining comprehensive documentation across all files in a software repository.
Comparison with RepoAgent
We compared DocAider, a multi-agent documentation system, with RepoAgent, another LLM-based documentation generation tool. While RepoAgent produces lengthy paragraphs, DocAider generates concise, bullet-pointed documentation, aligning with developers’ preferences for brevity. DocAider’s multi-agent approach potentially enhances accuracy and reduces hallucinations compared to RepoAgent’s single-agent system. DocAider also implements a Reviewer and Revisor agent to suggest and apply improvements. A notable feature of DocAider is its HTML-based front-end interface, which improves documentation accessibility and organization – factors highly valued by developers.
While our system is well-designed and offers unique features like recursive updates, RepoAgent stands out by providing thorough I/O examples for every function. However, LLMs can make incorrect assumptions as function complexity increases, leading to factual inaccuracies or nonsensical outputs. To mitigate this, we restrict the LLM from making such assumptions, resulting in some functions/classes lacking I/O examples.
Quantitive Analysis
We conducted a quantitative analysis of DocAider’s performance across six popular GitHub repositories: collarama, fake-useragent, graphiz, photon, progress, and pywhat. These repositories were selected based on their popularity (over 1000 stars each) and size (small to medium, limited to 20 files per repository). All of them varied in the number of functions and classes. Scores are normalised between 0 and 1, reflecting the presence of these attributes in the documentation. For instance, a score of 1 for Function/Class Description indicates that every class and function in the repository is described in the documentation, while a score of 0.5 for I/O examples means that only half of the functions have I/O examples provided in the documentation.
DocAider achieved perfect scores (1.0) for function/class descriptions across all repositories, demonstrating consistent performance. For parameters/attributes, most repositories received perfect scores, with only collamara scoring slightly lower at 0.94 due to two functions lacking parameter documentation. I/O examples showed the most variation, with scores ranging from 0.54 (photon and progress) to 0.88 (collamara). Lower scores were often due to specific function types without return values (e.g., class init methods) or complex logic that made example generation challenging. Return value documentation was consistently strong, with all repositories scoring 1.0.
Overall, DocAider is proficient in many areas, such as generating function/class descriptions and handling most documentation aspects. However, there is room for improvement in consistently documenting I/O examples, particularly for functions with more complex logic.
Lessons Learned
The development of DocAider provided valuable insights across several key areas. Firstly, the adoption of a multi-agent approach proved crucial in managing system’s complexity. Initially, a single-agent design led to issues such as hallucinations and incomplete documentation. By transitioning to a multi-agent architecture, the team was able to distribute tasks across specialized agents, each handling specific aspects of the documentation process. This approach significantly improved the accuracy and reliability of the documentation while also enhancing system scalability. The success of this strategy highlighted the importance of modular design and task specialization in complex AI-driven systems. Secondly, prompt engineering emerged as a critical and unexpectedly challenging aspect of the project. The quality of generated documentation was heavily dependent on the prompts given to the Large Language Models (LLMs). Initial struggles with overly broad or contextually lacking prompts led to irrelevant or inaccurate outputs. Through iterative testing and refinement, the team developed more precise and context-aware prompts, significantly improving documentation quality. This experience underscored the complexity and importance of effective prompt engineering in applications requiring high accuracy and relevance. Lastly, the team learned the critical importance of managing dependency versions. An incident where a new version of Semantic Kernel (1.3.0) caused the software to crash in Docker due to API changes highlighted the need for version consistency across development and deployment environments. This experience emphasized the importance of carefully managing and aligning dependency versions to ensure system stability and functionality.
Team Contributions
Jakupov Dias (Team Leader): Team management, Stakeholder communication, development of Documentation Update, Recursive Update, Update on PR comment, prompt engineering.
Chengqi Ke: development of Retrieval Augmented Generation and multi-agent communication using Semantic Kernel and AutoGen.
Fatima Hussain: development of GitHub workflows, evaluation of DocAider’s performance and effectiveness.
Tanmay Thaware: development of Retrieval Augmented Generation, evaluation of DocAider’s performance.
Tomas Kopunec: development of Abstract Syntax Tree, Recursive Update and HTML front-end .
Zena Wang: development of GitHub workflows and handled deployment, packaging the tool into a Docker image.
Future Work
Conclusion
DocAider successfully automated the creation and upkeep of accurate, up-to-date documentation, significantly reducing the manual workload for developers. By leveraging AI tools like Microsoft AutoGen, Semantic Kernel, and Azure AI Studio, the project addressed key challenges in maintaining consistent, real-time documentation.
While budget constraints, missing I/O examples, and the limitations of LLMs posed challenges, the project established a solid foundation for future improvements. Beyond solving the immediate need for documentation management, DocAider raised the bar for efficiency and accuracy in software development, showcasing the potential of AI-driven solutions for more advanced applications.
Call To Action
We invite you to explore DocAider further and consider how its innovative approach to documentation maintenance can be applied in your projects. Here are some steps you can take and explore the tools we used:
Connect with Us: Feel free to reach out to our team for more information or collaboration opportunities.
AutoGen: https://www.microsoft.com/en-us/research/project/autogen/
Semantic Kernel: https://learn.microsoft.com/en-us/semantic-kernel/overview/?tabs=Csharp
Special Thanks to Contributors
Each contributor’s continuous support and involvement all plays a crucial role in the success of the project, here, we present a special thanks to all following contributors.
Lee Stott, Principal Cloud Advocate Manager at Microsoft
Diego Colombo, Principal Software Engineer at Microsoft
Jens Krinke, Senior Lecturer and Academic Supervisor
Team
The team involved in developing this project included 6 members. All of us are Masters students at UCL studying Software Systems Engineering
Dias Jakupov – Team Leader – Full Stack Developer
GitHub URL: https://github.com/Dias2406/
LinkedIn URL: https://www.linkedin.com/in/dias-jakupov-a05258221/
Chengqi Ke – Full Stack Developer
GitHub URL: https://github.com/CQ-Ke/
LinkedIn URL: http://linkedin.com/in/chengqi-ke-9b91a8313/
Tomas Kopunec – Full Stack Developer
GitHub URL: https://github.com/TomasKopunec/
LinkedIn URL: https://www.linkedin.com/in/tomas-kopunec-425b0199/
Fatima Hussain – Full Stack Developer
GitHub URL: https://github.com/fatimahuss/
LinkedIn URL: http://linkedin.com/in/fatima-noor-hussain/
Tanmay Thaware – Full Stack Developer
GitHub URL: https://github.com/tanmaythaware/
LinkedIn URL: http://linkedin.com/in/tanmaythaware/
Zena Wang – Full Stack Developer
GitHub URL: https://github.com/ZenaWangqwq/
LinkedIn URL: https://www.linkedin.com/in/zena-wang-b63a8822b/
Microsoft Tech Community – Latest Blogs –Read More
Custom mask Simscape blocks
I created a custom simscape library with my own blocks using .ssc files. After compiling, Matlab generates a mask. However, I am only be able to add controls of type popup and edit (in the parameters section of the .ssc file). However, I would like to add different controls such as a link and table. As far as I know, this is not possible using the .ssc file, correct?
The idea is to create a custom mask on top of the "MATLAB" mask. I want to do this programmatically by "copying" the parameters of the MATLAB mask to the custom mask and add more controls. Is this possible?
If yes, should I do this with a sl_postprocess.m file for every block? I would appreciate an example.I created a custom simscape library with my own blocks using .ssc files. After compiling, Matlab generates a mask. However, I am only be able to add controls of type popup and edit (in the parameters section of the .ssc file). However, I would like to add different controls such as a link and table. As far as I know, this is not possible using the .ssc file, correct?
The idea is to create a custom mask on top of the "MATLAB" mask. I want to do this programmatically by "copying" the parameters of the MATLAB mask to the custom mask and add more controls. Is this possible?
If yes, should I do this with a sl_postprocess.m file for every block? I would appreciate an example. I created a custom simscape library with my own blocks using .ssc files. After compiling, Matlab generates a mask. However, I am only be able to add controls of type popup and edit (in the parameters section of the .ssc file). However, I would like to add different controls such as a link and table. As far as I know, this is not possible using the .ssc file, correct?
The idea is to create a custom mask on top of the "MATLAB" mask. I want to do this programmatically by "copying" the parameters of the MATLAB mask to the custom mask and add more controls. Is this possible?
If yes, should I do this with a sl_postprocess.m file for every block? I would appreciate an example. simscape MATLAB Answers — New Questions
conedisk problem using bvp4c
conedisk2()
function conedisk2
% Parameter values
A1 = 1.10629;
A2 = 1.15;
A3 = 1.2;
A4 = 1.1;
A5 = 1.1;
A6 = 1.1;
M = 0.2;
Grt = 5;
Pr = 0.71;
R = 0.2;
Ec = 0.1;
Q = 0.1;
Rew = 12;
Red = -12;
n = -1;
g1 = 0;
g2 = 0;
g3 = 0;
g4 = 0;
inf = 1;
% Set solver options to increase the maximum number of mesh points
options = bvpset(‘RelTol’, 1e-5, ‘AbsTol’, 1e-7, ‘NMax’, 5000);
% Defining parameters
solinit = bvpinit(linspace(0, 1, 200), [0 g1 g2 0 Rew g3 1 g4]);
sol = bvp4c(@bvp2D, @bc2D, solinit);
x = sol.x;
y = sol.y;
% Plotting of the velocity
figure(1)
plot(x, y(1,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘F(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of G
figure(2)
plot(x, y(5,:), ‘linewidth’, 1)
hold on
xlabel(‘eta ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘G(eta) ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of theta
figure(3)
plot(x, y(7,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘theta(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Residual of the boundary conditions
function res = bc2D(y0, yinf)
res = [y0(1);yinf(1); y0(4); yinf(4);y0(5)-Rew; yinf(5)-Red;y0(7)-1; yinf(7)];
end
% System of First Order ODEs
function dydx = bvp2D(eta, y)
yy1 = (-(1+eta^2)*((10*A1*eta+A2*eta*y(1)-A2*y(4))*y(3))-3*(2*A1+7*A1*(eta^2)+A2*(2*eta^2+1)*y(1)-A2*eta*y(4))*y(2)-2*A2*y(5)*y(6)-3*(A1+A2*y(1))*y(4)+A3*M*y(2)-A4*Grt*y(8))/(A1*(1+eta^2)^2);
yy2 = eta*y(2);
yy3 = (-3*A1*eta*y(6)-A2*(eta*y(1)-y(4))*y(6)+A3*M*y(5))/(A1*(eta^2+1));
yy4 = ((A5+4/3*R)*((eta*((2*n)-1))*y(8)+n^2*y(7))-A3*Pr*M*Ec*(y(1)^2+y(5)^2)-Pr*Q*y(7)+A6*Pr*(y(4)*y(8)+n*y(1)*y(7)-eta*y(1)*y(8)))/(A5+4/3*R)*(1+eta^2);
dydx = [y(2); y(3); yy1; yy2; y(6); yy3; y(8); yy4];
end
end
, for the above code i want the graph for F,G as what i have attached in jpg…but I am unable to get the graph.Please help me where do I have to make the changes. Thanks in advance.conedisk2()
function conedisk2
% Parameter values
A1 = 1.10629;
A2 = 1.15;
A3 = 1.2;
A4 = 1.1;
A5 = 1.1;
A6 = 1.1;
M = 0.2;
Grt = 5;
Pr = 0.71;
R = 0.2;
Ec = 0.1;
Q = 0.1;
Rew = 12;
Red = -12;
n = -1;
g1 = 0;
g2 = 0;
g3 = 0;
g4 = 0;
inf = 1;
% Set solver options to increase the maximum number of mesh points
options = bvpset(‘RelTol’, 1e-5, ‘AbsTol’, 1e-7, ‘NMax’, 5000);
% Defining parameters
solinit = bvpinit(linspace(0, 1, 200), [0 g1 g2 0 Rew g3 1 g4]);
sol = bvp4c(@bvp2D, @bc2D, solinit);
x = sol.x;
y = sol.y;
% Plotting of the velocity
figure(1)
plot(x, y(1,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘F(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of G
figure(2)
plot(x, y(5,:), ‘linewidth’, 1)
hold on
xlabel(‘eta ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘G(eta) ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of theta
figure(3)
plot(x, y(7,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘theta(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Residual of the boundary conditions
function res = bc2D(y0, yinf)
res = [y0(1);yinf(1); y0(4); yinf(4);y0(5)-Rew; yinf(5)-Red;y0(7)-1; yinf(7)];
end
% System of First Order ODEs
function dydx = bvp2D(eta, y)
yy1 = (-(1+eta^2)*((10*A1*eta+A2*eta*y(1)-A2*y(4))*y(3))-3*(2*A1+7*A1*(eta^2)+A2*(2*eta^2+1)*y(1)-A2*eta*y(4))*y(2)-2*A2*y(5)*y(6)-3*(A1+A2*y(1))*y(4)+A3*M*y(2)-A4*Grt*y(8))/(A1*(1+eta^2)^2);
yy2 = eta*y(2);
yy3 = (-3*A1*eta*y(6)-A2*(eta*y(1)-y(4))*y(6)+A3*M*y(5))/(A1*(eta^2+1));
yy4 = ((A5+4/3*R)*((eta*((2*n)-1))*y(8)+n^2*y(7))-A3*Pr*M*Ec*(y(1)^2+y(5)^2)-Pr*Q*y(7)+A6*Pr*(y(4)*y(8)+n*y(1)*y(7)-eta*y(1)*y(8)))/(A5+4/3*R)*(1+eta^2);
dydx = [y(2); y(3); yy1; yy2; y(6); yy3; y(8); yy4];
end
end
, for the above code i want the graph for F,G as what i have attached in jpg…but I am unable to get the graph.Please help me where do I have to make the changes. Thanks in advance. conedisk2()
function conedisk2
% Parameter values
A1 = 1.10629;
A2 = 1.15;
A3 = 1.2;
A4 = 1.1;
A5 = 1.1;
A6 = 1.1;
M = 0.2;
Grt = 5;
Pr = 0.71;
R = 0.2;
Ec = 0.1;
Q = 0.1;
Rew = 12;
Red = -12;
n = -1;
g1 = 0;
g2 = 0;
g3 = 0;
g4 = 0;
inf = 1;
% Set solver options to increase the maximum number of mesh points
options = bvpset(‘RelTol’, 1e-5, ‘AbsTol’, 1e-7, ‘NMax’, 5000);
% Defining parameters
solinit = bvpinit(linspace(0, 1, 200), [0 g1 g2 0 Rew g3 1 g4]);
sol = bvp4c(@bvp2D, @bc2D, solinit);
x = sol.x;
y = sol.y;
% Plotting of the velocity
figure(1)
plot(x, y(1,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘F(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of G
figure(2)
plot(x, y(5,:), ‘linewidth’, 1)
hold on
xlabel(‘eta ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘G(eta) ‘, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Plotting of theta
figure(3)
plot(x, y(7,:), ‘linewidth’, 1)
hold on
xlabel(‘eta’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
ylabel(‘theta(eta)’, ‘fontweight’, ‘bold’, ‘fontsize’, 16)
hold off
% Residual of the boundary conditions
function res = bc2D(y0, yinf)
res = [y0(1);yinf(1); y0(4); yinf(4);y0(5)-Rew; yinf(5)-Red;y0(7)-1; yinf(7)];
end
% System of First Order ODEs
function dydx = bvp2D(eta, y)
yy1 = (-(1+eta^2)*((10*A1*eta+A2*eta*y(1)-A2*y(4))*y(3))-3*(2*A1+7*A1*(eta^2)+A2*(2*eta^2+1)*y(1)-A2*eta*y(4))*y(2)-2*A2*y(5)*y(6)-3*(A1+A2*y(1))*y(4)+A3*M*y(2)-A4*Grt*y(8))/(A1*(1+eta^2)^2);
yy2 = eta*y(2);
yy3 = (-3*A1*eta*y(6)-A2*(eta*y(1)-y(4))*y(6)+A3*M*y(5))/(A1*(eta^2+1));
yy4 = ((A5+4/3*R)*((eta*((2*n)-1))*y(8)+n^2*y(7))-A3*Pr*M*Ec*(y(1)^2+y(5)^2)-Pr*Q*y(7)+A6*Pr*(y(4)*y(8)+n*y(1)*y(7)-eta*y(1)*y(8)))/(A5+4/3*R)*(1+eta^2);
dydx = [y(2); y(3); yy1; yy2; y(6); yy3; y(8); yy4];
end
end
, for the above code i want the graph for F,G as what i have attached in jpg…but I am unable to get the graph.Please help me where do I have to make the changes. Thanks in advance. graph for f and g MATLAB Answers — New Questions
How to extract and open Json file in folder.
Dear All,
I have Json file in folder as attached. But, I tried to extract and open, it failed.
Some can help me?Dear All,
I have Json file in folder as attached. But, I tried to extract and open, it failed.
Some can help me? Dear All,
I have Json file in folder as attached. But, I tried to extract and open, it failed.
Some can help me? image analysis, image processing, digital image processing, image segmentation, image acquisition, image, data, database, data import MATLAB Answers — New Questions
Unable to start simulation at condition specified using IC block.
I have built a model to simulate acceleration and top speed of a vehicle. I want to specify the initial engine RPM as 3500 for which I have added IC Block specifying the same. But even then Engine RPM is 0 at the start. How can I start my simulation when the engine speed is 3500 RPM?I have built a model to simulate acceleration and top speed of a vehicle. I want to specify the initial engine RPM as 3500 for which I have added IC Block specifying the same. But even then Engine RPM is 0 at the start. How can I start my simulation when the engine speed is 3500 RPM? I have built a model to simulate acceleration and top speed of a vehicle. I want to specify the initial engine RPM as 3500 for which I have added IC Block specifying the same. But even then Engine RPM is 0 at the start. How can I start my simulation when the engine speed is 3500 RPM? vehicle, vehicle performance, initial condition, ic block MATLAB Answers — New Questions
staff booking across timezones
Hi eve!
Our team has 5 members, 2 in India, working in IST, one member working in GST and 2 members working in EST. I have kept my business hours 24 hours open since the time zones cover most of the day. I being the admin have my time zone set to IST. How do I enable the coherence of the time zone among the members across the time zones.
I want the customer to be able to book timings across the time zones. How do I setup the calendar for all my staff which do not sync in time, will the outlook for my EST and GST staff sync with IST if I use IST to set up their time?
Hi eve!Our team has 5 members, 2 in India, working in IST, one member working in GST and 2 members working in EST. I have kept my business hours 24 hours open since the time zones cover most of the day. I being the admin have my time zone set to IST. How do I enable the coherence of the time zone among the members across the time zones.I want the customer to be able to book timings across the time zones. How do I setup the calendar for all my staff which do not sync in time, will the outlook for my EST and GST staff sync with IST if I use IST to set up their time? Read More
old spfx project fails to bundle on windows server 2019 VM
Hi all,
I hope someone can help as I wasted a lot of time troubleshooting.
I have an old spfx project for on-premise sharepoint.
On my laptop I can bundle it using nodeV8 and gulp CLI version: 2.3.0 Local version: 3.9.1
I have configured a dev VM with windows 2019 and copied over the code, configured the same versions of node and gulp but when I do a bundle it fails on errors that it cannot find files referenced to in css files.
Looking at the errors they make even less sense because the files are there, but it is looking in the wrong folder. instead of “srcassests” it is looking in “srccomponentsassets” (which does not exist).
I can only guess that it is the version of windows (2019) somehow impacting webpack’s translation of the relative paths.
The error message says “Can’t resolve ‘../../assets/icons/app.png'”
but the actual code file says
Hi all,I hope someone can help as I wasted a lot of time troubleshooting.I have an old spfx project for on-premise sharepoint.On my laptop I can bundle it using nodeV8 and gulp CLI version: 2.3.0 Local version: 3.9.1I have configured a dev VM with windows 2019 and copied over the code, configured the same versions of node and gulp but when I do a bundle it fails on errors that it cannot find files referenced to in css files.Looking at the errors they make even less sense because the files are there, but it is looking in the wrong folder. instead of “srcassests” it is looking in “srccomponentsassets” (which does not exist). I can only guess that it is the version of windows (2019) somehow impacting webpack’s translation of the relative paths. The error message says “Can’t resolve ‘../../assets/icons/app.png'”but the actual code file says background-image: url(../../../assets/icons/app.png); so you see it ignores the first “../” why is this happening only on a server VM (I created two VMs from scratch to make sure it is not something I did the first time). Read More
Word Form control markers showing on PDF export
Hello,
As the subject says, I’m getting markers showing in my PDF files that have been exported from Word. The document/template is set up with content controls for the form fields. When the form fields have been filled out, the file is exported as a PDF.
In the image the left side is the Word doc, the right side is the PDF. I’ve indicated the marks with cyan ovals.
The text has exported as an image and they have exported as paths.
I have exported using “Save as Adobe PDF” from the File menu, “Create PDF” from the Ribbon and by selecting PDF from options under “Save As” in the File menu.
I’m running MS365 on Windows 11 Pro. I rarely use Word.
As this file will be used by an number of different users of varying experience levels, I am hoping that there is solution that can be applied to the template.
Any suggestions will be greatly appreciated.
Hello,As the subject says, I’m getting markers showing in my PDF files that have been exported from Word. The document/template is set up with content controls for the form fields. When the form fields have been filled out, the file is exported as a PDF.In the image the left side is the Word doc, the right side is the PDF. I’ve indicated the marks with cyan ovals.The text has exported as an image and they have exported as paths.I have exported using “Save as Adobe PDF” from the File menu, “Create PDF” from the Ribbon and by selecting PDF from options under “Save As” in the File menu. I’m running MS365 on Windows 11 Pro. I rarely use Word.As this file will be used by an number of different users of varying experience levels, I am hoping that there is solution that can be applied to the template. Any suggestions will be greatly appreciated. Read More
Delete Like Voting
On the Sharepoint site pages you can like the page at the bottom. How can we delete individual like votes? The likes still show employees who have not been with the company for a long time. Is there a way to delete this with PnP Powershell?
On the Sharepoint site pages you can like the page at the bottom. How can we delete individual like votes? The likes still show employees who have not been with the company for a long time. Is there a way to delete this with PnP Powershell? Read More
ผมล้างข้อมูลของแอป microsoft authenticator โดยไม่ตั้งใจทำให้รหัสในการเข้าบัญชี 2 ชั้นของผมหาย
ผมต้องใช้บัญชีนี้ในการทำงานผมสามารถแก้ไขมันยังไงได้บ้าง
ผมต้องใช้บัญชีนี้ในการทำงานผมสามารถแก้ไขมันยังไงได้บ้าง Read More
How to Report the Information Stored in Recoverable Items
Use PowerShell to Report Recoverable Items Data
In August, I published an article about using Microsoft Graph PowerShell SDK cmdlets to access Exchange Online mailbox data. The nature of website articles is that they can’t cover everything, and in turn this means that questions flow in about whether it’s possible to use a technique covered in an article to accomplish a goal.
Last week, I was asked if it’s possible to report items in the Recoverable Items structure within mailboxes. My answer is that it all depends on what data you want.
How Exchange Online Uses Recoverable Items
Recoverable Items includes folders such as Deletions, Purges, SubstrateHolds, and Versions where Exchange Online holds messages and attachments required for eDiscovery. These items might be held by a litigation hold, an in-place hold, or waiting removal by the Managed Folder Assistant after their single item recovery period or retention period expires. The Managed Folder Assistant is also responsible actioning instructions in retention policies and labels by moving items into Recoverable Items.
To make sure that it’s always possible to hold data, Recoverable Items has a separate quota of up to 110 GB. When archive mailboxes are used, Exchange mailbox retention policies can move items to the Recoverable Items folder in the archive mailbox. Microsoft 365 retention policies don’t support a move to archive action. The ability to move items into archive mailboxes for long-term storage is one of the reasons why Exchange mailbox retention policies are still very useful.
The Get-ExoMailboxFolderStatistics cmdlet (or my version of a script to report folder contents) can report how many items are in Recoverable Items folders and the consumed quota.
Accessing Recoverable Items
Outlook clients can access and recover items in the Deletions folder. Administrators can list items in the Deletions folder by running the Get-RecoverableItems cmdlet or through the Exchange admin center (Figure 1).
However, neither users nor administrators can use these options to access content held in the other Recoverable Items folders. Administrators can use the MFCMAPI utility to view the contents of any Recoverable Items folder.
All of this information is valuable, but it didn’t answer the question. The scenario contemplated is for an eDiscovery investigator who needs to review items to see if anything of interest is present. Items might be in any folder, not just Deletions.
Building a Script to Report Recoverable Items
The answer is to use PowerShell to build the report recoverable items script to:
Connect to Exchange Online and find the mailboxes of interest. Normally, an eDiscovery investigation is limited to a known subset of mailboxes and other sources. The script (downloadable from GitHub) finds all user mailboxes. Amend this command to find the right target set.
Connect to the Microsoft Graph PowerShell SDK using an application identifier of an Entra ID app that has consent to use the Graph Mail.Read application permission. An X.509 certificate loaded into the app is used for authentication. Running the script in an interactive session only allows delegate access to the folders in the mailbox of the signed-in user. An app-only session is required to access all mailboxes.
Use the Get-MgUserMailFolder cmdlet to retrieve the identifier of the Recoverable Items folder. “RecoverableItemsRoot” is a well-known folder, which makes the task easier.
Use the Get-MgUserMailFolderChildFolder cmdlet to retrieve the set of folders under the root. We’re not interested in some folders, like Calendar Logging and Audit, so the script excludes these from the analysis.
Define the time period to find items for. The script looks for items created over the last year.
For each folder, use the Get-MgUserMailFolderMessage cmdlet to fetch a limited set of properties (to speed up performance). In an eDiscovery scenario, you might want to fetch the BodyPreview property. The script fetches a single-value extended property containing the item size and formats the size (from bytes) to make it look nice.
Report what’s found (Figure 2), including generating a CSV file.
Figure 2: Report of Recoverable Items generated by PowerShell
The report recoverable items script can access confidential information. Consider using RBAC for applications to block access to sensitive or confidential mailboxes.
Code Usable for All Folders
The techniques explained here can be used to report items from any mailbox folder. It’s relatively simple PowerShell and the only thing that’s likely to trip people up is the requirement to access the Graph SDK and use an Entra ID app with an X.509 certificate for authentication and authorization. But now you know this must be done, it shouldn’t be a surprise.
Support the work of the Office 365 for IT Pros team by subscribing to the Office 365 for IT Pros eBook. Your support pays for the time we need to track, analyze, and document the changing world of Microsoft 365 and Office 365.
In ecap module how does capture event selection works?
I want to mesure position of Bldc motor using hall sensor for FOC.As a refrence model I am using mcb_pmsm_foc_hall_f280049c.slx.How does capture event selection works? Below are the image of event slection block of 3 hall sensor.I want to mesure position of Bldc motor using hall sensor for FOC.As a refrence model I am using mcb_pmsm_foc_hall_f280049c.slx.How does capture event selection works? Below are the image of event slection block of 3 hall sensor. I want to mesure position of Bldc motor using hall sensor for FOC.As a refrence model I am using mcb_pmsm_foc_hall_f280049c.slx.How does capture event selection works? Below are the image of event slection block of 3 hall sensor. hall sensor, field oriented control, ecap, f280049c MATLAB Answers — New Questions
Team Calendar alternative and link to work items
We’ve started using the Team Calendar, but was looking for something which could link to work items and dates so we can see how things are planned out in a calendar month(s) view by iteration cycles.
Ideally, we only want to update the work item e.g due date which will sync across to the calendar.
I have tried looking for an extension with no luck, and seeing if someone has found a solution.
We’ve started using the Team Calendar, but was looking for something which could link to work items and dates so we can see how things are planned out in a calendar month(s) view by iteration cycles. Ideally, we only want to update the work item e.g due date which will sync across to the calendar. I have tried looking for an extension with no luck, and seeing if someone has found a solution. Read More
Price increase D365 Sandboxes – did I miss something?
Dear all
last week, we received this mail regarding a tier 5 Sandbox and its pricing:
I know about the upcoming price update as from October 1 but a) it is not October yet and b) this SKU was not announced to be increased at all.
Is this a mistake in Microsoft’s billing system? (almost impossible – ok, sarcasm off now)
Thanks for your inputs
cheers
Daniel
Dear alllast week, we received this mail regarding a tier 5 Sandbox and its pricing:I know about the upcoming price update as from October 1 but a) it is not October yet and b) this SKU was not announced to be increased at all.Is this a mistake in Microsoft’s billing system? (almost impossible – ok, sarcasm off now)Thanks for your inputscheersDaniel Read More
Simple import data and lookup
I want to import data from excel and update a table in sqlserver
The excel file has a column and a pk to compare with DB table, and update db table column.
I am able to get the data using excel source and now looking for how to update the database table one column IsValid using excel (header in excel is same as header in sql table and data is also same) data.
Excel file and db table has primary key value as ID, if the ID from excel and DB matches I want to update the table data column isValid using excels isValid field
Is it using look up we can do it or using merge ?
I want to import data from excel and update a table in sqlserver The excel file has a column and a pk to compare with DB table, and update db table column. I am able to get the data using excel source and now looking for how to update the database table one column IsValid using excel (header in excel is same as header in sql table and data is also same) data. Excel file and db table has primary key value as ID, if the ID from excel and DB matches I want to update the table data column isValid using excels isValid field Is it using look up we can do it or using merge ? Read More
How to change baud rate for STM32 F769I-Discovery in Simulink for PiL-Simulation?
Hi guys,
for a PIL-Simulation with serial connection (USB), I want to change the PIL baud rate for an STM32 F769I-Discovery:
I tried to change the PIL baud rate to e.g. 1000000 Bd. After I compiled my model, it doesn’t run with my desired baud rate and it returned an error. When I left the baud rate to 115200 Bd (default), Simulink compiled my model and ran it with 115200Bd.
Is there an error or why can’t I change the baud rate for a STM32-F769I Discovery although there is an option to change the baud rate (see picture above)?
Greetings
RLHi guys,
for a PIL-Simulation with serial connection (USB), I want to change the PIL baud rate for an STM32 F769I-Discovery:
I tried to change the PIL baud rate to e.g. 1000000 Bd. After I compiled my model, it doesn’t run with my desired baud rate and it returned an error. When I left the baud rate to 115200 Bd (default), Simulink compiled my model and ran it with 115200Bd.
Is there an error or why can’t I change the baud rate for a STM32-F769I Discovery although there is an option to change the baud rate (see picture above)?
Greetings
RL Hi guys,
for a PIL-Simulation with serial connection (USB), I want to change the PIL baud rate for an STM32 F769I-Discovery:
I tried to change the PIL baud rate to e.g. 1000000 Bd. After I compiled my model, it doesn’t run with my desired baud rate and it returned an error. When I left the baud rate to 115200 Bd (default), Simulink compiled my model and ran it with 115200Bd.
Is there an error or why can’t I change the baud rate for a STM32-F769I Discovery although there is an option to change the baud rate (see picture above)?
Greetings
RL stm32microcontroller, baud rate, processor-in-the-loop simulation, pil, simulink, stm32-f769i discovery MATLAB Answers — New Questions