DocAider: Automated Documentation Maintenance for Open-source GitHub Repositories

Project Overview

Comprehensive documentation is crucial for end users in open-source software projects, but manual creation and maintenance are time-consuming and costly. Traditional documentation generators like Pydoc rely on predefined rules and in-line code information. However, emerging generative AI techniques, particularly Large Language Models (LLMs), offer new possibilities for enhanced documentation generation. Developed in partnership with Microsoft and UCL, DocAider aims to create an AI-powered documentation tool that automatically generates and updates code documentation. The tool leverages Github Actions workflows to trigger documentation tasks upon pull requests (PRs) opening, providing valuable insights into continuous documentation maintenance. This approach addresses the challenges of automating documentation and ensures that project documentation remains current with minimal human intervention. This project aims to leverage LLM technologies, combined with Microsoft Semantic Kernel, Microsoft Autogen, and Azure AI Studio, to mitigate the burden of maintaining up-to-date documentation.
This system uses a multi-agent architecture where multiple agents work together to complete the task. It offers two innovative features: a recursive update mechanism, which ensures that changes ripple throughout all related documentation, and continuous monitoring and updating code via pull requests.
DocAider offers a promising solution for software engineers, with the potential to automatically maintain clean and up-to-date documentation. It allows developers to concentrate more on coding while simplifying the onboarding process for new team members. Additionally, it helps reduce costs and boosts overall efficiency.

Project Journey

This project was completed over 3 months. The few weeks, the team focused on the requirements engineering portion of the project, where we set functional and non-functional requirements, created context and architecture diagrams and broke the project down with the stakeholders, so that it was easy for us to implement in the following months, making sure we included the most important features and requirements. This process also allowed the team to see how much is realistically achievable, and what should be kept as optional if time allowed us to complete.
During implementation, our team employed agile methodologies, Git practices, and continuous integration and testing. We chose agile as our development approach because it facilitated constant communication between the team and stakeholders. This strategy proved crucial to our product’s success. Bi-weekly meetings with stakeholders allowed us to report progress, plan upcoming tasks and clarify the requirements. Additionally, we held weekly internal meetings for team members to showcase their work and assess overall progress.

Technical Details

DocAider is an LLM-powered tool that generates and updates documentation automatically. It performs the documentation tasks using a customised GitHub Actions workflow and runs in the background. We developed DocAider by integrating Semantic Kernel and AutoGen. The tools facilitate the development of AI-based software. Furthermore, we deployed and managed Azure OpenAI LLMs on Azure AI Studio. To obtain good results, we used the GPT-4-0125-preview model to create documentation for the source code. The temperature parameter was set between 0 and 0.2 for more deterministic and factual LLM responses.

Documentation Generation

The Conceptual Diagram of Multi–agent Conversation for Generating Documentation

AutoGen provides multiple conversation patterns to orchestrate AI agents, such as sequential chats, group chats, nested chats, etc. We used the sequential chats to create documentation. The figure shows our multi-agent architecture, which reduces LLM hallucinations in two ways: appropriate code context information and self-improvement. Four agents perform different tasks in sequence and an agent manager controls the multi-agent conversation.

Code Context Agent creates a graph representation of the entire repository, mapping the relationships between function calls. It then generates comprehensive information about the codebase using the actual source code and the relationship graph.
Documentation Generation Agent produces baseline documentation taking into account the contextual information passed from the previous agent. The documentation contains three basic sections: overview, class/function/method descriptions and input/output examples.
Review Agent assesses the baseline documentation, and suggests the improvements.
Revise Agent modifies the baseline documentation according to the suggestions and returns the improved documentation to Agent Manager.
Agent Manager controls the conversation process and responds to function calling requests from LLM-configured agents.

By using Semantic Kernel, we built skilled agents for performing specific tasks. AutoGen facilitates agent interactions to complete complex workflows. The LLM function calling capability helps to reduce programming efforts and makes agents flexible. An agent can autonomously execute external functions defined in the associated plugins to complete a variety of tasks.

Documentation Update

To maintain consistency and accuracy across all related documentation when a class/function in a file is changed, the Documentation Update feature performs the update recursively. If a class/function is modified, the system will automatically update the documentation for all dependent files. This includes documentation of the source file, as well as documentation of files that use functions dependent on the changed class/function. This recursive update feature ensures that all related documentation remains up–to–date with the latest changes in the code.

Additionally, Documentation update on PR Comment allows reviewers to trigger the documentation updates on specified files by commenting in a specific format. The reviewer can specify which file needs an update and provide instructions on what changes should be made. The system will then process this comment and update the documentation as instructed. This feature ensures that precise and targeted documentation updates can be made based on reviewer feedback, improving the overall quality and relevance of the documentation. Furthermore, it removes the need for developers to manually change documentation according to reviewers’ comments. The comment triggering this process needs to be in this format:

“Documentation {file_path}: {comment}”. For example, “Documentation main.py: Add more I/O examples”.

Results and Outcome

Our evaluation process involved three stages: a case study executing our system on a well–known repository to showcase our system, a comparison of our system against RepoAgent, and a quantitative analysis. Through this process, we could determine our system’s performance.

Case Study:

The section presents results from applying our tool to generate and update documentation for the Graphviz repository’s Python files. The system produced well-structured documentation, including overviews, global variables, function/class descriptions, and I/O examples, providing clear explanations of file purposes and usage guidelines. When updates were made to the base.py file, adding logging functionality and a new method, the system successfully incorporated these changes while preserving existing content. The system also demonstrated its ability to handle recursive updates, propagating changes from the ParameterBase class in base.py to dependent files like engine.py. Additionally, it responded effectively to a PR comment, requesting more input/output examples, showcasing its capacity to incorporate reviewer feedback. Overall, the multi-agent system proved capable of generating, updating, and maintaining comprehensive documentation across all files in a software repository.

Documentation Update for base.py

Comparison with RepoAgent

We compared DocAider, a multi-agent documentation system, with RepoAgent, another LLM-based documentation generation tool. While RepoAgent produces lengthy paragraphs, DocAider generates concise, bullet-pointed documentation, aligning with developers’ preferences for brevity. DocAider’s multi-agent approach potentially enhances accuracy and reduces hallucinations compared to RepoAgent’s single-agent system. DocAider also implements a Reviewer and Revisor agent to suggest and apply improvements. A notable feature of DocAider is its HTML-based front-end interface, which improves documentation accessibility and organization – factors highly valued by developers.
While our system is well-designed and offers unique features like recursive updates, RepoAgent stands out by providing thorough I/O examples for every function. However, LLMs can make incorrect assumptions as function complexity increases, leading to factual inaccuracies or nonsensical outputs. To mitigate this, we restrict the LLM from making such assumptions, resulting in some functions/classes lacking I/O examples.

Quantitive Analysis

We conducted a quantitative analysis of DocAider’s performance across six popular GitHub repositories: collarama, fake-useragent, graphiz, photon, progress, and pywhat. These repositories were selected based on their popularity (over 1000 stars each) and size (small to medium, limited to 20 files per repository). All of them varied in the number of functions and classes. Scores are normalised between 0 and 1, reflecting the presence of these attributes in the documentation. For instance, a score of 1 for Function/Class Description indicates that every class and function in the repository is described in the documentation, while a score of 0.5 for I/O examples means that only half of the functions have I/O examples provided in the documentation.

DocAider achieved perfect scores (1.0) for function/class descriptions across all repositories, demonstrating consistent performance. For parameters/attributes, most repositories received perfect scores, with only collamara scoring slightly lower at 0.94 due to two functions lacking parameter documentation. I/O examples showed the most variation, with scores ranging from 0.54 (photon and progress) to 0.88 (collamara). Lower scores were often due to specific function types without return values (e.g., class init methods) or complex logic that made example generation challenging. Return value documentation was consistently strong, with all repositories scoring 1.0.

Overall, DocAider is proficient in many areas, such as generating function/class descriptions and handling most documentation aspects. However, there is room for improvement in consistently documenting I/O examples, particularly for functions with more complex logic.

Lessons Learned

The development of DocAider provided valuable insights across several key areas. Firstly, the adoption of a multi-agent approach proved crucial in managing system’s complexity. Initially, a single-agent design led to issues such as hallucinations and incomplete documentation. By transitioning to a multi-agent architecture, the team was able to distribute tasks across specialized agents, each handling specific aspects of the documentation process. This approach significantly improved the accuracy and reliability of the documentation while also enhancing system scalability. The success of this strategy highlighted the importance of modular design and task specialization in complex AI-driven systems. Secondly, prompt engineering emerged as a critical and unexpectedly challenging aspect of the project. The quality of generated documentation was heavily dependent on the prompts given to the Large Language Models (LLMs). Initial struggles with overly broad or contextually lacking prompts led to irrelevant or inaccurate outputs. Through iterative testing and refinement, the team developed more precise and context-aware prompts, significantly improving documentation quality. This experience underscored the complexity and importance of effective prompt engineering in applications requiring high accuracy and relevance. Lastly, the team learned the critical importance of managing dependency versions. An incident where a new version of Semantic Kernel (1.3.0) caused the software to crash in Docker due to API changes highlighted the need for version consistency across development and deployment environments. This experience emphasized the importance of carefully managing and aligning dependency versions to ensure system stability and functionality.

Team Contributions

Jakupov Dias (Team Leader): Team management, Stakeholder communication, development of Documentation Update, Recursive Update, Update on PR comment, prompt engineering.

Chengqi Ke: development of Retrieval Augmented Generation and multi-agent communication using Semantic Kernel and AutoGen.

Fatima Hussain: development of GitHub workflows, evaluation of DocAider’s performance and effectiveness.

Tanmay Thaware: development of Retrieval Augmented Generation, evaluation of DocAider’s performance.

Tomas Kopunec: development of Abstract Syntax Tree, Recursive Update and HTML front-end .

Zena Wang: development of GitHub workflows and handled deployment, packaging the tool into a Docker image.

Future Work

DocAider’s current implementation successfully automates code documentation, but there are several areas for future improvement. First, enhancing the tool’s ability to provide comprehensive and accurate I/O examples is a priority. This can be achieved by refining agent prompts and potentially integrating context-specific agents to better interpret complex functions. Second, future evaluations should extend to larger, more complex repositories to assess DocAider’s scalability and performance beyond small and medium sized projects. This expansion was previously limited by budget constraints. Third, while initial attempts to use Retrieval-Augmented Generation (RAG) didn’t significantly improve documentation quality due to limited contextual knowledge in the repository, future iterations could explore more effective RAG implementations. For instance, retrieving test cases from repositories could enhance the accuracy of I/O examples, and RAG could support the development of repository-specific chatbots to assist developers. Lastly, DocAider’s modular and loosely coupled multi-agent design allows for significant scalability potential. The system can easily integrate new features such as code validation, static analysis, or security evaluation without major architectural changes. This flexibility extends to adding or replacing agents, supporting various LLM models, and expanding language support beyond Python, all while maintaining core functionality.

Conclusion

DocAider successfully automated the creation and upkeep of accurate, up-to-date documentation, significantly reducing the manual workload for developers. By leveraging AI tools like Microsoft AutoGen, Semantic Kernel, and Azure AI Studio, the project addressed key challenges in maintaining consistent, real-time documentation.

While budget constraints, missing I/O examples, and the limitations of LLMs posed challenges, the project established a solid foundation for future improvements. Beyond solving the immediate need for documentation management, DocAider raised the bar for efficiency and accuracy in software development, showcasing the potential of AI-driven solutions for more advanced applications.

Call To Action

We invite you to explore DocAider further and consider how its innovative approach to documentation maintenance can be applied in your projects. Here are some steps you can take and explore the tools we used:

Connect with Us: Feel free to reach out to our team for more information or collaboration opportunities.
AutoGen: https://www.microsoft.com/en-us/research/project/autogen/
Semantic Kernel: https://learn.microsoft.com/en-us/semantic-kernel/overview/?tabs=Csharp

Special Thanks to Contributors

Each contributor’s continuous support and involvement all plays a crucial role in the success of the project, here, we present a special thanks to all following contributors.

Lee Stott, Principal Cloud Advocate Manager at Microsoft
Diego Colombo, Principal Software Engineer at Microsoft
Jens Krinke, Senior Lecturer and Academic Supervisor

Team

The team involved in developing this project included 6 members. All of us are Masters students at UCL studying Software Systems Engineering

Dias Jakupov – Team Leader – Full Stack Developer

GitHub URL: https://github.com/Dias2406/

LinkedIn URL: https://www.linkedin.com/in/dias-jakupov-a05258221/

Chengqi Ke – Full Stack Developer

GitHub URL: https://github.com/CQ-Ke/

LinkedIn URL: http://linkedin.com/in/chengqi-ke-9b91a8313/

Tomas Kopunec – Full Stack Developer

GitHub URL: https://github.com/TomasKopunec/

LinkedIn URL: https://www.linkedin.com/in/tomas-kopunec-425b0199/

Fatima Hussain – Full Stack Developer

GitHub URL: https://github.com/fatimahuss/

LinkedIn URL: http://linkedin.com/in/fatima-noor-hussain/

Tanmay Thaware – Full Stack Developer

GitHub URL: https://github.com/tanmaythaware/

LinkedIn URL: http://linkedin.com/in/tanmaythaware/

Zena Wang – Full Stack Developer

GitHub URL: https://github.com/ZenaWangqwq/

LinkedIn URL: https://www.linkedin.com/in/zena-wang-b63a8822b/

Microsoft Tech Community – Latest Blogs –Read More

Cart

Cart