Baseline Agentic AI Systems Architecture
Agentic AI Systems are designed to resolved complex problems with limited direct human supervision [1]. These systems are composed of multiple conversable agents that converse with each other and can be orchestrated centrally or self-organize in a decentralized manner [1, 2]. As the usage of multi-agents systems increases in the enterprise to automate complex processes or solve complex tasks, we would like to take a closer look at what the architecture of such systems could look like.
These agents possess capabilities such as planning, allowing them to predict future states and select optimal actions to achieve specific goals. They also incorporate memory, enabling them to recall past interactions, experiences, and knowledge, which is crucial for maintaining continuity in tasks and refining strategies. Additionally, agents can utilize various tools, including APIs and external software, to execute code, query databases, and interact with other systems [1, 3]. This tool usage extends their functionality and enables them to perform a wide range of actions.
Because agents can take actions, write, and execute code, there is a potential risk of running code that could be malicious or harmful to the host system or other users [3]. Therefore, understanding the architecture of these systems is crucial to sandboxing code execution, restricting or denying access to production data and services, and mitigating failures, vulnerabilities, and abuses.
Architecture
Components
Azure AI Studio [5] is a managed cloud service used to train, deploy, automate, and manage machine learning models, including large language models (LLM), small language models (SLM), and multi-modal models used by the agents. The platform provides a comprehensive suite of tools and services to facilitate the end-to-end machine learning lifecycle. Key features of the Azure AI Studio include:
Prompt Flow [6] is a development tool designed to streamline the entire development lifecycle of Generative AI applications. It supports creating, testing, and deploying prompt flows, which can be used to generate responses or actions based on given prompts. These prompt flows can be deployed to a Machine Learning Workspace or containerized and deployed to Azure Container Apps or Azure Kubernetes Services [7]. AI Studio can also be used to develop and deploy these prompt flows.
Managed Online Endpoints are used by agents and backend services to invoke prompt flows for real-time inference. They provide scalable, reliable, and secure endpoints for deploying machine learning models, enabling real-time decision-making and interactions [7].
Azure AI Dependencies include essential Azure services and resources that support the functioning of AI Studio and associated projects [8]:
Azure Storage Account stores artifacts for projects, such as prompt flows and evaluation data. It is primarily used by the AI Studio to manage data and model assets.
Azure AI Search, a comprehensive cloud search service that supports full-text search, semantic search, vector search, and hybrid search. It provides search capabilities for AI projects and agents and is essential for implementing the Retrieval-Augmented Generation (RAG) pattern. This pattern involves extracting relevant queries from a prompt, querying the AI Search service, and using the results to generate a response using an LLM or SLM model.
Azure Key Vault used for securely storing and managing secrets, keys, and certificates required by agents, AI projects, and backend services.
Azure Container Registry stores and manages container images of agents, backend APIs, orchestrators, and other components. It also stores images created when using a custom runtime for prompt flows.
Azure OpenAI service enables natural language processing tasks like text generation, summarization, and conversation.
Azure AI Services offers APIs for vision, speech, language, and decision-making, including custom models.
Document Intelligence extracts data from documents for intelligent processing.
Azure Speech converts speech to text and vice versa, with translation capabilities.
These components and services provided by the Azure AI Studio enable seamless integration, deployment, and management of sophisticated AI solutions, facilitating the development and operation of Agentic AI Systems.
Azure Cosmos DB is well suited for Agentic AI Systems and AI agent [9]. It can provide “session” memory with the message history for conversable agents (e.g. ConversableAgent.chat_messages in Autogen [9, 10]). It can also be used for LLM caching [9, 11]. Finally it could be used as a vector database [9, 12].
Azure Cache for Redis is an in-memory store that can be used to store short term memory for agents and LLM caching like for Autogen [11, 13]. It could also be used by backend services to improve performance and as a session store [13].
Azure Container Apps is a serverless platform designed focus on containerized applications and less on the infrastructure [22]. It is well suited for Agentic AI Systems. Agents, orchestrator, prompt flows and backend API can all be deployed as Container Apps. It provides a reliable solution for your agents, orchestrator, prompt flows and backend API. They can scaled automatically regarding the load. Container Apps also provide Dapr integration that helps you implement simple, portable, resilient and secure microservices and agents [23].
For asynchronous between agents and between agents and an orchestrator, we propose to use Azure Service Bus. It is a fully managed enterprise message broker with message queues and publish-subscribe topics [24]. It provides a decoupled communication between agents and between agents and an orchestrator. Dapr can be used to communicate with Azure Service Bus [24]. Dapr provides resiliency policies for communication with Azure Service Bus (preview) [25].
For synchronous communication between agents and between agents and an orchestrator, you can Dapr service-to-service invocation. It is a simple way to call another service (agent or orchestrator) directly with authomatic mTLS authentication and encryption and using service discovery [24]. Dapr also provides resiliency for calling services but it cannot be applied to requests made using the Dapr Service Invocation API [26].
An Azure Kubernetes Services (AKS) architecture is provided below. You can deploy Dapr on Azure Kubernetes Services or use service meshes for direct communication between agents and between agents and an orchestrator. Azure Kubernetes Services provides also a reliable solution for your agents, orchestrator, prompt flows and backend API.
Conclusion
Agentic AI Systems represent a significant advancement in artificial intelligence, providing autonomous decision-making and problem-solving capabilities with minimal human intervention. By leveraging conversable agents with planning, memory, and tool usage capabilities, these systems can address complex enterprise challenges. The proposed architecture, utilizing Azure’s suite of services—including Azure OpenAI, AI Studio, Azure API Management, Container Apps and many others —provides a robust foundation for deploying these intelligent systems. Ensuring the safety, reliability, and ethical operation of such systems is critical, particularly in managing code execution and data security. As the field evolves, continuous refinement of these architectures and practices will be essential to maximize the benefits and minimize risks associated with Agentic AI.
References
Appendix
Thanks
Special thanks to our colleagues for their feedback on the architecture:
Anurag Karuparti
Freddy Ayala
Hitasi Patel
Joji Varghese
Paulrick Garraway
Sam El-Anis
Srikanth Bhakthan
Zouhair Ramram
Microsoft Tech Community – Latest Blogs –Read More