Automating Azure Monitor VM Insights Deployment with Python and Azure Functions
Real-time deployment of VM Insights
In today’s fast-paced digital landscape, the need for automatic and scalable solutions for real-time deployment has never been greater. As many organizations rely on virtual machines as an underlying compute layer, implementing an effective observability strategy becomes essential to ensure optimal performance and reliability. With a solid observability framework, businesses can gain critical insights into their infrastructure, enabling them to identify and address issues before they impact service delivery. In other words, it’s important for customers to adopt a more proactive approach rather than a reactive one.
Azure Monitor VM Insights offer powerful capabilities for achieving this level of visibility. The easy enablement of VM Insights empowers organizations to monitor their VMs in real-time, providing actionable data and analytics that drive informed decision-making. By leveraging these tools, businesses can enhance their observability strategy, ensuring that their infrastructure remains resilient, efficient, and responsive to evolving demands.
source: Chart performance with VM insights – Azure Monitor | Microsoft Learn
Table of contents:
Introduction to Azure Monitor
Practical implementation of the solution
How to get started
Security considerations
Limitations & Future Enhancements
Resources
1. Introduction to Azure Monitor
Azure Monitor is a comprehensive solution for collecting and analyzing monitoring data from cloud and on-premises environments, enabling full stack observability and enhancing availability and performance. It aggregates data from various layers across multiple Azure and non-Azure environments, storing it in Azure for central analysis and visualization. For analysis, tools like Azure Monitor Metrics (for real-time metrics) and Log Analytics Workspace (for querying and analyzing logs) can be leveraged.
1.1 VM Insights
In the remainder of the article, the focus will be on Virtual Machine monitoring using VM Insights. The latter is a powerful feature within Azure Monitor that provides deep visibility into the performance and health of your virtual machines. It provides an easy and efficient way to begin monitoring client workloads on your virtual machines and virtual machine scale sets. VM Insights supports Windows and Linux operating systems on:
Azure Virtual Machines.
Azure Virtual Machine Scale Sets.
Hybrid Virtual Machines connected with Azure Arc.
On-premises Virtual Machines.
Virtual Machines hosted in another cloud environment.
VM Insights offers you a set of predefined workbooks and curated visualizations that will allow you to do performance monitoring as well as analyze dependencies using the Map feature giving you a better understanding of the application components in your VM.
To enable VM Insights, please refer to the following documentation: Enable VM Insights overview – Azure Monitor | Microsoft Learn.
1.2 Azure Monitor Agent
As explained in the documentation, VM Insights is using an two different agents in the background, the Azure Monitor Agent and the Dependency Agent. The former will collect data from the machine and store it in a Log Analytics Workspace in Azure. The dependency agent, on the other hand, relies on the Azure Monitor Agent and captures data about processes running on the virtual machine and external process dependencies.
The data captured by the Azure Monitor Agent is being used by Performance Dashboards, e.g., CPU Utilization and Memory Usage. Data captured by the Dependency Agent is used by the Map feature in VM Insights.
1.3 Data Collection Rules
After the Agents are being installed on the machine, the next step is to deploy the Data Collection Rules. The Data Collection Rules are used by the Azure Monitor agent to specify which data to collect and how it should be processed. To make this connection, you would need to associate the machine with the Azure Monitor Agent running with the Data Collection Rule, as shown in the picture below.
Source: Data collection rules in Azure Monitor – Azure Monitor | Microsoft Learn
2. Practical implementation of the solution
The goal of the solution is to automate the enablement of VM Insights on virtual machines. For this, multiple tools are being used:
Azure Functions using Event Grid Trigger
Event Grid System Topic
Python scripting
Visual Studio Code for development and local testing
The architecture of the solution looks as follows:
The main component of this architecture is the Python script, which will execute the following high-level steps:
Authentication to Azure using a Service Principal
Assess environment and retrieve machines
Enable system-assigned managed identity on the discovered machines
Deploy Azure Monitor Agent & optionally the Dependency Agent
Associate the machine with the VM Insights Data Collection Rule
Next, to fully automate the process, the above steps will be applied automatically every time a new machine is deployed or enabled via Arc, eliminating the need for manual intervention. To do this, the Python script is deployed to an Azure Function app using an Event Grid Trigger.
Azure Functions is linked to an Event Grid System Topic that subscribes to virtual machine creation events at the subscription level. This means that every time a new virtual machine is deployed or enabled via Arc, an event is generated and captured by the Event Grid System Topic. This event then triggers the Azure Functions to execute the Python script.
More information about the tools can be found here:
Azure Functions Overview | Microsoft Learn
Azure Event Grid bindings for Azure Functions | Microsoft Learn
Create a Python function using Visual Studio Code – Azure Functions | Microsoft Learn
System topics in Azure Event Grid – Azure Event Grid | Microsoft Learn
Apps & service principals in Microsoft Entra ID – Microsoft identity platform | Microsoft Learn
3. How to get started
If you would like to test the solution in your own environment, please refer to the step-by-step tutorial available in this GitHub repository: claestom/AMA-deployment—DCR-association–Linux-Windows- (github.com).
The repository provides detailed guidance to:
Prerequisites
Configure your local environment
Create the Data Collection Rule
Configure the required permissions
Local testing of the script before deploying to Azure
Configure your environment in Azure
Azure Functions
Event Grid System Topic
Deployment of the script to Azure
import os
from dotenv import load_dotenv
from azure.identity import ClientSecretCredential
from azure.mgmt.subscription import SubscriptionClient
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import DataCollectionRuleAssociationProxyOnlyResource
from azure.mgmt.resource import ResourceManagementClient
from azure.core.exceptions import HttpResponseError
from azure.mgmt.compute.models import VirtualMachineIdentity, ResourceIdentityType
load_dotenv()
# Retrieve the credentials from environment variables
TENANT_ID = os.getenv(“TENANT_ID”)
CLIENT_ID = os.getenv(“CLIENT_ID”)
CLIENT_SECRET = os.getenv(“CLIENT_SECRET”)
DATA_COLLECTION_RULE_ID = os.getenv(“DATA_COLLECTION_RULE_ID”)
# Tag key/value pairs
VM_TAG = [“”, “”]
SUBSCRIPTION_TAG = [“”, “”]
# Dependency agent installation
DEP_AGENT = True
def enable_system_assigned_identity(resource_group_name, vm_name, subscription_id, credential):
compute_client = ComputeManagementClient(credential, subscription_id)
vm = compute_client.virtual_machines.get(resource_group_name, vm_name)
if vm.identity and vm.identity.type == ResourceIdentityType.system_assigned:
print(f”VM: {vm_name} already has a system-assigned managed identity enabled. Proceeding with the script.”)
else:
print(f”Enabling system-assigned managed identity for VM: {vm_name}”)
vm.identity = VirtualMachineIdentity(type=ResourceIdentityType.system_assigned)
async_vm_update = compute_client.virtual_machines.begin_create_or_update(resource_group_name, vm_name, vm)
async_vm_update.result()
print(f”System-assigned managed identity enabled for VM: {vm_name}”)
def check_tag_subscription(subscription_id, credential):
resource_client = ResourceManagementClient(credential, subscription_id)
subscription_tags = resource_client.tags.get_at_scope(f”/subscriptions/{subscription_id}”)
tags = subscription_tags.properties.tags
if tags.get(SUBSCRIPTION_TAG[0]) == SUBSCRIPTION_TAG[1] or all(element == “” for element in SUBSCRIPTION_TAG):
print(f”Subscription {subscription_id} has the required tags. Proceeding.”)
return True
else:
print(f”Subscription {subscription_id} does not have the required tags. Skipping”)
return False
def install_ama_extension(compute_client, extension_name, vm, vm_name, resource_group):
extension_parameters = {
“location”: vm.location,
“publisher”: “Microsoft.Azure.Monitor”,
“type”: extension_name,
“type_handler_version”: “1.10”,
“auto_upgrade_minor_version”: True,
“settings”: {}
}
extensions_result = compute_client.virtual_machine_extensions.list(resource_group, vm_name)
extensions = extensions_result.value # Access the list of extensions
if not extensions or all(extension.name != extension_name for extension in extensions):
print(f”No {extension_name} extension found on VM {vm_name}. Proceeding with installation.”)
try:
compute_client.virtual_machine_extensions.begin_create_or_update(
resource_group_name=resource_group,
vm_name=vm_name,
vm_extension_name=extension_name,
extension_parameters=extension_parameters
).result()
print(f”{extension_name} installed on VM {vm_name}.”)
except HttpResponseError as e:
print(f”Failed to install {extension_name} on VM {vm_name}. Error: {e}. Potential issue with the VM’s OS.”)
else:
print(f”{extension_name} already installed on VM {vm_name}.”)
def install_map_extension(compute_client, extension_name, vm, vm_name, resource_group):
extension_parameters = {
“apiVersion” : “2015-01-01”,
“location”: vm.location,
“publisher”: “Microsoft.Azure.Monitoring.DependencyAgent”,
“type”: extension_name,
“type_handler_version”: “9.10”,
“auto_upgrade_minor_version”: True,
“settings”: {“enableAMA”: “true”}
}
extensions_result = compute_client.virtual_machine_extensions.list(resource_group, vm_name)
extensions = extensions_result.value # Access the list of extensions
if not extensions or all(extension.name != extension_name for extension in extensions):
print(f”No {extension_name} found on VM {vm_name}. Proceeding with installation.”)
try:
compute_client.virtual_machine_extensions.begin_create_or_update(
resource_group_name=resource_group,
vm_name=vm_name,
vm_extension_name=extension_name,
extension_parameters=extension_parameters
).result()
print(f”{extension_name} installed on VM {vm_name}.”)
except HttpResponseError as e:
print(f”Failed to install {extension_name} on VM {vm_name}. Error: {e}”)
else:
print(f”{extension_name} already installed on VM {vm_name}.”)
def associate_data_collection_rule(monitor_client, vm, vm_name):
association_parameters = DataCollectionRuleAssociationProxyOnlyResource(
data_collection_rule_id=DATA_COLLECTION_RULE_ID,
description=”Data Collection Rule Association”
)
try:
monitor_client.data_collection_rule_associations.create(
resource_uri=vm.id,
association_name=vm_name,
body=association_parameters
)
print(f”VM {vm_name} associated with Data Collection Rule.”)
except HttpResponseError as e:
print(f”Failed to associate VM {vm_name} with Data Collection Rule. Error: {e}”)
def process_vm(vm, compute_client, monitor_client, subscription_id, credential):
vm_name = vm.name
resource_group = vm.id.split(“/”)[4]
instance_view = compute_client.virtual_machines.instance_view(resource_group, vm_name)
is_running = any(status.code == ‘PowerState/running’ for status in instance_view.statuses)
if not is_running:
print(f”VM {vm_name} is not running. Skipping.”)
return
print(f”VM {vm_name} is running. Proceeding with installation of Azure Monitor agent.”)
tags = vm.tags
os_profile = vm.os_profile
if tags and tags.get(VM_TAG[0]) == VM_TAG[1] or all(element == “” for element in VM_TAG):
enable_system_assigned_identity(resource_group, vm.name, subscription_id, credential)
if os_profile.windows_configuration and DEP_AGENT:
install_ama_extension(compute_client, “AzureMonitorWindowsAgent”, vm, vm_name, resource_group)
install_map_extension(compute_client, “DependencyAgentWindows”, vm, vm_name, resource_group)
elif os_profile.windows_configuration and not DEP_AGENT:
install_ama_extension(compute_client, “AzureMonitorWindowsAgent”, vm, vm_name, resource_group)
elif os_profile.linux_configuration and DEP_AGENT:
install_ama_extension(compute_client, “AzureMonitorLinuxAgent”, vm, vm_name, resource_group)
install_map_extension(compute_client, “DependencyAgentLinux”, vm, vm_name, resource_group)
elif os_profile.linux_configuration and not DEP_AGENT:
install_ama_extension(compute_client, “AzureMonitorLinuxAgent”, vm, vm_name, resource_group)
else:
print(f”VM {vm_name} has an unsupported OS. Skipping.”)
return
associate_data_collection_rule(monitor_client, vm, vm_name)
else:
print(f”VM {vm_name} does not have the required tags. Skipping.”)
def process_subscription(subscription, credential):
subscription_id = subscription.subscription_id
print(f”Processing subscription: {subscription_id}”)
compute_client = ComputeManagementClient(credential, subscription_id)
monitor_client = MonitorManagementClient(credential, subscription_id)
for vm in compute_client.virtual_machines.list_all():
process_vm(vm, compute_client, monitor_client, subscription_id, credential)
def main():
# Authenticate using the service principal
credential = ClientSecretCredential(tenant_id=TENANT_ID, client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
# Get a list of subscriptions
subscription_client = SubscriptionClient(credential)
for subscription in subscription_client.subscriptions.list():
if check_tag_subscription(subscription.subscription_id, credential) or all(element == “” for element in SUBSCRIPTION_TAG):
process_subscription(subscription, credential)
if __name__ == “__main__”:
main()
4. Security considerations
This solution relies on client secrets for authentication via the Service Principal and App Registration. To uphold robust security, our system enforces the expiration and periodic renewal of these critical credentials, mitigating potential vulnerabilities. It is essential to maintain vigilant monitoring to proactively detect and address any issues stemming from expired secrets, thereby avoiding disruptions to the solution’s functionality.
To achieve this, there are several options using Azure services to regularly check the expiration of the credentials:
Renew expiring service principal credentials recommendation – Microsoft Entra ID | Microsoft Learn
Use Azure Logic Apps to Notify of Pending AAD Application Client Secrets and Certificate Expirations – Microsoft Community Hub
In the solution as well, the System-Assigned Managed Identity is being enabled on the machines to make sure the data gets ingested into the VM Insights dashboards and dependency map. This is fine for test and development environments, however, when moving into production, it’s recommended to switch to User-Assigned Managed Identity as illustrated in following article: Best practice recommendations for managed system identities – Managed identities for Azure resources | Microsoft Learn.
5. Limitations & Future Enhancements
This section will cover the current limitations of the solution, as well as the features in development.
First, regarding compliance checks of the environment: goal is to automatically receive a compliance report. This includes information such as the percentage of non-compliant virtual machines and a list of those machines.
To improve the script’s performance and speed, an asynchronous check for the discovered virtual machines will be added. Currently, each subscription and virtual machine is processed synchronously.
Another limitation is in the filtering capabilities. At present, users can only filter by subscription and resource level using tagging. In the future, it would be valuable to introduce filtering mechanisms based on workload characteristics, such as identifying idle or unused machines.
Lastly, the solution is limited to Azure Virtual Machines and Azure Arc-enabled Virtual Machines, meaning only Virtual Machines recognized in the Azure portal. We are working on extending this to non-Azure Arc Virtual Machines. The approach would be like the current flow, but instead of using the Event Grid Trigger within Azure to activate Azure Functions, we could use an HTTP trigger. This would allow Azure Functions to be triggered from outside Azure via HTTP requests. However, some code modifications are still needed.
If you have any new feature ideas or remarks, feel free to raise issues or submit pull requests (PRs) in the GitHub repository!
6. Resources
claestom/AMA-deployment—DCR-association–Linux-Windows- (github.com)
Use the Azure libraries (SDK) for Python – Python on Azure | Microsoft Learn
Azure Monitor Agent overview – Azure Monitor | Microsoft Learn
Data collection rules in Azure Monitor – Azure Monitor | Microsoft Learn
What is VM insights? – Azure Monitor | Microsoft Learn
Microsoft Tech Community – Latest Blogs –Read More