Simplifying Azure Diagnostics with Category Groups and the New Built-In Policies
Heinrich and Luke (@Luke_Alderman) here to show you the exiting capabilities enabled by v2 of the unified diagnostics settings Policies. This enables managing diagnostics settings at enterprise scale
History
As many of you know, deploying diagnostics settings at scale was difficult. Jim Britt wrote a widely used script to automate the generation of Policies. I personally have written ~100 Policy definitions for our customers. Luke and others realized this and introduced the concept of categoryGroups “audit” and “allLogs” to diagnostics settings enabling a uniform approach to diagnostics settings. Subsequently Heinrich got involved working with Luke on scripting the generation of Policy definitions and Initiative definitions. In v1 33 resource providers supported categoryGroups, and we generated 99 built-in Policies (for Log Analytics, Storage and Event Hubs) and 3 Initiative definitions supporting the “audit” categoryGroup for the same log destinations.
V2 has expanded this to 140 resource providers supporting “allLogs”; 69 of them also support the “audit” categoryGroup. Luke and multiple people in the Policy teams generated 420 (140 RPs x 3 destinations) Policy definitions and 6 Initiatives covering the 3 destinations for “audit” and “allLogs respectively: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostics-settings-policies-deployifnotexists?tabs=portal#assign-initiatives.
Enough history: let’s talk about how you use it in your organization. We are not covering the mechanics here; the documentation covers that. Instead, we’ll cover variations to deployment.
Choosing your Approach
Logging everything allows to have every minutia available from analysis. However, this will generate huge volumes of data and will likely be very expensive.
Collect “audit” logs only should yield a good balance between the information collected and cost. You may miss data from the 71 resource providers that only support “allLogs”.
Combination of the above, such as, collecting “audit” from the 69 resource providers and selecting a subset of the 71 resource providers to collect “allLogs”.
Augment c) with custom Policies. For example: Storage accounts collect logs only for delete actions and not for read or write actions)
Logging “audit” categoryGroup
This is the simplest case, just assign one or more of the 3 “audit” Initiatives at the right scope and (maybe) based on region(s), see region topic below). The default resource type list will enable this logging for the 69 resource providers supporting “audit”. See https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostics-settings-policies-deployifnotexists?tabs=portal#assign-initiatives.
Selectively logging “allLogs” categoryGroup
If you want to collect every possible log, just assign one or more of the 3 “allLogs” Initiatives, the same way as the “audit” Initiatives since the default resource type list will log the 140 resource providers. Note: this would log some of the logs for the 69 providers above twice if you enabled them. More likely, you will either log the 71 resource types not included in “audit” or a subset of the 71 based on your specific needs.
The parameter “resourceTypeList” as the name implies, applies the diagnostics settings only for the resource providers selected in the array.
Azure Regions and Diagnostics Settings
Event Hub Namespaces and Storage Accounts
Azure Monitor requires the resource generating the logs and the Event Hub Namespace or Storage account to be in the same Azure region. The 4 Initiatives targeting those destinations have a “resourceLocation” parameter. Therefore, you must assign the Initiative(s) once per scope and per region used in your tenant. We recommend that you restrict regions with Azure Policy to prevent creation of resources in regions NOT covered by these Policy assignments. The built-in Policies to use are:
Allowed locations (e56962a6-4747-49cd-b67b-bf8b01975c4c)
Allowed locations for resource groups (e765b5de-1225-4ba3-bd56-1ac6695af988)
Azure Cosmos DB allowed locations (0473574d-2d43-4217-aefe-941fcdf7e684)
Log Analytics Workspaces (LAWS)
LAWS do not require an Azure region affinity. You can use a single LWAS for all regions within an Azure cloud. This works great if you use a limited number of regions in the same geographic area, such as US, North America, Europe, etc. It is not a great idea to ship logs from Australia to a US region. In addition, data sovereignty laws in many countries require the personal data for their residents must stay in Azure region within that country (Switzerland and others) or block of countries (EU). The two Initiatives have a parameter “resourceLocationList” allowing you to use multiple LAWS each in one of the listed regions.
Resource Providers and Event Hub Association
The two Initiatives addressing the Event Hub destination, have a parameter for the EventHub namespace called “eventHubAuthorizationRuleId”. Sidebar: Yes, the nomenclature for Event Hubs can be confusing. In addition, you may specify the Event Hub name with the parameter “eventHubName”. It defaults to “Monitoring”. A maximum of 10 Event Hubs can be created.
It is common to direct different types of resource providers to specific Event Hubs within an Event Hub Namespace. You can achieve this by creating one assignment per scope, per region and per Event Hub name. Use the parameter “resourceTypeList” in each assignment to select which logs go to the specified Event Hub.
Policy as Code, aka No Click-Ops
Most of you are using an Everything as Code (EaC) approach, covering actual code, Infrastructure as Code and Documentation as Code (via markdown). Policy is part of Infrastructure as Code and this “special” area is often called Policy as Code (PaC). Never use the Portal (Click-Ops) to manage Policy beyond exploration.
Heinrich is a co-maintainer of Enterprise Policy as Code (EPAC) and recommends using it for Policy management at scale: https://aka.ms/epac. However, if you don’t want to use EPAC, use another IaC tool to implement Policy as Code.
Disclaimer
The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
Microsoft Tech Community – Latest Blogs –Read More