Virtual Network Flow Logs Recipes
You might have heard about the General Availability of Virtual Network Flow Logs in Azure, and even read the announcement blog post. When writing that post with Harsha CS I had the chance to play a bit with VNet Flow Logs and Traffic Analytics, and I would like to share some of the learnings.
What the heck am I talking about? Let me bring you up to speed very quickly (attention, oversimplification ahead!): NSG Flow Logs is a technology that logs every packet going through an NSG: in and out, allowed and dropped. The main issue of NSG Flow Logs is, well, that you need an NSG, and some resources in Azure do not support them. For example, Azure Firewall, VPN gateways or ExpressRoute gateways. Enter VNet Flow Logs, which you can enable in a whole VNet or subnet, regardless of whether there are NSGs or not.
What is Traffic Analytics, I hear you say? VNet Flow Logs are stored in Azure Blob Storage. Optionally, you can enable Traffic Analytics, which will do two things: it will enrich the flow logs with additional information, and will send everything to a Log Analytics Workspace for easy querying. This “enrich and forward to Log Analytics” operation will happen in intervals, either every 10 minutes or every hour.
VNet Flow Logs give you much more, such as for example whether traffic is vnet-encrypted or not, whether traffic is dropped by AVNM security admin rules, and some more stuff, but you will find all that in the docs so I am not going to repeat it here.
Table structure: NTAIpDetails
This table will contain some enrichment data about public IP addresses, including whether they belong to Azure services and their region, and geolocation information for other public IPs. Here you can see a sample of what that table looks like:
For example, looking at the NTAIpDetails table you could extract information about what communications are there in place. The query is very simple (NTAIpDetails | distinct FlowType, PublicIpDetails, Location), and it gives you a glimpse of what you can do, especially when joining this table to NTANetAnalytics (see the scenarios further down for examples on how to join this table):
NTAIpDetails
| distinct FlowType, PublicIpDetails, Location
Table structure: NTATopologyDetails
This table contains information about different elements of your topology, including VNets, subnets, route tables, routes, NSGs, Application Gateways and much more. Here you cans see what it looks like:
For example, with a simple query you can get the routes in the route tables configured in a given resource group:
NTATopologyDetails
| where TimeGenerated > ago(600d)
| where AzureResourceType == “Route”
| extend name_a = split(Name, “/”)
| extend ResourceGroup = tostring(name_a[0]), RouteTableName = tostring(name_a[1]), RouteName = tostring(name_a[2])
| where ResourceGroup == “flowlogs”
| distinct ResourceGroup, RouteTableName, RouteName, NextHopType,NextHopIp
Yeah, there are many other ways of getting topology information in Azure, such as with Azure Resource Graph, but having this in a table is pretty handy to do join queries.
Table structure: NTANetAnalytics
Alright, now we are coming to more interesting things: this table is the one containing the flows we are looking for. Records in this table will contain the usual attributes you would expect such as source and destination IP, protocol, and destination port. Additionally, data will be enriched with information such as:
Source and destination VM
Source and destination NIC
Source and destination subnet
Source and destination load balancer
Flow encryption (yes/no)
Whether the flow is going over ExpressRoute
And many more
Further below you can read some scenarios with detailed queries that will show you some examples of ways you can extract information from VNet Flow Logs and Traffic Analytics. Of course, these are just some of the scenarios that came to mind on my topology, the idea is that you can get inspiration from these queries to support your individual use case.
Scenario 1: traffic to/from a virtual machine
For example, imagine you want to see with which IP addresses a given virtual machine has been talking to in the last few days:
NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == “10.1.1.8” and strlen(DestIp)>0
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by SrcIp, DestIp
What if we want to enrich this information with the NTAIpDetails table to get the geolocation of the public IP addresses? Let’s have a look:
NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == “10.1.1.8” and strlen(DestIp)>0
| join kind = leftouter (NTAIpDetails) on $left.DestIp == $right.Ip
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by SrcIp, DestIp, PublicIpDetails, Location
Mmmh, that packet host inc. looks suspicious, so now you might be interested in getting a time distribution on the protocols that have been going on there. Nothing easier! We can see here that there have been two big data transfers of around 1GB in two different days, the last one on 23rd April.
NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == “10.1.1.8” and DestIp == “136.144.58.113”
| join kind = leftouter (NTAIpDetails) on $left.DestIp == $right.Ip
| extend App = strcat(L4Protocol,tostring(DestPort))
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by App, bin(TimeGenerated, 1d)
| render columnchart
Mmmmh, somebody exfiltrating data?
Scenario 2: load balancer traffic distribution
You want to look into the traffic distribution of a given application front-ended by a load balancer? Two of the enrichment fields in the NTANetAnalytics table are SrcLoadBalancer and DestLoadBalancer, that we can leverage for this purpose. We will look at the source IP first, meaning traffic going from the VM to the load balancer:
NTANetAnalytics
| where SubType == ‘FlowLog’ and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains ‘web’ or DestLoadBalancer contains ‘web’
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(SrcIp)
| render piechart
Interesting, it looks like 10.1.1.70 is not getting too much traffic? Let’s look here to the time distribution as well, maybe that machine hasn’t been there for a long time:
NTANetAnalytics
| where SubType == ‘FlowLog’ and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains ‘web’ or DestLoadBalancer contains ‘web’
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(SrcIp), bin(TimeGenerated, 1d)
| render barchart
We can have a look at the Destination IP as well, which will show an interesting picture:
NTANetAnalytics
| where SubType == ‘FlowLog’ and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains ‘web’ or DestLoadBalancer contains ‘web’
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(DestIp), bin(TimeGenerated, 1d)
| render barchart
Wow, it looks like we need to have a look at these VMs, something seems not to be right with our load distribution!
Scenario 3: Traffic between IP ranges
Traffic Analytics will enrich the information with information such as source and destination subnets:
NTANetAnalytics
| where SubType == ‘FlowLog’ and FaSchemaVersion == ‘3’ and TimeGenerated > ago(60d)
| where isnotempty(SrcSubnet) and isnotempty(DestSubnet)
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcSubnet, DestSubnet,L4Protocol,DestPort
However, sometimes you want to do a different data aggregation, for example if you would like to see traffic between on-premises and Azure. In this case you can define the aggregation prefixes yourself and use the handy KQL function ipv4_is_in_range:
let prefix1=”10.1.1.0/27″;
let prefix2=”10.1.1.64/27″;
NTANetAnalytics
| where SubType == ‘FlowLog’ and FaSchemaVersion == ‘3’ and TimeGenerated > ago(30d)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcIp, DestIp,L4Protocol,DestPort,L7Protocol
You want this information but distributed across time, to have a look at traffic evolution? You got it. We can create a new field to aggregate the information about source, destination and protocol, and represent to a stacked chart:
let prefix1=”10.1.1.0/27″;
let prefix2=”10.1.1.64/27″;
NTANetAnalytics
| where SubType == ‘FlowLog’ and FaSchemaVersion == ‘3’ and TimeGenerated > ago(30d)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
//| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcIp, DestIp,L4Protocol,DestPort,L7Protocol
| extend FlowDescription = strcat(SrcIp, “-“, DestIp, “-“, L4Protocol, tostring(DestPort))
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by FlowDescription, bin(TimeGenerated, 1d)
| render columnchart
Interesting, a lot of traffic on port 80, maybe somebody should look at migrating to HTTPS?
Scenario 4: ExpressRoute traffic
You know that VNet Flow Logs are not tied to NSGs as we saw in the previous scenario about Azure Firewall. Guess what, this applies as well to VPN and ExpressRoute gateways. More concretely, with ExpressRoute we can even leverage the fields SrcExpressRouteCircuit and DestExpressRouteCircuit:
NTANetAnalytics
| where SubType == ‘FlowLog’ and TimeGenerated > ago(60d)
| where isnotnull(SrcExpressRouteCircuit) or isnotnull(DestExpressRouteCircuit)
| extend TargetResourceName = tostring(split(TargetResourceId, “/”)[2])
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by TargetResourceName, bin(TimeGenerated, 1d)
| render columnchart
We can use the previous recipe for prefix aggregation to show traffic from onprem to Azure and Azure to onprem:
let prefix1=”10.4.0.0/16″;
let prefix2=”10.1.0.0/16″;
NTANetAnalytics
| where SubType == ‘FlowLog’ and FaSchemaVersion == ‘3’ and FlowStartTime > ago(24h)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
| extend Direction = iff((SrcIpIsInPrefix1 and DestIpIsInPrefix2), “Onprem2Azure”, “Azure2Onprem”)
| summarize TotalBytesSrcToDest=sum(BytesSrcToDest), TotalBytesDestTosrc=sum(BytesDestToSrc) by Direction
| render columnchart
Wrapping up
These were only some examples of how you can slice and dice the data in VNet Flow Logs, please do not hesitate to let me know in the comments about other cool KQL queries you are using!
Microsoft Tech Community – Latest Blogs –Read More