Getting started with Private Clusters on HDInsight on AKS for securing your analytics workloads
HDInsight on AKS is a managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark™, Apache Flink:registered:, and Trino without the overhead of managing and monitoring containers.
HDInsight on AKS clusters allow you to setup outbound network connections from cluster to any destination, if the destination is reachable from the node’s network interface. This means that cluster resources can access any public or private IP address, domain name, or URL on the internet or on your virtual network.
However, in some scenarios, you may want to control or restrict the egress traffic from your cluster for security, compliance reasons. For example, you may want to:
Prevent clusters from accessing malicious or unwanted services.
Enforce network policies or firewall rules on the outbound traffic.
Monitor or audit the egress traffic from cluster for troubleshooting or compliance purposes.
There are different methods for managing the traffic flow. You can learn more about it here.
In this blog, we will discuss about how to control or restrict the egress traffic from your HDInsight on AKS cluster using User Defined Routing (UDR) in your virtual network.
With this setup, there won’t be any Public IP created when you spin up an HDInsight on AKS cluster.
Note: UDR setup requires you to setup firewall rules and define the routing using custom VNet and subnet before creating an HDInsight on AKS cluster
Let’s get started.
Step 1: Setup the virtual network (VNet). Required if you don’t have existing VNet
From the Azure portal, search for virtual networks and click to create new.
Create a VNet named “contoso-hdi-vnet”.
Step 2: Setup the firewall. Deploy the firewall in your virtual network (contoso-hdi-vnet).
To deploy a firewall into the integrated virtual network, you need a subnet called AzureFirewallSubnet
Navigate to your VNet (contoso-hdi-vnet) and go to subnets
Add subnet with subnet purpose as “Azure Firewall”
Now, go to Firewall tab and click to add a new Firewall
2. Create a firewall named “contoso-hdi-firewall” with the following details
Setting
Value
Resource group
Same resource group as the integrated virtual network.
Name
“contoso-hdi-firewall” or Name of your choice
Region
Same region as the integrated virtual network.
Firewall policy
Create one by selecting Add new.
Virtual network
Select the integrated virtual network.
Public IP address
Select an existing address or create one by selecting Add new.
3. Once deployment is complete, go to Overview page of newly created firewall, copy private IP address. The private IP address will be used as next hop address in the routing rule for the virtual network.
Step 3: Create a Route table and associate it with your virtual network to route all traffic to the firewall
When you create a virtual network, Azure automatically creates a default route table for each of its subnets and adds system default routes to the table. In this step, you create a user-defined route table that routes all traffic to the firewall, and then associate it with the App Service subnet in the integrated virtual network.
From the Azure portal, search for “Route tables” and select Route tables resource
Create a route table with name “contoso-hdi-route-table”.
Note: Region should be same as Firewall region For e.g. “East US 2” in this case
Go to the newly create route table and add a route with the following details
Setting
Value
Destination Type
IP Addresses
Destination IP addresses/CIDR ranges
0.0.0.0/0
Next hop type
Virtual appliance
Next hop address
The private IP address for the firewall that you copied
Go to subnets and associate the subnet you want to use during HDInsight on AKS cluster setup. Here, “default” subnet is used.
Step 4: Configure Firewall policies
Navigate to the firewall’s overview page and select its firewall policy.
Add network rules (defined here) with the subnet (To be used for setting up HDInsight on AKS cluster) as the source address
Add application rules (defined here) with the subnet (To be used for setting up HDInsight on AKS cluster) as the source address
Depending on the cluster type (Spark, Flink, Trino), you need to add additional network and application rules defined here.
Step 5: Setup HDInsight on AKS cluster pool
From the Azure portal, search “HDInsight on AKS clusters pool” and create a new HDInsight on AKS cluster pool
Under Security + network settings, choose the virtual network (contoso-hdi-vnet), Subnet (default) and Egress path (Outbound with userDefinedRouting)
Once cluster pool is created, verify that no public IP is created. Search for MC_hdi-<clusterpool deployment id> resource group
Step 6: Add AKS API Server Address to the network rules in the firewall policy
From the Azure portal, search for cluster pool name (contoso-hdi-udr-pool) and go the corresponding kubernetes resource.
From the overview tab, copy the API server address
Navigate to the contoso-hdi-firewall-policy and enable DNS proxy from DNS tab
Go to Network rules tab and add a new rule
Step 7: Assign the AKS cluster – that matches the cluster pool – Network Contributor role on your network resources that are used for defining the routing, such as Virtual Network, Route table, and NSG (if used).
Navigate to your VNet (contoso-hdi-vnet), go to Access control and click on “Add role assignment
Select “Network Contributor” role and member as “Managed identity”. In Managed identity option, select Kubernetes services and select your cluster pool name
Click Review+ create to complete the role assignment
Step 8: Create HDInsight on AKS cluster
From Azure portal, search for “HDInsight on AKS cluster” service or click + New cluster from the overview tab of the cluster pool
Select the cluster pool (contoso-hdi-udr-pool), cluster type (Trino) and click “Review + create”
Step 9: Access the cluster via a client such as virtual machine (VM)
Create a windows virtual machine and copy the public IP of the VM
Navigate to the route table (contoso-hdi-route-table) and add VM IP to the route table
Remote login to the VM and you can access the cluster web urls
With Private AKS clusters, and outbound UDR setup, enterprise customers can ensure that their sensitive data is protected from unauthorized access, and theft. They can continue to implement a range of security measures to protect their data driven applications. With the ability to perform regular security updates and patches to keep their systems up-to-date and secure with In-place upgrades.
With all of this available, enterprise customers can now comply with industry-specific regulations related to data privacy, security, and compliance and reduce the risk of data breaches and other security incidents.
Microsoft Tech Community – Latest Blogs –Read More