Category: Microsoft
Category Archives: Microsoft
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Announcing Face API Liveness Pricing
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Running GPU accelerated workloads with NVIDIA GPU Operator on AKS
Dr. Wolfgang De Salvador – EMEA GBB HPC/AI Infrastructure Senior Specialist
Dr. Kai Neuffer – Principal Program Manager, Industry and Partner Sales – Energy Industry
Resources and references used in this article:
About the NVIDIA GPU Operator — NVIDIA GPU Operator 23.9.1 documentation
Use GPUs on Azure Kubernetes Service (AKS) – Azure Kubernetes Service | Microsoft Learn
Create a multi-instance GPU node pool in Azure Kubernetes Service (AKS) – Azure Kubernetes Service | Microsoft Learn
As of today, several options are available to run GPU accelerated HPC/AI workloads on Azure, ranging from training to inferencing.
Looking specifically at AI workloads, the most direct and managed way to access GPU resources and related orchestration capabilities for training is represented by Azure Machine Learning distributed training capabilities as well as the related deployment for inferencing options.
At the same time, specific HPC/AI workloads require a high degree of customization and granular control over the compute-resources configuration, including the operating system, the system packages, the HPC/AI software stack and the drivers. This is the case, for example, described in previous blog posts by our benchmarking team for training of NVIDIA NeMo Megatron model or for MLPerf Training v3.0
In these types of scenarios, it is critical to have the possibility to fine tune the configuration of the host at the operating system level, to precisely match the ideal configuration for getting the most value out of the compute resources.
On Azure, HPC/AI workload orchestration on GPUs is supported on several Azure services, including Azure CycleCloud, Azure Batch and Azure Kuberenetes Services
Focus of the blog post
The focus of this article will be on getting NVIDIA GPUs managed and configured in the best way on Azure Kuberentes Services using NVIDIA GPU Operator.
The guide will be based on the documentation already available in Azure Learn for configuring GPU nodes or multi-instance GPU profile nodes, as well as on the NVIDIA GPU Operator documentation.
However, the main scope of the article is to present a methodology to manage totally the GPU configuration leveraging on NVIDIA GPU Operator native features, including:
Driver versions and customer drivers bundles
Time-slicing for GPU oversubscription
MIG profiles for supported-GPUs, without the need of defining exclusively the behavior at node pool creation time
Deploying a vanilla AKS cluster
The standard way of deploying a Vanilla AKS cluster is to follow the standard procedure described in Azure documentation.
Please be aware that this command will create an AKS cluster with:
Kubenet as Network CNI
AKS cluster will have a public endpoint
Local accounts with Kubernetes RBAC
In general, we strongly recommend for production workloads to look the main security concepts for AKS cluster.
Use Azure CNI
Evaluate using Private AKS Cluster to limit API exposure to the public internet
Evaluate using Azure RBAC with Entra ID accounts or Kubernetes RBAC with Entra ID accounts
This will be out of scope for the present demo, but please be aware that this cluster is meant for NVIDIA GPU Operator demo purposes only.
Using Azure CLI we can create an AKS cluster with this procedure (replace the values between arrows with your preferred values):
export RESOURCE_GROUP_NAME=<YOUR_RG_NAME>
export AKS_CLUSTER_NAME=<YOUR_AKS_CLUSTER_NAME>
export LOCATION=<YOUR_LOCATION>
## Following line to be used only if Resource Group is not available
az create group –resource-group $RESOURCE_GROUP_NAME –location $LOCATION
az aks create –resource-group $RESOURCE_GROUP_NAME –name $AKS_CLUSTER_NAME –node-count 2 –generate-ssh-keys
Connecting to the cluster
To connect to the AKS cluster, several ways are documented in Azure documentation.
Our favorite approach is using a Linux Ubuntu VM with Azure CLI installed.
This would allow us to run (be aware that in the login command you may be required to use –tenant <TENTANT_ID> in case you have access to multiple tenants or –identity if the VM is on Azure and you rely on an Azure Managed Identity) in case:
## Add –tenant <TENANT_ID> in case of multiple tenants
## Add –identity in case of using a managed identity on the VM
az login
az aks install-cli
az aks get-credentials –resource-group $RESOURCE_GROUP_NAME –name $AKS_CLUSTER_NAME
After this is completed, you should be able to perform standard kubectl commands like:
kubectl get nodes
root@aks-gpu-playground-rg-jumpbox:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-nodepool1-25743550-vmss000000 Ready agent 2d19h v1.27.7
aks-nodepool1-25743550-vmss000001 Ready agent 2d19h v1.27.7
Command line will be perfectly fine for all the operations in the blog post. However, if you would like to have a TUI experience, we suggest to use k9s, which can be easily installed on Linux following the installation instructions. For Ubuntu, you can install current version at the time of Blog post creation with:
wget “https://github.com/derailed/k9s/releases/download/v0.31.9/k9s_linux_amd64.deb”
dpkg -i k9s_linux_amd64.deb
k9s allows to easily interact with the different resources of AKS cluster directly from a terminal user interface. It can be launched with k9s command. Detailed documentation on how to navigate on the different resources (Pods, DaemonSets, Nodes) can be found on the official k9s documentation page.
Attaching an Azure Container registry to the Azure Kubernetes Cluster (only required for MIG and NVIDIA GPU Driver CRD)
In case you will be using MIG or NVIDIA GPU Driver CRD, it is necessary to create a private Azure Container Registry and attaching that to the AKS cluster.
export ACR_NAME=<ACR_NAME_OF_YOUR_CHOICE>
az acr create –resource-group $RESOURCE_GROUP_NAME
–name $ACR_NAME –sku Basic
az aks update –name $AKS_CLUSTER_NAME –resource-group $RESOURCE_GROUP_NAME –attach-acr $ACR_NAME
You will be able to perform pull and push operations from this Container Registry through Docker using this command on a VM with the container engine installed, provided that the VM has a managed identity with AcrPull/AcrPush permissions :
az acr login –name $ACR_NAME
About taints for AKS GPU nodes
It is important to understand deeply the concept of taints and tolerations for GPU nodes in AKS. This is critical for two reasons:
In case spot instances are used in the AKS cluster, they will be applied the taint
kubernetes.azure.com/scalesetpriority=spot:NoSchedule
In some cases, it may be useful to add on the AKS cluster a dedicated taint for GPU SKUs, like
sku=gpu:NoSchedule
The utility of this taint is mainly related to the fact that, as compared to on-premises and bare-metal Kubernetes clusters, in AKS nodepools are usually allowed to scale down to 0 instances. This means that once the AKS auto-scaler should take a decision on the basis of a “nvidia.com/gpu” resource request, it may struggle in understanding what is the right node pool to scale-up
However, the latter point can also be addressed in a more elegant and specific way using a affinity declaration for Jobs or Pods spec requesting GPUs, like for example:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
– matchExpressions:
– key: node.kubernetes.io/instance-type
operator: In
values:
– Standard_NC4as_T4_v3
Creating the first GPU pool
The currently created AKS cluster has as a default only a node pool with 2 nodes of Standard_DS2_v2 VMs.
In order to test NVIDIA GPU Operator and run some GPU accelerated workload, we should add a GPU node pool.
It is critical, in case the management of the NVIDIA stack is meant to be managed with GPU operator that the node is created with the tag:
SkipGPUDriverInstall=true
This can be done using Azure Cloud Shell, for example using an NC4as_T4_v3 and setting the autoscaling from 0 up to 1 node:
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nc4ast4
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NC4as_T4_v3
–enable-cluster-autoscaler
–min-count 0 –max-count 1 –node-count 0 –tags SkipGPUDriverInstall=True
In order to deploy in Spot mode, the following flags should be added to Azure CLI:
–priority Spot –eviction-policy Delete –spot-max-price -1
Recently, a preview feature has been released that is allowing to skip the creation of the tags:
# Register the aks-preview extension
az extension add –name aks-preview
# Update the aks-preview extension
az extension update –name aks-preview
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nc4ast4
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NC4as_T4_v3
–enable-cluster-autoscaler
–min-count 0 –max-count 1 –node-count 0 –skip-gpu-driver-install
At the end of the process you should get the appropriate node pool defined in the portal and in status “Succeeded”:
az aks nodepool list –cluster-name $AKS_CLUSTER_NAME –resource-group $RESOURCE_GROUP_NAME -o table
Name OsType KubernetesVersion VmSize Count MaxPods ProvisioningState Mode
——— ——– ——————- ——————– ——- ——— ——————- ——
nodepool1 Linux 1.27.7 Standard_DS2_v2 2 110 Succeeded System
nc4ast4 Linux 1.27.7 Standard_NC4as_T4_v3 0 110 Succeeded User
Install NVIDIA GPU operator
On the machine with kubectl configured and with context configured above for connection to the AKS cluster, run the following to install helm:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
&& chmod 700 get_helm.sh
&& ./get_helm.sh
To fine tune the node feature recognition, we will install Node Feature Discovery separately from NVIDIA Operator. NVIDIA Operator requires that the label feature.node.kubernetes.io/pci-10de.present=true is applied to the nodes. Moreover, it is important to tune the node discovery plugin so that it will be scheduled even on Spot instances of the Kubernetes cluster and on instances where the taint sku: gpu is applied
helm install –wait –create-namespace -n gpu-operator node-feature-discovery node-feature-discovery –create-namespace –repo https://kubernetes-sigs.github.io/node-feature-discovery/charts –set-json master.config.extraLabelNs='[“nvidia.com”]’ –set-json worker.tolerations='[{ “effect”: “NoSchedule”, “key”: “sku”, “operator”: “Equal”, “value”: “gpu”},{“effect”: “NoSchedule”, “key”: “kubernetes.azure.com/scalesetpriority”, “value”:”spot”, “operator”: “Equal”},{“effect”: “NoSchedule”, “key”: “mig”, “value”:”notReady”, “operator”: “Equal”}]’
After enabling Node Feature Discovery, it is important to create a custom rule to precisely match NVIDIA GPUs on the nodes. This can be done creating a file called nfd-gpu-rule.yaml containing the following:
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: nfd-gpu-rule
spec:
rules:
– name: “nfd-gpu-rule”
labels:
“feature.node.kubernetes.io/pci-10de.present”: “true”
matchFeatures:
– feature: pci.device
matchExpressions:
vendor: {op: In, value: [“10de”]}
After this file is created, we should apply this to the AKS cluster:
kubectl apply -n gpu-operator -f nfd-gpu-rule.yaml
After this step, it is necessary to add NVIDIA Helm repository:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
And now the next step will be installing the GPU operator, remembering the tainting also for the GPU Operator DaemonSets and also to disable the deployment of the Node Feature Discovery (nfd) that has been done in the previous step:
helm install –wait –generate-name -n gpu-operator nvidia/gpu-operator –set-json daemonsets.tolerations='[{ “effect”: “NoSchedule”, “key”: “sku”, “operator”: “Equal”, “value”: “gpu”},{“effect”: “NoSchedule”, “key”: “kubernetes.azure.com/scalesetpriority”, “value”:”spot”, “operator”: “Equal”},{“effect”: “NoSchedule”, “key”: “mig”, “value”:”notReady”, “operator”: “Equal”}]’ –set nfd.enabled=false
Running the first GPU example
Once the configuration has been completed, it is now time to check the functionality of the GPU operator submitting the first GPU accelerated Job on AKS. In this stage we will use as a reference the standard TensorFlow example that is also documented in the official AKS Azure Learn pages.
Create a file called gpu-accelerated.yaml with this content:
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: samples-tf-mnist-demo
name: samples-tf-mnist-demo
spec:
template:
metadata:
labels:
app: samples-tf-mnist-demo
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
– matchExpressions:
– key: node.kubernetes.io/instance-type
operator: In
values:
– Standard_NC4as_T4_v3
containers:
– name: samples-tf-mnist-demo
image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu
args: [“–max_steps”, “500”]
imagePullPolicy: IfNotPresent
volumeMounts:
– mountPath: /tmp
name: scratch
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
volumes:
– name: scratch
hostPath:
# directory location on host
path: /mnt/tmp
type: DirectoryOrCreate
# this field is optional
This job can be submitted with the following command:
kubectl apply -f gpu-accelerated.yaml
After approximately one minute the node should be automatically provisioned:
root@aks-gpu-playground-rg-jumpbox:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-nc4ast4-81279986-vmss000003 Ready agent 2m38s v1.27.7
aks-nodepool1-25743550-vmss000000 Ready agent 4d16h v1.27.7
aks-nodepool1-25743550-vmss000001 Ready agent 4d16h v1.27.7
We can check that Node Feature Discovery has properly labeled the node:
root@aks-gpu-playground-rg-jumpbox:~# kubectl describe nodes aks-nc4ast4-81279986-vmss000003 | grep pci-
feature.node.kubernetes.io/pci-0302_10de.present=true
feature.node.kubernetes.io/pci-10de.present=true
NVIDIA GPU Operator DaemonSets will start preparing the node:
After driver installation, NVIDIA Container toolkit and related validation is completed, the job will start:
Once node preparation is completed, the GPU operator will add an allocatable GPU resource to the node:
kubectl describe nodes aks-nc4ast4-81279986-vmss000003
…
Allocatable:
cpu: 3860m
ephemeral-storage: 119703055367
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 24487780Ki
nvidia.com/gpu: 1
pods: 110
…
We can follow the process with the kubectl logs commands:
root@aks-gpu-playground-rg-jumpbox:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
samples-tf-mnist-demo-tmpr4 1/1 Running 0 11m
root@aks-gpu-playground-rg-jumpbox:~# kubectl logs samples-tf-mnist-demo-tmpr4 –follow
2024-02-18 11:51:31.479768: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2024-02-18 11:51:31.806125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0001:00:00.0
totalMemory: 15.57GiB freeMemory: 15.47GiB
2024-02-18 11:51:31.806157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla T4, pci bus id: 0001:00:00.0, compute capability: 7.5)
2024-02-18 11:54:56.216820: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz
Accuracy at step 0: 0.1201
Accuracy at step 10: 0.7364
…..
Accuracy at step 490: 0.9559
Adding run metadata for 499
Time-slicing configuration
An extremely useful feature of NVIDIA GPU Operator is represented by time-slicing. Time-slicing allows to share a physical GPU available on a node with multiple pods. Of course, this is just a time scheduling partition and not a physical GPU partition. It basically means that the different GPU processes that will be run by the different Pods will receive a proportional time of GPU compute time. However, if a Pod is particularly requiring in terms of GPU processing, it will impact significantly the other Pods sharing the GPU.
In the official NVIDIA GPU operator there are different ways to configure time-slicing. Here, considering that one of the benefits of a cloud environment is the possibility of having multiple different node pools, each with different GPU or configuration, we will focus on a fine-grained definition of the time-slicing at the node pool level.
The steps to enable time-slicing are three:
Label the nodes to allow them to be referred in the time-slicing configuration
Creating the time-slicing ConfigMap
Enabling time-slicing based on the ConfigMap in the GPU operator cluster policy
As a first step, the nodes should be labelled with the key “nvidia.com/device-plugin.config”.
For example, let’s label our node array from Azure CLI:
az aks nodepool update –cluster-name $AKS_CLUSTER_NAME –resource-group $RESOURCE_GROUP_NAME –nodepool-name nc4ast4 –labels “nvidia.com/device-plugin.config=tesla-t4-ts2
After this step, let’s create the ConfigMap object required to allow for a time-slicing 2 on this node pool in a file called time-slicing-config.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config
data:
tesla-t4-ts2: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
resources:
– name: nvidia.com/gpu
replicas: 2
Let’s apply the configuration in the GPU operator namespace:
kubectl apply -f time-slicing-config.yaml -n gpu-operator
Finally, let’s update the cluster policy to enable the time-slicing configuration:
kubectl patch clusterpolicy/cluster-policy
-n gpu-operator –type merge
-p ‘{“spec”: {“devicePlugin”: {“config”: {“name”: “time-slicing-config”}}}}’
Now, let’s try to resubmit the job already used in the first step in two replicas, creating a file called gpu-accelerated-time-slicing.yaml:
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: samples-tf-mnist-demo-ts
name: samples-tf-mnist-demo-ts
spec:
completions: 2
parallelism: 2
completionMode: Indexed
template:
metadata:
labels:
app: samples-tf-mnist-demo-ts
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
– matchExpressions:
– key: node.kubernetes.io/instance-type
operator: In
values:
– Standard_NC4as_T4_v3
containers:
– name: samples-tf-mnist-demo
image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu
args: [“–max_steps”, “500”]
imagePullPolicy: IfNotPresent
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
Let’s submit the job with the standard syntax:
kubectl apply -f gpu-accelerated-time-slicing.yaml
Now, after the node has been provisioned, we will find that it will get two GPU resources allocatable and will at the same time take the two Pods running concurrently at the same time.
kubectl describe node aks-nc4ast4-81279986-vmss000004
…
Allocatable:
cpu: 3860m
ephemeral-storage: 119703055367
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 24487780Ki
nvidia.com/gpu: 2
pods: 110
…..
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
——— —- ———— ———- ————— ————- —
default samples-tf-mnist-demo-ts-0-4tdcf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 29s
default samples-tf-mnist-demo-ts-1-67hn4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 29s
gpu-operator gpu-feature-discovery-lksj7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 59s
gpu-operator node-feature-discovery-worker-wbbct 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8m11s
gpu-operator nvidia-container-toolkit-daemonset-8nmx7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m24s
gpu-operator nvidia-dcgm-exporter-76rs8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m24s
gpu-operator nvidia-device-plugin-daemonset-btwz7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 55s
gpu-operator nvidia-driver-daemonset-8dkkh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8m6s
gpu-operator nvidia-operator-validator-s7294 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m24s
kube-system azure-ip-masq-agent-fjm5d 100m (2%) 500m (12%) 50Mi (0%) 250Mi (1%) 9m18s
kube-system cloud-node-manager-9wpsm 50m (1%) 0 (0%) 50Mi (0%) 512Mi (2%) 9m18s
kube-system csi-azuredisk-node-ckqw6 30m (0%) 0 (0%) 60Mi (0%) 400Mi (1%) 9m18s
kube-system csi-azurefile-node-xmfbd 30m (0%) 0 (0%) 60Mi (0%) 600Mi (2%) 9m18s
kube-system kube-proxy-7l856 100m (2%) 0 (0%) 0 (0%) 0 (0%) 9m18s
A few remarks about time-slicing:
It is critical, in this specific scenario, to benchmark and characterize your GPU workload. Time-slicing is just a method to maximize resource utilization, but is not the solution to multiply available resources. It is suggested that a careful benchmarking of GPU usage and GPU memory usage is carried out to identify if time-slicing is a valid solution. For example, if the average load of a specific GPU process is around 30%, a time-slicing of 2 or 3 could be evaluated
Of course, also CPU and RAM resources should be considered in the equation
In AKS it is extremely important to note that once time-slicing configuration is changed for a specific nodepool which has no resource allocated, it is not immediately evident in the next autoscaler operation.
Let’s imagine for example a nodepool scaled-down to zero that has no time-slicing applied. Let’s assume to configure it with time-slicing equal to 2. Submitting a request for 2 GPU resources may still allocate 2 nodes.
This because the autoscaler has in its memory that each node provides only 1 allocatable GPU. In all subsequent operations, once a node will be correctly exposing 2 GPUs as allocatable for the first time, AKS autoscaler will acknowledge that and it will act accordingly in future autoscaling operations.
Multi-Instance GPU (MIG)
NVIDIA Multi-instance GPU allows for GPU partitioning on Ampere and Hopper architecture. This means allowing an available GPU to be partitioned at hardware level (and not at time-slicing level). This means that Pods can have access to a dedicated hardware portion of the GPU resources which is delimited at an hardware level.
In Kubernetes there are two strategies available for MIG, more specifically single and mixed.
In single strategy, the nodes expose a standard “nvidia.com/gpu” set of resources.
In mixed strategy, the nodes expose the specific MIG profiles as resources, like in the example below:
Allocatable:
nvidia.com/mig-1g.5gb: 1
nvidia.com/mig-2g.10gb: 1
nvidia.com/mig-3g.20gb: 1
In order to use MIG, you could follow standard AKS documentation. However, we would like to propose here a method relying totally on NVIDIA GPU operator.
As a first step, it is necessary to allow reboot of nodes to get MIG configuration enabled:
kubectl patch clusterpolicy/cluster-policy -n gpu-operator –type merge -p ‘{“spec”: {“migManager”: {“env”: [{“name”: “WITH_REBOOT”, “value”: “true”}]}}}’
Let’s start creating a node pools powered by a GPU supporting MIG on Azure, like on the SKU Standard_NC24ads_A100_v4 and let’s label the node with one of the MIG profiles available for A100 80 GiB.
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nc24a100v4
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NC24ads_A100_v4
–enable-cluster-autoscaler
–min-count 0 –max-count 1 –node-count 0 –skip-gpu-driver-install –labels “nvidia.com/mig.config”=”all-1g.10gb”
There is another important detail to consider in this stage with AKS, meaning that the auto-scaling of the nodes will bring-up nodes with a standard GPU configuration, without MIG activated. This means, that NVIDIA GPU operator will install the drivers and then mig-manager will activate the proper MIG configuration profile and reboot. Between these two phases there is a small time window where the GPU resources are exposed by the node and this could potentially trigger a job execution.
To support this scenario, it is important to consider on AKS the need of an additional DaemonSet that prevents any Pod to be scheduled during the MIG configuration. This is available in a dedicated repository.
To deploy the DaemonSet:
export NAMESPACE=gpu-operator
export ACR_NAME=<YOUR_ACR_NAME>
git clone https://github.com/wolfgang-desalvador/aks-mig-monitor.git
cd aks-mig-monitor
sed -i “s/<ACR_NAME>/$ACR_NAME/g” mig-monitor-daemonset.yaml
sed -i “s/<NAMESPACE>/$NAMESPACE/g” mig-monitor-roles.yaml
docker build . -t $ACR_NAME/aks-mig-monitor
docker push $ACR_NAME/aks-mig-monitor
kubectl apply -f mig-monitor-roles.yaml -n $NAMESPACE
kubectl apply -f mig-monitor-daemonset.yaml -n $NAMESPACE
We can now try to submit the mig-accelerated-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: samples-tf-mnist-demo-mig
name: samples-tf-mnist-demo-mig
spec:
completions: 7
parallelism: 7
completionMode: Indexed
template:
metadata:
labels:
app: samples-tf-mnist-demo-mig
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
– matchExpressions:
– key: node.kubernetes.io/instance-type
operator: In
values:
– Standard_NC24ads_A100_v4
containers:
– name: samples-tf-mnist-demo
image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu
args: [“–max_steps”, “500”]
imagePullPolicy: IfNotPresent
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
Then we will be submitting the job with kubectl:
kubectl apply -f mig-accelerated-job.yaml
After the node will startup, the first state will have a taint with mig=notReady:NoSchedule since the MIG configuration is not completed. GPU Operator containers will be installed:
kubectl describe nodes aks-nc24a100v4-42670331-vmss00000a
Name: aks-nc24a100v4-42670331-vmss00000a
…
nvidia.com/mig.config=all-1g.10gb
…
Taints: kubernetes.azure.com/scalesetpriority=spot:NoSchedule
mig=notReady:NoSchedule
sku=gpu:NoSchedule
…
Non-terminated Pods: (13 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
——— —- ———— ———- ————— ————- —
gpu-operator aks-mig-monitor-64zpl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 16s
gpu-operator gpu-feature-discovery-wpd2j 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
gpu-operator node-feature-discovery-worker-79h68 0 (0%) 0 (0%) 0 (0%) 0 (0%) 16s
gpu-operator nvidia-container-toolkit-daemonset-q5p9k 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12s
gpu-operator nvidia-dcgm-exporter-9g5kg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
gpu-operator nvidia-device-plugin-daemonset-5wpzk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
gpu-operator nvidia-driver-daemonset-kqkzb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
gpu-operator nvidia-operator-validator-lx77m 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12s
kube-system azure-ip-masq-agent-7rd2x 100m (0%) 500m (2%) 50Mi (0%) 250Mi (0%) 66s
kube-system cloud-node-manager-dc756 50m (0%) 0 (0%) 50Mi (0%) 512Mi (0%) 66s
kube-system csi-azuredisk-node-5b4nk 30m (0%) 0 (0%) 60Mi (0%) 400Mi (0%) 66s
kube-system csi-azurefile-node-vlwhv 30m (0%) 0 (0%) 60Mi (0%) 600Mi (0%) 66s
kube-system kube-proxy-4fkxh 100m (0%) 0 (0%) 0 (0%) 0 (0%) 66s
After the GPU Operator configuration is completed, mig-manager will start being deployed. MIG configuration will be applied and node will then set in a rebooting state:
kubectl describe nodes aks-nc24a100v4-42670331-vmss00000a
nvidia.com/mig.config=all-1g.10gb
nvidia.com/mig.strategy=single
nvidia.com/mig.config.state=rebooting
…
Taints: kubernetes.azure.com/scalesetpriority=spot:NoSchedule
mig=notReady:NoSchedule
sku=gpu:NoSchedule
…
Non-terminated Pods: (14 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
——— —- ———— ———- ————— ————- —
gpu-operator aks-mig-monitor-64zpl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m6s
gpu-operator gpu-feature-discovery-6btwx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m33s
gpu-operator node-feature-discovery-worker-79h68 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m6s
gpu-operator nvidia-container-toolkit-daemonset-wplkb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m33s
gpu-operator nvidia-dcgm-exporter-vnscq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m33s
gpu-operator nvidia-device-plugin-daemonset-d86dn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m33s
gpu-operator nvidia-driver-daemonset-kqkzb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m3s
gpu-operator nvidia-mig-manager-t4bw9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2s
gpu-operator nvidia-operator-validator-jrfkn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m33s
kube-system azure-ip-masq-agent-7rd2x 100m (0%) 500m (2%) 50Mi (0%) 250Mi (0%) 4m56s
kube-system cloud-node-manager-dc756 50m (0%) 0 (0%) 50Mi (0%) 512Mi (0%) 4m56s
kube-system csi-azuredisk-node-5b4nk 30m (0%) 0 (0%) 60Mi (0%) 400Mi (0%) 4m56s
kube-system csi-azurefile-node-vlwhv 30m (0%) 0 (0%) 60Mi (0%) 600Mi (0%) 4m56s
kube-system kube-proxy-4fkxh 100m (0%) 0 (0%) 0 (0%) 0 (0%) 4m56s
After the reboot, the MIG configuration will switch to state “success” and taints will be removed. Scheduling of the 7 pods of our job will then start:
kubectl describe nodes aks-nc24a100v4-42670331-vmss00000a
…
nvidia.com/mig.capable=true
nvidia.com/mig.config=all-1g.10gb
nvidia.com/mig.config.state=success
nvidia.com/mig.strategy=single
…
Taints: kubernetes.azure.com/scalesetpriority=spot:NoSchedule
sku=gpu:NoSchedule
…
Allocatable:
cpu: 23660m
ephemeral-storage: 119703055367
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 214295444Ki
nvidia.com/gpu: 7
pods: 110
…
Non-terminated Pods: (21 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
——— —- ———— ———- ————— ————- —
default samples-tf-mnist-demo-ts-0-5bs64 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-1-2msdh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-2-ck8c8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-3-dlkfn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-4-899fr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-5-dmgpn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
default samples-tf-mnist-demo-ts-6-pvzm4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
gpu-operator aks-mig-monitor-64zpl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m9s
gpu-operator gpu-feature-discovery-5t9gn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 41s
gpu-operator node-feature-discovery-worker-79h68 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m9s
gpu-operator nvidia-container-toolkit-daemonset-82dgg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2m22s
gpu-operator nvidia-dcgm-exporter-xbxqf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 41s
gpu-operator nvidia-device-plugin-daemonset-8gkzd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 41s
gpu-operator nvidia-driver-daemonset-kqkzb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m6s
gpu-operator nvidia-mig-manager-jbqls 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2m22s
gpu-operator nvidia-operator-validator-5rdbh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 41s
kube-system azure-ip-masq-agent-7rd2x 100m (0%) 500m (2%) 50Mi (0%) 250Mi (0%) 9m59s
kube-system cloud-node-manager-dc756 50m (0%) 0 (0%) 50Mi (0%) 512Mi (0%) 9m59s
kube-system csi-azuredisk-node-5b4nk 30m (0%) 0 (0%) 60Mi (0%) 400Mi (0%) 9m59s
kube-system csi-azurefile-node-vlwhv 30m (0%) 0 (0%) 60Mi (0%) 600Mi (0%) 9m59s
kube-system kube-proxy-4fkxh 100m (0%) 0 (0%) 0 (0%) 0 (0%) 9m59s
“
Checking on a node the status of MIG will visualize the 7 GPU partitions through nvidia-smi:
nvidia-smi
+—————————————————————————————+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|—————————————–+———————-+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000001:00:00.0 Off | On |
| N/A 27C P0 71W / 300W | 726MiB / 81920MiB | N/A Default |
| | | Enabled |
+—————————————–+———————-+———————-+
+—————————————————————————————+
| MIG devices: |
+——————+——————————–+———–+———————–+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 7 0 0 | 102MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 8 0 1 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 9 0 2 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 10 0 3 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 11 0 4 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 12 0 5 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
| 0 13 0 6 | 104MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 2MiB / 16383MiB | | |
+——————+——————————–+———–+———————–+
+—————————————————————————————+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 7 0 28988 C python 82MiB |
| 0 8 0 29140 C python 84MiB |
| 0 9 0 29335 C python 84MiB |
| 0 10 0 29090 C python 84MiB |
| 0 11 0 29031 C python 84MiB |
| 0 12 0 29190 C python 84MiB |
| 0 13 0 29255 C python 84MiB |
+—————————————————————————————+
A few remarks about MIG to take into account:
MIG provides physical GPU partitioning, so that the GPU associated to one Pod is totally reserved to that Pod
CPU and RAM resources should be considered in the equation, they won’t be partitioned by MIG and should follow the standard AKS limits assignment
In AKS it is extremely important to note that once MIG configuration is changed for a specific nodepool which has no node allocated, it is not immediately evident in the next autoscaler operation. This means that asking for 7 GPUs on a nodepool scaled down to 0 after the first activation of MIG in the terms above may bring-up 7 nodes
The Daemonsets described above just prevents scheduling during the boot-up phases of a node provisioned by autoscaler. If the MIG profile should be changed afterwards changing the MIG label on the node, the node should be cordoned. Changes to the labels must be done at AKS node pool level in case label was set through Azure CLI (using az aks nodepool update) or at single node level (using kubectl patch nodes) in case it was done with kubectl
For example, in the case above, if we want to move to another profile, it is important to cordon the node with the commands:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-nc24a100v4-42670331-vmss00000c Ready agent 11m v1.27.7
aks-nodepool1-25743550-vmss000000 Ready agent 6d16h v1.27.7
aks-nodepool1-25743550-vmss000001 Ready agent 6d16h v1.27.7
kubectl cordon aks-nc24a100v4-42670331-vmss00000c
node/aks-nc24a100v4-42670331-vmss00000c cordoned
Be aware that cordoning the nodes will not stop the Pods. You should verify no GPU accelerated workload is running before submitting the label change.
Since in our case we have applied the label at the AKS level, we will need to change the label from Azure CLI:
az aks nodepool update –cluster-name $AKS_CLUSTER_NAME –resource-group $RESOURCE_GROUP_NAME –nodepool-name nc24a100v4 –labels “nvidia.com/mig.config”=”all-1g.20gb”
This will trigger a reconfiguration of MIG with new profile applied:
+—————————————————————————————+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|—————————————–+———————-+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000001:00:00.0 Off | On |
| N/A 42C P0 77W / 300W | 50MiB / 81920MiB | N/A Default |
| | | Enabled |
+—————————————–+———————-+———————-+
+—————————————————————————————+
| MIG devices: |
+——————+——————————–+———–+———————–+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 3 0 0 | 12MiB / 19968MiB | 14 0 | 1 0 1 0 0 |
| | 0MiB / 32767MiB | | |
+——————+——————————–+———–+———————–+
| 0 4 0 1 | 12MiB / 19968MiB | 14 0 | 1 0 1 0 0 |
| | 0MiB / 32767MiB | | |
+——————+——————————–+———–+———————–+
| 0 5 0 2 | 12MiB / 19968MiB | 14 0 | 1 0 1 0 0 |
| | 0MiB / 32767MiB | | |
+——————+——————————–+———–+———————–+
| 0 6 0 3 | 12MiB / 19968MiB | 14 0 | 1 0 1 0 0 |
| | 0MiB / 32767MiB | | |
+——————+——————————–+———–+———————–+
+—————————————————————————————+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+—————————————————————————————+
We can then uncordon the node/nodes:
kubectl uncordon aks-nc24a100v4-42670331-vmss00000c
node/aks-nc24a100v4-42670331-vmss00000c uncordoned
Using NVIDIA GPU Driver CRD (preview)
The NVIDIA GPU Driver CRD allows to define in a granular way the driver version and the driver images on each of the nodepools in use in an AKS cluster. This feature is in preview as documented in NVIDIA GPU Operator and not suggested by NVIDIA on production systems.
In order to enable NVIDIA GPU Driver CRD (in case you have already installed NVIDIA GPU Operator you will need to perform helm uninstall, of course taking care of running workloads) perform the following command:
helm install –wait –generate-name -n gpu-operator nvidia/gpu-operator –set-json daemonsets.tolerations='[{“effect”: “NoSchedule”, “key”: “sku”, “operator”: “Equal”, “value”: “gpu” }, {“effect”: “NoSchedule”, “key”: “kubernetes.azure.com/scalesetpriority”, “value”:”spot”, “operator”: “Equal”}]’ –set nfd.enabled=false –set driver.nvidiaDriverCRD.deployDefaultCR=false –set driver.nvidiaDriverCRD.enabled=true
After this step, it is important to create nodepools with a proper label to be used to select nodes for driver version (in this case “driver.config“):
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nc4latest
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NC4as_T4_v3
–enable-cluster-autoscaler
–labels “driver.config”=”latest”
–min-count 0 –max-count 1 –node-count 0 –tags SkipGPUDriverInstall=True
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nc4stable
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NC4as_T4_v3
–enable-cluster-autoscaler
–labels “driver.config”=”stable”
–min-count 0 –max-count 1 –node-count 0 –tags SkipGPUDriverInstall=True
After this step, the driver configuration (NVIDIADriver object in AKS) should be created. This can be done with a file called driver-config.yaml with the following content:
apiVersion: nvidia.com/v1alpha1
kind: NVIDIADriver
metadata:
name: nc4-latest
spec:
driverType: gpu
env: []
image: driver
imagePullPolicy: IfNotPresent
imagePullSecrets: []
manager: {}
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
– key: “kubernetes.azure.com/scalesetpriority”
operator: “Equal”
value: “spot”
effect: “NoSchedule”
nodeSelector:
driver.config: “latest”
repository: nvcr.io/nvidia
version: “535.129.03”
—
apiVersion: nvidia.com/v1alpha1
kind: NVIDIADriver
metadata:
name: nc4-stable
spec:
driverType: gpu
env: []
image: driver
imagePullPolicy: IfNotPresent
imagePullSecrets: []
manager: {}
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
– key: “kubernetes.azure.com/scalesetpriority”
operator: “Equal”
value: “spot”
effect: “NoSchedule”
nodeSelector:
driver.config: “stable”
repository: nvcr.io/nvidia
version: “535.104.12”
This can then be applied with kubectl:
kubectl apply -f driver-config.yaml -n gpu-operator
Now scaling up to nodes (e.g. submitting a GPU workload requesting as affinity exactly the target labels of device.config) we can verify that the driver versions will be the one requested. Running nvidia-smi attaching a shell to the Daemonset container of each of the two nodes:
### On latest
+—————————————————————————————+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|—————————————–+———————-+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000001:00:00.0 Off | Off |
| N/A 30C P8 15W / 70W | 2MiB / 16384MiB | 0% Default |
| | | N/A |
+—————————————–+———————-+———————-+
### On stable
+—————————————————————————————+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|—————————————–+———————-+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000001:00:00.0 Off | Off |
| N/A 30C P8 14W / 70W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+—————————————–+———————-+———————-+
NVIDIA GPU Driver CRD allows to specify a specific Docker image and Docker Registry for the NVIDIA Driver installation on each node pool.
This becomes particularly useful in the case we will need to install the Azure specific Virtual GPU Drivers on A10 GPUs.
On Azure, NVads_A10_v5 VMs are characterized by NVIDIA VGPU technology in the backend, so they require VGPU Drivers. On Azure, the VGPU drivers comes included with the VM cost, so there is no need to get a VGPU license. The binaries available on the Azure Driver download page can be used on the supported OS (including Ubuntu 22) only on Azure VMs.
In this case, there is the possibility to bundle an ad-hoc NVIDIA Driver container image to be used on Azure, making that accessible to a dedicated container registry.
In order to do that, this is the procedure (assuming wehave an ACR attached to AKS with <ACR_NAME>):
export ACR_NAME=<ACR_NAME>
az acr login -n $ACR_NAME
git clone https://gitlab.com/nvidia/container-images/driver
cd driver
cp -r ubuntu22.04 ubuntu22.04-aks
cd ubuntu22.04-aks
cd drivers
wget “https://download.microsoft.com/download/1/4/4/14450d0e-a3f2-4b0a-9bb4-a8e729e986c4/NVIDIA-Linux-x86_64-535.154.05-grid-azure.run”
mv NVIDIA-Linux-x86_64-535.154.05-grid-azure.run NVIDIA-Linux-x86_64-535.154.05.run
chmod +x NVIDIA-Linux-x86_64-535.154.05.run
cd ..
sed -i ‘s%/tmp/install.sh download_installer%echo “Skipping Driver Download”%g’ Dockerfile
sed ‘s%sh NVIDIA-Linux-$DRIVER_ARCH-$DRIVER_VERSION.run -x%sh NVIDIA-Linux-$DRIVER_ARCH-$DRIVER_VERSION.run -x && mv NVIDIA-Linux-$DRIVER_ARCH-$DRIVER_VERSION-grid-azure NVIDIA-Linux-$DRIVER_ARCH-$DRIVER_VERSION%g’ nvidia-driver -i
docker build –build-arg DRIVER_VERSION=535.154.05 –build-arg DRIVER_BRANCH=535 –build-arg CUDA_VERSION=12.3.1 –build-arg TARGETARCH=amd64 . -t $ACR_NAME/driver:535.154.05-ubuntu22.04
docker push $ACR_NAME/driver:535.154.05-ubuntu22.04
After this, let’s create a specific NVIDIADriver object for Azure VGPU with a file named azure-vgpu.yaml and the following content (replace <ACR_NAME> with your ACR name):
apiVersion: nvidia.com/v1alpha1
kind: NVIDIADriver
metadata:
name: azure-vgpu
spec:
driverType: gpu
env: []
image: driver
imagePullPolicy: IfNotPresent
imagePullSecrets: []
manager: {}
tolerations:
– key: “sku”
operator: “Equal”
value: “gpu”
effect: “NoSchedule”
– key: “kubernetes.azure.com/scalesetpriority”
operator: “Equal”
value: “spot”
effect: “NoSchedule”
nodeSelector:
driver.config: “azurevgpu”
repository: <ACR_NAME>
version: “535.154.05”
Let’s apply it with kubectl:
kubectl apply -f azure-vgpu.yaml -n gpu-operator
Now, let’s create an A10 nodepool with Azure CLI:
az aks nodepool add
–resource-group $RESOURCE_GROUP_NAME
–cluster-name $AKS_CLUSTER_NAME
–name nv36a10v5
–node-taints sku=gpu:NoSchedule
–node-vm-size Standard_NV36ads_A10_v5
–enable-cluster-autoscaler
–labels “driver.config”=”azurevgpu”
–min-count 0 –max-count 1 –node-count 0 –tags SkipGPUDriverInstall=True
Scaling up a node with a specific workload and waiting for the finalization of Driver installation, we will see that the image of the NVIDIA Driver installation has been pulled by our registry:
root@aks-gpu-playground-rg-jumpbox:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-nodepool1-25743550-vmss000000 Ready agent 6d23h v1.27.7
aks-nodepool1-25743550-vmss000001 Ready agent 6d23h v1.27.7
aks-nv36a10v5-10653906-vmss000000 Ready agent 9m24s v1.27.7
root@aks-gpu-playground-rg-jumpbox:~# kubectl describe node aks-nv36a10v5-10653906-vmss000000| grep gpu-driver
nvidia.com/gpu-driver-upgrade-state=upgrade-done
nvidia.com/gpu-driver-upgrade-enabled: true
gpu-operator nvidia-gpu-driver-ubuntu22.04-56df89b87c-6w8tj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m29s
root@aks-gpu-playground-rg-jumpbox:~# kubectl describe pods -n gpu-operator nvidia-gpu-driver-ubuntu22.04-56df89b87c-6w8tj | grep -i Image
Image: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.2
Image ID: nvcr.io/nvidia/cloud-native/k8s-driver-manager@sha256:bb845160b32fd12eb3fae3e830d2e6a7780bc7405e0d8c5b816242d48be9daa8
Image: aksgpuplayground.azurecr.io/driver:535.154.05-ubuntu22.04
Image ID: aksgpuplayground.azurecr.io/driver@sha256:deb6e6311a174ca6a989f8338940bf3b1e6ae115ebf738042063f4c3c95c770f
Normal Pulled 4m26s kubelet Container image “nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.2” already present on machine
Normal Pulling 4m23s kubelet Pulling image “aksgpuplayground.azurecr.io/driver:535.154.05-ubuntu22.04”
Normal Pulled 4m16s kubelet Successfully pulled image “aksgpuplayground.azurecr.io/driver:535.154.05-ubuntu22.04” in 6.871887325s (6.871898205s including waiting)
Also, we can see how the A10 VGPU profile is recognized successfully attaching to the Pod of device-plugin-daemonset:
+—————————————————————————————+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|—————————————–+———————-+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10-24Q On | 00000002:00:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 24512MiB | 0% Default |
| | | Disabled |
+—————————————–+———————-+———————-+
Thank you
Thank you for reading our blog posts, feel free to leave any comment / feedback, ask for clarifications or report any issue
Microsoft Tech Community – Latest Blogs –Read More
What’s New in Microsoft Intune February 2024
We often hear feedback about the balance between optimizing for productivity and security. The choice to prioritize experience or protection shows up in device provisioning, support processes, and day-to-day administration. In this spirit, I’m happy to share a few updates to Intune that will help IT admins balance security and productivity for their end users. For a comprehensive view of updates, visit the documentation.
More unified cross-platform endpoint management
We use the metaphor “single pane of glass” to describe the ideal management environment: one that enables visibility into all your devices and platforms and reduces the need for switching tools (and its associated costs in time and attention). Last year, we declared that macOS device management with Microsoft Intune was entering a new era of capability, and with this month’s additions, the view is getting clearer and wider. I’m pleased to share the general availability of “await final configuration,” a feature of the automated device enrollment process that prepares the device for users before they reach the desktop.
The new “await final configuration” for macOS Automated Device Enrollment (ADE) provides the Setup Assistant experience for end users while company device configuration policies are downloaded and applied. The intent is for the device to be set up with the correct policies such as VPN and WiFi profiles, before end users land on the Home Screen, so there is no confusion or gaps to get productive and be secure. This capability is covered in detail in the new guide to macOS device enrollment.
Autopilot enhancements
On the same lines of delightful end user experiences, we’re adding a new setting in Autopilot deployments, which gives admins flexibility to install critical applications and get their users to be productive as soon as possible.
Previously, required applications could be installed under one of two conditions: block for all apps, where any application install failures during the technician phase would cause the entire deployment to fail, or block for some apps, which would only install specified apps during the technician phase and leave the rest for the user phase.
The new setting allows administrators to block only for selected apps and continue if other applications fail to install during the technician phase. For those non-blocking applications, the installation will be tried again when the user signs in for the first time. This new option is based on our customer feedback and will lead to better and more efficient provisioning experiences for end users and administrators.
More efficient updating
We saw a tremendous response from organizations when we introduced driver and firmware updating capabilities to Intune last June. We’re excited to announce a new capability to approve driver updates in bulk. This is especially helpful for those who want to retain manual approval over driver deployment, but have a diverse set of devices to manage. We hear from organizations who need to edit 50 or even 100 drivers at a time, so we know this will increase their productivity greatly. For those who use automatic approval, this bulk editing capability will help you with drivers that aren’t included in automatic approvals. This includes most firmware updates, saving even more effort. For those who previewed the functionality found it especially helpful to be able to schedule driver and firmware updates at the same time as quality updates. This reduces the number of reboots that may be needed by end users. For more details, look for updated documentation on Windows Driver update management in Microsoft Intune.
Hopefully we’ve given you reasons to be excited and keep your focus. How do you anticipate using these new Intune features? Let me know by reaching out to me on LinkedIn or in the comments below.
Stay up to date! Bookmark the Microsoft Intune Blog and follow us on LinkedIn or @MSIntune on X to continue the conversation.
Microsoft Tech Community – Latest Blogs –Read More
Announcing the 2024 Imagine Cup Semifinalists!
We’re thrilled to announce the next chapter of the 2024 Imagine Cup, a global technology competition that celebrates the perseverance, grit and brilliance of students who are building startups with AI at their core. The spotlight now shines on the Semifinals. It’s a phase that’s sure to push these founders toward accelerated growth and unlock new possibilities for them in the competition and beyond.
Today we proudly unveil the teams that have earned their place in the Semifinals. These teams are pioneers. They represent the first generation of technologists who are using AI, not just as a tool, but as a foundational element in innovative startups that have the potential to transform industries, uplift communities, and even save lives.
The 2024 Imagine Cup World Champion will be crowned in May during Microsoft Build. The winner will take home the grand prize—USD100,000 and a mentoring session with Microsoft Chairman and CEO, Satya Nadella. The two runners up will each earn USD50,000.
Curious about the journey ahead? Learn more about these semifinalists’ journey. Explore each team’s startup and discover how these entrepreneurs are using AI to create a tangible difference.
What’s next for the semifinalists?
In upcoming weeks, the semifinalists will refine their solutions, turning every line of code and design choice into a robust, market-ready startup. The journey so far has been nothing short of remarkable.
Here’s a sneak peek into what awaits the teams:
Harnessing the power of AI Acceleration: The semifinalists will work with mentors and experts to explore how AI can help propel their startup forward – from intelligent automation, to pushing creativity to data-driven insights – this will be the time for founders to embed AI seamlessly into their solution.
Access to Microsoft for Startups Founders Hub: This will allow the semifinalists to harness additional resources, delve deeper into Azure, and unlock tools poised to refine the trajectory for their startups, such as:
Up to USD150,000 of Azure credits. Plus, offers for 30+ tools and services from Microsoft and our partners, with more credits and benefits as they grow.
Access to Azure AI Studio, the most comprehensive set of generative AI models, including OpenAI GPT-3.5 Turbo, GPT-4, and Llama 2 by Meta.
USD2,500 in OpenAI credits to experiment with LLMs.
1:1 expert advice from AI experts and entrepreneurial mentors: The journey through the semifinals is not a one-size-fits-all experience. Participants will receive guidance tailored to their specific needs, challenges and opportunities as a founder.
2024 Imagine Cup Semifinalists (in alphabetical order)
Discover how these student innovators are using AI to make a tangible difference. Get acquainted with a brief overview of each team:
Team
About (as described in their own words)
Adalat AI
“Leverages AI to revolutionize India’s judicial system, tackling extensive case backlogs and delays. Our technology, including AI-driven transcription tools, expedites court processes, enhancing efficiency and accuracy.”
Aesop AI
“An interactive, educational storybook platform that transforms how stories are told.”
Agricode
“A multiplatform app to help farmers in every stage of their farming activities.”
Astra Wellbeing
“An SMS-based digital Wellness Platform designed to improve the wellbeing of frontline healthcare employees through AI-tailored messages of positive reinforcement and on-demand wellness resources.”
Boats Against the Current
“A multifunctional inspection robot for landscape water, which can realize the automation and intelligence of water inspection and ecological protection…”
BunnyBot
“Our team aims to create AI-powered companion robots to combat loneliness and support the elderly for Alzheimer’s and dementia.”
DevRelax
“A desktop application designed specifically for IT professionals: a comprehensive stress reduction solution tailored to the unique demands of the industry.”
EDARMA
“An augmented reality based educational platform which uses real time visuals to enhance the overall learning experience of students.”
FROM YOUR EYES
“An AI technology company that encapsulates the most technological and customizable form of the visual experience, spanning from humans to machines.”
Galen Health
“Our flagship product, OncoSight, is an AI platform that analyzes patterns in routinely available data from the electronic health record to detect early warning signs of pancreatic cancer.”
HearMe
“An innovative learning tool designed to accelerate vocabulary and language skills in hearing-impaired children.”
JRE
“We develop Al powered products to control manufacturing in heavy industries.”
ObviousAI
“An aggregator website for fashion products that support users to search with their natural language or their own photos and screenshots.”
ParkinSync
“A software platform which aims to facilitate the diagnosis of Parkinson’s Disease. It integrates with wearable sensors and displays data in a customizable UI. It can also be combined with a tremor-suppressing exoskeleton.”
PlanRoadmap
“An AI-powered productivity coach to help people with ADHD who are struggling with task paralysis get their tasks done. Our coach asks questions to identify the user’s obstacles, suggests strategies, and teaches the user about their work style.”
Sign Saathi
“A transformative solution designed to empower the deaf community and transcend the limitations they face by providing instant sign language generation, interpretation, and ultimately, education.”
UpEase
“A copilot for Higher Education! UpEase incorporates user and developer friendly interfaces, seamless integration with Microsoft 365, robust AI technology and a community network, to segregate itself in the education management space.”
Weeg
“A groundbreaking initiative that bridges the digital divide in remote areas through technology. Utilizing Azure’s cloud services, it offers a two-part solution: The Mesh Network and The Hive educational platform.”
WorldDex
“A real-life Pokédex: on our mobile app, you can scan and collect any object, talk to your collection, and share your experiences! WorldDex uses computer vision, LLMs, and other cutting-edge tech for a magical experience.”
Follow Along & Stay Tuned
Don’t miss this chance to be part of a transformative experience. The Imagine Cup community is buzzing with excitement. Tune in, cheer for your favorites, follow along, and get inspired by the ingenuity of these student founders!
Microsoft Tech Community – Latest Blogs –Read More
Streamlining the Process: Building and Publishing Apps Across the Microsoft Cloud
Publishing apps across the cloud can be a complex endeavor. In our recent webinar, “Building and Publishing Apps Across the Microsoft Cloud,” attendees had the opportunity to gain insights from guest speaker James Anderson, CEO of Akouo. He shared his company’s firsthand experience in developing multi-faceted solutions for Microsoft Teams across various Microsoft Cloud products. Akouo has developed two products: one integrates simultaneous interpretation for meetings and webinars on Microsoft Teams, while the second is a 100% Microsoft-powered multilingual caption generator for Microsoft Teams meetings and webinars. It enables bi-directional conversations with multilingual captions.
James Anderson shared some key points from Akouo’s journey in building and publishing apps across the Microsoft Cloud:
James shared Akouo’s experience in building solutions utilizing various Microsoft components such as Teams, Power Platform, Automate Pages, Dataverse, React UI Libraries, and Graph APIs and the complexity involved.
Experience in publishing to the Marketplace and the support received from Microsoft throughout the process.
Publishing both transactable (metered product) and non-transactable solutions.
Microsoft’s Anirudha Bakore, Principal PM Manager, and I, Sudi Naidoo, Principal Product Manager, discussed Microsoft’s approach to addressing challenges encountered by Akouo and other ISVs when building and publishing applications:
From an ISV Developer perspective, Microsoft is exploring a Cloud Native Application Bundle (CNAB) solution as a packaging mechanism, which would be cloud-agnostic. This approach would enable developers to create one package and directly publish it to the commercial marketplace.
From a customer viewpoint, Microsoft is exploring a way for customers to discover entire solutions spanning across Microsoft Cloud in one place and then deploy in one-click fashion across all components of the Microsoft Cloud such as Power Platform, Azure, Fabric etc.
Microsoft is also working on enhancing the Cloud Solution Center, the existing portal for deploying industry cloud solutions to customers’ environments. These enhancements aim to provide a more streamlined experience for selecting and deploying solutions that involve multiple cloud components.
Microsoft aims to deliver a seamless end-to-end experience to discover and deploy solutions from ISV developers as well as their customer’s point of view. By streamlining the process of building and publishing apps across the Microsoft Cloud, Microsoft and ISVs like Akouo are working towards making cloud-based solutions more accessible and efficient for both developers and customers alike.
Register to watch the recording! For more information and more details on the above content register to watch the recording.
Interested in continuing the conversation? Register HERE by answering a few questions about your Cross Cloud experience and our Microsoft team will reach out to you.
Microsoft Tech Community – Latest Blogs –Read More
Streamline SharePoint Governance with Site Lifecycle Management
Site Lifecycle Management Policies in SharePoint Advanced Management
SharePoint is a powerful platform for collaboration and content management; it is not surprising that millions of people use SharePoint daily to perform their tasks. A good governance plan is necessary to manage this growth, and while most organizations have some basic policies and procedures for creating, maintaining, and decommissioning sites; it has always been a challenge to find inactive sites and can be identified for deletion. This is especially important as organizations are starting to deploy Microsoft 365 Copilot, and cleaning up outdated data is more important than ever.
Site lifecycle management in SharePoint Advanced Management can help organizations to easily track and inform site owners when their site is not active, and allow them to take the appropriate steps to manage their content.
To use site lifecycle management, you must first create an inactive site policy. This will allow you to create rules that define when a site is considered inactive, what sites are included in the policy, and manually exclude sites as needed.
The policy can be adjusted to target sites based on how they were created and the type of site, such as group connected sites or communication sites, etc. This will enable administrators to apply different rules based on how the site is used. For example, you can have a stricter policy for classic sites, by setting the site policy to consider a site inactive sooner than others.
Administrators can also use these policies to only target only sites created by users, and exclude anything created in the SharePoint admin center or via PowerShell.
Once the policy is configured, site owners will receive an email notification asking them to certify that the site is active or providing them information on how to delete the site.
Administrators can access a report of inactive sites in the portal, which is in CSV format. This format can help with further actions such as deleting the sites or sending additional messages to the site owners.
SharePoint Advanced Management provides a powerful tool for managing site lifecycles through its site lifecycle policies. These policies allow administrators to easily track inactive sites, inform site owners, and take appropriate actions to manage their content. By utilizing these policies, organizations can ensure that their SharePoint environment remains clean and up-to-date, improving overall efficiency and productivity.
Review the full details of site lifecycle policies and everything that’s available with SharePoint Advanced Management in the links below:
Manage site lifecycle policies – SharePoint in Microsoft 365 | Microsoft Learn
Microsoft Tech Community – Latest Blogs –Read More
Armchair Architects: Large Language Models (LLMs) & Vector Databases
David Blank-Edelman and our armchair architects Uli Homann and Eric Charran will be focusing on large language models (LLMs) and vector databases and their role in fueling AI, ML, and LLMs.
What are vector databases?
Eric defines vector databases as a way in which we store meaningful information about multi-dimensional aspects of data such as what’s called vectors, which are numerical, typically numerical integers which work very much like a traditional relational database system.
What’s interesting about the vector databases is that they help us solve different types of queries. One type of query is like a “nearest neighbor”. For example, if Spotify knows that Eric Charran loves Def Leppard and he loves this one song, what are some of the nearest other songs that are very, very similar based on a number of dimensions that Spotify might have so that it can recommend this other song. The way that it works is that it’s just using numerical distance between the vectors to figure that answer to that question out.
Uli added that vector databases in the context of AI are effectively using text and they’re converting text into these numerical representation. If you go into the PostgreSQL community for example, the PostgreSQL teams have already added a plug-in into PostgreSQL where you can take any text field, turn it into a vector and then you can take that vector and embed it into an LLM.
Vectors have been around for a very long time as it is part of the of the neural network model which at the end of the day days, those are vectors as well. This is now data specific because it’s not just databases, while databases will be prevalent, you will see search systems also expose their search index as vectors.
Azure Cognitive Search for example does that and Uli theorizes that other search systems do that as well. You can take that index and make it part, for example, of an OpenAI system or Bard or whatever AI system you like.
Vector databases is one way of implementing an AI system, the other method is embedding.
Vector Databases and Natural Language Processing (NLP)
Let’s look into how vector databases are used for in the real world and NLP, where embedding is used. For example taking word embeddings, sentence embeddings, making them specifically integer base so LLMs can actually include them in the corpus of information that’s used to train it. That is one vector database use case another use case is the “nearest neighbor” example earlier.
If you recall, for the nearest neighbor use case, if I have this particular item, object input, what are the nearest things closest or the farthest things away from it. This can include image and video retrieval, taking unstructured data but vectorizing it so that you can find it, surface it, and do all of those comparison things that are important. This can also include anomaly detections and geospatial data and then machine learning.
What does embedding mean in the context of LLMs?
LLMs get trained primarily on the Internet, so if you’re looking at Bard or you’re looking at OpenAI, they get a copy of the Internet as the corpus of knowledge and conceptually how it works is it is vectorized and put it into the LLM.
Now that’s a great set of knowledge, and if you use ChatGPT or Bing Chat or something similar, you will effectively access that Internet. This is great, but most of these LLMs are static, for example, the OpenAI models got compiled sometime in 2021. If you ask the model without any helper about an event in 2022, it won’t know because it got compiled, conceptually speaking, with knowledge that didn’t include 2022 events.
So now what happens is you bring in models from the Internet, for example, that effectively allow these LLMs to understand “oh, there is something beyond what I already know” and bring it in. This scenario would apply for example in an internet search.
If you’re an enterprise, you care about the global knowledge but you want your enterprise’s specific knowledge also to be part of this search. So that if somebody like Eric is looking for specific things in his new company, the company’s knowledge is available for him as well. That’s what’s called data grounding, you ground the model with the data that your enterprise has and expand the knowledge, and embedding is one technique of doing that.
Embedding simply says, take this vector of knowledge and fold it into your larger model so that every time you run a query, this embedding will be part of the query that the system evaluates before it responds to you. The way that Eric thinks about it is the vector database stores the integer-based representation of a concept found within a corpus of information of a web page on the Internet and what it allows you to do is to link near concepts together.
That’s how the LLMs, if it’s trained on these vectors and these embeddings, really understands concepts. It’s the vectorization of semantic concepts and then the distance equation between them allows the model to stitch these things together and respond accordingly.
Using One or Multi-shot training of an LLM
LLM teams now have the technology and tools to make it easy to bring vector stores and vector databases into your models through embeddings. A tip is to make sure you use the right tooling to help you, however before you do that you can use prompt engineering to also feed example data of what you’re looking for into the model. Part of the prompt engineering is what’s called one or multi-shot training. As part of the prompt you’re saying, “I’m expecting this kind of output.”
Then the system takes that into account and says “ah, that’s what you’re looking for” and responds in kind. You can feed it quite a lot of sample data, this is what I want you to look at and that’s obviously far cheaper than doing embeddings and other things because it’s part of the prompt and therefore should be always considered first before you go into embeddings.
Corporation will end up using embeddings, but you should start with one shot and multi-shot training because there is a lot of knowledge in those models that can be coaxed out if you give it the right prompt.
LLM Fine Tuning
Fine tuning is the way in which you take a model and you try to develop a fit for purpose data set. Whether it’s a pre-compiled model that you’ve downloaded or used or have already trained, you engage in additional training loops in order for it to train on a fit for purpose data set so that you can basically tune it to respond in the ways in which you intend for it to respond.
The fine tuning element is to adjust the model’s parameters based on iterative training loops around purpose driven data sets. The key part is where you bring specific data sets, and you take a layer of the model and train it. You are adding more constraints to the general-purpose model and it’s using your data to affect the training itself for the specific domain you are in for example healthcare or industrial automation domain. Fine tuning also helps extremely with hallucinations because you tell the system this is what you need to pay attention to and it will effectively adapt and be more precise.
Limitations of LLMs and One shot or Multi-shot Training
There are two things that large language modes are really bad at. One is math, so don’t ask it to do calculus for you, that’s not going to work. As of today, the second one is, you cannot point a LLM to a structured database and have the system just automatically understand this is the schema, this is the data and ask it to produce good responses. There is still much more work involved.
The state-of-the-art right now is that you effectively build shims or work arounds it; you write code that you can then integrate into the prompt. For example, OpenAI has a way for you to call a function inside your prompt, and that function can, for example, bring back relational or structured data.
From the one shot or multi-shot training, you can take the result set, which cannot be too large, and feed it into the prompt. There is also pipeline based programming, which is explained in this scenario.
You are an insurance company.
David is a customer of the insurance company and would like to understand his claim status.
You go into your the website and you type in the chat and the first thing that I need to know is who is David? In my CRM system you will know who is David, what insurances he has, what claims are open, etc.…That is all structured information out of the CRM system and maybe a claims management system.
The first phase is to parse the language, the text that David entered using GPT for example, and pull out the information that’s relevant and then feed it through structured API calls.
You get the result sets back.
You then create the prompt that you really want to use for the response using prompt engineering with the one and multi-shot training.
Then the system generates the response that you’re looking for in terms of “Hi, David, great to see you again here’s the status of the claim.”
You then had just used the OpenAI model in this case twice, not just once.
In summary, you use it first to understand the language and extract what you need to call the structured APIs and then you feed the response from the structured system back into the prompt so that it generates the response you’re looking for.
Increased Adoption of Vector Indexes
Eric brought up as an architect how do I figure out whether or not I need a dedicated vector database or can I use these things called vector search indexes? Another question may be how do I build a system so that I can help LLM developers be better at their job or more efficient at their job?
Eric thinks that we’re reaching a transition point in which vector databases used to be a Relational Database Management System (RDBMS) for vectors and answering queries associated with vectors.
He is seeing a lot of the lake house platforms and traditional database management systems adopt vector indexes so that developers don’t have to pick the data up and move it to another specific place, vectorize it and store it. Now there are these components of relational database management systems or lake houses that create vectors on top of where the data lives, so on top of delta tables for example. That’s one architectural consideration and it should make architects happy because the heaviest thing in the world to move is data and architects hate doing it.
Architectural Considerations around Vector Databases
The other architectural consideration is how do you actually arrive at the vectorization? Is it an ETL scheme on write operation? Is there logic associated with the vectorization themselves? All of that information is important to consider when you’re trying to create a platform for your organization.
If you’re creating your own foundational models, vectorization and the process by which data becomes vectorized if or embedded becomes very important.
You also have to worry about whether or not you’re allowed to store specific information based on your industry such as in financial services, life sciences, health, all those different things you may need to scan and tokenize your data as well before it goes into the vectorized process.
Another consideration for architects to think about is that although vectorization is a key technique, but we have now seen real data, in this case Microsoft, that vectorization is not necessary alone the answer to the question. Microsoft has seen that a search index plus vectorization is actually faster and more reliable from a response perspective for an open AI system than just the vector or just the query of the index.
When developing a solution, you should be much more flexible in this case where you say, “how am I going to go and get this data?” Sometimes it’s a combination of techniques, not just one technique that will work or be most efficient.
Architecture for Uli is an understanding of the tools that you have and really picking on what it is and ideally not looking for black or white answers as the world is mostly gray and picking the right tools together makes the right answer rather than a singular tool or technique.
Resources
Vector search in Azure AI Search
Geospatial data processing and analytics
Microsoft Azure AI Fundamentals: Natural Language Processing
Azure Database for PostgreSQL
Vector DB Lookup tool for flows in Azure AI Studio
Related episodes
Armchair Architects: LLMs & Vector Databases (Part 2)
Watch more episodes in the Armchair Architects Series
Watch more episodes in the Well-Architected Series
Recommended Next Steps
If you’d like to learn more about the general principles prescribed by Microsoft, we recommend Microsoft Cloud Adoption Framework for platform and environment-level guidance and Azure Well-Architected Framework. You can also register for an upcoming workshop led by Azure partners on cloud migration and adoption topics and incorporate click-through labs to ensure effective, pragmatic training.
You can view the whole videos below and check our more videos from the Azure Enablement Show.
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More
Microsoft and SAP work together to transform identity for SAP customers
SAP has recently announced its collaboration with Microsoft and advises their SAP Identity Management (IDM) customers to move their identity management scenarios to Microsoft Entra ID as their IDM approaches the end of maintenance. This latest collaboration creates new possibilities for Microsoft Entra and SAP to offer enhanced integration that will support a comprehensive identity and access governance framework.
Microsoft and SAP will deepen our longstanding partnership to combine our unique areas of expertise. We are committed to delivering the best identity management solutions for our customers and users, and we’re honored to partner with SAP on delivering seamless and secure identity management experiences that will support SAP customers’ digital transformation and cloud adoption goals. Over the years we’ve worked together to integrate our products and services, such as Microsoft Azure, Microsoft 365, SAP Cloud Platform, SAP S/4HANA, and SAP SuccessFactors.
Our aim is to help SAP customers with their migration path so they can continue to connect enterprise software and collaboration tools to work and innovate effectively, quickly, and seamlessly.
To learn more about our latest collaboration, read the blog post here.
Irina Nechaeva, General Manager, Identity and Network Access
Learn more about Microsoft Entra:
Related Articles: SAP’s blog - Preparing for SAP Identity Management’s End-of-Maintenance in 2027.
See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Learn more at Azure Active Directory (Azure AD) rename to Microsoft Entra ID
Join the conversation on the Microsoft Entra discussion space
Learn more about Microsoft Security
Microsoft Tech Community – Latest Blogs –Read More