MATLAB R2024b GPU validation device fail for Multi-Instance GPU (MIG) A100
We are currently installing MATLAB R2024b on our HPC cluster. The intillation works beautifully across all of our GPUs except an A100 that utilizes NVIDIA’s Multi-Instance GPU (MIG). When I launch a CLI session using
matlab -nodesktop -nodisplay -nosoftwareopengl
and run "validateGPU", I receive the following error: "Encountered error when calling NVML. The NVML error was: Invalid Argument."
The same sequence does not produce an error when ran on one of our other A100 GPUs with the same Driver and CUDA version. In our MATLAB version R2023b we do not receive this error with our MIG GPU and it is able to run GPU code successfully.
For robustness, here is the full output:
nvidia-smi
+—————————————————————————————–+
| NVIDIA-SMI 550.90.12 Driver Version: 550.90.12 CUDA Version: 12.4 |
|—————————————–+————————+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:21:00.0 Off | On |
| N/A 29C P0 32W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:81:00.0 Off | On |
| N/A 28C P0 33W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:E2:00.0 Off | On |
| N/A 28C P0 34W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
+—————————————————————————————–+
| MIG devices: |
+——————+———————————-+———–+———————–+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 38MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+——————+———————————-+———–+———————–+
+—————————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+—————————————————————————————–+
Launching a session and attempting to validate the GPU:
matlab -nodesktop -nodisplay -nosoftwareopengl
< M A T L A B (R) >
Copyright 1984-2024 The MathWorks, Inc.
R2024b Update 2 (24.2.0.2773142) 64-bit (glnxa64)
October 22, 2024
To get started, type doc.
For product information, visit www.mathworks.com.
Warning: OpenGL Startup options will be removed in a future release.
>> validateGPU
# Beginning GPU validation
# Performing system validation
# CUDA-supported platform ………………………………………….PASSED
# CUDA-enabled graphics driver exists ……………………………….PASSED
# Version: 550.90.12
# CUDA-enabled graphics driver load …………………………………PASSED
# CUDA environment variables ……………………………………….PASSED
# CUDA_VISIBLE_DEVICES: "0"
# CUDA device count ……………………………………………….PASSED
# Found 1 devices.
# GPU libraries load ………………………………………………PASSED
#
# Performing device validation for device index 1
# Device exists …………………………………………………..FAILED
# Encountered error when calling NVML. The NVML error was:
# Invalid Argument.
#
# Device supported ………………………………………………..SKIPPED
# Device available ………………………………………………..SKIPPED
# Device selectable ……………………………………………….SKIPPED
# Device memory allocation …………………………………………SKIPPED
# Device kernel launch …………………………………………….SKIPPED
# Finished GPU validation with 1 failures.
Output using "coder.checkGpuInstall":
>> gpuEnvObj = coder.gpuEnvConfig;
>> gpuEnvObj.GpuId = 0;
>> gpuEnvObj.BasicCodegen = 1;
>> gpuEnvObj.BasicCodeexec = 1;
>> results = coder.checkGpuInstall(gpuEnvObj)
Compatible GPU : FAILED (There is a problem with the graphics driver or with this GPU device. Code execution will not be available. Check that you have a supported GPU and the latest graphics driver.)
CUDA Environment : FAILED (Unable to execute the nvcc command. Check your CUDA Toolkit installation.)
Runtime : PASSED
cuFFT : PASSED
cuSOLVER : PASSED
cuBLAS : PASSED
Host Compiler : PASSED
results =
struct with fields:
gpu: 0
cuda: 0
cudnn: 0
tensorrt: 0
hostcompiler: 1
basiccodegen: 0
basiccodeexec: 0
deepcodegen: 0
tensorrtdatatype: 0
deepcodeexec: 0We are currently installing MATLAB R2024b on our HPC cluster. The intillation works beautifully across all of our GPUs except an A100 that utilizes NVIDIA’s Multi-Instance GPU (MIG). When I launch a CLI session using
matlab -nodesktop -nodisplay -nosoftwareopengl
and run "validateGPU", I receive the following error: "Encountered error when calling NVML. The NVML error was: Invalid Argument."
The same sequence does not produce an error when ran on one of our other A100 GPUs with the same Driver and CUDA version. In our MATLAB version R2023b we do not receive this error with our MIG GPU and it is able to run GPU code successfully.
For robustness, here is the full output:
nvidia-smi
+—————————————————————————————–+
| NVIDIA-SMI 550.90.12 Driver Version: 550.90.12 CUDA Version: 12.4 |
|—————————————–+————————+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:21:00.0 Off | On |
| N/A 29C P0 32W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:81:00.0 Off | On |
| N/A 28C P0 33W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:E2:00.0 Off | On |
| N/A 28C P0 34W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
+—————————————————————————————–+
| MIG devices: |
+——————+———————————-+———–+———————–+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 38MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+——————+———————————-+———–+———————–+
+—————————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+—————————————————————————————–+
Launching a session and attempting to validate the GPU:
matlab -nodesktop -nodisplay -nosoftwareopengl
< M A T L A B (R) >
Copyright 1984-2024 The MathWorks, Inc.
R2024b Update 2 (24.2.0.2773142) 64-bit (glnxa64)
October 22, 2024
To get started, type doc.
For product information, visit www.mathworks.com.
Warning: OpenGL Startup options will be removed in a future release.
>> validateGPU
# Beginning GPU validation
# Performing system validation
# CUDA-supported platform ………………………………………….PASSED
# CUDA-enabled graphics driver exists ……………………………….PASSED
# Version: 550.90.12
# CUDA-enabled graphics driver load …………………………………PASSED
# CUDA environment variables ……………………………………….PASSED
# CUDA_VISIBLE_DEVICES: "0"
# CUDA device count ……………………………………………….PASSED
# Found 1 devices.
# GPU libraries load ………………………………………………PASSED
#
# Performing device validation for device index 1
# Device exists …………………………………………………..FAILED
# Encountered error when calling NVML. The NVML error was:
# Invalid Argument.
#
# Device supported ………………………………………………..SKIPPED
# Device available ………………………………………………..SKIPPED
# Device selectable ……………………………………………….SKIPPED
# Device memory allocation …………………………………………SKIPPED
# Device kernel launch …………………………………………….SKIPPED
# Finished GPU validation with 1 failures.
Output using "coder.checkGpuInstall":
>> gpuEnvObj = coder.gpuEnvConfig;
>> gpuEnvObj.GpuId = 0;
>> gpuEnvObj.BasicCodegen = 1;
>> gpuEnvObj.BasicCodeexec = 1;
>> results = coder.checkGpuInstall(gpuEnvObj)
Compatible GPU : FAILED (There is a problem with the graphics driver or with this GPU device. Code execution will not be available. Check that you have a supported GPU and the latest graphics driver.)
CUDA Environment : FAILED (Unable to execute the nvcc command. Check your CUDA Toolkit installation.)
Runtime : PASSED
cuFFT : PASSED
cuSOLVER : PASSED
cuBLAS : PASSED
Host Compiler : PASSED
results =
struct with fields:
gpu: 0
cuda: 0
cudnn: 0
tensorrt: 0
hostcompiler: 1
basiccodegen: 0
basiccodeexec: 0
deepcodegen: 0
tensorrtdatatype: 0
deepcodeexec: 0 We are currently installing MATLAB R2024b on our HPC cluster. The intillation works beautifully across all of our GPUs except an A100 that utilizes NVIDIA’s Multi-Instance GPU (MIG). When I launch a CLI session using
matlab -nodesktop -nodisplay -nosoftwareopengl
and run "validateGPU", I receive the following error: "Encountered error when calling NVML. The NVML error was: Invalid Argument."
The same sequence does not produce an error when ran on one of our other A100 GPUs with the same Driver and CUDA version. In our MATLAB version R2023b we do not receive this error with our MIG GPU and it is able to run GPU code successfully.
For robustness, here is the full output:
nvidia-smi
+—————————————————————————————–+
| NVIDIA-SMI 550.90.12 Driver Version: 550.90.12 CUDA Version: 12.4 |
|—————————————–+————————+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:21:00.0 Off | On |
| N/A 29C P0 32W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:81:00.0 Off | On |
| N/A 28C P0 33W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:E2:00.0 Off | On |
| N/A 28C P0 34W / 250W | 75MiB / 40960MiB | N/A Default |
| | | Enabled |
+—————————————–+————————+———————-+
+—————————————————————————————–+
| MIG devices: |
+——————+———————————-+———–+———————–+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 38MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+——————+———————————-+———–+———————–+
+—————————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+—————————————————————————————–+
Launching a session and attempting to validate the GPU:
matlab -nodesktop -nodisplay -nosoftwareopengl
< M A T L A B (R) >
Copyright 1984-2024 The MathWorks, Inc.
R2024b Update 2 (24.2.0.2773142) 64-bit (glnxa64)
October 22, 2024
To get started, type doc.
For product information, visit www.mathworks.com.
Warning: OpenGL Startup options will be removed in a future release.
>> validateGPU
# Beginning GPU validation
# Performing system validation
# CUDA-supported platform ………………………………………….PASSED
# CUDA-enabled graphics driver exists ……………………………….PASSED
# Version: 550.90.12
# CUDA-enabled graphics driver load …………………………………PASSED
# CUDA environment variables ……………………………………….PASSED
# CUDA_VISIBLE_DEVICES: "0"
# CUDA device count ……………………………………………….PASSED
# Found 1 devices.
# GPU libraries load ………………………………………………PASSED
#
# Performing device validation for device index 1
# Device exists …………………………………………………..FAILED
# Encountered error when calling NVML. The NVML error was:
# Invalid Argument.
#
# Device supported ………………………………………………..SKIPPED
# Device available ………………………………………………..SKIPPED
# Device selectable ……………………………………………….SKIPPED
# Device memory allocation …………………………………………SKIPPED
# Device kernel launch …………………………………………….SKIPPED
# Finished GPU validation with 1 failures.
Output using "coder.checkGpuInstall":
>> gpuEnvObj = coder.gpuEnvConfig;
>> gpuEnvObj.GpuId = 0;
>> gpuEnvObj.BasicCodegen = 1;
>> gpuEnvObj.BasicCodeexec = 1;
>> results = coder.checkGpuInstall(gpuEnvObj)
Compatible GPU : FAILED (There is a problem with the graphics driver or with this GPU device. Code execution will not be available. Check that you have a supported GPU and the latest graphics driver.)
CUDA Environment : FAILED (Unable to execute the nvcc command. Check your CUDA Toolkit installation.)
Runtime : PASSED
cuFFT : PASSED
cuSOLVER : PASSED
cuBLAS : PASSED
Host Compiler : PASSED
results =
struct with fields:
gpu: 0
cuda: 0
cudnn: 0
tensorrt: 0
hostcompiler: 1
basiccodegen: 0
basiccodeexec: 0
deepcodegen: 0
tensorrtdatatype: 0
deepcodeexec: 0 gpu, matlab, mig, nvml error MATLAB Answers — New Questions