Fail to start parpool on cluster
Hello, I’m submitting a batch job using SLURM:
sbatch shllscrpt96_16jul2024.sh false iter 1
shllscrpt96_16jul2024.sh itself looks like:
#!/bin/bash
#SBATCH -n 96
#SBATCH –mail-type="ALL"
#SBATCH –mem-per-cpu=8000M
module purge
module load matlab/2023b
matlab -nosplash -nodesktop -nodisplay -r "scriptfun_11jul2024($1,’$2′,$3); exit"
scriptfun_11jul2024(run_Spec, ms_Display, Category_MS) itself looks like:
function scriptfun_11jul2024(run_Spec,ms_Display,Category_MS)
n_cores = str2double(getenv(‘SLURM_NTASKS’));
pool = parpool(‘local’, n_cores);
"lots of statements (I’m happy to provide more details, but I’m 100% sure this is not the part causing the problem..)"
delete(pool)
end
The batch job "successfully" completes, but the output file shows that it failed:
Starting parallel pool (parpool) using the ‘local’ profile …
Preserving jobs with IDs: 13 14 15 16 because they contain crash dump files.
You can use ‘delete(myCluster.Jobs)’ to remove all jobs created with profile Processes. To create ‘myCluster’ use ‘myCluster = parcluster(‘Processes’)’.
Parallel pool using the ‘Processes’ profile is shutting down.
{�Error using parpool
Parallel pool failed to start with the following error. For more detailed
information, validate the profile ‘Processes’ in the Cluster Profile Manager.
Error in kimscriptfun_11jul2024 (line 13)
pool = parpool(‘local’, n_cores);
Caused by:
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job failed with no message.
}�
I’d appreciate any and all thoughts on what I might have done wrong. Thank you!Hello, I’m submitting a batch job using SLURM:
sbatch shllscrpt96_16jul2024.sh false iter 1
shllscrpt96_16jul2024.sh itself looks like:
#!/bin/bash
#SBATCH -n 96
#SBATCH –mail-type="ALL"
#SBATCH –mem-per-cpu=8000M
module purge
module load matlab/2023b
matlab -nosplash -nodesktop -nodisplay -r "scriptfun_11jul2024($1,’$2′,$3); exit"
scriptfun_11jul2024(run_Spec, ms_Display, Category_MS) itself looks like:
function scriptfun_11jul2024(run_Spec,ms_Display,Category_MS)
n_cores = str2double(getenv(‘SLURM_NTASKS’));
pool = parpool(‘local’, n_cores);
"lots of statements (I’m happy to provide more details, but I’m 100% sure this is not the part causing the problem..)"
delete(pool)
end
The batch job "successfully" completes, but the output file shows that it failed:
Starting parallel pool (parpool) using the ‘local’ profile …
Preserving jobs with IDs: 13 14 15 16 because they contain crash dump files.
You can use ‘delete(myCluster.Jobs)’ to remove all jobs created with profile Processes. To create ‘myCluster’ use ‘myCluster = parcluster(‘Processes’)’.
Parallel pool using the ‘Processes’ profile is shutting down.
{�Error using parpool
Parallel pool failed to start with the following error. For more detailed
information, validate the profile ‘Processes’ in the Cluster Profile Manager.
Error in kimscriptfun_11jul2024 (line 13)
pool = parpool(‘local’, n_cores);
Caused by:
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job failed with no message.
}�
I’d appreciate any and all thoughts on what I might have done wrong. Thank you! Hello, I’m submitting a batch job using SLURM:
sbatch shllscrpt96_16jul2024.sh false iter 1
shllscrpt96_16jul2024.sh itself looks like:
#!/bin/bash
#SBATCH -n 96
#SBATCH –mail-type="ALL"
#SBATCH –mem-per-cpu=8000M
module purge
module load matlab/2023b
matlab -nosplash -nodesktop -nodisplay -r "scriptfun_11jul2024($1,’$2′,$3); exit"
scriptfun_11jul2024(run_Spec, ms_Display, Category_MS) itself looks like:
function scriptfun_11jul2024(run_Spec,ms_Display,Category_MS)
n_cores = str2double(getenv(‘SLURM_NTASKS’));
pool = parpool(‘local’, n_cores);
"lots of statements (I’m happy to provide more details, but I’m 100% sure this is not the part causing the problem..)"
delete(pool)
end
The batch job "successfully" completes, but the output file shows that it failed:
Starting parallel pool (parpool) using the ‘local’ profile …
Preserving jobs with IDs: 13 14 15 16 because they contain crash dump files.
You can use ‘delete(myCluster.Jobs)’ to remove all jobs created with profile Processes. To create ‘myCluster’ use ‘myCluster = parcluster(‘Processes’)’.
Parallel pool using the ‘Processes’ profile is shutting down.
{�Error using parpool
Parallel pool failed to start with the following error. For more detailed
information, validate the profile ‘Processes’ in the Cluster Profile Manager.
Error in kimscriptfun_11jul2024 (line 13)
pool = parpool(‘local’, n_cores);
Caused by:
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error using
parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job failed with no message.
}�
I’d appreciate any and all thoughts on what I might have done wrong. Thank you! parpool, cluster, batch-job MATLAB Answers — New Questions