Trainnet with parallel-CPU mode giving incorrect results

I’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);

function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
endI’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);

function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end I’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);

function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end trainnet, parallel-cpu, regression, macos MATLAB Answers — New Questions

Cart

Cart

Related posts

Algebraic Loops with FMU

Error using confusionchart (line 68) Order must be an exact permutation of the class labels.

TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

Leave a Reply Cancel reply

Information

Contact Us

All Categories

Search

Cart

All Categories

Search

Cart