Trainnet with parallel-CPU mode giving incorrect results
I’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
endI’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end I’m using trainnet to train a convolutional regression network to find the X-Y centroid of a subtle gradient region in an input image. The training data consist of paired 130×326 grayscale images and ground-truth output coordinates. Both the RMSE and loss function reach very small numbers (eg 10^-3) after a few minutes of training on a smal dataset. The trained network gives the expected results when trained in single-CPU mode, but when trained in parallel-CPU mode, the predictions are significantly off. To attempt debugging, I scaled back to a very simple network, disabled normalization, and trained with only two datapoints–fully expecting it to memorize the training data perfectly. Using single-CPU training mode, the trained network yields perfect predictions (as expected) on the training data, but after using parallel-CPU mode, the trained network does not predict correctly on the training data. I added in a more verbose loss function and confirmed that the reported losses (i.e. showin in the loss function during training) are consistent with the (Y,T) pairs during training, and that the T values are being correctly read from the training data.
It seems perhaps the final outputted network in parallel-CPU mode does not correcltly capture the results of the training.
I’m running 2024a on a MBPro (M2 Max), using Apple Accelerate BLAS. (Default BLAS persistently crashed in parallel mode with trainnet.)
Code snippet below…
layers = [
imageInputLayer([130 326 1],"Name","imageinput","Normalization","none")
convolution2dLayer([10 10],8,"dilation",[2 2],"Name","conv_1")
maxPooling2dLayer([2 2],"Name","maxpool_4")
batchNormalizationLayer
reluLayer("Name","relu_1")
convolution2dLayer([2 2],16,"Name","conv_2")
fullyConnectedLayer(2,"Name","fc")];
opts = trainingOptions(‘sgdm’, …
‘InitialLearnRate’,1e-7, …
‘LearnRateSchedule’,’piecewise’,…
‘LearnRateDropPeriod’,500,…
‘LearnRateDropFactor’,.25,…
‘MaxEpochs’,1000, …
‘Verbose’,false, …
‘ExecutionEnvironment’,’parallel’,…
‘Shuffle’,’every-epoch’,…
‘Plots’,’training-progress’, …
‘OutputNetwork’,’last-iteration’);
FOVCnet = trainnet(trainingData,net,@modelLoss,opts);
function loss = modelLoss(Y,T) % define loss function
Y
T
loss = mse(Y,T)
end trainnet, parallel-cpu, regression, macos MATLAB Answers — New Questions