Function dlupdate to train network with Nesterov accelerated gradient
Dear all,
I wanted to use example from matlab website to train network with Nesterov accelerated gradient. I found functions to train network with sgd, sgdm, but I couldn’t find function to train network with nesterov accelerated gradient. I found on mathworks website that to create my own function to train network I have to use dlupdate function. I started with example from mathworks website (Update parameters using custom function – MATLAB dlupdate (mathworks.com)) and it works, but I don’t know how to do it with Nesterov accelerated gradient. Here is my code with sgd:
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the SGD algorithm defined in
% the sgdFunction helper function.
updateFcn = @(net,gradients) sgdFunction(net,gradients,learnRate);
net = dlupdate(updateFcn,net,gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = sgdFunction(parameters,gradients,learnRate)
parameters = parameters – learnRate .* gradients;
end
And it gives nice result with 0.8192 accuracy score
But when I try Nesterov accelearated gradient
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.001;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients) nesterovFunction(net, gradients, learnRate, momentum, velocities);
net = dlupdate(updateFcn, net, gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = nesterovFunction(parameters, gradients, learnRate, momentum, velocities)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = gradients;
else
% Update velocity
velocities = momentum * velocities + learnRate * gradients;
end
% Update parameters
parameters = parameters – velocities;
end
I got only 0.1 accuracy score and loss function is probably bad
I’m not sure, that this is Nesterov accelerated gradient or it is only sgdm with momentum. What is more I don’t know why the loss function does not converge to zero, and why it constant.
Best regards,
DanielDear all,
I wanted to use example from matlab website to train network with Nesterov accelerated gradient. I found functions to train network with sgd, sgdm, but I couldn’t find function to train network with nesterov accelerated gradient. I found on mathworks website that to create my own function to train network I have to use dlupdate function. I started with example from mathworks website (Update parameters using custom function – MATLAB dlupdate (mathworks.com)) and it works, but I don’t know how to do it with Nesterov accelerated gradient. Here is my code with sgd:
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the SGD algorithm defined in
% the sgdFunction helper function.
updateFcn = @(net,gradients) sgdFunction(net,gradients,learnRate);
net = dlupdate(updateFcn,net,gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = sgdFunction(parameters,gradients,learnRate)
parameters = parameters – learnRate .* gradients;
end
And it gives nice result with 0.8192 accuracy score
But when I try Nesterov accelearated gradient
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.001;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients) nesterovFunction(net, gradients, learnRate, momentum, velocities);
net = dlupdate(updateFcn, net, gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = nesterovFunction(parameters, gradients, learnRate, momentum, velocities)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = gradients;
else
% Update velocity
velocities = momentum * velocities + learnRate * gradients;
end
% Update parameters
parameters = parameters – velocities;
end
I got only 0.1 accuracy score and loss function is probably bad
I’m not sure, that this is Nesterov accelerated gradient or it is only sgdm with momentum. What is more I don’t know why the loss function does not converge to zero, and why it constant.
Best regards,
Daniel Dear all,
I wanted to use example from matlab website to train network with Nesterov accelerated gradient. I found functions to train network with sgd, sgdm, but I couldn’t find function to train network with nesterov accelerated gradient. I found on mathworks website that to create my own function to train network I have to use dlupdate function. I started with example from mathworks website (Update parameters using custom function – MATLAB dlupdate (mathworks.com)) and it works, but I don’t know how to do it with Nesterov accelerated gradient. Here is my code with sgd:
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the SGD algorithm defined in
% the sgdFunction helper function.
updateFcn = @(net,gradients) sgdFunction(net,gradients,learnRate);
net = dlupdate(updateFcn,net,gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = sgdFunction(parameters,gradients,learnRate)
parameters = parameters – learnRate .* gradients;
end
And it gives nice result with 0.8192 accuracy score
But when I try Nesterov accelearated gradient
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],’Mean’,mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
convolution2dLayer(3,20,’Padding’,1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.001;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients) nesterovFunction(net, gradients, learnRate, momentum, velocities);
net = dlupdate(updateFcn, net, gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = nesterovFunction(parameters, gradients, learnRate, momentum, velocities)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = gradients;
else
% Update velocity
velocities = momentum * velocities + learnRate * gradients;
end
% Update parameters
parameters = parameters – velocities;
end
I got only 0.1 accuracy score and loss function is probably bad
I’m not sure, that this is Nesterov accelerated gradient or it is only sgdm with momentum. What is more I don’t know why the loss function does not converge to zero, and why it constant.
Best regards,
Daniel neural network, nesterov accelerated gradient, sgdm MATLAB Answers — New Questions