Improve convergence of GD with momentum
Hello,
I have made a simple implementation of the GD algorithm with momentum and it seems to me that convergence is very slow, it takes about 15k iterations to reach predefined tolerance.
I feel like there should be an improvement to this code, but I don’t see where.
Please, suggest me a better implementation:
clear all
clc
% Initial guess
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]’;
% Parameters
alpha = 0.001;
beta = 0.4;
max_iter = 10000;
tol = 1e-8;
x = x0; % find the values of x that minimize␣
i = 0;
step = zeros(size(x));
mse = rosenbrock(x);
fprintf(‘Initial MSE = %14.10f x = %sn’, mse, mat2str(x’)); % print initial values
while mse > tol
grad = rosenbrock_gradient(x);
gnorm = norm(grad); % gradient norm
step = -(1-beta)*alpha*grad + beta*step;
x = x + step;
mse = rosenbrock(x); % update the mean squared error
i = i + 1;
end
fprintf(‘iterations = %6dn’, i);
fprintf(‘Final MSE = %14.10f x = %sn’, mse, mat2str(x’));
fprintf(‘gradient = %sn’, mat2str(grad’));
fprintf(‘gradient norm = %fn’, gnorm);
% Define the Rosenbrock function
function [mse] = rosenbrock(x)
mse = sum(100.0 * (x(2:end) – x(1:end-1).^2.0).^2.0 + (1 – x(1:end-1)).^2.0);
end
% Define the gradient of the Rosenbrock function
function [grad] = rosenbrock_gradient(x)
n = length(x);
grad = zeros(n, 1);
grad(1) = -400 * x(1) * (x(2) – x(1)^2) – 2 * (1 – x(1));
grad(2:n-1) = 200 * (x(2:n-1) – x(1:n-2).^2) – 400 * x(2:n-1) .* (x(3:n) – x(2:n-1).^2) – 2 * (1 – x(2:n-1));
grad(n) = 200 * (x(n) – x(n-1)^2);
endHello,
I have made a simple implementation of the GD algorithm with momentum and it seems to me that convergence is very slow, it takes about 15k iterations to reach predefined tolerance.
I feel like there should be an improvement to this code, but I don’t see where.
Please, suggest me a better implementation:
clear all
clc
% Initial guess
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]’;
% Parameters
alpha = 0.001;
beta = 0.4;
max_iter = 10000;
tol = 1e-8;
x = x0; % find the values of x that minimize␣
i = 0;
step = zeros(size(x));
mse = rosenbrock(x);
fprintf(‘Initial MSE = %14.10f x = %sn’, mse, mat2str(x’)); % print initial values
while mse > tol
grad = rosenbrock_gradient(x);
gnorm = norm(grad); % gradient norm
step = -(1-beta)*alpha*grad + beta*step;
x = x + step;
mse = rosenbrock(x); % update the mean squared error
i = i + 1;
end
fprintf(‘iterations = %6dn’, i);
fprintf(‘Final MSE = %14.10f x = %sn’, mse, mat2str(x’));
fprintf(‘gradient = %sn’, mat2str(grad’));
fprintf(‘gradient norm = %fn’, gnorm);
% Define the Rosenbrock function
function [mse] = rosenbrock(x)
mse = sum(100.0 * (x(2:end) – x(1:end-1).^2.0).^2.0 + (1 – x(1:end-1)).^2.0);
end
% Define the gradient of the Rosenbrock function
function [grad] = rosenbrock_gradient(x)
n = length(x);
grad = zeros(n, 1);
grad(1) = -400 * x(1) * (x(2) – x(1)^2) – 2 * (1 – x(1));
grad(2:n-1) = 200 * (x(2:n-1) – x(1:n-2).^2) – 400 * x(2:n-1) .* (x(3:n) – x(2:n-1).^2) – 2 * (1 – x(2:n-1));
grad(n) = 200 * (x(n) – x(n-1)^2);
end Hello,
I have made a simple implementation of the GD algorithm with momentum and it seems to me that convergence is very slow, it takes about 15k iterations to reach predefined tolerance.
I feel like there should be an improvement to this code, but I don’t see where.
Please, suggest me a better implementation:
clear all
clc
% Initial guess
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]’;
% Parameters
alpha = 0.001;
beta = 0.4;
max_iter = 10000;
tol = 1e-8;
x = x0; % find the values of x that minimize␣
i = 0;
step = zeros(size(x));
mse = rosenbrock(x);
fprintf(‘Initial MSE = %14.10f x = %sn’, mse, mat2str(x’)); % print initial values
while mse > tol
grad = rosenbrock_gradient(x);
gnorm = norm(grad); % gradient norm
step = -(1-beta)*alpha*grad + beta*step;
x = x + step;
mse = rosenbrock(x); % update the mean squared error
i = i + 1;
end
fprintf(‘iterations = %6dn’, i);
fprintf(‘Final MSE = %14.10f x = %sn’, mse, mat2str(x’));
fprintf(‘gradient = %sn’, mat2str(grad’));
fprintf(‘gradient norm = %fn’, gnorm);
% Define the Rosenbrock function
function [mse] = rosenbrock(x)
mse = sum(100.0 * (x(2:end) – x(1:end-1).^2.0).^2.0 + (1 – x(1:end-1)).^2.0);
end
% Define the gradient of the Rosenbrock function
function [grad] = rosenbrock_gradient(x)
n = length(x);
grad = zeros(n, 1);
grad(1) = -400 * x(1) * (x(2) – x(1)^2) – 2 * (1 – x(1));
grad(2:n-1) = 200 * (x(2:n-1) – x(1:n-2).^2) – 400 * x(2:n-1) .* (x(3:n) – x(2:n-1).^2) – 2 * (1 – x(2:n-1));
grad(n) = 200 * (x(n) – x(n-1)^2);
end gd with momentum MATLAB Answers — New Questions