doubt in the backpropagation algorithm
Hi
I’m studying neural networks and i’m doing a NN with 2 hidden layers and one neuron in the output layer
While I was studying and coding my NN, I faced one doubt.
In the backward step, the math behind this is clear:
to the output layer we have:
, where ⊙ is the hadamard product.
and to hidden layers, we have:
my problem is that, when I code this formulas, I need to change the second equation when I will calculate the the gradient of the first hidden layer (the hidden layer next to the input layer) to match the dimensions of the matrix as follow below:
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%update weights and biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
% I wrote this code partially in portuguese so let me explain a litlle.
%’delta_saida’ is the gradient of the output layer
%delta_h2 is the gradient of the second hidden layer
%delta_h1 is the gradient of the first hidden layer
%w_out,w2 and w1 are the weights of output, second hidden layer and first hidden layer, respectively.
%b_out,b2 and b1 are the biases of output, second hidden layer and first hidden layer, respectively.
% the function selecionar_funcao() is just to calculate the derivative accordingly with the activation function of the layer
%As you can see, I need to change delta_h1 to match the matrix dimensions
It is right to change the formula like i’m doing in my code ? I’m asking it because in my mind, the way that we calculate the gradient of all hidden layers must be the same, but in my case it isn’t true. I will share part of my code here to help anyone to see if i’m doing some mistake
%weights and biases initialization
w1 = randn(num_entradas,n_h1)*sqrt(2/num_entradas);
w2 = randn(n_h1,n_h2) *sqrt(2/n_h1);
w_out = randn(n_h2,n_out) *sqrt(2/n_h2);
b1 = randn(1, n_h1) * sqrt(2/num_entradas);
b2 = randn(1, n_h2) * sqrt(2/n_h1);
b_out = randn(1,n_out) * sqrt(2/n_h2);
%backpropagation
for epoch =1:max_epocas
soma_valid = 0;
soma_estim = 0;
%embaralhar os dados
conj_estim = embaralhar(conj_estim);
% conj_valid = embaralhar(conj_valid);
%Validating
for j=1:size(conj_valid,1)
enter_valid = conj_valid(j,2:end);
h1_in_valid = [enter_valid,1]*[w1;b1];
h1_out_valid = selecionar_funcao(h1_in_valid,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_valid = [h1_out_valid,1]*[w2;b2];
h2_out_valid = selecionar_funcao(h2_in_valid,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_valid = [h2_out_valid,1]*[w_out;b_out];
saida_out_valid = selecionar_funcao(saida_in_valid,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_valid = conj_valid(j,1) – saida_out_valid;
soma_valid = soma_valid + (erro_valid^2);
end
erro_atual_valid = (soma_valid/(2*size(conj_valid,1)));
erros_epoca_valid = [erros_epoca_valid;erro_atual_valid];
%trainning
for i =1:size(conj_estim,1)
enter_estim = conj_estim(i,2:end);
h1_in_estim = [enter_estim,1]*[w1;b1];
h1_out_estim = selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_estim = [h1_out_estim,1]*[w2;b2];
h2_out_estim = selecionar_funcao(h2_in_estim,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_estim = [h2_out_estim,1]*[w_out;b_out];
saida_out_estim = selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_estim = conj_estim(i,1) – saida_out_estim;
soma_estim = soma_estim + (erro_estim^2);
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%atualizar pesos e biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
end
erro_atual_estim = (soma_estim/(2*size(conj_estim,1)));
erros_epoca_estim = [erros_epoca_estim;erro_atual_estim];
if erros_epoca_estim(epoch) <limiar
break
else
end
endHi
I’m studying neural networks and i’m doing a NN with 2 hidden layers and one neuron in the output layer
While I was studying and coding my NN, I faced one doubt.
In the backward step, the math behind this is clear:
to the output layer we have:
, where ⊙ is the hadamard product.
and to hidden layers, we have:
my problem is that, when I code this formulas, I need to change the second equation when I will calculate the the gradient of the first hidden layer (the hidden layer next to the input layer) to match the dimensions of the matrix as follow below:
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%update weights and biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
% I wrote this code partially in portuguese so let me explain a litlle.
%’delta_saida’ is the gradient of the output layer
%delta_h2 is the gradient of the second hidden layer
%delta_h1 is the gradient of the first hidden layer
%w_out,w2 and w1 are the weights of output, second hidden layer and first hidden layer, respectively.
%b_out,b2 and b1 are the biases of output, second hidden layer and first hidden layer, respectively.
% the function selecionar_funcao() is just to calculate the derivative accordingly with the activation function of the layer
%As you can see, I need to change delta_h1 to match the matrix dimensions
It is right to change the formula like i’m doing in my code ? I’m asking it because in my mind, the way that we calculate the gradient of all hidden layers must be the same, but in my case it isn’t true. I will share part of my code here to help anyone to see if i’m doing some mistake
%weights and biases initialization
w1 = randn(num_entradas,n_h1)*sqrt(2/num_entradas);
w2 = randn(n_h1,n_h2) *sqrt(2/n_h1);
w_out = randn(n_h2,n_out) *sqrt(2/n_h2);
b1 = randn(1, n_h1) * sqrt(2/num_entradas);
b2 = randn(1, n_h2) * sqrt(2/n_h1);
b_out = randn(1,n_out) * sqrt(2/n_h2);
%backpropagation
for epoch =1:max_epocas
soma_valid = 0;
soma_estim = 0;
%embaralhar os dados
conj_estim = embaralhar(conj_estim);
% conj_valid = embaralhar(conj_valid);
%Validating
for j=1:size(conj_valid,1)
enter_valid = conj_valid(j,2:end);
h1_in_valid = [enter_valid,1]*[w1;b1];
h1_out_valid = selecionar_funcao(h1_in_valid,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_valid = [h1_out_valid,1]*[w2;b2];
h2_out_valid = selecionar_funcao(h2_in_valid,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_valid = [h2_out_valid,1]*[w_out;b_out];
saida_out_valid = selecionar_funcao(saida_in_valid,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_valid = conj_valid(j,1) – saida_out_valid;
soma_valid = soma_valid + (erro_valid^2);
end
erro_atual_valid = (soma_valid/(2*size(conj_valid,1)));
erros_epoca_valid = [erros_epoca_valid;erro_atual_valid];
%trainning
for i =1:size(conj_estim,1)
enter_estim = conj_estim(i,2:end);
h1_in_estim = [enter_estim,1]*[w1;b1];
h1_out_estim = selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_estim = [h1_out_estim,1]*[w2;b2];
h2_out_estim = selecionar_funcao(h2_in_estim,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_estim = [h2_out_estim,1]*[w_out;b_out];
saida_out_estim = selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_estim = conj_estim(i,1) – saida_out_estim;
soma_estim = soma_estim + (erro_estim^2);
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%atualizar pesos e biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
end
erro_atual_estim = (soma_estim/(2*size(conj_estim,1)));
erros_epoca_estim = [erros_epoca_estim;erro_atual_estim];
if erros_epoca_estim(epoch) <limiar
break
else
end
end Hi
I’m studying neural networks and i’m doing a NN with 2 hidden layers and one neuron in the output layer
While I was studying and coding my NN, I faced one doubt.
In the backward step, the math behind this is clear:
to the output layer we have:
, where ⊙ is the hadamard product.
and to hidden layers, we have:
my problem is that, when I code this formulas, I need to change the second equation when I will calculate the the gradient of the first hidden layer (the hidden layer next to the input layer) to match the dimensions of the matrix as follow below:
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%update weights and biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
% I wrote this code partially in portuguese so let me explain a litlle.
%’delta_saida’ is the gradient of the output layer
%delta_h2 is the gradient of the second hidden layer
%delta_h1 is the gradient of the first hidden layer
%w_out,w2 and w1 are the weights of output, second hidden layer and first hidden layer, respectively.
%b_out,b2 and b1 are the biases of output, second hidden layer and first hidden layer, respectively.
% the function selecionar_funcao() is just to calculate the derivative accordingly with the activation function of the layer
%As you can see, I need to change delta_h1 to match the matrix dimensions
It is right to change the formula like i’m doing in my code ? I’m asking it because in my mind, the way that we calculate the gradient of all hidden layers must be the same, but in my case it isn’t true. I will share part of my code here to help anyone to see if i’m doing some mistake
%weights and biases initialization
w1 = randn(num_entradas,n_h1)*sqrt(2/num_entradas);
w2 = randn(n_h1,n_h2) *sqrt(2/n_h1);
w_out = randn(n_h2,n_out) *sqrt(2/n_h2);
b1 = randn(1, n_h1) * sqrt(2/num_entradas);
b2 = randn(1, n_h2) * sqrt(2/n_h1);
b_out = randn(1,n_out) * sqrt(2/n_h2);
%backpropagation
for epoch =1:max_epocas
soma_valid = 0;
soma_estim = 0;
%embaralhar os dados
conj_estim = embaralhar(conj_estim);
% conj_valid = embaralhar(conj_valid);
%Validating
for j=1:size(conj_valid,1)
enter_valid = conj_valid(j,2:end);
h1_in_valid = [enter_valid,1]*[w1;b1];
h1_out_valid = selecionar_funcao(h1_in_valid,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_valid = [h1_out_valid,1]*[w2;b2];
h2_out_valid = selecionar_funcao(h2_in_valid,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_valid = [h2_out_valid,1]*[w_out;b_out];
saida_out_valid = selecionar_funcao(saida_in_valid,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_valid = conj_valid(j,1) – saida_out_valid;
soma_valid = soma_valid + (erro_valid^2);
end
erro_atual_valid = (soma_valid/(2*size(conj_valid,1)));
erros_epoca_valid = [erros_epoca_valid;erro_atual_valid];
%trainning
for i =1:size(conj_estim,1)
enter_estim = conj_estim(i,2:end);
h1_in_estim = [enter_estim,1]*[w1;b1];
h1_out_estim = selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’False’);
h2_in_estim = [h1_out_estim,1]*[w2;b2];
h2_out_estim = selecionar_funcao(h2_in_estim,ativ_h2,sig_a,tanh_a,tanh_b,’False’);
saida_in_estim = [h2_out_estim,1]*[w_out;b_out];
saida_out_estim = selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’False’);
erro_estim = conj_estim(i,1) – saida_out_estim;
soma_estim = soma_estim + (erro_estim^2);
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h2 = (w_out’*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,’True’);
delta_h1 = (w2*delta_h2′)’.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,’True’);
%atualizar pesos e biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim’;
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2’*h1_out_estim)’;
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1’*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
end
erro_atual_estim = (soma_estim/(2*size(conj_estim,1)));
erros_epoca_estim = [erros_epoca_estim;erro_atual_estim];
if erros_epoca_estim(epoch) <limiar
break
else
end
end machine learning, backpropagation MATLAB Answers — New Questions