Attention layer: Number of parameters doesn’t change when changing number of heads
Changing the number of heads attribute of an attention layer from the Matlab deep learning toolbox doesn’t seem to affect the resulting number of learnable parameters.
The following code will result in 1793 total paramters
% Number of heads for multi head attention layer
num_heads = 1;
% Number of key channels for querry, key and value
num_keys = 256;
% Number of classes
num_classes = 5;
% Define architecture
network_layers = [
sequenceInputLayer(1)
selfAttentionLayer(num_heads,num_keys)
fullyConnectedLayer(num_classes)
softmaxLayer
classificationLayer];
% Define layer graph
net = layerGraph;
net = addLayers(net,network_layers);
% Plot network structure
analyzeNetwork(net)
When changing the number of heads to e.g. 16, the number of learnable paramters doesn’t change.
% Number of heads for multi head attention layer
num_heads = 16;
Why is that?
Shouldn’t the number of learnable paramters of the attention layer increase proportional to the number of heads?
Any help is highly appreciated!Changing the number of heads attribute of an attention layer from the Matlab deep learning toolbox doesn’t seem to affect the resulting number of learnable parameters.
The following code will result in 1793 total paramters
% Number of heads for multi head attention layer
num_heads = 1;
% Number of key channels for querry, key and value
num_keys = 256;
% Number of classes
num_classes = 5;
% Define architecture
network_layers = [
sequenceInputLayer(1)
selfAttentionLayer(num_heads,num_keys)
fullyConnectedLayer(num_classes)
softmaxLayer
classificationLayer];
% Define layer graph
net = layerGraph;
net = addLayers(net,network_layers);
% Plot network structure
analyzeNetwork(net)
When changing the number of heads to e.g. 16, the number of learnable paramters doesn’t change.
% Number of heads for multi head attention layer
num_heads = 16;
Why is that?
Shouldn’t the number of learnable paramters of the attention layer increase proportional to the number of heads?
Any help is highly appreciated! Changing the number of heads attribute of an attention layer from the Matlab deep learning toolbox doesn’t seem to affect the resulting number of learnable parameters.
The following code will result in 1793 total paramters
% Number of heads for multi head attention layer
num_heads = 1;
% Number of key channels for querry, key and value
num_keys = 256;
% Number of classes
num_classes = 5;
% Define architecture
network_layers = [
sequenceInputLayer(1)
selfAttentionLayer(num_heads,num_keys)
fullyConnectedLayer(num_classes)
softmaxLayer
classificationLayer];
% Define layer graph
net = layerGraph;
net = addLayers(net,network_layers);
% Plot network structure
analyzeNetwork(net)
When changing the number of heads to e.g. 16, the number of learnable paramters doesn’t change.
% Number of heads for multi head attention layer
num_heads = 16;
Why is that?
Shouldn’t the number of learnable paramters of the attention layer increase proportional to the number of heads?
Any help is highly appreciated! transformer, attention, attention layer, learnable paramters, deep learning MATLAB Answers — New Questions