Why is this autoencoder only predicting a single output regardless of input when using min-max scaling?
Key questions:
Why does a network predict a specific value for the output regardless of input as if the input data had no information relevant to prediction?
Why does replacing min-max scaling with standard scaling fix this, at least occassionally?
The problem background: I am trying to train a simple image autoencoder, but I keep getting networks that only output a single image regardless of the input. Taking the difference between each output image reveals they are all exactly the same. Googling this issue, I saw a stack overflow post that this often arises with improperly dimensioned loss functions. I also saw folks mentioning issues with using the sigmoid loss function for autoencoders, but the explanations as to why never surpass guesswork. I changed the scaling from min-max scaling to standard scaling and was able to obtain a network that breaks out of the single-prediction behavior, but without understanding why, I will have no recourse but trial-and-error if it breaks again.
Notes on dimensioning loss functions: When calculating the loss between a batch of images of shape [imgDim, imgDim, 1, batchSize] the mse loss function outputs a loss of dimension [1,1,1,batchSize], but this loss function has produced defective results under min-max scaling, such as the aforementioned degeneration to a single output, as well as an initial loss three orders of magnitude above the inputs and outputs scaled to the range [0,1]. To be clear, I don’t mean the learning is unstable, I mean that the absolute values of the loss are absurd.
I tried to write my own loss function that reports a scalar value, but I encountered the same degeneration to a single prediction independent of input. I then wrote a version that reports an error tensor of the same shape as @mse, but this threw an error listed below, after the custom loss function in question.
% Version that reports a scalar
function meanAbsErr = myMae(prediction, target)
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), ‘all’);
end
% Version that reports [1,1,1,batchSize]
function meanAbsErr = myMae(prediction, target)
inDims = size(prediction);
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), 1);
outDims = ones(1,length(inDims)); outDims(end) = inDims(end);
meanAbsErr = reshape(meanAbsErr, outDims);
end
Value to differentiate is non-scalar. It must be a traced real dlarray scalar.
Error in mathworksDebug>modelLoss (line 213)
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
Error in deep.internal.dlfeval (line 17)
[varargout{1:nargout}] = fun(x{:});
Error in deep.internal.dlfevalWithNestingCheck (line 19)
[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});
Error in dlfeval (line 31)
[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});
Error in mathworksDebug (line 134)
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
Notes on scaling
I wrote a custom scaling function that executes the same behavior as rescale except that it reports the obtained extrema to use in scaling and de-scaling unseen data.
% Min-max scaling between [lb, ub]
function [scaled,smin,smax] = myRescale(varargin)
datastruct = varargin{1}; lb = varargin{2}; ub = varargin{3};
if length(varargin) <= 3
smin = min(datastruct(:)); smax = max(datastruct(:));
else
smin = varargin{4}; smax = varargin{5};
end
scaled = (datastruct – smin) / (smax – smin) * (ub – lb) + lb;
end
% Invert scaling
function unscaled = myDescale(scaled, lb, ub, smin, smax)
unscaled = (scaled + lb ) * (smax – smin) ./ (ub – lb) + smin;
end
% Converts the data to z-scores
function [standard, center, stddev] = myStandardize(varargin)
datastruct = varargin{1};
if length(varargin) == 1
center = mean(datastruct(:)); stddev = std(datastruct(:));
else
center = varargin{2}; stddev = varargin{3};
end
standard = (datastruct – center) / stddev;
end
% Converts z-scores back to the data’s scale
function destandard = myDestandardize(datastruct, center, stddev)
destandard = datastruct * stddev + center;
end
In the following code, I have removed the validation set to reduce bloat.
% % I intend to regularize the latent space of this autoencoder to be a
% classify images once it can accomplish basic reconstruction. I made this note so
% it’s clear what’s going on with the custom losses and so forth.
xTrain = digitTrain4DArrayData;
xTest = digitTest4DArrayData;
%% Scaling that does not work
% Min-max scaling
xlb = 0; xub=1;
[xTrain, xTrainMin, xTrainMax] = myRescale(xTrain, xl, xub);
xTest = myRescale(xTest, xTrainMin, xTrainMax);
%% Scaling that works, at least occasionally
xTest = myStandardize(xTest, xTrainCenter, xTrainStd);
ntrain = size(xDev,4);
IMG_DIM = size(xDev, 1);N_CHANNELS=size(xDev, 3);
OUT_CHANNELS = min(size(tTrain,1), 64);
numLatentChannels = OUT_CHANNELS;
imageSize = [28 28 1];
%% Layer definitions
% Encoder layer
layersE = [
imageInputLayer(imageSize,Normalization="none")
convolution2dLayer(3,32,Padding="same",Stride=2)
reluLayer
convolution2dLayer(3,64,Padding="same",Stride=2)
reluLayer
fullyConnectedLayer(numLatentChannels)
tanhLayer(Name=’latent’)];
% Latent projection
projectionSize = [7 7 64]; enc_dim = projectionSize(1);
numInputChannels = imageSize(3);
% Decoder
layersD = [
featureInputLayer(numLatentChannels)
projectAndReshapeLayer(projectionSize)
transposedConv2dLayer(3,64,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,32,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,numInputChannels,Cropping="same")
sigmoidLayer(‘Output’)
];
netE = dlnetwork(layersE);
netD = dlnetwork(layersD);
%% Training Parameters
numEpochs = 150;
miniBatchSize = 20;
learnRate = 1e-3;
dsXTrain = arrayDatastore(xTrain,IterationDimension=4);
dstTrain = arrayDatastore(tTrain,IterationDimension=2);
numOutputs = 2;
dsTrain = combine(dsXTrain, dstTrain);
mbq = minibatchqueue(dsTrain,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFormat=["SSCB", "CB"], …
MiniBatchFcn=@preprocessMiniBatch,…
PartialMiniBatch="return");
%Initialize the parameters for the Adam solver.
trailingAvgE = [];
trailingAvgSqE = [];
trailingAvgD = [];
trailingAvgSqD = [];
%Calculate the total number of iterations for the training progress monitor
numIterationsPerEpoch = ceil(ntrain / miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
epoch = 0;
iteration = 0;
%Initialize the training progress monitor.
monitor = trainingProgressMonitor( …
Metrics=["TrainingLoss"], …
Info=["Epoch", "LearningRate"], …
XLabel="Iteration");
%% Training
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
shuffle(mbq);
% Loop over mini-batches.
while hasdata(mbq) && ~monitor.Stop
iteration = iteration + 1;
% Read mini-batch of data.
[X, Ztarget] = next(mbq);
% Evaluate loss and gradients.
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
% Update learnable parameters.
[netE,trailingAvgE,trailingAvgSqE] = adamupdate(netE, …
gradientsE,trailingAvgE,trailingAvgSqE,iteration,learnRate);
[netD, trailingAvgD, trailingAvgSqD] = adamupdate(netD, …
gradientsD,trailingAvgD,trailingAvgSqD,iteration,learnRate);
updateInfo(monitor, …
LearningRate=learnRate, …
Epoch=string(epoch) + " of " + string(numEpochs));
recordMetrics(monitor,iteration, …
TrainingLoss=loss);
monitor.Progress = 100*iteration/numIterations;
end
end
%% Testing
dsTest = combine(arrayDatastore(xTest,IterationDimension=4),…
arrayDatastore(tTest,IterationDimension=2));
numOutputs = 2;
mbqTest = minibatchqueue(dsTest,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFcn=@preprocessMiniBatch, …
MiniBatchFormat="SSCB");
[YTest, ZTest] = modelPredictions(netE,netD,mbqTest);
reconerr = mean(flatten(xTest-YTest),1);
figure
histogram(reconerr)
xlabel("Reconstruction Error")
ylabel("Frequency")
title("Test Data")
numImages = 64;
ndisplay = 10;
figure
I = imtile(YTest(:,:,:,1:numImages));
imshow(I)
title("Reconstructed Images")
%% Functions
function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X,Ztarget)
% Forward through encoder.
Z = forward(netE,X);
% Forward through decoder.
Xrecon = forward(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
end
function loss = regularizedLoss(Xrecon,X,Z,Ztarget)
% Image Reconstruction loss.
reconstructionLoss = mse(Xrecon, X);
% Regularized Loss
%regLoss = mse(Z, Ztarget);
% Combined loss.
loss = reconstructionLoss;% + 0.0*regLoss;
end
function [Xrecon, Zpred] = modelPredictions(netE,netD,mbq)
Xrecon = [];
Zpred = [];
% Loop over mini-batches.
while hasdata(mbq)
X = next(mbq);
% Pass through encoder
Z = predict(netE,X);
% Pass through decoder to get reconstructed images
XGenerated = predict(netD,Z);
% Extract and concatenate predictions.
Xrecon = cat(4,Xrecon,extractdata(XGenerated));
Zpred = cat(2,Zpred,extractdata(Z));
end
end
function loss = assessLoss(netE, netD, X, Ztarget)
% Forward through encoder.
Z = predict(netE,X);
% Forward through decoder.
Xrecon = predict(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
end
function [X, Ztarget] = preprocessMiniBatch(Xcell, tCell)
% Concatenate.
X = cat(4,Xcell{:});
% Concatenate.
Ztarget = cat(2,tCell{:});
endKey questions:
Why does a network predict a specific value for the output regardless of input as if the input data had no information relevant to prediction?
Why does replacing min-max scaling with standard scaling fix this, at least occassionally?
The problem background: I am trying to train a simple image autoencoder, but I keep getting networks that only output a single image regardless of the input. Taking the difference between each output image reveals they are all exactly the same. Googling this issue, I saw a stack overflow post that this often arises with improperly dimensioned loss functions. I also saw folks mentioning issues with using the sigmoid loss function for autoencoders, but the explanations as to why never surpass guesswork. I changed the scaling from min-max scaling to standard scaling and was able to obtain a network that breaks out of the single-prediction behavior, but without understanding why, I will have no recourse but trial-and-error if it breaks again.
Notes on dimensioning loss functions: When calculating the loss between a batch of images of shape [imgDim, imgDim, 1, batchSize] the mse loss function outputs a loss of dimension [1,1,1,batchSize], but this loss function has produced defective results under min-max scaling, such as the aforementioned degeneration to a single output, as well as an initial loss three orders of magnitude above the inputs and outputs scaled to the range [0,1]. To be clear, I don’t mean the learning is unstable, I mean that the absolute values of the loss are absurd.
I tried to write my own loss function that reports a scalar value, but I encountered the same degeneration to a single prediction independent of input. I then wrote a version that reports an error tensor of the same shape as @mse, but this threw an error listed below, after the custom loss function in question.
% Version that reports a scalar
function meanAbsErr = myMae(prediction, target)
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), ‘all’);
end
% Version that reports [1,1,1,batchSize]
function meanAbsErr = myMae(prediction, target)
inDims = size(prediction);
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), 1);
outDims = ones(1,length(inDims)); outDims(end) = inDims(end);
meanAbsErr = reshape(meanAbsErr, outDims);
end
Value to differentiate is non-scalar. It must be a traced real dlarray scalar.
Error in mathworksDebug>modelLoss (line 213)
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
Error in deep.internal.dlfeval (line 17)
[varargout{1:nargout}] = fun(x{:});
Error in deep.internal.dlfevalWithNestingCheck (line 19)
[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});
Error in dlfeval (line 31)
[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});
Error in mathworksDebug (line 134)
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
Notes on scaling
I wrote a custom scaling function that executes the same behavior as rescale except that it reports the obtained extrema to use in scaling and de-scaling unseen data.
% Min-max scaling between [lb, ub]
function [scaled,smin,smax] = myRescale(varargin)
datastruct = varargin{1}; lb = varargin{2}; ub = varargin{3};
if length(varargin) <= 3
smin = min(datastruct(:)); smax = max(datastruct(:));
else
smin = varargin{4}; smax = varargin{5};
end
scaled = (datastruct – smin) / (smax – smin) * (ub – lb) + lb;
end
% Invert scaling
function unscaled = myDescale(scaled, lb, ub, smin, smax)
unscaled = (scaled + lb ) * (smax – smin) ./ (ub – lb) + smin;
end
% Converts the data to z-scores
function [standard, center, stddev] = myStandardize(varargin)
datastruct = varargin{1};
if length(varargin) == 1
center = mean(datastruct(:)); stddev = std(datastruct(:));
else
center = varargin{2}; stddev = varargin{3};
end
standard = (datastruct – center) / stddev;
end
% Converts z-scores back to the data’s scale
function destandard = myDestandardize(datastruct, center, stddev)
destandard = datastruct * stddev + center;
end
In the following code, I have removed the validation set to reduce bloat.
% % I intend to regularize the latent space of this autoencoder to be a
% classify images once it can accomplish basic reconstruction. I made this note so
% it’s clear what’s going on with the custom losses and so forth.
xTrain = digitTrain4DArrayData;
xTest = digitTest4DArrayData;
%% Scaling that does not work
% Min-max scaling
xlb = 0; xub=1;
[xTrain, xTrainMin, xTrainMax] = myRescale(xTrain, xl, xub);
xTest = myRescale(xTest, xTrainMin, xTrainMax);
%% Scaling that works, at least occasionally
xTest = myStandardize(xTest, xTrainCenter, xTrainStd);
ntrain = size(xDev,4);
IMG_DIM = size(xDev, 1);N_CHANNELS=size(xDev, 3);
OUT_CHANNELS = min(size(tTrain,1), 64);
numLatentChannels = OUT_CHANNELS;
imageSize = [28 28 1];
%% Layer definitions
% Encoder layer
layersE = [
imageInputLayer(imageSize,Normalization="none")
convolution2dLayer(3,32,Padding="same",Stride=2)
reluLayer
convolution2dLayer(3,64,Padding="same",Stride=2)
reluLayer
fullyConnectedLayer(numLatentChannels)
tanhLayer(Name=’latent’)];
% Latent projection
projectionSize = [7 7 64]; enc_dim = projectionSize(1);
numInputChannels = imageSize(3);
% Decoder
layersD = [
featureInputLayer(numLatentChannels)
projectAndReshapeLayer(projectionSize)
transposedConv2dLayer(3,64,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,32,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,numInputChannels,Cropping="same")
sigmoidLayer(‘Output’)
];
netE = dlnetwork(layersE);
netD = dlnetwork(layersD);
%% Training Parameters
numEpochs = 150;
miniBatchSize = 20;
learnRate = 1e-3;
dsXTrain = arrayDatastore(xTrain,IterationDimension=4);
dstTrain = arrayDatastore(tTrain,IterationDimension=2);
numOutputs = 2;
dsTrain = combine(dsXTrain, dstTrain);
mbq = minibatchqueue(dsTrain,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFormat=["SSCB", "CB"], …
MiniBatchFcn=@preprocessMiniBatch,…
PartialMiniBatch="return");
%Initialize the parameters for the Adam solver.
trailingAvgE = [];
trailingAvgSqE = [];
trailingAvgD = [];
trailingAvgSqD = [];
%Calculate the total number of iterations for the training progress monitor
numIterationsPerEpoch = ceil(ntrain / miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
epoch = 0;
iteration = 0;
%Initialize the training progress monitor.
monitor = trainingProgressMonitor( …
Metrics=["TrainingLoss"], …
Info=["Epoch", "LearningRate"], …
XLabel="Iteration");
%% Training
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
shuffle(mbq);
% Loop over mini-batches.
while hasdata(mbq) && ~monitor.Stop
iteration = iteration + 1;
% Read mini-batch of data.
[X, Ztarget] = next(mbq);
% Evaluate loss and gradients.
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
% Update learnable parameters.
[netE,trailingAvgE,trailingAvgSqE] = adamupdate(netE, …
gradientsE,trailingAvgE,trailingAvgSqE,iteration,learnRate);
[netD, trailingAvgD, trailingAvgSqD] = adamupdate(netD, …
gradientsD,trailingAvgD,trailingAvgSqD,iteration,learnRate);
updateInfo(monitor, …
LearningRate=learnRate, …
Epoch=string(epoch) + " of " + string(numEpochs));
recordMetrics(monitor,iteration, …
TrainingLoss=loss);
monitor.Progress = 100*iteration/numIterations;
end
end
%% Testing
dsTest = combine(arrayDatastore(xTest,IterationDimension=4),…
arrayDatastore(tTest,IterationDimension=2));
numOutputs = 2;
mbqTest = minibatchqueue(dsTest,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFcn=@preprocessMiniBatch, …
MiniBatchFormat="SSCB");
[YTest, ZTest] = modelPredictions(netE,netD,mbqTest);
reconerr = mean(flatten(xTest-YTest),1);
figure
histogram(reconerr)
xlabel("Reconstruction Error")
ylabel("Frequency")
title("Test Data")
numImages = 64;
ndisplay = 10;
figure
I = imtile(YTest(:,:,:,1:numImages));
imshow(I)
title("Reconstructed Images")
%% Functions
function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X,Ztarget)
% Forward through encoder.
Z = forward(netE,X);
% Forward through decoder.
Xrecon = forward(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
end
function loss = regularizedLoss(Xrecon,X,Z,Ztarget)
% Image Reconstruction loss.
reconstructionLoss = mse(Xrecon, X);
% Regularized Loss
%regLoss = mse(Z, Ztarget);
% Combined loss.
loss = reconstructionLoss;% + 0.0*regLoss;
end
function [Xrecon, Zpred] = modelPredictions(netE,netD,mbq)
Xrecon = [];
Zpred = [];
% Loop over mini-batches.
while hasdata(mbq)
X = next(mbq);
% Pass through encoder
Z = predict(netE,X);
% Pass through decoder to get reconstructed images
XGenerated = predict(netD,Z);
% Extract and concatenate predictions.
Xrecon = cat(4,Xrecon,extractdata(XGenerated));
Zpred = cat(2,Zpred,extractdata(Z));
end
end
function loss = assessLoss(netE, netD, X, Ztarget)
% Forward through encoder.
Z = predict(netE,X);
% Forward through decoder.
Xrecon = predict(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
end
function [X, Ztarget] = preprocessMiniBatch(Xcell, tCell)
% Concatenate.
X = cat(4,Xcell{:});
% Concatenate.
Ztarget = cat(2,tCell{:});
end Key questions:
Why does a network predict a specific value for the output regardless of input as if the input data had no information relevant to prediction?
Why does replacing min-max scaling with standard scaling fix this, at least occassionally?
The problem background: I am trying to train a simple image autoencoder, but I keep getting networks that only output a single image regardless of the input. Taking the difference between each output image reveals they are all exactly the same. Googling this issue, I saw a stack overflow post that this often arises with improperly dimensioned loss functions. I also saw folks mentioning issues with using the sigmoid loss function for autoencoders, but the explanations as to why never surpass guesswork. I changed the scaling from min-max scaling to standard scaling and was able to obtain a network that breaks out of the single-prediction behavior, but without understanding why, I will have no recourse but trial-and-error if it breaks again.
Notes on dimensioning loss functions: When calculating the loss between a batch of images of shape [imgDim, imgDim, 1, batchSize] the mse loss function outputs a loss of dimension [1,1,1,batchSize], but this loss function has produced defective results under min-max scaling, such as the aforementioned degeneration to a single output, as well as an initial loss three orders of magnitude above the inputs and outputs scaled to the range [0,1]. To be clear, I don’t mean the learning is unstable, I mean that the absolute values of the loss are absurd.
I tried to write my own loss function that reports a scalar value, but I encountered the same degeneration to a single prediction independent of input. I then wrote a version that reports an error tensor of the same shape as @mse, but this threw an error listed below, after the custom loss function in question.
% Version that reports a scalar
function meanAbsErr = myMae(prediction, target)
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), ‘all’);
end
% Version that reports [1,1,1,batchSize]
function meanAbsErr = myMae(prediction, target)
inDims = size(prediction);
meanAbsErr = mean(abs(flatten(prediction) – flatten(target)), 1);
outDims = ones(1,length(inDims)); outDims(end) = inDims(end);
meanAbsErr = reshape(meanAbsErr, outDims);
end
Value to differentiate is non-scalar. It must be a traced real dlarray scalar.
Error in mathworksDebug>modelLoss (line 213)
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
Error in deep.internal.dlfeval (line 17)
[varargout{1:nargout}] = fun(x{:});
Error in deep.internal.dlfevalWithNestingCheck (line 19)
[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});
Error in dlfeval (line 31)
[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});
Error in mathworksDebug (line 134)
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
Notes on scaling
I wrote a custom scaling function that executes the same behavior as rescale except that it reports the obtained extrema to use in scaling and de-scaling unseen data.
% Min-max scaling between [lb, ub]
function [scaled,smin,smax] = myRescale(varargin)
datastruct = varargin{1}; lb = varargin{2}; ub = varargin{3};
if length(varargin) <= 3
smin = min(datastruct(:)); smax = max(datastruct(:));
else
smin = varargin{4}; smax = varargin{5};
end
scaled = (datastruct – smin) / (smax – smin) * (ub – lb) + lb;
end
% Invert scaling
function unscaled = myDescale(scaled, lb, ub, smin, smax)
unscaled = (scaled + lb ) * (smax – smin) ./ (ub – lb) + smin;
end
% Converts the data to z-scores
function [standard, center, stddev] = myStandardize(varargin)
datastruct = varargin{1};
if length(varargin) == 1
center = mean(datastruct(:)); stddev = std(datastruct(:));
else
center = varargin{2}; stddev = varargin{3};
end
standard = (datastruct – center) / stddev;
end
% Converts z-scores back to the data’s scale
function destandard = myDestandardize(datastruct, center, stddev)
destandard = datastruct * stddev + center;
end
In the following code, I have removed the validation set to reduce bloat.
% % I intend to regularize the latent space of this autoencoder to be a
% classify images once it can accomplish basic reconstruction. I made this note so
% it’s clear what’s going on with the custom losses and so forth.
xTrain = digitTrain4DArrayData;
xTest = digitTest4DArrayData;
%% Scaling that does not work
% Min-max scaling
xlb = 0; xub=1;
[xTrain, xTrainMin, xTrainMax] = myRescale(xTrain, xl, xub);
xTest = myRescale(xTest, xTrainMin, xTrainMax);
%% Scaling that works, at least occasionally
xTest = myStandardize(xTest, xTrainCenter, xTrainStd);
ntrain = size(xDev,4);
IMG_DIM = size(xDev, 1);N_CHANNELS=size(xDev, 3);
OUT_CHANNELS = min(size(tTrain,1), 64);
numLatentChannels = OUT_CHANNELS;
imageSize = [28 28 1];
%% Layer definitions
% Encoder layer
layersE = [
imageInputLayer(imageSize,Normalization="none")
convolution2dLayer(3,32,Padding="same",Stride=2)
reluLayer
convolution2dLayer(3,64,Padding="same",Stride=2)
reluLayer
fullyConnectedLayer(numLatentChannels)
tanhLayer(Name=’latent’)];
% Latent projection
projectionSize = [7 7 64]; enc_dim = projectionSize(1);
numInputChannels = imageSize(3);
% Decoder
layersD = [
featureInputLayer(numLatentChannels)
projectAndReshapeLayer(projectionSize)
transposedConv2dLayer(3,64,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,32,Cropping="same",Stride=2)
reluLayer
transposedConv2dLayer(3,numInputChannels,Cropping="same")
sigmoidLayer(‘Output’)
];
netE = dlnetwork(layersE);
netD = dlnetwork(layersD);
%% Training Parameters
numEpochs = 150;
miniBatchSize = 20;
learnRate = 1e-3;
dsXTrain = arrayDatastore(xTrain,IterationDimension=4);
dstTrain = arrayDatastore(tTrain,IterationDimension=2);
numOutputs = 2;
dsTrain = combine(dsXTrain, dstTrain);
mbq = minibatchqueue(dsTrain,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFormat=["SSCB", "CB"], …
MiniBatchFcn=@preprocessMiniBatch,…
PartialMiniBatch="return");
%Initialize the parameters for the Adam solver.
trailingAvgE = [];
trailingAvgSqE = [];
trailingAvgD = [];
trailingAvgSqD = [];
%Calculate the total number of iterations for the training progress monitor
numIterationsPerEpoch = ceil(ntrain / miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
epoch = 0;
iteration = 0;
%Initialize the training progress monitor.
monitor = trainingProgressMonitor( …
Metrics=["TrainingLoss"], …
Info=["Epoch", "LearningRate"], …
XLabel="Iteration");
%% Training
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
shuffle(mbq);
% Loop over mini-batches.
while hasdata(mbq) && ~monitor.Stop
iteration = iteration + 1;
% Read mini-batch of data.
[X, Ztarget] = next(mbq);
% Evaluate loss and gradients.
[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);
% Update learnable parameters.
[netE,trailingAvgE,trailingAvgSqE] = adamupdate(netE, …
gradientsE,trailingAvgE,trailingAvgSqE,iteration,learnRate);
[netD, trailingAvgD, trailingAvgSqD] = adamupdate(netD, …
gradientsD,trailingAvgD,trailingAvgSqD,iteration,learnRate);
updateInfo(monitor, …
LearningRate=learnRate, …
Epoch=string(epoch) + " of " + string(numEpochs));
recordMetrics(monitor,iteration, …
TrainingLoss=loss);
monitor.Progress = 100*iteration/numIterations;
end
end
%% Testing
dsTest = combine(arrayDatastore(xTest,IterationDimension=4),…
arrayDatastore(tTest,IterationDimension=2));
numOutputs = 2;
mbqTest = minibatchqueue(dsTest,numOutputs, …
MiniBatchSize = miniBatchSize, …
MiniBatchFcn=@preprocessMiniBatch, …
MiniBatchFormat="SSCB");
[YTest, ZTest] = modelPredictions(netE,netD,mbqTest);
reconerr = mean(flatten(xTest-YTest),1);
figure
histogram(reconerr)
xlabel("Reconstruction Error")
ylabel("Frequency")
title("Test Data")
numImages = 64;
ndisplay = 10;
figure
I = imtile(YTest(:,:,:,1:numImages));
imshow(I)
title("Reconstructed Images")
%% Functions
function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X,Ztarget)
% Forward through encoder.
Z = forward(netE,X);
% Forward through decoder.
Xrecon = forward(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
end
function loss = regularizedLoss(Xrecon,X,Z,Ztarget)
% Image Reconstruction loss.
reconstructionLoss = mse(Xrecon, X);
% Regularized Loss
%regLoss = mse(Z, Ztarget);
% Combined loss.
loss = reconstructionLoss;% + 0.0*regLoss;
end
function [Xrecon, Zpred] = modelPredictions(netE,netD,mbq)
Xrecon = [];
Zpred = [];
% Loop over mini-batches.
while hasdata(mbq)
X = next(mbq);
% Pass through encoder
Z = predict(netE,X);
% Pass through decoder to get reconstructed images
XGenerated = predict(netD,Z);
% Extract and concatenate predictions.
Xrecon = cat(4,Xrecon,extractdata(XGenerated));
Zpred = cat(2,Zpred,extractdata(Z));
end
end
function loss = assessLoss(netE, netD, X, Ztarget)
% Forward through encoder.
Z = predict(netE,X);
% Forward through decoder.
Xrecon = predict(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X,Z,Ztarget);
end
function [X, Ztarget] = preprocessMiniBatch(Xcell, tCell)
% Concatenate.
X = cat(4,Xcell{:});
% Concatenate.
Ztarget = cat(2,tCell{:});
end autoencoder, deep learning, scaling, activation functions MATLAB Answers — New Questions