kfoldLoss() values have inconsistent precision between different iterations of a loop
I am training an RBF SVM with leave-one-out cross-validation using 94 observations and I am surpised to find that the precision of the result of kfoldLoss() isn’t consistent when comparing models that have the same loss (or accuracy). For example, an accuracy of 76/94 does not always produce exactly the same value, with a variation of around 1e-15. The error is completely negligible except for comparing values or searching for the maximum, etc. The only thing that should be different is which 76 of the 94 folds are correct, but this should have no effect on the value or precision of the result.
I’m using a parfor loop to test many combinations of features (e.g. 260K combinations) and measuring the accuracy using accuracy = 1 – kfoldLoss(Mdl). I then use max() to find the position of the result with the highest accuracy; however, sometimes this does not work because there can be tiny variations in the precision. How is this even possible?
With 94 observations, there are only 94 possible accuracy levels. In my latest test, the peak accuracy is 76 out of 94, which is 0.808510638297872…etc.
Eight of the models tested have this 76 / 94 accuracy but it isn’t stored with the same precision in the same double-precision vector. Precision errors are inevitable, but I would have expected MATLAB to always return the same result for 76 / 94.
I’m using a parfor loop. Could this have something to do with it? Is it possible for one thread to somehow produce a different precision from others? It’s an Intel i7-7700 running MATLAB 2024a on Windows 10 .
% Abbreviated code. "combinations" is a cell array with each cell
% containing a vector of the features to select from the training data
accuracy = [];
parfor i = 1:length(combinations)
td_sel = training_data(:, cell2mat(combinations(i)));
Mdl = fitcsvm(td_sel, response_name, ‘KernelFunction’, ‘RBF’, ‘KFold’, 94, ‘CacheSize’, ‘maximal’)
accuracy(i) = 1 – kfoldLoss(Mdl);
end
[max_val, max_pos] = max(accuracies)
max_val =
0.808510638297872
max_pos =
52793
% Find all values that are very close to this value. But I don’t understand
% how the precision (in the storage of the value) can be different
a = find(abs(accuracies – max_val) < 1e-10)
accuracies(a)
a =
6829
6891
6989
13699
21936
22778
45270
52793
ans =
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
accuracies(a) – max_val
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
accuracies(a) – 76/94
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
Thanks.I am training an RBF SVM with leave-one-out cross-validation using 94 observations and I am surpised to find that the precision of the result of kfoldLoss() isn’t consistent when comparing models that have the same loss (or accuracy). For example, an accuracy of 76/94 does not always produce exactly the same value, with a variation of around 1e-15. The error is completely negligible except for comparing values or searching for the maximum, etc. The only thing that should be different is which 76 of the 94 folds are correct, but this should have no effect on the value or precision of the result.
I’m using a parfor loop to test many combinations of features (e.g. 260K combinations) and measuring the accuracy using accuracy = 1 – kfoldLoss(Mdl). I then use max() to find the position of the result with the highest accuracy; however, sometimes this does not work because there can be tiny variations in the precision. How is this even possible?
With 94 observations, there are only 94 possible accuracy levels. In my latest test, the peak accuracy is 76 out of 94, which is 0.808510638297872…etc.
Eight of the models tested have this 76 / 94 accuracy but it isn’t stored with the same precision in the same double-precision vector. Precision errors are inevitable, but I would have expected MATLAB to always return the same result for 76 / 94.
I’m using a parfor loop. Could this have something to do with it? Is it possible for one thread to somehow produce a different precision from others? It’s an Intel i7-7700 running MATLAB 2024a on Windows 10 .
% Abbreviated code. "combinations" is a cell array with each cell
% containing a vector of the features to select from the training data
accuracy = [];
parfor i = 1:length(combinations)
td_sel = training_data(:, cell2mat(combinations(i)));
Mdl = fitcsvm(td_sel, response_name, ‘KernelFunction’, ‘RBF’, ‘KFold’, 94, ‘CacheSize’, ‘maximal’)
accuracy(i) = 1 – kfoldLoss(Mdl);
end
[max_val, max_pos] = max(accuracies)
max_val =
0.808510638297872
max_pos =
52793
% Find all values that are very close to this value. But I don’t understand
% how the precision (in the storage of the value) can be different
a = find(abs(accuracies – max_val) < 1e-10)
accuracies(a)
a =
6829
6891
6989
13699
21936
22778
45270
52793
ans =
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
accuracies(a) – max_val
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
accuracies(a) – 76/94
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
Thanks. I am training an RBF SVM with leave-one-out cross-validation using 94 observations and I am surpised to find that the precision of the result of kfoldLoss() isn’t consistent when comparing models that have the same loss (or accuracy). For example, an accuracy of 76/94 does not always produce exactly the same value, with a variation of around 1e-15. The error is completely negligible except for comparing values or searching for the maximum, etc. The only thing that should be different is which 76 of the 94 folds are correct, but this should have no effect on the value or precision of the result.
I’m using a parfor loop to test many combinations of features (e.g. 260K combinations) and measuring the accuracy using accuracy = 1 – kfoldLoss(Mdl). I then use max() to find the position of the result with the highest accuracy; however, sometimes this does not work because there can be tiny variations in the precision. How is this even possible?
With 94 observations, there are only 94 possible accuracy levels. In my latest test, the peak accuracy is 76 out of 94, which is 0.808510638297872…etc.
Eight of the models tested have this 76 / 94 accuracy but it isn’t stored with the same precision in the same double-precision vector. Precision errors are inevitable, but I would have expected MATLAB to always return the same result for 76 / 94.
I’m using a parfor loop. Could this have something to do with it? Is it possible for one thread to somehow produce a different precision from others? It’s an Intel i7-7700 running MATLAB 2024a on Windows 10 .
% Abbreviated code. "combinations" is a cell array with each cell
% containing a vector of the features to select from the training data
accuracy = [];
parfor i = 1:length(combinations)
td_sel = training_data(:, cell2mat(combinations(i)));
Mdl = fitcsvm(td_sel, response_name, ‘KernelFunction’, ‘RBF’, ‘KFold’, 94, ‘CacheSize’, ‘maximal’)
accuracy(i) = 1 – kfoldLoss(Mdl);
end
[max_val, max_pos] = max(accuracies)
max_val =
0.808510638297872
max_pos =
52793
% Find all values that are very close to this value. But I don’t understand
% how the precision (in the storage of the value) can be different
a = find(abs(accuracies – max_val) < 1e-10)
accuracies(a)
a =
6829
6891
6989
13699
21936
22778
45270
52793
ans =
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
accuracies(a) – max_val
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
accuracies(a) – 76/94
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
Thanks. kfoldloss, precision, parfor MATLAB Answers — New Questions