Adjust classifier performance (sensitivity & specificity)
Hi all,
I’m trying to build a classifier for my highly imbalanced binaty data, where I have the following stats:
tabulate(classes)
Value Count Percent
0 133412 97.62%
1 3247 2.38%
My dataset has 119 features. My question is: how can I balance my classifier sensitivity and specificity results (see more details below)?
In order to deal with my imbalanced data, I’m using the ensemble classifier with the RUSBoost Method, and acessing its performance, like shown in the code below:
%% Set cross validation – holdout
part = cvpartition(classes, ‘Holdout’, 0.5);
istrain = training(part); % Data for fitting
istest = test(part); % Data for quality assessment
holdout_train_features = features(istrain,:);
holdout_train_classes = classes(istrain);
holdout_test_features = features(istest,:);
holdout_test_classes = classes(istest);
%% Set classifier
% Set template tree
max_mum_splits = round(sum(istrain)/2);
t = templateTree(‘MaxNumSplits’, max_num_splits);
classifier = fitcensemble(holdout_train_features, holdout_train_classes, ‘Method’,’RUSBoost’, …
‘NumLearningCycles’, 1000, ‘Learners’, t,’LearnRate’, 0.1);
%% Test performance
% Get common classification indicators
[obtained_classes, scores] = predict(classifier, holdout_test_features);
holdout_validation_results = confusionchart(holdout_test_classes, obtained_classes);
TN = holdout_validation_results.NormalizedValues(1,1);
TP = holdout_validation_results.NormalizedValues(2,2);
FP = holdout_validation_results.NormalizedValues(1,2);
FN = holdout_validation_results.NormalizedValues(2,1);
accuracy = (TP + TN)/(TP + TN + FP + FN); % 0.99406
sensitivity = TP/(TP + FN); % 0.86445
specificity = TN/(TN + FP); % 0.99721
PPV = TP/(TP + FP); % 0.88295
NPV = TN/(TN + FN); % 0.9967
% Compute ROC curve
positiveClassIdx = find(classifier.ClassNames == 1);
[X,Y,T,AUC, OPTROCPT] = perfcurve(holdout_test_classes, scores(:,positiveClassIdx), 1);
plot(1-X,Y)
hold on
scatter(1-OPTROCPT(1),OPTROCPT(2), ‘filled’)
xlabel(‘Specificity’)
ylabel(‘Sensitivity’)
Which gets me the following:
As can be appreciated, I get an imbalanced value of specificity (very high) and sensitivity (low). My question is: how can I adjust my classifier in order to balance the sensitivity and specificity (and PPV and NPV, of course), so that it matched my desired balance (e.g., what I show in the ROC curve: 0.97 specificity and 0.961 sensitivity)?
Many thanks for your attention,
DiogoHi all,
I’m trying to build a classifier for my highly imbalanced binaty data, where I have the following stats:
tabulate(classes)
Value Count Percent
0 133412 97.62%
1 3247 2.38%
My dataset has 119 features. My question is: how can I balance my classifier sensitivity and specificity results (see more details below)?
In order to deal with my imbalanced data, I’m using the ensemble classifier with the RUSBoost Method, and acessing its performance, like shown in the code below:
%% Set cross validation – holdout
part = cvpartition(classes, ‘Holdout’, 0.5);
istrain = training(part); % Data for fitting
istest = test(part); % Data for quality assessment
holdout_train_features = features(istrain,:);
holdout_train_classes = classes(istrain);
holdout_test_features = features(istest,:);
holdout_test_classes = classes(istest);
%% Set classifier
% Set template tree
max_mum_splits = round(sum(istrain)/2);
t = templateTree(‘MaxNumSplits’, max_num_splits);
classifier = fitcensemble(holdout_train_features, holdout_train_classes, ‘Method’,’RUSBoost’, …
‘NumLearningCycles’, 1000, ‘Learners’, t,’LearnRate’, 0.1);
%% Test performance
% Get common classification indicators
[obtained_classes, scores] = predict(classifier, holdout_test_features);
holdout_validation_results = confusionchart(holdout_test_classes, obtained_classes);
TN = holdout_validation_results.NormalizedValues(1,1);
TP = holdout_validation_results.NormalizedValues(2,2);
FP = holdout_validation_results.NormalizedValues(1,2);
FN = holdout_validation_results.NormalizedValues(2,1);
accuracy = (TP + TN)/(TP + TN + FP + FN); % 0.99406
sensitivity = TP/(TP + FN); % 0.86445
specificity = TN/(TN + FP); % 0.99721
PPV = TP/(TP + FP); % 0.88295
NPV = TN/(TN + FN); % 0.9967
% Compute ROC curve
positiveClassIdx = find(classifier.ClassNames == 1);
[X,Y,T,AUC, OPTROCPT] = perfcurve(holdout_test_classes, scores(:,positiveClassIdx), 1);
plot(1-X,Y)
hold on
scatter(1-OPTROCPT(1),OPTROCPT(2), ‘filled’)
xlabel(‘Specificity’)
ylabel(‘Sensitivity’)
Which gets me the following:
As can be appreciated, I get an imbalanced value of specificity (very high) and sensitivity (low). My question is: how can I adjust my classifier in order to balance the sensitivity and specificity (and PPV and NPV, of course), so that it matched my desired balance (e.g., what I show in the ROC curve: 0.97 specificity and 0.961 sensitivity)?
Many thanks for your attention,
Diogo Hi all,
I’m trying to build a classifier for my highly imbalanced binaty data, where I have the following stats:
tabulate(classes)
Value Count Percent
0 133412 97.62%
1 3247 2.38%
My dataset has 119 features. My question is: how can I balance my classifier sensitivity and specificity results (see more details below)?
In order to deal with my imbalanced data, I’m using the ensemble classifier with the RUSBoost Method, and acessing its performance, like shown in the code below:
%% Set cross validation – holdout
part = cvpartition(classes, ‘Holdout’, 0.5);
istrain = training(part); % Data for fitting
istest = test(part); % Data for quality assessment
holdout_train_features = features(istrain,:);
holdout_train_classes = classes(istrain);
holdout_test_features = features(istest,:);
holdout_test_classes = classes(istest);
%% Set classifier
% Set template tree
max_mum_splits = round(sum(istrain)/2);
t = templateTree(‘MaxNumSplits’, max_num_splits);
classifier = fitcensemble(holdout_train_features, holdout_train_classes, ‘Method’,’RUSBoost’, …
‘NumLearningCycles’, 1000, ‘Learners’, t,’LearnRate’, 0.1);
%% Test performance
% Get common classification indicators
[obtained_classes, scores] = predict(classifier, holdout_test_features);
holdout_validation_results = confusionchart(holdout_test_classes, obtained_classes);
TN = holdout_validation_results.NormalizedValues(1,1);
TP = holdout_validation_results.NormalizedValues(2,2);
FP = holdout_validation_results.NormalizedValues(1,2);
FN = holdout_validation_results.NormalizedValues(2,1);
accuracy = (TP + TN)/(TP + TN + FP + FN); % 0.99406
sensitivity = TP/(TP + FN); % 0.86445
specificity = TN/(TN + FP); % 0.99721
PPV = TP/(TP + FP); % 0.88295
NPV = TN/(TN + FN); % 0.9967
% Compute ROC curve
positiveClassIdx = find(classifier.ClassNames == 1);
[X,Y,T,AUC, OPTROCPT] = perfcurve(holdout_test_classes, scores(:,positiveClassIdx), 1);
plot(1-X,Y)
hold on
scatter(1-OPTROCPT(1),OPTROCPT(2), ‘filled’)
xlabel(‘Specificity’)
ylabel(‘Sensitivity’)
Which gets me the following:
As can be appreciated, I get an imbalanced value of specificity (very high) and sensitivity (low). My question is: how can I adjust my classifier in order to balance the sensitivity and specificity (and PPV and NPV, of course), so that it matched my desired balance (e.g., what I show in the ROC curve: 0.97 specificity and 0.961 sensitivity)?
Many thanks for your attention,
Diogo machine learning, performance, matlab, classification, ensemble learning MATLAB Answers — New Questions