Automatically select the right number of bins (or combine the bins) for the expected frequencies in crosstab, in order to guarantee at least 5 elements per bin
I have two observed datasets, "x" and "y", that I want to compare the observed frequencies in bins of "x" and "y" against the other, through crosstab. To do so, I first need to place the elements of "x" and "y" into bins, by using the histcounts function. The resulting binned arrays, "cx" and "cy", are then compared to each other with a chi-square test, perfomed by crosstab. The chi-square test of independence is performed to determine if there is a significant association between the frequencies of "x" and "y" across the bins.
However, the chi-square test "is not valid for small samples, and if some of the counts (in the expected frequency) are less than five, you may need to combine some bins in the tails.". In the following example, several bins of the observed frequencies "cx" and "cy" have zero elements, and I do not know if they affect the expected frequencies calculated within/by crosstab.
Therefore, is there a way in crosstab to automatically select the right number of bins for the expected frequencies, or to combine them if some are empty, in order to guarantee at least 5 elements per bin?
rng default; % for reproducibility
a = 0;
b = 100;
nb = 50;
% Create two log-normal distributed random datasets, "x" and "y’
% (but we can use any randomly distributed data)
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
% Counts/frequency of "x" and "y"
cx = histcounts(x,’NumBins’,nb);
cy = histcounts(y,’NumBins’,nb);
[~,chi2,p] = crosstab(cx,cy)I have two observed datasets, "x" and "y", that I want to compare the observed frequencies in bins of "x" and "y" against the other, through crosstab. To do so, I first need to place the elements of "x" and "y" into bins, by using the histcounts function. The resulting binned arrays, "cx" and "cy", are then compared to each other with a chi-square test, perfomed by crosstab. The chi-square test of independence is performed to determine if there is a significant association between the frequencies of "x" and "y" across the bins.
However, the chi-square test "is not valid for small samples, and if some of the counts (in the expected frequency) are less than five, you may need to combine some bins in the tails.". In the following example, several bins of the observed frequencies "cx" and "cy" have zero elements, and I do not know if they affect the expected frequencies calculated within/by crosstab.
Therefore, is there a way in crosstab to automatically select the right number of bins for the expected frequencies, or to combine them if some are empty, in order to guarantee at least 5 elements per bin?
rng default; % for reproducibility
a = 0;
b = 100;
nb = 50;
% Create two log-normal distributed random datasets, "x" and "y’
% (but we can use any randomly distributed data)
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
% Counts/frequency of "x" and "y"
cx = histcounts(x,’NumBins’,nb);
cy = histcounts(y,’NumBins’,nb);
[~,chi2,p] = crosstab(cx,cy) I have two observed datasets, "x" and "y", that I want to compare the observed frequencies in bins of "x" and "y" against the other, through crosstab. To do so, I first need to place the elements of "x" and "y" into bins, by using the histcounts function. The resulting binned arrays, "cx" and "cy", are then compared to each other with a chi-square test, perfomed by crosstab. The chi-square test of independence is performed to determine if there is a significant association between the frequencies of "x" and "y" across the bins.
However, the chi-square test "is not valid for small samples, and if some of the counts (in the expected frequency) are less than five, you may need to combine some bins in the tails.". In the following example, several bins of the observed frequencies "cx" and "cy" have zero elements, and I do not know if they affect the expected frequencies calculated within/by crosstab.
Therefore, is there a way in crosstab to automatically select the right number of bins for the expected frequencies, or to combine them if some are empty, in order to guarantee at least 5 elements per bin?
rng default; % for reproducibility
a = 0;
b = 100;
nb = 50;
% Create two log-normal distributed random datasets, "x" and "y’
% (but we can use any randomly distributed data)
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
% Counts/frequency of "x" and "y"
cx = histcounts(x,’NumBins’,nb);
cy = histcounts(y,’NumBins’,nb);
[~,chi2,p] = crosstab(cx,cy) crosstab, binning, binned, array, histcounts MATLAB Answers — New Questions