Why does my fit to a PDF generated using a histogram not add up to 1 or give me correct expected value?
Hello,
I generate a pdf from a histogram which describes the distribution of values about an average. For example say I have values {x1,x2,….xn}, my PDF is to see how x’={x1,x2,….xn}/mean({x1,x2,….xn}) is distributed.
The PDF of x’ (blue curve) i generated is shown below where I fit a yellow quadratic function to whatever is below the mean <1 and with an exponential fit to whatever is >1.If I sum the blue curve:
(sum(prob))
it is 1 which makes sense since probabilities should add up to 1. In addition, if I do:
sum(x’.*prob)
for the distribution, I get 1 which makes sense because my expected value should be 1. However, If I try to do the same with my curve fits, using trapezoidal integration, I do not get 1! I don’t understand what is missing.. am I not normalizing a quantity?
The full code snippet I used to generate the above figure is attached below. I have also attached the data to this post if you might be interested in taking a look. My expected value from the discrete distribution given by the variable ‘discrete’ is nearly 1. but the ‘continuous’ variable when computed is like 0.18 which is unexpected. Any help is much appreciated!
data = [data_tot_fn_4_0p05/mean(data_tot_fn_4_0p05)]; %# Sample data
xRange = 0:0.2:max(data); %# Range of integers to compute a probability for
N = hist(data,xRange); %# Bin the data
prob=N./numel(data);
semilogy(xRange,prob); %# Plot the probabilities for each integer
xlabel(‘Integer value’);
ylabel(‘Probability’);
greater1=find(xRange>=1);
xgreater1=xRange(greater1)
ygreater1=prob(greater1);
less1=find(xRange<=1);
xless1=xRange(less1)
yless1=prob(less1);
model = @(a, x) exp(-a*(x));
% Initial guess for the parameter ‘a’
initial_guess = 0.5;
% Perform the curve fitting
a_fitgreat = lsqcurvefit(model, initial_guess, xgreater1, ygreater1);
model2=@(p,x)p(1)+p(2)./(p(3).*sqrt(pi/2)).*exp(-2*(x-p(4)).^2./p(3).^2)
model2=@(p,x)p(1)+p(2).*x+p(3).*x.^2;
initial_guess = [0.5,0.5,0.5,0.5];
initial_guess=[0.5,0.5,0.5];
% Perform the curve fitting
a_fitless = lsqcurvefit(model2, initial_guess, xless1, yless1);
% Calculate the fitted curve using the optimized ‘a’ value
y_fit_great = model(a_fitgreat, xgreater1);
y_fit_less = model2(a_fitless, xless1);
hold on;
plot(xgreater1,y_fit_great);
hold on;
plot(xless1,y_fit_less)
discrete=sum(xRange.*prob)
continuous=trapz(fofaveless,(fofaveless).*yfofaveless)+trapz(fofavegreat,(fofavegreat).*yfofavegreat);Hello,
I generate a pdf from a histogram which describes the distribution of values about an average. For example say I have values {x1,x2,….xn}, my PDF is to see how x’={x1,x2,….xn}/mean({x1,x2,….xn}) is distributed.
The PDF of x’ (blue curve) i generated is shown below where I fit a yellow quadratic function to whatever is below the mean <1 and with an exponential fit to whatever is >1.If I sum the blue curve:
(sum(prob))
it is 1 which makes sense since probabilities should add up to 1. In addition, if I do:
sum(x’.*prob)
for the distribution, I get 1 which makes sense because my expected value should be 1. However, If I try to do the same with my curve fits, using trapezoidal integration, I do not get 1! I don’t understand what is missing.. am I not normalizing a quantity?
The full code snippet I used to generate the above figure is attached below. I have also attached the data to this post if you might be interested in taking a look. My expected value from the discrete distribution given by the variable ‘discrete’ is nearly 1. but the ‘continuous’ variable when computed is like 0.18 which is unexpected. Any help is much appreciated!
data = [data_tot_fn_4_0p05/mean(data_tot_fn_4_0p05)]; %# Sample data
xRange = 0:0.2:max(data); %# Range of integers to compute a probability for
N = hist(data,xRange); %# Bin the data
prob=N./numel(data);
semilogy(xRange,prob); %# Plot the probabilities for each integer
xlabel(‘Integer value’);
ylabel(‘Probability’);
greater1=find(xRange>=1);
xgreater1=xRange(greater1)
ygreater1=prob(greater1);
less1=find(xRange<=1);
xless1=xRange(less1)
yless1=prob(less1);
model = @(a, x) exp(-a*(x));
% Initial guess for the parameter ‘a’
initial_guess = 0.5;
% Perform the curve fitting
a_fitgreat = lsqcurvefit(model, initial_guess, xgreater1, ygreater1);
model2=@(p,x)p(1)+p(2)./(p(3).*sqrt(pi/2)).*exp(-2*(x-p(4)).^2./p(3).^2)
model2=@(p,x)p(1)+p(2).*x+p(3).*x.^2;
initial_guess = [0.5,0.5,0.5,0.5];
initial_guess=[0.5,0.5,0.5];
% Perform the curve fitting
a_fitless = lsqcurvefit(model2, initial_guess, xless1, yless1);
% Calculate the fitted curve using the optimized ‘a’ value
y_fit_great = model(a_fitgreat, xgreater1);
y_fit_less = model2(a_fitless, xless1);
hold on;
plot(xgreater1,y_fit_great);
hold on;
plot(xless1,y_fit_less)
discrete=sum(xRange.*prob)
continuous=trapz(fofaveless,(fofaveless).*yfofaveless)+trapz(fofavegreat,(fofavegreat).*yfofavegreat); Hello,
I generate a pdf from a histogram which describes the distribution of values about an average. For example say I have values {x1,x2,….xn}, my PDF is to see how x’={x1,x2,….xn}/mean({x1,x2,….xn}) is distributed.
The PDF of x’ (blue curve) i generated is shown below where I fit a yellow quadratic function to whatever is below the mean <1 and with an exponential fit to whatever is >1.If I sum the blue curve:
(sum(prob))
it is 1 which makes sense since probabilities should add up to 1. In addition, if I do:
sum(x’.*prob)
for the distribution, I get 1 which makes sense because my expected value should be 1. However, If I try to do the same with my curve fits, using trapezoidal integration, I do not get 1! I don’t understand what is missing.. am I not normalizing a quantity?
The full code snippet I used to generate the above figure is attached below. I have also attached the data to this post if you might be interested in taking a look. My expected value from the discrete distribution given by the variable ‘discrete’ is nearly 1. but the ‘continuous’ variable when computed is like 0.18 which is unexpected. Any help is much appreciated!
data = [data_tot_fn_4_0p05/mean(data_tot_fn_4_0p05)]; %# Sample data
xRange = 0:0.2:max(data); %# Range of integers to compute a probability for
N = hist(data,xRange); %# Bin the data
prob=N./numel(data);
semilogy(xRange,prob); %# Plot the probabilities for each integer
xlabel(‘Integer value’);
ylabel(‘Probability’);
greater1=find(xRange>=1);
xgreater1=xRange(greater1)
ygreater1=prob(greater1);
less1=find(xRange<=1);
xless1=xRange(less1)
yless1=prob(less1);
model = @(a, x) exp(-a*(x));
% Initial guess for the parameter ‘a’
initial_guess = 0.5;
% Perform the curve fitting
a_fitgreat = lsqcurvefit(model, initial_guess, xgreater1, ygreater1);
model2=@(p,x)p(1)+p(2)./(p(3).*sqrt(pi/2)).*exp(-2*(x-p(4)).^2./p(3).^2)
model2=@(p,x)p(1)+p(2).*x+p(3).*x.^2;
initial_guess = [0.5,0.5,0.5,0.5];
initial_guess=[0.5,0.5,0.5];
% Perform the curve fitting
a_fitless = lsqcurvefit(model2, initial_guess, xless1, yless1);
% Calculate the fitted curve using the optimized ‘a’ value
y_fit_great = model(a_fitgreat, xgreater1);
y_fit_less = model2(a_fitless, xless1);
hold on;
plot(xgreater1,y_fit_great);
hold on;
plot(xless1,y_fit_less)
discrete=sum(xRange.*prob)
continuous=trapz(fofaveless,(fofaveless).*yfofaveless)+trapz(fofavegreat,(fofavegreat).*yfofavegreat); pdf, histogram, integral, sum, probability MATLAB Answers — New Questions