关于χ²分布和统计
来源:互联网 发布:电子表格查找相同数据 编辑:程序博客网 时间:2024/05/18 19:37
Recently I was helping my wife review some research papers in her physiotherapy area, some of which involve certain amount of statistical analysis, particularly χ² statistic which I later found is not quite a trivial statistical topic. I realized I have forgotten the majority of what I learned about probability and statistics in the university. Now I have to pick some of them up again. Fortunately, Wikipedia is always very handy for such needs.
First of all, what is χ² distribution? (As it is what χ² is based on)
In short, k-order χ² distribution or χ² distribution with k (k is a positive integer) degrees of freedom is the sum of squares of k independent standard normal random variables (random variables with thestandard normal (Gaussian) distribution). It degenerates to a standard normal random variable when k is 1.
Its probability density function and cumulative density function are both given in the Wikipedia article about it.
However what is interesting is its utilization as a mathematical tool in statistic tests.
Think about the following scenario.
From hypothesis, in a certain area, the ratio of the number of men to that of women is 1.1:1, and we can use the tool developed using χ² statistic to test how likely this 'theory' or statement is NOT true.
To answer this question, the key is to create a formula in a similar form to χ² where the errors or differences are corresponding to the individual random variables in χ².
And at the same time we can draw a sample of people from that area in the number that can easily test the frequency, like 105, as the ideal match of the theoretical frequency would be 55 men and 50 women.
And the formula mentioned above is defined as (note this testing variable is also called χ², as it's a χ² test)
χ² = (Number of Men from the Sample - 55)^2 / 55 + (Number of Women from the Sample - 50)^2 / 50, provided the size of the sample is 105.
We can see either of the two components of the sum above should act like the square of a standard normal random variable if the statement is true, however they are completely correlated instead of independent as if one of them is known the other is determined.
So if we end up having 59 men and 46 women in the sample, we will have χ² = 0.61. Look up in the cdf of χ² for degrees of freedom being 1, we find the possibility of χ² over 0.61 is around 0.4 which is way above the conventional criteria for statistical significance 0.001. This possibility might be denoted by p in some literature. So normally we would not reject the null hypothesis.
Hmm, the above interpretation sounds not making much sense (esp. the fact that we treat the case with degrees of freedom being 1 whereas there are actually two terms involved), however that's what I understand from the Wikipedia articles. Will review and correct that after a further study on the subject.
References:
1. Chi-squared distribution, Wikipedia
2. Pearson's chi-squared test
- 关于χ²分布和统计
- 6.2.5 数据分布和统计
- 关于共轭分布,beta分布,二项分布和Dirichlet分布、多项式分布的关系
- 关于集群和分布的个人理解
- 关于进程堆栈分布和工作原理
- R统计分布函数
- 统计0-1分布
- 统计分布总结
- 概率统计 分布
- 概率统计分布模型
- 统计量及其分布
- 分辨率分布统计入口
- Android源码Makefile分布统计
- 统计论坛在线人数分布...
- 统计论坛在线人数分布
- 统计论坛在线人数分布
- 统计论坛在线人数分布
- 统计之常用的分布
- Libcurl 简明使用指南
- libcurl教程
- URLconf+MTV:Django眼中的MVC
- STl-replace_if() trim,split,replace
- 层次聚类算法
- 关于χ²分布和统计
- VS2008 安装 Boost 1.43.0
- 如何弄明白需求
- (2356)Vista下使用Visual Studio 2008(VC)独立编译Boost.Regex库
- ThreadPoolExecutor eclips debug时问题解决
- 【让Email引领你的高效工作系列】之吐槽(下半场)
- 最后一天工作日,总结一下今年的工作
- 模板方法模式(Template Method Pattern)
- 将Excel转化为DataSet,并保存到数据库