概率论:高斯/正态分布

来源:互联网 发布:中国软件资讯网 编辑:程序博客网 时间:2024/05/01 22:57

http://blog.csdn.net/pipisorry/article/details/49516209

高斯分布(正态分布)

若随机变量X服从一个数学期望为μ、方差为σ^2的高斯分布,记为N(μ,σ^2)。其概率密度函数为正态分布的期望值μ决定了其位置,其标准差σ决定了分布的幅度。


正态随机变量概率密度函数φ(x)


正态分布曲线性质

1.当x<;μ时,曲线上升;当x>;μ时,曲线下降。当曲线向左右两边无限延伸时,以x轴为渐近线。
2.正态曲线关于直线x=μ对称。
3.σ越大,正态曲线越扁平;σ越小,正态曲线越尖陡。
4.在正态曲线下方和x轴上方范围内区域面积为1。
5. 3σ原则:P(μ-σ<X≤μ+σ)=68.3%P(μ-2σ<X≤μ+2σ)=95.4%P(μ-3σ&lt;X≤μ+3σ)=99.7%
6.如果X \sim N(\mu, \sigma^2) \,且a与b是实数,那么aX + b∼N(aμ + b,(aσ)2) (参见期望值和方差).
7.如果X \sim N(\mu_X, \sigma^2_X)Y \sim N(\mu_Y, \sigma^2_Y)是统计独立的正态随机变量,那么:
它们的和也满足正态分布U = X + Y \sim N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y) (proof).
它们的差也满足正态分布V = X - Y \sim N(\mu_X - \mu_Y, \sigma^2_X + \sigma^2_Y).
U与V两者是相互独立的。
8.如果X \sim N(0, \sigma^2_X)Y \sim N(0, \sigma^2_Y)是独立正态随机变量,那么:
它们的积XY服从概率密度函数为p的分布
  • p(z) = \frac{1}{\pi\,\sigma_X\,\sigma_Y} \; K_0\left(\frac{|z|}{\sigma_X\,\sigma_Y}\right),其中K0是贝塞尔函数(modified Bessel function)
  • 它们的比符合柯西分布,满足X / Y∼Cauchy(0,σX / σY).
  • 9.如果X_1, \cdots, X_n为独立标准正态随机变量,那么X_1^2 + \cdots + X_n^2服从自由度为n的卡方分布。

    一般正态分布N(μ,σ^2)转换成标准正态分布N(0, 1)

    利用(x-u)/σ

    正态分布归一化的证明

    正态(高斯)分布于统计学,就如水养育生命一样重要,而掌握正态分布的一些性质及其相关证明,也是一项基本功。接下来,我们证明正态分布密度函数的归一化,这是PRML讲解正态分布时的第一个练习题,我们的证明参考了这个习题的答案,其中采用的积分方法非常巧妙。

    问题的描述如下:如果随机变量X    满足均值为μ,方差为σ2 的正态分布,那么其密度函数为:

    N(xμ,σ2)=1(2πσ2)1/2exp{12σ2(xμ)2}

    我们要证明的是:

    N(xμ,σ2)dx=1

    问题的关键在于求解如下积分:

    I=exp(12σ2x2)dx

    为了求解I,可以先求I2,而I2可以写成二重积分的形式:

    I2=exp(12σ2x212σ2y2)dxdy

    将上述二重积分转换为极坐标形式:

    x=rcos(θ)
    y=rsin(θ)

    I2可以写成:

    I2=+02π0exp(r2cos2θ+r2sin2θ2σ2(x,y)(r,θ)drdθ

    其中:

    (x,y)(r,θ)=xryrxθyθ=cosθsinθrsinθrcosθ=rcos2θ+rsin2θ=r

    因此:

    I2=+02π0exp(r2cos2θ+r2sin2θ2σ2)rdrdθ

    利用分部积分法求I2:

    I2=2πσ2+0exp(r22σ2)d(r22σ2)
    I2=2πσ2exp(r22σ2)0=2πσ2

    从而证明了均值为0的正态分布密度函数的归一化:

    N(x0,σ2)dx=1(2πσ2)1/2exp{x22σ2}dx=1

    利用积分的换元法:

    N(xμ,σ2)dx=1(2πσ2)1/2exp{(xμ)22σ2}d(xμ)

    因此证明了N(xμ,σ2)的归一化:

    N(xμ,σ2)dx=1(2πσ2)1/2exp{y22σ2}dy=1
    [正态分布归一化的证明 ]

    累积分布函数Φ(x)

    累积分布函数是指随机变量X小于或等于x的概率,用密度函数表示为

    F(x;\mu,\sigma)=\frac{1}{\sigma\sqrt{2\pi}}\int_{-\infty}^x \exp \left( -\frac{(x - \mu)^2}{2\sigma^2}\ \right)\, dx.

    正态分布的累积分布函数能够由一个叫做误差函数的特殊函数表示:

    \Phi(z)=\frac12 \left[1 + \mathrm{erf}\,(\frac{z-\mu}{\sigma\sqrt2})\right] .

    标准正态分布的累积分布函数习惯上记为Φ,它仅仅是指μ = 0σ = 1时的值,

    \Phi(x)=F(x;0,1)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^x\exp\left(-\frac{x^2}{2}\right)\, dx.

    将一般正态分布用误差函数表示的公式简化,可得:

    \Phi(z)=\frac{1}{2} \left[ 1 + \operatorname{erf} \left( \frac{z}{\sqrt{2}} \right) \right].

    它的反函数被称为反误差函数,为:

    \Phi^{-1}(p)=\sqrt2\;\operatorname{erf}^{-1} \left(2p - 1 \right).

    该分位数函数有时也被称为probit函数。probit函数已被证明没有初等原函数。

    正态分布的分布函数Φ(x)没有解析表达式,它的值可以通过数值积分、泰勒级数或者渐进序列近似得到。

    概率密度函数的累积分布函数



    [正态分布- 维基百科]

    [正态分布- 维基百科的转载]

    皮皮blog


    单变量高斯分布



    多维高斯分布

    f(x)=1(2π)kdetΣexp(12(xμ)TΣ1(xμ))

    where μis the mean, Σthe covariance matrix, and k is the dimension of the space where x takes values.

    截断正态分布truncated normal distribution

    (正对数分布又叫做截断正态分布)

    π ( x ) ∝ N ( 1,1 ) I ( 0 ≤ x ≤ 4 )

    截断高斯分布的概率密度函数pdf


    其概率密度函数pdf要除以截断面积和sigma,使其pdf和为1。

    There is an understanding that if b = ∞ {\displaystyle b=\infty }, thenΦ ( b − μ σ ) = 1 {\displaystyle \Phi \left({\tfrac {b-\mu }{\sigma }}\right)=1}, and similarly, ifa = − ∞ {\displaystyle a=-\infty }, thenΦ ( a − μ σ ) = 0 {\displaystyle \Phi \left({\tfrac {a-\mu }{\sigma }}\right)=0}.

    截断正态分布数据生成

    先生成相同mu, sigma的正态分布数据,取(a, b)之间的就可以了。(因为截断正态分布只是在截断区有数据,其概率相对大小没变)

    截断正态分布的python分析

    [截断正态分布(truncated normal distribution)]

    [Truncated normal distribution]

    [截断正态分布]

    对数高斯分布

    高斯分布的共轭分布

    [主题模型TopicModel:LDA中的数学模型:高斯分布的共轭分布 ]

    皮皮blog



    python中正态分布相关操作的实现

    生成正态分布随机变量

    在计算机模拟中,经常需要生成正态分布的数值。最基本的一个方法是使用标准的正态累积分布函数的反函数。除此之外还有其他更加高效的方法,Box-Muller变换就是其中之一。另一个更加快捷的方法是ziggurat算法。下面将介绍这两种方法。一个简单可行的并且容易编程的方法是:求12个在(0,1)上均匀分布的和,然后减6(12的一半)。这种方法可以用在很多应用中。这12个数的和是Irwin-Hall分布;选择一个方差12。这个随即推导的结果限制在(-6,6)之间,并且密度为12,是用11次多项式估计正态分布。

    Box-Muller方法是以两组独立的随机数U和V,这两组数在(0,1]上均匀分布,用U和V生成两组独立的标准正态分布随即变量X和Y:

     X = \sqrt{- 2 \ln U} \, \cos(2 \pi V) ,
     Y = \sqrt{- 2 \ln U} \, \sin(2 \pi V)

    这个方程的提出是因为二自由度的卡方分布(见性质4)很容易由指数随机变量(方程中的lnU)生成。因而通过随机变量V可以选择一个均匀环绕圆圈的角度,用指数分布选择半径然后变换成(正态分布的)x,y坐标。

    [统计函数库scipy.stats:连续分布-Norm高斯分布]

    Box-Muller变换生成高斯分布数据的c++代码

    #include<iostream>
    using namespace std;

    double generateGaussianNoise(double mu, double sigma){
        const double epsilon = std::numeric_limits<double>::min();
        const double two_pi = 2.0*3.14159265358979323846;

        static double z0, z1;
        static bool generate;
        generate = !generate;

        if (!generate)
            return z1 * sigma + mu;

        double u1, u2;
        do    {
            u1 = rand() * (1.0 / RAND_MAX);
            u2 = rand() * (1.0 / RAND_MAX);
        } while (u1 <= epsilon);

        z0 = sqrt(-2.0 * log(u1)) * cos(two_pi * u2);
        z1 = sqrt(-2.0 * log(u1)) * sin(two_pi * u2);
        return z0 * sigma + mu;
    }
    double getGaussian(double sigma_2, double* y, int T){
        double mu = 0;
        for (int i = 0; i < T; i++)
            mu += y[i];
        mu = mu / T;

        return generateGaussianNoise(mu, sqrt(sigma_2 / T));
    }

    int __main(){
        int N = 10;
        double y[10] = { 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0 };
        for (int i = 0; i < 10; i++)
            cout << getGaussian(1.0, y, N) << endl;
        return 0;
    }

    [Box-Muller transform.https://en.wikipedia.org/wiki/Box-Muller_transform]

    绘制一维正态分布概率密度图

    from scipy import statsimport matplotlib.pyplot as pltx = np.linspace(stats.norm.ppf(0.01), stats.norm.ppf(0.99), 100)plt.plot(x, stats.norm.pdf(x), 'r-', alpha=0.6, label='norm pdf')plt.show()

    绘制2维高斯分布图

    import matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3Dfig = plt.figure()ax = Axes3D(fig)rv = stats.multivariate_normal([0, 0], cov=1)x, y = np.mgrid[-3:3:.15, -3:3:.15]ax.plot_surface(x, y, rv.pdf(np.dstack((x, y))), rstride=1, cstride=1)ax.set_zlim(0, 0.2)# savefig('../figures/plot3d_ex.png',dpi=48)plt.show()

    [三维绘图之matplotlib.mplot3d工具包]

    多维情况下固定一维绘制一维高斯分布

    import matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3Dfig = plt.figure()ax = Axes3D(fig)rv = stats.multivariate_normal([0, 0], cov=1)x, y = np.meshgrid(np.linspace(0, 0, 400), np.linspace(-3, 3, 400))ax.plot_surface(x, y, rv.pdf(np.dstack((x, y))), rstride=1, cstride=1)ax.set_zlim(0, 0.2)# savefig('../figures/plot3d_ex.png',dpi=48)plt.show()
    或者:

    x, y = np.mgrid[-3:3:.15, -3:3:.15]x = np.zeros_like(x)

    皮皮blog



    标准正态分布函数表

    这个表查的应该是标准正态分布累积分布函数的值吧


    查表:lz只知道列为0时的查表,不懂列是啥意思,从表中看出,cdf(x=0) = 0.5 [列0,行0],cdf(x=1) = 0.8413 [列值0,行值1], cdf(x=3) = 0.9987 [列值0,行值3]

    示例:如计算PHI(3)-PHI(-1), 可知PHI(3)-PHI(-1) = PHI(3)-(1 - PHI(1)) = PHI(3) + PHI(1) - 1 = 0.84

    标准正态分布函数表(形式1:sum=1)φ(x)  x       00.010.020.030.040.050.060.070.080.09x00.50.5040.5080.5120.5160.51990.52390.52790.53190.53590.10.53980.54380.54780.55170.55570.55960.56360.56750.57140.57530.20.57930.58320.58710.5910.59480.59870.60260.60640.61030.61410.30.61790.62170.62550.62930.63310.63680.64060.64430.6480.65170.40.65540.65910.66280.66640.670.67360.67720.68080.68440.6879










    0.50.69150.6950.69850.70190.70540.70880.71230.71570.7190.72240.60.72570.72910.73240.73570.73890.74220.74540.74860.75170.75490.70.7580.76110.76420.76730.77030.77340.77640.77940.78230.78520.80.78810.7910.79390.79670.79950.80230.80510.80780.81060.81330.90.81590.81860.82120.82380.82640.82890.83150.8340.83650.8389










    10.84130.84380.84610.84850.85080.85310.85540.85770.85990.86211.10.86430.86650.86860.87080.87290.87490.8770.8790.8810.8831.20.88490.88690.88880.89070.89250.89440.89620.8980.89970.90151.30.90320.90490.90660.90820.90990.91150.91310.91470.91620.91771.40.91920.92070.92220.92360.92510.92650.92780.92920.93060.9319










    1.50.93320.93450.93570.9370.93820.93940.94060.94180.9430.94411.60.94520.94630.94740.94840.94950.95050.95150.95250.95350.95451.70.95540.95640.95730.95820.95910.95990.96080.96160.96250.96331.80.96410.96480.96560.96640.96710.96780.96860.96930.970.97061.90.97130.97190.97260.97320.97380.97440.9750.97560.97620.9767










    20.97720.97780.97830.97880.97930.97980.98030.98080.98120.98172.10.98210.98260.9830.98340.98380.98420.98460.9850.98540.98572.20.98610.98640.98680.98710.98740.98780.98810.98840.98870.9892.30.98930.98960.98980.99010.99040.99060.99090.99110.99130.99162.40.99180.9920.99220.99250.99270.99290.99310.99320.99340.9936










    2.50.99380.9940.99410.99430.99450.99460.99480.99490.99510.99522.60.99530.99550.99560.99570.99590.9960.99610.99620.99630.99642.70.99650.99660.99670.99680.99690.9970.99710.99720.99730.99742.80.99740.99750.99760.99770.99770.99780.99790.99790.9980.99812.90.99810.99820.99820.99830.99840.99840.99850.99850.99860.9986










    30.99870.9990.99930.99950.99970.99980.99980.99990.999913.10.9990320.9990650.9990960.9991260.9991550.9991840.9992110.9992380.9992640.9992893.20.9993130.9993360.9993590.9993810.9994020.9994230.9994430.9994620.9994810.9994993.30.9995170.9995340.9995500.9995660.9995810.9995960.9996100.9996240.9996380.9996603.40.9996630.9996750.9996870.9996980.9997090.9997200.9997300.9997400.9997490.999760










    3.50.9997670.9997760.9997840.9997920.9998000.9998070.9998150.9998220.9998280.9998853.60.9998410.9998470.9998530.9998580.9998640.9998690.9998740.9998790.9998830.9998803.70.9998920.9998960.9999000.9999040.9999080.9999120.9999150.9999180.9999220.9999263.80.9999280.9999310.9999330.9999360.9999380.9999410.9999430.9999460.9999480.9999503.90.9999520.9999540.9999560.9999580.9999590.9999610.9999630.9999640.9999660.999967










    40.9999680.9999700.9999710.9999720.9999730.9999740.9999750.9999760.9999770.9999784.10.9999790.9999800.9999810.9999820.9999830.9999830.9999840.9999850.9999850.9999864.20.9999870.9999870.9999880.9999880.9999890.9999890.9999900.9999900.9999910.9999914.30.9999910.9999920.9999920.9999300.9999930.9999930.9999930.9999940.9999940.9999944.40.9999950.9999950.9999950.9999950.9999960.9999960.9999961.0000000.9999960.999996










    4.50.9999970.9999970.9999970.9999970.9999970.9999970.9999970.9999980.9999980.9999984.60.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999990.9999994.70.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999994.80.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999990.9999994.91.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000



    标准正态分布函数数值表(形式2:sum=0)
              φ(x)  x       00.010.020.030.040.050.060.070.080.09x000.0080.0160.0240.0320.03980.04780.05580.06380.07180.10.07960.08760.09560.10340.11140.11920.12720.1350.14280.15060.20.15860.16640.17420.1820.18960.19740.20520.21280.22060.22820.30.23580.24340.2510.25860.26620.27360.28120.28860.2960.30340.40.31080.31820.32560.33280.340.34720.35440.36160.36880.3758










    0.50.3830.390.3970.40380.41080.41760.42460.43140.4380.44480.60.45140.45820.46480.47140.47780.48440.49080.49720.50340.50980.70.5160.52220.52840.53460.54060.54680.55280.55880.56460.57040.80.57620.5820.58780.59340.5990.60460.61020.61560.62120.62660.90.63180.63720.64240.64760.65280.65780.6630.6680.6730.6778










    10.68260.68760.69220.6970.70160.70620.71080.71540.71980.72421.10.72860.7330.73720.74160.74580.74980.7540.7580.7620.7661.20.76980.77380.77760.78140.7850.78880.79240.7960.79940.8031.30.80640.80980.81320.81640.81980.8230.82620.82940.83240.83541.40.83840.84140.84440.84720.85020.8530.85560.85840.86120.8638










    1.50.86640.8690.87140.8740.87640.87880.88120.88360.8860.88821.60.89040.89260.89480.89680.8990.9010.9030.9050.9070.9091.70.91080.91280.91460.91640.91820.91980.92160.92320.9250.92661.80.92820.92960.93120.93280.93420.93560.93720.93860.940.94121.90.94260.94380.94520.94640.94760.94880.950.95120.95240.9534










    20.95440.95560.95660.95760.95860.95960.96060.96160.96240.96342.10.96420.96520.9660.96680.96760.96840.96920.970.97080.97142.20.97220.97280.97360.97420.97480.97560.97620.97680.97740.9782.30.97860.97920.97960.98020.98080.98120.98180.98220.98260.98322.40.98360.9840.98440.9850.98540.98580.98620.98640.98680.9872










    2.50.98760.9880.98820.98860.9890.98920.98960.98980.99020.99042.60.99060.9910.99120.99140.99180.9920.99220.99240.99260.99282.70.9930.99320.99340.99360.99380.9940.99420.99440.99460.99482.80.99480.9950.99520.99540.99540.99560.99580.99580.9960.99622.90.99620.99640.99640.99660.99680.99680.9970.9970.99720.9972










    30.99740.9980.99860.9990.99940.99960.99960.99980.999813.10.9980640.998130.9981920.9982520.998310.9983680.9984220.9984760.9985280.9985783.20.9986260.9986720.9987180.9987620.9988040.9988460.9988860.9989240.9989620.9989983.30.9990340.9990680.99910.9991320.9991620.9991920.999220.9992480.9992760.999323.40.9993260.999350.9993740.9993960.9994180.999440.999460.999480.9994980.99952










    3.50.9995340.9995520.9995680.9995840.99960.9996140.999630.9996440.9996560.999773.60.9996820.9996940.9997060.9997160.9997280.9997380.9997480.9997580.9997660.999763.70.9997840.9997920.99980.9998080.9998160.9998240.999830.9998360.9998440.9998523.80.9998560.9998620.9998660.9998720.9998760.9998820.9998860.9998920.9998960.99993.90.9999040.9999080.9999120.9999160.9999180.9999220.9999260.9999280.9999320.999934










    40.9999360.999940.9999420.9999440.9999460.9999480.999950.9999520.9999540.9999564.10.9999580.999960.9999620.9999640.9999660.9999660.9999680.999970.999970.9999724.20.9999740.9999740.9999760.9999760.9999780.9999780.999980.999980.9999820.9999824.30.9999820.9999840.9999840.999860.9999860.9999860.9999860.9999880.9999880.9999884.40.999990.999990.999990.999990.9999920.9999920.9999920.99999920.9999920.999992










    4.50.9999940.9999940.9999940.9999940.9999940.9999940.9999940.9999960.9999960.9999964.60.9999960.9999960.9999960.9999960.9999960.9999960.9999960.9999960.9999980.9999984.70.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999984.80.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999980.9999984.91111111111from: http://blog.csdn.net/pipisorry/article/details/49516209

    ref: 正态分布的由来

    Qs – Deep Gaussian Processes


    2 0