python机器学习(02)

来源：互联网发布：js 上传文件的原理编辑：程序博客网时间：2024/05/17 04:09

ps:没有01

python机器学习

课程模块

贝叶斯分析
从决策树到随机森林,gcForest
凸优化
半监督学习
特征工程

参考书：

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
这里写图片描述

使用到的扩展包:PyMC

PyMC安装比较困难,基于Anaconda部署比较简单,用到的参考书带有ipynb文件,同时需要安装ipython,Jupyter(Anaconda缺省安装)
用Jupyter打开ipynb文件,可打开整本书
直接在代码修改窗口修改代码
然后,shift+enter运行代码

课后下载代码和数据

贝叶斯公式

这里写图片描述

数理统计学处理的信息

数理统计学的任务是通过样本推断总体
抽样信息=总体信息+样本信息
基于抽样信息进行统计推断的理论和斱法称为经典(古典)统计学
先验信息:抽样之前,有关推断问题中未知参数的一些信息,通常来自于经验或,历史资料
基于总体信息+样本信息+先验信息进行统计推断的斱法和理论,称为贝叶斯统计学

后验分布

一般后验分布密度很难具有解析表达式,通常通过MCMC算出。
个别特殊的后验分布可以计算出解析表达式。

贝叶斯统计推断

贝叶斯点估计
贝叶斯区间估计
贝叶斯假设检验

贝叶斯假设检验

不经典统计的假设检验斱法相比,贝叶斯斱法更加直截了当,相当简单
不用设计检验统计量(需要很高的数学技巧)
无需计算统计量的分布
无需给出检验水平和否定域
容易推广到多重假设检验统计情形

PyMC

https://github.com/pymc-devs/pymc
PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems.
Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.
《Bayesian Methods for Hackers》(以下简称“Cameron书”)以PyMC作为主要软件实验工具,演示贝叶斯分析的众多例子

关于jupyter：
运行jupyter（windows）：jupyter notebook
运行程序：选中、shift+回车
编辑：修改程序直接在页面上修改即可。

Bayesian Methods for Hackers：
代码在Ch1_Introduction_PyMC2.ipynb文件中
P5：
抛硬币实例：

# The code below can be passed over, as it is currently not important, plus it# uses advanced topics we have not covered yet. LOOK AT PICTURE, MICHAEL!%matplotlib inlinefrom IPython.core.pylabtools import figsizeimport numpy as npfrom matplotlib import pyplot as pltfigsize(11, 9)import scipy.stats as statsdist = stats.betan_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500]data = stats.bernoulli.rvs(0.5, size=n_trials[-1])   #伯努利分布x = np.linspace(0, 1, 100)# For the already prepared, I'm using Binomial's conj. prior.for k, N in enumerate(n_trials):    sx = plt.subplot(len(n_trials) / 2, 2, k + 1)    plt.xlabel("$p$, probability of heads") \        if k in [0, len(n_trials) - 1] else None    plt.setp(sx.get_yticklabels(), visible=False)    heads = data[:N].sum()    y = dist.pdf(x, 1 + heads, 1 + N - heads)    plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads))    plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4)    plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1)    leg = plt.legend()    leg.get_frame().set_alpha(0.4)    plt.autoscale(tight=True)plt.suptitle("Bayesian updating of posterior probabilities",             y=1.02,             fontsize=14)plt.tight_layout()

这里写图片描述
图1：先验概率
图2-10：后验概率
例子二：

同P7页图书管理员的例子。

例子三：从短信数据推断行为
P12~25
代码可利用jupyter中Introducing our first hammer: PyMC 及以下部分进行验证。

homework
绿色书看
细看lda数学八卦

0 0