Binary Classification的另两种models
来源:互联网 发布:windows教育版 编辑:程序博客网 时间:2024/05/23 17:20
Why
除了常用的logistic 模型做$Y = \{0, 1\}$的预测外, 还有Probit Regression 和Complimentary log-log 两种方法. Logistic 和probit 输出结果相似. 对于单变量, 没有证据表明哪种表现最突出. 但多变量时一般采用logistic.
考虑上篇的心脏病例子. $Y = 1$表有病, $Y = 0$没病. 现实中有病没病是通过测量体内参数算出来的, 所以设存在连续值$Y^c$表人体内化学平衡. 当$Y^c < y^* \Rightarrow Y = 1, y^*$是健康阀值.
我们假设$Y^c$和年龄$X$存在线性关系: $Y^c = \beta_0^c + \beta_1^c X + \epsilon.$ 所以$\pi = P(Y = 1| X; \beta) = P(Y^c \leq y^*) = P(\beta_0^c + \beta_1^c X + \epsilon \leq y^* ) = P( \epsilon \leq y^* - \beta_0^c - \beta_1^c X ). $
无论是logistic, probit 还是log-log, 它们的区别集中在$\epsilon$上: 不同的model对误差分布有不同假设.
$\epsilon_i \sim iid.$
Logistic Model
随机变量$Z$它的pdf是$f(z) = e^z/(1 + e^z)^2$, cdf是$F(z) = e^z/(1 + e^z)$是称$Z$满足标准logisitc分布. Logistic 分布pdf 以0为中心呈钟型, cdf呈S型.设$\epsilon = \sigma Z$, $\sigma > 0$是scale参数, 我们可以得到$\pi = P( \epsilon \leq y^* - \beta_0^c - \beta_1^c X ) = P(Z \leq ( y^* - \beta_0^c - \beta_1^c X)/\sigma) = P(Z \leq \beta_0 + \beta_1 X) = e^{\beta_0 + \beta_1 X}/(1 + e^{\beta_0 + \beta_1 X}).$ 其中$\beta_0 = (y^* - \beta_0^c)/\sigma, \beta_1 = -\beta_1^c/\sigma.$
因此logistic model 为$\pi' = log(\pi/(1 - \pi)) = \beta_0 + \beta_1X.$
Probit Model
设$\epsilon = \sigma Z, Z \sim N(0, 1),$ 可得
$\pi = P( \epsilon \leq y^* - \beta_0^c - \beta_1^c X ) = P(Z \leq ( y^* - \beta_0^c - \beta_1^c X)/\sigma) = P(Z \leq \beta_0 + \beta_1 X) = \Phi(\beta_0 + \beta_1 X)$ 其中$\beta_0 = (y^* - \beta_0^c)/\sigma, \beta_1 = -\beta_1^c/\sigma. \Phi$是标准正太分布的cdf. 所以
$\Phi^{-1}(\pi) = \beta_0 + \beta_1 X.$
Complimentary log-log
用于数非常极端情况, 比如特别小或者特别大. 当$Z$的pdf满足$f(z) = exp(-z-e^z)$, cdf 满足$F(z) = exp(-e^{-z})$时, $Z$服从Gumbel分布. 与之前不同的是, Gumbel分布pdf不以$0$对称., 它向左倾斜. 条件老样子, 得
$\pi = P( \epsilon \leq y^* - \beta_0^c - \beta_1^c X ) = P(Z \leq ( y^* - \beta_0^c - \beta_1^c X)/\sigma) = P(Z \leq \beta_0 + \beta_1 X) = \Phi(\beta_0 + \beta_1 X)$ 其中$\beta_0 = (y^* - \beta_0^c)/\sigma, \beta_1 = -\beta_1^c/\sigma.$
- Binary Classification的另两种models
- Binary Classification Core PLA
- Logistic regression -binary classification
- Generalized linear models and linear classification
- 第四章:Linear Models for Classification
- 第四章:Linear Models for Classification
- FastText.zip compressing text classification models
- Binary classification 二分类和混淆矩阵的概念
- Classification Probability Models and Conditional Random Fields(2)--HMM
- Classification Probability Models and Conditional Random Fields(3)
- 第四章:Linear Models for Classification exercise 1-9
- 第四章:Linear Models for Classification exercise 10-24
- 第四章:Linear Models for Classification exercise 25-26
- Building powerful image classification models using very little data
- 利用pytorch实现Visualising Image Classification Models and Saliency Maps
- Logistic Regression to do Binary Classification
- 9-4-Linear Regression for Binary Classification
- Django Models的数据类型
- iOS 判断网络类型(3G,4G,Wi-Fi)
- iOS开发UI篇—使用storyboard创建导航控制器以及控制器的生命周期
- Win7无线配置工具VC++(Soft-Ap,Wifi,无线承载网络,ICS,Wifi)
- Spring+Freemarker实现自定义方法
- COCOS2DX 3.2 实现模态对话框
- Binary Classification的另两种models
- 修改openssh源代码,添加操作记录审记功能
- Master内使用FindControl找不到控件的解决方案
- Mozilla Firefox Web Browser火狐浏览器安装一直处于“正在检测下载文件的完整性”
- POI导出Word插入复选框
- Bag标签之把字符串分割成数组,返回一个存放数组数据的书包(Bag)
- 修改VS2008中文字体
- iOS笔记之_C语言流程控制
- Ubuntu安装appium过程整理