机器学习演算法 第四讲 Feasibility of Learning——学习笔记k
来源:互联网 发布:工资管理系统 数据库 编辑:程序博客网 时间:2024/05/08 14:25
今天年初七,放假结束,继续学习工作了!
在过年前一周也是断断续续的,食物中毒了,发烧了,图书馆没法上网了,小伙伴聚餐了,等等,再加上后来的春节,所以中间断了好久,是自己的松懈。
那下面继续~
Learning is impossible
A Learning Puzzle
all valid reasons, your adversarial teacher can always call you 'didn't learn'
A 'Simple' Binary Classification Problem
learning from D (to infer sth. outside D) is doomed if any 'unknown' f can happen
Probability to Rescue
Inferring Something Unknown
Statistics 101: Inferring Orange Probability
consider a bin of many many orange and green marbles, do we know the orange portion? No.
bin ---
assume
orange probability = u
green probability = 1 - u
with u unknown
sample ---
N marbles sampled independently, with
orange fraction = v
green fraction = 1 - v
with v known
Possible versus Probable
Possible versus Probable
does in-sample v say anything about out-of-sample u?
No: sample can be mostly green while bin is mostly is orange
Yes: in-sample v likely close to unknown u
Hoeffding's Inequality
in big samples (N large), v is probably close to u (within e)
P [ |v-u| > e ] <= 2 exp (-2 e^2 N)
called Hoeffding's Inequality, for marbles, coin, polling ...
the statement 'v=u' is probably approximately correct (PAC)
no need to know u
Connection to Learning
Added Components
For any fixed h, can probably infer unknown Eout by known Ein
The Formal Guarantee
For any fixed h, in 'big' data (N large)
in sample error Ein(h) is probably close to out-of-sample error Eout(h)
using Hoeffding's Inequality
P[ |Ein(h) - Eout(h)| > e ] <= 2exp(-2e^2N)
no need to know Eout(h)
--- f and P can stay unknown
Ein(h) = Eout(h) is probably approximately correct(PAC)
The Verification Flow
Possible versus Probable
Multiple h
PD [ BAD D] = sum P(D) * [ BAD D]
<= 2M exp(-2e^2N)
The Statistical Learning Flow
0 0
- 机器学习演算法 第四讲 Feasibility of Learning——学习笔记k
- 机器学习基石第四讲:feasibility of learning
- 机器学习演算法 第三讲 Types of Learning——学习笔记
- 机器学习基石-Feasibility of Learning
- 机器学习演算法 第六讲 Theory of Generalization——学习笔记
- 台湾大学林轩田机器学习基石课程学习笔记4 -- Feasibility of Learning
- Feasibility of Learning & Training versus Test(林轩田-机器学习)
- 机器学习演算法 第五讲 Training versus Testing——学习笔记
- 林轩田之机器学习课程笔记(when can machines learn之feasibility of learning)(32之4)
- 机器学习基石——第3-4讲.Types of Learning
- 机器学习基石第三讲:types of learning
- Stanford机器学习---第一讲. Introduction of machine learning
- Standford 机器学习—第四讲 神经网络的表示
- 机器学习中使用的神经网络第四讲笔记
- 机器学习笔记——K-means
- 机器学习笔记——K-Means
- Andrew Ng 《Machine Learning》第一讲——Supervised Learning & Unsupervised Learning 学习笔记
- Machine Learning第十讲[大规模机器学习]
- 算法效率的度量
- poj 3468
- 【hibernate框架】性能优化之session.clear()的运用和Java内存泄露问题
- Git常用指令集合
- 利用CIDetector来人脸识别
- 机器学习演算法 第四讲 Feasibility of Learning——学习笔记k
- 解决 Visual Studio Debugger Just-In-Time Debugging
- 一个简单的可以实现分页的SqlHelper类
- Thinking in Java学习笔记
- CF 518C(Anya and Smartphone-映射)
- 黑马程序员--String类的知识
- Linux增强系统安全性:防止单用户模式(single)修改密码
- zoj 3228 Searching the String (ac自动机)
- POJ 3321 Apple Tree(树状数组)