机器学习演算法 第四讲 Feasibility of Learning——学习笔记k

来源:互联网 发布:工资管理系统 数据库 编辑:程序博客网 时间:2024/05/08 14:25

今天年初七,放假结束,继续学习工作了!

在过年前一周也是断断续续的,食物中毒了,发烧了,图书馆没法上网了,小伙伴聚餐了,等等,再加上后来的春节,所以中间断了好久,是自己的松懈。

那下面继续~


Learning is impossible

A Learning Puzzle

all valid reasons, your adversarial teacher can always call you 'didn't learn' 

A 'Simple' Binary Classification Problem

learning from D (to infer sth. outside D) is doomed if any 'unknown' f can happen


Probability to Rescue

Inferring Something Unknown

Statistics 101: Inferring Orange Probability

consider a bin of many many orange and green marbles, do we know the orange portion? No.

bin ---
assume
orange probability = u
green probability = 1 - u
with u unknown
sample ---
N marbles sampled independently, with
orange fraction = v
green fraction = 1 - v
with v known

Possible versus Probable

does in-sample v say anything about out-of-sample u?
No: sample can be mostly green while bin is mostly is orange
Yes: in-sample v likely close to unknown u

Hoeffding's Inequality

in big samples (N large), v is probably close to u (within e)
P [ |v-u| > e ] <= 2 exp (-2 e^2 N)
called Hoeffding's Inequality, for marbles, coin, polling ...
the statement 'v=u' is probably approximately correct (PAC)
no need to know u


Connection to Learning

Added Components

For any fixed h, can probably infer unknown Eout by known Ein

The Formal Guarantee

For any fixed h, in 'big' data (N large)
in sample error Ein(h) is probably close to out-of-sample error Eout(h)
using Hoeffding's Inequality
P[ |Ein(h) - Eout(h)| > e ] <= 2exp(-2e^2N)
no need to know Eout(h)
--- f and P can stay unknown
Ein(h) = Eout(h) is probably approximately correct(PAC)

The Verification Flow

Possible versus Probable

Multiple h

PD [ BAD D] = sum P(D) * [ BAD D]
<= 2M exp(-2e^2N)

The Statistical Learning Flow






0 0
原创粉丝点击