机器学习演算法第四讲 Feasibility of Learning——学习笔记k

来源：互联网发布：工资管理系统数据库编辑：程序博客网时间：2024/05/08 14:25

今天年初七，放假结束，继续学习工作了！

在过年前一周也是断断续续的，食物中毒了，发烧了，图书馆没法上网了，小伙伴聚餐了，等等，再加上后来的春节，所以中间断了好久，是自己的松懈。

那下面继续~

Learning is impossible

A Learning Puzzle

all valid reasons, your adversarial teacher can always call you 'didn't learn'

A 'Simple' Binary Classification Problem

learning from D (to infer sth. outside D) is doomed if any 'unknown' f can happen

Probability to Rescue

Inferring Something Unknown

Statistics 101: Inferring Orange Probability

consider a bin of many many orange and green marbles, do we know the orange portion? No.

bin ---

assume

orange probability = u

green probability = 1 - u

with u unknown

sample ---

N marbles sampled independently, with

orange fraction = v

green fraction = 1 - v

with v known

Possible versus Probable

does in-sample v say anything about out-of-sample u?

Nｏ: sample can be mostly green while bin is mostly is orange

Yes: in-sample v likely close to unknown u

Hoeffding's Inequality

in big samples (N large), v is probably close to u (within e)

P [ |v-u| > e ] <= 2 exp (-2 e^2 N)

called Hoeffding's Inequality, for marbles, coin, polling ...

the statement 'v=u' is probably approximately correct (PAC)

no need to know u

Connection to Learning

Added Components

For any fixed h, can probably infer unknown Eout by known Ein

The Formal Guarantee

For any fixed h, in 'big' data (N large)

in sample error Ein(h) is probably close to out-of-sample error Eout(h)

using Hoeffding's Inequality

P[ |Ein(h) - Eout(h)| > e ] <= 2exp(-2e^2N)

no need to know Eout(h)

--- f and P can stay unknown

Ein(h) = Eout(h) is probably approximately correct(PAC)

The Verification Flow

Possible versus Probable

Multiple h

PD [ BAD D] = sum P(D) * [ BAD D]

<= 2M exp(-2e^2N)

The Statistical Learning Flow

0 0

机器学习演算法 第四讲 Feasibility of Learning——学习笔记k