10601 学习笔记

来源：互联网发布：js中保留字的单词编辑：程序博客网时间：2024/06/08 08:28

week 2

example [ 1,2, 3,4 ] for 1,2,3,4 each has 2 attribute. Output [a,b,c]

Input space C: number of possible unique inputs----2 ^4

Concept space: the space of all possible concepts 3 ^(2^4)

Hypothesis space H: the space of all semantically distinct conjunctions of the type described above 3^4 +1(empty set)

Training samples D;

Target function C:

Version space: VS(H,D) subset of hypotheses from H consistent with all training examples in D 包含了目标概念里所有的合理变形

VS（H,D): version space with respect to hypothesis space&training samples

Determine: A hypothesis h in H such that h(x) = c(x) for all x in D.

C>H>V

Find-S algorithm

1.Initialize h to the most specific hypothesis in H

2. For each positive training instance x

for each attribute constraint a(i) in h

If the constraint a(i) in h is satisfied by x

then do nothing

else replace a(i) in h by the next more general constraint that is satisfied by x

3.output hypothesis h

week3

How surprised you are?

Answer is I(E) = log2 (1/P(E))

Definition of Entropy (H): average amount of information in observing the output of the source S

H(X)=−∑x∈Xp(x)logap(x)

Non negative;

if p(x) = 1/X (uniform distribution), then max.

The further p is from uniform, the lower the entrophy.

P : probability

T,M not independent :

P(T = t, M= m) ~= P(T= t) * P(M = m)

Joint Entropy:

H(X,Y)=−∑x∈X,y∈Yp(x,y)logp(x,y)

Notice that normally H(X,Y) < H(X) + H(Y)

p(x,y) means x and y happens in the same time.

Condition Entropy:

H(Y|X)=∑x∈X p(x)H(Y|X=x)

=−∑x∈X p(x)[∑y∈Y p(y|x)logp(y|x)]

=−∑x∈X,y∈Yp(x,yH(Y|X)

H (Y | X) = \sum x \in X p (x) H (Y | X = x) = - \sum x \in X p (x) [\sum y \in Y p (y | x) log p (y | x)] = - \sum x \in X, y \in Y p (x, y) log p (y | x)

理解第二个算式。for every x, first calculate the sum of all H(Y|X=x).

Remember p(x)*p(y|x) = p(x,y)

Average Mutual Information

I(X,Y) 就是I(X;Y)

H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)

PS:log 定义，自己推导。

deduction please see the reference.

Understanding the semicolon, comma and vertical bar:

',' (comma) is the first to bind, and is used to group multiple RV's and treat them as one. E.g. H(X , Y | Z) means H( {X,Y} | Z)

';' (semicolon) is the next to bind, and is only used with the Mutual Information Operator. E.g. I(X ; Y | Z) means I ({X;Y} | Z)

'|' (vertical bar) is last to bind. It conditions everything to its left on everything to its right. E.g. H(X,Y|Z=z) means the joint entropy of {X,Y} under the condition Z=z.

Notion clarification

I(X,Y)=H(X)–H(X|Y)=H(Y)–H(Y|X)

H(X) - H(X|Y=y) may be negative! But H(X) - H(X|Y) is always non-negtative.

EX:

Here is another example: Assume half the class is male, half female. 90% of males were glasses, but only 50% of females were glasses. So the overall distribution of glasses in the class is 70%/30%, whereas the distribution of glasses among females is 50%/50%. So:

H(Glasses|gender=female) = H(0.5,0.5) = 1 bit > H(Glasses) = H(0.7, 0.3)

I(X;Y;Z) = I(X;Y) - I(X;Y | Z)

is the diffrence between (how much Y tells us about X on average) and (how much Y tells us about X on average, when Z is known). You can think about it as "How much does the usefulness of Y (with regard to knowing X) change when we also know Z."

The negative value will occur when the usefulness of Y for predicting X INCREASES when we also know Z.

An analogy: there is a safe (X) with two different locks (Y and Z). You need to open both locks in order to access the content of the safe. If you only hav a key to Y, it is not worth anything by itself. But if you already have the key to Z, suddenly the value of Y increases.

Another example, closer to the realm of information: If you divide a password into two halves (Y and Z), either half alone is not directly useful. But if you are told one of them (Z), the other one (Y) becomes very useful.

reference:

http://ziketang.com/2013/08/some-notions-about-entropy/

0 0