10601 学习笔记

来源:互联网 发布:js中保留字的单词 编辑:程序博客网 时间:2024/06/08 08:28

week 2

example [ 1,2, 3,4 ] for 1,2,3,4 each has 2  attribute. Output [a,b,c]

Input space C: number of possible unique inputs----2 ^4

Concept space: the space of all possible concepts 3 ^(2^4)

Hypothesis space H: the space of all semantically distinct conjunctions of the type described above 3^4 +1(empty set)

Training samples D;

Target function C:

Version space: VS(H,D) subset of hypotheses from H consistent with all training examples in D 包含了目标概念里所有的合理变形

VS(H,D): version space with respect to  hypothesis space&training samples

Determine: A hypothesis h in H such that h(x) = c(x) for all x in D.


Find-S algorithm

1.Initialize h to the most specific hypothesis in H

2. For each positive training instance x

for each attribute constraint a(i) in h

If the constraint a(i) in h is satisfied by x 

then do nothing

else replace a(i)  in h by the next more general constraint that is satisfied by x

3.output hypothesis h


How surprised you are? 

Answer is I(E) = log2 (1/P(E))

Definition of Entropy (H): average amount of information in observing the output of the source S


Non negative;

if p(x) = 1/X (uniform distribution), then max.

The further p is from uniform, the lower the entrophy.


P : probability

T,M not independent :

P(T = t, M= m) ~= P(T= t) * P(M = m)

Joint Entropy:


Notice that normally H(X,Y) < H(X) + H(Y)

p(x,y) means x and y happens in the same time.

Condition Entropy:






理解 第二个算式。for every x, first calculate the sum of all H(Y|X=x).

Remember p(x)*p(y|x) = p(x,y)

Average Mutual Information

I(X,Y) 就是I(X;Y)


PS:log 定义,自己推导。

deduction please see the reference.

Understanding the semicolon, comma and vertical bar:

',' (comma) is the first to bind, and is used to group multiple RV's and treat them as one.  E.g. H(X , Y | Z) means H( {X,Y} | Z)

 ';' (semicolon) is the next to bind, and is only used with the Mutual Information Operator.  E.g.  I(X ; Y | Z) means I ({X;Y} | Z)

 '|' (vertical bar) is last to bind.  It conditions everything to its left on everything to its right.  E.g. H(X,Y|Z=z) means the joint entropy of {X,Y} under the condition Z=z.

Notion clarification


H(X) - H(X|Y=y) may be negative!    But H(X) - H(X|Y) is always non-negtative.


Here is another example:  Assume half the class is male, half female.  90% of males were glasses, but only 50% of females were glasses.  So the overall distribution of glasses in the class is 70%/30%, whereas the distribution of glasses among females is 50%/50%.  So:

           H(Glasses|gender=female) = H(0.5,0.5) = 1 bit  > H(Glasses) = H(0.7, 0.3)

I(X;Y;Z) = I(X;Y) - I(X;Y | Z)    


is the diffrence between (how much Y tells us about X on average) and (how much Y tells us about X on average, when Z is known).   You can think about it as "How much does the usefulness of Y (with regard to knowing X) change when we also know Z."


The negative value will occur when the usefulness of Y for predicting X INCREASES when we also know Z.


An analogy: there is a safe (X) with two different locks (Y and Z).  You need to open both locks in order to access the content of the safe.  If you only hav a key to Y, it is not worth anything by itself.  But if you already have the key to Z, suddenly the value of Y increases.


Another example, closer to the realm of information:  If you divide a password into two halves (Y and Z), either half alone is not directly useful.  But if you are told one of them (Z), the other one (Y) becomes very useful.



0 0