3 The Bayesian Network Representation

来源:互联网 发布:淘宝领奖是真是假 编辑:程序博客网 时间:2024/05/29 02:19

3 The Bayesian Network Representation

3.1 Exploiting Independence Properties

3.1.1 Independent Random Variables

P(X1,...,Xn)=P(X1)P(X2)P(Xn)

3.1.2 The Conditional Parameterization

P(I,S)=P(I)P(S|I)
chain rule: P(I,D,G,S,L)=P(I)P(D)P(G|I,D)P(S|I)P(L|G)

3.1.3 The Naive Bayes Model

CPDs: conditional probability distributions

naive Bayes assumption: the features are conditionally independent given the instance’s class. ( In other words, within each class of instances, the different properties can be determined independently. )
(XiXi|C) for all i
Xi=X1,...,XnXi
这里写图片描述
P(C,X1,...,Xn)=P(C)P(Xi|C)

Concept: The Naive Bayes Model:
It is often used in practice, because of its simplicity and the small number of parameters required. The model is generally used for classification.

P(C=c1|x1,...,xn)P(C=c2|x1,...,xn)=P(C=c1)P(C=c2)i=1NP(xi|C=c1)P(xi|C=c2)

3.2 Bayesian Networks

They are not restricted to representing distributions satisfying the strong independence assumptions implicit in the naive Bayes model.

DAG: directed acyclic graph
这里写图片描述

3.2.1 The Student Example Revisited

local probability models

3.2.1.2 Reasoning Patterns
A joint distribution PB specifies (albeit implicitly) the probability PB(Y=y|E=e) of
any event y given any observations e.

causal reasoning or prediction:
intelligence and the difficulty of class => the recommendation letter
evidential reasoning or explanation:
from effects to causes
grade and the recommendation letter => intelligence
intercausal reasoning:
Explaining away is an instance of a general reasoning pattern called intercausal reasoning, where different causes of the same effect can interact. This type of reasoning is a very common pattern in human reasoning.
For example, when we have fever and a sore throat, and are concerned about mononucleosis, we are greatly relieved to be told we have the flu.

3.2.2.2 Bayesian Network Semantics
Definition: Bayesian network structure, local independencies
A Bayesian network structure G is a directed acyclic graph whose nodes represent random variables X1, … , Xn . Let PaGXi denote the parents of Xi in G, and NonDescendantsXi denote the variables in the graph that are not descendants of Xi . Then G encodes the following set of conditional independence assumptions, called the local independencies, and denoted by Il(G):
For each variable Xi

XiNonDescendantsXi|PaGXi

3.2.3 Graphs and Distributions

I-Maps

Definition: independencies in P
Let P be a distribution over X . We define I(P) to be the set of independence assertions of the
form (X ⊥ Y | Z) that hold in P .

Definition: I-map
Let K be any graph object associated with a set of independencies I(K). We say that K is an
I-map for a set of independencies I if I(K) ⊆ I.

G is an I-map for P if G is an I-map for I(P): any independence that G asserts must also hold in P .

I-Map to Factorization

Definition: factorization
Let G be a BN graph over the variables X1 , … , Xn . We say that a distribution P over the same
space factorizes according to G if P can be expressed as a product:

P(X1,...,Xn)=i=1nP(Xi|PaGXi)

This equation is called the chain rule for Bayesian networks.

Factorization to I-Map

3.3 Independencies in Graphs

This set is also called the set of global Markov independencies.

3.3.1 D-separation

v-structure: X → Z ← Y
observed variable active trail: Let G be a BN structure, and X1 … Xn a trail in G. Let Z be a subset of observed variables. The trail X1 … Xn is active given Z if
• Whenever we have a v-structure Xi−1 → Xi ← Xi+1 , then Xi or one of its descendants are in Z;
• no other node along the trail is in Z.
d-separation: Let X, Y , Z be three sets of nodes in G. We say that X and Y are d-separated given Z, denoted d-sepG (X; Y | Z), if there is no active trail between any node X ∈ X and Y ∈ Y given Z.
We use I(G) to denote the set of independencies that correspond to d-separation:
I(G) = {(X ⊥ Y | Z) : d-sepG (X; Y | Z)}.
This set is also called the set of global Markov independencies.
soundness of d-separation: if we find that two nodes X and Y are d-separated given some Z, then we are guaranteed that they are, in fact, conditionally independent given Z.
completeness of d-separation: d-separation detects all possible independencies.

Definition faithful: A distribution P is faithful to G if, whenever (X ⊥ Y | Z) ∈ I(P ), then d-sepG (X; Y | Z). In other words, any independence in P is reflected in the d-separation properties of the graph.
if X and Y are not d-separated given Z in G, then X and Y are dependent in all distributions P that factorize over G.

I-Equivalence

I = independence
这里写图片描述
a,b,c, All three of them encode precisely the same independence assumptions: (X ⊥ Y | Z)

I-equivalence: Two graph structures K1 and K2 over X are I-equivalent if I(K1 ) = I(K2). The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes, which are the set of equivalence classes induced by the I-equivalence relation.
这里写图片描述

skeleton: Let G1 and G2 be two graphs over X . If G1 and G2 have the same skeleton and the same set of v-structures then they are I-equivalent.

minimal I-map: A graph K is a minimal I-map for a set of independencies I if it is an I-map for I, and if the removal of even a single edge from K renders it not an I-map.

perfect map: We say that a graph K is a perfect map (P-map) for a set of independencies I if we have that I(K) = I. We say that K is a perfect map for P if I(K) = I(P ).

原创粉丝点击