Rough Set Theory

来源：互联网发布：java编写一个日历程序编辑：程序博客网时间：2024/05/22 12:41

开始从事数据挖掘方面的工作也已经有一年了，经常看CSND和国外一些博客上面的文章，但是从来也没有自己总结过，感觉很快就会忘记了，所以想从现在开始，用博客的方式来总结自己工作中的所学所得。

这篇关于Rough Set的整理主要是wiki上面的内容，因为找了好几个讲解Rough Set的资料感觉都不是很清晰，看到wiki上面的内容到时清楚明白，所以打算根据wiki上面的内容总结一下。

原文链接：点击打开链接（https://en.wikipedia.org/wiki/Rough_set）

1. Definitions

1.1 Information system framework

Let $I = (\mathbb{U},\mathbb{A})$ be an information system (attribute-value system).

$\mathbb{U}$ is a non-empty set of finite objects (the universe) $\mathbb{U}$ = { $O_{1}$ ～ $O_{10}$ }

$\mathbb{A}$ is a non-empty, finite set of attributes $\mathbb{A}$ = { $P_{1}$ ～ $P_{5}$ }

Such that $a:\mathbb{U} \rightarrow V_a$ for every $a \in \mathbb{A}$ .

$V_a$ is the set of values that attribute $a$ may take. $V_a$ = {0, 1, 2} if $a$ = $P_{1}$

$a(x)$ is the value of object $x$ 's attribute $a$ 's value. $a(x)$ = 1 if $a$ = $P_{1}$ and $x$ = $O_{1}$

An example of Information system table:

With any $P \subseteq \mathbb{A}$ there is an associated equivalence relation $\mathrm{IND}(P)$ :

\mathrm{IND}(P) = \left\{(x,y) \in \mathbb{U}^2 \mid \forall a \in P, a(x)=a(y)\right\}

The relation $\mathrm{IND}(P)$ is called a $P$ -indiscernibility relation.

The partition of $\mathbb{U}$ is a family of all equivalence classes of $\mathrm{IND}(P)$ and is denoted by $\mathbb{U}/\mathrm{IND}(P)$ (or $\mathbb{U}/P$ ).

If $(x,y)\in \mathrm{IND}(P)$ , then $x$ and $y$ are indiscernible (or indistinguishable) by attributes from $P$ .

For example:

if $P = \{P_{1},P_{2},P_{3},P_{4},P_{5}\}$

so the equivalence classes:

if attribute $P =\{ P_{1}\}$ alone is selected

so the equivalence classes:

1.2 Definition of a Rough Set

Let $X \subseteq \mathbb{U}$ be a target set that we wish to represent using attribute subset $P$ .

For example

consider the target set $X = \{O_{1},O_{2},O_{3},O_{4}\}$ , and let attribute subset $P = \{P_{1}, P_{2}, P_{3}, P_{4}, P_{5}\}$ , the full available set of features. It will be noted that the set $X$ cannot be expressed exactly, because in $[x]_P,$ , objects $\{O_{3}, O_{7}, O_{10}\}$ are indiscernible. Thus, there is no way to represent any set $X$ which includes $O_{3}$ but excludes objects $O_{7}$ and $O_{10}$ .

However, the target set $X$ can be approximated using only the information contained within $P$ by constructing the $P$ -lower and $P$ -upper approximations of $X$ :

{\underline P}X= \{x \mid [x]_P \subseteq X\}

{\overline P}X = \{x \mid [x]_P \cap X \neq \emptyset \}

Lower approximation and positive region

{\underline P}X = \{O_{1}, O_{2}\} \cup \{O_{4}\}

is the union of all equivalence classes in

[x]_P

which are contained by (i.e., are subsets of) the target set.

The lower approximation is the complete set of objects in

\mathbb{U}/P

that can be positively (i.e., unambiguously) classified as belonging to target set

X

Upper approximation and negative region

{\overline P}X = \{O_{1}, O_{2}\} \cup \{O_{4}\} \cup \{O_{3}, O_{7}, O_{10}\}

is the union of all equivalence classes in

[x]_P

which have non-empty intersection with the target set.

The upper approximation is the complete set of objects that in $\mathbb{U}/P$ that cannot be positively (i.e., unambiguously) classified as belonging to the complement ( $\overline X$ ) of the target set $X$ . In other words, the upper approximation is the complete set of objects that are possibly members of the target set $X$ . The set $\mathbb{U}-{\overline P}X$ therefore represents the negative region, containing the set of objects that can be definitely ruled out as members of the target set.

Boundary region

The boundary region, given by set difference ${\overline P}X - {\underline P}X$ , consists of those objects that can neither be ruled in nor ruled out as members of the target set $X$ .

The rough set

The tuple $\langle{\underline P}X,{\overline P}X\rangle$ composed of the lower and upper approximation is called a rough set; thus, a rough set is composed of two crisp sets, one representing a lower boundary of the target set $X$ , and the other representing an upper boundary of the target set $X$ .

The accuracy of the rough-set representation of the set $X$ :

That is, the accuracy of the rough set representation of

X

\alpha_{P}(X)

0 \leq \alpha_{P}(X) \leq 1

, is the ratio of the number of objects which can positively be placed in

X

to the number of objects that can possibly be placed in

X

– this provides a measure of how closely the rough set is approximating the target set.

1.3 Definability

if ${\overline P}X = {\underline P}X$ , we say the $X$ is definable on attribute set $P$ , otherwise, it's undefinable.

Set

X

is internally undefinable if

{\underline P}X \neq \emptyset

and

{\overline P}X = \mathbb{U}

. This means that on attribute set

P

, there are objects which we can be certain belong to target set

X

, but there are no objects which we can definitively exclude from set

X

Set

X

is externally undefinable if

{\underline P}X = \emptyset

and

{\overline P}X \neq \mathbb{U}

. This means that on attribute set

P

, there are no objects which we can be certain belong to target set

X

, but there are objects which we can definitively exclude from set

X

Set

X

is totally undefinable if

{\underline P}X = \emptyset

and

{\overline P}X = \mathbb{U}

. This means that on attribute set

P

, there are no objects which we can be certain belong to target set

X

, and there are no objects which we can definitively exclude from set

X

. Thus, on attribute set

P

, we cannot decide whether any object is, or is not, a member of

X

1.4 Reduct and core

Formally, a reduct is a subset of attributes $\mathrm{RED} \subseteq P$ such that

$[x]_{\mathrm{RED}}$ = $[x]_P$ , that is, the equivalence classes induced by the reduced attribute set $\mathrm{RED}$ are the same as the equivalence class structure induced by the full attribute set $P$ .
the attribute set $\mathrm{RED}$ is minimal, in the sense that $[x]_{(\mathrm{RED}-\{a\})} \neq [x]_P$ for any attribute $a \in \mathrm{RED}$ ; in other words, no attribute can be removed from set $\mathrm{RED}$ without changing the equivalence classes $[x]_P$ .

So, for example:

attribute set

\{P_3,P_4,P_5\}

is a reduct, and the equivalence class structure is

and same with the $P = \{P_{1}, P_{2}, P_{3}, P_{4}, P_{5}\}$ .So we can say the former one is a reduct of the latter one. Moreover, the reduct is not unique and for this instance, $\{P_1,P_2,P_5\}$ is also a reduct for the $P = \{P_{1}, P_{2}, P_{3}, P_{4}, P_{5}\}$ .

The set of attributes which is common to all reducts is called the core

for the two reducts $\{P_1,P_2,P_5\}$ and $\{P_3,P_4,P_5\}$ , the common attribute is $P_{5}$ ,which is the core of equivalence-class structure. If $P_{5}$ is drop out of the attribute set, the equivalence-class structure will be changed.

Note that it is possible for the core to be empty, whicn means that there is no indispensable attribute:any single attribute in such an information system can be deleted without altering the equivalence-class structure. In such cases, there is no essential or necessary attribute which is required for the class structure to be represented.

后面还有Attribute dependency, Rule extraction, Incomplete data三部分，待以后再看。

————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————