hadoop2.7.2学习笔记05-hadoop文件系统API定义-本文档使用到的专用符号

来源：互联网发布：关于发动机的软件编辑：程序博客网时间：2024/06/12 22:02

类似z-node的正式的符号集可以用来精确地定义hadoop文件系统的特征。

然而它有一些缺陷（这里不一一列出）。这里使用纯数学的正式符号来对hadoop文件系统的特征进行描述。

1、本文档中使用到的符号集

本文档使用的符号集将会包含z-node语法的一个子集，但是使用ASCII的格式。使用python list符号集来操作lists和sets。

iff : iff If and only if
⇒ : implies
→ : --> total function
↛ : -> partial function

∩ : ^: Set Intersection
∪ : +: Set Union
\ : -: Set Difference
∃ : exists Exists predicate
∀ : forall: For all predicate
= : == Equals operator
≠ : != operator. In Java z ≠ y is written as !( z.equals(y)) for all non-simple datatypes
≡ : equivalent-to equivalence operator. This is stricter than equals.
∅ : {} Empty Set. ∅ ≡ {}
≈ : approximately-equal-to operator
¬ : not Not operator. In Java, !
∄ : does-not-exist: Does not exist predicate. Equivalent tonot exists
∧ : and : local and operator. In Java , &&
∨ : or : local and operator. In Java, ||
∈ : in : element of
∉ : not in : not an element of
⊆ : subset-or-equal-to the subset or equality condition
⊂ : subset-of the proper subset condition
| p | : len(p) the size of a variable
:= : = :
`:#` : Python-style comments
happens-before : happens-before : Lamport’s ordering relationship as defined inTime, Clocks and the Ordering of Events in a Distributed System（这篇论文中不按时间顺序来判断事件a和事件b谁先谁后，而是根据3个原则进行判断。首先将所有事件区分为发送消息和接收消息两大类。如果a和b发生在同一个进程，且a发生比b要早，那么a happens before b；如果a是发送消息，而b正好是接收a发送的这条消息，那么a happens before b；如果a happens before b同时b happens before c，那么a happens before c。否则a和b被认为是并发的事件。

这里会用到python的数据结构（Sets，Lists，Maps和Strings）

（1）Lists

数列L 表示这样一个序列 [e1, e2, ... en]
list的len(L)表示数列的元素个数.
数列元素可以用以0开头的索引来检索 e1 == L[0]
Python切片操作可以得到数列的子集 L[0:3] == [e1,e2]（我觉得这里有点问题，结果应该是[e1,e2,e3]）,L[:-1] == en（我觉得这里的结果应该是[e1,e2, ... en-1]）
多个数列可以联结成新的数列 L' = L + [ e3 ]
数列可以这样删除元素 L' = L - [ e2, e1 ]. 这和Python的 del 操作不同.
如果某个元素包含在数列中，那么推测操作 in 会返回 true: e2 in L
python支持类似这样的方式创造新的数列: L' = [ x for x in l where x < 5]

（2）Sets

集合是被{ 和 }包裹的无序的元素.
使用 {} 定义集合. 而不是python使用的 set([list]). 这样做的前提是通过元素的内容就可以区分它是一个set还是一个dictionary.
空集合 {} 没有元素.
所有常用集合都支持apply函数.
集合也可以使用 in 函数进行推测.
可以使用类似的方式创造新的集合. S' = {s for s in S where len(s)==2}
对于集合 s, len(s) 返回集合的元素个数.
- 操作返回删除掉该操作符右边元素的原集合的子集.

（3）Maps

maps类似于python的dictionaries；{"key":value,"key2",value2}

keys(Map)表示map中所有key的集合.
k in Map 等价于 k in keys(Map)
空的map写作 {:}
- 返回去除掉根据key指定的某些键值对的原map的子集.
len(Map) 返回 map的元素个数.

（4）Strings

Strings是用双引号包裹的字符数列:

"abc" == ['a','b','c']

（5）状态可修改性

所有系统声明的状态都是不可修改的。通常使用单引号做后缀来标记操作后的系统状态。例如

L' = L + ['d','e']

（6）方法说明

方法定义了一些preconditions、postconditions，其中postconditions定义执行方法后系统新的状态，和方法的返回值。

2、异常

在经典语言中，preconditions必须被满足，否则会抛出失败原因。

hadoop要求能够指出执行失败的原因。

raise <exception-name>用来表明一个异常被抛出

可以在if-then-else中使用它：

if not exists(FS, Path) : raise IOException

用这种方式表述抛出多个异常：

if not exists(FS, Path) : raise {FileNotFoundException, IOException}

此时最早抛出的异常往往在集合最靠后的地方，主要靠它来诊断问题所在。

我们要区分执行操作需要满足的前提条件，包括哪些should满足的。如果某个方法说明中存在should满足的条件，那么这个条件需要得到高度的重视。例如

Should:

if not exists(FS, Path) : raise FileNotFoundException

3、条件

会有其他的条件运用在preconditions和postcondition的定义之中。例如

`supported(instance, method)`

它表示instance需要实现method方法，否则会抛出异常UnsupportedOperation

例如，FSDataInputStream.seek的一个preconditions是实现类必须支持Seekable.seek：

supported(FDIS, Seekable.seek) else raise UnsupportedOperation

0 0