《RDF Graph Partitions: a Brief Survey》——笔记

来源：互联网发布：软件系统技术说明书编辑：程序博客网时间：2024/06/05 11:44

Abstract

给出图分割的理由和解决方案。使用经典图形理论解决图分割问题。提出四种将RDF图转换为古典图形的方法。

Introduction

语义Web和Linked Data environments的核心数据模型。
RDF图规模太大，无法单机处理。早期的解决方法来此RDBMS。

Preliminaries

RDF是一个非常一般的数据模型，用于描述资源和他们之间的关系。
**Definition 2.1 (Subject, predicate and object).**Subject——资源，object——关系的值，谓词表示资源的特征或方面，并表示subject和object的关系。
**Definition 2.2 (RDF triple).**Assume that I is the set of all Internationalized Resource Identifier (IRI) references, B an infinite set of blank nodes, L the set of RDF literals. An RDF triple t is defined as a triple t=<s,p,o> where s∈L∪B is called the subject, p∈L is called the predicate and o∈I∪B∪L is called the object.
Definition 2.3 (IRIs) IRIs serve as global identifiers that can be used to identify any resource.
Definition 2.4 (Literals) Literals are a set of lexical values.包含字符串和数据类型。
Definition 2.5 (Blank nodes) existential variables used to denote the existence of some resource for which an IRI or literal is not given.
Definition 2.6 (RDF graph) let L=LS∪LL∪LD,O=I∪B∪L and S=I∪B, then G⊂S×I×Ois a finite subset of RDF triples, which is called RDF graph.
报警了，这个定义是什么鬼？
Definition 2.7 (Directed labeled graph) Directed labeled graph G is a quadruple G=(V,E,lbl,L), where V is a set of vertices, E={(v1,v2)|v1,v2∈V} is a set of directed edges, lbl:E∪V→Lis a labeling function, and L is a set of labels.
Definition 2.8 (k-way graph partition) Given a graph G=(V,E,lbl,L)， a k-way graph partitioning, C, is a division of V into k partitions {P1,P2,...,Pk} such that ⋃1≤i≤kPi=V, and Pi∩Pj=∅ for any i≠j

RDF Graph Partition

Classical Graph Partitioning

最被认可的图划分算法是在METIS软件包中的gpmetis。gpmetis基于多级图分割，有三个阶段：图粗化，初始分区，图解析。粗化阶段，通过将相邻顶点对尽量折叠在一起，将输入图转化成一系列较小的图。初始分区，当所得到的图形足够低时，使用kernighan-Lin算法进行分割。解析阶段，分区展开被折叠的顶点，投影成较大的图。

3.2 Relation Between Classical Graphs and RDF Graphs

所有的图划分算法（graph partitioning algorithms）都可被用在RDF图中，如果他们转化为classical graph表示。三元组的转换是最简单的情况。
对于三元组t=<s,p,o> 其中s∈I∪B,p∈L,o∈I∪B∪L被转换成有向边s′⟶p′o′，其中s′,o′∈V,p′∈L。
问题：三元组中p和s、o是相交的，但边中p′和s′,o′不相交。
解决方法：

<s,p,o> 转换为v1(s′)⟶p′v2(o′)其中v1,v2∈V并且s′,p′,o′∈L
转换为超图，即可以让边连接多于两个节点，这种方法s,p,o都被转化成顶点。但需要为其专门设计算法，且效率不如简单图。
以超图为起点，将RDF图转换成二分图。如下图
将每个RDF三元组转换成不同的图形节点，并在共享主题，对象和/或谓词的那些节点之间生成边。

RDF比传统图更通用，有向有标记的图可以很容易转换成RDF图，但是反向变换很麻烦。这意味着每个RDF图的问题的复杂性并不比相应的经典图形问题的复杂性要好。
这里也不太懂。。

Experiment

第四种方法不如第一种好。

Conclusions

提出了RDF图分区研究领域的工作，提供了RDF图的经典图分割的见解，觉少了古典图和RDF图之间的形式关系。
We outlined a partition of the vertices of an RDF graph into two disjoint subsets. In this paper we presented works from the RDF graph partitions research area. This paper provided insights on classical graph partitioning of RDF graphs. Moreover, we presented formal relationships between classical graphs and RDF graphs. Finally, we presented experiments, which showed a great potential for the presented approaches.

0 0