Titan 1.0 数据批量并行导入示例

来源:互联网 发布:python 字符串转义 编辑:程序博客网 时间:2024/06/05 20:03

Titan图数据库数据导入较复杂,分为串行导入、批量导入、以及基于Hadoop的数据导入,前两者较简单,相对好配置,本文通过示例详细介绍基于Hadoop的批量数据导入。

参考(https://groups.google.com/forum/#!topic/aureliusgraphs/fLPl7OlcXt0)

1)顶点集out_vertices.txt

root@cnic-1:~/titan# cat test/out_vertices.txt 12345root@cnic-1:~/titan# 

2)边数据集out_edges.txt

root@cnic-1:~/titan# cat test/out_edges.txt 1|2,3,4|52|5|1,33|2|1,4,54|3|1,55|1,4,3|2root@cnic-1:~/titan# 

3)解析并插入顶点和边的脚本vertices.groovy和edges.groovy

vertices.groovy

def parse(line, factory) {    idstr = line    def v1 = factory.vertex(idstr, "user")    return v1}

edges.groovy

root@cnic-1:~/titan# cat test/edges.groovy def parse(line, factory) {    def (id,inv,outv) = line.split("\\|")    def in_lst = inv.toString().split(",")    def out_lst = outv.toString().split(",")    idstr = "${id}".toString()    def v1 = factory.vertex(idstr, "user")    for (v_id in in_lst) {        def v2 = factory.vertex(v_id)        factory.edge(v1, v2, "friend")    }        for (v_id in out_lst) {        def v2 = factory.vertex(v_id)        factory.edge(v2, v1, "friend")    }    return v1}

vertices.groovy和vertices.groovy从输入数据集解析出顶点和边

4)最后准备gremlim执行脚本,用于设置Titan图数据库,启动Hadoop分别导入顶点和边

hadoop-script-load-example.groovy 

root@cnic-1:~/titan# cat hadoop-script-load-example.groovy cassandra_props = "conf/titan-cassandra.properties"path = "/root/titan/test"graph = TitanFactory.open(cassandra_props)m = graph.openManagement()user = m.makeVertexLabel("user").make()friend = m.makeEdgeLabel("friend").make()blid = m.makePropertyKey("bulkLoader.vertex.id").dataType(Long.class).make()uid = m.makePropertyKey("uid").dataType(Long.class).make()m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).buildCompositeIndex()m.commit()hdfs.copyFromLocal("${path}/out_vertices.txt", "vertices.txt")hdfs.copyFromLocal("${path}/vertices.groovy", "vertices.groovy")graph = GraphFactory.open("conf/hadoop-graph/hadoop-script.properties")graph.configuration().setInputLocation("vertices.txt")graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", "vertices.groovy")blvp = BulkLoaderVertexProgram.build().writeGraph(cassandra_props).create(graph)graph.compute(SparkGraphComputer).program(blvp).submit().get()hdfs.copyFromLocal("${path}/out_edges.txt", "edges.txt")hdfs.copyFromLocal("${path}/edges.groovy", "edges.groovy")graph = GraphFactory.open("conf/hadoop-graph/hadoop-script.properties")graph.configuration().setInputLocation("edges.txt")graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", "edges.groovy")blvp = BulkLoaderVertexProgram.build().keepOriginalIds(false).writeGraph(cassandra_props).create(graph)graph.compute(SparkGraphComputer).program(blvp).submit().get()root@cnic-1:~/titan# 
5)运行

out_edges.txt、out_vertices.txt输入数据集文件,vertices.groovy,edges.groovy脚本文件位于titan/test目录下;hadoop-script-load-example.groovy位于titan目录下。在titan目录下运行./bin/gremlin.sh hadoop-script-load-example.groovy,批量写入数据到Titan图数据库







0 0