Titan 1.0 数据批量并行导入示例
来源:互联网 发布:python 字符串转义 编辑:程序博客网 时间:2024/06/05 20:03
Titan图数据库数据导入较复杂,分为串行导入、批量导入、以及基于Hadoop的数据导入,前两者较简单,相对好配置,本文通过示例详细介绍基于Hadoop的批量数据导入。
参考(https://groups.google.com/forum/#!topic/aureliusgraphs/fLPl7OlcXt0)
1)顶点集out_vertices.txt
root@cnic-1:~/titan# cat test/out_vertices.txt 12345root@cnic-1:~/titan#
2)边数据集out_edges.txt
root@cnic-1:~/titan# cat test/out_edges.txt 1|2,3,4|52|5|1,33|2|1,4,54|3|1,55|1,4,3|2root@cnic-1:~/titan#
3)解析并插入顶点和边的脚本vertices.groovy和edges.groovy
vertices.groovy
def parse(line, factory) { idstr = line def v1 = factory.vertex(idstr, "user") return v1}
edges.groovy
root@cnic-1:~/titan# cat test/edges.groovy def parse(line, factory) { def (id,inv,outv) = line.split("\\|") def in_lst = inv.toString().split(",") def out_lst = outv.toString().split(",") idstr = "${id}".toString() def v1 = factory.vertex(idstr, "user") for (v_id in in_lst) { def v2 = factory.vertex(v_id) factory.edge(v1, v2, "friend") } for (v_id in out_lst) { def v2 = factory.vertex(v_id) factory.edge(v2, v1, "friend") } return v1}
vertices.groovy和vertices.groovy从输入数据集解析出顶点和边
4)最后准备gremlim执行脚本,用于设置Titan图数据库,启动Hadoop分别导入顶点和边
hadoop-script-load-example.groovy
root@cnic-1:~/titan# cat hadoop-script-load-example.groovy cassandra_props = "conf/titan-cassandra.properties"path = "/root/titan/test"graph = TitanFactory.open(cassandra_props)m = graph.openManagement()user = m.makeVertexLabel("user").make()friend = m.makeEdgeLabel("friend").make()blid = m.makePropertyKey("bulkLoader.vertex.id").dataType(Long.class).make()uid = m.makePropertyKey("uid").dataType(Long.class).make()m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).buildCompositeIndex()m.commit()hdfs.copyFromLocal("${path}/out_vertices.txt", "vertices.txt")hdfs.copyFromLocal("${path}/vertices.groovy", "vertices.groovy")graph = GraphFactory.open("conf/hadoop-graph/hadoop-script.properties")graph.configuration().setInputLocation("vertices.txt")graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", "vertices.groovy")blvp = BulkLoaderVertexProgram.build().writeGraph(cassandra_props).create(graph)graph.compute(SparkGraphComputer).program(blvp).submit().get()hdfs.copyFromLocal("${path}/out_edges.txt", "edges.txt")hdfs.copyFromLocal("${path}/edges.groovy", "edges.groovy")graph = GraphFactory.open("conf/hadoop-graph/hadoop-script.properties")graph.configuration().setInputLocation("edges.txt")graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", "edges.groovy")blvp = BulkLoaderVertexProgram.build().keepOriginalIds(false).writeGraph(cassandra_props).create(graph)graph.compute(SparkGraphComputer).program(blvp).submit().get()root@cnic-1:~/titan#5)运行
out_edges.txt、out_vertices.txt输入数据集文件,vertices.groovy,edges.groovy脚本文件位于titan/test目录下;hadoop-script-load-example.groovy位于titan目录下。在titan目录下运行./bin/gremlin.sh hadoop-script-load-example.groovy,批量写入数据到Titan图数据库
0 0
- Titan 1.0 数据批量并行导入示例
- Titan/JanusGraph的Hadoop-Gremlin批量数据导入注意事项
- solr批量导入数据,配置步骤示例
- 【C#--数据】1.使用SqlBulkCopy批量导入数据库的示例
- sqoop并行导入数据
- 数据批量导入数据库
- 批量导入数据
- FireBird批量数据导入
- .net批量导入数据
- 批量导入数据
- Oracle批量数据导入
- jdbc批量导入数据
- excel批量导入数据
- excel数据批量导入
- Redis批量导入数据
- excel批量导入数据
- excel批量导入数据
- excel批量导入数据
- POJ1008
- centos上安装mongodb
- lvs为何不能完全替代DNS轮询
- SharedPerefrence源码分析
- Android原生代码与HTML5的交互
- Titan 1.0 数据批量并行导入示例
- IEEE的浮点数表示
- 物联网知识学习
- select函数使用浅析
- Redis 存储List对象
- 购物车实现原理
- 组织机构权限系统设计
- android frameworks 源码介绍
- 【模板】Meisell-Lehmer 模板