spark2.0访问ES5.1中的数据
来源:互联网 发布:淘宝店铺怎么改折扣 编辑:程序博客网 时间:2024/06/15 23:28
需求是从ES中读取数据进行分析,本来想用java想用java纯代码写的,但是太麻烦,所以就用了sparksql来分析,实现需求
后来发现一个问题,单纯的java代码无法实现es数据的join操作,即使能实现也是麻烦到姥姥家,所以,贡献一下我的方案
先上依赖:
测试依赖
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
ES的依赖,5.X的都差不多
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.2.2</version>
</dependency>
<dependency>
log的依赖
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.7</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.7</version>
</dependency>
<dependency>
json的依赖
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20160212</version>
</dependency>
分词依赖
<dependency>
<groupId>analyzer</groupId>
<artifactId>IKAnalyzer</artifactId>
<version>5</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
sparksql的依赖,版本是scala2.11 spark2.02
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
spark和ES集成的依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>5.1.1</version>
</dependency>
下面就是代码部分
public static void main(String[] args) {
//设置日志的等级,不然产生的太多,看的很烦
Logger.getLogger("org.apache.spark").setLevel(Level.WARN);
//spark2.0的新api。不懂的童鞋请移架官网
SparkSession sparkorCreate = SparkSession.builder()
.appName("EsSparkSql")
.master("local")
//这个不设置会报错,请记住,踩的坑之一
.config("spark.sql.warehouse.dir", "file:///D:/tmp/spark-warehouse")
.getOrCreate();
registerESTable(sparkorCreate,"logdt","logdt9");
}
//从logdt的index,first的type中读取数据
Dataset<Row> OrderDF = sqlContext.read().format("org.elasticsearch.spark.sql")
.options(esOptions)
.load("logdt/first");
//注册成临时表,准备接下来的sql
OrderDF.registerTempTable("logdt");
logdt.show();
//OrderDF.show();
//这是一个sql
Dataset<Row> sql1 = sqlContext.sql("select timestamp timestamp,rectime,link,transcode,serno,globalSeqNo from logdt where link ='C' and transcode = 'Begin' and transcode is not null and globalSeqNo is not null");
sql1.registerTempTable("logdt1");
sql1.show();
//另外一个sql的条件
Dataset<Row> sql2 = sqlContext.sql("select timestamp timestamp,link,transcode,serno,globalSeqNo from logdt where link ='E' and transcode is not null and serno is not null");
sql2.registerTempTable("logdt2");
sql2.show();
//实现join操作
String joinsql="select logdt1.timestamp timestampc,logdt1.link liknc,logdt1.transcode transcodec,"
+ "logdt1.serno sernoc,logdt1.globalSeqNo globalSeqNoc,"
+ "logdt2.timestamp timestampe,logdt2.link likne,logdt2.transcode transcodee,"
+ "logdt2.serno sernoe,logdt2.globalSeqNo globalSeqNoe from logdt1 inner join logdt2 on logdt1.globalSeqNo=logdt2.serno";
Dataset<Row> sql3 = sqlContext.sql(joinsql);
Properties properties = new Properties();
properties.setProperty("user","root");
properties.setProperty("password","123456");
//分析的结果保存到数据库
sql3.write().mode(SaveMode.Append).jdbc("jdbc:mysql://localhost:3306/esdata",tableName,properties);
System.out.println("运行成功");
ES中的数据进行join,最后结果写入mysql的操作完成了,希望有用!
- spark2.0访问ES5.1中的数据
- spark2.0
- spark2.0
- es5中的数组方法
- ES5中的继承
- Spark2.0以下读取csv数据并转化为RDD
- spark2.0 用socket接收数据并处理
- [大数据]-hadoop2.8和spark2.1完全分布式搭建
- Spark2.0源码之1_Broadcast
- kylin2.0 安装与 Spark2.1 集成
- 访问SharedPreferences中的数据
- 访问vector中的数据
- 访问ValueStack中的数据
- ES6和ES5中的遍历
- JavaScript及es5中的方法
- es5
- ES5
- Spark2.0 Structured Streaming
- 《程序员的自我修养》第二章笔记
- php+html图片上传处理
- Sending a URL to Another App on Android and iOS with Delphi XE5
- React-Router去掉后便跟随的k参数
- java创建对象
- spark2.0访问ES5.1中的数据
- tomcat avicon.ico
- Manthan, Codefest 17-C-Helga Hufflepuff's Cup(树形DP)
- AJAX学习笔记(三)_XMLHttpRequest向服务器发送请求
- React 组件基本使用(三) ---父子组件之间的通信
- 【Web】Javascript多项式计算器
- Cannot resolve symbol 'permission'
- Children's Game UVA
- Unknown error: Unable to build: the file dx.jar was not loaded from the SDK folder!