spark2.0访问ES5.1中的数据

来源:互联网 发布:淘宝店铺怎么改折扣 编辑:程序博客网 时间:2024/06/15 23:28

   需求是从ES中读取数据进行分析,本来想用java想用java纯代码写的,但是太麻烦,所以就用了sparksql来分析,实现需求

 后来发现一个问题,单纯的java代码无法实现es数据的join操作,即使能实现也是麻烦到姥姥家,所以,贡献一下我的方案

   先上依赖:

测试依赖

<dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>

ES的依赖,5.X的都差不多

    <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>5.2.2</version>
</dependency>
<dependency>

log的依赖
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
    <version>2.7</version>
</dependency>
<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.7</version>
</dependency>
<dependency>

json的依赖
   <groupId>org.json</groupId>
   <artifactId>json</artifactId>
   <version>20160212</version>
</dependency>
分词依赖
<dependency>
   <groupId>analyzer</groupId>
   <artifactId>IKAnalyzer</artifactId>
   <version>5</version>
</dependency>
<dependency>
 <groupId>mysql</groupId>
 <artifactId>mysql-connector-java</artifactId>
 <version>5.1.6</version>
</dependency>
<dependency>

sparksql的依赖,版本是scala2.11 spark2.02
 <groupId>org.apache.spark</groupId>
 <artifactId>spark-sql_2.11</artifactId>
 <version>2.0.2</version>
</dependency>
spark和ES集成的依赖
<dependency>
 <groupId>org.elasticsearch</groupId>
 <artifactId>elasticsearch-spark-20_2.11</artifactId>
 <version>5.1.1</version>
    </dependency>

下面就是代码部分


public static void main(String[] args) {

 //设置日志的等级,不然产生的太多,看的很烦
  
  Logger.getLogger("org.apache.spark").setLevel(Level.WARN);

//spark2.0的新api。不懂的童鞋请移架官网
       SparkSession sparkorCreate = SparkSession.builder()   
       .appName("EsSparkSql")
       .master("local")

//这个不设置会报错,请记住,踩的坑之一
       .config("spark.sql.warehouse.dir", "file:///D:/tmp/spark-warehouse")
       .getOrCreate();
       
       registerESTable(sparkorCreate,"logdt","logdt9");
}

//从logdt的index,first的type中读取数据

 Dataset<Row> OrderDF = sqlContext.read().format("org.elasticsearch.spark.sql")
                                    .options(esOptions)
                                    .load("logdt/first");
        //注册成临时表,准备接下来的sql
        OrderDF.registerTempTable("logdt");
        logdt.show();
        //OrderDF.show();
        //这是一个sql
        Dataset<Row> sql1 = sqlContext.sql("select timestamp timestamp,rectime,link,transcode,serno,globalSeqNo from logdt where link ='C' and transcode = 'Begin' and transcode is not null and globalSeqNo is not null");


        sql1.registerTempTable("logdt1");
        
        sql1.show();
        //另外一个sql的条件
        Dataset<Row> sql2 = sqlContext.sql("select timestamp timestamp,link,transcode,serno,globalSeqNo from logdt where link ='E' and transcode is not null and serno is not null");
        
        sql2.registerTempTable("logdt2");
        
        sql2.show();
        

//实现join操作
        String joinsql="select logdt1.timestamp timestampc,logdt1.link liknc,logdt1.transcode transcodec,"
        + "logdt1.serno sernoc,logdt1.globalSeqNo globalSeqNoc,"
        + "logdt2.timestamp timestampe,logdt2.link likne,logdt2.transcode transcodee,"
        + "logdt2.serno sernoe,logdt2.globalSeqNo globalSeqNoe  from logdt1 inner join logdt2 on logdt1.globalSeqNo=logdt2.serno";
        
        
        Dataset<Row> sql3 = sqlContext.sql(joinsql);
        
        Properties properties = new Properties();
        properties.setProperty("user","root");
        properties.setProperty("password","123456");

//分析的结果保存到数据库
        sql3.write().mode(SaveMode.Append).jdbc("jdbc:mysql://localhost:3306/esdata",tableName,properties);
        System.out.println("运行成功");


ES中的数据进行join,最后结果写入mysql的操作完成了,希望有用!