【云星数据---Apache Flink实战系列(精品版)】:Apache Flink批处理API详解与编程实战021--DateSet实用API详解021

来源:互联网 发布:淘宝外观设计专利 编辑:程序博客网 时间:2024/06/07 03:22

Flink DateSet定制API详解(Scala版) -002

flatMap

element为粒度,对element进行1:n的转化。

执行程序:

package code.book.batch.dataset.advance.apiimport org.apache.flink.api.common.functions.FlatMapFunctionimport org.apache.flink.api.scala.{ExecutionEnvironment, _}import org.apache.flink.util.Collectorobject FlatMapFunction001scala {  def main(args: Array[String]): Unit = {    // 1.设置运行环境,并创造测试数据    val env = ExecutionEnvironment.getExecutionEnvironment    val text = env.fromElements("flink vs spark", "buffer vs  shuffer")    // 2.以element为粒度,将element进行map操作,转化为大写并添加后缀字符串"--##bigdata##"    val text2 = text.flatMap(new FlatMapFunction[String, String]() {      override def flatMap(s: String, collector: Collector[String]): Unit = {        collector.collect(s.toUpperCase() + "--##bigdata##")      }    })    text2.print()    //3.对每句话进行单词切分,一个element可以转化为多个element,这里是一个line可以转化为多个Word    //map的只能对element进行1:1转化,而flatMap可以对element进行1:n转化    val text3 = text.flatMap {      new FlatMapFunction[String, Array[String]] {        override def flatMap(s: String, collector: Collector[Array[String]]): Unit = {          val arr: Array[String] = s.toUpperCase().split("\\s+")          collector.collect(arr)        }      }    }    //显示结果的简单写法    text3.collect().foreach(_.foreach(println(_)))    //实际上是先获取Array[String],再从中获取到String    text3.collect().foreach(arr => {      arr.foreach(token => {        println(token)      })    })  }}

执行结果:

text2.print()FLINK VS SPARK--##bigdata##BUFFER VS  SHUFFER--##bigdata##text3.collect().foreach(_.foreach(println(_)))FLINKVSSPARKBUFFERVSSHUFFLE

filter

element为粒度,对element进行过滤操作。将满足过滤条件的element组成新的DataSet

执行程序:

package code.book.batch.dataset.advance.apiimport org.apache.flink.api.common.functions.FilterFunctionimport org.apache.flink.api.scala.{ExecutionEnvironment, _}object FilterFunction001scala {  def main(args: Array[String]): Unit = {    // 1.设置运行环境,并创造测试数据    val env = ExecutionEnvironment.getExecutionEnvironment    val text = env.fromElements(2, 4, 7, 8, 9, 6)    //2.对DataSet的元素进行过滤,筛选出偶数元素    val text2 = text.filter(new FilterFunction[Int] {      override def filter(t: Int): Boolean = {        t % 2 == 0      }    })    text2.print()    //3.对DataSet的元素进行过滤,筛选出大于5的元素    val text3 = text.filter(new FilterFunction[Int] {      override def filter(t: Int): Boolean = {        t >5      }    })    text3.print()  }}

执行结果:

text2.print()2486text3.print()7896
阅读全文
0 0
原创粉丝点击