快学Scala-Actor并发编程实现WordCount

来源:互联网 发布:高等教育出版社 知乎 编辑:程序博客网 时间:2024/05/22 03:46

使用scala的多线程来做wordcount之前至少要知道单击版怎么做wordcount,所以先在命令行做单机版的单词计数,具体解释参考
单词计数在D盘下有words.txt文件和words.log,内容均如下

hello tomhello jerryhello tomhello jerryhello tomhello tom

现在对words.txt内容做wordcount

scala> Source.fromFile("d://words.txt").getLines().toListres5: List[String] = List(hello tom, hello jerry, hello tom, hello jerry, hello tom, hello tom)scala> Source.fromFile("d://words.txt").getLines().toList.map(_.split(" "))res6: List[Array[String]] = List(Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, tom))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" "))res7: List[String] = List(hello, tom, hello, jerry, hello, tom, hello, jerry, hello, tom, hello, tom)scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1))res8: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (tom,1))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1)res9: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1), (tom,1), (tom,1), (tom,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1), (hello,1), (hello,1)))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size)res10: scala.collection.immutable.Map[String,Int] = Map(tom -> 4, jerry -> 2, hello -> 6)

单机版搞定,那么使用scala的Actor来做

case class WordCountTask(filename : String)case class ResultTask(map : Map[String,Int])case object StopTaskclass WordCountActor extends Actor{  override def act(): Unit = {    loop{      react {        case WordCountTask(filename) => {          //得到的是一个map,Map(tom -> 4, jerry -> 2, hello -> 6)          val wcResultMap = Source.fromFile(filename).getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size)          //结果方法哦task中返回给发送者          sender ! ResultTask(wcResultMap)        }          //退出        case StopTask => {          exit()        }      }    }  }}object WordCountActor {  def main(args: Array[String]): Unit = {    val responseSet = new mutable.HashSet[Future[Any]]()    val resultList = new ListBuffer[ResultTask]    //指定进行单词计数的文件    val files = Array ("d://words.txt","d://words.log")    //有几个文件就启几个actor    for (file <- files) {      val actor = new WordCountActor      //启动线程,发送异步消息等待接收返回结果      val response = actor.start() !! WordCountTask(file)      //接收结果放到Set中      responseSet += response    }    while (responseSet.size > 0){      // 获取接收到了消息的Future放到集合filterSet中      //responseSet中虽然有Future引用,但是此时Future中还不一定有内容      val filterSet = responseSet.filter(_.isSet)      for (ele <- filterSet) {        //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据        val result = ele.apply().asInstanceOf[ResultTask]        //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...))        resultList += result        //Set中移除        responseSet -= ele      }      //睡眠一会,保证消息返回完毕      Thread.sleep(300)    }    //下面做的相当于汇总的功能mapreduce中的reduce    //ListBuffer((tom,4),(jerry,2),(hello,6)...)    val r1 = resultList.flatMap(_.map)    //Map((tom,ListBuffer((tom,4),(tom,4))),(..),(...))    val r2 = r1.groupBy(_._1)    val r3 = r2.mapValues(_.foldLeft(0)(_+_._2))    println(r3)  }}输出Map(tom -> 8, jerry -> 4, hello -> 12)

注:上面的代码来源于学习资料,其实又些地方似乎不完善,比如做睡眠似乎就没考虑到Future的apply阻塞特性,可以不用过滤也能实现。把while那段改成下面的没问题

 while (responseSet.size > 0){      for (ele <- responseSet) {        //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据        val result = ele.apply().asInstanceOf[ResultTask]        //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...))        resultList += result        //Set中移除        responseSet -= ele      }    }
原创粉丝点击