快学Scala-Actor并发编程实现WordCount
来源:互联网 发布:高等教育出版社 知乎 编辑:程序博客网 时间:2024/05/22 03:46
使用scala的多线程来做wordcount之前至少要知道单击版怎么做wordcount,所以先在命令行做单机版的单词计数,具体解释参考
单词计数在D盘下有words.txt文件和words.log,内容均如下
hello tomhello jerryhello tomhello jerryhello tomhello tom
现在对words.txt内容做wordcount
scala> Source.fromFile("d://words.txt").getLines().toListres5: List[String] = List(hello tom, hello jerry, hello tom, hello jerry, hello tom, hello tom)scala> Source.fromFile("d://words.txt").getLines().toList.map(_.split(" "))res6: List[Array[String]] = List(Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, tom))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" "))res7: List[String] = List(hello, tom, hello, jerry, hello, tom, hello, jerry, hello, tom, hello, tom)scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1))res8: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (tom,1))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1)res9: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1), (tom,1), (tom,1), (tom,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1), (hello,1), (hello,1)))scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size)res10: scala.collection.immutable.Map[String,Int] = Map(tom -> 4, jerry -> 2, hello -> 6)
单机版搞定,那么使用scala的Actor来做
case class WordCountTask(filename : String)case class ResultTask(map : Map[String,Int])case object StopTaskclass WordCountActor extends Actor{ override def act(): Unit = { loop{ react { case WordCountTask(filename) => { //得到的是一个map,Map(tom -> 4, jerry -> 2, hello -> 6) val wcResultMap = Source.fromFile(filename).getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size) //结果方法哦task中返回给发送者 sender ! ResultTask(wcResultMap) } //退出 case StopTask => { exit() } } } }}object WordCountActor { def main(args: Array[String]): Unit = { val responseSet = new mutable.HashSet[Future[Any]]() val resultList = new ListBuffer[ResultTask] //指定进行单词计数的文件 val files = Array ("d://words.txt","d://words.log") //有几个文件就启几个actor for (file <- files) { val actor = new WordCountActor //启动线程,发送异步消息等待接收返回结果 val response = actor.start() !! WordCountTask(file) //接收结果放到Set中 responseSet += response } while (responseSet.size > 0){ // 获取接收到了消息的Future放到集合filterSet中 //responseSet中虽然有Future引用,但是此时Future中还不一定有内容 val filterSet = responseSet.filter(_.isSet) for (ele <- filterSet) { //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据 val result = ele.apply().asInstanceOf[ResultTask] //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...)) resultList += result //Set中移除 responseSet -= ele } //睡眠一会,保证消息返回完毕 Thread.sleep(300) } //下面做的相当于汇总的功能mapreduce中的reduce //ListBuffer((tom,4),(jerry,2),(hello,6)...) val r1 = resultList.flatMap(_.map) //Map((tom,ListBuffer((tom,4),(tom,4))),(..),(...)) val r2 = r1.groupBy(_._1) val r3 = r2.mapValues(_.foldLeft(0)(_+_._2)) println(r3) }}输出Map(tom -> 8, jerry -> 4, hello -> 12)
注:上面的代码来源于学习资料,其实又些地方似乎不完善,比如做睡眠似乎就没考虑到Future的apply阻塞特性,可以不用过滤也能实现。把while那段改成下面的没问题
while (responseSet.size > 0){ for (ele <- responseSet) { //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据 val result = ele.apply().asInstanceOf[ResultTask] //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...)) resultList += result //Set中移除 responseSet -= ele } }
阅读全文
0 0
- 快学Scala-Actor并发编程实现WordCount
- 快学Scala- Scala Actor 并发编程
- Scala Actor并发编程
- Scala Actor并发编程
- 快学Scala第20章----Actor
- Scala-Actor并行wordcount
- Scala使用Actor进行并发编程
- scala学习十二 并发编程二 actor模型
- 第67讲 Scala并发编程 中的actor
- scala进阶20-基于Actor多并发编程
- 第17节:scala中Actor并发编程
- 使用scala的actor模型实现并发的例子
- scala之actor编程
- Spark--Scala Actor统计多文件WordCount
- 快学Scala学习笔记及习题解答(19-20解析与Actor)
- 《快学Scala》第20章部分习题参考解答(Actor)
- 快学Scala
- 《快学scala》代码
- 使用最广泛的Android爬虫指示器PagerSlidingTabStrip遇到的相关问题的解决,并源码修改
- vs2013下配置opencv2.4.10(win7 64位)
- 数字图像处理matlab版第八章
- Java笔记杨枝11.26
- Netstat命令详解
- 快学Scala-Actor并发编程实现WordCount
- attachmentSimple文件上传组件使用
- 转载: 理解Cookie和Session机制
- LeetCode【26】Remove Duplicates from Sorted Array
- 移动端限制input框只能输入数字
- python+phantomjs+selenium爬虫添加cookie的方法
- PHP 获取今日、昨日、本周、上周、本月的等等常用的起始时间戳和结束时间戳的时间处理类
- linux Mysql
- linux环境定时重启tomcat