Top-K in MapReduce Haddop Framework
来源:互联网 发布:非结构化数据的处理 编辑:程序博客网 时间:2024/06/14 06:54
top 10的算法:我们只需要维护一个10个大小的数组,初始化放入10Query,按照每个Query的统计次数由大到小排序,然后遍历这300万条记录,每读一条记录list后进行从大到小排序。如果list长度为11,则pop()默认删除最后一个元素。
不难分析出,这样的算法的时间复杂度是N*K, 其中K是指top多少。
#!/usr/bin/python"""Your mapper function should print out 10 lines containing longest posts, sorted inascending order from shortest to longest.Please do not use global variables and do not change the "main" function."""import sysimport csvdef mapper(): reader = csv.reader(sys.stdin, delimiter='\t') writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL) a= [] for line in reader: a.append(line)# YOUR CODE HERE a.sort(key=lambda x: len(x[4]),reverse=True) if len(a) == 11: a.pop() for line in reversed(a[0:10]): writer.writerow(line)test_text = """\"\"\t\"\"\t\"\"\t\"\"\t\"333\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"88888888\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"1\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"11111111111\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"1000000000\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"22\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"4444\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"666666\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"55555\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"999999999\"\t\"\"\"\"\t\"\"\t\"\"\t\"\"\t\"7777777\"\t\"\""""# This function allows you to test the mapper with the provided test stringdef main(): import StringIO sys.stdin = StringIO.StringIO(test_text) mapper() sys.stdin = sys.__stdin__main()
这里top10代码的核心是:
def mapper(): reader = csv.reader(sys.stdin, delimiter='\t') writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL) a = [] for line in reader: a.append(line)# YOUR CODE HERE a.sort(key=lambda x: len(x[4]),reverse=True) if len(a) == 11: a.pop() for line in reversed(a[0:10]): writer.writerow(line)
python中的lambda表达式可以减少代码量。参数为列表x,返回x[4]的长度作为排序的key,reverse=True表示降序排列。 a.sort(key=lambda x: len(x[4]),reverse=True)
阅读全文
0 0
- Top-K in MapReduce Haddop Framework
- mapreduce top K实现
- mapreduce实现Top K
- mapreduce Top K算法
- hadoop mapreduce 解决 top K问题
- Pig、Hive、MapReduce 解决分组 Top K 问题
- Pig、Hive、MapReduce 解决分组 Top K 问题
- MapReduce解决在海量数据中求Top K
- Pig、Hive、MapReduce 解决分组 Top K 问题
- Pig、Hive、MapReduce 解决分组 Top K 问题
- Top 10 .NET Framework Technologies to Learn in 2007
- TOP K
- Top K
- Top K
- TOP K
- top k
- TOP-K
- Top K
- sizeof关于数组、指针、基本数据类型的使用
- 深入理解JS原型及其扩展
- 455. Assign Cookies
- 2017年暑假与南宁邀请赛总结
- geotrellis使用(三十二)大量GeoTiff文件实时发布TMS服务
- Top-K in MapReduce Haddop Framework
- shiro session存redis
- JavaScript : Array.prototype.concat()中涉及嵌套数组
- Search Range in Binary Search Tree
- 《自己动手写操作系统》实践(一)
- 非常可乐 HDU
- 461_Hamming_Distance
- 双列集合
- Lua 变量和赋值运算