MongoDB MapReduce实现的group问题
来源:互联网 发布:商业机构的域名 编辑:程序博客网 时间:2024/04/29 05:50
用MapReduce写的group速度不行啊!!
1)来源MongoDB权威指南
The price of using MapReduce is speed: group is not particularly speedy, but
MapReduce is slower and is not supposed to be used in “real time.” You run
MapReduce as a background job, it creates a collection of results, and then
you can query that collection in real time.
2)http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries
The idea is that you improve the performance of aggregation queries by using MapReduce on a sharded database that is distributed over multiple machines.
I did some comparisons of the performance of Mongo's Mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. This means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.
Exporting the data from mongo via mongoexport.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own MapReduce.
3)http://blog.evilmonkeylabs.com/2011/01/27/MongoDB-1_8-MapReduce/(下面的comments)
No clever tricks, unfortunately. As long as MapReduce is single threaded, we're not able to use it. We need to be able to run a few dozen or a few hundred at once but since you only get one MapReduce running per shard, we had to go in another direction. It would be nice to take advantage of new changes, but until it moves beyond being one giant blocking operation...
Any ideas as to when it'll be multithreaded?
----------------
I don't believe anything is scheduled currently to increase concurrency within MapReduce; much of the limitation exists in the JavaScript engine itself.
There are plans for the next major release series of MongoDB to include new aggregation features which will cover many of the common tasks people currently use MapReduce for in a much simpler interface.
-----------------
Unfortunately, MR in 1.8 will still only be single-threaded, meaning you can essentially only run one job per shard. These new features will be really useful once you can run MRs in parallel and distributed.
4)indexes for map/reduce (http://groups.google.com/group/mongodb-user/browse_thread/thread/3327e58e92140407/a16a9a2fa4b143cf?show_docid=a16a9a2fa4b143cf)
测试显示建索引对Map/Reduce是没有帮助的!
5)一个mongodb issue ticket
http://jira.mongodb.org/browse/SERVER-1197
6)为什么不直接用group命令呢?
直接访问shard server端口:
-------------------------
Thu Mar 17 17:14:07 uncaught exception: group command failed: {
"errmsg" : "exception: group() can't handle more than 10000 unique keys",
"code" : 10043,
"ok" : 0
}
-------------------
直接访问route server端口:
Thu Mar 17 17:05:59 uncaught exception: group command failed: { "ok" : 0, "errmsg" : "can't do command: group on sharded collection"
}
- MongoDB MapReduce实现的group问题
- mongodb mapreduce, aggregate, group 的类似功能
- Java中实现MongoDB的Group功能
- Mongodb-MapReduce 折腾一下午的问题!
- mongodb MapReduce的一个忽视问题
- MongoDB中group() mapReduce() aggregate()之比较
- 用MongoDB实现MapReduce
- 用MongoDB实现MapReduce
- 用MongoDB实现MapReduce
- 用MongoDB实现MapReduce
- Mongodb 和 spring的整合开发 权限认证 mongotemplate的group mapReduce开发
- Mongodb的group
- MongoDB的group分组
- Java 实现MongoDB Group 操作
- mongodb的mapreduce
- MongoDB的MapReduce使用
- mongodb的mapreduce
- mongodb mapreduce 的例子
- 计算机网络基础第一讲
- Linux下带颜色输出与闪烁功能
- 计算机网络基础第二讲
- 音频信号MATLAB处理的一些常用函数
- 如何清除WEBLOGIC缓存
- MongoDB MapReduce实现的group问题
- 带参数的存储过程动态创建一个视图及调用方法[表不固定,作为参数]
- myeclipse+weblogic安装配置
- java1.5新特性之可变参数和增强for循环
- GTK+图形化应用程序开发学习
- mysql 5.0安装图解
- 实验一:单链表的各种基本运算
- 写在今天
- SQL Server2008 时间类型