Mongodb --- Manual sharding
来源:互联网 发布:淘宝aj 编辑:程序博客网 时间:2024/05/17 22:25
最近在google group看到一个关于manual sharding的讨论,虽然暂时还没亲自去实践一下,但是觉得办法可行,大家都知道google group是要翻墙的,所以贴在这里方便查看.
Zer0提出的问题:
-----------------------------
Sorry for my English
I 've read all the documents at home page and search many other sites
but I still can not config for manual sharding
Someone say "moveChunk" is a manual, that 's ok but what I want is
more than that
For example, a document as follow:
{"name":"John", "age":21}
Shard key is "name:1"
How can I config for shard1 to hold document where "name" started with
A-O and shard2 hold names from P-Z.
No auto sharding, no auto rebalance at all.
Thanks so much
You can split at any point you like, even at non-existing keys:
http://www.mongodb.org/display/DOCS/Splitting+Chunks
If you want to move a chunk manually
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...
And you can stop the balancer
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...
Here's a script I use from the mongo shell.
Will wil need to change
-- number of shards
-- min and max values of the shard key
-- value delta between chunks
-- collection name
-Alvin
use admin
function pad(number, length) {
var str = '' + number;
while (str.length < length) {
str = '0' + str;
}
return str;
var shards=5
var min_value=-2061389163
var max_value=2061389163
var inc=40000000
var collection_name="scaleout.blogs"
for (j=0,i=min_value; i < max_value; i+=inc,j++) {
db.runCommand( { split : collection_name, middle : { ts : i }} );
db.runCommand( { moveChunk: collection_name, find : { ts : i+1}, to :
"shard" + pad((j%shards),4) } );
db.printShardingStatus()
-------------------------------------------
另外一个google group上的讨论是“fastest way to import a large dataset”, 这里面也提到了先manual-sharding,然后使用多个mongoimport分别导数据到相应shard中去,有兴趣的翻墙看看吧!
下面贴出前几个讨论:
Hey there,
we are having big trouble importing a large dataset into mongo in a
reasonable time.
We have a 6 node sharded cluster and we tried a couple of different
approaches.
The dataset consist of 1.4B small documents. Average size about 70
bytes.
Fastest import we have seen was 24 hours.
We would have thought that a mongos per machine with a couple of
mongoimports per node should give the best results. But oddly enough -
that's not faster - it's rather slower than a single mongoimport for
the whole cluster.
Right now I am wondering if there is a way to import the pre-sharded
documents into the shard databases using the --dbpath option and the
adjust the config database accordingly. Would that work? ...and be
faster?
Indexes beforehand or after?
cheers,
Torsten
Nat
-------------------------
What is your shard key?
- Index after is better than index before hand
- If you already preshard the data, turn the balancer off first
- You should break the import data in the same way that you preshard
and use mongoimport to load them up
- Your data should be sorted by shard key if possible
We tried _id (ObjectIds) as well as our preferred keys
So far we have been trying to index while importing.
We can give that another try.
I would shut down config server and mongos for the import.
Is that what you mean?
Of course.
> - Your data should be sorted by shard key if possible
OK
Biggest question: will it be worth it?
cheers,
Torsten
Nat
-----------------------------
- If you use ObjectId as a shard key, you won't be able to scale the
import. The maximum speed is limited by the speed of one machine.
- You can leave your config server and mongos up and do the import via
mongos.
- To turn off balancer,
> use config
> db.settings.update({_id:"balancer"},{$set : {stopped:true}},
true)
> import. The maximum speed is limited by the speed of one machine.
Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.
> mongos.
Confused - that's what I was doing before.
mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport
Or do you mean...
Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with
mongos?
> > use config
> > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)
Ah ... OK.
cheers,
Torsten
Nat
----------------------
- ObjectId is keyed by timestamp first.
- You can use --dbpath but you have to take mongod offline. I just
recommended another way without taking down mongod. As you will
perform mongoimport splitted by shard key, mongos should route
requests to one server per mongoimport.
- Do you have mongostat, iostat, db.stats() during import process?
True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.
That's fine.
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.
But doesn't that depend on what chunks are configured in the config server?
Certainly. With the current non-pre-sharded import...
- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.
- iostat shows quite uneven activity across the nodes.
- db.stats() we are monitoring over time. the following shows the
objects graphed:
https://skitch.com/tcurdt/rpti6/import-speed
Nat
--------------------------
if you use the sharding key [user, time], turn off balancer, you
should see better result. Can you post iostat and mongostat result?
What version are you on?
You should shard on user,time as you want to do.
The speed is probably because of migrations.
2 main options:
- try 1.7.5
- pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
this will prevent migrates.
I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.
................................60多个comments,有兴趣翻墙吧!
- Mongodb --- Manual sharding
- mongodb sharding
- MongoDB sharding
- MongoDB Sharding
- auto-sharding 无用论:auto-sharding vs. manual-sharding
- MongoDB auto-sharding
- mongodb的sharding
- mongodb sharding 学习笔记
- MongoDB Sharding配置
- mongodb的sharding(分片)
- mongodb sharding 配置
- mongodb sharding 机制
- mongodb sharding replica set
- MongoDB---Sharding分片
- MongoDB Sharding 分片技术
- MongoDB之Sharding 分片
- Mongodb Sharding 分片
- MongoDB Sharding集群部署
- 显卡性能排行榜
- 常见文件打开方式
- android 官方SDK文档--- Intent,Activity
- 打开vs2008遇到vss问题
- Eclipse快捷键大全(转载)
- Mongodb --- Manual sharding
- java面试题
- __invalid_parameter_noinfo vs2005 vc8 问题解决办法
- Pacific Timesheet Announces Enhanced SaaS Crew Timesheet Configurability
- 常见文件打开方式
- 开始我的学习记录
- SQL Server 锁
- 高效的C编程
- 有关物体运动的研究。