Mongodb --- Manual sharding

来源:互联网 发布:淘宝aj 编辑:程序博客网 时间:2024/05/17 22:25

最近在google group看到一个关于manual sharding的讨论,虽然暂时还没亲自去实践一下,但是觉得办法可行,大家都知道google group是要翻墙的,所以贴在这里方便查看.



Sorry for my English
I 've read all the documents at home page and search many other sites
but I still can not config for manual sharding
Someone say "moveChunk" is a manual, that 's ok but what I want is
more than that
For example, a document as follow:

{"name":"John", "age":21}

Shard key is "name:1"
How can I config for shard1 to hold document where "name" started with
A-O and shard2 hold names from P-Z.
No auto sharding, no auto rebalance at all.

Thanks so much

Alberto Lerner给的第一个答复(主要都是官方文档信息,也是下面的脚本主要用到的东西):


You can split at any point you like, even at non-existing keys:

If you want to move a chunk manually

And you can stop the balancer

Alvin Richards给出一个脚本用例:


Here's a script I use from the mongo shell.

Will wil need to change
-- number of shards
-- min and max values of the shard key
-- value delta between chunks
-- collection name


use admin
function pad(number, length) {
    var str = '' + number;
    while (str.length < length) {
        str = '0' + str;
    return str;


var shards=5
var min_value=-2061389163
var max_value=2061389163
var inc=40000000
var collection_name="scaleout.blogs"

for (j=0,i=min_value; i < max_value; i+=inc,j++)  {
        db.runCommand( { split : collection_name, middle : { ts : i }} );
        db.runCommand( { moveChunk: collection_name, find : { ts : i+1}, to :
"shard" + pad((j%shards),4) } );




另外一个google group上的讨论是“fastest way to import a large dataset”, 这里面也提到了先manual-sharding,然后使用多个mongoimport分别导数据到相应shard中去,有兴趣的翻墙看看吧!




Hey there,

we are having big trouble importing a large dataset into mongo in a
reasonable time.
We have a 6 node sharded cluster and we tried a couple of different

The dataset consist of 1.4B small documents. Average size about 70
Fastest import we have seen was 24 hours.

We would have thought that a mongos per machine with a couple of
mongoimports per node should give the best results. But oddly enough -
that's not faster - it's rather slower than a single mongoimport for
the whole cluster.

Right now I am wondering if there is a way to import the pre-sharded
documents into the shard databases using the --dbpath option and the
adjust the config database accordingly. Would that work? ...and be
Indexes beforehand or after?




What is your shard key?
- Index after is better than index before hand
- If you already preshard the data, turn the balancer off first
- You should break the import data in the same way that you preshard
and use mongoimport to load them up
- Your data should be sorted by shard key if possible

Torsten Curdt


> What is your shard key?

We tried _id (ObjectIds) as well as our preferred keys

> - Index after is better than index before hand

So far we have been trying to index while importing.
We can give that another try.

> - If you already preshard the data, turn the balancer off first

I would shut down config server and mongos for the import.
Is that what you mean?

> - You should break the import data in the same way that you preshard

Of course.

> and use mongoimport to load them up
> - Your data should be sorted by shard key if possible


Biggest question: will it be worth it?




- If you use ObjectId as a shard key, you won't be able to scale the
import. The maximum speed is limited by the speed of one machine.
- You can leave your config server and mongos up and do the import via
- To turn off balancer,
   > use config
   > db.settings.update({_id:"balancer"},{$set : {stopped:true}},

Torsten Curdt


> - If you use ObjectId as a shard key, you won't be able to scale the
> import. The maximum speed is limited by the speed of one machine.

Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.

> - You can leave your config server and mongos up and do the import via
> mongos.

Confused - that's what I was doing before.

mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport

Or do you mean...

Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with

> - To turn off balancer,
>   > use config
>   > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)

Ah ... OK.



- ObjectId is keyed by timestamp first.
- You can use --dbpath but you have to take mongod offline. I just
recommended another way without taking down mongod. As you will
perform mongoimport splitted by shard key, mongos should route
requests to one server per mongoimport.
- Do you have mongostat, iostat, db.stats() during import process?

Torsten Curdt


> - ObjectId is keyed by timestamp first.

True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.

> - You can use --dbpath but you have to take mongod offline.

That's fine.

> I just recommended another way without taking down mongod. As you will
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.

But doesn't that depend on what chunks are configured in the config server?

> - Do you have mongostat, iostat, db.stats() during import process?

Certainly. With the current non-pre-sharded import...

- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.

- iostat shows quite uneven activity across the nodes.

- db.stats() we are monitoring over time. the following shows the
objects graphed:



if you use the sharding key [user, time], turn off balancer, you
should see better result. Can you post iostat and mongostat result?

Eliot Horowitz


What version are you on?
You should shard on user,time as you want to do.
The speed is probably because of migrations.

2 main options:
 - try 1.7.5
 - pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
   this will prevent migrates.

I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.
