mongodb pre-splitting sharding测试

来源:互联网 发布:炭知天下价目表 编辑:程序博客网 时间:2024/06/05 19:18

           关于mongodb的预分片技术,网上的资料很多,好处是显而易见的,可以减轻因auto-balancer 要split chunk所带来的性能影响,下面介绍Pre-splitting的操作步骤,仅做为测试。

      首先,搭建环境,最好是Repl Set+Sharding环境,此次测试是在自己的电脑上做的,两个分片,每个分片3个副本集,要做pre-splitting,最好先计算要分片的collection有多大,平均每条数据有多大,如果不清楚,也可以随便建个coll,插入一条,然后db.XXX.stats()可以看到avgsize,这就是一条数据的大小,单位是byte.

       以下是测试过程,本次测试设定chunksize为5M,采用:for(var i=1; i<=200000; i++){
db.test.insert({_id:i,name:"wzw",age:i,uid:i+1});}这条语句插入数据(放到最后操作,此处做为示例),每条数据大概112bytes,20万条数据大概21M。

        本次测试在test库中测试。

        use test;

        mongos> sh.enableSharding("test");

{ "ok" : 1 }

        mongos> sh.shardCollection("test.test",{"_id":1});
{ "collectionsharded" : "test.test", "ok" : 1 }

       20万条数据 分配在两个分片上,每个分片大概10M(10W条)数据,那么做以下分配:

        划分10个chunk,每个chunk 20000条数据;

        操作之前先停止自动均衡;

mongos> use test
switched to db test
mongos> sh.stopBalancer();
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
mongos> use admin
switched to db admin
mongos> for( var y=1; y<10; y++ ) {
...     var x=20000
...     var prefix = x*y;
...     db.runCommand( { split : "test.test" , middle : { "_id": prefix } } );
...   }
{ "ok" : 1 }
mongos> sh.status();
--- Sharding Status --- 
  sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("55c82589acfaad53c0a64845")
}
  shards:
{  "_id" : "shard1",  "host" : "shard1/127.0.0.1:11111,127.0.0.1:22222,127.0.0.1:33333" }
{  "_id" : "shard2",  "host" : "shard2/127.0.0.1:44444,127.0.0.1:55555,127.0.0.1:60000" }
  balancer:
Currently enabled:  no
Currently running:  no
Failed balancer rounds in last 5 attempts:  0
Migration Results for the last 24 hours: 
85 : Success
  databases:
{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
{  "_id" : "test",  "partitioned" : true,  "primary" : "shard1" }
test.test
shard key: { "_id" : 1 }
chunks:
shard1 10
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 20000 } on : shard1 Timestamp(1, 1) 
{ "_id" : 20000 } -->> { "_id" : 40000 } on : shard1 Timestamp(1, 3) 
{ "_id" : 40000 } -->> { "_id" : 60000 } on : shard1 Timestamp(1, 5) 
{ "_id" : 60000 } -->> { "_id" : 80000 } on : shard1 Timestamp(1, 7) 
{ "_id" : 80000 } -->> { "_id" : 100000 } on : shard1 Timestamp(1, 9) 
{ "_id" : 100000 } -->> { "_id" : 120000 } on : shard1 Timestamp(1, 11) 
{ "_id" : 120000 } -->> { "_id" : 140000 } on : shard1 Timestamp(1, 13) 
{ "_id" : 140000 } -->> { "_id" : 160000 } on : shard1 Timestamp(1, 15) 
{ "_id" : 160000 } -->> { "_id" : 180000 } on : shard1 Timestamp(1, 17) 
{ "_id" : 180000 } -->> { "_id" : { "$maxKey" : 1 } } on : shard1 Timestamp(1, 18) 
{  "_id" : "user",  "partitioned" : true,  "primary" : "shard1" }
user.test
shard key: { "_id" : 1 }
chunks:
shard1 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard1 Timestamp(1, 0) 

        可以看到,系统按照预先定义的规则分配了10个chunk,都在shard1上,数据分配很均匀。

mongos> use test
switched to db test
mongos> db.test.stats();
{
"sharded" : true,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"ns" : "test.test",
"count" : 0,
"numExtents" : 1,
"size" : 0,
"storageSize" : 8192,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"avgObjSize" : 0,
"nindexes" : 1,
"nchunks" : 10,
"shards" : {
"shard1" : {
"ns" : "test.test",
"count" : 0,
"size" : 0,
"numExtents" : 1,
"storageSize" : 8192,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(1439189515, 1),
"electionId" : ObjectId("55c81b008266db61a673c152")
}
}
},
"ok" : 1
}

mongos> sh.startBalancer();

mongos> sh.status();
--- Sharding Status --- 
  sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("55c82589acfaad53c0a64845")
}
  shards:
{  "_id" : "shard1",  "host" : "shard1/127.0.0.1:11111,127.0.0.1:22222,127.0.0.1:33333" }
{  "_id" : "shard2",  "host" : "shard2/127.0.0.1:44444,127.0.0.1:55555,127.0.0.1:60000" }
  balancer:
Currently enabled:  yes
Currently running:  no
Failed balancer rounds in last 5 attempts:  0
Migration Results for the last 24 hours: 
90 : Success
  databases:
{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
{  "_id" : "test",  "partitioned" : true,  "primary" : "shard1" }
test.test
shard key: { "_id" : 1 }
chunks:
shard1 5
shard2 5
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : 20000 } on : shard2 Timestamp(2, 0) 
{ "_id" : 20000 } -->> { "_id" : 40000 } on : shard2 Timestamp(3, 0) 
{ "_id" : 40000 } -->> { "_id" : 60000 } on : shard2 Timestamp(4, 0) 
{ "_id" : 60000 } -->> { "_id" : 80000 } on : shard2 Timestamp(5, 0) 
{ "_id" : 80000 } -->> { "_id" : 100000 } on : shard2 Timestamp(6, 0) 
{ "_id" : 100000 } -->> { "_id" : 120000 } on : shard1 Timestamp(6, 1) 
{ "_id" : 120000 } -->> { "_id" : 140000 } on : shard1 Timestamp(1, 13) 
{ "_id" : 140000 } -->> { "_id" : 160000 } on : shard1 Timestamp(1, 15) 
{ "_id" : 160000 } -->> { "_id" : 180000 } on : shard1 Timestamp(1, 17) 
{ "_id" : 180000 } -->> { "_id" : { "$maxKey" : 1 } } on : shard1 Timestamp(1, 18) 
{  "_id" : "user",  "partitioned" : true,  "primary" : "shard1" }
user.test
shard key: { "_id" : 1 }
chunks:
shard1 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard1 Timestamp(1, 0) 


    可以看到数据 被平均分配到两个分片上,现在插入数据:

for(var i=1; i<=200000; i++){
db.test.insert({_id:i,name:"wzw",age:i,uid:i+1});},通过另一个会话观察:

[root@mongodb ~]# mongostat --port 50000
insert query update delete getmore command flushes mapped  vsize  res faults qr|qw ar|aw netIn netOut conn set repl     time
   849    *0     *0     *0       0   850|0       0        263.0M 8.0M      0   0|0   0|0  125k    54k    2      RTR 03:51:22
   937    *0     *0     *0       0   938|0       0        263.0M 8.0M      0   0|0   0|0  138k    59k    2      RTR 03:51:23
   889    *0     *0     *0       0   890|0       0        263.0M 8.0M      0   0|0   0|0  131k    56k    2      RTR 03:51:24
   862    *0     *0     *0       0   863|0       0        263.0M 8.0M      0   0|0   0|0  127k    54k    2      RTR 03:51:25
   774    *0     *0     *0       0   776|0       0        263.0M 8.0M      0   0|0   0|0  114k    50k    2      RTR 03:51:26
   851    *0     *0     *0       0   852|0       0        263.0M 8.0M      0   0|0   0|0  125k    54k    2      RTR 03:51:27
   857    *0     *0     *0       0   858|0       0        263.0M 8.0M      0   0|0   0|0  126k    54k    2      RTR 03:51:28
   800    *0     *0     *0       0   801|0       0        263.0M 8.0M      0   0|0   0|0  118k    51k    2      RTR 03:51:29
   816    *0     *0     *0       0   817|0       0        263.0M 8.0M      0   0|0   0|0  120k    52k    2      RTR 03:51:30
   838    *0     *0     *0       0   840|0       0        263.0M 8.0M      0   0|0   0|0  123k    53k    2      RTR 03:51:31
insert query update delete getmore command flushes mapped  vsize  res faults qr|qw ar|aw netIn netOut conn set repl     time
   875    *0     *0     *0       0   876|0       0        263.0M 8.0M      0   0|0   0|0  129k    55k    2      RTR 03:51:32
   778    *0     *0     *0       0   779|0       0        263.0M 8.0M      0   0|0   0|0  114k    50k    2      RTR 03:51:33
   850    *0     *0     *0       0   851|0       0        263.0M 8.0M      0   0|0   0|0  125k    54k    2      RTR 03:51:34
   897    *0     *0     *0       0   898|0       0        263.0M 8.0M      0   0|0   0|0  132k    56k    2      RTR 03:51:35
   848    *0     *0     *0       0   850|0       0        263.0M 8.0M      0   0|0   0|0  125k    54k    2      RTR 03:51:36
   846    *0     *0     *0       0   847|0       0        263.0M 8.0M      0   0|0   0|0  124k    54k    2      RTR 03:51:37
   886    *0     *0     *0       0   887|0       0        263.0M 8.0M      0   0|0   0|0  130k    56k    2      RTR 03:51:38
   866    *0     *0     *0       0   867|0       0        263.0M 8.0M      0   0|0   0|0  127k    55k    2      RTR 03:51:39
   880    *0     *0     *0       0   881|0       0        263.0M 8.0M      0   0|0   0|0  129k    55k    2      RTR 03:51:40

mongos> for(var i=1; i<=200000; i++){
... db.test.insert({_id:i,name:"wzw",age:i,uid:i+1});}
WriteResult({ "nInserted" : 1 })

mongos> db.test.stats();
{
"sharded" : true,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"ns" : "test.test",
"count" : 200000,
"numExtents" : 14,
"size" : 22400000,
"storageSize" : 45015040,
"totalIndexSize" : 5625088,
"indexSizes" : {
"_id_" : 5625088
},
"avgObjSize" : 112,
"nindexes" : 1,
"nchunks" : 10,
"shards" : {
"shard1" : {
"ns" : "test.test",
"count" : 100001,
"size" : 11200112,
"avgObjSize" : 112,
"numExtents" : 7,
"storageSize" : 22507520,
"lastExtentSize" : 11325440,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 1,
"totalIndexSize" : 2812544,
"indexSizes" : {
"_id_" : 2812544
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(1439189515, 1),
"electionId" : ObjectId("55c81b008266db61a673c152")
}
},
"shard2" : {
"ns" : "test.test",
"count" : 99999,
"size" : 11199888,
"avgObjSize" : 112,
"numExtents" : 7,
"storageSize" : 22507520,
"lastExtentSize" : 11325440,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 1,
"totalIndexSize" : 2812544,
"indexSizes" : {
"_id_" : 2812544
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(1439187391, 1),
"electionId" : ObjectId("55c82518bd50f12a8dd39e0f")
}
}
},
"ok" : 1
}

可以看到,test表被平均分配到两个分片上。

至此,mongodb pre-splitting测试结束,此次测试过程很简单,但是在做预分配chunks时要计算好如何分配,如果没做好,可能导致数据不均衡。当然在生产环境中,可以多做些chunks,为以后数据扩展做准备,并且片键采用hashed方式,不然数据还是不均衡的,本次因测试,所经以只做了10个chunks未做hashed。


0 0
原创粉丝点击