mongodb调优那些事(二)-索引

来源:互联网 发布:股票编程怎么输入 编辑:程序博客网 时间:2024/05/16 06:59

第一次使用mongo开发系统,最近遇到了性能问题,发现索引建的不好,该集合是用来存储采集的信息,字段包括发布日期publishDate(int型)、分值blendedScore(int型)、发布时间publishTime(date型)、相似数量similarNum(int型)、是否垃圾isRubbish(int型)以及信息自身的一些字段(不涉及查询及排序在此不再罗列),数量大概有千万级。
业务描述:
1)可按发布日期倒序+得分倒序+发布时间倒序。
2)可按相似数量倒序+发布时间倒序。
3)可按发布时间倒序。
以上三种情况都可按照发布日期、是否垃圾、关键词等进行搜索。

简化为mongo查询语句如下:

1)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishDate:-1,blendedScore:-1,publishTime:-1}2)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{similarNum:-1,publishTime:-1}3)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishTime:-1}

为了得到最优的结果进行了以下尝试

第一次创建的索引:

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,similarNum:-1,publishTime:-1},{name:"SortIdx",background:true});

考虑到节省空间,第一次按上面的方式试图将所有排序字段都放到一个索引中。这种方式的结果是:
1)业务1走索引速度最快。
2)业务2走索引,速度可接受。
3)业务3走索引,但速度慢到无法接受。
这种结果显然无法接受,所以继续修改索引。

第二次创建的索引:

db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true});  //索引2db.infos.ensureIndex({publishTime:-1},{name:"TimeSortIdx",background:true});  //索引3

由于每个业务的排序方式差别较大,而mongodb每次查询只会使用一个索引所以按照上面的方式创建了三个索引。这样创建后每个业务查询速度都很快。由于isRubbish是各个查询基本都会带的条件,所以考虑将isRubbish加入索引中,但是isRubbish字段只有0、1的取值,可能用作索引的意义不大,所以针对业务1的查询测试了一下。测试方法就是在该集合上再创建一个增加了isRubbish的字段的索引,创建语句:

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx1",background:true})

使用explain来查看执行情况,分别强制使用ScoreSortIdx和ScoreSortIdx1索引,mongo语句如下:

db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx").explain("executionStats");db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx1").explain("executionStats")

ScoreSortIdx explain的执行结果:

{  "executionStats": {    "executionSuccess": true,    "nReturned": 1,    "executionTimeMillis": 62,    "totalKeysExamined": 10438,    "totalDocsExamined": 10438,    "executionStages": {      "stage": "LIMIT",      "nReturned": 1,      "executionTimeMillisEstimate": 50,      "works": 10439,      "advanced": 1,      "needTime": 10437,      "needFetch": 0,      "saveState": 81,      "restoreState": 81,      "isEOF": 1,      "invalidates": 0,      "limitAmount": 0,      "inputStage": {        "stage": "SKIP",        "nReturned": 1,        "executionTimeMillisEstimate": 50,        "works": 10438,        "advanced": 1,        "needTime": 10437,        "needFetch": 0,        "saveState": 81,        "restoreState": 81,        "isEOF": 0,        "invalidates": 0,        "skipAmount": 0,        "inputStage": {          "stage": "FETCH",          "filter": {            "isRubbish": {              "$eq": 0            }          },          "nReturned": 10001,          "executionTimeMillisEstimate": 50,          "works": 10438,          "advanced": 10001,          "needTime": 437,          "needFetch": 0,          "saveState": 81,          "restoreState": 81,          "isEOF": 0,          "invalidates": 0,          "docsExamined": 10438,          "alreadyHasObj": 0,          "inputStage": {            "stage": "IXSCAN",            "nReturned": 10438,            "executionTimeMillisEstimate": 30,            "works": 10438,            "advanced": 10438,            "needTime": 0,            "needFetch": 0,            "saveState": 81,            "restoreState": 81,            "isEOF": 0,            "invalidates": 0,            "keyPattern": {              "publishDate": -1,              "blendedScore": -1,              "publishTime": -1            },            "indexName": "ScoreSortIdx",            "isMultiKey": false,            "direction": "forward",            "indexBounds": {              "publishDate": [                "[20160318.0, 20160310.0]"              ],              "blendedScore": [                "[MaxKey, MinKey]"              ],              "publishTime": [                "[MaxKey, MinKey]"              ]            },            "keysExamined": 10438,            "dupsTested": 0,            "dupsDropped": 0,            "seenInvalidated": 0,            "matchTested": 0          }        }      }    }  }}

ScoreSortIdx1 explain的执行结果:

{  "executionStats": {    "executionSuccess": true,    "nReturned": 1,    "executionTimeMillis": 14,    "totalKeysExamined": 10001,    "totalDocsExamined": 10001,    "executionStages": {      "stage": "LIMIT",      "nReturned": 1,      "executionTimeMillisEstimate": 10,      "works": 10002,      "advanced": 1,      "needTime": 10000,      "needFetch": 0,      "saveState": 78,      "restoreState": 78,      "isEOF": 1,      "invalidates": 0,      "limitAmount": 0,      "inputStage": {        "stage": "SKIP",        "nReturned": 1,        "executionTimeMillisEstimate": 10,        "works": 10001,        "advanced": 1,        "needTime": 10000,        "needFetch": 0,        "saveState": 78,        "restoreState": 78,        "isEOF": 0,        "invalidates": 0,        "skipAmount": 0,        "inputStage": {          "stage": "FETCH",          "nReturned": 10001,          "executionTimeMillisEstimate": 10,          "works": 10001,          "advanced": 10001,          "needTime": 0,          "needFetch": 0,          "saveState": 78,          "restoreState": 78,          "isEOF": 0,          "invalidates": 0,          "docsExamined": 10001,          "alreadyHasObj": 0,          "inputStage": {            "stage": "IXSCAN",            "nReturned": 10001,            "executionTimeMillisEstimate": 10,            "works": 10001,            "advanced": 10001,            "needTime": 0,            "needFetch": 0,            "saveState": 78,            "restoreState": 78,            "isEOF": 0,            "invalidates": 0,            "keyPattern": {              "isRubbish": 1,              "publishDate": -1,              "blendedScore": -1,              "publishTime": -1            },            "indexName": "ScoreSortIdx1",            "isMultiKey": false,            "direction": "forward",            "indexBounds": {              "isRubbish": [                "[0.0, 0.0]"              ],              "publishDate": [                "[20160318.0, 20160310.0]"              ],              "blendedScore": [                "[MaxKey, MinKey]"              ],              "publishTime": [                "[MaxKey, MinKey]"              ]            },            "keysExamined": 10001,            "dupsTested": 0,            "dupsDropped": 0,            "seenInvalidated": 0,            "matchTested": 0          }        }      }    }  }}

可以看到使用ScoreSortIdx1的查询时间比ScoreSortIdx不只快了一倍,当使用上面的条件进行count时,这种差距会更明显。

所以最终创建的索引:

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({isRubbish:1,similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true});  //索引2db.infos.ensureIndex({isRubbish:1,publishTime:-1},{name:"TimeSortIdx",background:true});  //索引3

发现mongodb的多列索引的第一个字段必须作为排序或查询字段,否则会执行全表扫描。所以在应用中所有的查询都保证必须使用isRubbish作为查询条件,即使不对isRubbish进行过滤,也会将查询条件加上{isRubbish:{$gte:0}}以保证查询都走索引。另外尽量丰富其他查询条件,减少命中的结果集数量。

后面读了一篇大神的文章:http://www.mongoing.com/eshu_explain3 mongo执行计划的详细讲解,得到一个建多列索引的原则:{精确匹配字段,排序字段,范围查询字段} 这样建起来的索引排序会更为高效(已做实际验证,的确是这样)。
由于该业务中存在{isRubbish:{$gte:0}}即范围查询的情况,所以最后将索引改成了下面的方式:

db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1,isRubbish:1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({similarNum:-1,publishTime:-1,isRubbish:1},{name:"HotSortIdx",background:true});  //索引2db.infos.ensureIndex({publishTime:-1,isRubbish:1},{name:"TimeSortIdx",background:true});  //索引3
1 0
原创粉丝点击