mongodb调优那些事(二)-索引
来源:互联网 发布:股票编程怎么输入 编辑:程序博客网 时间:2024/05/16 06:59
第一次使用mongo开发系统,最近遇到了性能问题,发现索引建的不好,该集合是用来存储采集的信息,字段包括发布日期publishDate(int型)、分值blendedScore(int型)、发布时间publishTime(date型)、相似数量similarNum(int型)、是否垃圾isRubbish(int型)以及信息自身的一些字段(不涉及查询及排序在此不再罗列),数量大概有千万级。
业务描述:
1)可按发布日期倒序+得分倒序+发布时间倒序。
2)可按相似数量倒序+发布时间倒序。
3)可按发布时间倒序。
以上三种情况都可按照发布日期、是否垃圾、关键词等进行搜索。
简化为mongo查询语句如下:
1)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishDate:-1,blendedScore:-1,publishTime:-1}2)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{similarNum:-1,publishTime:-1}3)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishTime:-1}
为了得到最优的结果进行了以下尝试
第一次创建的索引:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,similarNum:-1,publishTime:-1},{name:"SortIdx",background:true});
考虑到节省空间,第一次按上面的方式试图将所有排序字段都放到一个索引中。这种方式的结果是:
1)业务1走索引速度最快。
2)业务2走索引,速度可接受。
3)业务3走索引,但速度慢到无法接受。
这种结果显然无法接受,所以继续修改索引。
第二次创建的索引:
db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true}); //索引2db.infos.ensureIndex({publishTime:-1},{name:"TimeSortIdx",background:true}); //索引3
由于每个业务的排序方式差别较大,而mongodb每次查询只会使用一个索引所以按照上面的方式创建了三个索引。这样创建后每个业务查询速度都很快。由于isRubbish是各个查询基本都会带的条件,所以考虑将isRubbish加入索引中,但是isRubbish字段只有0、1的取值,可能用作索引的意义不大,所以针对业务1的查询测试了一下。测试方法就是在该集合上再创建一个增加了isRubbish的字段的索引,创建语句:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx1",background:true})
使用explain来查看执行情况,分别强制使用ScoreSortIdx和ScoreSortIdx1索引,mongo语句如下:
db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx").explain("executionStats");db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx1").explain("executionStats")
ScoreSortIdx explain的执行结果:
{ "executionStats": { "executionSuccess": true, "nReturned": 1, "executionTimeMillis": 62, "totalKeysExamined": 10438, "totalDocsExamined": 10438, "executionStages": { "stage": "LIMIT", "nReturned": 1, "executionTimeMillisEstimate": 50, "works": 10439, "advanced": 1, "needTime": 10437, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 1, "invalidates": 0, "limitAmount": 0, "inputStage": { "stage": "SKIP", "nReturned": 1, "executionTimeMillisEstimate": 50, "works": 10438, "advanced": 1, "needTime": 10437, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 0, "invalidates": 0, "skipAmount": 0, "inputStage": { "stage": "FETCH", "filter": { "isRubbish": { "$eq": 0 } }, "nReturned": 10001, "executionTimeMillisEstimate": 50, "works": 10438, "advanced": 10001, "needTime": 437, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 0, "invalidates": 0, "docsExamined": 10438, "alreadyHasObj": 0, "inputStage": { "stage": "IXSCAN", "nReturned": 10438, "executionTimeMillisEstimate": 30, "works": 10438, "advanced": 10438, "needTime": 0, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 0, "invalidates": 0, "keyPattern": { "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx", "isMultiKey": false, "direction": "forward", "indexBounds": { "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10438, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } } } } }}
ScoreSortIdx1 explain的执行结果:
{ "executionStats": { "executionSuccess": true, "nReturned": 1, "executionTimeMillis": 14, "totalKeysExamined": 10001, "totalDocsExamined": 10001, "executionStages": { "stage": "LIMIT", "nReturned": 1, "executionTimeMillisEstimate": 10, "works": 10002, "advanced": 1, "needTime": 10000, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 1, "invalidates": 0, "limitAmount": 0, "inputStage": { "stage": "SKIP", "nReturned": 1, "executionTimeMillisEstimate": 10, "works": 10001, "advanced": 1, "needTime": 10000, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 0, "invalidates": 0, "skipAmount": 0, "inputStage": { "stage": "FETCH", "nReturned": 10001, "executionTimeMillisEstimate": 10, "works": 10001, "advanced": 10001, "needTime": 0, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 0, "invalidates": 0, "docsExamined": 10001, "alreadyHasObj": 0, "inputStage": { "stage": "IXSCAN", "nReturned": 10001, "executionTimeMillisEstimate": 10, "works": 10001, "advanced": 10001, "needTime": 0, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 0, "invalidates": 0, "keyPattern": { "isRubbish": 1, "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx1", "isMultiKey": false, "direction": "forward", "indexBounds": { "isRubbish": [ "[0.0, 0.0]" ], "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10001, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } } } } }}
可以看到使用ScoreSortIdx1的查询时间比ScoreSortIdx不只快了一倍,当使用上面的条件进行count时,这种差距会更明显。
所以最终创建的索引:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({isRubbish:1,similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true}); //索引2db.infos.ensureIndex({isRubbish:1,publishTime:-1},{name:"TimeSortIdx",background:true}); //索引3
发现mongodb的多列索引的第一个字段必须作为排序或查询字段,否则会执行全表扫描。所以在应用中所有的查询都保证必须使用isRubbish作为查询条件,即使不对isRubbish进行过滤,也会将查询条件加上{isRubbish:{$gte:0}}以保证查询都走索引。另外尽量丰富其他查询条件,减少命中的结果集数量。
后面读了一篇大神的文章:http://www.mongoing.com/eshu_explain3 mongo执行计划的详细讲解,得到一个建多列索引的原则:{精确匹配字段,排序字段,范围查询字段} 这样建起来的索引排序会更为高效(已做实际验证,的确是这样)。
由于该业务中存在{isRubbish:{$gte:0}}即范围查询的情况,所以最后将索引改成了下面的方式:
db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1,isRubbish:1},{name:"ScoreSortIdx",background:true}); //索引1db.infos.ensureIndex({similarNum:-1,publishTime:-1,isRubbish:1},{name:"HotSortIdx",background:true}); //索引2db.infos.ensureIndex({publishTime:-1,isRubbish:1},{name:"TimeSortIdx",background:true}); //索引3
- mongodb调优那些事(二)-索引
- MongoDB - 索引 (二)使用
- mongodb调优那些事(一)-系统设置
- mongodb调优那些事(三)-副本集
- mongodb调优那些事(四)-遇到的坑
- mongodb调优那些事(一)-系统设置
- mongodb中的索引二
- mongodb(二):索引基础知识
- mongodb指南(十八) - developer zone - 索引(二)_id索引、复合索引、稀疏索引、数组索引、唯一索引
- 学习MongoDB 八: MongoDB索引(索引限制条件)(二)
- 学习MongoDB 八: MongoDB索引(索引限制条件)(二)
- 二、MongoDB的高级查询(聚合、游标、管道、索引)
- mongodb索引讲解与性能调优
- MongoDB总结(二):mongoDB的索引、备份和恢复、固定集合及数据库安全
- mongoDB---索引(转)
- MongoDB七(索引)
- mongoDB 索引(四)
- andoirdUI那些事(二)
- Github 项目收藏
- linux下CMYSQL编程1
- 机器学习算法汇总:人工神经网络、深度学习及其它
- HDOJ 2111 Saving HDU
- jxl操作实现导出读取excel
- mongodb调优那些事(二)-索引
- VMware下Ubuntu与宿主Windows共享文件夹
- C++虚继承(五) --- 虚拟继承的概念
- solrconfig.xml配置详解
- Android Studio使用技巧:导入第三方类库
- linux下c mysql编程函数总结2
- 一个月赚多少钱 才能在北京生活下去!
- Problem - 2111 Saving HDU
- Struts 2 和 JSF 的区别