Mongo 聚合框架-Aggregate(一)

来源：互联网发布：mac如何添加输入法编辑：程序博客网时间：2024/05/23 23:56

一概念

1、简介

　　使用聚合框架可以对集合中的文档进行变换和组合。可以用多个构件创建一个管道，用于对一连串的文档进行处理。构件有：筛选、投射、分组、排序、限制和跳过。
MongoDB的聚合管道将MongoDB文档在一个管道处理完毕后将结果传递给下一个管道处理，管道操纵是可以重复的。

2、管道表达式

　　管道操作符作为“键”,所对应的“值”叫做管道表达式。例如{＄match:{status:”A”}}，＄match称为管道操作符，而{status:”A”}称为管道表达式，它可以看作是管道操作符的操作数(Operand)，每个管道表达式是一个文档结构，它是由字段名、字段值、和一些表达式操作符组成的，例如管道表达式＄group: { _id: null, count: { ＄sum: 1 } } 就包含了一个表达式操作符＄sum进行累加求和。
每个管道表达式只能作用于处理当前正在处理的文档，而不能进行跨文档的操作。管道表达式对文档的处理都是在内存中进行的。除了能够进行累加计算的管道表达式外，其他的表达式都是无状态的，也就是不会保留上下文的信息。累加性质的表达式操作符通常和$group操作符一起使用，来统计该组内最大值、最小值等

3、aggregate语法

db.collection.aggregate(pipeline, options)

　pipeline：管道操作数组，也可以为文档类型，如果该参数不是数组形式，无法指定options参数
　options：aggregate命令的附加选项，文档类型。
　　1)explain：布尔值，指定返回结果是否显示该操作的执行计划
　　2)allowDiskUse：
　　　布尔值，指定该聚合操作是否使用磁盘。每个阶段管道限制为100MB的内存。如果一个节点管道超过这个极限,MongoDB将产生一个错误。为了能够在处理大型数据集,可以设置allowDiskUse为true来在聚合管道节点把数据写入临时文件。这样就可以解决100MB的内存的限制。
　　3)cursor：
　　4)maxTimeMS：
　　5)bypassDocumentValidation：
　　6)readConcern：
　　7)collation：

4、分片使用

　db.collection.aggregate()可以作用在分片集合，但结果不能输在分片集合，MapReduce可以作用在分片集合，结果也可以输在分片集合。

5、聚合中的游标

　 db.collection.aggregate()方法可以返回一个指针（cursor），数据放在内存中，直接操作。跟Mongo shell 一样指针操作。

二管道操作符

1、$match

　作用
　　＄match用于对文档集合进行筛选，在＄match中不能使用＄geoNear地理空间操作符及＄where表达式操作符。
　示例
　　获取分数大于70小于或等于90记录，然后将符合条件的记录送到下一阶段$group管道操作符进行处理。

db.articles.aggregate( [     { $match : { score : { $gt : 70, $lte : 90 } } },     { $group: { _id: null, count: { $sum: 1 } } }    ] );

2、$project

　作用
　　使用”$project”可以从子文档中提取字段，可以重命名字段，可以移除字段。
　示例1
　　获取article集合的title字段及author字段，将字段_id屏蔽

db.article.aggregate(    { $project : {        _id : 0 ,        title : 1 ,        author : 1    }});

　示例2
　　通过使用＄add给pageViews字段的值加10，然后将结果赋值给一个新的字段:doctoredPageViews

db.article.aggregate({     $project : {    title : 1,    doctoredPageViews : { $add:["$pageViews", 10] }}});

　　注:必须将$add计算表达式放到中括号里面

3、$group

　作用
　　＄group操作可以根据文档特定字段的不同值进行分组，＄group不能用流式工作方式对文档进行处理。在＄group阶段，内存的大小最大可以为100Mb，如果超过则会出错。当处理大量数据是，可以将参数allowDiskUse置为true，如此＄group操作符可以写入临时文件，
　格式
　　{ $group: { _id: ＜expression>,＜field1>: { ＜accumulator1> : ＜expression1> }, … } }

　　_id字段是强制性的，可以指定其为空来进行其他累加操作，accumulator操作符可以为：＄sum、＄avg、＄first、＄last、＄max、＄min、＄push、＄addToSet、＄stdDevPop、＄stdDevSamp
　示例1

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

　　根据日期的年月日进行分组统计总价格和平均数量及其数量

db.sales.aggregate(   [      {        $group : {           _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },           averageQuantity: { $avg: "$quantity" },           count: { $sum: 1 }        }      }   ])

　　结果如下

{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 }, "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 }, "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 }, "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }

　示例2

{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }

　　统计每个作者的所有书籍

db.books.aggregate([     { $group : { _id : "$author", books: { $push: "$title" } } }   ])

　　结果如下

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

　　如果将＄title换为＄＄ROOT,则会将整个文档都push进数组之中，结果如下：

{  "_id" : "Homer",  "books" :     [       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }     ]}{  "_id" : "Dante",  "books" :     [       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }     ]}

　　注：$group的输出是无序的，且该操作目前是在内存中进行的，所以不能用它来对大量个数的文档进行分组。

4、$unwind

　作用
　　unwind分割数组嵌入到自己顶层文件,会将数组中的每一个值拆分为单独的文档,这样一个文档会被分为数组长度个文档。
　示例
　　将博客评论映射为cts并查找评论作者为Mark的文档

> db.blog.aggregate([　　{"project":{"cts":"$comments"}},　　　{"$unwind":"$cts"},　　{"$match":{"cts.author":"Mark"}　　]);

　　注：{＄unwind:”＄cts”})不能忘了＄符号，且如果＄unwind目标字段不存在的话，那么该文档将被忽略过滤掉。

5、$limit

　作用
　　＄limit会接受一个数字n，返回结果集中的前n个文档
　语法
　　{ $limit: ＜positive integer> }

　示例
　　查询5条文档记录

db.article.aggregate(    { $limit : 5 });

6、$skip

　作用
　　＄skip接受一个数字n，丢弃结果集中的前n个文档
　语法
　　｛$skip: ＜positive integer> }
　示例
　　获取article集合中第5条数据之后的数据

db.article.aggregate(    { $skip : 5 });

7、$sort

　作用
　　可以根据任何字段进行排序，”$sort”不能用流式工作方式对文档进行处理,必须要接收到所有文档之后才能进行排序。
　　当排序的字段BSON类型不一致时，按一下顺序进行排序

MinKey (internal type)Null<Numbers (ints, longs, doubles, decimals)Symbol,StringObjectArrayBinDataObjectIdBooleanDateTimestampRegular ExpressionMaxKey (internal type)

　　如果将＄sort放到管道前面或者放在＄project、＄unwind和＄group之前可以利用索引，提高效率，如果＄project＄unwind＄group操作发生在＄sort之前，＄sort无法使用任何索引。
　　MongoDB对内存做了优化，在管道中如果＄sort出现在＄limit之前的话，＄sort只会对前＄limit个文档进行操作，这样在内存中也只会保留前＄limit个文档，从而可以极大的节省内存。＄sort操作是在内存中进行的，如果其占有的内存超过物理内存的10%，程序会产生错误
　语法
　　{ ＄sort: { ＜field1>: ＜sort order>, ＜field2>: ＜sort order> … } }
　　sort order可以为：
　　　1 ：升序
　　　-1 ：降序
　　　{ ＄meta: “textScore” }
　示例
　　按照age降序,posts升序排序

db.users.aggregate([     { $sort : { age : -1, posts: 1 } }   ]

8、$lookup

　作用
　　$lookup可以进行两个集合之间左连接操作
　语法：

{   $lookup:     {       from: <collection to join>,       localField: <field from the input documents>,       foreignField: <field from the documents of the "from" collection>,       as: <output array field>     }}

　　from：需要连接的集合
　　localField：当前集合中关联的字段
　　foreignField：外部集合关联的字段
　　as：查询结果输出数组
　示例
　　orders集合

{ "_id" : 1, "item" : "abc", "price" : 12, "quantity" : 2 }{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1 }{ "_id" : 3  }

　　inventory集合

{ "_id" : 1, "sku" : "abc", description: "product 1", "instock" : 120 }{ "_id" : 2, "sku" : "def", description: "product 2", "instock" : 80 }{ "_id" : 3, "sku" : "ijk", description: "product 3", "instock" : 60 }{ "_id" : 4, "sku" : "jkl", description: "product 4", "instock" : 70 }{ "_id" : 5, "sku": null, description: "Incomplete" }{ "_id" : 6 }

　　查询订单及其商品存货详情

db.orders.aggregate([    {      $lookup:        {          from: "inventory",          localField: "item",          foreignField: "sku",          as: "inventory_docs"        }   }])

　　结果如下

{  "_id" : 1,   "item" : "abc",  "price" : 12,  "quantity" : 2,  "inventory_docs" : [    { "_id" : 1, "sku" : "abc", description: "product 1", "instock" : 120 }  ]}{  "_id" : 2,  "item" : "jkl",  "price" : 20,  "quantity" : 1,  "inventory_docs" : [    { "_id" : 4, "sku" : "jkl", "description" : "product 4", "instock" : 70 }  ]}{  "_id" : 3,  "inventory_docs" : [    { "_id" : 5, "sku" : null, "description" : "Incomplete" },    { "_id" : 6 }  ]}

　　注：如果localField是一个数组，需要先用＄unwind操作将其分割嵌入

9、$count

　作用
　　查询符合条件的文档数量
　语法
　　{ ＄count: ＜string> }
　示例

{ "_id" : 1, "subject" : "History", "score" : 88 }{ "_id" : 2, "subject" : "History", "score" : 92 }{ "_id" : 3, "subject" : "History", "score" : 97 }{ "_id" : 4, "subject" : "History", "score" : 71 }{ "_id" : 5, "subject" : "History", "score" : 79 }{ "_id" : 6, "subject" : "History", "score" : 83 }

　　查询分数大于80分的集合的数量

db.scores.aggregate(  [    {      $match: {        score: {          $gt: 80        }      }    },    {      $count: "passing_scores"    }  ])

　　结果如下

{ "passing_scores" : 4 }

10、$redact

　作用
　　根据字段所处的document结构的级别，对文档进行“修剪”，它通常和“判断语句if-else”结合使用即＄cond。
　　＄redact可选值有3个：
　　　1)＄＄DESCEND：包含当前document级别的所有fields。当前级别字段的内嵌文档将会被继续检测。
　　　2)＄＄PRUNE：不包含当前文档或者内嵌文档级别的所有字段，不会继续检测此级别的其他字段，即使这些字段的内嵌文档持有相同的访问级别。
　　　3)＄＄KEEP：包含当前文档或内嵌文档级别的所有字段，不再继续检测此级别的其他字段，即使这些字段的内嵌文档中持有不同的访问级别。
　示例

 {       _id: 1,       tags: [ "G", "STLW" ],       year: 2014,       subsections: [         {           subtitle: "Section 1",           tags: [ "SI", "G" ],         },         {           subtitle: "Section 2",           tags: [ "STLW" ],         },         {           subtitle: "Section 3",           tags: [ "TK" ],           content: {             tags: [ "G" ]           }         }       ]     }

　　根据条件修减文档：如果tags字段与数组[“STLW”,”G”]的交集大于0，则保留该tags字段同层的文档，否则排除

db.ract.aggregate([    {"$redact":        {"$cond":            {"if":                {"$gt":[                    {"$size":                        {"$setIntersection":["$tags",["STLW","G"]]}},0]},        then:"$$DESCEND","else":"$$PRUNE"}}}]);

　　结果如下

   {       "_id" : 1,       "tags" : [ "G", "STLW" ],       "year" : 2014,       "subsections" : [         {           "subtitle" : "Section 1",           "tags" : [ "SI", "G" ],           "content" : "Section 1"         },         {           "subtitle" : "Section 2",           "tags" : [ "STLW" ],           "content" : "Section 2"         }       ]     }

　　对于此文档最高级别的tags值为[“G”,”STLW”]，此级别值为＄＄DESCEND，即此tags同级别的其他字段将会包含；那么继续检测“subsections.tags”级别的所有文档（是个数组，则逐个检测），基本思路类似，如果此级别返回DECEND那么继续检测“subsections.tags.content.tags”是否符合访问规则，如果返回＄＄PRUNE，那么此tags所在的内嵌文档的所有字段将被排除。Section 3的tags不符合要求，即使与此tags同级别的contents.tags符合访问规则，还是会被排除。

阅读全文

2 0

Mongo 聚合框架-Aggregate(一)

一 概念

1、简介

2、管道表达式

3、aggregate语法

4、分片使用

5、聚合中的游标

二 管道操作符

1、$match

2、$project

3、$group

4、$unwind

5、$limit

6、$skip

7、$sort

8、$lookup

9、$count

10、$redact

一概念

二管道操作符