ElasticSearch核心基础之聚合

来源:互联网 发布:淘宝手工丝网印刷油漆 编辑:程序博客网 时间:2024/05/23 20:20

 

一 聚合的分类

1.1    分组聚合(bucket)

分组聚合,就是指依据哪一个字段进行分组,然后该字段相同的值的文档都在一个bucket中

GET /索引/类型/_search

{

   "size" : 0,

   "aggs" : {

       "bucket聚合名称" : {

           "terms" : {

             "field" : "分组字段"

            }

        }

    }

}

1.2    度量聚合(metric)

度量聚合,一般是指根据分组后结果,进行分组的计算,比如统计求和,求平均数,以及什么最大值或者最小值之类

       "aggs": {

           "metric聚合名称": {

              "metric类型(avg,max之类)": {

                 "field": "统计字段"

              }

            }

         }

1.3 管道聚合(pipeline)

这一类聚合的数据源是其他聚合的输出,然后进行相关指标的计算

 

二 分组聚合

就是根据指定字段,创建文档分组。

3.1 子聚合(Children Aggregation)

子聚合是一个特殊的单分组聚合,可以通过父类型文档的分组聚合产生子类型文档分组。这种聚合依赖于映射中的_parent字段,只有一个选项:type表示父空间的分组应该被映射哪一种子类型

PUT child_example

{

    "mappings": {

        "answer" : {

            "_parent" : {

                "type" :"question"

            }

        }

    }

}

 

PUT child_example/question/1

{

    "body":"<p>I have Windows 2003 server and i bought a new Windows 2008server...",

    "title": "Whatsthe best way to file transfer my site from server to a newer one?",

    "tags": [

       "windows-server-2003",

       "windows-server-2008",

        "file-transfer"

    ]

}

 

PUT child_example/answer/1?parent=1&refresh

{

    "owner": {

        "location":"Norfolk, United Kingdom",

        "display_name":"Sam",

        "id": 48

    },

    "body":"<p>Unfortunately you're pretty much limited to FTP...",

    "creation_date":"2009-05-04T13:45:37.030"

}

PUT child_example/answer/2?parent=1&refresh

{

    "owner": {

        "location":"Norfolk, United Kingdom",

        "display_name":"Troll",

        "id": 49

    },

    "body":"<p>Use Linux...",

    "creation_date":"2009-05-05T13:45:37.030"

}

 

POST child_example/_search?size=0

{

    "aggs":{

        "top-tags":{

            "terms":{

               "field":"tags.keyword",

                 "size": 10

            },

            "aggs": {

               "to-answers": {

                   "children":{

                       "type":"answer"

                    },

                   "aggs":{

                       "top-names":{

                           "terms":{

                               "field":"owner.display_name.keyword",

                                "size": 10

                            }

                        }

                    }

                }

            }

        }

    }

}                         

2.2 直方图聚合(histogram aggregation)

它主要是根据某一个字段,按照这个field的值的各个范围区间,进行bucket分组操作,比如我们将价格分组,0~1000,1000~2000,2000

~3000之类的

"histogram":{

      "field":"分组字段",

      "interval":间隔

}

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

     "prices":{

          "histogram":{

               "field":"price",

               "interval":1000

           }

        }

   }

}

最后讲价格划分成0-1000,1000-2000,2000-3000,3000-4000四个范围

 "aggregations":{

     "prices": {

        "buckets": [

            {

              "key": 0,

              "doc_count": 5

           },

            {

              "key": 1000,

              "doc_count": 2

           },

            {

              "key": 2000,

              "doc_count": 0

           },

            {

              "key": 3000,

              "doc_count": 2

           },

            {

              "key": 4000,

              "doc_count": 2

            }

         ]

      }

   }

2.3 日期直方图(date histogram)

POST/ecommerce/music/_search

{

   "size" : 0,

   "aggs": {

      "sales": {

         "date_histogram": {

            "field":"c_date",

            "interval":"month",

            "format":"yyyy-MM-dd",

            "min_doc_count" : 0,

            "extended_bounds" : {

                "min" :"2016-01-01",

                "max" :"2017-01-01"

            }

         }

      }

   }

}

aggs.sales:表示聚合名字

aggs.sales.date_histogram:表示使用的bucket策略

aggs.sales.date_histogram.field:哪一个字段用于分组

aggs.sales.date_histogram.format:格式化

aggs.sales.date_histogram.interval:时间间隔[interval字段支持多种关键字:`year`, `quarter`, `month`, `week`, `day`, `hour`,`minute`, `second`]

aggs.sales.date_histogram.min_doc_count:0,表示即使没有数据,也要把分组显示出来,文档个数为0

aggs.sales.date_histogram.extended_bounds.min:指定时间下限

aggs.sales.date_histogram.extended_bounds.max:指定时间上限

"aggregations":{

    "sales": {

        "buckets": [

            {

                "key_as_string":"2016-01-01",

                "key": 1451606400000,

                "doc_count": 1

            },

            ......

        ]

    }

}

aggregations.sales:表示该组合聚合名字

aggregations.sales.buckets:分组信息

aggregations.sales.buckets.key_as_string: 组名

aggregations.sales.buckets.key:代表了日期时间戳

aggregations.sales.buckets.doc_count:该组内的文档数

2.4 日期范围聚合(date range)

POST/ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "music_ranage":{

            "date_range":{

               "field":"c_date",

               "format":"yyyy-MM-dd",

                "ranges":{

                   "from":"2016-01-01",

                   "to":"2017-01-01"

                }

            }

        }

    }

}

aggs.music_range:该时间范围分组聚合的名字

aggs.music_range.date_range:  指定聚合策略为时间范围

aggs.music_range.date_range.field: 聚合使用字段

aggs.music_range.date_range.format:字符串格式化

aggs.music_range.date_range.ranges: 一个时间范围的数组,里面可以是一个{"to"},{"from","to"},{"from"}

 

"aggregations":{

  "music_ranage": {

     "buckets": [

        {

           "key":"*-2015-12-31",

           "to": 1451520000000,

           "to_as_string":"2015-12-31",

           "doc_count": 4

        },

        {

           "key":"2016-01-01-2016-12-31",

           "from": 1451606400000,

           "from_as_string":"2016-01-01",

           "to": 1483142400000,

           "to_as_string":"2016-12-31",

           "doc_count": 3

        },

        {

           "key":"2017-01-01-*",

           "from": 1483228800000,

           "from_as_string":"2017-01-01",

           "doc_count": 4

        }

     ]

  }

}

2.5 过滤聚合(filteraggregation & filters aggregation)    

过滤聚合,言外之意就是针对某字段进行过滤

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "music_filter":{

            "filter":{

                "term":{"color": "红"}

            }

        }

    }

}

 

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

       "color_filter":{ "filters":{

                 "filters":{

                      "吉他":{"match":{"desc":"吉他"}},

                      "贝斯":{"match":{"desc":"贝斯"}},

                      "古筝":{"match":{"desc":"古筝"}},

                      "电子琴":{"match":{"desc":"电子琴"}}

                 }}

        }

    }

}

 

2.6 全局聚合(global aggregation)

如果我执行一个查询后,然后做一个聚合,那么聚合结果是针对查询 的,但是此时如果我想做一个全局的数据的聚合,该怎么办呢?

Global聚合可以解决这个问题,我们既可以针对搜索结果进行聚合,还可以针对全局数据聚合,所以聚合出来的结果有2个:

POST /ecommerce/music/_search

{

   "size":0,

   "query":{

       "match":{"desc":"吉他"}

    },

   "aggs":{

      "origin_aggs":{

          "terms":{"field":"origin.keyword"},

          "aggs":{

              "avg_price":{

                  "avg":{"field":"price"}

              }

           }

       },

      "all":{

          "global":{},

          "aggs":{

              "origin_avg_price":{

                   "terms":{"field":"origin.keyword"},

                   "aggs":{

                       "avg_price":{

                          "avg":{"field":"price"}

                       }

                  }

              }

           }

       }

    }

}

aggs.all.global: 是一个空的

aggs.all.aggs: 这个子聚合指定全局聚合,并注册给global聚合

2.7 直方图聚合(histogram aggregation)

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

       "prices":{

           "histogram":{

               "field":"price",

               "interval":1000,

               "order":{"_count":"asc"},

               "extended_bounds":{

                   "min":2000,

                   "max":4000

               }

            }

        }

    }

}

aggs.prices: 指定直方图聚合名字

aggs.price.histogram: 指定聚合策略

aggs.price.histogram.field: 聚合字段

aggs.price.histogram.interval: 聚合间隔

aggs.price.histogram.extended_bounds: 指定上下范围

 

2.8 范围聚合 (rangeaggregation)

{

   "aggs" : {

       "price_ranges" : {

           "range" : {

               "field" : "price",

               "ranges" : [

                   { "to" : 50 },

                   { "from" : 50, "to" : 100 },

                   { "from" : 100 }

               ]

            }

        }

    }

}

2.9 词条聚合(term aggregation)

就是根据那些不分词的字段进行聚合

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

       "origin_aggs":{

           "terms":{"field":"origin.keyword","size":2}

        }

    }

}

Size在这里表示只返回几个bucket

 

三 度量聚合

3.1 求平均(AVG)/最小(min)/最大(max)

POST /ecommerce/music/_search

{

   "size":0,

   "query":{

       "match_all":{}

    },

   "aggs":{

       "prices_avg":{

           "avg":{"field":"price"}

        },

        "prices_sum":{

           "sum":{"field":"price"}

        },

       "prices_min":{

           "min":{"field":"price"}

        },

       "prices_max":{

            "max":{"field":"price"}

        }

    }

}

3.2 去重复统计(cartinality metric)

cartinality metric,对每个bucket中的指定的field进行去重,取去重后的count,类似于count(distcint)

按照时间分组,然后统计每一个分组下不同的颜色有多少

POST /ecommerce/music/_search

{

   "aggs":{

       "years":{

           "date_histogram":{

               "field":"c_date",

               "interval":"year",

               "format":"yyyy"

           },

           "aggs":{

               "distinct_colors":{

                   "cardinality":{

                        "field" :"color.keyword"

                   }

               }

            }

        }

    }

}

aggs.years:指定bucket聚合名字

aggs.years.date_histogram: 指定bucket聚合策略

aggs.years.date_histogram.fields: 聚合字段名字

aggs.years.date_histogram.intevral: 时间间隔

aggs.years.date_histogram.format: 时间格式化

aggs.years.aggs.distinct_colors: 指定每一个分组下metric聚合的名字

aggs.years.aggs.distinct_colors.cardinality:metric聚合策略

aggs.years.aggs.distinct_colors.cardinality:metric去重字段

 

 

3.3 百分比统计(percentilesmetric)

在百分比达到多少的时候,数据情况,比如如下,在百分比50的时候,80和90的时候的销售数据情况

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "brands":{

           "terms":{"field":"brand.keyword"},

            "aggs":{

               "sales_percentiles":{

                   "percentiles":{

                       "field":"sales",

                       "percents":[50,80,95]

                    }

                },

               "sales_avg":{

                    "avg":{

                       "field":"sales"

                    }

                }

            }

        }

    }

}

3.4 Percentile Ranks Aggregation

表示在某个范围内,大概有多少百分比,比如在100-300范围内,各个品牌下的比例

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

       "brands":{

           "terms":{"field":"brand.keyword"},

           "aggs":{

               "sales_percentiles":{

                   "percentile_ranks":{

                       "field":"sales",

                       "values":[1500,2500]

                   }

               }

            }

        }

    }

}

 

POST /ecommerce/music/_search

{

   "size":0,

   "aggs":{

       "sales_percentiles":{

           "percentile_ranks":{

               "field":"sales",

               "values":[100,300]

            }

        }

    }

}

3.5最大、最小、和、平均值。一起求出来(StatsAggregation)

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "stats_aggs":{

           "stats":{"field":"price"}

        }

    }

}

3.6 最大值、最小值、和、平均值、标准差、方差等

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "stats_aggs":{

           "extended_stats":{"field":"price"}

        }

    }

}

3.7获取到每组前N条数据(Top Hits Aggregation)

获取前N条数据,通过term分组聚合,我们也可以做到,但是不能排序或者限制返回哪些字段                                                                                               

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "date_aggs":{

           "date_histogram":{

               "field":"c_date",

               "interval":"year",

               "format":"yyyy"

            },

            "aggs":{

               "top_tag_hits":{

                   "top_hits":{

                        "sort":[

                            {"review":"desc"}

                         ],

                        "_source": {

                           "includes": [

                               "brand","color","review"

                            ]

                        },

                       "size" : 2

                    }

                }

            }

        }

    }

}      

 

3.8 统计字段有多少不一样的值(value_count)

比如统计每一年下有多少不同的颜色

POST /ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "date_aggs":{

           "date_histogram":{

               "field":"c_date",

               "interval":"year",

               "format":"yyyy"

            },

            "aggs" : {

               "colors_count" : { "value_count" : {"field" : "color.keyword" } }

            }  

        }

    }

}

 

四 了解聚合返回结果的元数据

我们看一下聚合返回了些什么东西?

"hits": {

     "total": 11,

     "max_score": 0,

     "hits": []

}

hits:表示命中信息

hits.total: 有多少条记录命中

hits.hits: 哪些数据命中了,但是如果在聚合的时候,我们加上了size=0的参数,那么就不会显示这些数据

"aggregations": {

     "group_colors": {

        "doc_count_error_upper_bound": 0,

        "sum_other_doc_count": 0,

        "buckets": [

            {

              "key": "黑",

              "doc_count": 5,

              "avg_metric": {

                 "value": 1379.6

              }

            }     

]

      }

}            

aggregations:表示聚合信息

aggregations.group_colors:是针对当前聚合的名字

aggregations.buckets:表示分组聚合信息

aggregations.buckets.key: 表示分组的名字

aggregations.buckets.doc_count: 表示该分组下有几个文档

aggregations.buckets. avg_metric: 针对该bucket分组聚合下的metric度量聚合名字

aggregations .buckets. avg_metric.value: 度量聚合的结果值    

 

另外如果我们只想返回聚合结果,可以设置size=0

 

五 多层下钻聚合                 

比如我先按照品牌时间聚合,然后每一年再按照品牌聚合,然后每一个品牌再按照价格聚合

POST/ecommerce/music/_search

{

    "size":0,

    "aggs":{

        "date_aggs":{

            "date_histogram":{

               "field":"c_date",

               "interval":"year",

               "format":"yyyy"

            },

            "aggs":{

                "brands":{

                    "terms":{"field":"color.keyword"},

                    "aggs":{

                        "prices":{

                           "histogram":{"field":"price","interval":2000}

                        }

                    }

                }

            }

        }

    }

}

 

原创粉丝点击