Elasticsearch笔记-聚合
来源:互联网 发布:网络打印机的ip地址 编辑:程序博客网 时间:2024/06/05 14:55
本篇我们讨论ES的聚合功能,聚合可以对数据进行复杂的统计分析,作用类似于SQL中的group by
,不过其统计功能更灵活,更强大。
在讲解前先填充些数据,posts索引的article类型中目前含有以下数据
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 1.0, "hits" : [ { "_index" : "posts", "_type" : "article", "_id" : "5", "_score" : 1.0, "_source" : { "id" : 5, "name" : "生活日志", "author" : "wthfeng", "date" : "2015-09-21", "contents" : "这是日常生活的记录", "readNum" : 100 } }, { "_index" : "posts", "_type" : "article", "_id" : "8", "_score" : 1.0, "_source" : { "name" : "ES笔记2", "author" : "hefeng", "contents" : "ES 的 search ", "date" : "2016-10-23", "readNum" : 40 } }, { "_index" : "posts", "_type" : "article", "_id" : "2", "_score" : 1.0, "_source" : { "id" : 2, "name" : "更新后的文档", "author" : "wthfeng", "date" : "2016-10-23", "contents" : "这是我的javascript学习笔记", "brief" : "简介,这是新加的字段", "readNum" : 200 } }, { "_index" : "posts", "_type" : "article", "_id" : "4", "_score" : 1.0, "_source" : { "id" : 4, "name" : "javascript指南", "author" : "wthfeng", "date" : "2016-09-21", "contents" : "js的权威指南", "readNum" : 200 } }, { "_index" : "posts", "_type" : "article", "_id" : "6", "_score" : 1.0, "_source" : { "id" : "6", "name" : "java笔记1", "author" : "hefeng", "contents" : "java String info", "date" : "2016-10-21", "readNum" : 12 } }, { "_index" : "posts", "_type" : "article", "_id" : "1", "_score" : 1.0, "_source" : { "id" : 1, "name" : "ES更新过的文档", "author" : "wthfeng", "date" : "2016-10-25", "contents" : "这是更新内容", "readNum" : 200 } }, { "_index" : "posts", "_type" : "article", "_id" : "7", "_score" : 1.0, "_source" : { "id" : "7", "name" : "ES笔记1", "author" : "hefeng", "contents" : "ES search", "date" : "2016-09-21", "readNum" : 100 } } ] }}
我们有7篇文档。下面操作均来自这些数据。
聚合结构
聚合是与query
(查询)、sort
(排序)同等地位的数据操作类型。使用aggs
表示。类似于
{ "query":{}, "aggs":{},}
先来演示一个例子
GET /posts/article/_search?pretty&search_type=count -d @search.json
{ "aggs":{ "readNum_stats":{ "stats":{ "field":"readNum" } } }}
search_type=count指定只返回结果条数,查询语句中stats
表示查询某字段的最值及平均值状况。readNum_stats
为自定义字段,返回结果时将结果放入此字段内。返回结果如下:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "readNum_stats" : { "count" : 7, "min" : 12.0, "max" : 200.0, "avg" : 121.71428571428571, "sum" : 852.0 } }}
返回的聚合结果在aggregations
内,readNum字段的最值、平均值、总和及数量都统计出来了。
聚合类型
聚合类型主要有两种,一种是度量聚合,一种是桶聚合。前面示例为度量结合,主要用于求某字段的统计值(如最值、平均值等);另一种桶聚合则是按条件将数据分组,类似于SQL中的group by
。下面我们一一介绍。
度量聚合
度量聚合类似SQL中sum
、avg
、min
、max
等的作用,生成一个或多个统计项。具体用法如下:
1. min、max、avg、sum聚合
针对给定字段,返回该字段相应统计值。注意这些字段类型需是数值型。
① 求最低的文档阅读量
GET /posts/article/_search?pretty&search_type=count -d @search.json
{ "aggs":{ "minReadNum":{ "min":{ "field":"readNum" } } }}
返回结果
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "minReadNum" : { "value" : 12.0 } }}
② 求总阅读量
GET /posts/article/_search?pretty&search_type=count -d @search.json
{ "aggs": { "sum_ReadNum": { "sum": { "field": "readNum" } } }}
返回结果
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "sum_ReadNum" : { "value" : 852.0 } }}
用法都很简单,这里就不一一列举了。还有一种度量聚合将这些度量值集中一起输出。就是我们上节演示的stats
聚合
2. stats、extended_stats聚合
stats聚合输出指定字段的数目、最大、小值,平均值、总值,extended_stats是stats的扩展,在stats基础上还包括了平方和、方差、标准差等统计值。
GET /posts/article/_search?pretty&search_type=count -d @search.json
{ "aggs": { "stats_of_readNum": { "extended_stats": { "field": "readNum" } } }}
返回结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "stats_of_readNum" : { "count" : 7, "min" : 12.0, "max" : 200.0, "avg" : 121.71428571428571, "sum" : 852.0, "sum_of_squares" : 141744.0, //平方和 "variance" : 5434.775510204081, //方差 "std_deviation" : 73.72092993312063, //标准差 "std_deviation_bounds" : { "upper" : 269.156145580527, "lower" : -25.72757415195555 } } }}
桶聚合
1. terms聚合
terms聚合就类似SQL中的group by
,先看看下面示例:
将文档按作者分类,查询每位作者的文档数
GET /posts/article/_search?pretty&search_type=count -d @search.json
{ "aggs": { "author_aggs": { "terms": { "field": "author" } } }}
返回结果
{ "took" : 125, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "author_aggs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "wthfeng", "doc_count" : 4 }, { "key" : "hefeng", "doc_count" : 3 } ] } }}
由返回结果可知,名为wthfeng的作者有4篇文档,hefeng有3篇文档。用SQL表示则为:
select author,count(*) from article group by author
默认情况下,返回结果按文档数(doc_count
)倒序排序,我们也可以按其正序排序,或使用key
排序。按doc_count
排序应使用_count
,按key
排序应使用_terms
。例按key
正序排列应使用如下查询。
{ "aggs": { "author_aggs": { "terms": { "field": "author", "order":{ "_term":"asc" } } } }}
2. range聚合
range聚合按可以自定义范围将数值类型数据分组。起始值用from
表示(包括边界),终止值用to
表示(不包括边界)。可以给分组起一个便于记忆的自定义的名字,用key
表示。如按阅读量分组:
GET /posts/article/_search?pretty&search_type=count’ -d @search.json
{ "aggs": { "read_docs": { "range": { "field":"readNum", "ranges":[ {"to":50,"key":"less 50"}, {"from":50,"to":100,"key":"50 - 100"}, {"from":100,"to":150,"key":"100 - 150"}, {"from":150,"key":"more than 150"} ] } } }}
返回结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "read_docs" : { "buckets" : [ { "key" : "less 50", "to" : 50.0, "to_as_string" : "50.0", "doc_count" : 2 }, { "key" : "50 - 100", "from" : 50.0, "from_as_string" : "50.0", "to" : 100.0, "to_as_string" : "100.0", "doc_count" : 0 }, { "key" : "100 - 150", "from" : 100.0, "from_as_string" : "100.0", "to" : 150.0, "to_as_string" : "150.0", "doc_count" : 2 }, { "key" : "more than 150", "from" : 150.0, "from_as_string" : "150.0", "doc_count" : 3 } ] } }}
3. date_range聚合
date_range聚合与range用法一致,只是date_range专用于日期聚合。另外,可以使用format
指定日期格式。
GET ‘/posts/article/_search?pretty&search_type=count’
{ "aggs":{ "date_docs":{ "field":"date", "format":"yyyy-MM", "ranges":[ {"key":"before 2016","to":"2016-01"}, {"key":"first half of 2016","from":"2016-01","to":"2016-06"}, {"key":"second half of 2016","from":"2016-06","to":"2016-12"} ] } }}
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 7, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "date_docs" : { "buckets" : [ { "key" : "before 2016", "to" : 1.4516064E12, "to_as_string" : "2016-01", "doc_count" : 1 }, { "key" : "first half of 2016", "from" : 1.4516064E12, "from_as_string" : "2016-01", "to" : 1.4647392E12, "to_as_string" : "2016-06", "doc_count" : 0 }, { "key" : "second half of 2016", "from" : 1.4647392E12, "from_as_string" : "2016-06", "to" : 1.4805504E12, "to_as_string" : "2016-12", "doc_count" : 6 } ] } }}
- Elasticsearch笔记-聚合
- Elasticsearch API聚合查询-笔记
- elasticsearch学习笔记--聚合函数篇
- Elasticsearch聚合
- Elasticsearch]聚合
- ElasticSearch聚合
- ElasticSearch聚合
- Elasticsearch学习笔记2----聚合操作及常见问题解决
- [Elasticsearch] 聚合的测试数据
- Elasticsearch分组聚合-查询
- ElasticSearch聚合aggs入门
- Elasticsearch分析聚合
- ElasticSearch聚合分析API
- Elasticsearch分析聚合
- Elasticsearch分析聚合
- elasticsearch 之Aggregation聚合
- elasticsearch多级聚合查询
- ElasticSearch 地理位置聚合
- GitHub控件之BadgeView(数字提醒)
- 1025. PAT Ranking (25)
- 数据结构---二叉树的详解
- 2016第七季极客大挑战web部分write up
- 数据库的相关知识
- Elasticsearch笔记-聚合
- React Native学习提纲
- 一篇RxJava友好的文章(三)
- 拦截器简单实现
- MySQL 如何查看服务器配置信息
- 求1000以内的完数
- java基础 基本数据类型
- Shadow Maping
- Lua语法小贴士(九)table库