ElasticSearch聚合
来源:互联网 发布:旋转矩阵中6保6 编辑:程序博客网 时间:2024/05/31 19:56
目录:
一、基本概念
二、数据生成
maven
Java代码
三、查询方法
Metric 度量聚合
求平均值,最大值,最小值,和,计数,统计
百分比聚合
百分比分级聚合
Matrix 分组聚合
直方图聚合
最小文档计数
排序
日期直方图聚合
范围聚合
过滤聚合
Pipeline 管道聚合
平均分组聚合管道
移动平均聚合
总和累计聚合
最大和小分组聚合
统计分组聚合
—————————————————————————————
一、基本概念
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
aggregations
The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.
集合框架帮助在查询的基础上聚合数据,它提供一个简单的建筑模块称为【聚合】,用于构建数据的复杂
An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
【聚合】被看做为一个unit-of-work,在一系列的document上面进行分析信息。执行的上下文定义了这个文档集。
There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families:
有许多不同类型的聚合,每个聚合都有自己的目的和输出。为了更好地理解这些类型,将它们分为三个主要的家庭通常比较容易。
Metric 度量聚合
Aggregations that keep track and compute metrics over a set of documents.
Matrix 分组聚合
A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline 管道聚合
Aggregations that aggregate the output of other aggregations and their associated metrics
二、数据生成
maven
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>transport</artifactId> <version>5.2.0</version></dependency>
Java代码
package cn.orcale.com.es;import java.net.InetAddress;import java.util.Random;import org.elasticsearch.action.bulk.BulkRequestBuilder;import org.elasticsearch.action.index.IndexRequestBuilder;import org.elasticsearch.client.transport.TransportClient;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.common.transport.InetSocketTransportAddress;import org.elasticsearch.common.xcontent.XContentFactory;import org.elasticsearch.transport.client.PreBuiltTransportClient;/*** * * @author yuhui * */public class insetDatas{ @SuppressWarnings({ "resource" }) public static void main(String[] args) throws Exception { String[] brand = {"奔驰","宝马","宝马Z4","奔驰C300","保时捷","奔奔"}; int[] product_price = {10000,20000,30000,40000}; int[] sale_price = {10000,20000,30000,40000}; String[] sale_date = {"2017-08-21","2017-08-22","2017-08-23","2017-08-24"}; String[] colour = {"white","black","gray","red"}; String[] info = {"我很喜欢","Very Nice","不错, 不错 ","我以后还会来的"}; int num = 0; Random random = new Random(); Settings settings = Settings.builder().put("cluster.name", "elasticsearch") .build(); @SuppressWarnings("unchecked") TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new InetSocketTransportAddress(InetAddress .getByName("localhost"), 9300)); BulkRequestBuilder bulkRequestBuilder = client.prepareBulk(); for(int i =0 ;i<1000; i++){ num++; String brandTemp = brand[random.nextInt(6)]; //插入 IndexRequestBuilder indexRequestBuilder = client.prepareIndex("car_shop", "sales", num+"").setSource( XContentFactory.jsonBuilder().startObject() .field("num", num) .field("brand", brandTemp) .field("colour", colour[random.nextInt(4)]) .field("product_price", product_price[random.nextInt(4)]) .field("sale_price", sale_price[random.nextInt(4)]) .field("sale_date", sale_date[random.nextInt(4)]) .field("info", brandTemp+info[random.nextInt(4)]) .endObject()); bulkRequestBuilder.add(indexRequestBuilder); } bulkRequestBuilder.get(); System.out.println("插入完成"); client.close(); } }
三、查询方法
Metric 度量聚合
求平均值,最大值,最小值,和,计数,统计
#求平均值,最大值,最小值,和,计数,统计#在指定的查询范围内内求【求平均值,最大值,最小值,和,计数,统计】GET /car_shop/sales/_search{ "aggs" : { "avg_grade" : { "avg" : { "field" : "sale_price" } }, "max_price" : { "max" : { "field" : "sale_price" } }, "min_price" : { "min" : { "field" : "sale_price" } }, "intraday_return" : { "sum" : { "field" : "sale_price" } }, "grades_count" : { "value_count" : { "field" : "sale_price" } }, "grades_stats" : { "stats" : { "field" : "sale_price" } } }, "size": 0}
返回结果如下
"aggregations": { "max_price": { "value": 40000 }, "min_price": { "value": 10000 }, "grades_stats": { "count": 1000, "min": 10000, "max": 40000, "avg": 24030, "sum": 24030000 }, "intraday_return": { "value": 24030000 }, "grades_count": { "value": 1000 }, "avg_grade": { "value": 24030 } }
百分比聚合
#百分比聚合,"percents":[25,100]按照100等份计算,0是最低值,100是最高值GET /car_shop/sales/_search{ "aggs" : { "load_time_outlier" : { "percentiles" : { "field" : "num" ,"percents":[0,25,100] } } }, "size": 0}
返回结果如下
"aggregations": { "load_time_outlier": { "values": { "0.0": 1, "25.0": 250.75, "100.0": 1000 } } }
百分比分级聚合
#百分比分级聚合,"values":[5000,10000,30000,40000]指的是范围包括的比例GET /car_shop/sales/_search{ "aggs" : { "load_time_outlier" : { "percentile_ranks" : { "field" : "product_price" ,"values":[5000,10000,30000,40000] } } }, "size": 0}
返回结果如下
"aggregations": { "load_time_outlier": { "values": { "5000.0": 0, "10000.0": 27.1, "30000.0": 74.6, "40000.0": 100 } } }
Matrix 分组聚合
直方图聚合
#直方图聚合,"interval":10000是将product_price按照10000等分区间的计数GET /car_shop/sales/_search{ "aggs" : { "product_price" : { "histogram" : { "field" : "product_price" ,"interval":10000 } } }, "size": 0}
返回结果如下
"aggregations": { "product_price": { "buckets": [ { "key": 10000, "doc_count": 278 }, { "key": 20000, "doc_count": 251 }, { "key": 30000, "doc_count": 225 }, { "key": 40000, "doc_count": 246 } ] } }
最小文档计数
#最小文档计数GET /car_shop/sales/_search{ "aggs" : { "product_price" : { "histogram" : { "field" : "product_price" ,"interval":10000,"min_doc_count": 1 } } }, "size": 0}
返回结果如下
"aggregations": { "product_price": { "buckets": [ { "key": 10000, "doc_count": 278 }, { "key": 20000, "doc_count": 251 }, { "key": 30000, "doc_count": 225 }, { "key": 40000, "doc_count": 246 } ] } }
排序
#排序 _key 或者 _countGET /car_shop/sales/_search{ "aggs" : { "product_price" : { "histogram" : { "field" : "product_price" ,"interval":10000,"order": {"_key": "desc"} } } }, "size": 0}
返回结果如下
"aggregations": { "product_price": { "buckets": [ { "key": 40000, "doc_count": 246 }, { "key": 30000, "doc_count": 225 }, { "key": 20000, "doc_count": 251 }, { "key": 10000, "doc_count": 278 } ] } }
日期直方图聚合
#日期直方图聚合 按天, 按月, 按年GET /car_shop/sales/_search{ "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "sale_date" ,"interval":"1d","format": "yyyy-MM-dd" } } }, "size": 0}
返回结果如下
"aggregations": { "articles_over_time": { "buckets": [ { "key_as_string": "2017-08-21", "key": 1503273600000, "doc_count": 235 }, { "key_as_string": "2017-08-22", "key": 1503360000000, "doc_count": 259 }, { "key_as_string": "2017-08-23", "key": 1503446400000, "doc_count": 256 }, { "key_as_string": "2017-08-24", "key": 1503532800000, "doc_count": 250 } ] } }
范围聚合
#范围聚合GET /car_shop/sales/_search{ "aggs" : { "product_price" : { "range" : { "field" : "product_price" ,"ranges":[ {"to":10000}, {"from":10000,"to" :20000}, {"from":40000} ] } } }, "size": 0}
返回结果如下
"aggregations": { "product_price": { "buckets": [ { "key": "*-10000.0", "to": 10000, "doc_count": 0 }, { "key": "10000.0-20000.0", "from": 10000, "to": 20000, "doc_count": 266 }, { "key": "40000.0-*", "from": 40000, "doc_count": 251 } ] } }
过滤聚合
#过滤聚合(所有红色车子的平均价格)GET /car_shop/sales/_search{ "aggs" : { "car_colour" : { "filter": {"term": {"colour": "red"}}, "aggs": {"avg_price": {"avg": {"field":"sale_price"}}} } }, "size": 0}
返回结果如下
"aggregations": { "car_colour": { "doc_count": 258, "avg_price": { "value": 24069.767441860466 } } }
Pipeline 管道聚合
平均分组聚合管道
#平均分组聚合管道(求出每天总销售量以及平均每天销售量)#最后的avg_bucket 表示平均分组聚合, sales_per_day>sales是求平均值,是第一个aggs的别称sales_per_day和第二个aggs的别称sales比较,">"是聚合分隔符GET /car_shop/sales/_search{ "size": 0, "aggs": { "sales_per_day": { "date_histogram": { "field": "sale_date", "interval": "day" }, "aggs": { "sales": { "sum": { "field": "sale_price" } } } }, "avg_day_sales": { "avg_bucket": { "buckets_path": "sales_per_day>sales" } } }}
返回结果如下
"avg_day_sales": { "value": 6007500 }
移动平均聚合
#移动平均聚合(求总和分组,将所有天的值相加)POST /car_shop/sales/_search{ "size": 0, "aggs" : { "sales_per_day" : { "date_histogram" : { "field" : "sale_date", "interval" : "day" }, "aggs": { "sales": { "sum": { "field": "sale_price" } } } }, "sum_days_sales": { "sum_bucket": { "buckets_path": "sales_per_day>sales" } } }}
返回结果如下
"sum_days_sales": { "value": 24030000 }
总和累计聚合
#总和累计聚合(求每天的累计总和,第一天,第一二天,第一二三天,第一二三四天,)POST /car_shop/sales/_search{ "size": 0, "aggs" : { "sales_per_day" : { "date_histogram" : { "field" : "sale_date", "interval" : "day" }, "aggs": { "sales": { "sum": { "field": "sale_price" } }, "cumulative_sales": { "cumulative_sum": { "buckets_path": "sales" } } } } }}
返回结果如下
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1000, "max_score": 0, "hits": [] }, "aggregations": { "sales_per_day": { "buckets": [ { "key_as_string": "2017-08-21T00:00:00.000Z", "key": 1503273600000, "doc_count": 235, "sales": { "value": 5620000 }, "cumulative_sales": { "value": 5620000 } }, { "key_as_string": "2017-08-22T00:00:00.000Z", "key": 1503360000000, "doc_count": 259, "sales": { "value": 6150000 }, "cumulative_sales": { "value": 11770000 } }, { "key_as_string": "2017-08-23T00:00:00.000Z", "key": 1503446400000, "doc_count": 256, "sales": { "value": 6050000 }, "cumulative_sales": { "value": 17820000 } }, { "key_as_string": "2017-08-24T00:00:00.000Z", "key": 1503532800000, "doc_count": 250, "sales": { "value": 6210000 }, "cumulative_sales": { "value": 24030000 } } ] } }}
最大和小分组聚合
#最大和小分组聚合(求出所有天内最大、最小的销售值和日期)POST /car_shop/sales/_search{ "size": 0, "aggs" : { "sales_per_day" : { "date_histogram" : { "field" : "sale_date", "interval" : "day" }, "aggs": { "sales": { "sum": { "field": "sale_price" } } } } , "max_days_sales": { "max_bucket": { "buckets_path": "sales_per_day>sales" } }, "min_days_sales": { "min_bucket": { "buckets_path": "sales_per_day>sales" } } }}
返回结果如下
"max_days_sales": { "value": 6210000, "keys": [ "2017-08-24T00:00:00.000Z" ] }, "min_days_sales": { "value": 5620000, "keys": [ "2017-08-21T00:00:00.000Z" ] }
统计分组聚合
#统计分组聚合(求出所有天内统计包括:最小、最大、平均、总和的销售值和日期)POST /car_shop/sales/_search{ "size": 0, "aggs" : { "sales_per_day" : { "date_histogram" : { "field" : "sale_date", "interval" : "day" }, "aggs": { "sales": { "sum": { "field": "sale_price" } } } } ,"stats_days_sales": { "stats_bucket": { "buckets_path": "sales_per_day>sales" } } }}
返回结果如下
"stats_days_sales": { "count": 4, "min": 5620000, "max": 6210000, "avg": 6007500, "sum": 24030000 }
- Elasticsearch聚合
- Elasticsearch]聚合
- ElasticSearch聚合
- ElasticSearch聚合
- [Elasticsearch] 聚合的测试数据
- Elasticsearch分组聚合-查询
- ElasticSearch聚合aggs入门
- Elasticsearch笔记-聚合
- Elasticsearch分析聚合
- ElasticSearch聚合分析API
- Elasticsearch分析聚合
- Elasticsearch分析聚合
- elasticsearch 之Aggregation聚合
- elasticsearch多级聚合查询
- ElasticSearch 地理位置聚合
- Elasticsearch聚合查询
- Elasticsearch聚合 之 Range区间聚合
- Elasticsearch聚合 之 DateRange日期范围聚合
- JAVA常见的数据结构和算法
- 在AWS的EC2上创建root用户,并使用root用户登录
- SQLSERVER存储过程基本语法
- Text格式的配置表读取
- JavaScript监听键盘事件,组合键事件
- ElasticSearch聚合
- js根据时间戳获取格式化日期
- Handler,Looper用法和主线程子线程间通信
- Service的启动模式以及onStartCommand的重载
- PHP自定义cUrl函数(http_Curl)
- matlab中global的用法
- mtk手机芯片资料详解和问题解决方案
- [初学笔记] fopen fclose fprintf fileparts, load & save,whos & struct
- C++之(pair)用法总结