ElasticSearch聚合

来源:互联网 发布:旋转矩阵中6保6 编辑:程序博客网 时间:2024/05/31 19:56

目录:

一、基本概念

二、数据生成

       maven

       Java代码

三、查询方法

Metric 度量聚合

       求平均值,最大值,最小值,和,计数,统计

       百分比聚合

       百分比分级聚合

Matrix 分组聚合

       直方图聚合

       最小文档计数

       排序

       日期直方图聚合

       范围聚合

       过滤聚合

Pipeline 管道聚合

       平均分组聚合管道

       移动平均聚合

       总和累计聚合

       最大和小分组聚合

       统计分组聚合

—————————————————————————————

一、基本概念

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

aggregations
       The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.
         集合框架帮助在查询的基础上聚合数据,它提供一个简单的建筑模块称为【聚合】,用于构建数据的复杂
       An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
         【聚合】被看做为一个unit-of-work,在一系列的document上面进行分析信息。执行的上下文定义了这个文档集。
       There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families:
         有许多不同类型的聚合,每个聚合都有自己的目的和输出。为了更好地理解这些类型,将它们分为三个主要的家庭通常比较容易。
Metric 度量聚合
       Aggregations that keep track and compute metrics over a set of documents.
Matrix 分组聚合
       A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline 管道聚合
       Aggregations that aggregate the output of other aggregations and their associated metrics

二、数据生成

maven

<dependency>    <groupId>org.elasticsearch.client</groupId>    <artifactId>transport</artifactId>    <version>5.2.0</version></dependency>

Java代码

package cn.orcale.com.es;import java.net.InetAddress;import java.util.Random;import org.elasticsearch.action.bulk.BulkRequestBuilder;import org.elasticsearch.action.index.IndexRequestBuilder;import org.elasticsearch.client.transport.TransportClient;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.common.transport.InetSocketTransportAddress;import org.elasticsearch.common.xcontent.XContentFactory;import org.elasticsearch.transport.client.PreBuiltTransportClient;/*** *  * @author yuhui * */public class insetDatas{    @SuppressWarnings({ "resource" })    public static void main(String[] args) throws Exception {            String[] brand = {"奔驰","宝马","宝马Z4","奔驰C300","保时捷","奔奔"};            int[] product_price = {10000,20000,30000,40000};            int[] sale_price = {10000,20000,30000,40000};            String[] sale_date = {"2017-08-21","2017-08-22","2017-08-23","2017-08-24"};            String[] colour = {"white","black","gray","red"};            String[] info = {"我很喜欢","Very Nice","不错, 不错 ","我以后还会来的"};            int num = 0;            Random random = new Random();            Settings settings = Settings.builder().put("cluster.name", "elasticsearch")                    .build();            @SuppressWarnings("unchecked")            TransportClient client = new PreBuiltTransportClient(settings)                    .addTransportAddress(new InetSocketTransportAddress(InetAddress                            .getByName("localhost"), 9300));            BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();            for(int i =0 ;i<1000; i++){                num++;                String brandTemp = brand[random.nextInt(6)];                //插入                IndexRequestBuilder indexRequestBuilder = client.prepareIndex("car_shop", "sales", num+"").setSource(                        XContentFactory.jsonBuilder().startObject()                                .field("num", num)                                .field("brand", brandTemp)                                .field("colour", colour[random.nextInt(4)])                                .field("product_price", product_price[random.nextInt(4)])                                .field("sale_price",  sale_price[random.nextInt(4)])                                .field("sale_date",  sale_date[random.nextInt(4)])                                .field("info",  brandTemp+info[random.nextInt(4)])                                .endObject());                bulkRequestBuilder.add(indexRequestBuilder);            }                           bulkRequestBuilder.get();            System.out.println("插入完成");            client.close();    }   }

三、查询方法

Metric 度量聚合

求平均值,最大值,最小值,和,计数,统计

#求平均值,最大值,最小值,和,计数,统计#在指定的查询范围内内求【求平均值,最大值,最小值,和,计数,统计】GET /car_shop/sales/_search{  "aggs" : {        "avg_grade" : { "avg" : { "field" : "sale_price" } },        "max_price" : { "max" : { "field" : "sale_price" } },        "min_price" : { "min" : { "field" : "sale_price" } },        "intraday_return" : { "sum" : { "field" : "sale_price" } },        "grades_count" : { "value_count" : { "field" : "sale_price" } },        "grades_stats" : { "stats" : { "field" : "sale_price" } }    },    "size": 0}

返回结果如下

  "aggregations": {    "max_price": {      "value": 40000    },    "min_price": {      "value": 10000    },    "grades_stats": {      "count": 1000,      "min": 10000,      "max": 40000,      "avg": 24030,      "sum": 24030000    },    "intraday_return": {      "value": 24030000    },    "grades_count": {      "value": 1000    },    "avg_grade": {      "value": 24030    }  }

百分比聚合

#百分比聚合,"percents":[25,100]按照100等份计算,0是最低值,100是最高值GET /car_shop/sales/_search{  "aggs" : {      "load_time_outlier" : {            "percentiles" : {                "field" : "num" ,"percents":[0,25,100]            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "load_time_outlier": {      "values": {        "0.0": 1,        "25.0": 250.75,        "100.0": 1000      }    }  }

百分比分级聚合

#百分比分级聚合,"values":[5000,10000,30000,40000]指的是范围包括的比例GET /car_shop/sales/_search{  "aggs" : {      "load_time_outlier" : {            "percentile_ranks" : {                "field" : "product_price" ,"values":[5000,10000,30000,40000]            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "load_time_outlier": {      "values": {        "5000.0": 0,        "10000.0": 27.1,        "30000.0": 74.6,        "40000.0": 100      }    }  }

Matrix 分组聚合

直方图聚合

#直方图聚合,"interval":10000是将product_price按照10000等分区间的计数GET /car_shop/sales/_search{  "aggs" : {      "product_price" : {            "histogram" : {                "field" : "product_price" ,"interval":10000            }        }    },    "size": 0}

返回结果如下

"aggregations": {    "product_price": {      "buckets": [        {          "key": 10000,          "doc_count": 278        },        {          "key": 20000,          "doc_count": 251        },        {          "key": 30000,          "doc_count": 225        },        {          "key": 40000,          "doc_count": 246        }      ]    }  }

最小文档计数

#最小文档计数GET /car_shop/sales/_search{  "aggs" : {      "product_price" : {            "histogram" : {                "field" : "product_price" ,"interval":10000,"min_doc_count": 1            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "product_price": {      "buckets": [        {          "key": 10000,          "doc_count": 278        },        {          "key": 20000,          "doc_count": 251        },        {          "key": 30000,          "doc_count": 225        },        {          "key": 40000,          "doc_count": 246        }      ]    }  }

排序

#排序   _key 或者  _countGET /car_shop/sales/_search{  "aggs" : {      "product_price" : {            "histogram" : {                "field" : "product_price" ,"interval":10000,"order": {"_key": "desc"}            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "product_price": {      "buckets": [        {          "key": 40000,          "doc_count": 246        },        {          "key": 30000,          "doc_count": 225        },        {          "key": 20000,          "doc_count": 251        },        {          "key": 10000,          "doc_count": 278        }      ]    }  }

日期直方图聚合

#日期直方图聚合   按天, 按月, 按年GET /car_shop/sales/_search{  "aggs" : {      "articles_over_time" : {            "date_histogram" : {                "field" : "sale_date" ,"interval":"1d","format": "yyyy-MM-dd"            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "articles_over_time": {      "buckets": [        {          "key_as_string": "2017-08-21",          "key": 1503273600000,          "doc_count": 235        },        {          "key_as_string": "2017-08-22",          "key": 1503360000000,          "doc_count": 259        },        {          "key_as_string": "2017-08-23",          "key": 1503446400000,          "doc_count": 256        },        {          "key_as_string": "2017-08-24",          "key": 1503532800000,          "doc_count": 250        }      ]    }  }

范围聚合

#范围聚合GET /car_shop/sales/_search{  "aggs" : {      "product_price" : {            "range" : {                "field" : "product_price" ,"ranges":[                   {"to":10000},                   {"from":10000,"to" :20000},                   {"from":40000}                ]            }        }    },    "size": 0}

返回结果如下

  "aggregations": {    "product_price": {      "buckets": [        {          "key": "*-10000.0",          "to": 10000,          "doc_count": 0        },        {          "key": "10000.0-20000.0",          "from": 10000,          "to": 20000,          "doc_count": 266        },        {          "key": "40000.0-*",          "from": 40000,          "doc_count": 251        }      ]    }  }

过滤聚合

#过滤聚合(所有红色车子的平均价格)GET /car_shop/sales/_search{  "aggs" : {      "car_colour" : {          "filter": {"term": {"colour": "red"}},          "aggs": {"avg_price": {"avg": {"field":"sale_price"}}}        }    },    "size": 0}

返回结果如下

  "aggregations": {    "car_colour": {      "doc_count": 258,      "avg_price": {        "value": 24069.767441860466      }    }  }

Pipeline 管道聚合

平均分组聚合管道

#平均分组聚合管道(求出每天总销售量以及平均每天销售量)#最后的avg_bucket 表示平均分组聚合, sales_per_day>sales是求平均值,是第一个aggs的别称sales_per_day和第二个aggs的别称sales比较,">"是聚合分隔符GET /car_shop/sales/_search{  "size": 0,  "aggs": {    "sales_per_day": {      "date_histogram": {        "field": "sale_date",        "interval": "day"      },      "aggs": {        "sales": {          "sum": {            "field": "sale_price"          }        }      }    },    "avg_day_sales": {      "avg_bucket": {        "buckets_path": "sales_per_day>sales"       }    }  }}

返回结果如下

    "avg_day_sales": {      "value": 6007500    }

移动平均聚合

#移动平均聚合(求总和分组,将所有天的值相加)POST /car_shop/sales/_search{    "size": 0,    "aggs" : {        "sales_per_day" : {            "date_histogram" : {                "field" : "sale_date",                "interval" : "day"            },            "aggs": {                "sales": {                    "sum": {                        "field": "sale_price"                    }                }            }        },        "sum_days_sales": {            "sum_bucket": {                "buckets_path": "sales_per_day>sales"             }        }    }}

返回结果如下

    "sum_days_sales": {      "value": 24030000    }

总和累计聚合

#总和累计聚合(求每天的累计总和,第一天,第一二天,第一二三天,第一二三四天,)POST /car_shop/sales/_search{    "size": 0,    "aggs" : {        "sales_per_day" : {            "date_histogram" : {                "field" : "sale_date",                "interval" : "day"            },            "aggs": {                "sales": {                    "sum": {                        "field": "sale_price"                    }                },            "cumulative_sales": {                  "cumulative_sum": {                      "buckets_path": "sales"                     }                }            }        }     }}

返回结果如下

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "failed": 0  },  "hits": {    "total": 1000,    "max_score": 0,    "hits": []  },  "aggregations": {    "sales_per_day": {      "buckets": [        {          "key_as_string": "2017-08-21T00:00:00.000Z",          "key": 1503273600000,          "doc_count": 235,          "sales": {            "value": 5620000          },          "cumulative_sales": {            "value": 5620000          }        },        {          "key_as_string": "2017-08-22T00:00:00.000Z",          "key": 1503360000000,          "doc_count": 259,          "sales": {            "value": 6150000          },          "cumulative_sales": {            "value": 11770000          }        },        {          "key_as_string": "2017-08-23T00:00:00.000Z",          "key": 1503446400000,          "doc_count": 256,          "sales": {            "value": 6050000          },          "cumulative_sales": {            "value": 17820000          }        },        {          "key_as_string": "2017-08-24T00:00:00.000Z",          "key": 1503532800000,          "doc_count": 250,          "sales": {            "value": 6210000          },          "cumulative_sales": {            "value": 24030000          }        }      ]    }  }}

最大和小分组聚合

#最大和小分组聚合(求出所有天内最大、最小的销售值和日期)POST /car_shop/sales/_search{    "size": 0,    "aggs" : {        "sales_per_day" : {            "date_histogram" : {                "field" : "sale_date",                "interval" : "day"            },            "aggs": {                "sales": {                    "sum": {                        "field": "sale_price"                    }                }            }        } ,            "max_days_sales": {            "max_bucket": {                "buckets_path": "sales_per_day>sales"             }        },         "min_days_sales": {            "min_bucket": {                "buckets_path": "sales_per_day>sales"             }        }    }}

返回结果如下

    "max_days_sales": {      "value": 6210000,      "keys": [        "2017-08-24T00:00:00.000Z"      ]    },    "min_days_sales": {      "value": 5620000,      "keys": [        "2017-08-21T00:00:00.000Z"      ]    }

统计分组聚合

#统计分组聚合(求出所有天内统计包括:最小、最大、平均、总和的销售值和日期)POST /car_shop/sales/_search{    "size": 0,    "aggs" : {        "sales_per_day" : {            "date_histogram" : {                "field" : "sale_date",                "interval" : "day"            },            "aggs": {                "sales": {                    "sum": {                        "field": "sale_price"                    }                }            }        } ,"stats_days_sales": {            "stats_bucket": {                "buckets_path": "sales_per_day>sales"             }        }    }}

返回结果如下

    "stats_days_sales": {      "count": 4,      "min": 5620000,      "max": 6210000,      "avg": 6007500,      "sum": 24030000    }
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 我的世界延迟怎么办 黄金棒打不开怎么办 枪火游侠黑屏怎么办 老虎直播封了怎么办 网卡及驱动异常怎么办 电脑刺激战场卡怎么办 grub 文件兼容性错误怎么办 电脑程序不兼容怎么办 运行程序不兼容怎么办 usb驱动删了怎么办 电脑卡死 点不开怎么办 电脑开机找不到驱动程序怎么办 显卡关了黑屏怎么办 xp全部程序打不开怎么办 七彩凤电脑打不开怎么办 鹦鹉鱼不敢吃食怎么办 甘油三酯1.87怎么办 手机移动数据网打不开怎么办 苹果手机浏览器打不开没网怎么办 玩多人游戏很紧张怎么办 dnf容易掉线怎么办 qq名字改不了怎么办 手被打火机烧伤怎么办 无效的菜单句柄怎么办 网课被发现刷课怎么办 华硕笔记本玩游戏卡怎么办 手机玩游戏卡顿怎么办? 360n4s玩游戏卡怎么办 手机玩游戏卡死怎么办 游戏占用cpu过高怎么办 h1z1 cpu占用高怎么办 电视盒子网速慢怎么办 gta5解压完然后怎么办 火山遭举报了怎么办 火山被恶意举报怎么办 gta5无网络链接怎么办 pdf格式没有解压密码怎么办 电脑感染蠕虫病毒怎么办 手机积分被盗兑怎么办 自动雨伞卡住了怎么办 全自动伞收不了怎么办