Elasticsearch之分析。

来源:互联网 发布:阿里云邮箱域名解析 编辑:程序博客网 时间:2024/06/03 19:20

Elasticsearch有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY,但是功能更强大。

举个例子,让我们找到所有职员中共同点(兴趣爱好)是什么:

        GET /megacorp/employee/_search

         {

            "aggs" : {

               "all_interests" : {

                   "terms" : { "field" : "interests" }

               }

            }

         }

        暂时先忽略语法只看查询结果:

        {

            ...

            "hits" : {...},

            "aggregations" : {

              "all_interests" : {

                  "buckets" : [

                      {

                      "key" : "music",

                      "doc_count" : 2

                       },

                     {

                      "key" : "sports",

                      "doc_count" : 1

                       }

                  ]

              }

            }

         }

       我们可以看到两个职员对音乐有兴趣,一个喜欢运动。这些数据并没有被先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。


找到所有姓“Smith”的人最大的共同点(兴趣爱好)。

        GET      /megacorp/employee/_search

       {

          "query" : {

              "match" : {

                  "last_name" : "smith"

               }

          },

          "aggs" : {

               "all_interests" : {

                    "terms" : {

                           "field" : "interests"

                     }

                }

           }

       }


聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:

        GET /megacorp/employee/_search

        {

              "aggs" : {

                    "all_interests" : {

                          "terms" : {"field" : "interests" },

                          "aggs" : {

                                "avg_age" : {

                                    "avg" : {"field" : "age" }

                                 }

                           }

                     }

               }

         }

        当然这次返回的聚合结果有些复杂,但仍然很容易理解:

     ...

"all_interests": {
         "buckets": [
            {
               "key": "music",
               "doc_count": 2,
               "avg_age": {
                  "value": 28.5
               }
            },
            {
               "key": "sport",
               "doc_count": 1,
               "avg_age": {
                  "value": 25
               }
            },
         ]
      }

      该聚合结果比之前的聚合结果更加丰富。我们依然得到了兴趣以及数量(指具有该兴趣的员工人数)的列表,但是现在每个兴趣额外拥有avg_age字段来显示具有该兴趣员工的平均年龄。