关于elasticsearch的先聚合和过滤、先过滤再聚合的详解

来源:互联网 发布:微信秒红包软件 编辑:程序博客网 时间:2024/06/16 11:34
对于elasticsearch的聚合和过滤,他的结果并不会受到你写的顺序而影响。换句话说就是你无论是在聚合语句的前面写过滤条件,还是在过滤语句后面写过滤条件都不会影响他的结果。他都会先过滤再聚合和关系数据库一样先where后group by。
但是如果你想过滤条件不影响聚合(agg)结果,而只是改变hits结果;可以使用setPostFilter() 这个方法

eg:全部数据
代码:
SearchResponse response = null;
SearchRequestBuilder responsebuilder = client.prepareSearch("company")
.setTypes("employee").setFrom(0).setSize(250);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("age") ;
response = responsebuilder
.addAggregation(aggregation)
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();
Terms agg = response.getAggregations().get("agg");
结果: 仅聚合结果不过滤(注意看hits和agg里的结果)
{
    "took":100,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":7,
        "max_score":1,
        "hits":[
            {
                "_shard":1,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"5",
                "_score":1,
                "_source":{
                    "name":"Fresh",
                    "age":22
                },
                "_explanation":Object{...}
            },
            {
                "_shard":1,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"10",
                "_score":1,
                "_source":{
                    "name":"Henrry",
                    "age":30
                },
                "_explanation":Object{...}
            },
            {
                "_shard":1,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"9",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"jiangsu",
                        "city":"nanjing",
                        "area":{
                            "pos":"10001"
                        }
                    }
                },
                "_explanation":Object{...}
            },
            {
                "_shard":2,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"2",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"jiangsu",
                        "city":"nanjing"
                    },
                    "name":"jack_1",
                    "age":19,
                    "join_date":"2016-01-01"
                },
                "_explanation":Object{...}
            },
            {
                "_shard":2,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"4",
                "_score":1,
                "_source":{
                    "name":"willam",
                    "age":18
                },
                "_explanation":Object{...}
            },
            {
                "_shard":2,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"6",
                "_score":1,
                "_source":{
                    "name":"Avivi",
                    "age":30
                },
                "_explanation":Object{...}
            },
            {
                "_shard":4,
                "_node":"K7qK1ncMQUuIe0K6VSVMJA",
                "_index":"company",
                "_type":"employee",
                "_id":"3",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"shanxi",
                        "city":"xian"
                    },
                    "name":"marry",
                    "age":35,
                    "join_date":"2015-01-01"
                },
                "_explanation":Object{...}
            }
        ]
    },
    "aggregations":{
        "agg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":30,
                    "doc_count":2
                },
                {
                    "key":18,
                    "doc_count":1
                },
                {
                    "key":19,
                    "doc_count":1
                },
                {
                    "key":22,
                    "doc_count":1
                },
                {
                    "key":35,
                    "doc_count":1
                }
            ]
        }
    }
}
1、setQuery() 写在前面
代码:
SearchResponse response = null;
SearchRequestBuilder responsebuilder = client.prepareSearch("company")
.setTypes("employee").setFrom(0).setSize(250);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("age") ;
response = responsebuilder
.setQuery(QueryBuilders.rangeQuery("age").gt(30).lt(40))
.addAggregation(aggregation)
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();
Terms agg = response.getAggregations().get("agg");
结果:
{
    "took":538,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1,
        "hits":[
            {
                "_shard":4,
                "_node":"anlkGjjuQ0G6DODpZgiWrQ",
                "_index":"company",
                "_type":"employee",
                "_id":"3",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"shanxi",
                        "city":"xian"
                    },
                    "name":"marry",
                    "age":35,
                    "join_date":"2015-01-01"
                },
                "_explanation":Object{...}
            }
        ]
    },
    "aggregations":{
        "agg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":35,
                    "doc_count":1
                }
            ]
        }
    }
}

2、setQuery() 写在后面
代码:
SearchResponse response = null;
SearchRequestBuilder responsebuilder = client.prepareSearch("company")
.setTypes("employee").setFrom(0).setSize(250);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("age") ;
response = responsebuilder
.addAggregation(aggregation)
.setQuery(QueryBuilders.rangeQuery("age").gt(30).lt(40)
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();
Terms agg = response.getAggregations().get("agg");
结果:
    "took":538,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1,
        "hits":[
            {
                "_shard":4,
                "_node":"anlkGjjuQ0G6DODpZgiWrQ",
                "_index":"company",
                "_type":"employee",
                "_id":"3",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"shanxi",
                        "city":"xian"
                    },
                    "name":"marry",
                    "age":35,
                    "join_date":"2015-01-01"
                },
                "_explanation":Object{...}
            }
        ]
    },
    "aggregations":{
        "agg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":35,
                    "doc_count":1
                }
            ]
        }
    }
}


3、setPostFilter() 在聚合.aggAggregation()方法后
代码:
SearchResponse response = null;
SearchRequestBuilder responsebuilder = client.prepareSearch("company")
.setTypes("employee").setFrom(0).setSize(250);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("age") ;
response = responsebuilder
.addAggregation(aggregation)
.setPostFilter(QueryBuilders.rangeQuery("age").gt(30).lt(40))
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();
Terms agg = response.getAggregations().get("agg");
结果:
{
    "took":7,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1,
        "hits":[
            {
                "_shard":4,
                "_node":"fvp3NBT5R5i6CqN3y2LU4g",
                "_index":"company",
                "_type":"employee",
                "_id":"3",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"shanxi",
                        "city":"xian"
                    },
                    "name":"marry",
                    "age":35,
                    "join_date":"2015-01-01"
                },
                "_explanation":Object{...}
            }
        ]
    },
    "aggregations":{
        "agg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":30,
                    "doc_count":2
                },
                {
                    "key":18,
                    "doc_count":1
                },
                {
                    "key":19,
                    "doc_count":1
                },
                {
                    "key":22,
                    "doc_count":1
                },
                {
                    "key":35,
                    "doc_count":1
                }
            ]
        }
    }
}

4、setPostFilter() 在聚合.aggAggregation()方法前
代码:
SearchResponse response = null;
SearchRequestBuilder responsebuilder = client.prepareSearch("company")
.setTypes("employee").setFrom(0).setSize(250);
AggregationBuilder aggregation = AggregationBuilders
.terms("agg")
.field("age") ;
response = responsebuilder
.setPostFilter(QueryBuilders.rangeQuery("age").gt(30).lt(40))
.addAggregation(aggregation)
.setExplain(true).execute().actionGet();
SearchHits hits = response.getHits();
Terms agg = response.getAggregations().get("agg");
结果:
{
    "took":5115,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1,
        "hits":[
            {
                "_shard":4,
                "_node":"b8cNIO5cQr2MmsnsuluoNQ",
                "_index":"company",
                "_type":"employee",
                "_id":"3",
                "_score":1,
                "_source":{
                    "address":{
                        "country":"china",
                        "province":"shanxi",
                        "city":"xian"
                    },
                    "name":"marry",
                    "age":35,
                    "join_date":"2015-01-01"
                },
                "_explanation":Object{...}
            }
        ]
    },
    "aggregations":{
        "agg":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[
                {
                    "key":30,
                    "doc_count":2
                },
                {
                    "key":18,
                    "doc_count":1
                },
                {
                    "key":19,
                    "doc_count":1
                },
                {
                    "key":22,
                    "doc_count":1
                },
                {
                    "key":35,
                    "doc_count":1
                }
            ]
        }
    }
}

总结:
可以从运行的结果很好的看出无论是setPostFilter()还是setQuery(),它放在那的顺序并不会影响他的结果。更可以看出setQuery()这个方法的过滤条件不仅会影响它的hits的结果还会影响他的聚合(agg)结果。然而对于setPostFilter()这个方法,它只会影响hits的结果,并不会影响它的聚合(agg)结果。
阅读全文
0 0
原创粉丝点击