企业级搜索elasticsearch应用02-elasticsearch搜索
来源:互联网 发布:如何评价詹姆斯知乎 编辑:程序博客网 时间:2024/05/22 04:31
一 。使用命令搜索
1》uri搜索(参考https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html)
uri搜索表示将查询以及操作的动作置于uri参数中
为了方便搜索 添加测试数据(/root/my.json)
{"index":{"_id":"1"}}{"id":"1","country":"美国","provice":"加利福尼亚州","city":"旧金山","age":"30","name":"John","desc":"John is come from austrina John,s Dad is Johh Super"} {"index":{"_id":"2"}}{"id":"2","country":"美国","provice":"加利福尼亚州","city":"好莱坞","age":"40","name":"Mike","desc":"Mike is come from austrina Mike,s Dad is Mike Super"} {"index":{"_id":"3"}}{"id":"3","country":"美国","provice":"加利福尼亚州","city":"圣地牙哥","age":"50","name":"Cherry","desc":"Cherry is come from austrina Cherry,s Dad is Cherry Super"} {"index":{"_id":"4"}}{"id":"4","country":"美国","provice":"德克萨斯州","city":"休斯顿","age":"60","name":"Miya","desc":"Miya is come from austrina Miya,s Dad is Miya Super"} {"index":{"_id":"5"}}{"id":"5","country":"美国","provice":"德克萨斯州","city":"大学城","age":"70","name":"fubos","desc":"fubos is come from austrina fubos,s Dad is fubos Super"} {"index":{"_id":"6"}}{"id":"6","country":"美国","provice":"德克萨斯州","city":"麦亚伦","age":"20","name":"marry","desc":"marry is come from austrina marry,s Dad is marry Super"} {"index":{"_id":"7"}}{"id":"7","country":"中国","provice":"湖南省","city":"长沙市","age":"18","name":"张三","desc":"张三来自长沙市 是公务员一名"} {"index":{"_id":"8"}}{"id":"8","country":"中国","provice":"湖南省","city":"岳阳市","age":"15","name":"李四","desc":"李四来自岳阳市 是一名清洁工"} {"index":{"_id":"9"}}{"id":"9","country":"中国","provice":"湖南省","city":"株洲市","age":"33","name":"李光四","desc":"李光四 老家岳阳市 来自株洲 是李四的侄子"} {"index":{"_id":"10"}}{"id":"10","country":"中国","provice":"广东省","city":"深圳市","age":"67","name":"王五","desc":"王五来自深圳市 是来自深圳的一名海关缉私精英"} {"index":{"_id":"11"}}{"id":"11","country":"中国","provice":"广东省","city":"广州市","age":"89","name":"王冠宇","desc":"王冠宇是王五的儿子"}使用bulkapi 来批量导入
cd /root && curl -XPOST '192.168.58.147:9200/user/info/_bulk?pretty' --data-binary @my.json
可以使用 _search api来实现文档的搜索 比如 (q=查询的列:值)
[root@node1 ~]# curl -XGET 'http://192.168.58.147:9200/_search?q=name=marry&pretty'{ "took" : 42, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.6127073, "hits" : [ { "_index" : "user", "_type" : "info", "_id" : "6", "_score" : 1.6127073, "_source" : { "id" : "6", "country_s" : "美国", "provice_s" : "德克萨斯州", "city_s" : "麦亚伦", "age_i" : "20", "name_s" : "marry", "desc_s" : "marry is come from austrina marry,s Dad is marry Super" } } ] }}查询所有用户索引中 年龄是50的
curl -XGET 'http://192.168.58.147:9200/user/_search?q=age:50&pretty'
查询所有用户索引库中 用户信息类型 用户资产类型 年龄是30的 多个索引或者多个类型使用,隔开
curl -XGET 'http://192.168.58.147:9200/student/info,money/_search?q=age:30&pretty'
查询所有索引库中(_all)类型是info的年龄是30的用户
curl -XGET 'http://192.168.58.147:9200/_all/info/_search?q=age:30&pretty'
使用from指定从第几条开始查 size返回的结果数 默认from=0 size=10
curl -XGET 'http://192.168.58.147:9200/user/_search?q=age:30&from=0&size=2&pretty'
常用的查询参数如下:
Name Description
q
表示查询字符串df
在查询中,当没有定义字段的前缀的情况下的默认字段前缀analyzer
当分析查询字符串时,分析器的名字default_operator
被用到的默认操作,有AND
和OR
两种,默认是OR
explain
对于每一个命中(hit),对怎样得到命中得分的计算给出一个解释_source
将其设置为false,查询就会放弃检索_source
字段。你也可以通过设置_source_include
和_source_exclude
检索部分文档fields
命中的文档返回的字段sort
排序执行。可以以fieldName
、fieldName:asc
或者fieldName:desc
的格式设置。fieldName
既可以是存在的字段,也可以是_score
字段。可以有多个sort参数track_scores
当排序的时候,将其设置为true,可以返回相关度得分timeout
默认没有timeoutfrom
默认是0size
默认是10search_type
搜索操作执行的类型,有dfs_query_then_fetch
, dfs_query_and_fetch
, query_then_fetch
, query_and_fetch
, count
, scan
几种,默认是query_then_fetch
lowercase_expanded_terms
terms是否自动小写,默认是trueanalyze_wildcard
是否分配通配符和前缀查询,默认是falseterminate_after
The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, the response will have a boolean field terminated_early to indicate whether the query execution has actually terminated_early. Defaults to no terminate_after.指定使用中文名查询时 都无法查询数据 应该是分词器 分词的问题
curl -XGET 'http://192.168.58.147:9200/user/info/_search?q=name=张&pretty' curl -XGET 'http://192.168.58.147:9200/user/info/_search?q=name=张三&pretty'测试默认分词器的分词功能
[root@node1 ~]# curl -XPOST 'http://192.168.58.147:9200/_analyze?pretty' -d '> {> "tokenizer": "standard",> "text": "我是饺子"> }';{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "<IDEOGRAPHIC>", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "饺", "start_offset" : 2, "end_offset" : 3, "type" : "<IDEOGRAPHIC>", "position" : 2 }, { "token" : "子", "start_offset" : 3, "end_offset" : 4, "type" : "<IDEOGRAPHIC>", "position" : 3 } ]}
既然是逐个分词 q=name:张 应该可以查出来啊 奇怪 后来想到http协议有可能在url上带参数 中文乱码 不使用url方式 使用requestbody的方式试试
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "term" : { "name" : "三" } }}'发现这样搜索 就是搜索张或者三都能出结果 输入张三无法出结果 确实是standard分词器的特点
实际上 张三 就是一个词 这里推荐使用ik分词器的分词 分成的词 具有一定的意义(比如我是中国人 拆分成 我 是 中国人 中国等有意义的词)
设置ik分词器 (IK插件地址 https://github.com/medcl/elasticsearch-analysis-ik)
我这里5.6.4的es下载对应 5.6.4的ikcd /home/es/elasticsearch-5.6.4 && ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.4/elasticsearch-analysis-ik-5.6.4.zip安装完成后 检查plugins目录
[es@node1 elasticsearch-5.6.4]$ cd plugins/[es@node1 plugins]$ lltotal 4drwxr-xr-x 2 es es 4096 Dec 5 18:49 analysis-ik重启 elasticsearch
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name -Enetwork.host=192.168.58.147ik 带有两个分词器
ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有
测试
[es@node1 plugins]$ curl -XGET 'http://192.168.58.147:9200/_analyze?pretty&analyzer=ik_max_word' -d '我是中国人'{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "中国人", "start_offset" : 2, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 }, { "token" : "中国", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 3 }, { "token" : "国人", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 4 } ]}[es@node1 plugins]$ curl -XGET 'http://192.168.58.147:9200/_analyze?pretty&analyzer=ik_smart' -d '我是中国人' { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "中国人", "start_offset" : 2, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 } ]}删除索引user 重新创建索引 user
curl -XDELETE 'http://192.168.58.147:9200/user?pretty' curl -XPUT 'http://192.168.58.147:9200/user?pretty' -d '{ "settings" : { "analysis" : { "analyzer" : { "ik" : { "tokenizer" : "ik_max_word" } } } }, "mappings" : { "info" : { "dynamic" : true, "properties" : { "name" : { "type" : "string", "analyzer" : "ik_max_word" },"desc" : { "type" : "string", "analyzer" : "ik_max_word" } } } }}';上面的创建 表示设置 user的分词器是ik mapping用于设置 类型和属性的类型 以及他的分词器
info类型下的name_s和desc_s使用分词器
再次进入/root导入之前的数据文件
cd /root && curl -XPOST '192.168.58.147:9200/user/info/_bulk?pretty' --data-binary @my.json
测试分词器的效果 发现张 无法查询了 张三才能查询
[root@node1 ~]# curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "term" : { "name" : "张" } }}';{ "took" : 22, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }}
2》请求体搜索
》》term 词查询
中文搜索只能使用请求体搜索 否则无法搜索 所以一般搜索都是使用请求体 简单测试使用uri搜索 一般请求体中使用dsl语言 比如
查询所有数据
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { }}';查询包含词 name字段=三的
"query" : { "term" : { "name" : "三" } }和query同级 支持以下参数
Name Description
timeout
默认没有timeoutfrom
默认是0size
默认是10search_type
搜索操作执行的类型,有dfs_query_then_fetch
, dfs_query_and_fetch
, query_then_fetch
, query_and_fetch
, count
, scan
几种,默认是query_then_fetch
query_cache
当?search_type=count
时,查询结果是否缓存terminate_after
The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, the response will have a boolean field terminated_early to indicate whether the query execution has actually terminated_early. Defaults to no terminate_after.from和size 设置分页 order设置排序(注意请求体的参数不要用tab键代替空格 否则出错)
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "from":0, "size":5, "sort":[{"_score":{"order":"desc"}},{"age":{"order":"desc"}} ], "query" : { "term" : { "desc" : "来自" } }}';按照每个document的得分降序 得分相同的数据再按照age降序排序 我之前导入数据时 age使用的""表示是字符串 一般都会抛出异常
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on表示默认的字符串 不允许排序或者聚合 应该将fielddata设置为true 修改之前创建的映射 添加一个age将fielddata设置为true
"properties" : { "name" : { "type" : "string", "analyzer" : "ik_max_word" },"desc" : { "type" : "string", "analyzer" : "ik_max_word" },"age" : { "type" : "string", "fielddata" : true } }或者 type改成integer
"age" : { "type" : "integer" }_source指定显示哪些列
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "term" : { "desc" : "来自" } }, "_source":["name", "desc"]}';
》》terms 词查询
terms和term功能类似 允许搜索多个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "terms" : { "desc" : ["来自","com"] } }}';
》》match查询
对比term和match区别curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "term" : { "desc" : "李 来自" } }}';curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "match" : { "desc" : "李 来自" } }}';term会认为 李 来自是一个词 去匹配 所有查不到任何数据
match 将包含李 和来自 两个词的数据合并查询
还有另外一个match_phrase和term是一样的认为是一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "match_phrase": { "desc" : "李 来自" } }}'》》bool查询
bool查询可以组合查询 比如 must必须两个条件都满足 必须有李和来自两个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "bool": { "must": [ { "match": { "desc": "李" } }, { "match": { "desc": "来自" } } ] } }}'should 出现任意一个词即可
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "bool": { "should": [ { "match": { "desc": "李" } }, { "match": { "desc": "来自" } } ] } }}';must_not必须没有某些词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "bool": { "must_not": [ { "match": { "desc": "李" } }, { "match": { "desc": "来自" } } ] } }}';也可以组合使用 必须出现词李 不能出现词 来自
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "bool": { "must_not": [ { "match": { "desc": "李" } } ], "must":[{ "match": { "desc": "来自" } }] } }}';
》》regexp正则查询
支持正则匹配模式 例如
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "regexp" : { "desc" : "李.*" } }}';》》prefix前缀查询
以什么词开头 必须是一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "prefix" : { "desc" : "李" } }}';
》》multi_match多字段查询
多个字段等于同一个词
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "multi_match" : { "query" : "李" , "fields":["name","desc"] } }}';》》range范围过滤
范围操作符包含:
- gt :: 大于
- gte:: 大于等于
- lt :: 小于
- lte:: 小于等于
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "range" : { "age" : {"gt":20} } }}';假设需要判断 age>=20 and age<=30
使用range
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query": { "range" : { "age" : {"gte":20,"lte":"30"} } }}';可是使用bool
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query":{ "bool":{ "must": { "range" : { "age" : {"gte":20} } }, "must": { "range" : { "age" : {"lte":30} } } } }}';》》exists 和 missing 过滤
exists 和 missing 过滤可以用于查找文档中是否包含指定字段或没有某个字段
比如 查询所有带有age字段的doc
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query":{ "exists":{"field":"age"}}}';查询所有没有 age字段的doc
miising已经过期 替代的是使用bool+exists
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query":{ "bool":{ "must_not": { "exists" : { "field":"age" } } } }}';
3》过滤
每次的搜索返回的结果中都有一个_score的字段 这个得分是指定的搜索查询匹配程度的一个相对度量。得分越高,文档越相关,
得分越低文档的相关度越低。
Elasticsearch中的所有的查询都会触发相关度得分的计算。对于那些我们不需要相关度得分的场景下,Elasticsearch以过滤器的形式提供了另一种查询功能。过滤器在概念上类似于查询,但是它们有非常快的执行速度,这种快的执行速度主要有以下两个原因:
过滤器不会计算相关度的得分,所以它们在计算上更快一些过滤器可以被缓存到内存中,这使得在重复的搜索查询上,其要比相应的查询快出许多。
比如查询是否存在字段age的doc 可以用query 也可以使用过滤 过滤放在bool中
curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query":{ "bool":{ "filter": { "exists" : { "field":"age" } } } }}';返回的结果中 _score都为0 比如
{ "_index" : "user", "_type" : "info", "_id" : "7", "_score" : 0.0, "_source" : { "id" : "7", "country" : "中国", "provice" : "湖南省", "city" : "长沙市", "age" : "18", "name" : "张三", "desc" : "张三来自长沙市 是公务员一名" } }4》聚合
默认数字类型 比如 integer float long都可以直接聚合 如果是字符串需要聚合 必须设置属性 fielddata=true 假设我使用国家进行聚合(group by)
首先修改 mapping 添加映射(删除索引 重建 重新导入数据)
"country" : { "type" : "string", "analyzer" : "ik_max_word", "fielddata":true },"provice" : { "type" : "string", "analyzer" : "ik_max_word", "fielddata":true },"city" : { "type" : "string", "analyzer" : "ik_max_word", "fielddata":true }执行简单的聚合 (按某个字段分组 每个分组有多少个元素) 注意指定分词器 否则 逐个单词来分组curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" } } }}';最后聚合结果为:"aggregations" : { "group_by_country" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "美国", "doc_count" : 6 }, { "key" : "中国", "doc_count" : 5 } ] } }类似于sql语句SELECT country as key,COUNT(*) as doc_count from user_info GROUP BY country ORDER BY COUNT(*) DESC上面的group_by_country是给当前分组取一个名字 可以多个分组 下面结果会出现两个分组结果 互不影响curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" } }, "group_by_provice": { "terms": { "field": "provice" } } }}';上面size:0表示只是获取分组的结果 类似于solr中的facets如果设置了size:数字 返回结果中 还会带出具体分组的具体数据 类似于solr的聚合
es的聚合支持嵌套 比如 按国家分组后 再按城市分组 (安装国家分组 各个分组内部的数据 再按照城市分组 )curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" }, "aggs":{ "group_by_age":{ "terms":{ "field":"city" } } } } }}';也可以在嵌套的聚合中使用聚合函数获取平均值 最大值 或者最小值
分组获取每组平均年龄 按照平均年龄大到小排序 支持max min sum 等聚合curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" }, "aggs":{ "avg_age":{ "avg":{ "field":"age" } } } } }}';支持 按照范围值进行分组 比如按照国家分组后 按照每个国家的年龄段分组curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" }, "aggs":{ "group_by_age_range":{ "range": {"field": "age","ranges": [ {"from": 10,"to": 20 }, {"from": 20,"to": 40 }, {"from": 40,"to": 100 }] } } } } }}';二 。使用命令搜索
java操作 搜索的api
package es;import java.net.InetAddress;import java.net.UnknownHostException;import java.util.List;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.client.transport.TransportClient;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.common.transport.InetSocketTransportAddress;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.search.SearchHit;import org.elasticsearch.search.aggregations.AggregationBuilders;import org.elasticsearch.search.aggregations.bucket.terms.StringTerms;import org.elasticsearch.search.aggregations.bucket.terms.StringTerms.Bucket;import org.elasticsearch.search.aggregations.metrics.avg.Avg;import org.elasticsearch.search.sort.SortOrder;import org.elasticsearch.transport.client.PreBuiltTransportClient;/** * 之前的 /user/info数据为例 * @author jiaozi * */public class Search {/** * 查询所有的 的doc * [root@node1 ~]# curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '> {> "query":{}> }'{ "hits" : { "total" : 1, "max_score" : 0.5377023, "hits" : [ { "_index" : "user", "_type" : "info", "_id" : "7", "_score" : 0.5377023, "_source" : { "id" : "7", "country" : "中国", "provice" : "湖南省", "city" : "长沙市", "age" : "18", "name" : "张三", "desc" : "张三来自长沙市 是公务员一名" } } ] }}上面的结果是 hits里有多个hits 代码也是一样 hits里有source */public static void search(){SearchResponse searchResponse = client.prepareSearch("user").setTypes("info").addSort("age", SortOrder.ASC).setFrom(0).setSize(5).get();SearchHit[] hits = searchResponse.getHits().getHits();for (int i = 0; i < hits.length; i++) {System.out.println(hits[i].getSource());}}/** * curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query" : { "term" : { "desc" : "来自" } }, "_source":["name", "desc"]}'; */public static void searchTerms(){SearchResponse searchResponse = client.prepareSearch("user").setTypes("info").addSort("age", SortOrder.ASC)//QueryBuilders可以指定其他 比如 match或者match_phase或者 regexp 或者prefix或者其他.setQuery(QueryBuilders.termQuery("desc", "来自")) //搜索 有得分//.setQuery(QueryBuilders.rangeQuery("age").lte(30)) //范围查询//.setQuery(QueryBuilders.regexpQuery("desc", "来自"))//正则//.setQuery(QueryBuilders.prefixQuery("desc", "张三"))//前缀//.setQuery(QueryBuilders.matchQuery("desc", "张三 来自"))//match//.existsQuery("desc") //exists 是否存在字段//.setPostFilter(QueryBuilders.termQuery("desc", "来自")) //过滤没得分.setFrom(0).setSize(5).get();SearchHit[] hits = searchResponse.getHits().getHits();for (int i = 0; i < hits.length; i++) {System.out.println(hits[i].getSource());}}/** * 等价于 * curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "query":{ "bool":{ "must": { "range" : { "age" : {"gte":20} } }, "must": { "range" : { "age" : {"lte":30} } } } }}'; */public static void searchBool(){SearchResponse searchResponse = client.prepareSearch("user").setTypes("info").setQuery(QueryBuilders.boolQuery().must(QueryBuilders.rangeQuery("age").lte(40)).must(QueryBuilders.rangeQuery("age").gte(20))//.mustNot(queryBuilder)//.should(queryBuilder)).get();SearchHit[] hits = searchResponse.getHits().getHits();for (int i = 0; i < hits.length; i++) {System.out.println(hits[i].getSource());}}/** * 聚合 * curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" } } }}'; */public static void aggs(){SearchResponse searchResponse = client.prepareSearch("user").addAggregation(AggregationBuilders.terms("group_by_country").field("country")).setSize(4).get();StringTerms terms = searchResponse.getAggregations().get("group_by_country");List<Bucket> buckets = terms.getBuckets();for (int i = 0; i < buckets.size(); i++) {Bucket bucket = buckets.get(i);System.out.println(bucket.getKey()+"----"+bucket.getDocCount());}SearchHit[] hits = searchResponse.getHits().getHits();for (int i = 0; i < hits.length; i++) {System.out.println(hits[i].getSource());}}/** * curl -XPOST '192.168.58.147:9200/user/_search?pretty' -d '{ "size": 0, "aggs": { "group_by_country": { "terms": { "field": "country" }, "aggs":{ "avg_age":{ "avg":{ "field":"age" } } } } }}'; */public static void aggsAvgs(){SearchResponse searchResponse = client.prepareSearch("user").addAggregation(AggregationBuilders.terms("group_by_country").field("country").subAggregation(AggregationBuilders.avg("avg_age").field("age"))).get();StringTerms terms = searchResponse.getAggregations().get("group_by_country");List<Bucket> buckets = terms.getBuckets();for (int i = 0; i < buckets.size(); i++) {Bucket bucket = buckets.get(i);Avg st=bucket.getAggregations().get("avg_age");System.out.println(bucket.getKey()+"----"+bucket.getDocCount());System.out.println(st.getValue());}}public static void main(String[] args) {aggsAvgs();}static TransportClient client;static{try {client=getClient();} catch (UnknownHostException e) {e.printStackTrace();}}/** * 获取客户端操作对象 * @return * @throws UnknownHostException */public static TransportClient getClient() throws UnknownHostException{Settings settings = Settings.builder() .put("cluster.name", "my_cluster_name") //.put("index.analysis.analyzer.default.type","ik_max_word") .build();TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.58.147"), 9300));return client;}}
- 企业级搜索elasticsearch应用02-elasticsearch搜索
- 企业级搜索elasticsearch应用03-前置处理器
- elasticsearch 构建企业级搜索探讨
- 企业级搜索elasticsearch应用04-集群和常用插件安装
- elasticsearch 搜索
- ElasticSearch搜索
- Elasticsearch 搜索
- [ElasticSearch]搜索
- 企业级搜索elasticsearch应用01-单机安装和索引文档操作
- 分布式搜索Elasticsearch
- Elasticsearch分布式搜索
- ElasticSearch入门-搜索
- 分布式搜索Elasticsearch 概述
- 分布式搜索elasticsearch 基本概念
- [Elasticsearch] 分布式搜索
- elasticsearch java API ------搜索
- ElasticSearch搜索实例
- elasticSearch 同义词搜索
- 神经网络在TensorFlow实现
- readline
- ResourceBundle和properties 读取配置文件区别
- 数据库优化方法
- 剑指offer—重建二叉树
- 企业级搜索elasticsearch应用02-elasticsearch搜索
- STM32的固件库和CubeMX
- JAVA基础一
- 为何会需要点击两次
- 【Docker】docker 学习
- JAVA实现FTP文件传输
- 124. Binary Tree Maximum Path Sum
- kotlin学习笔记(一)
- Windows下安装LiteIDE