elasticsearch--搜索_Java基础使用
来源:互联网 发布:2017网络热曲 编辑:程序博客网 时间:2024/06/16 00:00
如转载请申明来源
一、搜索示例
a) 测试数据准备
curl -XPUT localhost:9200/my_index/my_type/_bulk -d '{ "index": { "_id": 1 }}{ "title": "The quick brown fox" , "age":"18"}{ "index": { "_id": 2 }}{ "title": "The quick brown fox jumps over the lazy dog" , "age":"20" }{ "index": { "_id": 3 }}{ "title": "The quick brown fox jumps over the quick dog" , "age":"19" }{ "index": { "_id": 4 }}{ "title": "Brown fox brown dog" , "age":"18" }'
b) 查询参数说明
请求示例, 查询index名为my_index、type名为my_type下所有的数据
from、size: 用于分页,从第0条开始,取10条数据
sort: 排序的条件
aggs: 聚合分析的条件,与aggregations等价
bool: 用于组合多个查询条件,后面的内容会讲解
curl -XPOST localhost:9200/my_index/my_type/_search?pretty=true -d '{"query": {"bool": {"must": [{"match_all": { }}],"must_not": [ ],"should": [ ]}},"from": 0,"size": 10,"sort": [ ],"aggs": { }}'
返回结果:
took: 本次请求处理耗费的时间(单位:ms)
time_out: 请求处理是否超时。tip:如果查询超时,将返回已获取的结果,而不是终止查询
_shards:本次请求涉及的分片信息,共5个分片处理,成功5个,失败0个
hits:查询结果信息
hits.total: 满足查询条件总的记录数
hits.max_score: 最大评分(相关性),因为本次没有查询条件,所以没有相关性评分,每条记录的评分均为1分(_score=1)
hits.hits: 本次查询返回的结果, 即从from到min(from+size,hits.total)的结果集
hits.hits._score: 本条记录的相关度评分,因为本次没有查询条件,所以没有相关性评分,每条记录的评分均为1分
hits.hits._source: 每条记录的原数据
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 1.0, "hits" : [ { "_index" : "my_index", "_type" : "my_type", "_id" : "2", "_score" : 1.0, "_source" : { "title" : "The quick brown fox jumps over the lazy dog", "age" : "20" } }, { "_index" : "my_index", "_type" : "my_type", "_id" : "4", "_score" : 1.0, "_source" : { "title" : "Brown fox brown dog", "age" : "18" } }, { "_index" : "my_index", "_type" : "my_type", "_id" : "1", "_score" : 1.0, "_source" : { "title" : "The quick brown fox", "age" : "18" } }, { "_index" : "my_index", "_type" : "my_type", "_id" : "3", "_score" : 1.0, "_source" : { "title" : "The quick brown fox jumps over the quick dog", "age" : "19" } } ] }}
c) java查询代码
Client client = ConnectionUtil.getLocalClient(); SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setFrom(0).setSize(10); Log.debug(requestBuilder); SearchResponse response = requestBuilder.get(); Log.debug(response);
二. 不同搜索/过滤关键字介绍
term, terms, range, exists, missing
match, match_all, multi_match
高亮搜索、scroll、排序
a) term
主要用于精确匹配,如数值、日期、布尔值或未经分析的字符串(not_analyzed)
{ "term": { "age": 26 }} { "term": { "date": "2014-09-01" }} { "term": { "public": true }} { "term": { "tag": "full_text" }}
Java代码:
QueryBuilder ageBuilder = QueryBuilders.termQuery("age", "10");
b) terms
和term有点类似,可以允许指定多个匹配条件。如果指定了多个条件,文档会去匹配多个条件,多个条件直接用or连接。以下表示查询title中包含内容dog或jumps的记录
{ "terms": { "title": [ "dog", "jumps" ] }}
等效于:
"bool" : { "should" : [ { "term" : { "title" : "dog" } }, { "term" : { "title" : "jumps" } } ] }
Java代码:
QueryBuilder builder = QueryBuilders.termsQuery("title", "dog", "jumps");// 与termsQuery等效builder = QueryBuilders.boolQuery().should(QueryBuilders.termQuery("title", "dog")).should(QueryBuilders.termQuery("title", "jumps"));
c) range
允许我们按照指定范围查找一批数据。数值、字符串、日期等
数值:
{ "range": { "age": { "gte": 20, "lt": 30 } }}
日期:
"range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-07 00:00:00" }}
当用于日期字段时,range 过滤器支持日期数学操作。例如,我们想找到所有最近一个小时的文档:
"range" : { "timestamp" : { "gt" : "now-1h" }}
日期计算也能用于实际的日期,而不是仅仅是一个像 now 一样的占位符。只要在日期后加上双竖线 ||,就能使用日期数学表达式了。
"range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-01 00:00:00||+1M" <1> }}
<1> 早于 2014 年 1 月 1 号加一个月
范围操作符包含:
gt :: 大于
gte:: 大于等于
lt :: 小于
lte:: 小于等于
Java代码:
QueryBuilders.rangeQuery("age").gte(18).lt(20);
过滤字符串时,字符串访问根据字典或字母顺序来计算。例如,这些值按照字典顺序排序:
5, 50, 6, B, C, a, ab, abb, abc, b
Tip: 使用range过滤/查找时,数字和日期字段的索引方式让他们在计算范围时十分高效。但对于字符串来说却不是这样。为了在字符串上执行范围操作,Elasticsearch 会在这个范围内的每个短语执行 term 操作。这比日期或数字的范围操作慢得多。
+
字符串范围适用于一个基数较小的字段,一个唯一短语个数较少的字段。你的唯一短语数越多,搜索就越慢。
d) exists, missing
exists和missing过滤可以用于查找文档中是否包含指定字段或没有某个字段,类似于SQL语句中的is not null和is null条件
目前es不推荐使用missing过滤, 使用bool.mustNot + exists来替代
{ "exists": { "field": "title" }}{ "missing": { "field": "title" }}"bool" : { "must_not" : { "exists" : { "field" : "title" } } }
Java代码:
// exitsQueryBuilder builder = QueryBuilders.existsQuery("title");// missingbuilder = QueryBuilders.missingQuery("title");// instead of missingbuilder = QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("title"));
e) match, match_all, multi_match
match_all用于查询所有内容,没有指定查询条件
{ "match_all": {}}
常用与合并过滤或查询结果。
match查询是一个标准查询,全文查询或精确查询都可以用到他
如果你使用 match 查询一个全文本字段,它会在真正查询之前用分析器先分析match一下查询字符。使用match查询字符串时,查询关键字和查询目标均会进行分析(和指定的分词器有关),指定not_analyzed除外。
{ "match": { "tweet": "About Search" }}
如果用match下指定了一个确切值,在遇到数字,日期,布尔值或者not_analyzed 的字符串时,它将为你搜索你给定的值:
{ "match": { "age": 26 }}{ "match": { "date": "2014-09-01" }}{ "match": { "public": true }}{ "match": { "tag": "full_text" }}
match参数type、operator、minimum_should_match寿命
type取值
boolean: 分析后进行查询
phrase: 确切的匹配若干个单词或短语, 如title: “brown dog”, 则查询title中包含brown和dog, 且两个是连接在一起的
phrase_prefix: 和phrase类似,最后一个搜索词(term)会进行前面部分匹配
官网解释:The match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text
operator取值
and: “brown dog”, 包含brown且包含dog
or: “brown dog”, 包含brown或dog
minimum_should_match:取值为整数或者百分数,用于精度控制。如取4,表示需要匹配4个关键字,50%,需要匹配一半的关键字。设置minimum_should_match时operator将失效
"match" : { "title" : { "query" : "BROWN DOG", "type" : "boolean", "operator" : "OR", "minimum_should_match" : "50%" } }
multi_match查询允许你做match查询的基础上同时搜索多个字段:
{ "multi_match": { "query": "full text search", "fields": [ "title", "body" ] }}
tip:
1. 查询字符串时,match与term的区别
term查找时内容精确匹配,match则会进行分
析器处理,分析器中的分词器会将搜索关键字分割成单独的词(terms)或者标记(tokens)
eg. 查询title包含Jumps的内容, 用示例数据时,term匹配不到结果,但match会转化成jumps匹配,然后查找到结果。和使用的分析器有关,笔者使用的是自带的标准分析器
http://localhost:9200/my_index/_analyze?pretty=true&field=title&text=Jumps
{ "tokens" : [ { "token" : "jumps", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 0 } ]}
Java代码:
QueryBuilder builder = QueryBuilders.matchAllQuery();builder = QueryBuilders.matchQuery("title", "Jumps");builder = QueryBuilders.matchQuery("title", "BROWN DOG!").operator(MatchQueryBuilder.Operator.OR).type(MatchQueryBuilder.Type.BOOLEAN);builder = QueryBuilders.multiMatchQuery("title", "dog", "jump");
f) 高亮搜索
本篇暂不介绍
g) 排序
和数据库中order by类似
"sort": { "date": { "order": "desc" }}
Java代码:
SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setFrom(0).setSize(10) .addSort("age", SortOrder.DESC);
h) scroll
scroll 类似于数据库里面的游标,用于缓存大量结果数据
一个search请求只能返回结果的一个单页(10条记录),而scroll API能够用来从一个单一的search请求中检索大量的结果(甚至全部)
,这种行为就像你在一个传统数据库内使用一个游标一样。
scrolling目的不是为了实时的用户请求,而是为了处理大量数据。
官网解释(https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-request-scroll.html):
While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.
通过scroll检索数据时,每次会返回一个scroll_id,检索下一批数据时,这个id必需要传递到scroll API
Client client = ConnectionUtil.getLocalClient();SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setScroll(new TimeValue(20000)) // 设置scroll有效时间 .setSize(2);System.out.println(requestBuilder);SearchResponse scrollResp = requestBuilder.get();System.out.println("totalHits:" + scrollResp.getHits().getTotalHits());while (true) { String scrollId = scrollResp.getScrollId(); System.out.println("scrollId:" + scrollId); SearchHits searchHits = scrollResp.getHits(); for (SearchHit hit : searchHits.getHits()) { System.out.println(hit.getId() + "~" + hit.getSourceAsString()); } System.out.println("================="); // 3. 通过scrollId获取后续数据 scrollResp = client.prepareSearchScroll(scrollId) .setScroll(new TimeValue(20000)).execute().actionGet(); if (scrollResp.getHits().getHits().length == 0) { break; }}
三. 组合搜索
bool: 组合查询, 包含must, must not, should
搜索关键字的权重
a) bool
上面介绍查询/过滤关键子时多次提到bool,我们现在介绍bool
bool 可以用来合并多个条件,bool可以嵌套bool,已用于组成复杂的查询条件,它包含以下操作符:
must :: 多个查询条件的完全匹配,相当于 and。
must_not :: 多个查询条件的相反匹配,相当于 not。
should :: 至少有一个查询条件匹配, 相当于 or。
这些参数可以分别继承一个条件或者一个条件的数组:
{ "bool": { "must": { "term": { "folder": "inbox" }}, "must_not": { "match": { "tag": "spam" }}, "should": [ { "term": { "starred": true }}, { "range": { "date": { "gte": "2014-01-01" }}} ] }}
tip: bool下面,must、must_not、should至少需存在一个
Java代码:
// (price = 20 OR productID = "1234") AND (price != 30)QueryBuilder queryBuilder = QueryBuilders.boolQuery() .should(QueryBuilders.termQuery("price", "20")) .should(QueryBuilders.termQuery("productId", "1234")) .mustNot(QueryBuilders.termQuery("price", "30"));
b) 搜索关键字权重, 提高查询得分
假设我们想搜索包含”full-text search”的文档,但想给包含“Elasticsearch”或者“Lucene”的文档更高的权重。即包含“Elasticsearch”或者“Lucene”的相关性评分比不包含的高,这些文档在结果文档中更靠前。
一个简单的bool查询允许我们写出像下面一样的非常复杂的逻辑:
"bool": { "must": { "match": { "content": { (1) "query": "full text search", "operator": "and" } } }, "should": [ (2) { "match": { "content": "Elasticsearch" }}, { "match": { "content": "Lucene" }} ]}
content字段必须包含full,text,search这三个单词。
如果content字段也包含了“Elasticsearch”或者“Lucene”,则文档会有一个更高的得分。
在上例中,如果想给包含”Elasticsearch”一词的文档得分更高于”Lucene”,则可以指定一个boost值控制权重,该值默认为1。一个大于1的boost值可以提高查询子句的相对权重。
"bool": { "must": { "match": { (1) "content": { "query": "full text search", "operator": "and" } } }, "should": [ { "match": { "content": { "query": "Elasticsearch", "boost": 3 (2) } }}, { "match": { "content": { "query": "Lucene", "boost": 2 (3) } }} ]}
这些查询子句的boost值为默认值1。
这个子句是最重要的,因为他有最高的boost值。
这个子句比第一个查询子句的要重要,但是没有“Elasticsearch”子句重要。
Java代码: QueryBuilders.matchQuery("title", "Dog").boost(3);
部分内容摘录于:http://es.xiaoleilu.com/ 第12、13章
附:测试类完整Java代码
package cn.com.axin.elasticsearch.qwzn.share;import java.net.UnknownHostException;import org.elasticsearch.action.search.SearchRequestBuilder;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.client.Client;import org.elasticsearch.common.unit.TimeValue;import org.elasticsearch.index.query.MatchQueryBuilder;import org.elasticsearch.index.query.QueryBuilder;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.search.SearchHit;import org.elasticsearch.search.SearchHits;import org.elasticsearch.search.sort.SortOrder;import cn.com.axin.elasticsearch.util.ConnectionUtil;import cn.com.axin.elasticsearch.util.Log;/** * @Title * * @author * @date 2016-8-11 */public class Search { public static void main(String[] args) throws Exception {// searchAll();// execQuery(termSearch());// execQuery(termsSearch());// execQuery(rangeSearch());// execQuery(existsSearch());// execQuery(matchSearch()); execQuery(boolSearch());// highlightedSearch();// scorll();// } /** * @return */ private static QueryBuilder boolSearch() { // age > 30 or last_name is Smith QueryBuilder queryBuilder = QueryBuilders.boolQuery() .should(QueryBuilders.rangeQuery("age").gt("30")) .should(QueryBuilders.matchQuery("last_name", "Smith")); // 挺高查询权重// QueryBuilders.matchQuery("title", "Dog").boost(3);// QueryBuilders.boolQuery().must(null);// QueryBuilders.boolQuery().mustNot(null); return queryBuilder; } private static void scorll() { Client client = null; try { client = ConnectionUtil.getLocalClient(); // 获取Client连接对象 SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type")// .setQuery(QueryBuilders.termQuery("age", "20")) .setScroll(new TimeValue(20000)) // 设置scroll有效时间 .setSize(2); System.out.println(requestBuilder); SearchResponse scrollResp = requestBuilder.get(); System.out.println("totalHits:" + scrollResp.getHits().getTotalHits()); while (true) { String scrollId = scrollResp.getScrollId(); System.out.println("scrollId:" + scrollId); SearchHits searchHits = scrollResp.getHits(); for (SearchHit hit : searchHits.getHits()) { System.out.println(hit.getId() + "~" + hit.getSourceAsString()); } System.out.println("================="); // 3. 通过scrollId获取后续数据 scrollResp = client.prepareSearchScroll(scrollId) .setScroll(new TimeValue(20000)).execute().actionGet(); if (scrollResp.getHits().getHits().length == 0) { break; } } } catch (Exception e) { e.printStackTrace(); } finally { if (null != client) { client.close(); } } } /** * @return */ private static void highlightedSearch() { QueryBuilder builder = QueryBuilders.termsQuery("age", "18"); Client client = null; try { client = ConnectionUtil.getLocalClient(); SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setFrom(0).setSize(10) .addHighlightedField("age");// .addSort("age", SortOrder.DESC); Log.debug(requestBuilder); SearchResponse response = requestBuilder.get(); Log.debug(response); } catch (UnknownHostException e) { e.printStackTrace(); } finally { if (null != client) { client.close(); } } } /** * @return */ private static QueryBuilder matchSearch() { QueryBuilder builder = QueryBuilders.matchAllQuery(); builder = QueryBuilders.matchQuery("title", "Jumps"); /* type: boolean 分析后进行查询 phrase: 确切的匹配若干个单词或短语, phrase_prefix: The match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text */ builder = QueryBuilders.matchQuery("title", "BROWN DOG!").operator(MatchQueryBuilder.Operator.OR).type(MatchQueryBuilder.Type.BOOLEAN); builder = QueryBuilders.multiMatchQuery("title", "dog", "jump"); return builder; } /** * @return */ private static QueryBuilder existsSearch() { // exits QueryBuilder builder = QueryBuilders.existsQuery("title"); // missing builder = QueryBuilders.missingQuery("title"); // instead of missing builder = QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("title")); return builder; } /** * */ private static QueryBuilder rangeSearch() { // age >= 18 && age < 20 return QueryBuilders.rangeQuery("age").gte(18).lt(20); } private static QueryBuilder termSearch(){ QueryBuilder builder = QueryBuilders.termsQuery("title", "brown"); return builder; } private static QueryBuilder termsSearch(){ QueryBuilder builder = QueryBuilders.termsQuery("title", "dog", "jumps"); // 与termsQuery等效 builder = QueryBuilders.boolQuery().should(QueryBuilders.termQuery("title", "dog")).should(QueryBuilders.termQuery("title", "jumps")); return builder; } private static void searchAll() { Client client = null; try { client = ConnectionUtil.getLocalClient(); SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setFrom(0).setSize(10) .addSort("age", SortOrder.DESC); Log.debug(requestBuilder); SearchResponse response = requestBuilder.get(); Log.debug(response); } catch (UnknownHostException e) { e.printStackTrace(); } finally { if (null != client) { client.close(); } } } /** * @param builder * @throws UnknownHostException */ private static void execQuery(QueryBuilder builder) throws UnknownHostException { Client client = ConnectionUtil.getLocalClient(); SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type") .setExplain(true) .setQuery(builder); Log.debug(requestBuilder); SearchResponse response = requestBuilder.get(); Log.debug(response); }}
获取连接对象的代码
/** * 获取本地的连接对象(127.0.0.1:9300) * @return * @throws UnknownHostException */ public static Client getLocalClient() throws UnknownHostException { return getClient("127.0.0.1", 9300, "es-stu"); } /** * 获取连接对象 * @param host 主机IP * @param port 端口 * @param clusterName TODO * @return * @throws UnknownHostException */ private static Client getClient(String host, int port, String clusterName) throws UnknownHostException { // 参数设置 Builder builder = Settings.settingsBuilder(); // 启用嗅探功能 sniff builder.put("client.transport.sniff", true); // 集群名 builder.put("cluster.name", clusterName); Settings settings = builder.build(); TransportClient transportClient = TransportClient.builder().settings(settings).build(); Client client = transportClient.addTransportAddress( new InetSocketTransportAddress(InetAddress.getByName(host), port)); // 连接多个地址 // transportClient.addTransportAddresses(transportAddress); return client; }
- elasticsearch--搜索_Java基础使用
- Elasticsearch 搜索使用详解
- ElasticSearch核心基础之搜索
- Elasticsearch搜索安装和使用
- laravel使用ElasticSearch进行搜索
- Elasticsearch搜索安装和使用
- nodejs elasticsearch基础使用
- ElasticSearch学习13_ElasticSearch RESTful搜索引擎_Java Jest使用入门
- 使用 Elasticsearch ik分词实现同义词搜索
- 使用ElasticSearch快速搭建数据搜索服务
- elasticsearch 搜索
- ElasticSearch搜索
- Elasticsearch 搜索
- [ElasticSearch]搜索
- 搜索学习--Elasticsearch全文搜索服务器的基本使用
- [Elasticsearch] 全文搜索 (一) - 基础概念和match查询
- [Elasticsearch] 全文搜索 (一) - 基础概念和match查询
- Elasticsearch基础
- MySql安装后无法启动服务
- AngularJS RootScope 源码分析
- HDU----2955
- JAVA中的并发工具类(三)---CountDownLatch
- 数据结构实验之二叉树七:叶子问题
- elasticsearch--搜索_Java基础使用
- HDOJ 1087 Super Jumping! Jumping! Jumping!
- 类与对象(长方柱)
- Hadoop 2.4 完全分布式环境安装与配置及配置信息介绍
- 函数重载
- DojoX DataGrid
- RandomAccessFile(一)
- 【1】【物理/积分】HDU5826 physics
- javaScript的insertAdjacentHTML()和insertAdjacentText()详解