Elasticsearch（三）elasticsearch轻量检索

来源：互联网发布：传淘宝和关联宝贝编辑：程序博客网时间：2024/06/06 06:51

一个 GET 是相当简单的，可以直接得到指定的文档。现在尝试点儿稍微高级的功能，比如一个简单的搜索！

搜索所有雇员

第一个尝试的几乎是最简单的搜索了。我们使用下列请求来搜索所有雇员：

GET /megacorp/employee/_search

可以看到，我们仍然使用索引库 megacorp 以及类型 employee，但与指定一个文档 ID 不同，这次使用_search 。返回结果包括了所有三个文档，放在数组 hits 中。一个搜索默认返回十条结果。

{   "took":      6,   "timed_out": false,   "_shards": { ... },   "hits": {      "total":      3,      "max_score":  1,      "hits": [         {            "_index":         "megacorp",            "_type":          "employee",            "_id":            "3",            "_score":         1,            "_source": {               "first_name":  "Douglas",               "last_name":   "Fir",               "age":         35,               "about":       "I like to build cabinets",               "interests": [ "forestry" ]            }         },         {            "_index":         "megacorp",        ...

注意：返回结果不仅告知匹配了哪些文档，还包含了整个文档本身：显示搜索结果给最终用户所需的全部信息。

Client程序演示

增加一个方法：

/*     * GET /megacorp/employee/_search     * 返回的文档放在hit[]中     * SearchResponse response5 = client.prepareSearch(index1, index2)                .setTypes(type1, type2)                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) // 就写这个就好了，虽然Java API定义了额外的搜索类型QUERY_AND_FETCH和DFS_QUERY_AND_FETCH，但这些模式是内部优化，不应该由API的用户明确指定。                .setQuery(QueryBuilders.termQuery("brandNameNew", 2))                 // Query                 .setPostFilter(QueryBuilders.rangeQuery("useYears").from(2).to(5))     // Filter                .setFrom(0).setSize(60).setExplain(true)                .get();        //所有的参数都是可选的，也就是说，最简单的可以这样写,代表查询整个集群        SearchResponse response6 = client.prepareSearch().get();     * 此方面知识来源于 Search API 搜索API允许执行搜索查询并取回匹配查询的搜索匹配。     * 它可以跨越一个或多个索引并跨越一个或多个类型执行。查询可以使用查询Java API提供。     * took：是查询花费的时间，毫秒单位        time_out：标识查询是否超时        _shards：描述了查询分片的信息，查询了多少个分片、成功的分片数量、失败的分片数量等        hits：搜索的结果，total是全部的满足的文档数目，hits是返回的实际数目（默认是10）        _score是文档的分数信息，与排名相关度有关，参考各大搜索引擎的搜索结果，就容易理解。     * !!!搜索请求的主体是使用SearchSourceBuilder。     */    private static void getEmployeesByIndexAndType(Client client,String[] indics,String[] types) {        System.out.println("集群中查询索引为"+Arrays.deepToString(indics)+"和类型为"+Arrays.deepToString(types)+"的所有数据，开始查询...");        //查询        SearchResponse response = client.prepareSearch(indics)                .setTypes(types)                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)                .get();        //分析查询结果 -- took暂不说明，它每次是变化的        // timed_out        boolean isTimedOut = response.isTimedOut();        System.out.println("timed_out:"+isTimedOut);        // _shards        int totalShards = response.getTotalShards();        int successfulShards = response.getSuccessfulShards();        int failedShards = response.getFailedShards();        System.out.println("_shards:{ total="+totalShards+" successful="+successfulShards+" failed="+failedShards+"}");        // 文档在hit数组中，更多方法使用请看API中SearchHits        SearchHits searchHits = response.getHits();        Iterator<SearchHit> iterator = searchHits.iterator();        while(iterator.hasNext()) {            SearchHit hit = iterator.next();            String index = hit.getIndex();            String type = hit.getType();            String id = hit.getId();            float score = hit.getScore();            System.out.println("index="+index+" type="+type+" id="+id+" score="+score+" source-->"+hit.getSourceAsString());        }        System.out.println("查询结束...");    }

Main中增加一个调用（main方法见之前文档，其实只要获得client连接即可）

// 3.查询所有雇员文档  _search getEmployeesByIndexAndType(client,new String[] {"megacorp"},new String[] {"employee"});

运行结果显示：
集群中查询索引为[megacorp]和类型为[employee]的所有数据，开始查询…
timed_out:false
_shards:{ total=5 successful=5 failed=0}
index=megacorp type=employee id=2 score=1.0 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}
index=megacorp type=employee id=4 score=1.0 source–>{“first_name”:”Douglas1”,”last_name”:”Fir”,”age”:35,”about”:”I like to build cabinets”,”interests”:[“forestry”]}
index=megacorp type=employee id=1 score=1.0 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=3 score=1.0 source–>{“first_name”:”Douglas”,”last_name”:”Fir”,”age”:35,”about”:”I like to build cabinets”,”interests”:[“forestry”]}
查询结束…

Head插件示例

这里写图片描述

搜索姓中为smith的雇员

接下来，尝试下搜索姓氏为 Smith 的雇员。为此，我们将使用一个高亮搜索，很容易通过命令行完成。这个方法一般涉及到一个查询字符串（query-string）搜索，因为我们通过一个URL参数来传递查询信息给搜索接口：
GET /megacorp/employee/_search?q=last_name:Smith
我们仍然在请求路径中使用 _search 端点，并将查询本身赋值给参数 q= 。返回结果给出了所有的 Smith：

{   ...   "hits": {      "total":      2,      "max_score":  0.30685282,      "hits": [         {            ...            "_source": {               "first_name":  "John",               "last_name":   "Smith",               "age":         25,               "about":       "I love to go rock climbing",               "interests": [ "sports", "music" ]            }         },         {            ...            "_source": {               "first_name":  "Jane",               "last_name":   "Smith",               "age":         32,               "about":       "I like to collect rock albums",               "interests": [ "music" ]            }         }      ]   }}

Client程序演示

我们引入
import static org.elasticsearch.index.query.QueryBuilders.*;
类似第一个例子使用即可。
增加一个方法：

/*     * 根据一个字段的值查询       * GET /megacorp/employee/_search?q=last_name:Smith     *      * QueryBuilders的term查询 ，表全部匹配，不进行分词解析     */    private static void getEmployeesByFieldEqual(Client client, String field, String text) {        SearchResponse response = client.prepareSearch("megacorp")                .setTypes("employee")                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)                .setQuery(QueryBuilders.termQuery(field, text))                .get();        //查看结果        SearchHits searchHits = response.getHits();        Iterator<SearchHit> iterator = searchHits.iterator();        while(iterator.hasNext()) {            SearchHit hit = iterator.next();            String index = hit.getIndex();            String type = hit.getType();            String id = hit.getId();            float score = hit.getScore();            System.out.println("index="+index+" type="+type+" id="+id+" score="+score+" source-->"+hit.getSourceAsString());        }    }

主方法中增加调用：

// 4.查询姓smith的雇员getEmployeesByFieldEqual(client,"last_name","Smith");

结果并没有显示。。。
我们先测试它运作吗？
getEmployeesByFieldEqual(client,”about”,”love”);
结果显示：
index=megacorp type=employee id=5 score=0.7884338 source–>{“first_name”:”John”,”last_name”:”Smith1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=1 score=0.7884338 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

即它把包含的也显示出来了，原来about这个字段是text类型的，也就是默认分析的，（analyzed:默认选项，以标准的全文索引方式，分析字符串，完成索引。）表示他将被分析器分析，也就是说如果一个文档的about字段是I love to go rock climbing，那么将被分析成[I,love,to,go,rock,climbing]，如图
这里写图片描述
在匹配love词时只要about字段中有love这个词就会被匹配。所以会出现这个结果。参考：
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/query-dsl-term-query.html

如果我们不想要这样的结果：

可以将此索引的类型改变成不被分析的类型。
（not_analyzed 索引时不进行分词分析，确切值形式）
查看一下索引mapping的内容：
这里写图片描述
发现他们都是默认text类型的。
我们已经存在的索引是不可以更改它的映射的，（为了使数据可查询，就需要知道每一个field包含的数据的数据类型以及它是如何索引的。如果你将一个field的数据类型从string修改为date，这这个字段所包含的数据将全部无用。你需要重建创建索引了！这条规则不仅仅针对es，任何一个可用于查询的数据库系统都是这样。如果不用索引，就是为灵活性牺牲速度。参考：http://blog.csdn.net/jingkyks/article/details/41513063）
对于存在的索引，只有新字段出现时，Elasticsearch才会自动进行处理。如果确实需要修改映射，那么就使用reindex,采用重新导入数据的方式完成。
（参考：http://blog.csdn.net/u010994304/article/details/50454025）
（如果想要执行重新导入的操作参考：
http://blog.csdn.net/jingkyks/article/details/41513063
http://blog.csdn.net/u010994304/article/details/50454025
http://blog.csdn.net/lengfeng92/article/details/38230521
http://www.cnblogs.com/Creator/p/3722408.html）
所以要么建立的时候就将这个字段设置为不分析的字段（删除这个索引，重新增加）
要么重新导入数据

示例

现在我们举个栗子，重新新建一个索引，让他的映射都为no_analyzed
（你也可以先delete你现有的索引，我这里重建）
这里写图片描述
（参照原来的索引写）
（数据类型参考https://www.cnblogs.com/xing901022/p/5471419.html）
放入和原来相同的数据

这里写图片描述
再调用刚才的方法：
getEmployeesByFieldEqual(client,”about”,”love”);
没有返回任何数据
调用：
getEmployeesByFieldEqual(client,”about”,”I love to go rock climbing”);
结果显示2条数据：
index=megacorp1 type=employee1 id=5 score=0.87546873 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp1 type=employee1 id=1 score=0.87546873 source–>{“first_name”:”John”,”last_name”:”Smith1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

增加一条如图的数据：
这里写图片描述
调用：
getEmployeesByFieldEqual(client,”last_name”,”Smith 1”);
显示：
index=megacorp1 type=employee1 id=6 score=1.5404451 source–>{“first_name”:”John”,”last_name”:”Smith 1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

调用：
getEmployeesByFieldEqual(client,”last_name”,”Smith”);
显示：
index=megacorp1 type=employee1 id=5 score=1.0296195 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp1 type=employee1 id=2 score=1.0296195 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}
当然也可以用querystring来写

SearchRequestBuilder request = client.prepareSearch("megacorp")                .setTypes("employee")        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)                .setQuery(new QueryStringQueryBuilder(text).field(field)); SearchResponse response = request.get();

问题解决！

疑问

至于为什么改了它的是否分析后就能够查询到了，这一点很迷惑，未找到原因。他们的tokens完全相同
这里写图片描述
请求的request串也完全相同。
{
“query” : {
“term” : {
“last_name” : {
“value” : “Smith”,
“boost” : 1.0
}
}
}
}
这个方法是运作的， last_name没有匹配任何值
{“took”:3,”timed_out”:false,”_shards”:{“total”:5,”successful”:5,”failed”:0},”hits”:{“total”:0,”max_score”:null,”hits”:[]}}
欢迎解惑。。。

Head插件示例

这里写图片描述

阅读全文

0 0