ES权威指南_01_get start_05 Searching—The Basic Tools
来源:互联网 发布:掌握了c语言之后学什么 编辑:程序博客网 时间:2024/06/05 02:42
https://www.elastic.co/guide/en/elasticsearch/guide/current/search.html
A search can be any of the following:
- A structured query on concrete fields,sorted by a field
- A full-text query, which finds all docs matching the search keywords, and returns them sorted by relevance
- A combination of the two
understand three subjects:
- Mapping,How the data in each field is interpreted
- Analysis,How full text is processed to make it searchable
- Query DSL,The flexible, powerful query language used by Elasticsearch
explain them in detail in Search in Depth.
测试数据:
https://gist.github.com/clintongormley/8579281.
1 The Empty Search
which doesn’t specify any query but simply returns all documents in all indices in the cluster:
GET /_search// 响应:{ "hits" : { "total" : 14, //匹配查询的总数14个,默认返回10个 "hits" : [// 结果数组 { "_index": "us", "_type": "tweet", "_id": "7", "_score": 1, "_source": { "date": "2014-09-17", "name": "John Smith", "tweet": "The Query DSL is really powerful and flexible", "user_id": 2 } }, ... 9 RESULTS REMOVED ... ], "max_score" : 1 }, "took" : 4, "_shards" : { "failed" : 0, "successful" : 10, "total" : 10 }, "timed_out" : false}
hits
The most important section of the response ,which contains the total number of documents that matched our query,
_index, _type, _id ,_source
Each element also has a _score. This is the relevance score, which is a measure of how well the document matches the query.
默认:_score倒排,默认值1.
took
how many milliseconds the entire search request took to execute.
shards
_shards element tells us the total number of shards that were involved in the query and, of them, how many were successful and how many failed.
可能部分分片失败。
timeout
timed_out value tells us whether the query timed out. By default, search requests do not time out.
GET /_search?timeout=10ms
注意:
timeout does not halt the execution of the query; it merely tells the coordinating node to return the results collected so far and to close the connection. In the background, other shards may still be processing the query even though results have been sent.
Use the time-out because it is important to your SLA, not because you want to abort the execution of long-running queries.
2 Multi-index, Multitype
/_search/gb,us/_search/g*,u*/_search/gb,us/user,tweet/_search/_all/user,tweet/_search
When you search within a single index, Elasticsearch forwards the search request to a primary or replica of every shard in that index, and then gathers the results from each shard. Searching within multiple indices works in exactly the same way—there are just more shards involved.
提示:
Searching one index that has five primary shards is exactly equivalent to searching five indices that have one primary shard each.
3 Pagination
size, the number of results that should be returned, defaults to 10
from,the number of initial results that should be skipped, defaults to 0
GET /_search?size=5GET /_search?size=5&from=5
a search request usually spans multiple shards. Each shard generates its own sorted results, which then need to be sorted centrally to ensure that the overall order is correct.
Deep Paging in Distributed Systems
To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the coordinating node, which then sorts all 50 results in order to select the overall top 10.
Now imagine that we ask for page 1,000—results 10,001 to 10,010,each shard has to produce its top 10,010 results。The coordinating node then sorts through all 50,050 results and discards 50,040 of them!
提示:In Reindexing Your Data we explain how you can retrieve large numbers of documents efficiently(scroll +bulk).
4 Search Lite【了解,生产环境不建议使用】
two forms of the search API:
- a “lite” query-string version
- full request body version
GET /_all/tweet/_search?q=tweet:elasticsearch
+name:john +tweet:mary
but the percent encoding needed for query-string parameters makes it appear more cryptic than it really is:
查询字符串参数所需的百分比编码使得它看起来比真正的更加隐秘:
GET /_search?q=%2Bname%3Ajohn+%2Btweet%3Amary
The + prefix
indicates conditions that must be satisfied for our query to match. - prefix
would indicate conditions that must not match.
The _all Field
GET /_search?q=mary
When you index a document, Elasticsearch takes the string values of all of its fields and concatenates them into one big string, which it indexes as the special _all field.
日期、时间、整型 ,在_all 都成字符串了。
Metadata: _all Field.
More Complicated Queries
- The name field contains mary or john
- The date is greater than 2014-09-10
- The _all field contains either of the words aggregations or geo
+name:(mary john) +date:>2014-09-10 +(aggregations geo)
?q=%2Bname%3A(mary+john)+%2Bdate%3A%3E2014-09-10+%2B(aggregations+geo)
参考: Query String Syntax reference docs
虽简洁,但难以debug,易出错。
Finally, the query-string search allows any user to run potentially slow, heavy queries on any field in your index, possibly exposing private information or even bringing your cluster to its knees!
For these reasons, we don’t recommend exposing query-string searches directly to your users, unless they are power users who can be trusted with your data and with your cluster.
- ES权威指南_01_get start_05 Searching—The Basic Tools
- ES权威指南_01_get start_10 Index Management
- ES权威指南_01_get start_01 You Know, for Search…
- ES权威指南_01_get start_03 Data In, Data Out
- ES权威指南_01_get start_04 Distributed Document Store
- ES权威指南_01_get start_06 Mapping and Analysis
- ES权威指南_01_get start_07 Full-Body Search
- ES权威指南_01_get start_08 Sorting and Relevance
- ES权威指南_01_get start_09 Distributed Search Execution
- ES权威指南_01_get start_11 Inside a Shard
- ES权威指南_01_get start_02 Life Inside a Cluster(ES集群内部原理)
- ES权威指南_05_Geolocation_02 Geohashes
- ES权威指南_04_aggs_01 High-Level Concepts
- ES权威指南_04_aggs_05 Scoping Aggs
- ES权威指南_04_aggs_07 Sorting Multivalue Buckets
- ES权威指南_04_aggs_11 Closing Thoughts
- ES权威指南_05_Geolocation_01 Geo Points
- ES权威指南_05_Geolocation_03 Geo Aggs
- webservice篇之简单开发(二)
- SLF4J官网手册个人翻译
- 汇编语言王爽(第二版)课后习题答案
- 手动写一个Behavior Designer任务节点
- android studio1.x升级到android studio2.2遇到的坑
- ES权威指南_01_get start_05 Searching—The Basic Tools
- Xcode8 检测内存泄露
- spring事务配置总结
- 小白笔记--------------------------安装hadoop集群简单总结
- java操作poi生成excel文件(.xlsx)
- 微信小程序--swiper图片显示不完整
- 睡眠办法a
- Android内存优化基础
- 堆区和栈区的区别