ES权威指南_01_get start_05 Searching—The Basic Tools

来源:互联网 发布:掌握了c语言之后学什么 编辑:程序博客网 时间:2024/06/05 02:42

https://www.elastic.co/guide/en/elasticsearch/guide/current/search.html

A search can be any of the following:

  • A structured query on concrete fields,sorted by a field
  • A full-text query, which finds all docs matching the search keywords, and returns them sorted by relevance
  • A combination of the two

understand three subjects:

  • Mapping,How the data in each field is interpreted
  • Analysis,How full text is processed to make it searchable
  • Query DSL,The flexible, powerful query language used by Elasticsearch

explain them in detail in Search in Depth.

测试数据:
https://gist.github.com/clintongormley/8579281.

1 The Empty Search

which doesn’t specify any query but simply returns all documents in all indices in the cluster:

GET /_search// 响应:{   "hits" : {      "total" :       14, //匹配查询的总数14个,默认返回10"hits" : [// 结果数组        {          "_index":   "us",          "_type":    "tweet",          "_id":      "7",          "_score":   1,          "_source": {             "date":    "2014-09-17",             "name":    "John Smith",             "tweet":   "The Query DSL is really powerful and flexible",             "user_id": 2          }       },        ... 9 RESULTS REMOVED ...      ],      "max_score" :   1   },   "took" :           4,   "_shards" : {      "failed" :      0,      "successful" :  10,      "total" :       10   },   "timed_out" :      false}

hits
The most important section of the response ,which contains the total number of documents that matched our query,

_index, _type, _id ,_source

Each element also has a _score. This is the relevance score, which is a measure of how well the document matches the query.

默认:_score倒排,默认值1.

took
how many milliseconds the entire search request took to execute.
shards
_shards element tells us the total number of shards that were involved in the query and, of them, how many were successful and how many failed.
可能部分分片失败。
timeout
timed_out value tells us whether the query timed out. By default, search requests do not time out.

GET /_search?timeout=10ms

注意:
timeout does not halt the execution of the query; it merely tells the coordinating node to return the results collected so far and to close the connection. In the background, other shards may still be processing the query even though results have been sent.

Use the time-out because it is important to your SLA, not because you want to abort the execution of long-running queries.

2 Multi-index, Multitype

/_search/gb,us/_search/g*,u*/_search/gb,us/user,tweet/_search/_all/user,tweet/_search

When you search within a single index, Elasticsearch forwards the search request to a primary or replica of every shard in that index, and then gathers the results from each shard. Searching within multiple indices works in exactly the same way—there are just more shards involved.

提示:
Searching one index that has five primary shards is exactly equivalent to searching five indices that have one primary shard each.

3 Pagination

size, the number of results that should be returned, defaults to 10
from,the number of initial results that should be skipped, defaults to 0

GET /_search?size=5GET /_search?size=5&from=5

a search request usually spans multiple shards. Each shard generates its own sorted results, which then need to be sorted centrally to ensure that the overall order is correct.

Deep Paging in Distributed Systems
To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the coordinating node, which then sorts all 50 results in order to select the overall top 10.

Now imagine that we ask for page 1,000—results 10,001 to 10,010,each shard has to produce its top 10,010 results。The coordinating node then sorts through all 50,050 results and discards 50,040 of them!

提示:In Reindexing Your Data we explain how you can retrieve large numbers of documents efficiently(scroll +bulk).

4 Search Lite【了解,生产环境不建议使用】

two forms of the search API:

  1. a “lite” query-string version
  2. full request body version
GET /_all/tweet/_search?q=tweet:elasticsearch
+name:john +tweet:mary

but the percent encoding needed for query-string parameters makes it appear more cryptic than it really is:
查询字符串参数所需的百分比编码使得它看起来比真正的更加隐秘:

GET /_search?q=%2Bname%3Ajohn+%2Btweet%3Amary

The + prefix indicates conditions that must be satisfied for our query to match.
- prefix would indicate conditions that must not match.

The _all Field

GET /_search?q=mary

When you index a document, Elasticsearch takes the string values of all of its fields and concatenates them into one big string, which it indexes as the special _all field.

日期、时间、整型 ,在_all 都成字符串了。

Metadata: _all Field.

More Complicated Queries

  • The name field contains mary or john
  • The date is greater than 2014-09-10
  • The _all field contains either of the words aggregations or geo
+name:(mary john) +date:>2014-09-10 +(aggregations geo)
?q=%2Bname%3A(mary+john)+%2Bdate%3A%3E2014-09-10+%2B(aggregations+geo)

参考: Query String Syntax reference docs

虽简洁,但难以debug,易出错。
Finally, the query-string search allows any user to run potentially slow, heavy queries on any field in your index, possibly exposing private information or even bringing your cluster to its knees!

For these reasons, we don’t recommend exposing query-string searches directly to your users, unless they are power users who can be trusted with your data and with your cluster.

0 0