通过HTTP RESTful API 操作elasticsearch搜索数据

来源：互联网发布：nginx域名重定向404 编辑：程序博客网时间：2024/06/05 15:44

通过HTTP RESTful API 操作elasticsearch搜索数据

标签： apielasticsearch搜索操作文档

2015-09-30 16:59 20183人阅读评论(0) 收藏举报

分类：

es（9）

目录(?)[+]

样例数据集

这是编造的JSON格式银行客户账号信息文档，文档schema如下：
{
“account_number”: 0,
“balance”: 16623,
“firstname”: “Bradshaw”,
“lastname”: “Mckenzie”,
“age”: 29,
“gender”: “F”,
“address”: “244 Columbus Place”,
“employer”: “Euron”,
“email”: “bradshawmckenzie@euron.com”,
“city”: “Hobucken”,
“state”: “CO”
}
这些数据可以通过www.json-generator.com网站生成

加载样例数据集

下载样例数据集链接
解压数据到指定目录，然后加载到elasticsearch集群

绝对路径：curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@/home/cluster/apps/elasticsearch/elasticsearch-1.7.2/test/accounts.json"相对路径：curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@test/accounts.json"1
2
3
4
5
1
2
3
4
5

curl 'localhost:9200/_cat/indices?v'结果：health status index              pri rep docs.count docs.deleted store.size pri.store.size yellow open   bank                 5   1       1000            0    417.1kb        417.1kb 1
2
3
4
1
2
3
4

上面结果，说明我们成功bulk 1000个文档到bank索引中了

搜索数据API

有两种方式：一种方式是通过 REST 请求 URI ，发送搜索参数；另一种是通过REST 请求体，发送搜索参数。而请求体允许你包含更容易表达和可阅读的JSON格式。

通过 REST 请求 URI

curl 'localhost:9200/bank/_search?q=*&pretty'结果：{  "took" : 63,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 1000,    "max_score" : 1.0,    "hits" : [ {      "_index" : "bank",      "_type" : "account",      "_id" : "1",      "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "6",      "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}    }, {      "_index" : "bank",      "_type" : "account",1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

q=*，参数告诉elasticsearch，在bank索引中匹配所有的文档
pretty，参数告诉elasticsearch，返回形式打印JSON结果

通过REST 请求体：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_all": {} }}'结果：{  "took" : 26,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 1000,    "max_score" : 1.0,    "hits" : [ {      "_index" : "bank",      "_type" : "account",      "_id" : "1",      "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "6",      "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "13",1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

与第一种方式不同是在URI中替代传递q=*，使用POST方式提交，请求体包含JSON格式搜索

介绍查询语言

elasticsearch提供JSON格式领域特定语言执行查询。可参考Query DSL。

{  "query": { "match_all": {} }}1
2
3
1
2
3

query：告诉我们定义查询
match_all：运行简单类型查询指定索引中的所有文档

除了指定查询参数，还可以指定其他参数来影响最终的结果。

match_all & 只返回第一个文档：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_all": {} },  "size": 1}'结果：{  "took" : 2,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 1000,    "max_score" : 1.0,    "hits" : [ {      "_index" : "bank",      "_type" : "account",      "_id" : "4",      "_score" : 1.0,      "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}    } ]  }}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

如果不指定size，默认是返回10条文档信息

match_all & 返回11到20个文档信息

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_all": {} },  "from": 10,  "size": 10}'1
2
3
4
5
6
1
2
3
4
5
6

from：指定文档索引从哪里开始，默认从0开始
size：从from开始，返回多个文档
这feature在实现分页查询很有用

match_all and 根据account balance 降序排序 & 返回10个文档（默认10个）

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_all": {} },  "sort": { "balance": { "order": "desc" } }}'1
2
3
4
5
1
2
3
4
5

执行搜索

默认的，我们搜索返回完整的JSON文档。而source（_source字段搜索点击量）。如果我们不想返回完整的JSON文档，我们可以使用source返回指定字段。

返回 account_number and balance：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_all": {} },  "_source": ["account_number", "balance"]}'1
2
3
4
5
1
2
3
4
5

这样操作有点类似于SQL SELECT FROM field lis

match 查询，可作为基本字段搜索查询
- 返回 account_number=20:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match": { "account_number": 20 } }}'1
2
3
4
1
2
3
4

返回 address=mill：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match": { "address": "mill" } }}'1
2
3
4
1
2
3
4

返回 address=mill or address=lane：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match": { "address": "mill lane" } }}'1
2
3
4
1
2
3
4

返回短语匹配 address=mill lane：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": { "match_phrase": { "address": "mill lane" } }}'1
2
3
4
1
2
3
4

布尔值(bool)查询

返回匹配address=mill & address=lane：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": {    "bool": {      "must": [        { "match": { "address": "mill" } },        { "match": { "address": "lane" } }      ]    }  }}'1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11

must:要求所有条件都要满足（类似于&&）

返回匹配address=mill or address=lane：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": {    "bool": {      "should": [        { "match": { "address": "mill" } },        { "match": { "address": "lane" } }      ]    }  }}'1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11

should：任何一个满足就可以（类似于||）

返回不匹配address=mill & address=lane：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": {    "bool": {      "must_not": [        { "match": { "address": "mill" } },        { "match": { "address": "lane" } }      ]    }  }}'1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11

must_not:所有条件都不能满足（类似于! (&&)）

返回 age=40 & state!=ID

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": {    "bool": {      "must": [        { "match": { "age": "40" } }      ],      "must_not": [        { "match": { "state": "ID" } }      ]    }  }}'1
2
3
4
5
6
7
8
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13

执行过滤器

文档中score(_score字段是搜索结果)。score是一个数字型的，是一种相对方法匹配查询文档结果。分数越高，搜索关键字与该文档相关性越高；越低，搜索关键字与该文档相关性越低。

在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算，那要使用另一种查询能力，构建过滤器。

过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因:
1. 过滤器不计算得分，所以他们比执行查询的速度
2. 过滤器可缓存在内存中，允许重复搜索

为了便于理解过滤器，先介绍过滤器搜索(like match_all, match, bool, etc.)，可以与其他的普通查询搜索组合一个过滤器。
range filter,允许我们通过一个范围值来过滤文档，一般用于数字或日期过滤

使用过滤器搜索返回 balances[ 20000,30000]。换句话说，balance>=20000 && balance<=30000

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "query": {    "filtered": {      "query": { "match_all": {} },      "filter": {        "range": {          "balance": {            "gte": 20000,            "lte": 30000          }        }      }    }  }}'结果：{  "took" : 3,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0  },  "hits" : {    "total" : 217,    "max_score" : 1.0,    "hits" : [ {      "_index" : "bank",      "_type" : "account",      "_id" : "4",      "_score" : 1.0,      "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "9",      "_score" : 1.0,      "_source":{"account_number":9,"balance":24776,"firstname":"Opal","lastname":"Meadows","age":39,"gender":"M","address":"963 Neptune Avenue","employer":"Cedward","email":"opalmeadows@cedward.com","city":"Olney","state":"OH"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "11",      "_score" : 1.0,      "_source":{"account_number":11,"balance":20203,"firstname":"Jenkins","lastname":"Haney","age":20,"gender":"M","address":"740 Ferry Place","employer":"Qimonk","email":"jenkinshaney@qimonk.com","city":"Steinhatchee","state":"GA"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "42",      "_score" : 1.0,"_source":{"account_number":42,"balance":21137,"firstname":"Harding","lastname":"Hobbs","age":26,"gender":"F","address":"474 Ridgewood Place","employer":"Xth","email":"hardinghobbs@xth.com","city":"Heil","state":"ND"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "54",      "_score" : 1.0,      "_source":{"account_number":54,"balance":23406,"firstname":"Angel","lastname":"Mann","age":22,"gender":"F","address":"229 Ferris Street","employer":"Amtas","email":"angelmann@amtas.com","city":"Calverton","state":"WA"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "66",      "_score" : 1.0,      "_source":{"account_number":66,"balance":25939,"firstname":"Franks","lastname":"Salinas","age":28,"gender":"M","address":"437 Hamilton Walk","employer":"Cowtown","email":"frankssalinas@cowtown.com","city":"Chase","state":"VT"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "92",      "_score" : 1.0,      "_source":{"account_number":92,"balance":26753,"firstname":"Gay","lastname":"Brewer","age":34,"gender":"M","address":"369 Ditmars Street","employer":"Savvy","email":"gaybrewer@savvy.com","city":"Moquino","state":"HI"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "100",      "_score" : 1.0,      "_source":{"account_number":100,"balance":29869,"firstname":"Madden","lastname":"Woods","age":32,"gender":"F","address":"696 Ryder Avenue","employer":"Slumberia","email":"maddenwoods@slumberia.com","city":"Deercroft","state":"ME"}    }, {      "_index" : "bank",      "_type" : "account",      "_id" : "105",      "_score" : 1.0,1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

过滤查询包含match_all查询(查询部分)和一系列过滤(过滤部分)。可以代替任何其他查询到查询部分以及其他过滤器过滤部分。在上述情况下,过滤器范围智能，因为文档落入range所有匹配“平等”,即。比另一个更相关,没有文档。

一般情况，最明智的方式决定是否使用filter or query，就看你是否关心相关性分数。如果相关性不重要，那就使用filter，否则就使用query。
queries and filters很类似于关系型数据库中的 “SELECT WHERE clause”

执行聚合

聚合提供从你的数据中分组和提取统计能力。
类似于关系型数据中的SQL GROUP BY和SQL 聚合函数。

在Elasticsearch,你有能力执行搜索返回命中结果,同时拆分命中结果，然后统一返回结果。当你使用简单的API运行搜索和多个聚合，然后返回所有结果避免网络带宽过大的情况是高效的。

根据state分组，降序统计top 10 state

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "size": 0,  "aggs": {    "group_by_state": {      "terms": {        "field": "state"      }    }  }}'结果： "hits" : {    "total" : 1000,    "max_score" : 0.0,    "hits" : [ ]  },  "aggregations" : {    "group_by_state" : {      "buckets" : [ {        "key" : "al",        "doc_count" : 21      }, {        "key" : "tx",        "doc_count" : 17      }, {        "key" : "id",        "doc_count" : 15      }, {        "key" : "ma",        "doc_count" : 15      }, {        "key" : "md",        "doc_count" : 15      }, {        "key" : "pa",        "doc_count" : 15      }, {        "key" : "dc",        "doc_count" : 14      }, {        "key" : "me",        "doc_count" : 14      }, {        "key" : "mo",        "doc_count" : 14      }, {        "key" : "nd",        "doc_count" : 14      } ]    }  }}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

类似于关系型数据库

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC1
1

size=0 不是展示搜索结果命中数，因为我只是想要看聚合结果

根据state计算账户平均balance，降序统计top 10 state

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "size": 0,  "aggs": {    "group_by_state": {      "terms": {        "field": "state"      },      "aggs": {        "average_balance": {          "avg": {            "field": "balance"          }        }      }    }  }}'1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

注意嵌套average_balance聚合group_by_state内聚合。这是一个常见的模式,所有的聚合。您可以嵌套内聚合聚合任意提取旋转汇总时,你需要从你的数据。

降序排序平均 balance：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "size": 0,  "aggs": {    "group_by_state": {      "terms": {        "field": "state",        "order": {          "average_balance": "desc"        }      },      "aggs": {        "average_balance": {          "avg": {            "field": "balance"          }        }      }    }  }}'1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

聚合年龄分区间(ages 20-29, 30-39, and 40-49),聚合性别，最后平均balance 展示最终结果

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{  "size": 0,  "aggs": {    "group_by_age": {      "range": {        "field": "age",        "ranges": [          {            "from": 20,            "to": 30          },          {            "from": 30,            "to": 40          },          {            "from": 40,            "to": 50          }        ]      },      "aggs": {        "group_by_gender": {          "terms": {            "field": "gender"          },          "aggs": {            "average_balance": {              "avg": {                "field": "balance"              }            }          }        }      }    }  }}'

0 0