_all、_source、store、index的使用

来源：互联网发布：淘宝发布宝贝发货地址编辑：程序博客网时间：2024/05/16 09:04

1._all

1.1_all field

_all字段是一个很少用到的字段，它连接所有字段的值构成一个用空格（space）分隔的大string，该string被analyzed和index，但是不被store。当你不知道不清楚document结构的时候，可以用_all。如，有一document：

curl -XPUT 'http://127.0.0.1:9200/myindex/order/0508' -d '{    "name": "Scott",    "age": "24"}'

用_all字段search：

curl -XGET "http://127.0.0.1:9200/myindex/order/_search?pretty" -d '{    "query": {        "match": {            "_all": "Scott 24"        }    }}'

也可以用query_string：

curl -XGET "http://127.0.0.1:9200/myindex/order/_search?pretty" -d '{    "query": {        "query_string": {            "query": "Scott 24"        }    }}'

输出：

{  "took" : 5,  "timed_out" : false,  "_shards" : {    "total" : 2,    "successful" : 2,    "failed" : 0  },  "hits" : {    "total" : 1,    "max_score" : 0.2712221,    "hits" : [ {      "_index" : "myindex",      "_type" : "order",      "_id" : "0508",      "_score" : 0.2712221    } ]  }}

注意：_all是按空格（space）分隔的，所以，对于date类型就被analyzed为["year", "month", "day"]。如，一document：

{  "first_name":    "John",  "last_name":     "Smith",  "date_of_birth": "1970-10-24"}

curl -XGET "http://127.0.0.1:9200/myindex/order/_search?pretty" -d '{    "query": {        "match": {            "_all": "john smith 1970"        }    }}'

_all字段将包含["john", "smith", "1970", "10", "24"]。

所以，_all 字段仅仅是一个经过分析的 string 字段。它使用默认的分析器来分析它的值，而不管这值本来所在的字段指定的分析器。而且像所有 string 类型字段一样，你可以配置 _all 字段使用的分析器：

PUT /myindex/order/_mapping{    "order": {        "_all": { "analyzer": "whitespace" }    }}

1.2 Disable _all field

_all字段需要额外的CPU周期和更多的磁盘。所以，如果不需要_all，最好将其禁用，如：

curl -XPUT 'http://127.0.0.1:9200/myindex/order/_mapping' -d '{    "order": {        "_all": {            "enabled": true        },        "properties": {            .......        }    }}'

1.3 Excluding fields from _all

你可能不想把_all禁用，而是希望_all包含某些特定的fields。通过include_in_all选项可以控制字段是否要被包含在_all字段中，默认值是true。在一个对象上设置include_in_all可以修改这个对象所有字段的默认行为。如，指定_all包含name：

PUT /myindex/order/_mapping{    "order": {        "include_in_all": false,        "properties": {            "name": {                "type": "string",                "include_in_all": true            },            ...        }    }}

2._source

2.1 Disable _source field

ElasticSearch用JSOn字符串表示document主体，且保存在_source中。像其他保存的字段一样，_source字段也会在写入硬盘前压缩。_source字段不能被index，所以不能被搜索到。但是它却被store，所以_source还是要占用磁盘空间。不过，你可以禁用_source。

curl -XPUT 'http://127.0.0.1:9200/myindex/order/_mapping' -d '{    "order": {        "_source": {            "enabled": false        },        "properties": {......        }    }}'

不过，禁用_source之后，下面的功能将不再支持：

更新请求不再起作用，
On the fly highlighting，
从ElasticSearch的一个index，重新索引到另一个时，要么改变mapping'或analysis，要么升级index到一个新的版本，
在index阶段，通过view document主体debug查询和聚合，
在以后自动修复index的功能丧失。

如果考虑的磁盘空间，你可以增加compression level，而不用禁用_source。

2.2 Including / Excluding fields from _source

在_sourcez字段store前，而在document被index之后，你可以减少_source字段的内容。移除_source中的fields和禁用_source有相似的缺点，特别是当你不能从一个ElasticSearch的index重新索引到另一个index。但是你可以用source filtering。如下是官网的一个例子：

PUT logs{  "mappings": {    "event": {      "_source": {        "includes": [          "*.count",          "meta.*"        ],        "excludes": [          "meta.description",          "meta.other.*"        ]      }    }  }}PUT logs/event/1{  "requests": {    "count": 10,    "foo": "bar"   },  "meta": {    "name": "Some metric",    "description": "Some metric description",     "other": {      "foo": "one",       "baz": "two"     }  }}GET logs/event/_search{  "query": {    "match": {      "meta.other.foo": "one"     }  }}

当然，即使{"_source": {"enabled": true}}，你也可以通过限定_source来请求指定字段：

GET /_search{    "query":   { "match_all": {}},    "_source": [ "title", "created" ]}

3.store

store属于field的属性，如：

curl -XPUT 'http://127.0.0.1:9200/myindex/order/_mapping' -d '{    "order": {        ......        "properties": {            "name": {"type": "string", "store": "no", ......},......        }    }}'

被store标记的fields被存储在和index不同的fragment中，以便于快速检索。虽然store占用磁盘空间，但是减少了计算。store的值可以取yes/no或者true/false，默认值是no或者false。

被store标记的fields可以用以下方式search（多个fields时，用fields=f1,f2,f3...）：

curl -XGET 'http://hadoop:9200/myindex/order/0508?fields=age&pretty=true'

4.index

和store一样，index也是fields的属性。它用于配置每个被index的field，且默认值是analyzed。index有三个值：

no：该field将不在被index。这样便于管理不需要被search的fields。
analyzed：该field用配置的analyzer分析。它一般是小写且标记化的，使用ElasticSearch默认的配置StandardAnalyzer。
not_analyzed：该field可以处理和index，但是不能改变其analyzer。默认使用的是ElasticSearch配置的KeywordAnalyzer，它把每个field作为一个标识处理。

curl -XPUT 'http://127.0.0.1:9200/myindex/order/_mapping' -d '{    "order": {        ......        "properties": {            "name": {"type": "string", "index": "no", ......},......        }    }}'

0 0