fielddata -->doc values

来源：互联网发布：微信点餐外卖源码编辑：程序博客网时间：2024/05/17 23:03

“fast, efficient and memory-friendly”

官方指南

列式，正向。资源受限时利用OS’s file system cache(NO GC)。lock memory?

invert index –> search
doc values –> sort/agg/script/parent-child/geo filter etc…,
look up the value contained in a specific document.

Generated on a per-segment basis and are immutable.
And, like the inverted index, doc values are serialized to disk.

ES memory config: 4-16gb on a 64gb. 减少了。

Column-store compression:省磁盘、更快访问到。CPU很少是瓶颈。

Disabling Doc Values：
default for all fields except analyzed strings.

    doc_values: false

特别配置：

  "doc_values": true,  "index": "no"

可aggs,不可search（理解为什么可以这样）。

聚合严禁analyzed string，因为 ①token数目未知②doc values不启用

需要分词，又需要聚合？

  "state" : {    "type": "string",    "fields": {      "raw" : {        "type": "string",        "index": "not_analyzed"      }    }  }

Doc values are most efficient when each document has one or several tokens, but not thousands

Doc values are not generated for analyzed strings. Yet these fields can still be used in aggregations. How is that possible?
fielddata is built and managed 100% in memory, living inside the JVM heap.
another reason to avoid aggregating analyzed fields: high-cardinality fields consume a large amount of memory when loaded into fie这里写代码片lddata.

限制内存使用：
Once analyzed strings have been loaded into fielddata, they will sit there until evicted (or your node crashes).
Fielddata is loaded lazily，per-field basis，but load all document that field.

$ES_HEAP_SIZE：不超过可用RAM的50%，不超过32GB

Fielddata Size:
indices.fielddata.cache.size,默认unbound，若超过阈值则之前values将evict。
This setting is a safeguard, not a solution for insufficient memory.
加载很耗时，且heavy disk IO , 大量garbage需GC。
基于时间序列的，早期的fielddata不用了，仍旧在内存。

    indices.fielddata.cache.size:  20%

the least recently used fielddata will be evicted

监控Fielddata:

  per-index:      _stats/fielddata?fields=*  per-node:       _nodes/stats/indices/fielddata?fields=*  per-index per-node:      _nodes/stats/indices/fielddata?level=indices&fields=*

?fields=*, the memory usage is broken down for each field.

Circuit Breaker:
fielddata size is checked after the data is loaded. Maybe OutOfMemoryException!!!
fielddata circuit breaker that is designed to deal with this situation.
其评估内存消耗，若超过了阈值，则circuit breaker is tripped and the query will be aborted and return an exception.

可用的CB，ensure memory limits are not exceeded:

indices.breaker.fielddata.limit:60%
indices.breaker.request.limit:40%
indices.breaker.total.limit:70% wrap fielddata and request…

Fielddata Filtering:

PUT /music/_mapping/song{  "properties": {    "tag": {      "type": "string",      "fielddata": { //关键字，how fielddata is handled for this field.        "filter": {          "frequency": { //filter  based on term frequencies.            "min":              0.01, // Load only terms that occur in at least 1% of documents in this segment.            "min_segment_size": 500  //Ignore any segments that have fewer than 500 documents.          }        }      }    }  }}

0 0