ES缓存fielddata(avoid)、doc values详解(持续更新)
来源:互联网 发布:软件安全在线检测 编辑:程序博客网 时间:2024/05/17 07:56
Fielddata(重点)
- 产生条件:sort、aggs、script
- 缺点
- java heap
造成集群instability的主因之一,the most frequent cause of the highest severity issues.
理解Fielddata
inverted index
使得 ES query fast。
This data structure holds a sorted list of all the unique terms that appear in a field, and each term points to the list of documents that contain that term:
反向索引解决的:是contain,而不是equal关系,是哪些文档包含特定term的query。
sort、aggs、script 的无奈
需回答的问题是:在doc1的xx字段包含了哪些term?
需要的数据结构:
Docs: Terms:----------------------------1 [ brown ]2 [ brown, fox, quick ]
This is the purpose of fielddata.
- generated at query time ,reading the inverted index, inverting the term <-> docs data structure, and storing the results(doc <-> terms) in memory.
- Loading slow,especially with big segments. 代价昂贵,小撸提性能,大撸伤!
- consume a lot heap space。
- loaded on demand、per segment。
Eviction happens:
- 删除、关闭相关索引
- segment removed(by merge) ,moving rather than going away(待确认).
- 节点重启
- clear relevant fielddata cache.
auto evict for other fielddata(默认不会,因无界)
Evicting fielddata when the cache is full, leads to different issues: one request triggers fielddata loading for one field and the next request triggers loading for another, causing the first field to be evicted. This causes memory thrashing(内存抖动) and slow garbage collections(缓慢的GC), so suffer from very slow queries while they wait for their fielddata to be loaded.
Fielddata does not go away on its own. In Elasticsearch 1.3 and later, allow up to 60% of your Java heap’s memory to be consumed by fielddata per node,prior to Elasticsearch 1.3 unlimited。
We control this via the Fielddata Circuit Breaker, which checks incoming requests for potential fielddata usage and then blocks them if they require more memory than is currently available.
Any circuit breaker’s purpose is to prevent(rejected) any bad requests, which means that it never gets the chance to cause a problem (e.g., allocate even more fielddata), but it’s important to note that it will not clear any existing fielddata.
监控
list of each node with its fielddata usage.
# curl 172.28.141.11:9240/_cat/fielddata?vnode total catId note quantity jWareId node-87.14 0b 0b 0b 0b 0b dm_172.28.141.11:9240 65.5mb 1.2mb 0b 1.2mb 13.8mb node-gw-87.11 0b 0b 0b 0b 0b node-87.12 0b 0b 0b 0b 0b dm_172.28.141.15:9240 80.6mb 993.7kb 0b 1011.8kb 17mb dm_172.28.141.13:9240 68.8mb 1.3mb 0b 1.3mb 14.7mb d_172.20.71.30:9240 75.6mb 859.9kb 0b 923.5kb 16mb gw_172.28.159.29:9203 0b 0b 0b 0b 0b node-87.13 0b 0b 0b 0b 0b gw_172.28.159.12:9203 0b 0b 0b 0b 0b node-gw-87.12 0b 0b 0b 0b 0b 省略id、host、ip、node,及其它字段fielddata。?fields=catId,note 查询特定字段的
参考
Support in the Wild: My Biggest Elasticsearch Problem at Scale (Chris Earle)
Field Data: The Most Common Cause of Elasticsearch Cluster Instability at Scale
fielddata –>doc values
参考guid https://www.elastic.co/guide/en/elasticsearch/guide/2.x/docvalues.html
Without repeating too much from the guide, doc values offload this burden by writing the fielddata to disk at index time, thereby allowing Elasticsearch to load the values outside of your Java heap as they are needed.
Through the file system cache(Linux), which gives in-memory performance without the cost of garbage collections.
怎么配置doc values
v2.x默认配置,手动配置
"doc_values" : true //切勿在analyzed string类型字段设置true
doc values缺点
Cannot be used with analyzed strings.
For regular, unstructured search, you will not use any fielddata.
With that in mind, the only time that you should catch yourself using fielddata for analyzed strings is with the significant terms aggregation. All other uses of fielddata should be avoided by using a not_analyzed version of the string.
take advantage of both analyzed and not analyzed strings by using multifields,
- ES缓存fielddata(avoid)、doc values详解(持续更新)
- ES权威指南_04_aggs_10 Doc Values and Fielddata
- Es官方文档整理-3.Doc Values和FieldData
- fielddata -->doc values
- elasticsearch的Doc Values 和 Fielddata
- ElasticSearch中doc values和fielddata
- ES资料(持续更新)
- ES script(持续更新)
- ES Search APIs(持续更新)
- ES Geo(地理位置)相关 (持续更新)
- es注意事项----持续更新
- ES监控(持续更新)
- ES监控(持续更新) .
- ES 父子关系(持续更新)
- ES 常用管理REST API (持续更新)
- ES scroll(ES游标) 解决深分页 (持续更新)
- elasticsearch 学习博客系列<五> ES 中 index-doc 的 更新(java)
- ES postfilter的危害(持续更新)
- Git merge 合并分区详解
- WebView 的使用及小知识点
- JavaScript 网页加载
- cmaptools 不支持中文格式
- UE4中的反射机制
- ES缓存fielddata(avoid)、doc values详解(持续更新)
- SQLite将一个表中的数据导入到另一个表中
- EOJ 1807 快速排序
- JS Promise用法
- 数据结构--BlockQuery,HashQuery,通讯录
- 性能优化系统学习(一):基础知识
- bzoj 2850: 巧克力王国 (KD-tree)
- 【OpenFace】
- 动画