fielddata -->doc values
来源:互联网 发布:微信点餐外卖源码 编辑:程序博客网 时间:2024/05/17 23:03
“fast, efficient and memory-friendly”
官方指南
列式,正向。资源受限时利用OS’s file system cache(NO GC)。lock memory?
invert index –> search
doc values –> sort/agg/script/parent-child/geo filter etc…,
look up the value contained in a specific document.
Generated on a per-segment basis and are immutable.
And, like the inverted index, doc values are serialized to disk.
ES memory config: 4-16gb on a 64gb. 减少了。
Column-store compression:省磁盘、更快访问到。CPU很少是瓶颈。
Disabling Doc Values:
default for all fields except analyzed strings.
doc_values: false
特别配置:
"doc_values": true, "index": "no"
可aggs,不可search(理解为什么可以这样)。
聚合严禁analyzed string,因为 ①token数目未知②doc values不启用
需要分词,又需要聚合?
"state" : { "type": "string", "fields": { "raw" : { "type": "string", "index": "not_analyzed" } } }
Doc values are most efficient when each document has one or several tokens, but not thousands
Doc values are not generated for analyzed strings. Yet these fields can still be used in aggregations. How is that possible?
fielddata is built and managed 100% in memory, living inside the JVM heap.
another reason to avoid aggregating analyzed fields: high-cardinality fields consume a large amount of memory when loaded into fie这里写代码片
lddata.
限制内存使用:
Once analyzed strings have been loaded into fielddata, they will sit there until evicted (or your node crashes).
Fielddata is loaded lazily,per-field basis,but load all document that field.
$ES_HEAP_SIZE:不超过可用RAM的50%,不超过32GB
Fielddata Size:
indices.fielddata.cache.size,默认unbound,若超过阈值则之前values将evict。
This setting is a safeguard, not a solution for insufficient memory.
加载很耗时,且heavy disk IO , 大量garbage需GC。
基于时间序列的,早期的fielddata不用了,仍旧在内存。
indices.fielddata.cache.size: 20%
the least recently used fielddata will be evicted
监控Fielddata:
per-index: _stats/fielddata?fields=* per-node: _nodes/stats/indices/fielddata?fields=* per-index per-node: _nodes/stats/indices/fielddata?level=indices&fields=*
?fields=*, the memory usage is broken down for each field.
Circuit Breaker:
fielddata size is checked after the data is loaded. Maybe OutOfMemoryException!!!
fielddata circuit breaker that is designed to deal with this situation.
其评估内存消耗,若超过了阈值,则circuit breaker is tripped and the query will be aborted and return an exception.
可用的CB,ensure memory limits are not exceeded:
indices.breaker.fielddata.limit:60%
indices.breaker.request.limit:40%
indices.breaker.total.limit:70% wrap fielddata and request…
Fielddata Filtering:
PUT /music/_mapping/song{ "properties": { "tag": { "type": "string", "fielddata": { //关键字,how fielddata is handled for this field. "filter": { "frequency": { //filter based on term frequencies. "min": 0.01, // Load only terms that occur in at least 1% of documents in this segment. "min_segment_size": 500 //Ignore any segments that have fewer than 500 documents. } } } } }}
- fielddata -->doc values
- elasticsearch的Doc Values 和 Fielddata
- ElasticSearch中doc values和fielddata
- ES缓存fielddata(avoid)、doc values详解(持续更新)
- ES权威指南_04_aggs_10 Doc Values and Fielddata
- Es官方文档整理-3.Doc Values和FieldData
- Fielddata
- Elasticsearch2.x Doc values
- fielddata那些事
- elasticsearch5 打开fielddata
- values
- 05-doc-values-es控制聚合内存使用-elasticsearch权威指南翻译
- Maximum SHMMAX values for Linux x86 and x86-64 (Doc ID 567506.1)
- doc
- doc
- DOC
- doc
- doc
- HDP学习--Ambari安装Hadoop集群步骤
- Eclipse+pydev 常用快捷键
- TEC1401.Report开发技术总结 - 第五章 使用Oracle Reports开发报表-在EBS应用中注册Report的注意事项(4/4)
- opencaster
- Erlang服务器内存耗尽调查(转)
- fielddata -->doc values
- Apache CXF 与 阿里巴巴 Dubbo等常用web服务框架介绍
- [leetcode][98] Validate Binary Search Tree
- BASH命令和SHELL脚本总结(
- 类名.class 类名.this
- HDUoj 2199 Can you solve this equation?(二分搜索)
- Repeated Substring Pattern
- TEC1401.Report开发技术总结 - 第六章 使用BI Publisher开发报表-创建XML数据源(1/5)
- Hession与Webservice的区别