Elasticsearch Merge合并操作与配置

来源：互联网发布：数学专业程序员编辑：程序博客网时间：2024/06/05 18:49

在elasticsearch中，一个分片就是一个完整的Lucene索引，一个Lucene索引又分解为很多segment，segment作为索引存储数据的底层单位，是不可变的。为了保证segment个数在一定范围之内和物理删除已打删除标记的文档，一些小的segment会周期性的合并为更大的segment。合并线程会根据硬件配置，自动平衡合并操作和其他一些操作（比如查询）。

Merge 调度任务

合并任务调度实例（ConcurrentMergeScheduler）控制着合并操作的进程。合并操作使用不同的线程
来进行合并操作，当线程数已达最大时，后面合并操作只能等待前面的线程执行完并可用时，才会进行合并；

合并调度程序支持动态配置最大线程数，配置参数为index.merge.scheduler.max_thread_count ，
参数值为 Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))，机械硬盘读写速度慢，如果配置太大，IO消耗太大，可能影响其他的操作，所以最好配置为1；

对应源代码为：

类文件：org.apache.lucene.index.ConcurrentMergeScheduler.java

 /** Used for testing.   * @lucene.internal    */  public static final String DEFAULT_CPU_CORE_COUNT_PROPERTY = "lucene.cms.override_core_count"; /** Sets max merges and threads to proper defaults for rotational   *  or non-rotational storage.   * 根据磁盘类型来设置最大合并线程数和最大正在合并的segment数      * @param spins true to set defaults best for traditional      rotatational storage (spinning disks),    *        else false (e.g. for solid-state disks)   */  public synchronized void setDefaultMaxMergesAndThreads(boolean spins) {    if (spins) {//机械的默认配置      maxThreadCount = 1;//最大合并线程数      maxMergeCount = 6;    } else {//固态硬盘时      int coreCount = Runtime.getRuntime().availableProcessors();      // Let tests override this to help reproducing a failure on a machine that has a different      // core count than the one where the test originally failed:      try {        String value = System.getProperty(DEFAULT_CPU_CORE_COUNT_PROPERTY);        //如果环境变量有这个DEFAULT_CPU_CORE_COUNT_PROPERTY值，就用这个，否则用coreCount        if (value != null) {          coreCount = Integer.parseInt(value);        }      } catch (Throwable ignored) {      }      maxThreadCount = Math.max(1, Math.min(4, coreCount/2));      maxMergeCount = maxThreadCount+5;    }  }

org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.java

 void refreshConfig() {        if (this.getMaxMergeCount() != config.getMaxMergeCount() || this.getMaxThreadCount() != config.getMaxThreadCount()) {            this.setMaxMergesAndThreads(config.getMaxMergeCount(), config.getMaxThreadCount());        }        boolean isEnabled = getIORateLimitMBPerSec() != Double.POSITIVE_INFINITY;        if (config.isAutoThrottle() && isEnabled == false) {            enableAutoIOThrottle();        } else if (config.isAutoThrottle() == false && isEnabled) {            disableAutoIOThrottle();        }    }

org.elasticsearch.index.MergeSchedulerConfig.java 变量配置：

  public static final Setting<Integer> MAX_THREAD_COUNT_SETTING =        new Setting<>("index.merge.scheduler.max_thread_count",            (s) -> Integer.toString(Math.max(1, Math.min(4, EsExecutors.boundedNumberOfProcessors(s) / 2))),            (s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_thread_count"), Property.Dynamic,            Property.IndexScope);    public static final Setting<Integer> MAX_MERGE_COUNT_SETTING =        new Setting<>("index.merge.scheduler.max_merge_count",            (s) -> Integer.toString(MAX_THREAD_COUNT_SETTING.get(s) + 5),            (s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_merge_count"), Property.Dynamic, Property.IndexScope);    public static final Setting<Boolean> AUTO_THROTTLE_SETTING =        Setting.boolSetting("index.merge.scheduler.auto_throttle", true, Property.Dynamic, Property.IndexScope);    private volatile boolean autoThrottle;    private volatile int maxThreadCount;    private volatile int maxMergeCount;

本地4个的PC机刚好算出是2个线程

1 0