elasticsearch bulk报错EsRejectedExcutionException[rejected execution(queue capacity 50) on.......]

来源:互联网 发布:西瓜数据 编辑:程序博客网 时间:2024/06/05 04:05


最近在使用elasticsearch的时候,遇到这样一个问题,其实一看大家都知道。就是队列过长了。处理不过来导致失败了。这里从官网查了一下资料,在进行bulk操作的时候默认是50个。这样,很容易就会报这种错误。我们需要调大一些。下面是官网关于Thread Pool的介绍

Thread Pooledit

On this page

  • Thread pool types
  • Processors setting
  • Elasticsearch Reference: 
  • Getting Started
  • Setup
  • Breaking changes
  • API Conventions
  • Document APIs
  • Search APIs
  • Indices APIs
  • cat APIs
  • Cluster APIs
  • Query DSL
  • Mapping
  • Analysis
  • Modules
    • Cluster
    • Discovery
    • Gateway
    • HTTP
    • Indices
    • memcached
    • Network Settings
    • Node
    • Tribe node
    • Plugins
    • Scripting
    • Text scoring in scripts
    • Thread Pool
    • Thrift
    • Transport
    • Snapshot And Restore
  • Index Modules
  • Testing
  • Glossary of terms

A node holds several thread pools in order to improve how threads memory consumption are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.

下面这些是线程池中比较重要的参数,其实大部分都是不用调整的基本上都能满足我们的需求。

There are several thread pools, but the important ones include:

index

     For index/delete operations. Defaults to fixed with a size of # of available processors, queue_size of 200.     

search

     For count/search operations. Defaults to fixed with a size of 3x # of available processors, queue_size of 1000.

suggest

For suggest operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.

get

For get operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.

bulk

For bulk operations. Defaults to fixed with a size of # of available processors, queue_size of 50.  这里,报的错就是这里,我们需要调大,到100.

percolate

For percolate operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.

snapshot

For snapshot/restore operations. Defaults to scaling, keep-alive 5m with a size of (# of available processors)/2.

warmer

For segment warm-up operations. Defaults to scaling with a 5m keep-alive.

refresh

For refresh operations. Defaults to scaling with a 5m keep-alive.

listener

Mainly for java client executing of action when listener threaded is set to true. Default size of (# of available processors)/2, max at 10.

Changing a specific thread pool can be done by setting its type and specific type parameters, for example, changing the index thread pool to have more threads:

threadpool:    index:        type: fixed        size: 30
Note

you can update threadpool settings live using Cluster Update Settings.

Thread pool typesedit

The following are the types of thread pools that can be used and their respective parameters:

cacheedit

The cache thread pool is an unbounded thread pool that will spawn a thread if there are pending requests. Here is an example of how to set it:

threadpool:    index:        type: cached

fixededit

The fixed thread pool holds a fixed size of threads to handle the requests with a queue (optionally bounded) for pending requests that have no threads to service them.

The size parameter controls the number of threads, and defaults to the number of cores times 5.

The queue_size allows to control the size of the queue of pending requests that have no threads to execute them. By default, it is set to -1 which means its unbounded. When a request comes in and the queue is full, it will abort the request.

threadpool:    index:        type: fixed        size: 30        queue_size: 1000

Processors settingedit

The number of processors is automatically detected, and the thread pool settings are automatically set based on it. Sometimes, the number of processors are wrongly detected, in such cases, the number of processors can be explicitly set using the processors setting. The example below sets the number of processors to 4, which means that the default search thread pool size is 4 x 3 = 12.

processors: 4

This setting is important when running multiple node instances on a single bare-metal machine. Each node will detect that it has the full number of processors. But in reality, they are sharing processors on the single machine. In other words, it is advised to lower the processors setting accordingly. For example, on a 24 core machine and running 3 nodes, set processors to 8.

In order to check the number of processors detected, use the nodes info API with the os flag.

配上一个中文版的解析:

一个Elasticsearch节点会有多个线程池,但重要的是下面四个: 
索引(index):主要是索引数据和删除数据操作(默认是cached类型) 
搜索(search):主要是获取,统计和搜索操作(默认是cached类型) 
批量操作(bulk):主要是对索引的批量操作(默认是cached类型) 
更新(refresh):主要是更新操作(默认是cached类型) 
可以通过给设置一个参数来改变线程池的类型(type),例如,把索引的线程池改成blocking类型: 

Java代码  收藏代码
  1. threadpool:   
  2.     index:   
  3.         type: blocking   
  4.         min: 1   
  5.         size: 30   
  6.         wait_time: 30s  

下面是三种可以设置的线程池的类型 
cache 
cache线程池是一个无限大小的线程池,如果有很请求的话都会创建很多线程,下面是个例子: 
Java代码  收藏代码
  1. threadpool:   
  2.     index:   
  3.         type: cached  

fixed 
fixed线程池保持固定个数的线程来处理请求队列。 
size参数设置线程的个数,默认设置是cpu核心数的5倍 
queue_size可以控制待处理请求队列的大小。默认是设置为-1,意味着无限制。当一个请求到来但队列满了的时候,reject_policy参数可以控制它的行为。默认是abort,会使那个请求失败。设置成caller会使该请求在io线程中执行。 
Java代码  收藏代码
  1. threadpool:   
  2.     index:   
  3.         type: fixed   
  4.         size: 30   
  5.         queue: 1000   
  6.         reject_policy: caller  

blocking 
blocking线程池允许设置一个最小值(min,默认为1)和线程池大小(size,默认为cpu核心数的5倍)。它也有一个等待队列,队列的大小(queue_size )默认是1000,当这队列满了的时候。它会根据定好的等待时间(wait_time,默认是60秒)来调用io线程,如果没有执行就会报错。 
Java代码  收藏代码
  1. threadpool:   
  2.     index:   
  3.         type: blocking   
  4.         min: 1   
  5.         size: 30   
  6.         wait_time: 30s  



所以,我调整了一下自己的配置:


在elasitcsearch.yml中添加:

threadpool.bulk.type: fixed    #设置类型为固定的。cached是无限制的

threadpool.bulk.size: 8  #可使用的处理器。

threadpool.bulk.queue_size: 1000 #设置队列长度,默认是50


这是bulk的配置方法。

对于其他的index等等。也是类似。

0 0