ES 一个索引多少分片合适?(持续更新)

来源:互联网 发布:行业数据库 编辑:程序博客网 时间:2024/05/02 02:20

ES集群简图

ES集群简图

基本概念:

  • cluster node集合,同一cluster name
  • node A single Elasticsearch instance.
  • index doc集合,可能有多个shard组成
  • shard ES的分布式特性,an index is usually split into elements known as shards that are distributed across multiple nodes. shard对用户透明由ES自动管理,按需平衡shard的分配。若确实需要调整shard,则需要reindex。
  • replica shard副本。ES默认每索引创建5 shard,1 replica。高可用(failover:HA)+负载均衡(LB)。

主片、副片区别:

  • only the primary shard can accept indexing requests. Both serve querying requests.
  • 主片静态不可变、副片动态可修改。
  • 分片概念,是基于索引的。
number_of_shards

Replicas are primarily for search performance, and a user can add or remove them at any time.

Announcing Replicated Elasticsearch Clusters on AWS

多与少:适中

A little overallocation is good. A kagillion shards is bad.
Depends on their size and how they are being used.

shard cost:

  1. a shard is essentially a Lucene index, it consumes file handles, memory, and CPU resources.
  2. Each search request will touch a copy of every shard in the index,, which isn’t a problem when the shards are spread across several nodes. Contention arises and performance decreases when the shards are competing for the same hardware resources. 每节点一分片。
  3. Elasticsearch uses term frequency statistics to calculate relevance, but these statistics correspond to individual shards. Result in poor document relevance.

There is therefore always a need for contingency planning.

30GB/20亿/每data一分片
1.5 to 3 times the number of nodes in your initial configuration.
增加节点自平衡。

one shard per index per node
need only one replica, then you’ll need twice as many nodes. Two replicas would require three times the number of nodes.

0 0