1. ES 5.2 官方文档-基本概念

来源:互联网 发布:匹克模考tpo软件 编辑:程序博客网 时间:2024/05/22 05:13

Basic Concepts

There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process.

基本概念

Elasticsearch有一些核心的概念。从开始就理解这些概念会极大的简化学习过程。

Near Realtime (NRT)

Elasticsearch is a near real time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

近实时(NRT)

Elasticsearch是一个近实时的搜索平台。这意味着从你索引一个文档到可以被搜索只有微小的延迟(一般为一秒)。

Cluster
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

集群

集群是一或多个节点(服务器)的集合,这些节点持有你整体数据并提供跨节点联合索引与搜索能力。集群被一个唯一的名字标识,默认为“elasticsearch”。这个名字十分重要,因为一个节点只能是一个集群的一部分并且节点启动时是根据集群的名字来决定加入哪个集群。

Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use logging-dev, logging-stage, and logging-prod for the development, staging, and production clusters.

确保在不同的环境不会复用相同的集群名字,否则会以节点加入错误的集群而告终。打个比方,你可以使用logging-dev,logging-stage和logging-prod作为开发,预发布和生产集群的名字。

Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.

你完全可以拥有一个只有一个节点的集群。此外你也可以有多个独立的集群,每个有拥有自己唯一的集群名字。

Node

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

节点

节点作为你的集群的一部分的独立服务器,保存你的数据并参与集群索引和搜索能力。像集群一样,一个节点被一个名字标识,默认是在节点启动时随机生产的UUID。你可以定义任何节点的名字如果你不想要默认的。这个名字对于管理目的是重要的,当你需要定义在你的网络中哪些服务器对应你Elasticsearch集群的哪个节点。

A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named elasticsearch.

节点可以通过配置集群名字来加入特定的集群。默认的,那个节点被设置加入名为elasticsearch的集群,这意味着如果你在网络中启动了一些节点,假设他们可以互相发现,它们会自动建立并加入一个名为elasticsearch的集群。

In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named elasticsearch.

在一个集群中,你可以拥有任意多个节点。此外,如果在你的网络中没有额外的节点正在运行,启动一个节点默认将建立一个单节点的集群名为elasticsearch。

Index

An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

索引

索引是拥有相似特征文档的集合。比如你可以拥有客户数据的索引,另外的产品目录的索引和订单数据的索引。一个索引被一个名字标识(必须全部小写),这个名字被用作当执行索引,搜索,更新和删除操作文档时的引用。

In a single cluster, you can define as many indexes as you want.

在一个集群中,你可以定义任意多个索引。

Type

Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you. In general, a type is defined for documents that have a set of common fields. For example, let’s assume you run a blogging platform and store all your data in a single index. In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.

类型

在索引中,你可以定义一个或多个类型。一个类型是你索引的一个逻辑分类/分区,它的语义完全由你决定。一般来说,类型用来定义那些拥有共同字段的文档。比方说,假设你正运行一个博客平台并在一个索引中存储了所有的数据。在这个索引中,你可能定义了用户数据的类型,另一个博客数据类型,和一个评论数据类型。

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is an ubiquitous internet data interchange format.

文档

一个文档是可被索引信息的基本单位。比方说,你可以拥有一个客户的文档,一个产品的文档和一个订单的文档。文档被表达为网络中交换数据的JSON格式。

Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

在一个索引/类型中,你可以存储任意数量的文档。注意尽管文档物理上存在与索引中,文档必须被索引和声明在一个索引内的类型中。

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

分片&副本

一个索引潜在可以存储大量的数据,可能超出了单节点的硬件限制。比如,一个拥有十亿文档占有1TB磁盘空间或许不适合在单节点磁盘上或向一个节点请求搜索会非常慢。

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.

为了解决问题,Elasticsearch提供了划分你的索引为多个称为分片的能力。当你创建了索引,你可以简单的定义你想要的分片数量。每个分片自身都是一个全功能、独立的“索引”,并可以在集群中的任意节点放置。

Sharding is important for two primary reasons:

分片因为两个原因很重要:

It allows you to horizontally split/scale your content volume

允许你水平切分/调整你的内容卷。

It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

允许你跨分片分布和并行操作(在多个节点中),因此增加了性能和吞吐量。

The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.

分片是如何被分布和搜索请求中如何聚合文档的机制完全被Elasticsearch管理,并且对用户来说是透明的。

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

在网络/云环境,错误会在任何时间发生,为了防止一个分片/节点不知为啥下线或因为什么原因不见了,拥有故障转移的机制是十分有用和推荐的。以此为目的,Elasticsearch允许你拥有一份或多问你索引分片的备份,他们被称为副本分片或副本集为简称。

Replication is important for two primary reasons:
因为两个原因副本是很重要的:

It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

副本提供了高可用性以防分片/节点故障。因此要明白分片的副本永远不会存储在其拷贝的原/主分片上。

副本允许你伸缩你的搜索卷/吞吐量,因为搜索可以并行的在所有副本上执行。

To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards). The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact.

总的来所,每个索引可以被分为多个分片。一个索引也可以被复制零或多次。一旦被复制,每个索引将拥有多个主分片(被复制的源分片)。分片和副本的数量可以以索引为单位在索引被创建的时候被定义。在索引创建后,你可以在任何时候动态改变副本的数量,但是你不能改变分片的数量。

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

Note
Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can monitor shard sizes using the _cat/shards api.

With that out of the way, let’s get started with the fun part…

0 0