SolrCloud功能和架构 SolrCloud Features and Architecture

来源:互联网 发布:计算机编程课程 编辑:程序博客网 时间:2024/05/17 07:38

摘自:http://www.xuebuyuan.com/1742284.html

英文:http://blog.sematext.com/2012/02/01/solrcloud-distributed-realtime-search/


Some of the nicethings about SolrCloud are:

  • centralized cluster configuration
  • automatic node fail-over
  • near real time search
  • leader election
  • durable writes

下面是SolrCloud一些不错的功能: 

§ 中心化集群配置

§ 自动容灾

§ 近实时搜索

§ 领导选举

§ 索引持久化

Furthermore,SolrCloud can be configured to:

  • have multiple index shards
  • have one or more replicas of each shards

另外SolrCloud也能被配置成: 
分片(shard)索引 
每个shard可以有一个或多个副本(replica) 

Shards andReplicas are arranged into Collections. Multiple Collections can be deployed ina single SolrCloud cluster.  A single search request can search multipleCollections at once, as long as they are compatible. The diagram below shows ahigh-level picture of how SolrCloud indexing works.

多个shard和replica可以组成一个Collection(从图中可以看出就是一个SolrCloud), 多个Collection可以部署到一个SolrCloud集群. 而一个搜索请求可以同时搜索多个Collection. 其工作流程就像下图中那样. 


SolrCloud Shards, Replicas, Replication

As the above diagram shows, documents can be sent to any SolrCloudnode/instance in the SolrCloud cluster.  Documents are automaticallyforwarded to the appropriate Shard Leader (labeled as Shard 1 and Shard2 in the diagram). This is done automatically and documents are sent inbatches between Shards. If a Shard has one or more replicas (labeled Shard1 replica and Shard 2 replica in the diagram) a document will getreplicated to one or more replicas.  Unlike in traditional master-slaveSolr setups where index/shard replication is performed periodically in batches,replication in SolrCloud is done in real-time.  This is how DistributedIndexing works at the high level.  We simplified things a bit, of course –for example, there is no ZooKeeper or overseer shownin our diagram.

就像上图那样,一个新的doc将发送到一个SolrCloud集群中任何一个节点。documents能自动选择发送到适当的一个ShardLeader(就像图表中标记的shard1和shard2)。这个过程是自动的,并且documents被批量的发送到这些shard中。如果一个Shard有一个或多个replica副本(在图表中标记的shard1 replica和shard2 replica),一个document会被复制到一个或多个replica中。在solrcloud中索引复制是实时的,不像solr传统的master-slave方式,在master-slave方式中,索引是定时批量复制的。

Setup Details

All configurationfiles are stored in ZooKeeper.  If you are not familiar with ZooKeeper youcan think of it as a distributed file system where SolrCloud configurationfiles are stored. When the first Solr instance in a SolrCloud cluster isstarted configuration files need to be sent to ZooKeeper and one needs tospecify how many shards there should be in the cluster. Then, this Solrinstance/node is running one can start additional Solr instances/nodes andpoint them to the ZooKeeper  instance (ZooKeeper is actually typicallydeployed as a quorum or 3, 5, or more instances in production environments). 

集群配置 

SolrCloud集群的所有的配置存储在ZooKeeper。 一旦一个SolrCloud节点启动, 该节点的配置信息将发送到ZooKeeper上存储.

Shard Replicas inSolrCloud serve multiple purposes.  

They provide faulttolerance in the sense that when (not if!) a single Solr instance/nodecontaining a portion of the index goes down, you still have one or morereplicas of data that was served by that instance else where in the cluster andthus you still have the whole data set and no data loss.  They also allowyou to spread query load over more servers, this making the cluster capable ofhandling higher query rates.

Shard Replicas在SolrCloud服务中的目的。

他们提供容错备份存在, 当一个包含该索引的solr节点挂掉之后,你仍然有一个或多个备份数据提供服务,因此你仍然拥有全部的数据并且没有数据丢失。

他们可以允许你将查询负载分配到更多的服务器上,提高整个集群的查询能力.

Indexing

As you saw above,the new SolrCloud really simplifies Distributed Indexing.  Documentdistribution between Shards and Replicas is automatic and real-time. There is no master server one needs to send all documents to. A documentcan be sent to any SolrCloud instance and SolrCloud takes care of the rest.Because of this, there is no longer a SPOF (Single Point of Failure) in Solr. Previously, Solr master was a SPOF in all but the most elaborate setups.

索引处理 

SolrCloud简化了分布式索引。索引文档在Shard和Replica之间的分发是自动和实时的。不再需要接收所有发送的索引文档的master server。 因为不存在master server, doc可以发送到任何一个SolrCloud的实例,然后由SolrCloud完成剩下的事情。这样就不再存在以前master/slave的单点问题. 

Querying

One can querySolrCloud a few different ways:

  • One can query a single Shard, which is just like Solr querying a search a single Solr instance.
  • The second option is to query a single Collection (i.e., search all shards holding pieces of a given Collection’s index).
  • The third option is to only query some of the Shards by specifying their addresses or names.
  • Finally, one can query multiple Collections assuming they are compatible and Solr can merge results they return.

 搜索方式 
有三种不同的搜索方式: 
在单个Solr实例上搜索 
在单个Collection上搜索(即在一个Collection的多个Shard上搜索) 
在指定的Shard上搜索 
在多个Collection上搜索, 并将最后merge的结果返回. 







0 0
原创粉丝点击