HBase学习笔记 --- hbase-indexer WIKI
来源:互联网 发布:淘宝网优惠网站 编辑:程序博客网 时间:2024/06/06 09:31
Introduction
The HBase Indexer project provides indexing (via Solr) for content stored in HBase. It provides a flexible and extensible way of defining indexing rules, and is designed to scale.
Indexing is performed asynchronously, so it does not impact write throughput on HBase. SolrCloud is used for storing the actual index in order to ensure scalability of the indexing.
Getting started with the HBase Indexer
- Make sure you've got the required software installed, as detailed on the Requirements page.
- Follow the Tutorial to get a feel for how to use the HBase Indexer.
- Customize your indexing setup as needed using the other reference documentation provided here.
How it works
The HBase Indexer works by acting as an HBase replication sink. As updates are written to HBase region servers, they are "replicated" asynchronously to the HBase Indexer processes.
The indexer analyzes incoming HBase mutation events, and where applicable it creates Solr documents and pushes them to SolrCloud servers.
The indexed documents in Solr contain enough information to uniquely identify the HBase row that they are based on, allowing you to use Solr to search for content that is stored in HBase.
HBase replication is based on reading the HBase log files, which are the precise source of truth of the what is stored in HBase: there are no missing or no extra events. In various cases, the log also contains all the information needed to index, so that no expensive random-read on HBase is necessary (see the read-row attribute in the Indexer Configuration).
HBase replication delivers (small) batches of events. HBase-indexer exploits this by avoiding double-indexing of the same row if it would have been updated twice in a short time frame, and as well will batch/buffer the updates towards Solr, which gives important performance gains. The updates are applied to Solr before confirming the processing of the events to HBase, so that no event loss is possible.
Horizontal scalability
All information about indexers is stored in ZooKeeper. New indexer hosts can always be added to a cluster, in the same way that HBase regionservers can be added to to an HBase cluster.
All indexing work for a single configured indexer is shared over all machines in the cluster. In this way, adding additional indexer nodes allows horizontal scaling.
Automatic failure handling
The HBase replication system upon which the HBase Indexer is based is designed to handle hardware failures. Because the HBase Indexer is based on this system, it also benefits from the same ability to handle failures.
In general, indexing nodes going down or Solr nodes going down will not result in any lost data in the HBase Indexer.
=========================================================================
以下为译文:
上述可以分为几块理解:
Getting started with the HBase Indexer
How it works
Horizontal scalability
Automatic failure handling
故障自动处理:此处设计和硬件设计一致采用原子性设计,因此故障自动处理非常有效。- HBase学习笔记 --- hbase-indexer WIKI
- HBase学习笔记 --- 调研HBase Indexer
- hbase-indexer
- 使用Lily HBase Indexer
- 深入理解HBase Indexer
- hbase-indexer环境搭建
- 深入理解HBase Indexer
- Hbase学习笔记:初识HBase
- Lily HBase Indexer使用整理
- Hbase学习笔记汇总
- HBase/Hadoop学习笔记
- Hbase学习笔记一
- HBase学习笔记
- Hbase学习笔记
- Hbase学习笔记
- hbase-0.98学习笔记
- HBase笔记:学习要点
- hbase学习笔记
- POJ 1077 八数码问题
- webappProject.gradle
- Ember 从0到1
- Android中的string资源占位符及Plurals string
- 微信支付出现--3当前页面的URL未注册
- HBase学习笔记 --- hbase-indexer WIKI
- JVM栈
- CentOS6.4下Mysql数据库的安装与配置
- jquery 将循环生成的多个tr放入table中的指定位置一法
- 文章标题 POJ 3186 : Treats for the Cows (区间DP)
- 【PAT】1006. Sign In and Sign Out (25)
- 删除rabbitmq的队列和队列中的数据
- 几种有关排序的常见面试问题
- JVM字节码指令集简介