Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化
来源:互联网 发布:数据库的实现原理 编辑:程序博客网 时间:2024/06/07 15:51
hbase节点出问题,一般是ZK认为该hbase节点不可用,主动从ZK中踢出了该hbase节点;
该hbase节点发现ZK上自己被踢出,自己发起shutdown关闭服务;
一般解决问题思路是查看该hbase节点的日志,从日志入手解决问题,目前已知如下状况会导致节点宕机:
1)FULL GC,优化GC设置,修改HBASE_REGIONSERVER_OPTS,采用并发回收机制等;
2)所有分区memstore一起flushing,阻塞一切读写,达到ZK超时时间,归根结底是给的内存太少,加大HBASE_HEAPSIZE;
3)split操作阻塞了读写,达到ZK超时时间,提前做规划,提前预分区,防止后期频繁split;
1.前言
使用hbase有一段时间了,从最开始对hbase读写性能的怀疑,到最后对hbase读写性能的肯定,经历了一个漫长的过程,在此,对hbase相关性能优化写一点个人的总结。
2.官方关于性能优化(最权威)
所有关于技术类的文档,一般官网会有个优化建议,怎么去找呢,一般文档中搜索“Performance Tuning”,意思为性能优化,即可查到。
官方文档其实写的很全面,但点到即止,主要从操作系统、网络、Java、HBase 配置、ZooKeeper、Schema 设计阐述了相关性能优化建议,这里只是贴出文档地址,我在这里不做过多讲解。
2.1 性能优化英文版https://hbase.apache.org/0.94/book.html#performance
2.2 性能优化中文版http://abloz.com/hbase/book.html#performance
3.性能优化关键点
3.1 内存
3.1.1 hbase客户端如何优化
HTable客户端的写缓冲的默认大小。这个值越大,需要消耗的内存越大。
先来看下创建hbase客户端的源代码:
1)pom.xml引入hbase-client
<dependency><groupId>org.apache.hbase</groupId><artifactId>hbase-client</artifactId><version>1.2.1</version></dependency>2)创建hbase表操作对象HTable
Configuration configuration = null;configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.property.clientPort", "2181");configuration.set("hbase.client.write.buffer", "5242880");configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");HTable table = new HTable(configuration,"tableName") ;try { // Use the table as needed, for a single operation and a single thread // construct List<Put> putLists // call table.put(putLists)} finally { table.close();}创建HTable对象时候传入了我们自己构造的org.apache.hadoop.hbase.HBaseConfiguration,被传入之后,首先调用了如下构造方法:
@Deprecated public HTable(Configuration conf, final String tableName) throws IOException { this(conf, TableName.valueOf(tableName)); }紧接着调用了如下方法:
@Deprecated public HTable(Configuration conf, final TableName tableName) throws IOException { this.tableName = tableName; this.cleanupPoolOnClose = this.cleanupConnectionOnClose = true; if (conf == null) { this.connection = null; return; } this.connection = ConnectionManager.getConnectionInternal(conf); this.configuration = conf; this.pool = getDefaultExecutor(conf); this.finishSetup(); }这里我们有必要看下this.finishSetup();干了啥,代码如下:
private void finishSetup() throws IOException { if (connConfiguration == null) { connConfiguration = new ConnectionConfiguration(configuration); } this.operationTimeout = tableName.isSystemTable() ? connConfiguration.getMetaOperationTimeout() : connConfiguration.getOperationTimeout(); this.scannerCaching = connConfiguration.getScannerCaching(); this.scannerMaxResultSize = connConfiguration.getScannerMaxResultSize(); if (this.rpcCallerFactory == null) { this.rpcCallerFactory = connection.getNewRpcRetryingCallerFactory(configuration); } if (this.rpcControllerFactory == null) { this.rpcControllerFactory = RpcControllerFactory.instantiate(configuration); } // puts need to track errors globally due to how the APIs currently work. multiAp = this.connection.getAsyncProcess(); this.closed = false; this.locator = new HRegionLocator(tableName, connection); }
在上面代码里,虽然我们传入了org.apache.hadoop.hbase.HBaseConfiguration,但是HTable自己维护了一个private ConnectionConfiguration connConfiguration,一定要注意这个对象,这个对象首先是基于程序员传入的org.apache.hadoop.hbase.HBaseConfiguration进行构建,那么如果某些值程序员不传入,那如何处理呢,大家应该会猜到使用默认值,的确如此,如果你不传入,那么ConnectionConfiguration connConfiguration就使用默认值,有哪些默认值呢,这里我们继续详看org.apache.hadoop.hbase.client.ConnectionConfiguration的源代码:
/** * Licensed to the Apache Software Foundation (ASF) under one or more contributor license * agreements. See the NOTICE file distributed with this work for additional information regarding * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance with the License. You may obtain a * copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable * law or agreed to in writing, software distributed under the License is distributed on an "AS IS" * BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License * for the specific language governing permissions and limitations under the License. */package org.apache.hadoop.hbase.client;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HConstants;import org.apache.hadoop.hbase.classification.InterfaceAudience;import com.google.common.annotations.VisibleForTesting;/** * Configuration parameters for the connection. * Configuration is a heavy weight registry that does a lot of string operations and regex matching. * Method calls into Configuration account for high CPU usage and have huge performance impact. * This class caches connection-related configuration values in the ConnectionConfiguration * object so that expensive conf.getXXX() calls are avoided every time HTable, etc is instantiated. * see HBASE-12128 */@InterfaceAudience.Privatepublic class ConnectionConfiguration { public static final String WRITE_BUFFER_SIZE_KEY = "hbase.client.write.buffer"; public static final long WRITE_BUFFER_SIZE_DEFAULT = 2097152; public static final String MAX_KEYVALUE_SIZE_KEY = "hbase.client.keyvalue.maxsize"; public static final int MAX_KEYVALUE_SIZE_DEFAULT = -1; private final long writeBufferSize; private final int metaOperationTimeout; private final int operationTimeout; private final int scannerCaching; private final long scannerMaxResultSize; private final int primaryCallTimeoutMicroSecond; private final int replicaCallTimeoutMicroSecondScan; private final int retries; private final int maxKeyValueSize; /** * Constructor * @param conf Configuration object */ ConnectionConfiguration(Configuration conf) { this.writeBufferSize = conf.getLong(WRITE_BUFFER_SIZE_KEY, WRITE_BUFFER_SIZE_DEFAULT); this.metaOperationTimeout = conf.getInt( HConstants.HBASE_CLIENT_META_OPERATION_TIMEOUT, HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT); this.operationTimeout = conf.getInt( HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT); this.scannerCaching = conf.getInt( HConstants.HBASE_CLIENT_SCANNER_CACHING, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING); this.scannerMaxResultSize = conf.getLong(HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE); this.primaryCallTimeoutMicroSecond = conf.getInt("hbase.client.primaryCallTimeout.get", 10000); // 10ms this.replicaCallTimeoutMicroSecondScan = conf.getInt("hbase.client.replicaCallTimeout.scan", 1000000); // 1000 ms this.retries = conf.getInt( HConstants.HBASE_CLIENT_RETRIES_NUMBER, HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER); this.maxKeyValueSize = conf.getInt(MAX_KEYVALUE_SIZE_KEY, MAX_KEYVALUE_SIZE_DEFAULT); } /** * Constructor * This is for internal testing purpose (using the default value). * In real usage, we should read the configuration from the Configuration object. */ @VisibleForTesting protected ConnectionConfiguration() { this.writeBufferSize = WRITE_BUFFER_SIZE_DEFAULT; this.metaOperationTimeout = HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT; this.operationTimeout = HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT; this.scannerCaching = HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING; this.scannerMaxResultSize = HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE; this.primaryCallTimeoutMicroSecond = 10000; this.replicaCallTimeoutMicroSecondScan = 1000000; this.retries = HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER; this.maxKeyValueSize = MAX_KEYVALUE_SIZE_DEFAULT; } public long getWriteBufferSize() { return writeBufferSize; } public int getMetaOperationTimeout() { return metaOperationTimeout; } public int getOperationTimeout() { return operationTimeout; } public int getScannerCaching() { return scannerCaching; } public int getPrimaryCallTimeoutMicroSecond() { return primaryCallTimeoutMicroSecond; } public int getReplicaCallTimeoutMicroSecondScan() { return replicaCallTimeoutMicroSecondScan; } public int getRetriesNumber() { return retries; } public int getMaxKeyValueSize() { return maxKeyValueSize; } public long getScannerMaxResultSize() { return scannerMaxResultSize; }}
这里我把上面代码的默认值抽取如下:
- hbase.client.write.buffer 默认2097152Byte,也即2MB
- hbase.client.meta.operation.timeout 默认1200000毫秒
- hbase.client.operation.timeout 默认1200000毫秒
- hbase.client.scanner.caching 默认Integer.MAX_VALUE
- hbase.client.scanner.max.result.size 默认2MB
- hbase.client.primaryCallTimeout.get 默认10000毫秒
- hbase.client.replicaCallTimeout.scan 默认1000000毫秒
- hbase.client.retries.number 默认31次
- hbase.client.keyvalue.maxsize 默认-1,不限制
- hbase.client.ipc.pool.type
- hbase.client.ipc.pool.size
- hbase.client.pause 100
- hbase.client.max.total.tasks 100
- hbase.client.max.perserver.tasks 2
- hbase.client.max.perregion.tasks 1
- hbase.client.instance.id
- hbase.client.scanner.timeout.period 60000
- hbase.client.rpc.codec
- hbase.regionserver.lease.period 被hbase.client.scanner.timeout.period代替,60000
- hbase.client.fast.fail.mode.enabled FALSE
- hbase.client.fastfail.threshold 60000
- hbase.client.fast.fail.cleanup.duration 600000
- hbase.client.fast.fail.interceptor.impl
- hbase.client.backpressure.enabled false
Configuration configuration = null;configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.property.clientPort", "2181");configuration.set("hbase.client.write.buffer", "5242880");configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");Connection connection = ConnectionFactory.createConnection(configuration);Table table = connection.getTable(TableName.valueOf("tableName"));try { // Use the table as needed, for a single operation and a single thread // construct List<Put> putLists // call table.put(putLists)} finally { table.close(); connection.close();}创建HTable对象时候传入了我们自己构造的org.apache.hadoop.hbase.HBaseConfiguration,被传入之后,org.apache.hadoop.hbase.client.ConnectionFactory默认会调用ConnectionManager.HConnectionImplementation.class构造Connection对象,org.apache.hadoop.hbase.client.HConnectionImplementation中源码如下:
/** * constructor * @param conf Configuration object * @param managed If true, does not do full shutdown on close; i.e. cleanup of connection * to zk and shutdown of all services; we just close down the resources this connection was * responsible for and decrement usage counters. It is up to the caller to do the full * cleanup. It is set when we want have connection sharing going on -- reuse of zk connection, * and cached region locations, established regionserver connections, etc. When connections * are shared, we have reference counting going on and will only do full cleanup when no more * users of an HConnectionImplementation instance. */ HConnectionImplementation(Configuration conf, boolean managed, ExecutorService pool, User user) throws IOException { this(conf); this.user = user; this.batchPool = pool; this.managed = managed; this.registry = setupRegistry(); retrieveClusterId(); this.rpcClient = RpcClientFactory.createClient(this.conf, this.clusterId, this.metrics); this.rpcControllerFactory = RpcControllerFactory.instantiate(conf); // Do we publish the status? boolean shouldListen = conf.getBoolean(HConstants.STATUS_PUBLISHED, HConstants.STATUS_PUBLISHED_DEFAULT); Class<? extends ClusterStatusListener.Listener> listenerClass = conf.getClass(ClusterStatusListener.STATUS_LISTENER_CLASS, ClusterStatusListener.DEFAULT_STATUS_LISTENER_CLASS, ClusterStatusListener.Listener.class); if (shouldListen) { if (listenerClass == null) { LOG.warn(HConstants.STATUS_PUBLISHED + " is true, but " + ClusterStatusListener.STATUS_LISTENER_CLASS + " is not set - not listening status"); } else { clusterStatusListener = new ClusterStatusListener( new ClusterStatusListener.DeadServerHandler() { @Override public void newDead(ServerName sn) { clearCaches(sn); rpcClient.cancelConnections(sn); } }, conf, listenerClass); } } }继续跟进第一句this(conf);
protected HConnectionImplementation(Configuration conf) { this.conf = conf; this.connectionConfig = new ConnectionConfiguration(conf); this.closed = false; this.pause = conf.getLong(HConstants.HBASE_CLIENT_PAUSE, HConstants.DEFAULT_HBASE_CLIENT_PAUSE); this.useMetaReplicas = conf.getBoolean(HConstants.USE_META_REPLICAS, HConstants.DEFAULT_USE_META_REPLICAS); this.numTries = connectionConfig.getRetriesNumber(); this.rpcTimeout = conf.getInt( HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT); if (conf.getBoolean(CLIENT_NONCES_ENABLED_KEY, true)) { synchronized (nonceGeneratorCreateLock) { if (ConnectionManager.nonceGenerator == null) { ConnectionManager.nonceGenerator = new PerClientRandomNonceGenerator(); } this.nonceGenerator = ConnectionManager.nonceGenerator; } } else { this.nonceGenerator = new NoNonceGenerator(); } stats = ServerStatisticTracker.create(conf); this.asyncProcess = createAsyncProcess(this.conf); this.interceptor = (new RetryingCallerInterceptorFactory(conf)).build(); this.rpcCallerFactory = RpcRetryingCallerFactory.instantiate(conf, interceptor, this.stats); this.backoffPolicy = ClientBackoffPolicyFactory.create(conf); if (conf.getBoolean(CLIENT_SIDE_METRICS_ENABLED_KEY, false)) { this.metrics = new MetricsConnection(this); } else { this.metrics = null; } this.hostnamesCanChange = conf.getBoolean(RESOLVE_HOSTNAME_ON_FAIL_KEY, true); this.metaCache = new MetaCache(this.metrics); }其实 ConnectionManager.HConnectionImplementation也是维护了自己的connectionConfig,很多值也是来自默认值。再来看下提交请求到hbase服务端的代码table.put(putLists),源代码如下:
public void put(final List<Put> puts) throws IOException { getBufferedMutator().mutate(puts); if (autoFlush) { flushCommits(); } }这里首先执行 getBufferedMutator().mutate(puts);它首先通过getBufferedMutator()从程序员传入的org.apache.hadoop.hbase.HBaseConfiguration构造的ClusterConnection connection获取了客户端hbase.client.write.buffer的配置(默认2MB,不配置就取默认值),然后调用mutate(List<? extends Mutation> ms)方法,mutate(List<? extends Mutation> ms)方法如下:
public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException { if (closed) { throw new IllegalStateException("Cannot put when the BufferedMutator is closed."); } long toAddSize = 0; for (Mutation m : ms) { if (m instanceof Put) { validatePut((Put) m); } toAddSize += m.heapSize(); } // This behavior is highly non-intuitive... it does not protect us against // 94-incompatible behavior, which is a timing issue because hasError, the below code // and setter of hasError are not synchronized. Perhaps it should be removed. if (ap.hasError()) { currentWriteBufferSize.addAndGet(toAddSize); writeAsyncBuffer.addAll(ms); backgroundFlushCommits(true); } else { currentWriteBufferSize.addAndGet(toAddSize); writeAsyncBuffer.addAll(ms); } // Now try and queue what needs to be queued. while (currentWriteBufferSize.get() > writeBufferSize) { backgroundFlushCommits(false); } }大体代码意思是你提交了一批List<Put>,现在我客户端需要判断下你提交的这批记录占用的总的Byte大小是否超过了刚设置的hbase.client.write.buffer的大小,只要总大小超过hbase.client.write.buffer,就调用backgroundFlushCommits(false),分析backgroundFlushCommits如下:
private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException { LinkedList<Mutation> buffer = new LinkedList<>(); // Keep track of the size so that this thread doesn't spin forever long dequeuedSize = 0; try { // 分析所有提交的List<Put>,Put是Mutation的实现 Mutation m;//如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous) //并且writeAsyncBuffer里仍然有Mutation对象 //那么就不断计算所占空间大小dequeuedSize,currentWriteBufferSize的大小则递减 while ( (writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) { buffer.add(m); long size = m.heapSize(); dequeuedSize += size; currentWriteBufferSize.addAndGet(-size); } //backgroundFlushCommits(false)时候,当List<Put>,这里不会进入 if (!synchronous && dequeuedSize == 0) { return; } //backgroundFlushCommits(false)时候,这里会进入,并且不会等待结果返回 if (!synchronous) { ap.submit(tableName, buffer, true, null, false); if (ap.hasError()) { LOG.debug(tableName + ": One or more of the operations have failed -" + " waiting for all operation in progress to finish (successfully or not)"); } } //backgroundFlushCommits(true)时候,这里会进入,并且会等待结果返回 if (synchronous || ap.hasError()) { while (!buffer.isEmpty()) { ap.submit(tableName, buffer, true, null, false); } RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null); if (error != null) { if (listener == null) { throw error; } else { this.listener.onException(error, this); } } } } finally { for (Mutation mut : buffer) { long size = mut.heapSize(); currentWriteBufferSize.addAndGet(size); dequeuedSize -= size; writeAsyncBuffer.add(mut); } } }这里会调用ap.submit(tableName, buffer, true, null, false)直接提交,并且不会等待返回结果,而ap.submit(tableName, buffer, true, null, false)会调用AsyncProcess.submit(ExecutorService pool, TableName tableName,List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback,boolean needResults),这里源代码如下:
/** * Extract from the rows list what we can submit. The rows we can not submit are kept in the * list. Does not send requests to replicas (not currently used for anything other * than streaming puts anyway). * * @param pool ExecutorService to use. * @param tableName The table for which this request is needed. * @param callback Batch callback. Only called on success (94 behavior). * @param needResults Whether results are needed, or can be discarded. * @param rows - the submitted row. Modified by the method: we remove the rows we took. * @param atLeastOne true if we should submit at least a subset. */ public <CResult> AsyncRequestFuture submit(ExecutorService pool, TableName tableName, List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback, boolean needResults) throws InterruptedIOException { if (rows.isEmpty()) { return NO_REQS_RESULT; } Map<ServerName, MultiAction<Row>> actionsByServer = new HashMap<ServerName, MultiAction<Row>>(); List<Action<Row>> retainedActions = new ArrayList<Action<Row>>(rows.size()); NonceGenerator ng = this.connection.getNonceGenerator(); long nonceGroup = ng.getNonceGroup(); // Currently, nonce group is per entire client. // Location errors that happen before we decide what requests to take. List<Exception> locationErrors = null; List<Integer> locationErrorRows = null; do { // Wait until there is at least one slot for a new task. waitForMaximumCurrentTasks(maxTotalConcurrentTasks - 1); // Remember the previous decisions about regions or region servers we put in the // final multi. Map<HRegionInfo, Boolean> regionIncluded = new HashMap<HRegionInfo, Boolean>(); Map<ServerName, Boolean> serverIncluded = new HashMap<ServerName, Boolean>(); int posInList = -1; Iterator<? extends Row> it = rows.iterator(); while (it.hasNext()) { Row r = it.next(); HRegionLocation loc; try { if (r == null) { throw new IllegalArgumentException("#" + id + ", row cannot be null"); } // Make sure we get 0-s replica. RegionLocations locs = connection.locateRegion( tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID); if (locs == null || locs.isEmpty() || locs.getDefaultRegionLocation() == null) { throw new IOException("#" + id + ", no location found, aborting submit for" + " tableName=" + tableName + " rowkey=" + Bytes.toStringBinary(r.getRow())); } loc = locs.getDefaultRegionLocation(); } catch (IOException ex) { locationErrors = new ArrayList<Exception>(); locationErrorRows = new ArrayList<Integer>(); LOG.error("Failed to get region location ", ex); // This action failed before creating ars. Retain it, but do not add to submit list. // We will then add it to ars in an already-failed state. retainedActions.add(new Action<Row>(r, ++posInList)); locationErrors.add(ex); locationErrorRows.add(posInList); it.remove(); break; // Backward compat: we stop considering actions on location error. } if (canTakeOperation(loc, regionIncluded, serverIncluded)) { Action<Row> action = new Action<Row>(r, ++posInList); setNonce(ng, r, action); retainedActions.add(action); // TODO: replica-get is not supported on this path byte[] regionName = loc.getRegionInfo().getRegionName(); addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup); it.remove(); } } } while (retainedActions.isEmpty() && atLeastOne && (locationErrors == null)); if (retainedActions.isEmpty()) return NO_REQS_RESULT; return submitMultiActions(tableName, retainedActions, nonceGroup, callback, null, needResults, locationErrors, locationErrorRows, actionsByServer, pool); }上面代码会去寻找提交的List<Put>的每个Put对象对应的region是哪个,对应的regionserver是哪个,然后进行批量提交,这里要提到另外一个值hbase.client.max.total.tasks(默认值100,意思为客户端最大处理线程数),如果去请求Put对象对应的region是哪个和对应的regionserver是哪个的操作大于100,那么就要等待,咋们继续看代码put代码:
public void put(final List<Put> puts) throws IOException { getBufferedMutator().mutate(puts); if (autoFlush) { flushCommits(); } }
紧接着,如果HTable的属性autoFlush(默认为true),那么不管剩下的数据多少,也会进行最后一次提交数据到hbase服务端,这时候flushCommits()里调用的是getBufferedMutator().flush(),而getBufferedMutator().flush()调用的是BufferedMutatorImpl.backgroundFlushCommits(true),最后调用上面的ap.submit(tableName, buffer, true, null, false)并且会调用ap.waitForAllPreviousOpsAndReset(null)等待返回结果。
说了这么多,咋们来总结一下:
- 客户端创建HTable会传入org.apache.hadoop.hbase.HBaseConfiguration对象,这里面的值你不设置,就会使用默认值,所有的默认值如下:
- hbase.client.write.buffer 默认2097152Byte,也即2MB
- hbase.client.meta.operation.timeout 默认1200000毫秒
- hbase.client.operation.timeout 默认1200000毫秒
- hbase.client.scanner.caching 默认Integer.MAX_VALUE
- hbase.client.scanner.max.result.size 默认2MB
- hbase.client.primaryCallTimeout.get 默认10000毫秒
- hbase.client.replicaCallTimeout.scan 默认1000000毫秒
- hbase.client.retries.number 默认31次
- hbase.client.keyvalue.maxsize 默认-1,不限制
- hbase.client.ipc.pool.type
- hbase.client.ipc.pool.size
- hbase.client.pause 100
- hbase.client.max.total.tasks 100
- hbase.client.max.perserver.tasks 2
- hbase.client.max.perregion.tasks 1
- hbase.client.instance.id
- hbase.client.scanner.timeout.period 60000
- hbase.client.rpc.codec
- hbase.regionserver.lease.period 被hbase.client.scanner.timeout.period代替,60000
- hbase.client.fast.fail.mode.enabled FALSE
- hbase.client.fastfail.threshold 60000
- hbase.client.fast.fail.cleanup.duration 600000
- hbase.client.fast.fail.interceptor.impl
- hbase.client.backpressure.enabled false
- 当你提交记录到hbase服务端写的时候调用table.put(List<Put> putLists),这个时候hbase客户端先从hbase.client.write.buffer设置大小预总的List<Put>的大小进行对比,如果超过hbase.client.write.buffer的值,不断进行提交,知道剩下的未提交记录的总大小小于hbase.client.write.buffer设置的值
- 默认HTable的autoFlush=true,接着hbase客户端默认会自己把刚才剩下的未提交记录进行最后一次提交
那么,现在客户端我们该如何优化呢,首先咋们要自己配置hbase.client.write.buffer的大小,默认2MB太小,这里我加大到5MB,所以代码里我需要增加下面这句:
configuration.set("hbase.client.write.buffer", "5242880");
第二,咋们不可能每次都提交一条记录,这是浪费和hbase服务端的交互时间,我们这里采用批量提交,我们得首先收集尽可能大的List<Put>,List<Put>多少合适呢,你首先需要知道你单个Put的大小,方法如下:
//Put put = new Put(rowkey.getBytes());//put.add("columnFamily".getBytes(), "columnNameUnderColumnFamily".getBytes(), "columnValue".getBytes());System.out.println(put.heapSize());
知道了hbase.client.write.buffer和单个Put的大小,那么问题明朗了,控制多少List<Put>就可预测了,有人说为啥要控制List<Put>的大小,我这里只能告诉你,实测当List<Put>的大小不断增加时,前期写入速度会增加,增大到一个峰值后,写入速度反而会降低,最佳值是小于(hbase.client.write.buffer/put byte)这个值。
另一个问题,hbase为了防止Hasee服务端节点故障,默认开启了WAL,WAL的意思是目前写入的在memstore(hbase里每个列簇每个region对应一个memstore,默认128MB,数据先写memstore,超过128MB就刷HFILE,单节点所有region的memstore之和如果超过JVM内存,会有危险,所以还有两个临界值,当触及临界值,也会刷HFILE)时候,会在HDFS上写一个日志文件,WAL在memstore刷HFILE成功之后,会清理过期的WAL,WAL好处是保证了数据尽量不丢失,坏处是降低了hbase写的性能,如果你可以容忍部分数据丢失,关闭WAL可以提高hbase写入速度,关闭的方法是构造Put对象时候加上这句:
put.setDurability(Durability.SKIP_WAL);
最后的代码如下:
Configuration configuration = null;configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.property.clientPort", "2181");configuration.set("hbase.client.write.buffer", "5242880");configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");HTable table = new HTable(configuration,"tableName") ;try { // Use the table as needed, for a single operation and a single thread // construct List<Put> putLists // call table.put(putLists) int bestSubmitCount=2500;//实际中最佳的批量提交Put个数需要通过(hbase.client.write.buffer/Put byte)计算 List<Put> putList = new ArrayList<Put>();for (RowData row : tableData.getRows()) {if (null == row.getColumns() || 0 == row.getColumns().size())continue;Put put = new Put(row.getRowKey());for (ColumnData column : row.getColumns()) {put.add(column.getFamily(), column.getQualifier(), column.getValue());}put.setDurability(Durability.SKIP_WAL);putList.add(put);if(putList.size()>bestSubmitCount){//采用批量提交,不要一条提交一次//更好的建议是,能根据rowkey对list进行分组,rowkey对应为同一预分区的放同一list,然后提交list,这样速度更快//更好更好的建议是达到最佳记录就提交table.put(putList);putList.clear();}}//剩下未提交的数据做最后一次提交table.put(putList);} finally { table.close();}
注意:
1)采用批量提交,不要一条提交一次,最好设置hbase.client.write.buffer值,2MB确实太小
2)更好的建议是,能根据rowkey对list进行分组,rowkey对应为同一预分区的放同一list,然后提交list,这样速度更快
3)能预知最佳提交记录数,效果会更好
3.1.2 hbase服务端优化
1)参数hbase.regionserver.handler.count的本质是设置一个RegsionServer可以同时处理多少请求。 如果定的太高,吞吐量反而会降低;如果定的太低,请求会被阻塞,得不到响应。你可以打开RPC-level日志读Log,来决定对于你的集群什么值是合适的。(请求队列也是会消耗内存的)。我的配置如下:
<property> <name>hbase.regionserver.handler.count</name> <value>300</value> <description>Count of RPC Listener instances spun up on RegionServers.Same property is used by the Master for count of master handlers.</description> </property>
2)hbase-env.sh中HEAP_SIZE优化
修改hbase-1.2.1/conf/hbase-env.sh中HBASE_HEAPSIZE,我的配置如下:
3)hbase内存配置,内存配置先要了解hbase内存模型,见下图:
- .每一个Region都有一个Memstore,Memstore默认大小为128MB,可通过hbase.hregion.memstore.flush.size更改;
- Region会随着split操作逐步增多,为了控制Memstore之和导致OOM错误,在hbase老版本中是通过hbase.regionserver.global.memstore.upperLimit和hbase.regionserver.global.memstore.lowerLimit进行控制,新版本中使用hbase.regionserver.global.memstore.size和hbase.regionserver.global.memstore.lowerLimit控制;
- Hbase-env.sh中HEAP_SIZE=4G时,老版本Hbase.regionserver.global.memstore.upperLimit(默认HEAP_SIZE*0.4)=1.6G,hbase.regionserver.global.memstore.lowerLimit(默认HEAP_SIZE*0.35)=1.4G,新版本hbase.regionserver.global.memstore.size(默认HEAP_SIZE*0.4)=1.6G和Hbase.regionserver.global.memstore.lowerLimit(hbase.regionserver.global.memstore.size*HEAP_SIZE*0.95)=1.52G;
- Memstore总和达到第一个临界值,会在所有memstore中选择一个最大的那个进行flushing,此时不会阻塞写;
- Memstore总和达到第二个临界值,会阻塞所有的读写,将当前所有memstore进行flushing。
- 每一个Region都有一个BlockCache,BlockCache总和默认打下为HEAP_SIZE乘以0.4,默认是通过hfile.block.cache.size设置;
- 所有的读请求,先到BlockCache中查找,基本Memstore中有的值在BlockCache中也都有,找不到再去Hfile中找。
- hbase中默认规定Memstore总和最大值(hbase.regionserver.global.memstore.size默认0.4)和BlockCache总和最大值(hfile.block.cache.size默认0.4)之和不能大于0.8,因为要预留0.2的HEAP_SIZE供其他操作使用,这个可详见hbase源代码Org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil.java文件。
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.hbase.io.util;import java.lang.management.ManagementFactory;import java.lang.management.MemoryUsage;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.apache.hadoop.hbase.classification.InterfaceAudience;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HConstants;@InterfaceAudience.Privatepublic class HeapMemorySizeUtil { public static final String MEMSTORE_SIZE_KEY = "hbase.regionserver.global.memstore.size"; public static final String MEMSTORE_SIZE_OLD_KEY = "hbase.regionserver.global.memstore.upperLimit"; public static final String MEMSTORE_SIZE_LOWER_LIMIT_KEY = "hbase.regionserver.global.memstore.size.lower.limit"; public static final String MEMSTORE_SIZE_LOWER_LIMIT_OLD_KEY = "hbase.regionserver.global.memstore.lowerLimit"; public static final float DEFAULT_MEMSTORE_SIZE = 0.4f; // Default lower water mark limit is 95% size of memstore size. public static final float DEFAULT_MEMSTORE_SIZE_LOWER_LIMIT = 0.95f; private static final Log LOG = LogFactory.getLog(HeapMemorySizeUtil.class); // a constant to convert a fraction to a percentage private static final int CONVERT_TO_PERCENTAGE = 100; /** * Checks whether we have enough heap memory left out after portion for Memstore and Block cache. * We need atleast 20% of heap left out for other RS functions. * @param conf */ public static void checkForClusterFreeMemoryLimit(Configuration conf) { if (conf.get(MEMSTORE_SIZE_OLD_KEY) != null) { LOG.warn(MEMSTORE_SIZE_OLD_KEY + " is deprecated by " + MEMSTORE_SIZE_KEY); } float globalMemstoreSize = getGlobalMemStorePercent(conf, false); int gml = (int)(globalMemstoreSize * CONVERT_TO_PERCENTAGE); float blockCacheUpperLimit = getBlockCacheHeapPercent(conf); int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE); if (CONVERT_TO_PERCENTAGE - (gml + bcul) < (int)(CONVERT_TO_PERCENTAGE * HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) { throw new RuntimeException("Current heap configuration for MemStore and BlockCache exceeds " + "the threshold required for successful cluster operation. " + "The combined value cannot exceed 0.8. Please check " + "the settings for hbase.regionserver.global.memstore.size and " + "hfile.block.cache.size in your configuration. " + "hbase.regionserver.global.memstore.size is " + globalMemstoreSize + " hfile.block.cache.size is " + blockCacheUpperLimit); } } /** * Retrieve global memstore configured size as percentage of total heap. * @param c * @param logInvalid */ public static float getGlobalMemStorePercent(final Configuration c, final boolean logInvalid) { float limit = c.getFloat(MEMSTORE_SIZE_KEY, c.getFloat(MEMSTORE_SIZE_OLD_KEY, DEFAULT_MEMSTORE_SIZE)); if (limit > 0.8f || limit <= 0.0f) { if (logInvalid) { LOG.warn("Setting global memstore limit to default of " + DEFAULT_MEMSTORE_SIZE + " because supplied value outside allowed range of (0 -> 0.8]"); } limit = DEFAULT_MEMSTORE_SIZE; } return limit; } /** * Retrieve configured size for global memstore lower water mark as percentage of total heap. * @param c * @param globalMemStorePercent */ public static float getGlobalMemStoreLowerMark(final Configuration c, float globalMemStorePercent) { String lowMarkPercentStr = c.get(MEMSTORE_SIZE_LOWER_LIMIT_KEY); if (lowMarkPercentStr != null) { return Float.parseFloat(lowMarkPercentStr); } String lowerWaterMarkOldValStr = c.get(MEMSTORE_SIZE_LOWER_LIMIT_OLD_KEY); if (lowerWaterMarkOldValStr != null) { LOG.warn(MEMSTORE_SIZE_LOWER_LIMIT_OLD_KEY + " is deprecated. Instead use " + MEMSTORE_SIZE_LOWER_LIMIT_KEY); float lowerWaterMarkOldVal = Float.parseFloat(lowerWaterMarkOldValStr); if (lowerWaterMarkOldVal > globalMemStorePercent) { lowerWaterMarkOldVal = globalMemStorePercent; LOG.info("Setting globalMemStoreLimitLowMark == globalMemStoreLimit " + "because supplied " + MEMSTORE_SIZE_LOWER_LIMIT_OLD_KEY + " was > " + MEMSTORE_SIZE_OLD_KEY); } return lowerWaterMarkOldVal / globalMemStorePercent; } return DEFAULT_MEMSTORE_SIZE_LOWER_LIMIT; } /** * Retrieve configured size for on heap block cache as percentage of total heap. * @param conf */ public static float getBlockCacheHeapPercent(final Configuration conf) { // L1 block cache is always on heap float l1CachePercent = conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT); float l2CachePercent = getL2BlockCacheHeapPercent(conf); return l1CachePercent + l2CachePercent; } /** * @param conf * @return The on heap size for L2 block cache. */ public static float getL2BlockCacheHeapPercent(Configuration conf) { float l2CachePercent = 0.0F; String bucketCacheIOEngineName = conf.get(HConstants.BUCKET_CACHE_IOENGINE_KEY, null); // L2 block cache can be on heap when IOEngine is "heap" if (bucketCacheIOEngineName != null && bucketCacheIOEngineName.startsWith("heap")) { float bucketCachePercentage = conf.getFloat(HConstants.BUCKET_CACHE_SIZE_KEY, 0F); MemoryUsage mu = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(); l2CachePercent = bucketCachePercentage < 1 ? bucketCachePercentage : (bucketCachePercentage * 1024 * 1024) / mu.getMax(); } return l2CachePercent; }}综上所述,我在hbase-site.xml中配置信息如下:
<property><name>hfile.block.cache.size</name><value>0.3</value></property><property><name>hbase.regionserver.global.memstore.size.lower.limit</name><value>0.5</value></property><property><name>hbase.regionserver.global.memstore.size</name><value>0.5</value></property>这样在HEAP_SIZE=4G时候,
hfile.block.cache.size计算值为4G*0.3=1.2G;
hbase.regionserver.global.memstore.size计算值为4G*0.5=2G;
hbase.regionserver.global.memstore.size.lower.limit计算值为4G*0.5*0.5=1G;
并且0.3+0.5<=0.8,没有超过hbase设置的不能超过0.8这个值
3.2 关于建表预分区
上图说明的问题:
1)创建表指定和不指定预分区是有本质区别的;
2)创建表不指定预分区,hbase默认只创建一个区,默认区大小为4GB,最开始读写数据都在这一个区,而这个区只是在集群一台机器上有,造成集群中单台机器负载过大,而其他机器都一直空闲;当文件大于10GB时,hbase暂停几分钟用来做split和compact,分裂为两个区,但新的数据写全部又集中到新的第二区,问题依旧是其他机器空闲;
3)创建表指定预分区,数据会根据提供的rowkey与建表时预分区做对比,将数据分布到不同预分区读写,达到负载均衡
结论:
建表必须指定预分区才能提高hbase并发读写性能,否则,就别玩hbase了。
3.3 关于表的ROWKEY设计
hbase默认是一级索引,一级索引指的是hbase对于rowkey方面的精确查询和范围查询都是很快的,所以,你用hbase尽量要将你的关注点设计到rowkey里面去。
也补充下哈,hbase目前外面也有开源的二级索引,比如华为的hindex —— 来自华为的 HBase 二级索引
上图是一个电话拨打记录存hbase的例子,说明问题如下:
1)不是有了预分区就行了的,rowkey的设计很关键,设计不合理,仍然会导致数据倾斜;
2)rowkey设计尽量达到数据的均匀分布
3.4 关于split和compact
3.4.1 hbase的split
1)了解hbase的split
hbase默认建表时如果不指定预分区,那么这个表就默认只有一个区,默认分区大小为10G,这个区里存储数据不断增大后,分区会进行split,split是根据不同算法来分裂的,算法通过hbase.regionserver.region.split.policy参数在hbase-site.xml指定。
算法一IncreasingToUpperBoundRegionSplitPolicy:策略的意思是,数据表如果预分区为2个,配置的memstore flush size=128M,那么下一次分裂大小是2的平方然后乘以128MB,即2*2*128M=512MB。也即就算默认每个区不是通过参数hbase.hregion.max.filesize设置了大小10G么,但是这个对于本算法来说不起作用啦!!!!!!!!!!!!!!是不是要崩溃!!!!!!
算法二ConstantSizeRegionSplitPolicy:策略的意思是按照上面指定的region大小超过10G才做分裂,不超过则坚决不分裂
2)hbase的split触发带来后果
阻塞该分区所在表所有读写,时间范围影响长,所以要尽量避免!!!!
3)我们能做到的优化措施:
- 正式线上环境,一定要预估算你的数据保留时间,这样可以在hbase table上设置TTL删除过期数据;
- 数据保留时间定下来,就是预估每天数据量,然后算出在保留时间内数据的最大值,比如1TB;
- 通过上面得到的最大值,设置每个预分区hbase.hregion.max.filesize文件最大值,比如50G;
- 最终得出你大致要建预分区20个(1TB/50GB=20),这样尽量保证最开始建的预分区就是最优,在后期也不会做分裂split动作
3.4.2 hbase的compact
1)了解hbase的compact
HBase的compact是针对HRegion的HStore进行操作的。
compact操作分为major和minor两种,major会把HStore所有的HFile都compact为一个HFile,并同时忽略标记为delete的KeyValue(被删除的KeyValue只有在compact过程中才真正被"删除"),可以想象major会产生大量的IO操作,对HBase的读写性能产生影响。minor则只会选择数个HFile文件compact为一个HFile,minor的过程一般较快,而且IO相对较低。在日常任务时间,都会禁止mjaor操作,只在空闲的时段定时执行。
2)生产环境中首先禁用major compact,在hbase-site.xml增加如下配置:
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
</property>
3)空闲时候用linux shell脚本进行major compact
#vi hbase_major_compact_small.sh
cd /opt/hbase-1.2.1/bin
./hbase shell
major_compact 'small_table1'
major_compact 'small_table2'
quit
#vi hbase_major_compact_big.sh
cd /opt/hbase-1.2.1/bin
./hbase shell
major_compact 'big_table1'
major_compact 'big_table2'
quit
这样就可以在比较空闲的时候发起major_compact动作。
网上一篇比较好的文章:http://itindex.net/detail/49632-hbase-%E6%80%A7%E8%83%BD%E8%B0%83%E4%BC%98
3.5 关于HBASE GC
上面hbase经过一番优化之后,读写性能都提升上去了,又会面临新的问题,在高并发写时候,频繁的创建了大量对象,这时候java GC就会在某一时刻进行垃圾回收GC。
垃圾回收GC没有错,我们需要关注的点时,如何避免GC造成的所有读写阻塞,当读写阻塞达到一定时间时候,会触发如下动作:
- java的老生代被占满,触发FULL GC,导致hbase读写阻塞很长一段时间;
- zookeeper会认为这台regionserver已经处于不可用状态,将当前regionserver从zookeeper中踢出;
- 踢出的regionserver发现自己被zookeeper踢出,此时就主动shutdown HOOK
为了避免上面那段情况,我们能优化的是尽早GC,解决方法参见
- hbase 报错gc wal.FSHLog: Error while AsyncSyncer sync, request close of hlog YouAr http://blackproof.iteye.com/blog/2188952
- 在HBase中应用MemStore-Local Allocation Buffers解决Full GC问题 http://blackproof.iteye.com/blog/2079612
- hbase gc MemStore-Local Allocation Buffer http://blackproof.iteye.com/blog/2079617
我的优化是,首先调整hbase-env.sh中参数HBASE_REGIONSERVER_OPTS
然后是在hbase-site.xml中增加如下配置:
<property> <name>hbase.hregion.memstore.mslab.enabled</name> <value>true</value> <description> Enables the MemStore-Local Allocation Buffer, a feature which works to prevent heap fragmentation under heavy write loads. This can reduce the frequency of stop-the-world GC pauses on large heaps.</description> </property> <property> <name>hbase.hregion.memstore.mslab.chunksize</name> <value>2097152</value> <description> The default value of hbase.hregion.memstore.mslab.chunksize is defined in file org.apache.hadoop.hbase.regionserver.HeapMemStoreLAB,the size is 2048 * 1024 bytes. </description> </property> <property> <name>hbase.hregion.memstore.mslab.max.allocation</name> <value>262144</value> <description> The default value of hbase.hregion.memstore.mslab.max.allocation is defined in file org.apache.hadoop.hbase.regionserver.HeapMemStoreLAB,the size is 256 * 1024. </description> </property>
上面做法的目的有点类似于memcached中分配不同大小的内存块从而减少内存碎片的出现,尽量使得内存充分被使用。
- Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化
- Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化
- Hadoop2.7.1+Hbase1.2.1集群环境搭建(1)hadoop2.7.1源码编译
- Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分布式集群环境
- Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分布式集群环境
- HBase 集群环境搭建-基于Hadoop2.2.0
- 搭建高可用 zookeeper3.4.6 +hadoop2.7.1 +hbase1.2.6 环境
- Hadoop2.7.2+Hbase1.2.1分布式环境搭建整理
- Hbase完全分布式集群安装配置(Hbase1.0.0,Hadoop2.6.0)
- Hbase完全分布式集群安装配置(Hbase1.0.0,Hadoop2.6.0)
- Hbase分布式集群安装(Hbase1.1.2与Hadoop2.6.2)
- Hbase完全分布式集群安装配置(Hbase1.0.0,Hadoop2.6.0)
- Hbase集群运维及应用性能优化总结(hbase1.20+)
- CentOS7搭建HBase1.0完全分布式集群(Hadoop2.6)
- hbase1.2.3集群搭建(基于hadoop2.7.3)
- hadoop2.6.5+zookeeper3.4.10+hbase1.3.1分布式集群搭建
- Hadoop2.7.1 集群环境搭建(虚拟机)
- hadoop2.7.2+hbase1.2.5+storm1.1.0+spark2.1.1环境搭建
- 通过LabPython将Python引入到LabVIEW中
- 泰坦尼克号逃生评分模型(1):需求背景
- spring boot 工程的简易搭建
- 索尼竟用人工智能写了两首流行歌
- eclips 运行项目内存不足的解决方案
- Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化
- SQL试题
- 无需收银!亚马逊推出革命性的线下便利店!
- HttpServletResponse
- php错误提示failed to open stream: HTTP request failed!
- JPEG文件编/解码详解
- JavaScript模块化
- 第16周OJ-2
- 常见负载均衡算法及java实现