hbase 获取行锁源码分析

gaofengyan 2015-03-19

现象:60020中有许多连接,并且长久不放;

           hbase hbck已经连接不上60020

           日志中出现大量以下日志:

           

2014-12-24 17:36:47,821 WARN  [RpcServer.handler=1,port=60020] retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.get
BlockLocations. Not retrying because failovers (15) exceeded maximum allowed (15)
java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "host184/192.168.5.184"; destination host is: "host150":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
        at org.apache.hadoop.ipc.Client.call(Client.java:1362)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at $Proxy13.getBlockLocations(Unknown Source)
[hadoop-cdh@host184 logs]$ vi hbase-hadoop-cdh-regionserver-host184.log
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
        at java.lang.Thread.run(Thread.java:722)
2015-02-27 00:48:19,543 WARN  [RpcServer.handler=16,port=60020] regionserver.HRegion: Failed getting lock in batch put, row=3d87cfc7693eed24f28afee4a0495f30
java.io.IOException: Timed out waiting for lock for row: 3d87cfc7693eed24f28afee4a0495f30
        at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:3462)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2382)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2249)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
        at java.lang.Thread.run(Thread.java:722)
2015-02-27 00:48:22,530 WARN  [RpcServer.handler=15,port=60020] regionserver.HRegion: Failed getting lock in batch put, row=3d87cfc7693eed24f28afee4a0495f30
java.io.IOException: Timed out waiting for lock for row: 3d87cfc7693eed24f28afee4a0495f30
        at org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:3462)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2382)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2249)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
        at java.lang.Thread.run(Thread.java:722)

可以看出获取15次,都没有成功, 因为获取不到row=3d87cfc7693eed24f28afee4a0495f30的行锁

获得行锁超时问题部分源码:

public RowLock getRowLock(byte[] row, boolean waitForLock) throws IOException {
    checkRow(row, "row lock");
    startRegionOperation();
    try {
      HashedBytes rowKey = new HashedBytes(row);
      RowLockContext rowLockContext = new RowLockContext(rowKey);

      // loop until we acquire the row lock (unless !waitForLock)
      while (true) {
        //获取锁,往currenthashmap中putifAbsent rowkey
        RowLockContext existingContext = lockedRows.putIfAbsent(rowKey, rowLockContext);
        if (existingContext == null) {
          // Row is not already locked by any thread, use newly created context.
          break;
        } else if (existingContext.ownedByCurrentThread()) {
          // Row is already locked by current thread, reuse existing context instead.
          rowLockContext = existingContext;
          break;
        } else {
          // Row is already locked by some other thread, give up or wait for it
          if (!waitForLock) {
            return null;
          }
          try {//等待其他线程 downlatch,释放锁
            if (!existingContext.latch.await(this.rowLockWaitDuration, TimeUnit.MILLISECONDS)) {
              throw new IOException("Timed out waiting for lock for row: " + rowKey);
            }
          } catch (InterruptedException ie) {
            LOG.warn("Thread interrupted waiting for lock on row: " + rowKey);
            InterruptedIOException iie = new InterruptedIOException();
            iie.initCause(ie);
            throw iie;
          }
        }
      }

      // allocate new lock for this thread
      return rowLockContext.newLock();
    } finally {
      closeRegionOperation();
    }
  }

 在一下这些地方,需要获得行锁

  hbase 获取行锁源码分析

有关配置

<property>
       <name>hbase.rowlock.wait.duration</name>
       <value>90000</value>
       <description> 
        每次获取行锁的超时时间,默认为30s
       </description> 
</property>
<property>
			<name>hbase.regionserver.lease.period</name>
			<value>180000</value>
			<description> 
			客户端每次获得rs一次socket时间
			</description> 
</property>

<property>
       <name>hbase.rpc.timeout</name>
       <value>180000</value>
			<description> 
			rpc超时时间
			</description> 
</property>

<property>
       <name>hbase.client.scanner.timeout.period</name>
       <value>180000</value>
			<description> 
			客户端每次scan|get的超时时间
			</description> 
</property>

<property>
        <name>hbase.client.scanner.caching</name>
        <value>100</value>
			<description> 
			客户端每次scan的一个next,获得多少行,默认1
			</description> 
</property>

相关推荐

飞鸿踏雪0 / 0评论 2020-05-07