zxiaozhuT 2011-06-15
今天同事问到hbase中in-memory属性的作用,以前没有注意过,今天仔细看了下代码:
// Instantiate priority buckets BlockBucket bucketSingle = new BlockBucket(bytesToFree, blockSize, singleSize()); BlockBucket bucketMulti = new BlockBucket(bytesToFree, blockSize, multiSize()); BlockBucket bucketMemory = new BlockBucket(bytesToFree, blockSize, memorySize()); // Scan entire map putting into appropriate buckets for(CachedBlock cachedBlock : map.values()) { switch(cachedBlock.getPriority()) { case SINGLE: { bucketSingle.add(cachedBlock); break; } case MULTI: { bucketMulti.add(cachedBlock); break; } case MEMORY: { bucketMemory.add(cachedBlock); break; } } } PriorityQueue<BlockBucket> bucketQueue = new PriorityQueue<BlockBucket>(3); bucketQueue.add(bucketSingle); bucketQueue.add(bucketMulti); bucketQueue.add(bucketMemory); int remainingBuckets = 3; long bytesFreed = 0; BlockBucket bucket; while((bucket = bucketQueue.poll()) != null) { long overflow = bucket.overflow(); if(overflow > 0) { long bucketBytesToFree = Math.min(overflow, (bytesToFree - bytesFreed) / remainingBuckets); bytesFreed += bucket.free(bucketBytesToFree); } remainingBuckets--; }
hbase内部的blockcache分三个队列:single、multi以及memory,分别占用25%,50%,25%的大小。这涉及到family属性中的in-memory选项,默认是false。
设为false的话,第一次访问到该数据时,会将它写入single队列,否则写入memory队列。当再次访问该数据并且在single中读到了该数据时,single会升级为multi
这三个队列其实是在共用blockcache的资源,区别是在LRU淘汰数据时,single会优先淘汰,其次为multi,最后为memory。
所以结论有两点:
1同一个family不会占用全部的blockcache资源
2当某些family特别重要时,可以将它的in-memory设为true,单独使用一个缓存队列,保证cache的优先使用