HBase HLog结构和原理

greatwqs

浏览: 680919 次
性别:
来自: 成都

最近访客更多访客>>

regtome

sshcainiao

dust_dn

zhangcaiyanbeyond

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hbase

HBase HLog WAL HLogKey WALEdit

一. HLog在HDFS上位置和RegionServer对应关系

HLog持久化在HDFS之上, HLog存储位置查看:

hadoop fs -ls /hbase/.logs

通过HBase架构图, HLog与HRegionServer一一对应,

Found 5 items                                                                                  
drwxr-xr-x - hadoop cug-admin  0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS02,61020,1365661380729
drwxr-xr-x - hadoop cug-admin  0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS03,61020,1365661378638
drwxr-xr-x - hadoop cug-admin  0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS04,61020,1365661379200
drwxr-xr-x - hadoop cug-admin  0 2013-04-11 14:22 /hbase/.logs/HADOOPCLUS05,61020,1365661378053
drwxr-xr-x - hadoop cug-admin  0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS06,61020,1365661378832

HADOOPCLUS02 ~ HADOOPCLUS06 为RegionServer.

上面显示的文件目录为HLog存储. 如果HLog已经失效(所有之前的写入MemStore已经持久化在HDFS),HLog存在于HDFS之上的文件会从/hbase/.logs转移至/hbase/.oldlogs, oldlogs会删除, HLog的生命周期结束.

二. HBase写流程和写HLog的阶段点.

向HBase Put数据时通过HBaseClient-->连接ZooKeeper--->-ROOT--->.META.-->RegionServer-->Region:

Region写数据之前会先检查MemStore.

1. 如果此Region的MemStore已经有缓存已有写入的数据, 则直接返回;

2. 如果没有缓存, 写入HLog(WAL), 再写入MemStore.成功后再返回.

MemStore内存达到一定的值调用flush成为StoreFile,存到HDFS.

在对HBase插入数据时,插入到内存MemStore所以很快,对于安全性不高的应用可以关闭HLog,可以获得更高的写性能.

三. HLog相关源码.

1. 总览.

写入HLog主要靠HLog对象的doWrite(HRegionInfo info, HLogKey logKey, WALEdit logEdit)

或者completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion),

在这两方法中调用this.writer.append(new HLog.Entry(logKey, logEdit));方法写入操作.

在方法内构造HLog.Entry:使用当前构造好的writer, 见上图引用对象,

完整实现类: org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter,

HLog 方法createWriterInstance(fs, newPath, conf) 创建 Writer对象.

2. SequenceFileLogWriter 和SequenceFileLogReader

在SequenceFileLogWriter 类中可以看到, 使用Hadoop SequenceFile.Writer写入到文件系统. SequenceFile是HLog在Hadoop存储的文件格式.

HLog.Entry为HLog存储的最小单位.

public class SequenceFileLogWriter implements HLog.Writer {
  private final Log LOG = LogFactory.getLog(this.getClass());
  // The hadoop sequence file we delegate to.
  private SequenceFile.Writer writer;
  // The dfsclient out stream gotten made accessible or null if not available.
  private OutputStream dfsClient_out;
  // The syncFs method from hdfs-200 or null if not available.
  private Method syncFs;
  // init writer need the key;
  private Class<? extends HLogKey> keyClass;
  @Override
  public void init(FileSystem fs, Path path, Configuration conf)
      throws IOException {
      // 1. create Hadoop file SequenceFile.Writer for writer initation.
      // 2. Get at the private FSDataOutputStream inside in SequenceFile so we can call sync on it. for dfsClient_out initation.
  }
  @Override
  public void append(HLog.Entry entry) throws IOException {
    this.writer.append(entry.getKey(), entry.getEdit());
  }
  @Override
  public void sync() throws IOException {
    if (this.syncFs != null) {
      try {
       this.syncFs.invoke(this.writer, HLog.NO_ARGS);
      } catch (Exception e) {
        throw new IOException("Reflection", e);
      }
    }
  }
}

SequenceFileLogReader为读取HLog.Entry对象使用.

3. HLog.Entry与属性logSeqNum

每一个Entry包含了 HLogKey和WALEdit

HLogKey包含了基本信息:

  private byte [] encodedRegionName;
  private byte [] tablename;
  private long logSeqNum;
  // Time at which this edit was written.
  private long writeTime;
  private byte clusterId;

logSeqNum是一个重要的字段值, sequence number是作为StoreFile里的一个元数据字段,可以针对StoreFile直接得到longSeqNum;

public class StoreFile {
  static final String HFILE_BLOCK_CACHE_SIZE_KEY = "hfile.block.cache.size";
  private static BlockCache hfileBlockCache = null;
  // Is this from an in-memory store
  private boolean inMemory;
  // Keys for metadata stored in backing HFile.
  // Set when we obtain a Reader.  StoreFile row 140
  private long sequenceid = -1;
  /**
   * @return This files maximum edit sequence id.
   */
  public long getMaxSequenceId() {
    return this.sequenceid;
  }
  /**
   * Return the highest sequence ID found across all storefiles in
   * the given list. Store files that were created by a mapreduce
   * bulk load are ignored, as they do not correspond to any edit
   * log items.
   * @return 0 if no non-bulk-load files are provided or, this is Store that
   * does not yet have any store files.
   */
  public static long getMaxSequenceIdInList(List<StoreFile> sfs) {
    long max = 0;
    for (StoreFile sf : sfs) {
      if (!sf.isBulkLoadResult()) {
        max = Math.max(max, sf.getMaxSequenceId());
      }
    }
    return max;
  }
    /**
     * Writes meta data. important for maxSequenceId WRITE!!
     * Call before {@link #close()} since its written as meta data to this file.
     * @param maxSequenceId Maximum sequence id.
     * @param majorCompaction True if this file is product of a major compaction
     * @throws IOException problem writing to FS
     */
    public void appendMetadata(final long maxSequenceId, final boolean majorCompaction)
    throws IOException {
      writer.appendFileInfo(MAX_SEQ_ID_KEY, Bytes.toBytes(maxSequenceId));
      writer.appendFileInfo(MAJOR_COMPACTION_KEY,
          Bytes.toBytes(majorCompaction));
      appendTimeRangeMetadata();
    }
  }
  /**
   * Opens reader on this store file.  Called by Constructor.
   * @return Reader for the store file.
   * @throws IOException
   * @see #closeReader()
   */
  private Reader open() throws IOException {
     // ........
     this.sequenceid = Bytes.toLong(b);
      if (isReference()) {
        if (Reference.isTopFileRegion(this.reference.getFileRegion())) {
          this.sequenceid += 1;
        }      
      }
     this.reader.setSequenceID(this.sequenceid);
     return this.reader;
   }
}

Store 类对StoreFile进行了管理, 如compact.在很多StoreFile进行合并时, 取值最大的longSeqNum;

public class Store implements HeapSize {
  /**
   * Compact the StoreFiles.  This method may take some time, so the calling
   * thread must be able to block for long periods.   *
   * <p>During this time, the Store can work as usual, getting values from
   * StoreFiles and writing new StoreFiles from the memstore.   *
   * Existing StoreFiles are not destroyed until the new compacted StoreFile is
   * completely written-out to disk.   *
   * <p>The compactLock prevents multiple simultaneous compactions.
   * The structureLock prevents us from interfering with other write operations.   *
   * <p>We don't want to hold the structureLock for the whole time, as a compact()
   * can be lengthy and we want to allow cache-flushes during this period.   *
   * @param forceMajor True to force a major compaction regardless of thresholds
   * @return row to split around if a split is needed, null otherwise
   * @throws IOException
   */
  StoreSize compact(final boolean forceMajor) throws IOException {
    boolean forceSplit = this.region.shouldForceSplit();
    boolean majorcompaction = forceMajor;
    synchronized (compactLock) {
      /* get store file sizes for incremental compacting selection.
       * normal skew:
       *
       *         older ----> newer
       *     _
       *    | |   _
       *    | |  | |   _
       *  --|-|- |-|- |-|---_-------_-------  minCompactSize
       *    | |  | |  | |  | |  _  | |
       *    | |  | |  | |  | | | | | |
       *    | |  | |  | |  | | | | | |
       */
      // .............
      this.lastCompactSize = totalSize;

      // Max-sequenceID is the last key in the files we're compacting
      long maxId = StoreFile.getMaxSequenceIdInList(filesToCompact);

      // Ready to go.  Have list of files to compact.
      LOG.info("Started compaction of " + filesToCompact.size() + " file(s) in cf=" +
          this.storeNameStr +
        (references? ", hasReferences=true,": " ") + " into " +
          region.getTmpDir() + ", seqid=" + maxId +
          ", totalSize=" + StringUtils.humanReadableInt(totalSize));
      StoreFile.Writer writer = compact(filesToCompact, majorcompaction, maxId);
      // Move the compaction into place.
      StoreFile sf = completeCompaction(filesToCompact, writer);
    }
    return checkSplit(forceSplit);
  }
  /**
   * Do a minor/major compaction.  Uses the scan infrastructure to make it easy.
   *
   * @param filesToCompact which files to compact
   * @param majorCompaction true to major compact (prune all deletes, max versions, etc)
   * @param maxId Readers maximum sequence id.
   * @return Product of compaction or null if all cells expired or deleted and
   * nothing made it through the compaction.
   * @throws IOException
   */
  private StoreFile.Writer compact(final List<StoreFile> filesToCompact,
                               final boolean majorCompaction, final long maxId)
      throws IOException {
    // Make the instantiation lazy in case compaction produces no product; i.e.
    // where all source cells are expired or deleted.
    StoreFile.Writer writer = null;
    try {
      // ......
    } finally {
      if (writer != null) {
      	// !!!! StoreFile.Writer  write Metadata for maxid.
        writer.appendMetadata(maxId, majorCompaction);
        writer.close();
      }
    }
    return writer;
  }
}

在compact时, 第一个compact(final boolean forceMajor)调用

compact(final List<StoreFile> filesToCompact, final boolean majorCompaction, final long maxId)
此方法最后写入writer.appendMetadata(maxId, majorCompaction); 也就是StoreFile中的appendMetadata方法.

可见, 是在finally中写入最大的logSeqNum. 这样StoreFile在取得每个logSeqNum, 可以由open读取logSeqNum;

clusterId 保存在Hadoop集群ID.

4. HLog的生命周期

这里就涉及到HLog的生命周期问题了.如果HLog的logSeqNum对应的HFile已经存储在HDFS了(主要是比较HLog的logSeqNum是否比与其对应的表的HDFS StoreFile的maxLongSeqNum小),那么HLog就没有存在的必要了.移动到.oldlogs目录,最后删除.

反过来如果此时系统down了,可以通过HLog把数据从HDFS中读取,把要原来Put的数据读取出来, 重新刷新到HBase.

补充资料:

HBase 架构101 –预写日志系统 (WAL)

http://cloudera.iteye.com/blog/911700

HLog的结构和生命周期

http://www.spnguru.com/2011/03/hlog%e7%9a%84%e7%bb%93%e6%9e%84%e5%92%8c%e7%94%9f%e5%91%bd%e5%91%a8%e6%9c%9f/

分享到：

Linux AWK 命令 | Hadoop HDFS架构和设计

2013-04-11 21:06
浏览 14952
评论(3)
分类:开源软件
查看更多

3 楼 marlay 2015-06-01

"如果此Region的MemStore已经有缓存已有写入的数据, 则直接返回"
这个不对吧，任何的写入都会 “数据会首先会被写入WAL，之后将被写到实际存放数据的MemStore中” ，请确认

2 楼 greatwqs 2013-04-27

触发是由于HRegion定时调用Flushcache:
protected boolean internalFlushcache(final HLog wal, final long myseqid) throws IOException(){
completeCacheFlush();
}
这种情况写HLog
completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion)

public void completeCacheFlush(final byte [] encodedRegionName, 
    final byte [] tableName, final long logSeqId, final boolean isMetaRegion) 
throws IOException { 
  try { 
    if (this.closed) { 
      return; 
    } 
    synchronized (updateLock) { 
      long now = System.currentTimeMillis(); 
      WALEdit edit = completeCacheFlushLogEdit();
      //这一句表名是Flush这种操作的日志 
      HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, 
          System.currentTimeMillis());
          //这一句表明该日志记录下了表名、分区名、当前的日志SequenceId 
      this.writer.append(new Entry(key, edit));
      //这一句写入日志文件 
      writeTime += System.currentTimeMillis() - now; 
      writeOps++; 
      this.numEntries.incrementAndGet(); 
      Long seq = this.lastSeqWritten.get(encodedRegionName); 
      if (seq != null && logSeqId >= seq.longValue()) { 
        this.lastSeqWritten.remove(encodedRegionName);
        //每个Region最后更新SequenceId被删除，表明该Region没有数据需要持久化。 
      } 
    } 
    // sync txn to file system 
    this.sync();
    //这种flush操作很重要，一定要同步到hdfs的其他节点上 
  } finally { 
    this.cacheFlushLock.unlock(); 
  } 
}

1 楼 greatwqs 2013-04-27

uestzengting Blog分析:

http://uestzengting.iteye.com/blog/1233695

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论