Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27733

hfile split occurs during bulkload, the new HFile file does not specify favored nodes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tooling
    • None

    Description

        1. BulkloadHFilesTool.class

      /**

      • Copy half of an HFile into a new HFile.
        */
        private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile,
        Reference reference, ColumnFamilyDescriptor familyDescriptor) throws IOException {
        FileSystem fs = inFile.getFileSystem(conf);
        CacheConfig cacheConf = CacheConfig.DISABLED;
        HalfStoreFileReader halfReader = null;
        StoreFileWriter halfWriter = null;
        try {
        ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, inFile).build();
        StoreFileInfo storeFileInfo =
        new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
        storeFileInfo.initHFileInfo(context);
        halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, cacheConf);
        storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
        Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo();

      int blocksize = familyDescriptor.getBlocksize();
      Algorithm compression = familyDescriptor.getCompressionType();
      BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
      HFileContext hFileContext = new HFileContextBuilder().withCompression(compression)
      .withChecksumType(StoreUtils.getChecksumType(conf))
      .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
      .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
      .withCreateTime(EnvironmentEdgeManager.currentTime()).build();
      halfWriter = new StoreFileWriter.Builder(conf, cacheConf, fs).withFilePath(outFile)
      .withBloomType(bloomFilterType).withFileContext(hFileContext).build();
      HFileScanner scanner = halfReader.getScanner(false, false, false);
      scanner.seekTo();

      ...

       

      When hfile splitting occurs during bulkload, the new HFile file does not specify favored nodes, which will affect the locality of data. Internally, we implemented a version of the code that allows us to specify the favored nodes of the split HFile in copyHFileHalf() to avoid compromising locality

      Attachments

        Issue Links

          Activity

            People

              alanlemma alan.zhao
              alanlemma alan.zhao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: