Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27904

A random data generator tool leveraging bulk load.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-beta-1
    • util
    • None
    • Reviewed

    Description

      As of now, there is no data generator tool in HBase leveraging bulk load. Since bulk load skips client writes path, it's much faster to generate data and use of for load/performance tests where client writes are not a mandate.
      Example: Any tooling over HBase that need x TBs of HBase Table for load testing.

      Requirements:
      1. Tooling should generate RANDOM data on the fly and should not require any pre-generated data as CSV/XML files as input.
      2. Tooling should support pre-splited tables (number of splits to be taken as input).
      3. Data should be UNIFORMLY distributed across all regions of the table.

      High-level Steps
      1. A table will be created (pre-splited with number of splits as input)
      2. The mapper of a custom Map Reduce job will generate random key-value pair and ensure that those are equally distributed across all regions of the table.
      3. HFileOutputFormat2 will be used to add reducer to the MR job and create HFiles based on key value pairs generated by mapper.
      4. Bulk load those HFiles to the respective regions of the table using LoadIncrementalFiles

      Results
      We had POC for this tool in our organization, tested this tool with a 11 nodes HBase cluster (having HBase + Hadoop services running). The tool generated:
      1. 100 GB of data in 6 minutes
      2. 340 GB of data in 13 minutes
      3. 3.5 TB of data in 3 hours and 10 minutes

      Usage
      hbase org.apache.hadoop.hbase.util.bulkdatagenerator.BulkDataGeneratorTool -mapper-count 100 -table TEST_TABLE_1 -rows-per-mapper 1000000 -split-count 100 -delete-if-exist -table-options "NORMALIZATION_ENABLED=false"

       

      Attachments

        Activity

          People

            hgwalani81 Himanshu Gwalani
            hgwalani81 Himanshu Gwalani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: