Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26178

Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. 

      The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included.

      It is time to replace some of the data structure for better time complexity including:

      int[][] regionsPerServer; // serverIndex -> region list

      int[][] regionsPerHost; // hostIndex -> list of regions
      int[][] regionsPerRack; // rackIndex -> region list

      // serverIndex -> sorted list of regions by primary region index
      ArrayList<HashSet<Integer>> primariesOfRegionsPerServer;

      // hostIndex -> sorted list of regions by primary region index
      int[][] primariesOfRegionsPerHost;

      // rackIndex -> sorted list of regions by primary region index
      int[][] primariesOfRegionsPerRack;

      Areas of algorithm improvement include:

      1. O(n ) to O(1) time to  lookup or update per server/host/rack for every move test iteration.(n = number of regions per server/host/rack).
      2. O(n ) to O(1) time for reserse lookup of region index from primary index.
      3. Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to O(1)

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            claraxiong Clara Xiong Assign to me
            claraxiong Clara Xiong
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment