Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4667

LocalityGroupIterator very inefficient with large locality groups

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.6, 1.7.3, 1.8.1, 2.0.0
    • 1.9.0
    • tserver
    • None

    Description

      On one of our systems we tracked some scans that were taking an extremely long time to complete (many hours). As it turns out the scan was relatively simple in that it was scanning a tablet for all keys that had a specific column family. Note that there was very little data that actually matched this column familiy. Upon tracing the code we found that it was spending a large amount of time in the LocalityGroupIterator. Stack traces continually found the code to be at line 128 or 129 of the LocalityGroupIterator. Those line numbers are consistent from the 1.6 series all the way to 2.0.0 (master). In this case the column family being searched for was included in one of a dozen or so locality groups on that table, and the locality group itself had 40 or so column families. We see several things that can be done here:

      1) The code that checks the group column families against those being searched for can quickly exit once if finds a match
      2) The code that checks the group column families against those being searched for can look at the relative size of those two groups an invert the logic appropriately for a more efficient loop.
      3) We could create a cached map of column families to locality groups allowing us to avoid examining each locality group every time we seek.

      Attachments

        Activity

          People

            ivan.bella Ivan Bella
            ivan.bella Ivan Bella
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7.5h
                7.5h