Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3709

TezMerger is slow for high number of segments

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0, 0.8.5
    • None

    Description

      The below code is a bad performer at scale since it has to memcpy the whole list of segments for each item in the batch instead of of just once per batch.
      This is true for both computeBytesInMerges and getSegmentDescriptors.

      for (int i = 0; i < batch; i++) {
        ArrayList#remove(0)
      }
      

      Attachments

        1. TEZ-3709.1.patch
          1 kB
          Jonathan Turner Eagles
        2. TEZ-3709.2.patch
          6 kB
          Jonathan Turner Eagles
        3. TEZ-3709.3.patch
          7 kB
          Jonathan Turner Eagles

        Activity

          People

            jeagles Jonathan Turner Eagles
            jeagles Jonathan Turner Eagles
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: