Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3936

Reduce TezEvent messaging overhead

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Revisiting TEZ-3145, and found that in addition to improving the way empty partitions are send from Maps to AM and AM to Reducers, message serialization can be improved to reduce network traffic.

      For example in a job with 42000 Maps and 7500 reduces where 95% of the partition data produced is empty. Tez DME events send from the AM to the Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty partitions message size is 450 bytes where 260 bytes is needed for sending empty partitions and 190 bytes for messaging. Total messaging is 132 GBs
      76 GBs for empty partition data and 56 GBs for non-empty partition messaging. This jira aims to reduce the non-empty partition messaging.

      Attachments

        1. TEZ-3936.001.patch
          9 kB
          Jonathan Turner Eagles
        2. TEZ-3936.002.patch
          9 kB
          Jonathan Turner Eagles

        Issue Links

          Activity

            People

              jeagles Jonathan Turner Eagles
              jeagles Jonathan Turner Eagles
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: