Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1271

Improve memory usage of Hash-shuffle

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.12.0, 0.11.1
    • Data Shuffle

    Description

      Currently, Hash-shuffle keeps intermediate file appender and tuple list in memory and the required memory will be in proportion to the input size
      If input size is 10GB, the hash-join key partition count will be 78125 (10TB / 128MB) and the required memory is 10GB (78125 * 128KB).

      We should improve the hash-shuffle file writer as following :

      • Separate the buffer from the file writer
      • Keep the tuples in off-heap buffer and reuse the buffer
      • Flush the buffers, if total buffer capacity is required more than maxBufferSize
      • Write the partition files asynchronously

      Attachments

        Activity

          People

            jhkim Jinho Kim
            jhkim Jinho Kim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: