Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4296

Consider using listStatusIterator instead of listStatus in DatePartitionedLogger

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.1
    • None
    • None

    Description

      DatePartitionedLogger should make use of listStatusIterator instead of listStatus to avoid OOM.

      https://github.com/apache/tez/blob/master/tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/DatePartitionedLogger.java#L163

      e.g /warehouse/tablespace/managed/hive/sys.db/query_data/date=x-y-z had way too may files and listing them with DatePartitionedLogger OOMed.

      Attachments

        1. Screenshot 2021-02-22 at 4.18.30 AM.png
          362 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              harishjp Harish JP
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h