Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersStop watchingWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time is 5.58% and CPU time is 26.5% of total execution time.

      After switching FileSystem.Statistics implementation to LongAdder, consumed Wall time decreased to 0.006% and CPU time to 0.104% of total execution time.

      Total job runtime decreased from 66 mins to 61 mins.

      These results are not conclusive, because I didn't benchmark multiple times to average results, but regardless of performance gains switching to LongAdder simplifies code and reduces its complexity.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            medb Igor Dvorzhak Assign to me
            medb Igor Dvorzhak

            Dates

              Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h
              1h

              Slack

                Issue deployment