Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21458

ACID: Optimize AcidUtils$MetaDataFile.isRawFormat

    XMLWordPrintableJSON

Details

    Description

      In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching or perhaps checking once, and saving our result for later. Also, perhaps we don't need to do this for every split. An example call stack:

      OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105	
      AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026	
      AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022	
      AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007	
      OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231	
      OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722	
      OrcRawRecordMerger.<init>(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022	
      OrcInputFormat.getReader(InputSplit, Options) line: 2108	
      OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006	
      FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776	
      FetchOperator.getRecordReader() line: 344	
      FetchOperator.getNextRow() line: 540	
      FetchOperator.pushRow() line: 509	
      FetchTask.fetch(List) line: 146	
      

      Here, for each split we'll make that check.

      Attachments

        1. async-prof-pid-1-cpu-1.svg
          1.60 MB
          Prasanth Jayachandran

        Activity

          People

            Unassigned Unassigned
            vgumashta Vaibhav Gumashta
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: