Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4244

Consider using RawLocalFileSystem in LocalDiskFetchedInput

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.10.1
    • None

    Description

      Using RawLocalFileSystem (LocalFSFileInputStream) should avoid the native FS call for seek() and should be using just (pos < 0) condition.

       

      "TezTR-348763_0_9_6_172_0" #68186 daemon prio=5 os_prio=0 tid=0x000055d7afbce800 nid=0x3877 runnable [0x00007f645019c000]
         java.lang.Thread.State: RUNNABLE
      	at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
      	at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
      	at java.io.File.exists(File.java:821)
      	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:646)
      	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:939)
      	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:640)
      	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456)
      	at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1781)
      	at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.getFileLength(ChecksumFileSystem.java:294)
      	at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:337)
      	- locked <0x00007f9f10196f00> (a org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream)
      	at org.apache.tez.runtime.library.common.shuffle.LocalDiskFetchedInput.getInputStream(LocalDiskFetchedInput.java:73)
      	at org.apache.tez.runtime.library.common.readers.UnorderedKVReader.openIFileReader(UnorderedKVReader.java:226)
      	at org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput(UnorderedKVReader.java:212)
      	at org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next(UnorderedKVReader.java:125)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:144)
      	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:386)
      	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:455)
      	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:242)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:555)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:111)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
       

      Attachments

        1. TEZ-4244.3.patch
          2 kB
          Rajesh Balamohan
        2. TEZ-4244.2.patch
          0.8 kB
          Rajesh Balamohan
        3. TEZ-4244.1.patch
          0.8 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: