Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7047

REFRESH on unpartitioned tables calls getBlockLocations on every file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.13.0
    • Impala 3.2.0
    • Catalog
    • ghx-label-8

    Description

      In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition object is reset, and a new empty one is created. It then calls refreshPartitionFileMetadata with this new partition which has an empty list of file descriptors. This ends up listing the directory, and for each file, since it doesn't find it in the empty descriptor list, will make a separate RPC to HDFS to get the locations.

      This is quite wasteful vs just using the API that returns the located statuses for the directory.

      Alternatively, it seems like it should probably keep around the old file descriptor list in the new Partition object so that the incremental refresh path can work.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: