Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9952

Invalid offset index in Parquet file

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • ghx-label-1

    Description

      When reading parquet file in impala 3.4, encountered the following error:

      I0714 16:11:48.307806 1075820 runtime-state.cc:207] 8c43203adb2d4fc8:0478df9b0000018b] Error from query 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
      I0714 16:11:48.834901 1075838 status.cc:126] 8c43203adb2d4fc8:0478df9b000002c0] Invalid offset index in Parquet file hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
          @           0xbf4ef9
          @          0x1748c41
          @          0x174e170
          @          0x1750e58
          @          0x17519f0
          @          0x1748559
          @          0x1510b41
          @          0x1512c8f
          @          0x137488a
          @          0x1375759
          @          0x1b48a19
          @     0x7f34509f5e24
          @     0x7f344d5ed35c
      I0714 16:11:48.835763 1075838 runtime-state.cc:207] 8c43203adb2d4fc8:0478df9b000002c0] Error from query 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
      I0714 16:11:48.893784 1075820 status.cc:126] 8c43203adb2d4fc8:0478df9b0000018b] Top level rows aren't in sync during page filtering in file hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
          @           0xbf4ef9
          @          0x1749104
          @          0x17494cc
          @          0x1751aee
          @          0x1748559
          @          0x1510b41
          @          0x1512c8f
          @          0x137488a
          @          0x1375759
          @          0x1b48a19
          @     0x7f34509f5e24
          @     0x7f344d5ed35c
      

       Corresponding source code:

      Status HdfsParquetScanner::CheckPageFiltering() {
        if (candidate_ranges_.empty() || scalar_readers_.empty()) return Status::OK();  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
        for (int i = 1; i < scalar_readers_.size(); ++i) {
          if (current_row != scalar_readers_[i]->LastProcessedRow()) {
            DCHECK(false);
            return Status(Substitute(
                "Top level rows aren't in sync during page filtering in file $0.", filename()));
          }
        }
        return Status::OK();
      }
      
      
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            boroknagyz Zoltán Borók-Nagy
            guojingfeng guojingfeng
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment