Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8109

Impala cannot read the gzip files bigger than 2 GB

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 3.1.0
    • Backend
    • None
    • ghx-label-2

    Description

      When querying a partition containing gzip files, the query fails with the error below:
      WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz:
      Error(255): Unknown error 255
      Root cause: EOFException: Cannot seek to negative offset

      hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz file is a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The uncompressed size is ~13GB

      The impalad version is : 2.12.0-cdh5.15.0

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hakkibc hakki
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: