Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28124

Do not allow non-numeric values in Hive table stats during an alter table

    XMLWordPrintableJSON

Details

    Description

      Hive table properties are string in their nature, however some of them have special meaning and should have numeric values, like the "totalSize", "numRows", "rawDataSize".
      During an "ALTER TABLE" statement Hive currently validates only the "numRows" and "rawDataSize" table properties, the other table properties can be set to non-numeric values (including an empty string).
      From certain applications (like from Spark) we get quite obscure "NumberFormatException" errors while trying to access such broken tables. (see SPARK-47444)
      For example such a query (after which that table can't be read from Spark)::

      0: jdbc:hive2://hs2host> alter table t1p set tblproperties('totalSize'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
      

      In the AbstractAlterTablePropertiesAnalyzer.java besides the "numRows" and "rawDataSize" we should validate the other table stats related properties too, currently the missing ones are:
      numFiles, numPartitions, totalSize, runTimeNumRows, numFilesErasureCoded.

      Attachments

        Activity

          People

            mszurap Miklos Szurap
            mszurap Miklos Szurap
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: