Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27403

Fix `updateTableStats` to update table stats always with new stats or None

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
    • 2.4.2, 3.0.0
    • SQL
    • None

    Description

      system shall update the table stats automatically if user set spark.sql.statistics.size.autoUpdate.enabled as true, currently this property is not having any significance even if it is enabled or disabled. This feature is similar to Hives auto-gather feature where statistics are automatically computed by default if this feature is enabled.

      Reference:

      https://cwiki.apache.org/confluence/display/Hive/StatsDev

      Reproducing steps:

      scala> spark.sql("create table table1 (name string,age int) stored as parquet")

      scala> spark.sql("insert into table1 select 'a',29")
      res2: org.apache.spark.sql.DataFrame = []

      scala> spark.sql("desc extended table1").show(false)
      ---------------------------------------------------------------------------------------++------

      col_name data_type comment

      ---------------------------------------------------------------------------------------++------

      name string null
      age int null
           
      1. Detailed Table Information
         
      Database default  
      Table table1  
      Owner Administrator  
      Created Time Sun Apr 07 23:41:56 IST 2019  
      Last Access Thu Jan 01 05:30:00 IST 1970  
      Created By Spark 2.4.1  
      Type MANAGED  
      Provider hive  
      Table Properties [transient_lastDdlTime=1554660716]  
      Location file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1  
      Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe  
      InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat  
      OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  
      Storage Properties [serialization.format=1]  
      Partition Provider Catalog  

      ---------------------------------------------------------------------------------------++------

      Attachments

        Issue Links

          Activity

            People

              S71955 Sujith Chacko
              S71955 Sujith Chacko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: