XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • SQL
    • None

    Description

      We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.

      The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.

      Reproduction:

        private def setupPartitionedTable(tableName: String, dir: File): Unit = {
          spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write
            .partitionBy("partCol1", "partCol2")
            .mode("overwrite")
            .parquet(dir.getAbsolutePath)
      
          spark.sql(s"""
            |create external table $tableName (normalCol long)
            |partitioned by (partCol1 int, partCol2 int)
            |stored as parquet
            |location "${dir.getAbsolutePath}"""".stripMargin)
          spark.sql(s"msck repair table $tableName")
        }
      
        test("filter by mixed case col") {
          withTable("test") {
            withTempDir { dir =>
              setupPartitionedTable("test", dir)
              val df = spark.sql("select * from test where normalCol = 3")
              assert(df.count() == 1)
            }
          }
        }
      

      cc cloud_fan

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            ekhliang Eric Liang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: