Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14459

SQL partitioning must match existing tables, but is not checked.

    XMLWordPrintableJSON

Details

    Description

      Writing into partitioned Hive tables has unexpected results because the table's partitioning is not detected and applied during the analysis phase.

      For example, if I have two tables, source and partitioned, with the same column types:

      CREATE TABLE source (id bigint, data string, part string);
      CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part string);
      
      // copy from source to partitioned
      sqlContext.table("source").write.insertInto("partitioned")
      

      Copying from source to partitioned succeeds, but results in 0 rows. This works if I explicitly partition by adding ...write.partitionBy("part").insertInto(...). This work-around isn't obvious and is prone to error because the partitionBy must match the table's partitioning, though it is not checked.

      I think when relations are resolved, the partitioning should be checked and updated if it isn't set.

      Attachments

        Activity

          People

            rdblue Ryan Blue
            rdblue Ryan Blue
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: