Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45855

Unable to set compression codec for Hive CTAS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • SQL
    • None
    • Spark 3.4.0
      Stackable.tech release 23.7.0 which runs spark on K8s.

    Description

      Hi,

      We've discovered code that worked in Spark 3.3.0 doesn't in 3.4.0. I can't find anything in the release notes to indicate why, so I wonder if this is a bug. Thank you for looking.

      Here we're using our own custom codec, but we noticed we can't set gzip either.

        SparkConf conf = spark.sparkContext().conf();
        conf.set("hive.exec.compress.output", "true");
        conf.set("mapred.output.compression.codec", D2Codec.class.getName()); 
        spark.sql("CREATE TABLE b AS SELECT id FROM a");

      This will create the table, but it writes uncompressed files, where Spark 3.3.0 would write compressed files. 

      Any advice is appreciated and I can help run tests. We run Spark on K8S using the stackable.tech distribution.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              timrobertson100 Tim Robertson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: