Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39348

Create table in overwrite mode fails when interrupted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.1
    • None
    • Input/Output
    • None

    Description

      When you attempt to rerun an Apache Spark write operation by cancelling the currently running job, the following error occurs:

      Error: org.apache.spark.sql.AnalysisException: Cannot create the managed table('`testdb`.` testtable`').
      The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already exists.;

      This problem can occur if:

      • The cluster is terminated while a write operation is in progress.
      • A temporary network issue occurs.
      • The job is interrupted.

      You can reproduce the problem by following these steps:

      1. Create a DataFrame:

      val df = spark.range(1000)

      2. Write the DataFrame to a location in overwrite mode:

      df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")

      3. Cancel the command while it is executing.

      4. Re-run the write command.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Uvarov Max
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: