Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23007

Add schema evolution test suite for file-based data sources

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1
    • 2.4.0
    • SQL, Tests
    • None

    Description

      A schema can evolve in several ways and the followings are already supported in file-based data sources.

      1. Add a column
      2. Remove a column
      3. Change a column position
      4. Change a column type

      This issue aims to guarantee users a backward-compatible schema evolution coverage on file-based data sources and to prevent future regressions by adding schema evolution test suites explicitly.

      Here, we consider safe evolution without data loss. For example, data type evolution should be from small types to larger types like `int`to`long`, not vice versa.

      As of today, in the master branch, file-based data sources have schema evolution coverages like the followings.

      File Format Coverage Note
      TEXT N/A Schema consists of a single string column.
      CSV 1, 2, 4  
      JSON 1, 2, 3, 4  
      ORC 1, 2, 3, 4 Native vectorized ORC reader has the widest coverage.
      PARQUET 1, 2, 3  

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: