Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386 Data Source V2 improvements
  3. SPARK-23204

DataSourceV2 should support named tables in DataFrameReader, DataFrameWriter

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 3.0.0
    • SQL
    • None

    Description

      DataSourceV2 is currently only configured with a path, passed in options as path. For many data sources, like JDBC, a table name is more appropriate. I propose testing the "location" passed to load(String) and save(String) to see if it is a path and if not, parsing it as a table name and passing "database" and "table" options to readers and writers.

      This also creates a way to pass the table identifier when using DataSourceV2 tables from SQL. For example, SELECT * FROM db.table creates an UnresolvedRelation(db,table) that could be resolved using the default source, passing the db and table name using the same options. Similarly, we can add a table property for the datasource implementation to metastore tables and add a rule to convert them to DataSourceV2 relations.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdblue Ryan Blue
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: