Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37749

Built-in ORC reader cannot read data file in sub-directories created by Hive Tez

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.3, 3.1.2, 3.2.0
    • None
    • Input/Output, SQL
    • None
    • HDP 3.1.4

    Description

      A Partitioned Hive Table is created and load data in HDP 3.1.4. The Hive engine is Tez, and the storage format is ORC. The data direcotry is like:

      table1/statt_dt=2021-12-08/-ext-10000/000000_0

       

      The result of SparkSQL which is "select * from table1" does not include the data of partition 2021-12-08.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            IAmAdele Ye Li
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: