Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • SQL
    • None

    Description

      For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job.

      Attachments

        Issue Links

          Activity

            People

              lian cheng Cheng Lian
              lian cheng Cheng Lian
              Cheng Lian Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: