Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5432

OrcStorage fails to detect schema in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.18.0
    • None
    • None
    • Reviewed

    Description

      OrcStorage needs to detect the schema of input data paths. If some data paths have no ORC files (perhaps only a _SUCCESS marker is present), this will fail.

      For example:

      A = LOAD '/path/to/20230101,/path/to/20230102' USING OrcStorage();
      

      If /path/to/20230101 contains only a _SUCCESS marker and 20230102 contains data, OrcStorage fails to detect the schema and Pig exits with a confusing/unhelpful error, something like "Cannot find any ORC files from <locations>. Probably multiple load/store statements in script."

      The code tries to use a search algorithm to recursively search through all input paths for the data (via Utils.depthFirstSearchForFile), but it is implemented incorrectly and returns early in this scenario.

      See: https://github.com/apache/pig/blob/c0d75ba930f9aa5c6454d0264a96f82b45279202/src/org/apache/pig/builtin/OrcStorage.java#L389-L408

      https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/util/Utils.java#L629-L667

      I'll attach a patch.

      Attachments

        1. PIG-5432.v01.patch
          0.7 kB
          Jacob Tolar

        Activity

          People

            jtolar Jacob Tolar
            jtolar Jacob Tolar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: