Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13275

Add a toString method to BytesRefArrayWritable

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Trivial
    • Resolution: Unresolved
    • 1.1.0
    • None
    • None

    Description

      RCFileInputFormat cannot be used externally for Hadoop Streaming today cause Streaming generally relies on the K/V pairs to be able to emit text representations (via toString()).

      Since BytesRefArrayWritable has no toString() methods, the usage of the RCFileInputFormat causes object representation prints which are not useful.

      Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an array), so its important to output them in a valid/parseable manner, as opposed to choosing a simple joining delimiter over the string representations of the inner elements.

      I propose adding a standardised CSV formatting of the array data, such that users of Streaming can then parse the results in their own script. Since we have OpenCSV as a dependency already, we can make use of it for this purpose.

      Attachments

        1. HIVE-13275.000.patch
          2 kB
          Harsh J

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            qwertymaniac Harsh J

            Dates

              Created:
              Updated:

              Slack

                Issue deployment