Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-143

DELTA encoding may exaggerate number of bits required

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • Java
    • None

    Description

      Consider the following code:

      RunLengthIntegerWriterV2.java, determineEncoding()
          this.min = literals[0];
          long max = literals[0];
          final long initialDelta = literals[1] - literals[0];
          long currDelta = initialDelta;
          long deltaMax = initialDelta;
          this.adjDeltas[0] = initialDelta;
      

      Given the following sequence of longs:

      {0, 10000, 10001, 10002, 10003, 10004, 10005}

      deltaMax would be 10000. deltaMax is used to determine the bit width of the encoded delta array, but the bit-packed output doesn't include the first delta--rather, it's encoded in Delta Base as a varint.

      I believe deltaMax should be set to 0 initially, allowing the later check for (i > 1) to ignore the first delta correctly.

      Sorry for no pull request with a regression test case. I'm not set up for java development here. It may also be that I'm reading this wrong.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ddrinka Douglas Drinka
            ddrinka Douglas Drinka
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment