Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-12923

PutHDFS to support appending avro data

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.26.0, 2.0.0-M3
    • None
    • None

    Description

      The goal of this ticket is to extend the PutHDFS processor with the ability to append avro records. The processor already provides an option to set 'append' as conflict resolution strategy, but that does not work correctly in case of avro files, because the serialized avro file cannot be deserialized again (because the binary content is invalid).

      Some notes about the implementation:

      • The user needs to explicitly select avro as file format and append as conflict resolution mode to enable 'avro append' mode, otherwise regular append mode will work just as before. There is no auto detection of mimetype for the incoming flowfile.
      • The records of the incoming flowfile and the ones in the existing avro file need to conform to the same avro schema, otherwise the append operation fails with incompatible schema.
      • The 'avro append' mode should only work when compression type is set to 'none', if any other compression type is selected in 'avro append' mode the user should get a validation error.

      The changes will have to be added to support/nifi-1.x branch also.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            balazsgerner Balázs Gerner
            balazsgerner Balázs Gerner
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 20m
                2h 20m

                Slack

                  Issue deployment