Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-13027

Warn users for small files processing in PutIceberg

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Extensions

    Description

      While it can be a valid use case, it is a very bad idea to send a lot of small flow files via the PutIceberg processor as it will generate a massive amount of snapshot files. The recommendation is clearly to use a MergeContent/MergeRecord processor before the PutIceberg processor to make sure we limit the amount of individual files being sent to an Iceberg table. While we can't force a user (this could be a flow analysis rule though) we should let them know very clearly that what they're doing is likely a bad idea and let them know what is the recommended way. However if the user is sure they know what they're doing, they should be able to disable the warning.

      This Jira is about adding:

      • a property "Warn for small flow files" set to true by default
      • a property "Minimum recommended file size" set to 10MB (depending on the previous property, if set to true)

      And if the warning is enabled and a processed flow file is below the limit, then log a warning with the recommendation of using a Merge processor so that a bulletin is generated and shown to the user.

      Attachments

        Issue Links

          Activity

            People

              pvillard Pierre Villard
              pvillard Pierre Villard
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h