Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Wait/Notify processors are used quite heavily. These processors are very powerful and allow for many different use cases. However, offering this power is done at the expense of making the processors difficult to configure.
The most common use case, it seems, is to simply allow a Process Group to process only a single FlowFile at a time. We see questions about how to accomplish this fairly frequently in Slack and on the mailing list.
I propose that we add a new feature to NiFi so that when a user configures a Process Group, they can configure the FlowFile Concurrency: either unbounded (which is the current behavior) or a single FlowFile at a time on each node. In the latter case, only a single FlowFile will be ingested by a Local Input Port, and no more FlowFiles will be ingested as long as there is data queued in the Process Group. Once all data has left the Process Group, the next FlowFile will be allowed through.
This has several advantages over the Wait/Notify pair of Processors. Firstly, there's no need to create a pair of two Processors and ensure that they are used in concert together properly. Secondly, there aren't a lot of properties to configure. Thirdly, implementing this at the framework level and with limited features means the implementation can be much simpler than that of Wait/Notify, which means it is much easier to maintain.
Additionally, a related concept can be easily introduced: the notion of a FlowFile Outbound Policy. This is analogous to the FlowFile Concurrency but is related to Output Ports. Here, the use could configure the group such that data should be transferred out of the Process Group as soon as it's available (which is the current behavior) or could be transferred as a batch. In the batch mode, the Output Ports would not transfer any data out of the Process Group until all FlowFiles are queued up at an Output Port (i.e., all processing has finished).
This allows for very simple configuration for an oft-requested capability: the ability to perform some action only after processing of a batch of data has completed.
Attachments
Issue Links
- is related to
-
NIFI-8681 Allow users to configure FlowFile Concurrency more specifically on ProcessGroup
- Open
- relates to
-
NIFI-7507 Update user guide to document FlowFile Concurrency and Outbound Policy of Process Groups
- Resolved
-
NIFI-7633 Provide Process Groups with a FlowFile Concurrency level for transferring a batch of FlowFiles at once
- Resolved
-
NIFI-7517 Update mapping to VersionedProcessGroup so that it allows for new FlowFIleConcurrency/OutboundPolicy
- Resolved
-
NIFI-7552 Add "batch.output.XYZ" attribute when Process Group is using batch output mode
- Resolved
- links to