Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6599

MergeRecord fails if 'fragment.count' attribute equals the number of records within a FlowFile where it should wait for remaining FlowFiles

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.11.0
    • Extensions
    • None

    Description

      RecordBinManager.createThresholds has following code block:

      if (MergeRecord.MERGE_STRATEGY_DEFRAGMENT.getValue().equals(mergeStrategy)) {
          fragmentCountAttribute = MergeContent.FRAGMENT_COUNT_ATTRIBUTE;
          if (!StringUtils.isEmpty(flowfile.getAttribute(fragmentCountAttribute))) {
              minRecords = Integer.parseInt(flowfile.getAttribute(fragmentCountAttribute));
          }
      } else {
          fragmentCountAttribute = null;
      }
      

      The code uses 'fragment.count' as the minRecords. This is wrong because 'fragment.count' represents the number of fragments, i.e. number of FlowFiles holding partial record set.

      This causes a FlowFile to be sent 'failure' relationship where it should be hold in the incoming connection. For example, when a FlowFile is split into two FlowFiles, and each has 2 records in it, 'fragment.count' will be 2. In this case, MergeContent thinks the minRecords is 2, where 4 is correct. Then the first FlowFile is processed, while the 2nd one hasn't arrived, MergeContent misunderstood that the bin reached to the minimum number of records. But since there's only one FlowFile, it sends the FlowFile to 'failure'.

      The issue can be reproduced by the attached template.

      Attachments

        1. NIFI-6599.xml
          44 kB
          Koji Kawamura

        Issue Links

          Activity

            People

              ijokarumawak Koji Kawamura
              ijokarumawak Koji Kawamura
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h