Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3350

Spooldir source may collect empty files and write them to HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.0
    • None
    • Sinks+Sources
    • None

    Description

      When I collect data from spooldir source to HDFS,i found if an empty file is created in spoolDir, an empty file with the same name will appear on hfds. It seems unreasonable. After reading source coding,i fount this code the following conditions will never be true in SpoolDirectorySource class.

      public void run() {
      int backoffInterval = 250;
      boolean readingEvents = false;
      try {
      while (!Thread.interrupted()) {
      readingEvents = true;
      List<Event> events = reader.readEvents(batchSize);
      readingEvents = false;

      1. this conditions will never be true
        if (events.isEmpty()) { break; }

        .
        .
        .
        }

      Please confirm whether this phenomenon is a problem. In my opinion, collecting empty file is meaningless. Especially for HDFS, it is not allowed to store too many small files on HDFS. Even if the user puts a lot of empty files unconsciously, flume should process it instead of writing to HDFS.

      Attachments

        Activity

          People

            Unassigned Unassigned
            l00454651 Confused
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: