Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3352

Unnecessary canary test will block on readonly spooldir while another trackerdir is set.

    XMLWordPrintableJSON

Details

    • Patch, Important

    Description

      Phenomenon

      In many cases, we have just read permission on spoolDir and write permission on trackerDir.

      However whenever flume starts for spooldir source , it will always try to create a '.canary' file in the spooling directory.

      Then it leads to the failure of some processing unnecessarily.

      Recur

      (Usually spooldir is mounted readonly from a nas in production and this time we create a readonly directory instead.)

      First we create the spoolDir by root user and create a file in it and it is readonly for others by default.

      su root
      mkdir /home/hadoop/testspooldir
      echo 'foo' > /home/hadoop/testspooldir/bar

      Then switch to another user (hadoop) who runs flume and make sure it has read permission for the spooldir.

      su hadoop
      mkdir /home/hadoop/testtrackerdir
      ll /home/hadoop/testspooldir
      >> total 4-rw-r--r-- 1 root root 4 Jan 16 19:15 bar

      now create the example.conf:

      a1.sources = r1
      a1.sinks = k1
      a1.channels = c1
      a1.sources.r1.type = spooldir
      a1.sources.r1.deletePolicy = never
      a1.sources.r1.spoolDir = /home/hadoop/testspooldir
      a1.sources.r1.trackerDir = /home/hadoop/testtrackerdir
      a1.sources.r1.trackingPolicy = tracker_dir
      a1.sinks.k1.type = logger
      a1.channels.c1.type = memory
      a1.channels.c1.capacity = 1000
      a1.channels.c1.transactionCapacity = 100
      a1.sources.r1.channels = c1
      a1.sinks.k1.channel = c1
      

      and start flume with it

      bin/flume-ng agent --conf conf --conf-file conf/example.conf --name a1 -Dflume.root.logger=INFO,console

      then the IOException is thrown.

      2020-01-16 19:16:12,777 (lifecycleSupervisor-1-0) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)] Unabl2020-01-16 19:16:12,777 (lifecycleSupervisor-1-0) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)] Unable to start EventDrivenSourceRunner: { source:Spool Directory source r1: { spoolDir: /home/hadoop/testspooldir } } - Exception follows.org.apache.flume.FlumeException: Unable to read and modify files in the spooling directory: /home/hadoop/testspooldir at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:195) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:89) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader$Builder.build(ReliableSpoolingFileEventReader.java:882) at org.apache.flume.source.SpoolDirectorySource.start(SpoolDirectorySource.java:111) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2024) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:185) ... 12 more

      Fix

      We just add the condition where this trick is necessary.

      The pr/patch will be submitted as as shown below.

      Or let it still exist and using warning log instead of exception thrown is better?

      Attachments

        1. FLUME-3352-0.patch
          2 kB
          taoyang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hackty taoyang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m