Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.0
-
None
-
None
-
Hortonworks
Description
NiFi HDFS processors (ListHDFS, PutHDFS etc) don't allow using a nameservice / namenode address present in provided hdfs-site.xml if it is not the hdfs defaultFS:
2019-03-15 11:10:15,589 ERROR [Timer-Driven Process Thread-11] o.apache.nifi.processors.hadoop.ListHDFS ListHDFS[id=8101dc52-0169-1000-ffff-ffffe716ad6 e] Failed to perform listing of HDFS due to java.lang.IllegalArgumentException: Wrong FS: hdfs://<fqdn>:8020/HDFSTest3/<custom>-NiFi-test/OneFS-in, expected: hdfs://<hostname>: java.lang.IllegalArgumentException: Wrong FS: hdfs://<fqdn>:8020/HDFSTest3/<custom>-NiFi-test/OneFS-in, expected: hdfs://<hostname> java.lang.IllegalArgumentException: Wrong FS: hdfs://<fqdn>:8020/HDFSTest3/<custom>-NiFi-test/OneFS-in, expected: hdfs://<hostname> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:780) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:226) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:974) at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:118) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1041) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1048) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895) at org.apache.nifi.processors.hadoop.ListHDFS.lambda$getStatuses$0(ListHDFS.java:398) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.nifi.processors.hadoop.ListHDFS.getStatuses(ListHDFS.java:398) at org.apache.nifi.processors.hadoop.ListHDFS.onTrigger(ListHDFS.java:347) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
This works ok with other Hadoop services like MapReduce / DistCP, Spark etc so it appears to be a limitation of the NiFi HDFS processors.
It can be worked around by using separate configuration per HDFS processor but should ideally be able to use a normal hdfs configuration containing more than 1 nameservice which is common in environments with data transfer workflows between clusters.