Description
Currently getting recursively a filtered list of files in a directory is clumsy because filtering should happen afterwards on the result list.
Imagine we want to list all non hidden files recursively.
The non hidden files filter is defined as:
!name.startsWith("_") && !name.startsWith(".")
Then we can do:
RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(path, /*recursive*/true); while (remoteIterator.hasNext()) { LocatedFileStatus each = remoteIterator.next(); if (filter applies to all of the path elements in each) { result.add(each); } }
For example each of these paths should be skipped:
- /.a/b/c
- /a/.b/c
- /a/b/.c/
It would be lot better to have a filter parameter on listFiles. This is needed to solve HIVE-22411 effectively.
Attachments
Issue Links
- blocks
-
HIVE-22411 Performance degradation on single row inserts
- Closed