Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-1
Description
tarmstrong pointed out that after IMPALA-7924 the build output started displaying lines such as: "Running thrift 11 compiler on..." even during builds when Thrift files were not modified.
I dug a bit deeper and found the following:
- This seems to be happening for Thrift compilation of ext-data-source files as well (e.g. ExternalDataSource.thrift, Types.thrift, etc.); "Running thrift compiler for ext-data-source on..." is always printed
- The issue is that the custom command for ext-data-source and Thrift 11 compilation specify an OUTPUT file that does not exist (and is not generated by Thrift)
- According to the CMake docs "if the command does not actually create the OUTPUT then the rule will always run" - so Thrift compilation will run during every build
- The issue is that you don't really know what files Thrift is going to generate without actually looking into the Thrift file and understanding Thrift internals
- For C++ and Python there is a workaround; for C++ Thrift always generates a file {THRIFT_FILE_NAME}_types.h (similar situation for Python); however, for Java no such file necessarily exists (ext-data-source only does Java gen)
- This is how regular Thrift compilation works (e.g. compilation of beeswax.thrift, ImpalaService.thrift, etc.); which is why we don't see the issue for regular Thrift compilation
A solution for Thrift 11 compilation is to just add generated Python files to the OUTPUT for the custom_command.
A solution for Thrift compilation of ext-data-source seems trickier, so open to suggestions.
Ideally, Thrift would be provide a way to return the list of files generated from a .thrift file, without actually generating the files, but I don't see a way to do that.
Attachments
Issue Links
- relates to
-
IMPALA-7924 Generate Thrift 11 Python Code
- Resolved