Details
-
Improvement
-
Status: Resolved
-
P3
-
Resolution: Not A Problem
-
None
-
None
Description
When attempting to use KafkaIO.Read with DataflowRunner, I have hit a lot of walls. The brokers need to be accessible both locally and from the dataflow runner instances. This means, when using TLS authentication, the keystore/truststore files need to be available locally and on the instances. I programmatically add the files to the pipeline options with
List<String> filesToStage = PipelineResources.detectClassPathResourcesToStage(IndicatorIngest.class.getClassLoader()); filesToStage.add("trust.p12"); filesToStage.add("server.p12");
but even when I do this, the remote file names are different. This means that I need to determine the remote file name myself, like this
PackageAttributes.forFileToStage(new File(filepath), filepath).getDestination().getName();
but that function is package private, so I need to wrap this call with a custom class in org.apache.beam.runners.dataflow.util. When I calculate this filename, I can use it to set the ssl.<thing>.location, but this is the wrong location locally, and it needs to be correct both locally and remotely. This means in my main I need to calculate the local files remote names, copy them to the local path with the same name, dynamically set the property to this path, and programmatically add these files to be staged so they hopefully have the same name on the worker. KafkaConsumer doesn't seem to provide any other way to specify where to get these keys from.
My question is, I am supposed to be jumping through all these hoops, or am I doing something (or multiple things) completely wrong?