Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24417 Build and Run Spark on JDK11
  3. SPARK-28877

Investigate/fix JAXB failure running Pyspark tests on JDK 11

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Build, PySpark
    • None

    Description

      It looks like we might have a test failure in Pyspark with JDK 11:

      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109686/console

      ======================================================================
      ERROR: test_linear_regression_pmml_basic (pyspark.ml.tests.test_persistence.PersistenceTest)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/tests/test_persistence.py", line 69, in test_linear_regression_pmml_basic
          model.write().format("pmml").save(lr_path)
        File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/util.py", line 175, in save
          self._jwrite.save(path)
        File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in __call__
          answer, self.gateway_client, self.target_id, self.name)
        File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/utils.py", line 89, in deco
          return f(*a, **kw)
        File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
          format(target_id, ".", name), value)
      Py4JJavaError: An error occurred while calling o529.save.
      : javax.xml.bind.JAXBException
       - with linked exception:
      [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
      	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:241)
      	at javax.xml.bind.ContextFinder.find(ContextFinder.java:477)
      	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:656)
      	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:599)
      	at org.jpmml.model.JAXBUtil.getContext(JAXBUtil.java:103)
      	at org.jpmml.model.JAXBUtil.createMarshaller(JAXBUtil.java:132)
      	at org.jpmml.model.JAXBUtil.marshal(JAXBUtil.java:77)
      	at org.jpmml.model.JAXBUtil.marshalPMML(JAXBUtil.java:67)
      	at org.apache.spark.mllib.pmml.PMMLExportable.toPMML(PMMLExportable.scala:44)
      	at org.apache.spark.mllib.pmml.PMMLExportable.toPMML(PMMLExportable.scala:78)
      ...
      

      The error is typical of other JDK 11-related incompatibilities, because Java 9 removed the built-in JAXB implementation from Sun. It appears that somehow the classpath is trying to load the 'old' JAXB implementation.

      It's curious because the JVM-based tests appear to pass. This suggests it may be more about how the Pyspark test classpath is constructed, and perhaps there is an old dependency or something selecting this implementation via a META-INF/MANIFEST.MF entry.

      It's also curious because we seemed to observe Pyspark tests passing with JDK 11 during earlier testing. This is likely to be more related to how Pyspark tests are run, but still needs a reproduction and an answer.

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              srowen Sean R. Owen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: