Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28365

Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • ML
    • None

    Description

      Because the local default locale isn't in available locales at Locale, when I did some tests locally with python code, StopWordsRemover related python test hits some errors, like:

      Traceback (most recent call last):
        File "/spark-1/python/pyspark/ml/tests/test_feature.py", line 87, in test_stopwordsremover
          stopWordRemover = StopWordsRemover(inputCol="input", outputCol="output")
        File "/spark-1/python/pyspark/__init__.py", line 111, in wrapper
          return func(self, **kwargs)
        File "/spark-1/python/pyspark/ml/feature.py", line 2646, in __init__
          self.uid)
        File "/spark-1/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj
          return java_obj(*java_args)
        File /spark-1/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
          answer, self._gateway_client, None, self._fqn)
        File "/spark-1/python/pyspark/sql/utils.py", line 93, in deco
          raise converted
      pyspark.sql.utils.IllegalArgumentException: 'StopWordsRemover_4598673ee802 parameter locale given invalid value en_TW.'
      

      As per hyukjin.kwon's advice, instead of setting up locale to pass test, it is better to have a workable locale if system default locale can't be found in available locales in JVM. Otherwise, users have to manually change system locale or accessing a private property _jvm in PySpark.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: