Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47854

[PYTHON] Avoid shadowing python built-ins in python function variable naming

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.1, 3.5.0, 3.5.1, 3.3.4
    • None
    • PySpark
    • None

    Description

      Given that spark 4.0.0 is upcoming I wonder if we should at least consider renaming certain function variable naming in python. Otherwise, we may need to wait another 4 years to do so.

      Example

      https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768

      There are 8 uses of `len` and 35 `str` as variable names, both of which are python built-ins. Shadowing `str` is somewhat dangerous in that the following would be nonsensical:

      def foo(str: "ColumnOrName", bar: "ColumnOrName"):
          # str is variable now, cannot be used as type
          bar = if lit(bar) if isinstance(bar, str) else bar
      

       

      Now obviously this would be breaking change for user code if the function is called with kwargs style. If we rename `str` to `src` or `col`, certain old code using kwargs would break:

      # breaks:
      foo(str="x", bar="y")
      
      # okay:
      foo("x", bar="y")

      Is this change a possibility for 4.0? Or are we thinking that the kwargs breaking change is too big to make compared to the benefit?

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            liucao Liu Cao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: