XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • PySpark, SQL
    • None

    Description

      There needs to be user facing documentation that will show how to enable/use Arrow with Spark, what the user should expect, and describe any differences with similar existing functionality.

      A comment from Xiao Li on https://github.com/apache/spark/pull/18664

      Given the users/applications contain the Timestamp in their Dataset and their processing algorithms also need to have the codes based on the corresponding time-zone related assumptions.

      • For the new users/applications, they first enabled Arrow and later hit an Arrow bug? Can they simply turn off spark.sql.execution.arrow.enable? If not, what should they do?
      • For the existing users/applications, they want to utilize Arrow for better performance. Can they just turn on spark.sql.execution.arrow.enable? What should they do?

      Note Hopefully, the guides/solutions are user-friendly. That means, it must be very simple to understand for most users.

      Attachments

        Activity

          People

            bryanc Bryan Cutler
            bryanc Bryan Cutler
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: