Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4358

Seaborn renders plots slowly in apache zeppelin notebooks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.1
    • None
    • pySpark
    • None

    Description

      I am currently trying to generate visualizations in zeppelin (0.8.1) notebooks using the pyspark interpreter with python 3.7.3.

      Generating the following simple plot with seaborn (0.9.0) takes around 5 minutes (with very high CPU usage throughout the duration):

      %pyspark
      import seaborn as sns
      import numpy as np
      import pandas as pd

      data = pd.DataFrame(np.random.rand(100,3))

      sns.pairplot(data)

      This behavior is rather inconsistent as the following (much more data intensive) plot is rendered instantly

      %pyspark
      import seaborn as sns
      import numpy as np
      import pandas as pd

      df = pd.DataFrame(data = np.random.rand(10000,2))

      sns.lineplot(x = 0, y = 1, data = df)

      I noticed that using matplotlib (3.1.0) is generally much faster for and almost as snappy as I am used to from jupyter notebook environments.

      I have already read about issue [ZEPPELIN-1894|https://jira.apache.org/jira/browse/ZEPPELIN-1894] but I can render the mentioned scatterplot instantly as well.

       

      I already stated my question on StackOverflow but I think here is a better place:

      Attachments

        Activity

          People

            Unassigned Unassigned
            nawidsayed Nawid Sayed
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: