Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29315

RDD.cache() called early creates problems

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 2.4.4
    • None
    • Spark Core
    • None
    • Apache Spark 2.4.4

      Windows 10

    Description

      First issue I post here.  I noticed that when I call RDD.cache() early in my code, the results are all wrong!
      If I remove the call to cache(), or I add cache later in the code, after the first map transformation, it works fine.
      The graph is created from a data structure that already contains the random.

       

      I have posted versions that work, and versions that don't work here in this gist.

      https://gist.github.com/mitchi/edd9637687cf47fac2616bb72932f8e7

      here is an output that works : 

      Colors of the graph

      3 2 1 3 2 1 1 4 2 3

      and an output that doesn't work :

      Colors of the graph

      25 16 36 49 3 1 6 15 10 3

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mitchi Edmond La Chance
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: