[SPARK-19600] ArrayIndexOutOfBoundsException in ALS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 2.0.1
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

Understand issue ~~SPARK-3080~~ closed, but I don't understand yet what cause the issue: memory, parallelism, negative userID or product ID?

I consistently ran into this issue with different set of training set, can you suggest any area to look at?

java.lang.ArrayIndexOutOfBoundsException: 221529807
at org.apache.spark.ml.recommendation.ALS$$anonfun$partitionRatings$1$$anonfun$apply$6.apply(ALS.scala:944)
at org.apache.spark.ml.recommendation.ALS$$anonfun$partitionRatings$1$$anonfun$apply$6.apply(ALS.scala:940)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:211)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Attachments

Issue Links

duplicates

SPARK-3080 ArrayIndexOutOfBoundsException in ALS for Large datasets

Closed

Activity

People

Assignee:: Unassigned

Reporter:: zhengxiang pan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Feb/17 19:29

Updated:: 14/Feb/17 20:06

Resolved:: 14/Feb/17 19:49