[SPARK-29272] dataframe.write.format("libsvm").save() take too much time - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: ML
Labels:
None

Description

I have a pyspark dataframe with about 10 thousand records，while using pyspark api to dump the whole dataset. It take 10 seconds. While I use filter api to select 10 records and dump the temp_df again. It take 8 seconds.why will it take so much time? How can I improve it? Thank you!

MLUtils.convertVectorColumnsToML(dataframe).write.format("libsvm").save('path'), mode='overwrite'),

temp_df = dataframe.filter(train_df['__index'].between(int(0,10))

Attachments

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: 张焕明

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Sep/19 09:23

Updated:: 12/Dec/22 18:11

Resolved:: 27/Sep/19 12:03

Agile

View on Board

dataframe.write.format("libsvm").save() take too much time

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment