[FLINK-2162] Implement adaptive learning rate strategies for SGD - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: Library / Machine Learning
Labels:
- ML

Description

At the moment, the SGD implementation uses a simple adaptive learning rate strategy, adaptedLearningRate = initialLearningRate/sqrt(iterationNumber), which makes the optimization algorithm sensitive to the setting of the initialLearningRate. If this value is chosen wrongly, then the SGD might become instable.

There are better ways to calculate the learning rate [1] such as Adagrad [3], Adadelta [4], SGD with momentum [5] others [2]. They promise to result in more stable optimization algorithms which don't require so much hyperparameter tweaking. It might be worthwhile to investigate these approaches.

It might also be interesting to look at the implementation of vowpal wabbit [6].

Resources:
[1] http://imgur.com/a/Hqolp
[2] http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html
[3] http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
[4] http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf
[5] http://www.willamette.edu/~gorr/classes/cs449/momrate.html
[6] https://github.com/JohnLangford/vowpal_wabbit

Attachments

Issue Links

relates to

FLINK-1889 Create optimization framework

Closed

Activity

People

Assignee:: Ventura Del Monte

Reporter:: Till Rohrmann

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Jun/15 11:42

Updated:: 28/Feb/19 22:57

Resolved:: 28/Feb/19 22:57