Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12197

Implement sampling for logistic regression classifier

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streaming expressions
    • None

    Description

      Currently the train Streaming Expression trains a logistic regression model by iterating over the entire distributed training set on each training iteration. Each training iteration involves building a matrix on each shard with the number of rows equal to the size of the training set contained on the shard. The number of columns will be the number of features. This scenario can create very large matrices when working with large training sets and feature sets.

      This ticket will add a sample parameter which will limit the size of the training set on each iteration to a random sample of the training set. This will allow for much larger training sets.

      Attachments

        Activity

          People

            jbernste Joel Bernstein
            jbernste Joel Bernstein
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: