Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13903

Classification Model Confusion Matrix Discrepancy

    XMLWordPrintableJSON

Details

    Description

      Using features and train stream sources generate a model with TP, TN, FP, FN fields. For some reason, the summation of the values of these fields is sometimes less than the training set size.

       How to regenerate:

      1. Create two collections: cellphones and cellphones-model

      2. Indexing the attached dataset into cellphones

      3. Run the following expression:

      commit(cellphones-model,update(cellphones-model,batchSize=500,}}
         train(cellphones,
           features(cellphones, q=":", featureSet="featureSet",
       field="title_t",
       outcome="brand_i", numTerms=25),
       q=":",
       name="cellphones-classification-model",
       field="title_t",
       outcome="brand_i",
       maxIterations=100)))

      4. Run the following query to retrieve confusion matrix:

      search q=:&collection=cellphones-model&fl=name_s,trueNegative_i,truePositive_i,falseNegative_i,falsePositive_i,iteration_i&sort=iteration_i%20desc&rows=100

      The summation of the metrics TP, TN, FP, FN is always less than the training set size by one in this instance for all iterations.

      Attachments

        1. cellphones.csv
          7 kB
          Ahmed Adel

        Activity

          People

            Unassigned Unassigned
            ahmed.adel Ahmed Adel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: