Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-78

AUC UDAF for BinaryClassificationMetrics

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.5.0
    • None

    Description

      Support Area Under ROC of Binary Classification Metrics.

      -- option 1
      select
         auroc(label, prob) as auroc
      from
         data;
      
      -- option 2
      WITH roc as (
        select
          roc(label, prob) as tpr, tpr
        from
          data
      )
      select
        auc(fpr, tpr) as auroc -- auc is UDAF, input is sorted by fp asc
      from (  
        select 
          fpr, tpr
        from
          roc
        DISTRIBUTE BY 
          floor(fpr / 0.2) -- 5 bins
        SORT BY 
          fpr ASC
      ) t
      

      Reference)
      http://www.citeulike.org/user/myui/article/12615084
      https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
      https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

      Attachments

        Issue Links

          Activity

            People

              takuti Takuya Kitazawa
              myui Makoto Yui
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: