Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16267

JSON Facet Stats methods include docs with no field value when using nested function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 8.11.1
    • None
    • Facet Module
    • Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-184-generic x86_64)

      Java(TM) SE Runtime Environment (build 1.8.0_211-b12)

    Description

      I’m noticing some unexpected and undesirable behavior when using JSON Facet API with Stats functions when using nested functions.  Below is an example which hopefully helps illustrate the behavior I’m seeing.

       

      I have a JSON Facet string of the following:

      json.facet={
         "grp_0": {
            "field": "ssnm",
            "limit": -1,
            "type": "terms",
            "mincount": 1,
            "refine": true,
            "sort": {"index": "asc"},
            "facet": {
               "avg_TotalCpuUsec": "avg(TotalCpuUsec)",
               "avg_sqrt_TotalCpuUsec": "avg(sqrt(TotalCpuUsec))",
               "count_TotalCpuUsec": "countvals(TotalCpuUsec)",
               "count_sqrt_TotalCpuUsec": "countvals(sqrt(TotalCpuUsec))",
               "sum_TotalCpuUsec": "sum(TotalCpuUsec)",
               "sum_sqrt_TotalCpuUsec": "sum(sqrt(TotalCpuUsec))"
            }
         }
      }
      

       

      And an example of one of the buckets returned is:

        "facets":{
          "count":32,
          "grp_0":{
            "buckets":[{
                "val":"Activity",
                "count":6,
                "count_sqrt_TotalCpuUsec":6,
                "sum_sqrt_TotalCpuUsec":495.29246931322893,
                "count_TotalCpuUsec":4,
                "sum_TotalCpuUsec":61464.399999999994,
                "avg_TotalCpuUsec":15366.099999999999,
                "avg_sqrt_TotalCpuUsec":82.54874488553816},
      .
      .
      .
      } ]}}}
      

       

      Notice that there are 6 documents in the bucket, but only 4 of them have the field “TotalCpuUsec”, which is reflected in value for countvals(TotalCpuUsec).  My issue is with the calculation of avg(sqrt(TotalCpuUsec)).  The calculation of avg(TotalCpuUsec) is correct, equaling sum(TotalCpuUsec) / 4.  However, the value of avg(sqrt(TotalCpuUsec)) equals sum(sqrt(TotalCpuUsec)) / 6.  I think it should have been divided by 4, since only 4 documents have a value for this field.  It appears that sqrt(TotalCpuUsec) is returning 0.0 for documents that don’t have the field, so this 0.0 for the 2 documents is factoring into the avg calculation, which seems to be reflected by the value of countvals(sqrt(TotalCpuUsec)), which is 6.

      This seems like a bug, but wanted to reach out to see if this is “working as expected” and if there are some facet attributes that can be set to work around this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jbnas Gerald Bonfiglio
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: