Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16291

Decay function queries gauss,linear,exponential

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 9.0
    • None
    • query parsers, search
    • None

    Description

      Description

      This is a Solr version of the Decay functions [available in Elasticsearch|https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html ]

      To see how the functions work see here

      Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.

      To use distance scoring on a query that has numerical fields, the user has to define an origin and a scale for each field. The origin is needed to define the “central point” from which the distance is calculated, and the scale to define the rate of decay. The decay function is specified as
       

      <decay_function>(<field-name>,scale,origin,offset,decay) for numerical/date field
      <decay_function>(<field-name>,scale,origin_lat,origin_lon,offset,decay) for geo fields 
      • <decay_function> should be one of 'linear', 'exp', or 'gauss'
      • The <field-name> must be a NumericFieldType, DatePointField, or LatLonPointSpatialField field, NOT multi-valued. e.g. linear("location","23km",52.0247, -0.490,"0km",0.5)
        In the above example, the field is a geo_point and origin can be provided in geo format. scale and offset must be given with a unit in this case. If your field is a date field, you can set scale and offset as days, hours, as with DateMath.
         
        e.g. gauss(pdate,"+2DAY+6HOUR","2021-07-20T00:00:00Z","+3DAY",0.5)
         
        pdate: DatePointField "+2DAY+6HOUR": range "2021-07-20T00:00:00Z: origin (defaults to NOW) "+3DAY: offset (defaults to zero) 0.5: decay{}

       

      • origin The point of origin used for calculating distance. Must be given as a number for numeric field, date for date fields and geo point for geo fields. Required for geo and numeric field. For date fields the default is NOW. Date math (for example NOW-1h) is supported for origin.
      • scale Required for all types. Defines the distance from origin + offset at which the computed score will equal decay parameter. For geo fields: Can be defined as number+unit (1km, 12m,...). Default unit is KM. For date fields: Can to be defined as a number+unit ("1h", "10d",…). For numeric field: Any number.
      • offset If an offset is defined, the decay function will only compute the decay function for documents with a distance greater than the defined offset. The default is 0.
      • decay The decay parameter defines how documents are scored at the distance given at scale. If no decay is defined, documents at the distance scale will be scored 0.5.

       
      To get a feel for how these function work you can see here on desmos . Adjust origin, offset, scale and decay to get a feel of how these parameters adjust the equation for gauss, exp or linear.

      Supported decay functions

       
      The DECAY_FUNCTION determines the shape of the decay:

      gauss Normal decay, computed as:

      score(doc) = exp(- (max(0,|doc.val - origin| - offset)^2)/2sig^2)

      where sig is computed to assure that the score takes the value decay at distance scale from origin+-offset

      sig^2 = -scale^2/(2.ln(decay))

      exp Exponential decay, computed as:

      score(doc) = exp(lmda . max(0,|doc.val - origin| - offset))

      lmda = ln(decay)/scale

      where again the parameter lambda is computed to assure that the score takes the value decay at distance scale from origin+-offset

      linear Linear decay, computed as:

      score(doc) = max((s-v)/s,0)

      where: v = max(0,|doc.val - origin| - offset) s = scale(1.0-decay))

      where again the parameter s is computed to assure that the score takes the value decay at distance scale from origin+-offset

      In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value.

      For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):

      Detailed example

      Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.

      You would like the query results that match your criterion (for example, "hotel, Nancy, non-smoker") to be scored with respect to distance to the town center and also the price.

      Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.In this case your origin for the location field is the town center and the scale is ~2km.

      If your budget is low, you would probably prefer something cheap above something expensive. For the price field, the origin would be 0 Euros and the scale depends on how much you are willing to pay, for example 20 Euros.

      In this example, the fields might be called "price" for the price of the hotel and "location" for the coordinates of this hotel.

      The function for price in this case could be:

      gauss("price",20,0) //or linear,exp 

      and for location:

      gauss("location","2km",11,12) //or linear,exp

      Suppose you want to multiply these two functions on the original score, the request would look like this:

      b=mul( gauss("price",20,0),gauss("location","2km",11,12)) 
      &q={!boost b=$b v=$qq} 
      &qq={!edismax }*:* 
      &sort=score+desc 
      &fl=*,score

      Suppose your original search results matches three hotels :

      • "Backback Nap"
      • "Drink n Drive"
      • "BnB Bellevue".

      "Drink n Drive" is pretty far from your defined location (nearly 2 km) and is not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56.

      "BnB Bellevue" and "Backback Nap" are both pretty close to the defined location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86 whereas "Backpack Nap" gets a value of 0.66.

       
       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rosher Dan Rosher
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h