Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14189

Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.5, 9.0
    • query parsers
    • None

    Description

      The edismax and some other query parsers treat pure whitespace queries as empty queries, but they use Java's String.trim() method to normalise queries. That method only treats characters 0-32 as whitespace. Other whitespace characters exist - such as U+3000 IDEOGRAPHIC SPACE - which bypass the test and lead to 400 Bad Request responses - see for example /solr/mycollection/select?q=%E3%80%80&defType=edismax vs /solr/mycollection/select?q=%20&defType=edismax. The first fails with the exception:

      org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... <TERM> ...
      

      PR 1172 updates the dismax, edismax and rerank query parsers to use StringUtils.isWhitespace() which is aware of all whitespace characters.

      Prior to the change, rerank behaves differently for U+3000 and U+0020 - with the change, both the below give the "mandatory parameter" message:

      q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80 - generic 400 Bad Request

      q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20 - 400 reporting "reRankQuery parameter is mandatory"

      Attachments

        Issue Links

          Activity

            People

              uschindler Uwe Schindler
              andywebb1975 Andy Webb
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h