Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16019

UTF-8 parsing errors for parameters should cause a HTTP 400 status code, not 500

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 9.0, 8.11.2
    • None
    • None

    Description

      When I make a request to an URI like /solr/my_core/query?q=%C0, I get a HTTP 500 status code with a stack trace originating at

      org.apache.solr.common.SolrException: URLDecoder: Invalid character encoding detected after position 2 of query string / form data (while parsing as UTF-8)
      at org.apache.solr.servlet.SolrRequestParsers.decodeChars(SolrRequestParsers.java:421)

      The obvious reason is that the q parameter value looks like the first byte in a multibyte utf-8 sequence, but that sequence is incomplete/invalid. I have seen a few more instances of this in our monitoring, also with different places where the problem surfaces. [Other issues unrelated, will file separate issues.]

      Instead of the HTTP 500 status code, something like e. g. HTTP 400 (Bad Request) would be more appropriate. It would also make processing in downstream systems (that have to deal with Solr’s response) much easier if this class of errors could be recognized.

      Also, if I look at the place where the exception is being thrown (https://github.com/apache/solr/blob/releases/lucene-solr/7.7.3/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L419-L422), care was taken to use the `ErrorCode.BAD_REQUEST` status. This information, however, seems to be lost along the way.

      Attachments

        Activity

          People

            janhoy Jan Høydahl
            mpdude Matthias Pigulla
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h