Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11136

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.0, 7.1, 8.0
    • None
    • None

    Description

      Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

      protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
      {
        if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
          throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
        }
      
        // ...
      
        while( true ) 
        {
          switch (parser.next()) {
          case XMLStreamConstants.START_ELEMENT:
            depth++;
            builder.setLength( 0 ); // reset the text
            type = KnownType.get( parser.getLocalName() );
      
            // ...
            
            // NOTE: nothing in this loop modifies 'type' 
            // so the 'while' is totally inappropriate even if there was no bug
            while( type == KnownType.DOC) {
              doc.addChildDocument(readDocument(parser));
              int event = parser.next();                                // PROBLEMATIC
              if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
                return doc;
              }
            }
            
            // ...
      

      Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hossman Chris M. Hostetter
            hossman Chris M. Hostetter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment