[SOLR-11136] XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.0, 7.1, 8.0
Component/s: None
Labels:
None

Description

Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
{
  if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
    throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
  }

  // ...

  while( true ) 
  {
    switch (parser.next()) {
    case XMLStreamConstants.START_ELEMENT:
      depth++;
      builder.setLength( 0 ); // reset the text
      type = KnownType.get( parser.getLocalName() );

      // ...
      
      // NOTE: nothing in this loop modifies 'type' 
      // so the 'while' is totally inappropriate even if there was no bug
      while( type == KnownType.DOC) {
        doc.addChildDocument(readDocument(parser));
        int event = parser.next();                                // PROBLEMATIC
        if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
          return doc;
        }
      }
      
      // ...

Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

Attachments

SOLR-11136.patch
22/Jul/17 02:56
8 kB
Chris M. Hostetter

Issue Links

Add Link

blocks

SOLR-10494 Switch Solr's Default Response Type from XML to JSON

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Chris M. Hostetter

Reporter:: Chris M. Hostetter

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Jul/17 02:56

Updated:: 08/Jun/19 18:20

Resolved:: 22/Jul/17 03:33

Agile

View on Board

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment