Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2549

NoSuchMethodException "CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)" parsing certain .docx files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.17
    • 1.19, 2.0.0
    • parser
    • None
    • Windows 10, JDK 1.8, Tomcat 8.5.x

       

    Description

      Parsing certain Word .docx files results in logging of the stacktrace below. This looks very similar to TIKA-792. If needed, I can provide a test file that reproduces the problem.

      java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)
      at java.lang.Class.getConstructor0(Class.java:3082)
      at java.lang.Class.getDeclaredConstructor(Class.java:2178)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1954)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1943)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
      at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:927)
      at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1669)
      at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
      at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
      at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
      at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
      at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
      at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
      at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
      at org.labkey.search.model.LuceneSearchServiceImpl.parse(LuceneSearchServiceImpl.java:909)
      at org.labkey.search.model.LuceneSearchServiceImpl.processAndIndex(LuceneSearchServiceImpl.java:562)
      at org.labkey.search.model.AbstractSearchService._indexLoop(AbstractSearchService.java:977)
      at org.labkey.search.model.AbstractSearchService.lambda$new$4(AbstractSearchService.java:914)
      at java.lang.Thread.run(Thread.java:748)

      Attachments

        1. blog post.docx
          18 kB
          Adam Rauch

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adam@labkey.com Adam Rauch
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: