Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.5
Description
org.apache.tika.parser.microsoft.WordExtractor.parseWord6() uses the deprecated Word6Extractor.getParagraphText() method. getParagraphText() is supposed to return a String[] with an element for each paragraph in the text. The replacement is getText(), which lets paragraph, cell, etc separation be implementation specific. I'm not sure, at this point, how the POI WordExtractor separates them.