Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5324

support get unicode from embedded TrueTypeFont cmap

    XMLWordPrintableJSON

Details

    • Patch

    Description

      for some special pdf files like the one I attached, some text is missing from text extraction. after some debug and tests, found out that this can be fixed if we use Cmap from TrueTypeFont too.

      I will submit a patch soon

      Attachments

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              iamgd67 qiang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: