Whether InFix can recognize soft hyphen in a PDF?
If so, when I extract text from PDF, hyphens are retained even though the “De-hyphenate words” is selected. If I copy the text from Adobe Reader then the hyphen was not retained for that word, which means in the PDF it is a soft-hyphen. Is there a way to resolve this issue?
I’ve been hitting hard from my customer for hyphen issue. So, any quick resolution will be appreciated.
Infix does try to remove soft hypens when exporting.
The way it works is discussed in another thread: http://www.iceni.com/forum/viewtopic.php?id=141.
However, there can always be issues with particular PDFs. Could you send me yours to look at? I will then be able to advise you on the cause of the unwanted hyphens. Please send your example to email@example.com
I checked the pdf that you sent us and Infix de-hyphenates the extracted text correctly.
You are probably not doing one of the following:
You need to set your spelling language to “Dutch” as the text in the document is in the dutch language. To do this select “File->Preferences…” from the main menu. Click on the “Spelling” tab, select “Dutch” from the “Language:” combo box and press “OK”. Then follow the instructions to install the Dutch language pack.
When exporting the document click on the “Format…” button on the “Export Pages” dialog to open the “Export Format” dialog. On the “Text” tab ensure that the “De-Hypenate words” check box is checked.