How to output translated OCR ?


As I understand, the OCR feature reads text from a PDF and creates “hidden text”… which I can still extract to an XML and translate. So far so good (so great actually!) But when I import back the translation, only the “hidden text” is translated (and doesn’t seem to align with the original text). I need to produce a translated PDF I can send back to my client - what’s the best way? I feel like I’m so close yet so far.


Hi Marc,

When you OCR you’ll need to select the “Editable Text” option from the PDF Style drop down. This will result in a PDF without the original page image in front of the OCRed text. You should then be able to translate the PDF in the manner you describe. Please be aware that this may not produce a PDF that is similar to the original, particularly if the original is highly formatted and has illustrations etc.

Regards, Simon.