Order that text is extracted from Infix


We’ve had brief conversations about this but I’d like to ask on the forum

When I export text from a pdf in xlm format, I’ve noticed that the text does not appear in the xml file in the ‘logical’ order (logical meaning in the same sequence that the reader would probably read the text)

This means that the translator often finds himself translating ‘fragments’

In the ‘bird’ project I sent you, the pdf displays two of the client’s pages on one page of Infix i.e Infix page 15 shows pages 29 and 30 of the client’s .pdf

The problem is that Infix seems to have put the right hand pages (2 for example) before the left hand page (1) in the xml file

Not only is it difficult to translate, but when I produce a text version for the client, the pages are reversed too

ie 2, 1, 4, 3, 6, 5 etc

So tonight, I’m wondering if Infix can look at this for me

Incidently, it may be interesting to make this into an option (left page or right page first) as there are right to left languages out there too where the page on the right would be prior to the page on the left.

Thanks in advance for your time


We extract text in a pdf document in the order that it appears in the document definition. If Galleys in a document are linked when you read the document, but are coming out ‘fragmented’ in the XML output, you can link them before exporting the document using the Text Plus tool. After linking them with the text plus tool the galley should come out sequentially in the XML output.

Infix should not swap pages around when you import the xml into the document. Can you email us the pdf (before and after export) and the XML file you are importing to support@iceni.com? We will then have a look at the issue for you.




It’s the bird migration project if you still have it