Iceni Technology Forum

Preparing pdf for text database


I need to see if I can prepare pdfs for a database in text format

At the moment, I save the pdfs as text files but the page numbers, headers and footnotes are all causing me grief as they ruin the database. Page breaks are also a problem

So I take them out manually but then this leaves gaps everywhere and so split sentences

Basically, I’d like to know if I can select the whole document after I take out stuff and ask Infix to reflow everything so that my exported text is ‘perfect’ or as near as possible.

Could show you on skype what I’d like but thought I’d use the forum

Can you help?



There is a way to get Infix to export and reflow a sequence of pages as if they are all linked in the same story.
You have to make an article thread spanning the pages concerned.

Go to the first page of the document containing text you want to export.
Select the article thread tool from the tool bar (looks like a wiggly snake).
Draw a box around the text you want to export.
Since this box will be the template for boxes across the document, make sure it covers not only the text you need on the current page, but is large enough to cover the important text on any following pages too.

Double-click in the box you just drew.

Name the Article and then check the ‘Duplicate’ check-box. You can then enter the page range across which you want the box to be replicated. Press OK and Infix will link up all the boxes.

Now choose File->export Articles… and export the articles in the document.
You may need to choose a Format… such as plain text.

Infix will export and try to link-up paragraphs from the boxes spanning that range of pages.

I’ll try this when I have time

The text box would have to be big enough to contain the all the text on all the pages.

Try programmatically…PDF from Database