Skip to main content

EUDI: Extract Tabular Data From Any File

PDF format is a de facto standard for exchanging documents or archiving information since PDF files are non editable and the document layout is preserved.

For this reason, most of statistical data like corporate annual reports, stock performance, sale figures, etc are available only in the form of PDF files.

But a common problem associated with PDF is that it becomes very difficult to extract data from them. Though there are some workarounds and tools to extract text out of PDF files, they fail to preserve the original document layout.

Say, the account department sends you the annual salary sheet in PDF which you want to export to Excel and do some analysis. Or the sales people send their weekly sales report in a plain text email that you like to represent as charts in Excel or even Microsoft Word.

Cogniview, an Israel based company, have released EUDI 1.0, a data extraction software that works with PDF or with any Windows Application that has the Print feature.

Like Acrobat, EUDI [End User Data Integrator] installs as a printer on your machine. When you want to extract data from PDF, Windows CHM Help files, Notepad files or even Web-based Email, just print the file or browser window using the EUDI printer.

The document then opens inside EUDI PDF to Excel conversion software where you visually mark the area using mouse that you want to extract. EUDI is smart enough to recognize the table layout and formatting and will automatically split the selected area into rows and columns.

It draws a marquee around the detected rows and columns. If there's a mistake in automatic detection, you can change manually like you do with Cell Selections in Microsoft Excel - Merge Cells, Split Rows/Columns or even Delete them.

The extracted text results are shown in real time in the bottom pane. Once you are satisfied with the adjusted layout, you can either export it to Microsoft Word, Excel or just copy to the clipboard. The file can also be saved in EUDI specific format for editing and exporting later.

EUDI's user interface is very intuitive and you can immediately get started without even reading the help manual. There's also a nice video screencast on their website if you like to see the software in action.

Download EUDI - This Windows-only software supports Office 97, 2000, XP and 2003. Product Activation is mandatory for using the software. It can be done online or via Fax/E-mail.

Don't confuse EUDI with an OCR software like OmniPage or Abby FineReader. EUDI will extract only the data that was embedded as text, it won't interpret a graphic image like a text logo.