By Jeremy Wagstaff
from the 21 February 2002 edition of the Far Eastern Economic Review, (c) 2003, Dow Jones & Company, Inc.

If you work for a corporation, institution or any set-up which considers a vision statement to be worthy of its resources, chances are you’ll be required to file regular reports on your comings, goings and sitting-still-and-doing-nothing sessions. And the chances are that no one will ever read these documents top to bottom. In fact, chances are that no one will read them at all. Heck, you probably don’t even read them. But they have to be done, or someone will notice and fire you.

But where does all this stuff go? In the old days we’d say with confidence, “landfill,” but in the digital age, no such luck. It all gets stored on some hard disk somewhere, no easier to find than its hard-copy forebears. Luckily, no one shows a pressing urge to want to find it, but what happens if they do? The sad truth is that all these zillions of e-mails, Word documents, Acrobat files, PowerPoint presentations and spreadsheets we produce don’t build us a supply of wisdom; they just get lost. In the lingo of the information game, it’s called unstructured data and unlike its rich cousin, structured data, which gets sifted by sophisticated programs wearing tin hats called data miners, it sits idle and largely inaccessible, unnoticed.

But there are signs that software developers are taking a closer look at this forgotten corner of the information superhighway and figuring out ways of imposing order on this unruly mass.

Logik, from Coredge Software Inc. (www.coredge.com), will take a document, or a whole directory, or hard drive, and sift — or parse — the contents, extracting the most important phrases, or themes as Logik calls them. Logik also generates a summary of the document. It does all this remarkably well, giving you a sense of the document in question along with a list of themes, from names and concepts to phrases like “vision statement” — all in less time than it takes to say: “What exactly is a vision statement and why do we need one?”

This process is great for handling large numbers of documents that you might need to retrieve at some point, but may not have the time to read all the way through. A keyword search for a phrase or theme will throw up a list of files that include that phrase. And if you select one of those documents you get a summary. Logik will also translate documents between major European languages and Japanese. I was impressed by the intuitive, uncluttered feel of this software.

But while automatic summarizing is a great concept which has come a long way in recent years, it’s by no means the main function users want to see in programs that organize their documents for them. To me the most important part of the process is a simple one: Can I find the document I’m looking for quickly, and can I view it immediately? While users can view the original document in Logik, it opens in a new window, making it less seamless than the rest of the program’s functions.

Document Search

For this kind of feature — finding quickly and viewing — you need Enfish Corp’s (www.enfish.com) Find, which indexes your hard drive and lets you find anything from a single word to a complex Boolean string quickly. Another program that offers a similar feature is 80-20 Software’s Retriever (www.80-20.com/products/retriever/) though at present it doesn’t let you preview the whole document (future versions will).

For software that does straight summarizing, check out Copernic Technologies’ Summarizer (www.copernic.com/products/summarizer), which does a great job of abridging anything on the fly, whether it’s a Web page, a Microsoft document or next door’s cat.

These programs make digging up any document you mislaid — or keeping track of colleagues’ documents — a whole lot easier. None of them comes cheap, however-Retriever is $50, Summarizer is $60 and Enfish Find is $70, while the standard version of Logik sells for $150. But to me that’s a good thing: These companies are aiming at a more discerning market with deeper pockets — in fact at exactly the sort of guys who spend their days writing reports that their bosses will never read.

16. January 2003
