Tag Archives: Index

Some PaperPort Tips

Further to my column in this week’s WSJ.com/AWSJ on PaperPort Pro and PaperMaster Pro, here are some tips on getting the most out of the former from Bob Anderson (ScanSoft Regional Director Asia Pacific and Japan):

Tip 1 – Using PaperPort PageViewer as a separate application and displaying PaperPort across two monitors. By default when you double click on a file it opens PageViewer as an integrated window over the PaperPort Desktop. You can change this behaviour under PaperPort Options: Desktop: Page View. Select “Display items in: PageViewer Application”. Now the PaperPort PageViewer “floats” over the PaperPort Desktop and can be moved independently. Unfortunately a single monitor does offer enough space to make this productive. To really get the benefit of the feature I have an external monitor connected to my laptop. You can also put an additional video card in a desktop computer. The Display control panel in Microsoft Windows will allow you to extend you Windows desktop across two monitors. I keep PaperPort opened on my right monitor at all times and when I double click on a file they appear in the PageViewer on my left monitor. Once you start working this way you will never go back to a single monitor again.

I can now instantly “zoom” onto a document without covering the PaperPort Desktop and with the new Page Thumbnails I can click and drag pages from the PaperPort PageViewer onto the PaperPort Desktop and vice-versa. This is essential when working with documents that are all text, like a contract, where one page thumbnail looks like the next because there are no distinguishing graphics or pictures.

Tip 2 – Using Page Names to quickly identify the content of individual pages.

On multi-page PDF documents I can quickly identify a particular page by sight, (such as the signature page of a contract or the beginning of a topic), by using Page Names. Click on a document and select Page Thumbnails. Click on the Page Thumbnail you wish to name. Click on the Item Properties button on the command bar. The Item Properties will appear in the Function Pane to the left. There will be a Page (x) of (x) field where you can type a page name. Enter a relevant page name and click the Save button.

If you need to see the full page to properly identify what the page name should be you can also add page names using the PaperPort PageViewer. Double click to open the document in the PaperPort PageViewer. Click on the page you wish to name. Select File: Item Properties under the file menu. The item properties will appear as a floating dialog. Enter the page name in the Page (x) of (x) field and click the Save button.

Tip 3 – Quickly finding an important page within a document. (Searching for the exact location of an annotation within a PDF document)

If I need to find a particular page at a later date I will mark it with a standard annotation so I can search for it later. I use annotations that make sense to me such as “please review”, “please call” and “important”.

I use the “Note” tool in the PaperPort PageViewer to mark the pages with these annotations. I then add the file to the All-in-One search index. When I need to go to that page again I do a search using the All-in-One search to locate the file by using “Name, author, keyword…” search with only “Annotations” checked and “Use All-in-One index” unchecked. Even though I initially added the entire file to the All-in-One index I only want PaperPort to find my note annotations and not all my other files containing the same words as my notes. This will limit my searches to quickly find the specific files I am trying to locate.

Once the file is located, if the annotation is buried on a page deep within a long document, I double-click it to open it in the PaperPort PageViewer. I then select Tools: Find in the PageViewer menu and search the document for annotations only. This immediately jumps me to the page that I had been trying to locate.

dtSearch: Not Dead. Not Yet.

Despite my love of indexers (and I’m in Seventh Heaven now that all the big boys are throwing out desktop search engines like it was a Bay City Rollers’ reunion) I still stick for most of my searching with dtSearch. It’s expensive, it’s tough, it’s ugly, but it gets the job done. And now they’ve added a feature which might not get you too excited, but for me is key: better viewers (or file parsers, if you want to get technical) for Microsoft documents.

Version 6.5 of dtSearch Desktop (free to those who are 6.x users) means you can see Word documents or Excel spreadsheets or PowerPoint presentations in their original glory. Now folks are going to say, well I can do that with X1 or more or less any of the other indexers that include built-in viewers, but I’d like to correct you: You can’t. Well you can if you don’t have big files, but over a certain size, you will get an error. And I have big Word files, all tabled up, and they nearly always don’t appear. In dtSearch they did come up, but not in with any decent formatting. Now they do. (Other features listed here.)

DtSearch, long the mainstay of a once sparse field, is not going away quietly. Good for them.

Another Indexing Program…

Further in my pursuit of the perfect search and indexing software, Sean Franzen points me to Vancouver-based Wisetech Software and their Archivarius 3000 which, he says, “recognizes more file formats than DiskMeta, allows you to index data on network drives and locate your indexes on network drives. The price is very competitive also. Development has been very active for the past six months.”

It looks interesting and worth checking out. On initial glance it lacks the thing I love most about X1, Enfish and the others: a preview pane built in that lets you view the whole file, not just the context of the found string. Archivarious costs between $20 and $45, depending on whether you’re a student, and individual or a commercial entity.

Ukraine Weighs In On The Search Stakes

Another addition to my index of indexing programs: diskMETA, from <META> Inc. “the largest search engine provider in Ukraine and a leader in Cyrillic multilingual search engine morphology technologies”.

A press release issued today says diskMETA is one of the fastest desktop search engines, and is available both as freeware and shareware. The program “is intended for extra large data volumes, UP TO 100 GIGABYTES. It can create up to 100 indexes, index up to ONE MILLION various files. The search time is never more than ONE SECOND”. It works on all Windows platforms (98 or higher).

The file search works with Office document formats (DOC, XLS, RTF, TXT), HTML pages, CHM, PDF files, ZIP and RAR archives. There are three versions: Lite (free), Personal ($50) and Pro, which supports morphological English searches and Intranet wide searches ($100)

The search technology used in diskMETA, apparently, “has a long and glorious history. It is used for a decade in the nationwide biggest and most popular web search engine www.meta.ua, in a series of search tools for web-sites and CD-rooms installed in most governmental and financial national institutions” in the Ukraine.

My tupennies’ worth? It’s fast, intuitive and unfussy. You can also view the raw text in a special preview window, but it doesn’t support preview in the same way that X1, dtSearch or the new Copernic Desktop Search do. That said, it’s great to see a new player on the block, especially one so enthusiastic.

This week’s column – Hard-Disk Hunters

This week’s Loose Wire column is about hard disk indexers, a topic familiar to those of you reading this blog. 

CONSIDER THIS: Your hard drive probably contains more info than you could ever imagine. Say you’ve got a modest hard drive of 20 gigabytes. That’s the equivalent of about 20 copies of the Encyclopedia Britannica. Or 20,000 floppy disks. That’s a lot of stuff, and, chances are, you have little or no idea what’s actually on there or, if you do, how to find it. Be ignorant no more: Help is at hand.

Now, I know we’ve been here before. One of my bugbears has been the lack of a decent program to find files on your computer. By this I don’t mean looking for anything particularly obscure, just your last letter home, or the e-mail you got from the accounts department demanding your expense report from covering the Burma Campaign. Simple stuff, and it’s always annoyed me that Internet search engines do this so much better on the world wide Web than they do on our own Word files or e-mails. (Mac fans will chime in at this point and say they’ve always had this feature; Windows fans will say XP has its own search-and-index function. But, with respect to both groups, I’d say neither is particularly useful and, in the case of XP’s, practical. It’s clunky, hard to figure out, and slows your computer down to a snail’s pace.) But now sharp new programs promise to do something about this, and they are aimed directly at the casual user who just wants to find stuff, without a lot of fuss.

In the column I mention most of the indexers listed here. Full text at the Far Eastern Economic Review (subscription required, trial available) or at WSJ.com (subscription required). Old columns at feer.com here.

Another Way To Find Stuff At Home and On The Net

Here’s another one of those tools that should have been around a long, long time ago (in fact one was but it went away: AltaVista Discovery. And don’t get me started on Enfish Tracker). It’s the desktop search engine that indexes your hard drive, the net, all that kind of stuff. Welcome to HotBot Desktop.

HotBot’s Desktop will let you “search local files, email (Outlook & Outlook Express), browser history, and RSS subscriptions. The HotBot Desktop creates a local index to allow you to quickly find local content as you are on or offline.” It also comes with a RSS feed reader and a built-in pop up blocker.

ResourceShelf says it’s by no means perfect, saying there are some bugs that Lycos intend to fix in later versions. It will also only work with Internet Explorer. Anyway, it’s great news that these things are back. I’m building up a list of indexing engines here. Please let me know if I’ve missed any.

Offer: Enfish Going Cheap, and Looking It Too

 I’m a tad worried about Enfish. Once the great white hope of computer indexing, I can’t help feeling they’re floundering. I just received an email — about five copies of it, to be precise — which seems to offer a version of Enfish’s Find product at a discount.
 
 
From what I can figure out in the email and on the website, Enfish Find can be bought for $44.95 – 10% off the full price. Fair enough, but why such an incomprehensible email, and why the typos? Enfish is still a good product, but it’s facing stiff competition from the more energetic X1 Technologies. Sloppy promotions aren’t going to help.

Update: X1 Improvements On Their Way

 Further to my posting about X1, the indexing program, X1’s chief cook Mark Goodstein says they are promising an update soon that includes:
  • PDF (Acrobat_ and Zip contents indexing.
  • Attachments indexing and display (for Outlook and Eudora).
  • Tighter Outlook integration (responding, moving, etc., from within X1).
  • Some improvements in the interface and performance stuff.
Sounds good. It’s good to see new stuff being added so fast.

Q&A: X1 and The Future of Finding Stuff

  Full text of email interview with Mark Goodstein of X1 (see my column in WSJE and FEER this week)
 
— Who are you aiming at with this product?
 
Not to be too simplistic, we’re aiming at two groups: consumers and professionals, specifically those who have a lot of email and files and who spend more time than they want searching for information on the Internet or intranet. The free version offers a substantial set of features that we hope will entice legions of users to use the product at
home and work, for all their information finding needs. The pro version has features that power users will demand, like indexing network drives and viewing files in their native formats, regardless of whether they have the native application installed. Both versions will continue to get richer over the coming weeks and months, as we add more consumer features, like media-specific tabs (pictures, music, etc.) and more powerful web searching and eCommerce-related features. The pro version will get support for indexing attachments, contacts, events, PDFs, and archives. We think these two prongs will encourage great numbers of people to use the product and will eventually allow us to crack the enterprise market, which is straining for simple interfaces to complex data: X1’s specialty.
 
 
— I’ve always thought this kind of product was really basic, and when Enfish came out in 1999, I assumed it would be massive. But it wasn’t, and nothing since has really caught on. Why is this? Does it have to do with new paradigms, or just the product wasn’t right, or people aren’t ready for it, or what?
 
Our approach isn’t that much different than others, but we’re staying focused on simplicity and speed. X1’s interface is visceral and innovative: allowing the user to winnow the searches down from all to just a few, instantly, as opposed to the normal none to many (sometimes with a coffee break) of today’s search engines and desktop search utilities. This interface gives the user the feeling of control over chaos, which is hard to underestimate. Many people have built up complicated directory structures for storing their files and email, all in an effort to just keep track. X1 allows the user to stop caring about the organization and more about the work!
 
This is a difficult question to answer because it seems like Enfish and others have done many of the things we’ve done, but several years in advance. I’m not sure why they failed to catch on like you assumed, but I don’t think the fundamentals have changed. The amount of data we’re responsible for is large and always growing; it’s in disparate formats and locations; the tools that help users wade into this sea of information are, maybe justifiably, difficult to understand and use; and there’s no incentive for market leaders, like Microsoft, to innovate. It doesn’t help that the dotcom bubble excited expectations and the companies responsible never followed through.
 
That said, we really do think we’ve created a beautiful interface to complicated data sets. We think of it as something between a spreadsheet and a database. So, like you said, Enfish should have caught on big, and didn’t. Just like databases were supposed to catch on big at the end-user level, and didn’t. Spreadsheets have tried to fill the gap,
becoming more database-y over time. But that’s a little ridiculous, as many people have come to realize.
 
— What’s under the hood? Presumably these programs have different technologies underpinning them? Could you explain a little of the challenges to minimize the downside of such programs — index size, performance loss, ease of use, success ratio of finding what you’re looking for, etc?
 
I assume most indexing technologies are actually pretty close cousins, separated by clever coding and intelligent choices. We all deal with the same limitations of compression, physical memory, disk space, etc., and all have to make trade-offs to deliver a product to market. X1 has an inverted index with all sorts of clever tricks to manage memory and
processor use to keep the indexing as invisible and painless as possible. Our goal, from the beginning, was to make a product that was as simple to use as possible, as fast as a machine would allow, and as invisible as possible. We’ve had success on all fronts and we’ll continue to improve and innovate as time goes by. We think the bottom line here is speed and simplicity. Speed allows us to skip all those complicated, frankly under-used, search features, while allowing the user to iteratively search (quickly) through their data. They may search twice before success, but certainly it’ll be faster and more satisfying. This is compounded by our innovative multi-field search interface. That’s it.
 
— Where do you see this going? Is searching a hard drive going to get more sophisticated a la data mining? Or is this a rough and ready product that will always fit the brute force approach?
 
Not to harp on this too much, but we honestly believe that our mission will be fulfilled and we’ll achieve big success if we stick to our dual goals of speed and simplicity. We can let Oracle do the OLAP while we do away with the DBA…
 

Column: the paper mountain

Loose Wire — Conquer That Paper Mountain: It’s time to get organized; Here’s some software to help you scan and locate photos and documents; But perhaps you shouldn’t ditch the filing cabinet just yet

By Jeremy Wagstaff
 
from the 29 May 2003 edition of the Far Eastern Economic Review, (c) 2003, Dow Jones & Company, Inc.
I’m a little suspicious of programs that, adorned with images of bits of paper and photos disappearing into a smiling computer monitor, promise to give order to the junk that is my life. The paperless office never happened — we still make printouts because it’s so easy — and while everyone seems to be photographing digitally these days, that doesn’t sort out our cupboards full of snaps. And even if this stuff does find its way onto your computer, chances are it’s all over the place, in subfolders with obscure names. A sort of digital chaos, really.

I don’t promise an end to all that. And the programs I’m about to tout are not really a new idea, but they both do a better job than their predecessors of helping you to get organized, whether you’re trying to sift through documents already on your computer, or get a handle on your photos.

First off, Scansoft’s PaperPort (deluxe version, $100 from www.scansoft.com/paperport/). Into its ninth version, it’s a lot more sophisticated than its forbears. PaperPort and its competitors allow you to scan documents into the computer, and then let you organize and view those documents into folders of your choosing. You can then convert them to digital text, a process called OCR or Optical Character Recognition, which in turn allows you to move chunks of the original document into a word-processing file. In theory it’s a great way to get rid of paper clutter on your desk, helping you to find those documents — or parts of them — easily, or to convert them to something you can use in your spreadsheet, document or whatever. In practice, it’s too much of a fiddle. Most folk find it easier to locate the hard copy of a document (behind the bookcase, next to the dead cockroach) than the soft one (What name did I give it? What keyword should I use to find it?), so they just buy another filing cabinet.

PaperPort hasn’t resolved the riddle of why we can always locate something under a messy pile of papers, but never after we’ve cleaned up, but it’s a few steps closer to making it easier to handle documents on your PC. First, you can scan them in a format called PDF, short for Adobe’s Portable Document Format, a widely used standard for viewing documents. By working within this standard — rather than PaperPort’s proprietary standard — everything you scan in PaperPort can be accessed and handled by other programs, or by folk who don’t use PaperPort. Common sense, I know, and they’ve got there at last. Another common-sense feature is a search function that allows you to search through an index of documents, whatever format they’re in, within PaperPort.

For a long time I’ve used PaperMaster, now owned by J2Global, the Internet-faxing company, which promises to have an updated version available later this year. PaperMaster does pretty much what PaperPort does, but it’s been doing it a lot longer and it actually looks like a filing cabinet, which I find reassuring. But it doesn’t work well with Windows XP, and is looking somewhat dated. Most importantly, it won’t save your scans in a file format recognized by anyone else on this planet. What’s more, it sometimes loses whole drawers of documents, which kind of defeats the object of the exercise.

So check out PaperPort. It will handle photos too, but if you’ve got a lot of them, I’d suggest Adobe’s new Photoshop Album ($50 from www.adobe.com/products/photoshopalbum/). Album is elbowing for space among a lot of similar products vying for the burgeoning home-photo market, but it has features and a very intuitive interface that I suspect will put it ahead of the pack.

Basically, it can collate pictures from more or less any source — scanning, digital images on your hard drive, on a digital camera, on a CD-ROM — and give you the tools to touch them up, label them, order them around and generally beat them into submission. You can create the usual things with them — albums, video disks, printouts, slide shows and whatnot — all in as tasteful a way as you can expect from a homespun photo album. I particularly liked the way you could tag photos more than once so, say, a picture of your Uncle Charlie doing the gardening in his pantomime costume could be categorized both under Family and Environmental Pollution Hazard. All in all, a smart program, and not badly priced.

Gripes? They’re a bit stingy on the tools they provide to touch up photos, so all the facial blemishes of my adolescent years are still there if you look closely.

These programs won’t change our lives. They may only make a dent in a filing cabinet and photo drawer. But they’re good enough for what they try to do, which is to lend a little order to our pre-paperless lives.