I’ve written before about how I think Google Earth, or something like it, will become a new form of interface — not just for looking for places and routes, but any kind of information. Some people call it the geo-web, but it’s actually bigger than that. Something like Google Earth will become an environment in its own right. I can imagine people using it to slice and dice company data, set up meetings, organize social networks.

Google is busy marching in this direction, and their newest offering is a great example of this: Google Book Search. This from Brandon Badger, product manager at Google Earth:

Did you ever wonder what Lewis and Clark said about your hometown as they passed through? What about if any other historical figures wrote about your part of the world? Earlier this year, we announced a first step toward geomapping the world’s literary information by starting to integrate information from Google Book Search into Google Maps. Today, the Google Book Search and Google Earth teams are excited to announce the next step: a new layer in Earth that allows you to explore locations through the lens of the world’s books.

Activating the layer peppers the earth with little yellow book icons — all over the place, like in this screenshot from Java:


Click on one of the books and the reference will pop up, including the title of the book, its cover, author, number of pages etc, as well as the actual context of the reference. Click on a link to the page

Is it perfect? No. It’s automated, so a lot of these references are just wrong. Click on a yellow book in Borneo and you find a reference in William Gilmore Simms’ “Life of Francis Marrion” to Sampit, which is the name of a town there, but it’s likely confused with the river of the same name in South Carolina.

Many of the books in Google’s database are scanned, so errors are likely to arise from imperfect OCR. Click on a book above the Java town of Kudus, and you get a reference to a History of France, and someone called “Ninon da f Kudus”, which in fact turns out to be the caption for an illustration of Le Grand Dauphin and Ninon de l’Enclos, a French C17 courtesan.

But who cares? By being able to click on the links you can quickly find out whether the references are accurate or not, and I’m guessing Google is going to gradually tidy this up, if not themselves then by allowing us users to correct such errors. (So far there doesn’t seem to be a way to do this.)

This is powerful stuff, and a glimpse of a new way of looking, storing and retrieving information. Plus it’s kind of fun.

A Beginner’s Guide to Scanning

A lot of folk ask me whether they should buy a scanner: those things that take bits of paper, or photographs, and turn them into files your computer can use.


A lot of folk ask me whether they should buy a scanner: those things that take bits of paper, or photographs, and turn them into files your computer can use.

Frankly, I’m surprised by this (not the taking and turning, but the asking). Why would people not have a scanner? I have four.

Well, five, actually, if you include that little business card scanner sitting in a drawer somewhere. OK, six. I bought a backup scanner once in case all my other scanners eloped. I scan every piece of paper that I can.

I scan whole books I want to read on my computer. I scan coworkers who pass me in the corridor. The truth is that scanners can save you lots of time, space and pain. But I readily accept that my passion for scanning may not have won you over.

First off, don’t get your hopes up. Twelve years ago I bought a scanner that, lulled by the pictures on the box of pages flying into my computer, I thought would rid me of a ridiculous four-drawer filing cabinet full of stuff I had been lugging around Asia.

I was to be disappointed. Scanners won’t digitize everything paper, I learned, and sometimes they will but will take so long the task won’t be finished in your lifetime. No, scanners won’t make you paperless, but they may lighten your load.

So, the second task is to figure out what there is you have to scan, and then get the right scanner for the job. There are flatbed scanners, which look a bit like the tops of photocopiers, which scan one loose sheet of paper at a time. (You can sometimes buy sheet feeders that, well, feed the sheets in, to some of these units.)

These can be cheap: Less than US$100 will buy you a quality Canon device. These are good, and do the job well. They’re fine if you’ve got the odd document or photo to scan, or the odd chapter in a book you want to store on your computer.

But they’re not good if you’ve got lots of stuff. For this, I’d recommend something like the Fujitsu ScanSnap. I have one of the basic models (5110EOX, selling for $300 to $650), which looks a bit like a small fax machine, and it’s still going strong after three years of heavy-duty scanning.

You can only scan single sheets into it — none of the flatbed/photocopier option — but it will scan pages fast, front and back, without you having to do anything other than press a button. The pages are scanned direct to a common file format called PDF.

I love my ScanSnap. I will scan all incoming business mail — bills, receipts, statements, letters of eviction — which means I need keep no formal paperwork except the odd will or letter from Aunt Maude that has sentimental value. The ScanSnap can also handle business cards, which it can scan more or less directly into Microsoft Outlook.

Neither of these options is particularly portable. If you scan and you travel, you may want to consider a small portable scanner. NeatReceipts has two scanners that make more sense if you move around: one a thin, long device that looks more like a truncheon or night stick, and one a small, cigarette box-sized business card scanner.

Which brings me to the important bit of scanning: What happens to the document once it’s scanned. Most software simply converts a physical thing to a digital thing, but to make the text that is on that physical thing something you can edit, search or add to, you need to run more software over it called optical character recognition, or OCR.

This software – which usually comes included with the scanner — basically looks at the patterns in the image of your document that the scanning software has created and tries to figure out the letters.

OCR software nowadays is remarkably accurate, so long as you give it good, clean documents to start with. Don’t expect your spidery handwriting or a smudged and heavily annotated tome from the Dark Ages to come out 100 percent accurate.

NeatReceipts doesn’t just specialize in digitizing and organizing your receipts: The smaller device handles business cards too. But for most jobs, you’d be better off with something like Paperport, which will handle all the OCR for you and also help you organize your documents into folders.

Bottom line? Scanning stuff is a very useful way to keep your desk clear and to be able to find stuff. But you have to be disciplined about it, and get a rather perverse joy out of watching paper disappear into a roller.

And be prepared to be regarded by co-workers, friends and family as a bit of a freak.

CAPTCHA Gets Useful


An excellent example of something that leverages a tool that already exists and makes it useful — CAPTCHA forms. AP writes from Pittsburgh:

Researchers estimate that about 60 million of those nonsensical jumbles are solved everyday around the world, taking an average of about 10 seconds each to decipher and type in.

Instead of wasting time typing in random letters and numbers, Carnegie Mellon researchers have come up with a way for people to type in snippets of books to put their time to good use, confirm they are not machines and help speed up the process of getting searchable texts online.

”Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. ”Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?”

The project, reCAPTCHA, is using people’s deciphering to go through those books being digitized by the Internet Archive that can’t be converted using ordinary OCR, where the results come out like this:


Those words are sent to CAPTCHAs and then the results fed back into the scanning engine. Here’s the neat bit, though, as explained on the website:

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Which I think is kind of neat: the only problems might occur if people know this and mess the system by getting one right and the other wrong. But how do they know which one?

Loose Change Sept 19 2006

It used to be called Loose Bits, but I prefer Loose Change. For now. It’s the same thing: tidbits I found that might be of interest:

  • First off, NeatReceipts, which sells a small scanner and special software to scan in your receipts while you’re on the road, has announced a new version of its software, which should be in the shops next month. Includes color Scanning, a better Document Organizer and better OCR. Version 2.5 will retail for $200, the same price as the current Scanalizer. I reviewed the product a few months back and was impressed, though you’ve got to really love receipts to get into it.
  • Lost in the Crowd allows you to search the web more anonymously by mixing in with your normal searches entirely random ones sent on your behalf: “What searches did you care about versus those that were just made up? There’s no way for the search engine, or anyone else, to tell.” Nice idea. Only hitch I can think of is if those random searches lead down weird alleys that may come back to haunt me.
  • Forget Google anonymity. Just worry about voting. A blog by two Princeton University types reveals an ordinary “hotel minibar” or office key will open the door on Diebold Voting Machines, allowing someone to remove, alter or replace the memory card that stores the votes.

The PaperPort View

Further to my column in this week’s on PaperPort Pro and PaperMaster Pro, here are some e-mailed answers from Bob Anderson (ScanSoft Regional Director Asia Pacific and Japan) in response to my questions about PaperPort Pro:

1) What are the improvements in version 10 (PaperPort Professional 10) over previous versions?

A) Faster and easier folder navigation with Bookmark Workspaces and Split Desktop. Bookmark Workspaces allow you to bookmark your most widely used folders and quickly jump back to them when you need them. Using the Split Desktop you can also look into two folders at once to move files quickly and easily between them or to simply compare the contents of two folders at the same time.

B) A new All-in-One Search Engine and Index Manager for faster and more flexible indexing of scanned files and faster retrieval of documents you search.

C) PDF Support

a) PDF Create! PaperPort includes the capability to create PDF files from all of your MS Office documents or any other PC application with PDF Create! – just like Adobe® Acrobat®. PDF has emerged as the universal standard for sharing and archiving documents and images. The PDF format lets you send any document or image to anyone, regardless of whether they use a PC or a Macintosh® computer, and they can view and print the file – exactly as it looked on your computer – without the need to have the application that was used to create the original file. Now, there is no need to buy any additional PDF creating software – PaperPort 10 Professional does everything you need.

b) PDF Combine, Stack, Drag&Drop, Cut, Paste and Delete Manipulating PDF files is easier. Accelerates the creation and assembly of custom documents by mimicking the way you work with paper documents. Easily add or remove pages, reorder them or create new documents with drag and drop tools that making working with electronic documents as simple as working with actual paper documents on your desktop.

c) PDF Security PaperPort protects your information by allowing you to set security options for individual documents. Keep your content locked down by requiring a password to view or print sensitive information.

d) PageViewer PDF Rendering – Resolution Adjustable Customise the view on your desktop by increasing or decreasing the resolution of graphics in PageViewer mode to optimise PaperPort’s performance to suit your needs.

D) Superior document assembly capabilities that make working with digital documents as easy as working with real paper by using automatic conversion to PDF, Page Thumbnails and the Split Desktop. Page Thumbnails allow you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format (print capable) into the Page Thumbnails of a PDF document. The ability to assemble documents as easily as dragging pages or documents from one document to another is greatly enhanced by the Split Desktop that allows you to merge documents across two different folders.

E) Faster and more productive scanning with a new scanner interface that provides for “One-click Scanning” and “Batch Scanning” capabilities

F) PaperPort SET tools (Scanner Enhancement Technology) are on the desktop for fast document image corrections. The PaperPort SET Tools allow you to rotate, auto-straighten, convert document colours, adjust image colour, hue, etc. all with a right-click on any image document.

G) Unparalleled ease-of-use and accessibility to PDF creation functions, including password protection, integrated throughout PaperPort and popular Microsoft applications.

2) What kind of users, and usage, are you aiming at? Do you have any interesting customer stories about how they’ve used PP, or perhaps your own experience?

We are looking to provide comprehensive desktop document management solutions for individual knowledge workers in the small business marketplace or for functional workgroups in large organisations.

A) Document Management – Anyone that wants to eliminate the use of paper as their primary means of storage and retrieval will become more organised using their computer and PaperPort. This includes home, home office and any business application that currently relies on a paper process and is looking for a way to organise, find and share information more easily and efficiently.

B) Scan-to-Desktop – With the proliferation of network multifunction and digital copier devices, scanning has really become a mainstream business function. PaperPort Professional helps millions of professionals to eliminate paper and to streamline the way they work with all of their documents. With PaperPort, you can easily scan using any connected or networked scanning device including flatbed, All-in-One and Multi-Function Devices. PaperPort supports WIA, TWAIN and ISIS scanner drivers as well as important industry-standard formats including PDF, TIFF and MAX. Its unique ability to work with desktop, network and departmental scanners and even digital copiers make it an invaluable productivity application for organisations of all sizes. Users can e-mail, fax or send documents to repositories on a networked file server with a simple drag-and-drop.

We have many interesting customer case studies that are attached as separate files.

3) This kind of product has, in my view, remained quite niche. People still appear to be somewhat resistant to the idea of committing their paper to a scanner. Is this true, and if not, do you have any statistics to support your view? If it is the case, how are you going about convincing users to change their habits?

Document scanning adoption is increasing as improved, affordable networked multi-function and workgroup/departmental devices penetrate new market segments. Recent research by organisations such as Gartner Group and IDC shows that the adoption of scanners – particularly in the enterprise in a networked environment – has seen tremendous growth in recent years. Multi-function devices and networked or departmental scanners with automatic document feeders is the fastest growing segment for scanner manufacturers, growing at an annual rate of 3.2 percent a year (source: Gartner) while colour multifunction devices are experiencing double digit growth. These scanners are being used primarily for document scanning (rather than photo scanning as may have been the case in the past), and e-mailing attachments has become more commonplace than faxing paper documents. As these devices continue to grow in popularity, so does the need for document management software such as PaperPort that allows people to not only get paper into a PC, but also to organise, find and share that information once it is in a digital environment.

The popularity of document management software also becomes particularly evident when we look at our customer base – nearly 4 million people use PaperPort, and say that, after e-mail, it is the application that they use most on their PC. We do recognise that we will likely never see a truly “paperless office,” but there is indeed a need for document management software such as PaperPort that allows digital and paper information to coexist and work together more efficiently.

According to Keith Kmetz, program director, Hardcopy Peripherals Solutions and Services, IDC, “The proliferation of network multifunction and digital copier devices, combined with intuitive applications such as PaperPort, has finally made document scanning a mainstream business function. Products like PaperPort Professional 10 give organisations the ability to deliver the benefits of scanning, PDF and document assembly to every business desktop.”

4) Where do you see the future of this kind of product? Is it going to morph into other products that index users’ documents (Desktop Search, etc), and do you see any combination of products like PP and voice recognition? After all, character recognition and voice recognition would appear to be cousins, and ScanSoft have strong footprints in both areas.

PaperPort is an environment to manage all kinds of documents. Scanning, document assembly and desktop search are all critical components in the document management lifecycle. PaperPort All-in-One search is already a comprehensive desktop search tool allowing you to search all scanned paper, image and text document types including PDF.

So where in managing documents do our customers need the most improvements to increase their productivity and find value in the next version of PaperPort? Voice recognition is definitely an area of growth as the accuracy of the technology allows for greater acceptance. Document collaboration, distribution and the effective use of multifunction devices are all areas where there is still significant opportunity for improvement and productivity. More and more multifunction devices are replacing single purpose machines making scanned documents more accessible and therefore the tools to manage those documents at the desktop are in greater demand.

With regards to the character and voice recognition, ScanSoft’s optical character recognition (OCR) technology can be used to create an index of a document by converting the ‘picture’ of the textual information within a document into computer text. Similarly, ScanSoft’s AudioMining technology can be used to create XML speech indexes of every spoken word contained in an audio and video file; in short, AudioMining does to audio and video what OCR does to a document. For the enterprise, particularly in industries where recording telephone calls is a requirement (such as banking, insurance, etc), AudioMining can be used to jump directly to a specific location in a conversation. Public Web users could also benefit when searching for training videos, research talks at universities, government hearings and news conferences, etc.

5) The PP interface hasn’t changed much in a decade, arguably. Is this version a major GUI revamp, or are you sticking with what you have?

I’d say we accomplished a lot of both. We have a very large user base that is quite familiar with the features and functions of the program that are directly tied to the easy-to-use interface. We wanted to make changes that specifically enhance the productivity of document assembly, search, PDF creation, password protection and management to further the concept of “electronic paper” without compromising where our customers have already found significant value.

Also worth noting is that PaperPort is available in multiple languages in both Western and now in Asian language versions. ScanSoft added Chinese (Traditional & Simplified), Japanese & Korean with OCR in the native language to the product line-up during the later half of 2004. The Asian version is coming to market through our various customers including OEM, licensees and general distribution partners

6) Why is one not able to preview documents in thumbnail view (in other words, being able to see all pages of the document in thumbnail, while being able to see full size preview of a page in the same window by clicking on one of the thumbnails), a feature common in other programs as far back as 1998? I’ve always felt this to be a key weakness in PP.

PaperPort 10 features Page Thumbnails on both the PaperPort Desktop and within the PaperPort PageViewer. We have also added significant value to these thumbnails by allowing you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format into the page thumbnails of any PDF document for automatic conversion to PDF and easy document assembly (See PaperPort menu item “View:Page Thumbnails”).

7) Another weakness I’ve felt exists has been the OCR. Given ScanSoft’s strengths in this field, why is PP’s inbuilt engine not more powerful?

The features of PaperPort OCR are dedicated to specific tasks within PaperPort and are generally adequate for most business purposes providing an appropriate amount of value for our customers. To enhance the OCR capabilities a customer may add professional OCR speed, accuracy and value by purchasing ScanSoft OmniPage which will automatically upgrade the PaperPort OCR capability.

8) Where does Acrobat/PDF fit into all this? Both you and PaperMaster now support direct-to-PDF scanning, and products like Fujitsu’s ScanSnap use it by default. Is this the way things will go? Have they already snuffed out alternatives? What are the advantages of this, and are there disadvantages? Adobe’s interface, in my view, doesn’t make it easy for users to tweak, annotate or fix PDF files.

The greatest challenge to streamlining document-based processes in business is the fact that there are two incompatible dominant electronic document formats – Microsoft® Word and PDF. Microsoft Word provides professionals with a rich environment for document creation and collaborative authoring but the editable Microsoft Word format is not well suited for electronic publishing and online document storage. Conversely, PDF has expanded from its traditional roots as a design and pre-press tool to an electronic file sharing standard providing business users with a format that is well suited for the distribution, viewing and archiving of documents.

The result is that Microsoft Word is the standard for authoring and editing business documents, while PDF is becoming the preferred way of distributing and sharing business documents online. The pervasiveness of Microsoft Word (400 million) and Adobe® Acrobat® Reader (500 million) gives rise to the need for document management solutions that enable the seamless movement of documents between the two dominant formats.

However, traditional solutions for PDF creation and management, such as Adobe Acrobat, are not priced or designed with the business user in mind. The ScanSoft family of PDF products – PDF Create!, PDF Converter, and PDF Converter Professional 2, address this need by providing business professionals with the ability to more seamlessly move between the two dominant electronic document formats – Microsoft Word and PDF.

We extend this PDF support into PaperPort Professional 10 as well, by including not only Scan-to-PDF and the ability to convert any format to PDF, but also ScanSoft PDF Create!, which allows users to quickly create a PDF from any PC application, or merge multiple files into a PDF. We also continue to support and make available our own internal “PaperPort Image” format as well as supporting TIFF, JPEG, GIF and BMP as long as our customers require these formats. We also believe that a true desktop management system encompasses scanned file formats as well as popular digital application files.

Software: PaperMaster Pro – worth the wait?

 It’s been nearly five years some folk have been waiting, but it looks like PaperMaster, a great program for scanning and organizing your paperwork, is back.
PaperMaster is back
PaperMaster — the last full version was 98, to give you some idea how long this software’s been hibernating — was pretty good. It look liked a filing cabinet, and let you scan and store more or less anything you could squeeze through your scanner. The company was sold to j2, which is basically an Internet faxing service and which were very, very quiet about the software until last year, when in response to public interest (well me, and a couple of other people) they released PaperMaster2002, an upgrade for existing licensed users of PaperMaster98 “who have migrated or are planning to migrate to the Microsoft® Windows 2000, XP, or ME operating system”.
That version wasn’t cheap — $150 — and didn’t do much apart from resolve a few of the features of PaperMaster98 that wouldn’t work under XP (unless you happened to stumble across some tweaks that fans had posted to websites). Earlier this year, when I complained about the cost of what was basically a minor upgrade, j2 told me “the PaperMaster upgrade was completed primarily for a few select users who were figuratively beating down j2 Global’s door to get the new product. The cost of the upgrade was a result of j2 Global investing significant resources to complete an upgrade designed for limited distribution. Based on customer response, j2 Global’s PaperMaster users seem to be fine with the price”. Not what I heard, but there you go.
Anyway, Pro is here. Nearly. You can pre-order and get 15% off the retail price of $199 (once again, not cheap). Still, it sounds as if it has some serious features
  •    Create PDFs from any office application or scan
  •    Organize fast and easy
  •    Find anything in seconds
  •    Get powerful OCR – Never re-type any document
  •    Fax easier via the Internet with built-in eFax®
All of which sound useful. I’ll review it once I’ve got hold of a copy. Earlier release date was set for today, so that could be soon. If you’re in a hurry, see my recent review of PaperPort, which does much the same thing.

Column: the paper mountain

Loose Wire — Conquer That Paper Mountain: It’s time to get organized; Here’s some software to help you scan and locate photos and documents; But perhaps you shouldn’t ditch the filing cabinet just yet

By Jeremy Wagstaff
from the 29 May 2003 edition of the Far Eastern Economic Review, (c) 2003, Dow Jones & Company, Inc.
I’m a little suspicious of programs that, adorned with images of bits of paper and photos disappearing into a smiling computer monitor, promise to give order to the junk that is my life. The paperless office never happened — we still make printouts because it’s so easy — and while everyone seems to be photographing digitally these days, that doesn’t sort out our cupboards full of snaps. And even if this stuff does find its way onto your computer, chances are it’s all over the place, in subfolders with obscure names. A sort of digital chaos, really.

I don’t promise an end to all that. And the programs I’m about to tout are not really a new idea, but they both do a better job than their predecessors of helping you to get organized, whether you’re trying to sift through documents already on your computer, or get a handle on your photos.

First off, Scansoft’s PaperPort (deluxe version, $100 from Into its ninth version, it’s a lot more sophisticated than its forbears. PaperPort and its competitors allow you to scan documents into the computer, and then let you organize and view those documents into folders of your choosing. You can then convert them to digital text, a process called OCR or Optical Character Recognition, which in turn allows you to move chunks of the original document into a word-processing file. In theory it’s a great way to get rid of paper clutter on your desk, helping you to find those documents — or parts of them — easily, or to convert them to something you can use in your spreadsheet, document or whatever. In practice, it’s too much of a fiddle. Most folk find it easier to locate the hard copy of a document (behind the bookcase, next to the dead cockroach) than the soft one (What name did I give it? What keyword should I use to find it?), so they just buy another filing cabinet.

PaperPort hasn’t resolved the riddle of why we can always locate something under a messy pile of papers, but never after we’ve cleaned up, but it’s a few steps closer to making it easier to handle documents on your PC. First, you can scan them in a format called PDF, short for Adobe’s Portable Document Format, a widely used standard for viewing documents. By working within this standard — rather than PaperPort’s proprietary standard — everything you scan in PaperPort can be accessed and handled by other programs, or by folk who don’t use PaperPort. Common sense, I know, and they’ve got there at last. Another common-sense feature is a search function that allows you to search through an index of documents, whatever format they’re in, within PaperPort.

For a long time I’ve used PaperMaster, now owned by J2Global, the Internet-faxing company, which promises to have an updated version available later this year. PaperMaster does pretty much what PaperPort does, but it’s been doing it a lot longer and it actually looks like a filing cabinet, which I find reassuring. But it doesn’t work well with Windows XP, and is looking somewhat dated. Most importantly, it won’t save your scans in a file format recognized by anyone else on this planet. What’s more, it sometimes loses whole drawers of documents, which kind of defeats the object of the exercise.

So check out PaperPort. It will handle photos too, but if you’ve got a lot of them, I’d suggest Adobe’s new Photoshop Album ($50 from Album is elbowing for space among a lot of similar products vying for the burgeoning home-photo market, but it has features and a very intuitive interface that I suspect will put it ahead of the pack.

Basically, it can collate pictures from more or less any source — scanning, digital images on your hard drive, on a digital camera, on a CD-ROM — and give you the tools to touch them up, label them, order them around and generally beat them into submission. You can create the usual things with them — albums, video disks, printouts, slide shows and whatnot — all in as tasteful a way as you can expect from a homespun photo album. I particularly liked the way you could tag photos more than once so, say, a picture of your Uncle Charlie doing the gardening in his pantomime costume could be categorized both under Family and Environmental Pollution Hazard. All in all, a smart program, and not badly priced.

Gripes? They’re a bit stingy on the tools they provide to touch up photos, so all the facial blemishes of my adolescent years are still there if you look closely.

These programs won’t change our lives. They may only make a dent in a filing cabinet and photo drawer. But they’re good enough for what they try to do, which is to lend a little order to our pre-paperless lives.