Tag Archives: Document

links for 2008-09-10

How To Trace The Source of a Hard Copy

Good piece by AP on a Electronic Frontier Foundation report saying that tracking codes in color laser printers have been cracked. The report points to dots embedded in Xerox’s color laser printers that appear on the printed page, which can then be traced back to particular printers:

By analyzing test pages printed out by supporters worldwide and by staffers at various FedEx Kinko’s locations, researchers found that some of the dots correspond to the printers’ serial numbers. Other dots refer to the date and time of the printing.

This is done, AP says, to foil currency counterfeiters, but could just as easily be used by governments to track down criminals or dissidents. This is not just the typewriter trick, where a document could be traced back to a particular typewriter, or make of typewriter, by quirks in the typeface and letter alignment. Although that is a part of it: by comparing two documents it is possible to conclude they are from the same printer, which would poleax a suspect accused of being behind a document just by printing something from their printer.

But although the article doesn’t mention it, I assume these tracking codes could also allow people to track down a suspect, by looking at the serial number and following the distribution of that printer. Unless the purchaser chose to cover his tracks, it shouldn’t be too hard to trace the printer through the country, town, retailer and credit card receipt. (With the time stamp included, it should be possible to track down the customer even if the end user is in a public printshop.) I’m guessing here, but it all seems plausible.

It’ll be interesting to see where EFF goes with this. Me? I’m no dissident but I’m not crazy about anyone being able to trace back what I print out.

 

The PaperPort View

Further to my column in this week’s WSJ.com/AWSJ on PaperPort Pro and PaperMaster Pro, here are some e-mailed answers from Bob Anderson (ScanSoft Regional Director Asia Pacific and Japan) in response to my questions about PaperPort Pro:

1) What are the improvements in version 10 (PaperPort Professional 10) over previous versions?

A) Faster and easier folder navigation with Bookmark Workspaces and Split Desktop. Bookmark Workspaces allow you to bookmark your most widely used folders and quickly jump back to them when you need them. Using the Split Desktop you can also look into two folders at once to move files quickly and easily between them or to simply compare the contents of two folders at the same time.

B) A new All-in-One Search Engine and Index Manager for faster and more flexible indexing of scanned files and faster retrieval of documents you search.

C) PDF Support

a) PDF Create! PaperPort includes the capability to create PDF files from all of your MS Office documents or any other PC application with PDF Create! – just like Adobe® Acrobat®. PDF has emerged as the universal standard for sharing and archiving documents and images. The PDF format lets you send any document or image to anyone, regardless of whether they use a PC or a Macintosh® computer, and they can view and print the file – exactly as it looked on your computer – without the need to have the application that was used to create the original file. Now, there is no need to buy any additional PDF creating software – PaperPort 10 Professional does everything you need.

b) PDF Combine, Stack, Drag&Drop, Cut, Paste and Delete Manipulating PDF files is easier. Accelerates the creation and assembly of custom documents by mimicking the way you work with paper documents. Easily add or remove pages, reorder them or create new documents with drag and drop tools that making working with electronic documents as simple as working with actual paper documents on your desktop.

c) PDF Security PaperPort protects your information by allowing you to set security options for individual documents. Keep your content locked down by requiring a password to view or print sensitive information.

d) PageViewer PDF Rendering – Resolution Adjustable Customise the view on your desktop by increasing or decreasing the resolution of graphics in PageViewer mode to optimise PaperPort’s performance to suit your needs.

D) Superior document assembly capabilities that make working with digital documents as easy as working with real paper by using automatic conversion to PDF, Page Thumbnails and the Split Desktop. Page Thumbnails allow you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format (print capable) into the Page Thumbnails of a PDF document. The ability to assemble documents as easily as dragging pages or documents from one document to another is greatly enhanced by the Split Desktop that allows you to merge documents across two different folders.

E) Faster and more productive scanning with a new scanner interface that provides for “One-click Scanning” and “Batch Scanning” capabilities

F) PaperPort SET tools (Scanner Enhancement Technology) are on the desktop for fast document image corrections. The PaperPort SET Tools allow you to rotate, auto-straighten, convert document colours, adjust image colour, hue, etc. all with a right-click on any image document.

G) Unparalleled ease-of-use and accessibility to PDF creation functions, including password protection, integrated throughout PaperPort and popular Microsoft applications.

2) What kind of users, and usage, are you aiming at? Do you have any interesting customer stories about how they’ve used PP, or perhaps your own experience?

We are looking to provide comprehensive desktop document management solutions for individual knowledge workers in the small business marketplace or for functional workgroups in large organisations.

A) Document Management – Anyone that wants to eliminate the use of paper as their primary means of storage and retrieval will become more organised using their computer and PaperPort. This includes home, home office and any business application that currently relies on a paper process and is looking for a way to organise, find and share information more easily and efficiently.

B) Scan-to-Desktop – With the proliferation of network multifunction and digital copier devices, scanning has really become a mainstream business function. PaperPort Professional helps millions of professionals to eliminate paper and to streamline the way they work with all of their documents. With PaperPort, you can easily scan using any connected or networked scanning device including flatbed, All-in-One and Multi-Function Devices. PaperPort supports WIA, TWAIN and ISIS scanner drivers as well as important industry-standard formats including PDF, TIFF and MAX. Its unique ability to work with desktop, network and departmental scanners and even digital copiers make it an invaluable productivity application for organisations of all sizes. Users can e-mail, fax or send documents to repositories on a networked file server with a simple drag-and-drop.

We have many interesting customer case studies that are attached as separate files.

3) This kind of product has, in my view, remained quite niche. People still appear to be somewhat resistant to the idea of committing their paper to a scanner. Is this true, and if not, do you have any statistics to support your view? If it is the case, how are you going about convincing users to change their habits?

Document scanning adoption is increasing as improved, affordable networked multi-function and workgroup/departmental devices penetrate new market segments. Recent research by organisations such as Gartner Group and IDC shows that the adoption of scanners – particularly in the enterprise in a networked environment – has seen tremendous growth in recent years. Multi-function devices and networked or departmental scanners with automatic document feeders is the fastest growing segment for scanner manufacturers, growing at an annual rate of 3.2 percent a year (source: Gartner) while colour multifunction devices are experiencing double digit growth. These scanners are being used primarily for document scanning (rather than photo scanning as may have been the case in the past), and e-mailing attachments has become more commonplace than faxing paper documents. As these devices continue to grow in popularity, so does the need for document management software such as PaperPort that allows people to not only get paper into a PC, but also to organise, find and share that information once it is in a digital environment.

The popularity of document management software also becomes particularly evident when we look at our customer base – nearly 4 million people use PaperPort, and say that, after e-mail, it is the application that they use most on their PC. We do recognise that we will likely never see a truly “paperless office,” but there is indeed a need for document management software such as PaperPort that allows digital and paper information to coexist and work together more efficiently.

According to Keith Kmetz, program director, Hardcopy Peripherals Solutions and Services, IDC, “The proliferation of network multifunction and digital copier devices, combined with intuitive applications such as PaperPort, has finally made document scanning a mainstream business function. Products like PaperPort Professional 10 give organisations the ability to deliver the benefits of scanning, PDF and document assembly to every business desktop.”

4) Where do you see the future of this kind of product? Is it going to morph into other products that index users’ documents (Desktop Search, etc), and do you see any combination of products like PP and voice recognition? After all, character recognition and voice recognition would appear to be cousins, and ScanSoft have strong footprints in both areas.

PaperPort is an environment to manage all kinds of documents. Scanning, document assembly and desktop search are all critical components in the document management lifecycle. PaperPort All-in-One search is already a comprehensive desktop search tool allowing you to search all scanned paper, image and text document types including PDF.

So where in managing documents do our customers need the most improvements to increase their productivity and find value in the next version of PaperPort? Voice recognition is definitely an area of growth as the accuracy of the technology allows for greater acceptance. Document collaboration, distribution and the effective use of multifunction devices are all areas where there is still significant opportunity for improvement and productivity. More and more multifunction devices are replacing single purpose machines making scanned documents more accessible and therefore the tools to manage those documents at the desktop are in greater demand.

With regards to the character and voice recognition, ScanSoft’s optical character recognition (OCR) technology can be used to create an index of a document by converting the ‘picture’ of the textual information within a document into computer text. Similarly, ScanSoft’s AudioMining technology can be used to create XML speech indexes of every spoken word contained in an audio and video file; in short, AudioMining does to audio and video what OCR does to a document. For the enterprise, particularly in industries where recording telephone calls is a requirement (such as banking, insurance, etc), AudioMining can be used to jump directly to a specific location in a conversation. Public Web users could also benefit when searching for training videos, research talks at universities, government hearings and news conferences, etc.

5) The PP interface hasn’t changed much in a decade, arguably. Is this version a major GUI revamp, or are you sticking with what you have?

I’d say we accomplished a lot of both. We have a very large user base that is quite familiar with the features and functions of the program that are directly tied to the easy-to-use interface. We wanted to make changes that specifically enhance the productivity of document assembly, search, PDF creation, password protection and management to further the concept of “electronic paper” without compromising where our customers have already found significant value.

Also worth noting is that PaperPort is available in multiple languages in both Western and now in Asian language versions. ScanSoft added Chinese (Traditional & Simplified), Japanese & Korean with OCR in the native language to the product line-up during the later half of 2004. The Asian version is coming to market through our various customers including OEM, licensees and general distribution partners

6) Why is one not able to preview documents in thumbnail view (in other words, being able to see all pages of the document in thumbnail, while being able to see full size preview of a page in the same window by clicking on one of the thumbnails), a feature common in other programs as far back as 1998? I’ve always felt this to be a key weakness in PP.

PaperPort 10 features Page Thumbnails on both the PaperPort Desktop and within the PaperPort PageViewer. We have also added significant value to these thumbnails by allowing you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format into the page thumbnails of any PDF document for automatic conversion to PDF and easy document assembly (See PaperPort menu item “View:Page Thumbnails”).

7) Another weakness I’ve felt exists has been the OCR. Given ScanSoft’s strengths in this field, why is PP’s inbuilt engine not more powerful?

The features of PaperPort OCR are dedicated to specific tasks within PaperPort and are generally adequate for most business purposes providing an appropriate amount of value for our customers. To enhance the OCR capabilities a customer may add professional OCR speed, accuracy and value by purchasing ScanSoft OmniPage which will automatically upgrade the PaperPort OCR capability.

8) Where does Acrobat/PDF fit into all this? Both you and PaperMaster now support direct-to-PDF scanning, and products like Fujitsu’s ScanSnap use it by default. Is this the way things will go? Have they already snuffed out alternatives? What are the advantages of this, and are there disadvantages? Adobe’s interface, in my view, doesn’t make it easy for users to tweak, annotate or fix PDF files.

The greatest challenge to streamlining document-based processes in business is the fact that there are two incompatible dominant electronic document formats – Microsoft® Word and PDF. Microsoft Word provides professionals with a rich environment for document creation and collaborative authoring but the editable Microsoft Word format is not well suited for electronic publishing and online document storage. Conversely, PDF has expanded from its traditional roots as a design and pre-press tool to an electronic file sharing standard providing business users with a format that is well suited for the distribution, viewing and archiving of documents.

The result is that Microsoft Word is the standard for authoring and editing business documents, while PDF is becoming the preferred way of distributing and sharing business documents online. The pervasiveness of Microsoft Word (400 million) and Adobe® Acrobat® Reader (500 million) gives rise to the need for document management solutions that enable the seamless movement of documents between the two dominant formats.

However, traditional solutions for PDF creation and management, such as Adobe Acrobat, are not priced or designed with the business user in mind. The ScanSoft family of PDF products – PDF Create!, PDF Converter, and PDF Converter Professional 2, address this need by providing business professionals with the ability to more seamlessly move between the two dominant electronic document formats – Microsoft Word and PDF.

We extend this PDF support into PaperPort Professional 10 as well, by including not only Scan-to-PDF and the ability to convert any format to PDF, but also ScanSoft PDF Create!, which allows users to quickly create a PDF from any PC application, or merge multiple files into a PDF. We also continue to support and make available our own internal “PaperPort Image” format as well as supporting TIFF, JPEG, GIF and BMP as long as our customers require these formats. We also believe that a true desktop management system encompasses scanned file formats as well as popular digital application files.

News: Shredded Stasi Documents To Be Pieced Back Together

 The kind of story I love: technology used to bring the oppressor to book. The Register reports that documents of the East German State Security Service (Stasi), torn into shreds and stored in 16,000 brown sacks, may soon be pieced together by a software program developed by the Fraunhofer Institute.
 
On Monday, the Institute said it would take five years to solve the world’s biggest jigsaw puzzle electronically. If done by hand, the operation would take several hundred years.
 

Update: Office Update You Should Probably Have

 If you’ve already upgraded to Microsoft Office 2003 (why, exactly?) there’s an update you should download. This update, Microsoft says in its understated way, “fixes a problem that occurs when you try to open or to save a Microsoft Office PowerPoint 2003 file, a Microsoft Office Word 2003 file, or a Microsoft Office Excel 2003 file that includes an OfficeArt shape that was previously modified and saved in an earlier version of Microsoft Office.”
 
It turns out that if you save one of those files containing an OfficeArt shape (a particularly kind of graphic) in Office 2003, then open it in an earlier version of Office, you may lose the whole thing. Or, in Microsoft-speak, “you may experience the following symptoms:
The document may not open completely.
The document may be corrupted.
The document may open but with missing content.
You might receive an error message.”
You’ve been warned. More details here.

Update: Documents To Go For Dana

 Further to my review of the excellent Dana keyboard, its makers AlphaSmart, Inc. have announced they plan to offer a wide-screen version of DataViz’s Documents To Go Professional as a bundled software option for new versions of Dana. Documents To Go enables Palm users to work with Microsoft Office documents, such as Word, Excel and PowerPoint.
 
 
I found the Dana an excellent alternative for writing in certain conditions when you just want to get away from your desk, your office, your family, your town. It’s not everyone’s cup of tea, but with tools like Documents To Go, the lines between laptop and Dana tend to blur.

Column: the paper mountain

Loose Wire — Conquer That Paper Mountain: It’s time to get organized; Here’s some software to help you scan and locate photos and documents; But perhaps you shouldn’t ditch the filing cabinet just yet

By Jeremy Wagstaff
 
from the 29 May 2003 edition of the Far Eastern Economic Review, (c) 2003, Dow Jones & Company, Inc.
I’m a little suspicious of programs that, adorned with images of bits of paper and photos disappearing into a smiling computer monitor, promise to give order to the junk that is my life. The paperless office never happened — we still make printouts because it’s so easy — and while everyone seems to be photographing digitally these days, that doesn’t sort out our cupboards full of snaps. And even if this stuff does find its way onto your computer, chances are it’s all over the place, in subfolders with obscure names. A sort of digital chaos, really.

I don’t promise an end to all that. And the programs I’m about to tout are not really a new idea, but they both do a better job than their predecessors of helping you to get organized, whether you’re trying to sift through documents already on your computer, or get a handle on your photos.

First off, Scansoft’s PaperPort (deluxe version, $100 from www.scansoft.com/paperport/). Into its ninth version, it’s a lot more sophisticated than its forbears. PaperPort and its competitors allow you to scan documents into the computer, and then let you organize and view those documents into folders of your choosing. You can then convert them to digital text, a process called OCR or Optical Character Recognition, which in turn allows you to move chunks of the original document into a word-processing file. In theory it’s a great way to get rid of paper clutter on your desk, helping you to find those documents — or parts of them — easily, or to convert them to something you can use in your spreadsheet, document or whatever. In practice, it’s too much of a fiddle. Most folk find it easier to locate the hard copy of a document (behind the bookcase, next to the dead cockroach) than the soft one (What name did I give it? What keyword should I use to find it?), so they just buy another filing cabinet.

PaperPort hasn’t resolved the riddle of why we can always locate something under a messy pile of papers, but never after we’ve cleaned up, but it’s a few steps closer to making it easier to handle documents on your PC. First, you can scan them in a format called PDF, short for Adobe’s Portable Document Format, a widely used standard for viewing documents. By working within this standard — rather than PaperPort’s proprietary standard — everything you scan in PaperPort can be accessed and handled by other programs, or by folk who don’t use PaperPort. Common sense, I know, and they’ve got there at last. Another common-sense feature is a search function that allows you to search through an index of documents, whatever format they’re in, within PaperPort.

For a long time I’ve used PaperMaster, now owned by J2Global, the Internet-faxing company, which promises to have an updated version available later this year. PaperMaster does pretty much what PaperPort does, but it’s been doing it a lot longer and it actually looks like a filing cabinet, which I find reassuring. But it doesn’t work well with Windows XP, and is looking somewhat dated. Most importantly, it won’t save your scans in a file format recognized by anyone else on this planet. What’s more, it sometimes loses whole drawers of documents, which kind of defeats the object of the exercise.

So check out PaperPort. It will handle photos too, but if you’ve got a lot of them, I’d suggest Adobe’s new Photoshop Album ($50 from www.adobe.com/products/photoshopalbum/). Album is elbowing for space among a lot of similar products vying for the burgeoning home-photo market, but it has features and a very intuitive interface that I suspect will put it ahead of the pack.

Basically, it can collate pictures from more or less any source — scanning, digital images on your hard drive, on a digital camera, on a CD-ROM — and give you the tools to touch them up, label them, order them around and generally beat them into submission. You can create the usual things with them — albums, video disks, printouts, slide shows and whatnot — all in as tasteful a way as you can expect from a homespun photo album. I particularly liked the way you could tag photos more than once so, say, a picture of your Uncle Charlie doing the gardening in his pantomime costume could be categorized both under Family and Environmental Pollution Hazard. All in all, a smart program, and not badly priced.

Gripes? They’re a bit stingy on the tools they provide to touch up photos, so all the facial blemishes of my adolescent years are still there if you look closely.

These programs won’t change our lives. They may only make a dent in a filing cabinet and photo drawer. But they’re good enough for what they try to do, which is to lend a little order to our pre-paperless lives.

Loose Wire — Click Here

Loose Wire — Click Here to Read Summary

By Jeremy Wagstaff
from the 21 February 2002 edition of the Far Eastern Economic Review, (c) 2003, Dow Jones & Company, Inc.

If you work for a corporation, institution or any set-up which considers a vision statement to be worthy of its resources, chances are you’ll be required to file regular reports on your comings, goings and sitting-still-and-doing-nothing sessions. And the chances are that no one will ever read these documents top to bottom. In fact, chances are that no one will read them at all. Heck, you probably don’t even read them. But they have to be done, or someone will notice and fire you.

But where does all this stuff go? In the old days we’d say with confidence, “landfill,” but in the digital age, no such luck. It all gets stored on some hard disk somewhere, no easier to find than its hard-copy forebears. Luckily, no one shows a pressing urge to want to find it, but what happens if they do? The sad truth is that all these zillions of e-mails, Word documents, Acrobat files, PowerPoint presentations and spreadsheets we produce don’t build us a supply of wisdom; they just get lost. In the lingo of the information game, it’s called unstructured data and unlike its rich cousin, structured data, which gets sifted by sophisticated programs wearing tin hats called data miners, it sits idle and largely inaccessible, unnoticed.

But there are signs that software developers are taking a closer look at this forgotten corner of the information superhighway and figuring out ways of imposing order on this unruly mass.


Logik, from Coredge Software Inc. (www.coredge.com), will take a document, or a whole directory, or hard drive, and sift — or parse — the contents, extracting the most important phrases, or themes as Logik calls them. Logik also generates a summary of the document. It does all this remarkably well, giving you a sense of the document in question along with a list of themes, from names and concepts to phrases like “vision statement” — all in less time than it takes to say: “What exactly is a vision statement and why do we need one?”

This process is great for handling large numbers of documents that you might need to retrieve at some point, but may not have the time to read all the way through. A keyword search for a phrase or theme will throw up a list of files that include that phrase. And if you select one of those documents you get a summary. Logik will also translate documents between major European languages and Japanese. I was impressed by the intuitive, uncluttered feel of this software.

But while automatic summarizing is a great concept which has come a long way in recent years, it’s by no means the main function users want to see in programs that organize their documents for them. To me the most important part of the process is a simple one: Can I find the document I’m looking for quickly, and can I view it immediately? While users can view the original document in Logik, it opens in a new window, making it less seamless than the rest of the program’s functions.

Document Search

For this kind of feature — finding quickly and viewing — you need Enfish Corp’s (www.enfish.com) Find, which indexes your hard drive and lets you find anything from a single word to a complex Boolean string quickly. Another program that offers a similar feature is 80-20 Software’s Retriever (www.80-20.com/products/retriever/) though at present it doesn’t let you preview the whole document (future versions will).

For software that does straight summarizing, check out Copernic Technologies’ Summarizer (www.copernic.com/products/summarizer), which does a great job of abridging anything on the fly, whether it’s a Web page, a Microsoft document or next door’s cat.

These programs make digging up any document you mislaid — or keeping track of colleagues’ documents — a whole lot easier. None of them comes cheap, however-Retriever is $50, Summarizer is $60 and Enfish Find is $70, while the standard version of Logik sells for $150. But to me that’s a good thing: These companies are aiming at a more discerning market with deeper pockets — in fact at exactly the sort of guys who spend their days writing reports that their bosses will never read.