Tag Archives: Web search engine

The Lost World of Yahoo

This piece was written for a commentary on the BBC World Service Business Daily about Jerry Yang’s decision to resign as CEO.

Back in the early days of the World Wide Web there was really only one name. Yahoo. You could tell it was big because it was what you’d type in your browser to see if your computer was connected to the Internet.

Without fail: Yahoo.com. It’s been around since 1994, since Jerry Yang and David Filo, two grad students at Stanford, built a list of interesting websites, a sort of yellow pages for the Internet. They called it, first, Jerry’s Guide to the World Wide Web, and then Yahoo. By the end of 1994 it had a million hits. By 1996 it had gone public.

And, I reckon, it’s been slightly lost ever since.

Not that you’d know that from the figures. It’s the most popular website in the world. Nearly half that traffic is actually email, according to Alexa, a website that tracks this kind of thing. Nearly everyone on the planet, it seems, has a Yahoo email address.

But there’s also other stuff: search, news, auctions, finance, groups, chat, games, movies, sports. And Yahoo has been pretty consistent for the 14 years of its life: If you look at its homepage, the place where you’d land if you typed in yahoo.com, it wouldn’t look that different in 1995 to what it looked like in 2005. The familiar red Yahoo logo at the top of the page, a little search box, and then some links to directories.

But since then things have got more complicated. The guys at Google made a better search engine, so much so that their name has become a verb, a shorthand way of saying “look up something or someone on the Internet.”

That kind of left Yahoo behind. So far, I’ve not heard Yahoo used as a verb, or a noun, at least in a positive way. And Google also figured out how to make money from it, which stole another bit of Yahoo’s thunder.

But it hasn’t stopped there. Internet speeds have got faster. We’re now connected most of the time, via computer or cellphone. Upstart bloggers have toppled big media conglomerates. So now all the big players—Microsoft, Google, Yahoo—are not quite sure what they are: Media companies? Advertising companies? Software services company? A mix of all three?

So it’s no surprise that Jerry Yang has been unable to articulate what, exactly Yahoo itself is. If you’re not sure what your company is, never mind that you founded it, you shouldn’t be sitting in the CEO’s chair.

The truth is that there are two Yahoos. Ask an ordinary user and they’ll know about Yahoo. The email program. The instant messenger. The news portal. To millions of people Yahoo is comfortable and familiar.

Ask a geek and they’ll talk about another Yahoo: all the cool stuff the company engineers are doing. Pipes, which lets you mash data together in interesting ways. Fireeagle, that blends together information about where you are. And there’s the stuff they’ve bought that most people don’t even realise belongs to Yahoo: delicious bookmarks, for example, or Flickr photos.

People may be down on Yahoo right now, and the share price isn’t pretty. But it’s still a big brand, known around the world. And, despite their frustrations, beloved by many geeks.

One day someone will come along and find a way to package all this stuff together, or sell bits of it off. Then Jerry’s Guide to the World Wide Web will find its way again. It just doesn’t look like that person is going to be Jerry himself.

links for 2008-09-24

Google Killer? A Clip Around the Ears, Maybe

There’s a new search engine out there, according to the Guardian, and it sort of tries to figure out what you’re looking for. Which is good. Google searches are great so long as they’re simple. But is Powerset up to snuff?

Here are some searches I did (betraying my interests):

image

Pretty good stuff. And how about me?

image

Even less obvious matches seem to work:

image

Also right on the money. Nixon got second place when I asked who was the first u.s. president to resign? which is good enough:

image

Other searches tho — how many copies of Office 2007 has Microsoft sold? and how far is it from London to Sydney — weren’t any good at all.

Of course, Powerset is so far only parsing Wikipedia articles (only — there are 2.3 million of those in the English language). And ask Google the same questions and you’re also likely to get the answers high up (1st in the case of Nixon, Taser inventer, Suharto resignation, though nowhere on my own alleged career (fittingly). Sydney/London throws up a WikiAnswers page, and I’ve given up hope trying to find out how many copies of Office 2007 have been sold.)

Still, it’s early days for something like this. There’s no question that a better search engine will one day come along, perhaps belonging to Google, perhaps not. Will it need to parse every sentence for meaning? Who knows?

Technorati Tags: ,,

Ring Tones, Drugs and the Spamming of Google News

This week in the WSJ.com (subscription only, I’m afraid) I wrote about web spam — the growing penetration of faux websites that ride up the search engines and muddy the Internet for all of us. I based it around the recent case of subdomain spam, well documented by the likes of blogs like Monetize. Briefly websites controlled by one Moldovan hit the high rankings on several major search engines using techniques that are imaginative, but not exactly beyond the intelligence of savvy search engine builders. It’s not as intrusive as spam in your inbox but it’s trashing the web and undermining the usefulness of search engines.

But it’s not just ordinary search results that get spammed. It’s news. A search for “ringtones” on Google News, for example, throws up “free mono ringtones” as the top item:

Grt

(“Ringtone” throws up similar results.) Amazing, not only is it the top story but all the six “related” stories you can see as a green link below the four are from the same domain, advertising a range of goods that can hardly be lumped together with ringtones, including sildenafil and tenuate. (Searches of those words on Google News also have the same domain as top ranked, at least at the time of writing. Here and here. In fact the results for tenuate do not throw up a single news story; all eight matches are web spam.)

The sites in question are all subdomains of www.vibe.com, an online magazine which is indexed by Google news for its pieces on musicians. The pages that hit the top rank of results for ringtone and ringtones, however, are community messageboard pages, and clearly marked as such, which makes me wonder how either the web spammer is fooling the Google bots into indexing pages which are clearly not news by any definition, or why Google’s bots aren’t doing the job they’re supposed to be doing.

Yahoo! News’ search doesn’t do much better: Its first hit is a web spam site under the domain www.ladysilvia.net, which doesn’t even pretend to be a news site:

Yrt

(MSN’s news search comes out well, without any spam in sight, as does A9, which is basically the same engine.) But why are these sites getting indexed and included in news searches? I can only assume ringtones are such big business that it’s worth the web spammers doing their damndest to push their results up not only ordinary search rankings, but I would have thought Google and Yahoo! would be on top of this. Apparently not.

A Directory Of Podcast Directories

To accompany my column this week on podcasting (which will appear here when it’s out; subscription only I’m afraid), here’s a directory of podcast directories (and search engines), in no particular order:

Some are better than others. Depends what you’re looking for. I’m sure there are more: Please feel free to add.

What’s The Difference Between A Search Engine, A Search Destination And A Portal?

LookSmart has today unveiled some more focused search engines, according to a press release from the company:

It calls them ‘vertical search destinations’ to ‘provide niche audiences with essential search results, versus the typically exhaustive returns from other search engines’:

Two additional resources are dedicated to parents:

Here’s LookSmart’s philosophy: “LookSmart believes that search on the Web will become increasingly vertical and personal. Consumers turn to the Web in search of essential content, be it related to a hobby, work or education,” says Debby Richman, senior vice president of consumer products for LookSmart.

The idea is to ‘tightly integrate’ these engines, or ‘destinations’ (kinda blurs the distinction between a search engine and a portal, eh folks?) via Furl, LookSmart’s consumer online filing cabinet. I’m not quite clear how that tight integration is going to work, but it will be interesting to watch.

Yahoo! Goes Outside For Searches

Maybe it’s just Yahoo! trying out the competition, but a press release from Tucson, AZ-based Webglimpse.net, maintainers of the Glimpse search engine, say that Yahoo! has “purchased several licenses” of its software for internal use. Glimpse is a C program for fast searching of large numbers of text files on Unix systems. It is at the core of Webglimpse, a website search engine.

WebGlimpse’s Golda Velez says: “As I understand it this will be used by Yahoo! and Overture developers as a tool to search local datasets, possibly a large code base.” Why isn’t Yahoo using its own software for this kind of thing?

LookSmart Acquires Furl

This whole grab-stuff-from- the-net-and-store-it- somewhere-you-might- be-able-to-find-it thing seems to be taking off at long last.

Furl, which allows you to save clips from the Internet and store/share/access/search them easily, has just told its customers in an email (no URL available yet) that it has been bought by LookSmart, a SF-based “provider of Web search and research-quality articles”.

Furl’s Mike Giles, Founder & CEO, has assured its users that “LookSmart has no intention of changing the things that make it great. On the contrary, LookSmart is committed to making existing features even more powerful.” To sweeten the move for users, Furl is giving each 5 gigabytes of storage, and has promised that the service will remain free.

Ukraine Weighs In On The Search Stakes

Another addition to my index of indexing programs: diskMETA, from <META> Inc. “the largest search engine provider in Ukraine and a leader in Cyrillic multilingual search engine morphology technologies”.

A press release issued today says diskMETA is one of the fastest desktop search engines, and is available both as freeware and shareware. The program “is intended for extra large data volumes, UP TO 100 GIGABYTES. It can create up to 100 indexes, index up to ONE MILLION various files. The search time is never more than ONE SECOND”. It works on all Windows platforms (98 or higher).

The file search works with Office document formats (DOC, XLS, RTF, TXT), HTML pages, CHM, PDF files, ZIP and RAR archives. There are three versions: Lite (free), Personal ($50) and Pro, which supports morphological English searches and Intranet wide searches ($100)

The search technology used in diskMETA, apparently, “has a long and glorious history. It is used for a decade in the nationwide biggest and most popular web search engine www.meta.ua, in a series of search tools for web-sites and CD-rooms installed in most governmental and financial national institutions” in the Ukraine.

My tupennies’ worth? It’s fast, intuitive and unfussy. You can also view the raw text in a special preview window, but it doesn’t support preview in the same way that X1, dtSearch or the new Copernic Desktop Search do. That said, it’s great to see a new player on the block, especially one so enthusiastic.

The New Search Wars

Search is getting big again. Will it work this time around?

Programs that search your hard drive have been around for a while, but few of them seem to last. There was Magellan, askSam (OK, still around, sort of), Altavista’s Desktop Search, dtSearch (still going strong) and Enfish (still around, barely breathing). That was in the 1990s. But it’s only recently we’ve seen folk get really excited about the space again: There’s X1, Tukaroo (bought out pre-launch by Ask Jeeves), HotBot Search, and now something called blinkx (thanks, Marjolein, for pointing it out.)

Blinkx was officially launched last month as “a free new search tool that thinks and links for you, eliminates the need for keywords or complex search methods, easily finding the information you seek whether it is on the Web, in the news or buried deep within files on your PC.” In other words, pretty much what the other guys do. I haven’t looked too closely at it, but the main idea, as co-founder Kathy Rittweger puts it, is easy search without the logistics: “By eliminating the mechanics of search, such as keywords or sorting through dozens of unqualified results, we drive users more quickly to their goal: finding something, even if they didn’t know it was there!”

That’s good, and I would have said before that that was the way to go, but nowadays I’m not so sure. I think that as disk space grows and people’s hard drives become more complex, different users need different grades of configurability. With most of these new search engines pitching to the ‘lite user’ there’s a danger the more serious document hunter gets left behind. It’s actually a simple calculation: Are you aiming at the casual user who is happy to stumble across a few documents they didn’t know they still had, or are you aiming at the user that needs to find all the documents relevant to their search?

Anyway, it’s good to see folk finally seeing this space for what it is: Horribly underserviced, full of missed opportunities and millions of folk lost on their own hard drives. With Google, Microsoft and others about to enter the fray, here’s hoping that we get something really good out of it.