My Photo

Adsense


Add to Google Reader or Homepage

Subscribe in Bloglines

Subscribe in one go

  • Subscribe to RSS Feed

Your email address:


Powered by FeedBlitz

Google reader

Software worth checking out

  • ActiveWords
    Do everything without leaving the keyboard
  • Anagram
    Translates copied text into Contact, Calendar, Task, and Note items for Outlook, Palm etc
  • BlogJet
    Weblog client for Windows that allows you to manage your blog without opening a browser.
  • ConnectedText
    Intriguing Wiki-based organiser
  • Copernic Desktop Search
    Great alternative to Google's or Microsoft's offering for searching your PC. Simple and unobtrusive
  • Courier Email
    Great email program
  • DtSearch
    Text Retrieval / Full Text Search Engine
  • ExplorerPlus
    Organize and manage all your system files and folders
  • Gmail
    Webmail that really works. Great for catching spam too.
  • Google Deskbar
    Search with Google from any application without lifting your fingers from the keyboard.
  • Google Earth
    Zip around the planet and see things differently
  • Google Reader
    Best online RSS reader I think there is out there
  • Google Talk
    Chat online and make free internet calls
  • Jot+
    store all of your notes and information in an easy-to-use outline
  • Mindjet
    The mindmapper of choice.
  • MSGTAG - MessageTag
    Email receipt alert
  • MyInfo
    free-form information organizer
  • NoteTab
    Great text and HTML editor
  • PersonalBrain
    If you've ever wanted to organise your information in a way that's different, try this. Worth spending time on mastering
  • Process Explorer
    Not too geeky way to figure out what software is slowing down your computer. Just keep it running for a while and the culprit will become obvious.
  • Safari
    Surprisingly fast browser -- and for Windows too.
  • Skype
    Dump those phone bills
  • SpaceMonger
    Keep track of the free space on your computer via treemaps
  • Stick
    Post-It note-like tabs to store text, folders etc that cling to the edge of your screen
  • SuperNotecard
    Great for authors and writers organizing their thoughts
  • TaskTracker
    Lists recent documents by type for easy access
  • Text Monkey
    Easily clean copied text
  • Trillian IM Clients
    Gathers all your instant messaging accounts in one window
  • UltraMon
    Increase productivity and unlock the full potential of multiple monitors.
  • Vyooh DiskView
    Visually see disk space usage in Windows Explorer
Blog Widget by LinkWithin

« People's Daily Most Read: Tibet | Main | Why Reporters Hate PR Professionals »

March 20, 2008

Counting the Words

image

I've been looking recently at different ways that newspapers can add value to the news they produce, and one of them is using technology to better mine the information that's available to bring out themes and nuances that might otherwise be lost. But does it always work?

The post popular page on the WSJ.com website at the moment is Barack Obama's speech, which has dozens of comments added to it (not all them illuminating; but there's another story.) What intrigued me was the text analysis box in the text:

image

Click on that link and you see a sort of tag cloud of words and how frequently they appear in the text of the piece itself. Mouse over a word and a popup tells you how many times Obama used the word. "Black," for example, appears 38 times; "white" appears only 29. That's nearly 25% fewer times.

image

Interesting, but useful? My gut reaction is that it cheapens a remarkable speech--remarkable not because of its views, but remarkable because it's a piece of oratory that could have been uttered 10, 20, 50, maybe even 100 years ago and still be understood.

My point? Analyzing a speech using a simple counter is not only pretty pointless--does the fact he said 'black' more times than 'white' tell us anything? What about the words he didn't use?--but it paves the way to speechwriters running their own text analysis over speeches before they're spoken. "Hey, Bob! We need to put more 'whites' in there otherwise people are going to freak out!" "OK how about mentioning you were in White Plains a couple of times last year?"

Maybe this already happens. But oratory is an art form: it doesn't succumb to analysis, just as efforts to subject Shakespeare to text analysis don't really tell us very much about Shakespeare.

The Journal is just messing around, of course, experimenting with what it can to see what might work. We're merely watching a small episode in newspapers trying to be relevant. And it should be applauded for doing so. But I really hope that something more substantial and smart will come along, because this kind of thing not only misses the mark, but is in danger of quickly becoming absurd.

Perhaps more important, it fails to really add value to the data. Without any analysis of the frequency of words, there's not really much one can say to the exercise except, maybe, "hmmm." Compare that with a Canadian research project a couple of years back which developed algorithms to measure spin in the 2006 election there. They looked at politicians' use of particular words: "exception words" -- however, unless -- for example, and the decreased use of personal pronouns--I, we, me, us-- which might imply the speaker was distancing him- or herself from what was being said.

That sounds smart, but was it revealing? The New Scientist, writing in January 2006, said the results concluded that the incumbent, Prime Minister Paul Martin, of the Liberal Party, spun "dramatically more than Conservative Party leader, Stephen Harper, and the New Democratic Party leader, Jack Layton." Harper, needless to say, won the election.

Oh, and in case you're interested, Shakespeare used the word "black" 174 times in his oeuvre, according to Open Source Shakespeare, and "white" only 148, 15% fewer occurrences. Clearly a story there.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c5af153ef00e551507ec48834

Listed below are links to weblogs that reference Counting the Words:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Loose Wire search

Eco-Safe

Rank

  • Wikio - Top Blogs - Technology
Blog powered by TypePad
Member since 12/2003

Facebook

ten mov.es

tenminut.es