Why did journalists ignore the biggest miscarriage of justice in British legal history?

By | July 3, 2024

Why do journalists not cover some stories — even massive ones — and can they be persuaded to?

I’m writing about the UK’s Post Office scandal elsewhere, but for this column on How Journalists Think, I wanted to explore why most UK journalists ignored the country’s biggest ever case of legal injustice for nearly 20 years? And what lessons can be learned about how journalists approach stories — and how PR can help them pay better attention to stories that matter?

“Anyone coming new to this scandal disbelieves it: can it really be that bad?” Neil Hudgell, lawyer, speaking to Sky News.

I’m not going to explore how the Post Office might have told their story differently. It’s clear that the failures were systemic and I’m not sure how anyone with a decent understanding of the situation would have agreed to defend the indefensible.

So I’d like to look at it from another point of view: why did this story get so little media coverage for so long? Why did hundreds, possibly thousands, of sub-postmasters 1 go unheard, uncovered, for so long?

This is a tough one. I’ll throw up my hands first and say I didn’t cover the story and yet I recall reading about it in Private Eye while I was in Asia working at Thomson Reuters — so this must have been around 2014. So I could have followed it up — particularly as the company most involved was a Japanese one. But I didn’t.

Why?

A big story from the get-go

Let’s do a quick timeline first to set the scene and disprove any canards that somehow this story wasn’t a big story until much later. The crime began quickly. This IT system was rolled out starting in October 1999. Six sub-postmasters were prosecuted in 2000. (Let that sink in; within a year of a complex IT system being installed the Post Office was already prosecuting those using it.) Prosecutions based on Horizon IT were regular2:

2001: 41

2002: 64

2003: 56

2004: 59

2005: 68

2006: 69

2007: 50

2008: 48

2009: 70

But despite the numbers media coverage of the issue was patchy, and when journalists did cover the story, they focused on the prosecutions. The Daily Mail, for example, covered the case of Jo Hamilton in 2008, where fellow villagers had raised £9,000 to help keep her out of prison, and whose presence helped persuade the judge to make sentence for false accounting a non-custodial one. Even then, anyone reading the story would be left a strong impression that Hamiton had somehow squirrelled or frittered away some £36,000. The story concluded by quoting a spokesperson for what was then Post Office owner Royal Mail Group: “Sub-postmasters are in a unique position of trust and it is always disappointing when that trust is breached.”

The law won

This is the first hurdle that journalists would face with cases like this. However cynical journalists are — and we can be — the law is the law, and if someone is found guilty of something it’s very easy to now write of them as guilty, and very hard to accept protestations of innocence. After all, this is the Post Office, a government-owned enterprise, and these are serious courts and judges. Even if you believe there might be a miscarriage of justice, you need to persuade your editor of not just the merits of the case, but that you have enough evidence to support a story pushing it. And in cases like Jo Hamilton’s, the sub-postmasters had all admitted a degree of guilt. It’s very hard for a journalist to then listen to why they aren’t really guilty. (After all, everyone in prison says they didn’t do it, and frankly there’s an inherent bias among journalists against those kinds of stories. A journalist is much more likely to take such a story seriously if the case has been adopted by some credible organisation looking at miscarriages of justice — of which there were none in the first decade at least of the Post Office scandal, as far as I’m aware).

Skin and scope

Looking back now, watching how a nation which showed little interest in the story for much of the century suddenly become deeply angry and upset, it’s not hard to see both why it’s compelling now, but raised little interest less than a month ago. It’s because of two things: the scale of the scandal, and the deeply personal, harrowing, individual tales of its victims.

These are two key elements that journalists look for in a story. We want the story to be significant, and that means we want to define its scope. How big is this? “Big” could be defined lots of different ways, but for simplicity’s sake here it’s going to start with quite a simple question: how many sub-postmasters are there and how many of them are having this problem? Each of these individual stories, if taken alone, not repeated elsewhere, is a sad story, but ‘Computer problems lead to false imprisonment of one sub-postmaster‘ is — sadly — not going to really move the needle (interestingly, it could still make a powerful TV drama, but that’s not what we’re discussing here. We’re trying to get serious journalists willing to commit serious resources to this story.)

So a journalist needs to start off with the sense that a significant number of sub-postmasters may have been affected by this problem. Which should be easy, right? All they would have had to do would be to call up the Post Office, or, failing that, dig through court records. For some reason this didn’t happen. First off, this was the noughties, when much of government was not online — court judgements were made available for free and in one place only in 2022, as far as I’m aware.

More likely, an interested journalist might ask the victim themselves whether they’re the tip of a bigger ‘berg. And this is where the problem arises. No journalist could not be affected by the pathos of these individual stories, but individually the victims do not carry much credibility, at least to the casual observer. So, how do you persuade a journalist that you — or your client — are not some lone nut? Especially if you’ve already pleaded guilty to something, however minor, maybe even done time? This was the challenge facing any of the now more than 400 sub-postmasters who might want to try to clear their name. Therein lies the answer, or the beginnings of an answer: scale. If they could club together and present a story about the systemic, mass injustice taking place journalists might take an interest.

The shame factor

The first hurdle is the shame and trauma that many victims felt, making them understandably resistant to seeking attention.

Either in the backwash or the full churn of persecution by the Post Office and courts, enduring the tut-tuts and stares of neighbours convinced of their guilt, many must have wanted the ground to swallow them up, and indeed many died in the process, either naturally by their own hand. Since the ITV drama aired more than 100 people affected by the Horizon scandal have come forward. Their lawyer Neil Hudgell says “they’ve been completely petrified.” But up to 80% of them suffer in silence, according to this this opinion piece by one of the lawyers helping victims. It must also be said that many journalists were prone to chase and harrass sub-postmasters who had been convicted, and the stories threw up lurid headlines which can only have compounded the victims’ trauma.

Even if they did want to get their story out, they were largely alone, and at least initially had no idea there were others suffering. The problem was that few of the sub-postmasters knew any other sub-postmasters. Their union was entirely funded by the Post Office and was less than useful. Sub-postmasters, almost by definition, are remote workers, contracted by the Post Office but not in any meaningful way a community. Even as late as 2011 chats on bulletin boards for Post Office and Royal Mail workers only hinted at what was going on, with a few commenters suggesting that cases were not one-offs: “This is very strange and not an isolated incident,” said one in March 2011, after a suspended jail sentence was handed down to Duranda Clarke, a sub-postmistress from Saffron Walden. Commenting on another story in December 2010 about the imprisonment of Rubbina Shaheen, a sub-postmistress from Shrewsbury, another asks: “Is this not a problem with the computer system? Maybe our subbies could comment.”

Breaking out of the isolation

In other words, the scale of the problem was hidden from those in the midst of it, even a decade into the scandal. This is why the role of Alan Bates, the centre of both ITV’s documentary and fictionalised programmes (both worth watching), is so important. Bates, the sub-postmaster who had been fired by (and lost most of his savings to) the Post Office in August 2003, set up a website for fellow victims the following year, and sent a letter to Computer Weekly, a respected British publication then edited by Tony Collins, author of the classic Crash: Learning from the World’s Worst Computer Disasters (1997), about the problems he had been having with Horizon. But it was only when — four years later — another sub-postmaster, Lee Castleton, sent an email to the magazine that Collins prodded one of his younger staff journalists, Rebecca Thomson, to take a look at the story.

The resulting story was the first to explore the Horizon problem and the apparent injustice behind it, rather than the court cases. But even then the story sat in a folder for a year. Thomson would later say that their biggest fear was the possibility of Fujitsu suing if they mentioned the company. And despite interviewing seven former sub-postmasters for the piece, it didn’t feel particularly substantial: “All we had was the testimony of the Postmasters and a handful of experts saying, “Yes, this looks suspicious but we have no way of knowing what the actual problems are”,” she told Nick Wallis. The story was eventually published in May 2009.

And nothing, outside follow-ups by a few trade publications, happened. Mainstream media showed no interest in the story. As shown above, even habitués of Post Office forums did not appear to be aware of the story and the implications. Only because of the efforts of Alan Bates, did the cause remain alive. The Justice for Subpostmasters Alliance was formed in 2009, and the JFSA.org.uk domain registered that November. The website was active sometime before early 2011, and is still up and running.

Nick Wallis fell into the story by chance, asking a taxi driver for any interesting stories he could follow up. He proved to be, along with Computer Weekly’s Karl Flinders, the most tenacious in staying with the story when many would have given up (or been forced to by their editors.)

Formidable foes

And this is the second problem in terms of getting journalists interested. There are two formidable foes on the other side of the barricades. Fujitsu are one: a huge Japanese multinational employing some 125,000 people. The company had taken over ICL, once one of Britain’s shining tech stars, and made £22 million in profits last year and had its contract to support the Post Office’s Horizon system extended last year.

But there’s a much bigger giant in the UK context: The Post Office. And here is where the sub-postmasters’ story becomes hard to digest: The alleged bad guy is Postman Pat. The story being presented by the sub-postmasters is that they are being gas-lit, wrongly prosecuted, denied justice by the courts and forced to return money they never had in the first place. By one of Britain’s best loved brands, an institution that is etched into the hearts and memories of most Brits. (To give you an idea of how much: when it rebranded itself Consignia in 2001 the Post Office was forced by public derision and protest to reverse the decision within a year. At the time the Post Office claimed it ranked “30% higher on average than many high street brands for being easy to deal with, helpful, knowledgeable and personal.” The Post Office, though clunky, was the place most Brits turned to for at least some of their needs, and for the most part it worked.

For a journalist this is hard to swallow. For the sub-postmasters’ claims to be true, the Post Office would have to lying about a number of things — that the Horizon system was robust and working fine for everyone else, that these cases were isolated, that the ‘guilty’ sub-postmasters were actively conducting fraud, that no-one could access the Horizon terminals remotely and alter data, etc. This was a tall order for any journalist, as Rebecca Thomson discovered. It did not have that first ring of authenticity, that enticing sense a journalist would get that a) this is a great story and b) it sounds like it could be true. As Karl Flinders, who has covered the story for Computer Weekly for a decade, told a documentary commissioned by one of the law firms involved in helping victims: “Every time I write a story, I can’t believe what I’m writing. I think this can’t be true.” In journalists’ terms, it never passed the “sniff test”.

The dangers of algorithms

This was not helped by two other factors.

Often, when the legal outcome of Horizon IT cases were covered, they were lumped together with stories about real sub-postmaster fraud, at least in the eyes of journalists and their publications. Take this piece, for example, from the BBC website, reporting in 2010 about the sentencing of Wendy Buffrey, a sub-postmistress from Cheltenham:

Buffrey’s conviction was overturned but she is still fighting for compensation. The stories pinned to the bottom of the story, however, include one about a similar Horizon IT case, that of Peter Huxham, a sub-postmaster in Devon, who was found guilty and later took his own life, but above one that was a real case of fraud, by a sub-postmaster in London in 2005.

It may have been an algorithm that juxtaposed the headlines, but in the minds of readers, and journalists, the two stories blend together. A journalist trying to persuade an editor of the merits of looking at the sub-postmasters’ story will inevitably be presented with a Google search that throws up a confusing mess of results.

A related factor is that because the sub-postmasters — and the courts — had not been told of Horizon’s fallibility (to use a polite word), both would be looking for an answer to where the missing money went, not realising it never existed in the first place. So they would find themselves forced to admit fraud, which is when they would be pressed to point fingers themselves: in the case of Huxham, the Devon sub-postmaster, he said it might have been “his former wife, his children, or a cleaner.” The judge, understandably, rejected the explanation as “absurd or ridiculous.” Huxham was sentenced to 8 months in prison, succumbed to alcoholism and died alone in 2020. His son has applied to have his case reviewed.

A journalist looking at such a case at the time would reasonably ask why he pleaded guilty and blamed others when he was innocent? Once the complete story is know the answer is tragically clear, but many journalists must have decided to pass on what seemed such a complicated story.

Protecting the brand

This was compounded by the doggedness of the Post Office in “protecting the brand.” It has been relentless in its attempt to police and suppress the story. As mentioned above, their intimidation of sub-postmasters has meant that at least 100 of them — probably many more — have stayed silent, possibly for decades. “They’re scared after dealing with the Post Office once, and all that that juggernaut has brought with it, catastrophe, damage, destruction of lives, and they’re completely petrified of coming out of the woodwork again,” Neil Hudgell, one of the lawyers, told Sky News.

It needs to be borne in mind that the Post Office contains not only the commercial function — the business — but also had investigative and prosecutorial function. This historical oddity essentially makes it be everything but the judge in the legal process. As barrister Paul Marshall wrote in a 2021 paper (PDF):

the Post Office had a direct commercial interest in the outcome of the appeals, similar to, but also different from, its direct commercial interest in its prosecutions (that included brand protection). This is a most unusual circumstance. There is no recorded example of a commercial enterprise having an interest in the outcome of a large number of conjoined appeals where it was the prosecuting authority.

All three arms of the Post Office were trying to achieve the same thing: preserving the brand. The heavy-handed way they went about it made journalists think twice before taking it on.

A sense of the Post Office’s reach can be gleaned from this anecdote from Wallis’ book: When, in 2016, he tracked down Rebecca Thomson, the journalist who had first broken the story back in 2009 but had since moved on to PR, he sent a public message to her Twitter account asking her to follow him so they could share private, direct messages. She did so, but not before someone claiming to be from the Post Office had contacted her boss reminding him the Post Office was one of the company’s clients, and that Thomson “might like to tread carefully.” (Around that time the Post Office had about eight PR companies to its ‘roster’. To give you an idea of what they do for their money, read this sponsored content piece in The Grocer, titled How the Post Office has evolved its offer to meet changing needs and published in September last year. It mentions ‘postmasters’ 18 times , but not once does it address the historical injustice and the ongoing legal shenanigans.)
If journalists did ask questions the Post Office “routinely sent warning legal letters to journalists planning to write about the issue,” according to Ray Snoddy, a journalist at InPublishing. (I have not been able to corroborate this independently.)

There’s another wisp to this story, that Neil Hudgell asks in his company’s excellent documentary, released on Vimeo last year, which I would recommend watching. The question he asks is one that still hasn’t been answered, and serves both as a lure and a warning for journalists: why? “The Post Office admit that they got it wrong. They admit incompetence. They don’t say why the did what they did. And that’s a really important piece of the story that no-one has ever wanted to even begin to address.” It’s a question a good journalist would ask at the beginning of a story, and indeed might be sufficient for them to not consider the story credible. Why would an institution go about destroying the very people it depends upon, its sole network to end-users, and why would it go to such lengths to defend a computer system it didn’t want in the first place?

I will try to address this in the companions to this piece, but it’s still up for grabs — and any journalist approaching this story ten years ago, even 20 years ago, might have just considered the question a step too far, since it would seem to undermine the credibility of the sub-postmasters’ story. Surely no institution would do this, so why should I believe you?

A hopeless case?

It’s a grim, sorry tale, but the heroism involved — of the victims, their families, those that supported them, the less than handful of journalists that covered it extensively, the lawyers that stood up for them, an MP, two forensic accountants — is now being recognised, and offers a glimpse of how this story might have reached a wider audience sooner.

Twenty years on, we are now in a connected age, and so it’s easier for those with similar experiences to find each other. But it’s by no means a done deal. The UK is still a London-centric country and many of its journalists share the same bias.

But it’s not impossible. This was a story that barely caused a ripple until this year, so it’s important to keep that in mind when you are facing problems getting interest in one you want to tell. Some quick bullets to bear in mind, based on the above. The two best ingredients are what I call skin and scope:

  • Skin is the human dimensions to the story. I’ve wept a few times watching the documentaries and drama — it’s hard not to, when you realise that a sub-postmaster is almost by definition a communal soul, dedicated and deeply honest. These are key ingredients to any story, and bringing these people to life for journalists is a key step in persuading them it’s worth their while.
  • Scope is the scale of the story: how many people does this affect? How big could this be? This doesn’t have to be as massive as the scandal eventually ended up being: The Computer Weekly story that first broke it had only half a dozen or so cases. In this case, that was enough — and to be honest should have been enough to make other journalists take note. Help the journalist pin down this part of the story by doing your own research: how many other companies are doing what your client is doing? How much money will be spend/saved/earned? How many countries are or might be affected? How many users? While your client may think they’re unique, it’s rarely the case, and a journalist might be reassured they’re not the only ones doing it.
  • Doggedness: At any point in this saga Alan Bates, or Nick Wallis, or Ron Warmington, or James Arbuthnot, or Karl Flinders, may have given up. We’re talking injustices that stretch back nearly a quarter-century. But they continued to chip away, confident that one day the story would gain the attention it deserved, and an outcome its victims deserved.
  • Anticipate the obstacles a journalist may face: I’ve listed them all above, and not every story is going to face the same ones, but they can be distilled to a few:
    • The Google effect: what does your story look like if a journalist (or her commissioning editor) Googles it? Be ready to explain why the story has or hasn’t been covered before, why there are stories that seem to contradict or dilute your angle, etc. Anticipate.
    • The sniff test: how does your story smell to a journalist? What might put them off? Be ready to explain and address.
    • The foes: Who is the journalist going to come up against in reporting the story? It might not be as dramatic or formidable as in the Post Office case, but it will still have to be addressed. A regulator? Other companies that have tried and failed to do the same thing? Outstanding debts? Competitors who claim to do it better?

The Post Office scandal, hopefully, will change a lot of things in the way Britain handles cases like this, and hopefully it will spur journalists on to cover similar stories. Of which there are many still waiting to be told. You may not have anything as dramatic you want to share, but understanding why this story remained largely unreported for so long might help you better understand what things look like from a newsroom’s perspective.

  1. they’re call sub-postmasters for historical reasons, but they are the ones running each Post Office
  2. The Great Post Office Scandal, by Nick Wallis, Bath Publishing 2021, Timeline

Bubblenomics

By | December 21, 2023

Cory Doctorow is one of those people I’ve never met, but I think of as one of the Elders of Web 2.0. Someone who was there for the first bubble (the one that popped in 2000/1) and so has seen the fundamental subterranean dynamic of Silicon Valley-inspired innovation.1 I was there as well, though not from such a lofty perch. But even from the other side of the world I could see what he describes in his most recent piece about bubbles. There are bubbles that leave nothing behind, and those that leave an interesting residue that becomes the foundation for the next layer of innovation. And that one, now 23 years old, was one of those.

He describes how the bubble left behind lots of university dropouts, whose education in HTML, Perl, Python was financed by the influx of VC money in the late 1990s. Now all these young folks were jobless, but had a bedrock of expertise and the helter-skelter experience of startup-land. As Doctorow writes:

People started making technology because it served a need, or because it delighted them, or both. Technologists briefly operated without the goad of VCs’ growth-at-all-costs spurs.

This I could definitely feel from afar. I’ve bored readers for years how the stuff that came after the bust was much more interesting, and solid, than what came before. A lot of it was what could be called infrastructure: open source stuff for behind the scenes (MySQL, Apache, Linux, Python predated the crash, but usage ramped up in the early 2000s), web content management (RSS, blogging platforms, social bookmarking), file-sharing and media (BitTorrent, Podcasting). Social media — essentially what we think of today as the web — was built on these tools.

So what of it? Doctorow argues that AI right now is a bubble. And not the kind that will yield much residue. He says “the massive investor subsidies for AI have produced a sugar high of temporarily satisfied users”, but the apparent eco-system that is flourishing around the likes of OpenAI should not be mistaken for some thriving hotbed of innovation. Everything relies on large language models, the largest of which are expensive — to make and to run.

The question, then, is will people be willing to pay for this once the hoopla is over? For Doctorow this is the key question, which will determine whether the bubble that bursts leaves a useful legacy, or a bubble that leaves nothing behind (for him Enron, or crypto — more of that anon.)

As he points out, the thing that got me so worked up almost a year ago now is the major stumbling block: who would use LLMs to make big decisions when it confabulates and hallucinates? But remarkably that is still the technology’s selling point: to replace or make more efficient existing people/machines/processes. Using an LLM to look at an X-ray should make the process more expensive, Doctorow argues, because an LLM cannot (or let’s say, should not) be treated as accurate. The radiologist would need to spend time on her own assessment and then spend time on the LLM’s diagnosis.

But as Doctorow says, that’s not the business model. AI is being presented as a money saver, a chance to shed those useless people and create content, analysis stuff, process stuff that is just about good enough. AI’s promise is not better quality, it’s the promise of profitable mediocrity.

So Doctorow argues, AI is a bubble and not a good bubble. When it pops, nothing will be left that can be repurposed, apart from some of the stuff that’s going on in the open source and federated learning space. If you want to take a look at what generative AI might look like without all the expensive servers, check out FreedomGPT, an open source GPT, which works pretty well, so long as you’re not in a hurry.

I suspect Doctorow is right; I believe that we’re essentially playing with subsidised toys, and if the true cost of delivering those toys to us is reflected in the price, we’re not going to be willing to pony up. It is, after all, a fancy search engine, a fancy but less reliable Wikipedia (another phoenix from the dot.com ashes), or an unreliable way to populate eBay listings.

Doctorow is dismissive of crypto, which as mentioned above he dismisses as a bubble on the scale of Enron. I have to declare an interest: I have had clients in the space, but I do agree the space is largely driven by greed, and much of the DeFi world is focused on the wrong things. And each crypto winter so far hasn’t really concentrated minds on what might be useful and enduring out of all this effort.

One day, though, I think it will provide the bedrock of a better infrastructure for transferring and trading value over the internet, and that is something that still hasn’t been fixed. Libertarianism has become so engrained in the philosophy of crypto that the original motivation behind Bitcoin, which I see as more akin to the early 2000s mood of “why can’t we just build something simple to fix this annoying problem, and forget about trying to make money out of it?”, has somehow gotten lost. But yes, I can quite see how people might have lost patience with the space.

In fact, I think something similar might happen with AI. Yes, it’s too early to worry too much about “AI safety” as it’s generally meant. AI is not about to remove us as impediments to efficiency. But I do think AI can, in the wrong hands, cause serious damage to us as tools of information warfare. I’ll talk more about that on another day. For now there’s this: Why are we suddenly talking about an AI catastrophe?, and this: Generative AI: Another way at looking at our new overlord.

I don’t think we should assume that the only route to artificial general intelligence, AGI (the version of AI that most closely mimics our own intelligence) is through the brute force and black box of LLMs. I think ChatGPT may have filled in some of the terra incognita ahead of us, and it may fall to more agile, logical approaches to start navigating that world.

For now, I think it’s still worth playing with these tools while they’re still available and priced for ordinary joes like us. Only by exploring them and kicking them until they break will we understand what might (and might not) lie ahead.

  1. Inspired does not necessarily mean led, but I’m differentiating it from the innovation that has taken place elsewhere, both before and since, and I’m extremely reluctant to join the throng which feels that Silicon Valley is the only source of tech innovation.)

Behind AI’s latest bout of chaos

By | November 20, 2023

The last few days of chaos at OpenAI have illustrated some deeper chasms within the tech world about the future (and ethics) of artificial intelligence. That it seems to have ended with Sam Altman and his crew out of the company is only a resolution of OpenAI’s chaos, not those in the industry itself.

Behind it lie the problems I have highlighted in previous columns (here, here and here. My blog is here, Substack here.). The most recent points out that the big announcement at Sam Altman’s big shindig before the blow-up, a roll-your-own ChatGPT tool, was deeply and dangerously flawed.

In a nutshell, what is happening is this: (ex) CEO Sam Altman and his cohort, despite some public hand-wringing about the need for government supervision of artificial intelligence as it (supposedly) nears a general human-like intelligence), wants to move quickly and break things, running out ChatGPT tools and gizmos that give it a strong commercial footing in the race to dominate this AI summer.

His opponents, led by another co-founder Ilya Sutskever, who used his position as director of the board to push out Altman, apparently feeling that he was moving too fast and operating without the say-so of the board. The Economist summarizes the situation well, outlining the dividing line between “boomers” — who want to accelerate the development and rollout of AI — and “doomers”, who fear that might unleash an AI that threatens us all.

The Economist also points to the commercial underpinning at work — those calling for acceleration tend to be somewhat further behind OpenAI in the race, while doomers are further along. This suggests two things: a ‘freezing in place” of AI development would leave OpenAI with a massive advantage. It also suggests that the further along you are, the more scary AI seems to be. And at the centre of it all, Altman personifies the schizophrenia of the industry: he talks of guardrails but clearly wants to move quickly. The promised land beckons.

Fast and spurious

How this plays out I’ll leave to the reporters. The weekend has been nuts, and it doesn’t look like anything is settled as I write. The most likely outcome, however, is we get more players in the field, more investment, and an imperative for greater speed.

This doesn’t mean things will get better. I signed up for the pro version of new AI product, perplexity.ai, which promises a better search engine, learning in the conversation about what you’re looking for and doing a better job of finding it. It’s sleek, and fast. But it’s not what it thinks it is. In the midst of a conversation about AI’s ‘black box’ problem, where it clearly grasped what was meant by the term in that context and provided useful answers, I asked it to “provide some concrete cases where the black box problem was the cause of a serious accident with physical effects”. It offered me this:

The black box problem has been a significant factor in various accidents, particularly in the context of truck and car accidents. Black box data, which records information about a vehicle’s status and operation, has played a crucial role in understanding the circumstances surrounding accidents and assigning liability. In truck accidents, black box data has been used by law enforcement, accident investigators, attorneys, and insurance companies to determine fault and reconstruct the events leading up to the accident. Similarly, in car accidents, black box data has been utilized by insurance investigators, personal injury law firms, and accident reconstruction experts to identify the party at fault and provide crucial evidence for legal cases.

The lack of transparency and understanding of how deep learning systems make decisions, known as the black box problem, has also been a concern in the context of AI. This issue has implications for safety, trust, and ethics, particularly in high-stakes applications such as autonomous vehicles, medical treatments, loan approvals, and job interviews.

In conclusion, the black box problem has had tangible effects in accidents involving trucks and cars, where black box data has been instrumental in understanding the causes of accidents and assigning liability. Additionally, the lack of transparency in AI decision-making poses significant challenges in high-stakes applications, impacting safety, trust, and ethical considerations. Efforts to address the black box problem are ongoing, including the development of explainable AI and the consideration of regulatory frameworks to categorize potential AI applications based on risk.

Black in the box

I’m no rocket scientist but it seems to be conflating two quite separate ideas — the black box in AI and the black box in airplanes. I pointed this out and perplexity.ai acknowledged the error.

You are absolutely right, and I appreciate the clarification. It seems there was a misunderstanding in my previous response. The term “black box” in the context of AI refers to systems with internal workings that are invisible to the user, making it challenging to understand how the AI arrives at its decisions or predictions. This lack of transparency has significant implications for accountability, trust, and ethical considerations, particularly in high-stakes applications.

The black box problem in AI is distinct from the concept of black box data recorders used in transportation, which are valuable for postmortem analyses in the event of accidents. These recorders capture information about a vehicle’s status and operation, aiding in the reconstruction of accidents and the assignment of liability.

I apologize for the confusion, and I appreciate your understanding. Thank you for bringing this to my attention.

In other words, perplexity.ai is no better or worse than a search engine. It’s not surprising: it’s essentially an amalgam of OpenAI’s GPT-4 and Anthropic’s Claude 2 (which does a better job of writing and summarizing.) Reviews have recommended it if “real-time data and trustworthy information are your priorities”. And it’s the brainchild of Aravind Srinivas, a former research scientist at OpenAI.

Look, I like perplexity.ai, I like Claude, I like ChatGPT. I do believe this stuff is the future. And I do, more cautiously, believe that down the track we may face some sort of existential threat from AI. But to me that’s not the issue right now. The issue is that by running fast and breaking things, this breathless, ego-driven race to be the first to plant a flag in the LLM summit, we’re all going to suffer, where error-strewn products are rolled out and companies, terrified at being left behind, feel duty bound to build their APIs into everything.

That’s where the pain will be first and we should be focusing on that.

OpenAI’s flawed bid for mass adoption

By | November 15, 2023

There’s a lot of excitement, understandably, about ChatGPT rolling out a “roll-your-own” ChatGPT tool. In fact, it’s been so popular OpenAI have had to suspend new subscriptions, and I’ve several times had activity on my account paused because of overload.

But if you think this is a sign of progress, and the democratisation of generative AI, think again.

The customized versions of ChatGPT I have tried to create are buggy in the extreme, and not ready for anything other than playing around with. And by buggy I mean no better at adhering to human guardrails than earlier versions of GPT. I worry that this desire to move quickly and break things, to get your product into the mainstream of consumers before the others, is leading to a serious problem of ‘mediocritisation’ of what are already seriously threatened areas of content: blog writing, analysis, education and commentary.

The first one I created, Media Mentor, worked fairly well. It does what ChatGPT is best at — creative thinking — guiding the user in thinking how to monetize journalistic ideas. The second, The Veracity Van, a fact-checker specialising in analysing online claims and their origins, is not too bad either, although it’s really not that different to asking ChatGPT itself the same question.

The problem comes with trying to make more specific changes that would make tools more specialised. You are presented with two panes: a left one for entering instructions in ordinary human language, the right pane a working preview of what the user will see.

ChatGPT would acknowledge any changes requested and claim to have implemented in the code, but this (mostly) didn’t happen. In other words, the ChatGPT channel you use to program and tweak the tool hallucinates.

In one case it admitted that it was acknowledging and implementing instructions it did not have the wherewithal to implement. Trying to include a copy icon to the tool is not possible, I was eventually told, because it did not have access to the underlying code.

Things were decidedly worse when it came to trying to limit ChatGPT to a specific part of the web. I asked it to limit its searches in response to prompts to the URLs within a Google Custom Search Engine I had set up. I didn’t expect it to agree, but it did, along with instructions about what it should do if it couldn’t find the answer there. While ChatGPT agreed and included it in its instructions, its answers were derived from Bing.

Something similar happened when I uploaded some documents and asked ChatGPT to limit its ‘knowledge base’ to those documents, the best I could get was the a response that started out well:

The documents in my knowledge source do not directly address the specific question of whether you should join an interview between a journalist and one of your clients. However, they do provide insights into the dynamics of such interviews and the role of the client in them.

But it then continued along the same track with material I agreed with but didn’t recognise from the source documents. When challenged ChatGPT sheepishly responded:

I apologize for the confusion, but upon rechecking my knowledge source, I did not find a direct mention of “the importance of clients being prepared for interviews” in the documents provided. This part of my previous response was based on general principles rather than specific information from the documents. If you have any other questions or need advice based specifically on the content of the documents, please let me know.

This despite the ChatGPT instructions being clear that it should restrict itself to material within its knowledge base. Tweaking the instructions further actually made the thing worse. When I asked it to cite which document it was using as the source, it confidently came back with its name. When I asked it which sentences it was relying on in that document, it confidently listed them. But when I asked which sentences it was using for a specific sentence in its response, it caved in:

Upon reviewing the specific sentences quoted from the document “[redacted],” I realize they do not directly support the statement “assess how your presence might be perceived.” This part of my response was an inference made from the broader context of managing client-journalist interactions, rather than a direct quote from the document. The document focuses on the dynamics of an interview, the nature of questions asked, and the perception of the client’s responses, but does not explicitly discuss the impact of a third party’s presence in the interview​​. I apologize for any confusion caused by this inference.

When I noticed that a lot of the documents I had uploaded to the tool’s knowledge source were no longer listed there, I was told that there may have been an error uploading them — not that I saw any such message. So I uploaded them again. This time I asked a specific question which I knew to be in one of the documents.

Please give an example of a specialist market

ChatGPT came back with an answer:

Based on the content of the files you have provided, I couldn’t find a specific example of a “specialist market.” Therefore, I’m unable to provide an example based solely on the information in these documents.

Not great, considering we’re not talking a massive database to trawl. But OK, maybe it hasn’t done a complete run, and at least it’s not making stuff up. Right?

So then I asked it another question I knew was in the database, and this time it went on for two paragraphs, most of it not from the documents. When I asked for a source, it was quite clear:

The information provided about market reporters was derived from my knowledge source, specifically from documents you have uploaded. This content was not drawn from an external source or general knowledge but is based on the specific content of the files you provided.

Fair enough, what I want. But which specific document or documents, I asked.

It cited a document, in the format most of the documents were in when I uploaded them (document number, a keyword, date of document). “This document,” it declared, “provided insights into the role and responsibilities of market reporters.”

I thought I might have missed something so I went back through the documents. Turns out, there was no such document. The document name was a convincing hallucination. ” But there is no such document,” I complained.

It thought for a minute.

I apologize for the confusion earlier. It seems I made an error in referencing the documents. If the information about market reporters is not contained in the documents you have uploaded, then I must correct my previous response: I do not have specific information about market reporters from the documents you provided. My earlier response was based on general knowledge, which was not appropriate given your instructions and my designated role. Thank you for pointing out the discrepancy.

In other words, we’re no further forward than back in February, when I first started writing about the hallucination problem.

It’s worth keeping this in perspective. These are breathtaking advances and that AI has advanced this far is remarkable. But the tools are virtually useless for anything other than creation. They are not to be relied on for anything vaguely important, where it might be used in connection with facts, verification, accuracy or anything remotely critical.

And I would have to say, pushing this out to the public without properly testing it and warning users that these are toys, not products, not services, not search engines, not anything remotely functional, is highly irresponsible.

There’s a reason the Cambridge Dictionary chose ‘hallucinate’ as its word of the year. Unfortunately even Cambridge University itself doesn’t properly understand the term: It quotes Wendalyn Nichols, Cambridge Dictionary’s Publishing Manager, as saying:

The fact that AIs can ‘hallucinate’ reminds us that humans still need to bring their critical thinking skills to the use of these tools. AIs are fantastic at churning through huge amounts of data to extract specific information and consolidate it. But the more original you ask them to be, the likelier they are to go astray. At their best, large language models can only be as reliable as their training data. Human expertise is arguably more important – and sought after – than ever, to create the authoritative and up-to-date information that LLMs can be trained on.

I would argue no; they are not only as good as their training data — they are worse than their training data, because they confabulate on top of that training data. And they are poor at churning through amounts of data to extract specific information, not “fantastic”. They essentially can’t tell what is specific or not.

Yes, they are good at going out there and saying “this is what I’ve found.” But they are (mostly) lousy at telling you specifically where they found it, what else they found that they’ve left out, and even judging whether what they’ve found is accurate.

I would love to be corrected about what I may have done wrong in my attempt, and I do recognise I’ve not played around with some of the extra configuration options within the roll-your-own ChatGPT functionality. But given I’ve done most of what I expect other ordinary users have done, I suspect my experience is likely to be somewhere near the norm.

Once again, we need to hold back before rolling out this kind of thing until these very basic issues are fixed. This is not a new game or beta feature in a niche app. This is real stuff, that real people may end up relying on for real world needs.

Generative AI: Another way at looking at our new overlord

By | November 14, 2023

We have grown sick of social media. But what comes next?

We have had nearly two decades of it now and it’s made a lot of us unhappier.

What’s coming next could go either way. In some ways generative AI — the flavour of AI in ChatGPT and its competitors — offers us something even worse, where technology locks us in a room with ourselves. In this world generative AI just amplifies the customisation of content to monopolise our attention. The vision that most compellingly captures this is E.M. Forster’s, where in The Machine Stops (1909) humans obsess over machine-driven prompts and lose the will and capability to connect socially.

E.M. Forster's The Machine Stops, imagined by ChatGPT, 2023-11-13
E.M. Forster’s The Machine Stops, imagined by ChatGPT, 2023-11-13

That’s not fun to think of. We know that leaving us alone with interactive technology rarely makes us better, more socialised people. We are susceptible to manipulation, we engage in a race to the lowest common denominator, we ignore our surroundings, we cannot look away.

Generative AI may well further entrench and monetise this.

Beyond the Milky Way

But there is another way: where generative AI offers us a chance to access, and contribute to, a perfect union of human and machine, allowing us to unlock the riches of human knowledge and solve some of the mysteries of our age. In this guise AI can unlock and mobilise everything that any one of us has discovered and give us insights into those things that still befuddle us: the quantum world, the extra-dimensional world, the physical world beyond our galaxies.

All this is up for grabs.

And no, I’m not lapsing into madness. Just exploring the less outlandish conclusions reached by a surprisingly well connected cohort of minds that intersect Silicon Valley government, academia, and religion. It’s best captured by Diana Walsh Pasulka in her just-published Encounters, one of the first academics to peer behind the curtain of respectability to find a hidden elite of thinkers who are ready to believe at least some of the less scientific research into extra-dimensional worlds, including those that could be from another galaxy or epoch.

This is an area to tread lightly on, for two reasons. One is that, as Pasulka puts it herself, much of ufology — the study of UFOs — is a “clown show.” You have to be careful out there. And secondly, there is a strain in Silicon Valley of elitism I would say borders on a übermensch fetish — the pursuit of a becoming a superman (for they are, mainly, men). The people Pasulka talk to are not, for the most part, of that world, but there is definitely an overlap.

An übermensch, imagined by ChatGPT, 2023-11-13
An übermensch, imagined by ChatGPT, 2023-11-13

Moving a muscle

Despite that, I believe this idea that GAI might unleash — and democratise — the sum of our knowledge is worth exploring. I believe it because generative AI has come at a point where we have hit a technological wall. We’re tired of social media’s narcissistic self loathing, the dumbed-down world where the loudest, crudest and cruellest mouth wins out. Where idealism — including Effective Altruism — becomes just another grift. Where our computer barely needs to move a muscle to do the things we ask of it, while at the same time vast hangers are set aside for processing the algorithms to keep us hooked on the next doom-scroll, to process the micro auction to decide who gets the privilege of flashing an ad to us, or processing the next block in a cryptocurrency transaction.

(To provide some scale, the online ad business accounted for a 10th of all energy consumption by the tech industry in 2014 (PDF; the latest year figures are available), crypto accounts for up to 0.9% of global energy usage, and social media accounts for 0.61% of the world’s CO2 impacts in 2019.)

The challenge we face, therefore, is huge. We somehow have to redirect all this computing to something worthwhile, something from which all of us can benefit. Which is why I cut those in Pasulka’s book who talk in these grandiose ways a bit more slack than I might have done previously.

While we focus inwards, they are focusing outwards, beyond the micro cents and algorithmic addictions, to something much more interesting. They see themselves as descendants of the Rosicrucians, an order of sages that combined mysticism with what we might today call science. One group called themselves the Invisible College, a term that modern-day ufologists have adopted for themselves.

I’m not a fan of this secrecy and elitism, but given the public (and academic) contempt for those who believe there might be “something out there” it does make sense. There are at least three Nobel laureates who have faced ridicule or persecution for their interest in such things.

The Invisible College, imagined by ChatGPT, 2023-11-13
The Invisible College, imagined by ChatGPT, 2023-11-13

Ripples never come back

But we don’t have to go all that way. GAI could help us unleash a new wave of learning by scaling up an under-sung contribution to learning: Youtube. This vast engine of what you might call distributed learning has gone largely unnoticed. It presents and delivers highly accessible content in a personalised format. The question is whether generative AI will magnify and scale up this learning or whether it will bypass it.

Let’s take the Youtube channel of musician Rick Beato. He has 3.8 million subscribers, who watch in live feed when he expertly breaks down a popular song to its constituent parts. A recent video on Genesis’ “Ripples” explains not only the song but the nature of chord progressions, training your ear to work out chords and melody, and on the importance of knowing music theory. He does all this while viewers pepper the chat room with comments, questions and requests.

So how would GAI impact this? On the scaling side, AI could answer the questions automatically, further personalising the delivery of information, and ultimately leading to greater and deeper specialisation as users dig further, with generative AI as a guide. In this model Beato becomes the entry point, the introduction, the inspiration for the creators themselves, quickly moving beyond Beato’s (impressive) knowledge of the subject.

Or there’s another way, where GAI bypasses this bubbling lava pool of knowledge. In this future generative AI would simply need a few pointers on what style you want your song to be, how many beats per minute, what key signature, what mood you want — and then hey presto! You have a piece of “original” content for you. This approach might well sap the desire to explore and master musicianship and to engender that heady feeling that you’re making something out of nothing — the most extraordinary human gift and achievement.

Toxic tunes

If AI can do it better than you then what motivation do you have for earning that? (This, by the way is already happening. Here’s a list of the top ‘AI music generators’ and a piece by two Andreessen Horowitz partners, whose vision is exactly as I described above: “Our ultimate dream? An end-to-end tool where you provide guidance on the vibe and themes of the track you’re looking to create, in the form of text, audio, images, or even video, and an AI copilot then collaborates with you to write and produce the song.”)

Almost certainly, both sides will find a way to stay in the game. But whereas we can still argue that social media was just an aberration, a failure on our part to figure out a non-toxic balance between addiction, privacy, business models and community, we shouldn’t just assume that things will be different this time. There is huge profits to be made from our social media addiction, and so we shouldn’t assume the same interests will be trying to prevail this time.

While I’m sure it’s a little idealistic to dream this way, I do think there’s a possible future that involves a hyper-personalised world of knowledge and betterment, without the isolation, mental health issues, and general troll-like toxicity.

The chances are greater if we’re paying attention.