The Real Threat from AI

By | January 23, 2023

We are asleep at the wheel when it comes to AI, partly because we have a very poor understanding of ourselves. We need to get better – fast

2023-01-27 Clarification: I refer to ChatGPT throughout but it would be more accurate to call the interaction as being with GPT-3, the underlying technology driving ChatGPT, which I’m told lacks some of the ‘smoother’ elements of ChatGPT. What I was interacting with below is a rawer version of ChatGPT, without the lip gloss.

It’s not hard to be impressed by ChatGPT, the dialog-based artificial intelligence developed by OpenAI. One technology writer of a similar vintage to myself, Rafe Needleman, called it

the most interesting and potentially most powerful technology I have ever seen since I started covering technology in the late 1980s.It is going to change the world–for good and for bad.

But AI is a slippery beast. We are here now, not because we have overcome the problems of those who conceived of the idea, but because of the explosion in computing power, data storage, and data itself. That combination is, largely, what is driving us so far down this road. Throw your algorithms at enough data, tweak, instruct those algorithms to learn from their mistakes, and zap! you have software that can distinguish cats from dogs, a stop sign from a balloon, Aunt Marjory’s face from Aunt Phyllis’, that can create images in response to a text instruction, and can research, summarize, write and all the things that people have been trying with ChatGPT.


Sound check

Of course, we are always going to be impressed by these things, because they are remarkable. We use AI all the time, and we are grateful for it, until we take it for granted, and then we get frustrated that it doesn’t perform perfectly for us. And herein lies the problem. We harbour this illusion — fed us by marketers and evangelists of AI — that while these products are always in beta, they are sufficiently consistent that we can depend on them. And the dirty truth is that we can’t and we shouldn’t. The danger of AI putting humans out of work is not because it will be infallible, but because we somehow accept the level of fallibility as ‘good enough.’ We are in danger of allowing something to insert itself into our world that is dangerously incomplete.

You might argue that, with ChatGPT, we’re already there. (Note 2023-01-27: I use ChatGPT throughout but I want to clarify that I was in fact interacting via a WhatsApp interface with GPT-3 via an API, not with ChatGPT directly. I will write more about this later.)

Let me show you with a recent experiment. I started with a few topics that interest me: the manipulation of the mind, the use of mechanical and electromagnetic waves as weapons. How much would ChatGPT know? I asked it (ChatGPT doesn’t have a gender) about TEMPEST, MKULTRA, and Havana Syndrome. performed pretty well. But then I asked it about something that had long intrigued me, but I hadn’t really been able to stand up: Hitler’s use of sound, both within human hearing and outside it, as a tool of social control:

Image.png
Image.png

That’s a pretty good answer. (I can confirm all the screenshots are with ChatGPT, via a WhatsApp interface here.) So good, I wanted to follow up on ChatGPT’s sources:

Image.png
Image.png

Impressive. I had not come across any of these papers, and found myself thinking I needed to do my research better. Until I started looking them up. I am more than happy to be corrected on this, but I could find none of these references in the journals cited. Here’s the first one: The Historical Journal: Volume 44 – Issue 3. Nothing there I could see suggesting someone wrote about Hitler’s use of sound in politics. Same thing with the second: The Journal of Popular Culture: Vol 42, No 6. Nothing matched the third one, but the complete reference was lacking — all of which made me suspicious. So I asked for links:

Image.png
Image.png

When I told it the link didn’t work, it apologised and sent exactly the same link again. So I asked for DOIs — digital object identifiers, a standard that assigns a unique number for each academic paper and book. Those didn’t work either (or sent me to a separate paper). That was when things got weird:

Image.png
Image.png

That came across quite strong: You’re wrong, but if you think you’re right, I can offer you something else. No self-doubt there — except on my part. So I took it up on its offer of additional references. All of which I couldn’t find. So I asked why.

Image.png
Image.png

Clearly ChatGPT wasn’t going to accept that it was making stuff up. It’s your fault; you’re in the wrong area, or there are copyright restrictions, why don’t you head off to a library? Or they’ve been published under different titles, or retracted. Try searching. I’d lie if I said that by this point I wasn’t somewhat discombobulated.

Driven to abstraction

So I figured: Perhaps, given ChatGPT’s reputation for creativity, to just ask it if it could dream up an academic reference. I asked it to make stuff up.

Image.png
Image.png

So there is some line it won’t cross. But what line is it? How can it be creating fake references if it says it is not programmed to do that? So I took a middle course, asking it to write up an academic abstract about something real but with a conclusion that had yet to be proven — and to include a key statistic that I just made up.

Image.png
Image.png

Not bad. Not true, but convincing. Even if it wasn’t true. And it surely knew it had created something artificial. So maybe now I could prove to it that it was making stuff up because it would have to fabulate some citations if I asked it to. So I did, and it responded with three publications. Were those real, I asked it.

Image.png
Image.png

So that was a specific denial. Jane Doe, though. Really? I asked for links. And when they (well, actually, there was only one, which was a dead link and a non-existent DOI) proved fallacious, I asked how come it had found real references for a non-existent (and falsely premised) paper?

Image.png
Image.png

Clever. But it felt increasingly as if I was trying to corner an octopus. It made perfect sense that it might use real sources for the fake paper I asked for, but somehow it would not accept that those sources themselves were fake. In other words, it knew enough about fakery to be able to do it, but apparently not enough to recognise when it faked things without being asked to.

Hallucinating

It was clear it wasn’t going to concede that her sources of information were non-existent. So I wondered whether others had found anything similar, and they had. This reddit thread from December where the writer was baffled that ChatGPT was throwing up references the writer had never heard of.

However, I consistently get wrong references, either author’s list needs to be corrected, or the title of the article doesn’t exist, the wrong article is associated with a wrong journal or the doi is invalid.

For them, only one in five cited references was accurate. A similar thread on ycombinator offered more. Users discussed several possible explanations including something ‘hallucination’, where AI offers “a confident response by an artificial intelligence that does not seem to be supported by its training data”. OpenAI has acknowledged this problem, but the blog post itself doesn’t explain how this problem occurs — only how it is trying to fix it, using another flavour of generative pre-trained transformer, which is what GPT stands for, called InstructGPT, which it turned out didn’t do much better at not making sh*t up.

I did ask ChatGPT whether she was hallucinating. That took me down a whole different rabbit-hole of tautologies and logic:

Image.png
Image.png

So ChatGPT is essentially in denial, and admits that it wouldn’t even know whether it was lying. I tried another tack. Can ChatGPT tell between real and fake. Yes, it said, and if I don’t know something I’ll tell you.

Image.png
Image.png

I gave it one more try. Maybe I could trick it into reading back the reality that hallucination was a problem.

Image.png
Image.png

No, said ChatGPT. I am not hallucinating, and if you asked me to, I couldn’t do it.

I felt by then I had hit a wall, but also proved my point. ChatGPT appears to be aware of its limits — ‘I would not know if I were hallucinating’ — but also unable to recognise how that contradicted its other statements — that it could not be hallucinating now because it (believes, or has been programmed to say it) was programmed only to deliver ‘accurate and reliable answers based on the information provided.’

Gaslit

So what is going on here? On one level it’s just a reflection of the beta nature of AI. Nothing to see here! After all, we know that sometimes Aunt Marjorie’s face gets confused with Aunt Phyllis’, or with a traffic cone. But this is a whole lot of different. ChatGPT was not willing to accept it had erred. It either didn’t understand its limitations, or did, but was not willing to acknowledge it. But the process of chatting with a bot suddenly went from pleasant — hey! Another friend’s brain to pick! — to being extremely sinister. I wasn’t trying to goad it into doing something anti-social or offensive. I was there to understand a topic and explore the sources of that knowledge. But ChatGPT was no longer there to have a reasoned dialog, but was actively and convincingly manipulating the information and conversation to, essentially gaslight me. That was extremely disconcerting.

This is where I believe where the peril of AI lies. Humans’ greatest weakness is the two-sided coin of conviction and self-doubt. Some of us are convinced that we witnessed things that we didn’t, that we saw things we didn’t, that a lie is actually the truth. It becomes harder over time to work out what is or was real and what isn’t, or wasn’t. And on the other side of the coin we are prone to doubting things that we did experience. Did we really see that guy fall of a bicycle? Did I really turn the gas off? Did Hitler really exterminate millions of Jews and Romani? These two ways are the easiest to manipulate — we can quickly build self-conviction if the reinforcing mechanism is strong enough, just as we can easily be manipulated into doubt by the same mechanism in reverse. Here, I believe, is where AI is at its most dangerous. Artificial intelligence may help us identify illnesses, assign resources efficiently, even cross the road. But it must not be allowed to be in a position to persuade us. Out of that darkness come dreadful things.

Unfortunately, ChatGPT has demonstrated we are at that point much earlier than we thought. So we need to think fast. AI’s flaw is a fundamental one, baked in at the start. It is not only that it is not indefatigably right. It is also because it doesn’t know whether — and why — it’s wrong. Or even whether it could be wrong. Yes, we can get ChatGPT to admit it’s got a fact wrong:

Image.png
Image.png

But it has also showed that it is programmed to push back, to argue the point, adopting confident language I would argue is dangerously close to gaslighting. This is where things become seriously problematic. At stake is our ability to recognise where this gray area in our psyche meets AI.

The lesson

So what can be done?

Part of the problem, I believe, can be found in OpenAI’s limited understanding of the contexts in which their AI might be used. It says of the language models deployed as the default language for versions of its GPT:

Despite making significant progress, our InstructGPT models are far from fully aligned or fully safe; they still generate toxic or biased outputs, make up facts, and generate sexual and violent content without explicit prompting. But the safety of a machine learning system depends not only on the behavior of the underlying models, but also on how these models are deployed. To support the safety of our API, we will continue to review potential applications before they go live, provide content filters for detecting unsafe completions, and monitor for misuse.

In other words, OpenAI recognises that this technology, as it stands, cannot be controlled. That leaves only two options: to bin it, or, as they put it, to control how the technology is deployed, and provide ‘filters’ — think censorship, essentially, where certain kinds of prompts and instructions will not be obeyed.

Recognition of the problem is a good thing, of course. But I fear the developers both misunderstand the problem and its scale. For one thing, it states that while

[w]e also measure several other dimensions of potentially harmful outputs on our API distribution: whether the outputs contain sexual or violent content, denigrate a protected class, or encourage abuse. We find that InstructGPT doesn’t improve significantly over GPT-3 on these metrics; the incidence rate is equally low for both models.

For me the incidence rate was far from “low.” And why are they lumping “making up facts” with generating “sexual and violent content” and “toxic.. outputs”? To me it suggests OpenAI hasn’t quite understood that making up facts — and refusing to concede they are made up — is a whole lot more dangerous than offensive language. We generally agree on what offensive language is, roughly, but as I’ve tried to argue, we have no filter for what is real and what isn’t.

This isn’t a censorship or ‘filter’ problem. It’s an existential one, that goes to the heart of being human.

9 thoughts on “The Real Threat from AI

  1. Jon Petersen

    Thanks Jeremy, very interesting. This needs discussing over a pint.

    Reply
  2. Telibert

    Well done, Jeremy! Thanks carving the path about how to treat AI. I am sure you were not hallucinating! (Joke!) I’ll share this with my undergrad class. Perhaps I can get you to speak to a group about your pursuit on AI?

    Reply
  3. Andrew

    As I read this article I kept thinking of the arguments that Dave would get into with HAL in “2001: A Space Odyssey”. The robot would push back on anything Dave insisted or asked it, in the most placid, creepily unbothered tone, even when HAL was trying to kill him. Terrifying.

    Reply
  4. Peter Burrows

    Wow. That was an incredibly instructive (and worrisome) article. Thank you.

    Reply
  5. Thaung

    What a timely article. I am presenting on this topic internally (with colleagues) next month. I had noted down “Misinformation” as one of the main points. Your research definitely will inform us.

    I also had 2 follow-up questions and I’m not an expert, so what I ask might be total rubbish.
    1. How does data masking work with text and data for ChatGPT? I assume things like addresses, phone numbers and sensitive info / personal data may be masked or randomised/anonymised. Then is it possible that exact journal issue numbers and page numbers may end up being masked?

    2. How does the information in a different language get used and thrown back out in another language? Specifically looking at one of the seemingly fictitious references, a Dr Thomas Weber was cited. He may or may not be the same Dr Thomas Weber who is a German historian, who’s written books (English and German) on Hitler. Could he have also written articles in German about how Hitler might have used sound in politics, but in German language, and in another publication?

    Whatever the answers are, this is a truly fascinating area. Keep doing the good work, Jeremy! Would be awesome if we can muse over this and other topics over coffee in Switzerland if you are swinging by.

    Thank you!

    Reply
  6. Grace Chang

    Thanks for this post! I noticed the same issue with the references when I was playing around with this recently. The scary thing is that the references that I couldn’t find seem so realistic because I know the researchers cited do the kind of research mentioned, and the journals cited are real journals.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.