A version of this story appeared in CNN’s What Matters newsletter. To get it in your inbox, sign up for free here.
One distressing headline out of New Hampshire as voters prepared to cast in-person primary ballots was that a fake version of President Joe Biden’s voice had been used in automatically generated robocalls to discourage Democrats from taking part in the primary.
It’s not distressing because of the message, per se – Democrats had penalized New Hampshire for insisting on holding its primary Tuesday. Biden was not even on the ballot, although a write-in campaign was launched on his behalf.
The distressing thing is that while the audio appears to be fake, it sounds just like the president and even uses his trademark “malarkey” catchphrase. Listen to it here.
Who created the audio? Who is behind the robocall? There have been warnings for years about a new era of deepfakes being used to manipulate US elections, from within or outside the country. That era is undeniably here.
I talked to Donie O’Sullivan, CNN’s correspondent covering both politics and technology, who has been on this story for years. Excerpts of our conversation are below.
WOLF: You cover this stuff every day. What crossed your mind when you saw this headline?
O’SULLIVAN: It’s probably one of the first of many, many, many headlines and stories like this that we are going to be seeing this year.
If we’re not already in it, we’re on the precipice of an explosion of AI-generated disinformation.
It’s most likely that this audio was created using artificial intelligence. I think we’ve all kind of become familiar with the fact that this technology is out there, but I think we’re ready to kick into gear with this current election campaign.
Importantly, there’s been a lot of focus on fake videos over the past few years.
But from the experts and other folks we’ve been speaking to, I think there’s a big concern about audio.
WOLF: Terms like “deepfake” are probably new to a lot of people. What is the basic glossary people need to know to stay on top of things?
O’SULLIVAN: “Deepfake” normally refers to a fake video that has been created using artificial intelligence. It is basically a fake video that looks very realistic.
Over the past few years, artificial intelligence has made the creation of fake images, videos and audio way easier.
AI makes fake videos, images and audio differently than how it has been done traditionally, which was by slicing audio together or by Photoshop. This is the machines, the computers themselves, making the images and audio.
This is a fundamentally different type of technology, and it is a lot more realistic.
WOLF: What do we know about who might be behind this? Not this specific instance, because we don’t know, but what do we know about who is doing this type of thing?
O’SULLIVAN: Anybody and everybody. The challenge we’re going to have in 2024 is that really anybody can make these pretty convincing deepfakes.
A few years ago we did stories about the threat of this type of technology. But then it might only be the likes of a nation-state, Russia or China or somebody else who really has access to this technology.
I think we’ve all seen over the last 14 months or so with ChatGPT that now we all tend to have access to this crazy and powerful technology.
I will give an example for this time last year:
I tried out fake audio creation software, which is widely available online, and then called my parents back in Ireland with my fake voice, and my dad fell for it. My mom kind of knew something was up, but she wasn’t quite sure what was going on. My dad fell for it and had a full conversation with my AI voice.
We were able to make that just by taking a few minutes of a recording of my voice. So basically, you can take anybody’s voice, you can pick up all the clips online of candidates, and you can essentially get him to say anything.
CNN reporter calls his parents using AI voice. Watch what happens next
WOLF: I don’t think anybody expects this robocall to change the outcome of an election where Biden isn’t even on the ballot. What is the ultimate threat?
O’SULLIVAN: We’re seeing underlying reporting already about how this type of technology is being used in scams, particularly to target people making it sound like a loved one is calling them and getting them hand to over money. Things like that.
I think if you just look back at political campaigns – 2012 and Mitt Romney’s 47% comment (when he argued that nearly half of Americans were dependent on the government), which was caught on audio. Donald Trump of course, with the “Access Hollywood” tape back in 2016.
All throughout modern American political campaign history, audio and tapes have played fundamentally important roles in campaigns. The concern is that tapes start emerging online that make it sound like Biden or Trump have said something that they didn’t really say, maybe something that’s quite incriminating.
Now, obviously, we have checks and balances in place for that. We consult AI experts and other digital forensic experts that can say, well, this doesn’t sound quite right here. There’s also some technology that’s been developed to try and detect these fakes.
We all know that now in this modern era of mis- and disinformation, even when something is fact checked, millions or tens of millions of people can hear a piece of fake audio before that happens, and it can still place doubts in people’s minds.
WOLF: So what is your advice to people? How should they approach something they see?
O’SULLIVAN: Especially since 2016, over the past decade, I think a lot of us, particularly people who are more and more clued into politics, we’re getting used to this era of mis- and disinformation and knowing that you can’t trust everything you read on the internet.
But now you can’t trust everything you physically hear or watch on the internet or elsewhere either.
That is easier said than done. It is just a general, absolute – being diligent, getting your information from reliable sources, etc. Here’s the thing: I do think that this type of technology allows for a profoundly different form of disinformation.
Because it’s one thing to read something, but if you hear a tape or you watch a video, I think that resonates in a very different way.
There is a lot of disinformation particularly on the political right, but anybody can fall for this stuff, and particularly if it’s playing into an existing narrative.
We all have people in our families who are one way politically inclined, and we like to think it’s the other side who is getting misinformed, but I don’t think that’s the case. And especially with this kind of technology.
WOLF: Have we seen the other side of things where someone created a deepfake to essentially lie about themselves in their own interest?
O’SULLIVAN: I haven’t really seen an example of that.
What I will say is the most mind-bending, dystopian element of all of this is that we all need to be aware that this technology is out there – so we’re living in this kind of new reality, or unreality, as people become more aware that this technology exists, that gives politicians and others the ability to deny something that might really have happened.
For instance, you can imagine that if the “Access Hollywood” tape arrived in 2024, Trump could say that’s a deepfake.
In addition to creating the possibility of fake scenarios, it also allows people the space to deny real scenarios.
There is a tape that is now being investigated by some federal authorities of (pro-Trump political operative) Roger Stone allegedly speaking about assassinating members of Congress.
He denies that he ever said that. So you can already see this kind of defense being utilized.
WOLF: The next-generation version of “my Twitter account was hacked” when it wasn’t.
WOLF: The frightening thing to me is that we only know what we know. We know this particular robocall exists, but we don’t really know what else is out there. We can’t say for sure where it came from or who is behind it.
You described a lag time of being able to verify something when people are exposed to it. And that’s an important thing. We assume it was to push down turnout for a write-in campaign for Biden, but we don’t know that to be true. It’s a mystery. And the more of these things that are out there, we won’t have any idea.
O’SULLIVAN: I think because this stuff is so easy to make, especially when it comes to audio, this is not just going to happen on a presidential level. This can happen all the way to dogcatcher.
If something happens involving Biden and Trump, there are going to be lots of eyeballs and people calling it out.
But I think it’s going to be at every level – state, county, township, city – and that’s going to be harder to catch.
If you want a visual example to point to, there was a lot of disinformation on the elections in Taiwan a few weeks back. There is a video of a US congressman that turned out to be a deepfake where he’s saying he’s soliciting votes for Taiwan’s presidential candidate.
WOLF: This has been a very depressing conversation. Is there an optimistic note you can leave us on?
O’SULLIVAN: You can take some bit of comfort from the fact that my mom was able to figure out that something was up with the deepfake.
I think people are also becoming more aware. Don’t trust everything you read on the internet. And I’d like to think as a society we’re getting a bit better at being a bit more skeptical.
The fact that we’re having these conversations now in January is better than all of us learning about this technology in September or October of this year.