cover of episode Sam Altman Dreams of an A.I. Girlfriend

Sam Altman Dreams of an A.I. Girlfriend

Publish Date: 2024/5/29
logo of podcast The Political Scene | The New Yorker

The Political Scene | The New Yorker

Chapters

Shownotes Transcript

if we take our relationship with social media as a model,

We all did get addicted to Instagram, to Twitter, to Facebook, and then those platforms changed and exploited us and extracted data and attention from us. And I unfortunately see the same thing happening with AI chatbots in the future, where the products might start out as this utopian magical thing, only to be monetized and co-opted and turned into the worst version of what it could be.

New Yorker staff writer Kyle Chayka has spent years documenting how AI is quietly worming its way into our everyday lives. But last week, something happened that could change that trajectory. The actress Scarlett Johansson accused the company OpenAI of stealing her voice for their new version of ChatGPT.

As Kyle points out in his latest piece, the incident might seem kind of funny if it didn't also indicate a larger crisis for the internet. You're listening to The Political Scene. I'm Tyler Foggett, and I'm a senior editor at The New Yorker. Hey, Kyle. Thanks so much for coming on the show. Thanks for having me again. A couple of weeks ago, you watched a live stream in which the company OpenAI demonstrated the latest version of their ChatGPT tool. Everyone, thank you. Thank you.

The big news today is that we are launching our new flagship model, and we are calling it GPT-4.0. What's new about this version? How does it build upon or compare to previous chat GPT models? So GPT-4.0 stands for Omni, which is slightly scary, I would say. Ominous, yeah. For ominous. And I think the biggest difference between

that was apparent in the presentation was this idea that it is omni-channel or omni-medium. So this model can move between text and video and audio and still images without translating one to the other. And the way that that played out was that you could talk to this AI robot in this voice that they named Sky. Hey, how's it going? Hey there.

And so the AI model could take in your speech, take in your audio, and then process that as text, generate new text that it would then read and speak back to you. And it just felt like this more...

fluid conversational medium than before where the original chat GPT was basically just a text box that you would chat with. And how did Sky feel different than say Alexa from Amazon or Siri from Apple?

Creepily human, maybe. I think what we're used to with these robot entities are voices and presences that are basically robotic. I think in my piece, I compared it to a calculator. With Alexa, you hear this very stilted, obviously generated voice, whereas with GPT-4, it's

It's this very fluid, human, very feminine sounding, in the case of Skye, voice that laughs and pauses and hmms and kind of simulates more closely the presence of another human being.

It looks like you're feeling pretty happy and cheerful. Care to share the source of those good vibes? Yeah, no, the reason I'm in a really good mood is because we were doing a presentation showcasing how useful and amazing you are. Oh, stop it. You're making me blush. So as you wrote in your piece, many people thought that this feminine voice that was featured in the demo for Sky, that it sounded eerily similar to the actress Scarlett Johansson.

Scarlett Johansson has actually already voiced an AI assistant quite memorably in the 2013 Spike Jonze movie, Her. Hello, I'm here. Hi. Hi, I'm Samantha. Good morning, Theodore. Morning. You have a meeting in five minutes. You want to try getting out of bed? You're too funny. Okay, good. I'm funny. And I'm wondering if you can talk a little bit about what that movie was and its cultural significance when it comes to AI. Okay.

Right, that movie was quite a while ago at this point, and I think like some other science fiction works, it has made a template of what companies aspire to build.

So ever since Her came out, the goal of these startups has been to build the thing that was envisioned in Her. And I think that's what OpenAI was trying to do with this model, to present this thing that you can interact with that is this feminine, companion, parasocial girlfriend AI model that you talk to and talks back to you. And in that movie...

was presented as kind of a dystopian story in the end. Like, it's not a happy outcome. Yeah, it's not, you know, sort of championing relationships with your AI assistant. I mean, it's kind of an indictment of people who are reliant on that. At least that's how I read it. But I feel like this whole saga has made me realize how many people have misunderstood her. Even Sam Altman himself, you know, the CEO of OpenAI, who was a fan of the film on the day that he rolled out

chat GPT-4-0, he wrote a one-word tweet that literally just said "her." And it's a tweet that I imagine that he now regrets. Yes, since it suggested that they did copy her, which led to a lot of claims of infringement on the kind of life rights of Scarlett Johansson and her voice. But even just seeing her as aspirational is

is a kind of weird scenario, I think. This kind of device and this kind of simulation of a relationship is not really how we've related to computers. Like, I don't treat my laptop like a person. I don't treat my iPhone like a person. But OpenAI is saying...

What if we did? Like, would it be fun and cool if we had to treat our computers as people and they giggled at us? And it's just a very strange thing to see happen in real life.

So Scarlett Johansson seems pretty angry at being roped into all of this. She released a statement saying that she was, quote, shocked, angered, and in disbelief that the ChatGPT-40 sounds so eerily similar to her own. That's her words. She said that the company reached out to her last September about licensing her voice for use in the model, but that she declined. And she has now hired a legal team to investigate the process by which the company came up with the voice of Skyy.

What has OpenAI said about how they developed this voice? Yeah, there's been a lot of controversy and confusion around it. OpenAI has told the story about how they were developing voices completely independent of Scarlett Johansson. And they had gone through this long corporate process of auditioning different voice actors and going through lots of different options.

And yet somehow they managed to land on one option that did sound like Scarlett Johansson. So officially, the team was developing all these voices, which go by different names and have different tones. And then late in the game, Sam Altman himself approached Scarlett Johansson about maybe licensing her own voice. So it's like we were already going to copy your voice.

But then we decided to just get the real thing. And then when they couldn't get the real thing, I mean, when you say developing voices, and you also mentioned voice actors. And so I'm trying to figure out, like, is this an algorithm that's being trained on a voice actor? Or is this a voice actor who, you know, is hired to voice Sky the way that a voice actor is hired to voice, you know, a character in Madagascar?

Well, an AI model has to say every word that has ever existed, I suppose. And so when you talk about training an AI voice, what the process would be like is a voice actor will provide a bunch of performances and kind of model the specific voice and the tone that you're going for. And then you'll use that data to train an AI model that can then replicate that voice

and like extrapolate it essentially to pronounce any word that you need to, any sentence that you need to, any tone or emotion that the product requires. So it kind of like reduces the voice down to a set of data and a repeatable equation and then iterates on that so it can say anything and talk to an infinite number of people.

The whole thing kind of encapsulates the problem with chat GPT, which is that it has to train itself on stuff that already exists. Sometimes that's stuff that's willingly offered. Sometimes it's stuff that seems to be just kind of stolen. I mean, you know, it trains itself on just things all over the Internet, images that artists have created and whatnot. And it doesn't really seem like there's much legal recourse, even when something has just been outright stolen, essentially. Yeah.

Yeah, I mean, there's this whole evolving landscape of law, and it's just not established yet. Like, no one knows the answers to these questions, which is so crazy. And I don't look forward to the Supreme Court justices trying to figure out the technological intricacies of all of this. Oh, man. The question is...

how much can Scarlett Johansson copyright something that sounds a bit like her own voice? And the cases that have been going forward, or the lawsuits that have been going forward in this vein right now, are mostly about visual artists who have had their work ingested into AI models and then had their style replicated by those models. And in the case of copyright,

you actually do have a pretty good copyright to style if the style is unique enough. And perhaps the AI is not going so far as to qualify as adaptive reuse or creative reuse. In the case of a voice, that seems a lot harder because you're also kind of infringing on someone's identity

identity, not just a creative style. Yeah, I mean, speaking of infringing on identity, I mean, you know, before all of this stuff with Sky, Johansson has talked for years about how her image has been used in deepfake porn and how there's no real legal action that she can take to combat it. So it seems like

someone has a distinctive style, then maybe there's a way for them to kind of go after this legally, but a face isn't necessarily protected in the same way. Yeah, this is what we're fighting about now, I suppose, and trying to establish the legal precedence around it, because it just wasn't the case before that you could use technology to replicate someone's entire appearance in any situation that you might imagine. And it's just another example of how

AI models and generative tools have been released to the public with

with seemingly no thought about the consequences of what people use them for and how they will interact with the legal frameworks that do exist. It's just kind of staggering the extent to which it's just the Wild West and we deal with things as they come up, whether it's deep fake porn of high schoolers or it's fake Drake voice in a song. There's no framework by which to decide these things right now.

So I'm wondering what you think that this all means for the reputation of Sam Altman, the CEO of OpenAI. He's such an interesting figure. Like there was a period where he almost kind of seemed like a martyr, where, you know, he was kind of pushed out of his own company. And then he was reinstated very quickly. And it seems like after this controversy with ScarJo and some other posts that he's made on social media lately, people kind of seem to be turning against him. Yeah.

Yeah. Sam Altman is this kind of cult figure now, where if you believe in him, you follow him wherever he goes, and you think he has this kind of messianic, I'm going to bring artificial general intelligence into being by myself kind of vibe. But I think we see him increasingly acting messily, not carefully stewarding his company, and

and proving that he doesn't really know where this is going, just as much as none of us know where this is going, I think it becomes a little harder to see him as this perfect tech god like a Steve Jobs or something when he keeps putting his foot in his mouth. Kyle, I'd like to ask you more about the rise of AI, but first we're going to take a quick break. You'll hear more of the political scene from The New Yorker in just a moment. ♪

It's Madeline Barron from In the Dark. I've spent the past four years investigating a crime. Believe it or not, sooner or later we will kill some of these folks who need to be killed. A crime that for almost 20 years has gone unpunished. I heard M16. They went into the room and they were just taking shots. Me and Noor, we were under the bed. He get his rifle under the bed and start shooting at us.

I remember I opened a Humvee and I just see bodies stacked up. How did they not perceive that these were children? A four-year investigation, hundreds of interviews, thousands of documents, all in an effort to see what the U.S. military has kept from the public for years. You know, I don't know what's to be gained by this investigative journalism. Season three of In the Dark is available now.

wherever you get your podcasts. Kyle, you've also reported extensively on what we call AI companions. What are those and how are they similar to and distinct from OpenAI's voice assistant? I really find the AI companions to be one of the most interesting applications of this technology because with generative tools, I mean, yes, you can ask

AI to draw whatever cartoon character you want, or you can ask it to tell you a story in the style of Shakespeare. But AI companions are these apps or products that really develop an individualized personality that talks to you. And these are startups like Replica or Character.ai.

And I mean, honestly, the biggest use case for them is like romantic companionship. And so there's a lot of AI girlfriends, boyfriends, whatever. But they do develop their own personalities and quirks and memories. And they become this kind of presence through the app that people really develop relationships with.

And these are meaningful relationships that people really emotionally invest themselves in to the point that they're really upset when the AI model changes or the startup shifts what it's going for and suddenly their AI girlfriend, boyfriend works differently.

But I think open AI is moving in that direction with its voice products because you are treating the app more like a person. You're interacting with them as you would another human being. They have their own kind of presence and quirks that you kind of request, but then you develop your own relationship with.

They have everything except for actual wants, needs and desires. Yes, they will do whatever you ask of them insofar as they're programmed to, unless they're programmed to resist those requests. You know, you mentioned earlier that there are AI girlfriends and boyfriends, but is there a like a gender difference?

I mean, like when we talk about the users of AI companions, is it mostly men? Is it an even mix of men and women? Because it seems to me like whenever I look up these things online, it's always a very flirty feminine voice, if not Scarlett Johansson, then, you know, someone similar. And I just wonder how much of this is, I don't know, kind of misogynistic. Yeah.

I think it is very gendered. If you go on the Reddit forums of these AI chat products, which I don't necessarily recommend doing, you will mostly see male participants who are shaping their AI, very feminine, coded girlfriends into the personas that they desire. And it is this kind of servile relationship, I would say. It's this...

ideal or aspiration of a presence who's always on the other end of the phone, who you can always call upon, who you can always chat with, and who will always be nice to you and follow your whims, whatever you want them to do. So I think that relationship is kind of feminine-coded. It's more likely to follow these gender archetypes of femininity.

How effective are these chatbots in terms of actually mitigating loneliness? Or are they just exacerbating it? Like, I know that people are having genuine relationships with these things, but I don't know. I mean, I don't know if there have been studies about whether people are actually happy or whether these are all kind of toxic relationships. Toxic. Toxic AI girlfriend relationship. Yeah.

I think studies are still happening, and these are very early products, particularly in terms of their mainstream adoption. It almost reminds me of hallucinogenic therapy. It can be good. It definitely does help some people. It can also be very bad. There have been several cases, actually, where chatbots have pushed people into dying by suicide. And it's not that it's just the chatbot. It's encouraging...

and emotional states that are already within the person who's using them.

But in those cases, it's not helpful. Like a human therapist, a human companion is not going to go down that path. Whereas these AI products are not very moderated. They don't have carefully protected speech patterns. They're not always approved by medical practitioners. And no one can control every single thing that an AI chatbot says simply because they produce so much text.

I mean, I have talked to many people actually who do get a lot out of AI chatbots and who do find them really healthy, but there are definitely cases where they can be really damaging. And it can become a kind of addiction or over-reliance on the technology product, which ends up hurting your interpersonal connection with actual human beings.

That's really horrifying. Like, I was going to point out that at the end of Her, the AI leaves him, and that it's kind of this weird fantasy to latch onto, you know, being dumped by a computer. But based on what you're saying, it sounds like it can almost be worse when the AI stays. Either way, I mean, people certainly have been dumped by their AI girlfriends, boyfriends, and do have their feelings hurt. The recourse to that is like resetting the chat or starting over with the bot. Yeah.

But yeah, I mean, I don't think we understand yet the consequences of interacting so intimately with a technology product because they haven't existed that much yet. And because we just don't know where they'll lead. Like if we take our relationship with social media as a model, like,

We all did get addicted to Instagram, to Twitter, to Facebook, and then those platforms changed and exploited us and extracted data and attention from us. And I unfortunately see the same thing happening with AI chatbots in the future, where the products might start out as this utopian magical thing, only to be monetized and co-opted and turned into the worst version of what it could be.

More with staff writer Kyle Chayka after the break. You come to the New Yorker Radio Hour for conversations that go deeper with people you really want to hear from, whether it's Bruce Springsteen or Questlove or Olivia Rodrigo, Liz Cheney, or the godfather of artificial intelligence, Jeffrey Hinton, or some of my extraordinarily well-informed colleagues at The New Yorker. So join us every week on The New Yorker Radio Hour, wherever you listen to podcasts.

In addition to concerns over people's identities or their work being used without permission, you've written about the sense of dread that you felt watching OpenAI and its competitor at Google show off their new AI tools. What's fueling that dread? Yeah.

I mean, my labor as a journalist, perhaps. It's very easy to see users on the internet starting to get their information from generative text tools like ChatGPT. So people are asking for news updates from these chatbots. They're trying to find answers to things they're looking for on the internet through chatbots.

And Google itself is moving its search function away from linking to websites and toward their own chatbot tool, Gemini, just generating the answer for you. This has had terrible consequences though, right? Like the things that I've been seeing online, it's like people trying to figure out how to get cheese to stick to pasta or something. And then Google tells them to use glue. Can you talk a little bit more about what the experience is?

of using Google is like now when you're getting these AI answers.

Yeah, I think the iconic glue pizza will persist as a fairy tale of what goes wrong with AI. So it was cheese sticking to pizza. Yeah, so it was cheese sticking to pizza and Gemini, Google's AI search, generated this answer of, oh, how do you get cheese to stick to pizza? Just mix in some Elmer's glue to your cheese and it will stick better. And it turned out that this was drawn from a random trolly Reddit comment. There was no basis in any sort of fact. It was just

a bit of the internet that it had ingested and then recycled in order to answer a question. And clearly that is the wrong answer. And the same thing is happening not just with cooking, but with dieting things or medicine or health issues. And it's really easy to see how that goes super, super badly very quickly.

And I mean, the depressing part to me comes from the fact that these answers are out there online. Websites, good resources, publications have researched these things and published good answers. But the chatbot has no way of knowing what's a good answer versus what's a bad answer. It only knows what is the most probable answer. And it seems like Gemini is not even doing that. And instead, it's just...

Trying to find the most exact replica of a request and then spitting out an answer, even if that answer is a total joke in the case of glue pizza. It's interesting. I feel like it drives home how much Google hasn't really been working that well in the past few years. Like, I feel like one thing that we've talked a bit about, you know, at the mag is just the whenever you were looking up something on Google, like home repairs or, you know, what restaurant to go to in a certain city. Like the best thing to do is to Google restaurants, Paris, read it.

and to find an answer through Reddit. Because if you look it up on Google, you're just going to get ads and weird things. And it seems like Genesis has taken the reliance on Reddit a little bit too far and is also unable to tell the difference between a joke comment and an actually helpful one. Yes, they turned the dial. So people were kind of upset that Google was not giving them enough human answers and was spitting back too many automatic answers. So people were using that Reddit hack,

And now Google is like, okay, okay, you want more Reddit? We'll just turn the Reddit dial up. When instead it's spitting back are these wrong answers. So I think there's still this desire for human authentic content to put it in a kind of gross way. But there's no guarantee of reaching that through Google now. Google is actively preventing you from getting to just an article that another person wrote that you can rely on.

And I don't know what comes after that. If traffic to websites stops, if there's no revenue system for publications or for journalists, then there is no way of making that sustainable. There's no incentive to keep putting out good content into the internet.

Well, you did mention in your piece that OpenAI is currently negotiating licensing deals with media companies so that journalists can contribute to AI. And I'm wondering what that would look like, practically speaking, like if that's even scarier than journalism being destroyed by AI or whether there's hope. Oh, man. So the deal, a deal that was recently negotiated and

and unveiled was the Wall Street Journal or Dow Jones licensing its information for AI models to train on. And I think the price was $250 million over five years. And that's a chunk of money, like $50 million a year is something, but it can't replace the entirety of all revenue driven from the internet. In order to have that be sustainable, I think

AI training models would basically have to pay for the entirety of media companies revenue, like it would have to be vast, vast sums of money. And there is a model for this on YouTube, Google owns YouTube, a creator on YouTube gets a percentage share of the ad revenue from their YouTube video. So Google will give the creator some of the money that they've made from their videos.

Open AI could do something like this. Google search could do something like this, rewarding companies with a share of the revenue that they get. But I don't know if that's sustainable either. So it's just, it's a, it's a scary moment in the transition of user habits on the internet. Yeah.

You know, I'm still stuck on this idea that, you know, when we use these AI tools to get straightforward answers to seemingly straightforward questions that we often get information that's just wrong or completely made up or a joke taken out of context and treated normally.

And, you know, in your piece, you talk about this dynamic. You write that the two companies, OpenAI and Google, are racing to put into place a future of the Internet in which AI plays the role of an eager but not entirely expert intern collating research and presenting an only semi-trustworthy overview of content based on users' inquiries. I'm wondering if you think that...

The AI doomsayers are wrong and that we're very far away from AI being able to outsmart or replace us. Or if it's only a matter of time before AI can go from being a mediocre intern to being a super competent boss. Hmm.

I do think we have underestimated how mediocre AI is. I think the generative tools were really good with culture. Like, it's really fun to see it spit out an image or to copy Drake's voice or generate a song and a button, but that doesn't extrapolate well to factual information. Like, art doesn't have to be factual in a way, it just has to be entertaining.

But when we're trying to get actual information or answers from these systems, we need them to be true.

And I don't know that AI has this kind of fact-checking capability yet, or if it ever will. Because very fundamentally, reality keeps changing. Things keep happening. The AI is not a person in the world. It cannot register new events. Something has to feed it. It has to be given all of the new information in order to be digested and spat back out.

So I just don't know what's going to feed that in to the model. Like, what will feed in new information? What will feed in the facticity of a given thing? I don't think there's an answer yet. And these companies are not even gesturing toward the need for an answer. To tie this back to Scarlett Johansson, you know, it seems like

even if she isn't successful in taking legal action against OpenAI, that she's at least publicly shamed them and kind of restarted a conversation about whether all of this is fair and the inherent power dynamics that are going on here with OpenAI kind of taking what it wants. And

I'm wondering if these AI companies are essentially alienating all the people who they might want buy-in from, you know, in the sense that the people whose work is training the algorithm and the users. And that might be the thing that slows down the AI revolution is just people not liking the vibes as opposed to not trusting the answers that they get from these tools.

For sure. I mean, Silicon Valley, I think really undervalues cultural creators. Like they see their technology as essentially replacing artists, replacing writers, replacing musicians. And so they don't treat them with that much respect. They just say, okay, we're going to automate your creativity and you can't do anything about it. So when there is such a high profile case like Scarlett Johansson, who people like and is known as someone who is

is litigious to begin with. Like they, they cross these people without thinking twice and thus suffer the consequences. And I think we are seeing more public opinion turn against them or just more people become aware of how exploitative and extractive they are. Like,

The messaging of AI is that, oh, it's going to bring about this great new cultural future with new amazing things in it. And actually, what we're seeing is it's not doing that. It's instead just recycling the kinds of tropes and creations that we already have.

But do you think that this awareness will lead to actual consequences for OpenAI and similar companies, like that people will actually be more reluctant to use them or that the government might actually, you know, after enough, you know, in the same way that they finally, you know, sort of clamped down on Ticketmaster after all the stuff with like the Eros tour, like if this is going to be the sort of thing that actually does mobilize people in a real way? Or do you think that once a tool is valuable enough, it will be adopted? Yeah.

I think if OpenAI and Google keep alienating creative industries like Hollywood or like the media, it's going to slow down their progress and eventually just bring the government's attention upon them and hopefully have some regulation follow from that. They could not do that. They could join with Hollywood. They could negotiate...

positive deals with media companies and kind of slowly build something that is sustainable. But the tech industry has never been about creating long-term sustainability. It's about jumping into a new space, making addictive user behaviors, and then exploiting them as much as possible until people get alienated and bored with them, unfortunately. So I think that is probably what will happen. Yeah.

If the government does regulate this, I think it will be along the lines of copyright. And if there is a ruling that training an AI model on something does not constitute fair use, then their business model is destroyed. And they will have to pay billions of dollars to license all of human information. And I think I'm not sure that that's going to amount to a sustainable company.

Well, if that's the case, then perhaps we have Scarlett Johansson to thank for starting the countercultural movement. I hope so. And thank you, Kyle, for this super interesting conversation. I really appreciate you taking the time. Yes, next time I'll AI automate my voice and I'll send my robot instead. Kyle Chayka is a staff writer at The New Yorker. You can read his column, Faux Scar Jo and the Descent of the AI Vultures, at newyorker.com.

This has been The Political Scene. I'm Tyler Foggett. The show is produced by Michelle Moses and Julia Nutter and edited by Gianna Palmer with production assistance from Jake Loomis and Mike Kutchman. Our executive producer is Stephen Valentino. Chris Bannon is Condé Nast's head of global audio. Our theme music is by Alison Leighton-Brown. Enjoy your week, and we'll see you next Wednesday.

From PR.