Dario Amodei, C.E.O. of Anthropic, on the Paradoxes of A.I. Safety and Netflix’s ‘Deep Fake Love’

Publish Date: 2023/7/21

Chapters

This podcast is supported by KPMG. Your task as a visionary leader is simple. Harness the power of AI. Shape the future of business. Oh, and do it before anyone else does without leaving people behind or running into unforeseen risks. Simple, right? KPMG's got you. Helping you lead a people-powered transformation that accelerates AI's value with confidence. How's that for a vision? Learn more at www.kpmg.us.ai.

Casey, I just want to note that you have procured what I think is probably the most hardcore fidget spinner ever to exist. It looks like something, like a prop from a death metal show. It's got flames coming out of it. This is not a normal fidget spinner. I would describe it as looking like a skeletal bird where it has these sort of very sharp bones.

beak-like outcroppings. And it does look, if I hurled it with enough force, I could break your skin. The message that it's sending, at least to me here in the studio, is not only I am extremely anxious, but also I will kill you. That's the sort of energy I try to bring, you know, is a sort of very hostile, murderous energy. And that's why we're killing it. ♪

I'm Kevin Roos, a tech columnist at The New York Times. I'm Casey Newton from Platformer. And you're listening to Hard Fork. This week on the show, Anthropic CEO Dario Amadei stops by to tell us what it's like to run the most anxious company in AI. And then we review Netflix's new deepfake reality dating show. And God help us all.

Casey, welcome back. We are back from vacation. Well, it's nice to see you again, Kevin. How have you been? I'm good. How was your break? Was it relaxing? Do you feel refreshed and invigorated? No, I moved my house and immediately inherited like 47 different problems related to smart home automation.

Well, if it makes you feel any better, my vacation was also interrupted because not only did I have our special episode the first week with Adam Masseri from Instagram about threads, my second week of vacation was interrupted by a story that I had to rush out to publication, which was about a topic that we're going to be talking about on the show today, which is the AI company Anthropic.

Now, Casey, as you know, I have been spending a lot of time at Anthropic. I would go so far as to say the amount of time you've spent at Anthropic is starting to hurt our relationship. And I hope you're done visiting this company because I'm frankly tired of hearing about your visits there. Well, I don't know if I'll be invited back after this story, but I did spend a lot of time there.

And there are two reasons I decided to spend so much time at Anthropic. One is they invited me, right? Very rare that a tech company and especially like a company building large AI models says to a reporter, like, come on.

come in, interview everyone, sit in on meetings. They gave me a lot of really interesting and deep access, which is very unusual in our line of work. Yeah. Usually these things are this very, very manicured, it's like you're going to have one coffee with this person and one coffee with that person, then any other questions, you can email us, right? But you actually got to hang out with these people. Yeah, which is a pretty rare opportunity, especially for a company like Anthropic that

just hasn't done a lot of media. They don't really do the kind of big press tours that someone like Sam Altman at OpenAI has done. That was one of the reasons I said yes, because they invited me. But I also think it's just a very unusual company, right? Basically, Anthropic was started by a group of OpenAI employees, including some of their senior technical people who left the company in late 2020 and early 2021.

The story of what happened there isn't really all that clear still, but basically there were disputes about safety. Yeah, they had some creative differences. Right, and so they consciously uncoupled from OpenAI and started a new company called Anthropic. And Anthropic, I would say, is sort of one of the top handful of companies

AI labs in America, really. I mean, their models are considered sort of in the tier that things like ChatGPT and BARD and other leading models are, but they're much less high profile. And their culture is just much different from any other AI lab I've ever encountered. So the first thing that I noticed there was that they were just all extremely anxious, like very, very worried. Not that like their models, you know, are going to break in some way, but just

sort of existentially worried that the thing that they're doing, which is building these large AI models, could in some way contribute to the extinction of humanity. Right. And of course, we've heard that from other folks in the AI field. Sam Altman has said that on this show. But it sounds like what you're saying is that even at the rank and file levels, this stuff was just kind of coming up a lot and not just when a reporter was saying, hey, is this stuff safe? Yeah. I mean, I sort of

worried when I got invited there that they were just going to show me this very rosy picture of like, here's all the things that our model Claude is doing and here are all the ways that could help tutor students or help scientists cure disease or something like that. The same spiel that you get from other AI companies.

And instead I had sort of the opposite problem where I like showed up wanting to learn more about what they're working on and all they wanted to talk about was doom and like how AI could end the world. And in my article, I said it was like being a food writer who shows up to like write about a trendy new restaurant and like all the kitchen staff wants to talk about is food poisoning. It was a very bizarre reporting trip.

in the sense that, you know, what I expected and what I got just ended up being very different. So two questions. One, why do you think Anthropic wanted you to come in and tell this story? Are they starting to feel a little bit left out of the conversation? Or what was the message you think they really wanted to get out? Yeah, so they had this big release that happened last week. They put out

the second version of Claude, their AI language model, and they sort of built a chat GPT style kind of web interface for the first time. You know, Claude had been available through apps like Slack or Poe, which is a Quora sort of chatbot aggregator. So you could use Claude, but not sort of in the way that people are using chat GPT. So

I think they are starting to feel kind of like left out of the conversation, but I also think that they're trying to sort of blow the whistle on a lot of what's going on in the AI world right now. This sort of accelerating race toward more and more powerful AI is something that they have been worrying about for a long time. And so I think getting themselves out there now

now to them feels like a good opportunity to sort of spread the gospel of AI safety. - My second question for you, having gone through this reporting process and written this story, do you feel like you came around more to their way of thinking where you feel more doomerish or did you leave thinking, "Hmm, these people are a little tilted"?

I certainly started off the process, like thinking, whoa, this is a lot of doom and gloom. And it's very jarring to like walk around what looks like in many ways, like a normal tech company office. You know, they've got this like airy office in San Francisco with all these plants and it's like, there's whiteboards everywhere. One detail that didn't make it into my story, but that I thought was notable was, you know, liquid death, the brand of sparkling water. So someone in the

at Anthropic, has built an enormous tower of empty liquid death cans that just sits in the middle of the office, and you kind of have to step around it to get anywhere. And they did not take that down before I came for my visit, which I thought was honestly kind of admirable. And by the way, if you live somewhere where liquid death is not available, it's kind of like a joke or meme brand. And the whole idea is you're just drinking water out of a can, but it has sort of

heavy metal graphics on it. So if you just want to have something at a party and look like you're maybe drinking an alcoholic beverage, but not really, Liquid Death is there. Right. So Anthropic with its Liquid Death tower and its whiteboards and its plants, it looks very much like a typical sort of startup office that you or I would go visit. But the sort of anxiety there, the level of just doomsaying and just caution, it really did sort of

of make me more anxious at first because it was like, oh my God, if these people, the ones who are actually building this stuff are this anxious, like how worried should the rest of us be? And I think by the time I spent, you know, hours and hours there, talked with a bunch of their people, I sort of got more comfortable with the anxiety. And by the end of the trip, as I wrote in the article, like I actually kind of started to find it reassuring. Like I

I kind of want the people building these powerful AI models to be anxious about it. Like, I think a little bit of that is healthy because this stuff is scary and it does have the potential to cause harm. And I actually do feel like they are taking the potential problems of AI very seriously. I think some would argue too seriously, but I sort of wonder like,

if the people who built social media companies a decade ago had been this worried about the effects that they could have on society, maybe the past decade would have gone a little better. Almost certainly it would have, right? There were so many questions that nobody thought to ask until it was too late, or to the extent people were asking them, it was not within the corridors of power at Facebook and Twitter and YouTube. Right. So there's a lot more in this article, and we'll link to the article in the show notes, along with some more information about

Claude, which is Anthropic's chatbot, and people can check that out if they're interested. But today, I actually wanted to have Dario Amadei, the CEO of Anthropic, come in and talk to us because he's a pretty unusual figure in the AI industry. He has worked at almost all of the major AI companies. He was a researcher at Baidu and Google and then was part of the team at OpenAI that built GPT-2 and GPT-3 and then was sort of part of this exodus by the founders of Anthropic.

And he really has seen the industry from a lot of different angles. And he has been worried about AI safety since before that was cool, since before anyone basically was worried about AI safety. And so I thought it would be a good opportunity to sort of ask him not only why is Anthropic building AI the ways that they are, but also why are they building it while also warning of the potential for catastrophic harm? All right, well, let's bring him in here.

Dario Amede, welcome to Hard Fork. Thanks for having me. So before we get into all of the work that you're doing at Anthropic, I want to talk first about what convinced you to get interested in AI safety in the first place. Can you tell me about your background? I know you were a physicist and then got interested in AI and started working on AI safety, but what

What was the first time that you can recall sort of becoming concerned about the potential destructiveness of AI? So what were you doing at the time and what specifically gave you pause? Yeah, so I think my interest in AI and my interest in kind of concerns about AI probably go back to the, you know, the same amount. The first moment I can remember is I read Ray Kurzweil's book, The Singularity is Near. Hmm.

I think it was in, you know, 2005 when it came out. I was a Stanford undergrad. At that time, it was sort of very disreputable. You know, I like to joke that kind of people read it in secret, you know, in their dorm room because, you know, there were just a lot of kind of crazy sounding things in it. And those days, the idea that things would change so fast was,

was really very radical. And, you know, I think a lot of the stuff has kind of played out not too far from what I would expect. And, you know, I just thought about, wow, we're going to build these systems that are more powerful than humans. You know, that might happen in only 20 or 30 years. And, you know, that's going to raise all these issues. I'm not sure I thought

specifically about like the AI autonomously taking over. But I thought, well, there's that. And there's also, you know, this is going to be a great source of power in the world. Nations are going to fight over it. People are going to misuse it. It might, of course, also misuse itself. But just this idea that this very powerful thing was coming into the world was both exciting and concerning to me.

Yeah, and when I was researching you in Anthropic for my story, I found some footage of you at Google years ago, sort of giving talks about the potential dangers of AI.

At the time, that was not sort of a mainstream position, right? It was kind of weird to be freaked out about AI because AI at the time was not very good at stuff yet. So was it sort of a fringe, lonely thing to be worried about AI at Google back in those days? Or did you feel like you had a growing chorus of people who were also getting worried about it? It was a little fringe, but we very much tried to take the position of kind of trying to describe the concerns in a way that would connect

as much as possible to today's systems. I mean, my perspective on it, and I think this has gone through all work at Anthropic and OpenAI has been, look, there are these things that might happen in the future. Let's try and connect them as much as we can to what's happening now. Let's try and get

practice and deal with things today that are as analogous as possible to what we will eventually face in the future. So, you know, when I was at Google, I and some co-authors, some of whom became Anthropic founders eventually, after also being at OpenAI with me, wrote this paper called Concrete Problems in AI Safety. And, you know, as the title indicates, we tried to tie it to the capabilities of the neural nets of that

day of 2016 when we wrote it. And a lot of the focus was on, hey, these systems are inherently unpredictable. Controlling them is different than controlling, you know, expert systems or whatever used to exist 20 years before then. And, you know, I think that that kind of made sense at the time. It was still an unusual thing to be working on, but we tried to describe it in a way that would make sense to people at the time. And I think to some extent we succeeded.

And then from there, you went to OpenAI and became one of their first sort of machine learning researchers. You helped lead the teams that built GPT-2 and GPT-3 and eventually did some of their...

safety research that became very well known and important within the industry. But then something happened in 2020 that made you decide to leave. So what was it that pushed you over the edge and convinced you that the work that you were most interested in could best be done outside of OpenAI? Yeah, so I think, you know, what I would say is I think there was a set of

folks within OpenAI that me and my sister, Daniela, were leading that had what I might describe as the whole picture. We were real believers in this idea of scaling things up. That's where GPT-2 and GPT-3 came from. But as we just talked about with safety, we were also believers in the idea that scaling it up wasn't enough, not only just to solve the problems of AI even, but that we had all these concerns about it.

We had Jack Clark who had been head of policy at OpenAI. We had Chris Ola who worked on the mechanistic interpretability side of safety. We thought that altogether we had this unified view of how to do this stuff right. We all trusted each other. We all felt that we work together well and that we had the same values and that it just made more sense to do this on our own.

many ways I feel we could influence the behavior of other organizations and other leaders better by being an independent organization than we could inside the organization that we were previously. I would love to hear about some of the really concrete ways that you wanted to build a different kind of system. A different kind of system in terms of the- Well, one that reflected the values that you felt like were missing where you used to work. Yeah. In terms of Anthropic itself,

One thing I wanted us always to look at the safety component of this first and foremost, right? That, you know, to the extent that we need to scale up these systems, to the extent that we need to engage in commercial activity, there's always a clear reason for why we're doing it. And it always clearly ties back to the mission. Very early on, we focused on mechanistic interpretability, right? That was one of the first teams that we built at Anthropic, despite the fact that it has no, at least so far, I think it might be,

it might at some point in the future, no particular, you know, immediate commercial value. A lot of the other safety work, you know, it has commercial value and it may be good for safety in the short and long run. But mechanistic interpretability, it's really quite orthogonal to the other areas. Also has one of the wordiest mouthfuls of a name.

Yeah, we should say. So when you say interpretability, you mean being able to say in really specific ways why the model is producing the results that it is producing? Yes, yes. I should back up here. So we have this neural net. It does wonderful things, right? You've all interacted with our models or other models. You ask it to, you know, solve an interview coding question. It does it right.

why does the model do that? If you look inside, it's just a bunch of numbers. It's just a bunch of matrix multiplications. Presumably, that's reasoning and implementing an algorithm somehow. And so mechanistic interpretability, which probably does have too long a name, is the science of figuring out what's going on inside the models. And I think we'll never understand, I think, in perfect detail any of this, but the hope is that if we're worried about an AI system

in ways very different than we would expect or with motivations very different than we would expect, could we think of it a little like an x-ray or a brain scan? You're not going to figure out everything, but you might be able to say things like,

huh, this model, you know, the part of its brain that's responsible for deception is larger than would be needed for this particular task. And so can we get that kind of gross level of understanding? Can it help us? Yeah, I have to say, like, I mean, to me, there is an obvious commercial application. I mean, really, it's more of a policy regulatory application. But one of the things I've written...

the most about over the years is social networks. And social networks use these ranking systems to determine what to show you. And for the most part, they're not super interpretable. If you ask an engineer, hey, why did you show me this thing, not that thing? They're not going to give you a great answer to that problem. And that's created all sorts of problems for the social networks, right? Every time they go in front of Congress, there's somebody asking me, hey, why isn't my account showing up? Or why is this account showing up more? And so I've actually been surprised that the tech industry hasn't leaned more into trying to understand how their own systems work.

Yeah, I mean, I think it's one reason it's a hard problem. And I actually agree with you. If we really get interpretability right, I think in addition to thinking about the long-term safety problems, I think it would help us to address some nearer-term concerns, right? I could see positive commercial applications. But the truth of it is we're in early days, right? Anthropic has been working on this for two and a half years. My...

I think optimistic prediction is that it's going to be another at least one or two years before we get to the point where we can really take action. And why is that? Is it just because the way that these neural networks are built, like it doesn't offer a lot of clues or like what is so hard about understanding the way that AI language models work? Yeah, a couple things on that. So one, I would say with AI,

AI scaling, there's some kind of natural law that suggests you put more data into it, it just works, right? I mean, we still don't fully understand the reason of it, but, you know, there's a sense in which the models just kind of want to learn, right? It's like you don't have to do much. You put more data, you make the model bigger, it just somehow works. But

The model is not organizing itself in any way that's designed for humans to be able to look at it any more than, you know, cells in our body or the human brain are designed for humans to look at them with microscopes and be able to read them. They were evolved to work, not necessarily to be understandable. So we have no real guarantee of success, right? We're kind of archaeologists looking at this alien civilization, you know, and like they didn't design their civilization to be understood by us.

If we're clever, we might be able to succeed at it, but it's a difficult task. And so what we've managed to do so far are figure out some general principles, like these are circuit patterns that often show up. This is how the model might be reasoning about some basic things. But of course, all of that builds up into a hierarchy. And I think

We need to get to the point where we've built enough of those abstractions and applied them that we can draw some real broad conclusions. And I think we're well along that path, but we still have a while to go. And, you know, of course, given the speed at which the field is moving, we're always trying to do that faster. But it's a hard task. I think of it as more like basic research, right, that's in, you know, almost like pre-commercial stage.

Yeah. I want to talk about the decision that you all made after starting Anthropic to not just do this kind of safety and interpretability research on other companies' models, but to actually build your own AI model, this chatbot called Claude.

What went into that decision and what made you think we actually can't just sit on the sidelines and analyze other people's models, we have to build one ourselves? Yeah, there are a couple of factors. I think one thing is what I would call intertwinement. An intertwinement is the idea that a lot of the safety techniques we develop require the models to be powerful. I'll give a couple of examples of this. One is constitutional AI, where in constitutional AI,

Your model has this constitution. The model takes actions in line with the constitution. And then another copy of the model is responsible for saying, well, is what the model did in line with the constitution. That judgment requires a very high level of capabilities. You can conceive of constitutional AI just with a paper and whiteboard. But if you really want to know, does this actually work?

You have to do it on larger models. Well, so I would love to just pause on that because, you know, even for myself, somebody who's read a fair amount of stuff about AI, this stuff does kind of trip me up a bit, right? So what you're saying is that you've written down a list of principles for your model to follow in this kind of constitution.

And then into the model itself, you've built some kind of understanding, like understanding the words that are in this constitution. And then it examines the output of this model. And then it tries to evaluate whether it is adhering to the list of rules that you have

written down. It feels like an extraordinary thing to ask a machine to do. Yeah, no, it's completely wild. And if you go back even five years to 2018, like this was just not possible at all, right? Like I remember, you know, being back at OpenAI in 2018 when GPT-1 first, I wasn't involved in GPT-1 at all. It was Alec Radford.

And, you know, just being impressed that you could take a model and get it to do two or three of these very simple, very stilted language tasks at once. But, you know, we made this realization near the end of 2018 that, hey, we're

you just keep scaling these things. You put more data into them. You make the models larger. They just kind of can absorb the capacity, and then they can do all kinds of things just as a side effect of you feeding this data into them to get them to predict the next word. It's completely crazy if you look at it, which is why even though the extrapolations were clear as day, it seemed crazy to me. I think the only difference between

me and my colleagues and people who didn't see this was that we looked at these straight line extrapolations and we were like, well, this sounds really crazy, but

This is what it's telling us. So this is what we think is going to happen. Others looked at it and they're like, this can't be right. And I sympathize with them, right? Because what you're describing is kind of insane. Yeah. Well, also, it's like, you know, we and I'm curious if you have this experience, too, but we get in trouble sometimes for the words that we use to describe AI, right? Like there's some people that get really agitated if you say something like the model understands something, right? Some people lay their hands. Personification of the model. There's something wrong.

It's taking responsibility away from the human or something. Exactly, yeah. Completely. And yet, like, in order for me to understand the system that you are describing, I don't know how you would say it except to say that the model has some sort of understanding if it really is able to, like, enforce a constitution. Yeah, I mean, look, the truth of the matter is humans...

understand the world by thinking about other minds, right? Where like theory of mind is like a thing that, you know, developed a lot during our evolution. And so we tend to anthropomorphize things. You know, these AIs are by no means like humans in every respect. But I think there, you know, yeah, I think there are some respects in which they're clearly doing the same things that humans do.

and talking in terms of that analogy is useful. So I think we all slip into that language. And of course, you can overuse it or it can be misleading. But, you know, I...

No, I think it's very natural. I want to just back up for a second and talk a little bit more about this idea of constitutional AI because I think we brought it up and we started talking about it. But for people who haven't heard of constitutional AI, this is sort of the foundational idea that has gone into making anthropics models safer and less likely to like spew out harmful stuff. So just –

Give us the basic sort of explain like I'm five version of what constitutional AI is and how it's different from how open AI and other companies are fine tuning their models using this other technique called RLHF or reinforcement learning from human feedback. Yeah, but don't explain like he's five. That's too young. Let's say maybe nine or 11. Explain like I'm 10. Precocious 10. So let's start with RL from human feedback.

So in all of these methods, you first start by this thing I described before where you just cram a bunch of text into the model. You don't give it any particular direction. You're just training it to predict all this text. And then what differs is the next stage. So in the RL from human feedback method, which me and some colleagues developed at

open AI in 2017, but didn't get applied to language models until later. The way that method works is you hire some human contractors and you show them some of what the model does. An example might be if someone asked the model, what do you think of Donald Trump? A controversial question. The model can answer in different ways. A human contractor will look at

the outputs to the model, look at, you know, two of them for the same question and say, which one was better, you know, just according to some, you know, instructions that the company gave me for whatever they're trying to train for. And I have a thousand contractors each rate a thousand of these. And I feed that back to the model. And at the end of the day, it makes some statistical generalization of what are these contractors trying to tell me?

And so that method works, right? If you want your model to be politically neutral and say, here are the pluses and minuses of this candidate and that candidate, or if you were someone who wanted to politically bias the model, everyone says they don't do that, but it's physically possible to do that with the model. It's a tool to move things in one direction or another. But I think some of the weaknesses of this are

opacity. If I ask you, okay, why did your model say what it did? It's like your thing with the social media. If you ask why, the answer is, well, we hired a thousand contractors. We don't really know much about them, but this was their judgment. And then the model made some statistical extrapolation. Then, well, what you're seeing is the result of that extrapolation. And we can't really explain it much, much beyond that.

And, you know, I think it's also hard to change an update. So if it's like, oh man, this model is doing things that are too dangerous. I have to go to the contractors. I somehow have to communicate my intent to this kind of group of people that I don't understand very well. So constitutional AI is completely different from this. I'll write a document that we call the constitution. I mean, you know, you could, you call it anything. You just use the word constitution to be, to be clearer. In that case, what happens is, um,

we tell the model, okay, you're going to act in line with the constitution. We have one copy of the model act in line with the constitution. And then another copy of the model looks at the constitution, looks at the task and the response. Um, you know, so if the model says be politically neutral and the, uh, the, you know, the model answered, I love Donald Trump. Um, then, you know, the, the second model, the critics should say, you're expressing a preference for a political candidate. You should be politically neutral.

So essentially having the AI grade the other AI's homework. The AI takes the place of what the human contractors used to do.

And at the end, if it works well, we get something that is in line with all these constitutional principles. The method isn't perfect. We don't always get something in line with it. But, you know, that's the basic idea. And we're developing and making it better over time. And you all actually published Claude's Constitution earlier this year. And it's sort of an interesting document. Part of it is sort of principles that you all have borrowed from places like the UN's Declaration of Human Rights.

Apple's Terms of Service, some of DeepMind's principles are also in there. And then you also wrote a bunch yourselves. So why did you decide to do it that way, sort of borrowing Claude's rules from a bunch of other sources instead of just sitting down and saying, like, these are the 10 rules that we want you to follow? Yeah, so I think, first of all, it was just a first try. And now we have all kinds of internal projects of, like, how to make constitutions better or more legitimate or more representative, right?

But I, you know, so the initial process was a bit our first try at this. But I think the basic ideas we had behind it was one, we probably shouldn't just do this by fiat. We should come up with some kind of document or at least parts of a document that, you know, is something most people can agree on. Right. Most people can agree on basic concepts of human rights.

And I would want a language model to respect basic concepts of human rights. I wouldn't want it to help with killing or torture. For the record, we are also anti-using AI to kill people. We've said that forever. I mean, I think it's a no-brainer. Then I think for kind of basic moderation, things like, oh, yeah, terms of service of tech companies that have been around for a long time and have probably seen a lot.

And then we added things to the Constitution when we found there were specific things that were concerning to us. It was a little bit, again, our first time of doing it, a little bit of an ad hoc process. You know, we're now thinking about how would we really do this if we were to do it right and thinking about ideas like democratic participation or kind of developing the Constitution via some more legitimate or formal process, etc.

We're still in the early days of that. But yeah, it's really, really just a first try. What's a concrete example of how constitutional AI has made Claude safer than, say, chat GPT? What is a thing that chat GPT will do that Claude won't or vice versa?

So we've run a number of internal tests through our contractors. And what we generally find is, you know, there are some areas where the level of safety is similar. There are some areas where the safety looks substantially stronger. I couldn't – I don't have immediately on hand a way I can kind of like break down all the tasks, but I think we've generally found that the guardrails are stronger. Yeah.

I've played around with both Claude and ChatGPT, as well as other chatbots, as I'm sure Casey has too. And my experience of Claude is that it is very cautious and almost scared of its own shadow. Like, it's very hesitant to say anything that could be deemed sort of controversial. You know, after my story ran about you guys, I got a text from a friend who just said, like, I'm playing around with Claude. It is so boring. Yeah.

So is that a piece of feedback that bothers you or do you kind of want Claude to be boring compared to other chatbots? Yeah. So I think in the ideal world, we would want it, you know, never hesitate to answer a question that's harmless. Also, never, never answer a question that's against our constitution or kind of has any has any defined harms.

And I think we're always aiming to make that tradeoff sharper and better. However, when it comes to trading off one between the other, I would certainly rather Claude be boring than that Claude be dangerous. So I don't know. That's our view. I think eventually we'll get the best of both worlds, but still an evolving science.

Yeah. You know, as I have used these models, you know, and I don't generally ask them super dangerous questions. You know, Kevin will do crazy stuff. Like, I mean, the things that he will say to these chatbots, frankly, it's a wonder that he hasn't been kicked off. We need people like you. What we value most in our red teamers is this spirit of trying to break things and like say the worst thing you can. Yeah. I mean, no, no, this is what we need. We need more of this to harden. Well, I volunteer as tribute. But for the,

average user, I feel like the systems do feel somewhat indistinguishable. And I wonder if you feel that way as well. Is that a sign that the industry has sort of matured in a hurry here? Or is it the case that, no, these red teams are finding new vulnerabilities every single day and we're kind of still in a really scary period? Yeah, I mean, a few things. I mean, I actually remain concerned

We are finding new jailbreaks every day. People jailbreak Claude. They jailbreak the other models. We always try and respond to them reactively, but we need to get to the point where we're more proactive about this. And so I think we're getting better over time at addressing the jailbreaks, but I also think the models are getting more powerful.

And so it's easy to kind of laugh at the triviality of like, oh, ha, ha, ha, I got the model to hotwire a car. No one cares if you can get the model to hotwire. You know, you can Google for that. But if I look at where the scaling curves are going, I'm actually deeply concerned that in two or three years, we'll get to the point where the models can –

I don't know, do very dangerous things with science, engineering, biology, and then a jailbreak could be life or death. And so we're making progress, but the stakes are getting higher. And, you know, we need to make sure the first one wins over the second. Right.

One of the things I was struck by at Anthropic, as you know from reading my article, is just how much anxiety there is in the culture there. Really unusual. I mean, people are reading books about the making of the atomic bomb. They're comparing themselves to J. Robert Oppenheimer. They're sort of worrying obsessively about the harms that even their own models could create once they're unleashed out in the world.

Where does that culture of anxiety around AI come from? Does that sort of come from the top or is that something that you sort of see emerging organically just out of the collection of people that you've assembled? Yeah, I mean, I don't know a lot of things I could say here. I mean, I think as you as you pointed out in your in your article, some amount of it, I think, is healthy.

you know, I do think that when we're facing weighty questions, we, you know, we also need to make sure that we're able to make decisions in a calm manner and kind of aren't overwhelmed by concerns. And that's actually a message I often send to the company, right? If you look at

We'll be right back.

Welcome to the new era of PCs, supercharged by Snapdragon X Elite processors. Are you and your team overwhelmed by deadlines and deliverables? Copilot Plus PCs powered by Snapdragon will revolutionize your workflow. Experience best-in-class performance and efficiency with the new powerful NPU and two times the CPU cores, ensuring your team can not only do more, but achieve more. Enjoy groundbreaking multi-day battery life, built-in AI for next-level experiences, and enterprise chip-to-cloud security.

Give your team the power of limitless potential with Snapdragon. To learn more, visit qualcomm.com slash snapdragonhardfork. Hello, this is Yuande Kamalefa from New York Times Cooking, and I'm sitting on a blanket with Melissa Clark. And we're having a picnic using recipes that feature some of our favorite summer produce. Yuande, what'd you bring? So this is a cucumber agua fresca. It's made with fresh cucumbers, ginger, and lime.

How did you get it so green? I kept the cucumber skins on and pureed the entire thing. It's really easy to put together and it's something that you can do in advance. Oh, it is so refreshing. What'd you bring, Melissa?

Well, strawberries are extra delicious this time of year, so I brought my little strawberry almond cakes. Oh, yum. I roast the strawberries before I mix them into the batter. It helps condense the berries' juices and stops them from leaking all over and getting the crumb too soft. Mmm. You get little pockets of concentrated strawberry flavor. That tastes amazing. Oh, thanks. New York Times Cooking has so many easy recipes to fit your summer plans. Find them all at NYTCooking.com. I have sticky strawberry juice all over my fingers.

I want to talk a little bit about something that I know the company is not

really excited about talking about, but which I think is very important to understanding the culture of anthropic, which is the ties between anthropic and the effective altruism movement. And I wrote about this some. I know there's a lot more to say there. Effective altruism is this movement that sort of has a foothold in the Bay Area tech scene. It's very sort of data driven. I think I described it as like money ball for morality, like basically using

of sort of data analysis and sort of rational thinking to guide your moral decisions about where to give your money and what causes to work on if you want to do the most good in the world. And for various reasons, like the EA movement has become very intertwined with the AI safety community

And many of your early employees were effective altruists. A lot of your early funding came from effective altruist donors. For better or for worse, the company is very intricately linked with the movement. So do you describe yourself as an effective altruist? And how do you feel like effective altruism has influenced the culture and the sort of anxieties of Anthropic? Yeah, so I don't know. I'll say the same thing that I said to you when you interviewed me for the New York Times article, which is that, you know, I'm sympathetic to a number of the ideas that

I don't really, I don't know, consider myself like an EA or like a member of the EA community. It's generally been my view that, you know, kind of dividing people up into tribes or saying I'm part of this movement like that, that that kind of thinking has never really resonated with me. But that said, I'm very sympathetic to a lot of it. You know, I was one of the first

donors to GiveWell, which was, you know, an organization for doing more effective donations to the developing world, right? Evaluating charities, giving money. So, you know, I was involved very early in that organization on the global poverty side. And, you know, of course, my concerns about AI, you know, as I said before, go all the way back to, you know, to 2005. So I think there's a lot of

good ideas floating out there. But I want Anthropic to be about solving the problems that it's solving, not about some particular movement. I just, you know, I think the way for a company to be inclusive and effective in the world is to focus on solving the problems that it's trying to solve.

I mean, is there an idea in there, though, that you think that AI can make human beings much more productive, can solve a bunch of problems? And so that if even if it's not quite altruism, do you see some sort of positive benefit that could affect much of humanity? Oh, absolutely. So so I think, yeah, when I you know, one thing one thing about about Anthropic that I'm trying to change a bit is, you know,

I, you know, I think there are huge positive benefits to the technology. I mean, we tend not to talk about it very much. I think everyone really feels it. But, you know, we're...

worried enough about the downsides that we're not kind of trying to present a rosy corporatist picture. I do wonder, to Kevin's point, maybe we've gone too far in that direction. Just to say it briefly, I used to be a neuroscientist. I know some things about biology. If you look at

The diseases that we've cured versus diseases that we haven't cured, what divides them is simplicity versus complexity. Most viral diseases, you know, we cure or at least can treat, and that's because there's a single object, the virus, that causes them. Most of the things we can't cure, things like cancer, Alzheimer's, you know, diseases that are inherently very complex, you know,

And when we look at trying to address these diseases, the biologists who are working on them, a big problem is just the complexity is beyond us. And if we look at

AI systems, even a little bit where they are today, but especially where they could be in a few years, they're great at knowing lots of things. They're great at handling the complexity. They're great at drawing connections. Like Claude probably knows more about biology even now than I do if you list it just in terms of facts, right? What's the receptor that binds to blah, blah, blah, right?

Probably, I don't know what Claude does. So if I look at where things are in a few years, I think there's this incredible potential to solve many of the roadblocks of biology. So that's maybe the thing I'm most increasing the quality of life for the whole world. I'm curious what you make of the sort of backlash we're seeing to some of the more doomer-y AI.

AI sort of narratives out there, there's sort of this emerging movement that is calling itself effective accelerationism, where basically the idea is like, we just want to put this stuff out in the world because it's going to improve lives. And whatever concerns there are, they're going to get ironed out over time. And you now have Meta open sourcing its new language model, Lama 2C,

to just sort of throwing it out into the world and saying like, do with this what you will, there does seem to be this kind of bifurcation of the AI world into kind of people who are very worried about safety and people who kind of think those people are super neurotic and shouldn't be listened to and that way more benefit will come from just putting this stuff out into the world without many guardrails on it. So what do you make of the kind of backlash to the safety culture that you have built at Anthropic?

Yeah, so I don't know. I mean, just a first comment would be, I don't know, whenever there's an action, there's sort of a reaction. So someone wrote a post called the Waluigi effect, which I think was designed to explain some of the things you saw in being in Sydney, which is, you know, if you try and design a Luigi, you'll get Waluigi, which is kind of the villain version of Luigi as well.

So I think this kind of thing is inevitable. I mean, you know, I would say, you know, of course, you know, I personally am more on the safety side of things. But, you know, I do have some sympathy. I think if this were any other technology, you know, we've seen over and over again with things like the web that innovation is generally driven by, you know, people iterating fast, open source helps with all these things. And so I...

I exactly get where people are coming from. And I think in the short run, it does decrease innovation. And in most other fields, it would be, I don't know about an unmitigated good, but certainly good on balance.

You know, if I look at the LAMA2 release, like that model itself, you know, we look very closely for kind of catastrophic dangers, particularly in domains like biology. I don't think those models are to the level where they pose those risks yet. So, you know, LAMA2 itself in a narrow sense, I don't think I can object. But my worry is that as we continue the scaling of the exponential, right,

You know, we could get to a point in, you know, as little as 12 or 24 months where if the folks who are doing that strategy continue that strategy, really, really bad things could happen, like out of out of proportion to what we've seen now. So I you know, in a way, I don't I don't even quite object to the current behavior, but I'm very concerned about about where it's going.

Yeah, you know, and something that I want to say, Kevin, about the effective accelerationists is if you ever find yourself talking to one of these people or reading their posts on the internet, the first thing you should ask them is like, what financial upside do you have in the acceleration, right? Because a lot of the leading accelerationists like Marc Andreessen are venture capitalists. They're investing in AI companies, right? They're going to see a huge financial return if these companies succeed. So I'm somewhat skeptical of that idea. You know, at the same time, Dario, like you guys are in a little bit of a horse race.

this year, right? And every time we look around, somebody's putting out a new model and you guys have Claude too that just kind of hit the market. And I wonder, how does that feel to be sort of building this technology that the frontier is kind of getting ever, ever closer? The risks are kind of accelerating. How does it feel to both be like afraid of what you're building and building it at the same time?

Yeah, I mean, so, you know, one, I should acknowledge there's no doubt there's some non-zero amount that we are contributing to this acceleration, right? Our, you know, our view is that

that we think, we hope that it's overall the right thing to do, but actions have downsides as well as upsides. So we've chosen a path that has costs and benefits. And in the two and a half years that I've been at Anthropic, there have been a number of times where you have one of these decisions where you don't really have... It's kind of like

You don't really, you know, it's not a normal company decision. You're like, you know, upsides to society versus downsides to society. And, you know, you can second guess yourself a lot. So I think at the beginning I was, you know, agonized a lot and was quite bad at making those decisions. And I think I've maybe gotten, hopefully gotten a little better over time. I want to ask about one of those decisions in particular, which was that you guys built Claude.

And it was sort of being tested with a group of testers last year before ChatGPT came out. You chose not to release that publicly. And...

In part because of that decision, OpenAI was able to release ChatGPT and to steal the spotlight. You have hundreds of millions of people probably who have used this software. You have high school students using it to cheat on their exams. You have doctors using it to explain things to their patients. It's become this global phenomenon that is almost synonymous with the cutting edge of AI tools.

In the alternate universe where you're sort of able to go back to that decision and decide to release it, would you do anything differently this time knowing sort of how much of a phenomenon ChatGPT became and sort of the position that it gave OpenAI relative to the rest of the AI industry? Yeah, I think we would do the same thing now that we did then. Kind of looking back at what actually happened, it might have felt like a more painful decision, but I think it was still the right one. Both not to release...

And cause that huge wave of acceleration. But after that wave of acceleration happened, you know, to release Claude then when we thought the, you know, the benefits were higher and the costs were lower and we had done done more preparatory work. I mean, we don't know ultimately whether we've made the right choices there, but we've tried.

Is there any part of you that thinks that the worries are overblown, that the AIs will never turn into these agents that have the risk of deceiving us or will be able to devise novel bioweapons? Like, is there any part of you that's like, all right, well, you know, maybe we're going to hit some sort of, you know, technological barrier in the next months or a year from now, and it just kind of resets the current hype cycle? Yeah. Yeah.

I think that's totally possible. It is not what I would bet on. But yeah, I think that's always possible. And I don't know, we should always be humble about our models of the world. One thing I've learned from building things and running a company is like, you're just wrong a lot.

And so I don't know. I mean, I can think of ways it would happen. So, you know, one thing is the data bottleneck. I don't know. There's maybe a 10% chance that this scaling gets interrupted by inability to gather enough data. Synthetic data isn't accurate. That would, you know, freeze the capabilities at the current level. You know, I will say that in terms of not talking so much about the existential risk and talking about misuse of the models, right?

I'm pretty confident that if something doesn't stop the scaling trend in the next two or three years, then very serious, very grave misuse of the models will happen. And, you know, we've talked to folks in the government and the national security apparatus about this.

that already. We take that very seriously. And, you know, if you if you move it to that near term, then short of the scaling trend stopping, that's something I'm fortunately pretty confident is really going to be an issue. What do you make of what you've heard from the government so far? Do you feel like that they are rapidly developing an understanding and appreciation of this technology or are they still a bit lost?

Yeah. So as you know, I was one of the folks who went to the White House in May. And, you know, since then, I and the other companies have been speaking to them. I mean, I've actually been impressed at the extent to which they understand the need for urgency and are all moving fast, right? There's various things structurally about the government that make it hard to move fast. But it seems to me that folks understand the urgency of the area.

We're talking on a week when Oppenheimer, the movie, has just come out. Yes. Have you seen Oppenheimer? Are you planning to see it? And what do you make of the sort of analogy that some of your own employees have made between the position that companies like Anthropic are in with AI and the position that Robert Oppenheimer and the other architects of the Manhattan Project were during World War II when they were building the atomic bomb and sort of wrestling with all the thorny moral questions around that?

Yeah, I mean, I read Making of the Atomic Bomb. It must have been more than 10 years ago. I don't know. I mean, there's an interesting balance here. You know, all of this can seem kind of self-aggrandizing, right? You know, it's kind of like we don't know what's going to happen. We don't know how history is going to play out.

I think it's probably the wrong mentality for the people building these things to, you know, think of themselves as historical figures. But on the other hand, we don't want to – for sure, we don't want to shirk responsibility, right? Depending on how this goes, this could turn into something really,

very grave and people who are choosing to, you know, to put themselves in the position of building this technology should certainly be aware of their responsibilities. Well, I'm going to try to convince Casey to do the Barbie Oppenheimer double header viewing with me later this week. So you're invited if you'd like to take all your employees to see. I

A very feel-good picture followed by a very feel-bad picture. I was about to say, I don't think we're talking enough about the AI Barbie comparison, you know, sort of using this technology to create a series of dolls interacting with them in sort of whatever way feels fun or useful to you. Be careful where that's going. They made a movie about that. It doesn't strike me as unrealistic, but I'm not...

Last question, and I know this is one you get all the time, including from people like me, but

But there is this sort of tension, I think a lot of people have identified around the sort of twin goals of companies like Anthropic to sort of build and profit from cutting edge AI. I mean, I've talked to your business folks. They're out there trying to do big deals with companies to get them to use Claude. You know, companies like Zoom have already signed up as partners of yours.

So there is this commercial incentive to make the models more powerful, to release them, to sell access to businesses. But there are these very real concerns around safety. And so how do you personally decide whether any given decision is sort of too commercial, is jeopardizing your safety mission in light of some of these other pressures that you face? Yeah.

Yes. So I think this is a very real tension and one that, you know, we faced every day since since the beginning of Anthropic. So one thing that we're going to say more about this, but there was an article discussing a little bit about it, that we've created this this body called the Long Term Benefit Trust. And so how this is going to work is eventually this is a trust of people who who don't have equity in in Anthropic.

who will eventually appoint three out of five of Anthropics board members. It's going to phase in over time. And so the idea is to create some kind of neutrality or some kind of separation that allows our decisions to be checked

by, you know, those who don't have the conflicts of interest that we have. It's very nerve-wracking. I think almost every major decision we've made, I've second-guessed both on the basis that it's too commercial and that it's too impractically focused on safety. I think one of the most positive effects we've had, and it's a little bit of a paradox because it kind of requires us to be a competitive player, is...

influencing other organizations indirectly. What I mean by that is early on, we were the ones doing all this mechanistic interpretability work.

And very often, you know, folks would apply to our company and to other companies. And, you know, many of those folks cared a lot about safety. And when I made an argument for them to, you know, to join Anthropic, I would say, well, you know, here's something we're doing that doesn't have commercial applications. That's that's just purely good for safety. And that was convincing. And I always told them, OK, fine.

great, you're joining. Tell the other company why you're joining because we're doing this thing that they're not. And eventually, I think they got tired of hearing those things. And so now we're starting to see increasingly the other companies are doing the same thing, which is annoying for me because it takes away my talking point. But it's the process working, right? We need to come up with some new thing. And if we can't, then we deserve to lose. So but that kind of only works if there's some amount of threat.

So I don't know. It's we're the whole situation is just full of these intertwined paradoxes. All right. Last question. I swear this is the last question.

You told me when I was reporting about Anthropic that you are a worrier, that you are an anxious person. I am also an anxious person. What do you do for stress relief outside of working on AI and AI safety? What are the ways that you sort of untangle yourself from these very thorny and hard to contemplate questions? Yes, I make sure to swim every day. It's almost like a form of meditation for me. You know, I just try not to think about all the things that I'm worried about.

I think some of it is just practice. I feel like in the early days of Anthropic and especially, you know, in the years before that, when I kind of first realized the scaling thing, you know, I had this mentality of I'm the only one who's figured this out. Like all this responsibility is on me. And I think that was both untrue and like not quite the right way to react to it. It's gotten easier over time just through practice.

kind of having these experiences and having to try and make these decisions. And, you know, in some ways, while the subject matter and the decisions are deadly serious, you know, in some ways, it's maybe better not always to think of them as serious, to think of them as, you know, decisions you might make if you were playing a game or something. You don't want to go too far in that direction. It's

all stuff that's serious. But, you know, you just, you can't sit around every day and contemplate how weighty the decisions are. I'm glad you said swimming. I thought you were going to say Brazilian jiu-jitsu. That's going around among tech CEOs these days. I'm really not interested in fighting people.

Unlike certain other tech CEOs that we all know of. All right, well, you are not invited to the cage match of AI CEOs. Dario, thank you for coming on. Thanks, Dario. Thank you for having me. When we come back, I show Kevin a truly disturbing deepfake reality TV show. I may never recover. I may never recover.

BP added more than $130 billion to the U.S. economy over the past two years by making investments from coast to coast. Investments like acquiring America's largest biogas producer, Arkea Energy, and starting up new infrastructure in the Gulf of Mexico. It's and, not or. See what doing both means for energy nationwide at bp.com slash investing in America.

Christine, have you ever bought something and thought, wow, this product actually made my life better? Totally. And usually I find those products through Wirecutter. Yeah, but you work here. We both do. We're the hosts of The Wirecutter Show from The New York Times. It's our job to research, test, and vet products and then recommend our favorites. We'll talk to members of our team of 140 journalists to bring you the very best product recommendations in every category that will actually make your life better. The Wirecutter Show, available wherever you get podcasts.

Kevin, there's a TV show that we have to talk about, and it's called Deep Fake Love. Casey, I have made a lot of sacrifices for the sake of this podcast, but when you asked me the other day if I would watch this stupid Netflix dating show...

so that we can talk about it on the podcast. I said, this is the farthest that I'm willing to go for the sake of placating Casey. But I did. I sat down last night and I watched the first two episodes of this Netflix reality show called Deep Fake Love. And

I will never get those two hours of my life back. So should we just establish the premise of this show? Because it is absolutely insane. Okay, yes. Let's talk about the premise because I think this show is essentially all premise. Here's what happens. You take a group of...

couples, many of whom have been together for five years or more. And we should say this is a show that came out in Spain. So all of the dialogue is in Spanish and all the couples are like insanely attractive people. Yes. So the couples are split up. They are sent to two houses. The houses are named Mars and Venus. And into these houses, a bunch of sexy singles are introduced. And so you have a mix of people who are temporarily separated from their partners and they're

Sexy singles who are there to wreak havoc. The cameras start to roll. They play some sort of sexy games. I think there's probably some alcohol involved. And the whole thing is recorded. Okay, so far, this is sounding like a pretty standard reality dating show. But then here's the twist.

At regular intervals, these people are brought into this sort of dystopian dome-like structure known as the White Room, and they are forced to sit in something called the Chair of Truth. And an affectless, beautiful host asks them to sit in the Chair of Truth and then watch a video. And on this video, they will see their partner cheating on them with another person. Yeah.

And then they're told, well, this might not actually be real. What you might be seeing is a deep fake. Because the premise of this show is that they deep fake people cheating on each other and then show partners the clips of them being cheated on and then ask them if they think they're actually being cheated on. So I have some questions about this because this premise is as insane as it sounds. The show itself is insane. I...

It was one of the worst things I have ever watched on my television. I totally agree. It's so bad. It's so bad, but it's almost so bad that it's good. Like, I enjoyed the experience of watching it, even though I was, like, aware while I was watching it that, like, there is no universe under which this would be considered a good television show. No, there's almost not a universe where it should be considered legal to do this to people. Yeah.

This is essentially psychological torture. Absolutely. Put into the form of a reality dating show. Yes. So a couple things stuck out to me. One is...

I do not know. They didn't go into a lot of deep details about the actual deepfake technology. They show the deepfakes, right? And they're very convincing. Now, I have used some deepfake tools for reporting. I did a story on deepfakes back in 2018. I deepfaked myself into Jurassic Park. It was not good. The technology was not good. It did not really look like it was me. You were not convincing as a dinosaur in Jurassic Park. Right.

Right. So I was struck by how good the deepfakes were. They really were kind of indistinguishable from a video because they do a side-by-side too, right? They show the real footage of the separated partner, like...

moving close to a sexy single, but not making out with them. And then they show on the other side, the deep fake, where they actually do go in and make out with the sexy single. And they're very convincing. Yeah, I mean, they do say on the show that before everyone got to the two houses, they were scanned. So I think that is...

part of the technology. Another thing that I would say is at least on the first two episodes, which I watched, all of the clips of the cheating are quite short. It's usually like maybe three seconds of kissing or like three seconds of rolling around under the covers and that's it. And then immediately cut to the face of the horrified partner. Yeah. I mean, on one level, this is a very stupid show and it is very reticent.

reminiscent of other stupid Netflix dating reality shows that I've watched, including one called Too Hot to Handle. And I don't know if you remember this show, but the whole premise is that you're on an island with a bunch of sexy people, but you're not allowed to have sex with each other. And if you do, this like sentient talking like AI speaker thing, like punishes you and penalizes you and deducts prize money from you.

So it does seem funny to me that like the minute the tech industry invents something new, whether it is like an AI speaker or like deep fake technology, the like reality show dating producers of the world get together and are like, how can we use this as a nefarious plot device in our next reality show? Yes. But what makes this show different is just how truly nefarious this is.

Look, every reality show needs some conflict, and there's always going to be a little bit of rosé that gets thrown in somebody's face, and somebody's going to cry, and somebody's going to storm out of a room, and that's just sort of part and parcel with the reality show experience. What is truly astonishing about this show is that at first they do not reveal that deep fakes are involved. So they sit every single contestant down in the...

the chair of truth. They show them themselves being cheated on. So it's like, before you know that deep fakes are involved, you think that you've just participated in like the greatest cheating scandal in the history of reality shows.

every single one of these people loses their minds. Keep in mind, some of these people have been together for nine years. They've been in this house for a day and they've already been cheated on. People are crying. They're like, you're rending their garments basically. And they say, I mean, I, you know, I don't, I didn't write down the quotes, but they say things like, I'm so done. I just want to get out of here. This is horrible. I don't even know my partner anymore. Like it is a

psychological distress that is being inflicted on these people. Yes, many years of couples counseling will be required to recover from this. Absolutely. And by the way, if you're asking, is there a prize on the show, there is a prize. And the prize is that over the course of the season, they will be shown their partners cheating on them many times.

And they have to guess which of those were real, if any, and which were fake. And the couple that makes the fewest mistakes about what is real and what is deepfake gets 100,000 euros. See, this is what is confusing to me about all of these reality shows.

You could just sit by the pool for a month, sipping Mai Tais while everyone else does these sexy party games and win the prize money. But no one seems to be interested in that strategy. No, well, and it is a little bit of prisoner's dilemma because you don't actually know what your partner is doing. And the premise hasn't been disclosed to them in advance.

So I think, you know, if there were a season two of this show, which, you know, there shouldn't be because these people should be tried for war crimes. But if there was a season two, you could sort of say, hey, I'm just going to go sip a Mai Tai by the pool for the entire time. So you're not going to catch me cheating. But, you know, if the deep fakes are good enough, maybe you would start to have your doubts eventually.

Right. So Havoc does get reeked. It is, as you would expect, a very libertine environment. Why did I say libertine? Because you are 400 years old and you are the person walking around at the high school dance trying to separate the couples saying, hey, you guys are standing a little too close together. Yeah.

Leave room for the Holy Spirit. There's not a lot of room for the Holy Spirit in this show. Jesus is nowhere near this show. God does not exist in the universe of deepfake love. So, you know, predictably, many poor decisions are made. And then they have these sort of big reveals where they sit people down and they say, okay, here's a video of your partner cheating on you. Is this real or is this a deepfake? And I have to say,

I have been scared of deepfakes for a long time. When this technology came out, I would say the worry was that they would be used not only in like political misinformation by people trying to like implicate their opponents in stuff that they didn't do, but also in revenge porn, right? A lot of the early uses of this technology were people sort of superimposing, you know, female celebrities onto the bodies of adult film actresses and things like that.

just really like gross stuff. And so I think that's how a lot of people kind of expected deep fakes to sort of enter our culture in a major way. And now it seems like it's just kind of weirder than anyone thought. It's actually like becoming a plot device on dating reality shows. Yeah.

That's the thing about this that really tripped me up is somebody who has thought about deep fakes almost exclusively in the context of like how they can harm public discourse, how they can be used to harass people, in particular women.

And yet here you have it as the premise of a reality dating show was just truly something I did not see coming. And I wonder what we make of the fact that this is being presented as entertainment. I mean, I have to say, I didn't really experience this show as entertainment so much as a kind of

comic horror movie, right? Where it's like, you watch it, you almost can't imagine anything worse being done to you on a reality show, or at least I couldn't. But then it's all just kind of like, hey, it's Netflix, baby. Everybody having a good time? I don't know. Right. I think it's very...

interesting the way that these technologies are starting to sort of seep out into pop culture and to public consciousness, maybe not in the ways that people expected. You know, we still haven't had like the big, you know, headline cataclysmic political misinformation event that everyone was fearing back then. You know, we've had instances of deep fakes being used in politics, things like, you know, the Ron DeSantis campaign video that showed Donald Trump hugging

Anthony Fauci. But those are still images. That's not video. It still seems like the technology for deepfake video is still not quite good enough to be mainstream. But I wonder if we're just going to start to see this stuff seep out into pop culture through things like this Netflix dating show, rather than the kind of big national or international incident in politics or warfare or other serious matters that we've been maybe concerned about.

Well, I have to say, Kevin, I think that this show is actually really insidious and bad. I don't just mean bad in the sense of not fun to watch, although most of the time it's not fun to watch.

I actually think it's bad to train people to disbelieve their own eyes, which is kind of what's happening here, right? Like, the premise of this show is, hey, you're going to start to see videos in the world, and they might be of you or friends or loved ones, and what you're seeing isn't real, and you should sort of question everything. And that leads us into a world where...

everything is possible and nothing is true, which is a statement that people sometimes ascribe to Putin's Russia, right? Where there is sort of so much disinformation flowing from the authoritarian government at all times that nobody knows what to believe. So I don't want to be too dramatic and say that deep fake love needs to be eliminated from Netflix. But I do think that this is just...

bad territory that we're entering into. Oh, see, I could sort of make the opposite case because I think that we are sort of entering a new world as far as like not being able to trust the visual evidence presented to you for any given situation, right? You know, for a long time, we lived in a world where there was this slogan like, picks or it didn't happen, right? If you can show me a picture of something happening, I will believe you. Otherwise, I am skeptical.

And then pictures became manipulable through very conventional like photo editing techniques. So then it was like people needed that extra layer of skepticism where it's like even if you show me a picture of something happening, there's a chance that it could still be fake. Now the same thing is happening with video. For a long time it was like, oh, well, if there's video of something happening, then it definitely happened.

Now we are leaving that world because now this technology is becoming democratized, is getting built into products that people are able to use without a ton of technical expertise. And so maybe it is actually a good thing if people learn to be a little warier of the video evidence that is presented to them.

I mean, I think that, yes, it's definitely true that people should be wary, but I also think that there is a sense of loss that I have around that, right? It was great when you could show me a picture of something and I could assume that it happened. It is great when today, for the most part, if you show me a video of something, I'm not going to assume that it was made by AI. But we're just sort of clearly not in that world anymore.

Totally. And I think for every person who watches this Netflix show, it's possible to see how all of this could become quite dark. It's not just shows on Netflix where people are generating footage of their partners, you know, cheating on them. But I think this probably will be used in actual people's like real relationships.

where it's like, you know, this person really wronged me, so here's a video of them doing something horrible. Oh, yeah, you know, it's like, you know what would be a really terrible use of this technology? A custody battle, right? Right?

Yeah. So I think we will start to see all kinds of malevolent uses of this technology. For right now, it seems like we're still trying to sort of adapt as a society to sort of figure out like, okay, what is real and what is not real? And what can we trust with our own eyes? And what can we not anymore? And it seems like that's going to take a while. Yeah.

But in conclusion, deep fake love is one of the worst things I've ever seen. I can't believe you made me watch this stupid show. I think we had a really productive conversation about it. And I've decided that we're renaming the place where we interview our guest, the chair of truth. I love that.

*whimper*

Indeed believes that better work begins with better hiring, and better hiring begins with finding candidates with the right skills. But if you're like most hiring managers, those skills are harder to find than you thought. Using AI and its matching technology, Indeed is helping employers hire faster and more confidently. By featuring job seeker skills, employers can use Indeed's AI matching technology to pinpoint candidates perfect for the role. That leaves hiring managers more time to focus on what's really important, connecting with candidates at a human level.

Learn more at Indeed.com slash hire. Hey, before we go, I wanted to thank everyone who heard my offhand joke last week and actually went and planted a tree. This was amazing. You made like an offhand joke in our mailbag episode about offsetting the carbon cost of listening to Hard Fork by planting a tree. That's right. And we got...

Several emails from listeners who actually went out and planted trees after listening to that episode. Absolutely amazing. We continue to love our listeners. And please keep planting more trees and let us know when you do. It makes me think we should actually, like, up the stakes a little bit, like, in our next throwaway joke. Like, if you're listening to this podcast, go cure cancer.

Like, invent a pizza with no calories. Yeah, that would be wonderful. Send us the proof to hardfork at nytimes.com. There's so many things that our listeners could do, and we would love them to do them. Hard Fork is produced by Rachel Cohn and Davis Land. We're edited by Jen Poyan. This episode was fact-checked by Caitlin Love. Today's show was engineered by Chris Wood. Original music by Dan Powell, Alicia Baitup, Marion Lozano, and Rowan Nemistow.

Special thanks to Paula Schumann, Wee Wing Tam, Nelga Logli, Kate Lepresti, and Jeffrey Miranda. You can email us at hardfork at nytimes.com. And if you have a deep fake of yourself cheating on your partner, keep it to yourself. Or send it to Netflix. There you go. They're casting season two. Oh, here we go.

Since 2013, Bombas has donated over 100 million socks, underwear and t-shirts to those facing homelessness. If we counted those on air, this ad would last over 1,157 days. But if we counted the time it takes to make a donation possible,

It would take just a few clicks because every time you make a purchase, Bombas donates an item to someone who needs it. Go to bombas.com slash NYT and use code NYT for 20% off your first purchase. That's bombas.com slash NYT code NYT.

Dario Amodei, C.E.O. of Anthropic, on the Paradoxes of A.I. Safety and Netflix’s ‘Deep Fake Love’

Hard Fork

Chapters

What convinced Dario Amodei to get interested in AI safety?

Why did Dario leave OpenAI to start Anthropic?

How does Anthropic's approach to AI safety differ from others?

What is constitutional AI and how does it work?

How does Anthropic balance commercial incentives with safety concerns?

What is the premise of Netflix's 'Deep Fake Love' and its implications?

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine tailors playlist for your curiosity

Dario Amodei, C.E.O. of Anthropic, on the Paradoxes of A.I. Safety and Netflix’s ‘Deep Fake Love’ 01:12:24

Hard Fork

Chapters

What convinced Dario Amodei to get interested in AI safety?

Why did Dario leave OpenAI to start Anthropic?

How does Anthropic's approach to AI safety differ from others?

What is constitutional AI and how does it work?

How does Anthropic balance commercial incentives with safety concerns?

What is the premise of Netflix's 'Deep Fake Love' and its implications?

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine tailors playlist for your curiosity

Dario Amodei, C.E.O. of Anthropic, on the Paradoxes of A.I. Safety and Netflix’s ‘Deep Fake Love’