Generative AI in Video and the Future of Storytelling (with Runway CEO Cristobal Valenzuela)

Publish Date: 2023/8/30

Chapters

Hello, Acquired listeners, and welcome to this episode of ACQ2. Today, we are talking with Chris Valenzuela, the co-founder and CEO of RunwayML, one of the most fascinating companies in the AI space right now that I've had a chance to play around with quite a bit, and I'm absolutely blown away by the product. And as we are diving into trying to understand the

current state of AI and everyone is watching all this rapidly evolve in real time. I looked at this as an awesome opportunity to get to spend more time with Chris, who is not only contributing to the space, but sort of like inventing it as we're going along. So without further ado, Chris, welcome to Acquired. Thank you for having me. Awesome to have you here.

Well, David and I wanted to start peeling the onion from sort of the highest layer, which is you have created at Runway a text box where I can go and type in text. And then within a minute or so, I have a high resolution video of what I typed in. And it is absolutely remarkable. And I think you just released the Gen 2 model, which is even better. I sort of wanted to just ask an open-ended question, which is,

How does this work and how did it evolve to... How did we get to this? Yes, be what it is today. Okay, there's a few questions there, so I'll try to unpack them all. How does it work or how we got here first, I guess? It's been a journey of a couple of years from the runway side. The company is now turning five years. And so we've been working on this idea of...

synthetic media or generating models for some time now. Even before that, the founders we met at school and we've been working on it for now, I would say collectively for like seven, eight years. And this was in NYU in the art school? This is at NYU, NYU Tisch School of the Arts, which is an arts program that also has a bit of like engineering in it. Think about it as art school for engineers or engineering school for artists.

And so we started playing around early, early on with early neural network research and projects and tried to take some of those ideas and apply them into the fields and into the arts, specifically into filmmaking and designing and art making in general.

So it's taken definitely some time and that's where we're coming from and happy to go deeper into it. Now, I guess the model you're referring to more concretely, Gen2, is a model and a research we've been working on for some time now that allows you to transform input mechanisms or input conditions like text, like you were referring to it, to video. You can also work with images or with other videos as well.

And so maybe the best way to think about the model itself is to think about it in two different ways. One from the product side of things, which is how are people using this

And you know, the best analogy I've come to understand or explain how these things work is really to think about it as a new camera. You have a new kind of camera, and this new camera allows you to create some sort of like video out of it. And you can control the camera with different settings and presets, and you can control the light and the aperture of the lens, etc. These models work pretty much similarly. You have a model that's able, or a system or a technology that's able to generate video

And you can condition the video generation to text, to images, to video, and a few other things as well. So depending on what you're trying to do, if you're trying to create a video out of an existing image, you might choose the image to video mode. If you're trying to maybe get some ideas out of your head, you might try the text to video mode, which is you try being text and you get video out. So it's a very flexible kind of like camera, if you want to put it with that analogy.

So that's, I guess, the first part of it. And the second, I guess, more technical aspects of how these models actually work. There's a research we're conducting for some time now on diffusion models specifically applied to video, which is the kind of baseline model that we built for this. The Runway founding team, you all were heavily involved of, if not the primary authors of latent diffusion, right? Yeah, we've been pioneering work on journey models and foundational models for both image and video in multi-model systems. For some time now, we are...

the co-authors of a very important paper called "Latent Diffusion" that gave birth to Stable Diffusion, which is a collaboration between the University of LMB Munich and Runway. And that, I was checking HagenFace the other day, is the most used open source model in the image domain and so perhaps one of the most influential models, I would say, in the whole generative AI landscape these days was made by Runway and LMB Munich.

And so we've been working on it for some time for sure. And now the next frontier for us represents video. And so Gen 1 and Gen 2, which are also papers we've published with our research team, have been kind of leading the way in the video side.

Okay, so we could be here for a 16-week course of lectures to try to answer the question of, well, how does it work? But give me the Reddit explain-it-like-I'm-five version of how do these models work? And maybe let's start with the images to produce an image as the output. And then after that, I want to follow up and ask you about video.

Sure. Collectively, models understand patterns and features within a dataset, right? They're just probabilistic models and they're trying to predict what's going to happen next. That's like, I guess, the broad definition of any AI system.

With video generation, you can take that same kind of like concept and apply it to frames, right? So you take one existing frame, let's say a picture you've taken in the real world or a picture you actually can generate with Runway. And the model is basically trying to predict what frames will come after that initial frame.

If you think about video, really, video is a magical trick. It's an optical illusion. There's no actual movement. It's just the optical illusion we've created by stitching frames together at a speed enough that our eyes believe there's movement there, right? But they're just frames.

And so the trick and how it works is really trying to build a system and a model that understands how to predict consistently and temporarily consistent, which is the key concept, every single frame and how that frame relates to the previous frame and to all the previous frames before and after.

And so for that, you train a large model that's on a large enough data set to gather those patterns and data and kind of insights around frames. And then the goal is, well, let's start now conditioning or generating new frames. And for that, you can use an existing image or attacks or other condition mechanisms as well.

You said something, a large model there. I want to double click on that for folks who maybe hear a large model and they think LLMs, you know, the current, what everybody thinks of with generative AI. That's a whole nother kind of branch of genealogy here, a different type of large model around language, around text.

images, video, this is a whole other branch, right? Yeah, that's very important to make a distinction about. The key concept here is that AI is not just LLMs. So AI is not just language models. And it's important that we're more specific. I think part of it has been this very reductive view of seeing AI as synonymous of like ChatGPT, which I think ChatGPT has dominated so many of the conversations these days.

that people assume that when we speak about AI, we're speaking about LLMs or chatbots or language models. And the truth is that the field of AI is way more bigger than just language models. For sure, language models have been perhaps the one that

people have been particularly excited about. But there's other domains as well that just work differently or can borrow some ideas from language models, but they operate in different domains. And in many ways, it feels like just kind of the tip of the spear. If you think about the economy and human activity. It does. It does. Text is very important, but lots of things go beyond text. It does. And so some of the perhaps questions and uses that you might come with a chatbot or a language model might not actually be

relevant or apply to someone working in film, right? And by being working in film, it's like you don't have the same constraints that are the same conditions or the same questions or the same challenges when making a film that when writing something with text.

And so models, foundational models or large models can be on different domains or different modalities depending on what they're trying to solve. And they're actually, we can go deeper into it, but they're actually models that can be multi-models who can work on different domains at the same time or

the different inputs. But most of the time when you're referring to foundational models, it's always good to be specific around the domain you're working with. So there are large models for image, there are large models or foundational models for video, and there are large and foundational models for text these days.

Are the explosions in all these different modalities, which seem to kind of be happening at the same time or within a year or two of each other, does it all date back to the 2017 paper on the transformer? Like, why is this all happening right now? The field itself dates back to like the 40s and even perhaps before that. And so definitely there's collectively been like decades and a lot of years of work into making this happen.

I think for me, a bigger moment in time that helps explain perhaps the way of more recent developments

Progress we've seen happens around 2015 when ImageNet was around on a paper that was published that came around and proved that you can use convolutional neural networks and neural networks in general to solve some problems that people thought neural networks would never solve. And from there on, a few things started to happen. Researchers were experimenting with using GPUs to compute images.

in parallel neural networks, which wasn't like possible before that. I think I was 2012 or 2013.

Yeah, I mean, Kudo was only around for a few years at this point, right? PyTorch was released around 2016, I think so, 2017. TensorFlow was like around the same time. So I started working on GAD of Runway around like 2016 or so, where most of these things were like starting to like get momentum on. And so I wouldn't say there's one particular paper that has like explained or like helped

justify the wave because again transformers are most of these days apply mostly on the text domain on the language domain they do have some applications in the visual domain as well but the latent diffusion paper that we publish has a really important paper and a really important research that goes deeper into using some neural network techniques or deep neural networks into like the image domain and so that's a different paper if a different like genealogy of like work

And so I wouldn't say there's one single thing. It's more of a combination of things that's been, I would say, happening more in particular for the last 12, 13 years, starting perhaps from the AlexNet and ImageNet work. Yeah, it makes sense. So on the video model in particular, does it have to train on video training data in order to understand what I mean when I say a panning shot or a dolly zoom or something like that?

You can think about the training as two separate stages to get that level of control. There's the baseline foundational model training, which is let's first get a model that's able to generate frames, right? So if you think about that, that's a new task. Like the...

idea that you can generate video using nothing but words is relatively new. Like getting to the point where you can do that consistently wasn't even like imaginable, again, a couple of years ago. And so what you do first is you generate the model, you create the model. And then a lot of the work that comes after that is fine tuning, which is specializing the model on specific things.

styles or specific control mechanisms that allow you to take this initial piece of research and define better ways of controlling it, which I guess to your example is like how do you make sure that you can define like the zooming and the panning and other kind of conditions that are relevant for video itself. So a lot of the work has to do with both things, creating the baseline foundational research model and then fine tuning on top.

We want to thank our longtime friend of the show, Vanta, the leading trust management platform. Vanta, of course, automates your security reviews and compliance efforts. So frameworks like SOC 2, ISO 27001, GDPR, and HIPAA compliance and monitoring, Vanta takes care of these otherwise incredibly time and resource draining efforts for your organization and makes them fast and simple.

Yeah, Vanta is the perfect example of the quote that we talk about all the time here on Acquired. Jeff Bezos, his idea that a company should only focus on what actually makes your beer taste better, i.e. spend your time and resources only on what's actually going to move the needle for your product and your customers and outsource everything else that doesn't. Every company needs compliance and trust with their vendors and customers.

It plays a major role in enabling revenue because customers and partners demand it, but yet it adds zero flavor to your actual product. Vanta takes care of all of it for you. No more spreadsheets, no fragmented tools, no manual reviews to cobble together your security and compliance requirements. It is one single software pane of glass.

that connects to all of your services via APIs and eliminates countless hours of work for your organization. There are now AI capabilities to make this even more powerful, and they even integrate with over 300 external tools. Plus, they let customers build private integrations with their internal systems.

And perhaps most importantly, your security reviews are now real-time instead of static, so you can monitor and share with your customers and partners to give them added confidence. So whether you're a startup or a large enterprise, and your company is ready to automate compliance and streamline security reviews like

Like Vanta's 7,000 customers around the globe. And go back to making your beer taste better. Head on over to vanta.com slash acquired and just tell them that Ben and David sent you. And thanks to friend of the show, Christina, Vanta's CEO, all acquired listeners get $1,000 of free credit. Vanta.com slash acquired.

Okay, so before you get to the fine-tuning, you've created the model. How do you get the data to train models like these? So every model is different. And at Runway, we have around 30 different models. And so every model can and will be probably trained on a different dataset.

We have internal datasets that we use to train our models, but we also actually train models either from scratch or fine-tune models for our customers and mostly like enterprise users. And so think about your large media company, your large entertainment company, and you have a large volume of data sitting around you.

You can use the data to train a version of a video generation model that has a particular knowledge and understanding of your thing that no one else has.

And if I go back to the analogy of the camera, it's basically the equivalent of building your own customized version of a camera that only works in the type of environments and settings and presets that you need it to work. You can also do that with particular data sets. MARK MANDEL: Well, this is an amazing way to get in here in a second to your customers and use cases for Runway. But based on what you're saying, you could do something like train a Runway camera on a movie.

So you could like have a movie with a certain visual style, train a runway model on it and create more video with that unique movies visual style.

Exactly. You can prompt or fine-tune a model with a particular art direction or a particular style, right? So you can use that as a reference material. And remember, these models learn patterns. They don't copy existing data. So by learning the patterns, as a creator, they will allow you to iterate on those ideas or video samples faster and quicker than ever before. And that's something we're doing a lot with

filmmakers these days, helping them ingest their own films or own content or animations and using that to create this customized, very personalized system that now you can use in conjunction with other tools you're using these days.

Yeah, this is a great segue to the use cases. So anyone who's thought about this for five minutes can come up with, ooh, Hollywood movies. And then you start thinking a little deeper and you're maybe like, ooh, what about marketing videos? People can come up with many clever use cases from there. Where do you decide where to aim your energy? And I'm curious what your sort of different customer segments look like and where you've found the most fertile ground that AI can be helpful for video. Yeah.

I mean, for video, really, and that's why I go back to the camera analogy. It's a very flexible system. It's a general purpose creative tool. And so really, the field of video generation and synthetic media will encompass everything from future films to short films to series to internal HR videos that a company can make to small creations that someone can make on their phone. And that's actually a great representation of the type of content that you see these days.

If you just do the exercise of searching in like social media for runway, you'll see a combination of videos being created by people that might have never thought of themselves as filmmakers or creatives creating video. And you also have professionals who've been working on the film industry for like decades using runway as well. And so it's a very flexible like system. And our goal is not to try to build barriers, right?

or constraints around the usage of it. Because like a camera, it can be used for anything. It's a very expressive tool if you know how to use it. I think the one thing though, it's interesting to recognize that, and it happens before with other technologies as well, which is the first thing people try to do with it is to try to replicate the past. And so they try to use it as literally as a camera.

And I think like if you look at the history of like the camera, the first thing people tried to do when they got their hands into a device that was able to capture light was to record theater plays. Because that's what was the form of art that people thought that cameras were supposed to be used for, right? That's part of like the experimentation phase of dealing with a new technology. You have to have some sort of like grounding in something you know. So you go back to what you know, which in the case of the camera was like 50%.

theater. But today there's a lot of things that you can do with video models that are perhaps similar to the things that you can do with a camera. And there are other things that a camera would allow you to do. And we're just starting to scratch the surface of those things. So future-wise and customer-wise and like focus-wise, we're really focusing on enabling those new types of like creative endeavors to flourish.

And so the movie example is like someone shooting something very cinematically instead of on stage, or I've heard the analogy that, you know, we're not just going to put full-size newspapers on the web, or we're not just going to take desktop websites and put them on the smartphone. There's like a native app. What's that analogy for video with AI instead of reproducing the previous medium?

I think it's part of the collective creative effort to try to uncover those. I think our role is partially just making sure that we can build those models safely and put them in the hands of creatives to figure out those new narratives and those new expression mediums. One that I feel is particularly interesting, though, is this idea of not thinking about film as a singular narrative or as a baked piece of content. You think about any movie you've watched recently or any series you've watched,

someone, a team, a company, a director, a filmmaker, or an artist made that and rendered that. And render that is you've collectively defined what the piece of content is. And then the next stage is you need to distribute that to viewers. And so you go into Netflix, into YouTube, whatever distribution format you have these days.

The interesting thing is that with journey models, you are going to be able to generate those pixels. Perhaps there's no rendering moment anymore because you might be generating those pixels as they're being watched or being consumed, which means that the types of stories that you can build can be much more personalized or much more specific or much more nuanced to your audience and viewer. And it also can be variable and can change. So that might be the case. And I'm not saying this is going to be the case for every single piece of like

content out there, but there might be a space where like, it looks way more like a video game than like a film. And it's still a story. And maybe you're in that story as well, or you're generating that story. Those are things that you can do today with traditional like editing techniques or traditional cinema, because you're constrained technologically, but what you can do with it. I'm reminded of the Neil Stevenson book, Diamond Age, not Snow Crash and VR, the other one. Was it called Ractives, I think? No, I haven't read that.

It was very different. There were actors who were kind of live playing with, but the concept was a movie or a TV show. The popular form of consumption had become like a, it's dynamic and it's like playing around you and you're a character in the reactive, I think, or reactive, I think is what they call it. Advertisers are going to love this if this becomes a possibility. Like the ability to do like one-to-one personalized marketing is crazy.

I think we're somehow living through that era of like personalization, like the Spotify algorithm is like a great example of exactly that. And it combines everything we're chatting right now, AI algorithms and data. And it works. It works so great that you forget that that's an AI system, like behind the scenes. That's ultimately the goal. Right. There are 2 billion unique Facebook news feeds, which is extremely different than 5 million people receiving a newspaper once a day and opening it up and all reading the same thing.

And that's great. That's great. On this topic, before we move on, what's your kind of most mind-blown moment that you've seen so far of something somebody's created with Runway? Oh, there's so many, I would say, creative moments. I think overall, more than one particular example is the feeling when really you've thought about very hard around every possible use case of a model or a way of using the model,

And then you put this model into the hands of a very talented artist and that person realizes and uses it in a way that you never thought of before. And I think that that's the adrenaline rush that as till makers we always try to like find, which is you're making a guitar and you put that guitar into the hands of Jimi Hendrix.

trying to predict the talent that might emerge and the type of like emotion and type of art that someone like that can make with an instrument or a tool like that. As toolmakers, it's just joy. We have a few moments in time where we've seen a few Jimi Hendrix there playing the guitar in ways that we never thought were possible.

Well, speaking of your Jimi Hendrix, this was used as one of the tools to make everything everywhere all at once, right? They use one of her many AI tools to edit a few scenes in that movie. Yes. A small percentage of scenes.

Still pretty cool. Because that was very early. I remember the first time I read that that was the case, I felt like it was only in the last few weeks I had heard things like, you know, what if AI starts augmenting journalists? And what if AI starts augmenting filmmakers? And then you're like, oh, literally the movie that I just saw and has...

crazy visual effects, like very good, very clever, very inventive visual effects. By the way, with a five-person VFX team, not a 500-person VFX team, is already using AI tools as, of course, one of many tools in the workflow. But like, it's not future stuff. It's present. It's happening. That's a perfect example, I would say, of what's to come more, where like, you're not really realizing that a lot of things are already using AI in some sort.

Really, that movie, it's a beautiful movie. If you haven't watched it, I'd definitely stop. Recommend just going and watching it. And it was just great, as you were saying, by a small team of seven editors and VFX people. Extremely talented artists who use many tools, among those, one of our tools to automate and go through the process of building such a massive, visually intense movie. And I think that's, again, a taste of what's to come with regards to

really making sure you can execute AVS really fast. I think the real promise of journey models and the tools that we're building on Runway is to take down the cost of creation to nearly zero.

It shouldn't be a constraint how expensive your ideas are in terms of communicating them. The only thing that should matter is how good they are and how many times you can iterate on those because every creative endeavor is just a feedback process of iteration. The faster you can iterate, the more things you can make. Like right now, everything is kind of waterfall where I'm going to read listeners the prompt that I put into Runway over the weekend when I was playing around with it.

Lens flare from a sunset while filming a pan shot of a vintage green 1970s Porsche 911 with Los Angeles in the background, super high gloss. I'm like watching this shot and just thinking about the camera setup and the perfect short amount of time. I would have like a five minute window to do this shot with a several person film crew and very expensive equipment to rent. And here I can just keep iterating on it.

in runway you also do this staged level of fidelity where first i get a still frame and i can choose oh sort of like this and i think you probably use that as some sort of seed to build a shorter cut and then from there i can sort of do the more expensive thing of do a full high resolution longer scene it is really mind-blowing of just me sitting here literally on my free credits before i even

paid for a full account, I could do this versus a several thousand dollar, you better catch this in five minutes film shot. Yeah, that's a great encapsulation of this overall idea of really thinking about anything great as an equals odd rule. So the more you make, the better stuff you'll eventually make. And so making and having a tool that allows you to do that work that you're referring to, like shooting something or creating a video with nothing but like a word,

allows you to do it at scale. You can do it multiple times, you can do it super fast, you're not constrained to actually going and shooting that in the real world.

And so the best strategy is really to produce as much work as possible because eventually from that amount of work, something great would come out of it. And the best artists, regardless of the medium, are the artists who are always experimenting and creating a lot every single day. Picasso was painting every single day. The best filmmakers are shooting and thinking about cinema every single day. And sometimes...

It's hard and it's expensive because you don't have the tools and maybe you don't have the resources, but now you have. We want to talk all about beyond just, you know, movie making and all the other use cases and applications here for Runway. But even just staying in that, like the past decade, we've talked about this so much on Acquired.

has converged all of commercial moviemaking into the most monolithic, non-iterative, expensive, barrier gatekeeping, unoriginal. 25 out of 25 top-grossing films are reused IP every year. Right. Marvel was one of the best acquisitions of all time and one of the most anti-democratizing forces in Hollywood over the last 20 years. Yeah.

And so you're like coming at right the right time to liberate this. I think the best movies are yet to be made and the best stories are yet to be told. Like we could consider like the golden era of cinema happening in one particular decade many years ago. I actually think that we're yet to see like the true like golden era of cinema. The moment like more people are able to create what used to be only the realm of like small studios or small agencies or small

Teams is going to be now feasible to anyone. We just released, and I mean just because this happened like four days ago, a new version of our video generation model that has created some sort of like insane wave of creativity and filmmaking. People are using it to create all sorts of interesting, fascinating short films,

And I was just chatting with a producer of a major production house, and I was showing him a clip someone made. You can search for it. It's called Commuters. I think I tweeted about it. Just robots in a subway car in New York. Incredibly well-crafted, very great cinematography, well-shot. It's 30 seconds. It's short.

If you look at that and you think about how long it took and how expensive you think it was, you might guess around like a couple of weeks and a couple thousand dollars. Like all of it was just made with one person really on a couple of hours, just using Runway and a few other tools. And the results were just like astonishing. I mean, it's just so good. And so that's for me where like,

If you consider that we're still in the early, early stages of this technology, like again, think about the camera. We're in the 1910s of the camera. It's a black and white camera, it kind of works, but it's a lot of work to be done.

We're going to get to like a labor of resolution and quality that will enable people to do this real, real, real wild creative stuff. You're six or so years in at this point. Has the thesis changed at all on where you'll need to play in the value chain? Like I could imagine thinking at first, oh, we want to create

models and we want researchers to use it and we want application developers to build on top of it. But now you have a full-blown application and you've had to build a lot of like real user experience for novices or people that aren't that well-versed in filmmaking. How has that evolved? There's a lot of, I would say, fundamental pieces of the technology that had to be built. Again, we were discussing like

the origins of PyTorch and TensorFlow, which are the frameworks that I guess nowadays every model is using. And those are just like a couple of years old. And so if you want to deploy a video generation model to the wild and to millions of users, you need to have some proper infrastructure in place. And so we've

From the very beginning at Runway, actually, we started building those underlying systems. And eventually, I would say that you get to a point where for someone who's shooting a film or who's telling a story, models don't really matter. Like, no one cares, really. No one cares beyond, like, the researchers themselves or the engineers themselves. Or at some point, like, if you're a technologist and, like, you care about it, you will, like, go deeper into it. But if you're a storyteller, like, you care about tools that are expressive and controllable, right?

And that's the only thing you care because that's the thing you want to use. It's like Shopify. Like if you're a retailer, you don't really care how Shopify works. You're just like, let me take credit cards, you know? And every piece of like major disrupting technology has gone through similar stages where like the internet at the beginning, everyone wanted to chat about routers and the internet highway speed or whatever you want to call it. And you have all these terms to refer to understanding this technology and

Really nowadays, no one cares. You just open the website and it works. And if it doesn't work, you complain about it. It's somehow similar where our goal is really not to obsess around the technology because when you obsess around the technology, you don't find real problems to solve. Our problems, and we state this as the company's vision,

We're a research-driven company. We've built some of the perhaps most important models, or we build really important models in the space. At the same time, our goal really is to move storytelling forward. We're a storytelling company. We're a company devoted to creativity. And the way to deliver that is like, well, we have to build it. We have to build everything from scratch. So we'll go back to the baseline, lowest level possible to make it happen. But it's always good to obsess around people and not technology.

It feels like a very sort of NVIDIA approach, right? Like NVIDIA is, you know, my sense from the long series we did on them last year is they don't do what they do just to have the coolest technology. They do what they do to

first so that people can make the coolest video games possible. And, you know, now so that this can happen, right? Exactly. I think the best companies are companies that are obsessed around customers and users and use cases rather than technology. I think a common misconception is since we're all so excited about technology is that

that you start obsessing around the technology itself and people, models and data sets, all these things dominate conversations nowadays, but few people are asking themselves like for whom and what, right? And I think we've always started the conversation from the other side, which is like, yeah, filmmakers, how do you make their process and their lives easier? So let's work backwards towards that.

Switching over to the business side of things, there's clearly lots of different use cases. And so how do you do sort of like pricing and packaging and go-to-market and customer segmentation when the tool is so versatile? It's hard. It's hard because this is a new field and it's an evergreen field that's changing radically. And so the one thing I would say we've learned over time by building Runway and building some sort of like learning and heuristics around that is that

Over-optimizing for the wrong thing at the wrong time can be very costly.

And so making sure innovation is at the core, how you're thinking about the company and the research and the product needs to be front and center. And so monetization and value capture can really depend on the type of model and the type of output that you can make. And I think we're really early on that journey. Video and capacity and resolution will continue to improve. Efficiencies of running these models will also go down. Right now, we haven't even entered the...

efficiency stages of the technology where things are going to get cheaper, faster, leaner. I think eventually you will get to a very similar like traditional SaaS model, which we already kind of have, but more optimized. I think there's a lot of optimization that has to be done to make these models really, really effective to be used, hopefully in a real-time basis very soon.

And do you find that the way that you build user experience and controls and the application workspace and support and all these things for enterprise level customers, I don't know who your enterprise customers are. Are they like Hollywood filmmakers? Like, are there different versions of the product and the experience that you need to craft for different audiences? Totally. Again, I'll go back to the camera. The camera can be flexible enough to be used by a consumer, but you also have red cameras, which are like professional filmmaking cameras that have all these controls and systems and settings that

It's interesting because those controls and ways of manipulating and having the flexibility that you want in a creative tool, specifically for generative models, hasn't been invented. This technology and these things that we can do weren't there a couple of years ago. And so you really have to think about what we call primitives or metaphors to interact with the technology. If you think about panning and zooming, those are all concepts that we've come to understand so we can make sense of how to control a camera.

So you need to make the same metaphors or similar metaphors or distinct metaphors to try to control these models that don't work in a similar fashion, in a similar way. And so a lot of the times, a lot of the research that we do is on that kind of like aspects of understanding. I actually agree that one of the hardest things to do these days on, I would say, the field of AI is actually product building.

It's not building the models. It's not fine-tuning the models. Of course, that's challenging and it requires a lot of things. But ultimately, and I go back to the people problem and the goal of what you're trying to do is trying to find the best interface and the best product to solve a need using a model. That's where like the real challenges start to appear. We spoke about this a few episodes ago in our sort of original exploration of generative AI here on ACQ2 with Jake Saper at Emergence of like,

Yeah, the models and the technical abilities is critical, but you need the UIs and the workflows. In many ways, that's the much more scarce and harder thing to develop. It is. It's unknown territory. We haven't entered the full spectrum of what's possible with these models. And so there's so much to be explored around this.

how to use these models and how to, again, I'll go back to control. Control is just key. You need to have control in a creative tool. And so coming up with those metaphors is totally new. It's a whole new field of research and exploration that we haven't delved into before. If people are noodling on this and they're like, I want to get into this field, what do you think it takes on a founding team for a startup to build a successful paradigm shifting AI company? Well, that's a big question.

I can give you, I guess, a sense of what we think is required to work at Runway and how we like decide on who we hire. And I think a few things that we tend to look a lot for, and I think are key for working in the space is one, humbleness, really not getting attached to your ideas and really willing, being able to like learn and question everything.

It's a field that's moving fast and you have a lot of preconceptions around how things should work because they've always worked like that.

you're gonna get dizzy very soon. Again, think about painters understanding the camera as a new paintbrush. And the thing is that it's not a new paintbrush, it's a new thing, and it requires you to think about it very differently. There's new challenges with it, new ways of using it, new art forms that are emerged with it, new artists. The idea of a photographer or a filmmaker might not have even been conceivable for someone who's painting in the 1800s.

Similarly to now, if you come with a lot of preconceptions around how creative tools should work and how creatives are actually working and not really questioning as to why, a good mindset there is to have like a first principles like view of the world. Just go and ask yourself why a lot. Like are people really so deep into analysts or video editing systems because they like them a lot or is it because that's the only thing they know how to use, right?

I don't know if there's specifics, but traits that I think we tend to look a lot for when hiring people that could also, I guess, be extrapolated to if someone wants to build in the space. And again, take this with a grain of salt is I would say having a first principles kind of mindset. I would say the first thing. The second is just humbleness and being able to learn a lot.

and not getting attached to ideas because the space is moving really fast. And I guess the last thing is just focus on people, focus on customers and focus on the goals of things they want to achieve. When you and your co-founders were first starting, did you have like a perfect skill set where, hey, one of us understands the plight of the video creator and another one of us is a pioneering researcher in foundational models? Or were there things that were like totally missing that you had to figure out along the way?

No, totally missing. I mean, the thing that's interesting is that now you can always tell the story backwards and like find all the ways of connecting the dots. There's this very influential book that we used to give everyone at Runway, which I think has really helped us shape our understanding of how to build teams and products and research. It's a book called Why Greatness Cannot Be Planned by Kenneth Stanley.

And it basically outlines that the best way to build great things is to kind of like start laying stepping stones. And every time you lay a stepping stone towards something, you can look around and new doors will open, new doors will be closed. Take one of those, take as many as you can, keep experimenting, move to the next one. And so really, I guess to your point, like the idea of even like foundational models 10 years ago wasn't even in the realm of what people thought. I mean, it wasn't a term.

And so a lot of the things that we thought you might need today, I see this idea of like prime engineering and prime engineers being like a job people want to hire for. Those things didn't exist just like four months ago. And so it's not that you need those skills to assemble someone or to assemble a team or a company. It's more of like just have a mindset of like understanding that those things are going to be possible someday. Yeah.

It's funny you were saying that I was like, that book and that name is ringing so many bells. And I just looked up. I think this is one of Patrick's biggest episodes last year. The author Kenneth was on invest like the best last year. And I remember listening to it and just being like, wow, that is a completely orthogonal way to think about building companies, growth mindset, a new way to think about it. I just love it. With that in mind of like, you got to be humble. You

daily here in this space. How are you thinking about business model right now? What is it right now? You mentioned you, you know, essentially SaaS business model, SaaS pricing. You know, as I think back about this, every time there's been a major revolution in technology in the

video filmmaking space, there's been a business model revolution, right? Like if I think back to, you know, Kodak, like the business model was, yeah, sell cameras, but you make a lot of money selling and developing film. Like that was the primary business model. And then you think about digital photography and you're like, oh, well, you know, Apple made all the money there. You just sell a device that includes this technology. What do you think it looks like here? I think it's too early to tell. It's

its final shape or form or be able to categorize it so definitely into something. I think interesting insights are customization matters a lot because again, control matters a lot. And so fine tuning is going to be really relevant for large customers and enterprises.

At the same time, I think distribution opens new possibilities for consumption. And so business models based not on the creation, but on the consumption side of things. Again, if we go back to like thinking about film and video as a game or closely more related to like perhaps the space of a video game, then you have a much more

options and business models that can be built around that as well. Right. How do we make sense? Because in a scenario like that, the cost actually accrues at time of consumption, right? Not at time of creation. Exactly. And so the creation components might be different because you might charge people differently or value might occur different, but also the consumption might be different as well. The compute is happening at the creation time. Yeah.

And compute optimizations will like go better over time. And so right now it's a big constraint, but over time it won't, I don't think it won't be that much. Which actually, if you step back and think about it, if that is the way this space evolves, I think that's going to be a successively much better business model than Apple's. Because if you think about like Apple's

Apple monetizes. I'm using Apple writ large to put red camera in there too. Device makers. Anybody who's selling device makers. They monetize a fairly large amount up front. But then...

All of the consumption of, you know, photos and videos taken on Apple devices, you know, that's a trillion plus dollar economy across social media, everything. Apple doesn't monetize any, you know, variable rate with that. Yeah. I mean, yeah, okay, sure. But like, no, like the monetization happens on Instagram, on Snapchat, on TikTok, et cetera. And software makers have figured it out. Like Adobe makes a bunch of money every month by people consuming their software. Right.

Right. But if you could monetize on a variable rate basis with consumption, that's just a way, way, way bigger opportunity.

Yeah, and I think that, again, we're not there yet technologically, but I think we will. And I think it's interesting to explore those. I think the most interesting thing is change, create their own markets and create their own like business models. I think this is the case, like where new business opportunities and new business models will be born out of it. I think we're already seeing initial like behaviors that will like make that the case.

Yeah. And obviously it's too early to tell, but like, I'm sure Tim Cook would in a heartbeat trade Apple's current business model for half a cent on every Instagram view, you know, or every TikTok view out there. Maybe. I'm not sure Tim Cook would trade Apple's business model for much. Fair enough. But the world may go in that direction. Yeah.

Yeah. Chris, I'm curious. This is going to be a little bit of a finance-y question. And let's take it away from Runway and talk about companies in general that make foundational models and productize them to sell them to customers.

So for a company like that, comparing it against like a SaaS company, do you think 10 years from now the income statement between a 2018 SaaS company versus a 10 years from now AI company, do they look the same or are the margins actually different because even if inference cost goes to zero, there's still large training costs required on an ongoing basis?

Yeah, and I think comparing research companies and research labs with traditional SaaS, it's perhaps the most fair comparison. And again, first of all, I think every company is different. So every company can operate differently, can offer different strategies to try to capture or create new markets or compete. And so not all companies and not all research labs might have a similar or even the same strategy, I would say. So I think, first of all, everything in life, it really depends. But overall, I would say that

perhaps a better comparison is to think about like research labs and companies building foundational models, more as like a bio company where there's an intense capital that needs to be like put up front to do the research, to get to where you need to go. And then there's a lot of like commercialization on top of it and product and can be built on top of it. And more importantly, a know-how of how to do it the next time and the next time and the next time, and also building the infrastructure to do it multiple times and scale and

The margins and the ways of, I would say, thinking about the investment here and the long-term value kind of like captures comes more from like upfront investment that you will have to do to train a model like Gen 2 and Gen 1, and then starting to like commercialize that after as well.

Certainly right now, the amount of money being raised by AI companies is very large. And my assumption has been that's largely because of training costs, that it's just the compute to build these types of companies is just much more expensive. At least right now in history, there really shouldn't be anything else about the company that's much more expensive, right? The go-to-market's the same, the talent maybe a little bit more expensive, but not... No, the talent is definitely more expensive. I mean, there's...

There are only a few people in the world that are able to do the baseline. I mean, a few. There's definitely a lot, but it's not a crowded market. Right. You're not hiring iOS engineers here. Exactly. And so research really matters, and that talent is expensive. That's one thing. I wouldn't say that's the main thing, but you definitely consider that. And you won't see that in other SaaS businesses where you can assemble an amazing business with great engineering full-stack folks,

Research, it's different and it takes a type of talent that's coming more and more, but it's still like rare. And so that's an expensive part of that. But for sure, compute matters a lot. And I would say long-term commitments on compute also matter a lot. So it's a scarce resource. And so you need to make sure that you get your hands into those resources if you want to do the kind of work that you want to do. There's an element like on our Nike episode, we talked about how...

really Nike and, you know, Adidas and the other scale players, they lock up the world's footwear manufacturing capacity for years at a time. And nobody else can produce at that scale. NVIDIA and Apple with TSMC, same thing. Yeah. And there's the same element happening here with compute. There is. And so being able to just

compete there, it's a requirement. If not, nothing else might matter. That's a capital that you need to just get there. Are the hyperscalers sort of reserving that capacity for people with deep pockets? Or if you're a startup, can you sign a big long contract even though you haven't raised the money yet?

I'm not sure. I think every cloud provider might be trying to do something different, so I can't really speak to everyone. I think these days getting compute is hard. It's really hard. And just going to AWS and asking or getting other cloud providers and getting one GPU might be hard because there's a lot of demand and there's a lot of...

different like demands coming from different companies who are trying to get capacity up to speed to train and also run models in inference. Hopefully that will get solved. Yeah. I mean, I guess that kind of brings us to your capital structure and your most recent fundraise. For folks who don't know and didn't see that recently, you just raised $141 million led by Google, but also with NVIDIA in the round and your VCs and plenty of others. There's a very strategic element to that, I would imagine. Yeah.

There is. It's an honor to be able to work with some of the best companies in the world. I think that's, first of all, one of the main takeaways of being able to partner with companies like NVIDIA, Google, and Salesforce is to make sure that, again, we understand that this is not just about models. It's about getting into the hands and building great products that solve actual problems in the real world. So who better to partner with some of the best companies in the world to actually do that? Is the way that...

The financing market is playing out right now. Tell me if this is directionally correct or not. Pure financial investors are just at a disadvantage because they offer commodity capital, whereas you can go raise a lot of capital from people who can provide access to the scarce resources in AI right now, namely compute.

where you could get your dollars from one or the other. But, you know, if you go with a corporate investor who actually has this sort of access, it's trajectory changing for the company. It is. And I think it's the investor landscape is also being radically redrawn and reinvented. I think Nat Friedman has been kind of like leading a great example there, building their own like cluster of GPUs and offering that to their companies. I think that's

something rare to see and perhaps unimaginable just a couple of years ago. But it tells you a lot that value comes not just from capital, specifically with like the last couple of years where interest rates were zero and perhaps capital was actually like just free and cheap.

there's more value that's required to build great companies. And if you can provide that by providing or giving infrastructure or doing more than just capital, of course, for companies that will be... It's so funny. All the VCs thought that AI might be coming for their jobs, making investment decisions. No, it turns out the AI disruption in VC is whether you have a GPU cluster or not. It turns out the platform teams we wanted all along were actually just GPUs. Yeah.

I love it. I love it. Totally fascinating.

Well, as we start drifting toward a close here, one question I do have for you is for people whose interest is piqued by this, and the answer can be technical answers or it can be more abstract answers. What are the canonical pieces of reading that people should go do if they want to set aside a weekend or a week or an hour and just try to get deeper on a high level understanding of where we're at today? On just the overall field of AI or in particular? Yeah, favorite pieces you've read about it.

My favorite piece is, it's a hard one. I remember like Karpathy wrote a blog post in 2015 called, I think, The Unreasonable Effectiveness of Regular Neural Networks, which is something I think not in use these days. But I think it opened my eyes into like, why would it be possible? So it's a great like more historical piece that I often go to. And then for the visual domain that I can perhaps more relate to since we're building a lot of visual tools these days at Runway,

There's this piece by an artist called Carl MacDonald that speaks a lot about using early computer vision models and early, early, early models for video making and image making. I'll go back and read that a few times because it always brings me a lot of interesting ideas and concepts around where things were just a couple of years ago. And again, coming back to like state of rate of change these days.

And then besides that, you know, these days there's so many things going on that it's hard to keep up to. I think just Twitter is a great source of like material these days and, but not really like getting attached to anything again, because perhaps something you read that I would recommend last week might become obsolete next week. So it's hard to define.

We spoke a bit about your history and being at NYU Tisch. How did you find yourself at this intersection? Like, have you always been fascinated by both engineering and, you know, filmmaking and visual arts? Or did you start in one or the other? What's your journey been here?

My journey, and I think the journey of my co-founders as well, has always been very inspired by a combination of multiple things. I have a background in economics and I work as a business consultant for some time. I did art and an exhibit in major places. I've doubled as software engineer and freelance for some time. I think we're more particularly interested in just, we're very curious people. And we understand that the best trait to have

is just to be able to learn anything. And when you learn that you can learn anything, that's a superpower. And same with my co-founders. They're like engineers, researchers turn into artists and artists turn into engineers. And that, I think, gives you some perspective of how to build things that would say break the mold or like the systems that we might have established around how what the AI world is and what the research world is and what the engineering world is. And you really start understanding that those are just arbitrary like

silos and worlds that you can break apart if you know how to speak the languages of those. I feel like NYU's always had that as, or at least the Tisch program has always had that as kind of part of the ethos, right? Were you part of, is it ITP? What's that? Yeah, you're familiar with ITP. Yeah. Is that what you were a part of? I was, yes. I came to study at NYU at ITP and ITP is, it's a rare program, an intersection of like art and technology. What does it stand for?

It's one of those names that was given, like I would say, like 40 years ago. And I think perfectly encapsulates that moment of time and technology at that moment in time. It stands for Interactive Telecommunications Program.

Awesome. That's right. That's right. Which I love. It's so old school. Nowadays, you do more than telecommunications, which perhaps was the thing people were thinking about in the 40s when Red Burns founded the program. The best way of thinking about ITP, and I think the ethos also of Runway that I think we've got inspiration from, is ITP is an art program for engineers, an engineering school for artists.

It's a frontier of the recently possible. So you can come and do things that are rare and weird and unique. And so thinking about computer vision and AI in 2015 and thinking about it in the realm of art was rare and weird. Now it's out of the place, but I think it was the fact that we were able and willing to go into that that got us where we are right now. Super cool to come full circle. I remember...

Right after I graduated from college and lived in New York, I, of course, read Fred Wilson's blog every day at USV. And that's how I first learned about ITP was him talking about it. Did Dennis Crowley was he? Yeah, Dennis Crowley from Foursquare. I think the initial ideas came out of from a class that I think is still running called

big games where you create live, like real games in the city. And I think Dennis created like a real size Pac-Man in New York. I remember reading about that. In fact, I remember hearing from a friend who was interested in that back when I was in high school, years before Foursquare came out, that someone from NYU had done this crazy real life Pac-Man. I remember the story now, Dennis. It was originally called Dodgeball.

And Google bought it. And then he was at Google for a few years. And then, you know, like many things at that time during Google, it went nowhere. And then he left and restarted it as Foursquare. Exactly. It's a great place. If you want to build and explore technology, ITV is a great place. So cool. Well, Chris, I know there is one other part of Runway that we haven't talked about yet, which is Runway Studios. And that's fun because it's just...

very cool art that people can go and check out. Tell us a little bit about what Runway Studios is and how people can view it. Runway Studios is, I would say, the entertainment division within Runway. And so we have Runway Research that pioneers the research models and the things that we need to make sure we can keep doing to push the boundaries of the field. And Runway Studios is the creative partner of filmmakers and musicians and artists that want to take these models and push them to the next level. And so we're

We've helped produce short films and music videos. We have an active call for grants that people can apply to get funding to make content and make videos and make short films and make even future films with Runway. The best way to think about it is think about it as a, like, it's Pixar. It's a new type of department or company within Runway that it's really pushing the boundaries of storytelling from the creative side of things, not just from the technological, technical side of things. And Parasite.

Pixar is an amazing analogy because obviously the films were to showcase Render Man and the Pixar computer. Pixar was a hardware company. I'm a huge fan, of course, of everything Pixar-wise. But I think that the key lesson for me there is like when you are able to merge art and science, great things happen. Yeah.

Love it. Well, where can people reach out to you or Runway if they're interested in being a customer playing around with the tools or working at Runway or working with you guys in any way? Yeah, I mean, we're hiring across the spectrum. So if you're interested in working with us, just go to randommail.com slash careers. You can also find me on Twitter and the office we have in New York and Tribeca. We spend most of the time here. The team is based here. And then Runway Research and Studios, just search Runway Research and Runway Studios and you'll probably find the right links for that.

Awesome. Well, Chris, thanks so much. Cool. Thank you guys. Thanks, Chris.

Generative AI in Video and the Future of Storytelling (with Runway CEO Cristobal Valenzuela)

ACQ2 by Acquired

Chapters

How Does Generative AI for Video Work?

The Role of Large Models in AI

Use Cases and Customer Segments for Runway

Business Model and Future of AI in Video

Challenges and Opportunities in AI Research

Personal Journey and Future Directions

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine tailors playlist for your curiosity

Generative AI in Video and the Future of Storytelling (with Runway CEO Cristobal Valenzuela) 57:31

ACQ2 by Acquired

Chapters

How Does Generative AI for Video Work?

The Role of Large Models in AI

Use Cases and Customer Segments for Runway

Business Model and Future of AI in Video

Challenges and Opportunities in AI Research

Personal Journey and Future Directions

Shownotes Transcript

PodQuest PodQuest Podcast Discovery Engine tailors playlist for your curiosity

Generative AI in Video and the Future of Storytelling (with Runway CEO Cristobal Valenzuela)