NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets AI transcript and summary - episode of podcast No Priors: Artificial Intelligence | Technology | Startups

November 7, 2024 · 35 min read

Go to PodExtra AI's episode page (NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets) to play and view complete AI-processed content: summary, mindmap, topics, takeaways, transcript, keywords and highlights.

Go to PodExtra AI's podcast page (No Priors: Artificial Intelligence | Technology | Startups) to view the AI-processed content of all episodes of this podcast.

No Priors: Artificial Intelligence | Technology | Startups episodes list: view full AI transcripts and summaries of this podcast on the blog

Episode: NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets

NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets

Author: Conviction
Duration: 00:36:53

Episode Shownotes

In this week’s episode of No Priors, Sarah and Elad sit down with Jensen Huang, CEO of NVIDIA, for the second time to reflect on the company’s extraordinary growth over the past year. Jensen discusses AI’s takeover of datacenters and NVIDIA’s rapid development of x.AI’s supercluster. The conversation also covers

Nvidia’s decade-long infrastructure bets, software longevity, and innovations like NVLink. Jensen shares his views on the future of embodied AI, digital employees, and how AI is transforming scientific discovery. Sign up for new podcasts every week. Email feedback to [email protected] Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Nvidia Show Notes: 00:00 Introduction 1:22 NVIDIA's 10-year bets 2:28 Outpacing Moore’s Law 3:42 Data centers and NVLink 7:16 Infrastructure flexibility for large-scale training and inference 10:40 Building and optimizing data centers 13:30 Maintaining software and architecture compatibility 15:00 X.AI’s supercluster 18:55 Challenges of super scaling data centers 20:39 AI’s role in chip design 22:23 NVIDIA's market cap surge and company evolution 27:03 Embodied AI 28:33 AI employees 31:25 Impact of AI on science and engineering 35:40 Jensen’s personal use of AI tools

Full Transcript

00:00:05 Speaker_00
Hi, listeners, and welcome to KnowPriors. Today, we're here again, one year since our last discussion with the one and only Jensen Huang, founder and CEO of NVIDIA.

00:00:15 Speaker_00
Today, NVIDIA's market cap is over $3 trillion, and it's the one literally holding all the chips in the AI revolution.

00:00:21 Speaker_00
We're excited to hang out in NVIDIA's headquarters and talk all things frontier models and data center scale computing and the bets NVIDIA is taking on a 10-year basis. Welcome back, Jensen.

00:00:32 Speaker_01
Thirty years in to NVIDIA and looking 10 years out, what are the big bets you think are still to make? Is it all about scale up from here?

00:00:39 Speaker_01
Are we running into limitations in terms of how we can squeeze more compute memory out of the architectures we have? What are you focused on?

00:00:47 Speaker_03
Well, if we take a step back and think about what we've done, we went from coding to machine learning.

00:00:55 Speaker_03
from writing software tools to creating AIs and all of that running on CPUs that was designed for human coding to now running on GPUs designed for AI coding, basically, machine learning. And so the world has changed.

00:01:14 Speaker_03
The way we do computing, the whole stack has changed. And as a result, the scale of the problems we could address has changed a lot because if you could paralyze your software on one GPU,

00:01:26 Speaker_03
you've set the foundations to parallelize across a whole cluster or maybe across multiple clusters or multiple data centers.

00:01:34 Speaker_03
And so I think we've set ourselves up to be able to scale computing at a level and develop software at a level that nobody's ever imagined before. And so we're at the beginning of that.

00:01:46 Speaker_03
Over the next 10 years, our hope is that we could double or triple performance every year at scale, not at chip.

00:01:57 Speaker_03
at scale and to be able to therefore drive the cost down by a factor of two or three, drive the energy down by a factor of two, three every single year.

00:02:05 Speaker_03
When you do that every single year, when you double or triple every year, in just a few years, it adds up. And so it compounds really, really aggressively. And so I wouldn't be surprised.

00:02:16 Speaker_03
if you know the way people think about moore's law which is uh two x every couple of years um you know we're going to be on some kind of a hyper moore's law curve and um i fully hope that we continue to do that well what do you think is the driver of making that happen even faster than moore's law because i know moore's law was sort of self-reflexive right it was something that he said and then people kind of implemented it to make it happen you know the two fundamental

00:02:41 Speaker_03
technical pillars. One of them was Dennard scaling, and the other one was Carver Mead's VLSI scaling. And both of those techniques were rigorous techniques, but those techniques have really run out of steam.

00:02:56 Speaker_03
And so now we need a new way of doing scaling.

00:03:00 Speaker_03
Obviously, the new way of doing scaling are all kinds of things associated with co-design, unless you can modify or change the algorithm to reflect the architecture of the system, and then change the system to reflect the architecture of the new software and go back and forth.

00:03:19 Speaker_03
unless you can control both sides of it, you have no hope. But if you can control both sides of it, you can do things like move from FP64 to FP32 to BF16 to FP8 to FP4 to who knows what, right? And so I think that co-design is a very big part of that.

00:03:38 Speaker_03
The second part of it, We call it full stack. The second part of it is data center scale. Unless you could treat the network as a compute fabric. and push a lot of the work into the network, push a lot of the work into the fabric.

00:03:57 Speaker_03
And as a result, you're compressing, doing compressing at very large scales. And so that's the reason why we bought Mellanox and started fusing InfiniBand and MVLink in such an aggressive way. And now look where MVLink is gonna go.

00:04:17 Speaker_03
The Compute Fabric is going to, scale out what appears to be one incredible processor called a GPU. Now we're going to hundreds of GPUs that are going to be working together.

00:04:30 Speaker_03
You know, most of these computing challenges that we're dealing with now, one of the most exciting ones, of course, is inference time scaling. It has to do with essentially generating tokens at incredibly low latency.

00:04:47 Speaker_03
Because you're self-reflecting, as you just mentioned. I mean, you're going to be doing tree search. You're going to be doing chain of thought. You're going to be doing probably some amount of simulation in your head.

00:04:57 Speaker_03
You're going to be reflecting on your own answers. Well, you're going to be prompting yourself and generating text silently and still respond hopefully in a second. Well, the only way to do that is if your latency is extremely low.

00:05:15 Speaker_03
Meanwhile, the data center is still about producing high throughput tokens because you still want to keep costs down, you want to keep the throughput high, you want to generate a return.

00:05:28 Speaker_03
And so these two fundamental things about a factory, low latency and high throughput, they're at odds with each other.

00:05:36 Speaker_03
In order for us to create something that is really great in both, we have to go invent something new and NVLink is really our way of doing that. Now you have a virtual GPU that has incredible amount of flops because you need it for context.

00:05:51 Speaker_03
You need a huge amount of memory, working memory, and still have incredible bandwidth for token generation all at the same time.

00:05:58 Speaker_02
I guess in parallel, you also have all the people building the models, actually also optimizing things pretty dramatically.

00:06:03 Speaker_02
Like David on my team pulled data where over the last 18 months or so, the cost of a million tokens going into a GPT-4 equivalent model has basically dropped 240x.

00:06:14 Speaker_02
And so there's just massive optimization and compression happening on that side as well.

00:06:19 Speaker_03
just in our layer, just on the layer that we work on. You know, one of the things that we care a lot about, of course, is the ecosystem of our stack and the productivity of our software.

00:06:30 Speaker_03
You know, people forget that because you have CUDA Foundation, and that's a solid foundation, everything above it can change. If everything, if the foundation's changing underneath you, it's hard to build a building on top.

00:06:42 Speaker_03
It's hard to create anything interesting on top. And so CUDA made it possible for us to iterate so quickly, just in the last year, And then we just went back and benchmarked when Lama first came out.

00:06:53 Speaker_03
We've improved the performance of Hopper by a factor of five, without the layer on top ever changing.

00:07:01 Speaker_03
Now, well, a factor of five in one year is impossible using traditional computing approaches, but accelerated computing and using this way of code design, we're able to invent all kinds of new things, yeah.

00:07:16 Speaker_01
How much are your biggest customers thinking about the interchangeability of their infrastructure between large scale training and inference?

00:07:29 Speaker_03
Well, infrastructure is disaggregated these days. Sam was just telling me that he had decommissioned Volta just recently. They have Pascals, they have Amperes, all different configurations of Blackwell coming.

00:07:41 Speaker_03
Some of it is optimized for air cool, some of it's optimized for liquid cool. your services are gonna have to take advantage of all of this.

00:07:49 Speaker_03
The advantage that NVIDIA has, of course, is that the infrastructure that you built today for training will just be wonderful for inference tomorrow.

00:07:59 Speaker_03
And most of chat GPT, I believe, are inferenced on the same type of systems that were trained on just recently. And so if you can train on it, you can inference on it.

00:08:07 Speaker_03
And so you're leaving a trail of infrastructure that you know is gonna be incredibly good at inference.

00:08:14 Speaker_03
And you have complete confidence that you can then take that return on the investment that you've had and put it into a new infrastructure to go scale with. You know you're going to leave behind something of use.

00:08:27 Speaker_03
And you know that NVIDIA and the rest of the ecosystem are going to be working on improving the algorithm so that the rest of your infrastructure improves by a factor of five in just a year. And so that motion will never change.

00:08:40 Speaker_03
And so the way that people will think about the infrastructures, yeah, even though I built it for training today, it's gotta be great for training. We know it's gonna be great for inference. Inference is going to be multi-scale.

00:08:53 Speaker_03
I mean, you're going to take, first of all, in order to distill smaller models, it's good to have a larger model to distill from. And so you're still going to create these incredible frontier models.

00:09:03 Speaker_03
They're going to be used for, of course, the groundbreaking work. You're going to use it for synthetic data generation. You're going to use the models, the big models, to teach smaller models and distill down to smaller models.

00:09:13 Speaker_03
And so there's a whole bunch of different things you could do, but in the end, you're gonna have giant models all the way down to little tiny models.

00:09:21 Speaker_03
The little tiny models are gonna be quite effective, you know, not as generalizable, but quite effective. And so, you know, they're gonna perform very specific stunts incredibly well, that one task.

00:09:32 Speaker_03
And we're gonna see superhuman tasks in one little tiny domain from a little tiny model, maybe, you know, it's not a small language model, but, you know, tiny language model, TLMs or, you know, whatever.

00:09:46 Speaker_03
Yeah, so I think we're going to see all kinds of sizes, and we hope. Is that right? Just kind of like softwares today. I think in a lot of ways, artificial intelligence allows us to break new ground in how easy it is to create new applications.

00:10:00 Speaker_03
But everything about computing has largely remained the same. For example, the cost of maintaining software is extremely expensive. And once you build it, you would like it to run on a large of an installed base as possible.

00:10:13 Speaker_03
You would like not to write the same software twice. I mean, a lot of people still feel the same way. You like to take your engineering and move them forward. And so to the extent that the architecture

00:10:25 Speaker_03
allows you to, on one hand, create software today that runs even better tomorrow with new hardware. That's great. Or software that you create tomorrow, AI that you create tomorrow, runs on a large installed base. You think that that's great.

00:10:37 Speaker_03
That way of thinking about software is not gonna change.

00:10:40 Speaker_01
NVIDIA has moved into larger and larger, let's say, like unit of support for customers. I think about it going from single chip to, you know, server to rack and VL72. How do you think about that progression? Like what's next?

00:10:53 Speaker_01
Like should NVIDIA do full data center?

00:10:56 Speaker_03
In fact, we build full data centers. The way that we build everything, unless you're building If you're developing software, you need the computer in its full manifestation. We don't build PowerPoint slides and ship the chips.

00:11:12 Speaker_03
And we build a whole data center. And until we get the whole data center built up, how do you know the software works? Until you get the whole data center built up, how do you know you're, you know,

00:11:22 Speaker_03
your fabric works and all the things that you expected the efficiencies to be, how do you know it's going to really work at the scale?

00:11:31 Speaker_03
And that's the reason why it's not unusual to see somebody's actual performance be dramatically lower than their peak performance as shown in PowerPoint slides. Computing is just not used to, it's not what it used to be.

00:11:49 Speaker_03
You know, I say that the new unit of computing is the data center. That's to us. So that's what you have to deliver. That's what we build. Now, we build a whole thing like that.

00:11:58 Speaker_03
And then we, for every single thing, every combination, air-cooled, x86, liquid-cooled, grace, Ethernet, InfiniBand, NVLink, no NVLink, you know what I'm saying? We build every single configuration. We have five supercomputers in our company today.

00:12:12 Speaker_03
Next year, we're going to build easily five more. So if you're serious about software, you build your own computers. If you're serious about software, then you're going to build your whole computer, and we build it all at scale.

00:12:22 Speaker_03
This is the part that is really interesting. We build it at scale, and we build it vertically integrated. We optimize it full stack, and then we disaggregate everything. and we sell it in parts.

00:12:35 Speaker_03
That's the part that is completely utterly remarkable about what we do. The complexity of that is just insane. And the reason for that is we want to be able to graft our infrastructure into GCP, AWS, Azure, OCI.

00:12:51 Speaker_03
All of their control planes, security planes are all different. And all of the way they think about their cluster sizing, all different. But yet we make it possible for them to all accommodate NVIDIA's architecture so that CUDA could be everywhere.

00:13:05 Speaker_03
That's really, in the end, the singular thought, that we would like to have a computing platform that developers could use that's largely consistent.

00:13:15 Speaker_03
Modulo 10% here and there because people's infrastructure is slightly optimized differently, and modulo 10% here and there, but everything they build will run everywhere.

00:13:25 Speaker_03
This is one of the principles of software that should never be given up, and we protect it quite dearly. It makes it possible for our software engineers to build once, run everywhere.

00:13:38 Speaker_03
That's because we recognize that the investment of software is the most expensive investment and it's easy to test. Look at the size of the whole hardware industry, and then look at the size of the world's industries.

00:13:50 Speaker_03
It's $100 trillion on top of this $1 trillion industry, and that tells you something. The software that you build, you basically maintain for as long as you shall live. We've never given up on a piece of software.

00:14:01 Speaker_03
The reason why CUDA is used is because I told everybody, we will maintain this for as long as we shall live, and we're serious. I just saw a review the other day, Nvidia Shield, our Android TV. It's the best Android TV in the world.

00:14:18 Speaker_03
We shipped that seven years ago. It is still the number one Android TV that people, you know, anybody who enjoys TV. And we just updated the software just this last week and people wrote a new story about it.

00:14:31 Speaker_03
GeForce, we have 300 million gamers around the world. We've never stranded a single one of them. And so the fact that our architecture is compatible across all of these different areas makes it possible for us to do it.

00:14:42 Speaker_03
Otherwise, we would have software teams that are a hundred times the size of our company is today, if not for this architectural compatibility. So we're very serious about that. And that translates to benefits to the developers.

00:14:56 Speaker_02
One impressive substantiation of that recently was how quickly it brought up a cluster for x.ai. And if you want to talk about that, because that was striking in terms of both the scale and the speed with which you did that.

00:15:06 Speaker_03
You know, a lot of that credit you got to give to Elon. I think the, first of all, to decide to do something, select the site, bring cooling to it, power.

00:15:23 Speaker_03
And then decide to build this 100,000 GPU supercluster, which is the largest of its kind in one unit. And then working backwards, we started planning together the date that he was gonna stand everything up.

00:15:41 Speaker_03
And the date that he was gonna stand everything up was determined quite a few months ago.

00:15:49 Speaker_03
And so all of the components, all the OEMs, all the systems, all the software integration we did with their team, all the network simulation, we simulate all the network configurations. I mean, it's like we pre-staged everything as a digital twin.

00:16:04 Speaker_03
We pre-staged all of his supply chain. We pre-staged all of the wiring of the networking. We even set up a small version of it.

00:16:15 Speaker_03
a kind of a, you know, just a first instance of it, you know, ground truth, if you will, reference zero, you know, system zero, before everything else showed up.

00:16:25 Speaker_03
So by the time that everything showed up, everything was staged, all the practicing was done, all the simulations were done, and then, you know, the massive integration. Even then, the massive integration was a

00:16:37 Speaker_03
was a monument of gargantuan teams of humanity crawling over each other, wiring everything up 24-7. And within a few weeks, the clusters were out. I mean, it's really a testament to his willpower.

00:16:55 Speaker_03
and how he's able to think through mechanical things, electrical things, and overcome what is apparently, you know, extraordinary obstacles.

00:17:05 Speaker_03
I mean, what was done there is the first time that a computer of that large scale has ever been done at that speed.

00:17:13 Speaker_03
Unless our two teams are working from a networking team, the compute team, the software team, the training team, you know, and the infrastructure team, the people that, the electrical engineers to the

00:17:22 Speaker_03
you know, to the software engineers all working together. Yeah, it's really quite a feat to watch.

00:17:27 Speaker_01
Was there a challenge that felt most likely to be blocking from an engineering perspective?

00:17:32 Speaker_03
Just the tonnage of electronics that had to come together. I mean, it'd probably be worth just to measure it. I mean, it's, you know, tons and tons of equipment. It's just abnormal.

00:17:44 Speaker_03
You know, usually a supercomputer system like that, you plan it for a couple of years. from the moment that the first systems come delivered to the time that you probably submitted everything for some serious work, don't be surprised if it's a year.

00:18:00 Speaker_03
I mean, that happens all the time. It's not abnormal. Now, we couldn't afford to do that, so we created, a few years ago, there was an initiative in our company that's called Data Center as a Product.

00:18:17 Speaker_03
We don't sell it as a product, but we have to treat it like it's a product. Everything about planning for it and then standing it up, optimizing it, tuning it, keep it operational.

00:18:28 Speaker_03
The goal is that it should be kind of like opening up your beautiful new iPhone and you open it up and everything just kind of works. Now, of course, it's a miracle of technology making it like that, but we now have the skills to do that.

00:18:40 Speaker_03
And so if you're interested in a data center and just have to give me a space and some power, some cooling, you know, and we'll help you set it up within, call it 30 days. I mean, it's pretty extraordinary.

00:18:52 Speaker_01
That's wild. If you think if you look ahead to 200,000, 500,000, a million in a super cluster or whatever you call it at that point, what do you think is the biggest blocker? Capital? Energy? Supply in one area?

00:19:06 Speaker_03
Everything. Nothing about what you just, the scales that you talked about, nothing is normal. Yeah. But nothing is impossible. Nothing is, yeah. No laws of physics limits, but everything is going to be hard. And of course, you know, is it worth it?

00:19:26 Speaker_03
Like, you can't believe. You know, to get to something that we would recognize as a computer that's so easily and so able to do what we ask it to do, otherwise general intelligence of some kind.

00:19:45 Speaker_03
And even if we could argue about, is it really general intelligence? Just getting close to it is going to be a miracle. We know that. And so I think there are five or six endeavors to try to get there.

00:19:58 Speaker_03
I think, of course, OpenAI and Anthropic and X and, of course, Google and Meta and Microsoft. This frontier, the next couple of clicks up that mountain, are just so vital. Who doesn't want to be the first on that mountain?

00:20:22 Speaker_03
I think that the prize for reinventing intelligence altogether, it's too consequential not to attempt it. And so I think that there are no laws of physics. Everything is going to be hard.

00:20:36 Speaker_01
A year ago when we spoke together, you talked about, we asked like what applications you got most excited about that NVIDIA would serve next in AI and otherwise.

00:20:46 Speaker_01
And you talked about how you let your most extreme customers sort of lead you there and about some of the scientific applications. I think that's become like much more mainstream of you over the last year.

00:20:59 Speaker_01
Is it still like science and AI's application of science that most excites you?

00:21:05 Speaker_03
I love the fact that we have AI chip designers.

00:21:09 Speaker_01
here at NVIDIA.

00:21:09 Speaker_03
Yeah. I love that we have AI software engineers.

00:21:13 Speaker_01
How effective are AI chip designers today?

00:21:15 Speaker_03
Super good. We couldn't have built Hopper without it. And the reason for that is because they could explore a much larger space than we can. And because they have infinite time, they're running on a supercomputer.

00:21:26 Speaker_03
We have so little time using human engineers that we don't explore as much of the space as we should. And we also can't explore it combinatorially. I can't explore my space while including your exploration and your exploration.

00:21:41 Speaker_03
And so, you know, our ships are so large, it's not like it's designed as one ship. It's designed almost like a thousand ships. And we have to optimize each one of them kind of in isolation.

00:21:53 Speaker_03
You really want to optimize a lot of them together and cross-module co-design and optimize across a much larger space. Obviously, we're going to be able to find local maximums that are hidden behind local minimums somewhere.

00:22:11 Speaker_03
And so clearly, we can find better answers. You can't do that without AI engineers. Just simply can't do it. We just don't have enough time.

00:22:20 Speaker_02
One other thing that's changed since we last spoke collectively, and I looked it up, at the time, NVIDIA's market cap was about $500 billion. It's now over $3 trillion.

00:22:30 Speaker_02
So over the last 18 months, you've added $2.5 trillion plus of market cap, which effectively is $100 billion plus a month. or two and a half snowflakes or, you know, a stripe plus a little bit or however you want to think about it. A country or two.

00:22:43 Speaker_02
A country or two. Obviously, a lot of things have stayed consistent in terms of focus on what you're building and etc. And, you know, walking through here earlier today, I felt the buzz like when I was at Google 15 years ago.

00:22:55 Speaker_02
It was kind of you felt the energy of the company and the vibe of excitement. What has changed during that period, if anything?

00:23:00 Speaker_02
Or what is different in terms of either how NVIDIA functions or how you think about the world or the size of bets you can take or?

00:23:07 Speaker_03
Mm-hmm. Well, our company can't change as fast as the stock price. Let's just be clear about that. So in a lot of ways, we haven't changed that much. I think the thing to do is to take a step back and ask ourselves, what are we doing?

00:23:23 Speaker_03
I think that that's really the big observation, realization, awakening for companies and countries is what's actually happening. I think what we were talking about earlier, from our industry perspective, we reinvented computing.

00:23:40 Speaker_03
Now, it hasn't been reinvented for 60 years. That's how big of a deal it is.

00:23:46 Speaker_03
That we've driven down the marginal cost of computing down probably by a million X in the last 10 years, to the point that we just, hey, let's just let the computer go exhaustively write the software. That's the big realization.

00:24:00 Speaker_03
And that in a lot of ways, we were kind of saying the same thing about chip design.

00:24:06 Speaker_03
We would love for the computer to go discover something about our chips that we otherwise couldn't have done ourselves, explore our chips and optimize it in a way that we couldn't do ourselves, in the way that we would love for a digital biology or any other field of science.

00:24:23 Speaker_03
And so I think people are starting to realize, one, we reinvented computing. But what does that mean, even? All of a sudden, we create this thing called intelligence. What happened to computing? Well, we went from datacenters.

00:24:37 Speaker_03
Datacenters are multi-tenant, stores our files. These new datacenters we're creating are not datacenters. They're not multi-tenant. They tend to be single-tenant. They're not storing any of our files.

00:24:47 Speaker_03
They're producing something and they're producing tokens. These tokens are reconstituted into what appears to be intelligence. Isn't that right? and intelligence of all different kinds. It could be articulation of robotic motion.

00:25:01 Speaker_03
It could be sequences of amino acids. It could be chemical chains. It could be all kinds of interesting things. So what are we really doing? We've created a new instrument, a new machinery that in a lot of ways is the noun of the adjective generative AI.

00:25:21 Speaker_03
Instead of generative AI, it's an AI factory. It's a factory that generates AI. And we're doing that at extremely large scale, and what people are starting to realize is, you know, maybe this is a new industry.

00:25:34 Speaker_03
It generates tokens, it generates numbers. But these numbers constitute, in a way, that is fairly valuable. And what industry would benefit from it? Then you take a step back and you ask yourself again, what's going on in NVIDIA?

00:25:50 Speaker_03
On the one hand, we reinvented computing as we know it. And so there's a trillion dollars worth of infrastructure that needs to be modernized. That's just... one layer of it.

00:26:00 Speaker_03
The big layer of it is that there's this instrument that we're building is not just for data centers, which we're modernizing, but you're using it for producing some new commodity. And how big can this new commodity industry be?

00:26:15 Speaker_03
Hard to say, but it's probably worth trillions. And so that I think is kind of the, if you were to take a step back, you know, we don't build computers anymore, we build factories. And every country is going to need it.

00:26:28 Speaker_03
Every company is going to need it. Give me an example of a company or industry that says, you know what? We don't need to produce intelligence. We've got plenty of it. And so that's the big idea, I think. And that's kind of an abstracted industrial view.

00:26:44 Speaker_03
And, you know, someday, someday people will realize that in a lot of ways that the semiconductor industry wasn't about building chips. It was building, it was about building the foundational fabric for society.

00:26:55 Speaker_03
And then all of a sudden everybody goes, Oh, I get it. You know, this is a big deal. It's not just about chips.

00:27:00 Speaker_01
How do you think about embodiment now?

00:27:04 Speaker_03
Well, the thing I, I, I'm super excited about is, in a lot of ways, we're close to artificial general intelligence, but we're also close to artificial general robotics. Tokens are tokens. I mean, the question is, can you tokenize it?

00:27:20 Speaker_03
Of course, tokenizing things is not easy, as you guys know.

00:27:24 Speaker_03
But if you were able to tokenize things, align it with large language models and other modalities, if I can generate a video that has Jensen reaching out to pick up the coffee cup, why can't I prompt a robot to generate the tokens to pick up the, you know?

00:27:44 Speaker_03
And so intuitively you would think that the problem statement is rather similar for a computer. And so I think that we're that close. That's incredibly exciting.

00:27:55 Speaker_03
Now, the two brownfield robotic systems, brownfield meaning that you don't have to change the environment for, is self-driving cars and with digital chauffeurs and embodied robots, right?

00:28:09 Speaker_03
Between the cars and the human robot, we could literally bring robotics to the world without changing the world because we built a world for those two things.

00:28:18 Speaker_03
probably not a coincidence that Elon's focused on those two forms of robotics because it is likely to have the largest potential scale. And so I think that's exciting. But the digital version of it, is equally exciting.

00:28:33 Speaker_03
We're talking about digital or AI employees. There's no question we're going to have AI employees of all kinds and our outlook will be some biologics and some artificial intelligence and we will prompt them in the same way. Isn't that right?

00:28:48 Speaker_03
Mostly I prompt my employees, provide them context, ask them to perform a mission. They go and recruit other team members. They come back and we're going back and forth. How is that going to be any different with digital and AI employees of all kinds?

00:29:05 Speaker_03
So we're going to have AI marketing people, AI designers, AI supply chain people, you know. And I'm hoping that NVIDIA is someday biologically bigger, but also from an artificial intelligence perspective, much, much bigger. That's our future company.

00:29:23 Speaker_01
If we came back and talked to you a year from now, what part of the company do you think would be most artificially intelligent?

00:29:31 Speaker_03
I'm hoping it's chip design.

00:29:33 Speaker_01
The most important part. That's right.

00:29:35 Speaker_03
Because I should start where it moves the needle most. Also, where we can make the biggest impact most. It's such an insanely hard problem. I work with Sasneen at Synopsys and Root at Cadence.

00:29:50 Speaker_03
I totally imagine them having Synopsys chip designers that I can rent. They know something about a particular module, their tool. and they trained an AI to be incredibly good at it, and we'll just hire a whole bunch of them.

00:30:07 Speaker_03
Whenever we're in that phase of that chip design, I might rent a million synopsis engineers to come and help me out, and then go rent a million cadence engineers to help me out.

00:30:19 Speaker_03
What an exciting future for them, that they have all these agents that sit on top of their tools platform, that use the tools platform and collaborate with other platforms. Christian will do that at SAP and Bill will do that at ServiceNow.

00:30:35 Speaker_03
People say that these SaaS platforms are going to be disrupted. I actually think the opposite.

00:30:40 Speaker_03
that they're sitting on a goldmine, that they're going to be this flourishing of agents that are going to be specialized in Salesforce, specialized in, you know, Salesforce, I think they call Lightning and SAP as a BAP, and everybody's got their own language.

00:30:55 Speaker_03
Is that right? And we got CUDA and we've got OpenUSD for Omniverse and who's going to create an AI agent that's awesome at OpenUSD. We are, because nobody cares about it more than we do.

00:31:09 Speaker_03
I think in a lot of ways, these platforms are going to be flourishing with agents and we're going to introduce them to each other and they're going to collaborate and solve problems.

00:31:17 Speaker_01
You see a wealth of different people working in every domain in AI. What do you think is under-noticed or that you want more entrepreneurs or engineers or business people to go work on?

00:31:29 Speaker_03
Well, first of all, I think what is misunderstood and misunderstood, maybe underestimated, is the under-the-water activity, under-the-surface activity of groundbreaking science, computer science, to science and engineering that is being affected by AI and machine learning.

00:31:58 Speaker_03
I think you just can't walk into a science department anywhere, theoretical math department anywhere, where AI and machine learning and the type of work that we're talking about today is going to transform tomorrow.

00:32:12 Speaker_03
If you take all of the engineers in the world, all of the scientists in the world, and you say that the way they're working today is early indication of the future, because obviously it is, then you're going to see a tidal wave

00:32:27 Speaker_03
of generative AI, a tidal wave of AI, a tidal wave of machine learning, change everything that we do in some short period of time.

00:32:37 Speaker_03
Now, remember, I saw the early indications of computer vision and the work with Alex and Ilya and Hinton in Toronto, and Yann LeCun, and of course, Andrew Ang here in Stanford.

00:32:55 Speaker_03
You know, I saw the early indications of it, and we were fortunate to have extrapolated from what was observed to be detecting cats into a profound change in computer science, in computing altogether. And that extrapolation was fortunate for us.

00:33:15 Speaker_03
And now, of course, we were so excited by it, so inspired by it, that we changed everything about how we did things. But that took how long?

00:33:26 Speaker_03
It took literally six years from observing that toy, AlexNet, which I think by today's standards would be considered a toy, to superhuman levels of capabilities in object recognition. Well, that was only a few years.

00:33:40 Speaker_03
What is happening right now, the groundswell in all of the fields of science, not one field of science left behind. I mean, just to be very clear.

00:33:48 Speaker_03
Everything from quantum computing to quantum chemistry, every field of science is involved in the approaches that we're talking about. If we give ourselves, and they've been at it for a couple, two, three years.

00:34:00 Speaker_03
If we give ourselves another couple, two, three years, the world's going to change. There's not going to be one paper. there's not gonna be one breakthrough in science, one breakthrough in engineering where generative AI isn't at the foundation of it.

00:34:10 Speaker_03
I'm fairly certain of it. And so I think there's a lot of questions about, every so often I hear about whether this is a fad. Computer, you just gotta go back to first principles and observe what is actually happening.

00:34:27 Speaker_03
The computing stack, the way we do computing has changed. If the way you write software has changed, I mean, that is pretty cool. Software is how humans encode knowledge. This is how we encode our algorithms. We encode it in a very different way now.

00:34:45 Speaker_03
That's gonna affect everything. Nothing else will ever be the same. And so I think I'm talking to the converted here, and we all see the same thing.

00:34:55 Speaker_03
And all the startups that you guys work with, and the scientists I work with, and the engineers I work with, nothing will be left behind. I mean, we're gonna take everybody with us.

00:35:05 Speaker_01
I think one of the most exciting things coming from the computer science world and looking at all these other fields of science is, like, I can go to a robotics conference now, a material science conference, a biotech conference, and like, I'm like, oh, I understand this.

00:35:19 Speaker_01
You know, not at every level of the science, but in the driving of discovery, it is all the algorithms that are general.

00:35:25 Speaker_03
And there's some universal, unifying concepts. Yeah.

00:35:30 Speaker_01
Yeah. And I think that's like incredibly exciting when you see how effective it is in every domain.

00:35:35 Speaker_03
Yep. Absolutely. Yeah. And I'm so excited that I'm using it myself every day. You know, I don't know about you guys, but it's my tutor now. I mean, I don't do I don't learn anything without first going to an AI. Why learn the hard way?

00:35:53 Speaker_03
Just go directly to an AI. I just go directly to chat GPT. Sometimes I do perplexity just depending on the formulation of my questions. I just start learning from there and then you can always fork off and go deeper if you like.

00:36:07 Speaker_03
But holy cow, it's just incredible. And almost everything I know, I double check. Even though I know it to be a fact, what I consider to be ground truth, I'm the expert. I'll still go to AI and double check. Yeah, it's so great.

00:36:21 Speaker_03
Almost everything I do, I involve it. Yeah.

00:36:24 Speaker_01
I think it's a great note to stop on.

00:36:26 Speaker_03
Thanks so much for your time today. Yeah, I really enjoyed it. Nice to see you guys.

00:36:29 Speaker_01
Thanks, Jensen. Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week.

00:36:43 Speaker_01
And sign up for emails or find transcripts for every episode at no-priors.com.

No Priors: Artificial Intelligence | Technology | Startups episodes list: view full AI transcripts and summaries of this podcast on the blog​

Episode: NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets​

Episode Shownotes​

Full Transcript​

No Priors: Artificial Intelligence | Technology | Startups episodes list: view full AI transcripts and summaries of this podcast on the blog

Episode: NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets

Episode Shownotes

Full Transcript