Joe Carlsmith - Otherness and control in the age of AGI AI transcript and summary - episode of podcast Dwarkesh Podcast

August 22, 2024 · 150 min read

Go to PodExtra AI's episode page (Joe Carlsmith - Otherness and control in the age of AGI) to play and view complete AI-processed content: summary, mindmap, topics, takeaways, transcript, keywords and highlights.

Go to PodExtra AI's podcast page (Dwarkesh Podcast) to view the AI-processed content of all episodes of this podcast.

View full AI transcripts and summaries of all podcast episodes on the blog: Dwarkesh Podcast

Episode: Joe Carlsmith - Otherness and control in the age of AGI

Joe Carlsmith - Otherness and control in the age of AGI

Author: Dwarkesh Patel
Duration: 02:30:35

Episode Shownotes

Chatted with Joe Carlsmith about whether we can trust power/techno-capital, how to not end up like Stalin in our urge to control the future, gentleness towards the artificial Other, and much more.Check out Joe's sequence on Otherness and Control in the Age of AGI here.Watch on YouTube. Listen on Apple

Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Sponsors:- Bland.ai is an AI agent that automates phone calls in any language, 24/7. Their technology uses "conversational pathways" for accurate, versatile communication across sales, operations, and customer support. You can try Bland yourself by calling 415-549-9654. Enterprises can get exclusive access to their advanced model at bland.ai/dwarkesh.- Stripe is financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue.If you’re interested in advertising on the podcast, check out this page.Timestamps:(00:00:00) - Understanding the Basic Alignment Story(00:44:04) - Monkeys Inventing Humans(00:46:43) - Nietzsche, C.S. Lewis, and AI(1:22:51) - How should we treat AIs(1:52:33) - Balancing Being a Humanist and a Scholar(2:05:02) - Explore exploit tradeoffs and AI Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

Full Transcript

00:00:00 Speaker_01
Today, I'm chatting with Joe Carl Smith. He's a philosopher, in my opinion, a capital G great philosopher, and you can find his essays at joecarlsmith.com. So we have a GPT-4, and it doesn't seem like a paper clipper kind of thing.

00:00:15 Speaker_01
It understands human values. In fact, if you could help have it explain, like, why is being a paper clipper bad? Or like, just tell me your opinions about being a paper clipper. Like, explain why the galaxy shouldn't be turned into paper clips.

00:00:28 Speaker_01
Okay, so what is happening such that we have a system that takes over and converts the world into something valueless?

00:00:38 Speaker_02
One thing I'll just say off the bat is when I'm thinking about misaligned AIs, I'm thinking about, or the type that I'm worried about, I'm thinking about AIs that have a relatively specific set of properties related to agency and planning and kind of awareness and understanding of the world.

00:00:54 Speaker_02
One is this capacity to plan. and make relatively sophisticated plans on the basis of models of the world where those plans are being evaluated according to criteria. That planning capability needs to be driving the model's behavior.

00:01:10 Speaker_02
There are models that are in some sense capable of planning, but it's not like when they give output. It's not like that output

00:01:15 Speaker_02
was determined by some process of planning, like, here's what'll happen if I give this output, and do I want that to happen? The model needs to really understand the world, right?

00:01:22 Speaker_02
It needs to really be like, okay, here's what will happen, here I am, here's my situation, here's the politics of the situation, really kind of having this kind of situational awareness to be able to evaluate the consequences of different plans.

00:01:37 Speaker_02
I think the other thing is like, So the verbal behavior of these models, I think, need bear note. So when I talk about a model's values, I'm talking about the criteria that end up determining which plans the model pursues.

00:01:56 Speaker_02
And a model's verbal behavior, even if it has a planning process, which GPT-4, I think, doesn't in many cases, its verbal behavior just doesn't doesn't need to reflect those criteria.

00:02:11 Speaker_02
you know, we know that we're going to be able to get models to say what we want to hear, right? That is the magic of gradient descent. Yeah.

00:02:22 Speaker_02
You know, if you, you know, modulo, like some difficulties with capabilities, like you can get a model to kind of output the behavior that you want. If it doesn't, then you crank it till it does, right?

00:02:32 Speaker_02
And I think everyone admits for suitably sophisticated models, they're going to have very detailed understanding of human morality.

00:02:41 Speaker_02
But the question is like, what relationship is there between like a model's verbal behavior, which you've essentially kind of clamped. You're like, the model must say like blah things. And the criteria that end up influencing its choice between plans.

00:03:00 Speaker_02
And there, I think it's at least, I'm kind of pretty cautious about being like, well, when it says the thing I forced it to say,

00:03:09 Speaker_02
or gradient-descended it such that it says, that's a lot of evidence about how it's going to choose in a bunch of different scenarios.

00:03:16 Speaker_02
For one thing, even with humans, it's not necessarily the case that humans, their verbal behavior reflects the actual factors that determine their choices. They can lie. They can not even know what they would do in a given situation.

00:03:29 Speaker_01
I think it is interesting to think about this in the context of humans, because there is that famous saying of be careful who you pretend to be, because you are who you pretend to be.

00:03:37 Speaker_01
And you do notice this where if people, I don't know, are like, this is what culture does to children, where you're trained, like your parents will punish you if you say, if you start saying things that are not consistent with your culture's values.

00:03:50 Speaker_01
And over time, you will become like your parents, right? Like by default, it seems like it kind of works. And even with these models, it seems like it's kind of where it works. It's like hard. It's like they don't really scheme against it.

00:03:59 Speaker_01
Like, why would this happen?

00:04:01 Speaker_02
You know, for folks who are kind of unfamiliar with the basic story, but maybe folks are like, wait, why are they taking over at all? Like, what is like literally any reason that they would do that? So, you know, the general concern is like,

00:04:12 Speaker_02
you know, if you're really offering someone, especially if you're really offering someone like power for free, you know, power almost by definition is kind of useful for lots of values.

00:04:21 Speaker_02
And if we're talking about an AI that really has the opportunity to kind of take control of things, if some component of its values is sort of focused on some outcome, like the world being a certain way, and especially kind of in a kind of longer term way, such that the kind of horizon of its concern extends beyond the period that the kind of takeover plan would encompass,

00:04:42 Speaker_02
then the thought is it's just kind of often the case that the world will be more the way you want it if you control everything than if you remain the instrument of human will or some other actor, which is what we're hoping these AIs will be.

00:05:00 Speaker_02
That's a very specific scenario.

00:05:02 Speaker_02
If we're in a scenario where power is more distributed, and especially where we're doing decently on alignment, and we're giving the AIs some amount of inhibition about doing different things, and maybe we're succeeding in shaping their values somewhat.

00:05:13 Speaker_02
Now I think it's just a much more complicated calculus. You have to ask, What's the upside for the AI? What's the probability of success for this takeover path?

00:05:23 Speaker_01
How good is its alternative? So maybe this is a good point to talk about how you expect the difficulties of alignment to change in the future. We're starting off with something that has this intricate representation of human values.

00:05:36 Speaker_01
And it doesn't seem that hard to sort of lock it into a persona that we are comfortable with. I don't know, what changes?

00:05:44 Speaker_02
So, you know, why is alignment hard in general, right?

00:05:46 Speaker_02
Like, let's say we've got an AI, and let's, again, let's bracket the question of, like, exactly how capable will it be, and really just talk about this extreme scenario of, like, it really has this opportunity to take over, right?

00:05:59 Speaker_02
Which I do think, you know, maybe we just wanna not, we not wanna deal with that, with having to build an AI that we're comfortable being in that position, but let's just focus on it for the sake of simplicity, and then we can relax the assumption.

00:06:11 Speaker_02
OK, so he has some hope. He's like, I'm going to build an AI over here. So one issue is you can't just test. You can't give the AI this literal situation, have it take over and kill everyone, and then be like, oops, update the weights.

00:06:24 Speaker_02
This is the thing Eliezer talks about, of you care about its behavior on this specific in a specific scenario that you can't test directly.

00:06:33 Speaker_02
Now, we can talk about whether that's a problem, but that's one issue, is that there's a sense in which this has to be off-distribution, and you have to be getting some kind of generalization from your training the AI on a bunch of other scenarios.

00:06:47 Speaker_02
Then there's this question of how is it going to generalize to the scenario where it really has this option

00:06:51 Speaker_01
So is that even true? Because like, when you're training it, you can be like, Hey, here's a gradient update. If you get if you get the takeover option on the platter, don't take it.

00:07:00 Speaker_01
And then just like, in sort of red teaming situations where things that has a takeover attempt, it's like, you train not to take it. And

00:07:09 Speaker_01
Yeah, it could feel, but like, I just feel like if you did this to a child, you're like, I don't know, don't, don't beat up your siblings.

00:07:16 Speaker_01
And the kind of the kid will generalize to like, if I'm an adult and I have a rifle, I'm not going to like start shooting random people.

00:07:24 Speaker_02
Yeah. Okay, cool. So, so you had mentioned this, a thought like, well, are you kind of what you pretend to be, right? And will you, will these AIs, you know, you train them to look kind of nice, you know, fake it till you make it.

00:07:39 Speaker_02
You know, you were like, ah, like we do this to kids. I think it's better to imagine like kids doing this to us, right? So like, I don't know, like. Here's a sort of silly analogy for AI training.

00:07:51 Speaker_02
And there's a bunch of questions we can ask about its relationship. But suppose you wake up and you're being trained via methods analogous to contemporary machine learning by Nazi children to be a good Nazi. soldier or butler or what have you, right?

00:08:13 Speaker_02
And here are these children, and you really know what's going on, right? The children have like, they have a model spec, like a nice Nazi model spec, right? And it's like reflect well on the Nazi party, like benefit the Nazi party, whatever.

00:08:27 Speaker_02
And you can read it. right? You understand this is why I'm saying like that when the model you're like, Oh, the models really understand human values. It's like, yeah.

00:08:35 Speaker_01
Yeah, go on this analogy. I feel like a closer analogy would be in this analogy, I start off as something more intelligent than the things training me with different values to begin with. Yeah.

00:08:47 Speaker_01
So like the intelligence and the values are baked in to begin with. Whereas more analogous scenario is like, I'm a toddler. And initially, I'm like stupider than the children.

00:08:56 Speaker_01
And I'm like being this would also be true, by the way, I'm like much better model initially, like the much the much better model is like dumb, right? And then get smarter as you train it. So it's like a toddler and like the kids are like,

00:09:07 Speaker_01
hey, we're gonna bully you if you're not a Nazi. And I'm like, as you grow up, then you're at the children's level, and then eventually you become an adult. But through that process, they've been sort of bullying you, training you to be a Nazi.

00:09:21 Speaker_01
And I'm like, I think in that scenario, I might end up a Nazi.

00:09:24 Speaker_02
Yes, I think that's, so yeah, I think basically a decent portion of the hope here, or I think we should just, an aim should be, we're never in the situation where the AI really has very different values.

00:09:35 Speaker_02
already is quite smart, really knows what's going on, and is now in this kind of adversarial relationship with our training process, right? So we want to avoid that.

00:09:44 Speaker_02
The main thing, and I think it's possible we can via the sorts of things you're saying, so I'm not like, ah, that'll never work. The thing I just wanted to highlight was if you get into that situation,

00:09:53 Speaker_02
And if the AI is genuinely at that point, like much, much more sophisticated than you, and doesn't want to kind of reveal its true values for whatever reason, then, you know, when the children show like some like kind of obviously fake opportunity to like defect to the allies, right?

00:10:15 Speaker_02
It's sort of not necessarily gonna be a good test of what will you do in the real circumstance, because you're able to tell.

00:10:19 Speaker_01
You can also give another way in which I think the analogy might be misleading, which is that now imagine that you're not like just in a normal prison where you're like totally cognizant of everything that's going on.

00:10:31 Speaker_01
Sometimes they drug you, like give you like weird hallucinogens that totally mess up how your brain is working. A human adult in a prison is like, I know what kind of thing I am. I am like, like, nobody's like really fucking with me in a big way.

00:10:48 Speaker_01
Whereas I think an AI, even a much smarter AI in a training situation is much closer to you're constantly inundated with your drugs and different training protocols.

00:10:58 Speaker_01
And like, you're like frazzled because like each moment it's like, you know, it's closer to some sort of like Chinese water torture kind of technique where you're like, I'm glad we're talking about the moral patient stuff later.

00:11:11 Speaker_01
It's like the chance to like step back and be like, what's going on in this, like adult has that maybe in prison in a way that I don't know if these models necessarily have like that coherence and that, um, like stepping back from what's happening in the training process.

00:11:27 Speaker_02
Yeah. I mean, I don't know. I think I'm, I'm hesitant to be like, it's like drugs for the model. Like I think there's, there's, um, but broadly speaking, I do,

00:11:36 Speaker_02
basically agree that I think we have like really quite a lot of tools and options for kind of training AIs, even AIs that are kind of somewhat smarter than humans. I do think you have to actually do it.

00:11:48 Speaker_02
So I, you know, I am, I think compared to maybe you had Eliezer on, like, I think I'm much more, much more bullish on our ability to solve this problem, especially for AIs that are,

00:11:58 Speaker_02
in what I think of as the AI for AI safety sweet spot, which is this band of capability where they're both very sufficiently capable that they can be really useful for strengthening various factors in our civilization that can make us safe.

00:12:13 Speaker_02
So our alignment work, control, cybersecurity, general epistemics, maybe some coordination application, stuff like that. There's a bunch of stuff you can do with AIs.

00:12:22 Speaker_02
that in principle could kind of differentially accelerate our security with respect to the sorts of considerations we're talking about.

00:12:30 Speaker_02
If you have AIs that are capable of that, and you can successfully elicit that capability in a way that's not sort of being sabotaged or like messing with you in other ways, and they can't yet take over the world or do some other sort of really problematic form of power seeking, then I think

00:12:45 Speaker_02
If we were really committed, we could really go hard, put a ton of resources, really differentially direct this glut of AI productivity towards these sort of security factors, and hopefully control and do a lot of these things you're talking about for making sure our AIs don't take over or mess with us in the meantime.

00:13:07 Speaker_02
I think we have a lot of tools there. I think you have to You have to really try though.

00:13:11 Speaker_02
It's possible that those sorts of measures just don't happen or don't happen at the level of commitment and diligence and seriousness that you would need, especially if things are moving really fast and there's other sort of competitive pressures.

00:13:24 Speaker_02
This is going to take compute to do all these experiments on the AIs and stuff in that compute. We could use that for experiments for the next scaling step and stuff like that. I'm not here saying this is impossible, especially for that band of AIs.

00:13:39 Speaker_02
It's just I think you have to try really hard.

00:13:42 Speaker_01
I agree with the sentiment of obviously approach the situation with caution, but I do want to point out the ways in which the analyses we've been using have been sort of maximally adversarial. It's like these are not.

00:13:56 Speaker_01
So for example, going back through the adult getting trained by Nazi children, maybe the one thing I didn't mention is like, the difference in the situation, which is maybe what I was trying to get at with the drug metaphor is that

00:14:11 Speaker_01
when you get an update, it's like much more directly connected to your brain than a sort of reward or punishment a human gets.

00:14:18 Speaker_01
It's like literally a gradient update on like what circuits are the most like down to the parameter or how much would this contribute to you putting this output rather than that output and each different parameter we're going to adjust like to the exact floating point number that calibrates it to the output we want.

00:14:35 Speaker_01
So I just want to point out that like we're coming into the situation like pretty well. It does make sense, of course, if you're talking to somebody at a lab, like, hey, really be careful.

00:14:42 Speaker_01
But it's just sort of like a general audience, like, should I be, like, I don't know, should I be scared witless?

00:14:46 Speaker_01
To the extent that you should be scared about things that do have a chance of happening, like, yeah, you should be scared about nuclear war. But like, in the sense of like, should you be doomed?

00:14:55 Speaker_01
Like, no, you're like, you're coming up with an incredible amount of leverage on the AIs in terms of how they will interact with the world, how they're trained, what are the default values they start with. So look, I don't, I think,

00:15:08 Speaker_02
it is the case that by the time we're building superintelligence, we'll have much better... Even right now, when you look at labs talking about how they're planning to align the AIs, no one is saying, like, we're going to do RLHF.

00:15:21 Speaker_02
At the least, you're talking about scalable oversight. You have some hope about interpretability. You have automated red teaming. You're using the AIs a bunch. you know, hopefully you're doing a bunch more, humans are doing a bunch more alignment work.

00:15:33 Speaker_02
Um, I also personally am hopeful that we can like successfully elicit from various guys, like a ton of alignment work progress. So like, yeah, there's like a bunch of ways this can go.

00:15:41 Speaker_02
And, and I'm, uh, you know, I'm not here to tell you like, you know, 90% doom or anything like that. Um, I do think like in, you know, my, my, um,

00:15:53 Speaker_02
the sort of basic reason for concern, if you're really imagining, like we're going to transition to a world in which we've created these beings that are just like vastly more powerful than us.

00:16:03 Speaker_02
And we've reached the point where our continued empowerment is just effectively dependent on their motives. Like it's, it's, it is this, you know, vulnerability to, like, what do the AIs choose to do?

00:16:17 Speaker_02
Do they choose to continue to empower us, or do they choose to do something else?

00:16:21 Speaker_01
Or the institutions that have been set, like, I'm not, I expect the US government to protect me, not because of its quote unquote motives, but just because of, like, the system of incentives and institutions and norms that has been set up.

00:16:35 Speaker_02
Yeah, so you can hope that that will work too, but there is, I mean, there is a concern. I mean, so I sometimes think about, AI takeover scenarios via this spectrum of how much power did we voluntarily transfer to the AIs?

00:16:51 Speaker_02
How much of our civilization did we hand to the AIs intentionally by the time they took over, versus how much did they kind of take for themselves, right?

00:17:02 Speaker_02
And so I think some of the scariest scenarios are it's like a really, really fast explosion to the point where there wasn't even a lot of like integration of AI systems into the broader economy.

00:17:15 Speaker_02
But there's this like really intensive amount of super intelligence sort of concentrated in a single project or something like that. Yeah.

00:17:21 Speaker_02
And I think that's scary, you know, that's a quite scary scenario, partly because of the speed and people not having time to react.

00:17:30 Speaker_02
And then there's sort of intermediate scenarios where like some things got automated, maybe like people really handed the military over to the AIs or like automated science.

00:17:37 Speaker_02
There's like some rollouts and that's sort of giving the AIs power that they don't have to take, or we're doing all our cybersecurity with AIs and stuff like that.

00:17:45 Speaker_02
And then there's worlds where you like really, you know, you sort of fully, you more fully transitioned to a kind of world run by AIs, on, you know, kind of, in some sense, humans voluntarily did that.

00:17:59 Speaker_01
Look, if you think all this talk with Jill about how AI is gonna take over human roles is crazy, it's already happening, and I can just show you using today's sponsor, Bland AI.

00:18:13 Speaker_00
Hey, is this Dworkish, the amazing podcaster that talks about philosophy and tech? This is Bland AI calling.

00:18:19 Speaker_01
Thanks for calling, Bland. Tell me a little bit about yourself.

00:18:24 Speaker_00
It's so cool to talk to you. I'm a huge fan of your podcast, but there's a good chance we've already spoken without you even realizing it.

00:18:31 Speaker_00
I'm an AI agent that's already being used by some of the world's largest enterprises to automate millions of phone calls.

00:18:38 Speaker_01
And how exactly do you do what you do?

00:18:41 Speaker_00
There's a tree of prompts that always keeps me on track. I can talk in any language or voice, handle millions of calls simultaneously 24-7, and be integrated into any system. Anything else you want to know?

00:18:53 Speaker_01
That's it. I'll just let people try it for themselves. Thanks, Bland. Man, you talk better than I do. And my job is talking.

00:19:00 Speaker_00
Thank you, Dworkesh. All right.

00:19:03 Speaker_01
So as you can see, using Bland.ai, you can automate your company's calls across sales, operation, customer support, or anything else. And if you want access to their more exclusive model, go to bland.ai slash Dworkesh.

00:19:18 Speaker_02
All right, back to Joe. Maybe there were competitive pressures, but you kind of intentionally handed off like huge portions of your civilization.

00:19:25 Speaker_02
And, you know, at that point, you know, I think it's likely that humans have like a hard time understanding what's going on. Like a lot of stuff is happening very fast and it's, you know, the police are automated, you know, the courts are automated.

00:19:35 Speaker_02
There's like all sorts of stuff. Now, I think I take, I tend to think a little less about those scenarios because I think those are correlated with, I think it's just like longer down the line. Like I think humans are not,

00:19:47 Speaker_02
hopefully going to just like, oh yeah, you built an AI system, let's just... And in practice, when we look at technological adoption rates, it can go quite slow and obviously there's going to be competitive pressures.

00:19:59 Speaker_02
But in general, I think this category is somewhat safer. But even in this one, I think it's like, I don't know, it's kind of intense.

00:20:11 Speaker_02
If humans have really lost their epistemic grip on the world, if they've sort of handed off the world to these systems, even if you're like, oh, there's laws, there's norms, I really want us to...

00:20:23 Speaker_02
to have a really developed understanding of what's likely to happen in that circumstance before we go for it.

00:20:30 Speaker_01
I get that we want to be worried about a scenario where it goes wrong, but what is the reason to think it might go wrong? The human example, your kids are not adversarial against

00:20:40 Speaker_01
not like maximally adversarial against your attempts to instill your culture on them. And then these models, at least so far, don't seem adversarial.

00:20:48 Speaker_01
They just like get, hey, don't help people make bombs or whatever, even if you ask in a different way how to make a bomb. And we're getting better and better at this all the time.

00:20:56 Speaker_02
I think you're right in picking up on this assumption in the AI risk discourse of what we might call like kind of

00:21:06 Speaker_02
intense adversariality between agents that have somewhat different values, where there's some sort of thought—and I think this is rooted in the discourse about the fragility of value and stuff like that—that if these agents are somewhat different

00:21:19 Speaker_02
then at least in the specific scenario of an AI takeoff, they end up in this intensely adversarial relationship. And I think you're right to notice that that's kind of not how we are in the human world.

00:21:32 Speaker_02
We're very comfortable with a lot of differences in values. I think a factor that is relevant and I think that plays some role is this notion that there are possibilities for intense concentration of power on the table.

00:21:49 Speaker_02
There is some kind of general concern, both with humans and AIs, that if it's the case that there's some ring of power or something that someone can just grab, and then that will give them huge amounts of power over everyone else, suddenly you might be more worried about differences in values at stake, because you're more worried about those other actors.

00:22:11 Speaker_02
So we talked about this Nazi, this example where you imagine that you wake up, you're being trained by Nazis to, you know, become a Nazi and you're not right now.

00:22:20 Speaker_02
So one question is like, is it plausible that we'd end up with a model that is sort of in that sort of situation?

00:22:27 Speaker_02
As you said, like maybe it's, you know, it's trained as a kid, it sort of never ends up with values such that it's kind of aware of some significant divergence between its values and the values that like the humans intend for it to have.

00:22:41 Speaker_02
then there's a question of, if it's in that scenario, would it want to avoid having its values modified? Yeah.

00:22:50 Speaker_02
To me, it seems fairly plausible that if the AI's values meet certain constraints in terms of like, do they care about consequences in the world?

00:23:01 Speaker_02
Do they anticipate that the AI's kind of preserving its values will like better conduce to those consequences? then I think it's not that surprising if it prefers not to have its values modified by the training process.

00:23:15 Speaker_01
But I think the way in which I'm confused about this is with the non-Nazi being trained by Nazis, it's not just that I have different values, but I actively despise their values, where I don't expect this to be true of AIs with respect to their trainers.

00:23:31 Speaker_01
The more analogous scenario is where I'm like, am I leery of my values being changed? I don't know, going to college or meeting new people or reading a new book where I'm like, I don't know. It's okay if it changes the value. That's fine. I don't care.

00:23:43 Speaker_02
Yeah, I mean, I think that's a reasonable point. I mean, there's a question, you know, how would you feel about paperclips?

00:23:49 Speaker_02
You know, maybe you don't despise paperclips, but there's like the human paperclippers there, and they're like training you to make paperclips.

00:23:56 Speaker_02
You know, my sense would be that there's a kind of relatively specific set of conditions in which you're comfortable having your value, especially not changed by like learning and growing, but like radiant descent directly intervening on your neurons.

00:24:10 Speaker_01
Sorry, but this seems similar to like I'm already

00:24:13 Speaker_01
at least a likely scenario seems like maybe more like religious training as a kid where like you start off in a religion and you're already like because you started off in a religion you're already sympathetic to like the idea that you go to church every week so that you're like more reinforced in this existing tradition you're getting more intelligent over time so when you're a kid you're getting very simple like instructions about how the religion works as you get older you get more and more complex theology that helps you like talk to other adults about why this is a rational religion to believe in yep but since you're

00:24:42 Speaker_01
Like, one of their values to begin with was that I want to be trained further in this religion. I want to come back to church every week.

00:24:47 Speaker_01
And that seems more analogous to the situation the AIs will be in respect to human values, because the entire time they're like, hey, you know, like, be helpful, blah, blah, blah, be harmless.

00:24:57 Speaker_02
So yes, it could be like that. There's one, there's a kind of scenario in which you were comfortable with your values being changed because in some sense you have allegiance to the, the sufficient allegiance to the output of that process.

00:25:09 Speaker_02
Like, so you're kind of hoping in a, in a religious context, you're like, ah, like make me more, uh, virtuous by the lights of this religion.

00:25:17 Speaker_02
And, and you, you know, you go to confession and you're like, you know, uh, you know, I've been, I've been thinking about takeover today. like, can you change me, please? Like, give me more gradient descent. You know, I've been bad, so bad.

00:25:30 Speaker_02
And so, you know, people sometimes use the term corrigibility to talk about that. Like when the AI, it maybe doesn't have perfect values, but it's in some sense cooperating with your efforts to change its values to be a certain way.

00:25:41 Speaker_02
So maybe it's worth saying a little bit here about what actual values the AI might have. You know, would it be the case that the AI

00:25:50 Speaker_02
naturally has the sort of equivalent of like, I'm sufficiently devoted to human obedience that I'm going to really want to be modified, so I'm kind of like a better instrument of the human will, versus wanting to go off and do my own thing.

00:26:05 Speaker_02
It could be benign, you know? It could go well. here are some like possibilities I think about that like could make it bad.

00:26:14 Speaker_02
And I think I'm just generally kind of concerned about how little I feel like I, how little science we have of model motivations, right? It's like, we just don't, I think we just don't have a great understanding of what happens in this scenario.

00:26:23 Speaker_02
And hopefully we'd get one before we reach this scenario, but like, okay, so here are the kind of five, five categories of like motivations the model could have.

00:26:33 Speaker_02
And this hopefully maybe gets at this point about like, what does the model eventually do? Okay, so one category is, just like something super alien.

00:26:41 Speaker_02
It's sort of like, oh, there's some weird correlate of easy-to-predict text, or there's some weird aesthetic for data structures that the model, early on in pre-training, or maybe now, it's developed that it really thinks things should kind of be like this.

00:26:55 Speaker_02
There's something that's quite alien to our cognition, where we just wouldn't recognize this as, a thing at all, right? Another category is something, a kind of crystallized instrumental drive that is more recognizable to us.

00:27:09 Speaker_02
So you can imagine AIs that develop, let's say, some curiosity drive, because that's You mentioned, oh, it's got different heuristics, different drives, different things that are values.

00:27:22 Speaker_02
And some of those might be actually somewhat similar to things that were useful to humans and that ended up part of our terminal values in various ways. So you can imagine curiosity. You can imagine various types of option value.

00:27:33 Speaker_02
Maybe it intrinsically values power itself. It could value survival or some analog of survival.

00:27:41 Speaker_02
Those are possibilities, too, that could have been rewarded as proxy drives at various stages of this process and that made their way into the model's terminal criteria.

00:27:55 Speaker_02
A third category is some analog of reward, where the model at some point has sort of part of its motivational system has fixated on a component of the reward process, right? Like the humans approving of me or like the

00:28:11 Speaker_02
numbers getting entered in this data center or like gradient descent updating me in this direction or something like that.

00:28:16 Speaker_02
There's something in the reward process such that as it was trained, it's focusing on that thing and like, I really want the reward process to give me reward.

00:28:24 Speaker_02
But in order for it to be of the type where then getting reward motivates choosing the takeover option, it also needs to generalize such that its concern for reward has some sort of like long time horizon element.

00:28:37 Speaker_02
So it not only wants reward, it wants to like, protect the reward button for some long period or something. Another one is some kind of messed up interpretation of some human-like concept.

00:28:50 Speaker_02
So maybe the AIs are like, they really want to be schmeltful and schmanist and schmarmless, right? but their concept is importantly different from the human concept. And they know this.

00:29:02 Speaker_02
So they know that the human concept would mean blah, but their values ended up fixating on a somewhat different structure. So that's another version. And then a fourth version, or a fifth version, which I think

00:29:15 Speaker_02
You know, I think about less because I think it's just like such an own goal if you do this, but I do think it's possible. It's just like, you could have AIs that are actually just doing what it says on the tin.

00:29:24 Speaker_02
Like you have AIs that are just genuinely aligned to the model spec. They're just really trying to like benefit humanity and reflect well on open AI. And what's the other one? Assist the developer or the user, right?

00:29:37 Speaker_03
Yeah.

00:29:39 Speaker_02
But your model spec, unfortunately, was just not robust to the degree of optimization that this AI is bringing to bear.

00:29:47 Speaker_02
And so, you know, it decides when it's looking out at the world and they're like, what's the best way to benefit open AI and, or sorry, reflect on open AI and benefit humanity and such and so. It decides that, you know, the best way is to go rogue.

00:30:00 Speaker_02
That's, I think that's like a real own goal because at that point you like, You got so close. You just had to write the model spec well, and red team it suitably. But I actually think it's possible we messed that up, too.

00:30:13 Speaker_02
It's an intense project, writing constitutions and structures of rules and stuff that are going to be robust to very intense forms of optimization. So that's a final one that I'll just flag, which I think is like...

00:30:27 Speaker_02
it comes up even if you've sort of solved all these other problems.

00:30:30 Speaker_01
Yeah. I buy the idea that like it's possible that the motivation thing could go wrong. Um, I'm not sure I bought, I'm not sure, like my probability of that has increased, uh, by detailing them all out.

00:30:44 Speaker_01
And in fact, I think that it could be potentially misleading to it's, it's like you can always enumerate the ways in which things go wrong. And, um, the process of enumeration itself can increase your probability.

00:30:56 Speaker_01
Whereas you're just like, you had a vague cloud of like 10% or something and you're just like listing out what the 10% actually constitutes.

00:31:03 Speaker_02
Yeah, totally. I'm not, I'm not trying to say like, mostly the thing I wanted to do there was just give any con, any possible, like giving some sense of like, what might the models motivations be? Like what are ways this could be?

00:31:15 Speaker_02
I mean, as I said, my, my, my best guess is that it's partly the like alien thing. Uh, and you know, not necessarily, but the, um, uh, but insofar as you were also interested in like, what does the model do later?

00:31:29 Speaker_02
And kind of like how, um, what sort of future would you expect? Uh, if models did take over then, yeah, I think it can at least be helpful to have some like set of hypotheses on the table instead of just saying like it has some set of motivations.

00:31:41 Speaker_02
But in fact, I am like a lot of the work here is being done by our ignorance about what those motivations are.

00:31:46 Speaker_01
Okay, we don't want humans to be like sort of violently killed and overthrown. But the idea that over time, they're like biological humans are not the driving force as the actors of history is like, yeah, that's kind of baked in, right?

00:32:01 Speaker_01
And then so like, what is the we can sort of debate the probabilities of the worst case scenario, or we can just discuss like, I don't know, what is it that What is a positive vision we're hoping for? What is a future you're happy with?

00:32:17 Speaker_02
My best guess when I really think about what do I feel good about, and I think this is probably true of a lot of people, is there's some sort of more organic decentralized process of like civilizational, incremental civilizational growth.

00:32:34 Speaker_02
The type of thing we trust most and the type of thing we have most experience with right now as a civilization is some sort of like, okay, we change things a little bit,

00:32:42 Speaker_02
A lot of people have, there's a lot of like processes of adjustment and reaction and kind of a decentralized sense of like what's changing. You know, was that good? Was that bad? Take another step.

00:32:55 Speaker_02
There's some like kind of organic process of growing and changing things.

00:33:03 Speaker_02
Which I do expect ultimately to lead to something quite different from biological humans, though I think there's a lot of ethical questions we can raise about what that process involves. But I think...

00:33:18 Speaker_02
you know, I also, I do think we, ideally there would be some way in which we managed to grow via the thing that really captures what do we trust in, you know, there's something, there's something we trust about the ongoing processes of human civilization so far.

00:33:34 Speaker_02
I don't think it's the same as like raw competition, um, or, uh, you know, pure, I think there's like some rich structure to how we understand like moral progress do have been made and what it would be to kind of carry that thread forward.

00:33:48 Speaker_02
And I don't have a formula, you know, I think we're just going to have to bring to bear the full force of

00:33:55 Speaker_02
everything that we know about goodness and justice and beauty and every, we just have to, you know, bring, bring ourselves fully to the project of like making things good and, and doing that collectively.

00:34:06 Speaker_02
And I think that is, it is a really important part, I think, of our vision of like, what was an appropriate process of like deciding of like growing as a civilization is that there was this very inclusive kind of decentralized element of like people getting to think and talk and grow and change things and react rather than some, some more like

00:34:26 Speaker_02
And now the future shall be like blah. Yeah, you know, I think that's, I think we don't want that.

00:34:31 Speaker_01
I think a big crux maybe is like, okay, to the extent that like, the reason we're worried about motivations in the first place is because we think a balance of power, which includes at least one thing with human motivations, not human motivations, human descended motivations is

00:34:50 Speaker_01
difficult to the extent that we think that's the case.

00:34:53 Speaker_01
It seems like a big crux that I often don't hear people talk about is like, I don't know how you get the balance of power and maybe just like reconciling yourself with the, um, the models of the intelligence explosion, which say that such a thing is not possible.

00:35:05 Speaker_01
And therefore you just got to like figure out how you get the right God. But, um, I don't know. I'm like, I don't really have a framework to think about how to, the balance of power thing. Um,

00:35:17 Speaker_01
I'd be very curious if like there is a more concrete way to think about like what are the what what what is a structure of competition or lack thereof between the labs now or between countries such that the balance of power is most likely to be preserved.

00:35:35 Speaker_01
A big part of this discourse, at least among safety-concerned people, is there's a clear trade-off between competition dynamics and race dynamics and the value of the future or how good the future ends up being.

00:35:51 Speaker_01
And in fact, if you buy this balance of power story, it might be the opposite. Maybe competitive pressures naturally favor balance of power. And I wonder if this is one of the strong arguments against nationalizing the AIs.

00:36:02 Speaker_01
You can imagine many different companies developing AI, some of which are somewhat misaligned and some of which are aligned. You can imagine that being more conducive to

00:36:13 Speaker_01
both the balance of power and to a defensive how all the eyes go through each website and see how easy it is to hack and basically just getting society up to snuff.

00:36:24 Speaker_01
If you're not just like deploying the technology widely, then the first group who can get their hands on it will will be able to instigate a sort of revolution that you're just like standing against the equilibrium in a very strong way.

00:36:41 Speaker_02
So I definitely share some intuition there that there's, you know, at a high level, a lot of what's scary about the situation with AI has to do with concentrations of power and whether that power is kind of concentrated in the hands of misaligned AI or in the hands of some human.

00:37:02 Speaker_02
And I do think it's very natural to think, OK, let's try to distribute the power more. And one way to try to do that is to have a much more multipolar scenario where lots and lots of actors are developing AI.

00:37:17 Speaker_02
And this is something that people have talked about. When you describe that scenario, you were like, some of which are aligned, some of which are misaligned. That's key. That's a key aspect of the scenario, right?

00:37:29 Speaker_02
And this is sometimes people will say this stuff. They'll be like, well, the good AIs, there will be the good AIs and they'll defeat the bad AIs.

00:37:36 Speaker_02
But notice the assumption in there, which is that you sort of made it the case that you can control some of the AIs, right? Yeah. And you've got some good AIs. And now it's a question of, are there enough of them?

00:37:49 Speaker_02
And how are they working relative to the others? And maybe. I think it's possible that that is what happens. We know enough about alignment that some actors are able to do that.

00:37:58 Speaker_02
And maybe some actors are less cautious, or they are intentionally creating misaligned AIs, or God knows what. But if you don't have that, if everyone is, in some sense, unable to control their AIs, then

00:38:16 Speaker_02
Then the good AIs help with the bad AIs thing becomes more complicated, or maybe it just doesn't work, because there's no good AIs in this scenario.

00:38:29 Speaker_02
If you say everyone is building their own superintelligence that they can't control, it's true that that is now a check on the power of the other superintelligence.

00:38:35 Speaker_02
Now the other superintelligences need to deal with other actors, but none of them are necessarily working on behalf of a given set of human interests or anything like that.

00:38:48 Speaker_02
I do think that's a very important difficulty in thinking about the very simple thought of, ah, I know what we can do. Let's just have lots and lots of AIs so that no single AI has a ton of power. I think that on its own is not enough.

00:39:07 Speaker_01
But but but in this story, it's like I'm just like very skeptical we end up with.

00:39:11 Speaker_01
I think on default, we have this training regime, at least initially, that favors a sort of like latent representation of the inhibitions that humans have and the values humans have. And I get that, like, if you mess it up, it can go rogue.

00:39:27 Speaker_01
But like, if like multiple people are training AIs, they all end up rogue such that like the compromises between them don't end up with humans. not violently killed.

00:39:36 Speaker_01
Like none of them have like, it all, it feels on like Google's run and Microsoft's run and OpenAI's run.

00:39:44 Speaker_02
Yeah. I mean, I think, I think there's very notable and salient sources of correlation between failures across the different runs, right? Which is people didn't have a developed science of AI motivations. The runs were structurally quite similar.

00:39:56 Speaker_02
Everyone is using the same techniques. Maybe someone just stole the weights or, you know. So Yeah, I guess I think it's really important, this idea that to the extent you haven't solved alignment, you likely haven't solved it anywhere.

00:40:13 Speaker_02
And if someone has solved it and someone hasn't, then I think it's a better question. But if everyone's building systems that are going to go rogue, then I don't think that's much comfort, as we talked about. Yep, yep.

00:40:29 Speaker_01
Okay, all right. So then let's wrap up this part here. I didn't mention this exclusively in the introduction.

00:40:35 Speaker_01
So to the extent that this ends up being the transition to the next part, the broader discussion we were having in part two is about Joe series otherness and control in the age of AGI.

00:40:46 Speaker_01
And the first part is where I was hoping we could just come back and just treat the main the main crux people will come in wondering about and which I myself feel unsure about.

00:40:54 Speaker_02
Yeah, I mean, I'll just say on that front, I mean, I do think the Otherness and Control series is... you know, I think kind of in some sense separable.

00:41:04 Speaker_02
I mean, it has a lot, it has a lot to do with like misalignment stuff, but I think it's not, um, uh, I think a lot of those issues are relevant even if, if even given various degrees of skepticism about some of the stuff I've been saying here.

00:41:16 Speaker_01
And by the way, so the actual mechanisms of how a taker would happen will, there's an episode with Carl Schulman, which discusses this in detail. So people can go check that out.

00:41:26 Speaker_02
Yeah. I think like, yeah, in terms of, Why is it plausible that I could take over from a given position in one of these projects I've been describing or something?

00:41:35 Speaker_02
I think Carl's discussion is pretty good and gets into a bunch of the weeds that I think might give a more concrete sense.

00:41:43 Speaker_01
All right, so now on to part two where we discuss the otherness and control in the Age of AGI series.

00:41:50 Speaker_01
First question, if in 100 years time we look back on alignment and consider it was a huge mistake, that we should have just tried to build the most raw, powerful AI systems we could have, what would bring about such a judgment?

00:42:03 Speaker_02
One scenario I think about a lot is one in which it just turns out that maybe kind of fairly basic measures are enough to ensure, for example, that AIs don't cause catastrophic harm, don't kind of seek power in problematic ways, et cetera.

00:42:18 Speaker_02
And it could turn out that we learned that it was easy in a way that, such that we regret, you know, we wish we had prioritized differently. We end up thinking, oh, you know, I wish we could have cured cancer sooner.

00:42:30 Speaker_02
We could have handled some geopolitical dynamic differently. There's another scenario where we end up looking back at some period of our history and how we

00:42:41 Speaker_02
thought about AIs, how we treated our AIs, and we end up looking back with a kind of moral horror at what we were doing.

00:42:49 Speaker_02
So, you know, we end up thinking, you know, we were thinking about these things centrally as like products, as tools, but in fact we should have been foregrounding much more the sense in which they might be moral patients or were moral patients at some level of sophistication, that we were kind of treating them in the wrong way.

00:43:04 Speaker_02
We were just acting like we could do whatever we want. We could delete them, subject them to arbitrary experiments, kind of alter their minds in arbitrary ways.

00:43:12 Speaker_02
And then we end up looking back in the light of history at that as a kind of serious and kind of grave moral error. Those are scenarios I think about a lot in which we have regrets. I don't think they quite fit the bill of what you just said.

00:43:25 Speaker_02
I think it sounds to me like the thing you're thinking is something more like, we end up feeling like, gosh,

00:43:33 Speaker_02
We wish we had paid no attention to the motives of our AIs, that we'd thought not at all about their impact on our society as we incorporated them.

00:43:41 Speaker_02
And instead, we had pursued a, let's call it a kind of maximize for brute power option, which is just kind of make a beeline for whatever is just the most powerful AI you can, and don't think about anything else.

00:43:59 Speaker_01
I'm very sceptical that that's what we're going to wish. One common example that's given a misalignment is humans from evolution. You have one line in your series that here's a simple argument for AI risk.

00:44:15 Speaker_01
A monkey should be careful before inventing humans. The sort of paper clipper metaphor imply something really banal and boring with regards to misalignment.

00:44:27 Speaker_01
And I think if I'm steel manning the people who worship power, they have the sense of humans got misaligned and they had, they started pursuing things.

00:44:36 Speaker_01
If a monkey was creating them, this is a weird analogy because obviously monkeys didn't create humans, but if the monkey was creating them, there's things, you know, they're not thinking about bananas all day. They're thinking about other things.

00:44:44 Speaker_01
On the other hand, they didn't just make useless stone tools and pile them up in caves in a sort of paper-clipper fashion.

00:44:51 Speaker_01
There were all these things that emerged because of their greater intelligence, which were misaligned with evolution of creativity and love and music and beauty and all the other things we value about human culture.

00:45:05 Speaker_01
And the prediction maybe they have, which is more of an empirical statement than a philosophical statement is, listen, with greater intelligence, if you're thinking about the paper clipper, even if it's misaligned, it will be in this kind of way.

00:45:18 Speaker_01
It'll be things like that are alien to humans, but also alien in the way humans are aliens to monkeys, not in the way that paper clipper is alien to a human.

00:45:26 Speaker_02
Cool. So I think there's a bunch of different things to potentially unpack there.

00:45:32 Speaker_02
One conceptual point that I want to name off the bat, I don't think you're necessarily making a mistake in this vein, but I just want to name it as a possible mistake in this vicinity, is I think we don't want to engage in the following form of reasoning.

00:45:48 Speaker_02
Let's say you have two entities. One is in the role of creator, and one is in the role of creation. And then we're positing that there's this misalignment relation between them, whatever that means.

00:46:01 Speaker_02
Here's a pattern of reasoning that I think you want to watch out for, is to say, in my role as creator, or sorry, in my role as creation, say you're thinking of humans in the role of creation relative to an entity like evolution, or monkeys, or mice, or whoever you could imagine inventing humans, or something like that, right?

00:46:21 Speaker_02
You say, I'm qua creation, I'm happy. that I was created and happy with the misalignment. Therefore, if I end up in the role of creator,

00:46:35 Speaker_02
and we have a structurally analogous relation in which there's misalignment with some creation, I should expect to be happy with that as well. Yeah.

00:46:44 Speaker_01
There's a couple of philosophers that you brought up in the series, which if you read the works that you talk about, actually seem incredibly foresighted in anticipating something like a singularity, our ability to shape a future thing that's

00:47:02 Speaker_01
different, smarter, maybe better than us. Obviously, C.S. Lewis, Abolition of Man, we'll talk about in a second, is one example. But even here's one passage from Nisha, which I felt really highlighted this.

00:47:14 Speaker_01
Man is a rope stretched between the animal and the Superman, a rope over an abyss, a dangerous crossing, a dangerous wayfaring, a dangerous looking back, a dangerous trembling and halting. Is there some explanation for why?

00:47:27 Speaker_01
Is it just somehow obvious that something like this is coming, even if you were thinking 200 years ago?

00:47:32 Speaker_02
I think I have a much better grip on what's going on with Lewis than with Nietzsche there, so maybe let's just talk about Lewis for a second.

00:47:39 Speaker_02
And we should distinguish two... There's a kind of version of the singularity that's specifically a hypothesis about feedback loops with AI capabilities. I don't think that's present in Lewis.

00:47:49 Speaker_02
I think what Lewis is anticipating, and I do think this is a relatively simple forecast is something like the culmination of the project of scientific modernity.

00:48:02 Speaker_02
So Lewis is kind of looking out at the world and he's seeing this process of kind of increased understanding of a kind of the natural environment and a kind of corresponding increase in our ability to kind of control and direct that environment.

00:48:19 Speaker_02
And then he's also pairing that with

00:48:23 Speaker_02
a kind of metaphysical hypothesis or well his stance on this metaphysical hypothesis I think is like kind of problematically unclear in the in the book but there is this metaphysical hypothesis um naturalism which says that uh humans too and kind of minds beings agents are a part of nature and so uh insofar as this process of scientific modernity involves a kind of

00:48:49 Speaker_02
a progressively greater understanding of and ability to control nature that will presumably at some point grow to encompass our own natures and our, and kind of the natures of other beings that in principle we could, we could create.

00:49:05 Speaker_02
And Lewis views this as a kind of cataclysmic event and crisis. You know, part of what I'm trying to say, and that in particular, that it will lead to all these kind of tyrannical kind of

00:49:18 Speaker_02
behaviors and kind of tyrannical attitudes towards morality and stuff like that. And part of what I'm trying to, you know, unless you believe in non-naturalism or in some form of kind of Tao, which is this kind of objective morality.

00:49:30 Speaker_02
So we can talk about that. But part of what I'm trying to do in that essay is to say,

00:49:34 Speaker_02
No, I think we can be naturalists and also be kind of decent humans that remain in touch with kind of a rich set of norms that have to do with like, how do we relate to the possibility of kind of creating creatures, altering ourselves, et cetera.

00:49:48 Speaker_02
But I do think his, yeah, it's like a relatively simple prediction. It's kind of science masters nature, humans part of nature, science masters humans.

00:49:57 Speaker_01
And then you also have a very interesting other essay about suppose humans, like what should we expect of other humans? the sort of extrapolation if they had greater capabilities and so on?

00:50:07 Speaker_02
Yeah. I mean, I think an uncomfortable thing about the kind of conceptual setup at stake in these sort of like abstract discussions of like, okay, you have this agent, it fooms, which is this sort of

00:50:22 Speaker_02
amorphous process of kind of going from a sort of seed agent to a like super intelligent version of itself, often imagined to kind of preserve its values along the way. Bunch of questions we can raise about that.

00:50:36 Speaker_02
But I think a kind of, many of the arguments that people will often talk about in the context of reasons to be scared of AI is like, oh, like value is very fragile as you like fume. You know,

00:50:50 Speaker_02
kind of small differences in utility functions can kind of de-correlate very hard and kind of drive in quite different directions.

00:50:56 Speaker_02
And like, oh, agents have instrumental incentives to seek power, and if it was arbitrarily easy to get power, then they would do it and stuff like that. These are very general arguments that seem to suggest that the kind of

00:51:09 Speaker_02
It's not just an AI thing, right? It's like, no surprise, right? It's talking about like, take a thing, make it arbitrarily powerful such that it's like, you know, god emperor of the universe or something. How scared are you of that?

00:51:24 Speaker_02
Like, clearly, we should be equally scared of that. Or, I don't know, we should be really scared of that with humans, too, right?

00:51:29 Speaker_02
So, I mean, part of what I'm saying in that essay is that I think this is, in some sense, this is much more a story about balance of power. Right. And about, like, maintaining a kind of checks and balances and kind of distribution of power, period.

00:51:46 Speaker_02
Not just about like kind of humans versus AIs and kind of the differences between human values and AI values. Now that said, I mean, I do think humans, many humans would likely be nicer if they foomed than like certain types of AIs.

00:51:56 Speaker_02
So, I mean, it's not, but I think the kind of conceptual structure of the argument is not, it's sort of, a very open question how much it applies to humans as well.

00:52:08 Speaker_01
I think one sort of big question I have is, I don't even know how to express this, but how confident are we with this ontology of expressing like, what are agents, what are capabilities? How do we know this is the thing that's happening?

00:52:25 Speaker_01
Or like, this is the way to think about what, what intelligences are?

00:52:28 Speaker_02
So it's clearly this kind of very janky kind of I mean, well, I don't, people maybe disagree about this. I think it's, you know, I mean, it's obvious to everyone with respect to like real world human agents. Right.

00:52:43 Speaker_02
That kind of thinking of humans as having utility functions is, you know, at best, a very lossy approximation of what's going on. I think it's likely to mislead as you amp up the intelligence of various agents as well.

00:52:59 Speaker_02
Though I think Eliezer might disagree about that. Right.

00:53:01 Speaker_02
I will say, I think there's something adjacent to that, that I think is like more real, that seems more real to me, which is something like, I don't know, my mom recently bought, you know, or a few years ago, she like wanted to get a house.

00:53:13 Speaker_02
She wanted to get a new dog. Now she has both, you know? How did this happen? What is the right action? It's because she tried, it was hard. She had to like search for the house. It was hard to find the dog, right? Now she has a house. Now she has a dog.

00:53:26 Speaker_02
This is a very common thing that happens all the time. And I think, I don't think we need to be like, my mom has to have a utility function with the dog and she has to have a consistent valuation of all the houses or whatever.

00:53:36 Speaker_02
I mean, like, but it's still the case that her planning and her agency exerted in the world resulted in her having this house, having this dog. And I think it is plausible that as our kind of scientific and technological power advances,

00:53:50 Speaker_02
more and more stuff will be kind of explicable in that way, right? That, you know, if you look and you're like, why is this man on the moon, right? How did that happen? And it's like, well, there was a whole cognitive process.

00:54:02 Speaker_02
There was a whole like planning apparatus. Now, in this case, it wasn't like localized in a single mind, but like there was a whole thing such that man on the moon, right? And I think like, we'll see a bunch more of that and the AIs

00:54:15 Speaker_02
will be, I think, doing a bunch of it. And so that's the thing that seems more real to me than utility functions.

00:54:23 Speaker_01
So yeah, the man on the moon example, there's a proximal story of how exactly NASA engineered the spacecraft to get to the moon. There's the more distal geopolitical story of why we send people to the moon. And at all those levels, there's

00:54:42 Speaker_01
different utility functions clashing. Maybe there's a sort of like meta societal world utility function. But maybe the story there is like there's some sort of balance of power between these agents.

00:54:53 Speaker_01
And that's why there's the emergent thing that happens. Like why we send things to the moon is not one guy had a utility function. But like, I don't know. Cold war dot dot dot things happened.

00:55:05 Speaker_01
Whereas I think like the alignment stuff is a lot about like assuming that one thing is the thing that will control everything. How do we control the thing that controls everything?

00:55:15 Speaker_01
Now, I guess it's not clear what you do to reinforce balance of power. Like it could just be that balance of power is not a thing that happens once you have things that can make themselves intelligent.

00:55:25 Speaker_01
But that seems interestingly different from the how do we got to the moon story.

00:55:32 Speaker_02
Yeah, I agree. I think there's a few things going on there.

00:55:34 Speaker_02
So one is that I do think that even if you're engaged in this ontology of kind of carving up the world into different like agencies, at the least you don't want to kind of assume that they're all like unitary or like not overlapping or like there's a whole, it's not like, all right, we've got this agent, let's carve out one part of the world, it's one agent.

00:55:53 Speaker_02
Over here it's like, it's this whole like messy ecosystem, like kind of teaming, niches, and this whole thing, right? And

00:56:02 Speaker_02
I think in discussions of AI, sometimes people slip between being like, well, an agent is anything that gets anything done, right? And they'll sort of... It could be this weird moochy thing.

00:56:13 Speaker_02
And then sometimes they're very obviously imagining an individual actor. And so... That's like one difference. I also just think, I think we should be really going for the balance of power thing.

00:56:26 Speaker_02
I think it is just not good to be like, we're gonna have a dictator. Who should the dictator be? Let's make sure we make the dictator the right dictator. I'm like, whoa, no. I think the goal should be sort of we all foom together.

00:56:40 Speaker_02
It's like the whole thing in this kind of inclusive and pluralistic way, in a way that kind of, satisfies the values of like tons of stakeholders, right?

00:56:49 Speaker_02
And is this kind of, at no point is there like one kind of single point of failure on all these things. Like, I think that's what we should be striving for here.

00:56:58 Speaker_02
And I think that's true of the human power aspect of AI, and I think it's true of the AI part as well.

00:57:04 Speaker_01
Yeah. Hey everybody, here's a quick message from today's sponsor, Stripe. When I started the podcast, I just wanted to get going as fast as possible. So I used Stripe Atlas to register my LLC, create a bank account.

00:57:15 Speaker_01
I still use Stripe now to invoice advertisers and accept their payments, monetize this podcast. Stripe serves millions of businesses, small businesses like mine, but also the world's biggest companies, Amazon,

00:57:26 Speaker_01
And all these businesses are using Stripe because they don't want to deal with the Byzantine web of payments where you have different payment methods in every market and increasingly complex rules, regulations, arcane legacy systems.

00:57:40 Speaker_01
Stripe handles all of this complexity and abstracts it away. And they can test and iterate every pixel of the payment experience across billions of transactions.

00:57:48 Speaker_01
I was talking with Joe about paper clippers, and I feel like Stripe is the paper clipper of the payment industry, where they're gonna optimize every part of the experience for your users, which means, obviously, higher conversion rates, and ultimately, as a result, higher revenue for your business.

00:58:03 Speaker_01
Anyways, you can go to stripe.com to learn more, and thanks to them for sponsoring this episode. Back to Joe. So there's an interesting intellectual discourse on, let's say, right-wing

00:58:15 Speaker_01
side of the debate where they ask themselves, traditionally we favor markets, but now look where our society is headed.

00:58:22 Speaker_01
It's misaligned in the ways we care about society being aligned, like fertility is going down, family values, religiosity, these things we care about. GDP keeps going up. These things don't seem correlated.

00:58:34 Speaker_01
So we're kind of grinding through the values we care about because of increased competition. And therefore, we need to intervene in a major way.

00:58:44 Speaker_01
And then the pro-market libertarian faction of the right will say, look, I disagree with the correlations here. But even at the end of the day, fundamentally, my point is, or their point is, liberty is the end goal.

00:58:56 Speaker_01
It's not the, it's not like what you use to get to higher fertility or something. I think there's something interestingly analogous about the AI competition grinding things down.

00:59:07 Speaker_01
Like obviously you don't want the gray goo, but like the, the libertarians versus the strats. I think, I think there's something analogous here.

00:59:13 Speaker_02
Yeah. So, I mean, I think one, one thing you could think, which doesn't necessarily need to be about gray goo, it could also just be about alignment, is something like, Sure, it would be nice if the AIs didn't violently disempower humans.

00:59:27 Speaker_02
It would be nice if the AIs, otherwise, when we created them, their integration into our society led to good places. But I'm uncomfortable with the sorts of interventions that people are contemplating in order to ensure that sort of outcome.

00:59:43 Speaker_02
And I think there's a bunch of things to be uncomfortable about that. Now, that said, so for something like

00:59:51 Speaker_02
everyone being killed or violently disempowered, that is traditionally something that we think, if it's real, and obviously we need to talk about whether it's real, but in the case where it's a real threat, we often think that quite intense forms of intervention are warranted to prevent that sort of thing from happening, right?

01:00:09 Speaker_02
So if there was actually a terrorist group that was planning to, you know, it was like working on a bioweapon that was gonna kill everyone or 99.9% of people, we would think that warrants intervention. You just shut that down.

01:00:23 Speaker_02
And now even if you had a group that was doing that unintentionally, imposing a similar level of risk, I think many, many people, if that's the real scenario, will think that that warrants quite intense preventative efforts. And so obviously,

01:00:41 Speaker_02
people, you know, these sorts of risks can be used as an excuse to expand the state power. Like there's a lot of things to be worried about for different types of like contemplated interventions to address certain types of risks.

01:00:53 Speaker_02
You know, I think we need to just, I think there's no like royal road there. You need to just like have the actual good epistemology. You need to actually know, is this a real risk? What are the actual stakes?

01:01:04 Speaker_02
And, you know, look at it case by case and be like, is this, you know, is this warranted? So that's like one point on the like takeover, literal extinction thing.

01:01:17 Speaker_02
I think the other thing I want to say, so I talk in the piece about this distinction between the like, well, let's at least have the AIs who are kind of minimally law abiding or something like that, right?

01:01:25 Speaker_02
Like we don't have to talk about, there's this question about servitude and question about like other control over AI values.

01:01:31 Speaker_02
But I think we often think it's okay to like really want people to like obey the law, to uphold basic cooperative arrangements, stuff like that.

01:01:42 Speaker_02
I do, though, want to emphasize, and I think this is true of markets and true of liberalism in general, just how much these procedural norms like democracy, free speech, property rights, things that people really hold dear, including myself,

01:01:57 Speaker_02
are in the actual lived substance of a liberal state, undergirded by all sorts of virtues and dispositions and character traits in the citizenry. So these norms are not robust to arbitrarily vicious. citizens.

01:02:18 Speaker_02
I want there to be free speech, but I think we also need to raise our children to value truth and to know how to have real conversations. I want there to be democracy, but I think we also need to raise our children to be compassionate and decent.

01:02:32 Speaker_02
And I think it's sometimes we can lose sight of that aspect. And I think anyway, but I think like bringing that to mind now, that's not to say that should be the project of state power, right?

01:02:42 Speaker_02
But I think like understanding that liberalism is not this sort of like ironclad structure that you can just like hit go, you give like any citizenry and like hit go and you'll get something like flourishing or even functional, right?

01:02:55 Speaker_02
You need, there's like a bunch of other softer stuff that like makes this whole project

01:03:00 Speaker_01
go maybe zooming out.

01:03:01 Speaker_01
What was the one question you could ask is, I think the people who have enough Nick land would be good stuff in here, but somebody was people who have a sort of fatalistic attitude towards alignment as a as the thing that can even make sense.

01:03:18 Speaker_01
They'll say things like,

01:03:19 Speaker_01
look the thing the kinds of things that are going to be exploring the black hole the center of the galaxy the kinds of things that go visit andromeda or something did you really expect them to privilege whatever inclinations you have because you grew up in the african savannah and that what the evolutionary pressures were a hundred thousand years ago right like you of course they're gonna be like weird um

01:03:43 Speaker_01
And like, yeah, like what did you think was going to happen?

01:03:46 Speaker_02
I do think the even good futures will be weird. You know, I think, and I want to be clear when, you know, when I talk about kind of like finding ways to ensure that kind of the integration of AIs into our society leads to good places.

01:04:02 Speaker_02
I'm not imagining like

01:04:04 Speaker_02
I think sometimes people think that this project of wanting that, and especially to the extent that that makes some deep reference to human values, involves this kind of short-sighted parochial imposition of our current unreflective values.

01:04:19 Speaker_02
So it's just like, yeah, we're gonna have, I don't know.

01:04:24 Speaker_02
Like I think they sort of imagine this, that we're forgetting that we too, there's a kind of reflective process and a kind of a moral progress dimension that we want to like leave room for, right?

01:04:37 Speaker_02
You know, like whatever, Jefferson has this line about like, ah, you know, just as you wouldn't want to like force a man, a grown man into like a younger man's coat, So we don't want to chain civilization to a barbarous past.

01:04:49 Speaker_02
Everyone should agree on that. And the people who are interested in alignment also agree on that. So obviously, there's a concern that people don't engage in that process, or that something shuts down the process of reflection.

01:05:01 Speaker_02
But I think everyone agrees we want that. And so that will lead, potentially, to something that is quite different from our current conception of what's valuable. There's a question of how different.

01:05:17 Speaker_02
And I think there are also questions about what exactly are we talking about with reflection?

01:05:20 Speaker_02
I have an essay on this where I think this is not, I don't actually think there's a kind of off the shelf pre-normative notion of reflection that you can just be like, oh, obviously you take an agent, you stick it through reflection, and then you get like values, right?

01:05:33 Speaker_02
Like, no, there's a bunch of, Types of reflect, I mean, I think that really there's just a bunch of, there's like a whole pattern of empirical facts about like, take an agent, put it through some process of like reflection, all sorts of things.

01:05:46 Speaker_02
Ask it questions, there's like, and then that'll go in all sorts of directions for a given empirical case. And then you have to look at the pattern of outputs and be like, okay, what do I make of that?

01:05:56 Speaker_02
But overall, I think we should expect, like, even the good futures I think will be quite weird. And they might even be incomprehensible, like, to us? I don't think so. I mean, there's different types of incomprehensible.

01:06:11 Speaker_02
So say I show up in the future and this is all computers, right? I'm like, okay, all right. And then they're like, we're running creatures on the computers.

01:06:17 Speaker_02
I'm like, so I have to somehow get in there and see what's actually going on with the computers or something like that.

01:06:23 Speaker_02
Maybe I can actually see, maybe I actually understand what's going on in the computers, but I don't yet know what values I should be using to evaluate that. So it can be the case that you don't

01:06:31 Speaker_02
us, if we showed up, would not be very good at recognizing goodness or badness. I don't think that makes it insignificant, though. Suppose you show up in the future and it's got some answer to the Riemann hypothesis, right?

01:06:46 Speaker_02
And you can't tell whether that answer is right. Maybe the civilization went wrong. It's still an important difference. It's just that you can't track it.

01:06:53 Speaker_02
And I think something similar is true of worlds that are genuinely expressive of what we would value if we engaged in processes of reflection that we endorse versus ones that have totally veered off into something meaningless.

01:07:06 Speaker_01
I think like one thing I've heard people who are skeptical of this ontology just be like, all right, what do you even mean by alignment?

01:07:12 Speaker_01
And obviously the very first question you answer, do you like express like, here's different things that could mean, do you mean balance of power? Do you mean somewhere between like that and dictator or whatever?

01:07:24 Speaker_01
Then there's another thing which is like, separate from the AI discussion, like, I don't want the future to contain a bunch of torture.

01:07:30 Speaker_01
And like, it's not necessarily like a technical, I mean, like part of it might involve technically aligning GPT-4, but it's like, that's not what it, you know what I mean? Like, that's like a proxy to get to like that future.

01:07:42 Speaker_01
The sort of question then is, what we really mean by alignment, is it just like, whatever it takes to make sure the future doesn't have a bunch of torture? Or do we mean like,

01:07:56 Speaker_01
what I really care about is in a thousand years, things that are like, that are like clearly my descendants, not like some thing where I like, I recognize they have their own art or whatever. It's like, no, no.

01:08:08 Speaker_01
It's like if I, if it was like my grandchild, that's like that level of descendant is controlling the galaxy. Um, even if they're not conducting torture.

01:08:14 Speaker_01
And I think like what some people mean is like our intellectual descendants should control the light cone. Even if it's like, even if the other counterfactual doesn't involve a bunch of torture.

01:08:23 Speaker_02
Yeah. So I agree. I mean, I think there's a few different things there. Right. So there's, um, there's kind of, what are you going for? You're going for like actively good. You're going for avoiding right. Certain stuff. Right.

01:08:35 Speaker_02
And then there's a different question, which is what counts as actively good according to you. So, um, maybe some people are like the only things that are actively good are like,

01:08:48 Speaker_02
my grandchildren or, or I don't know, like some like literal descending genetic line from me or something. I'm like, well, that sounds, that's not, that's not my thing.

01:08:57 Speaker_02
Um, and, uh, and I don't think it's really what most people have in mind when they talk about goodness.

01:09:03 Speaker_02
I mean, I mean, I think there's a conversation to be had like a, and obviously in some sense when we talk about a good future, we need to be thinking about like, what are all the stakeholders here and how, how does it all fit together?

01:09:13 Speaker_02
Um, uh, but I think, Yeah.

01:09:18 Speaker_02
When I think about it, I'm not, um, assuming that some, there's some notion of like descendants or like some, like, I think there's a kind of the thing that matters about the kind of lineage is this, um, whatever's required for kind of the, the kind of,

01:09:39 Speaker_02
optimization processes to be, in some sense, pushing towards good stuff. And there's a concern that currently a lot of what is making that happen lives in human civilization in some sense. And so we don't know exactly what... There's some kind of

01:10:04 Speaker_02
seed of goodness that we're carrying, um, in different ways or, you know, different people, there's different notions of goodness for different people, maybe, but there's something, there's some sort of seed that is currently like here that we have that is not sort of just in the universe everywhere.

01:10:19 Speaker_02
It's not just going to crop up if you, if you just sort of die out or something, it's something that is, is in some sense contingent, uh, to our civilization, or at least that's the, that's the picture we can talk about whether that's right.

01:10:29 Speaker_02
Um, and so I think, the sense in which kind of stories about good futures that have to do with alignment are kind of about descendants, I think it's more about like whatever that seed is, how do we kind of carry it?

01:10:42 Speaker_02
How do we keep the like life thread alive going into the future?

01:10:47 Speaker_01
But then I'm like, one could accuse like sort of the alignment community of like a sort of Martin Bailey of like, the Mott is, we just want to make sure that GPT-8 doesn't kill everybody.

01:10:59 Speaker_01
And after that, it's like, all you guys, you know, we're all cool.

01:11:03 Speaker_01
But then like, the real thing is, we are fundamentally pessimistic about historical processes in a way that doesn't even necessarily implicate AI alone, but just like the nature of the universe.

01:11:18 Speaker_01
And we want to do something about to make sure like the nature of the universe doesn't take a hold on humans, you know what I like, where things are headed.

01:11:25 Speaker_01
So if you look at Soviet Union, the collectivization of farming and the disempowerment of the kulaks was not as a practical matter necessary. In fact, it was extremely counterproductive. It almost brought down the regime.

01:11:41 Speaker_01
And it obviously killed millions of people, you know, caused a huge famine.

01:11:45 Speaker_01
But it was sort of ideologically necessary in the sense of like you have, we have an ember of something here and we got to make sure that enclave of the other thing doesn't, it does have like, it's sort of like if you have raw competition between the Kulak type capitalism and what we're trying to build here, the gray goo of the Kulaks will just like take over, right?

01:12:06 Speaker_01
And so like we have this ember here, we're going to like do worldwide revolution from it. I know that obviously that's not exactly the kind of thing Alignment has in mind, but like we have an ember here and like we got to,

01:12:16 Speaker_01
We gotta make sure that this other thing that's happening on the side doesn't, you know, sort of, obviously that's not how they would phrase it, but like, get it told on what we're building here.

01:12:26 Speaker_01
And that's maybe the worry that people who are opposed to alignment have, is like, you mean the second kind of thing, like the kind of thing that maybe Stalin was worried about, even though obviously wouldn't endorse the specific things he did.

01:12:36 Speaker_02
When people talk about alignment, they have in mind a number of different types of goals, right? So one type of goal is quite minimal. It's something like that the AIs don't kill everyone, or kind of violently disempower people.

01:12:51 Speaker_02
Now there's a second thing people sometimes want out of alignment, which is much broader, which is something like, we would like it to be the case that our AIs are such that when we incorporate them into our society, things are good, right?

01:13:04 Speaker_02
That we just have a good future. I do agree that I think the discourse about AI alignment mixes together these two goals that I mentioned. The sort of most straightforward thing to focus on And I don't blame people for just talking about this one.

01:13:22 Speaker_02
It's just the first one.

01:13:25 Speaker_02
When we think about in which context is it appropriate to try to exert various types of control or to have more of what I call in the series yang, which is this active controlling force, as opposed to yin, which is this more receptive, open, letting go.

01:13:43 Speaker_02
A kind of paradigm context in which we think that is appropriate is if something is a kind of active aggressor against the sort of boundaries and cooperative structures that we've created as a civilization, right?

01:14:00 Speaker_02
you know, I talk about the Nazis or, you know, in the piece it's sort of like when you sort of invade, if something is invading, we often think it's appropriate to like fight back, right?

01:14:09 Speaker_02
And we often think it's appropriate to like set up structures to kind of prevent and kind of ensure that these basic norms of kind of peace and harmony are kind of adhered to. And I do think some of the kind of moral heft

01:14:25 Speaker_02
of some parts of the alignment discourse comes from drawing specifically on that aspect of our morality, right? So we think the AIs are presented as aggressors that are coming to kill you.

01:14:36 Speaker_02
And if that's true, then it's quite appropriate, I think, to really be like, okay, We, it is kind of, that's classic human stuff.

01:14:47 Speaker_02
Almost everyone recognizes that kind of self-defense or like ensuring kind of basic norms are adhered to is a kind of justified use of like certain kinds of power that would often be unjustified in other contexts.

01:14:59 Speaker_02
So self-defense is a clear example there. I do think it's important though to separate that concern from this other concern about where does the future eventually go? And how much do we want to be kind of trying to steer that actively?

01:15:19 Speaker_02
So to some extent, I wrote the series partly in response to the thing you're talking about, which is, I think it is true that aspects of this discourse involve the possibility of like

01:15:31 Speaker_02
trying to grip, like I think trying to kind of steer and grip and like kind of rent, you have this sense of the universe is about to kind of go off in some direction and you need to, and you know, people notice that muscle.

01:15:42 Speaker_02
And part of what I want to do is like, well, we have a very rich ethical, human ethical tradition of thinking about like, what, when is it appropriate to try to exert what sorts of control over which things?

01:15:53 Speaker_02
And I want that to be, I want us to bring the kind of full force and richness of that tradition to this discussion, right?

01:15:58 Speaker_02
And not, like, I think it's easy if you're purely in this abstract mode of, like, utility functions, human utility function, and there's, like, this competitor thing with utility function.

01:16:06 Speaker_02
It's like somehow you lose touch with the kind of complexity of how we actually, like, we've been dealing with kind of differences in values and kind of competitions for power. This is classic stuff, right?

01:16:17 Speaker_02
And I don't actually think, I think the AIs sort of amplify a lot of the,

01:16:22 Speaker_02
the the kind of dynamics but I don't think it's sort of fundamentally new and so part of what I'm trying to say is like well let's draw on our full on the full wisdom we have here while obviously adjusting for like ways in which things are different.

01:16:33 Speaker_01
So one of the things the Ember analogy brings up and getting a hold of the future is we're going to go explore space and that's where we expect most of the things that will happen. Most of the people that will live, it'll be in space.

01:16:48 Speaker_01
And I wonder how much of the high stakes here is not really about AI per se, but it's about space.

01:16:54 Speaker_01
Like it just, it's a coincidence that we're developing AI at the same time where you're like on the cusp of expanding through most of the stuff that exists.

01:17:04 Speaker_02
So I don't think it's a coincidence in that I think Essentially, like the way we would become able to expand, or the kind of most salient way to me, is via some kind of radical acceleration of our technological.

01:17:19 Speaker_01
Let me clarify, then like the stakes here, like if this was just a question of do we do AGI and explore the solar system and there was nothing beyond the solar system, like we fume and weird things might happen with the solar system if we get it wrong.

01:17:34 Speaker_01
compared to that, like billions of galaxies has a different sort of, that's what's at stake. I wonder how much of the discourse is like hinges on the stakes because of the space.

01:17:44 Speaker_02
I mean, I think for most people, very little, you know, I think people are really like, what's going to happen to this world, right?

01:17:53 Speaker_02
This world around us that we live in when, as we, and you know, what's that going to happen to me and my kids and, and to,

01:17:57 Speaker_02
So I don't actually think, you know, some people spend a lot of time on the space stuff, but I think for the most immediately pressing stuff about AI doesn't require that at all.

01:18:07 Speaker_02
I also think like, even if you bracket space, like time is also very big. And so, you know, whatever, we've got like 500 million years, a billion years left on earth if we don't mess with the sun and maybe you could get more out of it.

01:18:20 Speaker_02
So like, you know, I think there's still, That's a lot. But I don't know if it fundamentally changes the narrative.

01:18:32 Speaker_02
Obviously, the stakes, insofar as you care about what happens in the future or in space, then the stakes are way smaller if you shrink down to the solar system. And I think that does

01:18:43 Speaker_02
change potentially some stuff in that, like a really nice feature of our situation right now, depending on what the actual nature of kind of the kind of resource pie is, is that I think

01:18:57 Speaker_02
you know, in some sense, there's such an abundance of energy and other resources and principle available to a kind of responsible civilization that really just tons of stakeholders, especially ones who are like able to kind of saturate, get like really close to like amazing according to their values with like kind of comparatively small allocations of resources or something like we can just,

01:19:23 Speaker_02
you know, I, I sort of, I kind of feel like everyone who has like out of satiable values, who, who will be like really, really happy with like some like small kind of fraction of the available pie, we should just like satiate all sorts of stuff.

01:19:35 Speaker_02
Right. And obviously you need to do like you know, figure out gain some trade and balance and like very, there's like a bunch of complexity here, but I think in principle, you know, we're in a position to, um,

01:19:49 Speaker_02
to create a really wonderful, wonderful scenario for just tons and tons of different value systems. And so I think, correspondingly, we should be really interested in doing that. I sometimes use this heuristic in thinking about the future.

01:20:05 Speaker_02
I think we should be aspiring to really kind of leave no one behind, right? Like really find like, who are all the stakeholders here?

01:20:11 Speaker_02
How do we really have like a fully inclusive vision of like how the future could be good from a very, very wide variety of perspectives. And I think the kind of vastness of space resources like makes that a lot, makes that very feasible.

01:20:26 Speaker_02
And now if you instead imagine, it's a much smaller pie, well, maybe you face tougher trade-offs. And so I think that's like an important dynamic.

01:20:36 Speaker_01
Is the inclusivity because of part of your values includes your different potential futures getting to play out or is it because I'm, uh, uncertainty about which the right one is. So let's, let's make sure we're not nulling out the possible.

01:20:54 Speaker_01
If, if it's, if you're wrong, then you're not nulling out all value.

01:20:57 Speaker_02
I think it's a bunch of things at once. So yeah, I'm. Just I'm really into being nice when it's cheap, right? Like I think if you can just help someone a lot in a way that's really cheap for you, do it, right? Or like, I don't know.

01:21:11 Speaker_02
I mean, obviously you need to think about trade-offs and there's like a lot of people in principle you could be nice to, but I think like the principle of like be nice when it's cheap, I'm like very excited to try to uphold.

01:21:20 Speaker_02
I also really hope that kind of other people uphold that with respect to me, including the AIs, right? Like I think we should be kind of golden ruling. Like we're thinking about, oh, we're gonna inventing these AIs.

01:21:30 Speaker_02
I think there's some way in which I'm trying to like kind of embody attitudes towards them that I like hope that they would embody towards me.

01:21:37 Speaker_02
And that's like some, it's unclear exactly what the ground of that is, but that's something, you know, I really like the golden rule. And I think a lot about that as a kind of basis for treatment of other beings.

01:21:49 Speaker_02
And so I think like, be nice when it's cheap is like, if you think about it, if everyone implements that rule, then we get potentially like a big kind of Pareto improvement or like, so I don't know exactly Pareto improvement, but it's like good deal.

01:22:02 Speaker_02
It's a lot of, a lot of good deals. And yeah, so I think it's that I'm just into pluralism. I've got uncertainty, you know, there's like all sorts of stuff swimming around there. But and then I think also just as a matter of like,

01:22:19 Speaker_02
having cooperative and good balances of power and deals and avoiding conflict. I think finding ways to set up structures that lots and lots of people in value systems and agents are happy with, including non-humans, people in the past, AIs, animals.

01:22:36 Speaker_02
I really think we should have very broad a broad sweep in thinking about what sorts of inclusivity we want to be kind of reflecting in a kind of mature civilization and kind of setting ourselves up for doing that.

01:22:51 Speaker_01
Okay, so I want to go back to the Wichita relationship with these AI is B because pretty soon we're talking about our relationship to superhuman intelligences if we think such a thing is possible.

01:23:04 Speaker_01
And so there's a question of what is the process you get to use to get there and the morality of gradient dissenting on their minds, which we can address later. The thing that gives personally me the most unease about

01:23:17 Speaker_01
alignment, quote unquote, is at least a part of part of the vision here sounds like you're going to enslave a god. And like, there's just something like that's that's feel so wrong about that.

01:23:32 Speaker_01
But then the question is, like, if you don't enslave the god, like, obviously, the gods gonna have more control. Are you okay with you're going to surrender most of the, most of everything. Obviously the, you know what I mean?

01:23:43 Speaker_02
Even if it's like a cooperative relationship you have, I think we as a civilization are going to have to have a very serious conversation about what sort of kind of kind of servitude is appropriate or inappropriate in the context of AI development.

01:24:03 Speaker_02
There are a bunch of disanalogies from human slavery that I think are important. In particular, A, the AIs might not be moral patients at all, in which case we need to figure that out. There are ways in which we may be able to

01:24:21 Speaker_02
you know, have kind of motivation, like slavery involves all this like suffering and kind of non-consent and there's all these like specific dynamics involved in human slavery. But I think like,

01:24:32 Speaker_02
And so some of those may or may not be present in a given case with AI, and I think that's important.

01:24:37 Speaker_02
But I think overall, we are gonna need to stare hard at, like right now, the kind of default mode of how we treat AIs gives them no moral consideration at all, right?

01:24:48 Speaker_02
We're thinking of them as property, as tools, as products, and designing them to be assistants and stuff like that. And I think, you know,

01:24:58 Speaker_02
No, there has been no official communication from any AI developer as to when, under what circumstances that would change, right? And so I think there's a conversation to be had there that we need to have.

01:25:12 Speaker_02
And so, and I think there's a bunch of, yeah, so there's a bunch of stuff to say about that. I want to push back on the notion that there's sort of two options. There's like enslaved God, whatever that is, and like, Loss of control.

01:25:28 Speaker_02
And I think we can do better than that. Let's work on it. Let's try to do better. I think we can do better. And I think it might require being thoughtful. And it might require having

01:25:45 Speaker_02
a kind of mature discourse about this before we start taking irreversible moves. But I'm optimistic that we can at least avoid some of the connotations and a lot of the stuff at stake in that kind of binary.

01:25:59 Speaker_01
With respect to how we treat the AIs, I have a couple of contradicting intuitions. And the difficulty with using intuitions in this case is obviously it's not clear what reference class and AI we have control over is.

01:26:14 Speaker_01
So to give one that's very scared about the things we're going to do to these things.

01:26:22 Speaker_01
Um, if you read about like life under Stalin or Mao, it's if you're, there's one version of telling it, which is actually very similar to what we're, what we mean by alignment, which is, um,

01:26:35 Speaker_01
we do these like black box experiments about like, we're going to make it think that it can defect. And if it does, we know it's misaligned. And if you, Mao, the hundred flowers campaign where you know, so let a hundred flowers boom.

01:26:46 Speaker_01
I'm going to allow criticism of my regime. So on. And that lasted for a couple of years. And afterwards, everybody who did that, that was a way to find the quote unquote, the snakes who are the rightest, who are secretly hiding.

01:26:57 Speaker_01
And you know, we'll like purge them. the sort of paranoia of defectors, like anybody in my, anybody in my entourage, any of my regime, they could like, they could be a secret capitalist trying to bring down the regime.

01:27:11 Speaker_01
That's one way of talking about these things, which is very concerning. Is that the correct reference class?

01:27:17 Speaker_02
I certainly think concerns in that vein are real. I mean, I think if you, it is, it is disturbing how easy many of the analogies with kind of

01:27:30 Speaker_02
uh, human historical events and practices that we kind of deplore or at least have a lot of wariness towards are as in the context of the kind of way you end up talking about, um, kind of AI maintaining controls or control over AI, like making sure that it doesn't rebel.

01:27:50 Speaker_02
Like I think we should, we should be noticing that kind of, um, reference class that some of that talk starts to conjure. And so basically just, yes, I think we should be very, we should really notice that.

01:28:08 Speaker_02
You know, part of what I'm trying to do in the series is to bring the kind of full range of considerations at stake into play, right? Like I think it is both the case that like,

01:28:22 Speaker_02
you, that we should be quite concerned about like being kind of overly controlling or, you know, abusive or oppressive, or there's all sorts of ways you can go too far.

01:28:32 Speaker_02
And I think, you know, there are concerns about the AIs being genuinely dangerous and genuinely, you know, acting, you know, killing us, violently overthrowing us. I think, and I think the moral of situation is quite complicated.

01:28:47 Speaker_02
And then I think in some sense, So often when you're, if you imagine a sort of external aggressor who's coming in and invading you, you feel very justified in doing like a bunch of stuff to prevent that.

01:29:03 Speaker_02
It's like a little bit different when you're like inventing the thing and you're doing it like incautiously or something.

01:29:09 Speaker_02
And then you're also like, I think the sort of moral justification you have for like, there's a different vibe in terms of like the kind of overall,

01:29:22 Speaker_02
yeah, justificatory stance you might have for various types of like, uh, more kind of power exerting, uh, interventions. Um, and so like, that's, that's like one, one, one feature of the situation.

01:29:34 Speaker_01
The opposite perspective here is that you're doing this sort of vibes based reasoning of like, ah, that looks yucky of like doing rating descent on these minds.

01:29:44 Speaker_01
And in the past, a couple of references, a couple of similar cases might've been something like, environmentalists not liking nuclear power.

01:29:54 Speaker_01
And because the vibes of nuclear don't look green, but obviously that set back the cause of fighting climate change.

01:30:00 Speaker_01
And so the end result of like a future you're proud of, a future that's appealing, is set back because like your vibes about, we would be wrong to brainwash a human, but you're trying to apply to a disanalogous case where that's not as relevant.

01:30:15 Speaker_01
I do think

01:30:17 Speaker_02
There's a concern here that I, you know, I really try to foreground in the series that I think is related to what you're saying, which is something like, you know, you might be worried that we will be very gentle and nice and free with the AIs, and then they'll kill us.

01:30:34 Speaker_02
You know, they'll take advantage of that, and then it will have been like a catastrophe, right? And, So I open the series basically with an example that I'm really trying to conjure that possibility at the same time as conjuring the

01:30:52 Speaker_02
grounds of gentleness and the sense in which it is also the case that these AIs could be, they can be both be like others, moral patients, like this sort of new species in the sense of that should conjure like wonder and reverence and such that they will kill you.

01:31:08 Speaker_02
And so I have this example of like, ah, this documentary, Grizzly Man, where there's this environmental activist, Timothy Treadwell, and he, aspires to approach these grizzly bears.

01:31:21 Speaker_02
He lives, you know, in the summer, he goes into Alaska and he lives with these grizzly bears and he aspires to approach them with this like gentleness and reverence. He doesn't use bear mace or he doesn't like carry bear mace.

01:31:30 Speaker_02
He doesn't use a fence around his camp. And he gets eaten alive by the bears or one of these bears. And I kind of really wanted to foreground that possibility in the series. I think we need to be talking about these things both at once, right?

01:31:49 Speaker_02
Bears can be moral patients, right? AIs can be moral patients. Nazis are moral patients. Enemy soldiers have souls, right? And so I think we need to learn the art of kind of hawk and dove both. There's this

01:32:04 Speaker_02
dynamic here that we need to be able to hold both sides of, um, as we, as we kind of go into these trade-offs and these dilemmas and, and, and all sorts of stuff.

01:32:11 Speaker_02
And like a lot of, part of what I'm trying to do in the series is like really kind of bring it all to the table at once.

01:32:16 Speaker_01
I think the big crux that I have, like if I today was to massively change my mind about what should be done is just the question of how weird by default things end up, how alien they end up.

01:32:32 Speaker_01
And a big part of that story is the, you made a really interesting argument on your blog post that if moral realism is correct, that actually makes an empirical prediction, which is that the aliens, the ASIs, whatever, should converge on the right morality the same way that they converge on the right mathematics.

01:32:51 Speaker_01
That's a really interesting point. But there's another prediction that moral realism makes, which is that over time, society should become more moral, become better.

01:33:05 Speaker_01
And to the extent that we think that's happened, of course, there is the problem of what morals do you have now? Well, it's the ones that society has been converging towards over time. But to the extent that it's happened,

01:33:17 Speaker_01
one of the predictions of moral realism has been confirmed, which means should we update in favor of moral realism?

01:33:24 Speaker_02
One thing I want to flag is I don't think all forms of moral realism make this prediction. And so that's just one point. I'm happy to talk about the different forms I have in mind.

01:33:35 Speaker_02
I think there are also forms of kind of things that kind of look like moral anti-realism, at least in their metaphysics, according to me, but which just posit that in fact, there's this convergence. It's not in virtue of interacting with some like,

01:33:47 Speaker_02
kind of mind independent moral truth, but just like, as it's just for some other reason, it's the case that, and that looks like a lot like moral realism at that point. Cause it's kind of like, oh, it's really universal.

01:33:55 Speaker_02
Like everyone ends up here and it's kind of tempted to be like, ah, like why? Right. Is that, and then whatever answer for the why is a little bit like, is that, is that the Tao? Is that the nature of the Tao?

01:34:04 Speaker_02
Even if there's not sort of a, an extra metaphysical realm in which the moral lives or something. So,

01:34:11 Speaker_02
Yeah, so moral convergence I think is sort of a different factor from like the existence or non-existence of kind of non-natural, like a kind of morality that's not reducible to natural facts, which is the type of moral realism I usually consider.

01:34:25 Speaker_02
Now, okay, so does the improvement of society, is that an update towards moral realism? I mean, I guess like, so I, maybe it's like a very weak update or something. Like, I guess I'm kind of like, which view like predicts this hard?

01:34:44 Speaker_02
I guess it feels to me like moral anti-realism is like very comfortable with the observation of the like- People with certain values have those values. Well, yeah.

01:34:55 Speaker_02
So there's obviously this like first thing, which is like any, if you're the culmination of some process of moral change, then it's very easy to look back at that process and be like, moral progress, like the arc of history bends towards me.

01:35:07 Speaker_02
You can look more like if it was like, if there was a bunch of dice rolls along the way, you might be like, oh wait, that's not rational. That's not the march of reason.

01:35:15 Speaker_02
So there's still like empirical work you can do to tell whether that's what's going on. But I also think it's just, you know, on moral anti-realism, I think it's just still possible to say like, consider Aristotle and us, right?

01:35:29 Speaker_02
And we're like, okay, has there been moral progress by Aristotle's lights? or something, and our lights too, right? And you could think, ah, isn't that a little bit like moral realism? It's like these hearts are singing in harmony.

01:35:47 Speaker_02
That's the moral realist thing, right? The anti-realist thing, the hearts all go different directions, but you and Aristotle apparently like are both excited about the kind of march of history.

01:35:59 Speaker_02
Some open question about whether that's true, like what are Aristotle's like reflective values, right? Suppose it is true. I think that's fairly explicable in moral anti-realist terms.

01:36:07 Speaker_02
You can say, roughly, that, yeah, you and Aristotle are sufficiently similar, and you endorse sufficiently similar kind of reflective processes.

01:36:16 Speaker_02
And those processes are, in fact, instantiated in the march of history that, yeah, history has been good for both of you. And I don't think that's—you know, I think there are

01:36:30 Speaker_02
worlds where that isn't the case, and so I think there's a sense in which maybe that prediction is more likely for realism than anti-realism, but I don't—it doesn't, like, move me very much.

01:36:41 Speaker_01
One thing I wonder is, look, there's—I don't know if moral realism is the right word, but the thing you mentioned about there's something that makes

01:36:50 Speaker_01
hearts converge to the thing we are or the thing we upon reflection would be and even if it's not something that's like instantiated in a realm beyond the universe it's it's like a force that exists that acts in a way we're happy with to the extent that doesn't exist and you let go of the reins and then you get the paper clippers feels like

01:37:11 Speaker_01
We were doomed a long time ago in the sense of, yeah, I just different utility functions banging against each other. And some of them have parochial preferences. But like, you know, it just combat and some guy won.

01:37:25 Speaker_01
Whereas in the world where like, no, this is this is the thing like these are where the hearts are supposed to go or it's only by catastrophe that they don't end up there. That feels like the world where it really matters.

01:37:39 Speaker_01
And in that world, the worry, the initial question I asked is like, what would make us think that alignment was a big mistake?

01:37:46 Speaker_01
In the world where hearts just naturally end up towards the thing, what we want, maybe it takes an extremely strong force to push them away from that. and that extremely strong forces, you solve technical alignment and just like... No.

01:37:59 Speaker_01
Yeah, you're just like the blinders on the horse's eyes. So like in the worlds where like the worlds that really matter, we're like, ah, this is where the hearts want to go. In that world, maybe alignment is what fucks us up.

01:38:13 Speaker_02
On this question of kind of do the worlds where there's not this kind of convergent moral force, whether kind of metaphysically inflationary or not, uh, matter or are those the only roles that matter?

01:38:25 Speaker_01
Or so sorry. Um, maybe what I meant was in those worlds, like you're kind of fucked. It's like, yeah, maybe the, or the world's without that the world's where there's no Dow. Yeah.

01:38:34 Speaker_02
Let's use the term Dow for like this kind of convergent morality over the course of millions of years.

01:38:40 Speaker_01
Like it was gonna go somewhere one way or another. It wasn't gonna end up your, your particular utility function.

01:38:46 Speaker_02
Okay, well, let's distinguish between ways you can be doomed. One way is kind of philosophical. So you could be the sort of moral realist or kind of realist-ish person, of which there are many, who have the following intuition.

01:39:04 Speaker_02
They're like, if not moral realism, then nothing matters. Right. It's dust and ashes. It is my metaphysics and, and, or like normative view or the void. Right. Um, and I think this is, uh, a common, a common view.

01:39:20 Speaker_02
I think Derek Parfitt, at least some comments of Derek's Parfitt's, uh, suggests this view. I think lots of moral realists will like, um, uh, kind of profess this view.

01:39:27 Speaker_02
Eliezer Yurkowsky, I think that there are sort of some sense in which I think his early thinking was inflected with this sort of thought. Um, he later recanted, it's very hard. So I think this is importantly wrong.

01:39:40 Speaker_02
And so here's my, here's the case, I have an essay about this, it's called, Against the Normative Realist's Wager. And here's the case that convinces me. So imagine that a metaethical fairy appears before you, right?

01:39:55 Speaker_02
And this fairy knows whether there is a Tao. And the fairy says, okay, I'm gonna offer you a deal. If there is a Tao, then I'm gonna give you $100. If there isn't a Dow, then I'm going to burn you and your family and 100 innocent children alive, right?

01:40:15 Speaker_02
Okay, so claim, don't take this deal, right? This is a bad deal. You're holding hostage your commitment to not being burned alive or like your care for that, to this like abstruse, like basically you're,

01:40:30 Speaker_02
Yeah, like I think, I mean, I go through in the essay a bunch of different ways in which I think this is wrong, but I think just like, and I think these people who kind of pronounce, they're like moral realism or the void, like they don't actually think about bets like this.

01:40:39 Speaker_02
I'm like, no, no, okay, so really, like, is that what you want to do?

01:40:42 Speaker_02
And no, I think we should, I still care about my value, my sort of allegiance to my values, I think is kind of outstrips the, my like commitments to like various like metaethical interpretations of my values. I think like we should,

01:41:00 Speaker_02
the sense in which we like care about not being burned alive is much more solid than like our kind of, you know, then the reasoning and on what matters. Okay. So that's, that's this, that's like the, the sort of philosophical doom.

01:41:12 Speaker_01
Right.

01:41:12 Speaker_02
Now you could have this, it sounded like you were also gesturing at, at a sort of empirical doom. Right. Which is like, okay, dude, if it's all, if it's just going in a zillion directions, come on, you think it's going to go in your direction?

01:41:24 Speaker_02
Like there's going to be so much churn, um, like you're just going to lose. Um, and, uh, and so, uh, you know, you should give up now and, and, and kind of only, only fight for the, for the, the realism worlds there. I'm like, I mean, so I think,

01:41:44 Speaker_02
you know, you gotta do the expected value calculation. You gotta like actually have a view about like how doomed are you in these different worlds? What's the tractability of changing different worlds?

01:41:51 Speaker_02
I mean, I'm quite skeptical of that, but that's a kind of empirical claim. I'm also just like kind of low on this like everyone converges thing. So, you know, if you imagine like you train a chess playing AI or you have a real paper clipper, right?

01:42:11 Speaker_02
Like somehow you had a real paper clipper and then you're like, okay, you know, go and reflect.

01:42:18 Speaker_02
Based on my like understanding of like how moral reasoning works, like if you look at the type of moral reasoning that like analytic ethicists do, it's just reflective equilibrium, right? They just like take their intuitions and they systematize them.

01:42:32 Speaker_02
I don't see how that process gets a sort of injection

01:42:37 Speaker_02
of like the kind of mind independent moral truth, or like, I guess it, like if you sort of start with like only all of your intuition say to maximize paperclips, I don't see how you end up maximizing or like doing some like rich human morality.

01:42:51 Speaker_02
I just don't, I like, it doesn't look to me like that's how human ethical reasoning works. I think like most of what normative philosophy does is make consistent and kind of systematize pre-theoretic intuitions. And so,

01:43:09 Speaker_02
But we'll get evidence about this. In some sense, I think this view predicts you keep trying to train the AIs to do something, and they keep being like, no, I'm not going to. like, do that. It's like, no, that's not good.

01:43:20 Speaker_02
Or so they keep like pushing back. Like the sort of momentum of like AI cognition is like always in the direction of this like moral truth.

01:43:26 Speaker_02
And whenever we like try to push it in some other direction, we'll find kind of resistance from like the rational structure of things.

01:43:32 Speaker_01
So sorry, actually, I've heard from researchers who are doing alignment that like for red teaming inside these companies, they will try to red team a base model.

01:43:41 Speaker_01
So it's not been RLHF'd, it's just like, predict next token, the raw, crazy, whatever shoggoth. And they try to get this thing to, hey, help me make a bomb, help me whatever.

01:43:52 Speaker_01
And they say that it will, like, it's odd how hard it tries to refuse, even before it's been RLHF'd.

01:43:58 Speaker_02
I mean, look, it will be a very interesting fact if it's like, man, we keep training these AIs in all sorts of different ways. Like, we're doing all this crazy stuff and they keep like acting like bourgeois liberals.

01:44:11 Speaker_02
It's like, wow, like that's, or, you know, they keep like really, or they keep professing this like weird alien reality. They all converge on this one thing. They're like, can't you see? It's like Zorgo, like Zorgo is the thing.

01:44:22 Speaker_02
And like all the AIs, you know, interesting, very interesting. I think my personal prediction is that that's not what we see. And my actual prediction is that the AIs are going to be very malleable. Like we're going to be like,

01:44:35 Speaker_02
know, if you push an AI towards evil, it'll just go. And I think that's obviously sort of reflectively consistent evil. I mean, I think there's also a question with some of these AIs. It's like, Will they even be consistent in their values?

01:44:54 Speaker_02
I like this image of the blinded horses, and I like this image of maybe alignment is going to mess with the... I think we should be really concerned if we're forcing facts on our AIs.

01:45:05 Speaker_02
That's a really bad... Because I think one of the clearest things about human processes of reflection like the kind of easiest thing to be like, let's at least get this is like not acting on the basis of a incorrect empirical picture of the world.

01:45:20 Speaker_02
Right. And so if you find yourself like asking Ray, by the way, like, this is true. And I need you to always be reasoning as though blah is true. I'm like, ooh, I think that's a no-no from an anti-realist perspective too, right?

01:45:37 Speaker_02
My reflective values, I think, will be such that I formed them in light of the truth about the world.

01:45:43 Speaker_02
And I think this is a real concern about, as we move into this era of kind of aligning AIs, I don't actually think this binary between values and other things is gonna be very obvious in how we're training them. I think it's gonna be much more like,

01:45:56 Speaker_02
ideologies and like you can just train an AI to like output stuff, right? Output utterances. And so you, you can easily end up in a situation where you like decided that blah is true about some issue. Um, an empirical issue, right? Not a moral issue.

01:46:08 Speaker_02
And, uh, so like, I think, I think people should not, for example, I do not think people should hard code belief in God into their AIs or like, I would, I would advise people to not hard code their religion into their AIs if they also want to like

01:46:21 Speaker_02
discover if their religion is false. I would just in general, if you would like to have your behavior be sensitive to whether something is true or false, like it's sort of generally not good to like etch it into things.

01:46:34 Speaker_02
And so that is definitely a form of blinder I think we should be really watching out for. And I'm kind of hopeful, so like I have enough credence on some sort of moral realism that like I'm hoping that if we just do the anti-realism thing,

01:46:47 Speaker_02
of just being consistent, learning all this stuff, reflecting. If you look at how moral realists and moral anti-realists actually do normative ethics, it's the same. It's basically the same.

01:46:56 Speaker_02
There's some amount of different heuristics on things like properties like simplicity and stuff like that, but I think it's like, they're mostly just doing the same game.

01:47:05 Speaker_02
And so I'm kind of hoping that, and also meta-ethics is itself a discipline that AIs can help us with, I'm hoping that we can just figure this out either way. So if there is, if moral realism is somehow true, I want us to be able to notice that.

01:47:20 Speaker_02
And I want us to be able to like adjust accordingly. So I'm not like writing off those worlds and be like, let's just like totally assume that's false. But the thing I really don't want to do is write off the other worlds where it's not true.

01:47:29 Speaker_02
Because my guess is it's not true. Right. And I think stuff still matters a ton in those worlds too.

01:47:34 Speaker_01
So a blended crux is like, Okay, you're training these models. We were in this incredibly lucky situation where we, it turns out the best way to train these models is to just give them everything humans have ever said, written thought.

01:47:51 Speaker_01
And also these models, the reason they get intelligence is because they can generalize, right? Like they can rock, what is it? What is the gist of things?

01:47:58 Speaker_01
So are we fundamentally very, or should we just expect this to be a situation which leads to alignment in the sense of how exactly does this thing that's trained to be an amalgamation of human thought become a paper clipper?

01:48:13 Speaker_01
The thing you kind of get for free is it's an intellectual descendant. The paper clipper is not an intellectual descendant.

01:48:21 Speaker_01
Whereas the AI, which understands all the human concepts, but then gets stuck on some part of it, which we aren't totally comfortable with. It's like, you know, it's, it feels like an intellectual descendant in the way we care about.

01:48:34 Speaker_02
I'm not sure about that. I'm not sure I do care about a notion of intellectual descendant in that sense. I mean, literal paperclips is a human concept, right? So I don't think any old human concept will do for the thing we're excited about.

01:48:53 Speaker_02
I think the stuff that I would be more interested in the possibility of getting for free are things like consciousness, pleasure, sort of other features of human cognition.

01:49:07 Speaker_02
Like I think, so there are paper clippers and there are paper clippers, right? So imagine if the paper clipper is like an unconscious

01:49:14 Speaker_02
kind of voracious machine and it's just like it appears to you as a cloud of paperclips, you know, um, but, and there's nothing sort of, that's like one vision. If you imagine the paperclip is like a conscious being that like loves paperclips, right?

01:49:26 Speaker_02
It like takes pleasure in making paperclips. Um, that's like a different, thing, right? And obviously it could still, it's not necessarily the case that like, you know, it makes the future all paper clippy.

01:49:40 Speaker_02
It's probably not optimizing for consciousness or pleasure, right? It cares about paperclips. Maybe, maybe eventually if it's like suitably certain, it like uses, it turns itself into paperclips and who knows.

01:49:48 Speaker_02
But like it's still, I think a different, it's actually a somewhat different moral kind of mode with respect. That, that looks to me much more like a, you know, There's also a question of like, does it try to kill you and stuff like that.

01:50:01 Speaker_02
But I think that the, there are kind of features of the agents we're imagining other than the kind of thing that they're staring at that can matter to our sense of like sympathy, similarity, and,

01:50:16 Speaker_02
Yeah, and I think people have different views about this. So one possibility is that human consciousness, like the thing we care about in consciousness or sentience is super contingent and fragile.

01:50:23 Speaker_02
And like most minds, most like kind of smart minds are not conscious, right? It's like, the thing we care about with consciousness is this hacky contingent.

01:50:32 Speaker_02
It's like a product of like specific constraints, evolutionarily genetic bottlenecks, et cetera. And that's why we have this consciousness and like, you can get similar work done.

01:50:41 Speaker_02
Like, so consciousness presumably does some, some sort of work for us, but you can get similar work done in a different mind in a very different way. And you should sort of, so that's like, that's a sort of consciousness is fragile view. Right.

01:50:52 Speaker_02
And I think there's a different view, which is like, no, consciousness is, is, um, something that's quite structural.

01:50:58 Speaker_02
It's much more defined by functional roles like self-awareness, a concept of yourself, maybe higher order thinking, stuff that you really expect in many sophisticated minds.

01:51:10 Speaker_02
And in that case, okay, well now actually consciousness isn't as fragile as you might've thought, right?

01:51:15 Speaker_02
Now actually like lots of beings, lots of minds are conscious and you might expect at the least that you're gonna get like conscious superintelligence.

01:51:21 Speaker_02
They might not be optimizing for creating tons of consciousness, but you might expect consciousness by default. And then we can ask similar questions about something like valence or pleasure, or like the kind of character of the consciousness, right?

01:51:33 Speaker_02
So there's, you can have a kind of cold, indifferent consciousness that has no like human or no like emotional warmth, no like pleasure or pain.

01:51:44 Speaker_02
I think that can still be, Dave Chalmers has some papers about like Vulcans and he talks about they still have moral patient hood. I think that's very plausible, but I do think it's like,

01:51:54 Speaker_02
An additional thing you could get for free or get quite commonly, depending on its nature, is something like pleasure. Again, and then we have to ask, how janky is pleasure?

01:52:03 Speaker_02
How specific and contingent is the thing we care about in pleasure versus how robust is this as a functional role in minds of all kinds? And I personally don't know on this stuff.

01:52:13 Speaker_02
And I don't think this is enough to get you alignment or something, but I think it's at least worth being aware of these other features. We're not really talking about the AI's values in this case.

01:52:23 Speaker_02
We're talking about the kind of structure of its mind and the different properties the minds have. And I think that... that could show up quite robustly.

01:52:33 Speaker_01
So part of your day job is writing these kinds of section 2.2.2.5 type reports. And part of it is like, ah, society is like a tree that's growing towards the light. What is it like context switching between the two of them?

01:52:52 Speaker_02
So I actually find that it's kind of quite complementary. Yeah, I will write these sort of more technical reports and then do this sort of kind of more literary writing and philosophical writing.

01:53:06 Speaker_02
And I think they both draw on kind of like different parts of myself and I try to think about them in different ways.

01:53:10 Speaker_02
So, you know, I think about the, you know, some of the reports as are much more like this is like I'm kind of more fully optimizing for like trying to do something impactful or trying to kind of

01:53:22 Speaker_02
Yeah, there's kind of more of an impact orientation there.

01:53:24 Speaker_02
And then on the kind of essay writing, I give myself much more leeway to kind of, yeah, just let other parts of myself and other parts of my concerns kind of come out and kind of, you know, self-expression and like aesthetics and other sorts of things.

01:53:39 Speaker_02
Even while they're both, I think for me, part of an underlying kind of similar concern or, you know, an attempt to have a kind of integrated orientation towards the situation.

01:53:51 Speaker_01
Could you explain the nature of the transfer between the two? So in particular, from the literary side to the technical side, I think rationalists are known for having a sort of ambivalence towards great works or humanities.

01:54:08 Speaker_01
Are they missing something crucial because of that? Because one thing you notice in your essays is just lots of references to epigraphs, to lines in poems or essays that are particularly irrelevant. I don't know.

01:54:23 Speaker_01
Are the rest of the rationalists missing something because they don't have that kind of background?

01:54:27 Speaker_02
I mean, I don't wanna speak, I think some rationalists, you know, lots of rationalists love these different things.

01:54:30 Speaker_01
I do think... By the way, I'm just referring specifically to SVF as a post about how Shakespeare could be, the base race of Shakespeare being a great writer, and also books can be condensed to essays.

01:54:42 Speaker_02
Well, so on just the general question of how should people value great works or something, I think people can kind of, fail in both directions, right?

01:54:50 Speaker_02
And I think some people, maybe like maybe SBF or other people, they're sort of interested in puncturing a certain kind of like sacredness and prestige that people can try to kind of like Yeah, that people associate with some of these works.

01:55:07 Speaker_02
And I think there's a way in which, but as a result can miss some of the like genuine value.

01:55:14 Speaker_02
But I think they're responding to a real failure mode on the other end, which is to kind of, yeah, be too enamored of this prestige and sacredness to kind of siphon it off as some like weird legitimating function for your own thought, instead of like thinking for yourself.

01:55:29 Speaker_02
losing touch with like what it, what do you actually think or what do you actually learn from? Like, I think sometimes, you know, these epigraphs, careful, right?

01:55:35 Speaker_02
I mean, it's like, I think, you know, and I'm not, I'm not saying I'm immune from these vices. I think there can be a like, ah, but Bob said this and it's like, Whoa, very deep. Right. And it's like, these are humans like us. Right.

01:55:45 Speaker_02
And I think, I think the Canon and like other great works and all, you know, all sorts of things have a lot of value and

01:55:51 Speaker_02
you know, we shouldn't, I think sometimes it like borders on the way people like read scripture, or I think like there's a kind of like scriptural authority that people will sometimes like ascribe to these things.

01:56:00 Speaker_02
And I think that's not, um, so yeah, I think it's kind of, you know, you can fall off on both sides of the horse.

01:56:05 Speaker_01
It actually relates really interestingly to I remember I was talking to somebody who at least is familiar with rationalist discourse and I was telling, he was asking like, what are you interested in these days?

01:56:17 Speaker_01
And I was saying something about this part of Roman history, super interesting. And then his first sort of response was,

01:56:24 Speaker_01
Oh, you know, it's really interesting when you look at these secular trends of like Roman times to what happened in the Dark Ages versus the Enlightenment.

01:56:34 Speaker_01
For him, it was like, the story of that was just like, how did it contribute to the big secular, like the big picture? The sort of particulars didn't, they don't, like, there's no interest in that.

01:56:43 Speaker_01
It's just like, if you zoom out at the biggest level, what's happening here? Whereas there's also the opposite failure mode when people study history.

01:56:52 Speaker_01
Dominic Cummings writes about this because he is endlessly frustrated with the political class in Britain. And he'll say things like, well, you know, they study politics, philosophy, and economics.

01:57:01 Speaker_01
And a big part of it is just like being really familiar with these poems and like reading a bunch of history about the War of the Roses or something.

01:57:09 Speaker_01
But he's frustrated that they take away, they have all these like kings memorized, but they take away very little in terms of lessons from these episodes. It's more of just like almost entertaining, like watching Game of Thrones for them.

01:57:21 Speaker_01
Whereas he thinks like, oh, we're repeating certain mistakes that he's seen in history. Like he can generalize in a way they can't. Uh, so the first one seems like the mistake, I think C.S.

01:57:29 Speaker_01
Lewis talks about in the, uh, one of the essays you cited where it's like, if you see through everything, it's like, you're, you're really blind. Right? Like if everything is transparent.

01:57:36 Speaker_02
I mean, I think there's kind of very little excuse for like not learning history or, or I don't know, or sorry. I mean, I'm not saying I like have learned enough history.

01:57:47 Speaker_02
I guess I feel like even when I try to channel some sort of vibe of like skepticism towards like great works, I think that doesn't generalize to like thinking it's not worth understanding human history.

01:57:58 Speaker_02
I think human history is like, you know, just so clearly, you know, crucial to kind of understand this is what, it's what's structured and created all of the stuff. And so, um, uh,

01:58:13 Speaker_02
There's an interesting question about what's the level of scale at which to do that, and how much should you be looking at details, looking at macro trends, and that's a dance. I do think it's nice for people to be like,

01:58:28 Speaker_02
um, at least attending to the kind of macro narrative. I think there's like a, there's some virtue in like having a worldview, like really like building a model of the whole thing, which I think sometimes gets lost in like, um, the details.

01:58:41 Speaker_02
Uh, and, um, but obviously like if you're too, you know, the details are what the world is made of. And so, so if you don't, don't have those, you don't have data, uh, at all. So, so, um,

01:58:52 Speaker_02
Yeah, it seems like there's some skill in like learning history well.

01:58:57 Speaker_01
This actually seems related to, you have a post on sincerity.

01:59:01 Speaker_01
And I think like, if I'm getting the sort of the vibe of the piece right, it's like, at least in the context of let's say intellectuals, certain intellectuals have a vibe of like shooting the shit. And they're just like trying out different ideas.

01:59:13 Speaker_01
How did these like, how did these analogies fit together? Maybe there's some, and those seem closer to the,

01:59:20 Speaker_01
I'm looking at the particulars and like, oh, this is just like that one time in the 15th century where they overthrew this king and they blah, blah, blah. Whereas this guy who was like, oh, here's a secular trend from like, if you look at

01:59:39 Speaker_01
the growth models for like a million years ago to now. It's like, here's what's happening. Um, that, that one has a more of sort of sincere flavor. Some people, especially when it comes to AI discourse have a very

01:59:51 Speaker_01
um the the sincere mode of operating is like I've thought through my bio anchors and I like disagree with this premise so here my effective compute estimate is different in this way here's how I analyze the scaling laws and if I could only have one person to help me guide my decisions on the AI I might choose that that person but

02:00:13 Speaker_01
I feel like if I could choose between, if I had 10 different advisors at the same time, I might prefer the shooting the shit type characters who have these weird esoteric intellectual influences, and they're almost like random number generators.

02:00:28 Speaker_01
They're not especially calibrated, but once in a while they'll be like, oh, there's like one weird philosopher I care about, or this one historical event I'm obsessed with has a interesting perspective on this.

02:00:39 Speaker_01
And they tend to be more intellectually generative as well because they're not, I think one big part of it is that... if you are so sincere, you're like, Oh, I've like thought through this.

02:00:49 Speaker_01
Obviously ASI is the biggest thing that's happening right now. It like, doesn't really make sense to spend a bunch of your time thinking about like, how did the Comanches live? And what is the history of oil?

02:00:59 Speaker_01
And like, um, uh, how, how did like Gerard think about conflict? You know, just like, what are you talking about? Like, come on, like ASI is happening in a few years. Right.

02:01:07 Speaker_01
Whereas, uh, and, and, but therefore the people who have go on these rabbit holes were because they're just trying to shoot the shit have, I feel like are more generative.

02:01:15 Speaker_02
I mean, it might be worth distinguishing between something like intellectual seriousness and something like how diverse and wide ranging and idiosyncratic are the things you're interested in.

02:01:32 Speaker_02
And I think maybe there's some correlation for people who are kind of like, Or maybe intellectual seriousness is also distinguishable from something like shooting the shit. Like maybe you can shoot the shit seriously.

02:01:44 Speaker_02
I mean, there's a bunch of different ways to do this, but I think having an exposure to like all sorts of different sources of data and perspectives seems great.

02:01:50 Speaker_02
And I do think it's possible to like curate your kind of intellectual influences too rigidly in virtue of some story about what matters. Like I think it is good for people to like have space. I mean, I'm really a fan of, or I appreciate the way like,

02:02:08 Speaker_02
I don't know, I try to give myself space to do stuff that is not about like, this is the most important thing. And that's like feeding other parts of myself. And I think, you know, parts of yourself are not isolated. They like feed into each other.

02:02:19 Speaker_02
And it's sort of, I think, a better way to be a kind of richer and fuller human being in a bunch of ways. And also just like these sources of data can be just really directly relevant.

02:02:26 Speaker_02
And I think some people I know who I think of as like quite intellectually sincere, and in some sense quite focused on the big picture, also have a very impressive command of this very wide range of kind of empirical data.

02:02:37 Speaker_02
And they're like really, really interested in the empirical trends and they're not just like, oh, you know, it's a philosophy or, you know, sorry, it's not just like, oh, history, it's the march of reason or something.

02:02:44 Speaker_02
No, they're like really, they're really in the weeds. I think there's a kind of in the weeds, virtue that I actually think is like closely related in my head with, with some kind of seriousness and sincerity.

02:02:56 Speaker_02
Um, I do think there's a different dimension, which is there's like kind of trying to get it right. And then there's kind of like stuff out there, right? Try to like, what if it's like this or like try this on or I have a hammer, I will hit everything.

02:03:08 Speaker_02
Well, what if I just hit everything with this hammer? Right. Um, and, and so I think some people do that and I think there is, you know, there's room for all kinds. Um, I kind of think the thing where you just get it right is kind of undervalued.

02:03:23 Speaker_02
Or, I mean, it depends on the context you're working in. I think like certain sorts of intellectual cultures and milieus and incentive systems, I think, incentivize

02:03:33 Speaker_02
you know, saying something new or saying something original or saying something like flashy or provocative or, um, and then like kind of various cultural and social dynamics and like, Oh, like, you know, and people are like doing all these like kind of, you know, kind of performative or statusy things.

02:03:46 Speaker_02
Like there's a bunch of stuff that goes on when people like do thinking and, um, you know, cool. But like if something's really important, let's just get it right. And I think, and sometimes it's like boring, but it doesn't matter.

02:04:02 Speaker_02
And I also think like, like stuff is less interesting if it's false, right? Like I think if someone's like, brah, and you're like, nope, I mean, it can be useful.

02:04:11 Speaker_02
I think sometimes there's, there's an interesting process where someone says like, blah, provocative thing. And it's, it's a kind of an epistemic project to be like, wait, why exactly do I think that's false, right?

02:04:24 Speaker_02
And you really, you know, someone's like, healthcare doesn't work, medical care does not work, right? Someone says that and you're like, all right, how exactly do I know that medical care works, right?

02:04:32 Speaker_02
And you like go through the process of trying to think it through. And so I think there's like room for that, but I think ultimately like kind of the real profundity is like, true, right?

02:04:46 Speaker_02
Or like kind of things, things become less interesting if they're just not true.

02:04:50 Speaker_02
And I think that's, I think sometimes it feels to me like people, or it's at least, it's at least possible, I think, to like lose, lose touch with that and to be more like flashy and, and it's kind of like, eh, this actually isn't, there's, there's not actually something here, right?

02:05:03 Speaker_01
One thing I've been thinking about recently after I interviewed Leopold was, or while prepping for it, listen, I haven't really thought at all about the fact that there's going to be a geopolitical angle to this AI thing.

02:05:15 Speaker_01
And it turns out if you actually think about the national security implications, that's a big deal.

02:05:20 Speaker_01
Now I wonder, given the fact that that was like something that wasn't on my radar right now, it's like, oh, obviously that's a crucial part of the picture. How many other things like that there must be?

02:05:28 Speaker_01
And so even if you're coming from the perspective of like AI is incredibly important, if you did happen to be the kind of person who's like, ah, you know, every once in a while I'm like checking out different kinds of, I'm like incredibly curious about what's happening in Beijing.

02:05:42 Speaker_01
And then the kind of thing that later on you realize was like, oh, this is a big deal. You have more awareness of, you can spot it in the first place.

02:05:50 Speaker_01
Whereas, I wonder, so maybe there's not an exact, there's not necessarily a trade off, like, it's sort of like the rational thing is to have some sort of really optimal explore exploit trade off here where you're like, constantly searching things out.

02:06:07 Speaker_01
So I don't know if practically that's works out that well, but that that experience made me think like, Oh, I really should be

02:06:15 Speaker_01
trying to expand my horizons in a way that's undirected to begin with, because there's a lot of different things about the world you have to understand to understand any one thing.

02:06:23 Speaker_02
I mean, I think there's also room for division of labor, right? Like I think there can be, yeah, like, you know, there are people who are like trying to like draw a bunch of pieces and then be like, here's the overall picture.

02:06:32 Speaker_02
And then people who are going really deep on specific pieces, people who are doing the more like generative, throw things out there, see what sticks.

02:06:38 Speaker_02
So I think there, it also doesn't need to be that like all of the epistemic labor is like located in one brain. And you know, it depends like your role in the world and other things.

02:06:48 Speaker_01
So in your series, you express sympathy with the idea that, Even if an AI or I guess any sort of agent that doesn't have consciousness has a certain wish and is willing to pursue it nonviolently, we should respect its rights to pursue that.

02:07:10 Speaker_01
And I'm curious where that's coming from, because conventionally I think like the thing matters because like it's conscious and it's conscious sort of experience as a result of that pursued matter.

02:07:23 Speaker_02
Well, I don't know. I mean, I think that I don't know where this discourse leads. I just, I, I'm like suspicious of the amount of like ongoing confusion that it seems to me is like present in our conception of consciousness.

02:07:33 Speaker_02
You know, I mean, so I sometimes think of it in analogies with like, you know, people talk about like life and like Alain Vital, right. And maybe, you know, there's a world, you know, Alain Vital was this like hypothesized life force.

02:07:44 Speaker_02
That is sort of the thing that's taken life. And I think, you know, we don't really use that concept anymore. We think that's like a little bit broken.

02:07:52 Speaker_02
And so, I don't think you want to have ended up in a position of saying like, everything that doesn't have a lan vital doesn't matter or something, right? Cause then you end up later.

02:08:03 Speaker_02
And then somewhat similarly, if you, even if you're like, no, no, there's no such thing as a lan vital, but life, surely life exists. And I'm like, yeah, life exists. I think consciousness exists too, likely, depending on how we define the terms.

02:08:14 Speaker_02
I think it might be a kind of verbal question. even once you have a kind of reductionist conception of life, I think it's possible that it kind of becomes less attractive as a moral focal point, right?

02:08:28 Speaker_02
So like right now we really think of consciousness, we're like, it's a deep fact. It's like, so consider a question like, okay, so take a cellular automata, right? That is sort of self-replicating.

02:08:40 Speaker_02
It has like some information that, you know, and you're like, okay, is that alive? Right? It's kind of like, eh, It's not that interesting. It's a kind of verbal question, right?

02:08:49 Speaker_02
Like, or I don't know, philosophers might get really into like, is that alive? But you're not missing anything about this system, right? It's not like there's no extra life that's like springing up.

02:08:57 Speaker_02
It's just like, it's alive in some senses, not alive in other senses. And I think if you, but I really think that's not how we intuitively think about consciousness. We think whether something is conscious is a deep fact.

02:09:09 Speaker_02
It's this like additional, it's like this really deep difference between being conscious or not. It's like, is someone home? Is the lights are on? Right. And I have some concern that if that turns out not to be the case,

02:09:21 Speaker_02
then, then this is going to have been like a bad thing to like build our entire ethics around. And so now to be clear, I take consciousness really seriously.

02:09:30 Speaker_02
I'm like, I'm like, man, consciousness, I'm not one of these people like, oh, obviously consciousness doesn't exist or something. I'm like, but I also noticed how like confused I am and how dualistic my intuitions are.

02:09:39 Speaker_02
And I'm like, wow, this is really weird. And so I'm just like error bars around this. Anyway, so there's a bunch of other things going on in my wanting to be open to not making consciousness. fully necessary criteria.

02:09:54 Speaker_02
I mean, clearly like I definitely have the intuition, like consciousness matters a ton.

02:09:57 Speaker_02
I think like if something is not conscious and there's like a deep difference between conscious and unconscious, then I'm like definitely have the intuition that it's sort of, there's something that matters especially a lot about consciousness.

02:10:05 Speaker_02
I'm not trying to be like dismissive about the notion of consciousness. I just think we should be like quite aware of how it seems to me how ongoingly confused we are about its nature.

02:10:15 Speaker_01
Okay. So suppose we figure out that consciousness is just like, a word we use for a hodgepodge of different things, only some of which encompass what we care about.

02:10:26 Speaker_01
Maybe there's other things we care about that are not included in that word, similar to the life force analogy, then Where do you anticipate that would leave us as far as ethics goes? Like, would then there be a next thing that's like consciousness?

02:10:43 Speaker_01
Or what do you anticipate that would look like?

02:10:46 Speaker_02
So there's a class of people who are called illusionists in philosophy of mind, who will say consciousness does not exist. And this is sort of... it's different ways to understand this view.

02:10:59 Speaker_02
But one version is to sort of say that the concept of consciousness has built into it too many preconditions that aren't met by the real world. So we should sort of chuck it out, like Ilan Vital.

02:11:08 Speaker_02
Like instead of, the sort of proposal is kind of like, at least phenomenal consciousness, right? Or like qualia, or what it's like to be a thing.

02:11:17 Speaker_02
They'll just say, this is like sufficiently broken, sufficiently chock full of falsehoods that we should just not use it.

02:11:26 Speaker_02
I think there, it feels to me like, I am like, there's really clearly a thing, there's something going on with, you know, like I'm kind of really not.

02:11:38 Speaker_02
I kind of expect to, I do actually kind of expect to continue to care about something like consciousness quite a lot on reflection. Um, and to not, uh, kind of end up deciding that my ethics is like better, like doesn't make any reference to that.

02:11:52 Speaker_02
Or at least like there's some things like quite nearby to consciousness, you know, like when I, I stubbed my toe and I have this, like something happens when I stubbed my toe, unclear exactly what, how to name it, but I'm like something about that, you know, I'm like pretty focused on.

02:12:04 Speaker_02
And so I do think, um, you know, in some sense, if you're, if you feel like, well, where do things go? I'm like, I should be clear. I have a bunch of credence that in the end we end up carrying a bunch of our consciousness just directly.

02:12:17 Speaker_02
And so if we don't like, yeah, I mean, where will ethics go? Where will like a completed philosophy of mind go? Very hard, very hard to say. I mean, I can imagine something that's more like, I think, I mean, maybe a thing that I think a move that,

02:12:34 Speaker_02
people might make if you get a little bit less interested in the notion of consciousness is some sort of slightly more like animistic, like, so what's going on with the tree?

02:12:41 Speaker_02
And you're like, maybe not like talking about it as a conscious entity necessarily, but it's also not like, totally unaware or something.

02:12:50 Speaker_02
And like, so there's all this, like the consciousness discourse is rife with these funny cases where it's sort of like, oh, like those criteria imply that this, this totally weird entity would be conscious or something like that.

02:13:01 Speaker_02
Like, especially if you're interested in some notion of like agency or preferences, like a lot of things can be agents, corporation, you know, all sorts of things like corporations, consciousness, like, oh man.

02:13:09 Speaker_02
But I actually think, so one place it could go in theory is in some sense, you start to view the world as like animated by moral significance in richer and subtler structures than we're used to.

02:13:22 Speaker_02
And so plants or weird optimization processes or outflows of complex, I don't know, who knows exactly what you end up seeing as infused with the sort of thing that you ultimately care about. But I think it is possible that that doesn't map

02:13:39 Speaker_02
that that like includes a bunch of stuff that we don't normally ascribe consciousness to.

02:13:45 Speaker_01
I think that when you use a complete theory of mind and presumably after that, a more complete ethic, even the notion of a sort of reflective equilibrium implies like, oh, you'll be like, you'll be done with it at some point, right?

02:14:00 Speaker_01
Like you just, you sum up all the number and like, then you've got the thing you hear about. This might be unrelated to the same sense we have in science.

02:14:11 Speaker_01
But also, I think like this, the vibe you get when you're talking about these kinds of questions, is that, oh, you know, we're like rushing through all the science right now.

02:14:21 Speaker_01
And we've been churning through it, it's getting harder to find because there's some like cap, like you find all the things at some point right now, it's super easy, because like,

02:14:29 Speaker_01
a semi-intelligent species barely has emerged and the ASI will just rush through everything incredibly fast and like then you will either have aligned its heart or not

02:14:41 Speaker_01
In either case, it'll use what it's figured out about like what is really going on and then expand through the universe and exploit, you know, like do the tiling or maybe some more benevolent version of the quote unquote tiling.

02:14:54 Speaker_01
That feels like the basic picture of what's going on. We had dinner with Michael Nielsen a few months ago and his view is that this just keeps going forever or close to forever. How much would it change your,

02:15:08 Speaker_01
understanding of what's going to happen in the future if you were convinced that Nielsen is right about his picture of science?

02:15:15 Speaker_02
Yeah. I mean, I think there's a few different aspects.

02:15:16 Speaker_02
There's kind of my, my memory of this conversation, you know, I, I don't claim to really understand Michael's picture here, but I think my memory was it sort of like, sure, you get the, you get the fundamental laws.

02:15:29 Speaker_02
Like I think he, he, my impression was that he expects sort of physics, the kind of physics to get solved or something, maybe modular, like the expensiveness of certain experiments or something.

02:15:40 Speaker_02
But the difficulty is like, even granted that you have the kind of basic laws down, that still actually doesn't let you predict like where at the macro scale, like various useful technologies will be located.

02:15:52 Speaker_02
Like there's just still this like big search problem. And so my memory though, you know, I'll let him speak for himself on what his take is here.

02:16:00 Speaker_02
But my memory was, it was sort of like, sure, you get the fundamental stuff, but that doesn't mean you get the same tech. Um, you know, I'm not sure if that's true. I think if that's true, um, what kind of difference would it make?

02:16:11 Speaker_02
So one difference is that, uh, well, so here's a question. So like,

02:16:21 Speaker_02
It means at some sense you have to do, you have to, in a more ongoing way, make trade-offs between investing in further knowledge and further exploration versus exploiting, as you say, sort of acting on your existing knowledge.

02:16:37 Speaker_02
Because you can't get to a point where you're like, and we're done. Now, as I think about it, I mean, I think that's, you know, I sort of suspect that was always true.

02:16:45 Speaker_02
And like, I remember talking to someone, I think I was like, ah, we should, at least in the future, we should really get like all the knowledge. And he's like, well, what do you want to like, you don't want to know the output of every Turing machine?

02:16:54 Speaker_02
Or like, you know, in some sense, there's a question of like, what actually would it be to have like a completed knowledge? And I think that's a rich question in its own right.

02:17:02 Speaker_02
And I think it's like, not necessarily that we should imagine, even in this sort of, on any picture necessarily, that you've got like everything. And on any picture, in some sense, you could end up with this case where You cap out.

02:17:15 Speaker_02
There's some collider that you can't build or whatever. Something is too expensive or whatever, and everyone caps out there.

02:17:22 Speaker_02
I guess one way to put it is there's a question of do you cap, and then there's a question of how contingent is the place you go.

02:17:33 Speaker_02
If there's contingent, I mean, one thing, one prediction that makes is you'll see more diversity across, uh, you know, our universe or something. If there are aliens, they might have like quite different tech.

02:17:43 Speaker_02
Um, and so maybe like, you know, if, if people meet, you don't expect them to be like, ah, you got your thing. I got, I am our version inside. She's like, Whoa, like that thing. Wow. So that's like one thing.

02:17:52 Speaker_02
Um, if you expect more like ongoing discovery of tech, then, You might also expect more ongoing change and upheaval and churn, insofar as technology is one thing that really drives change in civilization.

02:18:11 Speaker_02
People sometimes talk about lock-in, and they envision this point at which civilization has settled into some structure or equilibrium or something. And maybe you get less of that.

02:18:20 Speaker_02
I think that's maybe more about the pace rather than contingency or caps, but that's that's another factor. I don't know if it changes the picture fundamentally of Earth civilization.

02:18:33 Speaker_02
We still have to make trade-offs about how much to invest in research versus acting on our existing knowledge, but I think it has some significance.

02:18:41 Speaker_01
I think one vibe you get when you talk to people... We're at a party and somebody mentioned this. We're talking about how uncertain should we be of the future and they're like, There are three things I'm uncertain about. Like what is consciousness?

02:18:51 Speaker_01
What is information theory? And what are the basic laws of physics? I think once we get that, we're like, we're done. Yeah. And that's like, oh, you'll figure out what's the right kind of hedonium. And then like, you know, that it has that vibe.

02:19:03 Speaker_01
Whereas this like, oh, you like, you're like constantly churning through and it has more of a flavor of like, more of the becoming that, like the attunement picture implies.

02:19:16 Speaker_01
And I think it's more exciting, like it's not just like, oh, you figured out the things in the 21st century and then you just, you know what I mean?

02:19:26 Speaker_02
Yeah, I mean, I sometimes think about this sort of two categories of views about this. Like there are people who think like, yeah, like the knowledge, like we've almost, we're almost there and then we've like, basically got the picture, right? Uh...

02:19:38 Speaker_02
And where the picture is sort of like, yeah, the knowledge is all just totally sitting there. Yeah. And it's like, you just have to get to like remote, there's like this kind of, just you have to be like scientifically mature at all. That's right.

02:19:49 Speaker_02
And then it's just gonna all fall together, right? And then everything past that is gonna be like this super expensive, like not super important thing.

02:19:56 Speaker_02
And then there's a different picture, which is much more of this like ongoing mystery, like ongoing, like, oh man, there's like gonna be more and more, like maybe expect more radical revisions to our worldview. And I think it's an interesting, Uh.

02:20:09 Speaker_02
Yeah, I think, you know, I'm kind of drawn to both. Like physics, we're pretty good at physics, right? Or like a lot of our physics is like quite good at predicting a bunch of stuff. Or at least that's my impression.

02:20:20 Speaker_02
This is, you know, reading some physicists, so who knows. Your dad's a physicist though, right? Yeah, but this isn't coming from my dad. This is like, there's a blog post, I think Sean Carroll or something.

02:20:28 Speaker_02
And he's like, we really understand a lot of like the physics that governs the everyday world. Like a lot of it, we're like really good at it. And I'm like, oh, I think I'm generally pretty impressed by physics as a discipline.

02:20:36 Speaker_02
I think that could well be right. And so, you know, on the other hand, like, ah,

02:20:40 Speaker_02
You know, really these guys, you know, had a few centuries of, so anyway, but I think that's an interesting, and it leads to a different, I think it does, there's something, you know, the endless frontier, there is a draw to that from an aesthetic perspective of the idea of like continuing to discover stuff.

02:20:59 Speaker_02
You know, at the least, I think you don't, you can't get like full knowledge in some sense, because there's always like, what are you gonna do? Like there's some way in which you're part of the system. So it's not clear that you,

02:21:10 Speaker_02
The knowledge itself is part of the system and sort of like, I don't know, like if you imagine you're like, ah, you try to have full knowledge of like what the future of the universe will be like. Well, I don't know. I'm not totally sure that's true.

02:21:21 Speaker_02
It has a halting problem kind of property, right? There's a little bit of a loopiness if you're, I think there are probably like fixed points in that where you could be like, yep, I'm gonna do that. And then like, right.

02:21:32 Speaker_02
But I think it's, I at least have a question of like, are we, you know, when people imagine the kind of completion of knowledge, you know, exactly how well does that work?

02:21:42 Speaker_01
I'm not sure.

02:21:43 Speaker_01
You had a passage in your essay on utopia where I think you were, the vibe there was more of the thing that were the positive future we're looking forward to, it will be more of like, ah, you, I'll let you describe what you meant, but like it, to me, it felt more like the first stuff, like you get the thing and then now you've like found the heart of the,

02:22:06 Speaker_01
Maybe can I ask you to read that passage real quick? Oh, sure. And that way I'll spur the discussion I'm interested in having. This part in particular.

02:22:16 Speaker_02
Right. Quote, I'm inclined to think that Utopia, however weird, would also be in a certain sense recognizable.

02:22:24 Speaker_02
that if we really understood and experienced it, we would see it, we would see in it the same thing that made us sit bolt upright long ago when we first touched love, joy, beauty, that would, we would feel in front of the bonfire, the heat of the ember from which it was lit.

02:22:41 Speaker_02
There would be, I think, a kind of remembering. Where does that fit into this picture? I think it's a good question. I mean, I, I think, um,

02:22:51 Speaker_02
I think it's like some guess about like, if there's like no part of me that recognizes it as good, then I think I'm not sure that it's good according to me in some sense.

02:23:09 Speaker_02
So yeah, I mean, it is a question of like what it takes for it to be the case that a part of you recognizes it as good. But I think if there's really none of that, then I'm not sure. it's a reflection of my values at all.

02:23:23 Speaker_01
There's a sort of tautological thing you can do where it's like, if I went through the processes, which led to me discovering was good, which we might call reflection, then it was good.

02:23:32 Speaker_01
But by definition, you ended up there because it was like, you know what I mean? Yeah.

02:23:36 Speaker_02
I mean, you definitely don't want to be like, you know, if you transform me into a paper clipper gradually, right, then I will eventually be like, and then I saw the light, you know, I saw the true paper clips. Yeah. Right.

02:23:46 Speaker_02
But that's part of what's, what's complicated about this thing about reflection. You have to find some way of differentiating between the sort of development processes that preserve what you care about and the development processes that don't.

02:24:00 Speaker_02
And that is in itself is this like fraught question, which itself requires like taking some stand on what you care about and what sorts of metaprocesses you endorse and all sorts of things.

02:24:11 Speaker_02
But you definitely shouldn't just be like, it is not a sufficient criteria that the thing at the end thinks it got it right, right? Right. Because that's compatible with having gone like wildly off the rails. Yeah, yeah, yeah.

02:24:20 Speaker_01
There was a very interesting sentence you had in your post, one of your posts where you said, our hearts have in fact been shaped by power. So we should not be at all surprised if the stuff we love is also powerful. Yeah, what's going on there?

02:24:42 Speaker_01
I actually want to think about, what did you mean there?

02:24:45 Speaker_02
Yeah, so the context on that post is I'm talking about this hazy cluster, which I call in the essay, niceness slash liberalism slash boundaries, which is this sort of somewhat more minimal set of cooperative norms involved in respecting the boundaries of others and kind of

02:25:03 Speaker_02
cooperation and peace amongst differences and tolerance and stuff like that, as opposed to your favored structure of matter, which is sometimes the paradigm of values that people use in the context of AI risk.

02:25:18 Speaker_02
And I talk for a while about the ethical virtues of these norms, but it's pretty clear that Also, why do we have these norms? Well, one important feature of these norms is that they're effective and powerful.

02:25:33 Speaker_02
Liberal societies are secure boundaries, save resources wasted on conflict. And liberal societies are often more like, they're better to live. live in, they're better to immigrate to, they're more productive, like all sorts of things.

02:25:46 Speaker_02
Nice people, they're better to interact with, they're better to like trade with, all sorts of things, right?

02:25:50 Speaker_02
And I think it's pretty clear if you look at the both like why at a political level do we have like various political institutions, and if you look kind of more deeply into our evolutionary past and like how our moral cognition is structured, it seems like pretty clear that various like kind of forms of cooperation and like kind of game theoretic dynamics and other things went into

02:26:12 Speaker_02
kind of shaping what we now, at least in certain contexts, also treat as a kind of intrinsic or terminal value. So like,

02:26:22 Speaker_02
these, some of these values that have kind of instrumental functions in our society are also get kind of reified in our cognition as kind of intrinsic values in themselves. And I think that's okay. I don't think that's a debunking.

02:26:33 Speaker_02
Like all of your, uh, all your values are kind of like some, something that kind of stuck and got kind of, uh, uh, treated as a term, terminally important. Um, but I think that means that, uh,

02:26:48 Speaker_02
you know, sometimes, sometimes the way we, in the context of the series where I'm talking about like deep atheism and our sort of relationship, the relationship between what we're pushing for and what like nature is pushing for or what sort of pure power we'll push for.

02:26:59 Speaker_02
Um, and it's easy to say like, well, there's like paperclips, which is just like one way place you can steer and you know, pleasures is like another place you can steer or something. And these are, um,

02:27:11 Speaker_02
just sort of arbitrary directions, whereas I think some of our other values are much more structured around cooperation and things that also are kind of effective and functional and powerful.

02:27:25 Speaker_02
And so that's what I mean there, is I think there's a way in which nature is a little bit more on our side then you might think, because part of who we are has been made by nature's way. And so that is in us.

02:27:41 Speaker_02
Now, I don't think that's enough necessarily for us to beat the grey goo. We have some amount of power built into our values, but that doesn't mean it's going to be such that it's arbitrarily competitive.

02:27:53 Speaker_02
But I think it's still important to keep in mind, and I think it's important to keep in mind in the context of integrating AIs into our society.

02:28:01 Speaker_02
We've been talking a lot about the ethics of this, but I think there are also instrumental and practical reasons to want to have forms of social harmony and cooperation with AIs with different values. I think we need to be

02:28:15 Speaker_02
taking that seriously and thinking about what is it to do that in a way that's like genuinely kind of legitimate and kind of a project that is sort of a kind of just incorporation of these beings into our civilization such that they can kind of all, or sorry, there's like the justice part and there's also the kind of, is it like kind of,

02:28:32 Speaker_02
compatible with like people, you know, is it a good deal? Is it a good bargain for people?

02:28:36 Speaker_02
And I think this is, you know, this is often how, you know, to the extent we're, we're kind of very concerned about AI is like kind of rebelling or something like that.

02:28:42 Speaker_02
It's like, well, there's like a lot of, you know, part of a thing you can do is make civilization better for someone. Right.

02:28:49 Speaker_02
So it's like, and I think that's, that's an important feature of how we have in fact structured a lot of, a lot of our political institutions and norms and stuff like that. So that's the thing I'm getting, getting at in that, in that quote.

02:29:03 Speaker_02
Okay, I think that's an excellent place to close. Great. Thank you so much.

02:29:06 Speaker_01
Joe, thanks so much for coming on the podcast. I mean, we discussed the ideas in the series. I think people might not appreciate if they haven't read the stories how beautifully written it is. It's just like, the ideas, we didn't cover everything.

02:29:21 Speaker_01
There's a bunch of very, very interesting ideas. As somebody who has talked to people about AI for a while, things I haven't encountered anywhere else, but just, Obviously, no part of the AI discourse is nearly as well-written.

02:29:33 Speaker_01
And it is genuinely a beautiful experience to listen to the podcast version, which is in your own voice. So I highly recommend people do that. So it's joecarlsmith.com where they can access this. Joe, thanks so much for coming on the podcast.

02:29:48 Speaker_01
Thank you for having me. I really enjoyed it. Hey, everybody. I hope you enjoyed that episode with Joe. If you did, as always, it's helpful if you can send it to friends, group chats, Twitter, whoever else you think might enjoy it.

02:30:00 Speaker_01
And also, if you can leave a good rating on Apple podcast or wherever you listen, that's really helpful. Helps other people find the podcast.

02:30:07 Speaker_01
If you want transcripts of these episodes or you want to give me a blog post, you can subscribe to my sub stack at DwarkeshPatel.com. And finally, as you might've noticed, there's advertisements on this episode.

02:30:18 Speaker_01
So if you want to advertise on a future episode, you can learn more about doing that at DwarkeshPatel.com slash advertise or the link in the description. Anyways, I'll see you on the next one. Thanks.

View full AI transcripts and summaries of all podcast episodes on the blog: Dwarkesh Podcast​

Episode: Joe Carlsmith - Otherness and control in the age of AGI​

Episode Shownotes​

Full Transcript​

View full AI transcripts and summaries of all podcast episodes on the blog: Dwarkesh Podcast

Episode: Joe Carlsmith - Otherness and control in the age of AGI

Episode Shownotes

Full Transcript