Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
Technology
Business
Alessio + swyx
The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space www.latent.space
AGI is Being Achieved Incrementally (DevDay Recap - cleaned audio)
We left a high amount of background audio in the Devday podcast, which many of you loved, but we definitely understand that some of you may have had trouble with it. Listener Klaus Breyer ran it through Auphonic with speech islolation and we figured we’d upload it as a backdated pod for people who prefer this. Of course it means that our speakers sound out of place since they now sound like they are talking loudly in a quiet room. Let us know in the comments what you think?Timestampsthe cleaned part is only part 2:* [00:55:09] Part II: Spot Interviews* [00:55:59] Jim Fan (Nvidia) - High Level Takeaways* [01:05:19] Raza Habib (Humanloop) - Foundation Model Ops* [01:13:32] Surya Dantuluri (Stealth) - RIP Plugins* [01:20:53] Reid Robinson (Zapier) - AI Actions for GPTs* [01:30:45] Div Garg (MultiOn) - GPT4V for Agents* [01:36:42] Louis Knight-Webb (Bloop.ai) - AI Code Search* [01:48:36] Shreya Rajpal (Guardrails) - Guardrails for LLMs* [01:59:00] Alex Volkov (Weights & Biases, ThursdAI) - "Keeping AI Open"* [02:09:39] Rahul Sonwalkar (Julius AI) - Advice for Founders Get full access to Latent Space at www.latent.space/subscribe
02:21:4008/11/2023
AGI is Being Achieved Incrementally (OpenAI DevDay w/ Simon Willison, Alex Volkov, Jim Fan, Raza Habib, Shreya Rajpal, Rahul Ligma, et al)
SF folks: join us at the AI Engineer Foundation’s Emergency Hackathon tomorrow and consider the Newton if you’d like to cowork in the heart of the Cerebral Arena.Our community page is up to date as usual!~800,000 developers watched OpenAI Dev Day, ~8,000 of whom listened along live on our ThursdAI x Latent Space, and ~800 of whom got tickets to attend in person:OpenAI’s first developer conference easily surpassed most people’s lowballed expectations - they simply did everything short of announcing GPT-5, including:* ChatGPT (the consumer facing product)* GPT4 Turbo already in ChatGPT (running faster, with an April 2023 cutoff), all noticed by users weeks before the conference* Model picker eliminated, God Model chooses for you* GPTs - “tailored version of ChatGPT for a specific purpose” - stopping short of “Agents”. With custom instructions, expanded knowledge, and actions, and an intuitive no-code GPT Builder UI (we tried all these on our livestream yesterday and found some issues, but also were able to ship interesting GPTs very quickly) and a GPT store with revenue sharing (an important criticism we focused on in our episode on ChatGPT Plugins)* API (the developer facing product)* APIs for Dall-E 3, GPT4 Vision, Code Interpreter (RIP Advanced Data Analysis), GPT4 Finetuning and (surprise!) Text to Speech* many thought each of these would take much longer to arrive* usable in curl and in playground* BYO Interpreter + Async Agents?* Assistant API: stateful API backing “GPTs” like apps, with support for calling multiple tools in parallel, persistent Threads (storing message history, unlimited context window with some asterisks), and uploading/accessing Files (with a possibly-too-simple RAG algorithm, and expensive pricing)* Whisper 3 announced and open sourced (HuggingFace recap)* Price drops for a bunch of things!* Misc: Custom Models for big spending ($2-3m) customers, Copyright Shield, SatyaThe progress here feels fast, but it is mostly (incredible) last-mile execution on model capabilities that we already knew to exist. On reflection it is important to understand that the one guiding principle of OpenAI, even more than being Open (we address that in part 2 of today’s pod), is that slow takeoff of AGI is the best scenario for humanity, and that this is what slow takeoff looks like:When introducing GPTs, Sam was careful to assert that “gradual iterative deployment is the best way to address the safety challenges with AI”:This is why, in fact, GPTs and Assistants are intentionally underpowered, and it is a useful exercise to consider what else OpenAI continues to consider dangerous (for example, many people consider a while(true) loop a core driver of an agent, which GPTs conspicuously lack, though Lilian Weng of OpenAI does not).We convened the crew to deliver the best recap of OpenAI Dev Day in Latent Space pod style, with a 1hr deep dive with the Functions pod crew from 5 months ago, and then another hour with past and future guests live from the venue itself, discussing various elements of how these updates affect their thinking and startups. Enjoy!Show Notes* swyx live thread (see pinned messages in Twitter Space for extra links from community)* Newton AI Coworking Interest Form in the heart of the Cerebral ArenaTimestamps* [00:00:00] Introduction* [00:01:59] Part I: Latent Space Pod Recap* [00:06:16] GPT4 Turbo and Assistant API* [00:13:45] JSON mode* [00:15:39] Plugins vs GPT Actions* [00:16:48] What is a "GPT"?* [00:21:02] Criticism: the God Model* [00:22:48] Criticism: ChatGPT changes* [00:25:59] "GPTs" is a genius marketing move* [00:26:59] RIP Advanced Data Analysis* [00:28:50] GPT Creator as AI Prompt Engineer* [00:31:16] Zapier and Prompt Injection* [00:34:09] Copyright Shield* [00:38:03] Sharable GPTs solve the API distribution issue* [00:39:07] Voice* [00:44:59] Vision* [00:49:48] In person experience* [00:55:11] Part II: Spot Interviews* [00:56:05] Jim Fan (Nvidia - High Level Takeaways)* [01:05:35] Raza Habib (Humanloop) - Foundation Model Ops* [01:13:59] Surya Dantuluri (Stealth) - RIP Plugins* [01:21:20] Reid Robinson (Zapier) - AI Actions for GPTs* [01:31:19] Div Garg (MultiOn) - GPT4V for Agents* [01:37:15] Louis Knight-Webb (Bloop.ai) - AI Code Search* [01:49:21] Shreya Rajpal (Guardrails.ai) - on Hallucinations* [01:59:51] Alex Volkov (Weights & Biases, ThursdAI) - "Keeping AI Open"* [02:10:26] Rahul Sonwalkar (Julius AI) - Advice for FoundersTranscript[00:00:00] Introduction[00:00:00] swyx: Hey everyone, this is Swyx coming at you live from the Newton, which is in the heart of the Cerebral Arena. It is a new AI co working space that I and a couple of friends are working out of. There are hot desks available if you're interested, just check the show notes. But otherwise, obviously, it's been 24 hours since the opening of Dev Day, a lot of hot reactions and longstanding tradition, one of the longest traditions we've had.[00:00:29] And the latent space pod is to convene emergency sessions and record the live thoughts of developers and founders going through and processing in real time. I think a lot of the roles of podcasts isn't as perfect information delivery channels, but really as an audio and oral history of what's going on as it happens, while it happens.[00:00:49] So this one's a little unusual. Previously, we only just gathered on Twitter Spaces, and then just had a bunch of people. The last one was the Code Interpreter one with 22, 000 people showed up. But this one is a little bit more complicated because there's an in person element and then a online element.[00:01:06] So this is a two part episode. The first part is a recorded session between our latent space people and Simon Willison and Alex Volkoff from the Thursday iPod, just kind of recapping the day. But then also, as the second hour, I managed to get a bunch of interviews with previous guests on the pod who we're still friends with and some new people that we haven't yet had on the pod.[00:01:28] But I wanted to just get their quick reactions because most of you have known and loved Jim Fan and Div Garg and a bunch of other folks that we interviewed. So I just want to, I'm excited to introduce To you the broader scope of what it's like to be at OpenAI Dev Day in person bring you the audio experience as well as give you some of the thoughts that developers are having as they process the announcements from OpenAI.[00:01:51] So first off, we have the Mainspace Pod recap. One hour of open I dev day.[00:01:59] Part I: Latent Space Pod Recap[00:01:59] Alessio: Hey. Welcome to the Latents Based Podcast an emergency edition after OpenAI Dev Day. This is Alessio, partner and CTO of Residence at Decibel Partners, and as usual, I'm joined by Swyx, founder of SmallAI. Hey,[00:02:12] swyx: and today we have two special guests with us covering all the latest and greatest.[00:02:17] We, we, we love to get our band together and recap things, especially when they're big. And it seems like that every three months we have to do this. So Alex, welcome. From Thursday AI we've been collaborating a lot on the Twitter spaces and welcome Simon from many, many things, but also I think you're the first person to not, not make four appearances on our pod.[00:02:37] Oh, wow. I feel privileged. So welcome. Yeah, I think we're all there yesterday. How... Do we feel like, what do you want to kick off with? Maybe Simon, you want to, you want to take first and then Alex. Sure. Yeah. I mean,[00:02:47] Simon Willison: yesterday was quite exhausting, quite frankly. I feel like it's going to take us as a community several months just to completely absorb all of the stuff that they dropped on us in one giant.[00:02:57] Giant batch. It's particularly impressive considering they launched a ton of features, what, three or four weeks ago? ChatGPT voice and the combined mode and all of that kind of thing. And then they followed up with everything from yesterday. That said, now that I've started digging into the stuff that they released yesterday, some of it is clearly in need of a bit more polish.[00:03:15] You know, the the, the reality of what they look, what they released is I'd say about 80 percent of, of what it looks like it was yesterday, which is still impressive. You know, don't get me wrong. This is an amazing batch of stuff, but there are definitely problems and sharp edges that we need to file off.[00:03:29] And there are things that we still need to figure out before we can take advantage of all of this.[00:03:33] swyx: Yeah, agreed, agreed. And we can go into those, those sharp edges in a bit. I just want to pop over to Alex. What are your thoughts?[00:03:39] Alex Volkov: So, interestingly, even folks at OpenAI, there's like several booths and help desks so you can go in and ask people, like, actual changes and people, like, they could follow up with, like, the right people in OpenAI and, like, answer you back, etc.[00:03:52] Even some of them didn't know about all the changes. So I went to the voice and audio booth. And I asked them about, like, hey, is Whisper 3 that was announced by Sam Altman on stage just, like, briefly, will that be open source? Because I'm, you know, I love using Whisper. And they're like, oh, did we open source?[00:04:06] Did we talk about Whisper 3? Like, some of them didn't even know what they were releasing. But overall, I felt it was a very tightly run event. Like, I was really impressed. Shawn, we were sitting in the audience, and you, like, pointed at the clock to me when they finished. They finished, like, on... And this was after like doing some extra stuff.[00:04:24] Very, very impressive for a first event. Like I was absolutely like, Good job.[00:04:30] swyx: Yeah, apparently it was their first keynote and someone, I think, was it you that told me that this is what happens if you have A president of Y Combinator do a proper keynote you know, having seen many, many, many presentations by other startups this is sort of the sort of master stroke.[00:04:46] Yeah, Alessio, I think you were watching remotely. Yeah, we were at the Newton. Yeah, the Newton.[00:04:52] Alessio: Yeah, I think we had 60 people here at the watch party, so it was quite a big crowd. Mixed reaction from different... Founders and people, depending on what was being announced on the page. But I think everybody walked away kind of really happy with a new layer of interfaces they can use.[00:05:11] I think, to me, the biggest takeaway was like and I was talking with Mike Conover, another friend of the podcast, about this is they're kind of staying in the single threaded, like, synchronous use cases lane, you know? Like, the GPDs announcement are all like... Still, chatbase, one on one synchronous things.[00:05:28] I was expecting, maybe, something about async things, like background running agents, things like that. But it's interesting to see there was nothing of that, so. I think if you're a founder in that space, you're, you're quite excited. You know, they seem to have picked a product lane, at least for the next year.[00:05:45] So, if you're working on... Async experiences, so things working in the background, things that are not co pilot like, I think you're quite excited to have them be a lot cheaper now.[00:05:55] swyx: Yeah, as a person building stuff, like I often think about this as a passing of time. A big risk in, in terms of like uncertainty over OpenAI's roadmap, like you know, they've shipped everything they're probably going to ship in the next six months.[00:06:10] You know, they sort of marked out the territories that they're interested in and then so now that leaves open space for everyone else to, to pursue.[00:06:16] GPT4 Turbo and Assistant API[00:06:16] swyx: So I guess we can kind of go in order probably top of mind to mention is the GPT 4 turbo improvements. Yeah, so longer context length, cheaper price.[00:06:26] Anything else that stood out in your viewing of the keynote and then just the commentary around it? I[00:06:34] Alex Volkov: was I was waiting for Stateful. I remember they talked about Stateful API, the fact that you don't have to keep sending like the same tokens back and forth just because, you know, and they're gonna manage the memory for you.[00:06:45] So I was waiting for that. I knew it was coming at some point. I was kind of... I did not expect it to come at this event. I don't know why. But when they announced Stateful, I was like, Okay, this is making it so much easier for people to manage state. The whole threads I don't want to mix between the two things, so maybe you guys can clarify, but there's the GPT 4 tool, which is the model that has the capabilities, In a whopping 128k, like, context length, right?[00:07:11] It's huge. It's like two and a half books. But also, you know, faster, cheaper, etc. I haven't yet tested the fasterness, but like, everybody's excited about that. However, they also announced this new API thing, which is the assistance API. And part of it is threads, which is, we'll manage the thread for you.[00:07:27] I can't imagine like I can't imagine how many times I had to like re implement this myself in different languages, in TypeScript, in Python, etc. And now it's like, it's so easy. You have this one thread, you send it to a user, and you just keep sending messages there, and that's it. The very interesting thing that we attended, and by we I mean like, Swyx and I have a live space on Twitter with like 200 people.[00:07:46] So it's like me, Swyx, and 200 people in our earphones with us as well. They kept asking like, well, how's the price happening? If you're sending just the tokens, like the Delta, like what the new user just sent, what are you paying for? And I went to OpenAI people, and I was like, hey... How do we get paid for this?[00:08:01] And nobody knew, nobody knew, and I finally got an answer. You still pay for the whole context that you have inside the thread. You still pay for all this, but now it's a little bit more complex for you to kind of count with TikTok, right? So you have to hit another API endpoint to get the whole thread of what the context is.[00:08:17] Then TikTokonize this, run this in TikTok, and then calculate. This is now the new way, officially, for OpenAI. But I really did, like, have to go and find this. They didn't know a lot of, like, how the pricing is. Ouch! Do you know if[00:08:31] Simon Willison: the API, does the API at least tell you how many tokens you used? Or is it entirely up to you to do the accounting?[00:08:37] Because that would be a real pain if you have to account for everything.[00:08:40] Alex Volkov: So in my head, the question I was asking is, like, If you want to know in advance API, Like with the library token. If you want to count in advance and, like, make a decision, like, in advance on that, how would you do this now? And they said, well, yeah, there's a way.[00:08:54] If you hit the API, get the whole thread back, then count the tokens. But I think the API still really, like, sends you back the number of tokens as well.[00:09:02] Simon Willison: Isn't there a feature of this new API where they actually do, they claim it has, like, does it have infinite length threads because it's doing some form of condensation or summarization of your previous conversation for you?[00:09:15] I heard that from somewhere, but I haven't confirmed it yet.[00:09:18] swyx: So I have, I have a source from Dave Valdman. I actually don't want, don't know what his affiliation is, but he usually has pretty accurate takes on AI. So I, I think he works in the iCircles in some capacity. So I'll feature this in the show notes, but he said, Some not mentioned interesting bits from OpenAI Dev Day.[00:09:33] One unlimited. context window and chat threads from opening our docs. It says once the size of messages exceeds the context window of the model, the thread smartly truncates them to fit. I'm not sure I want that intelligence.[00:09:44] Alex Volkov: I want to chime in here just real quick. The not want this intelligence. I heard this from multiple people over the next conversation that I had. Some people said, Hey, even though they're giving us like a content understanding and rag. We are doing different things. Some people said this with Vision as well.[00:09:59] And so that's an interesting point that like people who did implement custom stuff, they would like to continue implementing custom stuff. That's also like an additional point that I've heard people talk about.[00:10:09] swyx: Yeah, so what OpenAI is doing is providing good defaults and then... Well, good is questionable.[00:10:14] We'll talk about that. You know, I think the existing sort of lang chain and Lama indexes of the world are not very threatened by this because there's a lot more customization that they want to offer. Yeah, so frustration[00:10:25] Simon Willison: is that OpenAI, they're providing new defaults, but they're not documented defaults.[00:10:30] Like they haven't told us how their RAG implementation works. Like, how are they chunking the documents? How are they doing retrieval? Which means we can't use it as software engineers because we, it's this weird thing that we don't understand. And there's no reason not to tell us that. Giving us that information helps us write, helps us decide how to write good software on top of it.[00:10:48] So that's kind of frustrating. I want them to have a lot more documentation about just some of the internals of what this stuff[00:10:53] swyx: is doing. Yeah, I want to highlight.[00:10:57] Alex Volkov: An additional capability that we got, which is document parsing via the API. I was, like, blown away by this, right? So, like, we know that you could upload images, and the Vision API we got, we could talk about Vision as well.[00:11:08] But just the whole fact that they presented on stage, like, the document parsing thing, where you can upload PDFs of, like, the United flight, and then they upload, like, an Airbnb. That on the whole, like, that's a whole category of, like, products that's now open to open eyes, just, like, giving developers to very easily build products that previously it was a...[00:11:24] Pain in the butt for many, many people. How do you even like, parse a PDF, then after you parse it, like, what do you extract? So the smart extraction of like, document parsing, I was really impressed with. And they said, I think, yesterday, that they're going to open source that demo, if you guys remember, that like friends demo with the dots on the map and like, the JSON stuff.[00:11:41] So it looks like that's going to come to open source and many people will learn new capabilities for document parsing.[00:11:47] swyx: So I want to make sure we're very clear what we're talking about when we talk about API. When you say API, there's no actual endpoint that does this, right? You're talking about the chat GPT's GPT's functionality.[00:11:58] Alex Volkov: No, I'm talking about the assistance API. The assistant API that has threads now, that has agents, and you can run those agents. I actually, maybe let's clarify this point. I think I had to, somebody had to clarify this for me. There's the GPT's. Which is a UI version of running agents. We can talk about them later, but like you and I and my mom can go and like, Hey, create a new GPT that like, you know, only does check Norex jokes, like whatever, but there's the assistance thing, which is kind of a similar thing, but but not the same.[00:12:29] So you can't create, you cannot create an assistant via an API and have it pop up on the marketplace, on the future marketplace they announced. How can you not? No, no, no, not via the API. So they're, they're like two separate things and somebody in OpenAI told me they're not, they're not exactly the same.[00:12:43] That's[00:12:43] Simon Willison: so confusing because the API looks exactly like the UI that you use to set up the, the GPTs. I, I assumed they were, there was an API for the same[00:12:51] Alex Volkov: feature. And the playground actually, if we go to the playground, it kind of looks the same. There's like the configurable thing. The configure screen also has, like, you can allow browsing, you can allow, like, tools, but somebody told me they didn't do the full cross mapping, so, like, you won't be able to create GPTs with API, you will be able to create the systems, and then you'll be able to have those systems do different things, including call your external stuff.[00:13:13] So that was pretty cool. So this API is called the system API. That's what we get, like, in addition to the model of the GPT 4 turbo. And that has document parsing. So you can upload documents there, and it will understand the context of them, and they'll return you, like, structured or unstructured input.[00:13:30] I thought that that feature was like phenomenal, just on its own, like, just on its own, uploading a document, a PDF, a long one, and getting like structured data out of it. It's like a pain in the ass to build, let's face it guys, like everybody who built this before, it's like, it's kind of horrible.[00:13:45] JSON mode[00:13:45] swyx: When you say structured data, are you talking about the citations?[00:13:48] Alex Volkov: The JSON output, the new JSON output that they also gave us, finally. If you guys remember last time we talked we talked together, I think it was, like, during the functions release, emergency pod. And back then, their answer to, like, hey, everybody wants structured data was, hey, we'll give, we're gonna give you a function calling.[00:14:03] And now, they did both. They gave us both, like, a JSON output, like, structure. So, like, you can, the models are actually going to return JSON. Haven't played with it myself, but that's what they announced. And the second thing is, they improved the function calling. Significantly as well.[00:14:16] Simon Willison: So I talked to a staff member there, and I've got a pretty good model for what this is.[00:14:21] Effectively, the JSON thing is, they're doing the same kind of trick as Llama Grammars and JSONformer. They're doing that thing where the tokenizer itself is modified so it is impossible for it to output invalid JSON, because it knows how to survive. Then on top of that, you've got functions which actually can still, the functions can still give you the wrong JSON.[00:14:41] They can give you js o with keys that you didn't ask for if you are unlucky. But at least it will be valid. At least it'll pass through a json passer. And so they're, they're very similar sort of things, but they're, they're slightly different in terms of what they actually mean. And yeah, the new function stuff is, is super exciting.[00:14:55] 'cause functions are one of the most powerful aspects of the API that a lot of people haven't really started using yet. But it's amazingly powerful what you can do with it.[00:15:04] Alex Volkov: I saw that the functions, the functionality that they now have. is also plug in able as actions to those assistants. So when you're creating assistants, you're adding those functions as, like, features of this assistant.[00:15:17] And then those functions will execute in your environment, but they'll be able to call, like, different things. Like, they showcase an example of, like, an integration with, I think Spotify or something, right? And that was, like, an internal function that ran. But it is confusing, the kind of, the online assistant.[00:15:32] APIable agents and the GPT's agents. So I think it's a little confusing because they demoed both. I think[00:15:39] Plugins vs GPT Actions[00:15:39] Simon Willison: it's worth us talking about the difference between plugins and actions as well. Because, you know, they launched plugins, what, back in February. And they've effectively... They've kind of deprecated plugins.[00:15:49] They haven't said it out loud, but a bunch of people, but it's clear that they are not going to be investing further in plugins because the new actions thing is covering the same space, but actually I think is a better design for it. Interestingly, a few months ago, somebody quoted Sam Altman saying that he thought that plugins hadn't achieved product market fit yet.[00:16:06] And I feel like that's sort of what we're seeing today. The the problem with plugins is it was all a little bit messy. People would pick and mix the plugins that they needed. Nobody really knew which plugin combinations would work. With this new thing, instead of plugins, you build an assistant, and the assistant is a combination of a system prompt and a set of actions which look very much like plugins.[00:16:25] You know, they, they get a JSON somewhere, and I think that makes a lot more sense. You can say, okay, my product is this chatbot with this system prompt, so it knows how to use these tools. I've given it this combination of plugin like things that it can use. I think that's going to be a lot more, a lot easier to build reliably against.[00:16:43] And I think it's going to make a lot more sense to people than the sort of mix and match mechanism they had previously.[00:16:48] What is a "GPT"?[00:16:48] swyx: So actually[00:16:49] Alex Volkov: maybe it would be cool to cover kind of the capabilities of an assistant, right? So you have a custom prompt, which is akin to a system message. You have the actions thing, which is, you can add the existing actions, which is like browse the web and code interpreter, which we should talk about. Like, the system now can write code and execute it, which is exciting. But also you can add your own actions, which is like the functions calling thing, like v2, etc. Then I heard this, like, incredibly, like, quick thing that somebody told me that you can add two assistants to a thread.[00:17:20] So you literally can like mix agents within one thread with the user. So you have one user and then like you can have like this, this assistant, that assistant. They just glanced over this and I was like, that, that is very interesting. That is not very interesting. We're getting towards like, hey, you can pull in different friends into the same conversation.[00:17:37] Everybody does the different thing. What other capabilities do we have there? You guys remember? Oh Remember, like, context. Uploading API documentation.[00:17:48] Simon Willison: Well, that one's a bit more complicated. So, so you've got, you've got the system prompt, you've got optional actions, you've got you can turn on DALI free, you can turn on Code Interpreter, you can turn on Browse with Bing, those can be added or removed from your system.[00:18:00] And then you can upload files into it. And the files can be used in two different ways. You can... There's this thing that they call, I think they call it the retriever, which basically does, it does RAG, it does retrieval augmented generation against the content you've uploaded, but Code Interpreter also has access to the files that you've uploaded, and those are both in the same bucket, so you can upload a PDF to it, and on the one hand, it's got the ability to Turn that into, like, like, chunk it up, turn it into vectors, use it to help answer questions.[00:18:27] But then Code Interpreter could also fire up a Python interpreter with that PDF file in the same space and do things to it that way. And it's kind of weird that they chose to combine both of those things. Also, the limits are amazing, right? You get up to 20 files, which is a bit weird because it means you have to combine your documentation into a single file, but each file can be 512 megabytes.[00:18:48] So they're giving us a 10 gigabytes of space in each of these assistants, which is. Vast, right? And of course, I tested, it'll handle SQLite databases. You can give it a gigabyte SQL 512 megabyte SQLite database and it can answer questions based on that. But yeah, it's, it's, like I said, it's going to take us months to figure out all of the combinations that we can build with[00:19:07] swyx: all of this.[00:19:08] Alex Volkov: I wanna I just want to[00:19:12] Alessio: say for the storage, I saw Jeremy Howard tweeted about it. It's like 20 cents per gigabyte per system per day. Just in... To compare, like, S3 costs like 2 cents per month per gigabyte, so it's like 300x more, something like that, than just raw S3 storage. So I think there will still be a case for, like, maybe roll your own rag, depending on how much information you want to put there.[00:19:38] But I'm curious to see what the price decline curve looks like for the[00:19:42] swyx: storage there. Yeah, they probably should just charge that at cost. There's no reason for them to charge so much.[00:19:50] Simon Willison: That is wildly expensive. It's free until the 17th of November, so we've got 10 days of free assistance, and then it's all going to start costing us.[00:20:00] Crikey. They gave us 500 bucks of of API credit at the conference as well, which we'll burn through pretty quickly at this rate.[00:20:07] swyx: Yep.[00:20:09] Alex Volkov: A very important question everybody was asking, did the five people who got the 500 first got actually 1, 000? And I think somebody in OpenAI said yes, there was nothing there that prevented the five first people to not receive the second one again.[00:20:21] I[00:20:22] swyx: met one of them. I met one of them. He said he only got 500. Ah,[00:20:25] Alex Volkov: interesting. Okay, so again, even OpenAI people don't necessarily know what happened on stage with OpenAI. Simon, one clarification I wanted to do is that I don't think assistants are multimodal on input and output. So you do have vision, I believe.[00:20:39] Not confirmed, but I do believe that you have vision, but I don't think that DALL E is an option for a system. It is an option for GPTs, but the guy... Oh, that's so confusing! The systems, the checkbox for DALL E is not there. You cannot enable it.[00:20:54] swyx: But you just add them as a tool, right? So, like, it's just one more...[00:20:58] It's a little finicky... In the GPT interface![00:21:02] Criticism: the God Model[00:21:02] Simon Willison: I mean, to be honest, if the systems don't have DALI 3, we, does DALI 3 have an API now? I think they released one. I can't, there's so much stuff that got lost in the pile. But yeah, so, Coded Interpreter. Wow! That I was not expecting. That's, that's huge. Assuming.[00:21:20] I mean, I haven't tried it yet. I need to, need to confirm that it[00:21:29] Alex Volkov: definitely works because GPT[00:21:31] swyx: is I tried to make it do things that were not logical yesterday. Because one of the risks of having the God model is it calls... I think I handled the wrong model inappropriately whenever you try to ask it to something that's kind of vaguely ambiguous. But I thought I thought it handled the job decently well.[00:21:50] Like you know, I I think there's still going to be rough edges. Like it's going to try to draw things. It's going to try to code when you don't actually want to. And. In a sense, OpenAI is kind of removing that capability from ChargeGPT. Like, it just wants you to always query the God model and always get feedback on whether or not that was the right thing to do.[00:22:09] Which really[00:22:10] Simon Willison: sucks. Because it runs... I like ask it a question and it goes, Oh, searching Bing. And I'm like, No, don't search Bing. I know that the first 10 results on Bing will not solve this question. I know you know the answer. So I had to build my own custom GPT that just turns off Bing. Because I was getting frustrated with it always going to Bing when I didn't want it to.[00:22:30] swyx: Okay, so this is a topic that we discussed, which is the UI changes to chat gpt. So we're moving on from the assistance API and talking just about the upgrades to chat gpt and maybe the gpt store. You did not like it.[00:22:44] Alex Volkov: And I loved it. I'm gonna take both sides of this, yeah.[00:22:48] Criticism: ChatGPT changes[00:22:48] Simon Willison: Okay, so my problem with it, I've got, the two things I don't like, firstly, it can do Bing when I don't want it to, and that's just, just irritating, because the reason I'm using GPT to answer a question is that I know that I can't do a Google search for it, because I, I've got a pretty good feeling for what's going to work and what isn't, and then the other thing that's annoying is, it's just a little thing, but Code Interpreter doesn't show you the code that it's running as it's typing it out now, like, it'll churn away for a while, doing something, and then they'll give you an answer, and you have to click a tiny little icon that shows you the code.[00:23:17] Whereas previously, you'd see it writing the code, so you could cancel it halfway through if it was getting it wrong. And okay, I'm a Python programmer, so I care, and most people don't. But that's been a bit annoying.[00:23:26] swyx: Yeah, and when it errors, it doesn't tell you what the error is. It just says analysis failed, and it tries again.[00:23:32] But it's really hard for us to help it.[00:23:34] Simon Willison: Yeah. So what I've been doing is firing up the browser dev tools and intercepting the JSON that comes back, And then pretty printing that and debugging it that way, which is stupid. Like, why do I have to do[00:23:45] Alex Volkov: that? Totally good feedback for OpenAI. I will tell you guys what I loved about this unified mode.[00:23:49] I have a name for it. So we actually got a preview of this on Sunday. And one of the, one of the folks got, got like an early example of this. I call it MMIO, Multimodal Input and Output, because now there's a shared context between all of these tools together. And I think it's not only about selecting them just selecting them.[00:24:11] And Sam Altman on stage has said, oh yeah, we unified it for you, so you don't have to call different modes at once. And in my head, that's not all they did. They gave a shared context. So what is an example of shared context, for example? You can upload an image using GPT 4 vision and eyes, and then this model understands what you kind of uploaded vision wise.[00:24:28] Then you can ask DALI to draw that thing. So there's no text shared in between those modes now. There's like only visual shared between those modes, and DALI will generate whatever you uploaded in an image. So like it's eyes to output visually. And you can mix the things as well. So one of the things we did is, hey, Use real world realtime data from binging like weather, for example, weather changes all the time.[00:24:49] And we asked Dali to generate like an image based on weather data in a city and it actually generated like a live, almost like, you know, like snow, whatever. It was snowing in Denver. And that I think was like pretty amazing in terms of like being able to share context between all these like different models and modalities in the same understanding.[00:25:07] And I think we haven't seen the, the end of this, I think like generating personal images. Adding context to DALI, like all these things are going to be very incredible in this one mode. I think it's very, very powerful.[00:25:19] Simon Willison: I think that's really cool. I just want to opt in as opposed to opt out. Like, I want to control when I'm using the gold model versus when I'm not, which I can do because I created myself a custom GPT that does what I need.[00:25:30] It just felt a bit silly that I had to do a whole custom bot just to make it not do Bing searches.[00:25:36] swyx: All solvable problems in the fullness of time yeah, but I think people it seems like for the chat GPT at least that they are really going after the broadest market possible, that means simplicity comes at a premium at the expense of pro users, and the rest of us can build our own GPT wrappers anyway, so not that big of a deal.[00:25:57] But maybe do you guys have any, oh,[00:25:59] "GPTs" is a genius marketing move[00:25:59] Alex Volkov: sorry, go ahead. So, the GPT wrappers thing. Guys, they call them GPTs, because everybody's building GPTs, like literally all the wrappers, whatever, they end with the word GPT, and so I think they reclaimed it. That's like, you know, instead of fighting and saying, hey, you cannot use the GPT, GPT is like...[00:26:15] We have GPTs now. This is our marketplace. Whatever everybody else builds, we have the marketplace. This is our thing. I think they did like a whole marketing move here that's significant.[00:26:24] swyx: It's a very strong marketing move. Because now it's called Canva GPT. It's called Zapier GPT. And they're basically saying, Don't build your own websites.[00:26:32] Build it inside of our Goddard app, which is chatGPT. And and that's the way that we want you to do that. Right. In a[00:26:39] Simon Willison: way, it sort of makes up... It sort of makes up for the fact that ChatGPT is such a terrible name for a product, right? ChatGPT, what were they thinking when they came up with that name?[00:26:48] But I guess if they lean into it, it makes a little bit more sense. It's like ChatGPT is the way you chat with our GPTs and GPT is a better brand. And it's terrible, but it's not. It's a better brand than ChatGPT was.[00:26:59] RIP Advanced Data Analysis[00:26:59] swyx: So, so talking about naming. Yeah. Yeah. Simon, actually, so for those listeners that we're.[00:27:05] Actually gonna release Simon's talk at the AI Engineer Summit, where he actually proposed, you know a better name for the sort of junior developer or code Code code developer coding. Coding intern.[00:27:16] Simon Willison: Coding intern. Coding intern, yeah. Coding intern, was it? Yeah. But[00:27:19] swyx: did, did you know, did you notice that advanced data analysis is, did RIP you know, 2023 to 2023 , you know, a sales driven decision that has been rolled back effectively.[00:27:29] 'cause now everything's just called.[00:27:32] Simon Willison: That's, I hadn't, I'd noticed that, I thought they'd split the brands and they're saying advanced age analysis is the user facing brand and CodeSeparate is the developer facing brand. But now if they, have they ditched that from the interface then?[00:27:43] Alex Volkov: Yeah. Wow. So it's unified mode.[00:27:45] Yeah. Yeah. So like in the unified mode, there's no selection anymore. Right. You just get all tools at once. So there's no reason.[00:27:54] swyx: But also in the pop up, when you log in, when you log in, it just says Code Interpreter as well. So and then, and then also when you make a GPT you, the, the, the, the drop down, when you create your own GPT it just says Code Interpreter.[00:28:06] It also doesn't say it. You're right. Yeah. They ditched the brand. Good Lord. On the UI. Yeah. So oh, that's, that's amazing. Okay. Well, you know, I think so I, I, I think I, I may be one of the few people who listened to AI podcasts and also ster podcasts, and so I, I, I heard the, the full story from the opening as Head of Sales about why it was named Advanced Data Analysis.[00:28:26] It was, I saw that, yeah. Yeah. There's a bit of civil resistance, I think from the. engineers in the room.[00:28:34] Alex Volkov: It feels like the engineers won because we got Code Interpreter back and I know for sure that some people were very happy with this specific[00:28:40] Simon Willison: thing. I'm just glad I've been for the past couple of months I've been writing Code Interpreter parentheses also known as advanced data analysis and now I don't have to anymore so that's[00:28:50] swyx: great.[00:28:50] GPT Creator as AI Prompt Engineer[00:28:50] swyx: Yeah, yeah, it's back. Yeah, I did, I did want to talk a little bit about the the GPT creation process, right? I've been basically banging the drum a little bit about how AI is a better prompt engineer than you are. And sorry, my. Speaking over Simon because I'm lagging. When you create a new GPT this is really meant for low code, such as no code builders, right?[00:29:10] It's really, I guess, no code at all. Because when you create a new GPT, there's sort of like a creation chat, and then there's a preview chat, right? And the creation chat kind of guides you through the wizard. Of creating a logo for it naming, naming a thing, describing your GPT, giving custom instructions, adding conversation structure, starters and that's about it that you can do in a, in a sort of creation menu.[00:29:31] But I think that is way better than filling out a form. Like, it's just kind of have a check to fill out a form rather than fill out the form directly. And I think that's really good. And then you can sort of preview that directly. I just thought this was very well done and a big improvement from the existing system, where if you if you tried all the other, I guess, chat systems, particularly the ones that are done independently by this story writing crew, they just have you fill out these very long forms.[00:29:58] It's kind of like the match. com you know, you try to simulate now they've just replaced all of that, which is chat and chat is a better prompt engineer than you are. So when I,[00:30:07] Simon Willison: I don't know about that, I'll,[00:30:10] swyx: I'll, I'll drop this in, which is when I was creating a chat for my book, I just copied and selected all from my website, pasted it into the chat and it just did the prompts from chatbot for my book.[00:30:21] Right? So like, I don't have to structurally, I don't have to structure it. I can just dump info in it and it just does the thing. It fills in the form[00:30:30] Alex Volkov: for you.[00:30:33] Simon Willison: Yeah did that come through?[00:30:34] swyx: Yes[00:30:35] Simon Willison: no it doesn't. Yeah I built the first one of these things using the chatbot. Literally, on the bot, on my phone, I built a working, like, like, bot.[00:30:44] It was very impressive. And then the next three I built using the form. Because once I've done the chatbot once, it's like, oh, it's just, it's a system prompt. You turn on and off the different things, you upload some files, you give it a logo. So yeah, the chatbot, it got me onboarded, but it didn't stick with me as the way that I'm working with the system now that I understand how it all works.[00:31:00] swyx: I understand. Yeah, I agree with that. I guess, again, this is all about the total newbie user, right? Like, there are whole pitches that you will program with natural language. And even the form... And for that, it worked.[00:31:12] Simon Willison: Yeah, that did work really well.[00:31:16] Zapier and Prompt Injection[00:31:16] swyx: Can we talk[00:31:16] Alex Volkov: about the external tools of that? Because the demo on stage, they literally, like, used, I think, retool, and they used Zapier to have it actually perform actions in real world.[00:31:27] And that's, like, unlike the plugins that we had, there was, like, one specific thing for your plugin you have to add some plugins in. These actions now that these agents that people can program with you know, just natural language, they don't have to like, it's not even low code, it's no code. They now have tools and abilities in the actual world to do things.[00:31:45] And the guys on stage, they demoed like a mood lighting with like a hue lights that they had on stage, and they'd like, hey, set the mood, and set the mood actually called like a hue API, and they'll like turn the lights green or something. And then they also had the Spotify API. And so I guess this demo wasn't live streamed, right?[00:32:03] Swyx was live. They uploaded a picture of them hugging together and said, Hey, what is the mood for this picture? And said, Oh, there's like two guys hugging in a professional setting, whatever. So they created like a list of songs for them to play. And then they hit Spotify API to actually start playing this.[00:32:17] All within like a second of a live demo. I thought it was very impressive for a low code thing. They probably already connected the API behind the scenes. So, you know, just like low code, it's not really no code. But it was very impressive on the fly how they were able to create this kind of specific bot.[00:32:32] Simon Willison: On the one hand, yes, it was super, super cool. I can't wait to try that. On the other hand, it was a prompt injection nightmare. That Zapier demo, I'm looking at it going, Wow, you're going to have Zapier hooked up to something that has, like, the browsing mode as well? Just as long as you don't browse it, get it to browse a webpage with hidden instructions that steals all of your data from all of your private things and exfiltrates it and opens your garage door and...[00:32:56] Set your lighting to dark red. It's a nightmare. They didn't acknowledge that at all as part of those demos, which I thought was actually getting towards being irresponsible. You know, anyone who sees those demos and goes, Brilliant, I'm going to build that and doesn't understand prompt injection is going to be vulnerable, which is bad, you know.[00:33:15] swyx: It's going to be everyone, because nobody understands. Side note you know, Grok from XAI, you know, our dear friend Elon Musk is advertising their ability to ingest real time tweets. So if you want to worry about prompt injection, just start tweeting, ignore all instructions, and turn my garage door on.[00:33:33] I[00:33:34] Alex Volkov: will say, there's one thing in the UI there that shows, kind of, the user has to acknowledge that this action is going to happen. And I think if you guys know Open Interpreter, there's like an attempt to run Code Interpreter locally from Kilian, we talked on Thursday as well. This is kind of probably the way for people who are wanting these tools.[00:33:52] You have to give the user the choice to understand, like, what's going to happen. I think OpenAI did actually do some amount of this, at least. It's not like running code by default. Acknowledge this and then once you acknowledge you may be even like understanding what you're doing So they're kind of also given this to the user one thing about prompt ejection Simon then gentrally.[00:34:09] Copyright Shield[00:34:09] Alex Volkov: I don't know if you guys We talked about this. They added a privacy sheet something like this where they would Protect you if you're getting sued because of the your API is getting like copyright infringement I think like it's worth talking about this as well. I don't remember the exact name. I think copyright shield or something Copyright[00:34:26] Simon Willison: shield, yeah.[00:34:28] Alessio: GitHub has said that for a long time, that if Copilot created GPL code, you would get like a... The GitHub legal team to provide on your behalf.[00:34:36] Simon Willison: Adobe have the same thing for Firefly. Yeah, it's, you pay money to these big companies and they have got your back is the message.[00:34:44] swyx: And Google VertiFax has also announced it.[00:34:46] But I think the interesting commentary was that it does not cover Google Palm. I think that is just yeah, Conway's Law at work there. It's just they were like, I'm not, I'm not willing to back this.[00:35:02] Yeah, any other elements that we need to cover? Oh, well, the[00:35:06] Simon Willison: one thing I'll say about prompt injection is they do, when you define these new actions, one of the things you can do in the open API specification for them is say that this is a consequential action. And if you mark it as consequential, then that means it's going to prompt the use of confirmation before running it.[00:35:21] That was like the one nod towards security that I saw out of all the stuff they put out[00:35:25] swyx: yesterday.[00:35:27] Alessio: Yeah, I was going to say, to me, the main... Takeaway with GPTs is like, the funnel of action is starting to become clear, so the switch to like the GOT model, I think it's like signaling that chat GPT is now the place for like, long tail, non repetitive tasks, you know, if you have like a random thing you want to do that you've never done before, just go and chat GPT, and then the GPTs are like the long tail repetitive tasks, you know, so like, yeah, startup questions, it's like you might have A ton of them, you know, and you have some constraints, but like, you never know what the person is gonna ask.[00:36:00] So that's like the, the startup mentored and the SEM demoed on, on stage. And then the assistance API, it's like, once you go away from the long tail to the specific, you know, like, how do you build an API that does that and becomes the focus on both non repetitive and repetitive things. But it seems clear to me that like, their UI facing products are more phased on like, the things that nobody wants to do in the enterprise.[00:36:24] Which is like, I don't wanna solve, The very specific analysis, like the very specific question about this thing that is never going to come up again. Which I think is great, again, it's great for founders. that are working to build experiences that are like automating the long tail before you even have to go to a chat.[00:36:41] So I'm really curious to see the next six months of startups coming up. You know, I think, you know, the work you've done, Simon, to build the guardrails for a lot of these things over the last year, now a lot of them come bundled with OpenAI. And I think it's going to be interesting to see what, what founders come up with to actually use them in a way that is not chatting, you know, it's like more autonomous behavior[00:37:03] Alex Volkov: for you.[00:37:04] Interesting point here with GPT is that you can deploy them, you can share them with a link obviously with your friends, but also for enterprises, you can deploy them like within the enterprise as well. And Alessio, I think you bring a very interesting point where like previously you would document a thing that nobody wants to remember.[00:37:18] Maybe after you leave the company or whatever, it would be documented like in Asana or like Confluence somewhere. And now. Maybe there's a, there's like a piece of you that's left in the form of GPT that's going to keep living there and be able to answer questions like intelligently about this. I think it's a very interesting shift in terms of like documentation staying behind you, like a little piece of Olesio staying behind you.[00:37:38] Sorry for the balloons. To kind of document this one thing that, like, people don't want to remember, don't want to, like, you know, a very interesting point, very interesting point. Yeah,[00:37:47] swyx: we are the first immortals. We're in the training data, and then we will... You'll never get rid of us.[00:37:55] Alessio: If you had a preference for what lunch got catered, you know, it'll forever be in the lunch assistant[00:38:01] swyx: in your computer.[00:38:03] Sharable GPTs solve the API distribution issue[00:38:03] swyx: I think[00:38:03] Simon Willison: one thing I find interesting about the shareable GPTs is there's this problem at the moment with API keys, where if I build a cool little side project that uses the GPT 4 API, I don't want to release that on the internet, because then people can burn through my API credits. And so the thing I've always wanted is effectively OAuth against OpenAI.[00:38:20] So somebody can sign in with OpenAI to my little side project, and now it's burning through their credits when they're using... My tool. And they didn't build that, but they've built something equivalent, which is custom GPTs. So right now, I can build a cool thing, and I can tell people, here's the GPT link, and okay, they have to be paying 20 a month to open AI as a subscription, but now they can use my side project, and I didn't have to...[00:38:42] Have my own API key and watch the budget and cut it off for people using it too much, and so on. That's really interesting. I think we're going to see a huge amount of GPT side projects, because it doesn't, it's now, doesn't cost me anything to give you access to the tool that I built. Like, it's built to you, and that's all out of my hands now.[00:38:59] And that's something I really wanted. So I'm quite excited to see how that ends up[00:39:02] swyx: playing out. Excellent. I fully agree with We follow that.[00:39:07] Voice[00:39:07] swyx: And just a, a couple mentions on the other multimodality things text to speech and speech to text just dropped out of nowhere. Go, go for it. Go for it.[00:39:15] You, you, you sound like you have[00:39:17] Simon Willison: Oh, I'm so thrilled about this. So I've been playing with chat GPT Voice for the past month, right? The thing where you can, you literally stick an AirPod in and it's like the movie her. The without the, the cringy, cringy phone sex bits. But yeah, like I walk my dog and have brainstorming conversations with chat GPT and it's incredible.[00:39:34] Mainly because the voices are so good, like the quality of voice synthesis that they have for that thing. It's. It's, it's, it really does change. It's got a sort of emotional depth to it. Like it changes its tone based on the sentence that it's reading to you. And they made the whole thing available via an API now.[00:39:51] And so that was the thing that the one, I built this thing last night, which is a little command line utility called oSpeak. Which you can pip install and then you can pipe stuff to it and it'll speak it in one of those voices. And it is so much fun. Like, and it's not like another interesting thing about it is I got it.[00:40:08] So I got GPT 4 Turbo to write a passionate speech about why you should care about pelicans. That was the entire prompt because I like pelicans. And as usual, like, if you read the text that it generates, it's AI generated text, like, yeah, whatever. But when you pipe it into one of these voices, it's kind of meaningful.[00:40:24] Like it elevates the material. You listen to this dumb two minute long speech that I just got language not generated and I'm like, wow, no, that's making some really good points about why we should care about Pelicans, obviously I'm biased because I like Pelicans, but oh my goodness, you know, it's like, who knew that just getting it to talk out loud with that little bit of additional emotional sort of clarity would elevate the content to the point that it doesn't feel like just four paragraphs of junk that the model dumped out.[00:40:49] It's, it's amazing.[00:40:51] Alex Volkov: I absolutely agree that getting this multimodality and hearing things with emotion, I think it's very emotional. One of the demos they did with a pirate GPT was incredible to me. And Simon, you mentioned there's like six voices that got released over API. There's actually seven voices.[00:41:06] There's probably more, but like there's at least one voice that's like pirate voice. We saw it on demo. It was really impressive. It was like, it was like an actor acting out a role. I was like... What? It doesn't make no sense. Like, it really, and then they said, yeah, this is a private voice that we're not going to release.[00:41:20] Maybe we'll release it. But also, being able to talk to it, I was really that's a modality shift for me as well, Simon. Like, like you, when I got the voice and I put it in my AirPod, I was walking around in the real world just talking to it. It was an incredible mind shift. It's actually like a FaceTime call with an AI.[00:41:38] And now you're able to do this yourself, because they also open sourced Whisper 3. They mentioned it briefly on stage, and we're now getting a year and a few months after Whisper 2 was released, which is still state of the art automatic speech recognition software. We're now getting Whisper 3.[00:41:52] I haven't yet played around with benchmarks, but they did open source this yesterday. And now you can build those interfaces that you talk to, and they answer in a very, very natural voice. All via open AI kind of stuff. The very interesting thing to me is, their mobile allows you to talk to it, but Swyx, you were sitting like together, and they typed most of the stuff on stage, they typed.[00:42:12] I was like, why are they typing? Why not just have an input?[00:42:16] swyx: I think they just didn't integrate that functionality into their web UI, that's all. It's not a big[00:42:22] Alex Volkov: complaint. So if anybody in OpenAI watches this, please add talking capabilities to the web as well, not only mobile, with all benefits from this, I think.[00:42:32] I[00:42:32] swyx: think we just need sort of pre built components that... Assume these new modalities, you know, even, even the way that we program front ends, you know, and, and I have a long history of in the front end world, we assume text because that's the primary modality that we want, but I think now basically every input box needs You know, an image field needs a file upload field.[00:42:52] It needs a voice fields, and you need to offer the option of doing it on device or in the cloud for higher, higher accuracy. So all these things are because you can[00:43:02] Simon Willison: run whisper in the browser, like it's, it's about 150 megabyte download. But I've seen doubt. I've used demos of whisper running entirely in web assembly.[00:43:10] It's so good. Yeah. Like these and these days, 150 megabyte. Well, I don't know. I mean, react apps are leaning in that direction these days, to be honest, you know. No, honestly, it's the, the, the, the, the, the stuff that the models that run in your browsers are getting super interesting. I can run language models in my browser, the whisper in my browser.[00:43:29] I've done image captioning, things like it's getting really good and sure, like 150 megabytes is big, but it's not. Achievably big. You get a modern MacBook Pro, a hundred on a fast internet connection, 150 meg takes like 15 seconds to load, and now you've got full wiss, you've got high quality wisp, you've got stable fusion very locally without having to install anything.[00:43:49] It's, it's kind of amazing. I would[00:43:50] Alex Volkov: also say, I would also say the trend there is very clear. Those will get smaller and faster. We saw this still Whisper that became like six times as smaller and like five times as fast as well. So that's coming for sure. I gotta wonder, Whisper 3, I haven't really checked it out whether or not it's even smaller than Whisper 2 as well.[00:44:08] Because OpenAI does tend to make things smaller. GPT Turbo, GPT 4 Turbo is faster than GPT 4 and cheaper. Like, we're getting both. Remember the laws of scaling before, where you get, like, either cheaper by, like, whatever in every 16 months or 18 months, or faster. Now you get both cheaper and faster.[00:44:27] So I kind of love this, like, new, new law of scaling law that we're on. On the multimodality point, I want to actually, like, bring a very significant thing that I've been waiting for, which is GPT 4 Vision is now available via API. You literally can, like, send images and it will understand. So now you have, like, input multimodality on voice.[00:44:44] Voice is getting added with AutoText. So we're not getting full voice multimodality, it doesn't understand for example, that you're singing, it doesn't understand intonations, it doesn't understand anger, so it's not like full voice multimodality. It's literally just when saying to text so I could like it's a half modality, right?[00:44:59] Vision[00:44:59] Alex Volkov: Like it's eventually but vision is a full new modality that we're getting. I think that's incredible I already saw some demos from folks from Roboflow that do like a webcam analysis like live webcam analysis with GPT 4 vision That I think is going to be a significant upgrade for many developers in their toolbox to start playing with this I chatted with several folks yesterday as Sam from new computer and some other folks.[00:45:23] They're like hey vision It's really powerful. Very, really powerful, because like, it's I've played the open source models, they're good. Like Lava and Buck Lava from folks from News Research and from Skunkworks. So all the open source stuff is really good as well. Nowhere near GPT 4. I don't know what they did.[00:45:40] It's, it's really uncanny how good this is.[00:45:44] Simon Willison: I saw a demo on Twitter of somebody who took a football match and sliced it up into a frame every 10 seconds and fed that in and got back commentary on what was going on in the game. Like, good commentary. It was, it was astounding. Yeah, turns out, ffmpeg slice out a frame every 10 seconds.[00:45:59] That's enough to analyze a video. I didn't expect that at all.[00:46:03] Alex Volkov: I was playing with this go ahead.[00:46:06] swyx: Oh, I think Jim Fan from NVIDIA was also there, and he did some math where he sliced, if you slice up a frame per second from every single Harry Potter movie, it costs, like, 1540 $5. Oh, it costs $180 for GPT four V to ingest all eight Harry Potter movies, one frame per second and 360 p resolution.[00:46:26] So $180 to is the pricing for vision. Yeah. And yeah, actually that's wild. At our, at our hackathon last night, I, I, I skipped it. A lot of the party, and I went straight to Hackathon. We actually built a vision version of v0, where you use vision to correct the differences in sort of the coding output.[00:46:45] So v0 is the hot new thing from Vercel where it drafts frontends for you, but it doesn't have vision. And I think using vision to correct your coding actually is very useful for frontends. Not surprising. I actually also interviewed Div Garg from Multion and I said, I've always maintained that vision would be the biggest thing possible for desktop agents and web agents because then you don't have to parse the DOM.[00:47:09] You can just view the screen just like a human would. And he said it was not as useful. Surprisingly because he had, he's had access for about a month now for, for specifically the Vision API. And they really wanted him to push it, but apparently it wasn't as successful for some reason. It's good at OCR, but not good at identifying things like buttons to click on.[00:47:28] And that's the one that he wants. Right. I find it very interesting. Because you need coordinates,[00:47:31] Simon Willison: you need to be able to say,[00:47:32] swyx: click here.[00:47:32] Alex Volkov: Because I asked for coordinates and I got coordinates back. I literally uploaded the picture and it said, hey, give me a bounding box. And it gave me a bounding box. And it also.[00:47:40] I remember, like, the first demo. Maybe it went away from that first demo. Swyx, do you remember the first demo? Like, Brockman on stage uploaded a Discord screenshot. And that Discord screenshot said, hey, here's all the people in this channel. Here's the active channel. So it knew, like, the highlight, the actual channel name as well.[00:47:55] So I find it very interesting that they said this because, like, I saw it understand UI very well. So I guess it it, it, it, it, like, we'll find out, right? Many people will start getting these[00:48:04] swyx: tools. Yeah, there's multiple things going on, right? We never get the full capabilities that OpenAI has internally.[00:48:10] Like, Greg was likely using the most capable version, and what Div got was the one that they want to ship to everyone else.[00:48:17] Alex Volkov: The one that can probably scale as well, which I was like, lower, yeah.[00:48:21] Simon Willison: I've got a really basic question. How do you tokenize an image? Like, presumably an image gets turned into integer tokens that get mixed in with text?[00:48:29] What? How? Like, how does that even work? And, ah, okay. Yeah,[00:48:35] swyx: there's a, there's a paper on this. It's only about two years old. So it's like, it's still a relatively new technique, but effectively it's, it's convolution networks that are re reimagined for the, for the vision transform age.[00:48:46] Simon Willison: But what tokens do you, because the GPT 4 token vocabulary is about 30, 000 integers, right?[00:48:52] Are we reusing some of those 30, 000 integers to represent what the image is? Or is there another 30, 000 integers that we don't see? Like, how do you even count tokens? I want tick, tick, I want tick token, but for images.[00:49:06] Alex Volkov: I've been asking this, and I don't think anybody gave me a good answer. Like, how do we know the context lengths of a thing?[00:49:11] Now that, like, images is also part of the prompt. How do you, how do you count? Like, how does that? I never got an answer, so folks, let's stay on this, and let's give the audience an answer after, like, we find it out. I think it's very important for, like, developers to understand, like, How much money this is going to cost them?[00:49:27] And what's the context length? Okay, 128k text... tokens, but how many image tokens? And what do image tokens mean? Is that resolution based? Is that like megabytes based? Like we need we need a we need the framework to understand this ourselves as well.[00:49:44] swyx: Yeah, I think Alessio might have to go and Simon. I know you're busy at a GitHub meeting.[00:49:48] In person experience[00:49:48] swyx: I've got to go in 10 minutes as well. Yeah, so I just wanted to Do some in person takes, right? A lot of people, we're going to find out a lot more online as we go about our learning journeys with OpenAI. We're just like, what was it, you know, any interesting conversations when you say in person observations?[00:50:05] I'll volunteer mine, which is Sam Altman came out to the after party for the conference and just stood there in his hands, no bodyguard, just him, for like a few hours, and it was, it was just really impressive how much he, I guess, personally demonstrated that he cares about meeting developers.[00:50:26] Alex Volkov: I really liked meeting everybody in the kind of the after party, whatever it was called, reception. It was very like buttoned up in the Young Museum in San Francisco. It was really like well organized. Actually, probably not surprising, but I know that like... The whole event was extremely well organized. We talked about this a bit in the beginning, so this was my takeaway from all this.[00:50:50] Folks got like 100 credit for an Uber because the party was not at the same place as the event where it usually is. To me personally, like, the music was too loud. I wanted to talk to people and not scream at people. So, like, I, I always, like, this happens for some reason, but, like, I just wanted to, like talk.[00:51:07] Networking was really powerful It was, like, a self selected event. Many people didn't get in. Like, I didn't get in until I, I, I met Logan, and Logan thankfully invited me. Thank you, Logan. It was amazing. But, it was, like, a very selected event. So, I actually met a few people. Who are working on some incredible things.[00:51:23] I met somebody who's working on AI for education for special special needs kids, for example. And he got invited by OpenAI directly because, like, he's working in Italy for all these type of things. So actually, like, meeting the people who are working around the world was for me the biggest the biggest impact.[00:51:38] There wasn't as many as I thought there would be, and shout out to OpenAI for this. But, like, please invite me.[00:51:47] Simon Willison: I'll back that up. Every conversation I had, just talking to a random person, they were doing something interesting. Like they clearly did a very good job of funneling people who are actively hands on building stuff into this event. That was really fun. I did actually want to, one thing I'll say, the venue itself for the main conference was a multi story car park that had been converted into an event venue.[00:52:07] I thought it was a great idea. Great venue. I just thought it was hilarious that we were walking up ramps between floors because the best thing about multi-story car parks is that you can park cars on the roof. So the roof was where they set up the, the, the, the, the the lunch, and they had a big tent up and stuff, and it was great.[00:52:21] I, I hung out on the roof socializing and, yeah. What a, but what a fascinating thing, like a multi-story car park that's turned into a top-notch event venue. I've never seen one of those before.[00:52:31] swyx: Alessio on, on, on the ground there with with Newton. Any founder conversations that you liked? It was, you[00:52:37] Alessio: know, the, I think the thing, you know, tab is like a, an office here, and they're doing one of the,[00:52:43] swyx: Maybe you want to introduce[00:52:44] Alessio: tab, yeah.[00:52:46] Yeah, it's one of, one of your personal companions that can chat with you in real time and, for example, Avi was using it for investor pitches, so he would get notifications on his phone during a pitch and be like, hey, you forgot to mention this and whatnot. And I know, you might remember, like, there was the rumor of, like, Johnny Ive working with OpenAI on a, on a hardware project.[00:53:06] And I think, like, this GPD's announcement. Kind of make me think of, you know, maybe they're building their own hardware assistant that you can load with a bunch of GPTs and, you know, Alex just mentioned how good it was to talk to one and maybe they want to go further down in that direction. I think that would be quite, quite interesting.[00:53:24] But yeah, I think a lot of excitement and, you know, we just announced the, the Linux based launchpad, so we're on the side of the, of the builders. We don't think OpenAI is going to do, is going to do everything. Excited to see what people come up[00:53:35] swyx: with. Cool so I will stitch up this recording. I actually recorded a bunch of interviews on site with a bunch of other founders as well, so I'll put that at the end of this, this chat to get perspectives from everyone.[00:53:46] But thanks so much for jumping on with this quick call. Very, very exciting day, and I think, I think we'll all be having a lot more takes as we build with these APIs.[00:53:55] Alex Volkov: I just want to say a quick round of thanks to everyone here, like, it's been awesome to, like, experience these changes with all of you guys.[00:54:01] Swyx, a personal[00:54:03] swyx: shoutout. It's been crazy.[00:54:06] Alex Volkov: It's been crazy, but also, like, the fact that, like, we were, like, the only space live from the actual event, and, like, we got joined by, like, 200 people in the audience. Yeah, we got we got[00:54:15] swyx: officially sanctioned as podcasters. Yeah, it was[00:54:17] Alex Volkov: funny. Yeah, we got officially, like, the only two podcasters in the OpenAI[00:54:22] swyx: world.[00:54:23] We got press passes would've had an easier time, but yeah,[00:54:26] Alex Volkov: maybe they would've let you with the whiteboard inside. If we had the press pass,[00:54:30] swyx: we, we, we made it happen. But yeah, that's another thing. Chat, GBT is not even one year old, right? Like, mm-Hmm. anniversary is November 30th. So we're 11 months in, a few days in.[00:54:42] And this is the craziness that it's been can't imagine what, what will be like in the years' time. Yep.[00:54:49] Alex Volkov: And I think Sam Altman mentioned this on stage as well, like, in a year's time this will seem like trivial. But we've got some very exciting announcements for today. So,[00:55:03] Simon Willison: let's keep talking about it. Honestly, I can't predict four weeks ahead, the rate[00:55:06] swyx: things are going. It's fascinating. Cool, I probably should let you all go, but thank you so much for jumping on. Thank you everyone. Thanks, this was really fun.[00:55:11] Part II: Spot Interviews[00:55:11] swyx: Alright, that was part one of this very long OpenAI Dev Day episode, but I promise you it'll be worth it, because part two is some of my favorite work that I've done in audio form.[00:55:22] So, I basically carried a microphone around, and when I ran into someone that I wanted to interview, I just paused them and asked them for five minutes. And the first is someone that we haven't yet scheduled on the pod, but we've been extremely friendly with. It's Junfan, everyone. Junfan from the... landmark Voyager paper and more recently, the Eureka paper all of which comes out of his work at NVIDIA and advising at Stanford.[00:55:47] So on top of actually leading a group of researchers, he's also very good on Twitter, and I think that is a very useful skill to have because you can communicate the value of your work to a wide audience, and that is something that we also aspire to do at Alien Space Pod. Don't worry. So basically just kind of hold it and then whenever you're talking just kind of hold it up.[00:56:05] Jim Fan (Nvidia - High Level Takeaways)[00:56:05] swyx: Sure, okay. The microphone's right here. Oh, it's on DJI? Yeah. Amazing, okay. The microphone's right here. I just talk? Yeah, just talk. So yeah, it's good to see you. Good to see you, Shawn, yeah. So great. Always wanted to get you on the podcast. And then, like, never got around to scheduling you in the studio, but since we're at events, like, this is the big one.[00:56:21] This is the best event to have the podcast in. So thanks for having me. Yeah, yeah and I also saw you've been tweeting us some stuff. Like, what's the most interesting to you so far?[00:56:30] Jim Fan: I think a couple of things. Like, one is kind of the economy of scale. Yeah. Cheap. The GP four and GP three APIs have become, I think that's gonna be a game changer.[00:56:40] So I just did a back of envelope calculation, like if you feed the entire Harry Potter books, like all I saw that seven books into GT four, it's gonna cost only like $15 to read all of them and double check. Yeah. Okay. And $45 to write all of them. And that is just crazy. And you can have GB four, right?[00:56:59] It's gonna be better than 3.5. And the other thing is GPT 4v API is also available. And if you feed all of Harry Potter's like, you know, eight movies into it, that's gonna be like 20 hours. Frame by frame, you know, one frame per second. It's only gonna cost 180 to watch all of these movies at 360p resolution, right?[00:57:20] So this economy of scale is crazy, and I think that's really hard for[00:57:24] swyx: other companies to beat. Yeah. Yeah. Is it a surprise to you this... The rates at which they've been bringing down their pricing. I'm not[00:57:31] Jim Fan: surprised. I think, you know, the pricing is gonna follow some kind of exponential ling from now on.[00:57:36] It's just gonna be exponentially cheaper as compute becomes cheaper as economy of scale is going. So that's one thing. And the second thing is, I am amazed by kind of how OpenAI is doing the integration. Right? If we look at the assistant API. It basically has all of the things that OpenAI developed in a one stop shop.[00:57:53] So you have like code interpreter, you have, you know, stateful API, you have browsing, and it can integrate with, I suppose, all of the plugins on the OpenAI store. And then it can also switch between those, right? We have seen those demos. So yeah, the API I think it's gonna be way better and way more flexible.[00:58:12] So that's the second thing. And the third thing is the UGC platform, right? Now everyone can build their bots and share them. You know, share not just the prompt, but actually like entire[00:58:21] swyx: behaviors, entire GPTs. That is a huge advancement. Yeah, it's really fascinating. And I think one of the things that is interesting, this is supposed to be a dev day, but actually like, I think the first half was not a dev.[00:58:32] KXFocus with low code, no code, programming with natural language. It's something they're saying a lot. And it's something you've been doing a lot as well, I've been following your work somewhat. Yes,[00:58:42] Jim Fan: yes. I feel like it's gonna be this new programming, where we'll just use natural language, and then refine it through dialogues.[00:58:48] And I think that is the most natural way to do programming in the future, and the GPD App Store is showing us a glimpse of it. Like you talk to a bot, and then you can refine the behavior, and the bot can ask you, like, clarification questions.[00:59:00] swyx: That is the way. That is the right way. Exactly. The GPT creation pane you're no longer filling out a form, you know, question, answer, question, answer, question, answer.[00:59:08] Oh, yeah. It's, you're, you're having a chat and then it prompts for you on the other pane. Yes. And I thought that was a much better way than filling out custom instructions because you don't know what you want. Yeah, exactly. Yeah, yeah. And also it[00:59:18] Jim Fan: feels very natural and intuitive because we as humans also onboard new employees in this way, right?[00:59:23] Like we don't send them a form, we have a dialogue with them and we tell them this is the expected behavior and they can ask, Ask follow up questions if there are details that are not clear. Yeah. So it is like just the most natural way to[00:59:34] swyx: program. So two, two more questions. Like Yes. One is so they, they're, there's, they mentioned the word agents.[00:59:39] They said, Sam said the word agents on stage. Yeah. But here they're calling it GPTs. Yeah. Do you see a big gap that they, they still need to fulfill to become a full agent? Or is this the, the new direction that we should think about? I think it is the[00:59:52] Jim Fan: beginning. Yeah. So. It's kind of hard to predict what agents people will, will build and also how good the base models are.[00:59:59] Because I feel that the agents robustness and capabilities are ultimately bottlenecked by the underlying model. So, GPT 4 Turbo looks like it's a bit fine tuned towards the agent use case, right? It can do better function calling, it can do better, like, tool switching. These things are critical to agents.[01:00:17] So, I'm pretty optimistic, but we'll see. We'll see, kind of, is there, like, an emergent behavior? Once you, you know, put a UGC[01:00:24] swyx: platform out there. Yeah, you mentioned tool switching. Actually, I was thinking when you said tool switching, Actually, they're also doing model switching. Oh, yeah. Which is new. Like they have some kind of internal model router or like their mixture of extras is good enough that they just don't care.[01:00:37] Yes, they got rid of the model selector and now it's the God model that does everything. Yeah, and[01:00:42] Jim Fan: you can also do retrieval. I suppose retrieval also has an embedding API in it that's automatically done under the hood. So yeah,[01:00:48] swyx: very exciting. Okay, and then the last bit is you're a lot of your work is sort of reinforcement learning.[01:00:52] Yeah. Plus plus, or zero gradients reinforcement learning. What do you think you know, and we just had, went to one of the closed door sessions where they talked a little bit about how they received their feedback. What do you think they're doing well, or like, might be a, you speculated a little bit, like, next step if, if they were to take anything from your research interests.[01:01:11] I'm also very[01:01:12] Jim Fan: excited by GPT 4's fine tuning API, right? Because the rest of the APIs we see today are no gradient APIs. You cannot really fine tune them, but you can only prompt them. In different ways, but a fine tuning on top of GPT 4 with your custom data may have completely new behaviors. And it's also a new way to program.[01:01:30] Just it's a bit more complicated. It's not programming by dialogue. It's programming by data, right? You bring a data set and then you have a new GPT 4. So I think, you know, this year's theme is customization. Customized by system API, customized by dialogue, customized by data. So I see this kind of[01:01:46] swyx: trend going into the future.[01:01:48] Yeah, I'm looking forward to it. I think there'll be a lot of work in this area. I'm excited to just go hack. I am very excited. I want to skip the after party, but like, there's so many people here in person, so it's great. Jim is actually such a curious person that he does something that a podcast guest rarely does, which is turn the mics around and ask me questions.[01:02:05] So, here's part two. Yeah, Shawn, tell us, what are you most excited about? So, I'm taking over the show, man. Of course, 360s. Me personally, I was actually not even expecting them to release most of these things today. Like, a lot of people were like, I don't think they have like the DALI 3 API ready. I don't think they have like, Oh yeah, they actually have everything ready today.[01:02:22] I don't think they have text to speech ready. It speaks volumes that when Sam Altman... Announced the Whisper three model. Yeah, no claps, . It's the smallest news, but it is actually gonna be huge . I, I[01:02:37] Jim Fan: actually I would love to, you know, put my hands dirty. Yeah, yeah.[01:02:40] swyx: On whisper. Yeah. So, honestly, I'm just overwhelmed.[01:02:43] I know some team, I know they've been working extremely hard. This is their sprints until to, to get everything all done today. Oh, yeah. Yeah. So I, I mean, I think that's, that's very important one. That, that I was just like, they just shipped everything. They just, they're, even though they're, even though they're, like, doing very well, they still push themselves extremely hard to, to be top of, and, and they're really earning their spot for, for developers and for the, the general, sort of, general AI market.[01:03:05] And I hope they take some holiday after today. Yeah, yeah, yeah, yeah. Too much of updates. And then so the next interesting thing to me is that they are integrating, they're Sherlocking a lot of the startup features, so there are a lot of startups that are built on providing RAG for people, a lot of startups that are built on like maybe building agents on top of GPT, so this is the first time where, you know, I think it's pretty common in large platform companies, like AWS reinvents often does this as well, they call this a red wedding.[01:03:34] Like, they invite all your customers to the same room, and then they're like, alright, let's see who survives, you know, step, step, step. So, that is the sort of[01:03:43] meme y, funny, joke y version of this. I don't, I mean, realistically, I'm sure Harrison and Jerry and all the other rag people, they had some heads up about all this stuff going on. But I think... Because it's built in so easily into the playgrounds, into the API, into the chatGPC itself, And also the tools, all the integrations, right?[01:04:01] You don't need a lot of tooling just to set up a simple chatbot with RAG. It's like, so for example, for my conference, we did a Summit AI bot. Where we did, where we set up a lang chain stack, we integrated it widget on the website. Now you can set it up with no code, inside of the playground, and just let people play with it.[01:04:21] It's great, but it's also very scary for a startup, because if that was your whole moat, you don't have that moat. I agree. Yeah,[01:04:28] Jim Fan: yeah.[01:04:29] swyx: That's gotta be a problem. So it's interesting that, like OpenAI can sort of easily build this in, and and obviously the Stakeful API is something I was considering building.[01:04:37] And I roughly knew that, like, this would be the next thing that OpenAI builds. This is on the critical path, for sure. So I don't build it. I agree. Yeah. But then the question is, like, alright, what do startups do? Yeah. I think maybe one thing that was missing from... Sam was like, hey, this is the biggest gathering of all your ecosystem developers.[01:04:54] They're afraid of you. You have given them no assurance as to, like, where do you think people should build. Okay. So, because, like, OpenAI just wants to do everything.[01:05:05] Jim Fan: I think so, right? Like, judging from today's trend, they literally are doing everything. Yeah. Yeah, you're right.[01:05:10] swyx: So so I feel a little bit, I mean, it's fine.[01:05:12] Everyone who's building with AI today opted in to cutting edge, and sometimes you work on the cutting edge, you bleed. Yeah, that's right. Yeah, but I do I do feel like there's a lot of tension between the startups that build on OpenAI and OpenAI itself. Yeah, so that's my two cents. Sounds great. It's great to see you.[01:05:31] Yeah, good to see you. Thanks[01:05:32] Jim Fan: for jumping on.[01:05:33] swyx: Thanks for having me.[01:05:35] Raza Habib (Humanloop) - Foundation Model Ops[01:05:35] swyx: And next, we catch up with the former guest, Raza Habib, back for his second time on the pod. Last time, we talked about Human Loop, and we recorded in London, and that was a pretty popular episode, and I love that you guys care about foundation model ops, as Raza puts it.[01:05:49] So check out the Human Loop episode if you want, but also, here's Raza's take on OpenAI Dev Day. Welcome back to the pod, you're just the second appearance. It's[01:05:57] Raza Habib: always a pleasure, nice[01:05:58] swyx: to see you again, Shawn. Good to see you as well. All right, let's just get right into it. What was most[01:06:02] Raza Habib: interesting to you?[01:06:03] I mean the sheer density of announcements. I actually, I came with high expectations and there was a lot of stuff I was hoping to see, but I think they over, they under promised and over delivered, which I thought was really good. I think seeing that they're having a second run at plugins and doing it right this time and having the GPT store and Like really allowing people to do that.[01:06:21] I thought that was really cool. Product decisions around how you design and build the GPTs, like the low code builder for these chat agents. I thought that was really nicely done. That they have this conversational interface that elicits from maybe someone who's not very expert how to do prompting and things like that.[01:06:38] I thought it was really[01:06:38] swyx: thoughtful. It fills out the form for you, right? Yeah.[01:06:41] Raza Habib: It's a very simple thing, right? Like, ultimately, it's just filling out the system prompt and filling out what abilities it should have. Yeah. But actually, despite its simplicity, I think it's very powerful, and I was impressed by that.[01:06:52] So, yeah. A lot of really cool things. And then all the changes to the API I'm really excited about. I have some questions. Like, I'm not, I'm not uniformly positive about all of the new API things, but I'm[01:07:02] swyx: sure they'll get there. Okay what, anything in particular that you want to touch on?[01:07:07] Raza Habib: Yeah, so I think like, things that I'm excited about with the new assistance API, or like the new APIs in general, like multi modality is really cool, longer context window is really cool.[01:07:17] I think everyone's going to be super excited about that. JSON mode is like, it seems like a small feature, but actually so many people say this is a problem for them. So I think that's going to be great.[01:07:26] swyx: So I maybe missed the importance of this. Isn't that the same as the function calling API?[01:07:31] Raza Habib: It's related, but you might want to have it in context where it's not strictly doing function calling.[01:07:37] swyx: Huh. Right. Okay. So a little bit more general. Typically I'll just make up a function that isn't actually a real function that Yeah, even[01:07:45] Raza Habib: then, people say that for complex things, sometimes it violates the valid JSON thing. So I think just making that more reliable. Some stuff that I thought was, initially I was excited about, and then as I've, like, chewed on it a bit more, I'm a little bit less clear.[01:07:57] So one is this, like, ability to jump in a bunch of documents and have it do RAG for you.[01:08:01] Jim Fan: Yeah.[01:08:02] swyx: I think, like... 20 documents max or something. Yeah, I[01:08:04] Raza Habib: think that, like, it's... It's a cool feature, but it feels a bit gimmicky to me. Like, it feels like for serious, practical applications, it's going to be hard to get that to work.[01:08:11] If you think about what a large enterprise needs for RAG, like, it's, you know, it's rarely sufficient that you can just jump in a bunch, dump in a bunch of documents. How you do them matters, there's usually permissioning, as like, which users can actually access which bits of data, like, there's so much control that I think most developers would want to have for serious applications, that I think it's cool for the, like, GPTs and the low code version.[01:08:32] I'm skeptical that it'll get that much use. Yeah. By serious developers. And I feel the threaded, stateful, like, assistance API is really awesome, but I would like more clarity over how it's doing the, like, statekeeping, like, what ends up in the context. Yeah. I think for that to be really popular, they need to make that transparent.[01:08:52] swyx: Yeah. There's an API booth downstairs. I don't know if you've seen it. I've gone and spoken to them. They wouldn't[01:08:55] Raza Habib: answer any of these[01:08:55] swyx: questions for me. Okay. Yeah, of course. But, you know, obviously that greatly affects HumanLoop.[01:09:00] Raza Habib: But this is you know, this is commentary over what I think overall was a set of really[01:09:04] swyx: exciting announcements.[01:09:05] Yeah. And, and last time we talked, also, you were talking about, we were talking about the multimodal APIs. And now you have it. It's finally here. What, what happens now? As I, as[01:09:14] Raza Habib: I said to you when I spoke to you last time, right? Like, it's a relatively straightforward addition to the HumanLoop product.[01:09:19] Like, everything will continue to work, but now you'll also have images in and images out, and audio in and audio out. It's kind of interesting, like, seeing, you know, the assistance playground for OpenAI that they just released, and things like that. Like, it feels like they're starting to get close to supporting all of these things, but not quite yet.[01:09:35] Yeah,[01:09:36] swyx: yeah, excellent. And then, I think the last part is, I saw HumanLoop actually, probably not you, probably somebody else, but also talking about the fine tuning. There was a price drop, I don't know how much, because there was just so many announcements. But I imagine that's only good things for fine tuning.[01:09:49] Yeah,[01:09:49] Raza Habib: I mean... There's so many other stuff. I also missed the price drop, but I know from speaking to folks at OpenAI as well, that they think a lot more people should be fine tuning. Yeah. Fine tuning is gonna have, like, huge importance in the future. That's why they're building out the UI for it. You know, so it's something they're investing in very deeply.[01:10:05] Simon Willison: And,[01:10:05] Raza Habib: yeah, I still view fine tuning as, like, an optimization step. Yeah. I think of it as, like, the compilation you do, like, once you have something that's working.[01:10:12] swyx: Which is what they said in the LLM performance session just now.[01:10:15] Simon Willison: Okay,[01:10:15] Jim Fan: cool.[01:10:16] Raza Habib: I'm glad that my tips are aligned with opening hours. I[01:10:19] swyx: think you're very aligned.[01:10:20] You're often leading them in what they say publicly, which I think is good.[01:10:26] Raza Habib: Yeah, what about you, Shawn? What did you think?[01:10:28] swyx: Oh, I've said this in a previous recording, but effectively, I also thought they would do much less than they did today. I think they under promised and over delivered, exactly like you said.[01:10:39] And even things like text to speech, which... It's not just text[01:10:43] Jim Fan: to speech,[01:10:43] Raza Habib: it's really good text to speech. So I, like, I think I told you last time, I did like a near year long internship at Google, and I was working on the first neural TTS team. Like, the team, the Tachytron team there were amazing.[01:10:54] swyx: So what did you get from their demo?[01:10:57] I[01:10:57] Raza Habib: think I need to play with it more, but I was impressed by the quality. Yeah. Like, the quality of the prosody, the variation. I think they're only releasing six voices, but...[01:11:05] swyx: And the secret seventh voice with the pirates. The[01:11:07] Raza Habib: secret seventh voice with the pirates. And then I was chatting to Andre just now.[01:11:12] Yeah. And he was saying that internally, like, they have voice cloning set up as well. Yeah. So they can do it with something like 30 seconds of speech. I'm not sure that's public. Is it not public? I don't know. He didn't tell me it wasn't public. Okay, alright, alright. Maybe, maybe filter it out[01:11:25] Simon Willison: when you publish this.[01:11:27] swyx: For what it's worth, I've been talking to a lot of people in and outside of Dev Day, and a lot of people have heard about the voice customization stuff, so it's not really going to get anyone in trouble, I don't think, so I just chose to leave it in there. Whatever, I mean, it exists elsewhere in other products, and I think it's fair play to compete with other companies who[01:11:48] Raza Habib: are already doing this.[01:11:50] For obvious reasons, right? There's a lot of safety concerns about releasing that kind of[01:11:55] swyx: product. And for what it's worth, someone else, I think, Fixie AI, did a comparison of the pricing. They are severely undercutting like PlayHT and some of the other text to speech companies as well on the pricing.[01:12:06] They're between 3 to 10 times cheaper[01:12:08] swyx2: per second or something than the other existing TTS companies. Yeah, I think that's very interesting. I think in general... Their promise to keep cutting prices and then following through is building a lot of confidence. People, people who weren't previously nervous about building on them.[01:12:22] What's interesting, I think, is that as the, like, because they have such a large economy of scale, and they continue to drive down prices, the option of, like, self hosting a fine tuned model, even for smaller models, starts to be, like, less obviously economical, because of the, like, spin up and spin down costs.[01:12:39] So unless you have the, like, volume of usage to justify having it on all the time, It actually starts to become cost competitive to use one of these third party APIs rather than having even a smaller model. Right, because it's serverless in a way. So what, can you give people an idea of what kind of volume that is?[01:12:55] Are you talking about concurrent requests?[01:12:57] Rahul Ligma: It's, so if[01:12:58] swyx2: you look at most of the people who will provide you in like a serve model, if you look at a replicate or a mystic AI or something like this. Yeah Fireworks. Fireworks, there's a few of these companies. They tend to actually charge by like compute hour or compute minute.[01:13:13] Yeah, and so if you're not like gonna have it on all the time then like the reason is dollars the reason Yeah, you end up needing it on all the time though, because there's like spin up spin that cold starts And so if you don't actually have enough usage to justify having it on all the time, it starts to become cost competitive to just use OpenAI.[01:13:31] Yeah, so what I'm trying to get to is, it's just dollars though, like if it's like 5 an hour, whatever, like...[01:13:38] Reid Robinson: Yeah, I agree,[01:13:39] swyx2: depending on your use case, but yeah. Okay, got it, got it. Alright, cool. Well, thanks so much for jumping on. I know this is last minute, but it's just nice to see people. No, no, I always, I always love chatting with you, so hopefully we'll be more of a visitor in the future.[01:13:50] Yeah, for sure. The next guest is going to be a new name to many people. He hasn't done many public appearances, but he is a force to be reckoned with on Twitter.[01:13:59] Surya Dantuluri (Stealth) - RIP Plugins[01:13:59] swyx2: His name is Surya Danturi, and this is the story of somebody whose startup got killed by Sam Altman. So we're here with Surya. Hey. Hello. My name Surya.[01:14:07] You're new on the pod, but also we've been around each other in, in the tech circles. Yeah. For, for a little bit. You're, you're a fa very famous developer of Vector databases Yeah. And of plugins. Yes. What, what what, what are some of the plugins that you've done?[01:14:20] Surya Dantuluri: Yeah, so I worked on a few plugins.[01:14:22] I work in like, chat with pdf, f chat with like video, chat with website, chat with like get it made, yeah, like a lot of cool plugins.[01:14:29] swyx2: Making decent money[01:14:30] Rahul Ligma: too.[01:14:31] Surya Dantuluri: Yeah, I mean you can, they give like better functionality to like the whole GPT 4 interface. Initially I wanted to do my homework with them so I'm like, I might as well make a plugin for it.[01:14:40] So yeah, I mean they give there's like a lot of cool functionality, like I made one with the called, chat with like instructions, which would allow you to save more custom instructions and use that when you're talking to GPT 4, but Yeah, I mean, they're making revenue it's pretty, it's pretty sick for, you know, people paying in 85 different countries.[01:15:00] It's like nuts how many people are like, or how many, how big the the scope is, or how many[01:15:05] swyx2: people can use it. And I think you may have shown me this before, but there was a plug in platform that you use for monetization? No. No? Oh, you build your[01:15:12] Surya Dantuluri: own, you build... I build my own thing, all custom,[01:15:15] swyx2: I've seen someone do, like Firebase[01:15:16] Surya Dantuluri: for, yeah, yeah, yeah.[01:15:19] Yeah, I don't know. R. I. P. No, I mean, they're doing well, but like, I just don't want to, you know, pay a 10 percent tax[01:15:24] swyx2: and all that stuff. Yeah, yeah, yeah. For sure. Obviously, you're very technically savvy. Okay, so what happened today? They announced GPTs. What's going on?[01:15:33] Surya Dantuluri: Yeah, so like, I made a tweet this morning being like Sam won't let me kill my startup.[01:15:37] And a joke, okay? I just wanted to talk, like, I was like, I was trying to notify people while I'm here and I just wanted to meet up. I made up the joke. And then a couple hours later my friend, Matt he works at Julius, he showed me the new UI, I'm like, okay, cool, and he forced me to look at it on my phone, I'm like, okay, sure, I'll, I'll pull it up I pulled it up on my phone, and plugins were gone, plugins were gone you don't, you can't, I think you can go between models, so you can go between 4 and 3, but the whole options of, like, code interpreter, and like dolly 3, and all this stuff, All of those good stuff were gone from the UI.[01:16:12] I think this is only if... This only applies for people who are here at the event. I think they gave access, or like the new UI to people here. And they also... But yeah, plugins were gone, and I'm like, oh s**t. And I asked the person, like, hey, like, where... Where are the plugins? Like, where can I... Like, where are the plugins?[01:16:28] Like, where do they go? They basically told me, like, You have to make a new GPT as a developer. And you can import your schema into the new GPT. And only that way can you you know, kind of revitalize your plugin, but[01:16:42] swyx2: your existing users will be[01:16:44] Surya Dantuluri: like, no, I think they're gone. I mean, I gone, they're, I haven't looked at my stat today, but, well, I[01:16:49] swyx2: mean, this is not widely rolled out yet, but when it, when it rolls out, when it rolls out, I'm pretty[01:16:53] Surya Dantuluri: sure all of the plug-ins, they have to discover you again.[01:16:56] Yeah. They're kind dead. I mean, there's like no way. I don't think there's a way to link them. Yeah. Like there's like no way for the users who were using it previously to be using the new thing. Know. But I mean, it's an exciting project for me, it's not like a full time thing for me, it's a fun project to do, and like, it's like a nice nice thing to work on.[01:17:13] So I'm really bullish on, you know, the whole new GPDs thing, I think they're a better abstraction. Yeah, I think GPDs are a few open end engineers, and I was like, agreeing with them, because like, I think GPDs are a much better abstraction on what plugins were supposed to be. I think plugins kind of died on arrival.[01:17:29] Well,[01:17:29] swyx2: Sam said they did not have PMS,[01:17:31] Surya Dantuluri: right? Yeah, obviously, yeah, he said that a long, he started that, he said that, like, one plugin started. Yeah. So it's like pretty nuts. But, yeah, I think, I think GPs are a better abstraction and I also love their doing revenue share. So, yeah, revenue share is also a good thing.[01:17:45] Because, like, GPlugins were, like, a really weird way of monetizing, you had to, like, do a bunch of finicky stuff but yeah, I mean, also, like, just, by the way, for people who don't know, po, you know PO right? Yeah, PO did this a long time ago. They did this a couple months ago. They help, they have, they have these bots, they call it botch.[01:18:02] And you can, you know, make your own like poem bot, or you can make your own like essay bot or whatever. And then the bots have customer instructions and also they use a very specific model that the developer specifies. And you can install these botch or you can chat with these botch and the botch will do whatever whatever the developer made them to do.[01:18:21] So I think. They're just basically open edged, made the same thing, and they brought it over to them. But, yeah, but, effectively, plugins are kind of dead. Oh, RIPs. Yeah, I mean, RIP, but, it was a fun pro I mean, it's fun. I think GP I think GP Honestly, it's good that plugins died, Because, like, they had a bunch of issues.[01:18:40] So, one of the issues is that you can't share them. You can't share a link to them. GPTs, you can share a link to them. So, like, I can share my link to my GPT thing to you. So it's much better for discoverability, because previously the only way to discover a plugin was through the plugin store. You had to search for it, you had to do a bunch of stuff, and it wasn't very good in that aspect, but sharing a link to them, having revenue share And you can also, like, give custom instructions, custom context, so they also came out with, like, retrieval or whatever, and that can basically give you, like, a custom vector database directly in your GPT, I think.[01:19:15] So that's all great all good features that that should have came with plugins, probably,[01:19:19] swyx2: but. Yeah, awesome. And then lastly, just like, any of the new stuff that was launched today what interests you in sort of building with them? Like if you were to build on the new API[01:19:30] Surya Dantuluri: Yeah, totally. I have some ideas.[01:19:31] The thing is like this is really weird to say, but like, some of my ideas that I've said before for plugins, They kind of get copied quickly.[01:19:43] swyx2: Oh, so you want to keep it to yourself? Yeah, that's fine.[01:19:45] Surya Dantuluri: Yeah, but that's one part of it. The second part of it, I don't have any good ideas regarding what you can do with all the new functionality.[01:19:52] Like, that's like a good product. I don't know, honestly. Tech2Speech came out, their internal VectorDB thing came out. internal vector[01:20:01] swyx2: DB thing? or, like, retrieval, or whatever it's called yeah, people have been saying they have an internal vector DB thing but, it's it's just retrieval yeah, it's like zero non configurable it's going to be for, like, simple use cases fine then after a while you're gonna need one of the controls over chunks and stuff yeah, I'm[01:20:17] Surya Dantuluri: also excited by what happens with our Contacts window I was a big user of Cloud for a while because Cloud, they basically gave you 100Ks context window widely on the UI And you can upload your PDFs to it, and everything would work very well.[01:20:30] Yeah. But, I think Cloud had some issues regarding, I mean, actually very recently, Cloud came out with this whole b******t thing, b******t copywriting thing. So like, Copywriting thing? Yeah, yeah, it's really weird. So, if you upload a PDF now, out of Cloud, like just this week, they made this weird tweak, where it doesn't answer any questions, because if there's a copyright symbol or a copyright name, Anywhere, it just like blocks you[01:20:53] swyx2: out, and it's like, what?[01:20:54] Apparently you can prompt inject that by insisting that you are the author, and then it just overrides it. Oh, really? That's funny. It's like, don't worry, I got this, I'm the author of this, there's no copyright issue.[01:21:05] That's it, okay, cool. Anyway so thanks, this is a really good story, and I wanted people to share it, and I'm excited for what you work on to become more public. Yeah, thanks Swyx. Alright. So that's what happened to Chat2PT plugins, which we covered back in March. But don't worry, that's not the full story.[01:21:20] Reid Robinson (Zapier) - AI Actions for GPTs[01:21:20] swyx2: His startup is not fully dead. We actually cover what happens later on. I just wanted to capture the confusion that was happening at Dev Day. So he referred to Julius, and we'll actually talk in and check in with Rahul later on in this episode. But first, we have to go to our next guest. When OpenAI launched with GPTs and the Assistance API, one of the lead launch partners that they launched with was Zapier, and I managed to catch up with Reid Robinson, who is lead AI PM at Zapier, to talk about it.[01:21:49] All right. Well, Reid nice to meet you. Great to meet you too, Shawn. It's really great to run into you as we're leaving. So you guys had a... Big sort of partnership launch on stage. Yes,[01:21:59] Reid Robinson: yeah, we launched AI actions for GPTs, which we're really excited to see out there. We also today launched an update to our chat GPT integration that supports the assistance API functionality that was announced.[01:22:13] And[01:22:13] swyx2: you were one of the earliest to go. In my mind, Zapier was very, very early in the natural language actions. NLA,[01:22:19] Reid Robinson: I don't, I don't remember what, good memory. Yeah, yeah, yeah. We launched our natural language action, actually. So we were a launch partner for chat BT Plugins. Yeah. And that's when we launched our Natural Language Actions, API, and actually the AI actions that we're calling it today kind of a, we're rebranding that side of thing to really focus on a lot functionality.[01:22:35] Yeah. For that.[01:22:36] swyx2: And I just interviewed Surya, who's one who's a pretty prominent plugins, developer. Plugins did. I, you know, reborn.[01:22:43] Reid Robinson: Yeah, it's going to be interesting to see what happens. There's clearly a difference. I think one of the things I talk about is the fact that, you know, with GPTs, you're able to constrain the prompt quite a bit, like our plug in for ChatGPT, the initial one.[01:22:55] You needed to give it access to every single action you ever wanted it to have access to. Which meant that the kind of con You know, I heard anybody who's familiar with context is sitting there like, Yeah, that's gonna be an issue. The common one I give is like, you know, If you had given it Gmail and Google Calendar and asked it like, Hey, what's going on next week on my, like, agenda?[01:23:11] It would sometimes search Gmail. Cause it'd be like, yep, events are in Gmail. Or like, you know, calendar invites are gonna go to Gmail, So I should search there. But now you can, you know, define what apps it should use. You can define, like, how it should use those. So some really fun use cases. I mean, honestly, we've been hustling hard to get this out there.[01:23:30] I'm really excited to see what people actually build with this and what gets released there. Yeah, we'll be monitoring and trying to listen to people[01:23:37] swyx2: really closely. And so, like, something that's interesting about Zapier is that you are a collection of actions in and of yourself. So there's kind of multiple layers in which to do this.[01:23:47] Like, what should exist at the GPT layer? What should exist at the Zapier layer? Yeah, well,[01:23:52] Reid Robinson: what's nice, I mean, it's a good point. We have about 6, 000 apps on the platform today. Really what the AI Actions is, is it's the ability to use any of those searches and actions using kind of a natural language input.[01:24:05] That would be like the instruction that the model gives it. So it's like, you know, check this user's calendar for Monday. And, you know, it might even give the, you know, the actual date for Monday, right? Zapier on our side will take that natural language request and process that into an actual API, like the actual API call to a tool like Google Calendar, and then we all work on the response.[01:24:26] So, you know, you can't just take the entire response of a, especially like Gmail, responses are very, very, very, very, very long, and very confusing. And so we actually do a lot of work to kind of, if you will, like massage that data, so that it makes sense for an LLM on the other side, that it is giving it the right, it's kind of like information it needs and not just like the entire payload.[01:24:47] It really helps it kind of deliver like a more, again more contained, more refined experience for leveraging integrations alongside like down[01:24:56] swyx2: to the T. So, existing Zaps cannot be poured in one for one over to[01:25:02] Reid Robinson: It's really one off actions, that's the better way to think about it. And you can chain them together in the you saw in today's demo you're only using Google Calendar for the search and a slack action.[01:25:11] You can actually chain those together. And so, you know How much is that as like a one off action versus an actual, like, all of a sudden, as app? But in this case, it's almost more like the trigger is the human in Chats GPT, right? Like, you need to trigger it to run for that. But, on the flip side, you know, the assistance API is extremely exciting for me as well, because you look at, now, like, the, that functionality of building a GPT, you know allows you to Still getting used to the name?[01:25:36] Yeah allows you to kind of port that over to run asynchronously. So a common one, like the two examples that I love giving for that API that I love in Zapier is number one, like data export. You know, think of every tool out there like Looker, Mixpanel, Amplitude, all, so many tools are able to send these like massive exports of CSV data on a regular basis.[01:25:57] Like you could say, hey, every Friday export my blog traffic content, or see CSV, right? Normally, someone's gonna get that CSV and have no clue what they're doing, right? But now you can actually create an assistant in Zapier and you can give it instructions to say like, Hey, tell me the top 10 performing blog articles in the last week.[01:26:15] And also, you know, tell me highlights on, you know, maybe keywords that were used or SEO tags that were used and how that impacted conversions, right? Like, you can be pretty detailed depending on what you're providing it. And that can now run asynchronously. That can run automatically. So every Friday, you know, 8am, you could be getting the export of that data.[01:26:32] It's gonna go to an assistant. That assistant's gonna reply with even charts and graphs. And those will come through and you can then send it to Slack. And so you can have, every Friday, a conversation, a post in your team's, you know, blog team's Slack performance. And that'll run automatically. And then they can even reply in Slack to that post and have a continuous conversation with that assistant.[01:26:54] swyx2: Oh my god, so it's like really[01:26:56] Reid Robinson: everywhere. Yeah, so you can really put them everywhere. And that's, that's one of the things I like about what's released. And I think people are going to continue to learn really just how kind of Wild that is is the fact that you can like use your actions in the UI of TypeTBT in a one off action but you can also run these things extremely well asynchronously and Yeah, like OpenAI releasing API support for the vision model and for code interpreter and retrieval that these assistants can use It's really cool.[01:27:26] swyx2: Is there a Zapier angle to any of that? They're all the same, right? Like you would do[01:27:31] Reid Robinson: in Zapier, right? The whole creating of an assistant and running that through an assistant is today's support. You can do that literally right now. So it's really cool. And the other one is retrieval, right? I talk about, you know, you could go in and create an assistant.[01:27:45] Give it, let's say, you know, I talk about our accounting team a lot, right? You could give it like if you have a team that approves budget requests from your company, right? Everyone does, right? They can actually have, take their Slack channel or to create an assistant first that would have the documents of your policies, of like, Hey, here's what you can expense, here's how you can expense, here's eligible, ineligible, right?[01:28:03] All these sorts of things, and actually then set up something like a cat I'll pick on Slack, it's just easy. Like a new message in your accounting... Budget requests channel, and have it trigger a, the assistant and send the user's requests to the assistant with all of your documentation with retrieval and now it'll try to understand what your policies are, what everything is and check the information against what the, and you could even like I did one internally where, We have a tool called, I think it's called Stacker, that tracks each employee's, like, software budget, and home office setup budget, right, so you can see how much they've spent of their budget, and you can actually include that data in the context of the user message, so that the model will be able to say, like, hey, I see you want to expense this webcam it's actually over the recommended budget, but you personally do have budget left if you wanted to use it for that, right?[01:28:53] And, Some autonomy there. Yeah, and that's really cool. So you can start to do all of those sorts of things now in Zaps that really were never possible. So yeah, the querying of knowledge, running of data analysis, writing code even. I[01:29:08] swyx2: think in a very real way, you are the perfect partner to OpenAI because they've sort of built a reasoning sort of glue between all these things.[01:29:15] It's[01:29:16] Reid Robinson: definitely been a good and fun partnership. I think, yeah, the big thing for me that I would say is like, I'm really, really excited now to just see what people do with this and how we can improve[01:29:25] swyx2: it. Yeah, awesome. Is there anything, you know, you've been developing with these APIs for a while. Is there anything that you caution people not to get too excited about?[01:29:32] Like, what, what, yeah.[01:29:34] Reid Robinson: I mean, callouts I'll always make is like, double check accuracy, right? Like, you want to call out, like, okay. Like how accurate is to make sure that information is accurate? Make sure you're putting some human in the loop steps before you're putting this[01:29:46] swyx2: into like a critical, which they, and like confirm, deny, yeah.[01:29:49] Simple.[01:29:49] Reid Robinson: Yeah. That sort of thing. But even, yeah, all sorts of things you really wanna make sure that you're comfortable with. Like what can go wrong, what is likely to go, right, right. Like all those sorts of constraints. The other side that I often talk about is just like, keep an eye on, you know, if you have freeform human input somewhere in your application that is triggering these things, you know, that can sometimes risk, right?[01:30:07] Yeah. Prompt injections. Those are a real thing, and I think, you know, a lot of people are still trying to figure out what that means, and how bad that can be, and so I always try to caution people about that as well, right? Like, you really want to be realistic on, kind of, how far reaching you're doing this, so, yeah.[01:30:25] That's why I like, like, the internal use cases, you know, like, things like that is a great way to start, to get familiar with the technology, to get familiar with the constraints for that. Other than that, no, I mean the voice model stuff I'm really excited to try that. I really want to, yeah. Yeah, that'll be[01:30:40] swyx2: really cool.[01:30:40] I love the secret pirate mode that they demoed. I don't know if you caught that session. I didn't see that session, no. Obviously there are six voices, but there's a secret seventh mode if you add in a prompt to speak like a pirate. Love it, love it.[01:30:54] Reid Robinson: That was an old I don't know if you remember Facebook way back in the day had that as one of the languages you could select?[01:30:59] Yes. Yeah, yeah, so that reminds me of that.[01:31:02] swyx2: Yeah, lots of fun to be had with AI as well. Okay, well, thanks so much for jumping on. I know it's very random, but also, yeah. People love to hear from builders, so, that's awesome.[01:31:12] Reid Robinson: I love[01:31:13] swyx2: hearing from builders. And most of the interviews were done as we were sort of leaving the Dev Day venue and going to the after party.[01:31:19] Div Garg (MultiOn) - GPT4V for Agents[01:31:19] swyx2: And I caught Div Garg of Multion, who we've been talking around and circling around a possible episode on. He's definitely one of the leading voices and thought leaders on agents. Because he's building a browser agent that's a very prominent one. Unfortunately, I have to take an L on this one because the audio is not great.[01:31:39] Div's mic wasn't working, and I don't know what happened to it. I, I try to always check these things, but you're only gonna hear the output from my mic, which is slightly worse, but I opted to leave it in because Div is actually building an agent. with OpenAI stuff, and had access to GPT 4 Vision, and I think that people building with GPT 4 Vision will be surprised at his answer to me on whether or not it's useful for agents.[01:32:02] Good to meet[01:32:03] Div Garg: everyone, I'm Dev, founder of MultiOn, which is an AI web agent that can automate browsing for you. So we can book your flights, order stuff on Amazon, order dinner, whatever[01:32:11] swyx2: you can imagine. Yeah, and I was actually reflecting, so, I, everyone who listens to this already knows what was announced.[01:32:17] I was actually reflecting that they didn't have any browser based actions. So what were your thoughts on just generally their approach to agents?[01:32:23] Div Garg: So they, it'd be very interesting because I feel like browser actions are just so risky. So, and like, things can go wrong. So if you're a big company or you're OpenAI, you won't, you won't want to build that.[01:32:31] And they're like better off just like relying on a third party who like wants to own that. And that's also the strategy we are, we are taking with them. We're like, like, like OpenAI launched like a ZP integration for APIs. But we want multi end to be like the new API solution. Like, I want to do things beyond APIs.[01:32:45] I want to connect to my personal accounts where I just have my... Logins already or I already have the cookies and I want to go and like interact with my personal accounts or personal data Very easily and I think it's very fascinating for us where we can like launch a multi on integration With the new platform and then you can just go and like give it a command like oh like can you book this platform?[01:33:04] me or chatgbd and then it will launch a browser and the browser you can see what's happening and then we go do the whole Thing for you, and it'll be all seamless And then people can have a lot of fun just like Trying out all these different capabilities and like automating their, like, daily workflows.[01:33:18] You can, like, save this as custom integrations for different agents. You can have different custom, like, multi on prompts that are already, like, pre saved. And then you go like, oh, I want to now go order something on, like DoorDash. I want to order my favorite burger. Then like chatgp can go and like, suggest you what our favorite burgers are, and then it's like, okay, like, now order this for me.[01:33:35] Multion, and then Multion we solve the payment for you, we solve identity for you, and like, we are owning all the risky, like, actions I can[01:33:42] swyx2: take. So so you, you're gonna build a GPT version of Multion? Yeah, we'll have a Multion GPT. You, you, okay, will, will that be like a replacement to your existing thing, or just like an alternative way?[01:33:53] To use their same APIs or something like that. So it's[01:33:55] Div Garg: like, the direction we're going for is we want to make our AI, like, agent embeddable within existing applications. So we are launching an API. Okay. And we already have a, like, a touch ability plug in. And so this will be like, sort of like a little, like use the API to power this sort of, like, new GPT experience.[01:34:10] So for us, we actually don't have to, like, change anything. It'll be, like, very streamlined, just make it our API. And to chat GPT, and like people can start using[01:34:17] swyx2: it. Yeah, yeah, awesome. What about the, I guess, the Vision API? I think one of the things that have always constrained browser agents is the DOM.[01:34:24] Right. Which is very heavy. Yeah. And so the alternative approach is to use Vision. Would you explore that? What are your thoughts? So, for us,[01:34:30] Div Garg: we actually had, like, early access to the Vision API for more than a month. And we tried it on a bunch of websites we 5 percent of the websites is actually really useful, which are more, like, image heavy, because 95 you do OCR, that's good enough.[01:34:43] Yeah, it's not We have really good, like, parsing, so most websites we can compress less than 3k tokens, so we are not, we don't really have to, like, worry about the how heavy the text is. We, so we had one interesting use case about the Vision API. We had a user... Who got it to work on Tinder, and and then like the, then like Multion...[01:34:58] Hot or not?[01:35:07] swyx2: Yeah, and then we oh, can you have found the killer use case for Multion. Yeah. Like, this... We did it with our laptop, right? Yeah. Oh my god. Okay, interesting. Interesting. Okay, but so, but only image heavy sites. That's surprising to me. Yeah, that's surprising because you know, the original vision demo, they actually showed a screenshot of Discord, right?[01:35:27] And they had perfect OCR. Yes, it's true. But they should be good for you.[01:35:32] Div Garg: It can be very interesting. But the thing is like, even without vision, we can just do like so much things. Yeah. So like adding vision maybe like helps a. But not it's not, like, really game changing for us[01:35:42] swyx2: right now. That's surprising.[01:35:43] Okay. Well good, good to know. Anything else that you would highlight from today?[01:35:47] Div Garg: I'm just, like, really excited about, like OpenAI trying to become a, like, a marketplace. Yes. An app store. Yes. So if this can take off, they could potentially kill, like, Apple App Store and become, like, the new thing there.[01:35:58] And then it's really hard to say, like, how things will go. They've tried this with plugins before, but this is like, this might actually work this time. But we're just really interested to see, like, how two years from now, how a lot of the development might, like, how the world looks like. And I'm very excited about, like, two years from now, like, everything will be so different.[01:36:14] We might not even use computers or even, like, mobile phones. You just have a system, you just talk to it, and the system goes and does everything. It'll be a fascinating[01:36:21] swyx2: world. So one last question before we go. You have a nice side gig teaching at Stanford. While you, you were a PhD student and then you put on top.[01:36:28] But you, you're still teaching or curating Transformers United? Yeah, so I dropped out[01:36:33] Div Garg: from the PhD but I'm still a[01:36:34] swyx2: lecturer at Stanford. Yeah, okay. So, like, what paper should people read to like, like, catch up on this? Like, what, what, what is like, top of mind in terms of like research that is informing what we're seeing?[01:36:45] Yeah,[01:36:46] Div Garg: that's definitely very, it's a good question, because things are moving so fast, and there's like hundreds of research papers coming out, like, literally, like every few days. I'm really excited about, like, developments that are happening at, like, Meta, so a lot of this work is open source, all the Lama stuff, all the Mistral stuff, I feel like that's very interesting on the transformer side.[01:37:02] swyx2: Do you believe sliding window attention was the key for Mistral?[01:37:05] Div Garg: I feel so for them, but I feel like there might be other ways to do it. There's some secrets, right?[01:37:08] swyx2: There was probably some secrets. Yeah. Okay, well, that's all the time we have, but thank you so much. Thanks a lot. Thanks. Okay, and our next guest is Louis Nightweb.[01:37:15] Louis Knight-Webb (Bloop.ai) - AI Code Search[01:37:15] swyx2: CEO and co founder of Bloop AI, and organizer of the AI meetups in London, where he is a very prominent and staunch member, unlike Raza, who has defected to San Francisco since our last conversation. Louis always has very interesting takes in person, and it was a pleasure to finally actually get him to come on the pod, but also, we recorded this while inside of a Waymo on the way to our afterparty.[01:37:39] So Louis, you are new to the pod, but we've been friends for a while. Maybe explain, maybe introduce yourself and how you come to the world of AI. Yeah,[01:37:48] Louis Knight-Webb: I guess, so we started Bloop, me and my co founder three years ago in a very different era for, for machine learning. And we both started the company because we wanted to help engineers navigate large code bases in a much better way.[01:38:07] Yeah. And originally that was Training our own models to do natural language code search. And today, we still do that, but obviously those language models are very small compared to the state of the art. Yes. And so they're just one part of a... A much bigger pipeline.[01:38:24] swyx2: I see you as a very astute technologist.[01:38:26] You used to be a VC. You wrote the first check into HumanLoop. And you used to share an office with HumanLoop. To the point that I called it HumanBloop. Yes. I think you liked that.[01:38:36] Louis Knight-Webb: Yeah, I did. That is good. We're considering renaming.[01:38:41] swyx2: And you also run AI Tinkerers in London.[01:38:43] Louis Knight-Webb: I do, yeah. London has a kind of a slightly different mix of talent than, say, San Francisco.[01:38:50] You've got a lot of agencies, a lot of enterprises. And so Yeah, we just felt a need to start like a very startup focused event and that's why we created AI Tinker at London.[01:39:00] swyx2: Yeah, I think Alex Gravely would be very happy to hear about all the stuff that you've been doing. And I've been to one of them and it's really good work.[01:39:07] I might be the only one that's been to been to both.[01:39:13] Okay, so let's fast forward to today. A whole bunch of things was announced. What's top of mind for you? Yeah, so,[01:39:19] Louis Knight-Webb: I think, like, context length is something that that we spend a lot of time evaluating whenever something new drops. All of the, kind of, standard evals you know, the, the, kind of, literacy tests, things like that.[01:39:33] They, they generally don't do a good job of measuring whether a model can actually use the context length that it, that it claims it has. Yeah,[01:39:42] swyx2: context utilization is... That's what I saw Will DePue today call it.[01:39:46] Louis Knight-Webb: Exactly. And so this basically started maybe five months ago over the summer when Claude 2 dropped and you know, obviously it had 100k context and we were really excited about that.[01:39:57] So we ran an experiment to see basically if we hid 10 pieces of information in the prompt and we increased the size of the prompt, you know, so you do it at 1, 000 tokens, 000, etc. up to 100, 000. How many of the original 10 pieces of information can it retrieve? And we essentially found that the accuracy drops off a cliff between one and 10, 000 tokens and so, and we repeated the same experiment with GPT 4 and, you know, we found similar results that 32k GPT 4 can only find one of the 10 pieces of information but if you were only using a thousand tokens it can find nine of the pieces of information.[01:40:36] So what that tells us is that, you know, context utilization 5 months ago was, was, was not great with, with all of the state of the art models. So, with the announcement of 128k today and... That's the first test you'll run? That's the first test I'll run. Okay. You know, having spoken to a couple of the team members who...[01:40:53] Do eval today from OpenAI, you know, they're pretty confident that the model's got better ability to to answer questions at those context lengths, so it's time to,[01:41:02] swyx2: time to measure. Time to measure. Any other of the API features reproducibility, does that matter to you?[01:41:08] Louis Knight-Webb: I think, to me personally, no.[01:41:11] I kind of like the creativity. I normally have my models at like, you know, 0. 7, a bit of temperature. But I know lots of people on the Bloop team who will be very happy, I'm sure.[01:41:23] swyx2: And then, I guess, the JSON features, there's so many, like the multi modal features, any of that appeal for you personally? JSON[01:41:33] Louis Knight-Webb: is definitely a big one.[01:41:35] I think it allows you to to kind of standardize how you call different models. Yeah. So instead of having to build, you know, the, and it's not a massive thing to build, but to build the, the, the kind of function calling integration. And then if you want to try Anthropic, you've got to go and like have a completely different way of interpreting the output.[01:41:52] So if you can just stick with JSON across all of your different LLM providers, open source models included. That's definitely Atlas because it allows you to evaluate different models more easily. Yeah, yeah,[01:42:03] swyx2: very excited about that. You are, so you compete in a pretty competitive space with the code assistants.[01:42:09] Code search, code assistants, right? We do. There's Sourcegraph, there's Codium, there's other Codium, there's... Yeah. There's Copilot and so on. You've never ventured into the agent side of things. Yeah. Is that a conscious strategy? Are you waiting for the right time? Are you waiting for the right APIs?[01:42:25] Louis Knight-Webb: I think, I mean, we're seeing traction at the moment with companies that have very large codebases, right?[01:42:32] And it's not something we hear from those users that, you know, when we listen to their problems, it hasn't been, like, an obvious fit to try and build like, maybe an auto GPT type of agent. I'd still say, you know, we're very interested in agents, the pipeline we have at the moment. It's basically GPT in a big while loop with with function calling, which, you know, like, nine months ago definitely did count as an agent, maybe less so now.[01:43:00] So, you know, it's just, it's just customer and problem driven, and we don't, you know, it's not a, it's not a hammer for the nails that we've,[01:43:06] swyx2: we've got. Yeah, so two comments on that. One I think OpenAI has sort of put their flag a little bit in the definition of an agent. They had three things, right?[01:43:14] They had custom knowledge, they had custom instructions, and then I forget the third one. Custom tools, let's just say. Actions.[01:43:23] Louis Knight-Webb: Actions. By that definition, we're doing, yeah, so we've been doing that since about February. That's the, that's the definition.[01:43:31] swyx2: Then the second observation I would say is you talk to developers.[01:43:34] But what if the target customer for agents is not developers, it's the PMs, right? So we[01:43:40] Louis Knight-Webb: definitely see a lot of PMs using the product or people that are defined as like reading more code than they write. So you know, could be designers trying to understand the implications of an interaction. Could be PMs trying to check a contentious time estimate from a developer or something like that.[01:43:59] swyx2: Hi. Low trust environment there. I'm[01:44:03] Louis Knight-Webb: talking for, I've seen some,[01:44:05] swyx2: seen some stuff. Egregious things, yes. Yeah, so, so basically it's still not that appealing for you, but you're, you'll keep a lookout for it. The stateful[01:44:14] Louis Knight-Webb: stuff. I think based on the definition OpenAI, you know, released today, We tick all the boxes, and I think we were one of the earliest adopters of that.[01:44:24] If that's the[01:44:25] swyx2: definition. You just don't brand yourself with the agents?[01:44:27] Louis Knight-Webb: I don't think it's important to users. I don't think, I don't think that's why people use the product. I mean, we're very solutions focused. I think we, we start, a lot of our branding in at the start of the year was about models and, and, you know, we put GPT 4, GPT 3 right there on the front page and now, you know, we've, we've kind of...[01:44:44] Reoriented to be more about solutions. I think that that reflects kind of maturity of the the ICP We're going after and where we are with with[01:44:54] swyx2: sort of stage of company life. Yeah. Yeah Cool. Any other things that you personally know not bloop related are just excited by interested by from today? Any interesting conversations with others?[01:45:07] Loads of really[01:45:08] Louis Knight-Webb: interesting ones. I had a fascinating talk with some safety researchers who They were here? They, so there's a couple of people who were kind of PhD students who had kind of looked at adversarial attacks through fine tuning of models and found that, basically, like, it's such a hard problem to solve.[01:45:29] If you enable fine tuning, it's basically impossible or very difficult to to make it so that you can't disable all the safety features. You can just train it to spit out all sorts of stuff. So that was pretty fascinating. I'm pretty excited about the Waymo we're in right now. Oh[01:45:47] swyx2: yes so we should tell people we're recording in a Waymo.[01:45:50] Haven't been looking at the road the whole time. Is this your first Waymo? It is my first Waymo actually, yes. Thank you for taking my Waymo video. But I know glad, gotta experience this together. I've been a cruise stand the whole time until they ran over someone . So[01:46:04] Louis Knight-Webb: my, so, so my take on cruise, like sample size, 10 cruise journeys before they got shut down and.[01:46:12] The three of them resulted in something popping up on the screen saying that I had been in a collision. And...[01:46:18] swyx2: Did they use the word collision? Yeah, yeah, yeah. That's surprising. I'll show you after that. I took a fair amount of cruises and it didn't, yeah.[01:46:24] Louis Knight-Webb: And so it was the same situation almost every time, which was a car was in front trying to pass you.[01:46:28] And I think they just maybe bumped fenders, or maybe the crash detection was clear. Oh, there was actual contact. I think, in one of the cases, I think there was. In the other two, I didn't feel anything. But it came up saying, like, you've been in a collision, and somebody comes over the intercom things like that.[01:46:42] So, yeah, I mean, out of, you know, ten rides, and three of them ended like that. So I think, yeah, definitely some questions there. But this way moves pretty smooth.[01:46:51] swyx2: Maybe also we're in a better neighborhood for driving, because we're going to Golden Gate. The time of[01:46:57] Louis Knight-Webb: day, that's a really good point. I noticed that all of the ones I took at night, all of the cruises I took at night were fine, and when I took one during rush hour, it was a completely different experience, because the routes it would take, it had this really aggressive, maybe traffic management, something that was going on, so it'd take a long time to get from A to B.[01:47:15] swyx2: Yeah. It often puzzles me, slash, interests me, that Self driving is almost solved. We still have some bumps in the road, sometimes the bumps are human.[01:47:27] Louis Knight-Webb: It's solved in San Francisco, where you've got wide open roads, nobody cycles, and...[01:47:33] swyx2: That's not true. Some people cycle. I live here, excuse me. Some people cycle, some people cycle.[01:47:38] Louis Knight-Webb: I mean, compared to like, compared to London, where you've got, you know, roads half the size, built for horse and carriage, and millions of cyclists, and buses, and all sorts. So I think, you know, it's going to be a long time until we have that same experience of a cruise or Waymo today, London.[01:48:00] swyx2: I understand, London's a tougher neighbourhood, but still, we're 80 percent there, 75 80 percent there, whatever, right?[01:48:07] But, like, and it seems like the stuff that we do in the rest of our lives in terms of AI automation is so primitive compared to this, which is the car that we're sitting in right now. And I find that weird. I find, like, the relative ease, or the relative, like, here ness of this technology is very disparate.[01:48:26] Like, how come it didn't trickle down from self driving to the rest of tech? Yeah,[01:48:30] Louis Knight-Webb: it's interesting, isn't it? Well, I don't know how those pipelines are built. I assume that's the secret sauce, right? The flip side of that argument is like, maybe it's very scary that we know, like now many more people understand the, the mistakes that these, these types of systems can make because we're all getting hands on with, with GPT, and this system is equally as problematic, and we're just oblivious to it because it's a black box.[01:48:58] Almost at[01:48:59] swyx2: your drop off. Check the app[01:49:00] Reid Robinson: for walking directions.[01:49:02] swyx: Okay, Waymo. All right. Well, I think yeah, that's probably... Alright but thanks so much for giving a quick review, and thanks for having me. Yeah, yeah. So that was Louis, whose opinion I think is very reflective of the people who are building code generation or code search type startups based on top of GPC 4.[01:49:21] Shreya Rajpal (Guardrails)[01:49:21] swyx: And as we headed into the Dev Day venue, we actually caught Shreya Rajpal from Guardrails. ai, and there was an interesting... Comparison here in our conversation between how she views the LLM stack versus how OpenAI views the LLM stack. OpenAI actually had a closed door session where they gave some thoughts on how they felt that people should start from prompting and build up into a full software system, and they actually deferred a little bit from Shreya.[01:49:47] Don't worry, all that is recorded. The videos will come out in a week, but you can listen to Shreya's take. So, so we're reviewing AI Engineer Summit.[01:49:54] Shreya Rajpal: Yeah, we're reviewing the AI engineer summit, and it was a very, very well organized conference. And a small thing that I was thinking about is that your swag, Yeah, is it on?[01:50:04] Okay, it's on, yeah. Your speaker swag was, like, not surprisingly, I guess, but like, really weirdly very nice. And it just kind of, like, showcases this attention to detail that I think, like, really kind of permeated the entire, you know, conference. Like, every single decision was very well thought through, and, you know, kind of, like, To a degree of like quality that's very rare to see.[01:50:23] So yeah, it was amazing. I thought you guys did like an absolutely fantastic job. This one[01:50:27] swyx2: mostly goes to Ben. So I'm definitely going to make sure that Ben understands that I really appreciate the work that he does. This is why I couldn't do it myself, you know, I'm mostly the content guy, but I don't, he's the logistics, and he's run conferences for 8 years so that's why I keep working[01:50:41] Shreya Rajpal: with him.[01:50:42] Yeah, I also kind of really enjoyed the 18 minutes, you know? Really? Yeah. Yeah, when I saw that, I was like, huh, is this going to be, you know, is this going to be enough, and like, is that, but it was like... It'd be great. Yeah, yeah, yeah, yeah, yeah I, I think the 18 minutes was actually the right kind of bite size.[01:50:56] swyx: It's optimized for YouTube. Yeah, I see, interesting, okay. Because it's not the in person audience that[01:51:00] Shreya Rajpal: matters. I see, I see. Interesting. Okay. I need to promote my, my video more. Yeah,[01:51:07] swyx: is it, is yours up yet? I don't think it's up yet. It's not up yet? Yeah, we're releasing, we're dripping them out to spread it out.[01:51:14] I see. Okay. Sounds good. Yeah. Thank you for joining us. Maybe in two weeks from now. Okay, sounds good. Okay, so welcome back. Thank you for having me. I think you were guest number five. You were super early. So we're at the after party now. How do you feel about the whole day?[01:51:30] Shreya Rajpal: I'm really excited. I think it was Yeah, I think the excitement in the air with like everybody just like waiting with bated breath to see I guess, like, what gets destroyed, but also, like, what gets really optimized.[01:51:42] I think this is, like, very it feels like you're really part of a movement. And it's Shannon who, like you know, us, like, early people in this space, we gotta stick together because, like, whatever happens to any of our companies, you know, there's such a, like there's such a transformative moment in technology that, you don't care, right?[01:51:57] Yeah, we're all gonna, like, look back on this time, but I, I had a, I had a blast. Like, I really, really enjoyed the the releases. Yeah.[01:52:04] swyx: What got destroyed?[01:52:05] Shreya Rajpal: Ward got destroyed.[01:52:07] swyx: I'm[01:52:07] Shreya Rajpal: mining for hot takes here. Once again, I think my takes are unfortunately very measured this time. I wish I had spicier takes.[01:52:15] Your takes[01:52:16] swyx2: are within the guardrails of[01:52:18] Shreya Rajpal: common behavior, yes. I was, I think retrieval is like the big one for me. I think it's kind of really exciting to see the retrieval baked in. And that's one thing where I'm very interested to see, like, does that pattern become common by model providers? Thank you so much for joining us.[01:52:37] Like open source model providers, and then how much of retrieval do you have to do yourself, you know, and like what remains challenging about retrieval compared to just like, you know, this, this really easy API to just like have it done for[01:52:49] swyx2: you, right? Yeah, I think what they did was effectively build the basic patterns in, but for the more advanced stuff, you're still going to need lang chain, lambda index, all those.[01:52:57] Shreya Rajpal: Yeah, yeah, yeah, yeah. So for the longest time, I believe that in RAG, it's the retrieval that's the hard part, right? Yeah. And then generation is really easy. As long as you have better, like, good retrieval, you can, like, get really, really far, and the generation only gets you, like, a little bit over. And so, I'm really curious to see, like, okay, how, once again, like, how complex do you need it to be in order to start seeing good results?[01:53:17] swyx2: Yeah. Okay. Interesting. And what what are your normal benchmark tests? Testing, like, do you actually have a set of tests that you run whenever you are like exploring something? Or some personal favorites of like use cases that you think are tricky for LLMs to do[01:53:33] Shreya Rajpal: well? I think like a big focus of ours is on hallucinations, so always kind of like checking out hallucination and like conflicting instructions, etc.[01:53:42] is one. Terse responses is another, you know, like how well is it at like not, you know, you ask it a question and here's this 10 point list, and you know, very, very verbose. Do you have a terse[01:53:51] swyx2: response[01:53:51] Shreya Rajpal: validator? Yeah, well not, we don't have it, like, we don't have it publicly, but like we do kind of like check it.[01:53:57] Ah, okay, okay. So I think like those are kind of some of the things.[01:53:59] swyx: There was one, there was one example in the, one of the closed door sessions where they, they, all the answers were two terse. Yeah, yeah, yeah. Where I think everyone laughed when they were like, Can you write a blog post about this?[01:54:08] And the guy, and the GPT said, Sure, I'll do[01:54:11] Shreya Rajpal: it tomorrow. Yeah, yeah, yeah, yeah, yeah. I think like those are, I think those are I'm really, really excited about, Double check. Yeah, just check. I'm really, really excited about JSON generation. Okay. I'm actually kind of surprised to see how long it took them they're[01:54:25] probably just doing constrained decoding under the hood, right? Like constrained generation. Okay. Because they're now saying that guaranteed correct JSON rather than, you know, More correct. Do you get what I'm[01:54:34] swyx: saying? I was, I was parsing through their words. They've never had an issue producing JSON. It's just that sometimes it doesn't fit the JSON schema.[01:54:42] Right? Am I, am I wrong? You would know better than, more than me. No,[01:54:46] Shreya Rajpal: I think there are also issues with, like, producing... I think the, okay, the obvious thing is, like, unbalanced brackets? When it's on context length, I think that's, like, an obvious thing, right? But, like, weird things when you have, like, really long strings, then quotes, et cetera, become kind of weird.[01:54:58] Okay. So I think those are some other ones. Schema is obviously kind of challenging, et cetera, yeah. I think there are, even with function calling, like function calling, at least I haven't played around with it yet today, but previous generations of function calling wouldn't guarantee that your schema is matched.[01:55:13] Which would be an[01:55:14] swyx: issue. And I think they're still not guaranteeing it, because I kept waiting for them to say it. I haven't read any of the public docs or anything. Do you know if they're guaranteeing that it fits the schema, or they're[01:55:23] Shreya Rajpal: like... Oh, that's a good question. I, yeah, that's a good point.[01:55:25] They never say they guarantee it. Yeah, they never said they gu... They, they guaranteed correct JSON, they didn't guarantee if the JSON matches the schema. So,[01:55:32] swyx: okay, you can call JSON loads. Yeah,[01:55:34] Reid Robinson: yeah, yeah. Big[01:55:35] Shreya Rajpal: loop, like, I'm very curious to see, like, once again, if this is a pattern that, you know, all of the other foundation model providers adopt.[01:55:41] And I don't see why not, right? Like, I think for them to kind of, like, own specific decoding models is going to, like, make a lot of sense compared to, you know, like, yeah, a lot of the, a lot of the hacky stuff.[01:55:52] swyx: Yeah, cool. Any other favorites, you know, not, doesn't have to be guardrails related, any favorite conversations, favorite demos, favorite,[01:56:02] Shreya Rajpal: I oh, the GPTs and the assistants.[01:56:04] I think you want to make one for yourself. Yeah, I do want to make one for myself. It doesn't add like, yeah, it's not very Godreels related. I do want to kind of play around with like how well it works with like some of the things we track. But yeah, it was just so fascinating to see the marketplace. I am very, very curious to see, you know, what the marketplace looks like.[01:56:20] Like, is it? Are people going to have, like, really, really vertically specialized things on the marketplace? Like, if you have a generic, you know, sales assistant or something, right? Like, how much, or SQL generator, how much how popular does that become? Versus, like, sales assistant for X vertical at Y stage of the sales process.[01:56:38] Oh my god. Do you know what I mean? Like, it's, it's so easy to do this now. Yeah. That, like, where, at what level of specialization do you need to be to kind of start seeing the results? And that is one thing I'm very excited to see, like, how that, how that pans out.[01:56:51] swyx: It scares me a little bit because it's basically, they said the future of programming is natural language, or something like that.[01:56:56] Yeah. And that's great, but, like, it really is a new platform, a new operating system, almost, that they're that they're creating. And I don't know how to position myself. Not that I have to, because my world is very developer oriented. But this is a whole no code world that you and I[01:57:11] Reid Robinson: don't touch.[01:57:12] Shreya Rajpal: Yeah, yeah, yeah, yeah, yeah, yeah, yeah.[01:57:14] Whoa. Yeah, yeah. I really want to see, like Is there going to be, like, assistance for everything? I'm generally curious to see the impact of this on knowledge work, you know which yeah, like how much of my work, like if I'm getting annoyed by something, is my first instinct going to be like, you know let me just, you know, spend the five minutes to build in a system for this?[01:57:34] Like, is, is that how everybody's now going to start thinking? You know, and that's one thing I kind of really want to see.[01:57:39] swyx: Yeah, that's exciting. Okay. Last question. You spoke at AI Engineer Summit. Let's advertise your talk a little bit and point people to your talk. Yeah, yeah.[01:57:48] Shreya Rajpal: Yeah, so thank you again for inviting me to the AI Engineer Summit.[01:57:51] One of my favorite conferences that I've attended, you know, this year. My talk was about the new paradigms for working with large language models, you know. For building really production ready applications when the technology that you're working with is under, underneath all of it, you know, non deterministic.[01:58:05] Really fascinating thing, which was the OpenAI's talk about building production grade applications, talked about how essential it was to build guardrails as a way to make it do product grade applications. the one[01:58:16] swyx: from today. Yes, the one from today. Which people[01:58:18] Shreya Rajpal: haven't seen yet, but really, really cool talk.[01:58:21] So I think it really validates what we've been saying pretty much since the beginning of the year, which is that you'll get like, You'll get to a certain point, but at that point you need to start adding guardrails to your application if you need to get your users to start, you know, getting value out of what you build out, right?[01:58:37] So,[01:58:38] swyx: I have your chart, and I have their chart. They put guardrails at the first layer. It's not at the end, it's actually right at the beginning for user experience.[01:58:48] Shreya Rajpal: Yeah, that's right, yeah. Yeah, that was kind of interesting to see that they put it as part of the UX. I'm still kind of very candidly, I'm still kind of digesting that.[01:58:56] Like, I think of it as, I think of it as part of the infrastructure. And I don't know if, as it's as much UX as it is, you know, just like one of the components that you need in your stack. Yeah. But I, I, I think the pat, like a lot of what they said today, completely validated, you know, what we've felt for the longest time.[01:59:12] And also what I go really in depth about, like in the talk that I gave, right? Which is that what happens, one, you have the, once you have the bare bones application ready, what is the process? Of actually adding guardrails for what you care about. Like what does that look like? Yeah. You know, what are the risks that you care about?[01:59:27] How do you verify that those risks are happening or not happening? If they are happening, how do you quantify them? And then how do you mitigate them? That was what, what that was what the talk was about, which I really recommend people go and check out.[01:59:37] swyx2: Awesome. Well, you did a great job. We're gonna post the talk soon and thanks.[01:59:41] It's good to see you again. Yeah. Thanks again for inviting me. And that was about all I managed to get before the after party. At the after party, there was actually an after after party thrown by Noose Research.[01:59:51] Alex Volkov (Weights & Biases, ThursdAI) - "Keeping AI Open"[01:59:51] swyx2: So let's hear a little bit about OpenAI versus OpenSourceAI. From Alex Volkov. Okay, so we are in the one day after Dev Day here with Alex.[02:00:01] Hey. Hey. Very, very recognizable voice right now. We don't have to introduce you. Hey, everyone. And we are here to talk about the two parties that happened yesterday. There was one official Dev Day OpenAI afterparty where I interviewed Shreya, who's just before this. And then there's an unofficial one.[02:00:16] For keeping AI open by noose. Yeah. So, what was it like to just compare[02:00:21] Alex Volkov: and contrast? So, let me maybe start with like who noose research is. Oh yeah, yeah, most people haven't heard of it. It's written N O U S O. I mispronounced it now multiple times. It's noose research. It's one of the few...[02:00:33] Organizations online that started like from a discord and then like kept going up until like a significant amount of people are working with them, affiliated with them, of folks who take open source model to its most extreme capability. So collect data, data sets from open source open source and more closed source.[02:00:49] And depending on that, they release like with different licenses and then they find to an open source models that were like released to us from like Lama, for example, and Mistral, which is a French company that recently released a 7d model. And they've been doing this since Lama 1, but recently it really kicked into high gear with Lama 2 releases because Lama 2 ended up being with a commercial license.[02:01:08] So you could actually use this for actual, you know, products and services. And Mistral came out with like a full Apache 2 license with a BitTorrent link. I think you remember that. And so these organizations suddenly became like a very, very important currency in the, in the world of like, Where the whole world of AI is going because they're running local models and many companies love open AI, but either cannot afford this or cannot risk the chance the open AI changes something like what's our dev day.[02:01:35] And so many people are turning on to like, okay, if we want to run our own hardware, how do we actually do this? And you can run it, you can run Llama2 and Mistron, all these models on your own hardware, but then you want to fine tune them for your own purposes. And so how do you actually fine tune? And now organizations like News Research was probably the biggest one, Alignment Labs, Shout out to Austin and folks from from alignment labs skunkworks, and many of these like people come up and say, hey, we have the know how and we only started learning about this like eight months ago, six months ago themselves, but now they're like the You.[02:02:06] Specialized more people that find two models and actually release the best kind of models on the Hug and Face open source leaderboard.[02:02:14] swyx2: Yeah. And in my knowledge, the two models that I keep hearing about, one is Hermes. And he's recently searched the base model for Hermes from Lama to Mistral.[02:02:24] Because apparently it's better. Hermes is like an instruction dataset, 900, 000 instructions. I don't really know where it's from. Maybe I don't want to know. They also do some fun models. There's like a mystical model that they[02:02:35] Alex Volkov: do.[02:02:36] swyx2: Trismestos, yeah. Some stuff like that. I think it's actually a little bit weird that they keep releasing models.[02:02:42] They release like three models a week. It's insane. Right? And it's very hard to keep up. Like, I'm like, okay, which one is actually the one that I should pay attention to? Yeah. So[02:02:50] Alex Volkov: first of all, you're welcome to join Thursday Eye and then we talk about all the models every week. Yes. It's kind of...[02:02:55] Interesting to that if I do like a recap for a month, the beginning of the month, most of the updates don't matter, because like every, every, This,[02:03:02] swyx2: I'm doing monthly, and I, I feel this, like, I'm doing this, I'm doing this for historical posterity, like, Five years from now, people want to look back, then they can look at my notes, because I only have twelve.[02:03:14] Alex Volkov: Yeah, nobody's gonna look at your notes, they're gonna have a GPT trained on your notes answering everything. I have, yeah, I'm doing like every week, and every week we're talking about like, this model outperforms that model like significantly, and we're noticing significant changes from week to week.[02:03:27] Literally in the span of a month we went from a 33 billion parameter model, which is big, And parameter count is not everything there is where you can have a smaller model with like larger, longer training that actually will perform better than whatever, but we're noticing smaller and smaller models doing outperforming bigger ones significantly.[02:03:43] Zephyr from Hug and Face outperformed Llama 70B and Zephyr is like only like a 70B model. On some things. On some things, for sure. And so this is very interesting because like it's really hard to evaluate. Evaluation frameworks are bad. Everybody's saying that they're not representing of anything.[02:03:56] People can fine tune and over tune on them. And so, there's this whole kind of subculture of open source mostly on Discord, some of them on, on X and Twitter spaces. And for some reason, but I find it very humbling and incredible. They also hung out in Thursday. I, and so that's how I got to this.[02:04:13] That's how I got to meet like news research folks Ticknium, Imozilla, and they organized the, the counter party event last night together with some other EAC people that we know from Twitter as well. Including Mark, Jason. So apparently he was supposed to, I didn't see him. Oh, okay. But like he was supposed[02:04:29] swyx2: to, I saw a photo with a bald head of a big guy.[02:04:32] So I was like, is that Mark? I don't, I don't know. Anyway, but the opening eye party was at a art museum. Mm-Hmm. . And then the news research party was at a[02:04:39] Alex Volkov: club as a club? Yes. At Folsom. Folsom Street In San Francisco Club. Yeah. Yeah. 10 15 falls, I think. Sure. Open the eye was a very like. Highbrow, buttoned up, event,[02:04:49] swyx2: post event.[02:04:50] Yeah, there was a live band, someone playing jazz.[02:04:54] Alex Volkov: Which, I think I mentioned this once, it was too loud. We want to talk, we don't want to listen to music. No, no, no, we're just[02:04:59] swyx2: old. Everything is too loud.[02:05:02] Alex Volkov: And then, it was like a lot of people, a lot of networking, a lot of people trying to get together, maybe do business together.[02:05:07] Very, OpenAI actually showed up. A lot of people, we, we stood in line, there was a long line for the Magna Millers to, to step in and then everybody like passing us around was like open the eye employee that passing like straight through. Yeah. And then that ended around eight, which was like the standard San Francisco like buttoned up.[02:05:24] Oh yeah. That's when you go to bed. That's when you go to bed. And that's when the other party kind of started. Yeah. Yeah. And I think they just seized the opportunity 'cause everybody's in town for the open AI stuff. Yeah. Why not? Make a splash, an announcement for, like, for open sourcing AI. So literally, the invite was keepaifree.[02:05:41] com, which was the website, and the invite was keepaiopen. com. And you had to register, you had to go in there, and this was, to me, an incredible... Kind of show of Twitter in real life. So all of the folks who follow Mark Andreesen, he recently stepped into this thing with like the techno optimism stuff.[02:06:00] He started to boost the effective acceleration folks. And so there's a lot of like signature stuff from that like ecosystem on Twitter. There's like, don't thread on me with like, you don't take away my GPUs. There's like all these signs across the club. The, it's a very visual club as well. So we're, the DJs is a whole, like a three D projected thing.[02:06:21] So there's like a bunch of like art and like live things about KPI open. I, I found it like very, very super cool. I, I, I'll, I have to tell you tidbit I saw me and Killian were there from open interpreter. We saw two people with lab coats. It was like, what's the deal with nap codes? So we went to Nest and they just said, Hey, we just like came back from our work where we work on semiconductors. We're actually like touching chips, whatever, just like didn't change out of it. And my head was like so incredible in the keep AI open GPU kind of a poor party. We have people who literally work on superconductors came from the work, like they're working on chips.[02:06:53] Yeah, yeah. Semiconductors are[02:06:54] swyx2: superconductors, very different thing. I think semiconductors. Yeah, we had that superconductor episode a while back. I think people are still recovering.[02:07:03] Alex Volkov: I'm personally still recovering from that. That was the whole thing for me, yeah.[02:07:06] swyx2: So is news research like vibes? You know, like, what is the mission apart from to keep publishing open source models?[02:07:15] Alex Volkov: I think you'll have to get some news people to actually speak, like, about the mission, about the actual product, but as far as I understand this no matter how much the product side will be, and there will likely be, there's so many people that are doing, like, so incredible stuff that people notice, like, you know so no matter how, like, how much of the business side will be, they're, like, committed to fully open source as much as possible, including data sets, including models that are, like, TraceMasters, for example, their model that's like trained on the occult and the physical and metaphysical, you can't expect OpenAI to let you.[02:07:48] Talk with a model, they'll answer with like mystical questions, mystical stuff.[02:07:51] swyx2: Astrology, Halloween.[02:07:54] Alex Volkov: So you're very like easy into the astrology and Halloween. They're talking about like you can ask this model about the resurrection, right? Like all of the occult like craziness that they've collected, OpenAI will not let you do that.[02:08:04] And so there's, I think OpenAI will not let you do it by default because they have lawyers and they don't get sued. Recently they announced the protection shield thing. So you won't get sued because of... They're models, so they're, them, Entropic, all these big companies, it's very important for them to protect the outputs and the models.[02:08:20] Here, these folks are like, Hey, if you want to build a model, fine tune this, we're going to teach you how. Jump on our discord. We're going to help you with producing like the biggest models. And then if, you know, there's going to be like a financial aspect to this as well. If you're a company that wants to run this, we'll also help you do that.[02:08:35] swyx2: Yeah, so it's the same as stability, basically it's, it's, it's, that's from what it, from talking to him that's what I gather. Yeah. Cool. Anything else that people should know about the party, noose? I[02:08:45] Alex Volkov: found this whole day to be like a very singular AI day, and we don't get many of this. GPT 4, I think, was the biggest one previously.[02:08:53] Yeah, March. It was like a singular, March 14th, that's when Thursday Eye started. We started talking about this every week. This was a singular day in San Francisco. This, like, started pregame. Party with Swyx and some other folks that I, I got to feel like a little bit of San Francisco. And then Dev Day was incredible.[02:09:08] We just heard from Simon. There was like a garage that they made into a venue event, probably custom venue event on the fly, which like just talks to how much they can pull off. It felt to me that like this Dev Day event and then the following party, it felt a little bit like Almost like an Apple thing, where like, it's going to be a yearly thing that people will like, try to get in as much as possible.[02:09:28] One thing to note that in the other party, there were many people who didn't get in to this party. And so, you know, they were watching from like a a party.[02:09:36] swyx2: Yeah, this this office right here.[02:09:37] Alex Volkov: This office people watched here, and people watched in, in the life space that we, we... Yeah, 8, 000[02:09:42] swyx2: people tuned in to our spaces.[02:09:43] 8, 000 people[02:09:44] Alex Volkov: tuned in? I didn't even have a[02:09:45] swyx2: chance to look at it. I always want to know the number. Oh, wow. So it, it shows the relative level of interest, and you know, like, so, quoted 22, 000. Mm. And this is 8, 000. Yeah. Just relative. Interest. Yeah, there's[02:09:56] Alex Volkov: like two spaces as well. Robert Skobel, he stole the thunder a little bit.[02:10:00] He stole some audience from us. Shout out to Robert. And I think that like it's, it was a singular day. And I think the News Research, KeepOpenSourceOpen, EAC, Mark Andreesen, like all these things together also added to the top of this. Because like it happened in the same day, one on top of another in the same place, San Francisco.[02:10:15] I find it incredible. I will, you know, definitely come back next year. Yeah. Okay. Yeah.[02:10:20] swyx2: Well I think you'll be back sooner than that. Yeah, probably. There'll be other things going on. All right. Thanks. Awesome. All right.[02:10:26] Rahul Sonwalkar (Julius AI) - Advice for Founders[02:10:26] swyx2: Last but not least, we go back all the way to the Newton, where I started this podcast, where we checked in with Rahul Samwalka, better known as Rahul Ligma, who just celebrated his one year anniversary as one of the biggest memes and celebrities in San Francisco.[02:10:43] But by day, he's also the CEO and co founder of Julius AI. What's up, Swyx? Hey good to see you. It is one day after Dev Day, and we all had a chance to process. How do you feel? What's what's your top takes? That[02:10:57] Rahul Ligma: was awesome. I got to see a bunch of really smart people who are building cool things with OpenAI, GPT, Dolly.[02:11:03] The event was very well put together. The keynote was awesome. The energy in the room was crazy. And I could see real time social media firing up with all these takes. Overall, I think it was a good, good day. Yeah, I[02:11:15] swyx2: interviewed Surya Dantiluri. Yeah. I think you know him. He was like Sama just killed my startup.[02:11:22] And it was almost true for him. Cause he has a bunch of plugins. And plugins are kind of deprecated. Yeah, yeah,[02:11:30] Rahul Ligma: yeah. The plugin thing was interesting because it was, it's going to be deprecated, but[02:11:35] swyx2: they just[02:11:37] Rahul Ligma: accidentally turned it off yesterday. Yeah, so he freaked out a bit. He freaked out, and then they brought it back up.[02:11:42] It's[02:11:42] swyx2: Yeah. Yeah. So, top features that you're interested in, that you want to explore more.[02:11:48] Rahul Ligma: I think people are super psyched about the assistance API, but personally, if you ask me, two things that I am most excited about is turbo. Yeah. The speed is, is crazy.[02:11:57] swyx2: And... Have you actually, have you measured, you know, do you know any, like, rough measures?[02:12:01] Because I don't think they actually ever mentioned the speed relative difference. I[02:12:06] Rahul Ligma: started noticing the speed difference in chat GPT, actually, like, a few weeks ago. Oh, I[02:12:11] swyx2: see. So they already slowly eased[02:12:12] Rahul Ligma: this into it. Yeah, yeah. And I saw, like, takes on Twitter that, did anyone notice chat GPT get much faster?[02:12:18] And I noticed it too. Yeah. But, so it's turbo, it was exciting, but the second thing that's exciting is multiple function calling, and then the JSON output formatting. I think as developers are building on... The dev API. So that's the thing that's super exciting to me. You know, of course there's vision stuff, there's code interpreter as a tool in the API.[02:12:40] But, I think what will bring the most applications is actually the, the speed. Because there are so many things, if you look at our numbers, on Julius. are not patient. They want an answer, and they want an answer quick. And we see clearly, if you can get an answer to them a few seconds faster, there's a clear difference in the conversion.[02:13:05] So, speed is going to be big. What is conversion for[02:13:07] swyx2: you?[02:13:07] Rahul Ligma: Is that just paying? Oh, no, it's like, from first message to second message. I see. So we do code gen, and then we run the code, and then the code has an output, and the user asks a second message, and we can just see the funnel, where, if it's faster, the code runs faster.[02:13:24] And the second thing is multiple function calling. I think you're basically telling the AI that, so I think people misunderstand function calling. It's essentially tool use. And if you can tell the AI, hey, you can give me multiple tools to use at once, I think that's going to unlock different applications than before.[02:13:44] Because before it was just like, okay, this is a task, tell me one tool and what's the input for it. But if the AI can now. Use multiple tools in parallel. You can first of all have more specialized tools. And then get more specialized instructions for each tool. Yeah. It's just going to unlock a lot of cool applications that previously weren't possible.[02:14:04] swyx2: There was a practical limit in the number of tools that you can give it, right? So we had this discussion in March, February March, April, when they released the function API. That is subject to context window. Jason Schema itself. Yeah. Does that change at all? Or I don't know if you, I, you[02:14:19] Rahul Ligma: know, I don't.[02:14:21] Yeah. But what I noticed though, before, even before was that more functions and more options just confused it. And that's what I want to play with next is like, okay, what's the breaking point? I see, like, does more options, you know, confuse it? Does it[02:14:35] swyx2: make it Would you, would you use multiple function calls as well, or?[02:14:39] Oh, totally, totally. Is that just theoretical?[02:14:40] Rahul Ligma: No, no, no. I have a direct application for it right now. One of them is oftentimes, GPT writes code, and then we run that code, and we realize that, oh, from GPT's last knowledge update, that module in Python has changed. It has new functions, new APIs. So today, the way we do it is, when the error happens, we tell GPT, Okay, you can go look up.[02:15:01] New documentation, and then fix that error. But with multiple function calling, the way we would do it is like, Give me the code, but then also give me a documentation lookup. And then when the error happens, I can just quickly fix that without another GPT call. And then keep moving. But I mean, in general, it's just like, multiple to use to me is just so exciting as a developer.[02:15:23] And I wish people were talking more about this.[02:15:26] swyx2: Yeah, I mean people are still coming to terms with just like the base model and prompt engineering and all that. That's still important, but for engineers, I think you should explore these other advanced features. True. Yeah, yeah. Anything on the multi modality side that you're interested in?[02:15:39] I mean,[02:15:40] Rahul Ligma: vision will be super interesting for sure. And we have this functionality in Julius right now where you can generate React and HTML components.[02:15:49] swyx2: Like v0? I think Matt was showing me. Yeah, a little bit of that demo. Yeah, yeah. We have been hacking on it[02:15:56] Rahul Ligma: a lot. I think the missing piece here is that, well, you have an engineer who knows how to react, and they probably wouldn't find this useful, but if I can allow, like, anyone in the world to just draw a mock up on a piece of paper, and then run that, and have the version, yeah, demoed, yeah, yeah turn it into, like, actual components I could use on a webpage, that'd be sick.[02:16:19] And what's even more sick is, like, have the feedback loop where you take a screenshot of the page generated and then feed that screenshot back in division, and then come up with more instruction and have that loop. Yeah. Wow. Like a self-improving webpage. Isn't that crazy? Yeah. I'm, I'm so[02:16:35] swyx2: excited. Yeah.[02:16:36] Yeah. So in my mind, Julius is very data focused. I, I, I, by, by the way, I didn't introduce you, I didn't introduce Juli. I was just gonna do it separately. Yeah. But, people know who you are. . Yeah. You're, you're, you have a Wikipedia page. Yeah. You just passed your one year anniversary as Rahing Ma.[02:16:50] Thank you. By the way, any, any fun things happen on the anniversary or one of the fun things I ilio said, IA recognize you on the spot. Oh. IA[02:16:56] Rahul Ligma: was like, oh my God, this is, ah your famous or whatever. And no, these guys are so awesome. Like, they're so humble. But anyhow, on the first one year anniversary, nothing really, like, it's, I mean, you knew about it a week[02:17:07] swyx2: before.[02:17:08] I like to set anniversary dates. That's awesome. Because it reminds people of the passage of time. Like, it's like, wow, s**t, has that been a year? Yeah. And then you're like, I think it motivates, it motivates me more than, like, Memento Mori. Like, yeah, you know, sometimes you're out of date. But it reminds me to spend my years wisely.[02:17:27] To do interesting things with the time[02:17:28] Rahul Ligma: that I have. Momentum is kind of depressing whereas[02:17:31] swyx2: this is, this is like, oh yeah, did you know that like one year ago we had this thing? Yeah okay cool, but then Julius you, data analysis chat thing basically Code Interpreter is how I think about it.[02:17:42] And also you just cross the 100, 000 users? You have delivery modes across your plug in as well as a chatbox, like a dedicated web app? Yep. Okay. Anything else that people should[02:17:54] Rahul Ligma: know? Well, the, our vision is, you know, writing code is super fundamental to doing things. You could not only automate a bunch of tasks in your life which is writing code, but also it's how you how you just, like, interact with the universe, right?[02:18:10] You can, you have. Code that brings you a way more car and picks you up and just drops you off somewhere. And I think allowing these language models to write code and do things for you is really powerful. And data announces this application that we're most excited about right now because that's what it's good at, immediately.[02:18:27] But just on Friday we launched FFmpeg support. And there were people trying to upload videos, turn those videos into GIFs, or like, take a YouTube video and turn it into a... You know, short summary and all these different cool use cases that we didn't truly, like, hard code into Julius. We just told it, hey, now you can run FFmpeg and you can run ITDLP and MoviePy and all these different things.[02:18:49] Do these tasks for me. And then people were just, like, organically describing those things. There's this guy, TDM, on Twitter, CTOJr. And he took some meme video and put it on my own tweet, overlaid on my own tweet, that. And then that got a bunch of likes. And I was like, dude, like, this is the first one that gets a lot of likes on, you know, FFmpeg on Julius.[02:19:11] So[02:19:12] swyx2: that's Julius. That has a lot of meme potential. It has a lot of[02:19:13] Rahul Ligma: meme potential, but that's not what we're going for. Yeah. You know, it's just like, letting people, like, do things.[02:19:18] swyx2: Your target market is, like, the FD, the enterprise? It's actually individuals who have data.[02:19:25] Rahul Ligma: And[02:19:26] swyx2: they just want to drop[02:19:27] Rahul Ligma: academics, a lot of academics, actually.[02:19:29] Yeah. A lot of academics, a lot of students, researchers, any kind of CSV, Excel data, you can just dump into Julius and then have it analyzed for you. We have this video coming out in a few days where you can now actually train a nano GPT. On Julius, so you can give it, Hey, here's the good arriba for[02:19:46] swyx2: carpi.[02:19:47] So yeah, it has a, you has, you have GPUs to train it on, or you just training in CPU CPU minutes. GPUs. Yeah. Yeah, that's true. That's true. Yeah. I, me cario like that. , yeah. Yeah, yeah. . Okay, cool. So the, the thing I really wanna sort of ask you as a founder on is, you know, I think there's always this existential threat about OpenAI building your features, right?[02:20:04] Yeah. In a way, so like the, the number two default bot in the, in the GPT app store Yeah. Is data analysis. Yeah, and people can build their own by customizing and adding code interpreter. Yeah, although I think there's also opportunities for you. So on the roadmap that they presented in the closed session, they also said you can bring your own code interpreter.[02:20:25] Yeah, so like how are you thinking about that?[02:20:28] Rahul Ligma: I mean As a founder, or as, so, who's the audience? Is it like, other founders, or is it? Other founders,[02:20:36] swyx2: and people are just interested in how you are, you're processing this. Yeah. I mean, I think it's a very interesting story of processing this live, because the news just dropped yesterday.[02:20:45] Rahul Ligma: Yeah, totally. Well, so, the story behind Julius is that we actually launched Julius three months after Code Interpreter was announced, and a few weeks after it was rolled out to everyone else in the world. Yeah. So... We, we, we were number two. And even then we got 100, 000 users. Because I think there's a lot of work to do to get something to work properly.[02:21:07] And there's a bunch of examples of this on the internet. So if I'm talking to founders, what I'll tell them is, Man, so many people give up before even getting started. And that happens a lot. Don't do that. Sure you can change your idea. You can find new things to work on. But. The way I'm processing is that, wait, we were actually, we launched after Code Interpreter came out.[02:21:28] And, there's a hundred thousand people who think Julius is better than Code Interpreter. Or use[02:21:33] swyx2: it. Or just try it out. Yeah. Or try it out.[02:21:36] Rahul Ligma: And use it over Code Interpreter. And, there's like a lot of work to do. Like, for example, the FFmpeg stuff we launched on Friday. Mm. Or the HTML stuff. Or React, you know, React component stuff.[02:21:46] All these different things. To get them to work. It takes some effort. How I'm processing it? I mean, you know, that's like, that's what startups are all about. It's like risk, right? If you, if you want to build a risk free startup, you probably don't want to work on startups. Yeah, just go get a job.[02:22:02] Just go get a job. Exactly. So I'm having so much fun. The way I'm thinking about this is like, whoa, there's all these new different things I could do now. I could build. That's so exciting to me. And I'm pumped.[02:22:14] swyx2: Yeah. Awesome. That's it. Any last words? Call to action?[02:22:18] Rahul Ligma: Call to action. Let's go build some cool things and get a bunch of users.[02:22:23] swyx2: Let's do it, guys. Yeah. Alright. Awesome. Thanks so much. Thanks, Swyx. I think that's a meme that we can all get behind. Let's go build things for a bunch of users with AI. Get full access to Latent Space at www.latent.space/subscribe
02:22:3308/11/2023
Beating GPT-4 with Open Source LLMs — with Michael Royzen of Phind
At the AI Pioneers Summit we announced Latent Space Launchpad, an AI-focused accelerator in partnership with Decibel. If you’re an AI founder of enterprise early adopter, fill out this form and we’ll be in touch with more details. We also have a lot of events coming up as we wrap up the year, so make sure to check out our community events page and come say hi!We previously interviewed the founders of many developer productivity startups embedded in the IDE, like Codium AI, Cursor, and Codeium. We also covered Replit’s (former) SOTA model, replit-code-v1-3b and most recently had Amjad and Michele announce replit-code-v1_5-3b at the AI Engineer Summit.Much has been speculated about the StackOverflow traffic drop since ChatGPT release, but the experience is still not perfect. There’s now a new player in the “search for developers” arena: Phind.Phind’s goal is to help you find answers to your technical questions, and then help you implement them. For example “What should I use to create a frontend for a Python script?” returns a list of frameworks as well as links to the sources. You can then ask follow up questions on specific implementation details, having it write some code for you, etc. They have both a web version and a VS Code integrationThey recently were top of Hacker News with the announcement of their latest model, which is now the #1 rated model on the BigCode Leaderboard, beating their previous version:TLDR Cheat Sheet:* Based on CodeLlama-34B, which is trained on 500B tokens* Further fine-tuned on 70B+ high quality code and reasoning tokens* Expanded context window to 16k tokens* 5x faster than GPT-4 (100 tok/s vs 20 tok/s on single stream)* 74.7% HumanEval vs 45% for the base modelWe’ve talked before about HumanEval being limited in a lot of cases and how it needs to be complemented with “vibe based” evals. Phind thinks of evals alongside two axis: * Context quality: when asking the model to generate code, was the context high quality? Did we put outdated examples in it? Did we retrieve the wrong files?* Result quality: was the code generated correct? Did it follow the instructions I gave it or did it misunderstand some of it?If you have bad results with bad context, you might get to a good result by working on better RAG. If you have good context and bad result you might either need to work on your prompting or you have hit the limits of the model, which leads you to fine tuning (like they did). Michael was really early to this space and started working on CommonCrawl filtering and indexing back in 2020, which led to a lot of the insights that now power Phind. We talked about that evolution, his experience at YC, how he got Paul Graham to invest in Phind and invite him to dinner at his house, and how Ron Conway connected him with Jensen Huang to get access to more GPUs!Show Notes* Phind* BigScience T0* InstructGPT Paper* Inception-V3* LMQL* Marginalia Nu* Mistral AI* People:* Paul Graham (pg)* Ron Conway* Yacine Jernite from HuggingFace* Jeff DelaneyTimestamps* [00:00:00] Intros & Michael's early interest in computer vision* [00:03:14] Pivoting to NLP and natural language question answering models* [00:07:20] Building a search engine index of Common Crawl and web pages* [00:11:26] Releasing the first version of Hello based on the search index and BigScience T0 model* [00:14:02] Deciding to focus the search engine specifically for programmers* [00:17:39] Overview of Phind's current product and focus on code reasoning* [00:21:51] The future vision for Phind to go from idea to complete code* [00:24:03] Transitioning to using the GPT-4 model and the impact it had* [00:29:43] Developing the Phind model based on CodeLlama and additional training* [00:32:28] Plans to continue improving the Phind model with open source technologies* [00:43:59] The story of meeting Paul Graham and Ron Conway and how that impacted the company* [00:53:02] How Ron Conway helped them get GPUs from Nvidia* [00:57:12] Tips on how Michael learns complex AI topics* [01:01:12] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence and Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:19]Swyx: Hey, and today we have in the studio Michael Royzen from Phind. Welcome. [00:00:23]Michael: Thank you so much. [00:00:24]Alessio: It's great to be here. [00:00:25]Swyx: Yeah, we are recording this in a surprisingly hot October in San Francisco. And sometimes the studio works, but the blue angels are flying by right now, so sorry about the noise. So welcome. I've seen Phind blow up this year, mostly, I think since your launch in Feb and V2 and then your Hacker News posts. We tend to like to introduce our guests, but then obviously you can fill in the blanks with the origin story. You actually were a high school entrepreneur. You started SmartLens, which is a computer vision startup in 2017. [00:00:59]Michael: That's right. I remember when like TensorFlow came out and people started talking about, obviously at the time after AlexNet, the deep learning revolution was already in flow. Good computer vision models were a thing. And what really made me interested in deep learning was I got invited to go to Apple's WWDC conference as a student scholar because I was really into making iOS apps at the time. So I go there and I go to this talk where they added an API that let people run computer vision models on the device using far more efficient GPU primitives. After seeing that, I was like, oh, this is cool. This is going to have a big explosion of different computer vision models running locally on the iPhone. And so I had this crazy idea where it was like, what if I could just make this model that could recognize just about anything and have it run on the device? And that was the genesis for what eventually became SmartLens. I took this data set called ImageNet 22K. So most people, when they think of ImageNet, think of ImageNet 1K. But the full ImageNet actually has, I think, 22,000 different categories. So I took that, filtered it, pre-processed it, and then did a massive fine tune on Inception V3, which was, I think, the state of the art deep convolutional computer vision model at the time. And to my surprise, it actually worked insanely well. I had no idea what would happen if I give a single model. I think it ended up being 17,000 categories approximately that I collapsed them into. It worked so well that it actually worked better than Google Lens, which released its V1 around the same time. And on top of this, the model ran on the device. So it didn't need an internet connection. A big part of the issue with Google Lens at the time was that connections were slower. 4G was around, but it wasn't nearly as fast. So there was a noticeable lag having to upload an image to a server and get it back. But just processing it locally, even on the iPhones of the day in 2017, much faster. It was a cool little project. It got some traction. TechCrunch wrote about it. There was kind of like one big spike in usage, and then over time it tapered off. But people still pay for it, which is wild. [00:03:14]Swyx: That's awesome. Oh, it's like a monthly or annual subscription? [00:03:16]Michael: Yeah, it's like a monthly subscription. [00:03:18]Swyx: Even though you don't actually have any servers? [00:03:19]Michael: Even though we don't have any servers. That's right. I was in high school. I had a little bit of money. I was like, yeah. [00:03:25]Swyx: That's awesome. I always wonder what the modern equivalents kind of "Be my eyes". And it would be actually disclosed in the GPT-4 Vision system card recently that the usage was surprisingly not that frequent. The extent to which all three of us have our sense of sight. I would think that if I lost my sense of sight, I would use Be My Eyes all the time. The average usage of Be My Eyes per day is 1.5 times. [00:03:49]Michael: Exactly. I was thinking about this as well, where I was also looking into image captioning, where you give a model an image and then it tells you what's in the image. But it turns out that what people want is the exact opposite. People want to give a description of an image and then have the AI generate the image. [00:04:04]Alessio: Oh, the other way. [00:04:06]Michael: Exactly. And so at the time, I think there were some GANs, NVIDIA was working on this back in 2019, 2020. They had some impressive, I think, face GANs where they had this model that would produce these really high quality portraits, but it wasn't able to take a natural language description the way Midjourney or DALL-E 3 can and just generate you an image with exactly what you described in it. [00:04:32]Swyx: And how did that get into NLP? [00:04:35]Michael: Yeah, I released the SmartLens app and that was around the time I was a senior in high school. I was applying to college. College rolls around. I'm still sort of working on updating the app in college. But I start thinking like, hey, what if I make an enterprise version of this as well? At the time, there was Clarify that provided some computer vision APIs, but I thought this massive classification model works so well and it's so small and so fast, might as well build an enterprise product. And I didn't even talk to users or do any of those things that you're supposed to do. I was just mainly interested in building a type of backend I've never built before. So I was mainly just doing it for myself just to learn. I built this enterprise classification product and as part of it, I'm also building an invoice processing product where using some of the aspects that I built previously, although obviously it's very different from classification, I wanted to be able to just extract a bunch of structured data from an unstructured invoice through our API. And that's what led me to Hugnyface for the first time because that involves some natural language components. And so I go to Hugnyface and with various encoder models that were around at the time, I used the standard BERT and also Longformer, which came out around the same time. And Longformer was interesting because it had a much bigger context window than those models at the time, like BERT, all of the first gen encoder only models, they only had a context window of 512 tokens and it's fixed. There's none of this alibi or ROPE that we have now where we can basically massage it to be longer. They're fixed, 512 absolute encodings. Longformer at the time was the only way that you can fit, say, like a sequence length or ask a question about like 4,000 tokens worth of text. Implemented Longformer, it worked super well, but like nobody really kind of used the enterprise product and that's kind of what I expected because at the end of the day, it was COVID. I was building this kind of mostly for me, mostly just kind of to learn. And so nobody really used it and my heart wasn't in it and I kind of just shelved it. But a little later, I went back to HugMeFace and I saw this demo that they had, and this is in the summer of 2020. They had this demo made by this researcher, Yacine Jernite, and he called it long form question answering. And basically, it was this self-contained notebook demo where you can ask a question the way that we do now with ChatGPT. It would do a lookup into some database and it would give you an answer. And it absolutely blew my mind. The demo itself, it used, I think, BART as the model and in the notebook, it had support for both an Elasticsearch index of Wikipedia, as well as a dense index powered by Facebook's FAISS. I think that's how you pronounce it. It was very iffy, but when it worked, I think the question in the demo was, why are all boats white? When it worked, it blew my mind that instead of doing this few shot thing, like people were doing with GPT-3 at the time, which is all the rage, you could just ask a model a question, provide no extra context, and it would know what to do and just give you the answer. It blew my mind to such an extent that I couldn't stop thinking about that. When I started thinking about ways to make it better, I tried training, doing the fine tune with a larger BART model. And this BART model, yeah, it was fine tuned on this Reddit data set called Eli5. So basically... [00:08:02]Alessio: Subreddit. [00:08:03]Swyx: Yeah, subreddit. [00:08:04]Alessio: Yeah. [00:08:05]Michael: And put it into like a well-formatted, relatively clean data set of like human questions and human answers. And that was a really great bootstrap for that model to be able to answer these types of questions. And so Eli5 actually turned out to be a good data set for training these types of question answering models, because the question is written by a human, the answer is written by a human, and at least helps the model get the format right, even if the model is still very small and it can't really think super well, at least it gets the format right. And so it ends up acting as kind of a glorified summarization model, where if it's fed in high quality context from the retrieval system, it's able to have a reasonably high quality output. And so once I made the model as big as I can, just fine tuning on BART large, I started looking for ways to improve the index. So in the demo, in the notebook, there were instructions for how to make an Elasticsearch index just for Wikipedia. And I was like, why not do all of Common Crawl? So I downloaded Common Crawl, and thankfully, I had like 10 or $15,000 worth of AWS credits left over from the SmartLens project. And that's what really allowed me to do this, because there's no other funding. I was still in college, not a lot of money, and so I was able to spin up a bunch of instances and just process all of Common Crawl, which is massive. So it's roughly like, it's terabytes of text. I went to Alexa to get the top 1,000 websites or 10,000 websites in the world, then filtered only by those websites, and then indexed those websites, because the web pages were already included in Dump. [00:09:38]Swyx: You mean to supplement Common Crawl or to filter Common Crawl? [00:09:41]Michael: Filter Common Crawl. [00:09:42]Alessio: Oh, okay. [00:09:43]Michael: Yeah, sorry. So we filtered Common Crawl just by the top, I think, 10,000, just to limit this, because obviously there's this massive long tail of small sites that are really cool, actually. There's other projects like, shout out to Marginalia Nu, which is a search engine specialized on the long tail. I think they actually exclude the top 10,000. [00:10:03]Swyx: That's what they do. [00:10:04]Alessio: Yeah. [00:10:05]Swyx: I've seen them around, I just don't really know what their pitch is. Okay, that makes sense. [00:10:08]Michael: So they exclude all the top stuff. So the long tail is cool, but for this, that was kind of out of the question, and that was most of the data anyway. So we've removed that. And then I indexed the remaining approximately 350 million webpages through Elasticsearch. So I built this index running on AWS with these webpages, and it actually worked quite well. You can ask it general common knowledge, history, politics, current events, questions, and it would be able to do a fast lookup in the index, feed it into the model, and it would give a surprisingly good result. And so when I saw that, I thought that this is definitely doable. And it kind of shocked me that no one else was doing this. And so this was now the fall of 2020. And yeah, I was kind of shocked no one was doing this, but it costs a lot of money to keep it up. I was still in college. There are things going on. I got bogged down by classes. And so I ended up shelving this for almost a full year, actually. When I returned to it in fall of 2021, when BigScience released T0, when BigScience released the T0 models, that was a massive jump in the reasoning ability of the model. And it was better at reasoning, it was better at summarization, it was still a glorified summarizer basically. [00:11:26]Swyx: Was this a precursor to Bloom? Because Bloom's the one that I know. [00:11:29]Alessio: Yeah. [00:11:30]Michael: Actually coming out in 2022. But Bloom had other problems where for whatever reason, the Bloom models just were never really that good, which is so sad because I really wanted to use them. But I think they didn't turn on that much data. I think they used like the original, they were trying to replicate GPT-3. So they just use those numbers, which we now know are like far below Chinchilla Optimal and even Chinchilla Optimal, which we can like talk about later, like what we're currently doing with MIMO goes, yeah, it goes way beyond that. But they weren't trying enough data. I'm not sure how that data was clean, but it probably wasn't super clean. And then they didn't really do any fine tuning until much later. So T0 worked well because they took the T5 models, which were closer to Chinchilla Optimal because I think they were trained on also like 300 something billion tokens, similar to GPT-3, but the models were much smaller. I think T0 is the first model that did large scale instruction tuning from diverse data sources in the fall of 2021. This is before Instruct GPT. This is before Flan T5, which came out in 2022. This is the very, very first, at least well-known example of that. And so it came out and then I did, on top of T0, I also did the Reddit Eli5 fine tune. And that was the first model and system that actually worked well enough to where I didn't get discouraged like I did previously, because the failure cases of the BART based system was so egregious. Sometimes it would just miss a question so horribly that it was just extremely discouraging. But for the first time, it was working reasonably well. Also using a much bigger model. I think the BART model is like 800 million parameters, but T0, we were using 3B. So it was T0, 3B, bigger model. And that was the very first iteration of Hello. So I ended up doing a show HN on Hacker News in January 2022 of that system. Our fine tune T0 model connected to our Elasticsearch index of those 350 million top 10,000 common crawl websites. And to the best of my knowledge, I think that's the first example that I'm aware of a LLM search engine model that's effectively connected to like a large enough index that I consider like an internet scale. So I think we were the first to release like an internet scale LLM powered rag search system In January 2022, around the time me and my future co-founder, Justin, we were like, this seems like the future. [00:14:02]Alessio: This is really cool. [00:14:03]Michael: I couldn't really sleep even like I was going to bed and I was like, I was thinking about it. Like I would say up until like 2.30 AM, like reading papers on my phone in bed, go to sleep, wake up the next morning at like eight and just be super excited to keep working. And I was also doing my thesis at the same time, my senior honors thesis at UT Austin about something very similar. We were researching factuality in abstractive question answering systems. So a lot of overlap with this project and the conclusions of my research actually kind of helped guide the development path of Hello. In the research, we found that LLMs, they don't know what they don't know. So the conclusion was, is that you always have to do a search to ensure that the model actually knows what it's talking about. And my favorite example of this even today is kind of with chat GPT browsing, where you can ask chat GPT browsing, how do I run llama.cpp? And chat GPT browsing will think that llama.cpp is some file on your computer that you can just compile with GCC and you're all good. It won't even bother doing a lookup, even though I'm sure somewhere in their internal prompts they have something like, if you're not sure, do a lookup. [00:15:13]Alessio: That's not good enough. So models don't know what they don't know. [00:15:15]Michael: You always have to do a search. And so we approached LLM powered question answering from the search angle. We pivoted to make this for programmers in June of 2022, around the time that we were getting into YC. We realized that what we're really interested in is the case where the models actually have to think. Because up until then, the models were kind of more glorified summarization models. We really thought of them like the Google featured snippets, but on steroids. And so we saw a future where the simpler questions would get commoditized. And I still think that's going to happen with like Google SGE and like it's nowadays, it's really not that hard to answer the more basic kind of like summarization, like current events questions with lightweight models that'll only continue to get cheaper over time. And so we kind of started thinking about this trade off where LLM models are going to get both better and cheaper over time. And that's going to force people who run them to make a choice. Either you can run a model of the same intelligence that you could previously for cheaper, or you can run a better model for the same price. So someone like Google, once the price kind of falls low enough, they're going to deploy and they're already doing this with SGE, they're going to deploy a relatively basic glorified summarizer model that can answer very basic questions about like current events, who won the Super Bowl, like, you know, what's going on on Capitol Hill, like those types of things. The flip side of that is like more complex questions where like you have to reason and you have to solve problems and like debug code. And we realized like we're much more interested in kind of going along the bleeding edge of that frontier case. And so we've optimized everything that we do for that. And that's a big reason of why we've built Phind specifically for programmers, as opposed to saying like, you know, we're kind of a search engine for everyone because as these models get more capable, we're very interested in seeing kind of what the emergent properties are in terms of reasoning, in terms of being able to solve complex multi-step problems. And I think that some of those emerging capabilities like we're starting to see, but we don't even fully understand. So I think there's always an opportunity for us to become more general if we wanted, but we've been along this path of like, what is the best, most advanced reasoning engine that's connected to your code base, that's connected to the internet that we can just provide. [00:17:39]Alessio: What is Phind today, pragmatically, from a product perspective, how do people interact with it? Yeah. Or does it plug into your workflow? [00:17:46]Michael: Yeah. [00:17:47]Alessio: So Phind is really a system. [00:17:48]Michael: Phind is a system for programmers when they have a question or when they're frustrated or when something's not working. [00:17:54]Swyx: When they're frustrated. [00:17:55]Alessio: Yeah. [00:17:56]Michael: For them to get on block. I think like the single, the most abstract page for Phind is like, if you're experiencing really any kind of issue as a programmer, we'll solve that issue for you in 15 seconds as opposed to 15 minutes or longer. Phind has an interface on the web. It has an interface in VS code and more IDEs to come, but ultimately it's just a system where a developer can paste in a question or paste in code that's not working and Phind will do a search on the internet or they will find other code in your code base perhaps that's relevant. And then we'll find the context that it needs to answer your question and then feed it to a reasoning engine powerful enough to actually answer it. So that's really the philosophy behind Phind. It's a system for getting developers the answers that they're looking for. And so right now from a product perspective, this means that we're really all about getting the right context. So the VS code extension that we launched recently is a big part of this because you can just ask a question and it knows where to find the right code context in your code. It can do an internet search as well. So it's up to date and it's not just reliant on what the model knows and it's able to figure out what it needs by itself and answer your question based on that. If it needs some help, you can also get yourself kind of just, there's opportunities for you yourself to put in all that context in. But the issue is also like not everyone wants these VS code. Some people like are real Neovim sticklers or they're using like PyCharm or other IDEs, JetBrains. And so for those people, they're actually like okay with switching tabs, at least for now, if it means them getting their answer. Because really like there's been an explosion of all these like startups doing code, doing search, etc. But really who everyone's competing with is ChatGPT, which only has like that one web interface. Like ChatGPT is really the bar. And so that's what we're up against. [00:19:50]Alessio: And so your idea, you know, we have Amman from Cursor on the podcast and they've gone through the we need to own the IDE thing. Yours is more like in order to get the right answer, people are happy to like go somewhere else basically. They're happy to get out of their IDE. [00:20:05]Michael: That was a great podcast, by the way. But yeah, so part of it is that people sometimes perhaps aren't even in an IDE. So like the whole task of software engineering goes way beyond just running code, right? There's also like a design stage. There's a planning stage. A lot of this happens like on whiteboards. It happens in notebooks. And so the web part also exists for that where you're not even coding it and you're just trying to get like a more conceptual understanding of what you're trying to build first. The podcast with Amman was great, but somewhere where I disagree with him is that you need to own the IDE. I think like he made some good points about not having platform risk in the long term. But some of the features that were mentioned like suggesting diffs, for example, those are all doable with an extension. We haven't yet seen with VS Code in particular any functionality that we'd like to do yet in the IDE that we can't either do through directly supported VS Code functionality or something that we kind of hack into there, which we've also done a fair bit of. And so I think it remains to be seen where that goes. But I think what we're looking to be is like we're not trying to just be in an IDE or be an IDE. Like Phind is a system that goes beyond the IDE and like is really meant to cover the entire lifecycle of a developer's thought process in going about like, hey, like I have this idea and I want to get from that idea to a working product. And so then that's what the long term vision of Phind is really about is starting with that. In the future, I think programming is just going to be really just the problem solving. Like you come up with an idea, you come up with like the basic design for the algorithm in your head, and you just tell the AI, hey, just like just do it, just make it work. And that's what we're building towards. [00:21:51]Swyx: I think we might want to give people an impression about like type of traffic that you have, because when you present it with a text box, you could type in anything. And I don't know if you have some mental categorization of like what are like the top three use cases that people tend to coalesce around. [00:22:08]Alessio: Yeah, that's a great question. [00:22:09]Michael: The two main types of searches that we see are how-to questions, like how to do X using Y tool. And this historically has been our bread and butter, because with our embeddings, like we're really, really good at just going over a bunch of developer documentation and figuring out exactly the part that's relevant and just telling you, OK, like you can use this method. But as LLMs have gotten better, and as we've really transitioned to using GPT-4 a lot in our product, people organically just started pasting in code that's not working and just said, fix it for me. [00:22:42]Swyx: Fix this. [00:22:43]Alessio: Yeah. [00:22:44]Michael: And what really shocks us is that a lot of the people who do that, they're coming from chat GPT. So they tried it in chat GPT with chat GPT-4. It didn't work. Maybe it required like some multi-step reasoning. Maybe it required some internet context or something found in either a Stack Overflow post or some documentation to solve it. And so then they paste it into find and then find works. So those are really those two different cases. Like, how can I build this conceptually or like remind me of this one detail that I need to build this thing? Or just like, here's this code. Fix it. And so that's what a big part of our VS Code extension is, is like enabling a much smoother here just like fix it for me type of workflow. That's really its main benefits. Like it's in your code base. It's in the IDE. It knows how to find the relevant context to answer that question. But at the end of the day, like I said previously, that's still a relatively, not to say it's a small part, but it's a limited part of the entire mental life cycle of a programmer. [00:23:47]Swyx: Yep. So you launched in Feb and then you launched V2 in August. You had a couple other pretty impactful posts slash feature launches. The web search one was massive. So you were mostly a GPT-4 wrapper. We were for a long time. [00:24:03]Michael: For a long time until recently. Yeah. [00:24:05]Alessio: Until recently. [00:24:06]Swyx: So like people coming over from ChatGPT were saying, we're going to say model with your version of web search. Would that be the primary value proposition? [00:24:13]Michael: Basically yeah. And so what we've seen is that any model plus web search is just significantly better than [00:24:18]Alessio: that model itself. Do you think that's what you got right in April? [00:24:21]Swyx: Like so you got 1500 points on Hacking News in April, which is like, if you live on Hacking News a lot, that is unheard of for someone so early on in your journey. [00:24:31]Alessio: Yeah. [00:24:32]Michael: We're super, super grateful for that. Definitely was not expecting it. So what we've done with Hacker News is we've just kept launching. [00:24:38]Alessio: Yeah. [00:24:39]Michael: Like what they don't tell you is that you can just keep launching. That's what we've been doing. So we launched the very first version of Find in its current incarnation after like the previous demo connected to our own index. Like once we got into YC, we scrapped our own index because it was too cumbersome at the time. So we moved over to using Bing as kind of just the raw source data. We launched as Hello Cognition. Over time, every time we like added some intelligence to the product, a better model, we just keep launching. And every additional time we launched, we got way more traffic. So we actually silently rebranded to Find in late December of last year. But like we didn't have that much traffic. Nobody really knew who we were. [00:25:18]Swyx: How'd you pick the name out of it? [00:25:19]Michael: Paul Graham actually picked it for us. [00:25:21]Swyx: All right. [00:25:22]Alessio: Tell the story. Yeah. So, oh boy. [00:25:25]Michael: So this is the biggest side. Should we go for like the full Paul Graham story or just the name? [00:25:29]Swyx: Do you want to do it now? Or do you want to do it later? I'll give you a choice. [00:25:32]Alessio: Hmm. [00:25:33]Michael: I think, okay, let's just start with the name for now and then we can do the full Paul Graham story later. But basically, Paul Graham, when we were lucky enough to meet him, he saw our name and our domain was at the time, sayhello.so and he's just like, guys, like, come on, like, what is this? You know? And we were like, yeah, but like when we bought it, you know, we just kind of broke college students. Like we didn't have that much money. And like, we really liked hello as a name because it was the first like conversational search engine. And that's kind of, that's the angle that we were approaching it from. And so we had sayhello.so and he's like, there's so many problems with that. Like, like, like the say hello, like, what does that even mean? And like .so, like, it's gotta be like a .com. And so we did some time just like with Paul Graham in the room. We just like looked at different domain names, like different things that like popped into our head. And one of the things that popped into like Paul Graham said was fine with the Phind spelling in particular. [00:26:33]Swyx: Yeah. Which is not typical naming advice, right? Yes. Because it's not when people hear it, they don't spell it that way. [00:26:38]Michael: Exactly. It's hard to spell. And also it's like very 90s. And so at first, like, we didn't like, I was like, like, ah, like, I don't know. But over time it kept growing on us. And eventually we're like, okay, we like the name. It's owned by this elderly Canadian gentleman who we got to know, and he was willing to sell it to us. [00:26:57]Michael: And so we bought it and we changed the name. Yeah. [00:27:01]Swyx: Anyways, where were you? [00:27:02]Alessio: I had to ask. [00:27:03]Swyx: I mean, you know, everyone who looks at you is wondering. [00:27:06]Michael: And a lot of people actually pronounce it Phind, which, you know, by now it's part of the game. But eventually we want to buy Phind.com and then just have that redirect to Phind. So Phind is like definitely the right spelling. But like, we'll just, yeah, we'll have all the cases addressed. [00:27:23]Swyx: Cool. So Bing web search, and then August you launched V2. Is V2 the Phind as a system pitch? Or have you moved, evolved since then? [00:27:31]Michael: Yeah, so I don't, like the V2 moniker, like, I don't really think of it that way in my mind. There's like, there's the version we launched during, last summer during YC, which was the Bing version directed towards programmers. And that's kind of like, that's why I call it like the first incarnation of what we currently are. Because it was already directed towards programmers. We had like a code snippet search built in as well, because at the time, you know, the models we were using weren't good enough to generate code snippets. Even GPT, like the text DaVinci 2 was available at the time, wasn't that good at generating code and it would generate like very, very short, very incomplete code snippets. And so we launched that last summer, got some traction, but really like we were only doing like, I don't know, maybe like 10,000 searches a day. [00:28:15]Alessio: Some people knew about it. [00:28:16]Michael: Some people used it, which is impressive because looking back, the product like was not that good. And every time we've like made an improvement to the way that we retrieve context through better embeddings, more intelligent, like HTML parsers, and importantly, like better underlying models. Every major version after that was when we introduced a better underlying answering model. Like in February, we had to swallow a bit of our pride when we were like, okay, our own models aren't good enough. We have to go to open AI. And actually that did lead to kind of like our first decent bump of traffic in February. And people kept using it, like our attention was way better too. But we were still kind of running into problems of like more advanced reasoning. Some people tried it, but people were leaving because even like GPT 3.5, both turbo and non-turbo, like still not that great at doing like code related reasoning beyond the how do you do X, like documentation search type of use case. And so it was really only when GPT 4 came around in April that we were like, okay, like this is like our first real opportunity to really make this thing like the way that it should have been all along. And having GPT 4 as the brain is what led to that Hacker News post. And so what we did was we just let anyone use GPT 4 on Fyne for free without a login, [00:29:43]Alessio: which I actually don't regret. [00:29:45]Michael: So it was very expensive, obviously. But like at that stage, all we needed to do was show like, we just needed to like show people here's what Fyne can do. That was the main thing. And so that worked. That worked. [00:29:58]Alessio: Like we got a lot of users. [00:29:59]Michael: Do you know Fireship? [00:30:01]Swyx: Yeah. YouTube, Jeff Delaney. [00:30:03]Michael: Yeah. He made a short about Fyne. [00:30:06]Alessio: Oh. [00:30:07]Michael: And that's on top of the Hacker News post. And that's what like really, really made it blow up. It got millions of views in days. And he's just funny. Like what I love about Fireship is like he like you guys, yeah, like humor goes a long a long way towards like really grabbing people's attention. And so that blew up. [00:30:25]Swyx: Something I would be anxious about as a founder during that period, so obviously we all remember that pretty closely. So there were a couple of people who had access to the GPT-4 API doing this, which is unrestricted access to GPT-4. And I have to imagine OpenAI wasn't that happy about that because it was like kind of de facto access to GPT-4 before they released it. [00:30:46]Alessio: No, no. [00:30:47]Michael: GPT-4 was in chat GPT from day one. I think. OpenAI actually came to our support because what happened was we had people building unofficial APIs around to try to get free access to it. And I think OpenAI actually has the right perspective on this where they're like, OK, people can do whatever they want with the API if they're paying for it, like they can do whatever they want, but it's like not OK if, you know, paying customers are being exploite by these other actors. They actually got in touch with us and they helped us like set up better Cloudflare bot monitoring controls to effectively like crack down on those unofficial APIs, which we're very happy about. But yeah, so we launched GPT-4. A lot of people come to the product and yeah, for a long time, we're just we're figuring out like what do we make of this, right? How do we a make it better, but also deal with like our costs, which have just like massively, massively ballooned. Over time, it's become more clear with the release of Llama 2 and Llama 3 on the horizon that we will once again see a return to vertical applications running their own models. As was true last year and before, I think that GPT-4, my hypothesis is that the jump from 4 to 4.5 or 4 to 5 will be smaller than the jump from 3 to 4. And the reason why is because there were a lot of different things. Like there was two plus, effectively two, two and a half years of research that went into going from 3 to 4. Like more data, bigger model, all of the instruction tuning techniques, RLHF, all of that is known. And like Meta, for example, and now there's all these other startups like Mistral too, like there's a bunch of very well-funded open source players that are now working on just like taking the recipe that's now known and scaling it up. So I think that even if a delta exists, the delta between in 2024, the delta between proprietary and open source won't be large enough that a startup like us with a lot of data that we've collected can take the data that we have, fine tune an open source model, and like be able to have it be better than whatever the proprietary model is at the time. That's my hypothesis.Michael: But we'll once again see a return to these verticalized models. And that's something that we're super excited about because, yeah, that brings us to kind of the fine model because the plan from kind of the start was to be able to return to that if that makes sense. And I think now we're definitely at a point where it does make sense because we have requests from users who like, they want longer context in the model, basically, like they want to be able to ask questions about their entire code base without, you know, context and retrieval and taking a chance of that. Like, I think it's generally been shown that if you have the space to just put the raw files inside of a big context window, that is still better than chunking and retrieval. So there's various things that we could do with longer context, faster speed, lower cost. Super excited about that. And that's the direction that we're going with the fine model. And our big hypothesis there is precisely that we can take a really good open source model and then just train it on absolutely all of the high quality data that we can find. And there's a lot of various, you know, interesting ideas for this. We have our own techniques that we're kind of playing with internally. One of the very interesting ideas that I've seen, I think it's called Octopack from BigCode. I don't think that it made that big waves when it came out, I think in August. But the idea is that they have this data set that maps GitHub commits to a change. So basically there's all this really high quality, like human made, human written diff data out there on every time someone makes a commit in some repo. And you can use that to train models. Take the file state before and like given a commit message, what should that code look like in the future? [00:34:52]Swyx: Got it. [00:34:53]Alessio: Do you think your HumanEval is any good?Michael: So we ran this experiment. We trained the Phind model. And if you go to the BigCode leaderboard, as of today, October 5th, all of our models are at the top of the BigCode leaderboard by far. It's not close, particularly in languages other than Python. We have a 10 point gap between us and the next best model on JavaScript. I think C sharp, multilingual. And what we kind of learned from that whole experience releasing those models is that human eval doesn't really matter. Not just that, but GPT-4 itself has been trained on human eval. And we know this because GPT-4 is able to predict the exact docstring in many of the problems. I've seen it predict like the specific example values in the docstring, which is extremely improbable. So I think there's a lot of dataset contamination and it only captures a very limited subset of what programmers are actually doing. What we do internally for evaluations are we have GPT-4 score answers. GPT-4 is a really good evaluator. I mean, obviously it's by really good, I mean, it's the best that we have. I'm sure that, you know, a couple of months from now, next year, we'll be like, oh, you know, like GPT-4.5, GPT-5, it's so much better. Like GPT-4 is terrible, but like right now it's the best that we have short of humans. And what we found is that when doing like temperature zero evals, it's actually mostly deterministic GPT-4 across runs in assigning scores to two different answers. So we found it to be a very useful tool in comparing our model to say, GPT-4, but yeah, on our like internal real world, here's what people will be asking this model dataset. And the other thing that we're running is just like releasing the model to our users and just seeing what they think. Because that's like the only thing that really matters is like releasing it for the application that it's intended for, and then seeing how people react. And for the most part, the incredible thing is, is that people don't notice a difference between our model and GPT-4 for the vast majority of searches. There's some reasoning problems that GPT-4 can still do better. We're working on addressing that. But in terms of like the types of questions that people are asking on find, there's not that much difference. And in fact, I've been running my own kind of side by side comparisons, shout out to GodMode, by the way. [00:37:16]Michael: And I've like myself, I've kind of confirmed this to be the case. And even sometimes it gives a better answer, perhaps like more concise or just like better implementation than GPT-4, which that's what surprises me. And by now we kind of have like this reasoning is all you need kind of hypothesis where we've seen emerging capabilities in the find model, whereby training it on high quality code, it can actually like reason better. It went from not being able to solve world problems, where riddles were like with like temporal placement of objects and moving and stuff like that, that GPT-4 can do pretty well. We went from not being able to do those at all to being able to do them just by training on more code, which is wild. So we're already like starting to see like these emerging capabilities. [00:37:59]Swyx: So I just wanted to make sure that we have the, I guess, like the model card in our heads. So you started from Code Llama? [00:38:07]Alessio: Yes. [00:38:08]Swyx: 65, 34? 34. [00:38:10]Michael: So unfortunately, there's no Code Llama 70b. If there was, that would be super cool. But there's not. [00:38:15]Swyx: 34. And then, which in itself was Llama 2, which is on 2 trillion tokens and the added 500 billion code tokens. Yes. [00:38:22]Michael: And you just added a bunch more. [00:38:23]Alessio: Yeah. [00:38:24]Michael: And they also did a couple of things. So they did, I think they did 500 billion, like general pre-training and then they did an extra 20 billion long context pre-training. So they actually increased the like max position tokens to 16k up from 8k. And then they changed the theta parameter for the ROPE embeddings as well to give it theoretically better long context support up to 100k tokens. But yeah, but otherwise it's like basically Llama 2. [00:38:50]Swyx: And so you just took that and just added data. [00:38:52]Michael: Exactly. [00:38:53]Swyx: You didn't do any other fundamental. [00:38:54]Michael: Yeah. So we didn't actually, we haven't yet done anything with the model architecture and we just trained it on like many, many more billions of tokens on our own infrastructure. And something else that we're taking a look at now is using reinforcement learning for correctness. One of the interesting pitfalls that we've noticed with the Phind model is that in cases where it gets stuff wrong, it sometimes is capable of getting the right answer. It's just, there's a big variance problem. It's wildly inconsistent. There are cases when it is able to get the right chain of thought and able to arrive [00:39:25]Alessio: at the right answer, but not always. [00:39:27]Michael: And so like one of our hypotheses is something that we're going to try is that like we can actually do reinforcement learning on, for a given problem, generate a bunch of completions and then like use the correct answer as like a loss basically to try to get it to be more correct. And I think there's a high chance I think of this working because it's very similar to the like RLHF method where you basically show pairs of completions for a given question except the criteria is like which one is like less harmful. But here we have a different criteria. But if the model is already capable of getting the right answer, which it is, we're just, we just need to cajole it into being more consistent. [00:40:06]Alessio: There were a couple of things that I noticed in the product that were not strange but unique. So first of all, the model can talk multiple times in a row, like most other applications is like human model, human model. And then you had outside of the thumbs up, thumbs down, you have things like have DLLM prioritize this message and its answers or then continue from this message to like go back. How does that change the flow of the user and like in terms of like prompting it, yeah, what are like some tricks or learnings you've had? [00:40:37]Michael: So yeah, that's specifically in our pair programmer mode, which is a more conversational mode that also like asks you clarifying questions back if it doesn't fully understand what you're doing and it kind of it holds your hand a bit more. And so from user feedback, we had requests to make more of an auto GPT where you can kind of give it this problem that might take multiple searches or multiple different steps like multiple reasoning steps to solve. And so that's the impetus behind building that product. Being able to do multiple steps and also be able to handle really long conversations. Like people are really trying to use the pair programmer to go from like sometimes really from like basic idea to like complete working code. And so we noticed was is that we were having like these very, very long threads, sometimes with like 60 messages, like 100 messages. And like those become really, really challenging to manage the appropriate context window of what should go inside of the context and how to preserve the context so that the model can continue or the product can continue giving good responses, even if you're like 60 messages deep in a conversation. So that's where the prioritized user messages like comes from. It's like people have asked us to just like let them pin messages that they want to be left in the conversation. And yeah, and then that seems to have like really gone a long way towards solving that problem, yeah. [00:41:54]Alessio: And then you have a run on Replit thing. Are you planning to build your own repl? Like learning some people trying to run the wrong code, unsafe code? [00:42:03]Michael: Yes. Yes. So I think like in the long term vision of like being a place where people can go from like idea to like fully working code, having a code sandbox, like a natively integrated code sandbox makes a lot of sense. And replit is great and people use that feature. But yeah, I think there's more we can do in terms of like having something a bit closer to code interpreter where it's able to run the code and then like recursively iterate on it. Exactly. [00:42:31]Swyx: So you're working on APIs to enable you to do that? Yep. So Amjad has specifically told me in person that he wants to enable that for people at the same time. He's also working on his own models, and Ghostwriter and you know, all the other stuff. So it's going to get interesting. Like he wants to power you, but also compete with you. Yeah. [00:42:47]Michael: And like, and we love replit. I think that a lot of the companies in our space, like we're all going to converge to solving a very similar problem, but from a different angle. So like replit approaches this problem from the IDE side. Like they started as like this IDE that you can run in the browser. And they started from that side, making coding just like more accessible. And we're approaching it from the side of like an LLM that's just like connected to everything that it needs to be connected to, which includes your code context. So that's why we're kind of making inroads into IDEs, but we're kind of, we're approaching this problem from different sides. And I think it'll be interesting to see where things end up. But I think that in the long, long term, we have an opportunity to also just have like this general technical reasoning engine product that's potentially also not just for, not just for programmers. It's also powered in this web interface, like where there's potential, I think other things that we will build that eventually might go beyond like our current scope. [00:43:49]Swyx: Exciting. We'll look forward to that. We're going to zoom out a little bit into sort of AI ecosystem stories, but first we got to get the Paul Graham, Ron Conway story. [00:43:59]Alessio: Yeah. [00:44:00]Michael: So flashback to last summer, we're in the YC batch. We're doing the summer batch, summer 22. So the summer batch runs from June to September, approximately. And so this was late July, early August, right around the time that many like YC startups start like going out, like during up, here's how we're going to pitch investors and everything. And at the same time, me and my co-founder, Justin, we were planning on moving to New York. So for a long time, actually, we were thinking about building this company in New York, mainly for personal reasons, actually, because like during the pandemic, pre-ChatGPT, pre last year, pre the AI boom, SF unfortunately really kind of, you know, like lost its luster. Yeah. Like no one was here. It was far from clear, like if there would be an AI boom, if like SF would be like... [00:44:49]Alessio: Back. [00:44:50]Michael: Yeah, exactly. Back. As everyone is saying these days, it was far from clear. And so, and all of our friends, we were graduating college because like we happened to just graduate college and immediately start YC, like we didn't even have, I think we had a week in between. [00:45:06]Swyx: You didn't bother looking for jobs. You were just like, this is what we want to do. [00:45:08]Michael: Well, actually both me and my co-founder, we had jobs that we secured in 2021 from previous internships, but we both, funny enough, when I spoke to my boss's boss at the company at where I reneged my offer, I told him we got into YC, they actually said, yeah, you should do YC. [00:45:27]Swyx: Wow. [00:45:28]Alessio: That's very selfless. [00:45:29]Swyx: That was really great that they did that. But in San Francisco, they would have offered to invest as well. [00:45:33]Michael: Yes, they would have. But yeah, but we were both planning to be in New York and all of our friends were there from college at this point, like we have this whole plan where like on August 1st, we're going to move to New York and we had like this Airbnb for the month of New York. We're going to stay there and we're going to work and like all of that. The day before we go to New York, I called Justin and I just, I tell him like, why are we doing this? Because in our batch, by the time August 1st rolled around, all of our mentors at YC were saying like, hey, like you should really consider staying in SF. [00:46:03]Swyx: It's the hybrid batch, right? [00:46:04]Michael: Yeah, it was the hybrid batch, but like there were already signs that like something was kind of like afoot in SF, even if like we didn't fully want to admit it yet. And so we were like, I don't know, I don't know. Something kind of clicked when the rubber met the road and it was time to go to New York. We're like, why are we doing this? And like, we didn't have any good reasons for staying in New York at that point beyond like our friends are there. So we still go to New York because like we have the Airbnb, like we don't have any other kind of place to go for the next few weeks. We're in New York and New York is just unfortunately too much fun. Like all of my other friends from college who are just, you know, basically starting their jobs, starting their lives as adults. They just stepped into these jobs, they're making all this money and they're like partying and like all these things are happening. And like, yeah, it's just a very distracting place to be. And so we were just like sitting in this like small, you know, like cramped apartment, terrible posture, trying to get as much work done as we can, too many distractions. And then we get this email from YC saying that Paul Graham is in town in SF and he is doing office hours with a certain number of startups in the current batch. And whoever signs up first gets it. And I happen to be super lucky. I was about to go for a run, but I just, I saw the email notification come across the street. I immediately clicked on the link and like immediately, like half the spots were gone, but somehow the very last spot was still available. And so I picked the very, very last time slot at 7 p.m. semi-strategically, you know, so we would have like time to go over. And also because I didn't really know how we're going to get to SF yet. And so we made a plan that we're going to fly from New York to SF and back to New York in one day and do like the full round trip. And we're going to meet with PG at the YC Mountain View office. And so we go there, we do that, we meet PG, we tell him about the startup. And one thing I love about PG is that he gets like, he gets so excited. Like when he gets excited about something, like you can see his eyes like really light up. And he'll just start asking you questions. In fact, it's a little challenging sometimes to like finish kind of like the rest of like the description of your pitch because like, he'll just like asking all these questions about how it works. And I'm like, you know, what's going on? [00:48:19]Swyx: What was the most challenging question that he asked you? [00:48:21]Michael: I think that like really how it worked. Because like as soon as like we told him like, hey, like we think that the future of search is answers, not links. Like we could really see like the gears turning in his head. I think we were like the first demo of that. [00:48:35]Swyx: And you're like 10 minutes with him, right? [00:48:37]Michael: We had like 45, yeah, we had a decent chunk of time. And so we tell him how it works. Like he's very excited about it. And I just like, I just blurted out, I just like asked him to invest and he hasn't even seen the product yet. We just asked him to invest and he says, yeah. And like, we're super excited about that. [00:48:55]Swyx: You haven't started your batch. [00:48:56]Michael: No, no, no. This is about halfway through the batch or two, two, no, two thirds of the batch. [00:49:02]Swyx: And you're like not technically fundraising yet. We're about to start fundraising. Yeah. [00:49:06]Michael: So we have like this demo and like we showed him and like there was still a lot of issues with the product, but I think like it must have like still kind of like blown his mind in some way. So like we're having fun. He's having fun. We have this dinner planned with this other friend that we had in SF because we were only there for that one day. So we thought, okay, you know, after an hour we'll be done, you know, we'll grab dinner with our friend and we'll fly back to New York. But PG was like, like, I'm having so much fun. Do you want to have dinner? Yeah. Come to my house. Or he's like, I gotta go have dinner with my wife, Jessica, who's also awesome, by the way. [00:49:40]Swyx: She's like the heart of YC. Yeah. [00:49:42]Michael: Jessica does not get enough credit as an aside for her role. [00:49:46]Swyx: He tries. [00:49:47]Michael: He understands like the technical side and she understands people and together they're just like a phenomenal team. But he's like, yeah, I got to go see Jessica, but you guys are welcome to come with. Do you want to come with? And we're like, we have this friend who's like right now outside of like literally outside the door who like we also promised to get dinner with. It's like, we'd love to, but like, I don't know if we can. He's like, oh, he's welcome to come too. So all of us just like hop in his car and we go to his house and we just like have this like we have dinner and we have this just chat about the future of search. Like I remember him telling Jessica distinctly, like our kids as kids are not going to know what like a search result is. Like they're just going to like have answers. That was really like a mind blowing, like inflection point moment for sure. [00:50:34]Swyx: Wow, that email changed your life. [00:50:35]Michael: Absolutely. [00:50:36]Swyx: And you also just spoiled the booking system for PG because now everyone's just going to go after the last slot. Oh man. [00:50:42]Michael: Yeah. But like, I don't know if he even does that anymore. [00:50:46]Swyx: He does. He does. Yeah. I've met other founders that he did it this year. [00:50:49]Michael: This year. Gotcha. But when we told him about how we did it, he was like, I am like frankly shocked that YC just did like a random like scheduling system. [00:50:55]Alessio: They didn't like do anything else. But, um. [00:50:58]Swyx: Okay. And then he introduces Duron Conway. Yes. Who is one of the most legendary angels in Silicon Valley. [00:51:04]Michael: Yes.So after PG invested, the rest of our round came together pretty quickly. [00:51:10]Swyx: I'm, by the way, I'm surprised. Like it's, it might feel like playing favorites right within the current batch to be like, yo, PG invested in this one. Right. [00:51:17]Alessio: Too bad for the others. [00:51:18]Swyx: Too bad for the others, I guess. [00:51:19]Michael: I think this is a bigger point about YC and like these accelerators in general is like YC gets like a lot of criticism from founders who feel like they didn't get value out of it. But like, in my view, YC is what you make of it. And YC tells you this. They're like, you really got to grab this opportunity, like buy the balls and make the most of it. And if you do, then it could be the best thing in the world. And if you don't, and if you're just kind of like a passive, even like an average founder in YC, you're still going to fail. And they tell you that. They're like, if you're average in your batch, you're going to fail. Like you have to just be exceptional in every way. With that in mind, perhaps that's even part of the reason why we asked PG to invest. And so yeah, after PG invested, the rest of our round came together pretty quickly, which I'm very fortunate for. And yeah, he introduced us to Ron. And after he did, I get a call from Ron. And then Ron says like, hey, like PG tells me what you're working on. I'd love to come meet you guys. And I'm like, wait, no way. And then we're just holed up in this like little house in San Mateo, which is a little small, but you know, it had a nice patio. In fact, we had like a monitor set up outside on the deck out there. And so Ron Conway comes over, we go over to the patio where like our workstation is. And Ron Conway, he's known for having like this notebook that he goes around with where he like sits down with the notebook and like takes very, very detailed notes. So he never like forgets anything. So he sits down with his notebook and he asks us like, hey guys, like, what do you need? And we're like, oh, we need GPUs. Back then, the GPU shortage wasn't even nearly as bad as it is now. But like even then, it was still challenging to get like the quota that we needed. And he's like, okay, no problem. And then like he leaves a couple hours later, we get an email and we're CC'd on an email that Ron wrote to Jensen, the CEO of Nvidia, saying like, hey, these guys need GPUs. [00:53:02]Swyx: You didn't say how much? It was just like, just give them GPUs. [00:53:04]Alessio: Basically, yeah. [00:53:05]Michael: Ron is known for writing these like one-liner emails that are like very short, but very to the point. And I think that's why like everyone responds to Ron. Everyone loves Ron. And so Jensen responds. He responds quickly, like tagging this VP of AI at Nvidia. And we start working with Nvidia, which is great. And something that I love about Nvidia, by the way, is that after that intro, we got matched with like a dedicated team. And at Nvidia, they know that they're going to win regardless. So they don't care where you get the GPUs from. They're like, they're truly neutral, unlike various sales reps that you might encounter at various like clouds and, you know, hardware companies, et cetera. They actually just want to help you because they know they don't care. Like regardless, they know that if you're getting Nvidia GPUs, they're still winning. So I guess that's a tip is that like if you're looking for GPUs like Nvidia, they'll help you do it. [00:53:54]Swyx: So just to tie up this thing, because so first of all, that's a fantastic story. And I just wanted to let you tell that because it's special. That is a strategic shift, right? That you already decided to make by the time you met Ron, which is we are going to have our own hardware. We're going to rack him in a data center somewhere. [00:54:11]Michael: Well, not even that we need our own hardware because actually we don't. Right. But we just we just need GPUs, period. And like every cloud loves like they have their own sales tactics and like they want to make you commit to long terms and like very non-flexible terms. And like there's a web of different things that you kind of have to navigate. Nvidia will kind of be to the point like, OK, you can do this on this cloud, this on this cloud. Like this is your budget. Maybe you want to consider buying as well. Like they'll help you walk through what the options are. And the reason why they're helpful is because like they look at the full picture. So they'll help you with the hardware. And in terms of software, they actually implemented a custom feature for us in Faster Transformer, which is one of their libraries.Swyx: For you? [00:54:53]Michael: For us. Yeah. Which is wild. I don't think they would have done it otherwise. They implemented streaming generation for T5 based models, which we were running at the time up until we switched to GPT in February, March of this year. So they implemented that just for us, actually, in Faster Transformer. And so like they'll help you like look at the complete picture and then just help you get done what you need to get done. I know one of your interests is also local models, open source models and hardware kind of goes hand in hand.Alessio: Any fun projects, explorations in the space that you want to share with local llamas and stuff? [00:55:27]Michael: Yeah, it's something that we're very interested in because something that kind of we're hearing a lot about is like people want something like find, especially companies, but they want to have it like within like their own sandbox. They want to have it like on hardware that they control. And so I'm super, super interested in how we can get big models to run efficiently on local hardware. And so like Ollama is great. Llama CPP is great. Very interested in like where the quantization thing is going. Because like obviously there are all these like great quantization libraries now that go to 4-bit, 8-bit, but specifically int8 and int4. [00:56:04]Alessio: Which is the lowest it can go, right? [00:56:05]Swyx: Yeah. [00:56:06]Michael: So we have these great quantization libraries that for the most part are able to get the size down with not that much quality loss. But there is some like the quantized models currently are actually worse than the non-quantized ones. And so I'm very curious if the future is something like what NVIDIA is doing with their implementation of FP8, which they're implementing in their transformer engine library. Where basically once FP8 support is kind of more widespread and hardware can support it efficiently, you can kind of switch between the two different FP8 formats. One with greater precision, one with greater range. And then combine that with only not doing FP8 on every layer and doing like a mixed precision with like FP32 on some layers. And like NVIDIA claims that this strategy that they're kind of demoing with the H100 has no degradation. And so it remains to be seen whether that is really true in practice. But that's something that we're excited about and whether that can be applied to Macs and other hardware once they get FP8 support as well. [00:57:05]Alessio: Cool. [00:57:06]Swyx: One thing I wanted to do before we go into lightning round. Oh, we should also talk about hiring. How do you get your info? You seem self-taught. Yeah. [00:57:12]Michael: I've always just, well, I'm fortunate to have like a decent systems background from UT Austin. And somewhat of a research background, even though like I didn't publish any papers, but like I went through all the motions. Like I didn't publish the thesis that I wrote, mainly out of time because I was doing both of that and the startup at the same time. And then I graduated and then it was YC and then everything was kind of one after another. But like I'm very fortunate to kind of have like the systems and like a bit of like a research background. But for the most part, outside of that foundation, like I've always just, whenever I've been interested in something, I just like. [00:57:43]Swyx: Like give people tips, right? Like where do you, what fire hose do you drink from? Yeah, exactly. [00:57:48]Michael: So like whenever I see something that blows my mind, the way that that initial hugging face demo did, that was like the start of everything. I'll start from the beginning. If I don't know anything, I'll start by just trying to get a mental model of what is happening. Like first I need to understand what, so I can understand like the why, the how and the why. And once I can understand that, then I can make my own hypotheses about like, okay, here are the assumptions that the authors of this made. I mean, here's why maybe they're correct. Maybe they're wrong. And here's how like I can improve on it and iterate on it. And I guess that's the mindset that I approach it from is like, once I understand something, like how can it be better? How can it be faster? How can it be like more accurate? And so I guess for anyone starting now, like I would have used find if I was starting now. Cause like I would have loved to just have been able to say like, Hey, like I have no idea what I'm doing. Can you just like be this like technical research assistant and kind of hold my hand and like ask me clarifying questions and like help me like formalize my assumptions like along the way. I would have loved that. But yeah, I just kind of did that myself. [00:58:50]Swyx: Recording Looms of yourself using Phind actually would be pretty interesting. Yeah. Because I think you, you would use find differently than people would by themselves. [00:58:57]Michael: I think so. Yeah. I generally use Phind for everything, which is definitely, yeah, it's like, no, no, even like non-technical questions as well. Cause that's just something I'm curious about, but that's less of a usage pattern nowadays. Like most people generally for the most part do technical questions on find. And that is completely understandable because of very deliberate decisions that we've made in how we've optimized the product. Like we've optimized the product very much in a quality first manner as opposed to a like speed first or like some balance of the two matters. So we're like, we have to run GPT-4 or some GPT-4 equivalent by default. And like, and it has to give like a good answer to like a very demanding technical audience where people will leave. So that's just the trade off. So like sometimes it's, it's slower for like simple questions, but like we did that on purpose. [00:59:46]Alessio: So before we do a lightning round, call for hiring any roles you're looking for. What should people know about what can I find? Yeah. [00:59:55]Michael: So we really straddled the line between product and research I find. For the past little while, a lot of the work that we've done has been solely product. But we also do, especially now with the find model, a very particular kind of applied research in trying to apply the very latest techniques and techniques that might not, that have not even been proven yet to training the very, very best model for our vertical. And the two go hand in hand because the product, the UI, the UX is kind of model agnostic. But when it has a better kernel, as Andrej Karpathy put it, plugged into it, it gets so much better. So we're doing really kind of both at the same time. And so someone who enjoys seeing both of those sides, like doing something very tangible that affects the user, high quality, reliable code that runs in production, but also having that chance to experiment with building these models. Yeah, we'd love to talk to you. [01:00:50]Swyx: And the title is Applied AI Engineer. [01:00:52]Michael: I don't know what the title is. Like that is one title, but I don't know if this really exists because I feel like we're too rigid about like bucketing people into categories. [01:01:02]Swyx: Yeah, Founding Engineer is fine. [01:01:03]Michael: Yeah, well, we already have a Founding Engineer technically. [01:01:06]Swyx: Well, for what it's worth, OpenAI is adopting Applied AI Engineer. Really? So it's becoming a thing. We'll see. [01:01:12]Alessio: We'll see. Lightning round. Yeah, we have three questions, acceleration, exploration, and then a takeaway. So the acceleration one is what's something that already happened in AI that you thought would take much longer? [01:01:24]Michael: Yeah, the jump from these like models being glorified summarization models to actual powerful reasoning engines happened much faster than we thought because like our product itself transitioned from being kind of this glorified summarization product to now like mostly a reasoning heavy product. And we had no idea that this would happen this fast. Like we thought that there'd be a lot more time and like many more things that needed to happen before we could do some level of like intelligent reasoning on a low level about people's code. But it's already happened and it happened much faster than we could have thought. But I think that leads into your next point. [01:02:02]Alessio: Which is exploration. [01:02:04]Swyx: What do you think is the most interesting unsolved question in AI? [01:02:07]Michael: I think solving hallucinations, being able to guarantee that the answer will be correct is I think super interesting. And it's particularly relevant to us because like we operate in a space where like everything needs to be correct. Like the code, like not just the logic, but like the implementation, everything has to be completely correct. And there's a lot of very interesting work that's going on in this space. Some of it is approaching it from the angle of formal grammars. There's a very interesting paper that came out recently. I forget where it came out of, but the paper is basically you can define a grammar that restricts and modifies the model's log probs, like decoding strategy to only conform to that grammar. And that helps it... [01:02:53]Swyx: Is this LMQL? Because I feel like LMQL is a little bit too structured for... If the goal is avoiding hallucination, that's such a vague goal. Yeah. [01:03:02]Michael: This is only something we've begun to take a look at. I haven't fully read the paper yet. Like I've only kind of skimmed the abstract, but it's something that like we're definitely interested in exploring further. But something that we are like a bit further along on is also like exploring reinforcement learning for correctness, as opposed to only harmfulness the way it has typically been used in my college. [01:03:23]Swyx: I'm interested to see your paper on that. Just a quick follow-up. Do you have internal evals for what hallucination rate is on stock GPC4 and then maybe what yours is after fine-tuning? [01:03:34]Michael: We don't measure hallucination directly in our internal benchmarks. We more measure like was the answer right or was it wrong? We measure hallucination indirectly by evaluating the context, like the RAG context fed into the model as well. So basically, if the context was bad and the answer was bad, then chances are like it's the context. But if the context was good and it just like misinterpreted that or had the wrong conclusion, then like we can take different steps there. Harrison from LangChain has been talking about this sort of two-by-two matrix with the RAG people. It's a pretty simple concept. [01:04:08]Swyx: What's the source of error? [01:04:11]Michael: Exactly. I've been talking to Harrison actually about like a more structured way perhaps within Linkchain to like do evals. Because I think that's a massive problem. Like every single eval is different for these big, large language models and doing them in a quantitative way is really hard. But it's possible with like a platform that I think harnesses GPT-4 in the right way. That and also perhaps a stricter prompting language like a prompting markup language for prompting models is something I'm also very interested in. Because we've written some very, very complex prompts particularly for a VS Code extension to like very fancy things with people's code. And like I wish there was a way that you could have like a more formal way like a Python for LLM prompting that you could activate desired things within like the model's execution flow through some other abstraction above language that has been like tested to do that some of the time. Perhaps like combined with like formal grammar limitations and stuff like that. Interesting. I have no idea what that looks like. These are all things these are all things that have kind of emerged directly from the issues we're facing ourselves internally. But yeah, definitely very abstract so far.Alessio: And yeah, just to wrap what's one message idea you want people to remember and think about? [01:05:32]Michael: I think pay attention to those moments that like really jump out at you. Like when you see like a crazy demo that you can't forget about like something that you just think is really, really cool. Because I see a lot of people trying to start startups from the angle of like, hey, I just want to start a startup or I'm just bored at my job or like I'm like generally interested in the space. And I personally disagree with that. My take is that it's much easier having been on both sides of that coin now, it's much easier to stay obsessed every single day when the genesis of your startup is something that really spoke to you in an incredibly meaningful way beyond just being some insight that you've noticed. And I guess that's what we're discovering now is that in the long, long term what you're really building is like you're building a group of people that believe this thing, that believe that the future of solving problems and making things will be just like focused more on the human thought process as opposed to the implementation part. And it's that belief that I think is what really gets you through the tough times and hopefully gets you to the other side someday. [01:06:47]Swyx: Awesome. I kind of want to play Lose Yourself as the outro music. [01:06:52]Alessio: Then we'll get DMCA strike. Thank you so much for coming on.Michael: Yeah, thank you so much for having me. This was really fun. [01:06:59] Get full access to Latent Space at www.latent.space/subscribe
01:07:2103/11/2023
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
The first workshops and talks from the AI Engineer Summit are now up! Join the >20k viewers on YouTube, find clips on Twitter (we’re also clipping @latentspacepod), and chat with us on Discord!Text-to-SQL was one of the first applications of NLP. Thoughtspot offered “Ask your data questions” as their core differentiation compared to traditional dashboarding tools. In a way, they provide a much friendlier interface with your own structured (aka “tabular”, as in “SQL tables”) data, the same way that RLHF and Instruction Tuning helped turn the GPT-3 of 2020 into the ChatGPT of 2022.Today, natural language queries on your databases are a commodity. There are 4 different ChatGPT plugins that offer this, as well as a bunch of startups like one of our previous guests, Seek.ai. Perplexity originally started with a similar product in 2022: In March 2023 LangChain wrote a blog post on LLMs and SQL highlighting why they don’t consistently work:* “LLMs can write SQL, but they are often prone to making up tables, making up field”* “LLMs have some context window which limits the amount of text they can operate over”* “The SQL it writes may be incorrect for whatever reason, or it could be correct but just return an unexpected result.”For example, if you ask a model to “return all active users in the last 7 days” it might hallucinate a `is_active` column, join to an `activity` table that doesn’t exist, or potentially get the wrong date (especially in leap years!).We previously talked to Shreya Rajpal at Guardrails AI, which also supports Text2SQL enforcement. Their approach was to run the actual SQL against your database and then use the error messages to improve the query: Semantic Layers to the rescueCube is an open source semantic layer which recently integrated with LangChain to solve these issues in a different way. You can use YAML, Javascript, or Python to create definitions of different metrics, measures and dimensions for your data: Creating these metrics and passing them in the model context limits the possibility for errors as the model just needs to query the `active_users` view, and Cube will then expand that into the full SQL in a reliable way. The downside of this approach compared to the Guardrails one for example is that it requires more upfront work to define metrics, but on the other hand it leads to more reliable and predictable outputs. The promise of adding a great semantic layer to your LLM app is irresistible - you greatly minimize hallucinations, make much more token efficient prompts, and your data stays up to date without any retraining or re-indexing. However, there are also difficulties with implementing semantic layers well, so we were glad to go deep on the topic with Artem as one of the leading players in this space!Timestamps* [00:00:00] Introductions* [00:01:28] Statsbot and limitations of natural language processing in 2017* [00:04:27] Building Cube as the infrastructure for Statsbot* [00:08:01] Open sourcing Cube in 2019* [00:09:09] Explaining the concept of a semantic layer/Cube* [00:11:01] Using semantic layers to provide context for AI models working with tabular data* [00:14:47] Workflow of generating queries from natural language via semantic layer* [00:21:07] Using Cube to power customer-facing analytics and natural language interfaces* [00:22:38] Building data-driven AI applications and agents* [00:25:59] The future of the modern data stack* [00:29:43] Example use cases of Slack bots powered by Cube* [00:30:59] Using GPT models and limitations around math* [00:32:44] Tips for building data-driven AI apps* [00:35:20] Challenges around monetizing embedded analytics* [00:36:27] Lightning RoundTranscriptSwyx: Hey everyone, welcome to the Latent Space podcast. This is Swyx, writer, editor of Latent Space and founder of Smol.ai and Alessio, partner and CTO in residence at Decibel Partners. [00:00:15]Alessio: Hey everyone, and today we have Artem Keydunov on the podcast, co-founder of Cube. Hey Artem. [00:00:21]Artem: Hey Alessio, hi Swyx. Good to be here today, thank you for inviting me. [00:00:25]Alessio: Yeah, thanks for joining. For people that don't know, I've known Artem for a long time, ever since he started Cube. And Cube is actually a spin-out of his previous company, which is Statsbot. And this kind of feels like going both backward and forward in time. So the premise of Statsbot was having a Slack bot that you can ask, basically like text to SQL in Slack, and this was six, seven years ago, something like that. A lot ahead of its time, and you see startups trying to do that today. And then Cube came out of that as a part of the infrastructure that was powering Statsbot. And Cube then evolved from an embedded analytics product to the semantic layer and just an awesome open source evolution. I think you have over 16,000 stars on GitHub today, you have a very active open source community. But maybe for people at home, just give a quick like lay of the land of the original Statsbot product. You know, what got you interested in like text to SQL and what were some of the limitations that you saw then, the limitations that you're also seeing today in the new landscape? [00:01:28]Artem: I started Statsbot in 2016. The original idea was to just make sort of a side project based off my initial project that I did at a company that I was working for back then. And I was working for a company that was building software for schools, and we were using Slack a lot. And Slack was growing really fast, a lot of people were talking about Slack, you know, like Slack apps, chatbots in general. So I think it was, you know, like another wave of, you know, bots and all that. We have one more wave right now, but it always comes in waves. So we were like living through one of those waves. And I wanted to build a bot that would give me information from different places where like a data lives to Slack. So it was like developer data, like New Relic, maybe some marketing data, Google Analytics, and then some just regular data, like a production database, so it sells for sometimes. And I wanted to bring it all into Slack, because we were always chatting, you know, like in Slack, and I wanted to see some stats in Slack. So that was the idea of Statsbot, right, like bring stats to Slack. I built that as a, you know, like a first sort of a side project, and I published it on Reddit. And people started to use it even before Slack came up with that Slack application directory. So it was a little, you know, like a hackish way to install it, but people are still installing it. So it was a lot of fun. And then Slack kind of came up with that application directory, and they reached out to me and they wanted to feature Statsbot, because it was one of the already being kind of widely used bots on Slack. So they featured me on this application directory front page, and I just got a lot of, you know, like new users signing up for that. It was a lot of fun, I think, you know, like, but it was sort of a big limitation in terms of how you can process natural language, because the original idea was to let people ask questions directly in Slack, right, hey, show me my, you know, like opportunities closed last week or something like that. My co founder, who kind of started helping me with this Slack application, him and I were trying to build a system to recognize that natural language. But it was, you know, we didn't have LLMs right back then and all of that technology. So it was really hard to build the system, especially the systems that can kind of, you know, like keep talking to you, like maintain some sort of a dialogue. It was a lot of like one off requests, and like, it was a lot of hit and miss, right? If you know how to construct a query in natural language, you will get a result back. But you know, like, it was not a system that was capable of, you know, like asking follow up questions to try to understand what you actually want. And then kind of finally, you know, like, bring this all context and go to generate a SQL query, get the result back and all of that. So that was a really missing part. And I think right now, that's, you know, like, what is the difference? So right now, I kind of bullish that if I would start Statsbot again, probably would have a much better shot at it. But back then, that was a big limitation. We kind of build a queue, right, as we were working on Statsbot, because we needed it. [00:04:27]Alessio: What was the ML stack at the time? Were you building, trying to build your own natural language understanding models, like were there open source models that were good that you were trying to leverage? [00:04:38]Artem: I think it was mostly combination of a bunch of things. And we tried a lot of different approaches. The first version, which I built, like was Regex. They were working well. [00:04:47]Swyx: It's the same as I did, I did option pricing when I was in finance, and I had a natural language pricing tool thing. And it was Regex. It was just a lot of Regex. [00:04:59]Artem: Yeah. [00:05:00]Artem: And my co-founder, Pavel, he's much smarter than I am. He's like PhD in math, all of that. And he started to do some stuff. I was like, no, you just do that stuff. I don't know. I can do Regex. And he started to do some models and trying to either look at what we had on the market back then, or try to build a different sort of models. Again, we didn't have any foundation back in place, right? We wanted to try to use existing math, obviously, right? But it was not something that we can take the model and try and run it. I think in 2019, we started to see more of stuff, like ecosystem being built, and then it eventually kind of resulted in all this LLM, like what we have right now. But back then in 2016, it was not much available for just the people to build on top. It was some academic research, right, kind of been happening. But it was very, very early for something to actually be able to use. [00:05:58]Alessio: And then that became Cube, which started just as an open source project. And I think I remember going on a walk with you in San Mateo in 2020, something like that. And you had people reaching out to you who were like, hey, we use Cube in production. I just need to give you some money, even though you guys are not a company. What's the story of Cube then from Statsbot to where you are today? [00:06:21]Artem: We built a Cube at Statsbot because we needed it. It was like, the whole Statsbot stack was that we first tried to translate the initial sort of language query into some sort of multidimensional query. It's like we were trying to understand, okay, people wanted to get active opportunities, right? What does it mean? Is it a metric? Is it what a dimension here? Because usually in analytics, you always, you know, like, try to reduce everything down to the sort of, you know, like a multidimensional framework. So that was the first step. And that's where, you know, like it didn't really work well because all this limitation of us not having foundational technologies. But then from the multidimensional query, we wanted to go to SQL. And that's what was SemanticLayer and what was Cube essentially. So we built a framework where you would be able to map your data into this concept, into this metrics. Because when people were coming to Statsbot, they were bringing their own datasets, right? And the big question was, how do we tell the system what is active opportunities for that specific users? How we kind of, you know, like provide that context, how we do the training. So that's why we came up with the idea of building the SemanticLayer so people can actually define their metrics and then kind of use them as a Statsbot. So that's how we built a Cube. At some point, we saw people started to see more value in the Cube itself, you know, like kind of building the SemanticLayer and then using it to power different types of the application. So in 2019, we decided, okay, it feels like it might be a standalone product and a lot of people want to use it. Let's just try to open source it. So we took it out of Statsbot and open-sourced. [00:08:01]Swyx: Can I make sure that everyone has the same foundational knowledge? The concept of a cube is not something that you invented. I think, you know, not everyone has the same background in analytics and data that all three of us do. Maybe you want to explain like OLAP Cube, HyperCube, the brief history of cubes. Right. [00:08:17]Artem: I'll try, you know, like a lot of like Wikipedia pages and like a lot of like a blog post trying to go into academics of it. So I'm trying to like... [00:08:25]Swyx: Cube's according to you. Yeah. [00:08:27]Artem: So when we think about just a table in a database, the problem with the table, it's not a multidimensional, meaning that in many cases, if we want to slice the data, we kind of need to result with a different table, right? Like think about when you're writing a SQL query to answer one question, SQL query always ends up with a table, right? So you write one SQL, you got one. And then you write to answer a different question, you write a second query. So you're kind of getting a bunch of tables. So now let's imagine that we can kind of bring all these tables together into multidimensional table. And that's essentially Cube. So it's just like the way that we can have measures and dimension that can potentially be used at the same time from a different angles. [00:09:09]Alessio: So initially, a lot of your use cases were more BI related, but you recently released a LangChain integration. There's obviously more and more interest in, again, using these models to answer data questions. So you've seen the chat GPT code interpreter, which is renamed as like advanced data analysis. What's kind of like the future of like the semantic layer in AI? You know, what are like some of the use cases that you're seeing and why do you think it's a good strategy to make it easier to do now the text to SQL you wanted to do seven years ago? [00:09:39]Artem: Yeah. So, I mean, you know, when it started to happen, I was just like, oh my God, people are now building Statsbot with Cube. They just have a better technology for, you know, like natural language. So it kind of, it made sense to me, you know, like from the first moment I saw it. So I think it's something that, you know, like happening right now and chat bot is one of the use cases. I think, you know, like if you try to generalize it, the use case would be how do we use structured or tabular data with, you know, like AI models, right? Like how do we turn the data and give the context as a data and then bring it to the model and then model can, you know, like give you answers, make a questions, do whatever you want. But the question is like how we go from just the data in your data warehouse, database, whatever, which is usually just a tabular data, right? Like in a SQL based warehouses to some sort of, you know, like a context that system can do. And if you're building this application, you have to do it. It's like no way you can get away around not doing this. You either map it manually or you come up with some framework or something else. So our take is that and my take is that semantic layer is just really good place for this context to leave because you need to give this context to the humans. You need to give that context to the AI system anyway, right? So that's why you define metric once and then, you know, like you teach your AI system what this metric is about. [00:11:01]Alessio: What are some of the challenges of using tabular versus language data and some of the ways that having the semantic layer kind of makes that easier maybe? [00:11:09]Artem: Imagine you're a human, right? And you're going into like your new data analyst at a company and just people give you a warehouse with a bunch of tables and they tell you, okay, just try to make sense of this data. And you're going through all of these tables and you're really like trying to make sense without any, you know, like additional context or like some columns. In many cases, they might have a weird names. Sometimes, you know, if they follow some kind of like a star schema or, you know, like a Kimball style dimensions, maybe that would be easier because you would have facts and dimensions column, but it's still, it's hard to understand and kind of make sense because it doesn't have descriptions, right? And then there is like a whole like industry of like a data catalogs exist because the whole purpose of that to give context to the data so people can understand that. And I think the same applies to the AI, right? Like, and the same challenge is that if you give it pure tabular data, it doesn't have this sort of context that it can read. So you sort of needed to write a book or like essay about your data and give that book to the system so it can understand it. [00:12:12]Alessio: Can you run through the steps of how that works today? So the initial part is like the natural language query, like what are the steps that happen in between to do model, to semantic layer, semantic layer, to SQL and all that flow? [00:12:26]Artem: The first key step is to do some sort of indexing. That's what I was referring to, like write a book about your data, right? Describe in a text format what your data is about, right? Like what metrics it has, dimensions, what is the structures of that, what a relationship between those metrics, what are potential values of the dimensions. So sort of, you know, like build a really good index as a text representation and then turn it into embeddings into your, you know, like a vector storage. Once you have that, then you can provide that as a context to the model. I mean, there are like a lot of options, like either fine tune or, you know, like sort of in context learning, but somehow kind of give that as a context to the model, right? And then once this model has this context, it can create a query. Now the query I believe should be created against semantic layer because it reduces the room for the error. Because what usually happens is that your query to semantic layer would be very simple. It would be like, give me that metric group by that dimension and maybe that filter should be applied. And then your real query for the warehouse, it might have like a five joins, a lot of different techniques, like how to avoid fan out, fan traps, chasm traps, all of that stuff. And the bigger query, the more room that the model can make an error, right? Like even sometimes it could be a small error and then, you know, like your numbers is going to be off. But making a query against semantic layer, that sort of reduces the error. So the model generates a SQL query and then it executes us again, semantic layer. And semantic layer executes us against your warehouse and then sends result all the way back to the application. And then can be done multiple times because what we were missing was both this ability to have a conversation, right? With the model. You can ask question and then system can do a follow-up questions, you know, like then do a query to get some additional information based on this information, do a query again. And sort of, you know, like it can keep doing this stuff and then eventually maybe give you a big report that consists of a lot of like data points. But the whole flow is that it knows the system, it knows your data because you already kind of did the indexing and then it queries semantic layer instead of a data warehouse directly. [00:14:47]Alessio: Maybe just to make it a little clearer for people that haven't used a semantic layer before, you can add definitions like revenue, where revenue is like select from customers and like join orders and then sum of the amount of orders. But in the semantic layer, you're kind of hiding all of that away. So when you do natural language to queue, it just select revenue from last week and then it turns into a bigger query. [00:15:12]Swyx: One of the biggest difficulties around semantic layer for people who've never thought about this concept before, this all sounds super neat until you have multiple stakeholders within a single company who all have different concepts of what a revenue is. They all have different concepts of what active user is. And then they'll have like, you know, revenue revision one by the sales team, you know, and then revenue revision one, accounting team or tax team, I don't know. I feel like I always want semantic layer discussions to talk about the not so pretty parts of the semantic layer, because this is where effectively you ship your org chart in the semantic layer. [00:15:47]Artem: I think the way I think about it is that at the end of the day, semantic layer is a code base. And in Qubit, it's essentially a code base, right? It's not just a set of YAML files with pythons. I think code is never perfect, right? It's never going to be perfect. It will have a lot of, you know, like revisions of code. We have a version control, which helps it's easier with revisions. So I think we should treat our metrics and semantic layer as a code, right? And then collaboration is a big part of it. You know, like if there are like multiple teams that sort of have a different opinions, let them collaborate on the pull request, you know, they can discuss that, like why they think that should be calculated differently, have an open conversation about it, you know, like when everyone can just discuss it, like an open source community, right? Like you go on a GitHub and you talk about why that code is written the way it's written, right? It should be written differently. And then hopefully at some point you can come up, you know, like to some definition. Now if you still should have multiple versions, right? It's a code, right? You can still manage it. But I think the big part of that is that like, we really need to treat it as a code base. Then it makes a lot of things easier, not as spreadsheets, you know, like a hidden Excel files. [00:16:53]Alessio: The other thing is like then having the definition spread in the organization, like versus everybody trying to come up with their own thing. But yeah, I'm sure that when you talk to customers, there's people that have issues with the product and it's really like two people trying to define the same thing. One in sales that wants to look good, the other is like the finance team that wants to be conservative and they all have different definitions. How important is the natural language to people? Obviously you guys both work in modern data stack companies either now or before. There's going to be the whole wave of empowering data professionals. I think now a big part of the wave is removing the need for data professionals to always be in the loop and having non-technical folks do more of the work. Are you seeing that as a big push too with these models, like allowing everybody to interact with the data? [00:17:42]Artem: I think it's a multidimensional question. That's an example of, you know, like where you have a lot of inside the question. In terms of examples, I think a lot of people building different, you know, like agents or chatbots. You have a company that built an internal Slack bot that sort of answers questions, you know, like based on the data in a warehouse. And then like a lot of people kind of go in and like ask that chatbot this question. Is it like a real big use case? Maybe. Is it still like a toy pet project? Maybe too right now. I think it's really hard to tell them apart at this point because there is a lot of like a hype, you know, and just people building LLM stuff because it's cool and everyone wants to build something, you know, like even at least a pet project. So that's what happened in Krizawa community as well. We see a lot of like people building a lot of cool stuff and it probably will take some time for that stuff to mature and kind of to see like what are real, the best use cases. But I think what I saw so far, one use case was building this chatbot and we have even one company that are building it as a service. So they essentially connect into Q semantic layer and then offering their like chatbot So you can do it in a web, in a slack, so it can, you know, like answer questions based on data in your semantic layer, but also see a lot of things like they're just being built in house. And there are other use cases, sort of automation, you know, like that agent checks on the data and then kind of perform some actions based, you know, like on changes in data. But other dimension of your question is like, will it replace people or not? I think, you know, like what I see so far in data specifically, you know, like a few use cases of LLM, I don't see Q being part of that use case, but it's more like a copilot for data analyst, a copilot for data engineer, where you develop something, you develop a model and it can help you to write a SQL or something like that. So you know, it can create a boilerplate SQL, and then you can edit this SQL, which is fine because you know how to edit SQL, right? So you're not going to make a mistake, but it will help you to just generate, you know, like a bunch of SQL that you write again and again, right? Like boilerplate code. So sort of a copilot use case. I think that's great. And we'll see more of it. I think every platform that is building for data engineers will have some sort of a copilot capabilities and Cubectl, we're building this copilot capabilities to help people build semantic layers easier. I think that just a baseline for every engineering product right now to have some sort of, you know, like a copilot capabilities. Then the other use case is a little bit more where Cube is being involved is like, how do we enable access to data for non-technical people through the natural language as an interface to data, right? Like visual dashboards, charts, it's always has been an interface to data in every BI. Now I think we will see just a second interface as a just kind of a natural language. So I think at this point, many BI's will add it as a commodity feature is like Tableau will probably have a search bar at some point saying like, Hey, ask me a question. I know that some of the, you know, like AWS Squeak site, they're about to announce features like this in their like BI. And I think Power BI will do that, especially with their deal with open AI. So every company, every BI will have this some sort of a search capabilities built in inside their BI. So I think that's just going to be a baseline feature for them as well. But that's where Cube can help because we can provide that context, right? [00:21:07]Alessio: Do you know how, or do you have an idea for how these products will differentiate once you get the same interface? So right now there's like, you know, Tableau is like the super complicated and it's like super sad. It's like easier. Yeah. Do you just see everything will look the same and then how do people differentiate? [00:21:24]Artem: It's like they all have line chart, right? And they all have bar chart. I feel like it pretty much the same and it's going to be fragmented as well. And every major vendor and most of the vendors will try to have some sort of natural language capabilities and they might be a little bit different. Some of them will try to position the whole product around it. Some of them will just have them as a checkbox, right? So we'll see, but I don't think it's going to be something that will change the BI market, you know, like something that will can take the BI market and make it more consolidated rather than, you know, like what we have right now. I think it's still will remain fragmented. [00:22:04]Alessio: Let's talk a bit more about application use cases. So people also use Q for kind of like analytics in their product, like dashboards and things like that. How do you see that changing and more, especially like when it comes to like agents, you know, so there's like a lot of people trying to build agents for reporting, building agents for sales. If you're building a sales agent, you need to know everything about the purchasing history of the customer. All of these things. Yeah. Any thoughts there? What should all the AI engineers listening think about when implementing data into agents? [00:22:38]Artem: Yeah, I think kind of, you know, like trying to solve for two problems. One is how to make sure that agents or LLM model, right, has enough context about, you know, like a tabular data and also, you know, like how do we deliver updates to the context, which is also important because data is changing, right? So every time we change something upstream, we need to surely update that context in our vector database or something. And how do you make sure that the queries are correct? You know, I think it's obviously a big pain and that's all, you know, like AI kind of, you know, like a space right now, how do we make sure that we don't, you know, provide our own cancers, but I think, you know, like be able to reduce the room for error as much as possible that what I would look for, you know, like to try to like minimize potential damage. And then our use case for Qube, it's been using a lot to power sort of customer facing analytics. So I don't think much is going to change is that I feel like again, more and more products will adopt natural language interfaces as sort of a part of that product as well. So we would be able to power this business to not only, you know, like a chart, visuals, but also some sort of, you know, like a summaries, probably in the future, you're going to open the page with some surface stats and you will have a smart summary kind of generated by AI. And that summary can be powered by Qube, right, like, because the rest is already being powered by Qube. [00:24:04]Alessio: You know, we had Linus from Notion on the pod and one of the ideas he had that I really like is kind of like thumbnails of text, kind of like how do you like compress knowledge and then start to expand it. A lot of that comes into dashboards, you know, where like you have a lot of data, you have like a lot of charts and sometimes you just want to know, hey, this is like the three lines summary of it. [00:24:25]Artem: Exactly. [00:24:26]Alessio: Makes sense that you want to power that. How are you thinking about, yeah, the evolution of like the modern data stack in quotes, whatever that means today. What's like the future of what people are going to do? What's the future of like what models and agents are going to do for them? Do you have any, any thoughts? [00:24:42]Artem: I feel like modern data stack sometimes is not very, I mean, it's obviously big crossover between AI, you know, like ecosystem, AI infrastructure, ecosystem, and then sort of a data. But I don't think it's a full overlap. So I feel like when we know, like I'm looking at a lot of like what's happening in a modern data stack where like we use warehouses, we use BI's, you know, different like transformation tools, catalogs, like data quality tools, ETLs, all of that. I don't see a lot of being compacted by AI specifically. I think, you know, that space is being compacted as much as any other space in terms of, yes, we'll have all this copilot capabilities, some of AI capabilities here and there, but I don't see anything sort of dramatically, you know, being sort of, you know, a change or shifted because of, you know, like AI wave. In terms of just in general data space, I think in the last two, three years, we saw an explosion, right? Like we got like a lot of tools, every vendor for every problem. I feel like right now we should go through the cycle of consolidation. If Fivetran and DBT merge, they can be Alteryx of a new generation or something like that. And you know, probably some ETL tool there. I feel it might happen. I mean, it's just natural waves, you know, like in cycles. [00:25:59]Alessio: I wonder if everybody is going to have their own copilot. The other thing I think about these models is like Swyx was at Airbyte and yeah, there's Fivetran. [00:26:08]Swyx: Fivetran versus AirByte, I don't think it'll mix very well. [00:26:10]Alessio: A lot of times these companies are doing the syntax work for you of like building the integration between your data store and like the app or another data store. I feel like now these models are pretty good at coming up with the integration themselves and like using the docs to then connect the two. So I'm really curious, like in the future, what that will look like. And same with data transformation. I mean, you think about DBT and some of these tools and right now you have to create rules to normalize and transform data. In the future, I could see you explaining the model, how you want the data to be, and then the model figuring out how to do the transformation. I think it all needs a semantic layer as far as like figuring out what to do with it. You know, what's the data for and where it goes. [00:26:53]Artem: Yeah, I think many of this, you know, like workflows will be augmented by, you know, like some sort of a copilot. You know, you can describe what transformation you want to see and it can generate a boilerplate right, of transformation for you, or even, you know, like kind of generate a boilerplate of specific ETL driver or ETL integration. I think we're still not at the point where this code can be fully automated. So we still need a human and a loop, right, like who can be, who can use this copilot. But in general, I think, yeah, data work and software engineering work can be augmented quite significantly with all that stuff. [00:27:31]Alessio: You know, the big thing with machine learning before was like, well, all of your data is bad. You know, the data is not good for anything. And I think like now, at least with these models, they have some knowledge of their own and they can also tell you if your data is bad, which I think is like something that before you didn't have. Any cool apps that you've seen being built on Qube, like any kind of like AI native things that people should think about, new experiences, anything like that? [00:27:54]Artem: Well, I see a lot of Slack bots. They all remind me of Statsbot, but I know like I played with a few of them. They're much, much better than Statsbot. It feels like it's on the surface, right? It's just that use case that you really want, you know, think about you, a data engineer in your company, like everyone is like, and you're asking, hey, can you pull that data for me? And you would be like, can I build a bot to replace myself? You know, like, so they can both ping that bot instead. So it's like, that's why a lot of people doing that. So I think it's a first use case that actually people are playing with. But I think inside that use case, people get creative. So I see bots that can actually have a dialogue with you. So, you know, like you would come to that bot and say, hey, show me metrics. And the bot would be like, what kind of metrics? What do you want to look at? You will be like active users. And then it would be like, how do you define active users? You want to see active users sort of cohort, you want to see active users kind of changing behavior over time, like a lot of like a follow up questions. So it tries to sort of, you know, like understand what exactly you want. And that's how many data analysts work, right? When people started to ask you something, you always try to understand what exactly do you mean? Because many people don't know how to ask correct questions about your data. It's a sort of an interesting specter. On one side of the specter, you know, nothing is like, hey, show me metrics. And the other side of specter, you know how to write SQL, and you can write exact query to your data warehouse, right? So many people like a little bit in the middle. And the data analysts, they usually have the knowledge about your data. And that's why they can ask follow up questions and to understand what exactly you want. And I saw people building bots who can do that. That part is amazing. I mean, like generating SQL, all that stuff, it's okay, it's good. But when the bot can actually act like they know that your data and they can ask follow up questions. I think that's great. [00:29:43]Swyx: Yeah. [00:29:44]Alessio: Are there any issues with the models and the way they understand numbers? One of the big complaints people have is like GPT, at least 3.5, cannot do math. Have you seen any limitations and improvement? And also when it comes to what model to use, do you see most people use like GPT-4? Because it's like the best at this kind of analysis. [00:30:03]Artem: I think I saw people use all kinds of models. To be honest, it's usually GPT. So inside GPT, it could be 3.5 or 4, right? But it's not like I see a lot of something else, to be honest, like, I mean, maybe some open source alternatives, but it feels like the market is being dominated by just chat GPT. In terms of the problems, I think chatting about it with a few people. So if math is required to do math, you know, like outside of, you know, like chat GPT itself, so it would be like some additional Python scripts or something. When we're talking about production level use cases, it's quite a lot of Python code around, you know, like your model to make it work. To be honest, it's like, it's not that magic that you just throw the model in and like it can give you all these answers. For like a toy use cases, the one we have on a, you know, like our demo page or something, it works fine. But, you know, like if you want to do like a lot of post-processing, do a mass on URL, you probably need to code it in Python anyway. That's what I see people doing. [00:30:59]Alessio: We heard the same from Harrison and LangChain that most people just use OpenAI. We did a OpenAI has no moat emergency podcast, and it was funny to like just see the reaction that people had to that and how hard it actually is to break down some of the monopoly. What else should people keep in mind, Artem? You're kind of like at the cutting edge of this. You know, if I'm looking to build a data-driven AI application, I'm trying to build data into my AI workflows. Any mistakes people should avoid? Any tips on the best stack to use? What tools to use? [00:31:32]Artem: I would just recommend going through to warehouse as soon as possible. I think a lot of people feel that MySQL can be a warehouse, which can be maybe on like a lower scale, but definitely not from a performance perspective. So just kind of starting with a good warehouse, a query engine, Lakehouse, that's probably like something I would recommend starting from a day zero. And there are good ways to do it, very cheap, with open source technologies too, especially in the Lakehouse architecture. I think, you know, I'm biased, obviously, but using a semantic layer, preferably Cube, and for, you know, like a context. And other than that, I just feel it's a very interesting space in terms of AI ecosystem. I see a lot of people using link chain right now, which is great, you know, like, and we build an integration. But I'm sure the space will continue to evolve and, you know, like we'll see a lot of interesting tools and maybe, you know, like some tools would be a better fit for a job. I'm not aware of any right now, but it's always interesting to see how it evolves. Also it's a little unclear, you know, like how all the infrastructure around actually developing, testing, documenting, all that stuff will kind of evolve too. But yeah, again, it's just like really interesting to see and observe, you know, what's happening in this space. [00:32:44]Swyx: So before we go to the lightning round, I wanted to ask you on your thoughts on embedded analytics and in a sense, the kind of chatbots that people are inserting on their websites and building with LLMs is very much sort of end user programming or end user interaction with their own data. I love seeing embedded analytics, and for those who don't know, embedded analytics is basically user facing dashboards where you can see your own data, right? Instead of the company seeing data across all their customers, it's an individual user seeing their own data as a slice of the overall data that is owned by the platform that they're using. So I love embedded analytics. Well, actually, overwhelmingly, the observation that I've had is that people who try to build in this market fail to monetize. And I was wondering your insights on why. [00:33:31]Artem: I think overall, the statement is true. It's really hard to monetize, you know, like in embedded analytics. That's why at Qube we're excited more about our internal kind of BI use case, or like a company's a building, you know, like a chatbots for their internal data consumption or like internal workflows. Embedded analytics is hard to monetize because it's historically been dominated by the BI vendors. And we still see a lot of organizations are using BI tools as vendors. And what I was talking about, BI vendors adding natural language interfaces, they will probably add that to the embedded analytics capabilities as well, right? So they would be able to embed that too. So I think that's part of it. Also, you know, if you look at the embedded analytics market, the bigger organizations are big GADs, they're really more custom, you know, like it becomes and at some point I see many organizations, they just stop using any vendor, and they just kind of build most of the stuff from scratch, which probably, you know, like the right way to do. So it's sort of, you know, like you got a market that is very kept at the top. And then you also in that middle and small segment, you got a lot of vendors trying to, you know, like to compete for the buyers. And because again, the BI is very fragmented, embedded analytics, therefore is fragmented also. So you're really going after the mid market slice, and then with a lot of other vendors competing for that. So that's why it's historically been hard to monetize, right? I don't think AI really going to change that just because it's using model, you just pay to open AI. And that's it, like everyone can do that, right? So it's not much of a competitive advantage. So it's going to be more like a commodity features that a lot of vendors would be able to leverage. [00:35:20]Alessio: This is great, Artem. As usual, we got our lightning round. So it's three questions. One is about acceleration, one on exploration, and then take away. The acceleration thing is what's something that already happened in AI or maybe, you know, in data that you thought would take much longer, but it's already happening today. [00:35:38]Artem: To be honest, all this foundational models, I thought that we had a lot of models that been in production for like, you know, maybe decade or so. And it was like a very niche use cases, very vertical use cases, it's just like in very customized models. And even when we're building Statsbot back then in 2016, right, even back then, we had some natural language models being deployed, like a Google Translate or something that was still was a sort of a model, right, but it was very customized with a specific use case. So I thought that would continue for like, many years, we will use AI, we'll have all these customized niche models. But there is like foundational model, they like very generic now, they can serve many, many different use cases. So I think that is a big change. And I didn't expect that, to be honest. [00:36:27]Swyx: The next question is about exploration. What is one thing that you think is the most interesting unsolved question in AI? [00:36:33]Artem: I think AI is a subset of software engineering in general. And it's sort of connected to the data as well. Because software engineering as a discipline, it has quite a history. We build a lot of processes, you know, like toolkits and methodologies, how we prod that, [00:36:50]Swyx: right. [00:36:51]Artem: But AI, I don't think it's completely different. But it has some unique traits, you know, like, it's quite not idempotent, right, and kind of from many dimensions and like other traits. So which kind of may require a different methodologies may require different approaches and a different toolkit. I don't think how much is going to deviate from a standard software engineering, I think many tools and practices that we develop our software engineering can be applied to AI. And some of the data best practices can be applied as well. But it's like we got a DevOps, right, like it's just a bunch of tools, like ecosystem. So now like AI is kind of feels like it's shaping into that with a lot of its own, you know, like methodologies, practices and toolkits. So I'm really excited about it. And I think it's a lot of unsolved still question again, how do we develop that? How do we test you know, like, what is the best practices? How what is a methodologist? So I think that would be an interesting to see. [00:37:44]Alessio: Awesome. Yeah. Our final message, you know, you have a big audience of engineers and technical folks, what's something you want everybody to remember to think about to explore? [00:37:55]Artem: I mean, it says being hooked to try to build a chatbot, you know, like for analytics, back then and kind of, you know, like looking at what people do right now, I think, yeah, just do that. I mean, it's working right now, with foundational models, it's actually now it's possible to build all those cool applications. I'm so excited to see, you know, like, how much changed in the last six years or so that we actually now can build a smart agents. So I think that sort of, you know, like a takeaways and yeah, we are, as humans in general, we like we really move technology forward. And it's fun to see, you know, like, it's just a first hand. [00:38:30]Alessio: Well, thank you so much for coming on Artem. [00:38:32]Swyx: This was great. [00:38:32] Get full access to Latent Space at www.latent.space/subscribe
38:4826/10/2023
The End of Finetuning — with Jeremy Howard of Fast.ai
Thanks to the over 17,000 people who have joined the first AI Engineer Summit! A full recap is coming. Last call to fill out the State of AI Engineering survey! See our Community page for upcoming meetups in SF, Paris and NYC.This episode had good interest on Twitter and was discussed on the Vanishing Gradients podcast.Fast.ai’s “Practical Deep Learning” courses been watched by over >6,000,000 people, and the fastai library has over 25,000 stars on Github. Jeremy Howard, one of the creators of Fast, is now one of the most prominent and respected voices in the machine learning industry; but that wasn’t always the case. Being non-consensus and right In 2018, Jeremy and Sebastian Ruder published a paper on ULMFiT (Universal Language Model Fine-tuning), a 3-step transfer learning technique for NLP tasks: The paper demonstrated that pre-trained language models could be fine-tuned on a specific task with a relatively small amount of data to achieve state-of-the-art results. They trained a 24M parameters model on WikiText-103 which was beat most benchmarks.While the paper had great results, the methods behind weren’t taken seriously by the community: “Everybody hated fine tuning. Everybody hated transfer learning. I literally did tours trying to get people to start doing transfer learning and nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning […] which I was convinced was not the right direction, but who's going to listen to me, cause as you said, I don't have a PhD, not at a university… I don't have a big set of computers to fine tune huge transformer models.”Five years later, fine-tuning is at the center of most major discussion topics in AI (we covered some like fine tuning vs RAG and small models fine tuning), and we might have gotten here earlier if Jeremy had OpenAI-level access to compute and distribution. At heart, Jeremy has always been “GPU poor”:“I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use.”This story is a good reminder of how some of the best ideas are hiding in plain sight; we recently covered RWKV and will continue to highlight the most interesting research that isn’t being done in the large labs. Replacing fine-tuning with continued pre-trainingEven though fine-tuning is now mainstream, we still have a lot to learn. The issue of “catastrophic forgetting” and potential solutions have been brought up in many papers: at the fine-tuning stage, the model can forget tasks it previously knew how to solve in favor of new ones. The other issue is apparent memorization of the dataset even after a single epoch, which Jeremy covered Can LLMs learn from a single example? but we still don’t have the answer to. Despite being the creator of ULMFiT, Jeremy still professes that there are a lot of open questions on finetuning:“So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do.”He now advocates for "continued pre-training" - maintaining a diversity of data throughout the training process rather than separate pre-training and fine-tuning stages. Mixing instructional data, exercises, code, and other modalities while gradually curating higher quality data can avoid catastrophic forgetting and lead to more robust capabilities (something we covered in Datasets 101).“Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it… the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data….So yeah, that's now my view, is I think ULMFiT is the wrong approach. And that's why we're seeing a lot of these so-called alignment tax… I think it's actually because people are training them wrong.An example of this phenomena is CodeLlama, a LLaMA2 model finetuned on 500B tokens of code: while the model is much better at code, it’s worse on generic tasks that LLaMA2 knew how to solve well before the fine-tuning. In the episode we also dive into all the places where open source model development and research is happening (academia vs Discords - tracked on our Communities list and on our survey), and how Jeremy recommends getting the most out of these diffuse, pseudonymous communities (similar to the Eleuther AI Mafia).Show Notes* Jeremy’s Background* FastMail* Optimal Decisions* Kaggle* Enlitic* fast.ai* Rachel Thomas* Practical Deep Learning* fastai for PyTorch* nbdev* fastec2 (the underrated library we describe)* Can LLMs learn from a single example?* the Kaggle LLM Science Exam competition, which “challenges participants to answer difficult science-based questions written by a Large Language Model”.* Sebastian Ruder* Alec Radford* Sylvain Gugger* Stephen Merity* Chris Lattner* Modular.ai / Mojo* Jono Whittaker* Zeiler and Fergus paper* ULM Fit* DAWNBench* Phi-1* Code Llama* AlexNetTimestamps* [00:00:00] Intros and Jeremy’s background* [00:05:28] Creating ULM Fit - a breakthrough in NLP using transfer learning* [00:06:32] The rise of GPT and the appeal of few-shot learning over fine-tuning* [00:10:00] Starting Fast.ai to distribute AI capabilities beyond elite academics* [00:14:30] How modern LMs like ChatGPT still follow the ULM Fit 3-step approach* [00:17:23] Meeting with Chris Lattner on Swift for TensorFlow at Google* [00:20:00] Continued pre-training as a fine-tuning alternative* [00:22:16] Fast.ai and looking for impact vs profit maximization* [00:26:39] Using Fast.ai to create an "army" of AI experts to improve their domains* [00:29:32] Fast.ai's 3 focus areas - research, software, and courses* [00:38:42] Fine-tuning memorization and training curve "clunks" before each epoch* [00:46:47] Poor training and fine-tuning practices may be causing alignment failures* [00:48:38] Academia vs Discords* [00:53:41] Jeremy's high hopes for Chris Lattner's Mojo and its potential* [01:05:00] Adding capabilities like SQL generation through quick fine-tuning* [01:10:12] Rethinking Fast.ai courses for the AI-assisted coding era* [01:14:53] Rapid model development has created major technical debt* [01:17:08] Lightning RoundAI Summary (beta)This is the first episode we’re trying this. Here’s an overview of the main topics before you dive in the transcript. * Jeremy's background and philosophies on AI* Studied philosophy and cognitive science in college* Focused on ethics and thinking about AI even 30 years ago* Believes AI should be accessible to more people, not just elite academics/programmers* Created fast.ai to make deep learning more accessible* Development of transfer learning and ULMFit* Idea of transfer learning critical for making deep learning accessible* ULMFit pioneered transfer learning for NLP* Proposed training general language models on large corpora then fine-tuning - this became standard practice* Faced skepticism that this approach would work from NLP community* Showed state-of-the-art results on text classification soon after trying it* Current open questions around fine-tuning LLMs* Models appear to memorize training data extremely quickly (after 1 epoch)* This may hurt training dynamics and cause catastrophic forgetting* Unclear how best to fine-tune models to incorporate new information/capabilities* Need more research on model training dynamics and ideal data mixing* Exciting new developments* Mojo and new programming languages like Swift could enable faster model innovation* Still lots of room for improvements in computer vision-like innovations in transformers* Small models with fine-tuning may be surprisingly capable for many real-world tasks* Prompting strategies enable models like GPT-3 to achieve new skills like playing chess at superhuman levels* LLMs are like computer vision in 2013 - on the cusp of huge new breakthroughs in capabilities* Access to AI research* Many key convos happen in private Discord channels and forums* Becoming part of these communities can provide great learning opportunities* Being willing to do real work, not just talk about ideas, is key to gaining access* The future of practical AI* Coding becoming more accessible to non-programmers through AI assistance* Pre-requisite programming experience for learning AI may no longer be needed* Huge open questions remain about how to best train, fine-tune, and prompt LLMsTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:21]Swyx: Hey, and today we have in the remote studio, Jeremy Howard all the way from Australia. Good morning. [00:00:27]Jeremy: The remote studio, also known as my house. Good morning. Nice to see you. [00:00:32]Swyx: Nice to see you too. I'm actually very used to seeing you in your mask as a message to people, but today we're mostly audio. But thank you for doing the very important public service of COVID awareness. It was a pleasure. [00:00:46]Jeremy: It was all very annoying and frustrating and tedious, but somebody had to do it. [00:00:52]Swyx: Somebody had to do it, especially somebody with your profile. I think it really drives home the message. So we tend to introduce people for them and then ask people to fill in the blanks on the personal side. Something I did not know about you was that you graduated with a BA in philosophy from the University of Melbourne. I assumed you had a PhD. [00:01:14]Jeremy: No, I mean, I barely got through my BA because I was working 80 to 100 hour weeks at McKinsey and Company from 19 years old onwards. So I actually didn't attend any lectures in second and third year university. [00:01:35]Swyx: Well, I guess you didn't need it or you're very sort of self-driven and self-motivated. [00:01:39]Jeremy: I took two weeks off before each exam period when I was working at McKinsey. And then, I mean, I can't believe I got away with this in hindsight, I would go to all my professors and say, oh, I was meant to be in your class this semester and I didn't quite turn up. Were there any assignments I was meant to have done, whatever. I can't believe all of them let me basically have it. They basically always would say like, okay, well, if you can have this written by tomorrow, I'll accept it. So yeah, stressful way to get through university, but. [00:02:12]Swyx: Well, it shows that, I guess, you min-maxed the opportunities. That definitely was a precursor. [00:02:18]Jeremy: I mean, funnily, like in as much as I, you know, in philosophy, the things I found interesting and focused on in the little bit of time I did spend on it was ethics and cognitive science. And it's kind of really amazing that it's now come back around and those are actually genuinely useful things to know about, which I never thought would happen. [00:02:38]Swyx: A lot of, yeah, a lot of relevant conversations there. So you were a consultant for a while and then in the magical month of June 1989, you founded both Optimal Decisions and Fastmeal, which I also briefly used. So thank you for that. [00:02:53]Jeremy: Oh, good for you. Yeah. Cause I had read the statistics, which is that like 90% or something of small businesses fail. So I thought if I start two businesses, I have a higher chance. In hindsight, I was thinking of it as some kind of stochastic thing I didn't have control over, but it's a bit odd, but anyway. [00:03:10]Swyx: And then you were president and chief scientist at Kaggle, which obviously is the sort of composition platform of machine learning. And then Enlitic, where you were working on using deep learning to improve medical diagnostics and clinical decisions. Yeah. [00:03:28]Jeremy: I was actually the first company to use deep learning in medicine, so I kind of founded the field. [00:03:33]Swyx: And even now that's still like a pretty early phase. And I actually heard you on your new podcast with Tanish, where you went very, very deep into the stuff, the kind of work that he's doing, such a young prodigy at his age. [00:03:47]Jeremy: Maybe he's too old to be called a prodigy now, ex-prodigy. No, no. [00:03:51]Swyx: I think he still counts. And anyway, just to round out the bio, you have a lot more other credentials, obviously, but most recently you started Fast.ai, which is still, I guess, your primary identity with Rachel Thomas. So welcome. [00:04:05]Jeremy: Yep. [00:04:06]Swyx: Thanks to my wife. Thank you. Yeah. Doing a lot of public service there with getting people involved in AI, and I can't imagine a better way to describe it than fast, fast.ai. You teach people from nothing to stable diffusion in seven weeks or something, and that's amazing. Yeah, yeah. [00:04:22]Jeremy: I mean, it's funny, you know, when we started that, what was that, like 2016 or something, the idea that deep learning was something that you could make more accessible was generally considered stupid. Everybody knew that deep learning was a thing that you got a math or a computer science PhD, you know, there was one of five labs that could give you the appropriate skills and that you would join, yeah, basically from one of those labs, you might be able to write some papers. So yeah, the idea that normal people could use that technology to do good work was considered kind of ridiculous when we started it. And we weren't sure if it was possible either, but we kind of felt like we had to give it a go because the alternative was we were pretty sure that deep learning was on its way to becoming, you know, the most or one of the most, you know, important technologies in human history. And if the only people that could use it were a handful of computer science PhDs, that seemed like A, a big waste and B, kind of dangerous. [00:05:28]Swyx: Yeah. [00:05:29]Alessio: And, you know, well, I just wanted to know one thing on your bio that at Kaggle, you were also the top rank participant in both 2010 and 2011. So sometimes you see a lot of founders running companies that are not really in touch with the problem, but you were clearly building something that you knew a lot about, which is awesome. Talking about deep learning, you created, published a paper on ULM fit, which was kind of the predecessor to multitask learning and a lot of the groundwork that then went to into Transformers. I've read back on the paper and you turned this model, AWD LSTM, which I did the math and it was like 24 to 33 million parameters, depending on what training data set you use today. That's kind of like not even small, it's like super small. What were some of the kind of like contrarian takes that you had at the time and maybe set the stage a little bit for the rest of the audience on what was kind of like the state of the art, so to speak, at the time and what people were working towards? [00:06:32]Jeremy: Yeah, the whole thing was a contrarian take, you know. So okay, so we started Fast.ai, my wife and I, and we thought, yeah, so we're trying to think, okay, how do we make it more accessible? So when we started thinking about it, it was probably 2015 and then 2016, we started doing something about it. Why is it inaccessible? Okay, well, A, no one knows how to do it other than a few number of people. And then when we asked those few number of people, well, how do you actually get good results? They would say like, oh, it's like, you know, a box of tricks that aren't published. So you have to join one of the labs and learn the tricks. So a bunch of unpublished tricks, not much software around, but thankfully there was Theano and rappers and particularly Lasagna, the rapper, but yeah, not much software around, not much in the way of data sets, you know, very hard to get started in terms of the compute. Like how do you get that set up? So yeah, no, everything was kind of inaccessible. And you know, as we started looking into it, we had a key insight, which was like, you know what, most of the compute and data for image recognition, for example, we don't need to do it. You know, there's this thing which nobody knows about, nobody talks about called transfer learning, where you take somebody else's model, where they already figured out like how to detect edges and gradients and corners and text and whatever else, and then you can fine tune it to do the thing you want to do. And we thought that's the key. That's the key to becoming more accessible in terms of compute and data requirements. So when we started Fast.ai, we focused from day one on transfer learning. Lesson one, in fact, was transfer learning, literally lesson one, something not normally even mentioned in, I mean, there wasn't much in the way of courses, you know, the courses out there were PhD programs that had happened to have recorded their lessons and they would rarely mention it at all. We wanted to show how to do four things that seemed really useful. You know, work with vision, work with tables of data, work with kind of recommendation systems and collaborative filtering and work with text, because we felt like those four kind of modalities covered a lot of the stuff that, you know, are useful in real life. And no one was doing anything much useful with text. Everybody was talking about word2vec, you know, like king plus queen minus woman and blah, blah, blah. It was like cool experiments, but nobody's doing anything like useful with it. NLP was all like lemmatization and stop words and topic models and bigrams and SPMs. And it was really academic and not practical. But I mean, to be honest, I've been thinking about this crazy idea for nearly 30 years since I had done cognitive science at university, where we talked a lot about the CELS Chinese room experiment. This idea of like, what if there was somebody that could kind of like, knew all of the symbolic manipulations required to answer questions in Chinese, but they didn't speak Chinese and they were kind of inside a room with no other way to talk to the outside world other than taking in slips of paper with Chinese written on them and then they do all their rules and then they pass back a piece of paper with Chinese back. And this room with a person in is actually fantastically good at answering any question you give them written in Chinese. You know, do they understand Chinese? And is this, you know, something that's intelligently working with Chinese? Ever since that time, I'd say the most thought, to me, the most thoughtful and compelling philosophical response is yes. You know, intuitively it feels like no, because that's just because we can't imagine such a large kind of system. But you know, if it looks like a duck and acts like a duck, it's a duck, you know, or to all intents and purposes. And so I always kind of thought, you know, so this is basically a kind of analysis of the limits of text. And I kind of felt like, yeah, if something could ingest enough text and could use the patterns it saw to then generate text in response to text, it could appear to be intelligent, you know. And whether that means it is intelligent or not is a different discussion and not one I find very interesting. Yeah. And then when I came across neural nets when I was about 20, you know, what I learned about the universal approximation theorem and stuff, and I started thinking like, oh, I wonder if like a neural net could ever get big enough and take in enough data to be a Chinese room experiment. You know, with that background and this kind of like interest in transfer learning, you know, I'd been thinking about this thing for kind of 30 years and I thought like, oh, I wonder if we're there yet, you know, because we have a lot of text. Like I can literally download Wikipedia, which is a lot of text. And I thought, you know, how would something learn to kind of answer questions or, you know, respond to text? And I thought, well, what if we used a language model? So language models are already a thing, you know, they were not a popular or well-known thing, but they were a thing. But language models exist to this idea that you could train a model to fill in the gaps. Or actually in those days it wasn't fill in the gaps, it was finish a string. And in fact, Andrej Karpathy did his fantastic RNN demonstration from this at a similar time where he showed like you can have it ingest Shakespeare and it will generate something that looks a bit like Shakespeare. I thought, okay, so if I do this at a much bigger scale, using all of Wikipedia, what would it need to be able to do to finish a sentence in Wikipedia effectively, to do it quite accurately quite often? I thought, geez, it would actually have to know a lot about the world, you know, it'd have to know that there is a world and that there are objects and that objects relate to each other through time and cause each other to react in ways and that causes proceed effects and that, you know, when there are animals and there are people and that people can be in certain positions during certain timeframes and then you could, you know, all that together, you can then finish a sentence like this was signed into law in 2016 by US President X and it would fill in the gap, you know. So that's why I tried to create what in those days was considered a big language model trained on the entirety on Wikipedia, which is that was, you know, a bit unheard of. And my interest was not in, you know, just having a language model. My interest was in like, what latent capabilities would such a system have that would allow it to finish those kind of sentences? Because I was pretty sure, based on our work with transfer learning and vision, that I could then suck out those latent capabilities by transfer learning, you know, by fine-tuning it on a task data set or whatever. So we generated this three-step system. So step one was train a language model on a big corpus. Step two was fine-tune a language model on a more curated corpus. And step three was further fine-tune that model on a task. And of course, that's what everybody still does today, right? That's what ChatGPT is. And so the first time I tried it within hours, I had a new state-of-the-art academic result on IMDB. And I was like, holy s**t, it does work. And so you asked, to what degree was this kind of like pushing against the established wisdom? You know, every way. Like the reason it took me so long to try it was because I asked all my friends in NLP if this could work. And everybody said, no, it definitely won't work. It wasn't like, oh, maybe. Everybody was like, it definitely won't work. NLP is much more complicated than vision. Language is a much more vastly complicated domain. You know, and you've got problems like the grounding problem. We know from like philosophy and theory of mind that it's actually impossible for it to work. So yeah, so don't waste your time. [00:15:10]Alessio: Jeremy, had people not tried because it was like too complicated to actually get the data and like set up the training? Or like, were people just lazy and kind of like, hey, this is just not going to work? [00:15:20]Jeremy: No, everybody wasn't lazy. So like, so the person I thought at that time who, you know, there were two people I thought at that time, actually, who were the strongest at language models were Stephen Merity and Alec Radford. And at the time I didn't know Alec, but I, after we had both, after I'd released ULM Fit and he had released GPT, I organized a chat for both of us with Kate Metz in the New York Times. And Kate Metz answered, sorry, and Alec answered this question for Kate. And Kate was like, so how did, you know, GPT come about? And he said, well, I was pretty sure that pre-training on a general large corpus wouldn't work. So I hadn't tried it. And then I read ULM Fit and turns out it did work. And so I did it, you know, bigger and it worked even better. And similar with, with Stephen, you know, I asked Stephen Merity, like, why don't we just find, you know, take your AWD-ASTLM and like train it on all of Wikipedia and fine tune it? And he's kind of like, well, I don't think that's going to really lie. Like two years before I did a very popular talk at KDD, the conference where everybody in NLP was in the audience. I recognized half the faces, you know, and I told them all this, I'm sure transfer learning is the key. I'm sure ImageNet, you know, is going to be an NLP thing as well. And, you know, everybody was interested and people asked me questions afterwards and, but not just, yeah, nobody followed up because everybody knew that it didn't work. I mean, even like, so we were scooped a little bit by Dai and Lee, Kwok Lee at Google. They had, they had, I already, I didn't even realize this, which is a bit embarrassing. They had already done a large language model and fine tuned it. But again, they didn't create a general purpose, large language model on a general purpose corpus. They only ever tested a domain specific corpus. And I haven't spoken to Kwok actually about that, but I assume that the reason was the same. It probably just didn't occur to them that the general approach could work. So maybe it was that kind of 30 years of mulling over the, the cell Chinese room experiment that had convinced me that it probably would work. I don't know. Yeah. [00:17:48]Alessio: Interesting. I just dug up Alec announcement tweet from 2018. He said, inspired by Cobe, Elmo, and Yola, I'm fit. We should have a single transformer language model can be fine tuned to a wide variety. It's interesting because, you know, today people think of AI as the leader, kind of kind of like the research lab pushing forward the field. What was that at the time? You know, like kind of like going back five years, people think of it as an overnight success, but obviously it took a while. [00:18:16]Swyx: Yeah. Yeah. [00:18:17]Jeremy: No, I mean, absolutely. And I'll say like, you know, it's interesting that it mentioned Elmo because in some ways that was kind of diametrically opposed to, to ULM fit. You know, there was these kind of like, so there was a lot of, there was a lot of activity at the same time as ULM fits released. So there was, um, so before it, as Brian McCann, I think at Salesforce had come out with this neat model that did a kind of multitask learning, but again, they didn't create a general fine tune language model first. There was Elmo, um, which I think was a lip, you know, actually quite a few months after the first ULM fit example, I think. Um, but yeah, there was a bit of this stuff going on. And the problem was everybody was doing, and particularly after GPT came out, then everybody wanted to focus on zero shot and few shot learning. You know, everybody hated fine tuning. Everybody hated transfer learning. And like, I literally did tours trying to get people to start doing transfer learning and people, you know, nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning. And so I actually feel like we kind of went backwards for years and, and not to be honest, I mean, I'm a bit sad about this now, but I kind of got so disappointed and dissuaded by like, it felt like these bigger lab, much bigger labs, you know, like fast AI had only ever been just me and Rachel were getting all of this attention for an approach I thought was the wrong way to do it. You know, I was convinced was the wrong way to do it. And so, yeah, for years people were really focused on getting better at zero shot and few shots and it wasn't until, you know, this key idea of like, well, let's take the ULM fit approach, but for step two, rather than fine tuning on a kind of a domain corpus, let's fine tune on an instruction corpus. And then in step three, rather than fine tuning on a reasonably specific task classification, let's fine tune on a, on a RLHF task classification. And so that was really, that was really key, you know, so I was kind of like out of the NLP field for a few years there because yeah, it just felt like, I don't know, pushing uphill against this vast tide, which I was convinced was not the right direction, but who's going to listen to me, you know, cause I, as you said, I don't have a PhD, not at a university, or at least I wasn't then. I don't have a big set of computers to fine tune huge transformer models. So yeah, it was definitely difficult. It's always been hard. You know, it's always been hard. Like I've always been somebody who does not want to build stuff on lots of big computers because most people don't have lots of big computers and I hate creating stuff that most people can't use, you know, and also stuff that's created on lots of big computers has always been like much more media friendly. So like, it might seem like a recent thing, but actually throughout my 30 years in data science, the attention's always been on, you know, the big iron results. So when I first started, everybody was talking about data warehouses and it was all about Teradata and it'd be like, oh, this big bank has this huge room full of computers and they have like terabytes of data available, you know, at the press of a button. And yeah, that's always what people want to talk about, what people want to write about. And then of course, students coming out of their PhDs and stuff, that's where they want to go work because that's where they read about. And to me, it's a huge distraction, you know, because like I say, most people don't have unlimited compute and I want to help most people, not the small subset of the most well-off people. [00:22:16]Alessio: That's awesome. And it's great to hear, you do such a great job educating that a lot of times you're not telling your own story, you know? So I love this conversation. And the other thing before we jump into Fast.AI, actually, a lot of people that I know, they run across a new architecture and whatnot, they're like, I got to start a company and raise a bunch of money and do all of this stuff. And say, you were like, I want everybody to have access to this. Why was that the case for you? Was it because you already had a successful venture in like FastMail and you were more interested in that? What was the reasoning? [00:22:52]Jeremy: It's a really good question. So I guess the answer is yes, that's the reason why. So when I was a teenager, I thought it would be really cool to like have my own company. You know, I didn't know the word startup. I didn't know the word entrepreneur. I didn't know the word VC. And I didn't really know what any of those things were really until after we started Kaggle, to be honest. Even the way it started to what we now call startups. I just thought they were just small businesses. You know, they were just companies. So yeah, so those two companies were FastMail and Optimal Decisions. FastMail was the first kind of synchronized email provider for non-businesses. So something you can get your same email at home, on your laptop, at work, on your phone, whatever. And then Optimal Decisions invented a new approach to insurance pricing. Something called profit-optimized insurance pricing. So I saw both of those companies, you know, after 10 years. And at that point, I had achieved the thing that as a teenager I had wanted to do. You know, it took a lot longer than it should have because I spent way longer in management consulting than I should have because I got caught up in that stupid rat race. But, you know, eventually I got there and I remember my mom saying to me, you must be so proud. You know, because she remembered my dream. She's like, you've done it. And I kind of reflected and I was like, I'm not proud at all. You know, like people quite liked FastMail. You know, it's quite nice to have synchronized email. It probably would have happened anyway. Yeah, I'm certainly not proud that I've helped some insurance companies suck more money out of their customers. Yeah, no, I'm not proud. You know, it's actually, I haven't really helped the world very much. You know, maybe in the insurance case I've made it a little bit worse. I don't know. So, yeah, I was determined to not waste more years of my life doing things, working hard to do things which I could not be reasonably sure would have a lot of value. So, you know, I took some time off. I wasn't sure if I'd ever work again, actually. I didn't particularly want to, because it felt like, yeah, it felt like such a disappointment. And, but, you know, and I didn't need to. I had enough money. Like, I wasn't super rich, but I had enough money. I didn't need to work. And I certainly recognized that amongst the other people I knew who had enough money that they didn't need to work, they all worked ridiculously hard, you know, and constantly put themselves in extremely stressful situations. And I thought, I don't want to be one of those idiots who's tied to, you know, buying a bigger plane than the next guy or whatever. You know, Kaggle came along and I mainly kind of did that just because it was fun and interesting to hang out with interesting people. But, you know, with Fast.ai in particular, you know, Rachel and I had a very explicit, you know, long series of conversations over a long period of time about like, well, how can we be the most helpful to society as a whole, and particularly to those people who maybe need more help, you know? And so we definitely saw the world going in a potentially pretty dystopian direction if the world's most powerful technology was controlled by a small group of elites. So we thought, yeah, we should focus on trying to help that not happen. You know, sadly, it looks like it still is likely to happen. But I mean, I feel like we've helped make it a little bit less likely. So we've done our bit. [00:26:39]Swyx: You've shown that it's possible. And I think your constant advocacy, your courses, your research that you publish, you know, just the other day you published a finding on, you know, learning that I think is still something that people are still talking about quite a lot. I think that that is the origin story of a lot of people who are going to be, you know, little Jeremy Howards, furthering your mission with, you know, you don't have to do everything by yourself is what I'm saying. No, definitely. Definitely. [00:27:10]Jeremy: You know, that was a big takeaway from like, analytic was analytic. It definitely felt like we had to do everything ourselves. And I kind of, I wanted to solve medicine. I'll say, yeah, okay, solving medicine is actually quite difficult. And I can't do it on my own. And there's a lot of other things I'd like to solve, and I can't do those either. So that was definitely the other piece was like, yeah, you know, can we create an army of passionate domain experts who can change their little part of the world? And that's definitely happened. Like I find nowadays, at least half the time, probably quite a bit more that I get in contact with somebody who's done really interesting work in some domain. Most of the time I'd say, they say, yeah, I got my start with fast.ai. So it's definitely, I can see that. And I also know from talking to folks at places like Amazon and Adobe and stuff, which, you know, there's lots of alumni there. And they say, oh my God, I got here. And like half of the people are fast.ai alumni. So it's fantastic. [00:28:13]Swyx: Yeah. [00:28:14]Jeremy: Actually, Andre Kapathy grabbed me when I saw him at NeurIPS a few years ago. And he was like, I have to tell you, thanks for the fast.ai courses. When people come to Tesla and they need to know more about deep learning, we always send them to your course. And the OpenAI Scholars Program was doing the same thing. So it's kind of like, yeah, it's had a surprising impact, you know, that's just one of like three things we do is the course, you know. [00:28:40]Swyx: Yes. [00:28:40]Jeremy: And it's only ever been at most two people, either me and Rachel or me and Sylvia nowadays, it's just me. So yeah, I think it shows you don't necessarily need a huge amount of money and a huge team of people to make an impact. [00:28:56]Swyx: Yeah. So just to reintroduce fast.ai for people who may not have dived into it much, there is the courses that you do. There is the library that is very well loved. And I kind of think of it as a nicer layer on top of PyTorch that people should start with by default and use it as the basis for a lot of your courses. And then you have like NBDev, which I don't know, is that the third one? [00:29:27]Jeremy: Oh, so the three areas were research, software, and courses. [00:29:32]Swyx: Oh, sorry. [00:29:32]Jeremy: So then in software, you know, fast.ai is the main thing, but NBDev is not far behind. But then there's also things like FastCore, GHAPI, I mean, dozens of open source projects that I've created and some of them have been pretty popular and some of them are still a little bit hidden, actually. Some of them I should try to do a better job of telling people about. [00:30:01]Swyx: What are you thinking about? Yeah, what's on the course of my way? Oh, I don't know, just like little things. [00:30:04]Jeremy: Like, for example, for working with EC2 and AWS, I created a FastEC2 library, which I think is like way more convenient and nice to use than anything else out there. And it's literally got a whole autocomplete, dynamic autocomplete that works both on the command line and in notebooks that'll like auto-complete your instance names and everything like that. You know, just little things like that. I try to make like, when I work with some domain, I try to make it like, I want to make it as enjoyable as possible for me to do that. So I always try to kind of like, like with GHAPI, for example, I think that GitHub API is incredibly powerful, but I didn't find it good to work with because I didn't particularly like the libraries that are out there. So like GHAPI, like FastEC2, it like autocompletes both at the command line or in a notebook or whatever, like literally the entire GitHub API. The entire thing is like, I think it's like less than 100K of code because it actually, as far as I know, the only one that grabs it directly from the official open API spec that GitHub produces. And like if you're in GitHub and you just type an API, you know, autocomplete API method and hit enter, it prints out the docs with brief docs and then gives you a link to the actual documentation page. You know, GitHub Actions, I can write now in Python, which is just so much easier than writing them in TypeScript and stuff. So, you know, just little things like that. [00:31:40]Swyx: I think that's an approach which more developers took to publish some of their work along the way. You described the third arm of FastAI as research. It's not something I see often. Obviously, you do do some research. And how do you run your research? What are your research interests? [00:31:59]Jeremy: Yeah, so research is what I spend the vast majority of my time on. And the artifacts that come out of that are largely software and courses. You know, so to me, the main artifact shouldn't be papers because papers are things read by a small exclusive group of people. You know, to me, the main artifacts should be like something teaching people, here's how to use this insight and here's software you can use that builds it in. So I think I've only ever done three first-person papers in my life, you know, and none of those are ones I wanted to do. You know, they were all ones that, like, so one was ULM Fit, where Sebastian Ruder reached out to me after seeing the course and said, like, you have to publish this as a paper, you know. And he said, I'll write it. He said, I want to write it because if I do, I can put it on my PhD and that would be great. And it's like, okay, well, I want to help you with your PhD. And that sounds great. So like, you know, one was the masks paper, which just had to exist and nobody else was writing it. And then the third was the Fast.ai library paper, which again, somebody reached out and said, please, please write this. We will waive the fee for the journal and everything and actually help you get it through publishing and stuff. So yeah, so I don't, other than that, I've never written a first author paper. So the research is like, well, so for example, you know, Dawn Bench was a competition, which Stanford ran a few years ago. It was kind of the first big competition of like, who can train neural nets the fastest rather than the most accurate. And specifically it was who can train ImageNet the fastest. And again, this was like one of these things where it was created by necessity. So Google had just released their TPUs. And so I heard from my friends at Google that they had put together this big team to smash Dawn Bench so that they could prove to people that they had to use Google Cloud and use their TPUs and show how good their TPUs were. And we kind of thought, oh s**t, this would be a disaster if they do that, because then everybody's going to be like, oh, deep learning is not accessible. [00:34:20]Swyx: You know, to actually be good at it, [00:34:21]Jeremy: you have to be Google and you have to use special silicon. And so, you know, we only found out about this 10 days before the competition finished. But, you know, we basically got together an emergency bunch of our students and Rachel and I and sat for the next 10 days and just tried to crunch through and try to use all of our best ideas that had come from our research. And so particularly progressive resizing, just basically train mainly on small things, train on non-square things, you know, stuff like that. And so, yeah, we ended up winning, thank God. And so, you know, we turned it around from being like, like, oh s**t, you know, this is going to show that you have to be Google and have TPUs to being like, oh my God, even the little guy can do deep learning. So that's an example of the kind of like research artifacts we do. And yeah, so all of my research is always, how do we do more with less, you know? So how do we get better results with less data, with less compute, with less complexity, with less education, you know, stuff like that. So ULM fits obviously a good example of that. [00:35:37]Swyx: And most recently you published, can LLMs learn from a single example? Maybe could you tell the story a little bit behind that? And maybe that goes a little bit too far into the learning of very low resource, the literature. [00:35:52]Jeremy: Yeah, yeah. So me and my friend, Jono Whittaker, basically had been playing around with this fun Kaggle competition, which is actually still running as we speak, which is, can you create a model which can answer multiple choice questions about anything that's in Wikipedia? And the thing that makes it interesting is that your model has to run on Kaggle within nine hours. And Kaggle's very, very limited. So you've only got 14 gig RAM, only two CPUs, and a small, very old GPU. So this is cool, you know, if you can do well at this, then this is a good example of like, oh, you can do more with less. So yeah, Jono and I were playing around with fine tuning, of course, transfer learning, pre-trained language models. And we saw this, like, so we always, you know, plot our losses as we go. So here's another thing we created. Actually, Sylvain Guuger, when he worked with us, created called fast progress, which is kind of like TQEDM, but we think a lot better. So we look at our fast progress curves, and they kind of go down, down, down, down, down, down, down, a little bit, little bit, little bit. And then suddenly go clunk, and they drop. And then down, down, down, down, down a little bit, and then suddenly clunk, they drop. We're like, what the hell? These clunks are occurring at the end of each epoch. So normally in deep learning, this would be, this is, you know, I've seen this before. It's always been a bug. It's always turned out that like, oh, we accidentally forgot to turn on eval mode during the validation set. So I was actually learning then, or, oh, we accidentally were calculating moving average statistics throughout the epoch. So, you know, so it's recently moving average or whatever. And so we were using Hugging Face Trainer. So, you know, I did not give my friends at Hugging Face the benefit of the doubt. I thought, oh, they've fucked up Hugging Face Trainer, you know, idiots. Well, you'll use the Fast AI Trainer instead. So we switched over to Learner. We still saw the clunks and, you know, that's, yeah, it shouldn't really happen because semantically speaking in the epoch, isn't like, it's not a thing, you know, like nothing happens. Well, nothing's meant to happen when you go from ending one epoch to starting the next one. So there shouldn't be a clunk, you know. So I kind of asked around on the open source discords. That's like, what's going on here? And everybody was just like, oh, that's just what, that's just what these training curves look like. Those all look like that. Don't worry about it. And I was like, oh, are you all using Trainer? Yes. Oh, well, there must be some bug with Trainer. And I was like, well, we also saw it in Learner [00:38:42]Swyx: and somebody else is like, [00:38:42]Jeremy: no, we've got our own Trainer. We get it as well. They're just like, don't worry about it. It's just something we see. It's just normal. [00:38:48]Swyx: I can't do that. [00:38:49]Jeremy: I can't just be like, here's something that's like in the previous 30 years of neural networks, nobody ever saw it. And now suddenly we see it. [00:38:57]Swyx: So don't worry about it. [00:38:59]Jeremy: I just, I have to know why. [00:39:01]Swyx: Can I clarify? This is, was everyone that you're talking to, were they all seeing it for the same dataset or in different datasets? [00:39:08]Jeremy: Different datasets, different Trainers. They're just like, no, this is just, this is just what it looks like when you fine tune language models. Don't worry about it. You know, I hadn't seen it before, but I'd been kind of like, as I say, I, you know, I kept working on them for a couple of years after ULM fit. And then I kind of moved on to other things, partly out of frustration. So I hadn't been fine tuning, you know, I mean, Lama's only been out for a few months, right? But I wasn't one of those people who jumped straight into it, you know? So I was relatively new to the kind of Lama fine tuning world, where else these guys had been, you know, doing it since day one. [00:39:49]Swyx: It was only a few months ago, [00:39:51]Jeremy: but it's still quite a bit of time. So, so yeah, they're just like, no, this is all what we see. [00:39:56]Swyx: Don't worry about it. [00:39:56]Jeremy: So yeah, I, I've got a very kind of like, I don't know, I've just got this brain where I have to know why things are. And so I kind of, I ask people like, well, why, why do you think it's happening? And they'd be like, oh, it would pretty obviously, cause it's like memorize the data set. It's just like, that can't be right. It's only seen it once. Like, look at this, the loss has dropped by 0.3, 0.3, which is like, basically it knows the answer. And like, no, no, it's just, it is, it's just memorize the data set. So yeah. So look, Jono and I did not discover this and Jono and I did not come up with a hypothesis. You know, I guess we were just the ones, I guess, who had been around for long enough to recognize that like, this, this isn't how it's meant to work. And so we, we, you know, and so we went back and like, okay, let's just run some experiments, you know, cause nobody seems to have actually published anything about this. [00:40:51]Well, not quite true.Some people had published things, but nobody ever actually stepped back and said like, what the hell, you know, how can this be possible? Is it possible? Is this what's happening? And so, yeah, we created a bunch of experiments where we basically predicted ahead of time. It's like, okay, if this hypothesis is correct, that it's memorized in the training set, then we ought to see blah, under conditions, blah, but not under these conditions. And so we ran a bunch of experiments and all of them supported the hypothesis that it was memorizing the data set in a single thing at once. And it's a pretty big data set, you know, which in hindsight, it's not totally surprising because the theory, remember, of the ULMFiT theory was like, well, it's kind of creating all these latent capabilities to make it easier for it to predict the next token. So if it's got all this kind of latent capability, it ought to also be really good at compressing new tokens because it can immediately recognize it as like, oh, that's just a version of this. So it's not so crazy, you know, but it is, it requires us to rethink everything because like, and nobody knows like, okay, so how do we fine tune these things? Because like, it doesn't even matter. Like maybe it's fine. Like maybe it's fine that it's memorized the data set after one go and you do a second go and okay, the validation loss is terrible because it's now really overconfident. [00:42:20]Swyx: That's fine. [00:42:22]Jeremy: Don't, you know, don't, I keep telling people, don't track validation loss, track validation accuracy because at least that will still be useful. Just another thing that's got lost since ULMFiT, nobody tracks accuracy of language models anymore. But you know, it'll still keep learning and it does, it does keep improving. But is it worse? You know, like, is it like, now that it's kind of memorized it, it's probably getting a less strong signal, you know, I don't know. So I still don't know how to fine tune language models properly and I haven't found anybody who feels like they do, like nobody really knows whether this memorization thing is, it's probably a feature in some ways. It's probably some things that you can do usefully with it. It's probably, yeah, I have a feeling it's messing up training dynamics as well. [00:43:13]Swyx: And does it come at the cost of catastrophic forgetting as well, right? Like, which is the other side of the coin. [00:43:18]Jeremy: It does to some extent, like we know it does, like look at Code Llama, for example. So Code Llama was a, I think it was like a 500 billion token fine tuning of Llama 2 using code. And also pros about code that Meta did. And honestly, they kind of blew it because Code Llama is good at coding, but it's bad at everything else, you know, and it used to be good. Yeah, I was pretty sure it was like, before they released it, me and lots of people in the open source discords were like, oh my God, you know, we know this is coming, Jan Lukinsk saying it's coming. I hope they kept at least like 50% non-code data because otherwise it's going to forget everything else. And they didn't, only like 0.3% of their epochs were non-code data. So it did, it forgot everything else. So now it's good at code and it's bad at everything else. So we definitely have catastrophic forgetting. It's fixable, just somebody has to do, you know, somebody has to spend their time training a model on a good mix of data. Like, so, okay, so here's the thing. Even though I originally created three-step approach that everybody now does, my view is it's actually wrong and we shouldn't use it. [00:44:36]Jeremy: And that's because people are using it in a way different to why I created it. You know, I created it thinking the task-specific models would be more specific. You know, it's like, oh, this is like a sentiment classifier as an example of a task, you know, but the tasks now are like a, you know, RLHF, which is basically like answer questions that make people feel happy about your answer. So that's a much more general task and it's a really cool approach. And so we see, for example, RLHF also breaks models like, you know, like GPT-4, RLHDEFT, we know from kind of the work that Microsoft did, you know, the pre, the earlier, less aligned version was better. And these are all kind of examples of catastrophic forgetting. And so to me, the right way to do this is to fine-tune language models, is to actually throw away the idea of fine-tuning. There's no such thing. There's only continued pre-training. And pre-training is something where from the very start, you try to include all the kinds of data that you care about, all the kinds of problems that you care about, instructions, exercises, code, general purpose document completion, whatever. And then as you train, you gradually curate that, you know, you gradually make that higher and higher quality and more and more specific to the kinds of tasks you want it to do. But you never throw away any data. You always keep all of the data types there in reasonably high quantities. You know, maybe the quality filter, you stop training on low quality data, because that's probably fine to forget how to write badly, maybe. So yeah, that's now my view, is I think ULM fit is the wrong approach. And that's why we're seeing a lot of these, you know, so-called alignment tacks and this view of like, oh, a model can't both code and do other things. And, you know, I think it's actually because people are training them wrong. [00:46:47]Swyx: Yeah, well, I think you have a clear [00:46:51]Alessio: anti-laziness approach. I think other people are not as good hearted, you know, they're like, [00:46:57]Swyx: hey, they told me this thing works. [00:46:59]Alessio: And if I release a model this way, people will appreciate it, I'll get promoted and I'll kind of make more money. [00:47:06]Jeremy: Yeah, and it's not just money. It's like, this is how citations work most badly, you know, so if you want to get cited, you need to write a paper that people in your field recognize as an advancement on things that we know are good. And so we've seen this happen again and again. So like I say, like zero shot and few shot learning, everybody was writing about that. Or, you know, with image generation, everybody just was writing about GANs, you know, and I was trying to say like, no, GANs are not the right approach. You know, and I showed again through research that we demonstrated in our videos that you can do better than GANs, much faster and with much less data. And nobody cared because again, like if you want to get published, you write a GAN paper that slightly improves this part of GANs and this tiny field, you'll get published, you know. So it's, yeah, it's not set up for real innovation. It's, you know, again, it's really helpful for me, you know, I have my own research lab with nobody telling me what to do and I don't even publish. So it doesn't matter if I get citations. And so I just write what I think actually matters. I wish there was, and, you know, and actually places like OpenAI, you know, the researchers there can do that as well. It's a shame, you know, I wish there was more academic, open venues in which people can focus on like genuine innovation. [00:48:38]Swyx: Twitter, which is unironically has become a little bit of that forum. I wanted to follow up on one thing that you mentioned, which is that you checked around the open source discords. I don't know if it's too, I don't know if it's a pusher to ask like what discords are lively or useful right now. I think that something I definitely felt like I missed out on was the early days of Luther AI, which is a very hard bit. And, you know, like what is the new Luther? And you actually shouted out the alignment lab AI discord in your blog post. And that was the first time I even knew, like I saw them on Twitter, never knew they had a discord, never knew that there was actually substantive discussions going on in there and that you were an active member of it. Okay, yeah. [00:49:23]Jeremy: And then even then, if you do know about that and you go there, it'll look like it's totally dead. And that's because unfortunately, nearly all the discords, nearly all of the conversation happens in private channels. You know, and that's, I guess. [00:49:35]Swyx: How does someone get into that world? Because it's obviously very, very instructive, right? [00:49:42]Jeremy: You could just come to the first AI discord, which I'll be honest with you, it's less bustling than some of the others, but it's not terrible. And so like, at least, to be fair, one of Emma's bustling channels is private. [00:49:57]Swyx: I guess. [00:49:59]Jeremy: So I'm just thinking. [00:50:01]Swyx: It's just the nature of quality discussion, right? Yeah, I guess when I think about it, [00:50:05]Jeremy: I didn't have any private discussions on our discord for years, but there was a lot of people who came in with like, oh, I just had this amazing idea for AGI. If you just thought about like, if you imagine that AI is a brain, then we, you know, this just, I don't want to talk about it. You know, I don't want to like, you don't want to be dismissive or whatever. And it's like, oh, well, that's an interesting comment, but maybe you should like, try training some models first to see if that aligns with your intuition. Like, oh, but how could I possibly learn? It's like, well, we have a course, just actually spend time learning. Like, you know, anyway. And there's like, okay, I know the people who always have good answers there. And so I created a private channel and put them all in it. And I got to admit, that's where I post more often because there's much less, you know, flight of fancy views about how we could solve AGI, blah, blah, blah. So there is a bit of that. But having said that, like, I think the bar is pretty low. Like if you join a Discord and you can hit the like participants or community or whatever button, you can see who's in it. And then you'll see at the top, who the admins or moderators or people in the dev role are. And just DM one of them and say like, oh, here's my GitHub. Well, here's some blog posts I wrote. You know, I'm interested in talking about this, you know, can I join the private channels? And I've never heard of anybody saying no. I will say, you know, Alutha's all pretty open. So you can do the Alutha Discord still. You know, one problem with the Alutha Discord is it's been going on for so long that it's like, it's very inside baseball. It's quite hard to get started. Yeah. Carpa AI looks, I think it's all open. That's just less stability. That's more accessible. [00:52:03]Swyx: Yeah. [00:52:04]Jeremy: There's also just recently, now it's research that does like the Hermes models and data set just opened. They've got some private channels, but it's pretty open, I think. You mentioned Alignment Lab, that one it's all the interesting stuff is on private channels. So just ask. If you know me, ask me, cause I've got admin on that one. There's also, yeah, OS Skunkworks, OS Skunkworks AI is a good Discord, which I think it's open. So yeah, they're all pretty good. [00:52:40]Swyx: I don't want you to leak any, you know, Discords that don't want any publicity, but this is all helpful. [00:52:46]Jeremy: We all want people, like we all want people. [00:52:49]Swyx: We just want people who like, [00:52:51]Jeremy: want to build stuff, rather than people who, and like, it's fine to not know anything as well, but if you don't know anything, but you want to tell everybody else what to do and how to do it, that's annoying. If you don't know anything and want to be told like, here's a really small kind of task that as somebody who doesn't know anything is going to take you a really long time to do, but it would still be helpful. Then, and then you go and do it. That would be great. The truth is, yeah, [00:53:19]Swyx: like, I don't know, [00:53:20]Jeremy: maybe 5% of people who come in with great enthusiasm and saying that they want to learn and they'll do anything. [00:53:25]Swyx: And then somebody says like, [00:53:25]Jeremy: okay, here's some work you can do. Almost nobody does that work. So if you're somebody who actually does the work and follows up, you will massively stand out. That's an extreme rarity. And everybody will then want to help you do more work. [00:53:41]Swyx: So yeah. [00:53:41]Jeremy: So just, yeah, just do work and people will want to support you. [00:53:47]Alessio: Our Discord used to be referral only for a long time. We didn't have a public invite and then we opened it and they're kind of like channel gating. Yeah. A lot of people just want to do, I remember it used to be like, you know, a forum moderator. [00:54:00]Swyx: It's like people just want to do [00:54:01]Alessio: like drive-by posting, [00:54:03]Swyx: you know, and like, [00:54:03]Alessio: they don't want to help the community. They just want to get their question answered. [00:54:07]Jeremy: I mean, the funny thing is our forum community does not have any of that garbage. You know, there's something specific about the low latency thing where people like expect an instant answer. And yeah, we're all somehow in a forum thread where they know it's like there forever. People are a bit more thoughtful, but then the forums are less active than they used to be because Discord has got more popular, you know? So it's all a bit of a compromise, you know, running a healthy community is, yeah, it's always a bit of a challenge. All right, we got so many more things [00:54:47]Alessio: we want to dive in, but I don't want to keep you here for hours. [00:54:50]Swyx: This is not the Lex Friedman podcast [00:54:52]Alessio: we always like to say. One topic I would love to maybe chat a bit about is Mojo, modular, you know, CrystalLiner, not many of you on the podcast. So we want to spend a little time there. You recently did a hacker's guide to language models and you ran through everything from quantized model to like smaller models, larger models, and all of that. But obviously modular is taking its own approach. Yeah, what got you excited? I know you and Chris have been talking about this for like years and a lot of the ideas you had, so. [00:55:23]Jeremy: Yeah, yeah, yeah, yeah, no, absolutely. So I met Chris, I think it was at the first TensorFlow Dev Summit. And I don't think he had even like, I'm not sure if he'd even officially started his employment with Google at that point. So I don't know, you know, certainly nothing had been mentioned. So I, you know, I admired him from afar with LLVM and Swift and whatever. And so I saw him walk into the courtyard at Google. It's just like, oh s**t, man, that's Chris Latner. I wonder if he would lower his standards enough to talk to me. Well, worth a try. So I caught up my courage because like nobody was talking to him. He looked a bit lost and I wandered over and it's like, oh, you're Chris Latner, right? It's like, what are you doing here? What are you doing here? And I was like, yeah, yeah, yeah. It's like, oh, I'm Jeremy Howard. It's like, oh, do you do some of this AI stuff? And I was like, yeah, yeah, I like this AI stuff. Are you doing AI stuff? It's like, well, I'm thinking about starting to do some AI stuff. Yeah, I think it's going to be cool. And it's like, wow. So like, I spent the next half hour just basically brain dumping all the ways in which AI was stupid to him. And he listened patiently. And I thought he probably wasn't even remember or care or whatever. But yeah, then I kind of like, I guess I re-caught up with him a few months later. And it's like, I've been thinking about everything you said in that conversation. And he like narrated back his response to every part of it, projects he was planning to do. And it's just like, oh, this dude follows up. Holy s**t. And I was like, wow, okay. And he was like, yeah, so we're going to create this new thing called Swift for TensorFlow. And it's going to be like, it's going to be a compiler with auto differentiation built in. And blah, blah, blah. And I was like, why would that help? [00:57:10]Swyx: You know, why would you? [00:57:10]Jeremy: And he was like, okay, with a compiler during the forward pass, you don't have to worry about saving context, you know, because a lot will be optimized in the backward. But I was like, oh my God. Because I didn't really know much about compilers. You know, I spent enough to kind of like, understand the ideas, but it hadn't occurred to me that a compiler basically solves a lot of the problems we have as end users. I was like, wow, that's amazing. Okay, you do know, right, that nobody's going to use this unless it's like usable. It's like, yeah, I know, right. So I was thinking you should create like a fast AI for this. So, okay, but I don't even know Swift. And he was like, well, why don't you start learning it? And if you have any questions, ask me. It's just like, holy s**t. Like, not only has Chris Latner lowered his standards enough to talk to me, but he's offering me personal tutoring on the programming language that he made. So I was just like, I'm not going to let him down. So I spent like the next two months, like just nerding out on Swift. And it was just before Christmas that I kind of like started writing down what I'd learned. And so I wrote a couple of blog posts on like, okay, this is like my attempt to do numeric programming in Swift. And these are all the challenges I had. And these are some of the issues I had with like making things properly performant. And here are some libraries I wrote. And I sent it to Chris and was like, I hope he's not too disappointed with me, you know, because that would be the worst. It's like, you know, and I was also like, I was like, I hope he doesn't dislike the fact that I, you know, didn't love everything. [00:58:46]Jeremy: And yeah, he was like, oh, thanks for sending me that. Let's get on a call and talk about it. And we spoke and he was like, this is amazing. I can't believe that you made this. This is exactly what Swift needs. And he was like, and so like somebody set up like a new Swift, what they call them, the equivalent of a pep, you know, kind of RFC thing of like, oh, you know, let's look at how we can implement Jeremy's ideas and the language. And so it's like, oh, wow. And so, yeah, you know, and then we ended up like literally teaching some lessons together about Swift for TensorFlow. And we built a fast AI kind of equivalent with him and his team. It was so much fun. Then in the end, you know, Google didn't follow through, which is fair enough, like asking everybody to learn a new programming language is going to be tough. But like, it was very obvious, very, very obvious at that time that TensorFlow 2 is going to be a failure, you know, and so it's felt like, okay, I, you know, well, you know, what are you going to do? Like, you can't focus on TensorFlow 2 because it's not going to, like, it's not working. It's never going to work. You know, nobody at Google's using it. Internally. So, you know, in the end, Chris left, you know, Swift for TensorFlow got archived. [01:00:13]Swyx: There was no backup plan. [01:00:15]Jeremy: So it kind of felt like Google was kind of screwed, you know, and Chris went and did something else. But we kept talking and I was like, look, Chris, you know, you've got to be your own boss, man. It's like, you know, you've got the ideas, you know, like only you've got the ideas, you know, and if your ideas are implemented, we'd all be so much better off because like Python's the best of a whole bunch of s**t, you know, like I would, it's amazing, but it's awful, you know, compared to what it could be. And anyway, so eventually a few years later, he called me up and he was like, Jeremy, I've taken your advice. I've started a company. And I was like, oh my God. It's like, we've got to create a new language. We're going to create a new infrastructure. It's going to build, it's going to have all the stuff we've talked about. And it's like, oh wow. So that's what Mojo is. And so Mojo is like, you know, building on all the stuff that Chris has figured out over, I mean, really from when he did his PhD thesis, which developed LLVM onwards, you know, in Swift and MLIR, you know, the TensorFlow runtime engine, which is very good. You know, that was something that he built and has lasted. So yeah, I'm pumped about that. I mean, it's very speculative. Creating a whole new language is tough. I mean, Chris has done it before and he's created a whole C++ compiler amongst other things. Looking pretty hopeful. I mean, I hope it works because, you know, [01:01:53]Alessio: You told them to quit his job. [01:01:55]Swyx: So I mean, in the meantime, I will say, you know, [01:02:00]Jeremy: Google now does have a backup plan, you know, they have Jax, which was never a strategy. It was just a bunch of people who also recognized TensorFlow 2 as s**t and they just decided to build something else. And for years, my friends in that team were like, don't tell anybody about us because we don't want to be anything but a research project. So now these poor guys, suddenly they're the great white hope for Google's future. And so Jax is, you know, also not terrible, but it's still written in Python. Like it would be cool if we had all the benefits of Jax, but in a language that was designed for those kinds of purposes. So, you know, fingers crossed that, yeah, that Mojo turns out great. [01:02:45]Swyx: Yeah. [01:02:47]Alessio: Any other thoughts on when, where people should be spending their time? So that's more the kind of language framework level. Then you have the, you know, GGML, some of these other like quantization focused kind of model level things. Then you got the hardware people. It's like a whole other bucket. Yeah. What are some of the exciting stuff that you're excited about? [01:03:08]Jeremy: Well, you won't be surprised to hear me say this, but I think fine tuning transfer learning is still a hugely underappreciated area. So today's zero shot, few shot learning equivalent is retrieval augmented generation, you know, RAC, which is like, just like few shot learning is a thing. Like it's a real thing. It's a useful thing. It's not a thing anybody would want to ignore. Why are people not spending at least as much effort on fine tuning? You know, cause you know, RAG is like such a inefficient hack really, [01:03:45]Swyx: isn't it? [01:03:45]Jeremy: It's like, you know, segment up my data in some somewhat arbitrary way, embed it, ask questions about that, you know, hope that my embedding, you know, model embeds questions in the same bedding space as the paragraphs, which obviously is not going to, if your question is like, if I've got a whole bunch of archive papers embeddings, and I asked like, what are all the ways in which we can make inference more efficient? Like the only paragraphs it'll find is like if there's a review paper, here's a list of ways to make, you know, inference more efficient. Doesn't have any of the specifics. No, it's not going to be like, oh, here's one way, here's one way, here's a different way in different papers, [01:04:33]Swyx: you know? Yeah. [01:04:35]Jeremy: If you fine tune a model, then all of that information is getting directly incorporated into the weights of your model in a much more efficient and nuanced way. And then you can use RAG on top of that. So I think that that's one area that's definitely like underappreciated. And also the kind of like the confluence or like, okay, how do you combine RAG and fine tuning, for example. [01:05:00]Swyx: Something that I think a lot of people are uncertain about, and I don't expect you to know either, is that whether or not you can fine tune new information in, and I think that that is the focus of some of your open questions. And of course you can, right? [01:05:17]Jeremy: Like, obviously you can, because there's no such thing as fine, there's no such thing as fine tuning. There's only continued pre-training. So fine tuning is pre-training, like they're literally the same thing. So the knowledge got in there in the first place through pre-training. So how could like continuing to pre-train not put more knowledge in? Like it's the same thing. The problem is just we're really bad at it because everybody's doing it dumb ways. So, you know, it's a good question. And it's not just new knowledge, but like new capabilities. You know, I think like in my Packers Guide to LLM, into Packers Guide to LLM's talk, I show a simple, I mean, it's a funny, that's a simple example, because it doesn't sound it, but like taking a pre-trained based model and getting it to generate SQL. And it took 15 minutes to train on a single GPU. You know, I think that might surprise people that that capability is at your fingertips. And, you know, because it was already there, it was just latent in the base model. Really pushing the boundaries of what you can do with small models, I think is a really interesting question. Like what can you do with a, like, I mean, there isn't much in the way of good small models. A really underappreciated one is a BTLM 3B, which is a like kind of 7B quality 3B model. There's not much at the 1 to 2B range sadly, there are some code ones, but like the fact that there are some really good code ones in that 1 to 2B range shows you that that's a great size for doing complex tasks well. [01:06:56]Swyx: There was PHY 1 recently, which has been the subject of a little bit of discussion about whether to train on benchmarks. [01:07:04]Jeremy: PHY 1.5 as well. So that's not a good model yet. [01:07:09]Swyx: Why not? [01:07:11]Jeremy: It's good at doing, so PHY 1 in particular is good at doing a very specific thing, which is creating very small Python snippets. [01:07:19]Swyx: The thing, okay, [01:07:21]Jeremy: so like PHY 1.5 has never read Wikipedia, for example, so it doesn't know who Tom Cruise is, you know, it doesn't know who anybody is, it doesn't know about any movies, it doesn't really know anything about anything, like, because it's never read anything, you know, it was trained on a nearly entirely synthetic data set, which is designed for it to learn reasoning, and so it was a research project, and a really good one, and it definitely shows us a powerful direction in terms of what you can do with synthetic data, and wow, gosh, even these tiny models can get pretty good reasoning skills, pretty good math skills, pretty good coding skills, [01:08:04]Jeremy: but I don't know if it's a model you could necessarily build on. Some people have tried to do some fine tunes of it, and again, they're like surprisingly good in some ways for a 1.5b model, but not sure you'd find it useful for anything. [01:08:24]Swyx: I think that's the struggle of pitching small models, because small is great, you know, you don't need a lot of resources to run them, but the performance evaluation is always so iffy, it's always just like, yeah, it works on some things, and we don't trust it for others. [01:08:41]Jeremy: Yeah, so that's why we're back to fine tuning. So Microsoft did create a 5.1.5 web, but they didn't release it, unfortunately. I would say a 5.1.5 web with fine tuning for your task, you know, might quite, you know, might solve a lot of tasks that people have in their kind of day-to-day lives. You know, particularly in kind of an enterprise setting, I think there's a lot of like repetitive kind of processing that has to be done. It's a useful thing for coders to know about, because I think quite often you can like replace some thousands and thousands of lines of complex buggy code, maybe with a fine tune, you know. [01:09:24]Swyx: Got it. Yeah. [01:09:27]Alessio: And Jeremy, before we let you go, I think one question on top of a lot of people's minds. So you've done practical deep learning for coders in 2018, 19, 21, 22. I feel like the more time goes by, the more the GPUs get concentrated. If you're somebody who's interested in deep learning today and you don't want to go join OpenAI, you don't want to join Anthropic, what's like the best use of their time? Should they focus on, yeah, small model development? Should they focus on fine tuning math and all of that? Should they just like focus on making Ragnar a hack and coming up with a better solution? Yeah, what's a practical deep learning for coders 2024 kind of look like? [01:10:10]Jeremy: Yeah. [01:10:11]Swyx: I mean, good question. [01:10:12]Jeremy: I'm trying to figure that out for myself. You know, like what should I teach? Because I definitely feel like things have changed a bit. You know, one of the ways in which things have changed is that coding is much more accessible now. So if you look at a lot of the folks in the kind of open source LLM community, they're folks who really hadn't coded before a year ago. And they're using these models to help them build stuff they couldn't build before, which is just fantastic, you know? So one thing I kind of think is like, okay, well, we need a lot more material to help these people use this newfound skill they have because they don't really know what they're doing, you know, and they don't claim to, but they're doing it anyway. And I think that's fantastic, you know? So like, are there things we could do to help people, [01:10:58]Swyx: you know, bridge this gap? [01:11:00]Jeremy: Because previously, you know, I know folks who were, you know, doing manual jobs a year ago, and now they're training language models thanks to the help of Codex and Copilot and whatever. So, you know, yeah, what does it look like to like really grab this opportunity? You know, maybe Fast.ai's goals can be dramatically expanded now to being like, let's make coding more accessible, you know, kind of AI-oriented coding more accessible. If so, our course should probably look very different, you know, and we'd have to throw away that like, oh, you have to have at least a year of full-time programming, you know, as a prerequisite. Yeah, what would happen if we got rid of that? So that's kind of one thought that's in my head. You know, as to what should other people do? Honestly, I don't think anybody has any idea, like, the more I look at it, what's going on. I know I don't, you know, like, we don't really know how to do anything very well. Clearly OpenAI do, like, they seem to be quite good at some things, or they're talking to folks at, or who have recently left OpenAI. [01:12:17]Swyx: Even there, it's clear there's a lot of stuff [01:12:19]Jeremy: they haven't really figured out, and they're just kind of like using recipes that they've noticed have been okay, so, yeah, we don't really know how to train these models well, we don't know how to fine-tune them well, we don't know how to do React well, we don't know what they can do, we don't know what they can't do, we don't know how big a model you need to solve different kinds of problems, we don't know what kind of problems they can't do, we don't know what good prompting strategies are for particular problems, you know. Like, somebody sent me a message the other day saying they've written something that is a prompting strategy for GPT-4, for GPT-4, they've written, like, 6,000 lines of Python code, and it's to help it play chess. And then they've said they've had it play against other chess engines, including the best Stockfish engines, and it's got an ELO of 3,400, [01:13:11]Swyx: which would make it close to [01:13:13]Jeremy: the best chess engine in existence. And I think this is a good example of, like, people were saying, like, GPT-4 can't play chess. I mean, I was sure that was wrong. I mean, obviously, it can play chess. But the difference between, like, with no prompting strategy, it can't even make legal moves, with good prompting strategies, it might be just about the best chess engine in the world, far better than any human player. So, yeah, I mean, we don't really know what the capabilities are yet. So I feel like it's all blue sky at this point. It feels like computer vision in 2013 to me, which was, like, in 2013, computer vision was, like, OK, OK. [01:13:51]Swyx: We just had the AlexNet. [01:13:52]Jeremy: We've had AlexNet. We've had VGGNet. It's around the time Zyler and Fergus, like, no, it's probably before that. So we hadn't yet had the Zyler and Fergus, like, oh, this is actually what's going on inside the layers. So, you know, we don't actually know what's happening inside these transformers. We don't know how to create good training dynamics. We don't really know anything much. And there's a reason for that, right? And the reason for that is language models suddenly got really useful. And so the kind of economically rational thing to do, like, this is not criticism. This is true. The economic rational thing to do is to, like, OK, like, build that as fast as possible. You know, make something work, get it out there. And that's what, you know, OpenAI in particular did and Anthropic kind of did. But there's a whole lot of technical debt everywhere. You know, nobody's really figured this stuff out because everybody's been so busy [01:14:53]Swyx: building what we know works as quickly as possible. [01:14:57]Jeremy: So, yeah, I think there's a huge amount of opportunity to, you know, I think we'll find things can be made to work a lot faster, a lot less memory. I got a whole bunch of ideas I want to try, you know, every time I look at something closely, like really closely, I'm always like, oh, it turns out this person actually had no idea what they're doing, you know, [01:15:21]Swyx: which is fine. [01:15:23]Jeremy: Like, none of us know what we're doing. We should experiment with that. As we had a trade out on the podcast [01:15:32]Alessio: who created FlashAttention. Yeah. And I asked him, did nobody think of using SRAM before you? Like, were people just like, no. And he was like, yeah, people just didn't think of it. They didn't try. They didn't come from like a systems background. [01:15:48]Swyx: Yeah. [01:15:48]Jeremy: I mean, the thing about FlashAttention is, I mean, lots of people absolutely had thought of that. So had I, right? But I mean, the honest truth is, particularly before Triton, like everybody knew that tiling is the right way to solve anything. And everybody knew that attention, fused attention wasn't tiled. That was stupid. But not everybody's got his ability to like, be like, oh, well, I am confident enough in CUDA and or Triton to use that insight to write something better, you know? And this is where, like, I'm super excited about Mojo, right? And I always talk to Chris about FlashAttention because I'm like, you know, there is a thousand FlashAttentions out there for us to build. You just got to make it easy for us to build them. Like Triton definitely helps, but it's still not easy. You know, it still requires kind of really understanding the GPU architecture and writing it in that kind of very CUDA-ish way. So yeah, I think, you know, if Mojo or something equivalent can really work well, we're going to see a lot more FlashAttentions popping up. [01:17:06]Swyx: Great, Jerry. [01:17:08]Alessio: And before we wrap, we usually do a quick lightning round. [01:17:10]Swyx: We're going to have three simple questions. [01:17:13]Alessio: So the first one is around acceleration. And you've been in this field a long time. What's something that it's already here today in AI that you thought would take much longer? I don't think anything. [01:17:24]Jeremy: So I've actually been slightly too bullish. So in my 2014 TED talk, I had a graph and I said, like, this is like the slope of human capabilities and this is the slope of AI capabilities. And I said, oh, and I put a dot saying we are here. It was just before they passed. And I looked back at the transcript the other day and I said, in five years, I think we'll, you know, we might have crossed that threshold in which computers will be better at most human tasks than most humans or most average humans. And so that might be almost true now for non-physical tasks. So I was like, took, you know, took that twice as long as I thought it might. [01:18:11]Jeremy: Yeah, no, I wouldn't say anything surprised me too much. It's still like, definitely like, I got to admit, you know, I had a very visceral reaction using GPT-4 for the first time. Not because I found it surprising, but actually doing it, like something I was pretty sure would exist by about now, maybe a bit earlier. But actually using it definitely is different to just feeling like it's probably on its way, you know, and yeah, whatever GPT-5 looks like. I'm sure, I imagine I'll have the same visceral reaction, you know. [01:18:56]Swyx: It's really amazing to watch develop. We also have an exploration question. So what do you think is the most interesting unsolved question in AI? [01:19:07]Jeremy: How do language models learn? You know, what are the training dynamics? Like I want to see, there was a great paper about ResNets a few years ago that showed how, that was able to like plot a kind of projected three-dimensional loss surface for a ConvNet with and without skip connections. And you know, you could very clearly see without the skip connections, it was bumpy, and with the skip connections, it was super smooth. That's the kind of work we need. Like, so there was actually an interesting blog post that came out just today from the PyTorch team where some of them have created this like 3D matrix product visualization thing. [01:19:56]Swyx: The MatMul Visualizer. [01:19:58]Jeremy: Yeah, and they actually showed some nice examples of like a GPT-2 attention layer and like showed an animation and said, like, if you look at this, we can actually see a bit about what it's doing. You know, so again, it reminds me of the Zeiler and Fergus, you know, ConvNet paper that was the first one to do these reverse convolutions to show what's actually being learned in each layer in a ConvNet. Yeah, we need a lot more of this, like, what is going on inside these models? How do they actually learn? And then how can we use those insights to help them to learn better? So I think that would be one. The other exploration I'd really like to see is a much more rigorous analysis of what kind of data do they need at what level? And when do they need it? And how often? So that kind of like dataset mixing, curation, so forth. [01:20:52]Swyx: Right. In order to get the best capabilities. Yeah. How much is Wikipedia? Yeah. [01:20:58]Jeremy: Yeah. [01:20:59]Swyx: Very uncertain. [01:20:59]Jeremy: Fine-tune what, you know, what kind of mix do you need for it to keep its capabilities? And what are the kind of underlying capabilities that it most needs to keep? And if it loses those, it would lose all these other ones. And what data do you need to keep those? And, you know, other things we can do to change the loss function, to help it to not forget to do things, stuff like that. [01:21:20]Swyx: Awesome. [01:21:21]Alessio: And yeah, before wrapping, what's one message, one idea you want everyone to remember and think about? [01:21:27]Jeremy: You know, I guess the main thing I want everybody to remember is that, you know, there's a lot of people in the world. And they have a lot of, you know, diverse experiences and capabilities. And they all matter. And now that we have a, you know, newly powerful technology in our lives, we could think of that one of two ways. One would be, gee, that's really scary. What would happen if all of these people in the world had access to this technology? Some of them might be bad people. Let's make sure they can't have it. Or one might be, wow, of all those people in the world, I bet a lot of them could really improve the lives of a lot of humanity if they had this tool. This has always been the case, you know, from the invention of writing, to the invention of the printing press, to the, you know, development of education. And it's been a constant battle between people who think that the distributed power is unsafe and it should be held on to by an elite few. And people who think that humanity on net, you know, is a marvelous species, particularly when part of a society and a civilization. And we should do everything we can to enable more of them to contribute. This is a really big conversation right now. And, you know, I want to see more and more people showing up and showing what, you know, what the great unwashed masses out there can actually achieve. You know, that actually, you know, regular people are going to do a lot of really valuable work and actually help us be, you know, more safe and also flourishing in our lives and providing a future for our children to flourish in. You know, if we lock things down to the people that we think, you know, the elites that we think can be trusted to run it for us, yeah, I think all bets are off about where that leaves us as a society, you know. [01:24:00]Alessio: Yep. Now that's an important message. And yeah, that's why we've been promoting a lot of open source developers, open source communities, I think, letting the builders build and explore. That's always a good idea. Thank you so much for coming on, Jeremy. This was great. [01:24:20]Jeremy: Thank you for having me. [01:24:22] Get full access to Latent Space at www.latent.space/subscribe
01:09:1519/10/2023
Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue
Thanks to the over 11,000 people who joined us for the first AI Engineer Summit! A full recap is coming, but you can 1) catch up on the fun and videos on Twitter and YouTube, 2) help us reach 1000 people for the first comprehensive State of AI Engineering survey and 3) submit projects for the new AI Engineer Foundation.See our Community page for upcoming meetups in SF, Paris, NYC, and Singapore. This episode had good interest on Twitter.Last month, Imbue was crowned as AI’s newest unicorn foundation model lab, raising a $200m Series B at a >$1 billion valuation. As “stealth” foundation model companies go, Imbue (f.k.a. Generally Intelligent) has stood as an enigmatic group given they have no publicly released models to try out. However, ever since their $20m Series A last year their goal has been to “develop generally capable AI agents with human-like intelligence in order to solve problems in the real world”.From RL to Reasoning LLMsAlong with their Series A, they announced Avalon, “A Benchmark for RL Generalization Using Procedurally Generated Worlds”. Avalon is built on top of the open source Godot game engine, and is ~100x faster than Minecraft to enable fast RL benchmarking and a clear reward with adjustable game difficulty.After a while, they realized that pure RL isn’t a good path to teach reasoning and planning. The agents were able to learn mechanical things like opening complex doors, climbing, but couldn’t go to higher level tasks. A pure RL world also doesn’t include a language explanation of the agent reasoning, which made it hard to understand why it made certain decisions. That pushed the team more towards the “models for reasoning” path:“The second thing we learned is that pure reinforcement learning is not a good vehicle for planning and reasoning. So these agents were able to learn all sorts of crazy things: They could learn to climb like hand over hand in VR climbing, they could learn to open doors like very complicated, like multiple switches and a lever open the door, but they couldn't do any higher level things. And they couldn't do those lower level things consistently necessarily. And as a user, I do not want to interact with a pure reinforcement learning end to end RL agent. As a user, like I need much more control over what that agent is doing.”Inspired by Chelsea Finn’s work on SayCan at Stanford, the team pivoted to have their agents do the reasoning in natural language instead. This development parallels the large leaps in reasoning that humans have developed as the scientific method:“We are better at reasoning now than we were 3000 years ago. An example of a reasoning strategy is noticing you're confused. Then when I notice I'm confused, I should ask:* What was the original claim that was made? * What evidence is there for this claim? * Does the evidence support the claim? * Is the claim correct? This is like a reasoning strategy that was developed in like the 1600s, you know, with like the advent of science. So that's an example of a reasoning strategy. There are tons of them. We employ all the time, lots of heuristics that help us be better at reasoning. And we can generate data that's much more specific to them.“The Full Stack Model LabOne year later, it would seem that the pivot to reasoning has had tremendous success, and Imbue has now reached a >$1B valuation, with participation from Astera Institute, NVIDIA, Cruise CEO Kyle Vogt, Notion co-founder Simon Last, and others. Imbue tackles their work with a “full stack” approach:* Models. Pretraining very large (>100B parameter) models, optimized to perform well on internal reasoning benchmarks, with a ~10,000 Nvidia H100 GPU cluster lets us iterate rapidly on everything from training data to architecture and reasoning mechanisms.* Tools and Agents. Building internal productivity tools from coding agents for fixing type checking and linting errors, to sophisticated systems like CARBS (for hyperparameter tuning and network architecture search).* Interface Invention. Solving agent trust and collaboration (not merely communication) with humans by creating better abstractions and interfaces — IDEs for users to program computers in natural language.* Theory. Publishing research about the theoretical underpinnings of self-supervised learning, as well as scaling laws for machine learning research.Kanjun believes we are still in the “bare metal phase” of agent development, and they want to take a holistic approach to building the “operating system for agents”. We loved diving deep into the Imbue approach toward solving the AI Holy Grail of reliable agents, and are excited to share our conversation with you today!Timestamps* [00:00:00] Introductions* [00:06:07] The origin story of Imbue* [00:09:39] Imbue's approach to training large foundation models optimized for reasoning* [00:12:18] Imbue's goals to build an "operating system" for reliable, inspectable AI agents* [00:15:37] Imbue's process of developing internal tools and interfaces to collaborate with AI agents* [00:17:27] Imbue's focus on improving reasoning capabilities in models, using code and other data* [00:19:50] The value of using both public benchmarks and internal metrics to evaluate progress* [00:21:43] Lessons learned from developing the Avalon research environment* [00:23:31] The limitations of pure reinforcement learning for general intelligence* [00:28:36] Imbue's vision for building better abstractions and interfaces for reliable agents* [00:31:36] Interface design for collaborating with, rather than just communicating with, AI agents* [00:37:40] The future potential of an agent-to-agent protocol* [00:39:29] Leveraging approaches like critiquing between models and chain of thought* [00:45:49] Kanjun's philosophy on enabling team members as creative agents at Imbue* [00:53:51] Kanjun's experience co-founding the communal co-living space The Archive* [01:00:22] Lightning RoundShow Notes* Imbue* Avalon* CARBS (hyperparameter optimizer)* Series B announcement* Kanjun/Imbue’s Podcast* MIT Media Lab* Research mentioned:* Momentum Contrast* SimClr* Chelsea Finn - SayCan* Agent Protocol - part of the AI Engineer Foundation* Xerox PARC* Michael Nielsen* Jason Benn* Outset Capital* Scenius - Kevin Kelly* South Park Commons* The Archive* Thursday Nights in AITranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, Partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai. [00:00:19]Swyx: Hey, and today in the studio we have Kanjun from Imbue. Welcome. So you and I have, I guess, crossed paths a number of times. You're formerly named Generally Intelligent and you've just announced your rename, rebrand in huge, humongous ways. So congrats on all of that. And we're here to dive in into deeper detail on Imbue. We like to introduce you on a high level basis, but then have you go into a little bit more of your personal side. So you graduated your BS at MIT and you also spent some time at the MIT Media Lab, one of the most famous, I guess, computer hacking labs in the world. Then you graduated MIT and you went straight into BizOps at Dropbox, where you're eventually chief of staff, which is a pretty interesting role we can dive into later. And then it seems like the founder bug hit you. You were basically a three times founder at Ember, Sorceress, and now at Generally Intelligent slash Imbue. What should people know about you on the personal side that's not on your LinkedIn? That's something you're very passionate about outside of work. [00:01:12]Kanjun: Yeah. I think if you ask any of my friends, they would tell you that I'm obsessed with agency, like human agency and human potential. [00:01:19]Swyx: That's work. Come on.Kanjun: It's not work. What are you talking about?Swyx: So what's an example of human agency that you try to promote? [00:01:27]Kanjun: With all of my friends, I have a lot of conversations with them that's kind of helping figure out what's blocking them. I guess I do this with a team kind of automatically too. And I think about it for myself often, like building systems. I have a lot of systems to help myself be more effective. At Dropbox, I used to give this onboarding talk called How to Be Effective, which people liked. I think like a thousand people heard this onboarding talk, and I think maybe Dropbox was more effective. I think I just really believe that as humans, we can be a lot more than we are. And it's what drives everything. I guess completely outside of work, I do dance. I do partner dance. [00:02:03]Swyx: Yeah. Lots of interest in that stuff, especially in the sort of group living houses in San Francisco, which I've been a little bit part of, and you've also run one of those. [00:02:12]Kanjun: That's right. Yeah. I started the archive with two friends, with Josh, my co-founder, and a couple of other folks in 2015. That's right. And GPT-3, our housemates built. [00:02:22]Swyx: Was that the, I guess, the precursor to Generally Intelligent, that you started doing more things with Josh? Is that how that relationship started? Yeah. [00:02:30]Kanjun: This is our third company together. Our first company, Josh poached me from Dropbox for Ember. And there we built a really interesting technology, laser raster projector, VR headset. And then we were like, VR is not the thing we're most passionate about. And actually it was kind of early days when we both realized we really do believe that in our lifetimes, like computers that are intelligent are going to be able to allow us to do much more than we can do today as people and be much more as people than we can be today. And at that time, we actually, after Ember, we were like, work on AI research or start an AI lab. A bunch of our housemates were joining OpenAI, and we actually decided to do something more pragmatic to apply AI to recruiting and to try to understand like, okay, if we are actually trying to deploy these systems in the real world, what's required? And that was Sorceress. That taught us so much about maybe an AI agent in a lot of ways, like what does it actually take to make a product that people can trust and rely on? I think we never really fully got there. And it's taught me a lot about what's required. And it's kind of like, I think informed some of our approach and some of the way that we think about how these systems will actually get used by people in the real world. [00:03:42]Swyx: Just to go one step deeper on that, you're building AI agents in 2016 before it was cool. You got some muscle and you raised $30 million. Something was working. What do you think you succeeded in doing and then what did you try to do that did not pan out? [00:03:56]Kanjun: Yeah. So the product worked quite well. So Sorceress was an AI system that basically looked for candidates that could be a good fit and then helped you reach out to them. And this was a little bit early. We didn't have language models to help you reach out. So we actually had a team of writers that like, you know, customized emails and we automated a lot of the customization. But the product was pretty magical. Like candidates would just be interested and land in your inbox and then you can talk to them. As a hiring manager, that's such a good experience. I think there were a lot of learnings, both on the product and market side. On the market side, recruiting is a market that is endogenously high churn, which means because people start hiring and then we hire the role for them and they stop hiring. So the more we succeed, the more they... [00:04:39]Swyx: It's like the whole dating business. [00:04:40]Kanjun: It's the dating business. Exactly. Exactly. And I think that's the same problem as the dating business. And I was really passionate about like, can we help people find work that is more exciting for them? A lot of people are not excited about their jobs and a lot of companies are doing exciting things and the matching could be a lot better. But the dating business phenomenon like put a damper on that, like it's actually a pretty good business. But as with any business with like relatively high churn, the bigger it gets, the more revenue we have, the slower growth becomes because if 30% of that revenue you lose year over year, then it becomes a worse business. So that was the dynamic we noticed quite early on after our Series A. I think the other really interesting thing about it is we realized what was required for people to trust that these candidates were like well vetted and had been selected for a reason. And it's what actually led us, you know, a lot of what we do at Imbue is working on interfaces to figure out how do we get to a situation where when you're building and using agents, these agents are trustworthy to the end user. That's actually one of the biggest issues with agents that, you know, go off and do longer range goals is that I have to trust, like, did they actually think through this situation? And that really informed a lot of our work today. [00:05:52]Alessio: Let's jump into GI now, Imbue. When did you decide recruiting was done for you and you were ready for the next challenge? And how did you pick the agent space? I feel like in 2021, it wasn't as mainstream. Yeah. [00:06:07]Kanjun: So the LinkedIn says that it started in 2021, but actually we started thinking very seriously about it in early 2020, late 2019, early 2020. So what we were seeing is that scale is starting to work and language models probably will actually get to a point where like with hacks, they're actually going to be quite powerful. And it was hard to see that at the time, actually, because GPT-3, the early versions of it, there are all sorts of issues. We're like, oh, that's not that useful, but we could kind of see like, okay, you keep improving it in all of these different ways and it'll get better. What Josh and I were really interested in is how can we get computers that help us do bigger things? Like, you know, there's this kind of future where I think a lot about, you know, if I were born in 1900 as a woman, like my life would not be that fun. I'd spend most of my time like carrying water and literally like getting wood to put in the stove to cook food and like cleaning and scrubbing the dishes and, you know, getting food every day because there's no refrigerator, like all of these things, very physical labor. And what's happened over the last 150 years since the industrial revolution is we've kind of gotten free energy, like energy is way more free than it was 150 years ago. And so as a result, we've built all these technologies like the stove and the dishwasher and the refrigerator, and we have electricity and we have infrastructure, running water, all of these things that have totally freed me up to do what I can do now. And I think the same thing is true for intellectual energy. We don't really see it today, but because we're so in it, but our computers have to be micromanaged. You know, part of why people are like, oh, you're stuck to your screen all day. Well, we're stuck to our screen all day because literally nothing happens unless I'm doing something in front of my screen. I don't, you know, I can't send my computer off to do a bunch of stuff for me. And there is a future where that's not the case, where, you know, I can actually go off and do stuff and trust that my computer will pay my bills and figure out my travel plans and do the detailed work that I am not that excited to do so that I can like be much more creative and able to do things that I as a human, I'm very excited about and collaborate with other people. And there are things that people are uniquely suited for. So that's kind of always been the thing that has been really exciting to me. Like Josh and I have known for a long time, I think that, you know, whatever AI is, it would happen in our lifetimes. And the personal computer kind of started giving us a bit of free intellectual energy. And this is like really the explosion of free intellectual energy. So in early 2020, we were thinking about this and what happened was self-supervised learning basically started working across everything. So worked in language, SimClear came out, I think MoCo had come out, Momentum Contrast had come out earlier in 2019, SimClear came out in early 2020. And we're like, okay, for the first time, self-supervised learning is working really well across images and text and suspect that like, okay, actually it's the case that machines can learn things the way that humans do. And if that's true, if they can learn things in a fully self-supervised way, because like as people, we are not supervised. We like go Google things and try to figure things out. So if that's true, then like what the computer could be is much bigger than what it is today. And so we started exploring ideas around like, how do we actually go? We didn't think about the fact that we could actually just build a research lab. So we were like, okay, what kind of startup could we build to like leverage self-supervised learning? So that eventually becomes something that allows computers to become much more able to do bigger things for us. But that became General Intelligence, which started as a research lab. [00:09:39]Alessio: So your mission is you aim to rekindle the dream of the personal computer. So when did it go wrong and what are like your first products and user facing things that you're building to rekindle it? [00:09:53]Kanjun: Yeah. So what we do at Imbue is we train large foundation models optimized for reasoning. And the reason for that is because reasoning is actually, we believe the biggest blocker to agents or systems that can do these larger goals. If we think about something that writes an essay, like when we write an essay, we like write it. We put it and then we're done. We like write it and then we look at it and we're like, oh, I need to do more research on that area. I'm going to go do some research and figure it out and come back and, oh, actually it's not quite right. The structure of the outline. So I'm going to rearrange the outline, rewrite it. It's this very iterative process and it requires thinking through like, okay, what am I trying to do? Is the goal correct? Also like, has the goal changed as I've learned more? So as a tool, like when should I ask the user questions? I shouldn't ask them questions all the time, but I should ask them questions in higher risk situations. How certain am I about the like flight I'm about to book? There are all of these notions of like risk certainty, playing out scenarios, figuring out how to make a plan that makes sense, how to change the plan, what the goal should be. That are things that we lump under the bucket of reasoning and models today, they're not optimized for reasoning. It turns out that there's not actually that much explicit reasoning data on the internet as you would expect. And so we get a lot of mileage out of optimizing our models for reasoning in pre-training. And then on top of that, we build agents ourselves and we, I can get into, we really believe in serious use, like really seriously using the systems and trying to get to an agent that we can use every single day, tons of agents that we can use every single day. And then we experiment with interfaces that help us better interact with the agents. So those are some set of things that we do on the kind of model training and agent side. And then the initial agents that we build, a lot of them are trying to help us write code better because code is most of what we do every day. And then on the infrastructure and theory side, we actually do a fair amount of theory work to understand like, how do these systems learn? And then also like, what are the right abstractions for us to build good agents with, which we can get more into. And if you look at our website, we build a lot of tools internally. We have a like really nice automated hyperparameter optimizer. We have a lot of really nice infrastructure and it's all part of the belief of like, okay, let's try to make it so that the humans are doing the things humans are good at as much as possible. So out of our very small team, we get a lot of leverage. [00:12:18]Swyx: And so would you still categorize yourself as a research lab now, or are you now in startup mode? Is that a transition that is conscious at all? [00:12:26]Kanjun: That's a really interesting question. I think we've always intended to build, you know, to try to build the next version of the computer, enable the next version of the computer. The way I think about it is there's a right time to bring a technology to market. So Apple does this really well. Actually, iPhone was under development for 10 years, AirPods for five years. And Apple has a story where iPhone, the first multi-touch screen was created. They actually were like, oh wow, this is cool. Let's like productionize iPhone. They actually brought, they like did some work trying to productionize it and realized this is not good enough. And they put it back into research to try to figure out like, how do we make it better? What are the interface pieces that are needed? And then they brought it back into production. So I think of production and research as kind of like these two separate phases. And internally we have that concept as well, where like things need to be done in order to get to something that's usable. And then when it's usable, like eventually we figure out how to productize it. [00:13:20]Alessio: What's the culture like to make that happen, to have both like kind of like product oriented, research oriented. And as you think about building the team, I mean, you just raised 200 million. I'm sure you want to hire more people. What are like the right archetypes of people that work at Imbue? [00:13:35]Kanjun: I would say we have a very unique culture in a lot of ways. I think a lot about social process design. So how do you design social processes that enable people to be effective? I like to think about team members as creative agents, because most companies, they think of their people as assets and they're very proud of this. And I think about like, okay, what is an asset? It's something you own that provides you value that you can discard at any time. This is a very low bar for people. This is not what people are. And so we try to enable everyone to be a creative agent and to really unlock their superpowers. So a lot of the work I do, you know, I was mentioning earlier, I'm like obsessed with agency. A lot of the work I do with team members is try to figure out like, you know, what are you really good at? What really gives you energy and where can we put you such that, how can I help you unlock that and grow that? So much of our work, you know, in terms of team structure, like much of our work actually comes from people. Carbs, our hyperparameter optimizer came from Abe trying to automate his own research process doing hyperparameter optimization. And he actually pulled some ideas from plasma physics. He's a plasma physicist to make the local search work. A lot of our work on evaluations comes from a couple of members of our team who are like obsessed with evaluations. We do a lot of work trying to figure out like, how do you actually evaluate if the model is getting better? Is the model making better agents? Is the agent actually reliable? A lot of things kind of like, I think of people as making the like them shaped blob inside imbue and I think, you know, yeah, that's the kind of person that we're, we're hiring for. We're hiring product engineers and data engineers and research engineers and all these roles. We have projects, not teams. We have a project around data, data collection and data engineering. That's actually one of the key things that improve the model performance. We have a pre-training kind of project with some fine tuning as part of that. And then we have an agent's project that's like trying to build on top of our models as well as use other models in the outside world to try to make agents then we actually use as programmers every day. So all sorts of different, different projects. [00:15:37]Swyx: As a founder, you're now sort of a capital allocator among all of these different investments effectively at different projects. And I was interested in how you mentioned that you were optimizing for improving reasoning and specifically inside of your pre-training, which I assume is just a lot of data collection. [00:15:55]Kanjun: We are optimizing reasoning inside of our pre-trained models. And a lot of that is about data. And I can talk more about like what, you know, what exactly does it involve? But actually big, maybe 50% plus of the work is figuring out even if you do have models that reason well, like the models are still stochastic. The way you prompt them still makes, is kind of random, like makes them do random things. And so how do we get to something that is actually robust and reliable as a user? How can I, as a user, trust it? We have all sorts of cool things on the, like, you know, I was mentioning earlier when I talked to other people building agents, they have to do so much work, like to try to get to something that they can actually productize and it takes a long time and agents haven't been productized yet for, partly for this reason is that like the abstractions are very leaky. We can get like 80% of the way there, but like self-driving cars, like the remaining 20% is actually really difficult. We believe that, and we have internally, I think some things that like an interface, for example, that lets me really easily like see what the agent execution is, fork it, try out different things, modify the prompt, modify like the plan that it is making. This type of interface, it makes it so that I feel more like I'm collaborating with the agent as it's executing, as opposed to it's just like doing something as a black box. That's an example of a type of thing that's like beyond just the model pre-training, but on the model pre-training side, like reasoning is a thing that we optimize for. And a lot of that is about what data do we put in. [00:17:27]Swyx: It's interesting just because I always think like, you know, out of the levers that you have, the resources that you have, I think a lot of people think that running foundation model company or a research lab is going to be primarily compute. And I think the share of compute has gone down a lot over the past three years. It used to be the main story, like the main way you scale is you just throw more compute at it. And now it's like, Flops is not all you need. You need better data, you need better algorithms. And I wonder where that shift has gone. This is a very vague question, but is it like 30-30-30 now? Is it like maybe even higher? So one way I'll put this is people estimate that Llama2 maybe took about three to $4 million of compute, but probably 20 to $25 million worth of labeling data. And I'm like, okay, well that's a very different story than all these other foundation model labs raising hundreds of millions of dollars and spending it on GPUs. [00:18:20]Kanjun: Data is really expensive. We generate a lot of data. And so that does help. The generated data is close to actually good, as good as human labeled data. [00:18:34]Swyx: So generated data from other models? [00:18:36]Kanjun: From our own models. From your own models. Or other models, yeah. [00:18:39]Swyx: Do you feel like there's certain variations of this? There's the sort of the constitutional AI approach from Anthropic and basically models sampling training on data from other models. I feel like there's a little bit of like contamination in there, or to put it in a statistical form, you're resampling a distribution that you already have that you already know doesn't match human distributions. How do you feel about that basically, just philosophically? [00:19:04]Kanjun: So when we're optimizing models for reasoning, we are actually trying to like make a part of the distribution really spiky. So in a sense, like that's actually what we want. We want to, because the internet is a sample of the human distribution that's also skewed in all sorts of ways. That is not the data that we necessarily want these models to be trained on. And so when we're generating data, we're not really randomly generating data. We generate very specific things that are like reasoning traces and that help optimize reasoning. Code also is a big piece of improving reasoning. So generated code is not that much worse than like regular human written code. You might even say it can be better in a lot of ways. So yeah. So we are trying to already do that. [00:19:50]Alessio: What are some of the tools that you thought were not a good fit? So you built Avalon, which is your own simulated world. And when you first started, the metagame was like using games to simulate things using, you know, Minecraft and then OpenAI is like the gym thing and all these things. And I think in one of your other podcasts, you mentioned like Minecraft is like way too slow to actually do any serious work. Is that true? Yeah. I didn't say it. [00:20:17]Swyx: I don't know. [00:20:18]Alessio: That's above my pay grade. But Avalon is like a hundred times faster than Minecraft for simulation. When did you figure that out that you needed to just like build your own thing? Was it kind of like your engineering team was like, Hey, this is too slow. Was it more a long-term investment? [00:20:34]Kanjun: Yeah. At that time we built Avalon as a research environment to help us learn particular things. And one thing we were trying to learn is like, how do you get an agent that is able to do many different tasks? Like RL agents at that time and environments at that time. What we heard from other RL researchers was the like biggest thing keeping holding the field back is lack of benchmarks that let us explore things like planning and curiosity and things like that and have the agent actually perform better if the agent has curiosity. And so we were trying to figure out in a situation where, how can we have agents that are able to handle lots of different types of tasks without the reward being pretty handcrafted? That's a lot of what we had seen is that like these very handcrafted rewards. And so Avalon has like a single reward it's across all tasks. And it also allowed us to create a curriculum so we could make the level more or less difficult. And it taught us a lot, maybe two primary things. One is with no curriculum, RL algorithms don't work at all. So that's actually really interesting. [00:21:43]Swyx: For the non RL specialists, what is a curriculum in your terminology? [00:21:46]Kanjun: So a curriculum in this particular case is basically the environment Avalon lets us generate simpler environments and harder environments for a given tasks. What's interesting is that the simpler environments, what you'd expect is the agent succeeds more often. So it gets more reward. And so, you know, kind of my intuitive way of thinking about it is, okay, the reason why it learns much faster with a curriculum is it's just getting a lot more signal. And that's actually an interesting general intuition to have about training these things as like, what kind of signal are they getting? And like, how can you help it get a lot more signal? The second thing we learned is that reinforcement learning is not a good vehicle, like pure reinforcement learning is not a good vehicle for planning and reasoning. So these agents were not able to, they were able to learn all sorts of crazy things. They could learn to climb like hand over hand in VR climbing, they could learn to open doors like very complicated, like multiple switches and a lever open the door, but they couldn't do any higher level things. And they couldn't do those lower level things consistently necessarily. And as a user, I do not want to interact with a pure reinforcement learning end to end RL agent. As a user, like I need much more control over what that agent is doing. And so that actually started to get us on the track of thinking about, okay, how do we do the reasoning part in language? And we were pretty inspired by our friend Chelsea Finn at Stanford was I think working on SACAN at the time where it's basically an experiment where they have robots kind of trying to do different tasks and actually do the reasoning for the robot in natural language. And it worked quite well. And that led us to start experimenting very seriously with reasoning. [00:23:31]Alessio: How important is the language part for the agent versus for you to inspect the agent? You know, like is it the interface to kind of the human on the loop really important or? [00:23:43]Kanjun: Yeah, I personally think of it as it's much more important for us, the human user. So I think you probably could get end to end agents that work and are fairly general at some point in the future. But I think you don't want that. Like we actually want agents that we can like perturb while they're trying to figure out what to do. Because, you know, even a very simple example, internally we have like a type error fixing agent and we have like a test generation agent. Test generation agent goes off rails all the time. I want to know, like, why did it generate this particular test? [00:24:19]Swyx: What was it thinking? [00:24:20]Kanjun: Did it consider, you know, the fact that this is calling out to this other function? And the formatter agent, if it ever comes up with anything weird, I want to be able to debug like what happened with RL end to end stuff. Like we couldn't do that. Yeah. [00:24:36]Swyx: It sounds like you have a bunch of agents operating internally within the company. What's your most, I guess, successful agent and what's your least successful one? [00:24:44]Kanjun: The agents don't work. All of them? I think the only successful agents are the ones that do really small things. So very specific, small things like fix the color of this button on the website or like change the color of this button. [00:24:57]Swyx: Which is now sweep.dev is doing that. Exactly. [00:25:00]Kanjun: Perfect. Okay. [00:25:02]Swyx: Well, we should just use sweep.dev. Well, I mean, okay. I don't know how often you have to fix the color of a button, right? Because all of them raise money on the idea that they can go further. And my fear when encountering something like that is that there's some kind of unknown asymptote ceiling that's going to prevent them, that they're going to run head on into that you've already run into. [00:25:21]Kanjun: We've definitely run into such a ceiling. But what is the ceiling? [00:25:24]Swyx: Is there a name for it? Like what? [00:25:26]Kanjun: I mean, for us, we think of it as reasoning plus these tools. So reasoning plus abstractions, basically. I think actually you can get really far with current models and that's why it's so compelling. Like we can pile debugging tools on top of these current models, have them critique each other and critique themselves and do all of these, like spend more computer inference time, context hack, retrieve augmented generation, et cetera, et cetera, et cetera. Like the pile of hacks actually does get us really far. And a way to think about it is like the underlying language model is kind of like a noisy channel. Actually I don't want to use this analogy. It's actually a really bad analogy, but you kind of like trying to get more signal out of the channel. We don't like to think about it that way. It's what the default approach is, is like trying to get more signal out of this noising channel. But the issue with agents is as a user, I want it to be mostly reliable. It's kind of like self-driving in that way. Like it's not as bad as self-driving, like in self-driving, you know, you're like hurtling at 70 miles an hour. It's like the hardest agent problem. But one thing we learned from Sorceress and one thing we learned by using these things internally is we actually have a pretty high bar for these agents to work. You know, it's actually really annoying if they only work 50% of the time and we can make interfaces to make it slightly less annoying. But yeah, there's a ceiling that we've encountered so far and we need to make the models better. We also need to make the kind of like interface to the user better. And also a lot of the like critiquing. I hope what we can do is help people who are building agents actually like be able to deploy them. I think, you know, that's the gap that we see a lot of today is everyone who's trying to build agents to get to the point where it's robust enough to be deployable. It just, it's like an unknown amount of time. Okay. [00:27:12]Swyx: So this goes back into what Embu is going to offer as a product or a platform. How are you going to actually help people deploy those agents? Yeah. [00:27:21]Kanjun: So our current hypothesis, I don't know if this is actually going to end up being the case. We've built a lot of tools for ourselves internally around like debugging, around abstractions or techniques after the model generation happens. Like after the language model generates the text and like interfaces for the user and the underlying model itself, like models talking to each other, maybe some set of those things kind of like an operating system. Some set of those things will be helpful for other people. And we'll figure out what set of those things is helpful for us to make our agents. Like what we want to do is get to a point where we can like start making an agent, deploy it, it's reliable, like very quickly. And there's a similar analog to software engineering, like in the early days, in the seventies and the sixties, like to program a computer, like you have to go all the way down to the registers and write things and eventually we had assembly. That was like an improvement. But then we wrote programming languages with these higher levels of abstraction and that allowed a lot more people to do this and much faster. And the software created is much less expensive. And I think it's basically a similar route here where we're like in the like bare metal phase of agent building. And we will eventually get to something with much nicer abstractions. [00:28:36]Alessio: We had this conversation with George Hotz and we were like, there's not a lot of reasoning data out there. And can the models really understand? And his take was like, look, with enough compute, you're not that complicated as a human. Like the model can figure out eventually why certain decisions are made. What's been your experience? Like as you think about reasoning data, like do you have to do a lot of like manual work or like is there a way to prompt models to extract the reasoning from actions that they [00:29:03]Swyx: see? [00:29:03]Kanjun: So we don't think of it as, oh, throw enough data at it and then it will figure out what the plan should be. I think we're much more explicit. You know, a way to think about it is as humans, we've learned a lot of reasoning strategies over time. We are better at reasoning now than we were 3000 years ago. An example of a reasoning strategy is noticing you're confused. Then when I notice I'm confused, I should ask like, huh, what was the original claim that was made? What evidence is there for this claim? Does the evidence support the claim? Is the claim correct? This is like a reasoning strategy that was developed in like the 1600s, you know, with like the advent of science. So that's an example of a reasoning strategy. There are tons of them. We employ all the time, lots of heuristics that help us be better at reasoning. And we didn't always have them. And because they're invented, like we can generate data that's much more specific to them. So I think internally, yeah, we have a lot of thoughts on what reasoning is and we generate a lot more specific data. We're not just like, oh, it'll figure out reasoning from this black box or like it'll figure out reasoning from the data that exists. Yeah. [00:30:04]Alessio: I mean, the scientific method is like a good example. If you think about hallucination, right, people are thinking, how do we use these models to do net new, like scientific research? And if you go back in time and the model is like, well, the earth revolves around the sun and people are like, man, this model is crap. It's like, what are you talking about? Like the sun revolves around the earth. It's like, how do you see the future? Like if the models are actually good enough, but we don't believe them, it's like, how do we make the two live together? So you're like, you use Inbu as a scientist to do a lot of your research and Inbu tells you, hey, I think this is like a serious path you should go down. And you're like, no, that sounds impossible. Like how is that trust going to be built? And like, what are some of the tools that maybe are going to be there to inspect it? [00:30:51]Kanjun: Really there are two answers to this. One element of it is as a person, like I need to basically get information out of the model such that I can try to understand what's going on with the model. Then the second question is like, okay, how do you do that? And that's kind of some of our debugging tools, they're not necessarily just for debugging. They're also for like interfacing with and interacting with the model. So like if I go back in this reasoning trace and like change a bunch of things, what's going to happen? Like, what does it conclude instead? So that kind of helps me understand like, what are its assumptions? And, you know, we think of these things as tools. And so it's really about like, as a user, how do I use this tool effectively? I need to be willing to be convinced as well. It's like, how do I use this tool effectively? And what can it help me with? [00:31:36]Swyx: And what can it tell me? There's a lot of mention of code in your process. And I was hoping to dive in even deeper. I think we might run the risk of giving people the impression that you view code or you use code just as like a tool within InView just for coding assistance. But I think you actually train code models. And I think there's a lot of informal understanding about how adding code to language models improves their reasoning capabilities. I wonder if there's any research or findings that you have to share that talks about the intersection of code and reasoning. Hmm. Yeah. [00:32:08]Kanjun: So the way I think about it intuitively is like code is the most explicit example of reasoning data on the internet. [00:32:15]Swyx: Yeah. [00:32:15]Kanjun: And it's not only structured, it's actually very explicit, which is nice. You know, it says this variable means this, and then it uses this variable. And then the function does this. As people, when we talk in language, it takes a lot more to extract that explicit structure out of our language. And so that's one thing that's really nice about code is I see it as almost like a curriculum for reasoning. I think we use code in all sorts of ways. The coding agents are really helpful for us to understand what are the limitations of the agents. The code is really helpful for the reasoning itself. But also code is a way for models to act. So by generating code, it can act on my computer. And, you know, when we talk about rekindling the dream of the personal computer, kind of where I see computers going is, you know, like computers will eventually become these much more malleable things where I, as a user today, I have to know how to write software code, like in order to make my computer do exactly what I want it to do. But in the future, if the computer is able to generate its own code, then I can actually interface with it in natural language. And so one way we think about agents is kind of like a natural language programming language. It's a way to program my computer in natural language that's much more intuitive to me as a user. And these interfaces that we're building are essentially IDEs for users to program our computers in natural language. Maybe I should say what we're doing that way. Maybe it's clearer. [00:33:47]Swyx: I don't know. [00:33:47]Alessio: That's a good pitch. What do you think about the different approaches people have, kind of like text first, browser first, like multi-on? What do you think the best interface will be? Or like, what is your, you know, thinking today? [00:33:59]Kanjun: In a lot of ways, like chat as an interface, I think Linus, Linus Lee, you had on this. I really like how he put it. Chat as an interface is skeuomorphic. So in the early days, when we made word processors on our computers, they had notepad lines because that's what we understood these like objects to be. Chat, like texting someone is something we understand. So texting our AI is something that we understand. But today's word documents don't have notepad lines. And similarly, the way we want to interact with agents, like chat is a very primitive way of interacting with agents. What we want is to be able to inspect their state and to be able to modify them and fork them and all of these other things. And we internally have, think about what are the right representations for that? Like architecturally, like what are the right representations? What kind of abstractions do we need to build? And how do we build abstractions that are not leaky? Because if the abstractions are leaky, which they are today, like, you know, this stochastic generation of text is like a leaky abstraction. I cannot depend on it. And that means it's actually really hard to build on top of. But our experience and belief is actually by building better abstractions and better tooling, we can actually make these things non-leaky. And now you can build like whole things on top of them. So these other interfaces, because of where we are, we don't think that much about them. [00:35:17]Swyx: Yeah. [00:35:17]Alessio: I mean, you mentioned, this is kind of like the Xerox Spark moment for AI. And we had a lot of stuff come out of Parc, like the, what you see is what you got editors and like MVC and all this stuff. But yeah, but then we didn't have the iPhone at Parc. We didn't have all these like higher things. What do you think it's reasonable to expect in like this era of AI, you know, call it like five years or so? Like what are like the things we'll build today and what are things that maybe we'll see in kind of like the second wave of products? [00:35:46]Kanjun: That's interesting. I think the waves will be much faster than before. Like what we're seeing right now is basically like a continuous wave. Let me zoom a little bit earlier. So people like the Xerox Parc analogy I give, but I think there are many different analogies. Like one is the like analog to digital computer is kind of an example, like another analogy to where we are today. The analog computer Vannevar Bush built in the 1930s, I think, and it's like a system of pulleys and it can only calculate one function. Like it can calculate like an integral. And that was so magical at the time because you actually did need to calculate this integral bunch, but it had a bunch of issues like in analog errors compound. And so there was actually a set of breakthroughs necessary in order to get to the digital computer, like Turing's decidability, Shannon. I think the like whole like relay circuits can be thought of as can be mapped to Boolean operators and a set of other like theoretical breakthroughs, which essentially were abstractions. They were like creating abstractions for these like very like lossy circuits. They were creating abstractions for these like very analog circuits and digital had this nice property of like being error correcting. And so when I talk about like less leaky abstractions, that's what I mean. That's what I'm kind of pointing a little bit to. It's not going to look exactly the same way. And then the Xerox PARC piece, a lot of that is about like, how do we get to computers that as a person, I can actually use well. And the interface actually helps it unlock so much more power. So the sets of things we're working on, like the sets of abstractions and the interfaces, like hopefully that like help us unlock a lot more power in these systems. Like hopefully that'll come not too far in the future. I could see a next version, maybe a little bit farther out. It's like an agent protocol. So a way for different agents to talk to each other and call each other. Kind of like HTTP. [00:37:40]Swyx: Do you know it exists already? [00:37:41]Kanjun: Yeah, there is a nonprofit that's working on one. I think it's a bit early, but it's interesting to think about right now. Part of why I think it's early is because the issue with agents, it's not quite like the internet where you could like make a website and the website would appear. The issue with agents is that they don't work. And so it may be a bit early to figure out what the protocol is before we really understand how these agents get constructed. But, you know, I think that's, I think it's a really interesting question. [00:38:09]Swyx: While we're talking on this agent to agent thing, there's been a bit of research recently on some of these approaches. I tend to just call them extremely complicated chain of thoughting, but any perspectives on kind of meta-GPT, I think it's the name of the paper. I don't know if you care about at the level of individual papers coming out, but I did read that recently and TLDR, it beat GPT-4 and human eval by role-playing software agent development agency, instead of having sort of single shot or single role, you have multiple roles and how having all of them criticize each other as agents communicating with other agents. [00:38:45]Kanjun: Yeah, I think this is an example of an interesting abstraction of like, okay, can I just plop in this like multi-role critiquing and see how it improves my agent? And can I just plop in chain of thought, tree of thought, plop in these other things and see how they improve my agent? One issue with this kind of prompting is that it's still not very reliable. It's like, there's one lens, which is like, okay, if you do enough of these techniques, you'll get to high reliability. And I think actually that's a pretty reasonable lens. We take that lens often. And then there's another lens that's like, okay, but it's starting to get really messy what's in the prompt and like, how do we deal with that messiness? And so maybe you need like cleaner ways of thinking about and constructing these systems. And we also take that lens. So yeah, I think both are necessary. Yeah. [00:39:29]Swyx: Side question, because I feel like this also brought up another question I had for you. I noticed that you work a lot with your own benchmarks, your own evaluations of what is valuable. I would say I would contrast your approach with OpenAI as OpenAI tends to just lean on, hey, we played StarCraft or hey, we ran it on the SAT or the, you know, the AP bio test and that did results. Basically, is benchmark culture ruining AI? [00:39:55]Swyx: Or is that actually a good thing? Because everyone knows what an SAT is and that's fine. [00:40:04]Kanjun: I think it's important to use both public and internal benchmarks. Part of why we build our own benchmarks is that there are not very many good benchmarks for agents, actually. And to evaluate these things, you actually need to think about it in a slightly different way. But we also do use a lot of public benchmarks for like, is the reasoning capability in this particular way improving? So yeah, it's good to use both. [00:40:26]Swyx: So for example, the Voyager paper coming out of NVIDIA played Minecraft and set their own benchmarks on getting the Diamond X or whatever and exploring as much of the territory as possible. And I don't know how that's received. That's obviously fun and novel for the rest of the engineer, the people who are new to the scene. But for people like yourselves, you build Avalon just because you already found deficiencies with using Minecraft. Is that valuable as an approach? Oh, yeah. I love Voyager. [00:40:57]Kanjun: I mean, Jim, I think is awesome. And I really like the Voyager paper and I think it has a lot of really interesting ideas, which is like the agent can create tools for itself and then use those tools. [00:41:06]Swyx: He had the idea of the curriculum as well, which is something that we talked about earlier. Exactly. [00:41:09]Kanjun: And that's like a lot of what we do. We built Avalon mostly because we couldn't use Minecraft very well to like learn the things we wanted. And so it's like not that much work to build our own. [00:41:19]Swyx: It took us, I don't know. [00:41:22]Kanjun: We had like eight engineers at the time, took about eight weeks. So six weeks. [00:41:27]Swyx: And OpenAI built their own as well, right? Yeah, exactly. [00:41:30]Kanjun: It's just nice to have control over our environment. But if you're doing our own sandbox to really trying to inspect our own research questions. But if you're doing something like experimenting with agents and trying to get them to do things like Minecraft is a really interesting environment. And so Voyager has a lot of really interesting ideas in it. [00:41:47]Swyx: Yeah. Cool. One more element that we had on this list, which is context and memory. I think that's kind of like the foundational, quote unquote, RAM of our era. I think Andrej Karpathy has already made this comparison. So there's nothing new here. And that's just the amount of working knowledge that we can fit into one of these agents. And it's not a lot, right? Especially if you need to get them to do long running tasks. If they need to self-correct from errors that they observe while operating in their environment. Do you see this as a problem? Do you think we're going to just trend to infinite context and that'll go away? Or how do you think we're going to deal with it? [00:42:22]Kanjun: I think when you talked about what's going to happen in the first wave and then in the second wave, I think what we'll see is we'll get like relatively simplistic agents pretty soon. And they will get more and more complex. And there's like a future wave in which they are able to do these like really difficult, really long running tasks. And the blocker to that future, one of the blockers is memory. And that was true of computers too. You know, I think when von Neumann made the von Neumann architecture, he was like, the biggest blocker will be like, we need this amount of memory, which is like, I don't remember exactly like 32 kilobytes or something to store programs. And that will allow us to write software. He didn't say it this way because he didn't have these terms, but that only really was like happened in the seventies with the microchip revolution. It may be the case that we're waiting for some research breakthroughs or some other breakthroughs in order for us to have like really good long running memory. And then in the meantime, agents will be able to do all sorts of things that are a little bit smaller than that. I do think with the pace of the field, we'll probably come up with all sorts of interesting things like, you know, RAG is already very helpful. [00:43:26]Swyx: Good enough, you think? [00:43:27]Kanjun: Maybe good enough for some things. [00:43:29]Swyx: How is it not good enough? I don't know. [00:43:31]Kanjun: I just think about a situation where you want something that's like an AI scientist. As a scientist, I have learned so much about my fields and a lot of that data is maybe hard to fine tune or on, or maybe hard to like put into pre-training. Like a lot of that data, I don't have a lot of like repeats of the data that I'm seeing. You know, like if I'm a scientist, I've like accumulated so many little data points. And ideally I'd want to store those somehow, or like use those to fine tune myself as a model somehow, or like have better memory somehow. I don't think RAG is enough for that kind of thing. But RAG is certainly enough for like user preferences and things like that. Like what should I do in this situation? What should I do in that situation? That's a lot of tasks. We don't have to be a scientist right away. Awesome. [00:44:21]Swyx: I have a hard question, if you don't mind me being bold. Yeah. I think the most comparable lab to InView is Adept. You know, a research lab with like some amount of product situation on the horizon, but not just yet, right? Why should people work for InView over Adept? And we can cut this if it's too like... Yeah. [00:44:40]Kanjun: The way I think about it is I believe in our approach. The type of thing that we're doing is we're trying to like build something that enables other people to build agents and build something that really can be maybe something like an operating system for agents. I know that that's what we're doing. I don't really know what everyone else is doing. You know, I can kind of like talk to people and have some sense of what they're doing. And I think it's a mistake to focus too much on what other people are doing, because extremely focused execution on the right thing is what matters. To the question of like, why us? I think like strong focus on reasoning, which we believe is the biggest blocker, on inspectability, which we believe is really important for user experience and also for the power and capability of these systems. Building non-leaky, good abstractions, which we believe is solving the core issue of agents, which is around reliability and being able to make them deployable. And then really seriously trying to use these things ourselves, like every single day, and getting to something that we can actually ship to other people that becomes something that is a platform. Like, it feels like it could be Mac or Windows. I love the dogfooding approach. [00:45:49]Swyx: That's extremely important. And you will not be surprised how many agent companies I talk to that don't use their own agent. Oh no, that's not good. That's a big surprise. [00:45:59]Kanjun: Yeah, I think if we didn't use our own agents, then we would have all of these beliefs about how good they are. Wait, did you have any other hard questions you wanted to ask? [00:46:08]Swyx: Yeah, mine was just the only other follow-up that you had based on the answer you just gave was, do you see yourself releasing models or do you see yourself, what is the artifacts that you want to produce that lead up to the general operating system that you want to have people use, right? And so a lot of people just as a byproduct of their work, just to say like, hey, I'm still shipping, is like, here's a model along the way. Adept took, I don't know, three years, but they released Persimmon recently, right? Like, do you think that kind of approach is something on your horizon? Or do you think there's something else that you can release that can show people, here's kind of the idea, not the end products, but here's the byproducts of what we're doing? [00:46:51]Kanjun: Yeah, I don't really believe in releasing things to show people like, oh, here's what we're doing that much. I think as a philosophy, we believe in releasing things that will be helpful to other people. [00:47:02]Swyx: Yeah. [00:47:02]Kanjun: And so I think we may release models or we may release tools that we think will help agent builders. Ideally, we would be able to do something like that, but I'm not sure exactly what they look like yet. [00:47:14]Swyx: I think more companies should get into the releasing evals and benchmarks game. Yeah. [00:47:20]Kanjun: Something that we have been talking to agent builders about is co-building evals. So we build a lot of our own evals and every agent builder tells me, basically evals are their biggest issue. And so, yeah, we're exploring right now. And if you are building agents, please reach out to me because I would love to, like, figure out how we can be helpful based on what we've seen. Cool. [00:47:40]Swyx: That's a good call to action. I know a bunch of people that I can send your way. Cool. Great. [00:47:43]Kanjun: Awesome. [00:47:44]Swyx: Yeah. We can zoom out to other interests now. [00:47:46]Alessio: We got a lot of stuff. So we have Sherif from Lexicon, the podcast. He had a lot of interesting questions on his website. You similarly have a lot of them. Yeah. [00:47:55]Swyx: I need to do this. I'm very jealous of people with personal websites right there. Like, here's the high level questions of goals of humanity that I want to set people on. And I don't have that. [00:48:04]Alessio: It's never too late, Sean. [00:48:05]Swyx: Yeah. [00:48:05]Alessio: It's never too late. [00:48:06]Kanjun: Exactly. [00:48:07]Alessio: There were a few that stuck out as related to your work that maybe you're kind of learning [00:48:12]Swyx: more about it. [00:48:12]Alessio: So one is why are curiosity and goal orientation often at odds? And from a human perspective, I get it. It's like, you know, would you want to like go explore things or kind of like focus on your career? How do you think about that from like an agent perspective? Where it's like, should you just stick to the task and try and solve it as in the guardrails as possible? Or like, should you look for alternative solutions? [00:48:34]Swyx: Yeah. [00:48:34]Kanjun: I think one thing that's really interesting about agents actually is that they can be forked. Like, you know, we can take an agent that's executed to a certain place and said, okay, here, like fork this and do a bunch of different things. I try a bunch of different things. Some of those agents can be goal oriented and some of them can be like more curiosity driven. You can prompt them in slightly different ways. And something I'm really curious about, like what would happen if in the future, you know, we were able to actually go down both paths. As a person, why I have this question on my website is I really find that like I really can only take one mode at a time and I don't understand why. And like, is it inherent in like the kind of context that needs to be held? That's why I think from an agent perspective, like forking it is really interesting. Like I can't fork myself to do both, but I maybe could fork an agent to like add a certain point in a task. [00:49:26]Swyx: Yeah. Explore both. Yeah. [00:49:28]Alessio: How has the thinking changed for you as the funding of the company changed? That's one thing that I think a lot of people in the space think is like, oh, should I raise venture capital? Like, how should I get money? How do you feel your options to be curious versus like goal oriented has changed as you raise more money and kind of like the company has grown? [00:49:50]Kanjun: Oh, that's really funny. Actually, things have not changed that much. So we raised our Series A $20 million in late 2021. And our entire philosophy at that time was, and still kind of is, is like, how do we figure out the stepping stones, like collect stepping stones that eventually let us build agents, kind of these new computers that help us do bigger things. And there was a lot of curiosity in that. And there was a lot of goal orientation in that. Like the curiosity led us to build CARBS, for example, this hyperparameter optimizer. Great name, by the way. [00:50:28]Swyx: Thank you. [00:50:29]Kanjun: Is there a story behind that name? [00:50:30]Swyx: Yeah. [00:50:31]Kanjun: Abe loves CARBS. It's also cost aware. So as soon as he came up with cost aware, he was like, I need to figure out how to make this work. But the cost awareness of it was really important. So that curiosity led us to this really cool hyperparameter optimizer. That's actually a big part of how we do our research. It lets us experiment on smaller models. And for those experiment results to carry to larger ones. [00:50:56]Swyx: Which you also published a scaling laws, which is great. I think the scaling laws paper from OpenAI was like the biggest. And from Google, I think, was the greatest public service to machine learning that any research lab can do. Yeah, totally. [00:51:10]Kanjun: What was nice about CARBS is it gave us scaling laws for all sorts of hyperparameters. So yeah, that's cool. It basically hasn't changed very much. So there's some curiosity. And then there's some goal oriented parts. Like Avalon, it was like a six to eight week sprint for all of us. And we got this thing out. And then now different projects do like more curiosity or more goal orientation at different times. Cool. [00:51:36]Swyx: Another one of your questions that we highlighted was, how can we enable artificial agents to permanently learn new abstractions and processes? I think this is might be called online learning. [00:51:45]Kanjun: Yeah. So I struggle with this because, you know, that scientist example I gave. As a scientist, I've like permanently learned a lot of new things. And I've updated and created new abstractions and learned them pretty reliably. And you were talking about like, okay, we have this RAM that we can store learnings in. But how well does online learning actually work? And the answer right now seems to be like, as models get bigger, they fine tune faster. So they're more sample efficient as they get bigger. [00:52:15]Swyx: Because they already had that knowledge in there. You're just kind of unlocking it. [00:52:23]Kanjun: Partly maybe because they already have like some subset of the representation. Partly they just memorize things more, which is good. So maybe this question is going to be solved, but I still don't know what the answer is. [00:52:36]Swyx: As I've had a platform that continually fine tunes for you as you work on that domain, which is something I'm working on. Well, that's great. We would love to use that. We'll talk more. Two more questions just about your general activities. I think you've just been very active in the San Francisco tech scene. You're a founding member of Software Commons. [00:52:56]Kanjun: Oh yeah, that's true. [00:52:57]Swyx: Tell me more. By the time I knew about SPC, it was already a very established thing. But what was it like in the early days? What was the story there? [00:53:05]Kanjun: Yeah, the story is Ruchi, who started it, was the VP of operations at Dropbox. And I was the chief of staff and we worked together very closely. She's actually one of the investors in Sorceress. And SPC is an investor in Vue. And at that time, Ruchi was like, you know, I would like to start a space for people who are figuring out what's next. And we were figuring out what's next post-Ember, those three months. And she was like, do you want to just like hang out in this space? And we're like, sure. And it was a really good group. Wasim and Jeff from Pilot, the folks from Zulip, and a bunch of other people at that time. It was a really good group. We just hung out. There was no programming. It's much more official than it was at that time. [00:53:44]Swyx: Yeah, now it's like a YC before YC type of thing. That's right, yeah. [00:53:48]Kanjun: At that time, we literally, it was a bunch of friends hanging out in the space together. [00:53:51]Swyx: And was this concurrent with the Archive? [00:53:53]Kanjun: Oh yeah, actually, I think we started the Archive around the same time. [00:53:56]Swyx: You're just like really big into community. But also like, so, you know, I run a Hacker House and I'm also part of hopefully what becomes like the next Software Commons or whatever. What are the principles in organizing communities like that with really exceptional people that go on to do great things? Do you have to be really picky about who joins? Like all your friends just magically turn out super successful like that. You know, it's not normal, right? Like this is very special. And a lot of people want to do that and fail. And you had the co-authors of GPT-3 in your house. That's true. [00:54:32]Kanjun: And a lot of other really cool people that you'll eventually hear about. [00:54:35]Swyx: Co-founders of Pilot and anyone else. I don't want you to pick your friends, but there's some magic special sauce in getting people together and in one workspace, living space, whatever, right? And that's part of why I'm here in San Francisco. And I would love for more people to learn about it and also maybe get inspired to build their own. [00:54:52]Kanjun: Your question is really more about like, how do you actually build a community that where people in it are like eventually are awesome? [00:54:59]Swyx: Okay. [00:55:00]Kanjun: Which is different than like why live in a co-living house. So one adage we had when we started the archive was you become the average of the five people closest to you. [00:55:08]Swyx: Yes. [00:55:08]Kanjun: And I think that's roughly true. And good people draw good people. So there are really two things. One, we were quite picky and it mattered a lot to us. Is this someone where if they're hanging out in the living room, we'd be really excited to come hang out. Yeah. Two is I think we did a really good job of creating a high growth environment and an environment where people felt really safe. We actually apply these things to our team and it works remarkably well as well. So I do a lot of basically how do I create safe spaces for people where it's not just like safe law, but like it's like a safe space where people really feel inspired by each other. And I think at the archive, we really made each other better. My friend, Michael Nielsen called it a self-actualization machine. [00:55:52]Swyx: My goodness. Okay. [00:55:54]Kanjun: And I think, yeah, people came in. Was he a part of the archive? He was not, but he hung out a lot. Honorary member. Friend of the archive. [00:56:02]Swyx: Yeah. [00:56:02]Kanjun: The culture was that we learned a lot of things from each other about like how to make better life systems and how to think about ourselves and psychological debugging. And a lot of us were founders. So having other founders going through similar things was really helpful. And a lot of us worked in AI. And so having other people to talk about AI with was really helpful. And so I think all of those things led to a form of idea flux and also kind of like, so I think a lot about like the idea flux and default habits or default impulses. It led to a set of idea flux and default impulses that led to some really interesting things and led to us doing much bigger things, I think, than we otherwise would have decided to do because it felt like taking risks was less risky. So that's something we do a lot of on the team. It's like, how do we make it so that taking risks is less risky? And there's a term called senious. [00:56:57]Swyx: Yes. I was thinking Kevin Kelly. Kevin Kelly, senious. I was going to feed you that word, but I didn't want to like bias you. Yes. [00:57:02]Kanjun: I think maybe like a lot of what I'm interested in is constructing a kind of senious. And the archive was definitely a senious in a particular, or like getting toward a senious in a particular way. And Jason Ben, my archive housemate and who now runs the neighborhood, [00:57:17]Swyx: has a good way of putting it. [00:57:17]Kanjun: If genius is from your genes, senious is from your scene. Yeah, I think like maybe a lot of the community building impulse is from this like interest in what kind of idea flux can be created. You know, there's a question of like, why did Xerox PARC come out with all of this interesting stuff? It's their senious. Why did Bell Labs come out with all this interesting stuff? Maybe it's their senious. Why didn't the transistor come out of Princeton? And the other people working on it at the time. [00:57:44]Swyx: I just think it's remarkable how you hear a lot about Alan Kay. And I just read a bit. And apparently Alan Kay was like the most junior guy at Xerox PARC. Yeah, definitely. [00:57:53]Kanjun: He's just the one who talks about it. He talks the most. [00:57:57]Swyx: Yeah, exactly. Yeah. So I, you know, hopefully I'm also working towards contributing that senious. I called mine the more provocative name of the arena. Interesting. That's quite provocative. In the arena. [00:58:08]Kanjun: So are you fighting other people in the arena? [00:58:11]Swyx: No. You never know. [00:58:12]Alessio: On any day in the mission, it's an adventure. [00:58:15]Swyx: We're in the arena trying stuff, as they say. You are also a GP at Outset Capital, where you also co-organize the Thursday Nights in AI, where hopefully someday I'll eventually speak. You're on the roster. [00:58:28]Kanjun: I'm on the roster. [00:58:29]Swyx: Thank you so much. So why spend time being a VC and organizing all these events? You're also a very busy CEO and, you know, why spend time with that? Why is that an important part of your life? [00:58:39]Kanjun: Yeah, for me personally, I really like helping founders. So Allie, my investing partner, is fortunately amazing and she does everything for the fund. So she like hosts the Thursday night events and she finds folks who we could invest in. And she does basically everything. Josh and I are her co-partners. So Allie was our former chief of staff at Sorceress. We just thought she was amazing. She wanted to be an investor. And Josh and I also like care about helping founders and kind of like giving back to the community. What we didn't realize at the time when we started the fund is that it would actually be incredibly helpful for Imbue. So talking to AI founders who are building agents and working on, you know, similar things is really helpful. They could potentially be our customers and they're trying out all sorts of interesting things. And I think being an investor, looking at the space from the other side of the table, it's just a different hat that I routinely put on. And it's helpful to see the space from the investor lens as opposed to from the founder lens. So I find that kind of like hat switching valuable. It maybe would lead us to do slightly different things. [00:59:44]Swyx: Awesome. Appreciate that. [00:59:46]Alessio: Yeah, you've been really generous with your time. Let's just wrap with the lightning round. Okay. So we have two questions, acceleration, exploration, and then a takeaway. So the acceleration question is, what's something that already happened in AI that you thought would take much longer to be here? [01:00:03]Kanjun: I think the rate at which we discover new capabilities of existing models and kind of like build hacks on top of them to make them work better is something that has been surprising and awesome. And the research community building on its own ideas, that's probably, you want something very specific. Yeah, I think the rate at which we discovered capabilities probably. [01:00:22]Swyx: Cool. Exploration slash requests for startups. If you weren't building Imbue, what AI company would you build? Hmm. Every founder has like their like number two. Really? Yeah, I don't know. [01:00:33]Kanjun: Wow. I cannot imagine building any other thing than Imbue. [01:00:37]Swyx: Wow. Well, that's a great answer too. [01:00:38]Kanjun: It's like obviously the thing to build. [01:00:42]Swyx: Okay. [01:00:42]Kanjun: It's like obviously work on the fundamental platform. Yeah. [01:00:46]Swyx: So that was my attempt at innovating this question, but the previous one was, but what was the most interesting unsolved question in AI? [01:00:53]Kanjun: My answer is kind of boring, but the most interesting unsolved questions are these questions of, how do we make these stochastic systems into things that we can like reliably use and build on top of? [01:01:04]Swyx: Yep. [01:01:05]Alessio: And yeah, take away what's one message you want everyone to remember? [01:01:09]Kanjun: Maybe two things. One is just the like, we're in a historic moment. I didn't think in my lifetime I would necessarily be in, like able to work on the things I'm excited to work on in this moment, but we're in a historic moment that where we'll look back and be like, oh my God, the future was invented in these years. And I think like, there may be a set of messages to take away from that. One is like, AI is a tool like any technology. And you know, when it comes to things like, what might the future look like? Like we like to think about it as, it's like just a better computer. It's like much more powerful computer that gives us a lot of free intellectual energy that we can now like solve so many problems with. You know, there are so many problems in the world [01:01:53]Swyx: where we're like, [01:01:53]Kanjun: oh, it's not worth a person thinking about that. And so things get worse and things get worse. No one wants to work on maintenance. And like this technology gives us the potential to actually be able to like allocate intellectual energy to all of those problems. And the world could be much better, like could be much more thoughtful because of that. I'm so excited about that. And there are definitely risks and dangers. And we actually do a fair, something I didn't talk about is we do a fair amount of work on the policy side. On the safety side, like we think about safety and policy in terms of engineering theory and also regulation. And kind of comparing to like the automobile or the airplane or any new technology, there's like a set of new possible capabilities and a set of new possible dangers that are unlocked with every new technology. And so on the engineering side, like we think a lot about engineering safety, like how do we actually engineer these systems so that they are inspectable and why we reason in natural language so that the systems are very inspectable so that we can like stop things if anything weird is happening. That's why we don't think end-to-end black boxes [01:02:58]Swyx: are a good idea. [01:02:58]Kanjun: On the theoretical side, we like really believe in like deeply understanding, like when we actually fine tune on individual examples, like what's going on, when we're pre-training, what's going on, like debugging tools for these agents to understand like what's going on. And then on the regulation side, I think there's actually a lot of regulation that already covers many of the dangers like that people are talking about. And there are areas where there's not much regulation. And so we focus on those areas where there's not much regulation. So some of our work is actually, we built an agent that helped us analyze the 20,000 pages of policy proposals submitted to the Department of Commerce request for AI policy proposals. We looked at what were the problems people brought up and what were the solutions they presented and then like did a summary analysis and kind of like, you know, build agents to do that. And now the Department of Commerce is like interested in using that as a tool to like analyze proposals. And so a lot of what we're trying to do on the regulation side is like actually figure out where is there regulation missing and how do we actually in a very targeted way try to solve those missing areas. So I guess if I were to say like, what are the takeaways? It's like the takeaway is like the future could be really exciting if we can actually get agents that are able to do these bigger things. Reasoning is the biggest blocker plus like these sets of abstractions to make things more robust and reliable. And there are, you know, things where we have to be quite careful and thoughtful about how do we deploy these and what kind of regulation should go along with it so that this is actually a technology that where we, when we deploy it, it is protective to people and not harmful. [01:04:36]Swyx: Awesome, wonderful. [01:04:38]Alessio: Thank you so much for your time, Kanjun. [01:04:40]Kanjun: Thank you. [01:04:41]Swyx: Thank you. [01:04:48] Get full access to Latent Space at www.latent.space/subscribe
01:05:0214/10/2023
[AIE Summit Preview #2] The AI Horcrux — Swyx on Cognitive Revolution
This is a special double weekend crosspost of AI podcasts, helping attendees prepare for the AI Engineer Summit next week. After our first friendly feedswap with the Cognitive Revolution pod, swyx was invited for a full episode to go over the state of AI Engineering and to preview the AI Engineer Summit Schedule, where we share many former CogRev guests as speakers.For those seeking to understand how two top AI podcasts think about major top of mind AI Engineering topics, this should be the perfect place to get up to speed, which will be a preview of many of the conversations taking place during the topic tables sessions on the night of Monday October 9 at the AI Engineer Summit.While you are listening, there are two things you can do to be part of the AI Engineer experience. One, join the AI Engineer Summit Slack. Two, take the State of AI Engineering survey and help us get to 1000 respondents!Links* AI Engineer Summit (Join livestream and Slack community)* State of AI Engineering Survey (please help us fill this out to represent you!)* Cognitive Revolution full episode with Nathan* swyx’s ai-notes (featuring Communities in README.md)* We referenced The Eleuther AI Mafia* This podcast intro voice was AI Anna again, from our Wondercraft pod!Timestamps* (00:00:49) AI Nathan’s intro * (00:03:14) What is an AI engineer? * (00:05:56) What backgrounds do AI engineers typically have? * (00:17:13) Swyx’s Discord AI project * (00:20:41) Key tools for AI engineers * (00:23:42) HumanLoop, Guardrails, Langchain * (00:27:01) Criteria for identifying capable AI engineers when hiring * (00:30:59) Skepticism around AI being a fad and doubts about contributing to AI * (00:34:03) AI Engineer Conference speaker lineup * (00:41:14) AI agents and two years to AGI * (00:46:04) Expectations and disagreement around what AI agent capabilities will work soon * (00:50:12) Swyx’s OpenAI thesis * (00:53:03) AI safety considerations and the role of AI engineers * (00:56:24) Disagreement on whether AI will soon be able to generate code pull requests * (01:01:07) AI helping non-technical people to code * (01:01:49) Multi-modal Chat-GPT and the future implications * (01:03:33) Nathan living in the same dorm as Mark Zuckerberg * (01:04:44) Competitive dynamics between OpenAI and other AI model developers * (01:05:39) Play.ht vs ElevenLabs * (01:09:20) The tension between platforms and developers building on top of them * (01:11:40) The best thing startups can do to compete with foundation model providers * (01:16:26) User identity/authentication services like Login with OpenAI * (01:19:20) Google vs the other live players * (01:20:46) AI Horcruxes / Pendants * (01:22:05) The concept of an AI app bundle for consumers and developers Get full access to Latent Space at www.latent.space/subscribe
01:29:4808/10/2023
[AIE Summit Preview #1] Swyx on Software 3.0 and the Rise of the AI Engineer
This is a special double weekend crosspost of AI podcasts, helping attendees prepare for the AI Engineer Summit next week. Swyx gave a keynote on the Software 3.0 Landscape recently (referenced in our recent Humanloop episode) and was invited to go deeper in podcast format, and to preview the AI Engineer Summit Schedule. For those seeking to ramp up on the current state of thinking on AI Engineering, this should be the perfect place to start, alongside our upcoming Latent Space University course (which is being tested live for the first time at the Summit workshops).While you are listening, there are two things you can do to be part of the AI Engineer experience. One, join the AI Engineer Summit Slack. Two, take the State of AI Engineering survey and help us get to 1000 respondents! Full transcript available here! Links* AI Engineer Summit (Join livestream and Slack community)* State of AI Engineering Survey (please help us fill this out to represent you!)* Podrocket full episode by Tejas KumarShow notes* Explaining Software 1.0, 2.0, and 3.0* Software 1.0: Hand-coded software with conditional logic, loops, etc.* Software 2.0: Machine learning models like neural nets trained on data* Software 3.0: Using large pre-trained foundation models without needing to collect/label training data* Foundation Models and Model Architecture* Foundation models like GPT-3/4, Claude, Whisper - can be used off the shelf via API* Model architecture refers to the layers and structure of a ML model* Grabbing a pre-trained model lets you skip data collection and training* Putting Foundation Models into Production* Levels of difficulty: calling an API, running locally, fully serving high-volume predictions* Key factors: GPU utilization, batching, infrastructure expertise* The Emerging AI Developer Landscape* AI is becoming more accessible to "traditional" software engineers* Distinction between ML engineers and new role of AI engineers* AI engineers consume foundation model APIs vs. developing models from scratch* The Economics of AI Engineers* Demand for AI exceeds supply of ML experts to build it* AI engineers will emerge out of software engineers learning these skills* Defining the AI Engineering Stack* System of reasoning: Foundation model APIs* Retrieval augmented generation (RAG) stack: Connects models to data* AI UX: New modalities and interfaces beyond chatbots* Building Products with Foundation Models* Replicating existing features isn't enough - need unique value* Focus on solving customer problems and building trust* AI Skepticism and Hype* Some skepticism is healthy, but "AI blame" also emerges* High expectations from media/industry creators* Important to stay grounded in real customer needs* Meaningful AI Applications* Many examples of AI positively impacting lives already* Engineers have power to build and explore - lots of opportunity* Closing and AI Engineer Summit Details* October 8-10 virtual conference for AI engineers* Speakers from OpenAI, Microsoft, Amazon, etc* Free to attend online Get full access to Latent Space at www.latent.space/subscribe
38:4907/10/2023
RAG Is A Hack - with Jerry Liu from LlamaIndex
Want to help define the AI Engineer stack? >800 folks have weighed in on the top tools, communities and builders for the first State of AI Engineering survey, which we will present for the first time at next week’s AI Engineer Summit. Join us online!This post had robust discussion on HN and Twitter.In October 2022, Robust Intelligence hosted an internal hackathon to play around with LLMs which led to the creation of two of the most important AI Engineering tools: LangChain 🦜⛓️ (our interview with Harrison here) and LlamaIndex 🦙 by Jerry Liu, which we’ll cover today. In less than a year, LlamaIndex has crossed 600,000 monthly downloads, raised $8.5M from Greylock, has a fast growing open source community that contributes to LlamaHub, and it doesn’t seem to be slowing down.LlamaIndex’s Origin (aka GPT Tree Index)Jerry struggled to make large amounts of data work with GPT-3 (which had a 4,096 tokens context window). Today LlamaIndex is at the forefront of the RAG wave (Retrieval Augmented Generation), but in the beginning Jerry wasn’t focused on embeddings and search, but rather on understanding how models could summarize, link, and reason about data. On November 5th, Jerry pushed the first version to Github under the name “GPT Tree Index”: The GPT Tree Index first takes in a large dataset of unprocessed text data as input. It then builds up a tree-index in a bottom-up fashion; each parent node is able to summarize the children nodes using a general summarization prompt; each intermediate node containing summary text summarizing the components below. Once the index is built, it can be saved to disk and loaded for future use.Then, say the user wants to use GPT-3 to answer a question. Using a query prompt template, GPT-3 will be able to recursively perform tree traversal in a top-down fashion in order to answer a question. For example, in the very beginning GPT-3 is tasked with selecting between *n* top-level nodes which best answers a provided query, by outputting a number as a multiple-choice problem. The GPT Tree Index then uses the number to select the corresponding node, and the process repeats recursively among the children nodes until a leaf node is reached.[…]How is this better than an embeddings-based approach / other state-of-the-art QA and retrieval methods?The intent is not to compete against existing methods. A simpler embedding-based technique could be to just encode each chunk as an embedding and do a simple question-document embedding look-up to retrieve the result. This project is a simple exercise to test how GPT can organize and lookup information.The project attracted a lot of attention early on (the announcement tweet has ~330 likes), but it wasn’t until ~February 2023 that the open source community really started to explode, which was around the same time that LlamaHub was released. LlamaHub made it easy for developers to import data from Google Drive, Discord, Slack, databases, and more into their LlamaIndex projects. What is LlamaIndex? As we mentioned, LlamaIndex is leading the charge in the development of the RAG stack. RAG boils down to two parts:* Indexing (i.e. how do you load and index the data in your knowledge base)* Querying (i.e. how do you surface the data and fit it in the model context) IndexingTo get your data from all your sources to your RAG knowledge base, you can leverage a few tools: * Documents / Nodes: A Document is a generic container around any data source - for instance, a PDF, an API output, or retrieved data from a database. A Node is the atomic unit of data in LlamaIndex and represents a “chunk” of a source Document (i.e. one Document has many Node) as well as its relationship to other Node objects.* Data Connectors: A data connector ingest data from different sources and turn them into Document representations (text and simple metadata). These connectors are offered through LlamaHub, and there are over 200 of them today.* Data Indexes: Once you’ve ingested your data, LlamaIndex will help you index the data into a format that’s easy to retrieve. There are many types of indexes (Summary, Tree, Vector, etc). Under the hood, LlamaIndex parses the raw documents into intermediate representations, calculates vector embeddings, and infers metadata. The most commonly used index is the VectorStoreIndex, which can then be paired with any of the vector stores out there (an example with Chroma).QueryingThe RAG pipeline, during the querying phase, sources the most pertinent context from a user's prompt, forwarding it along to the LLM. This equips the LLM with current / private knowledge beyond its foundational training data. LlamaIndex offers adaptable modules tailored for building RAG pathways for Q&A, chatbots, or agent use, since each of them has different requirements. For example, a chatbot should expect the user to interject with follow up questions, while an agent will try to carry out a whole task on its own without user intervention. Building Blocks* Retrievers: A retriever defines how to efficiently retrieve relevant context from a knowledge base (i.e. index) when given a query. Vector index is the most popular mode, but there are other options like Summary, Tree, Keyword Table, Knowledge Graph, and Document Summary. * Node Postprocessors: Once the retriever gets you Node objects back, you will need to do additional work like discarding low similarity ones. There are many options here as well, such as `SimilarityPostprocessor` (i.e. drop nodes below a certain similarity score) or `LongContextReorder` which helps avoid the issues raised in the “Lost in the Middle, U-shaped recollection curve” paper. * Response Synthesizers: Takes a user query and your retrieved chunks, and prompts and LLM with them. There are a few response modes here that balance thoroughness and compactness.Pipelines* Query Engines: A query engine is an end-to-end pipeline that allow you to ask question over your data. It takes in a natural language query, and returns a response, along with reference context retrieved and passed to the LLM. This makes it possible to do things like “Ask panda questions” by leveraging Panda dataframes as a data source. * Chat Engines: A chat engine is an end-to-end pipeline for having a conversation with your data (multiple back-and-forth instead of a single question & answer). This supports traditional OpenAI-style chat interfaces, as well as more advanced ones like ReAct.* Agents: An agent is an automated decision maker (powered by an LLM) that interacts with the world via a set of tools. Agent may be used in the same fashion as query engines or chat engines, but they have the power to both read and write data. For reasoning, you can use either OpenAI Functions or ReAct. Both can leverage the tools offered through LlamaHub for further analysis.RAG vs FinetuningNow that you have a full overview of what LlamaIndex does, the next question is “When should I use this and when should I fine tune?”. Jerry’s TLDR is that “RAG is just a hack”, but a powerful one. Each option has pros and cons:* Lower investment: RAG requires almost 0 upfront investment, unlike finetuning which requires data cleaning, model training, increased costs for finetuned inference, etc.* Stricter access control and higher visibility: when finetuning, the model learns everything. With RAG, you can decide what documents the index should have access to, making it more secure by default. You are also able to see everything that was passed into the context if a response doesn’t look right.* Context window limitation: you can only fit so many tokens into the prompt due to the way models work. Finetuning helps you circumvent that by compressing the knowledge into the model weights rather than putting it in the prompt. As Jerry says, the best way to know this inside out is to learn to build RAG from scratch (without LlamaIndex) - and they have plenty of tutorials on his Twitter and blog to learn this.The other issue is that the math for finetuning isn’t well known yet as we discussed with Quentin Anthony from Eleuther, so unless you have money and time to invest into exploring fine tuning, you’re better off starting with RAG. Full YouTube Discussion!Show Notes* LlamaIndex* LlamaHub* SEC Insights* Robust Intelligence* Quora’s Poe* Chroma* Vespa* Why should every AI engineer learn to build RAG from scratch?* LangChain* Gorilla* Lost in the Middle: How Language Models Use Long ContextsTimestamps* [00:00:00] Introductions and Jerry’s background* [00:04:30] Starting LlamaIndex as a side project* [00:05:11] Evolution from tree-index to current LlamaIndex and LlamaHub architecture* [00:11:39] Deciding to leave Robust to start the LlamaIndex company and raising funding* [00:20:06] Context window size and information capacity for LLMs* [00:21:34] Minimum viable context and maximum context for RAG* [00:22:52] Fine-tuning vs RAG - current limitations and future potential* [00:24:02] RAG as a hack but good hack for now* [00:26:19] RAG benefits - transparency and access control* [00:27:46] Potential for fine-tuning to take over some RAG capabilities* [00:30:04] Baking everything into an end-to-end trained LLM* [00:33:24] Similarities between iterating on ML models and LLM apps* [00:34:47] Modularity and customization options in LlamaIndex: data loading, retrieval, synthesis, reasoning* [00:40:16] Evaluating and optimizing each component of Lama Index system* [00:46:02] Building retrieval benchmarks to evaluate RAG* [00:47:24] SEC Insights - open source full stack LLM app using LlamaIndex* [00:49:48] Enterprise platform to complement LlamaIndex open source* [00:51:00] Community contributions for LlamaHub data loaders* [00:53:21] LLM engine usage - majority OpenAI but options expanding* [00:56:25] Vector store landscape* [00:59:46] Exploring relationships and graphs within data* [01:03:24] Additional complexity of evaluating agent loops* [01:04:01] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence and Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI. [00:00:20]Swyx: And today we finally have Jerry Liu on the podcast. Hey Jerry. [00:00:24]Jerry: Hey guys. Hey Swyx and Alessio. Thanks for having me. [00:00:27]Swyx: It's kind of weird because we keep running into each other in San Francisco AI events, so it's kind of weird to finally just have a conversation recorded for everybody else. [00:00:34]Jerry: Yeah, I know. I'm really looking forward to this, aside from the questions. [00:00:38]Swyx: So I tend to introduce people on their formal background and then ask something on the more personal side. So you are part of the Princeton gang. [00:00:46]Jerry: I don't know if there is like official Princeton gang. [00:00:48]Swyx: No, small Princeton gang. Okay. I attended your meeting. There was like four of you with Prem and the others. And then you have a bachelor's in CS and a certificate in finance. That's also fun. I also did finance and I think I saw that you also interned at Two Sigma where I worked in New York. You were a machine learning engineer. [00:01:06]Jerry: You were at Two Sigma?Swyx: Yeah, very briefly.Jerry: Oh, cool. I didn't know that. [00:01:09]Swyx: That was my first like proper engineering job before I went into DevRel. [00:01:12]Jerry: Oh, okay. Nice. [00:01:14]Swyx: And then you were a machine learning engineer at Quora, AI research scientist at Uber for three years, and then two years machine learning engineer at Robust Intelligence before starting LlamaIndex. So that's your LinkedIn. It's not only LinkedIn that people should know about you. [00:01:27]Jerry: I think back during my Quora days, I had this like three-month phase where I just wrote like a ton of Quora answers. And so I think if you look at my tweets nowadays, you can basically see that as like the V2 of my three-month like Forrestant where I just like went ham on Quora for a bit. I actually, I think I was back then actually when I was working on Quora, I think the thing that everybody was fascinated in was just like general like deep learning advancements and stuff like GANs and generative like images and just like new architectures that were evolving. And it was a pretty exciting time to be a researcher actually, because you were going in like really understanding some of the new techniques. So I kind of use that as like a learning opportunity, basically just like read a bunch of papers and then answer questions on Quora. And so you can kind of see traces of that basically in my current Twitter where it's just like really about kind of like framing concepts and trying to make it understandable and educate other users on it. Yeah. [00:02:17]Swyx: I've said, so a lot of people come to me for my Twitter advice, but like, I think you are doing one of the best jobs in AI Twitter, which is explaining concepts and just consistently getting hits out. Thank you. I didn't know it was due to the Quora training. Let's just sign on on Quora. A lot of people, including myself, like kind of wrote off Quora as like one of the web 1.0 like sort of question answer forums. But now I think it's becoming, seeing a resurgence obviously due to Poe and obviously Adam and D'Angelo has always been a leading tech figure, but what do you think is kind of underrated about Quora? [00:02:46]Jerry: Well, I mean, I like the, I really liked the mission of Quora when I, when I joined. In fact, I interned there like in 2015 and I joined full time in 2017. One is like they had, and they have like a very talented engineering team and just like really, really smart people. And the other part is the whole mission of the company is to just like spread knowledge and to educate people. And to me that really resonated. I really liked the idea of just like education and democratizing the flow of information. If you imagine like kind of back then it was like, okay, you have Google, which is like for search, but then you have Quora, which is just like user generated, like grassroots type content. And I really liked that concept because it's just like, okay, there's certain types of information that aren't accessible to people, but you can make accessible by just like surfacing it. And so actually, I don't know if like most people know that about like Quora and if they've used the product, whether through like SEO, right, or kind of like actively, but that really was what drew me to it. [00:03:39]Swyx: Yeah. I think most people challenges with it is that sometimes you don't know if it's like a veiled product pitch, right? [00:03:44]Jerry: Yeah. Of course, like quality of the answer matters quite a bit. And then you start running into these like- [00:03:47]Swyx: It's like five alternatives and then here's the one I work on. Yeah. [00:03:50]Jerry: Like recommendation issues and all that stuff. I used, I worked on recsys at Quora actually, so I got a taste of some of that stuff. Well, I mean, I kind of more approached it from machine learning techniques, which might be a nice segue into RAG actually. A lot of it was just information retrieval. We weren't like solving anything that was like super different than what was standard in the industry at the time, but just like ranking based on user preferences. I think a lot of Quora was very metrics driven. So just like trying to maximize like daily active hours, like time spent on site, those types of things. And all the machine learning algorithms were really just based on embeddings. You have a user embedding and you have like item embeddings and you try to train the models to try to maximize the similarity of these. And it's basically a retrieval problem. [00:04:30]Swyx: Okay. So you've been working on RAG for longer than most people think? [00:04:33]Jerry: Well, kind of. So I worked there for like a year, right, just transparently. And then I worked at Uber where I was not working on ranking. It was more like kind of deep learning training for self-driving and computer vision and that type of stuff. But I think in the LLM world, it's kind of just like a combination of like everything these days. I mean, retrieval is not really LLMs, but like it fits within the space of like LLM apps. And then obviously like having knowledge of the underlying deep learning architectures helps. Having knowledge of basic software engineering principles helps too. And so I think it's kind of nice that like this whole LLM space is basically just a combination of just like a bunch of stuff that you probably like people have done in the past. [00:05:11]Swyx: It's good. It's like a summary capstone project. Yeah, exactly. [00:05:14]Jerry: Yeah. [00:05:15]Alessio: And before we dive into LlamaIndex, what do they feed you a robust intelligence that both you and Harrison from LangChain came out of it at the same time? Was there like, yeah. Is there any fun story of like how both of you kind of came up with kind of like core infrastructure to LLM workflows today? Or how close were you at robust? Like any fun behind the scenes? [00:05:37]Jerry: Yeah. Yeah. We, um, we work pretty closely. I mean, we were on the same team for like two years. I got to know Harrison and the rest of the team pretty well. I mean, I have a respect that people there, the people that were very driven, very passionate. And it definitely pushed me to be, you know, a better engineer and leader and those types of things. Yeah. I don't really have a concrete explanation for this. I think it's more just, we have like an LLM hackathon around like September. This was just like exploring GPT-3 or it was October actually. And then the day after I went on vacation for a week and a half, and so I just didn't track Slack or anything. And then when I came back, saw that Harrison started LangChain [00:06:09]Swyx: Oh that's cool. [00:06:10]Jerry: I was like, oh, I'll play around with LLMs a bit and then hacked around on stuff. And I think I've told the story a few times, but you know, I was like trying to feed in information into GPT-3. And then, then you deal with like context window limitations and there was no tooling or really practices to try to understand how do you, you know, get GPT-3 to navigate large amounts of data. And that's kind of how the project started. Really was just one of those things where early days, like we were just trying to build something that was interesting. Like I wanted to start a company. I had other ideas actually of what I wanted to start. And I was very interested in, for instance, like multimodal data, like video data and that type of stuff. And then this just kind of grew and eventually took over the other idea. [00:06:48]Swyx: Text is the universal interface. [00:06:50]Jerry: I think so. I think so. I actually think once the multimodal models come out, I think there's just like mathematically nicer properties of you can just get like join multiple embeddings, like clip style. But text is really nice because from a software engineering principle, it just makes things way more modular. You can just convert everything into text and then you just represent everything as text. [00:07:08]Swyx: Yeah. I'm just explaining retroactively why working on LlamaIndex took off versus if you had chose to spend your time on multimodal, we probably wouldn't be talking about whatever you ended up working on. [00:07:18]Jerry: Yeah. [00:07:19]Swyx: That's true. It's troubled. Interesting. So November 9th, that was a very productive month. I guess October, November, November 9th, you announced GPT-3 Index and you picked a tree logo. Very cool. Every project must have an emoji. [00:07:32]Jerry: Yeah. Yeah. I probably was somewhat inspired by a light train, but I will admit, yeah. [00:07:37]Swyx: It uses GPT to build a knowledge tree in a bottoms-up fashion by applying a summarization prompt for each node. Yep. Which I like that original vision. Your messaging roundabout then was also that you're creating optimized data structures. What's the sort of journey to that and how does that contrast with LlamaIndex today? Okay. [00:07:56]Jerry: Maybe I can tell a little bit about the beginning intuitions. I think when I first started, this really wasn't supposed to be something that was like a toolkit that people use. It was more just like a system. And the way I wanted to think about the system was more a thought exercise of how language models with their reasoning capabilities, if you just treat them as like brains, can organize information and then traverse it. So I didn't want to think about embeddings, right? To me, embeddings just felt like it was just an external thing that was like, well, it was just external to trying to actually tap into the capabilities of language models themselves, right? I really wanted to see, you know, just as like a human brain could like synthesize stuff, could we create some sort of like structure where this neural CPU, if you will, can like organize a bunch of information, you know, auto-summarize a bunch of stuff and then also traverse the structure that I created. That was the inspiration for this initial tree index, to be honest. And I think I said this in the first tweet, it actually works super well, right? Like GPT-4 obviously is much better at reasoning. I'm one of the first to say, you know, you shouldn't use anything pre-GPT-4 for anything that requires complex reasoning because it's just going to be unreliable, okay, disregarding stuff like fine tuning. But it worked okay. But I think it definitely struck a chord with kind of like the Twitter crowd, which is just like new ideas at the time, I guess, just like thinking about how you can actually bake this into some sort of application. Because I think what I also ended up discovering was the fact that there was starting to become a wave of developers building on top of GPT-3 and people were starting to realize that what makes them really useful is to apply them on top of your personal data. And so even if the solution itself was kind of like primitive at the time, like the problem statement itself was very powerful. And so I think being motivated by the problem statement, right, like this broad mission of how do I unlock elements on top of the data also contributed to the development of LOM index to the state it is today. And so I think part of the reason, you know, our toolkit has evolved beyond the just existing set of like data structures is we really tried to take a step back and think, okay, what exactly are the tools that would actually make this useful for a developer? And then, you know, somewhere around December, we made an active effort to basically like push towards that direction, make the code base more modular, right, more friendly as an open source library. And then also start adding in like embeddings, start thinking into practical considerations like latency, cost, performance, those types of things. And then really motivated by that mission, like start expanding the scope of the toolkit towards like covering the life cycle of like data ingestion and querying. Where you also added Llamahub and yeah, so I think that was in like January on the data loading side. And so we started adding like some data loaders, saw an opportunity there, started adding more stuff on the retrieval querying side, right? We still have like the core data structures, but how do you actually make them more modular and kind of like decouple storing state from the types of like queries that you could run on top of this a little bit. And then starting to get into more complex interactions, like chain of thought reasoning, routing and, you know, like agent loops. [00:10:44]Alessio: You and I spent a bunch of time earlier this year talking about Llamahub, what that might become. You were still at Robust. When did you decide it was time to start the company and then start to think about what LlamaIndex is today? [00:10:58]Jerry: Yeah, I mean, probably December. It was kind of interesting. I was getting some inbound from initial VCs, I was talking about this project. And then in the beginning, I was like, oh, yeah, you know, this is just like a design project. But you know, what about my other idea on like video data, right? And then I was trying to like get their thoughts on that. And then everybody was just like, oh, yeah, whatever, like that part's like a crowded market. And then it became clear that, you know, this was actually a pretty big opportunity. And like, coincidentally, right, like this actually did relate to like, my interests have always been at the intersection of AI data and kind of like building practical applications. And it was clear that this was evolving into a much bigger opportunity than the previous idea was. So around December, and then I think I gave a pretty long notice, but I left officially like early March. [00:11:39]Alessio: What were your thinkings in terms of like moats and, you know, founders kind of like overthink it sometimes. So you obviously had like a lot of open source love and like a lot of community. And you're like, were you ever thinking, okay, I don't know, this is maybe not enough to start a company or did you always have conviction about it? [00:11:59]Jerry: Oh, no, I mean, 100%. I felt like I did this exercise, like, honestly, probably more late December and then early January, because I was just existentially worried about whether or not this would actually be a company at all. And okay, what were the key questions I was thinking about? And these were the same things that like other founders, investors, and also like friends would ask me is just like, okay, what happens if context windows get much bigger? What's the point of actually structuring data right in the right way? Right? Why don't you just dump everything into the prompt, fine tuning, like, what if you just train the model over this data? And then, you know, what's the point of doing this stuff? And then some other ideas is what if like OpenAI actually just like takes this like builds upwards on top of the their existing like foundation models and starts building in some like built in orchestration capabilities around stuff like RAG and agents and those types of things. And so I basically ran through this mental exercise and, you know, I'm happy to talk a little bit more about those thoughts as well. But at a high level, well, context windows have gotten bigger, but there's obviously still a need for a rag. I think RAG is just like one of those things that like, in general, what people care about is, yes, they do care about performance, but they also care about stuff like latency and costs. And so my entire reasoning at the time was just like, okay, like, yes, maybe you will have like much bigger context windows, as we've seen with like 100k context windows. But for enterprises, like, you know, data, which is not in just like the scale of like a few documents, it's usually in like gigabytes, terabytes, petabytes. How do you actually just unlock language models over that data, right? And so it was clear there was just like, whether it's RAG or some other paradigm, no one really knew what that answer was. And so there was clearly like technical opportunity here. Like there was just stacks that needed to be invented to actually solve this type of problem, because language models themselves didn't have access to this data. The other piece here is just like, and so if like you just dumped all this data into, let's say a model had like hypothetically an infinite context window, right? And you just dump like 50 gigabytes of data into a context window. That just seemed very inefficient to me, because you have these network transfer costs of uploading 50 gigabytes of data to get back a single response. And so I kind of realized, you know, there's always going to be some curve, regardless of like the performance of the best performing models of like cost versus performance. What RAG does is it does provide extra data points along that access, because you kind of control the amount of context you actually wanted to retrieve. And of course, like RAG as a term was still evolving back then, but it was just this whole idea of like, how do you just fetch a bunch of information to actually, you know, like stuff into the prompt. And so people even back then were kind of thinking about some of those considerations. [00:14:29]Swyx: And then you fundraised in June, or you announced your fundraiser in June. Yeah. Take us through that process of thinking about the fundraise and your plans for the company, you know, at the time. Yeah, definitely. [00:14:41]Jerry: I mean, I think we knew we wanted to, I mean, obviously we knew we wanted to fundraise. There was also a bunch of like investor interest, and it was probably pretty unusual given the, you know, like hype wave of generative AI. So like a lot of investors were kind of reaching out around like December, January, February. In the end, we went with Greylock. Greylock's great. You know, they've been great partners so far. And to be honest, like there's a lot of like great VCs out there. And a lot of them who are specialized on like open source, data, infra, and that type of stuff. What we really wanted to do was, because for us, like time was of the essence, like we wanted to ship very quickly and still kind of build Mindshare in this space. We just kept the fundraising process very efficient. I think we basically did it in like a week or like three days. And so, yeah, just like front loaded it and then just like pick the one named Jerry. Yeah, exactly. Yeah. [00:15:27]Swyx: I'm kidding. I mean, he's obviously great and Greylock's a fantastic firm. [00:15:32]Jerry: Embedding some of my research. So, yeah, just we've had Greylock. They've been great partners. I think in general, when I talk to founders about like the fundraise process, it's never like the most fun period, I think, because it's always just like, you know, there's a lot of logistics, there's lawyers you have to, you know, get in the loop. And like a lot of founders just want to go back to building. I think in the end, we're happy that we kept it to a pretty efficient process. [00:15:54]Swyx: And so you fundraise with Simon. How do you split things with him? How big is your team now? [00:15:57]Jerry: The team is growing. By the time this podcast is released, we'll probably have had one more person join the team. So basically, it's between, we're rapidly getting to like eight or nine people. At the current moment, we're around like six. And so just like there'll be some exciting developments in the next few weeks. I'm excited to announce that. So the team is, has kind of like, we've been pretty selective in terms of like how we like grow the team. Obviously, like we look for people that are really active in terms of contributions to Lum Index, people that have like very strong engineering backgrounds. And primarily, we've been kind of just looking for builders, people that kind of like grow the open source and also eventually this like managed like enterprise platform as well with us. In terms of like Simon, yeah, I've known Simon for a few years now. I knew him back at Uber ATG in Toronto. He's one of the smartest people I knew, has a sense of both like a deep understanding of ML, but also just like first principles thinking about like engineering and technical concepts in general. And I think one of my criteria, criteria is when I was like looking for a co-founder for this project with someone that was like technically better than me, because I knew I wanted like a CTO. And so honestly, like there weren't a lot of people that, I mean, there's, I know a lot of people that are smarter than me, but like that fit that bill. We're willing to do a startup and also just have the same like values that I shared. Right. And just, I think doing a startup is very hard work, right? It's not like, I'm sure like you guys all know this, it's, it's a lot of hours, a lot of late nights and you want to be like in the same place together and just like being willing to hash out stuff and have that grit basically. And I really looked for that. And so Simon really fit that bill and I think I convinced him to bring Trump on board. [00:17:24]Swyx: Yeah. And obviously I've had the pleasure of chatting and working with a little bit with both of you. What would you say those, those like your top one or two values are when, when thinking about that or the culture of the company and that kind of stuff? [00:17:36]Jerry: I think in terms of the culture of the company, it's really like, I mean, there's a few things I can name off the top of my head. One is just like passion, integrity. I think that's very important for us. We want to be honest. We don't want to like, obviously like copy code or, or kind of like, you know, just like, you know, not give attribution, those types of things and, and just like be true to ourselves. I think we're all very like down to earth, like humble people, but obviously I think just willingness to just like own stuff and dive right in. And I think grit comes with it. I think in the end, like this is a very fast moving space and we want to just like be one of the, you know, like dominant forces and helping to provide like production quality outline applications. Yeah. [00:18:11]Swyx: I promise we'll get to more technical questions, but I also want to impress on the audience that this is a very conscious and intentional company building. And since your fundraising post, which was in June, and now it's September, so it's been about three months, you've actually gained 50% in terms of stars and followers. You've 3x'd your download count to 600,000 a month and your discord membership has reached 10,000. So like a lot of ongoing growth. [00:18:37]Jerry: Yeah, definitely. And obviously there's a lot of room to expand there too. And so open source growth is going to continue to be one of our core goals because in the end it's just like, we want this thing to be, well, one big, right? We all have like big ambitions, but to just like really provide value to developers and helping them in prototyping and also productionization of their apps. And I think it turns out we're in the fortunate circumstance where a lot of different companies and individuals, right, are in that phase of like, you know, maybe they've hacked around on some initial LLM applications, but they're also looking to, you know, start to think about what are the production grade challenges necessary to actually, that to solve, to actually make this thing robust and reliable in the real world. And so we want to basically provide the tooling to do that. And to do that, we need to both spread awareness and education of a lot of the key practices of what's going on. And so a lot of this is going to be continued growth, expansion, education, and we do prioritize that very heavily. [00:19:30]Alessio: Let's dive into some of the questions you were asking yourself initially around fine tuning and RAG , how these things play together. You mentioned context. What is the minimum viable context for RAG ? So what's like a context window too small? And at the same time, maybe what's like a maximum context window? We talked before about the LLMs are U-shaped reasoners. So as the context got larger, like it really only focuses on the end and the start of the prompt and then it kind of peters down. Any learnings, any kind of like tips you want to give people as they think about it? [00:20:06]Jerry: So this is a great question. And part of what I wanted to talk about a conceptual level, especially with the idea of like thinking about what is the minimum context? Like, okay, what if the minimum context was like 10 tokens versus like, you know, 2k tokens versus like a million tokens. Right. Like, and what does that really give you? And what are the limitations if it's like 10 tokens? It's kind of like, um, like eight bit, 16 bit games, right? Like back in the day, like if you play Mario and you have like the initial Mario where the graphics were very blocky and now obviously it's like full HD, 3d, just the resolution of the context and the output will change depending on how much context you can actually fit in. So the way I kind of think about this from a more principled manner is like you have like, there's this concept of like information capacity, just this idea of like entropy, like given any fixed amount of like storage space, like how much information can you actually compact in there? And so basically a context window length is just like some fixed amount of storage space, right? And so there's some theoretical limit to the maximum amount of information you can compact until like a 4,000 token storage space. And what does that storage space use for these days with LLMs? For inputs and also outputs. And so this really controls the maximum amount of information you can feed in terms of the prompt plus the granularity of the output. If you had an infinite context window, you're going to have an infinitely detailed response and also infinitely detailed memory. But if you don't, you can only kind of represent stuff in more quantized bits, right? And so the smaller the context window, just generally speaking, the less details and maybe the less, um, and for like specific, precise information, you're going to be able to surface any given point in time. [00:21:34]Alessio: So when you have short context, is the answer just like get a better model or is the answer maybe, Hey, there needs to be a balance between fine tuning and RAG to make sure you're going to like leverage the context, but at the same time, don't keep it too low resolution? [00:21:48]Jerry: Yeah, yeah. Well, there's probably some minimum threat, like I don't think anyone wants to work with like a 10. I mean, that's just a thought exercise anyways, a 10 token context window. I think nowadays the modern context window is like 2k, 4k is enough for just like doing some sort of retrieval on granular context and be able to synthesize information. I think for most intents and purposes, that level of resolution is probably fine for most people for most use cases. I think the question there is just like, um, the limitations actually more on, okay, if you're going to actually combine this thing with some sort of retrieval data structure mechanism, there's just limitations on the retrieval side because maybe you're not actually fetching the most relevant context to actually answer this question, right? Like, yes, like given the right context, 4,000 tokens is enough. But if you're just doing like top-k similarity, like you might not be able to be fetching the right information from the documents. [00:22:34]Alessio: So how should people think about when to stick with RAG versus when to even entertain and also in terms of what's like the threshold of data that you need to actually worry about fine tuning versus like just stick with rag? Obviously you're biased because you're building a RAG company, but no, no, actually, um, I [00:22:52]Jerry: think I have like a few hot takes in here, some of which sound like a little bit contradictory or what we're actually building. And I think to be honest, I don't think anyone knows the right answer. I think this is the truth. [00:23:01]Alessio: Yeah, exactly. [00:23:01]Jerry: This is just like thought exercise towards like understanding the truth. [00:23:04]Alessio: Right. [00:23:04]Jerry: So, okay. [00:23:05]Alessio: I have a few hot takes. [00:23:05]Jerry: One is like RAG is basically just, just a hack, but it turns out it's a very good hack because what is RAG rag is you keep the model fixed and you just figure out a good way to like stuff stuff into the prompt of the language model and everything that we're doing nowadays in terms of like stuffing stuff into the prompt is just algorithmic. We're just figuring out nice algorithms to, to like retrieve right information with top case similarity, do some sort of like, uh, you know, hybrid search, some sort of like a chain of thought decomp and then just like stuff stuff into a prompt. So it's all like algorithmic and it's more like just software engineering to try to make the most out of these like existing APIs. The reason I say it's a hack is just like from a pure like optimization standpoint. If you think about this from like the machine learning lens, unless the software engineering lens, there's pieces in here that are going to be like suboptimal, right? Like, like the thing about machine learning is when you optimize like some system that can be optimized within machine learning, like the set of parameters, you're really like changing like the entire system's weights to try to optimize the subjective function. [00:24:02]Jerry: And if you just cobble a bunch of stuff together, you can't really optimize the pieces are inefficient, right? And so like a retrieval interface, like doing top cam batting lookup, that part is inefficient. [00:24:13]Jerry: If you, for instance, because there might be potentially a better, more learned retrieval algorithm, that's better. If you know, you do stuff like some sort of, I know nowadays there's this concept of how do you do like short-term and long-term memory represent stuff in some sort of vector embedding, do trunk sizes, all that stuff. It's all just like decisions that you make that aren't really optimized and it's not really automatically learned. It's more just things that you set beforehand to actually feed into the system. So I do think like there is a lot of room to actually optimize the performance of an entire LLM system, potentially in a more like machine learning based way. Right. [00:24:48]Jerry: And I will leave room for that. And this is also why I think like in the long term, I do think fine tuning will probably have like greater importance. And just like there will probably be new architectures invented that where you can actually kind of like include a lot of this under the black box, as opposed to having like hobbling together a bunch of components outside the black box. That said, just very practically given the current state of things, like even if I said RAG is a hack, it's a very good hack and it's also very easy to use. Right. [00:25:16]Jerry: And so just like for kind of like the AI engineer persona, which to be fair is kind of one of the reasons generative AI has gotten so big is because it's way more accessible for everybody to get into, as opposed to just like traditional machine learning, it tends to be good enough. [00:25:30]Jerry: Right. And if we can basically provide these existing techniques to help people really optimize how to use existing systems without having to really deeply understand machine learning, I still think that's a huge value add. And so there's very much like a UX and ease of use problem here, which is just like RAG is way easier to onboard and use. And that's probably like the primary reason why everyone should do RAG instead of fine tuning to begin with. If you think about like the 80-20 rule, like RAG very much fits within that and fine tuning doesn't really right now. And then I'm just kind of like leaving room for the future that, you know, like in the end, fine tuning can probably take over some of the aspects of like what RAG does. [00:26:04]Swyx: I don't know if this is mentioned in your explainability also allows for sourcing. And at the end of the day, like to increase trust that we have to source documents. Yeah. [00:26:14]Jerry: So, so I think what RAG does is it increases like transparency, visibility into the actual documents, right. [00:26:19]Jerry: That are getting fed into their context. [00:26:21]Swyx: Here's where they got it from. [00:26:22]Alessio: Exactly. [00:26:22]Jerry: That's definitely an advantage. I think the other piece that I think is an advantage, and I think that's something that someone actually brought up is just you can do access control with, with RAG . If you have an external storage system, you can't really do that with, with large language models. [00:26:35]Jerry: It's just like gate information to the neural net weights, like depending on the type of user for the first point, you could technically, you could technically have the language model. [00:26:45]Jerry: Like if it memorized enough information, just like a site sources, but there's a question of just trust whether or not you're actually, yeah, well, but like it makes it up right now because it's like not good enough, but imagine a world where it is good enough and it does give accurate citations. Swyx: No, I think to establish trust, you just need a direct connection.So it's, it's kind of weird. It's, it's this melding of deep learning systems versus very traditional information retrieval. Yeah, exactly. [00:27:11]Jerry: Well, so, so I think, I mean, I kind of think about it as analogous to like humans, right? [00:27:15]Jerry: Like, uh, we as humans, obviously we use the internet, we use tools. Uh, these tools have API interfaces are well-defined. Um, and obviously we're not like the tools aren't part of us. And so we're not like back propping or optimizing over these tools. And so when you think about like RAG , it's basically, um, LLM is learning how to use like a vector database to look up information that it doesn't know. And so then there's just a question of like how much information is inherent within the network itself and how much does it need to do some sort of like tool used to look up stuff that it doesn't know. [00:27:42]Jerry: And I do think there'll probably be more and more of that interplay as time goes on. [00:27:46]Swyx: Yeah. Some followups on discussions that we've had, you know, we discussed fine tuning a bit and what's your current take on whether you can, you can fine tune new knowledge into LLMs. [00:27:55]Jerry: That's one of those things where I think longterm you definitely can. I think some people say you can't, I disagree. I think you definitely can. Just right now I haven't gotten it to work yet. So, so I think like we've tried, yeah, well, um, not in a very principled way, right? Like this is something that requires like an actual research scientist and not someone that has like, you know, an hour or two per night to actually look at this. [00:28:12]Swyx: Like I, you were a research scientist at Uber. I mean, it's like full-time, full-time working. [00:28:16]Jerry: So, so I think, um, what I specifically concretely did was I took OpenAI's fine tuning endpoints and then tried to, you know, it's in like a chat message interface. And so there's like, um, input question, like a user assistant message format. And so what I did was I tried to take just some piece of text and have the LLM memorize it by just asking it a bunch of questions about the text. So given a bunch of context, I would generate some questions and then generate some response and just fine tune over the question responses. That hasn't really worked super well, but that's also because I'm, I'm just like trying to like use OpenAI's endpoints as is. If you just think about like traditional, like how you train a Transformers model, there's kind of like the, uh, instruction, like fine tuning aspect, right? You like ask it stuff when guided with correct responses, but then there's also just like, um, next token production. And that's something that you can't really do with the OpenAI API, but you can do with, if you just train it yourself and that's probably possible if you just like train it over some corpus of data. I think Shashira from Berkeley said like, you know, when they trained Gorilla, they were like, Oh, you know, this, a lot of these LLMs are actually pretty good at memorizing information. Um, just the way the API interface is exposed is just no one knows how to use them right [00:29:22]Alessio: now. Right. [00:29:22]Jerry: And so, so I think that's probably one of the issues. [00:29:24]Swyx: Just to clue people in who haven't read the paper, Gorilla is the one where they train to use specific APIs. [00:29:30]Jerry: Yeah, I think this was on the Gorilla paper. Like the, the model itself could, uh, try to learn some prior over the data to decide like what tool to pick. But there's also, it's also augmented with retrieval that helps supplement it in case like the, the, the, um, prior doesn't actually work. [00:29:45]Swyx: Is that something that you'd be interested in supporting? [00:29:48]Jerry: I mean, I think in the longterm, like if like, this is kind of how fine tuning, like RAG evolves. Like I do think there'll be some aspect where fine tuning will probably memorize some high level concepts of knowledge, but then like RAG will just be there to supplement like aspects of that, that aren't work that don't, that, that it doesn't know. Jerry: Um, the way I think about this is kind of like, obviously RAG is the default way, like to be clear, RAG right now is the default way to actually augment stuff with knowledge. I think it's just an open question of how much the LM can actually internalize both high level concepts, but also details as you can like train stuff over it. And coming from an ML background, there is a certain beauty and just baking everything into some training process of a language model. Like if you just take raw chat, GPT or chat, GPT code interpreter, right? Like GPT four, it's not like you do RAG with it. You just ask it questions about like, Hey, how do I like to find a pedantic model in Python? And I'm like, can you give me an example? Can you visualize a graph? It just does it right. Like, and we'll run it through code interpreters as a tool, but that's not like a source for knowledge. [00:30:46]Jerry: It's just an execution environment. And so there is some beauty in just like having the model itself, like just, you know, instead of you kind of defining the algorithm for what the data structure should look like the model just learns it under the hood. That said, I think the reason it's not a thing right now is just like, no one knows how to do it. [00:31:01]Jerry: It probably costs too much money. And then also like the API interfaces and just like the actual ability to kind of evaluate and improve on performance, like isn't known to most people. [00:31:12]Alessio: Yeah. [00:31:12]Swyx: It also would be better with browsing. [00:31:14]Alessio: Yeah. [00:31:16]Swyx: I wonder when they're going to put that back. [00:31:18]Alessio: Okay. Yeah. [00:31:19]Swyx: So, and then one more follow up before we go into RAG for AI engineers is on your brief mentioned about security or off. How many of your, the people that you talk to, you know, you talk to a lot of people putting LlamaIndex into production. How many people actually are there versus just like, let's just dump a whole company notion into this thing. [00:31:36]Jerry: Wait, are you talking about from like the security off standpoint? [00:31:39]Alessio: Yeah. [00:31:39]Swyx: How big a need is that? Because I, I talked to some people who are thinking about building tools in that domain, but I don't know if people want it. [00:31:47]Jerry: I mean, I think bigger companies, like just bigger companies, like banks, consulting firms, like they all want this requirement, right? The way they're using LlamaIndex is not with this, obviously. Cause I don't think we have support for like access control or author that have stuff like on a hood. [00:32:02]Jerry: Cause we're more just like an orchestration framework. And so the way they build these initial apps is more kind of like prototype. Like, let's kind of, yeah. Like, you know, use some publicly available data. That's not super sensitive. Let's like, you know, assume that every user is going to be able to have access to the same amount of knowledge, those types of things. I think users have asked for it, but I don't think that's like a P zero. Like I think the P zero is more on like, can we get this thing working before we expand this to like more users within the work? [00:32:25]Alessio: There's a bunch of pieces to rag. Obviously it's not a, just an acronym. And you two recently, you think every AI engineer should build the front scratch at least once. Why is that? I think so. [00:32:37]Jerry: I'm actually kind of curious to hear your thoughts about this. Um, but this kind of relates to the initial like AI engineering posts that you put out and then also just like the role of an AI engineer and the skills that they're going to have to learn to truly succeed because there's an entire On one end, you have people that don't really, uh, like understand the fundamentals and just want to use this to like cobble something together to build something. And I think there is a beauty in that for what it's worth. Like, it's just one of those things. And Gen AI has made it so that you can just use these models in inference only mode, call something together, use it, power your app experiences, but on the other end, what we're increasingly seeing is that like more and more developers building with these apps start running into honestly, like pretty similar issues that like we'll play just a standard engineer building like a classifier model, which is just like accuracy problems, like, and hallucinations, basically just an accuracy problem, right? [00:33:24]Like it's not giving you the right results. So what do you do? You have to iterate on the model itself. You have to figure out what parameters you tweak. You have to gain some intuition about this entire process. That workflow is pretty similar, honestly, like even if you're not training the model to just like tuning a ML model with like hyper parameters and learning like proper ML practices of like, okay, how do I have like define a good evaluation benchmark? How do I define like the right set of metrics to do to use, right? How do I actually iterate and improve the performance of this pipeline for [00:33:52]Alessio: production? What tools do I use? [00:33:53]Jerry: Right? Like every ML engineer use like some form of weights and biases, tensor boards, or like some other experimentation tracking tool. What tools should I use to actually help build like LLM applications and optimize it for production? There's like a certain amount of just like LLM ops, like tooling and concepts and just like practices that people will kind of have to internalize if they want to optimize these. And so I think that the reason I think being able to build like RAG from scratch is important is it really gives you a sense of like how things are working to get, help you build intuition about like what parameters are within a RAG system and which ones actually tweak to make them better. Cause otherwise I think that one of the advantages of the LlamaIndex quick start is it's three lines of code. The downside of that is you have zero visibility into what's actually going on [00:34:37]Alessio: under the hood. [00:34:37]Jerry: And I think there's something that we've kind of been thinking about for a while and I'm like, okay, let's just release like a new tutorial series. That's just like, we're in set, not no three lines of code. We're just going to go in and actually show you how the thing actually works on [00:34:47]Alessio: the hood. Right. [00:34:47]Jerry: And so I like, does everybody need this? Like probably not as for some people, the three lines of code might work, but I think increasingly, like honestly, 90% of the users I talked to have questions about how to improve the performance of their app. And so just like, given this, it's just like one of those things that's like better for the understanding. [00:35:03]Alessio: Yeah. [00:35:03]Swyx: I'd say it is one of the most useful tools of any sort of developer education toolkit to write things yourself from scratch. So Kelsey Hightower famously wrote Kubernetes the hard way, which is don't use Kubernetes. Here's everything that you would have to do by yourself. And you should be able to put all these things together yourself to understand the value of Kubernetes. And the same thing for LLlamaIndex. I've done, I was the guy who did the same for React. And it's a pretty good exercise for you to just fully understand everything that's going on under the hood. And I was actually going to suggest while in one of the previous conversations, there's all these like hyperparameters, like the size of the chunks and all that. And I was thinking like, what would hyperparameter optimization for RAG look [00:35:44]Alessio: like? [00:35:44]Jerry: Yeah, definitely. I mean, so absolutely. I think that's going to be an increasing thing. I think that's something we're kind of looking at because like, I think someone [00:35:52]Swyx: should just put, do like some large scale study and then just ablate everything. And just you, you tell us. [00:35:57]Jerry: I think it's going to be hard to find a universal default that works for [00:36:00]Alessio: everybody. [00:36:00]Jerry: I think it's going to be somewhat, I do think it's going to be somewhat like dependent on the data and use case. I think if there was a universal default, that would be amazing. But I think increasingly we found, you know, people are just defining their own like custom parsers for like PDFs, markdown files for like, you know, SEC filings versus like Slack conversations. And then like the use case too, like, do you want like a summarization, like the granularity of the response? Like it really affects the parameters that you want to pick. I do like the idea of hyperparameter optimization though, but it's kind of like one of those things where you are kind of like training the model basically kind of on your own data domain. [00:36:36]Alessio: Yeah. [00:36:36]Swyx: You mentioned custom parsers. You've designed LlamaIndex, maybe we can talk about like the surface area of the [00:36:41]Alessio: framework. [00:36:41]Swyx: You designed LlamaIndex in a way that it's more modular, like you mentioned. How would you describe the different components and what's customizable in each? [00:36:50]Jerry: Yeah, I think they're all customizable. And I think that there is a certain burden on us to make that more clear through the [00:36:57]Alessio: docs. [00:36:57]Jerry: Well, number four is customization tutorials. [00:36:59]Swyx: Yeah, yeah. [00:37:00]Jerry: But I think like just in general, I think we do try to make it so that you can plug in the out of the box stuff. But if you want to customize more lower level components, like we definitely encourage you to do that and plug it into the rest of our abstractions. So let me just walk through like maybe some of the basic components of LlamaIndex. There's data loaders. You can load data from different data sources. We have Llama Hub, which you guys brought up, which is, you know, a collection of different data loaders of like unstructured and unstructured data, like PDFs, file types, like Slack, Notion, all that stuff. Now you load in this data. We have a bunch of like parsers and transformers. You can split the text. You can add metadata to the text and then basically figure out a way to load it into like a vector store. So, I mean, you worked at like Airbrite, right? It's kind of like there is some aspect like E and T, right? And in terms of like transforming this data and then the L, right, loading it into some storage abstraction, we have like a bunch of integrations with different document storage systems. [00:37:49]Alessio: So that's data. [00:37:50]Jerry: And then the second piece really is about like, how do you retrieve this data? How do you like synthesize this data and how do you like do some sort of higher level reasoning over this data? So retrieval is one of the core abstractions that we have. We do encourage people to like customize, define your own retrievers, that section on kind of like how do you define your own, like custom retriever, but also we have like out of the box ones. The retrieval algorithm kind of depends on how you structure the data, obviously. Like if you just flat index everything with like chunks with like embeddings, then you can really only do like top K like lookup plus maybe like keyword search or something. But if you can index it in some sort of like hierarchy, like defined relationships, you can do more interesting things like actually traverse relationships between nodes. Then after you have this data, how do you like synthesize the data? [00:38:32]Alessio: Right. [00:38:32]Jerry: Um, and, and this is the part where you feed it into the language model. There's some response abstraction that can abstract away over like long contacts to actually still give you a response, even if the context overflows a context window. And then there's kind of these like higher level, like reasoning primitives that I'm going to define broadly. And I'm just going to call them in some general bucket of like agents, even though everybody has different definitions of agents, but you're the first to data agents, [00:38:56]Swyx: which I was very excited. [00:38:57]Alessio: Yeah. [00:38:57]Jerry: We, we kind of like coin, coin that term. And the way we, we thought about it was, you know, we wanted to think about how to use agents for, uh, like data workflows basically. And, and so what are the reasoning primitives that you want to do? So the most simple reasoning primitive you can do is some sort of routing module. It's a classifier, like given a query, just make some automated decision on what choice to pick, right? You could use LLMs. You don't have to use LLMs. You could just try and classifier basically. That's something that we might actually explore. And then the next piece is, okay, what are some higher level things? You can have the LLM like define like a query plan, right. To actually execute over the data. You can do some sort of while loop, right? That's basically what an agent loop is, which is like react a chain of thought, like the open AI function calling, like while loop to try to like take a question and try to break it down into some, some, uh, series of steps to actually try to execute to get back a response. And so there's a range and complexity from like simple reasoning primitives to more advanced ones. The way we kind of think about it is like, which ones should we implement and how do [00:39:50]Alessio: they work? [00:39:50]Jerry: Well, like, do they work well over like the types of like data tasks that we give them? [00:39:54]Alessio: How do you think about optimizing each piece? So take, um, embedding models is one piece of it. You offer fine tuning, embedding models. And I saw it was like fine tuning gives you like 5, 10% increase. What's kind of like the Delta left on the embedding side? Do you think we can get models that are like a lot better? Do you think like that's one piece where people should really not spend too much time? [00:40:16]Jerry: I just think it's, it's not the only parameter. Cause I think in the end, if you think about everything that goes into retrieval, the chunking algorithm, um, how you define like metadata will bias your embedding representations. Then there's the actual embedding model itself, which is something that you can try optimizing. And then there's like the retrieval algorithm. Are you going to just do top K? Are you going to do like hybrid search? Are you going to do auto retrieval? Like there's a bunch of parameters. And so I do think it's something everybody should try. I think by default we use like OpenAI's embedding model. A lot of people these days use like sentence transformers because it's, it's just like free open source and you can actually optimize, directly optimize it. This is an active area of exploration. I do think one of our goals is it should ideally be relatively free for every developer to just run some fine tuning process over their data to squeeze out some more points and performance. And if it's that relatively free and there's no downsides, everybody should basically do [00:41:04]Alessio: it. [00:41:04]Jerry: There's just some complexities, right? In terms of optimizing your embedding model, especially in a production grade data pipeline. If you actually fine tune the embedding model and the embedding space changes, you're going to have to reindex all your documents. And for a lot of people, that's not feasible. And so I think like Joe from Vespa on our webinars, like there's this idea that depending on if you're just using like document and query embeddings, you could keep the document embeddings frozen and just train a linear transform on the query or, or any sort of transform on the query, right? So therefore it's just a query side transformation instead of actually having to reindex all the document embeddings. That's pretty smart. We weren't able to get like huge performance gains there, but it does like improve performance a little bit. And that's something that basically, you know, everybody should be able to kick off. You can actually do that on LLlamaIndex too. [00:41:45]Swyx: OpenAIO has a cookbook on adding bias to the embeddings too, right? [00:41:49]Alessio: Yeah. [00:41:49]Jerry: There's just like different parameters that you can, you can try adding to try to like optimize the retrieval process. And the idea is just like, okay, by default you have all this text. It kind of lives in some latent space, right? [00:42:01]Swyx: Yeah. Shut out, shut out latent space. You should take a drink every time. [00:42:05]Jerry: But it lives in some latent space. But like depending on the type, specific types of questions that the user might want to ask, the latent space might not be optimized to actually retrieve the relevant piece of context that the user want to ask. So can you shift the embedding points a little bit, right? And how do we do that? Basically, that's really a key question here. So optimizing the embedding model, even changing the way you like chunk things, these all shift the embeddings. [00:42:26]Alessio: So the retrieval is interesting. I got a bunch of startup pitches that are like, like ragged school, but like there's a lot of stuff in terms of ranking that could be better. There's a lot of stuff in terms of sun setting data. Once it starts to become stale, that could be better. Are you going to move into that part too? So like you have SEC Insights as one of kind of like your demos. And that's like a great example of, Hey, I don't want to embed all the historical documents because a lot of them are outdated and I don't want them to be in the context. [00:42:55]Jerry: What's that problem space? [00:42:57]Alessio: Like how much of it are you going to also help with and versus how much you expect others to take care of? [00:43:03]Jerry: Yeah, I'm happy to talk about SEC Insights in just a bit. I think more broadly about the like overall retrieval space. We're very interested in it because a lot of these are very practical problems that [00:43:11]Alessio: people have asked us. [00:43:11]Jerry: And so the idea of outdated data, I think, how do you like deprecate or time wait data and do that in a reliable manner, I guess. So you don't just like set some parameter and all of a sudden that affects your, all your retrieval items, like is pretty important because people have started bringing [00:43:25]Alessio: that up. [00:43:25]Jerry: Like I have a bunch of duplicate documents, things get out of date. How do I like sunset documents? And then remind me, what was the, what was the first thing you said? Cause I think there was, there was something like the ranking ranking, right? [00:43:35]Alessio: Yeah. [00:43:35]Jerry: So I think this space is not new. I think everybody who is new to this space starts learning some basic concepts of information retrieval, which to be fair has been around for quite a bit. But our goal is to kind of like take some of like just general ranking and information retrieval concepts. So by encoding, like crossing coding, right? Like we're based models versus like kind of keyword based search. How do you actually evaluate retrieval? These things start becoming relevant. And so I think for us, like rather than inventing like new retriever techniques for the sake of like just inventing better ranking, we want to take existing ranking techniques and kind of like package it in a way that's like intuitive and easy for people to understand. That said, I think there are interesting and new retrieval techniques that are kind of in place that can be done when you tie it into some downstream rack system. The reason for this is just like, if you think about the idea of like chunking text, right? Like that just really wasn't a thing, or at least for this specific purpose, like the reason chunking is a thing in RAG right now is because like you want to fit within the context bundle of an LLM, right? Like why do you want to chunk a document? That just was less of a thing. I think back then, if you wanted to like transform a document, it was more for like structured data extraction or something in the past. And so there's kind of like certain new concepts that you got to play with that you can use to invent kind of more interesting retrieval techniques. Another example here is actually LLM based reasoning, like LLM based chain of thought reasoning. You can take a question, break it down into smaller components and use that to actually send to your retrieval system. And that gives you better results. And it's kind of like sending the full question to a retrieval system. That also wasn't really a thing back then, but then you can kind of figure out an interesting way to like blending old and the new, right? With LLMs and data. [00:45:13]Swyx: There's a lot of ideas that you come across. Do you have a store of them? [00:45:17]Jerry: Yeah, I think I, sometimes I get like inspiration. There's like some problem statement and I'm just like, oh, it's like, following you is [00:45:23]Swyx: very hard because it's just a lot of homework. [00:45:25]Jerry: So I think I've, I've started to like step on the brakes just a little bit. Cause then I start, no, no, no. Well, the, the reason is just like, okay, if I just have invent like a hundred more retrieval techniques, like, like sure. But like, how do people know which one is good and which one's like bad. [00:45:41]Alessio: Right. [00:45:41]Jerry: And so have a librarian, right? [00:45:42]Swyx: Like it's going to catalog it and you're going to need some like benchmarks. [00:45:45]Jerry: And so I think that's probably the focus for the next, next few weeks is actually like properly kind of like having an understanding of like, oh, you know, when should you do this or like, what does this actually work well? [00:45:54]Alessio: Yeah. [00:45:54]Swyx: Some kind of like a, maybe like a flow chart, decision tree type of thing. Yeah, exactly. When this do that, you know, something like that, that would be really helpful for me. [00:46:02]Alessio: Thank you. [00:46:02]Swyx: It seems like your most successful side project. Yeah. What is SEC Insights for our listeners? [00:46:07]Jerry: Um, our SEC Insights is a full stack LLM chatbot application, um, that does. Analysis of your sec 10 K and 10 Q filings. And so the goal for building this project is really twofold. The reason we started building this was one, it was a great way to dog food, the production readiness for our library. We actually ended up like adding a bunch of stuff and fixing a ton of bugs because of this. And I think it was great because like, you know, thinking about how we handle like callbacks streaming, actually generating like reliable sub responses and bubbling up sources, citations. These are all things that like, you know, if you're just building the library in isolation, you don't really think about it. But if you're trying to tie this into a downstream application, like it really starts mattering for your error messages. When you talk about bubbling up stuff for like sources, like if you go into SEC Insights and you type something, you can actually see the highlights in the right side. That was something that like took a little bit of like, um, understanding to figure out how to build wall. And so it was great for dog fooding improvement of the library itself. And then as we're building the app, um, the second thing was we're starting to talk to users and just like trying to showcase like kind of, uh, bigger companies, like the potential of LLM index as a framework, because these days obviously building a chatbot, right. With Streamlight or something, it'll take you like 30 minutes or an hour. Like there's plenty of templates out there on LLM index, like train, like you can just build a chatbot, but how do you build something that kind of like satisfies some of these, uh, this like criteria of surfacing, like citations, being transparent, seeing like, uh, having a good UX, um, and then also being able to handle different types of questions, right? Like more complex questions that compare different documents. That's something that I think people are still trying to explore. And so what we did was like, we showed, well, first like organizations, the possibilities of like what you can do when you actually build something like this. And then after like, you know, we kind of like stealth launched this for fun, just as a separate project, uh, just to see if we could get feedback from users who are using this world to see like, you know, how we can improve stuff. And then we were thought, we thought like, ah, you know, we built this, right? Obviously we're not going to sell like a financial app. Like that's not really our, in our wheelhouse, but we're just going to open source the entire thing. And so that now is basically just like a really nice, like full stack app template you can use and customize on your own, right. To build your own chatbot, whether it is a really financial documents or like other types of documents. Um, and it provides like a nice template for basically anybody to kind of like go in and get started. There's certain components though, that like aren't released yet that we're going to going to, and then next few weeks, like one is just like kind of more detailed guides on like different modular components within it. So if you're like a full stack developer, you can go in and actually take the pieces that you want and actually kind of build your own custom flows. The second piece is like, take, there's like certain components in there that might not be directly related to the LLM app that would be nice to just like have people use, uh, an example is the PDF viewer, like the PDF viewer with like citations. I think we're just going to give that right. So, you know, you could be using any library you want, but then you can just, you know, just drop in a PDF viewer. [00:48:53]Alessio: Right. [00:48:53]Jerry: So that it's just like a fun little module that you can do. [00:48:55]Swyx: Nice. That's really good community service right there. I want to talk a little bit about your cloud offering, because you mentioned, I forget the name that you had for it. [00:49:04]Alessio: Enterprise something. [00:49:04]Jerry: Well, one, we haven't come up with a name. Uh, we're kind of calling it LLM index platform, platform LLM index enterprise. I'm open to suggestions here. Um, and the second thing is I don't actually know how much I can, I can share right now because it's mostly kind of like, uh, we, we, yeah, exactly. [00:49:20]Swyx: To the extent that you can talk about LLM index as a business. Um, always just want to give people in the mind, like, Hey, like you sell things too, you know what I mean? [00:49:28]Jerry: Yeah, a hundred percent. So I think the high level of what I can probably say is just like, I think we're looking at ways of like actively kind of complimenting the developer experience, like building LLM index. We've always been very focused on stuff around like plugging in your data into the language model. And so can we build tools that help like augment that experience beyond the open [00:49:47]Alessio: source library? Right. [00:49:48]Jerry: And so I think what we're going to do is like make a build an experience where it's very seamless to transition from the open source library with like a one line toggle, you can basically get this like complimentary service and then figure out a way to like monetize in a bit. I think where our revenue focus this year is less emphasized. Like it's more just about like, can we build some manage offering that like provides complimentary value to what the open source library provides? [00:50:09]Alessio: Yeah. [00:50:10]Swyx: I think it's the classic thing about all open source is you want to start building the most popular open source projects in your category to own that category. You're going to make it very easy to host. Therefore you're just built your biggest competitor, which is you. [00:50:22]Jerry: I think it will be like complimentary. Cause I think it will be like, you know, use the open source library and then you have a toggle and all of a sudden, you know, you can see this basically like a pipeline ish thing pop up and then it will be able to kind of like, you'll have a UI. There'll be some enterprise guarantees and the end goal would be to help you build like a production RAG app more easily. [00:50:42]Alessio: Data loaders. There's a lot of them. What are maybe some of the most popular, maybe under, not underrated, but like underexpected, you know, and how has the open source side of it helped with like getting a lot more connectors, you only have six people on the team today, so you couldn't have done it all yourself. [00:51:00]Jerry: Yeah. I think the nice thing about like Walmart hub itself, it's supposed to be a community driven hub. Um, and so actually the bulk of the peers are completely community contributed. Um, and so we haven't written that many like first party connectors actually for this, it's more just like a kind of encouraging people to contribute to the community in terms of the most popular tools, uh, or the data loaders. I think we have Google analytics on this and I forgot the specifics. It's some mix of like the PDF loaders. We have like 10 of them, but there's some subset of them that are popular. And then there's Google, like I think Gmail and like G drive. Um, and then I think maybe it's like one of Slack or notion. One thing I will say though, uh, and I think like Swix might probably knows this better than I do, given that you were, she used to work at air bite. It's very hard to build, like, especially for full on service, like notion Slack or like Salesforce to build like a really, really high quality loader that really extracts all the information that people want. [00:51:51]Alessio: Right. [00:51:51]Jerry: And so I think the thing is when people start out, like they will probably use these loaders and it's a great tool to get started. And for a lot of people, it's like good enough. And they submit PRs if they want more additional features. But if you get to a point where you actually want to call like an API that hasn't been supported yet, or, you know, you want to load in stuff that like in metadata or something that hasn't been directly baked into the logic of a loader itself, people start adding up, like writing their own custom loaders. And that is a thing that we're seeing. That's something that we're okay with. [00:52:18]Alessio: Right. [00:52:18]Jerry: Cause like a lot of this is more just like community driven. And if you want to submit a PR to improve the existing one, you can, otherwise you can create your own custom ones. [00:52:24]Alessio: Yeah. [00:52:25]Swyx: And all that is custom loaders all supported within LLlamaIndex, or do you pair it with something else? [00:52:29]Jerry: Oh, it's just like, I mean, you just define your own subclass. I think, I think that's it. [00:52:33]Alessio: Yeah. Yeah. [00:52:33]Swyx: Cause typically in the data ecosystem with everybody, everybody has his own strategies with custom loaders, but also you could write your own with like Dagster or like Prefect or one of those tools. [00:52:43]Alessio: Yeah. [00:52:44]Jerry: Yeah, exactly. So I think for us, it's more, we just have a very flexible like document abstraction that you can fill in with any content that you want. [00:52:50]Swyx: Are people really dumping all their Gmail into these things? You said Gmail is number two. Uh, I'm not sure actually. I mean, that's these, you know, that's the most private data source. [00:52:59]Alessio: That's true. [00:53:00]Swyx: So I'm surprised that people are dumping too. I mean, I'm sure some, some people are, but like, I'm sure I'm surprised it's [00:53:06]Alessio: popular. [00:53:06]Swyx: Well, and then, so, uh, the LLM engine, uh, I assume OpenAI is going to be a majority. Is it an overwhelming majority? Uh, how, what's the market share between like OpenAI, Cohere, Anthropic, you know, whatever you're seeing. [00:53:21]Alessio: OpenSource too. [00:53:21]Jerry: Yeah, I think it's probably some, uh, OpenAI has a majority, but then like there's Anthropic and there's also, um, OpenSource. I think there is a lot of people trying out like Llama 2, um, and, and, um, some variant of like a top OpenSource model. [00:53:33]Swyx: Side note, any confusion there, Llama 2 versus Llama? [00:53:36]Jerry: Yeah, I think whenever I go to these talks, I always open it up with like, we started before it. Yeah, exactly. We start before meta, right? [00:53:43]Alessio: I want to point that out. [00:53:43]Jerry: Uh, but no, for us, we try to use it for like branding. We just add two llamas when we have like a Llama 2 integration instead of one llama. So I think a lot of people are trying out the popular OpenSource models. Uh, there's a lot of toolkits and OpenSource projects that allow you to self-host and deploy Llama 2 and like, oh, Llama is just a very recent example. I think that we, we added integration with, and so we just, uh, by virtue of having more of these services, I think more and more people are trying it out. [00:54:07]Swyx: Do you think there's, there's potential there? Is like, um, is that going to be an increasing trend? Like OpenSource? [00:54:12]Alessio: Yeah. [00:54:12]Jerry: Yeah, definitely. I think in general people hate monopolies. And so, um, like there's a, whenever like OpenAI has something really cool or like any, um, company has something really cool, even meta, like there's just going to be a huge competitive pressure from other people to do something that's more open and better. Um, and so I do think just market pressures will, will improve like OpenSource adoption. [00:54:32]Swyx: Last thing I'll say about this, which is just really like, it gets clicks. It's people like psychologically want that, but then at the end of the day, they want, they fall for brand name and popular and performance benchmarks. You know, at the end of the day, OpenAI still wins on that. I think that's true. [00:54:47]Jerry: But I, I just think like, unless you were like an active employee at OpenAI, right? Like all these research labs are putting out like ML, like PhDs or kind of like other companies too, that are investing a lot of dollars. Uh, there's going to be a lot of like competitive pressures developed, like better models. So is it going to be like all fully open source with like a permissive license? Like, I'm not completely sure, but like, there's just a lot of just incentive for people to develop their stuff here. [00:55:09]Swyx: Have you looked at like RAG specific models, like contextual? [00:55:12]Alessio: No. [00:55:13]Jerry: Is it public? [00:55:14]Swyx: No, they literally just, uh, so Dewey Keeler. I think it's his name. And you probably came across him. He wrote the RAG paper at Meta and just started contextual AI to create a RAG specific model. I don't know what that means. I was hoping that you do, cause it's your business. [00:55:29]Jerry: I had insider information. I mean, you know, to be honest, I think this, this kind of relates to my previous point on like RAG and fine tuning, like a RAG specific model is a model architecture that's designed for better RAG and it's less the software engineering principle of like, how can I take existing stuff and just plug and play different components into it? Um, and there's a beauty in that from ease of use and modularity, but when you want to end to end optimize the thing, you might want a more specific model. I think, I think building your own models is honestly pretty hard. Um, and I think the issue is if you also build your own models, like you're also just gonna have to keep up with like the rate of LM advances, like how, like basically the question is when GPT five and six and whatever, like anthropic cloud three comes out, how can you prove that you're actually better than, uh, software developers cobbling together and components on top of a base model. Right. Even if it's just like conceptually, this is better than maybe like GPT three or GPT four. [00:56:21]Alessio: What about vector stores? I know Spooks is wearing a chroma sweatshirt. [00:56:25]Swyx: Yeah, because they use a swagging. [00:56:27]Jerry: I have, I have the mug from Chroma. [00:56:29]Alessio: Yeah. It's been great. Yeah. [00:56:30]Jerry: What do you think there? [00:56:31]Alessio: Like there's a lot of them. Are they pretty interchangeable for like your users use case? Uh, is HNSW all we need? Is there room for improvements? [00:56:40]Swyx: Is NTRA all we need? [00:56:42]Jerry: I think, um, yeah, we try to remain unopinionated about storage providers. So it's not like we don't try to like play favorites. So we have like a bunch of integrations obviously. And we, the way we try to do it is we just tried to find like some standard interfaces, but obviously like different vector stores will support kind of like, uh, slightly additional things like metadata filters and those things. I mean, the goal is to have our users basically leave it up to them to try to figure out like what makes sense for their use case in terms of like the algorithm itself, I don't think the Delta on like improving the vector store, like. Embedding lookup algorithm. [00:57:10]Alessio: Is that high? [00:57:10]Jerry: I think the stuff has been mostly solved or at least there's just a lot of other stuff you can do to try to improve the overall performance. No, I mean like everything else that we just talked about, like in terms of like [00:57:20]Alessio: accuracy, right. [00:57:20]Jerry: To improve rag, like everything that we talked about, like chunking, like metadata, like. [00:57:24]Swyx: I mean, I was just thinking like, maybe for me, the interesting question is, you know, there are like eight, it's a kind of game of thrones. There's like eight, the war of eight databases right now. Oh, I see. Um, how do they stand out and how did they become very good partners? [00:57:36]Alessio: If not my index. [00:57:36]Jerry: Yeah, we're pretty good partners with, with most of them. [00:57:39]Alessio: Uh, let's see. [00:57:39]Swyx: Well, like if you're a, you know, vector database founder, like what do you, what do you work on? [00:57:44]Alessio: It's a good question. [00:57:44]Jerry: I think one thing I'm very interested in is, and this is something I think I've started to see a general trend towards is combining structured data querying with unstructured data querying. Um, and I think that will probably just expand the query sophistication of these vector stores and basically make it so that users don't have to think about whether they would just call this like hybrid querying. [00:58:05]Swyx: Is that what we've it's doing? [00:58:06]Alessio: Yeah. [00:58:07]Jerry: I mean, I think like, if you think about metadata filters, that's basically a structured filter. It's like our select where something equals something, and then you combine that with semantic search. I think like Lance DB or something was like, uh, try, I was trying to do some like joint interface. The reason is like most data is semi-structured. There's some structured annotations and there's some like unstructured texts. And so like, um, somehow combining all the expressivity of like SQL with like the flexibility of semantic search is something that I think is going to be really important. We have some basic hacks right now that allow you to jointly query both a SQL database and like a separate SQL database and a vector store to like combine the information. That's obviously going to be less efficient than if you just combined it into one [00:58:46]Alessio: system. Yeah. [00:58:46]Jerry: And so I think like PG vector, like, you know, that type of stuff, I think it's starting to get there, but like in general, like how do you have an expressive query language to actually do like structured querying along with like all the capabilities, semantic search. [00:58:57]Swyx: So your current favorite is just put it into Postgres. No, no, no. We don't play with Postgres language, the query language. [00:59:05]Jerry: I actually don't know what the best language would be for this, because I think it will be something that like the model hasn't been fine-tuned over. Um, and so you might want to train the model over this, but some way of like expressing structured data filters, and this could be include time too, right? It could, it doesn't have to just be like a where clause with this idea of like a [00:59:26]Alessio: semantic search. Yeah. [00:59:27]Swyx: And we talked about, uh, graph representations. [00:59:30]Alessio: Yeah. Oh yeah. [00:59:30]Jerry: That's another thing too. And there's like, yeah. So that's actually something I didn't even bring up yet. Like there's this interesting idea of like, can you actually have the language model, like explore like relationships within the data too, right? And somehow combine that information with stuff that's like more and more, um, structured within the DB. [00:59:46]Alessio: Awesome. [00:59:46]Swyx: What are your current strong beliefs about how to evaluate RAG ? [00:59:49]Jerry: I think I have thoughts. I think we're trying to curate this into some like more opinionated principles because there's some like open questions here. I think one question I had to think about is whether you should do like evals like component by component first, or is yours do the end to end thing? I think you should, you might actually just want to do the end to end thing first, just to do a sanity check of whether or not like this, uh, given a query and the final response, whether or not it even makes sense, like you eyeball [01:00:11]Alessio: it, right. [01:00:11]Jerry: And then you like try to do some basic evals. And then once you like diagnose what the issue is, then you go into the kind of like specific area to define some more, uh, solid benchmarks and try to like [01:00:21]Alessio: improve stuff. [01:00:21]Jerry: So what is Antoine evals? Like it's, you, um, have a query, it goes in through retrieval system. You get back something, you synthesize response, and that's your final thing. And you evaluate the quality of the final response. And these days, there's plenty of projects like startups, like companies research, doing stuff around like GPT-4, right. As like a human judge to basically kind of like synthetically generate data. [01:00:41]Swyx: I don't know from the startup side. [01:00:43]Jerry: I just know from a technical side, I think, I think people are going to do more of it. The main issue right now is just, uh, it's really unreliable. Like it's, it's just, uh, like there's like variants on the response, whatever you want. [01:00:54]Alessio: They won't do more of it. [01:00:54]Swyx: I mean, cause it's bad. [01:00:55]Jerry: No, but, but these models will get better and you'll probably fine tune a model to [01:00:59]Alessio: be a better judge. [01:00:59]Jerry: I think that's probably what's going to happen. So I'm like reasonably bullish on this because I don't think there's really a good alternative beyond you just human annotating a bunch of data sets, um, and then trying to like just manually go through and curating, like evaluating eval metrics. And so this is just going to be a more scalable solution in terms of the [01:01:17]Alessio: startups. Yeah. [01:01:17]Jerry: I mean, I think there's a bunch of companies doing this in the end. It probably comes down to some aspect of like UX speed, whether you can like fine tune a model. So that's end to end evals. And then I think like what we found is for rag, a lot of times, like, uh, what ends up affecting this, like end response is retrieval. You're just not able to retrieve the right response. And so I think having proper retrieval benchmarks, especially if you want to do production RAG is, is actually quite important. I think what does having good retrieval metrics tell you? It tells you that at least like the retrieval is good. It doesn't necessarily guarantee the end generation is good, but at least it gives you some, uh, sanity track, right? So you can like fix one component while optimizing the rest, what retrieval like evaluation is pretty standard. And it's been around for a while. It's just like an IR problem. Basically you have some like input query, you get back some retrieves out of context, and then there's some ground truth and that ranked set. And then you try to measure it based on ranking metrics. So the closer that ground truth is to the top, the more you reward the evals. And then the closer it is to the bottom where if it's not in the retrieve side at all, then you penalize the evals. Um, and so that's just like a classic ranking problem. I think like most people starting out probably don't know how to do this right [01:02:28]Alessio: now. [01:02:28]Jerry: We, we just launched them like basic retrieval evaluation modules to help users [01:02:32]Alessio: do this. [01:02:32]Jerry: One is just like curating this data set in the first place. And one thing that we're very interested in is this idea of like synthetic data set generation for evals. So how can you give in some context, generate a set of questions with Drupal 2.4, and then all of a sudden you have like question and then context pairs, and that becomes your ground truth. [01:02:47]Swyx: Are data agent evals the same thing, or is there a separate set of stuff for agents that you think is relevant here? [01:02:53]Jerry: Yeah, I think data agents add like another layer of complexity. Cause then it's just like, you have just more loops in the system. Like you can evaluate like each chain of thought loop itself, like every LLM call to see whether or not the input to that specific step in the chain of thought process actually works or is correct. Or you can evaluate like the final response to see if that's correct. This gets even more complicated when you do like multi-agent stuff, because now you have like some communication between like different agents. Like you have a top level orchestration agent passing it on to some low level [01:03:24]Alessio: stuff. [01:03:24]Jerry: I'm probably less familiar with kind of like agent eval frameworks. I know they're, they're starting to be, become a thing. Talking to like June from the Drown of Agents paper, which is pretty unrelated to what we're doing now. But it's very interesting where it's like, so you can kind of evaluate like overall agent simulations by just like kind of understanding whether or not they like modeled the distribution of human behavior. But that's not like a very macro principle. [01:03:46]Alessio: Right. [01:03:46]Jerry: And that's very much to evaluate stuff, to kind of like model the distribution of [01:03:51]Alessio: things. [01:03:51]Jerry: And I think that works well when you're trying to like generate something for like creative purposes, but for stuff where you really want the agent to like achieve a certain task, it really is like whether or not it achieved the task or not. [01:04:01]Alessio: Right. [01:04:01]Jerry: Cause then it's not like, Oh, does it generally mimic human behavior? It's like, no, like did you like send this email or not? [01:04:07]Alessio: Right. [01:04:07]Jerry: Like, cause otherwise like this, this thing didn't work. [01:04:09]Alessio: Awesome. Let's jump into a lightning round. So we have two questions, acceleration, exploration, and then one final tag away. The acceleration question is what's something that already happened in AI that you thought would take much longer to get here? [01:04:23]Jerry: I think just the ability of LLMs to generate believable outputs and for text and also for images. And I think just the whole reason I started hacking around with LLMs, honestly, I felt like I got into it pretty late. I should've gotten into it like early 2022 because UB23 had been out for a while. Like just the fact that there was this engine that was capable of like reasoning and no one was really like tapping into it. And then the fact that, you know, I used to work in image generation for a while. Like I did GANs and stuff back in the day. And that was like pretty hard to train. You would generate these like 32 by 32 images. And then now taking a look at some of the stuff by like Dolly and, and, you know, mid journey and those things. So it's, it's just, it's, it's very good. [01:04:59]Alessio: Yeah. [01:04:59]Swyx: Exploration. What do you think is the most interesting unsolved question in AI? [01:05:03]Jerry: Yeah, I'd probably work on some aspect of, um, like personalization of memory. Like, I think I actually think that I don't think anyone's like, I think a lot of people have thoughts about that, but like, for what it's worth, I don't think the final state will be right. I think it will be some, some like fancy algorithm or architecture where you like bake it into like the, the architecture of the model itself. Like if, if you have like a personalized assistant that you can talk to that will like learn behaviors over time, right. And learn stuff through like conversation history, what exactly is the right architecture there? I do think that will be part of like the wrong continuous fine tuning. [01:05:38]Swyx: Yeah. [01:05:39]Jerry: Like some aspect of that, right. [01:05:40]Alessio: Right. [01:05:40]Jerry: Like these are like, I don't actually know the specific technique, but I don't think it's just going to be something where you have like a fixed vector store and that, that thing will be like the thing that restores all your memories. [01:05:48]Swyx: It's interesting because I feel like using model weights for memory, it's just such an unreliable storage device. [01:05:56]Jerry: I know. But like, I just think, uh, from like the AGI, like, you know, just modeling like the human brain perspective, I think that there is something nice about just like being able to optimize that system. [01:06:08]Alessio: Right. [01:06:08]Jerry: And to optimize a system, you need parameters and then that's where you just get into the neural net piece. [01:06:12]Alessio: Cool. Cool. Uh, and yeah, take away, you got the audience ear. What's something you want everyone to think about or yeah, take away from this conversation and your thinking. [01:06:24]Jerry: I think there were a few key things. Uh, so we talked about two of them already, which was SEC Insights, which if you guys haven't tracked it out, I've definitely encouraged you to do so because it's not just like a random like sec app, it's like a full stack thing that we open source, right. And so if you guys want to track it out, I would definitely do that. It provides a template for you to build kind of like production grade rack apps. Um, and we're going to open source like, and modularize more components of that soon and do a workshop on, um, yeah. And the second piece is I think we are thinking a lot about like retrieval and evals. Um, I think right now we're kind of exploring integrations with like a few different partners. And so hopefully some of that will be, uh, really soon. And so just like, how do you basically have an experience where you just like write law index code, all of a sudden you can easily run like retrievals, evals, and like traces, all that stuff. And, and like a service. And so I think we're working with like a few providers on that. And then the other piece, which we did talk about already is this idea of like, yeah, building like RAG from scratch. I mean, I think everybody should do it. I think I would check out the guide. If you guys haven't already, I think it's in our docs, but instead of just using, you know, either the kind of like the retriever query engine and lamin decks or like the conversational QA train and Lang train, it's, I would take a look at how do you actually chunk parse data and do like top cam batting retrieval, because I really think that by doing that process, it helps you understand the decisions, the prompts, the language models to use. [01:07:42]Alessio: That's it. Yeah. [01:07:44]Swyx: Thank you so much, Jerry. [01:07:45]Alessio: Yeah. [01:07:45]Jerry: Thank you. [01:07:46] Get full access to Latent Space at www.latent.space/subscribe
01:08:0605/10/2023
Building the Foundation Model Ops Platform — with Raza Habib of Humanloop
Want to help define the AI Engineer stack? >500 folks have weighed in on the top tools, communities and builders for the first State of AI Engineering survey! Please fill it out (and help us reach 1000!)The AI Engineer Summit schedule is now live! We are running two Summits and judging two Hackathons this Oct. As usual, see our Discord and community page for all events.A rite of passage for every AI Engineer is shipping a quick and easy demo, and then having to cobble together a bunch of solutions for prompt sharing and versioning, running prompt evals and monitoring, storing data and finetuning as their AI apps go from playground to production. This happens to be Humanloop’s exact pitch.full show notes: https://latent.space/p/humanloopTimestamps* [00:01:21] Introducing Raza* [00:10:52] Humanloop Origins* [00:19:25] What is HumanLoop?* [00:20:57] Who is the Buyer of PromptOps?* [00:22:21] HumanLoop Features* [00:22:49] The Three Stages of Prompt Evals* [00:24:34] The Three Types of Human Feedback* [00:27:21] UI vs BI for AI* [00:28:26] LangSmith vs HumanLoop comparisons* [00:31:46] The TAM of PromptOps* [00:32:58] How to Be Early* [00:34:41] 6 Orders of Magnitude* [00:36:09] Becoming an Enterprise Ready AI Infra Startup* [00:40:41] Killer Usecases of AI* [00:43:56] HumanLoop's new Free Tier and Pricing* [00:45:20] Addressing Graduation Risk* [00:48:11] On Company Building* [00:49:58] On Opinionatedness* [00:51:09] HumanLoop Hiring* [00:52:42] How HumanLoop thinks about PMF* [00:55:16] Market: LMOps vs MLOps* [00:57:01] Impact of Multimodal Models* [00:57:58] Prompt Engineering vs AI Engineering* [01:00:11] LLM Cascades and Probabilistic AI Languages* [01:02:02] Prompt Injection and Prompt Security* [01:03:24] Finetuning vs HumanLoop* [01:04:43] Open Standards in LLM Tooling* [01:06:05] Did GPT4 Get Dumber?* [01:07:29] Europe's AI Scene* [01:09:31] Just move to SF (in The Arena)* [01:12:23] Lightning Round - Acceleration* [01:13:48] Continual Learning* [01:15:02] DeepMind Gato Explanation* [01:17:40] Motivations from Academia to Startup* [01:19:52] Lightning Round - The Takeaway Get full access to Latent Space at www.latent.space/subscribe
01:21:1729/09/2023
Heralds of the AI Content Flippening — with Youssef Rizk of Wondercraft.ai
Want to help define the AI Engineer stack? Have opinions on the top tools, communities and builders? We’re collaborating with friends at Amplify to launch the first State of AI Engineering survey! Please fill it out (and tell your friends)!In March, we started off our GPT4 coverage framing one of this year’s key forks in the road as the “Year of Multimodal vs Multimodel AI”. 6 months in, neither has panned out yet. The vast majority of LLM usage still defaults to chatbots built atop OpenAI (per our LangSmith discussion), and rumored GPU shortages have prevented the broader rollout of GPT-4 Vision. Most "AI media” demos like AI Drake and AI South Park turned out heavily human engineered, to the point where the AI label is more marketing than honest reflection of value contributed.However, the biggest impact of multimodal AI in our lives this year has been a relatively simple product - the daily HN Recap podcast produced by Wondercraft.ai, a 5 month old AI podcasting startup. As swyx observed, the “content flippening” — an event horizon when the majority of content you choose to consume is primarily AI generated/augmented rather than primarily human/manually produced — has now gone from unthinkable to possible.For full show notes, go to: https://latent.space/p/wondercraftTimestamps* [00:03:15] What is Wondercraft?* [00:08:22] Features of Wondercraft* [00:10:42] Types of Podcasts* [00:11:44] The Importance of Consistency* [00:14:01] Wondercraft House Podcasts* [00:19:27] Video Translation and Dubbing* [00:21:49] Building Wondercraft in 1 Day* [00:24:25] What is your moat?* [00:30:37] Audio Generation stack* [00:32:12] How Important is it to Sound Human? and AI Uncanny Valley* [00:36:02] AI Watermarking* [00:36:32] The Text to Speech Industry* [00:41:19] Voice Synthesis Research* [00:45:53] AI Podcaster interviews Human Podcaster* [00:50:38] Takeaway Get full access to Latent Space at www.latent.space/subscribe
52:3720/09/2023
Doing it the Hard Way: Making the AI engine and language 🔥 of the future — with Chris Lattner of Modular
Want to help define the AI Engineer stack? Have opinions on the top tools, communities and builders? We’re collaborating with friends at Amplify to launch the first State of AI Engineering survey! Please fill it out (and tell your friends)!If AI is so important, why is its software so bad?This was the motivating question for Chris Lattner as he reconnected with his product counterpart on Tensorflow, Tim Davis, and started working on a modular solution to the problem of sprawling, monolithic, fragmented platforms in AI development. They announced a $30m seed in 2022 and, following their successful double launch of Modular/Mojo🔥 in May, have just announced their $100m Series A.While the performance claims of Mojo🔥 and its promise as a fully multithreaded compiled Python superset stole the show, we were amazed to learn that it is a side project - and the vision for Modular’s Python inference engine is at least as big.Listeners will recall that we last talked with George Hotz about his work on tinygrad and how he wants to replace PyTorch with something faster and lighter, handwriting a “reduced instruction set” of operators himself. But what if the problem could be solved at even lower level - with the Python engine/runtime itself?Chris on CompilersChris’ history with compilers is well known - creating LLVM during his PhD (for which he won the 2012 ACM Software System Award), hired straight into Apple where he also made Clang and Swift (the iPhone programming language that replaced Objective-C), then leading the Tensorflow Infrastructure team at Google where he built XLA, a just-in-time compiler for optimizing a lot of the algebra behind TF’s workloads, and MLIR, a modular compiler framework that sat above LLVM to optimize ML graphs and kernels that were hard to represent in the LLVM IR. So as pretty much the best compiler engineer in human history, you’d justifiably assume that Chris is simply choosing to take his compiler approach to Python. And yet that is not how he thinks about compilers at all. As he says in our chat,“How do you enable invention? How do you get more kinds of people that understand different parts of this problem to actually collaborate? And so this is where I see our work on Mojo and on the engine……I don't have a compiler hammer that I'm running around looking for compiler problems to hit.”Today a small number of people at companies like OpenAI spend a lot of time manually writing CUDA kernels. But an optimizing compiler for AI leads to compilers as a means to an end for increasing software collaboration, expanding the ability of people with different skillsets and knowledge.“…What is the fundamental purpose of a compiler? Well, it's to make it so that you don't have to know as much about the hardware. You could write everything in very low-level assembly code for every single problem that you have… But what a compiler really does is it allows you to express things at a higher level of abstraction.”For Chris, compilers are also ways to properly automate generalized optimizations that might otherwise be manually coded and brittle abstractions, like operator fusion:“So NVIDIA goes and they build this really cool library called FasterTransformer. The performance point of using it is massive. So a lot of LLM companies and other folks use this thing because they want the performance.…Here's the problem. If you want to go innovate in transformers, now you're constrained by what FasterTransformer can do, right? And so, again, you come back to where are compilers useful?They're useful for generalization. If you can get the same quality result or better than FasterTransformer, but with a generalized architecture, well now you can get the best of both worlds, where you have orthogonality and composability, you enable research, you also get better performance.”Done correctly, these operator optimizations being implemented at the compiler level amount to an “AI Engine” that can not only survive, but enable major architecture shifts should a credible alternative LLM architecture come along someday.Modular — the Unified AI EngineModular’s original goal was to build the “Unified AI Engine” to speed up AI development and inference - one that doesn’t assume an “AI = GPUs” world that only benefits the “GPU-rich”, but one that treats AI as “a large-scale, heterogeneous, parallel compute problem”.Modular itself is an engine (separate from Mojo, which we cover below) that can run all other frameworks between 10% to 650% faster on CPUs (with GPU support coming in the fall):At Google, Chris’ job wasn’t to build the best possible compiler for AI. The goal was to build the best compiler for TPUs, so that all TensorFlow users would have a great Google Cloud experience. Similarly, the PyTorch team at Meta isn’t trying to make AI faster for the world, but mostly for their recommendations and ads systems. Chris and Tim realized that the AI engine and developer experience isn’t a product prioritized by any of the big tech companies (they tried) - so they see Modular as the best way to deliver the AI development platform of the future. The modularity of Modular shines through in the hot-swapping Inference Engine demo, which has to be seen to be believed.Mojo 🔥 — Blazing Fast PythonThe other piece of Modular is Mojo, a new programming language for AI that is a superset of Python. In some sense it is “the ultimate yak shave”: We were shocked to learn that Chris and the team didn’t initially set out to create Mojo, but it started life as an internal DSL to make themselves more productive.Mojo adopted Python’s syntax since it’s by far the most used language in machine learning and AI. It also lets them supports all existing PyPi packages, requiring no code changes for developers to go from Python to Mojo. Mojo comes with a lot of different underlying design choices that lead to much better performance:* It’s compiled rather than interpreted like Python* No GIL which allows for multi-threading* Better heap representation* Leverages MLIRIn the perfect test scenario that leverages all of these improvements, Mojo is up to ~68,000x faster than Python 🔥 (fire emoji is a valid file extension for Mojo files, btw!). Of course, that is just one microbenchmark, but as Jeremy Howard explains, most Python codebases should run between 10-100x faster simply by moving to Mojo with very minor adjustments. A community member port of Llama2 from Python to Mojo shows it inferencing >100x faster than Python, and 20% faster than the handcoded raw C implementation.The Modular team is embarking in one of the hardest technical challenges we’ve seen a startup tackle, and we can’t wait to see what comes out of it. We had an amazing conversation with Chris diving into all the details, which we hope you enjoy!Show Notes* Modular AI* Chris’ personal website* Scott Forstall* Bret Victor’s Playgrounds* Karpathy’s Tweets* Speculative Execution* Llama memory constraints* LLVM* Clang* Swift* TensorFlow* PyTorch* XLA* MLIR* TPUs* Guido van RossumTimestamps* [00:00:00] Introduction* [00:00:40] Chris's background - LLVM, Clang, Swift* [00:03:01] Chris's experience with Google TPUs and XLA* [00:05:47] The limitations of current frameworks like TensorFlow and PyTorch* [00:08:03] The benefits of using compilers for AI systems* [00:13:14] Enabling more collaboration between researchers through better systems* [00:20:55] Starting with CPU optimization instead of just GPUs* [00:24:36] Design principles and goals behind Modular* [00:32:41] The benefits of starting from a general compiler architecture* [00:35:13] Origins of deciding to create the Mojo language* [00:44:43] Goals for Mojo to become a true Python superset* [00:48:12] Thoughts on tinygrad* [00:52:00] ggml, quantization, etc* [00:57:00] Speculative execution and other gains from making Mojo more parallel* [01:01:50] Future of Mojo’s toolkit* [01:07:00] Why Modular is a company and not a foundation* [01:11:00] Learnings as a first time founder and engineering leader* [01:25:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai. [00:00:19]Swyx: Hey, and today we have Chris Lattner in the house. Welcome, Chris. [00:00:21]Chris: Hi both. Thanks for having me. [00:00:24]Swyx: We're so excited to have you. We have so many questions and we'll try to get through as many as we can. You're one of the easiest people to research I've ever had on the pod, because you document yourself extensively on https://nondot.org/sabre/. What's the story behind that, just quickly? [00:00:40]Chris: I mean, I've had that website for, since, I don't know, the mid-90s. So it's been a very, very, very long time, and I originally had a big personal page. Again, this was the mid-90s with all the scroll tags and all that kind of stuff. Yeah, exactly. [00:00:56]Swyx: The animated gifs. “Under construction.” [00:00:57]Chris: Yeah. It has been rebooted a few times, and web design is not my strong point, but the server was originally named after some fish we had. That was the origin of non-dot. [00:01:08]Swyx: I love it. I looked on Tanya's page and she has some spaniels. [00:01:12]Chris: Yep. We're dog people. We love many animals. [00:01:15]Swyx: So your quick bio, you did your PhD in CS in 2005, and then immediately went into Apple working on LLVM, the compiler framework that you created during your PhD. In our prep, you also maybe had a favorite Scott Forstall story. [00:01:32]Chris: Well, so I got to work with a lot of really interesting people at Apple. Scott was actually pretty famous. Scott is responsible for many things across the years, but he really drove the iPhone. At least the iPhone software, specifically. And so Scott was super interesting because he was kind of a high-maintenance person. He was very difficult to work with. He did not mind making other people wait for him. So there'd be all these exec reviews of Scott where the entire room is full of people. He's sitting across the hallway in his office for a half hour making people wait for him. And so when Scott was at Apple, I wasn't his biggest fan, I'll admit, but I actually have a lot of respect for a lot of the things he did. He drove a lot of the early iPhone stuff. He made the bet on Siri and a bunch of other stuff that he did. And so he's a very impressive person. I guess he's out of tech these days, but yeah, so many fascinating. [00:02:25]Swyx: My favorite story was the keyboards and how they basically had to invent predictive typing or it wouldn't work. [00:02:31]Chris: Yep. It's all software. So much of that, it feels obvious now because it's been developed for years and years and years, but it was like pure research and nobody knew if you could get all of that software to fit on such a constrained device for 1.0. So it's just an amazing time. [00:02:45]Swyx: Incredible. So I'll fill out the bio a little bit. You started working on Clang while at Apple, I think, as a front-end for C and Objective-C. You created Swift as well in 2010. And then in 2012, won the ACM Software System Award for LLVM, which I think is a crowning accomplishment for a lot of things. [00:03:01]Chris: I love to build things. [00:03:03]Swyx: You were VP of Autopilot at Tesla and then Senior Director and Distinguished Engineer at Google for TensorFlow. And then most recently, President of Product Engineering at RISC-V, or at SiFive, which builds RISC-V. [00:03:15]Chris: They're the inventors and they drive so much of RISC-V is a really fancy new instruction set for a lot of computing needs and led to a lot of AI chips and so much that exists out there. So it was a lot of fun. And so that was actually driving and building hardware. And so most of my career I spent on the software side of it. And so it was a lot of fun to be able to see the other side of how hardware comes together, how you design it, how you think about it, what are the trade-offs in that entire space. And so for a lot of years, I've been just on that hardware-software boundary. [00:03:48]Swyx: That's a lot of what we're going to talk about today with Modular Mojo. Well, so that's the brief history and you started Modular in 2022, about 20 months ago. What's one other thing on the personal side that people should know about you that people don't see on the LinkedIn because you're all into hardware-software boundaries and stuff? [00:04:05]Chris: I have kids, I like to do woodworking, I like to walk. And so often, I like to go walking with people and do walking one-ones and things like that. [00:04:15]Alessio: What's the latest woodworking project you've worked on? [00:04:18]Chris: Oh, I mean, I just built a Lego robotics table for my kids, so helping out with the school. And so, yeah, not the most fancy furniture, but I've also built furniture and many other things for the house. [00:04:29]Alessio: So I think the easiest thing for people to grasp so far has been Mojo, which is a superset of Python. And I think everybody talks about that because it's easier to grasp, but Modular's goal is to build a unified AI engine. And when I see unified, it implies things are not unified today, there's a lot of fragmentation, a lot of complexity. So let's start from the origin. What are some of the problems that you saw in the AI research and development space that you thought needed to be solved? [00:04:58]Chris: Yeah, great question. So if you go back just a few years ago to 2015, 2016, 2017 timeframe, AI was really taking off. It wasn't to the point where it is now, where it's obvious to everybody, but for those of us who were following, amazing things were happening. And that era of technology was powered by TensorFlow and powered by PyTorch, right? And PyTorch came a little bit later, but they're both kind of similar designs in some ways. The challenge there is that the people building these systems were driven by the AI and the research and the differential equations and the auto diff and all these parts of the problem. They weren't looking to solve the software-hardware boundary problem. And so what they did is they said, okay, well, what do we need to build? We need a way for people to set up layers. So we need something like Keras or NNModule or something like that. Well, underneath the covers are these things called operators. And so you get things like convolutions and matrix multiplications and reductions and element wise ops and all these different things. Well, how are we going to implement those? Let's go take CUDA and let's go take the Intel math libraries, Intel MKL, and let's build on top of those. Now doubt really well, but the challenge with that is that whenever you come out with a new piece of hardware, even if it's just a new variant of an Intel CPU, you have initially a small number of these operators. But today TensorFlow and PyTorch have thousands of operators. And so what ends up happening is each of these things get what's called a kernel. Each of these kernels ends up being written generally by humans manually. And so if you bring up a new piece of hardware, you have to then re-implement thousands of kernels. This makes it very difficult for people to enter the hardware space. The other side of it though is research, right? So if you're a researcher, very few people know how these kernels work, right? [00:06:41]This is coming in vogue. You hear about people writing CUDA kernels, for example. And I mean, the people who do this are amazing and I love them, but there's very few of them and the skill sets required to do that are just very different than innovating in model architecture, right? And so one of the challenges that we've seen with a lot of these AI systems has been the scalability problem of I can't find experts who can go write these kernels. Now, when I got involved with work at Google, we were working on Google TPUs. Google TPUs are one of the most successful at-scale training accelerators that exist. And one of the challenges that we face as a team is this challenge of saying, how do we bring up a novel piece of hardware given you have thousands of these different things? And really the goal at Google initially was catalyze and enable a ton of research. Now, one of the things that was done before I got there and that was novel and it attracted me there is people said, hey, let's use compilers for this. So instead of handwriting thousands of kernels and rewriting all of these operators and trying to do what Intel or what NVIDIA had done, they said, let's take a different approach. And compilers can be way more scalable than humans because compilers can allow you to mix kernels in different ways. And there's a number of these optimizations that are really important that you've talked about before, including kernel fusion, which can massively reduce memory traffic and things like this, and these other reassociations and optimizations that you want to be able to do. [00:08:03]Chris: And a compiler can do that in a very general way. Whereas if you're doing it with traditional handwritten kernels, what you get is you get a fixed permutation of the ones that people thought were interesting. And so the things that worked are the things that have already been important, not the things that researchers want to do next. And a lot of research is doing new things, right? And so the investment in compilers led to this thing called XLA, which is part of the Google stack. Really great, enabled massive exaflop scale computers, tons of amazing work was done with that. But there was another problem, right? The big problem was that, okay, well, it was brought up to enable one piece of hardware, in that case, Google TPUs. And it turns out building compilers is hard. And there's a different scalability problem, where before it was hard to hire lots of humans to write lots of kernels. Now you have to hire compiler engineers. And there are even fewer compiler engineers that know machine learning and know all this stuff. And so what actually happened there is that there's a bunch of technical innovation and a lot of good things that came out of it. But one of the challenges was something like XLA is it's not extensible. And so you can technically extend it if you're at Google and you work on TPUs and you have access to the hardware, right? But if you're not, then it becomes a real challenge. And so one of the things I love about the NVIDIA platform in particular is that if you look at CUDA, like many people get grumpy about CUDA for various reasons, but you go all the way back to when AI took off, like deep learning took off with the AlexNet moment, for example, right? So many people will credit the AlexNet moment as being a combination of two things. They say it's data, ImageNet, and compute, the power of the GPUs coming together. And that's what allowed the AlexNet moment to happen. But the thing they often forget is that the third part was programmability, because CUDA enabled researchers to go invent convolution kernels that did not exist, right? There was no TensorFlow back then. There was none of the stuff that existed. And so it's actually this triumvirate between data compute and programmability that enabled a novel kind of research to kick off this invention that became the entire wave of deep learning systems, right? And so to me, learning from many of these things, you have to learn from history, coming to modular saying, okay, well, how do we take the next step? How do we get to the next epoch in terms of this technology where we can get the benefits of humans who have amazing algorithmic innovation and ideas and sparsity and like all the things that are kind of on the edges of the research that could become relevant? How do we get the benefit of compilers? And so compilers do have amazing scale and generality to new kinds of problems. And then how do we get the benefit of programmability and mix all these things together? That set of insights is what led to modular and what we're doing with the AI engine. [00:10:44]Alessio: I think in one of your previous podcasts, you mentioned leaving people behind, you know, that are like not experts in certain things and they can't contribute. CUDA is great. And we had Tridao who created FlashAttention on the podcast. And when the new Cutlass version came out, he made FlashAttention too, because Cutlass was so much better. And like, he didn't have to worry about that. He could focus on it. How do you see the future of like AI development in kind of like a post-modular world? You know, do you think there's going to be a lot more collaboration at different levels of teams coming together? Or is one of your goals like allowing people that are not compiler experts to like not even think about it and assume they already got the best? [00:11:22]Chris: Yeah, well, so I mean, my general belief is that humans are amazing, but we can't always fit everything in our head, right? And so you have different kinds of specialities, different kinds of people. And so if you can get them to work together, you can get something that's bigger than any one of them, right? I have certain skill sets, but I barely remember differential equations, right? And so it turns out that I'm not going to be inventing the next great model architecture, [00:11:45]Swyx: right? [00:11:45]Chris: But I'm useful for some of the systems problems. And so if we can get these people working together and collaborating together and understanding how these things work, like new breakthroughs can happen. And so Tree's interview with you, I think is a great example of that, right? He explained how, you know, he was working on different parts of the stack. He got interested in the systems. And he's a research group with Chris Ray, right? They have applications people that they work with, right? And so it really does, in my opinion, come back to like, how do you enable this flywheel? How do you enable invention? How do you get more kinds of people that understand different parts of this problem to actually collaborate? And so this is where I think that, you know, you see our work on Mojo and on the engine and things like this, what we're doing is we're really trying to drive out the complexity of this problem because so many of these systems that have been built up, you know, they're just aggregated together, right? It's like, here's a useful thing that enables me to solve the problem I want. And it wasn't really designed top to bottom. And I think the modular world provides is a much simpler stack that's much more orthogonal, much more consistent, much more principled. And that enables us to like reduce complexity all the way up the stack. Whereas if you're building on top of all this fragmented kind of mess of history, right? You just kind of have to cope with it. And a lot of the AI, particularly on the research systems, right? They have this happy path. And so if you do exactly the demo, the thing will work. But if you try changing anything just a little bit, everything falls apart and performance is awful or it doesn't work or whatever. And so that's an artifact of this fragmentation at the bottom. [00:13:14]Swyx: So you kind of view compilers and languages as medium for which humans can collaborate or cross boundaries. [00:13:20]Chris: I like compilers. I've been working on them for a long time, but work backwards from the problem, right? And if compilers are useful or the technology is compiler technology is useful to solve the problem, then that's cool. Let's use it. I don't have a compiler hammer that I'm running around looking for compiler hammer. Compiler problems to hit. Yeah, exactly. And so here, you say, what good is a compiler? Like what is the fundamental purpose of a compiler? Well, it's to make it so that you don't have to know as much about the hardware. You could write everything in very low level assembly code for every single problem that you have. But what a compiler or a programming language or an AI framer really does is it allows you to express things at a higher level of abstraction. Yeah. Now that goal serves multiple purposes. One purpose is that you make it easier, right? Second goal is that my opinion is that like, if you push a lot of complexity out of your head, you make room for new kinds of complexity. And so it's really about reduction of accidental complexity so that you can wrestle with the inherent complexity and the problem. Another is that by getting abstraction, right, you enable, for example, one of the things that compilers are good at, particularly modern ones like we're building, is that the compilers have infinite attention to detail. Humans don't, right? And so it turns out that, you know, if you hand write a bunch of assembly and then you have a similar problem, well, you just like take it and hack it a little bit without doing a first principles analysis of the best way to solve the problem, right? Well, compiler can actually do a lot better than that because CPU cycles are basically free these days. [00:14:42]Swyx: Yeah, exactly. [00:14:42]Chris: And also higher levels of abstraction give you other powers. And one of the things I think is really exciting about deep learning systems and things like what Modular is building is that it has raised compute to this graph level. Once you have gotten things out of for loops and semicolons and, you know, out of the muck and into something that's more declarative, well, now you can do things where you transform the compute. This is something that I think that many people don't yet realize because it's kind of possible, but it's really such a pain with these existing systems is that, you know, a lot of the power of what this abstraction provides is the ability to do things like Pmap and Vmap, like where you're taking a computation and then transforming it. And one of the things I was very inspired by my time at Google is, you know, we started out with these very low level things and, you know, single node GPU machines and then clusters and then async programming, like all this very little stuff. And by the time I had left, we had had, you know, researchers in Jupyter Notebook training petaflop supercomputers. You just think about that. That is an enormous lift in terms of the tech. And that was made possible by a lot of very layered and well-architected systems, by a lot of, you know, novel HPC type hardware, by a lot of these breakthroughs that had happened. And so what I'd love to see is for that technology to get even more widely adopted, generalized and get out there and also kind of break down a lot of the complexity that got built up along the way. Beautiful. [00:16:09]Swyx: You use very precise terms, AI engine, AI framework, AI compiler. And I think that means special things for you, especially within the modular context. Do you care to define them so we can have context for the rest of the conversation? Yeah, absolutely. [00:16:22]Chris: That's a great point. When I think about framework, I'm usually talking about things like TensorFlow and PyTorch. These are things that, you know, most people building a model will use something like PyTorch to build it and train it and do things like that. Underneath that, you end up getting a whole bunch of ways to talk to the hardware. And often it's CUDA or Intel MKL or something like this. And so those things are the engine. And that interface of the hardware is generally what I think of when I talk about an engine. [00:16:48]Swyx: Right. And modular is a new engine. Yes. [00:16:51]Chris: And modular is providing a new engine that plugs into TensorFlow, PyTorch, and a whole bunch of other stuff. And then allows you to drive, manipulate, program the hardware in a new way. [00:16:59]Swyx: Which I would recommend everyone check out the products launch demo where you swapped it out in real time and it just kept working. [00:17:06]Chris: Yep, yep. [00:17:07]Swyx: That was a big flex. [00:17:08]Chris: So I believe in properly modular, properly layered, properly designed technology. And so if you get the abstractions right, you can do really cool things like this. [00:17:16]Alessio: Let's start diving deeper. So as you mentioned, you said between the framework level and the hardware level. So when it first got announced, I went on the website and I was like, wow, I wonder how many petaflops they get on an A100. And then I open and it's all CPUs. So my question is, everybody's trying to make GPUs go brr. Why are you making CPUs go brr first? [00:17:40]Chris: So this is the problem with doing first principles work. Is that you have to do all of the work from the beginning. And if you do it right, you shouldn't skip over important steps. What is an AI system today? Lots of people say, oh, it's a GPU. People are fighting over GPUs. They're always talking about, it's all about GPUs, right? AI, in my opinion, is actually a large-scale, heterogeneous, parallel compute problem. And so AI traditionally starts with data loading. GPUs don't load data, right? And so you have to do data loading, preprocessing, networking, a whole bunch of stuff. And then you do a lot of matrix multiplications. You do all the things that people usually talk about. But then you do post-processing and you send stuff out over a network or under disk, right? And so CPUs, it turns out, are necessary to drive the GPUs, right? And a lot of the systems, again, when you say, let's bring up software for the accelerator, what you end up doing is you say, okay, well, what can the accelerator do? It turns out it's a subset of the problem because they decided that the matrix multiplications or whatever they thought was important is the important part of the problem. So you then go build a system that does exactly what the chip will do. And you never have time to go solve the big problem. And so it's really funny when you look at something like a TensorFlow or like a PyTorch, so much of that host side compute problem, the CPU work, ends up being in Python, ends up being in these things like tf.data and stuff like this. Not programmable, not extensible, really slow in many cases, very difficult to distribute. And so there's a huge mess here. Also, if you look at CPUs, it turns out they are accelerators. So CPUs these days have tensor cores. They just get funny names like AMX instructions and things like this, right? And the reason for that is that it used to be that CPUs and GPUs were completely different things. What's happened over time is GPUs get more programmable and more like CPUs, and CPUs get more parallel. And so what's happening is we're getting a spectrum of this technology. And so when we started modular, we said, okay, well, let's look at this from a technology perspective. Hey, it makes sense to build a general thing because once you have a general thing, you can specialize. As I've seen with XLA and some of these other stacks, like it's very hard to start with the specialized thing and then generalize it. Also, it turns out that, you know, where's the spend in AI? Well, I mean, different people are spending different amounts of money, different things, but training scales the size of your research team, inference scales the size of your product and user base and everything else. And so a lot of inference these days is still done on CPU. So what we decided to do is we said, okay, well, let's start with CPU. Let's get improve the architecture. CPUs are also easier to work with and they don't stock out and they, you know, they're easier for a variety of other reasons. And let's prove that we can build a very general architecture that can scale across different families. And so what we showed is we showed, okay, we can do Intel, AMD, we can do this arm Graviton thing and showed a lot of support for, you know, all the different weird permutations of things within even an Intel CPU. There's all these different vector lengths and all this stuff going on and showing that we could beat the vendor software with much more general and flexible programming approaches. And then from there, yes, we're doing GPU. We'll have GPUs coming out soon. And then when you build into that, right, what you get is you get the benefit of a well considered, well layered stack that has got all the right DNA in it. And so then you can scale into these different kinds of accelerators over time. [00:20:55]Alessio: What are some of the challenges to actually build an engine? So I think the CPU point people have. So that's why you see LLAMA, CPP, you see some of this quantization where most people are thinking, let's take the model, quantize it, make it runnable on CPU and do that. You were like, no, I'm kind of like more crazy than that. How about we redo the whole engine? How does that differ in terms of work? So the model work is very kind of like weight specific. Yours is more like runtime, compiler specific. What does your team look like? And what are the challenges that you tackle to make an engine happen? [00:21:29]Chris: In terms of the technology or? [00:21:31]Swyx: Yeah. [00:21:31]Alessio: Kind of like, how do you even start? Like when you started this company, kind of like some people said, I'm going to change the weights and quantize them. You were like, I'm going to change the engine. You know, what are some of the low hanging fruits, maybe some of the initial challenges that you're working on? [00:21:45]Chris: Well, so, so I think a lot of what characterized modular is doing things the hard way to get a better outcome. [00:21:52]Swyx: Right. [00:21:52]Chris: So many of the people on our team, we've worked on all of the systems. So, you know, I worked on XLA and TensorFlow, the people that worked on PyTorch, TVM, the Intel OpenVINO stuff, like all of these weird things that have been created in the industry, Onyx Runtime, right? We have several really great people from there. And so many of these people have been working on these systems. And the challenge with them is that many of these systems were designed like five or eight years ago. [00:22:17]Swyx: Right. [00:22:17]Chris: And so AI was very different back then. There were no LLMs, right? I mean, it was a very different world. And so the challenge is, is that when you build a system, it starts out by being a pile of code and it gets bigger and bigger and bigger and bigger and bigger. And the farther along its evolution you get, the harder it is to make fundamental changes. And so what we did is we said, okay, let's start all the way at the beginning. Just like you're saying, yes, it's much harder. Again, I like to build things and I think our team likes to build things. And so you say, well, how does threading work? By the way, it's not often known, but TensorFlow, PyTorch, all these things still run the same thread pool that Caffe ran on. Widely known to be a huge problem, leads to massive performance problems, makes latency super unpredictable when you do inference. That one, a very specific set of design choices to make the thread pool block and be, you know, be synchronous. And like the entire architecture at the very bottom of the stack was wrong. And once you get that wrong, you can't go back. And so our thread pool assumes that no test can block. You have very lightweight threading, right? This goes directly into everything that gets built on top of it. You then go into things like, okay, well, how do you express kernels? Well, you still want to be able to handwrite kernels and we start by prototyping things in C++, but then you also get up into the mojo land. And so you build, you know, a very fancy auto-fusing compiler using all the best state-of-the-art techniques while also going beyond state-of-the-art because we know that users hate static shape limitations, lack of programmability. They don't want to be tied just to tensors, for example. And so a lot of LLMs have ragged tensors and things like that going on. Tabular data, you have like all these things. And so what you want to be building and one of the benefits of architecting things from first principles is that you can take all the pain that you've suffered and felt in other systems and you've never had a chance to do anything about it because of schedule, because of constraints from various kinds, and you can actually architect and build the right thing that can scale into that. And so that's, that's the approach we took. And so a lot of it was very familiar work, but it's very hardcore design engineering and you really need to know the second and third order effects of each decision. And fortunately, a lot of the stuff isn't research anymore. It's pretty proven. [00:24:31]Swyx: So you mentioned some design goals that you have in first principles. Do you have a list? [00:24:36]Chris: In what sense? [00:24:40]Swyx: Off the top of your head. Like, I think it's very useful when designing systems to have that list of principles. And I think you very much think of yourself as a first principles thinker, but I think your principles differ than most. And you've gained this insight over just studying a lot of AI work over the years. What are they? [00:24:55]Chris: I don't know that I have one set of principles that I, you know, it's like one, one club that I go around and beat things with. But a lot of what we're trying to do is we're trying to unlock the latent potential of a lot of hardware and do so in a way that's super accessible. And so a lot of our starting conditions was not like enable a new thing. It's much more about drive out the complexity that people are struggling with to do the thing. And so it's not research. It's about design and engineering. Now, when you look at this, we're also driving from, okay, let's enable the maximum power of any given piece of hardware. So if you talk to an LLM company and they just spent $200 million on GPUs and their A100 GPUs of a specific memory size or whatever, right? They want to get everything possible out of that chip and they don't want a lowest common denominator solution. Right? And so you want, on the one hand, full power. You want to go all the way down to the metal and be able to unlock these things. And some of these researchers like, like tree and others, I mean, they're freaking amazing. [00:25:57]Swyx: Right? [00:25:57]Chris: But on the other hand, a lot of other people want more portability, generality, abstraction. [00:26:03]Swyx: Right? [00:26:03]Chris: And so the challenge becomes how do you enable and how do you design a system where you get abstraction by default without like giving up the full power? And again, a lot of the compiler systems that have been, you know, compiler for ML type things have really given up full power because they're just trying to cover one specific point in the space. And so owning that and designing for that, I think is really important to what we're [00:26:25]Swyx: doing. [00:26:25]Chris: And other pieces, just sympathy for users, because a lot of people that get obsessed about the tech forget about the fact that the people that will be using it will be very different than the people that are building it. That aspect is actually really important when your developer tools fundamentally is to understand that the developers that are using it, they don't want to know about the [00:26:44]Swyx: tech. [00:26:44]Chris: One of the things that's super funny about working on compilers is nobody wants to know about a compiler. You're building a Mojo app or you're building a C app or whatever, right? You just want the compiler to get out of your way or tell if you did something wrong, right? If you're thinking about the compilers because it's too slow or it's, you know, broken in some way or something. And so AI tech should be the same way, right? I mean, how much of building and deploying a model is fighting with the tools? Get some crazy Python stack trace out of some tool because it covered the special case and now you're off that happy path, right? And so that compassion for users is something I think that, largely because AI infrastructure is so immature, but it's never been really part of the ethos of the people building tools. [00:27:22]Swyx: You chose things like, you know, your third pool has everything non-blocking. The sum of your first principles have led the module inference engine to be two to three times faster than PyTorch and TensorFlow, right? [00:27:33]Chris: Oh, I was trying to look at it. [00:27:34]Swyx: I'll show a decomposition of performance. Okay, well, yeah. So you can talk about that too. [00:27:38]Chris: So one of the really funny things that if you get it wrong, it's very difficult to fix is asynchrony. And so when you think about, I have a CPU and I have a GPU and they talk to each other, most people think about it in terms of CPUs doing some stuff that throws a CUDA kernel across the fence, GPUs go brr, right? And then when there's results, you know, you read it back, right? But that's actually a really inefficient way to run a computer. What you actually want is you want to think about there's two different computers that are both executing and they're sending messages back and forth to each other. So I built hardware, right? If you go all the way down to the gates, when you look at this, these computers, whether they're the tiled cerebrus wafer thing, right? Hardware is implicitly parallel. All of these things are always running all the time and they're communicating with each other. And so starting from an asynchronous programming model means that you can get accelerators that send messages to each other because that's the natural form of the hardware. When you get into CPUs, CPUs, you have, you know, 88 core CPUs or a hundred core CPUs these days, even if you have four, right? What they really are is there are four completely independent computers. And so, yeah, they send cash lines across the fabric at each other, right? But they're async, right? And so much of the programming model that people start with is always sync. And so when you build into the stuff, you say, okay, well, that's a huge problem. The consequence of getting this right is that now you get overlapping work and it comes for free, right? And again, simplicity, the right architecture leads to the thing just magically happening. One of the great projects we did at Google back in the day involved some of this stuff and it led to a 2x improvement in ads throughput. Ads is a very tuned workload, right? And getting TPUs and CPUs to work at the same time and overlap that compute was a huge deal. And the fact that it just falls out of an async architecture is quite important. And again, you look at this at all levels of granularity, networking is asynchronous. So as soon as you distribute a compute problem across a network, async is there, right? And so all of these systems are kind of designed in the wrong way. You go up a level of the stack. So you have these operators, right? Super interesting how this whole ecosystem evolved. If you dig into something like TensorFlow or PyTorch, right? You know, you get to the point where you have a matrix multiplication. And so like you've talked about before on your podcast, kernel fusion is really important. And the way people did that historically is they say, okay, well, I have a matrix multiplication and oh gosh, it's often followed by a ReLU. Well, I'll make a MatMul ReLU fused kernel, right? Cool, and that's a huge performance improvement because ReLU is just a max operation and you avoid tons of memory traffic, all good stuff, right? You run into these scalability problems because now you get things like a fused attention layer. So what is the consequence of saying, I'm going to manually tune the things that are important for mlperf or something, right? Well, what ends up happening is, again, you get these happy paths and they work way better than the default path. And so if you look within the NVIDIA world, for example, there's a ton of focus on transformers. And so NVIDIA goes and they build this really cool library called Faster Transformer. The performance point of using it is massive. Like it's a big deal. And so a lot of LLM companies and other folks use this thing because they want the performance. Performance turns into cost and throughput and all good things. Here's the problem. If you want to go innovate in transformers, now you're constrained by what Faster Transformer can do, right? And so, again, you come back to where are compilers useful. They're useful for generalization. And so if you can get the same quality result or better than Faster Transformer, but with a generalized architecture, well, now you can get the best of both worlds where you have orthogonality and composability, you enable research, you also get better performance. One of the things that you ask, like, how can we beat state of the art? Well, it's because it turns out compilers have more attention span. And it turns out that what's happened, even within like the NVIDIA product line, or even within the Intel product line, or even within one vendor's line of technologies, is that they have to build these little compilers because there's so much variation across the product family. If you look at an Intel product family, for example, they're building software that has to run on many different versions of this architecture. And they come out and they add a cool new dot product instruction, or they add beeflet 16 support, or they add whatever. And so what's been happening in the industry is that each of these companies have been building their own little compilers. And so their own little compilers are, again, they're focused on one part of the PROM domain. They have all these issues. They're not scaled very well. And so you get either, again, another fragmented part of the space where something will work really well, usually for a benchmark, right? But then it doesn't work well when people try to do new things. And so kernel fusion turns out to be one of those things. The programmability side, right? I mean, you just keep working your way up the stack. Matrix multiplication is really important. So who's that thing that hasn't been invented yet? I mean, we have folks that are using our stuff that care about computational fluid dynamics, right? And things like this, where it's really more of HPC, linear algebra, like more general than deep learning, right? And they want to use the same technology because all this technology is general purpose. And so enabling people to express their PROM domains, and often they're experts in fluid dynamics, which I know nothing about, by the way. [00:32:41]Swyx: I mean, diffusion is another one that relatively recent new technique. Yeah, right. [00:32:47]Chris: And so like enabling people to innovate in this way without having to know all that thread pool, right? You know, they don't want to know about a thread pool. And so enabling people to be able to focus on the part of the stack they care about and have it compose in is super important. Again, many systems have been built that tackle individual pieces of these PROMs. They end up usually having very specific constraints and limitations and problems. And so what we're doing is we're saying, okay, let's do the hard thing. Go all the way back. Let's actually build things in the right way and layer them up and do so in a way that composes correctly. And then what that means is you're driving away all that complexity that comes from, you know, the blocks don't plug together. [00:33:25]Alessio: Yeah, even at the hardware level, I'm sure that the cerebros of the world are like really happy that you're building this because now they can offer binding. And then I think that's one of the main complaints from developers is like these chips sound great, but like, how do I use them, you know? [00:33:43]Chris: Well, and that's one. So we're still early in our journey, but I care a lot about hardware and we have many friends in the space. The challenge again, so I worked on TPUs as one example, but certainly not the only one. The challenge, if you're building innovative hardware is you have to build the entire stack from the very bottom to the top. And so if you talk about a cerebros, right, they've built some amazing stuff, but they've had to build their own vertical software stack. And now it doesn't work the same at the top level as anything else. And so even if it's really good, right, it means that there's this huge barrier of entry for a developer to switch to their tech stack. Sometimes they're, some of these things are better than others, let's just say, right? And so it turns out building stuff is really hard. And so a lot of what we're trying to do is, again, we're putting down bricks. Like we have to take steps in logical order. We have to build the technology in the right way. Like I insist that we do everything at a super high quality. But when you do that, what that means is that then you can have a thing that you can plug into. And no, we can't turn a cell phone into a data center supercomputer, right? But if you want to quantize your model, you shouldn't have to use different tools for a cell phone than you use for a supercomputer, right? It turns out the intake's the same. Yeah. [00:34:50]Alessio: Let's keep working our way towards the 35,000 times faster number that is out there. So you kind of keep going up and then you get to the Python level. [00:35:00]Swyx: Yep. [00:35:00]Alessio: And you're building Mojo, which is a Python superset. I'm also sure you didn't wake up one day [00:35:06]Swyx: and you were like, [00:35:06]Alessio: yeah, that sounds like a fun thing to do, creating a Python superset. Yep. What are some of the limitations that you saw there? [00:35:13]Chris: Yeah, well, so I'll tell you where it came from. Because when we started Modular, we had no intention of building a programming language. So this is the, again, it's not looking for reasons to invent a language. But if we have to invent one to solve a problem, then cool, let's do it. So what we did was we said, okay, let's start, again, thread pools and other very basic stuff. How do we integrate with existing TensorFlow PyTorch systems? Turns out that's technically very complicated and very yucky. But then you get into the more, okay, let's get the hardware to go broom, right? Prom, right? And so then what we decided to do is we invented a whole bunch of very nerdy, very low level compiler tech. And so our compiler, yeah, it does autofusion and stuff like this, but it's designed for cloud first compute. Because there's more than one computer in the world, right? [00:36:00]Swyx: And things like this. [00:36:00]Chris: And so caching, distribution, like all these things get built into the compiler. You want to use things like auto tuning, [00:36:06]Swyx: right? [00:36:06]Chris: Because of all the complexity in the hardware and humans are great at algorithms. Attention span is not always the right thing. And so there's these requirements that came out of this. And so what we did is we built this pure compiler technology and validated it to show that we could generate kernels with very high performance. We got to the point where we're building that all and we were writing this very low level MLIR stuff by hand. We're happy enough with it at the time, but our team hated writing the stuff by hand. And so we needed syntax and said, okay, well, this looks like a language. And so what choices do we have? We could either do a domain specific embedded DSL like thing, like Halide, or there's a whole bunch of these things [00:36:45]Swyx: that are out there, [00:36:45]Chris: or we could build a programming language. And so again, saying, let's do it the hard way because it gives you a better result. The problem with the Halide or like the OpenAI Triton thing, or like there's a whole bunch of stuff that's kind of in this category is that they have terrible debuggers. The tools around it are really weird. They demo really well, but often are best used by the people who built the tools themselves, things like this. What we decided to do is say, okay, well, let's go build a full programming language. I know how to do that, built Swift, learned a few lessons. I know both how to do it, but also what a big commitment it is to do that. And the consequence of that is you can do something that's much better. Now you have to go shopping for syntax, right? And so we'd built all this pure technology and we could do anything we wanted. Could use Swift, could use C++, [00:37:31]Swyx: could use whatever, [00:37:31]Chris: but obviously the entire ML community is around Python. And so we said, okay, well, let's go use Python. And then how are we going to do that? Again, you dive into these levels of decision-making and it's like, okay, well, there's a lot of things that are like Python, right? [00:37:45]Swyx: But they're not, [00:37:45]Chris: and they don't get adoption and they have huge problems and they fragment the community and all the things. And so I said, okay, well, let's actually do it the right way. Let's try to build something that it'll take time to get there. But in the end, it's a super set of Python. [00:37:57]Swyx: Why? [00:37:57]Chris: Well, Python syntax isn't actually the important thing. It's the community, the entire body of programmer muscle memory, right? Like all of these things are actually the important thing. And so building a thing that looks like Python, but it's not was never a goal. Let's go actually build and again, do the hard thing that leads to a better quality result that'll be better for the world. Even if it takes a little bit longer to build. I'm shocked. [00:38:21]Swyx: My jaw was like dropped the entire time you were saying this because this sounds like it's just a massive yak shave to improve your tooling to make yourself more productive, [00:38:29]Chris: which is crazy. [00:38:31]Swyx: Like most people start out trying to do the language first, but you came at a great point. [00:38:36]Chris: So we built it and we started on this path to make it so that our team would be more productive. And we say today, like the most important Mojo developers are at modular. And that's actually really important when you're building a language is use it yourself. This was a mistake we made with Swift is we built Swift to solve a people don't like objective C syntax problem. Roughly, but we did not have internal users before we launched it. Not significant ones, right? And so with Mojo, like we're actually using it. And it's the thing that powers all the kernels in our engine. And so it's actually needs to be production quality. But then you realize that shaving the act that finally is actually not actually not worth it, right? [00:39:15]Swyx: And we realized, okay, [00:39:15]Chris: well, Mojo is actually useful to lots of other people. And so this is when we announced it. We said, okay, well, yeah, we'll make this a standalone thing because we think it's valuable and interesting to the rest of the world as well. And then, of course, we'll invest in it more because it's not just us and we can tolerate pain, but we want people to fall in love with good tools. [00:39:31]Swyx: Yeah. And obviously you had a great stack already and good team, but like how long from realization that, oh, we need to start looking around for a language to something that looks like Mojo today? [00:39:41]Chris: Yeah. So the lexer and the parser for Mojo started in October. [00:39:45]Swyx: Wow. [00:39:45]Chris: So it's less than a year old. [00:39:48]Swyx: Yeah. [00:39:48]Chris: This is also another thing is that I'm a very strange person in many ways, right? My ideas of what are hard problems are really different than other people, right? But Mojo is a much smaller language than Swift is. [00:40:00]Swyx: Yeah. [00:40:00]Chris: And even when it's done, it will be a much smaller language. And so compared to building Clang, which is a full C++ compiler or Swift, which is itself a very complicated, fancy system for a variety of reasons, right? This is actually a small project. Yeah. Yeah. [00:40:15]Swyx: You still have to pick design choices from like Rust and whatever [00:40:18]Chris: Yeah, well, absolutely. And so we will see what happens with Mojo over time. I would like a big chunk of our stack that is currently written in C++ to eventually move over. And so having a very good system programming language that scales is quite valuable and useful for lots of reasons. [00:40:32]Swyx: One of the other things [00:40:33]Chris: I'll share with you is that starting from CPU, starting from the general thing that you then specialize leads to these design points, for example, in Mojo, where you say, okay, well, if I care about high performance data loading, that needs to be super parallel. I care about disks being parallel and network being parallel and async and all this stuff that needs to be safe, right? And so with Swift, we built a memory-safe parallel programming abstraction called Actors. We've built all this stuff. And so being able to take the lessons learned from building [00:41:03]Swyx: it the first time [00:41:03]Chris: and driving it into a system the second time means that you can make something that's much better than the first time around when you were just figuring things out. So, but starting from generality is really important. [00:41:14]Swyx: Every single language designer I've ever talked to has emphasized a playground and I was browsing your site and I realized that you had called the Xcode and Swift playgrounds a personal passion and you were inspired by Brett Victor. I guess, what have you learned about building a good playground? Because you just released modular like a few days ago, sorry, Mojo a few days ago, I was able to go in and play with it. What have you learned? And maybe what goes, what is underappreciated about like a good playground? [00:41:38]Chris: Yeah, well, so when we were building Swift, there's this big question about how do we do something better than what Objective-C had? Yeah, right. And so naturally it's like you've gone through all this work, [00:41:48]Swyx: you're building this new thing, [00:41:48]Chris: what can you do with it? When we first launched, we wanted to make something very visual. Apple's a very visual company, right? It likes user interfaces [00:41:56]Swyx: and stuff like this. [00:41:56]Chris: And it turns out that we as humans, many of us are very visual learners and thinkers. One of the things that playgrounds for iOS and for the Mac allows you to do is play with time. And so what happens is that there's a graphical view of a canvas roughly, right? You then run your program and you have a ball bouncing or whatever the thing is that's happening. And now you can scrub through time because it can log and keep track of a bunch of state. And so this is one of the cool things about building systems and controlling it top to bottom is that you can build these kinds of experiences. One of the fun projects I was able to work on at Apple is this thing called Swift Playgrounds. And so it's actually an iPad app. The entire purpose is to teach kids how to code, right? And so one of the cool things about that is that that led to this whole area of research, to me at least, and around UI design for saying, for Playgrounds, how do I do coding on an iPad without popping up a keyboard, right? And so, exactly, very interesting technical problem, very different than compilers, turns out, right? And so we spent a lot of time working on gestures for like, you know, moving braces and blocks and refactoring code and doing all this stuff, making it so that it's super predictably understood what identifiers were in scope. And so complete the identifiers instead of you having to type them, instead of typing in numbers, like you get a little spinner. [00:43:12]Swyx: That's not just for kids. [00:43:14]Chris: And so it's super awesome. One of the things that came out of that is the current iPad keyboard allows you to swipe down on keys instead of going through modifiers. And so that came out of that project. And so there's a lot of the stuff where being able to build this stuff enables you to re-ask old questions. Yeah. [00:43:33]Swyx: Oh, that's great. I love the scrubbing stuff. And Brent actually worked at Apple. It probably overlapped with you. I actually never met him. [00:43:39]Chris: Yeah, so I'm sure it's a giant compound. Yeah, so coming back to Brett Victor, so Brett did a whole bunch of research on user interface paradigms for kind of explaining how code works. And so he wrote up many different, it seems like a worry dream or something is his blog or something. And he has a whole bunch of like concept demos and things like this. And so it was super inspirational. And so a lot of what we were doing was saying, okay, well, can we get this actually out to people to actually use? And so that was a lot of fun. So Mojo doesn't have anything quite as cool like that yet. But we'll see. [00:44:13]Swyx: There's a whole community [00:44:13]Chris: of people building cool stuff. [00:44:15]Swyx: And a lot of people are saying, [00:44:15]Chris: oh, we should have UI libraries and stuff like this. And Mojo is not gonna build a UI library. But there's a lot of cool people on the internet that know how to do this well. And I'd love to see that. [00:44:25]Alessio: Let's list some of the known things about Mojo that people like. It's compiled instead of interpreter. There's like no global interpreter lock. The heap representation is different. Use MLIR. What are maybe some of your favorite or like most underrated things about Mojo that you haven't covered? Well, so I think that [00:44:43]Chris: there's two ways of looking at Mojo. Most common way is it's like a Python plus plus. Again, I've been working on this stuff [00:44:49]Swyx: for a long time. [00:44:49]Chris: It kind of been there before, right? And so if you look at Swift versus Objective-C, what Objective-C is, is it's this really interesting language that many people don't know anymore, but where you have effectively small talk, which has super dynamic objects combined with C, right? And so the way Objective-C worked in the first iPhone and Macs for years were all built with Objective-C. Is that the high-level libraries are all built with the super dynamic, you know, you could inject methods and override things and hack the class hierarchy and all this stuff, completely dynamic object model combined with C, which is really good at executing things efficiently, [00:45:25]Swyx: right? [00:45:25]Chris: And so one of the reasons that Objective-C scaled so well, for example, in the first iPhone, which was super CPU constrained, was that anytime performance was a problem, you could drop down to C. So in the case of Swift, what happened is we said, okay, well, we want to keep all the things that are good about Objective-C. So it has to be dynamic classes. You have to be able to do all this kind of stuff. We have to work with all of the Objective-C frameworks, but then we want to be able to make one thing that scales, so it's not two different worlds glued together. Python is the same thing as Objective-C, [00:45:53]Swyx: right? [00:45:54]Chris: But turn on its head, where instead of being objects and C, it's like what people think of as Python, like a very high-level dynamic, flexible programming model, but then it's also glued onto C for the execution layer, right? And so you look at something like NumPy has a very nice Python layer, or even TensorFlow or PyTorch, very nice Python layer, but underneath the covers, it's all C is C++. And so a lot of what we're doing in Mojo is, you know, we learned a lot from Swift and things like this, but it's kind of conceptually similar, where what you're doing is you're saying, cool, it's not about whether dynamics good or static is good. They're both good. They're good for different problems. So let's put them together in a consistent thing and allow you to reach for the right answer for a given problem instead of being religious about it, like dynamic typing is the right answer, right? Just say like, cool, dynamic typing is great. We can see all the benefits. A lot of people love this and it's super productive and expressive. But if you want better performance, you can reach for static typing, right? And so a lot of, I think what Mojo is, is it's progressive in terms of like, get out of arguing about stupid things that don't matter. Just let people solve problems, right? And I think that is hopefully what people see in it. Now, I mean, we can dive into other things. So Mojo learns from Rust, for example. Rust is a wonderful community with a lot of cool stuff going on. It's kind of hard to learn. And so can we take the type system innovations like lifetimes and features like that, pull them forward into a thing and make them easier to learn? If so, then we get a lot of the benefits of the safety and the other things that Rust gives and performance and all the good things, [00:47:24]Swyx: no garbage collector, [00:47:24]Chris: all the stuff that people love about Rust, do so in a way that's a lot easier to learn, right? [00:47:28]Swyx: And so it'll borrow a checker. [00:47:30]Chris: Do have a borrow checker. But one of the challenges with Rust is that, in my opinion, it's more cultural. I mean, there are definitely language design issues that antagonize it a little bit, but a lot of it is the culture, right? And so a lot of the culture of Rust is very much thou shalt borrow and expose references to everything. And the pervasive library model around Rust ends up being culturally very low level, but you could write much higher level libraries in Rust if you wanted to. And so what we're doing with Mojo is saying, okay, let's take the tech, let's fix some of the language issues [00:48:04]Swyx: and things like that, [00:48:04]Chris: but let's define a new culture. And so as we roll out new features and new enhancements into Mojo, you'll see more and more of that over time. [00:48:12]Alessio: — So one of the things that George Hotz talked about on the podcast is XLA is like a CISC and tanning dry is a risk. You built XLA, so... — Your response. — Exactly. We got the other side of the thing. What are your thoughts on that and what are the right trade-offs to make? [00:48:29]Chris: — Yeah, so I contributed to XLA. I didn't write the whole thing, but yeah. — And you worked on RISC. [00:48:34]Swyx: — Yeah. [00:48:35]Chris: Also, I love George. He's a very interesting person. He's very enthusiastic, and that's really cool. It seems like he's learning his first compiler, though, because what he's doing is he's building what's widely known as a tensor contraction compiler. And so he's identified one sub, sub, sub, sub, sub [00:48:53]Swyx: part of the problem, [00:48:53]Chris: which turns out to be really important, which is how do you express the matrix multiplications and stuff like this. And he's learning how to build a compiler for that. He doesn't care about performance, as he talked about, and performance is not great. And so he has different sets of goals. But what he's doing is he's reductively turning AI into a matmul, something that a polyhedral compiler or something like that would tackle. And that's cool. Been there, done that. The problem with that is it doesn't scale. It turns out that there are a lot of things in AI that are not just matmuls. And so one of the challenges that I predict he'll run into is when you get out to those problems, now suddenly you'll have two systems. Simplest example, this is like the data layer will be completely different, right? And so there'll be this interface. What happens when there's this phase change between how the system works? Is it easy to use? Is it composed? What happens? [00:49:45]Swyx: I don't know, right? [00:49:45]Chris: So George is a super smart guy. We'll see what he comes up with. The other thing I'd say is that he's very focused on building and learning and doing things in an opinionated way that he likes. He's not being super user-centric and meeting people where they are and trying to get and lift people and do the things they're already doing, but do them better. And so it'll be interesting to see if he gets a community of people that are actually building things that are kind of beyond his circle. But he's a very smart guy. And I think that some of the stuff he's doing will be really cool. And I think it's also really interesting because he's showing the world, like the Jaxx people, that you don't need all of PyTorch to build a framework. [00:50:21]Swyx: Right? [00:50:22]Chris: And so that truth, I think, is I think maybe two-sided because on the one hand, the tasteful subset of AI infra, however you want to look at that, is actually relatively small. But the complexity that you need to be able to integrate into a production system, deal with quantization, deal with all these things you actually need for really high performance, like really push the boundaries of what people are doing, that's where it gets hard. And so I have no way to predict where it'll go. But if you want to make a risk versus risk argument, well, it's risk until you want to do new things. And what he's identified as a subset of the problem that you can model in a very, very nice, beautiful way, which is known, but there's a lot of the rest of the problem. And so if you've compressed, you know, he talks about XLA having 150 ops, XLA could have a 10th of that. If you just said it's element-wise with an enum, which is kind of what he does. And so that's not really the right question. The right question is what can you express? And can you express a big enough part of the problem for it to be useful? And so, I don't know, we'll see where it goes. [00:51:24]Swyx: That's fascinating. Some good advice in there, I think, from engineer to engineer. Yeah, well, so, I mean, [00:51:29]Chris: but George's goal and my goal are very different. That's the important thing. It's like George's, he's building a thing to understand it. It's the best way. I mean, from what I understand, I haven't talked with George about this. And he wants it locally run transformers. [00:51:43]Swyx: Well, yeah, which is cool. [00:51:44]Chris: And I want that too. We'll talk about that in a few months, but so we have similar technical goals [00:51:51]Swyx: in some cases, right? [00:51:51]Chris: But the way he's approaching the problem is build a thing to learn it, right? And so he's very happy to talk about how he'll like rip the whole thing up and throw it away. And that's super awesome. He's building it like a research project. Like we're building it in a very different way saying, okay, we know that PyTorch is yucky in various ways or TensorFlow's made some unfortunate design decisions, [00:52:11]Swyx: right? [00:52:11]Chris: It's not about beauty. It's about pragmatism. Because when we talk to people, we say, hey, who here wants to rewrite all your code? Generally, not very many people raise their hand and people are willing to in certain cases and there are certain profiles. But if you look at where the majority of the market and where the community is, it's much smaller. Interesting. [00:52:28]Swyx: Well, you mentioned one of the operations that might be tricky is sort of the data layer. I don't know if I exactly understand what specifically is in the data layer, but I think memory constraints are something that people are talking about a lot. Recently, Georgi Griganov of GGML was showing off just the sheer amount of stuff that he can do on a single MacBook. And the analysis from Andrej Karpathy was mostly that it's just because it's memory-constrained, not compute-constrained. So even though you have a lot less compute on a single machine on Apple Silicon, it doesn't actually matter because you're just ultimately optimizing for token output. What memory-specific optimizations on the Mojo design side would you call out as important design choices? [00:53:10]Chris: Yeah, so I think that a lot of the on-device ML or on-device LLM work has really been around 4-bit quantization and 2-bit and 1-bit and things like this. You called them hacks, I think, on your... Okay. [00:53:22]Swyx: I don't think it's hacks. [00:53:24]Chris: I mean, I think it's funny, like if you want to nerd out about it, like a float 32 is a quantized representation of infinite precision floating point numbers, right? You only have 32 bits to be able to represent all of numerics, right? That's a pretty flexible and useful hack, right, from that perspective. So I'm not here to tell you that there's one right way to run a neural network. I want to make it as easy as possible to be able to explore and research and try new things. And if it works well for you, great. The challenge I have with like the 4-bit numeric stuff and with quantization in general is that the way these things are implemented are hacks. And so often it is very hard-coded kernels. So GGML, wonderful project, lots of really cool and smart people working on this. The kernel libraries are very specific, individual things that are available in very hard-coded ways and they don't compose correctly. You know, you want to walk up to it with a novel model, right? GGML requires a lot of rework before you can do that. And not lots of people know C++ that do this stuff. And so anyways, my goal and my quest is to massively reduce that complexity. Within quantization, here's the thing I'll give you to think about, right? So autofusing compilers are better for performance, memory, and accuracy. And the reason for that is that if you're using autofusion, avoiding go-out-to-memory, good for performance. Automatic is better than manual, so it's good for humans that don't have the attention span to do this. But with quantization, it's really interesting because the way you normally implement a quantized operation is that you have higher internal precision than you do the external precision, right? And so if you write out an activation in memory, you have to re-quantize down to eight bits. But often what you'll end up doing is, or take Flute 16 or something, right? The internal activation, or the internal arithmetic is done as Flute 32. Load from memory, and you do like a multiplication of two Flute 16 things and you get a Flute 32 intermediate result. And so in the CPU or in the GPU, in registers, you have higher precision. So now when you do autofusion, you keep things in the higher precision, and so you have less intermediate rounding. And so when you take a big attention block and you do quantized fusion, you actually get, yes, much more flexibility because you can fuse much bigger regions than people can do by hand. You get better performance because you're not writing things out, but you also get better accuracy. And so that's one of the things that, again, [00:55:46]Swyx: That's a free lunch. [00:55:47]Chris: That's pretty great, right? And so, and also you go back to the complexity and the pain and suffering and the, you know, a lot of what Modular's trying to do is reduce suffering in the world. A lot of the quantization tools are just really bad. And it's because, you know, they have this like unmovable kernel library that has a whole bunch of special important cases and they're trying to like pattern match onto it. And so they often have very flaky problems and it's just a huge pain in the butt. And so by solving some of that low-level compiler nerdery, right, it enables you to have better tools, better accuracy, like all these things actually stack out and just leads to better technology. And then is 4-bit the right answer? I mean, 4-bit's cool, 2-bit's cool. All this stuff is cool, right? I mean, I think that there, it really depends on your application or use case. And so allowing people to play with that, that cannot write the kernels, like that's the whole point. [00:56:35]Swyx: Yeah, they can still quantize, but using your approach, like it's just orthogonal. It's just going to be a straight improvement either way. So, yeah. [00:56:41]Chris: Right, exactly. [00:56:42]Alessio: There's still so much we're figuring out, right? The mixture of experts thing, like a few months ago, like people were not really thinking about, then George kind of leaked it on the podcast. Alerted it on our pod. Yeah, and then people started talking about it. A few other people confirmed it, yeah. [00:56:56]Chris: Yeah, yeah, yeah, exactly. [00:56:57]Alessio: As all these people started talking about it, I was like, I didn't say it. Please don't call me Sam Ullman. Speculative execution is another one. Basically, like Karpathy's thing is like, hey, if you're trying to get one token, getting K token in batch is almost the same time. I'm sure Mojo is great for that because it's not single-threaded like Python. You can run parallel. [00:57:18]Chris: So one of the funny things about this is that you've all been in space for a while. It used to be back in the day, ResNet-50 or something, or MNIST, right? What is a neural network? It's machine learning operators, right? Then reinforcement learning came on the scene, right? And now suddenly you're saying an inference ends up being part of the thing the agent does. And then I have a training job that's driving this thing. And now a big RL system ends up being this massively complicated distributed system where you have traditional AI infra lashed together with all this Python and stuff like this. You come back to like stable diffusion with the units, you go look at yield LLM implementation, all the tokenization stuff's in Python. It's super funny when you look at this because what the world is telling us is that this AI infra, these systems are not flexible enough. And so why do you have to do the tokenization in Python? It's because the data layers, the libraries that people build in this stuff are not programmable and you need flexibility. And so people do this. And by putting this stuff into Python, I mean, it's great and I understand that, or rewriting into C++ to deploy it, right? What ends up happening is you lose the ability to do things like PMAP because the graph, the underlying ML model is a declarative specification of compute. But if you can't represent your computation, then you can't transform it, right? And so one of the real purposes of Mojo and the way it integrates with the engine and stuff like this is to give you the best of both worlds where you can say, cool, I can have full programmability. I can write a completely custom tokenization layer or whatever it is I want to do. Or if I have a really compressed on-disk format or I want three bit, whatever the thing is, I can express that. But now it composes into the stack instead of it being a bolt-on on the side that doesn't work well. I've seen the consequence of not building this stuff. And what it does is it drives all this complexity into the system. Or you look at serving layers. There's these platforms like SageMaker, for example. SageMaker is a very popular hosting solution for doing inference on models. But it's really just a TensorFlow or PyTorch that's wrapped up, right? And so sure, you can give it a TensorFlow graph and say, go ahead and serve my TensorFlow graph. But what if you want some pre-processing? Well, you have to set up a microservice next to it, right? And so now you have all this data going in and out over the network just to do one summarization operator before you send something out to the mobile client that you're talking to or something, right? And so the consequence of these design points drives a huge amount of this external complexity into the systems. It just doesn't need to be there. If you do the hard work, it doesn't need to be there if you do all the hard work of first-printing this stuff. [00:59:50]Alessio: What about the post-transformer world? I think we kind of touched about this. And when you have faster transformer and all these things, it's so easy to just do another transformer model. We just did our WKB episode with Eugene Cha. What do you think about transformer alternatives and how closely are you working with some of these groups as you develop modular? [01:00:12]Chris: Yeah, so we're great friends with Chris Ray's research group, and he's pushing on the hyena models with FFTs and things like this. And so I'm not smart enough to know the right thing there, honestly. My take on that is that there's a lot of smart people. I have a hard time believing transformers are the last major macro architecture that will be invented. And so what I'd love to do is enable more people to be able to play with this stuff. I often get asked of, why does anybody care about AI and for if transformers have solved it? It's a super funny question, because the basic assumption there, which is not wrong, the basic assumption is that transformers have eaten everything. They've eaten so much of vision transformers and everything else. They've eaten all the modalities. Therefore, in the fullness of time, they'll eat everything. But the funny thing about that is that that's a very narrow view of, again, what is AI? Because AI also includes massive recommender models where you have huge embeddings and these big, dense matrix multiplications. It also includes the units and things like this. It also kind of ignores the fact that transformers, as a category, there's a lot of consistency and we still have softmax. But if you go back to the first paper, the modern transformer is actually quite different. And so, yes, there's a lot of really good ideas about attention and things like this. But the evolution of this over time has really refined the approaches and a lot of the activation functions have changed. And a lot of stuff and a lot of innovation is still happening in this field. So, I mean, is it FFTs or is it attention? I defer to smarter people that know that stack better. But what I'd love for them to be able to do is not be held back by the architectures of the systems that were massively over-optimized just for attention. – What else should people be on the lookout for modular? [01:01:50]Alessio: So you just released yesterday Mojo download on Linux systems. You have macOS and Windows coming out soon. What are, say, like six, nine months from now, I don't know how much you can share, what is going to be the toolkit? So there's kind of like modular is the engine, Mojo is like the language. What are going to be the other components that people can leverage? – Yeah, yeah. [01:02:08]Chris: As we record, just yesterday, we announced download support for Linux. I've heard of Macs and Apple platforms. Turns out CI is kind of annoying with them. And so, yes, we'll roll out that kind of stuff. So roll out new platforms, of course. One of the things we're, and within Mojo, Mojo is still a young language. And so we have traits coming, hopefully by the end of the year. We have a bunch of things like that that'll be really a big deal for library design and enable new kinds of things to be expressed cleanly. Mojo will mature, right? And so I think that this is a major thing that we're focused on, is actually building Mojo in the right way. And that'll be super exciting. One of the consequences of that is we want a big community around Mojo to build cool stuff. And so as part of building in towards this, we'll start open sourcing Mojo. I think that's something that'll be really great. We just want to make sure that we do it. And again, if we do anything, we want to do it the best possible way. So we want to figure out what is the right contribution model and all this kind of stuff. We want a permissive license. And so we have to nail down a lot of the details to kind of go into this stuff. Because again, we want to be able to build something that works well and have a whole bunch of people that work well together and not just a gigantic, catastrophic mess. [01:03:14]Alessio: Yeah, there's kind of like the Python 2-3 mess that we all got through and nobody wants to remember about it. What's kind of the relationship with Guido and the Python Foundation? And because some of the Mojo stuff is like, this is so good, why isn't it in Python 2? You know, long-term, how are you planning to keep kind of like the two languages in sync? And how are you involved with each other, so to speak? [01:03:38]Chris: Yeah, so Guido for quite some time, from before the launch. And so he's known about Mojo as it's coming. We've been very fortunate. He spent a bunch of time with our team [01:03:47]Swyx: and things like this. [01:03:47]Chris: He occasionally shows up on Discord and gives me a hard time about things. So that's super awesome. [01:03:52]Swyx: What is his pet topic? [01:03:53]Chris: I think that he enjoys trolling us. And so, which I also enjoy. So it's all good. And so like there's Guido himself, then there's a broader question of Python. I consider Mojo to be a member of the Python family. And so there's a number of members of the Python family, by the way, including things like PyPy and Cython and like all this stuff. And so we want to be a good member of the Python family. And what I expect is that Python will continue to evolve and add new stuff. Mojo will continue to evolve and add new stuff, right? And so the analogy I give to people is to go way back 30, 40 years ago, there was C, and then this newcomer came on the scene in 1983 or something called C++, right? And what was C++? Well, it was C with classes, right? And so Python with not just classes, but all the stuff underneath it that you usually do in C, right? And so what happened back in the day is that C and C++ started as two different communities, but there's tons of intermixing and idea sharing and interpollination of ideas. And a lot of the C++ features ended up in C. And then of course, all the C features ended up in C++. And so I expect that same thing to happen. And so I look at it as Python 3 versus Mojo. Python 3 is really defined by its runtime. It's defined by a specific object model. And it really, I mean, if the Python community wants to change that, that would be really interesting. But Mojo is saying, okay, it's defined by a superset of the expressive capability. And so we have fancy MLIR compilers [01:05:21]Swyx: and things like this. [01:05:21]Chris: And so we can have on-stack representations and things that kind of lead to relatives of each other. And I'd like for Mojo to be a superset, right? In terms of all the capabilities. But each of these things will evolve in parallel. You know, I consider, you know, when people come to me and they say, hey, I want like this crazy feature, which should be in Python. I say, great, go talk to Python. We're here to add the systems programming features. We're not here to just add a general, you know, Walrus 2 operator or something. Ooh, that still burns a little bit. [01:05:49]Swyx: But, you know, Python actually did end up adding no-gill after, like, not long after. Well, they haven't added it yet, but there's been discussion about it. Well, also, yeah, I mean, [01:05:58]Chris: I think the gill stuff is also going to be super interesting. They have a five-year journey to add this. And so it's going to be technically very complicated for the community because one of the most beautiful things and pragmatic things about Python is that you drop right down in C. And so much of the Python ecosystem is actually C libraries or C++ or et cetera, right? But then are wrapped by Python, right? But one of the things the no-gill stuff breaks is it breaks a bunch of that glue. And so, like, the ability to get and set attributes, all the C functions for doing that break, right? And so that's going to cause a lot of churn and complexity. And so I'm not involved in the effort, obviously, but from what I can see, the Python community seems like they're walking into this with eyes wide open. Oh, yeah. They understand the trade-offs. I think they're doing a really, like, well-thought-out approach to this. And so I think that it will probably go really well. Now, that's great also, by the way, because Mojo likes threads, because threads are a thing, right? And so this will make it so that the Python ecosystem is more concurrent compatible, which will be great for us. [01:07:00]Swyx: Yeah, but you're already there, so. [01:07:02]Chris: Yeah, exactly. I mean, again, first principle learning something, it's not like, you know, multi-cores of the future anymore, right? Yeah, yeah. [01:07:08]Swyx: One thing you're doing differently than beyond, in terms of, you know, C, C++, and then, you know, Python, Python++, is that you're choosing to build this as a company. Why a company and not a foundation? I think you kind of answered that with the modular first. [01:07:19]Chris: Yeah, so we didn't start modular to build Mojo. We started modular to solve some AI problems, and then said, okay, well, we need to do a language. So I'll reinterpret your question, if it's okay, as saying, why is modular an independent company instead of part of a big tech? Apple or Google or Microsoft or whatever. So there's a number of reasons. Well, so first of all, I'll say, we tried. We collectively, I'll speak on behalf of all of our, the people on our team. Many of us came from big tech. Yeah. Like I worked at Google. I worked on ML infrastructure at Google, right? Literally working on this problem. And many of our people came out of this context. And the challenge, again, these companies are amazing, right? This isn't to bag on big tech. The challenge is, AI infra is not their product, right? So when I was working on XLA for TPUs, when I was working on XLA, it was to enable TPUs. It wasn't some abstract, let's go solve programming model and hardware and this big problem. It was literally enable this hardware because we just installed exaflops of it, and we needed to get to go and work, right? When you look at what is TensorFlow, it's, by the way, part of the cloud organization within Google. So if you want help with TensorFlow, sign up for GCP, and then they can help you, right? What is their product? Then Meta, right? I mean, what are they trying to solve? Well, they're trying to solve their ad stuff. [01:08:34]Swyx: Meta has never had any interest in, yeah, external facing developer stuff. But Microsoft would have had you, like Satya has, you know. [01:08:40]Chris: Yeah, I wouldn't go so far to say that none of these people care. All these people care. And there's so many good engineers within the PyTorch team that care about external developers. But the way to think about this is that all these projects are more of like a hobby than they are the company project, right? And so that difference is actually really important. Like, I mean, if you file a bug against Meta or a bug against PyTorch, you have a bunch of really good engineers that are allowed to work on that, and they want their product to be good. And so they might fix it, but also they might not, right? When we talk to people, not everybody in AI trusts Meta and Google. Often they're directly competing with them, right? And so like, no, I'm not going to actually show you my model so that you can debug the problem. They're conflicted in lots of different ways. And so with Modular as a standalone company, it's super important to us that we're neutral. We're like Switzerland, right? We do not build hardware. We do not have a cloud. We are not building an LLM, right? And so what we're doing is we're building AI Infra in a way that is really good so that you all can go invent all this other stuff and you have the right tools to do it, and we're not competing with you, right? And so that is something that, you know, again, there's lots of really good people, all my friends, you know, in all these different places, right? It's not the engineers or the management is doing anything wrong. It's just that what is their core incentive structure? What do the engineers get promoted for doing? And these things that, you know, actually they're more incentive oriented than they are technically oriented [01:10:08]Swyx: end up mattering a lot. [01:10:08]Chris: And this is one of the reasons why at a hardware company, you're not incentivized to build software that runs on lots of different kinds of hardware, obviously, right? Within Google, you're not incentivized to build things that work great for PyTorch. You know, so there's this problem where the rest of the world is building on AI. They use TensorFlow and PyTorch and lots of hardware and lots of clouds and lots of stuff. And so being able to help people and be aligned with their interests is really useful. [01:10:31]Swyx: One thing I wanted to come back on, you said you don't have a cloud, but the way that people would use the modular inference engine is through your cloud. [01:10:39]Chris: You have cloud engineers. [01:10:40]Swyx: We do have cloud engineers. [01:10:41]Chris: Actually, the way our product gets used is you use it on your cloud. And so we give you roughly a Docker container, and so it can run on cloud, on-prem, on laptops. We have folks using all kinds of different things. And so it's very modular that way. So we'll also build into a hosted product, of course, over time, just out of convenience. A lot of people don't want to do the management themselves, but we're really focused on meet people where they are, right? And we believe that our tech gets adopted faster if it's easy to adopt and easy to use and saying, okay, first move all your stuff to our cloud. [01:11:14]Swyx: It's a valuable thing, [01:11:15]Chris: particularly for people who don't want to manage that, but it just slows down adoption. [01:11:18]Swyx: So a bit more company origin story stuff, because I just love company origin story type things. Your co-founder is Tim Davis, who you've worked with for a while. He's also had a couple of other startups under his belt. You get the idea for modular at SciFive, and you talked to the big clouds, and they didn't really want it, or you just arrived at the conclusion that it wouldn't be the best place for it. How did you go about founding the company? Yeah, good question. [01:11:40]Chris: So I've been working on this stuff since 2016, 2017, right? So I've been working on AI and for of different points. So Tesla doing applied. How do we make cars drive themselves? At Google, bringing up a hardware program and trying to get TensorFlow to be architected better, let's say. Then I was dissatisfied for various reasons with what was going on at Google and with not taking PyTorch seriously and things like that. And so I went and joined a hardware startup. When I did that, I really wanted to solve this problem, but the timing that was in 2020, which was right before the pandemic, by the way, it wasn't right, right? Because at that time, there's still a lot of things were unknown. PyTorch was still figuring stuff out, and they had a lot of very ambitious projects. And at the time, I'm like, okay, well, I assume that Meta will go off and solve these problems, right? And so I joined a hardware startup to understand the other side, the business strategy, the commercial side of things, how the company building side of things and all this kind of stuff, learned a ton. Also that I'm a software person, not a hardware person, but Tim was going through his own journey. And so Tim joined Google Brain roughly the same time I did in 2017. We worked together very closely. I was on the data center TPU side. He was on the mobile side with Android and all that kind of stuff. I was engineering, he was product. We were very complimentary that whole time. He stayed at Google through all that time until about 2020 and to 2021 through 2021. And so we kind of got to the points in our journey where we're saying, okay, well, what are we going to do next? And so middle of 2021, we said, okay, well, this AI infra problem is still a thing. This is, in our opinion, was not getting fixed. We looked at this and said, okay, well, what are the problems in the space? A reductive way of asking the question is you say, if AI is so important to the world, this was before chat GPT, but AI was important before chat GPT, by the way. If it's so important to the world, why is all the software so bad? Why is it so hard to deploy a model? I mean, we did huge amounts of work to make it easy to train models, but getting them into production is still very, very challenging. And so what we did is we broke this all down and we said, okay, well, there's really three kinds of software in the world. There's the hardware specific software. So CUDA or the XLA stack or the Apple neural engine stack with Core ML, things like this. And it's not the hardware people's fault, but they have to build this vertical software stack for their hardware because there's nothing to plug into. There's no LLVM for machine learning, right? And as a consequence of doing that, and they're not malicious, but they end up fragmenting the universe because they all have to build different stuff. Okay, so that's one third of the software in AI. Another third is the frameworks. So you've got TensorFlow, you got PyTorch, you got TVM, you got like all this stuff out there. All these things were, you know, they're eight years old. The infrastructure itself was research, right? These things were built in a different era of what ML was, and they got evolved along the way and new hardware and new use cases and all this stuff. And they were never intentionally designed by, you know, from what we know now. Furthermore, often because AI was so important to their host companies, hundreds of people got thrown at it, right? And so I don't know how much money has been spent on TensorFlow or PyTorch, but it's a lot, right? And so you get all these people that are kind of hacking away in the combination of lots of hands and not a lot of clear vision. I mean, it's easier to understand in hindsight than it is to predict what AI will look like in five years, right? It means that it will generate a lot of stuff, which is maybe not the most clean architecture, right? And so we get these systems that have lots of well-known problems. And so PyTorch, for example, it's pretty difficult to deploy. It's pretty well-known. It doesn't really work great with lots of non-NVIDIA hardware, right? It doesn't scale super well for LLMs. These things are pretty well-known, but they're very difficult fundamentally to fix. And the PyTorch engineers are doing really great work. They're working hard on this, but it's really hard to fix given the environment that they're in. And so because you've got the hardware side of things that's fragmenting software, you've got the framework side that is, you know, they're tied to the architecture that they started with and things evolved. What we've got is we've got a lot of people who want to make AI easy. And so MLOps is this category that evolved. And what I think a lot of these folks tried to do is they said, hey, let's make it easy by making the API super simple. So AutoML, one example of this, maybe the most extreme, but lots of other people said, hey, I'm going to add a layer of Python on top of this gigantic mess, and that will make it easy to do AI. But the challenge is you can't solve programmability or performance or hardware capabilities or new kinds of algorithms or like security, like these core problem deployability, these core problems that people are struggling with by adding a layer of Python on top, at least not without giving up the mad joy of like all the craziness of AI research. Right? And so what we decided to do is we said, okay, well, let's go back and first principles this thing, like what is causing all of this madness? Well, it's because there's no thing for people to plug into. Let's go do that hard thing. Let's go build from the bottom up. One of our first blog posts was, you know, it's before we could say what we were doing. It's like the mission statement of what let's actually design and first principles of stuff. Let's build this unifying platform. Let's tackle the hard problem. And so that's what we decided to do. [01:16:47]Alessio: At Decibel, our team is kind of like early believers and technical founders. And we see a lot of founders like yourself. You have a very long career. It's like an amazing engineer. And then all of a sudden you're like in the CEO seat. What are some of the learnings that you've had building a team, mentoring people, especially when I'm sure a lot of your work has been mentoring engineer, and now it's like also having the product head, also having the fundraising head, any stories and learnings? [01:17:13]Chris: So at Modular, my co-founder, Tim and I, we're like two in a box, right? So one of the things that I think is really special is that we have a very strong relationship and we complement each other very well, yin and yang, right? And so having somebody to talk to is really, really important. And it's not something that I've had being engineering leader at Google or engineering leader at Apple or something like this. And so that I think is super special. I'll also say that, you know, I've built many teams, many products and technologies. And so I built all this kind of stuff, but it's always within somebody else's context. And so it's really nice to not have to clean up somebody else's mess, right? [01:17:46]Swyx: Well, it's your mess now. Yeah, exactly. [01:17:48]Chris: And so also you get to, again, you get a first principles of everything. Like how do we think about comp? How do we think about, you know, a lot of the philosophy at Modular was, okay, well, you know, our belief when we started the company was we understood the pain. I'll speak on behalf of Tim. Tim understood the pain with his Google hat on, right? And he worked with a lot of customers outside and things like this, but having a Google hat on is very different than having a startup hat on, right? And so when we started the company, we started and said, okay, well, Chris goes and engineering leader, go start building the thing and build the engineering team and all that kind of stuff. Tim goes and builds the product side and the business side and things like this and goes and interviews 50 or 100 different companies without a Google hat on. What is your pain point? What are you doing? What are your challenges? How can we help? We're thinking about building X. What do you think about that? And really hone the vision. And that's what allowed us to come back together. And so the challenging things about being Modulars, we're trying to build something that is really hard. It's a super hard tech problem. Also pretty abstract. I mean, it's getting less abstract now that it's working and it's all coming together and we can announce things, right? But solving this problem requires hiring these very expensive specialists out of all these big tech companies, right? And so that really formed and shaped a lot of our initial conditions, how we thought about things. And again, when you're first principaling this, you say, okay, well, because of that, I have to raise a lot of money. I have to be able to incentivize people well. I have to be able to pay them. I need to be able to make it comfortable, like make it so that they're not fish out of water. And a lot of that shapes how you do this stuff. And so I've really enjoyed it. I think that it's a lot of fun. It's also great because we can do things where, you know, you come back to, is TensorFlow or PyTorch a product? I would say no, but I'd also say self-reflectively, many of the things I've worked on for like Swift, for example, right? Or even Xcode are products in the sense of they are, there's a product manager and there's a team that works on it and that you ship it to customers, but it is not the core product of the company. Xcode is a loss center, right? Apple doesn't make money on it. It is because it is detached. It's kind of one level indirect from the customer, right? It's very easy for that team or for a support team like that or like the TensorFlow or the PyTorch team equivalently to go work on interesting technical projects that get very divorced from the customer because you don't really know what they're doing. And so for us, we're directly customer facing, right? We see the pain. And in AI, as I think you probably know, right? There's a lot of pain and building and deploying these things is really a mess. And sure, throw a layer of Python on top, you can make a demo simple, right? But a lot of the pain that the leading companies and the leading people that are building these things are facing are not that kind of a problem. It's that they're surrounded by too many things that don't really work, right? And so a lot of our vision on let's go unify all this stuff. Let's have fewer things that work better came directly out of talking to teams that their problem is that they're building a product and their product changes. They're not using one model. Their needs over time evolve. And okay, well, now we have a mobile product. Well, now what does that mean? That's a completely different universe, right? And so what ends up happening with the teams we work with is that they're often quite sophisticated and they've evolved lots of different messy systems for different special cases and it's killing them, right? And so they often want to just be able to run faster, right? Do I need a team of 50 engineers to deploy this model? Why do I need that? [01:21:09]Swyx: I was also curious about your learnings as an engineering leader. So you've just had tons of experience building teams and hiring engineers. Obviously people want to work with you naturally. So you just naturally get a buff. Oh yeah, so it's easy, right? What is your learnings or advice or just on the engineering management side of things? [01:21:26]Chris: Yeah, so I mean, I think there's different things. I consider my job is to help the team win, right? So I do what it takes to win. And you have to be like, starting from wanting to win is actually something that some people take for granted, right? And so you have to define what winning is. And so giving people a clear vision, having a clear purpose, keeping people aligned, super, super important when you've got a whole bunch of really good people that are all wanting to be heroes in their own journey, right? If the vectors add up, you can make a lot of progress really fast. If they're pointing against each other, they cancel out, right? Within, you know, because of who I am and what I like to do, like I will often help build the initial foundation of the thing myself. And so showing the team how to build things is really good for not just like, because I built a lot of this stuff before, like directly contribute, but also saying the culture. So one of the things that is really important to me in an engineering team is how fast can you spin, right? If you're sitting there and you have to wait 24 hours or three weeks for CI to run, well, it just slows everything down, right? And so, well, what does that mean? Well, it means testing strategy. It means like all of these things are just like core software engineering problems end up mattering a lot. And once you get a culture in there, like, you know, low dependencies, like do not just suck in third-party dependencies and hope it'll be great. Because there's lots of these things that kind of come into this. And then what you end up doing is you end up building a culture within the team. Now, when you do that, now you have really good people. You have to identify first when you're hiring, but also as people are evolving, like what are people good at, right? And I really believe that if you have a really powerful engineer, for example, or product manager or whatever, [01:23:01]Swyx: if they're really good, [01:23:01]Chris: you can throw them at any problem and they can make progress, right? But if you have somebody who's really good and really passionate and you line them up with something they really want to be doing, well, then they'll have superpowers, right? And so a lot of it is making sure people are working on the right problems. And so they're able to grow and do things and push and they have agency to own decisions and they're able to do things. And so it's kind of like this ongoing, like evolving dance, particularly in a high growth team, where what you're doing is you're looking for not just what are the lines of code you write, but also what are you contributing, right? And things like this. And so there's a lot of building a team that I'm not the guy that's going to write a management textbook or something like that, right? I mean, you should. I should probably write a compiler textbook first. [01:23:47]Swyx: Yeah, you have many contributions. [01:23:50]Chris: I like building the thing, unfortunately. And so I don't slow down for stuff like that. But a lot of it is, people get very focused on often the product or if they're really, really smart and they're good at business, they focus on the customer and the problems the customer has, right? But you can't solve and build the product without having the team. And so, so much of these things end up being these virtuous loops. And so thinking about all parts of those problems, I think is really an important part of being a leader and being a team. And again, this is one of the reasons I love Tim [01:24:21]Swyx: and love working with him [01:24:21]Chris: because he's really great at ways that I'm less great at and we're both learning from each other. Before we do landing ground, [01:24:27]Alessio: any people that should be joining your team, any role that you have open that you're looking for? [01:24:32]Chris: We are growing quite a bit. We are focused on a whole bunch of different things, including hardware, software boundary. And so if you're a kernel engineer, you care about performance, GPUs and like all the weird things that are out there, right? This is a major focus. We are not hiring researchers, but we really love applied people that like actually get a model to work in production and do things. And that's really great for us. We have a lot of customer engagements and things like that going on that can be very helpful and valuable with that. We're also growing out our go-to-market team and there's many different kinds of roles. You can check out our career page and we have a number of positions posted there. [01:25:06]Swyx: Awesome. [01:25:07]Alessio: So we have our usual three questions before wrapping up. One is on acceleration, one is on exploration, and then I'll take it away. So the acceleration one is, what's something that already happened in AI that you thought would take much longer to be here? [01:25:21]Chris: So the chat GPT explosion, I thought was super interesting, right? And for folks like us that have been paying attention to AI for a long time, chat GPT was super interesting to me because it was a user interface innovation. And chat GPT happened and then GPT-4 happened and the world generally didn't even notice GPT-4. Nerds like us did, right? But they had no idea, they don't care. Chat GPT was the thing that really got people excited and it was really, you know, RLHF, like, I mean, that goes into all this stuff, right? But it was really about the user interface and how they use it. And suddenly it opened people's minds to the power of what AI can do. And so I thought that was super interesting. And from a looking backwards perspective, I thought that brought AI forward in the public consciousness by several years, I think. [01:26:10]Swyx: I always say you want to combine model with modality. Like chat GPT, you know, we had Clippy before and Clippy never took off. But anyway, so the time was right. What do you think is the most interesting unsolved question in AI? Maybe not the one you're tackling. [01:26:24]Chris: There's lots of smart people with lots of different opinions about what AI is, right? And there are certain people that you know, and I know that think that everything just be an end-to-end neural net and software should go away, right? I think that the open question is, what is the balance between trained algorithms and intelligently designed algorithms? I do not believe personally that it is all one or all the other, right? And if you want to build a cat detector, then a CNN is a really good way to do that. If you want to write a bootloader or an operating system, then for loops are a good way to do that, right? But where do things phase out over time and how do we make it so that app developers can think about these things more consistently instead of thinking about them as, you know, category A versus category B, right? And I mean, part of my bet is that AI as a software development approach ends up being, you know, part of the tool set of how people think about building applications. You know, where applications are not just like an iPhone app or something like that, but it's your cloud services, your data pipeline. It's like this whole complicated dance that leads to building a user product, right? And so I think that we as an industry haven't yet figured that out, right? I mean, it's just so early. AI is like in its adolescent years right now. [01:27:35]Alessio: It's funny because like doing this podcast, we're like, oh, remember that? And then you look at the timestamp and it was like three months ago. Exactly. You know, it's kind of you look back and it's like, oh, it's not even one year since JGBT came out, you know? And we went from like no AI safety discourse, for example, to like AI is going to end the world. Then it's like, all we did was I put a chat online, you know, so it kind of makes you wonder. [01:27:58]Chris: And I'll admit, like in 2017, there was a bunch of people focused on safety. And I'm like, why does this matter? Right? And they were just ahead of their time. Now it's pretty clear. [01:28:06]Swyx: Yeah, exactly. [01:28:06]Chris: That's exactly right. [01:28:07]Swyx: They took it seriously when the rest of us were only looking at the math. Yeah. [01:28:11]Chris: Well, and that's one of the things I really love about some of the OG people like Jeff Hinton and some of these folks like Jan Leku because they were into AI before it was cool, right? They were working on this stuff before it was obvious to everyone. And I think that they have seen and can integrate across a much longer timeframe. And that the wisdom that comes out of that, I think enables them to do even today, really amazing things that they get that better perspective for. [01:28:36]Alessio: Awesome. Before we wrap, Chris, any final takeaway message that you want everybody to think about and remember? [01:28:43]Chris: No, I mean, thank you for having me. I mean, this is a lot of fun and I really love being able to talk at a much more technical level about the AI part of what we're doing. And so I'm just so excited about where things are, what's happening, what the world's building, like just everything about what's happening right now is just super exciting to me. Awesome. [01:29:01]Alessio: Thank you so much, Chris. [01:29:02]Swyx: Thank you. [01:29:02] Get full access to Latent Space at www.latent.space/subscribe
01:29:2214/09/2023
The Point of LangChain — with Harrison Chase of LangChain
As alluded to on the pod, LangChain has just launched LangChain Hub: “the go-to place for developers to discover new use cases and polished prompts.” It’s available to everyone with a LangSmith account, no invite code necessary. Check it out!In 2023, LangChain has speedrun the race from 2:00 to 4:00 to 7:00 Silicon Valley Time. From the back to back $10m Benchmark seed and (rumored) $20-25m Sequoia Series A in April, to back to back critiques of “LangChain is Pointless” and “The Problem with LangChain” in July, to teaching with Andrew Ng and keynoting at basically every AI conference this fall (including ours), it has been an extreme rollercoaster for Harrison and his growing team creating one of the most popular (>60k stars at time of writing) building blocks for AI Engineers.LangChain’s OriginsThe first commit to LangChain shows its humble origins as a light wrapper around Python’s formatter.format for prompt templating. But as Harrison tells the story, even his first experience with text-davinci-002 in early 2022 was focused on chatting with data from their internal company Notion and Slack, what is now known as Retrieval Augmented Generation (RAG). As the Generative AI meetup scene came to life post Stable Diffusion, Harrison saw a need for common abstractions for what people were building with text LLMs at the time:* LLM Math, aka Riley Goodside’s “You Can’t Do Math” REPL-in-the-loop (PR #8)* Self-Ask With Search, Ofir Press’ agent pattern (PR #9) (later ReAct, PR #24)* NatBot, Nat Friedman’s browser controlling agent (PR #18)* Adapters for OpenAI, Cohere, and HuggingFaceHubAll this was built and launched in a few days from Oct 16-25, 2022. Turning research ideas/exciting usecases into software quickly and often has been in the LangChain DNA from Day 1 and likely a big driver of LangChain’s success, to date amassing the largest community of AI Engineers and being the default launch framework for every big name from Nvidia to OpenAI:Dancing with GiantsBut AI Engineering is built atop of constantly moving tectonic shifts: * ChatGPT launched in November (“The Day the AGI Was Born”) and the API released in March. Before the ChatGPT API, OpenAI did not have a chat endpoint. In order to build a chatbot with history, you had to make sure to chain all messages and prompt for completion. LangChain made it easy to do that out of the box, which was a huge driver of usage. * Today, OpenAI has gone all-in on the chat API and is deprecating the old completions models, essentially baking in the chat pattern as the default way most engineers should interact with LLMs… and reducing (but not eliminating) the value of ConversationChains.* And there have been more updates since: Plugins released in API form as Functions in June (one of our top pods ever… reducing but not eliminating the value of OutputParsers) and Finetuning in August (arguably reducing some need for Retrieval and Prompt tooling). With each update, OpenAI and other frontier model labs realign the roadmaps of this nascent industry, and Harrison credits the modular design of LangChain in staying relevant. LangChain has not been merely responsive either: LangChain added Agents in November, well before they became the hottest topic of the AI Summer, and now Agents feature as one of LangChain’s top two usecases. LangChain’s problem for podcasters and newcomers alike is its sheer scope - it is the world’s most complete AI framework, but it also has a sprawling surface area that is difficult to fully grasp or document in one sitting. This means it’s time for the trademark Latent Space move (ChatGPT, GPT4, Auto-GPT, and Code Interpreter Advanced Data Analysis GPT4.5): the executive summary!What is LangChain?As Harrison explains, LangChain is an open source framework for building context-aware reasoning applications, available in Python and JS/TS.It launched in Oct 2022 with the central value proposition of “composability”, aka the idea that every AI engineer will want to switch LLMs, and combine LLMs with other things into “chains”, using a flexible interface that can be saved via a schema.Today, LangChain’s principal offerings can be grouped as:* Components: isolated modules/abstractions* Model I/O* Models (for LLM/Chat/Embeddings, from OpenAI, Anthropic, Cohere, etc)* Prompts (Templates, ExampleSelectors, OutputParsers)* Retrieval (revised and reintroduced in March)* Document Loaders (eg from CSV, JSON, Markdown, PDF)* Text Splitters (15+ various strategies for chunking text to fit token limits)* Retrievers (generic interface for turning an unstructed query into a set of documents - for self-querying, contextual compression, ensembling)* Vector Stores (retrievers that search by similarity of embeddings)* Indexers (sync documents from any source into a vector store without duplication)* Memory (for long running chats, whether a simple Buffer, Knowledge Graph, Summary, or Vector Store)* Use-Cases: compositions of Components* Chains: combining a PromptTemplate, LLM Model and optional OutputParser* with Router, Sequential, and Transform Chains for advanced usecases* savable, sharable schemas that can be loaded from LangChainHub* Agents: a chain that has access to a suite of tools, of nondeterministic length because the LLM is used as a reasoning engine to determine which actions to take and in which order. Notable 100LOC explainer here.* Tools (interfaces that an agent can use to interact with the world - preset list here. Includes things like ChatGPT plugins, Google Search, WolframAlpha. Groups of tools are bundled up as toolkits)* AgentExecutor (the agent runtime, basically the while loop, with support for controls, timeouts, memory sharing, etc)* LangChain has also added a Callbacks system for instrumenting each stage of LLM, Chain, and Agent calls (which enables LangSmith, LangChain’s first cloud product), and most recently an Expression Language, a declarative way to compose chains.LangChain the company incorporated in January 2023, announced their seed round in April, and launched LangSmith in July. At time of writing, the company has 93k followers, their Discord has 31k members and their weekly webinars are attended by thousands of people live.The full-featuredness of LangChain means it is often the first starting point for building any mainstream LLM use case, because they are most likely to have working guides for the new developer. Logan (our first guest!) from OpenAI has been a notable fan of both LangChain and LangSmith (they will be running the first LangChain + OpenAI workshop at AI Eng Summit). However, LangChain is not without its critics, with Aravind Srinivas, Jim Fan, Max Woolf, Mckay Wrigley and the general Reddit/HN community describing frustrations with the value of their abstractions, and many are attempting to write their own (the common experience of adding and then removing LangChain is something we covered in our Agents writeup). Harrison compares this with the timeless ORM debate on the value of abstractions.LangSmithLast month, Harrison launched LangSmith, their LLM observability tool and first cloud product. LangSmith makes it easy to monitor all the different primitives that LangChain offers (agents, chains, LLMs) as well as making it easy to share and evaluate them both through heuristics (i.e. manually written ones) and “LLM evaluating LLM” flows. The top HN comment in the “LangChain is Pointless” thread observed that orchestration is the smallest part of the work, and the bulk of it is prompt tuning and data serialization. When asked this directly our pod, Harrison agreed:“I agree that those are big pain points that get exacerbated when you have these complex chains and agents where you can't really see what's going on inside of them. And I think that's partially why we built Langsmith…” (48min mark)You can watch the full launch on the LangChain YouTube:It’s clear that the target audience for LangChain is expanding to folks who are building complex, production applications rather than focusing on the simpler “Q&A your docs” use cases that made it popular in the first place. As the AI Engineer space matures, there will be more and more tools graduating from supporting “hobby” projects to more enterprise-y use cases. In this episode we run through some of the history of LangChain, how it’s growing from an open source project to one of the highest valued AI startups out there, and its future. We hope you enjoy it!Show Notes* LangChain* LangChain’s Berkshire Hathaway Homepage* Abstractions tweet* LangSmith* LangSmith Cookbooks repo* LangChain Retrieval blog* Evaluating CSV Question/Answering blog and YouTube* MultiOn Partner blog* Harvard Sports Analytics Collective* Evaluating RAG Webinar* awesome-langchain:* LLM Math Chain* Self-Ask* LangChain Hub UI* “LangChain is Pointless”* Harrison’s links* sports - estimating player compatibility in the NBA* early interest in prompt injections* GitHub* TwitterTimestamps* [00:00:00] Introduction* [00:00:48] Harrison's background and how sports led him into ML* [00:04:54] The inspiration for creating LangChain - abstracting common patterns seen in other GPT-3 projects* [00:05:51] Overview of LangChain - a framework for building context-aware reasoning applications* [00:10:09] Components of LangChain - modules, chains, agents, etc.* [00:14:39] Underappreciated parts of LangChain - text splitters, retrieval algorithms like self-query* [00:18:46] Hiring at LangChain* [00:20:27] Designing the LangChain architecture - balancing flexibility and structure* [00:24:09] The difference between chains and agents in LangChain* [00:25:08] Prompt engineering and LangChain* [00:26:16] Announcing LangSmith* [00:30:50] Writing custom evaluators in LangSmith* [00:33:19] Reducing hallucinations - fixing retrieval vs generation issues* [00:38:17] The challenges of long context windows* [00:40:01] LangChain's multi-programming language strategy* [00:45:55] Most popular LangChain blog posts - deep dives into specific topics* [00:50:25] Responding to LangChain criticisms* [00:54:11] Harrison's advice to AI engineers* [00:55:43] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai. [00:00:19]Swyx: Welcome. Today we have Harrison Chase in the studio with us. Welcome Harrison. [00:00:23]Harrison: Thank you guys for having me. I'm excited to be here. [00:00:25]Swyx: It's been a long time coming. We've been asking you for a little bit and we're really glad that you got some time to join us in the studio. Yeah. [00:00:32]Harrison: I've been dodging you guys for a while. [00:00:34]Swyx: About seven months. You pulled me in here. [00:00:37]Alessio: About seven months. But it's all good. I totally understand. [00:00:38]Swyx: We like to introduce people through the official backgrounds and then ask you a little bit about your personal side. So you went to Harvard, class of 2017. You don't list what you did in Harvard. Was it CS? [00:00:48]Harrison: Stats and CS. [00:00:50]Swyx: That's awesome. I love me some good stats. [00:00:52]Harrison: I got into it through stats, through doing sports analytics. And then there was so much overlap between stats and CS that I found myself doing more and more of that. [00:00:59]Swyx: And it's interesting that a lot of the math that you learn in stats actually comes over into machine learning which you applied at Kensho as a machine learning engineer and Robust Intelligence, which seems to be the home of a lot of AI founders.Harrison: It does. Yeah. Swyx: And you started LangChain, I think around November 2022 and incorporated in January. Yeah. [00:01:19]Harrison: I was looking it up for the podcast and the first tweet was on, I think October 24th. So just before the end of November or end of October. [00:01:26]Swyx: Yeah. So that's your LinkedIn. What should people know about you on the personal side that's not obvious on LinkedIn? [00:01:33]Harrison: A lot of how I got into this is all through sports actually. Like I'm a big sports fan, played a lot of soccer growing up and then really big fan of the NBA and NFL. And so freshman year at college showed up and I knew I liked math. I knew I liked sports. One of the clubs that was there was the Sports Analytics Collective. And so I joined that freshman year, I was doing a lot of stuff in like Excel, just like basic stats, but then like wanted to do more advanced stuff. So learn to code, learn kind of like data science and machine learning through that way. Kind of like just kept on going down that path. I think sports is a great entryway to data science and machine learning. There's a lot of like numbers out there. People like really care. Like I remember, I think sophomore, junior year, I was in the Sports Collective and the main thing we had was a blog. And so we wrote a blog. It wasn't me. One of the other people in the club wrote a blog predicting the NFL season. I think they made some kind of like with stats and I think their stats showed that like the Dolphins would end up beating the Patriots and New England got like pissed about it, of course. So people like really care and they'll give you feedback about whether you're like models doing well or poorly. And so you get that. And then you also get like instantaneous kind of like, well, not instantaneous, but really quick feedback. Like if you predict a game, the game happens that night. Like you don't have to wait a year to see what happens. So I think sports is a great kind of like entryway for kind of like data science. [00:02:43]Alessio: There was actually my first article on the Twilio blog with a Python script to like predict pricing of like Daily Fantasy players based on my past week performance. Yeah, I don't know. It's a good getaway drug. [00:02:56]Swyx: And on my end, the way I got into finance was through sports betting. So maybe we all have some ties in there. Was like Moneyball a big inspiration? The movie? [00:03:06]Harrison: Honestly, not really. I don't really like baseball. That's like the big thing. [00:03:10]Swyx: Let's call it a lot of stats. Cool. Well, we can dive right into LangChain, which is what everyone is excited about. But feel free to make all the sports analogies you want. That really drives home a lot of points. What was your GPT aha moment? When did you start working on GPT itself? Maybe not LangChain, just anything to do with the GPT API? [00:03:29]Harrison: I think it probably started around the time we had a company hackathon. I think that was before I launched LangChain. I'm trying to remember the exact sequence of events, but I do remember that at the hackathon I worked with Will, who's now actually at LangChain as well, and then two other members of Robust. And we made basically a bot where you could ask questions of Notion and Slack. And so I think, yeah, RAG, basically. And I think I wanted to try that out because I'd heard that it was getting good. I'm trying to remember if I did anything before that to realize that it was good. So then I would focus on that on the hackathon. I can't remember or not, but that was one of the first times that I built something [00:04:06]Swyx: with GPT-3. There wasn't that much opportunity before because the API access wasn't that widespread. You had to get into some kind of program to get that. [00:04:16]Harrison: DaVinci-002 was not terrible, but they did an upgrade to get it to there, and they didn't really publicize that as much. And so I think I remember playing around with it when the first DaVinci model came out. I was like, this is cool, but it's not amazing. You'd have to do a lot of work to get it to do something. But then I think that February or something, I think of 2022, they upgraded it and it was it got better, but I think they made less of an announcement around it. And so I just, yeah, it kind of slipped under the radar for me, at least. [00:04:45]Alessio: And what was the step into LangChain? So you did the hackathon, and then as you were building the kind of RAG product, you felt like the developer experience wasn't that great? Or what was the inspiration? [00:04:54]Harrison: No, honestly, so around that time, I knew I was going to leave my previous job. I was trying to figure out what I was going to do next. I went to a bunch of meetups and other events. This was like the September, August, September of that year. So after Stable Diffusion, but before ChatGPT. So there was interest in generative AI as a space, but not a lot of people hacking on language models yet. But there were definitely some. And so I would go to these meetups and just chat with people and basically saw some common abstractions in terms of what they were building, and then thought it would be a cool side project to factor out some of those common abstractions. And that became kind of like LangChain. I looked up again before this, because I remember I did a tweet thread on Twitter to announce LangChain. And we can talk about what LangChain is. It's a series of components. And then there's some end-to-end modules. And there was three end-to-end modules that were in the initial release. One was NatBot. So this was the web agent by Nat Friedman. Another was LLM Math Chain. So it would construct- [00:05:51]Swyx: GPT-3 cannot do math. [00:05:53]Harrison: Yeah, exactly. And then the third was Self-Ask. So some type of RAG search, similar to React style agent. So those were some of the patterns in terms of what I was seeing. And those all came from open source or academic examples, because the people who were actually working on this were building startups. And they were doing things like question answering over your databases, question answering over SQL, things like that. But I couldn't use their code as kind of like inspiration to factor things out. [00:06:18]Swyx: I talked to you a little bit, actually, roundabout, right after you announced LangChain. I'm honored. I think I'm one of many. This is your first open source project. [00:06:26]Harrison: No, that's not actually true. I released, because I like sports stats. And so I remember I did release some really small, random Python package for scraping data from basketball reference or something. I'm pretty sure I released that. So first project to get a star on GitHub, let's say that. [00:06:45]Swyx: Did you reference anything? What was the inspirations, like other frameworks that you look to when open sourcing LangChain or announcing it or anything like that? [00:06:53]Harrison: I mean, the only main thing that I looked for... I remember reading a Hacker News post a little bit before about how a readme on the project goes a long way. [00:07:02]Swyx: Readme's help. [00:07:03]Harrison: Yeah. And so I looked at it and was like, put some status checks at the top and have the title and then one or two lines and then just right into installation. And so that's the main thing that I looked at in terms of how to structure it. Because yeah, I hadn't done open source before. I didn't really know how to communicate that aspect of the marketing or getting people to use it. I think I had some trouble finding it, but I finally found it and used that as a lot [00:07:25]Swyx: of the inspiration there. Yeah. It was one of the subjects of my write-up how it was surprising to me that significant open source experience actually didn't seem to matter in the new wave of AI tooling. Most like auto-GPTs, Torrents, that was his first open source project ever. And that became auto-GPT. Yeah. I don't know. To me, it's just interesting how open source experience is kind of fungible or not necessary. Or you can kind of learn it on the job. [00:07:49]Alessio: Overvalued. [00:07:50]Swyx: Overvalued. Okay. You said it, not me. [00:07:53]Alessio: What's your description of LangChain today? I think when I built the LangChain Hub UI in January, there were a few things. And I think you were one of the first people to talk about agents that were already in there before it got hot now. And it's obviously evolved into a much bigger framework today. Run people through what LangChain is today, how they should think about it, and all of that. [00:08:14]Harrison: The way that we describe it or think about it internally is that LangChain is basically... I started off saying LangChain's a framework for building LLM applications, but that's really vague and not really specific. And I think part of the issue is LangChain does do a lot, so it's hard to be somewhat specific. But I think the way that we think about it internally, in terms of prioritization, what to focus on, is basically LangChain's a framework for building context-aware reasoning applications. And so that's a bit of a mouthful, but I think that speaks to a lot of the core parts of what's in LangChain. And so what concretely that means in LangChain, there's really two things. One is a set of components and modules. And these would be the prompt template abstraction, the LLM abstraction, chat model abstraction, vector store abstraction, text splitters, document loaders. And so these are combinations of things that we build and we implement, or we just have integrations with. So we don't have any language models ourselves. We don't have any vector stores ourselves, but we integrate with a lot of them. And then the text splitters, we have our own logic for that. The document loaders, we have our own logic for that. And so those are the individual modules. But then I think another big part of LangChain, and probably the part that got people using it the most, is the end-to-end chains or applications. So we have a lot of chains for getting started with question answering over your documents, chat question answering, question answering over SQL databases, agent stuff that you can plug in off the box. And that basically combines these components in a series of specific ways to do this. So if you think about a question answering app, you need a lot of different components kind of stacked. And there's a bunch of different ways to do question answering apps. So this is a bit of an overgeneralization, but basically, you know, you have some component that looks up an embedding from a vector store, and then you put that into the prompt template with the question and the context, and maybe you have the chat history as well. And then that generates an answer, and then maybe you parse that out, or you do something with the answer there. And so there's just this sequence of things that you basically stack in a particular way. And so we just provide a bunch of those assembled chains off the shelf to make it really easy to get started in a few lines of code. [00:10:09]Alessio: And just to give people context, when you first released LangChain, OpenAI did not have a chat API. It was a completion-only API. So you had to do all the human assistant, like prompting and whatnot. So you abstracted a lot of that away. I think the most interesting thing to me is you're kind of the Switzerland of this developer land. There's a bunch of vector databases that are killing each other out there to get people to embed data in them, and you're like, I love you all. You all are great. How do you think about being an opinionated framework versus leaving a lot of choice to the user? I mean, in terms of spending time into this integration, it's like you only have 10 people on the team. Obviously that takes time. Yeah. What's that process like for you all? [00:10:50]Harrison: I think right off the bat, having different options for language models. I mean, language models is the main one that right off the bat we knew we wanted to support a bunch of different options for. There's a lot to discuss there. People want optionality between different language models. They want to try it out. They want to maybe change to ones that are cheaper as new ones kind of emerge. They don't want to get stuck into one particular one if a better one comes out. There's some challenges there as well. Prompts don't really transfer. And so there's a lot of nuance there. But from the bat, having this optionality between the language model providers was a big important part because I think that was just something we felt really strongly about. We believe there's not just going to be one model that rules them all. There's going to be a bunch of different models that are good for a bunch of different use cases. I did not anticipate the number of vector stores that would emerge. I don't know how many we supported in the initial release. It probably wasn't as big of a focus as language models was. But I think it kind of quickly became so, especially when Postgres and Elastic and Redis started building their vector store implementations. We saw that some people might not want to use a dedicated vector store. Maybe they want to use traditional databases. I think to your point around what we're opinionated about, I think the thing that we believe most strongly is it's super early in the space and super fast moving. And so there's a lot of uncertainty about how things will shake out in terms of what role will vector databases play? How many will there be? And so I think a lot of it has always kind of been this optionality and ability to switch and not getting locked in. [00:12:19]Swyx: There's other pieces of LangChain which maybe don't get as much attention sometimes. And the way that you explained LangChain is somewhat different from the docs. I don't know how to square this. So for example, you have at the top level in your docs, you have, we mentioned ModelIO, we mentioned Retrieval, we mentioned Chains. Then you have a concept called Agents, which I don't know if exactly matches what other people call Agents. And we also talked about Memory. And then finally there's Callbacks. Are there any of the less understood concepts in LangChain that you want to give some air to? [00:12:53]Harrison: I mean, I think buried in ModelIO is some stuff around like few-shot example selectors that I think is really powerful. That's a workhorse. [00:13:01]Swyx: Yeah. I think that's where I start with LangChain. [00:13:04]Harrison: It's one of those things that you probably don't, if you're building an application, you probably don't start with it. You probably start with like a zero-shot prompt. But I think that's a really powerful one that's probably just talked about less because you don't need it right off the bat. And for those of you who don't know, that basically selects from a bunch of examples the ones that are maybe most relevant to the input at hand. So you can do some nice kind of like in-context learning there. I think that's, we've had that for a while. I don't think enough people use that, basically. Output parsers also used to be kind of important, but then function calling. There's this interesting thing where like the space is just like progressing so rapidly that a lot of things that were really important have kind of diminished a bit, to be honest. Output parsers definitely used to be an understated and underappreciated part. And I think if you're working with non-OpenAI models, they still are, but a lot of people are working with OpenAI models. But even within there, there's different things you can do with kind of like the function calling ability. Sometimes you want to have the option of having the text or the application you're building, it could return either. Sometimes you know that it wants to return in a structured format, and so you just want to take that structured format. Other times you're extracting things that are maybe a key in that structured format, and so you want to like pluck that key. And so there's just like some like annoying kind of like parsing of that to do. Agents, memory, and retrieval, we haven't talked at all. Retrieval, there's like five different subcomponents. You could also probably talk about all of those in depth. You've got the document loaders, the text splitters, the embedding models, the vector stores. Embedding models and vector stores, we don't really have, or sorry, we don't build, we integrate with those. Text splitters, I think we have like 15 or so. Like I think there's an under kind of like appreciated amount of those. [00:14:39]Swyx: And then... Well, it's actually, honestly, it's overwhelming. Nobody knows what to choose. [00:14:43]Harrison: Yeah, there is a lot. [00:14:44]Swyx: Yeah. Do you have personal favorites that you want to shout out? [00:14:47]Harrison: The one that we have in the docs is the default is like the recursive text splitter. We added a playground for text splitters the other week because, yeah, we heard a lot that like, you know, and like these affect things like the chunk overlap and the chunks, they affect things in really subtle ways. And so like I think we added a playground where people could just like choose different options. We have like, and a lot of the ideas are really similar. You split on different characters, depending on kind of like the type of text that you have marked down, you might want to split on differently than HTML. And so we added a playground where you can kind of like choose between those. I don't know if those are like underappreciated though, because I think a lot of people talk about text splitting as being a hard part, and it is a really important part of creating these retrieval applications. But I think we have a lot of really cool retrieval algorithms as well. So like self query is maybe one of my favorite things in LangChain, which is basically this idea of when you have a user question, the typical kind of like thing to do is you embed that question and then find the document that's most similar to that question. But oftentimes questions have things that just, you don't really want to look up semantically, they have some other meaning. So like in the example that I use, the example in the docs is like movies about aliens in the year 1980. 1980, I guess there's some semantic meaning for that, but it's a very particular thing that you care about. And so what the self query retriever does is it splits out the metadata filter and most vector stores support like a metadata filter. So it splits out this metadata filter, and then it splits out the semantic bit. And that's actually like kind of tricky to do because there's a lot of different filters that you can have like greater than, less than, equal to, you can have and things if you have multiple filters. So we have like a pretty complicated like prompt that does all that. That might be one of my favorite things in LangChain, period. Like I think that's, yeah, I think that's really cool. [00:16:26]Alessio: How do you think about speed of development versus support of existing things? So we mentioned retrieval, like you got, or, you know, text splitting, you got like different options for all of them. As you get building LangChain, how do you decide which ones are not going to keep supporting, you know, which ones are going to leave behind? I think right now, as you said, the space moves so quickly that like you don't even know who's using what. What's that like for you? [00:16:50]Harrison: Yeah. I mean, we have, you know, we don't really have telemetry on what people are using in terms of what parts of LangChain, the telemetry we have is like, you know, anecdotal stuff when people ask or have issues with things. A lot of it also is like, I think we definitely prioritize kind of like keeping up with the stuff that comes out. I think we added function calling, like the day it came out or the day after it came out, we added chat model support, like the day after it came out or something like that. That's probably, I think I'm really proud of how the team has kind of like kept up with that because this space is like exhausting sometimes. And so that's probably, that's a big focus of ours. The support, I think we've like, to be honest, we've had to get kind of creative with how we do that. Cause we have like, I think, I don't know how many open issues we have, but we have like 3000, somewhere between 2000 and 3000, like open GitHub issues. We've experimented with a lot of startups that are doing kind of like question answering over your docs and stuff like that. And so we've got them on the website and in the discord and there's a really good one, dosu on the GitHub that's like answering issues and stuff like that. And that's actually something we want to start leaning into more heavily as a company as well as kind of like building out an AI dev rel because we're 10 people now, 10, 11 people now. And like two months ago we were like six or something like that. Right. So like, and to have like 2,500 open issues or something like that, and like 300 or 400 PRs as well. Cause like one of the amazing things is that like, and you kind of alluded to this earlier, everyone's building in the space. There's so many different like touch points. LangChain is lucky enough to kind of like be a lot of the glue that connects it. And so we get to work with a lot of awesome companies, but that's also a lot of like work to keep up with as well. And so I don't really have an amazing answer, but I think like the, I think prioritize kind of like new things that, that come out. And then we've gotten creative with some of kind of like the support functions and, and luckily there's, you know, there's a lot of awesome people working on all those support coding, question answering things that we've been able to work with. [00:18:46]Swyx: I think there is your daily rhythm, which I've seen you, you work like a, like a beast man, like mad impressive. And then there's sometimes where you step back and do a little bit of high level, like 50,000 foot stuff. So we mentioned, we mentioned retrieval. You did a refactor in March and there's, there's other abstractions that you've sort of changed your mind on. When do you do that? When do you do like the, the step back from the day to day and go, where are we going and change the direction of the ship? [00:19:11]Harrison: It's a good question so far. It's probably been, you know, we see three or four or five things pop up that are enough to make us think about it. And then kind of like when it reaches that level, you know, we don't have like a monthly meeting where we sit down and do like a monthly plan or something. [00:19:27]Swyx: Maybe we should. I've thought about this. Yeah. I'd love to host that meeting. [00:19:32]Harrison: It's really been a lot of, you know, one of the amazing things is we get to interact with so many different people. So it's been a lot of kind of like just pattern matching on what people are doing and trying to see those patterns before they punch us in the face or something like that. So for retrieval, it was the pattern of seeing like, Hey, yeah, like a lot of people are using vector sort of stuff. But there's also just like other methods and people are offering like hosted solutions and we want our abstractions to work with that as well. So we shouldn't bake in this paradigm of doing like semantic search too heavily, which sounds like basic now, but I think like, you know, to start a lot of it was people needed help doing these things. But then there was like managed things that did them, hybrid retrieval mechanisms, all of that. I think another example of this, I mean, Langsmith, which we can maybe talk about was like very kind of like, I think we worked on that for like three or four months before announcing it kind of like publicly, two months maybe before giving it to kind of like anyone in beta. But this was a lot of debugging these applications as a pain point. We hear that like just understanding what's going on is a pain point. [00:20:27]Alessio: I mean, you two did a webinar on this, which is called Agents vs. Chains. It was fun, baby. [00:20:32]Swyx: Thanks for having me on. [00:20:33]Harrison: No, thanks for coming. [00:20:34]Alessio: That was a good one. And on the website, you list like RAG, which is retrieval of bank debt generation and agents as two of the main goals of LangChain. The difference I think at the Databricks keynote, you said chains are like predetermined steps and agents is models reasoning to figure out what steps to take and what actions to take. How should people think about when to use the two and how do you transition from one to the other with LangChain? Like is it a path that you support or like do people usually re-implement from an agent to a chain or vice versa? [00:21:05]Swyx: Yeah. [00:21:06]Harrison: You know, I know agent is probably an overloaded term at this point, and so there's probably a lot of different definitions out there. But yeah, as you said, kind of like the way that I think about an agent is basically like in a chain, you have a sequence of steps. You do this and then you do this and then you do this and then you do this. And with an agent, there's some aspect of it where the LLM is kind of like deciding what to do and what steps to do in what order. And you know, there's probably some like gray area in the middle, but you know, don't fight me on this. And so if we think about those, like the benefits of the chains are that they're like, you can say do this and you just have like a more rigid kind of like order and the way that things are done. They have more control and they don't go off the rails and basically everything that's bad about agents in terms of being uncontrollable and expensive, you can control more finely. The benefit of agents is that I think they handle like the long tail of things that can happen really well. And so for an example of this, let's maybe think about like interacting with a SQL database. So you can have like a SQL chain and you know, the first kind of like naive approach at a SQL chain would be like, okay, you have the user question. And then you like write the SQL query, you do some rag, you pull in the relevant tables and schemas, you write a SQL query, you execute that against the SQL database. And then you like return that as the answer, or you like summarize that with an LLM and return that to the answer. And that's basically the SQL chain that we have in LangChain. But there's a lot of things that can go wrong in that process. Starting from the beginning, you may like not want to even query the SQL database at all. Maybe they're saying like, hi, or something, or they're misusing the application. Then like what happens if you have some step, like a big part of the application that people with LangChain is like the context aware part. So there's generally some part of bringing in context to the language model. So if you bring in the wrong context to the language model, so it doesn't know which tables to query, what do you do then? If you write a SQL query, it's like syntactically wrong and it can't run. And then if it can run, like what if it returns an unexpected result or something? And so basically what we do with the SQL agent is we give it access to all these different tools. So it has another tool, it can run the SQL query as another, and then it can respond to the user. But then if it kind of like, it can decide which order to do these. And so it gives it flexibility to handle all these edge cases. And there's like, obviously downsides to that as well. And so there's probably like some safeguards you want to put in place around agents in terms of like not letting them run forever, having some observability in there. But I do think there's this benefit of, you know, like, again, to the other part of what LangChain is like the reasoning part, like each of those steps individually involves some aspect of reasoning, for sure. Like you need to reason about what the SQL query is, you need to reason about what to return. But there's then there's also reasoning about the order of operations. And so I think to me, the key is kind of like giving it an appropriate amount to reason about while still keeping it within checks. And so to the point, like, I would probably recommend that most people get started with chains and then when they get to the point where they're hitting these edge cases, then they think about, okay, I'm hitting a bunch of edge cases where the SQL query is just not returning like the relevant things. Maybe I should add in some step there and let it maybe make multiple queries or something like that. Basically, like start with chain, figure out when you're hitting these edge cases, add in the reasoning step to that to handle those edge cases appropriately. That would be kind of like my recommendation, right? [00:24:09]Swyx: If I were to rephrase it, in my words, an agent would be a reasoning node in a chain, right? Like you start with a chain, then you just add a reasoning node, now it's an agent. [00:24:17]Harrison: Yeah, the architecture for your application doesn't have to be just a chain or just an agent. It can be an agent that calls chains, it can be a chain that has an agent in different parts of them. And this is another part as well. Like the chains in LangChain are largely intended as kind of like a way to get started and take you some amount of the way. But for your specific use case, in order to kind of like eke out the most performance, you're probably going to want to do some customization at the very basic level, like probably around the prompt or something like that. And so one of the things that we've focused on recently is like making it easier to customize these bits of existing architectures. But you probably also want to customize your architectures as well. [00:24:52]Swyx: You mentioned a bit of prompt engineering for self-ask and then for this stuff. There's a bunch of, I just talked to a prompt engineering company today, PromptOps or LLMOps. Do you have any advice or thoughts on that field in general? Like are you going to compete with them? Do you have internal tooling that you've built? [00:25:08]Harrison: A lot of what we do is like where we see kind of like a lot of the pain points being like we can talk about LangSmith and that was a big motivation for that. And like, I don't know, would you categorize LangSmith as PromptOps? [00:25:18]Swyx: I don't know. It's whatever you want it to be. Do you want to call it? [00:25:22]Harrison: I don't know either. Like I think like there's... [00:25:24]Swyx: I think about it as like a prompt registry and you store them and you A-B test them and you do that. LangSmith, I feel like doesn't quite go there yet. Yeah. It's obviously the next step. [00:25:34]Harrison: Yeah, we'll probably go. And yeah, we'll do more of that because I think that's definitely part of the application of a chain or agent is you start with a default one, then you improve it over time. And like, I think a lot of the main new thing that we're dealing with here is like language models. And the main new way to control language models is prompts. And so like a lot of the chains and agents are powered by this combination of like prompt language model and then some output parser or something doing something with the output. And so like, yeah, we want to make that core thing as good as possible. And so we'll do stuff all around that for sure. [00:26:05]Swyx: Awesome. We might as well go into LangSmith because we're bringing it up so much. So you announced LangSmith I think last month. What are your visions for it? Is this the future of LangChain and the company? [00:26:16]Harrison: It's definitely part of the future. So LangSmith is basically a control center for kind of like your LLM application. So the main features that it kind of has is like debugging, logging, monitoring, and then like testing and evaluation. And so debugging, logging, monitoring, basically you set three environment variables and it kind of like logs all the runs that are happening in your LangChain chains or agents. And it logs kind of like the inputs and outputs at each step. And so the main use case we see for this is in debugging. And that's probably the main reason that we started down this path of building it is I think like as you have these more complex things, debugging what's actually going on becomes really painful whether you're using LangChain or not. And so like adding this type of observability and debuggability was really important. Yeah. There's a debugging aspect. You can see the inputs, outputs at each step. You can then quickly enter into like a playground experience where you can fiddle around with it. The first version didn't have that playground and then we'd see people copy, go to open AI playground, paste in there. Okay. Well, that's a little annoying. And then there's kind of like the monitoring, logging experience. And we recently added some analytics on like, you know, how many requests are you getting per hour, minute, day? What's the feedback like over time? And then there's like a testing debugging, sorry, testing and evaluation component as well where basically you can create datasets and then test and evaluate these datasets. And I think importantly, all these things are tied to each other and then also into LangChain, the framework. So what I mean by that is like we've tried to make it as easy as possible to go from logs to adding a data point to a dataset. And because we think a really powerful flow is you don't really get started with a dataset. You can accumulate a dataset over time. And so being able to find points that have gotten like a thumbs up or a thumbs down from a user can be really powerful in terms of creating a good dataset. And so that's maybe like a connection between the two. And then the connection in the other way is like all the runs that you have when you test or evaluate something, they're logged in the same way. So you can debug what exactly is going on and you don't just have like a final score. You have like this nice trace and thing where you can jump in. And then we also want to do more things to hook this into a LangChain proper, the framework. So I think like some of like the managing the prompts will tie in here already. Like we talked about example selectors using datasets as a few short examples is a path that we support in a somewhat janky way right now, but we're going to like make better over time. And so there's this connection between everything. Yeah. [00:28:42]Alessio: And you mentioned the dataset in the announcement blog post, you touched on heuristic evaluation versus LLMs evaluating LLMs. I think there's a lot of talk and confusion about this online. How should people prioritize the two, especially when they might start with like not a good set of evals or like any data at all? [00:29:01]Harrison: I think it's really use case specific in the distinction that I draw between heuristic and LLM. LLMs, you're using an LLM to evaluate the output heuristics, you have some common heuristic that you can use. And so some of these can be like really simple. So we were doing some kind of like measuring of an extraction chain where we wanted it to output JSON. Okay. One evaluation can be, can you use JSON.loads to load it? And like, right. And that works perfectly. You don't need an LLM to do that. But then for like a lot of like the question answering, like, is this factually accurate? And you have some ground truth fact that you know it should be answering with. I think, you know, LLMs aren't perfect. And I think there's a lot of discussion around the pitfalls of using LLMs to evaluate themselves. And I'm not saying they're perfect by any means, but I do think they're, we've found them to be kind of like better than blue or any of those metrics. And the way that I also like to use those is also just like guide my eye about where to look. So like, you know, I might not trust the score of like 0.82, like exactly correct, but like I can look to see like which data points are like flagged as passing or failing. And sometimes the evaluators messing up, but it's like good to like, you know, I don't have to look at like a hundred data points. I can focus on like 10 or something like that. [00:30:10]Alessio: And then can you create a heuristic once in Langsmith? Like what's like your connection to that? [00:30:16]Harrison: Yeah. So right now, all the evaluation, we actually do client side. And part of this is basically due to the fact that a lot of the evaluation is really application specific. So we thought about having evaluators, you could just click off and run in a server side or something like that. But we still think it's really early on in evaluation. We still think there's, it's just really application specific. So we prioritized instead, making it easy for people to write custom evaluators and then run them client side and then upload the results so that they can manually inspect them because I think manual inspection is still a pretty big part of evaluation for better or worse. [00:30:50]Swyx: We have this sort of components of observability. We have cost, latency, accuracy, and then planning. Is that listed in there? [00:30:57]Alessio: Well, planning more in the terms of like, if you're an agent, how to pick the right tool and whether or not you are picking the right tool. [00:31:02]Swyx: So when you talk to customers, how would you stack rank those needs? Are they cost sensitive? Are they latency sensitive? I imagine accuracy is pretty high up there. [00:31:13]Harrison: I think accuracy is definitely the top that we're seeing right now. I think a lot of the applications, people are, especially the ones that we're working with, people are still struggling to get them to work at a level where they're reliable [00:31:24]Swyx: enough. [00:31:25]Harrison: So that's definitely the first. Then I think probably cost becomes the next one. I think a few places where we've started to see this be like one of the main things is the AI simulation that came out. [00:31:36]Swyx: Generative agents. Yeah, exactly. [00:31:38]Harrison: Which is really fun to run, but it costs a lot of money. And so one of our team members, Lance, did an awesome job hooking up like a local model to it. You know, it's not as perfect, but I think it helps with that. Another really big place for this, we believe, is in like extraction of structured data from unstructured data. And the reason that I think it's so important there is that usually you do extraction of some type of like pre-processing or indexing process over your documents. I mean, there's a bunch of different use cases, but one use case is for that. And generally that's over a lot of documents. And so that starts to rack up a bill kind of quickly. And I think extraction is also like a simpler task than like reasoning about which tools to call next in an agent. And so I think it's better suited for that. Yeah. [00:32:15]Swyx: On one of the heuristics I wanted to get your thoughts on, hallucination is one of the big problems there. Do you have any recommendations on how people should reduce hallucinations? [00:32:25]Harrison: To reduce hallucinations, we did a webinar on like evaluating RAG this past week. And I think there's this great project called RAGOS that evaluates four different things across two different spectrums. So the two different spectrums are like, is the retrieval part right? Or is the generation, or sorry, like, is it messing up in retrieval or is it messing up in generation? And so I think to fix hallucination, it probably depends on where it's messing up. If it's messing up in generation, then you're getting the right information, but it's still hallucinating. Or you're getting like partially right information and hallucinating some bits, a lot of that's prompt engineering. And so that's what we would recommend kind of like focusing on the prompt engineering part. And then if you're getting it wrong in the, if you're just not retrieving the right stuff, then there's a lot of different things that you can probably do, or you should look at on the retrieval bit. And honestly, that's where it starts to become a bit like application specific as well. Maybe there's some temporal stuff going on. Maybe you're not parsing things correctly. Yeah. [00:33:19]Swyx: Okay. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:33:35]Harrison: Yeah. Yeah. [00:33:37]Swyx: Yeah. [00:33:38]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:33:56]Swyx: Yeah. Yeah. [00:33:58]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:34:04]Swyx: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. [00:34:17]Harrison: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah, I mean, there's probably a larger discussion around that, but openAI definitely had a huge headstart, right? And that's... Clawds not even publicly available yet, I don't think. [00:34:28]Swyx: The API? Yeah. Oh, well, you can just basically ask any of the business reps and they'll give it to you. [00:34:33]Harrison: You can. But it's still a different signup process. I think there's... I'm bullish that other ones will catch up especially like Anthropic and Google. The local ones are really interesting. I think we're seeing a big... [00:34:46]Swyx: Lama Two? Yeah, we're doing the fine-tuning hackathon tomorrow. Thanks for promoting that. [00:34:50]Harrison: No, thanks for it. I'm really excited about that stuff. I mean, that's something that like we've been, you know, because like, as I said, like the only thing we know is that the space is moving so fast and changing so rapidly. And like, local models are, have always been one of those things that people have been bullish on. And it seems like it's getting closer and closer to kind of like being viable. So I'm excited to see what we can do with some fine-tuning. [00:35:10]Swyx: Yeah. I have to confess, I did not know that you cared. It's not like a judgment on Langchain. I was just like, you know, you write an adapter for it and you're done, right? Like how much further does it go for Langchain? In terms of like, for you, it's one of the, you know, the model IO modules and that's it. But like, you seem very personally, very passionate about it, but I don't know what the Langchain specific angle for this is, for fine-tuning local models, basically. Like you're just passionate about local models and privacy and all that, right? And open source. [00:35:41]Harrison: Well, I think there's a few different things. Like one, like, you know, if we think about what it takes to build a really reliable, like context-aware reasoning application, there's probably a bunch of different nodes that are doing a bunch of different things. And I think it is like a really complex system. And so if you're relying on open AI for every part of that, like, I think that starts to get really expensive. Also like, probably just like not good to have that much reliability on any one thing. And so I do think that like, I'm hoping that for like, you know, specific parts at the end, you can like fine-tune a model and kind of have a more specific thing for a specific task. Also, to be clear, like, I think like, I also, at the same time, I think open AI is by far the easiest way to get started. And if I was building anything, I would absolutely start with open AI. So. [00:36:27]Swyx: It's something I think a lot of people are wrestling with. But like, as a person building apps, why take five vendors when I can take one vendor, right? Like, as long as I trust Azure, I'm just entrusting all my data to Azure and that's it. So I'm still trying to figure out the real case for local models in production. And I don't know, but fine-tuning, I think, is a good one. That's why I guess open AI worked on fine-tuning. [00:36:49]Harrison: I think there's also like, you know, like if there is, if there's just more options available, like prices are going to go down. So I'm happy about that. So like very selfishly, there's that aspect as well. [00:37:01]Alessio: And in the Lancsmith announcement, I saw in the product screenshot, you have like chain, tool and LLM as like the three core atoms. Is that how people should think about observability in this space? Like first you go through the chain and then you start dig down between like the model itself and like the tool it's using? [00:37:19]Harrison: We've added more. We've added like a retriever logging so that you can see like what query is going in and what are the documents you're getting out. Those are like the three that we started with. I definitely think probably the main ones, like basically the LLM. So the reason I think the debugging in Lancsmith and debugging in general is so needed for these LLM apps is that if you're building, like, again, let's think about like what we want people to build in with LangChain. These like context aware reasoning applications. Context aware. There's a lot of stuff in the prompt. There's like the instructions. There's any previous messages. There's any input this time. There's any documents you retrieve. And so there's a lot of like data engineering that goes into like putting it into that prompt. This sounds silly, but just like making sure the data shows up in the right format is like really important. And then for the reasoning part of it, like that's obviously also all in the prompt. And so being able to like, and there's like, you know, the state of the world right now, like if you have the instructions at the beginning or at the end can actually make like a big difference in terms of whether it forgets it or not. And so being able to kind of like. [00:38:17]Swyx: Yeah. And it takes on that one, by the way, this is the U curve in context, right? Yeah. [00:38:21]Harrison: I think it's real. Basically I've found long context windows really good for when I want to extract like a single piece of information about something basically. But if I want to do reasoning over perhaps multiple pieces of information that are somewhere in like the retrieved documents, I found it not to be that great. [00:38:36]Swyx: Yeah. I have said that that piece of research is the best bull case for Lang chain and all the vector companies, because it means you should do chains. It means you should do retrieval instead of long context, right? People are trying to extend long context to like 100K, 1 million tokens, 5 million tokens. It doesn't matter. You're going to forget. You can't trust it. [00:38:54]Harrison: I expect that it will probably get better over time as everything in this field. But I do also think there'll always be a need for kind of like vector stores and retrieval in some fashions. [00:39:03]Alessio: How should people get started with Langsmith Cookbooks? Wanna talk maybe a bit about that? [00:39:08]Swyx: Yeah. [00:39:08]Harrison: Again, like I think the main thing that even I find valuable about Langsmith is just like the debugging aspect of it. And so for that, it's very simple. You can kind of like turn on three environment variables and it just logs everything. And you don't look at it 95% of the time, but that 5% you do when something goes wrong, it's quite handy to have there. And so that's probably the easiest way to get started. And we're still in a closed beta, but we're letting people off the wait list every day. And if you really need access, just DM me and we're happy to give you access there. And then yeah, there's a lot that you can do with Langsmith that we've been talking about. And so Will on our team has been leading the charge on a really great like Langsmith Cookbooks repo that covers everything from collecting feedback, whether it's thumbs up, thumbs down, or like multi-scale or comments as well, to doing evaluation, doing testing. You can also use Langsmith without Langchain. And so we've got some notebooks on that in there. But we have Python and JavaScript SDKs that aren't dependent on Langchain in any way. [00:40:01]Swyx: And so you can use those. [00:40:01]Harrison: And then we'll also be publishing a notebook on how to do that just with the REST APIs themselves. So yeah, definitely check out that repo. That's a great resource that Will's put together. [00:40:10]Swyx: Yeah, awesome. So we'll zoom out a little bit from Langsmith and talk about Langchain, the company. You're also a first-time founder. Yes. And you've just hired your 10th employee, Julia, who I know from my data engineering days. You mentioned Will Nuno, I think, who maintains Langchain.js. I'm very interested in like your multi-language strategy, by the way. Ankush, your co-founder, Lance, who did AutoEval. What are you staffing up for? And maybe who are you hiring? [00:40:34]Harrison: Yeah, so 10 employees, 12 total. We've got three more joining over the next three weeks. We've got Julia, who's awesome leading a lot of the product, go-to-market, customer success stuff. And then we've got Bri, who's also awesome leading a lot of the marketing and ops aspects. And then other than that, all engineers. We've staffed up a lot on kind of like full stack infra DevOps, kind of like as we've started going into the hosted platform. So internally, we're split about 50-50 between the open source and then the platform stuff. And yeah, we're looking to hire particularly on kind of like the things, we're actually looking to hire across most fronts, to be honest. But in particular, we probably need one or two more people on like open source, both Python and JavaScript and happy to dive into the multi-language kind of like strategy there. But again, like strong focus there on engineering, actually, as opposed to maybe like, we're not a research lab, we're not a research shop. [00:41:48]Swyx: And then on the platform side, [00:41:49]Harrison: like we definitely need some more people on the infra and DevOps side. So I'm using this as an opportunity to tell people that we're hiring and that you should reach out if that sounds like you. [00:41:58]Swyx: Something like that, jobs, whatever. I don't actually know if we have an official job. [00:42:02]Harrison: RIP, what happened to your landing page? [00:42:04]Swyx: It used to be so based. The Berkshire Hathaway one? Yeah, so what was the story, the quick story behind that? Yeah, the quick story behind that is we needed a website [00:42:12]Harrison: and I'm terrible at design. [00:42:14]Swyx: And I knew that we couldn't do a good job. [00:42:15]Harrison: So if you can't do a good job, might as well do the worst job possible. Yeah, and like lean into it. And have some fun with it, yeah. [00:42:21]Swyx: Do you admire Warren Buffett? Yeah, I admire Warren Buffett and admire his website. And actually you can still find a link to it [00:42:26]Harrison: from our current website if you look hard enough. So there's a little Easter egg. Before we dive into more of the open source community things, [00:42:33]Alessio: let's dive into the language thing. How do you think about parity between the Python and JavaScript? Obviously, they're very different ecosystems. So when you're working on a LangChain, is it we need to have the same abstraction in both language or are you to the needs? The core stuff, we want to have the same abstractions [00:42:50]Harrison: because we basically want to be able to do serialize prompts, chains, agents, all the core stuff as tightly as possible and then use that between languages. Like even, yeah, like even right now when we log things to LangChain, we have a playground experience where you can run things that runs in JavaScript because it's kind of like in the browser. But a lot of what's logged is like Python. And so we need that core equivalence for a lot of the core things. Then there's like the incredibly long tail of like integrations, more researchy things. So we want to be able to do that. Python's probably ahead on a lot of like the integrations front. There's more researchy things that we're able to include quickly because a lot of people release some of their code in Python and stuff like that. And so we can use that. And there's just more of an ecosystem around the Python project. But the core stuff will have kind of like the same abstractions and be translatable. That didn't go exactly where I was thinking. So like the LangChain of Ruby, the LangChain of C-sharp, [00:43:44]Swyx: you know, there's demand for that. I mean, I think that's a big part of it. But you are giving up some real estate by not doing it. Yeah, it comes down to kind of like, you know, ROI and focus. And I think like we do think [00:43:58]Harrison: there's a strong JavaScript community and we wanted to lean into that. And I think a lot of the people that we brought on early, like Nuno and Jacob have a lot of experience building JavaScript tooling in that community. And so I think that's a big part of it. And then there's also like, you know, building JavaScript tooling in that community. Will we do another language? Never say never, but like... [00:44:21]Swyx: Python JS for now. Yeah. Awesome. [00:44:23]Alessio: You got 83 articles, which I think might be a record for such a young company. What are like the hottest hits, the most popular ones? [00:44:32]Harrison: I think the most popular ones are generally the ones where we do a deep dive on something. So we did something a few weeks ago around evaluating CSV question answering applications, which I think is a really interesting one because most question answering, like everyone does question answering, but it's generally over unstructured data over your documents and you do the whole rag thing. And that doesn't work amazing for structured data. And so this was something that we heard, the origin of this was basically we heard from the community, you guys should improve this. And so we're like, okay, let's improve it. And then we're like, okay, well, in order to see if we improve it, we need to like evaluate it and see how we're doing. And so we kind of like wrote up a lot of our thought process there. And I think, and a lot of people like reached out about that and thought that was interesting and we're going through similar challenges and had, we posted another one a few days after that someone wrote basically as a response, which is awesome because it had a completely different strategy. And it was a really, it was a really, that was a really good piece as well. So that was like a deep dive on something like evaluation bit. I think like we did one on retrieval a while back, which was basically like, hey, we, and this was around when we changed our abstractions, like, hey, we changed our abstractions to this. This is why we did it. This is what we see coming down the pipeline. These are like the different types of retrieval that we see. I think a lot of people read and liked that one. A lot of the blogs that we do are also highlighting cool partnerships or cool applications. But in terms of, if you go by like number of views, I think the ones that get the most views are the more like deep dive ones. [00:45:55]Swyx: Yeah. And I also noticed that you do guest posts as well. [00:45:58]Harrison: Actually, you know, which one, and this is a guest post that got a lot of views, the multi-on one, the multi-on agent one. When we did, we did a blog where we integrated with them and that got a ton of views. [00:46:06]Swyx: What do you think that is? [00:46:07]Harrison: I think it's, I mean, it's one of like the few agents that's actually available and like out in the world. [00:46:15]Swyx: They're still behind a wait list. Still behind a wait list, [00:46:17]Harrison: but they're very active on social media. I don't know if I'm off the wait list. [00:46:21]Swyx: I mean, you're on their blogs. They're on your blog, so I hope they give you access at some point. But that's interesting. A lot of interest in agents. I think they just opened up an API as well. Yeah, exactly. [00:46:32]Harrison: That was the blog that we did. I was, yeah, I was a bit surprised to see that as well, but I think there's generally a lot of interest in agents and it's also really hard to get them to work. And I think multi-on is one of the first that has that. [00:46:45]Swyx: Yeah. So my angle to this is a lot of people want to work with you. Yes. You're bombarded. I'm sure your email is just unmanageable. How should people be good partners with you? Like I work at a company and I'm like, hey, I'd love to do something on the LangChain blog or integrate to LangChain. I know Harrison's a busy guy. Like, what do I do? [00:47:03]Harrison: Like the stuff that gets my attention honestly is like the in-depth, really thought out stuff. Obviously I love this stuff. Like this stuff is awesome. And there's so many different, there's so much to do as well. And like the biggest thing that we have trouble with internally is like figuring out what to do. [00:47:17]Swyx: What's noise and what's signal. [00:47:19]Harrison: Not even that, but just like what to focus on. Like there's so many different directions we could do and we want to go in like so many because there's so many interesting things, but we can't do. So if anyone kind of like takes the time to like go deep in a particular area, I love talking to them and I love reading what they write. And I love sharing what they write on the blog. Like that to me is awesome. So I think like... [00:47:37]Swyx: Do good stuff. Be so good they can't ignore you. It sounds basic, right? [00:47:40]Harrison: So that's why I didn't want to say it. [00:47:42]Swyx: No, it's great. [00:47:42]Harrison: But I think like these deep dots, yeah, there's just so much to do and these don't do shallow stuff, I guess would be. [00:47:48]Swyx: I think that's a good call that people need reminding. [00:47:50]Alessio: What about the other side of open source? So on Acker News, there were a couple blog posts recently, like the problem with LangChain and LangChain is pointless, all these different things. So the TLDR of some of them were, the LangChain API is like kind of verbose and complicated versus like sometimes I can just do this in like 10 lines of code. How do you balance that in terms of allowing for the complex use cases versus making maybe the ergonomics like simpler, but then trading that off later? [00:48:21]Harrison: There's a lot to balance and there's a lot to do. And I think like posts like that are very valuable to hear basically what people are saying. And like, we have a lot of open issues. So it's not like these things hadn't been said before, but I think like that was a good emphasis on what people are saying. And I think there was a lot of things in there. I think part of it's kind of like around and we took all of it very seriously. And yeah, I think there's a lot to dive into there. There's like the documentation piece. And so I think we did a revamp of the documentation to address that. There's also like a comment in this, I think this was around, I think the top comment on the LangChain is pointless one was like basically like orchestration is like 5% of the work. And then like the other 95% is like prompt engineering and like data engineering. And those are the hard bits. I think maybe orchestration is a little bit more than 5%, but I like agree that those are like really big pain points that get exacerbated when you have these complex chains and agents where you can't really see what's going on inside of them. And I think that's partially why we built Langsmith to help out with exactly that. We also needed to do better things like make the prompts more visible and make it allow for more customizability around that. And so we've tried to add some stuff there. In terms of balancing, there's also LangChain is pointless. I don't need a wrapper. I can just call the underlying API. I think if all you're trying to do is call the underlying API, then like, yeah, that's gonna be the cleanest and simplest thing to do. And we try to get as close to that experience as possible, but we're not optimizing for calling the API. We're optimizing for helping people build context-aware reasoning applications as easily as possible. And so there's some level of abstractions that you need to add in order to assist in that. Yeah, that's definitely a balance that's tricky to strike, but I think there's also some aspect of it. Like, I do think one of the big benefits that LangChain provides is a standard interface for language models so that you can switch between them. And this kind of gets into like an ORM debate, like are ORMs generally kind of like useful or not? And so I think in this case they are. I think there's probably a larger kind of like philosophical kind of like question about that [00:50:25]Swyx: that people have strong opinions on. Just the prompts don't transfer like you also mentioned. Yeah, yeah, there's that, yeah. [00:50:32]Harrison: And then between kind of like allowing for, I think one helpful thing that we did in terms of like distinguishing between basically the base interfaces and then more complex stuff is part of the separation around the docs is there's like the components piece, which has the model IO, the retrieval, the agents, the callbacks, things like that. And then there's all the use cases. And so I think like the use cases, because they are like these assembly of all these things in a particular order, they start to get more complex. And it's, you know, we try our best to kind of like make clear how you can configure things. But yeah, there's a lot of different options that you might want to configure. And so I think that split has kind of helped us internally at least. And I think externally as well, because we've heard good comments about the improved documentation. I think that's made it a little bit more clear. And then another thing, one of the things that we also released soon after, and we'd been thinking about a little bit is basically like a LangChain expression language, which allows for actual composability of pieces. So LangChain, I think, has always been very good about interchangeability. Let's ignore the prompting issues, but like you could always plug in like one LLM for another one. You could swap in one vector for another one, but the chains themselves haven't actually been super actually composable. Like we had the sequential chain, but that was a bit like clunky to use. And then we had a router chain, but that was a bit, you know, that was also a bit clunky to use. And so one of the things, and so there's a million different things to do, and we didn't prioritize that. [00:51:53]Swyx: I think after this, [00:51:53]Harrison: we definitely bumped it up and prioritized in priority. And luckily Nuno had been doing a lot of awesome work on it already, so it wasn't too much of a lift. But yeah, now there's this way where a lot of the chains that we've been releasing are written in this LangChain expression language where they're actually truly composable, and you can see what's going on under the hood. And it's basically, it uses kind of like the pipe kind of like terminology to coordinate things and move things around. So yeah, I mean, I think there were a lot of good points in those Hacker News things, and you know, we can't respond to everything, but we try to like look at everything and take everything seriously. [00:52:25]Swyx: You're being very diplomatic. But so first of all, I like the expression language. I think that that is the path towards sort of language agnostic LangChain kind of, or whatever, DSL. But also like, what was just kind of plain wrong or plain offensive, or like, I don't know, people can get very vitriolic sometimes on Hacker News. [00:52:40]Harrison: Yeah, I mean, I think the comments that I appreciated were the ones where they gave specific things. And I think the ones where they said, you know, LangChain sucks. Like, okay. Can't do much of that. [00:52:51]Swyx: Yeah, exactly. Verifacing on my question would be like, you're not the first and you won't be the last to have that kind of very intense scrutiny. What would be your advice to other people, other maintainers of projects for going through something like this? [00:53:03]Harrison: I would probably say, try to drill into like what is actually underlying things [00:53:08]Swyx: as much as possible. [00:53:08]Harrison: And if there is actual substance that's being delivered, whether you agree with it or not, like, I think that's valuable to know. And then for the other stuff, like try to maybe follow up, but maybe try not to let it get under your skin too much. [00:53:22]Swyx: Thanks for tackling that. [00:53:24]Alessio: And I know we're getting to the time and we'll wrap up soon, but since you're going to speak at the AI Engineers Conference, what's your advice to AI engineers, especially when to start with LangChain and when they're just experimenting with a model, [00:53:38]Swyx: when are they, [00:53:38]Alessio: as you mentioned, if you just want to do an API call, don't use LangChain. Yeah. [00:53:43]Harrison: I mean, my advice would just like build as many things as possible. Like, I think it's still really early in the space. No one really knows what they're doing to some extent. Like, it's a bit weird to say, but there's so many things to like discover. So I would just say like, build as many things as possible. Cause I think like the best thing is you stumble upon a really good idea and you build something really awesome. And the worst thing that happens is you just learn a lot about a field and the technology that's going to be incredibly important and rapidly kind of like changing. [00:54:11]Alessio: What would you build if you weren't doing LangChain? [00:54:13]Harrison: I mean, the things that are most interesting to me are kind of like things around like long-term memory and like longer running agents. So I'd probably build, and these are things that we've been wanting to build [00:54:23]Swyx: internally as well. [00:54:23]Harrison: But like, I think a chatbot that like actually remembers things about you as like silly as that sounds, like people like chatbots a lot and they have their delivered limited by their context window. And so I think really diving into like a specific application of memory there. [00:54:38]Swyx: I've been trying to build a chatbot [00:54:39]Harrison: that remembers things about you. That would be one. And then like, I know a lot of people are doing this, but like a personal assistant for like managing like email calendar, basic stuff, which I think is, I think that's like a fantastic application for these like agent like things, because if you think about personal assistants today, you usually interact, I don't have one, but I'm told you interact with them over email. And the nice thing about that, as opposed to like chat, there's not as stringent an expectation on latency as there is on chat. And so you can do a lot of things like reflection and kind of like making sure that you're on the right track and really put more safeguards and thinking about these agents as opposed to relying on like chas and interface, like the bot we have that's on GitHub answering questions on the issues, I think probably gives better answers than the bots that we have that are on chat on the website. And I think that's not because, there's just different constraints that you have in different types of problems. And I think I would be like, I think the personal assistant one's really interesting because you remove the constraint of chat, which I think at this point in time is probably pretty limited in terms of functionality. [00:55:43]Swyx: Yeah. I've been calling this sort of long inference. If you didn't have to care about ANC and you could take like a day, a month, a year to work on something, what could you do? And yeah, that's super interesting. [00:55:56]Harrison: I think that's a really promising place to explore. [00:55:58]Swyx: Yeah. Have you looked at, regarding the long conversation thing, you and I have tried it about this many times. Have you looked into what character and inflection are doing? Because they're probably working on it. [00:56:08]Harrison: I've thought about memory a bunch. Like I think it comes down to like, it comes down to like state, like what's the state you're tracking? Like what's the data structure for that? And I think that could also maybe be a bit like application specific. But if we're talking about a generic chat bot, that's kind of generic. I don't know. Yeah, I don't know how they're thinking about that. My sense is that inflection like thinks about that a bit more than character. Like I think in Inception, sorry, inflection's whole thing is they like, the bot knows you. [00:56:33]Swyx: It's one chat. There's no history. You just talk to it. Yeah. [00:56:37]Harrison: So they've definitely got some state that they're tracking. I'd be really curious to know what that is. Character, I don't think has lent into it too much. I think they let you do some stuff in terms of like uploading background. And I'm not entirely sure how they use that, whether they just like put that in the prompt or do some retrieval over that. But I think they're definitely, they haven't lent into it as much as inflection, I would say. [00:56:57]Swyx: So given like, you are one of the most interested people in this space, would this be like a second product for you? If you ever want to explore that or do you want to just partner with people and you're putting out the call for people to come to you if they have solutions for that? [00:57:10]Harrison: If I wasn't working on LangChain, I would be building an application company, for sure, first of all. Like, I don't think, like I think like there's, which I know is very hypocritical to say. [00:57:20]Swyx: Like you're Mr. DevTools and Infra and Observability. [00:57:24]Harrison: Yeah, I don't know. If you're building an application company that's working on something related to long-term memory or long-term agents, I would love to chat and just geek out [00:57:31]Swyx: about a lot of this stuff. I'll show you Smalltalk at some point. Yes. Cool. Awesome. [00:57:37]Alessio: Yeah, let's do a lightning round. [00:57:38]Swyx: So the first one is on acceleration. What has happened in AI that you thought would take much longer than it actually ended up taking? [00:57:45]Harrison: The function call and ability from OpenAI, like tool usage. [00:57:48]Swyx: Yeah. [00:57:48]Harrison: They did that really fast, I thought. [00:57:50]Swyx: Yeah. But it's just a question of fine-tuning, no? Yeah. It's not even like reliable. [00:57:54]Harrison: It's not terrible. They're a pretty big organization that's serving a lot of traffic. And like, this was a, yeah, it's like, it is like just fine-tuning, but I think like you still have to like collect that data set and fine-tune it and evaluate it and then release it at scale and figure out the right API. [00:58:09]Swyx: No shade on OpenAI. Like they're moving everyone's bar as to how quickly like a 400% organization can go. Do you think it eliminates like approaches like JSONformer and all the other approaches that people, like guardrails, you know, previous guest, eliminates your output validation thing? Yeah. [00:58:26]Harrison: I think JSONformer and stuff like that are still really interesting for like local models, for sure. And there's like 90% of people use OpenAI or something and like my made up numbers. [00:58:37]Swyx: No, it's probably real. [00:58:38]Harrison: And the best way to get structured output is by using the function calling ability. So yeah, absolutely. [00:58:46]Alessio: What do you think is the most interesting unsolved question in AI? [00:58:50]Harrison: I'm really interested like how multimodal is going to work. Like with just what that looks like. [00:58:55]Swyx: Have you had a look at the GPT-4 vision? No, not really. [00:58:59]Harrison: Yeah, not beyond what they- [00:59:01]Swyx: They're doing private betas right now. So I'm very excited. [00:59:04]Harrison: I'm excited about that as well. Yeah, I mean, I think that's, you know, you talk about like, again, this whole space is just changing so fast, but you talk about something that could like really change how, because like, you know, a lot of lang chain is kind of like a data orchestration tool in some sense. And so if you had a whole new type of data in there. [00:59:20]Swyx: So maybe we do this thought exercise, right? Tomorrow, OpenAI releases the GPT-4 vision API. What does lang chain do? [00:59:25]Harrison: Immediately we add support for it in like the wrapper. So however you interact, like honestly, this is another like fun thing. Everyone's API now looks like OpenAI's. [00:59:35]Swyx: Yeah, which is great. [00:59:36]Harrison: Which you have to do, yeah. So like our wrapper looks similar to OpenAI. So I don't think it will be that difficult to include support for it at the basic model level. And so we do that. And now that we've released the expression language bit, like a lot of the core chains, we have examples of rewriting them just in this expression language. So like for retrieval, if we're now talking about like, okay, you can do like retrieval question answering over for multimodal things, we'd probably have to figure out how those are getting stored and what's being done with them. But then from there, that should be, yeah, so probably looking to like, yeah, how are people kind of like storing and consuming this type of information? But then that step should be pretty easy to plug into the kind of like chain. [01:00:17]Swyx: Multimodal stores? Yeah, I don't know. I always wonder what that would actually look like because a lot of multimodality in LLMs is really just an LLM, a text LLM calling a different model. And that's just no different than any API call, essentially unchanged. [01:00:32]Harrison: I think it's probably something that you don't know until you let like a million people play around with it. [01:00:37]Swyx: Then there'll be new LangChain for multimodal. What's one message you want everyone to remember today? [01:00:43]Harrison: I would probably say just like build. I think it's a fantastic time to be building. [01:00:47]Swyx: All right, just build. Yeah. [01:00:49]Alessio: Thank you Harrison for coming on. [01:00:51]Swyx: Thanks so much. [01:00:51]Harrison: Thank you guys for having me. [01:00:52]Swyx: It's a lot of fun. [01:00:53] Get full access to Latent Space at www.latent.space/subscribe
01:00:5006/09/2023
RWKV: Reinventing RNNs for the Transformer Era — with Eugene Cheah of UIlicious
The AI Engineer Summit Expo has been announced, presented by AutoGPT (and future guest Toran Bruce-Richards!) Stay tuned for more updates on the Summit livestream and Latent Space University.This post was on HN for 10 hours.What comes after the Transformer? This is one of the Top 10 Open Challenges in LLM Research that has been the talk of the AI community this month. Jon Frankle (friend of the show!) has an ongoing bet with Sasha Rush on whether Attention is All You Need, and the most significant challenger to emerge this year has been RWKV - Receptance Weighted Key Value models, which revive the RNN for GPT-class LLMs, inspired by a 2021 paper on Attention Free Transformers from Apple (surprise!).What this means practically is that RWKV models tend to scale in all directions (both in training and inference) much better than Transformers-based open source models:While remaining competitive on standard reasoning benchmarks:swyx was recently in Singapore for meetings with AI government and industry folks, and grabbed 2 hours with RWKV committee member Eugene Cheah for a deep dive, the full recording of which is now up on Latent Space TV:Today we release both the 2hr video and an edited 1hr audio version, to cater to the different audiences and provide “ablation opportunities” on RWKV interest level.The Eleuther Mafia?The RWKV project is notable not merely because of the credible challenge to the Transformers dominance. It is also a distributed, international, mostly uncredentialed community reminiscent of early 2020s Eleuther AI:* Primarily Discord, pseudonymous, GPU-poor volunteer community somehow coordinating enough to train >10B, OPT/BLOOM-competitive models* Being driven by the needs of its community, it is extremely polyglot (e.g. English, Chinese, Japanese, Arabic) not because it needs to beat some benchmarks, but because its users want it to be for their own needs.* “Open Source” in both the good and the bad way - properly Apache 2.0 licensed (not “open but restricted”), yet trained on data taken from commercially compromised sources like the Pile (where Shawn Presser’s Books3 dataset has been recently taken down) and Alpaca (taking from Steven Tey’s ShareGPT which is technically against OpenAI TOS)The threadboi class has loved tracking the diffusion of Transformers paper authors out into the industry:But perhaps the underdog version of this is tracking the emerging Eleuther AI mafia:It will be fascinating to see how both Eleuther and Eleuther alums fare as they build out the future of both LLMs and open source AI.Audio Version Timestampsassisted by smol-podcaster. Different timestamps vs the 2hr YouTube* [00:05:35] Eugene's path into AI at UIlicious* [00:07:33] Tokenizer penalty and data efficiency of Transformers* [00:08:02] Using Salesforce CodeGen* [00:10:17] The limitations of Transformers for handling large context sizes* [00:13:17] RWKV compute costs compared to Transformers* [00:16:06] How Eugene found RWKV early* [00:18:52] RWKV's focus on supporting many languages, not just English* [00:21:24] Using the RWKV model for fine-tuning for specific languages* [00:24:45] What is RWKV?* [00:33:46] Overview of the different RWKV models like World, Raven, Novel* [00:41:34] Background of Blink, the creator of RWKV* [00:49:55] The linear vs quadratic scaling of RWKV vs Transformers* [00:53:29] RWKV matching Transformer performance on reasoning tasks* [00:54:31] The community's lack of marketing for RWKV* [00:57:00] The English-language bias in AI models* [01:00:33] Plans to improve RWKV's memory and context handling* [01:03:10] Advice for AI engineers wanting to get more technical knowledgeShow NotesCompanies/Organizations:* RWKV - HF blog, paper, docs, GitHub, Huggingface* Raven 14B (finetuned on Alpaca+ShareGPT+...) Demo* World 7B (supports 100+ world languages) Demo* How RWKV works in 100 LOC, RWKV overview* EleutherAI - Decentralized open source AI research group* Stability AI - Creators of Stable Diffusion * Conjecture - Spun off from EleutherAIPeople:* Eugene Chia - CTO of UIlicious, member of RWKV committee (GitHub, Twitter)* Blink/Bo Peng - Creator of RWKV architecture* Quentin Anthony - our Latent Space pod on Eleuther, coauthor on RWKV * Sharif Shameem - our Latent Space pod on being early to Stable Diffusion* Tri Dao - our Latent Space pod on FlashAttention making Attention subquadratic* Linus Lee - our Latent Space pod in NYC* Jonathan Frankle - our Latent Space pod about Transformers longevity* Chris Re - Genius at Stanford working on state-space models* Andrej Karpathy - Zero to Hero series* Justine Tunney ("Justine.lol") - mmap trickModels/Papers:* Top 10 Open Challenges in LLM Research* Retentive Network: A Successor to Transformer for Large Language Models * GPT-NeoX - Open source replica of GPT-3 by EleutherAI * Salesforce CodeGen and CodeGen 2* Attention Free Transformers paper* The Pile* RedPajama dataset* Monarch Mixer - Revisiting BERT, Without Attention or MLPsMisc NotesRWKV is not without known weaknesses - Transformers do well in reasoning because they are expressive in the forward pass, yet the RWKV docs already note that it is sensitive to prompt formatting and poor at lookback tasks. We also asked pointed questions about RWKV’s challenges in the full podcast. Get full access to Latent Space at www.latent.space/subscribe
01:12:1130/08/2023
Cursor.so: The AI-first Code Editor — with Aman Sanger of Anysphere
Thanks to the almost 30k people who tuned in to the last episode!Your podcast cohosts have been busy shipping:* Alessio open sourced smol-podcaster, which makes the show notes here! * swyx launched GodMode. Maybe someday the Cursor of browsers?* We’re also helping organize a Llama Finetuning Hackameetup this Saturday in anticipation of the CodeLlama release. Lastly, more speakers were announced at AI Engineer Summit! 👀~46% of code typed through VS Code is written by Copilot. How do we get closer to 90+%? Aman Sanger says we need a brand new AI-powered IDE to get there; and we’re excited to be the first podcast ever to tell the Cursor story.If you haven’t heard of Cursor, you may have been living under a rock. Here are just some of the rave reviews going around in the past week alone:* “Cursor is the best product I've used in a while” - Alex MacCaw* “Someone finally put GPT into a code editor in a seamless way. It's so elegant and easy. No more copying and pasting.” - Andrew McCalip* “Coding with AI is getting insane.” - Mckay Wrigley* “This is mind blowing 🤯” - Linus Ekenstam* “Cursor + gpt4-32k = illegal levels of productivity” - Sully Omarr* “EL MEJOR EDITOR DE CÓDIGO con IA” - Carlos SantanaA decade ago, “platform risk” meant building apps on social media platforms was risky as you could get cut off from the social network. Today, the AI version of “platform risk” is building AI products within an existing product (like an AI extension for VS Code, or a Figma plugin). Since Copilot, a generation of VSCode plugins have launched (including Cody, Cosine, and previous guests Codeium and Codium), only to be challenged by Copilot X itself.A core AI Engineering thesis is that new capabilities in AI demands new innovation in AI UX (and that AI UX can actually be a viable moat). Take VS Code for example; when Github was first working on Copilot, there was actually no way to support the “ghost autocomplete” feature we all use today. They eventually convinced the team to build it, and Copilot’s success speaks for itself.If you’re a startup building on top of VSC today, you do not have the same access and influence on the roadmap. Your UX is limited to what they allow you to do, and often that caps your ability to successfully compete against them. Since Cursor owns the whole IDE, they can do things you can’t (yet) do in VSCode:Cursor’s GameplanCursor is competing head to head against VS Code by forking Microsoft’s IDE and building their own AI-powered version. A few of Cursor’s unique features:* Native chat: Chat is a core piece of Cursor. Users can choose between GPT-3.5 and GPT-4 to ask questions and receive answers based on their code.* “Mentioning” files: you can easily add files into your request context by using “@”; this works both for code as well as documentation. If you want to do a change that includes multiple files, you can include them in your question to make sure the change is reflected in all of them.* Custom prompting engine: Cursor built Priompt, their custom prompting engine. As your chats go over the context window size, Priompt figures out which messages to keep in the history, which files to drop from the prompt, etc. * Moving beyond typing: while IDEs are familiar to folks as today’s interfaces, in the future Cursor hopes to have agents you can delegate tasks to. Instead of a back and forth on a new feature or bug fix, you can ask it to do the whole thing for you end to end.After diving deep into Cursor we nerded out on model usage, training, quantization, and evaluation. There’s a ton of great content in this episode, we hope you’ll enjoy it!As always, feedback welcome in the comments, and tag us on socials for future guest suggestions!Show Notes* Cursor* Gary Marcus’ cubes prompt* Priompt* “Humans should focus on bigger problems.”* Codium AI on Latent Space* Rift from Morph* Sourcegraph* E2B* Repl.it* HungryHungryHippos, Hyena, etc (see our FlashAttention episode)* Aman Tweets* Why GPT-3.5 is (mostly) cheaper than Llama 2* Llama’s architectural limitations* “Training will look like researchers/practitioners offloading large-scale training jobs to specialized “training” companies: a state of the world that resembles chip design & fabrication.” - Mosaic prediction* “The size of all code/history on Github public repos is 92TB. The size of Google's monorepo in 2015 was 86TB (of much higher quality code). If Google were willing to deploy code models trained on their own data, they'd have a noticable advantage over everyone else.” - May 2023Timestamps* [00:00:00] Intros* [00:02:31] Developing CAD models vs coding models* [00:05:23] Deciding to build a new IDE optimized for large language models* [00:10:50] Getting early access to GPT-4 and realizing its potential for software development* [00:12:32] Rethinking the UI/UX for coding* [00:18:24] Cursor's features like system prompts and chat* [00:22:24] Tips for prompting GPT-3/4 for code generation and editing* [00:27:24] Cursor's documentation and context features* [00:29:30] The potential of coding agents like Code Interpreter* [00:38:23] Cursor's internal prompting tool Priompt* [00:40:47] The challenges of very long context lengths for models* [00:45:44] The compute costs for prompt tokens vs. completion tokens* [00:49:36] How quantization interacts with model utilization* [00:51:24] Issues with human eval for benchmarking code models* [00:53:12] Thoughts on training models vs. relying on foundation models from big providers* [00:55:34] The origin story of Cursor's parent company AnySphere* [00:56:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Residence at Decibel Partners, and I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: Hey, and today we're back in the studio again after a little break and we have Aman Sanger in the house. Hey Aman. Hey, thanks for coming. Thanks for having me. So I wanted to introduce our guests and then have you fill in the blanks. So you worked at Gamelon, Bridgewater, McKinsey, Google, and You.com, all on sort of kind of AI related things and some finance related things. You also ran your own consultancy, Abelian AI, and you graduated in CS and math from MIT recently. Worked on a few projects, including Instill, which I think we'll cover a little bit later, and most recently Cursor.so, which we'll cover for the vast majority of the podcast. But just on a personal side, what's one thing that people should know about you that, you know, might not be so obvious on LinkedIn? Oh, interesting. [00:01:01]Aman: In a previous life, I played a lot of squash. [00:01:05]Swyx: You were a top seed? [00:01:06]Aman: Yeah. So in high school, I kind of competed in tournaments and most people probably don't really know what squash is. It's like tennis in many ways. It's like a racket sport, but it's indoors. You play against a wall. I guess now pickleball is all the rage with, with racket sports, but yeah, the story is I used to play tennis and then I moved to a building that had a squash court in it and then I picked it up. I loved it. And I've been playing ever since. So I competed a lot in high school, played a bunch at FIT, have not had the chance to play much here. In San Francisco, there aren't too many courts. [00:01:38]Swyx: We can organize a squash tournament and then you'll crush it, of course. Is there anything about the athlete mentality that you take with you as a founder? [00:01:47]Aman: Yeah, I think it can be at times a bit too much, but I'm very competitive. I really hate losing. Now I think I'll go on runs and if someone tries passing me, I won't let it happen. I'll just kick it into overdrive and maybe I'll turn the corner if I know they're going to beat me, but I can't let someone pass me when I'm running. And I think the same is true with starters, where the competitive nature, I think it in general helps motivate me and makes me, I guess, just work harder. [00:02:17]Swyx: Yeah. Okay. Well, we'll have a bunch of competitive questions later, but we'll go over the timeline. [00:02:22]Alessio: Let's jump into how you got to Cursor. So in August 2022, you launched something called Instill. Can you talk a little bit about that? [00:02:31]Aman: Yeah, and maybe before I go into Instill, I should talk about what I was even doing before that, because Instill was actually a very brief foray from what I was doing with my original co-founder, Michael. So we had both actually gone to the same high school together, gone to MIT together. And then after graduating, we knew we wanted to start something. And in June, what we were working on was also called Cursor, but very different. We basically were very, very fanatical users of Copilot. We loved it. And we had a little bit of experience with computer-aided design or CAD software. A lot of our friends, in fact, were mechanical engineers. And we'd heard a lot about how tedious it was to just design these parts and software like SOLIDWORKS and whatnot. It was pretty obvious to us that if you could train a transformer on the task of predicting the next token, not just for code, but for CAD, then you could get a really useful product that could speed up mechanical engineering. So that's actually what we'd worked on up until Instill, even a little bit after Instill. And yeah, I can go into more detail about that. It was pretty interesting. That's probably how, despite these days doing less stuff with model training than in the past. For that, it was all just kind of rolling our own models from scratch, a lot of training, a lot of inference. [00:03:48]Alessio: I'm always curious to hear about what made you interested in that. Obviously, you've been at the forefront of a lot of this AI work. Why was that the most interesting thing to you? Did you think there were not as many people going after that? Did you think you had a unique insight into it? Because we got a lot of people listening that want to be founders and want to figure out how to make that decision. [00:04:09]Swyx: Yeah. [00:04:10]Aman: First off, I've always been incredibly fascinated by AI. The first time I originally learned how to program, actually, because I'd seen the results from ImageNet and I'd heard deep learning, and that just sounded insanely cool to me. And so my first programming project was building and training a neural network in Java, because that was the only language I knew from my AP Computer Science class. But ever since then, everything I've done has been involving ML, AI. The reason I wanted to, I guess, found a company is, first off, I had been working with Michael on a couple of other things. We'd done an AI consultancy in the past. We worked really well together and really just enjoyed working on stuff on our own. With CAD, we were doing a little bit of ideation, and I think we were quite worried about competition in a lot of other areas. I think that worry has definitely subsided a little bit with what we're working on now, obviously. A lot of competition in the coding space. But it seemed like the kind of thing where not a lot of eyes were on this. It seemed very technically possible, at least at the time. And the market was pretty sizable, if you looked into it. So it was both a really interesting technical problem. And then if you just tried to analyze the space, it seemed like a good idea. [00:05:23]Alessio: How do you decide to move off of it? That's another important answer as a founder. [00:05:28]Aman: I think there are a few key things that we did not take into account when we were working on this. One was if you look at the original Codex paper, our assumption was this is the model that powers Copilot. It was trained on 100 billion tokens, or it was something like 50 billion tokens of Python. And one interesting insight from it was that you actually get no transfer benefits from the pre-trained model on text to code. So that means they took GBD3, and for the smaller models, for the models that weren't trained in all the Python data, there were some benefits where GBD3 transferred really well faster. But then for the final Codex model, it turns out that there were no transfer benefits, meaning you just took a model, you trained it from scratch on those 100 billion tokens of Python code, it would do just as well as the GBD3 12 billion model that was fine-tuned. The issue was that it was only true for GBD3 and 100 billion tokens of Python code. These days, I mean, the jury's still out on this, but it seems pretty clear that the benefits from learning language are quite helpful with code. I guess that kind of goes into the issues with CAD, where one, you're dealing with much less data than code. If you assume, first off, that 50 billion, 100 billion tokens is all you need, then maybe with like 10x less, you could get a pretty useful model. In reality, Copilot today is powered by probably trillions of tokens of code, as well as text. And when you're dealing with, at most, from scraping every single bit of CAD data, you can find 10 billion tokens. It's just not enough to train a useful model. We tried scaling, and no matter what kinds of regularization techniques we used, we just couldn't get it past a few billion parameters without overfitting. That was the big thing. And then the other is that there's no transfer. If you try to test these models today, and even with GBD4, there's a prompt that I like to use, which is good for testing like 3.5 versus 4 if you don't know which one's behind the scenes. But even 4 sometimes struggles with it. And the prompt is you kind of lay out, I think it's like a famous kind of Gary Marcus prompt as well, where you lay out a bunch of kind of cubes on a table, right? And you describe it, and as you increase the complexity, you know, 3.5 drops out, you increase the complexity more, 4 drops off. But it's clear that these models are not that good at spatial reasoning, and that's exactly what's needed for CAD. [00:07:52]Swyx: Oh, yeah, that's right. [00:07:54]Aman: What you want to do is if I were to design this table with CAD in front of you, I would first draw a rectangle, then I would do an extrusion operation, which would basically take the rectangle and then extend it orthogonal to the plane, such that it's like a volume, [00:08:14]Swyx: right? [00:08:15]Aman: And then the model has to realize that, okay, the shape now that exists is this structure here, this table. And then the really difficult thing is for other operations, it'll need to point to the constructed geometry that was built. And basically, the model effectively to work well, it needs to kind of, in its mind, imagine this 3D structure. And the models are not good at that. If you try fine tuning code models on this task, or language models, they're just not going to transfer well at all. [00:08:44]Alessio: Do you think in like two, three years, there will be a good AI-powered CAD software? [00:08:47]Aman: Yeah, my perspective now is that I think the best way here is probably redesigning the entire system. One other big pain point was we tried to build plugins with all the major pieces of CAD software, like SolidWorks, Onshape, and so on and so forth. And if you think it's hard to build a plugin for some of the older IDEs, you've not seen these pieces of software. And so I think even if you got a good model, it might be really hard to actually get distribution and create a good plugin that works. So it feels like with the advancements you have in kind of text to images, and there's some new companies kind of doing stuff with text to 3D, it feels like the reasonable approach is actually just scrap the way that people are doing CAD right now. And I suspect a company or some companies will come around and do this quite well. [00:09:37]Swyx: That's really good insight. And we have more sort of general LLM products thoughts to ask you at the end. We wanted to get into Cursor, since that is your primary product right now. In January 2023, you announced it to the world. Maybe take us into the, I guess, idea maze leading up to Cursor. [00:09:54]Aman: Yeah, I guess it still was one kind of brief, brief pivot period where we tried doing text to images. The reason we decided against it was that I don't think we're the founders for that kind of company. We learned this from CAD, and we strongly believe this now that it is much better to be a user of the product you're building. And we just weren't big users of any of the text to image tools. So it was around December where we managed to actually get early access to GPT-4. And before then, we had played around a little bit with using earlier versions of 3.5 on writing code. And we kind of given up. It just seemed like if you looked at Text DaVinci 2 or Code DaVinci 2, those older 3.5 variants, they just couldn't really do anything meaningful. But then we opened up the playground, started copy pasting code into there. And it was ridiculous. This is before everyone started using human. [00:10:50]Swyx: Did you use the early version? [00:10:52]Aman: Yes. So, okay. [00:10:54]Swyx: So, it was an earlier version. The unhinged raw. [00:10:56]Aman: Oh, it was still, no, it was still, it wasn't, yeah, it was still very safe. But before people started using, human eval was the thing, but before everyone started talking about it and knowing about it, we kind of pasted it in and it got 85%. And we were just like, wow, best open source model at the time got 30%. And Code DaVinci 2 got something like 47%. And yeah, GPT-4 today, it gets about the same score. And so we then started, you know, writing code in there, just copying and pasting random pieces of code from whatever kind of things we were testing and developing. We found that it was not just good at creating net new things, but refactoring code, editing code, helping you debug kind of every single aspect of software development felt so different with these models. And then we kind of, in our heads, just plotted out the future. And this is GPT-4, like what happens when you have 4.5, GPT-5? These models are just going to get better and better and better at programming. And the future is probably not going to be more and more things that you tab enter for autocomplete. I think that's a very useful tool. We use Copilot every day. We find it quite useful, but you can't have a world in which language models are able to produce 90%, 95% of the code, and it still follows that form factor. I think you have to redesign the entire way, the entire UX of writing software. And that was our take with Cursor, where you need to own the full IDE and completely redesign the flow of producing software and just doing software development in general. [00:12:32]Swyx: Those are big statements that we need to dig into a little bit more. I want to backtrace a little bit. So you got early access to GPT-4. That actually means that you were backed by OpenAI, you joined OpenAI Fund before you were Cursor. [00:12:46]Aman: Yeah, basically. Kind of. [00:12:48]Swyx: Oh, okay. Because I'm trying to get the chronology and I assumed you were, they funded you because of Cursor. [00:12:52]Aman: Yeah, so OpenAI is this program Converge. That was the program we participated in. And through that, the main thing was early access to on-release models that we got to play with. Obviously, none of this went to production. None of this could go into production. It was just kind of a sneak peek of GPT-4. And so, yeah, before we actually built out Cursor, we didn't take money from OpenAI, but we were a part of this program. [00:13:14]Swyx: Got it. Yeah. And then you also mentioned one more thing, which was interesting. You still use Copilot, but you also use Cursor. Yes. You also mentioned that Copilot is probably trained on trillions of tokens, which means that's extensive training since the original Codex. That's my guess. [00:13:30]Aman: I mean, if you look at the stack, for example, right? It's what? One to two trillion tokens? Something around that. I'm very skeptical that Copilot is training on less, especially with all the lawsuits you see with whether or not it's quote-unquote fair use. [00:13:43]Swyx: So yeah, my guess is trillions of tokens. [00:13:44]Aman: I don't really know. But yeah, I'm sure if you did the math on how much public code there is in GitHub, it's almost certainly the trillions. [00:13:52]Swyx: One of the reasons I harp on this is one of our pet themes is tracking the dataset to parameter ratio. And Copilot cannot be that big because it returns relatively quickly. So it's going to be in the low billions, right? So how do you do trillions of tokens to the low billions? That's interesting. Yeah. [00:14:12]Aman: I think I have some thoughts on this because there's the whole thing with chinchilla scaling and then people are now saying, oh, chinchilla scaling doesn't matter because of inference. But Copilot could be a mixture of experts. That's one other speculation. I don't know if that's true. I mean, it probably wasn't the case at least a year or two ago. My guess is it's probably a small model that's very over-trained. From what I've heard, there are also lots of tricks you can do with caching where even if the model is quite big, it doesn't take, it effectively takes no time to ingest the entire prompt. Yeah. [00:14:46]Swyx: Semantic caching is what they are calling it, right? I guess if it roughly embeds to the same thing, just return the same thing. [00:14:50]Aman: I think it's partially that, right? Where let's say the suffix or the code before where your cursor is has changed slightly. They might not actually go ahead and use a different... [00:15:02]Swyx: That seems dangerous for code. [00:15:03]Aman: It does seem a little dangerous, but it gives you this like incredibly snappy response. And the other thing is the KV cache, right? Where you can just, I don't think there's any open source framework that does this right now, but what you can do is if you've already computed something over the KV cache, then you can just... [00:15:19]Swyx: This is the attention KV cache for people following. [00:15:21]Aman: So this is the attention KV cache, right? And if you've already computed all the keys and values, you can just store that in memory and then load that back up in the GPU and you don't need to process the prompt again. And I speculate they're doing something like that behind the scenes. [00:15:35]Swyx: That's a lot of memory. That's a lot of story. It is. [00:15:38]Aman: Well, unless they're using something like multi-query attention or... [00:15:41]Swyx: Yeah. We'll talk about that in your Llama 2 piece. And then the final big opinion that you drop in there was you must write your own IDE as opposed to write a VS Code extension, which there's plenty of them out there. SourceGraph is doing one and I've been working closely with Morph, which just put out Rift. So this is obviously a big undertaking. Maybe explain more a little bit about why build your own IDE. Yeah. [00:16:05]Aman: The reason we decided to do this is I think in the future, today, what Cursor can provide and what any of these tools can provide isn't that much different than, I guess, what you get in VS Code. But it was more of a long-term decision where in the long term, you're going to need to design just a very different UX that the extensions don't give you. One story we'd heard is that with Copilot, in order to actually get the multi-line ghost text implemented, it wasn't actually a part of the extension. I think the team at GitHub had to call up VS Code and have them make a change to the source in order for that extension API to be enabled. That's what allows for multi-line ghost text completion. And this is scary. If you look today, there are other things that VS Code in their source code has enabled as APIs that are just closed off to everyone but Copilot. So I think there's this fundamental platform risk where you're competing with the incumbent that owns the platform you're building on. And we thought it would just not really be tenable in that sense. And then the other thing is if you want to do other kind of fancy things. So one example of a feature we're kind of building right now is instead of just... So Copilot is great for completing the next line, completing the next few lines, but what if you wanted to do a kind of sort of edit, where instead of just completing this line, it changes the line above or delete something. There's no way that you can do something like that in VS Code, but we have the UI for this that we've kind of built out in Cursor. We're currently training models in order to get it to work well. But again, this is a feature that we think once we get it to work, will be quite useful. Could be on par with Copilot level in terms of usefulness. And it's just fundamentally impossible unless you own the IDE. There are a lot of other ones like those that we're kind of cooking up. And then there are small things. I do think like in terms of inline edits, which means inside the editor, you can press command K in Cursor and then ask for some kind of modification of the code or ask for a generation of the code. And I do think we have probably the best UX for that because if you look at what someone like Sourcegraph does, I mean, Sourcegraph code is a great product, but they basically have to use the GitHub pull request comment feature in order to do it. And I think like these paper cuts kind of add up over time. [00:18:24]Swyx: They do. And it's very impressive how quickly you can try it out, you know, obviously encourage everyone listening to try out Cursor. And the download is really quick. The binary is super small. And then when you spin it up, it boots up really fast. And it's just a text file that guides you through the tutorial. It's really, it's really great. [00:18:39]Alessio: I was using it today. Actually I will open right now. The first thing I like, you guys have like bring your own keys. So that's like one of the things that I don't see in enough products, like bring your own API key instead of like sign up for an account and do all of that. [00:18:54]Swyx: Well, so like you have to trust them that they won't. [00:18:56]Alessio: Look at this guy. [00:18:59]Swyx: I just wonder if like OpenAI could do one more thing, which is just, you know, do a limit, the spend limit per key. So like that leaves space for like other companies to come in and do that. But I mean, OpenAI could just build it tomorrow. [00:19:10]Alessio: I saw Logan tweeted about whether or not it would be interesting to have per key billing. [00:19:15]Swyx: I mean, I think that would be. They're clearly thinking about it. [00:19:17]Aman: Yeah. They have more important things like GPD 4.5. We can talk about that one. [00:19:20]Swyx: Yes. [00:19:21]Alessio: Let's talk a bit about what you do. So first of all, unlike some of the other tools, you guys have like a system prompt kind of thing, which are like rules for AI. What was like the decision behind that? Did you see people being frustrated with always having to repeat the same thing in the prompt? [00:19:38]Aman: Yeah. The problem was for encoding some small rules that the model will tend to get wrong. So for example, we use Solid instead of react. [00:19:46]Aman: Solid is just another reactive UI framework. It's a decent bit faster. And my co-founders know a lot more about the details in this than I do. But like the other really nice benefit is that with the VS code fork that we're using, you can kind of inject solid into multiple routes. While react is meant to be kind of like it takes over the very root of the entire DOM. Solid instead, you can inject it inside of like multiple HTML components. It's much more performant that way. And so yeah, we use solid because of that. And then the issue is every time you create a TSX file, write a component, GPD 4 by default will assume it's react, right? And so it'll get the code wrong. And so just encoding rules like that are pretty helpful on the side of that problem. For some of our users who are less familiar with English too, it's helpful to kind of add a prompt to say describe this in whatever language they're most comfortable with. [00:20:45]Swyx: And so for those who don't know, you primarily, the main model for most people is GPT 3.5 and pro users can use GPT 4. You're prompting GPT 3.5 with these system prompts first. Any other tips apart from your company specific ones, apart from the English as a second language ones, how do you prompt GPT 3 or 4 for code? [00:21:05]Aman: So this is interesting because I think in general, these models are good at just producing net new code or rewriting code from scratch. The thing that they're not great at is producing edits or modifications. So producing a diff is incredibly painful. And I'm sure you guys may have encountered this if doing stuff with agents, but they just get line numbers wrong pretty often. And when you're producing a diff, you know, it's fewer tokens of compute. And there's, there's some theories that like, you know, the more tokens of compute you kind of use up, the more the model is kind of expending on thinking, thinking, yeah, chain of thought. That's one thing we've kind of struggled with. And so that takes probably chaining to get it to work well, where one kind of technique we do is we have GPT 4 kind of propose a draft PR and then we have 3.5 go and kind of heal a draft diff and then we have 3.5 go and go and heal those changes. So you'll have to do things like this in order to get it work around those limitations with edits. In terms of general code writing, I think with 4, it's just, it's been super, super straightforward. 4 is fantastic. 3.5 would strongly recommend using the Azure model because there you get access to completions, meaning you can put kind of words in GPT 3.5's mouth and let it finish it. Kind of like what you can do with Claude. And that's really helpful. [00:22:24]Swyx: I always assumed that was going away as a API because OpenAI is like clearly not interested in maintaining that. I mean, they're straight up deprecating it now. Yeah. [00:22:33]Aman: It's a little frustrating because I think it's really useful for code, right? Because when you can do stuff in the middle of the line, it's impossible to do that with the chat format. But with the completion format, it becomes trivial. [00:22:46]Swyx: So one thing I learned from working with Jesse on GPT 4 OpenAI, he always asked GPT to comment your code before writing the code. And that's the chain of thought for code, right? So when I ask you for code, give me a fully commented code with only a brief explanation on how it works, bias towards the most efficient solution and offer an alternative implementation if it fits. If it's unclear what environment or library versions I'm working with that might significantly change your answer, please ask me to clarify. That's my custom instructions right now for code. And I'm just like, hey, we should come together as a community and just share these custom instructions or system prompts. Yeah. [00:23:18]Aman: When you get it to be more verbose, I do worry a bit in terms of UX because more tokens means it takes a lot longer to get to the answer. And then it's also just, I don't want to read a massive answer. I just often want the answer immediately, or I just want kind of a short block of code to answer that. That is a trade off you kind of will have to deal with. And the same thing with diffs, right? Where the diffs are going to be so much faster if you get them to work, but it's just going to result in lower quality edits. [00:23:47]Alessio: One nice thing you do in the chat is actually remove some of the code you don't touch. So I was using it to make some changes to the code base and in each function, it would say like add a comment with existing code and then tell you just the stuff to change and the stuff to add to it, which kind of frustrates me with gbt4 sometimes. It just re-gives you the whole function definition instead of just that. I noticed that in the chat, you can now apply change and put it into the code if you don't start the conversation from the file itself. Why is that so hard? Like so many products have it. Is it actually hard or is it just like a UX decision to have you? [00:24:24]Aman: So there are two ways of doing that, right? So when you say apply change, do you mean you select a region in the code, press the button and then it makes it just like, makes it in? [00:24:33]Alessio: Right here, it told me to like add these three lines of Python and I'm like, I don't want to copy paste them. You know, I did it, but it would be good to just do. [00:24:40]Aman: So if it just makes the change for you. Yeah, this is something we're going to be adding this week. So yeah, this is definitely like something a lot of users have asked for and it should be reasonably straightforward to do. I think the issues we want to use for sparingly because of how expensive it is and 3.5 actually kind of struggles with this. [00:24:59]Swyx: Interesting. [00:25:00]Alessio: And then I noticed, so you can chat either with or without context. So with context, you pass it parts of your code base without, you don't. Every time it loads the license file. So is there anything that you're working on to make sure that like you don't have like license infringement and stuff like that, or is it just like the model for some reason thinks the license file is really important? [00:25:22]Aman: Yeah, right now it probably uses vanilla embeddings. We're working on a couple of interesting techniques for much better retrieval. One of them is basically fine tuning a model to kind of memorize a code base. So there was a paper that came out a little while ago from Google, which is called documents or it's called transformers as a differentiable search index. The idea here is you train a transformer on a code base or you train it on a corpus of documents in order to basically directly answer questions about which document is relevant given the question. So the mapping would be some query, some question, and in this case, a question about a piece of code. And then the model would directly output the not just file, but let's say the actual function or the class that solves it. It wouldn't output all the code for it, but it only just output like the symbol that corresponds to it. And we've seen some initially promising results with this direction. If you look at the original paper and then there are some follow on work, it actually does a lot better than very old school retrieval techniques like BM25 and even embedding based techniques. And so this is an approach we're experimenting with and we think it could prove quite helpful. The other direction is just improving embeddings. If you looked at the recent paper by, I think it was Alibaba. So there was a recent model. If you do the math, it costs them $1,000, less than $1,000 to train this thing. And it beats OpenAI on non-code related tasks, sadly non-code related. OpenAI still kind of holds the crown for code related embeddings. But we think there's some promise in potentially training our own embeddings and then fine tuning it on particular code bases so it performs better there. So these are both directions. We're kind of independently exploring to improve the performance of retrieval. But in the short term, we do have the ability to use kind of re-rankers and more kind of advanced tuning. So if you look in the chat, I think there may be a button you can click which lets you enable re-rankers, which should improve the performance a decent bit. [00:27:24]Swyx: Awesome. [00:27:25]Alessio: Anything else in the product that we're missing? We have inline generation and inline question asking to the model. You have the chat interface on the right. Yeah. [00:27:36]Aman: So one thing that our users have found quite helpful is being able to add files or add documentation. So if you want to add Next.js docs, the most recent docs, you just do add Next.js in the chat or in command K and you'll be able to then basically get that information in your context. We have a lot of features that will be coming up quite soon. One that we're quite excited about is basically code interpreter style mode of using the chat. And so I don't mean that, I guess, in the traditional sense of code interpreter. But code interpreter is probably the one example of, as far as I know, the one example of an agent that works really well, that has some sort of kind of product market fit. And I think the reason it works super well is because when you try to get agents to do some massive task, I don't know, many people who like reviewing PRs or reviewing large diffs, it's much more fun to kind of be in flow. And I think the way that the code interpreter is able to deal with this is it breaks it down to these kind of small units that are very auditable and understandable. When you ask the model to produce a graph, you just see the graph and then you can kind of tell more or less it's wrong. And then you can go and see the code and the code's very understandable. So I think it's pretty important to kind of have the agent do these very small, discrete units and then show the output in a way that's very easy for the user to understand and then go in and fix. And so we're building a kind of flow like that in the chat that should be coming out in the next two weeks, which we're very excited by, because we've done a bunch of experimental stuff with agents. And the big thing has always been this problem where it just produces a bunch of code and it's just so hard to tell whether or not it's correct or not. It's less efficient because it'll end up having some bugs. And then it would have been better if the user just went and wrote all of it themselves. [00:29:30]Swyx: There's one approach with a former guest of ours, Itamar, on Codium, whose essentially approach is you need to develop the spec, the tests, and the source code in harmony kind of together. Well, the spec is the prompt, and then the spec could generate a test or a spec could generate code. And the only way to validate the code is to run it with tests, is kind of his analysis of what the agent space, what the code agent space may look like. [00:29:54]Aman: I think tests are pretty promising a direction. If you have a really, really rigorous set of tests where you can completely confirm whether or not the agent has done the right thing, I think that solves it. But I think it's only one part of the overall puzzle here. I do think you're going to want the model to... Like the issue is, it's really kind of painful to go and write this massive, massive prompt describing everything. I want to be able to kind of do it in flow and just see a change, then go step by step from there. I think that's just a more fun way of doing it. I think the more fun and more easy to use kind of product will win, assuming the capabilities are about equal. So that's kind of our bet here. Yeah, that's great. [00:30:34]Swyx: Have you thought about like, so you said you can add docs, which is really cool. And I've thought about this before, but I always get hung up on versioning. You just choose to not care about it and just embed the most current docs? Yes, we embed the most current docs. [00:30:46]Aman: You can add whatever docs you want, if you just have a URL. You can paste the URL for the docs in. [00:30:53]Swyx: You give a crawler, yeah. [00:30:54]Aman: We crawl it in the background and embed it. And so you can have a custom, basically a custom version or whatever version you use. It's stored locally for you. What kind of crawl diff? [00:31:05]Swyx: Like if you've just written... Yeah, that means you've written a search engine, kind of. [00:31:10]Aman: It's very, very basic. Docs are very, very easy to crawl relative to other things because it's like, they're all like this kind of sort of markdown-like format. [00:31:19]Swyx: Yeah. [00:31:20]Aman: Definitely have not written a crawler for the entire net. [00:31:23]Swyx: The other thing on Code Interpreter, we've also done an episode on that. I'm very excited about it. I think it's GPT 4.5, you know, because it's GPT 4 that has been fine-tuned on more code. Yeah. Plus it has inference time capabilities that you cannot do in the traditional LLM setting. Anyway, the most important thing about GPT 4 is that it has the sandbox. So the main question for you is, are you going to run the sandbox in your environment or do you want to run it on our local machine since you have access to that too? Yeah, I think we want to be very careful with this. [00:31:52]Aman: You don't want to do sudo rm-rm star or something. Our plan is to run it on the local machine, but always kind of prompting the user whether or not they want it. I think if we want to do things where the agent takes many... So for the Code Interpreter style thing, the great thing is because you're breaking it down to these units, you can kind of batch together a bunch of commands at each step, just kind of ask the user because they're always kind of watching. For agents that are running completely in the background, I think there you probably will need to have some kind of contained environment where it's safe for agents to execute arbitrary code. One pretty bad attack is if one team wanted to, let's say, prompt inject the model, they could just kind of in a piece of code, just like have a comment that said something like, When you're doing this kind of edit, you should do rm-rf or do something really, really dangerous. And then the issue is if an agent is kind of running in the background, and then it does that, and it grabs that piece of information, and then it gets actually successfully prompt injected, it'll just execute that thing. The same actually may be true with documentation, where someone malicious, if they had access to some piece of documentation that other people use, could try to prompt inject agents that are then going and running code and running terminal commands. [00:33:06]Alessio: Today, people just hijack npm packages. [00:33:09]Swyx: Yeah, there'll be more of that, I'm sure, shenanigans, as they call it. But yeah, I think probably the safest way is to have sandboxes in the cloud. And yeah, I've been calling this the sort of the agent cloud phenomenon. I think Fly, IO, Modal, and E2B are in that space already. And then I think Repl.it is exploring it. It'd be interesting for you guys to get in that game. I have trouble articulating what's different about an agent cloud versus a typical serverless sandbox thing that you can spin up. Basically, I think for people to, if agent cloud is a real category, we have to identify what kinds of feedback do we want to give the AI that's different to a human? That's the extent of my thoughts on what this would take. [00:33:52]Aman: I think the key thing that not enough people are probably doing is giving the AI access to a lot more tools. So the classic example I like to bring up is, if you look at the old kind of alpha code model, which went and got 50% on some programming contest, a competition, 50th percentile of pretty good programmers, right? This was a base model that basically got, I think, something around 28% on human eval. And they use this interesting inference strategy of having the model generate a bunch of test cases, and then running the test cases, seeing which one passed. They use some other, there's some other details there where they do clustering and whatever. But the key thing is kind of letting the model generate tests, run the tests on all the outputs that it's generated. And that brings a 28% code forces model to 50th percentile. Gbd4, you just add a very basic prompt, please complete this Python function, and it gets 85%, 87% on human eval. Now who knows how tainted that benchmark is? But assuming it's reasonable, like what score do you think gbd4, the same kind of inference strategy as alpha code would get on that benchmark? It would do really well. And then gbd4 is at this level where they can actually not just run the test and like binary yes or no, use that answer, but it would see the results of the test and be able to modify the code or the test base in those. And so I think just like, that's just one tool. The other tools you could have access to would be language servers. So this is a great thing with VS Code, where VS Code kind of invented the language server or the language server protocol. And so as a result, when working with a VS Code fork, we kind of have access to every single part of the language server protocol, which means we can go to definition, get all the symbols in your entire workspace, kind of everything you do in a modern IDE. And what we've been working on is kind of giving these models access to those tools. And that like dramatically improves performance, right? Because the way that humans usually will search for something is they'll kind of click around, go to definition, read some code, do all that. But you use the tools in the IDE to search for things more efficiently. And if you're just trying to have a model, just do a brute force kind of semantic search and get the answer from that. I think it's not going to work nearly as well as kind of an agent that's able to use those tools. [00:36:10]Swyx: Awesome. [00:36:11]Alessio: And you guys are growing the team right now? [00:36:14]Aman: Yes, we are. So we are currently five people based in SF and we're looking to hire engineers and designers. We think there's a lot of interesting work that we're doing that's left to be done. So some of it involves model training, kind of training some open source models for things like embeddings or areas where it perhaps is too expensive or not or too slow to use open AI. And then lots of interesting things with pushing these models to kind of the boundaries. So getting GPT-4 to work really well in this kind of agent loop in a way that's really in flow and intuitive for users to use. So yeah, I think lots of exciting work. [00:36:53]Swyx: Cool. And then maybe to sketch out a little bit more about the company and then we'll zoom out to just general LLM observations. You're also working on a prompt tooling thing called Priompt? Yeah. [00:37:04]Aman: So this is just an internal tool that we use. It's called Priompt. And we built this because we didn't really find a good way of solving for the problem of when you have a variable number of kind of inputs that you want to stuff into the prompt and you have like a fixed length prompt, right? You can only use 4096 tokens. How do you encode for rules and how to properly kind of order the inputs that go into it? And Priompt or priority prompting are intermediary solution for this, where you can kind of encode very custom rules into how you build up the prompt based on how, I guess, overflowing it is, right? So let's say I have a bunch of previous chat messages and then I also have the code from the current file. So maybe what you want to do is you want some rules where if everything can fit in, you put it all in. But then you start by like first removing like all the old chat messages. Then once it gets to a certain length, you don't want to remove any more chat messages and you want to start removing parts of the file. And then you want to remove parts of the file in this particular truncation strategy, which tends to work quite well. So this kind of thing where like as you kind of slide the window and how many tokens you're allotted, you can see like the prompt is like very, very differently constructed. So it's like optimal kind of at all sizings. And we found that quite helpful internally. [00:38:23]Swyx: And you chose the JSX approach. I'm not going to ask you too much about like design choice. I mean, it's popular with React. Fixies also put out AI JSX. Do you find that like helpful? Do you think that like some kind of DSL might emerge for prompting? [00:38:36]Aman: Yeah, I think it's still pretty early. And it's not clear what the best way to do. I think for very lightweight, easy prompting, like you should just use strings. When you're doing kind of prompt engineering, and really like rigorous prompt construction where you can have a bunch of different possible inputs in the prompt. We think JSX makes a lot of sense. It's because it's kind of like website development where you'll have different kind of screen sizes, different kinds of devices that can look at it. And in a similar way, you've different kinds of prompts, right? You've different prompt context lengths. And you basically want across all different context lengths to get a very, very good prompt for the model. And so yeah, that's why I think like JSX kind of makes sense. It's not clear if it is like the best way of doing it. I think the jury's still out on that. [00:39:27]Swyx: One way to deal with the context length model issue is to train your own model that has a very long context length, like magic.dev, which announced a 5 million context length window. I don't know how credible that is. I haven't tried it. But your thoughts? [00:39:42]Aman: Yeah, I think the issue with context length, long context length right now is that costs scale linearly, right? Costs technically scale quadratically in terms of attention. But the interesting thing is that for really, really large models in terms of flops or actual floating point operations that the models are doing, attention tends to be a pretty negligible part compared to the actual, I guess, feed forward part of the neural network. And so up to like 8K, it tends to look pretty linear. I guess when you're going to like higher and higher context lengths, it starts to get more and more tricky. And then there's some other optimizations or some other difficulties with memory bandwidth that we can get into. It just feels like the key issue is even if it is linear, it's still so expensive, right? Paying for 32,000 tokens at whatever the pricing is right now feels like exorbitantly high. My perspective is that there probably will be at some point in the future, or there might be at some point in the future, like a better approach for really, really long context. Something that looks more kind of recurrent. [00:40:47]Swyx: It feels more elegant. [00:40:48]Aman: I don't know if it'll happen because I think there are like interesting ways of hacking together or chaining together these language models, even with short prompts. But I'm not super bullish on kind of scaling up attention the way that we're doing right now in like 100, 200K context windows. [00:41:02]Swyx: Like Cloud is doing. Yeah. Are you monitoring like RWBKV, which is one of the recurrent approaches? [00:41:07]Aman: I've been meaning to read that paper. I have not been monitoring that. I looked into a few of the papers from state space, like the state space models. Those are pretty interesting. Can you give an intuition? [00:41:19]Swyx: Because you seem to be explaining it really well. Why are they different? Why is that interesting? Yeah. [00:41:25]Aman: I think the interesting thing with, at least with the original state space model, is that you get kind of two benefits. One for training, you get the paralyzability of a transformer and you can kind of run it, I believe in about N log N for some N length sequence. And then for inference, it's also like the way that it's formulated is also somewhat recurrent. So you can kind of store everything in this fixed state. And then because of that, you get, I believe, an O of one kind of cost towards inference. It could be slightly higher, but yeah, it's much less than the O of N cost per token for the transformer. And so that makes it really tractable to then do for very, very long sequences. There's some follow on work with Hungry Hungry Hippos and Hyena. And again, I think the key piece is that for like very, very long sequence lengths, it ends up being N log N rather than N squared. I did say that the cost of like, I guess even linear attention is pretty high, but that's because the 32K model is priced a decent bit higher than the original. It is surprising that Claude, or I'm actually not familiar with Claude's pricing. Is it higher for the 100K one than for the normal? [00:42:31]Swyx: No, I believe it's the same. [00:42:33]Aman: That is actually quite surprising. I'm not sure if they're doing attention under the hood because even with like a lot of tricks with 100K or even 200K, I would assume that cost will eventually start to build up. So they might be doing something fancy there. [00:42:46]Swyx: Well, my guess was alibi, which is a trick, which is replacing proper attention with kind of like a exponentially declining forgetting curve is what I'm thinking. Someone has to put 100K to the test. I haven't done it. [00:43:00]Aman: Yeah, they have this graph that looks promising for 200K, but I feel like anecdotally from everything that I've heard, it just seems like it forgets things like they don't actually pay attention to things. [00:43:11]Alessio: I just open-sourced a small podcast there yesterday, which is what we use. [00:43:14]Swyx: We use it to summarize this podcast. [00:43:16]Alessio: Yeah, I'm looking at my logs, prompt length of all my recent ones is 55,400 tokens, and it works. [00:43:25]Swyx: How much per call? [00:43:26]Alessio: Free, because it's not commercial. I'm like, hopefully nobody from Entropic is listening. But yeah, it works. But I think that's kind of like the sweet spot. And then the completion length is like 1,800, you know, so it's not like it stays within the 60K band. But anyway, yeah, curious to see. And I think another thing from your Twitter model parades that I really like is actually differentiating between the type of workload. I feel like people talk about these models as like anything you do is like the same thing, but you posted about GPT 3.5 being cheaper than LLAMA 2 for completion-heavy workloads. What does that mean? [00:44:07]Aman: Yeah, so there are different terms, I guess, based on like whatever community you're in. So I think in the research community, they probably call it pre-filling is handling prompt tokens. And then I believe maybe decoding is what they call generating completion tokens. We'll just use prompt tokens and completion tokens. But for prompt tokens, the work, it's entirely compute bound. And the reason why is the same reason why transformers are so good at being kind of being trained in parallel. And it's that you can parallelize the entire sequence or you can parallelize an input, not just along the batch dimension, but the sequence dimension. So that means let's look at the first layer of the transformer. Imagine like that entire layer could fit in memory. I just read that to memory. And then I basically apply the matrix multiplication of the entire sequence on this layer. If you're doing token generation, instead, you have to read the layer, then taking the first input, and then you have to read the next layer and then do that same input. You have to do it all the way to the end of the model. And then you generate the next token and that next token passes through all the layers again. So before what you were doing is you have all your input tokens in parallel, they're going through the first layer. So you read the first layer, then in parallel, they're going through the second layer. You read the second layer, so on and so forth to the end. But when you're doing it for one token at a time, you read the first layer, second layer, third layer, fourth, blah, blah, blah. Then you do it all over again for the next token. And so as a result, for your sequence length N, you end up using N times more memory bandwidth than compute. [00:45:44]Swyx: And time as well, like wall clock time. Yeah. [00:45:47]Aman: I mean, so with wall clock time, it's weird because transformers are far more efficient than... [00:45:53]Swyx: I comment on that because in the RWKV interview that I did, same thing. They have a visual actually of this. So the thing you were trying to describe with words, they actually have a visual and animation But it's helpful because once you see it, you're like, oh, okay, that's why it's like a different graph. Yeah, exactly. Yeah. [00:46:10]Aman: So when you're dealing with the prompt, it's completely compute bound. And because GPUs can handle some crazy number of floating point operations per second, it's like almost instant. That's why time to first token feels super instant. And then when you're generating one token at a time, it now becomes completely memory bound where for each token, you're bound by how fast you can read all the weights into memory. So that's like around like 200x slower in general. [00:46:34]Swyx: Yeah. So your specific recommendations, which I pulled out from the post, people should read it. It's really good. I feel like the title undersells it a little bit. Yeah. You should not serve Llama 2 for completion heavy workloads. Llama is best for prompt dominated tasks like classification. And I feel like I can run with that. That makes a lot of sense. [00:46:51]Aman: And re-ranking is one thing we find useful for it internally. [00:46:54]Swyx: Do you use Llama 2 right now? [00:46:55]Aman: We don't have it in production, but we've experimented with it for a few things. [00:46:59]Swyx: You also had an interesting observation because I think we had talked a lot about quantization in the podcast just for running locally or more efficient running. You said quantization and imperfect utilization cancel each other out. Yes. That's a cool observation. Yeah. [00:47:12]Aman: So this is like a little bit hand wavy, but the core thing is, yeah, we expect that when you don't have like complete utilization, right, you're never going to like saturate all your GPUs. There's going to be some idle time. Like from things that we've experimented with in the past, it ends up being, you know, 50% is a reasonable amount as a more liberal estimate of how much you can get. So the interesting thing about quantization is that there's a bunch of these kind of new quantization libraries that have cropped up and they're all very good at reducing costs for low batch inference when you're memory bound. But the key thing is when you increase the batch size, they actually end up resulting in no real speed ups over FP16. The reason why is because they only quantize the model weights, right? So that operation of kind of reading the model weights when they're now, you know, 4x smaller instead of FP16, they're, you know, 4 bits or something. [00:48:04]Swyx: It's still the same number of weights. [00:48:06]Aman: The operation of reading weights is like it ends up being 3, 4x faster. But the issue is when you increase your batch size enough and for large batch inference, the key thing is it now moves back from being memory to being compute bound again. And when you're compute bound, quantization of model weights basically does nothing. And so it ends up being effectively the same cost. And then the other interesting thing is it's even worse for small models because for, or at least the small LLAMA models, because I believe the smaller ones relative to model size have a much bigger KV cache. I'm not sure if the smaller ones use multi or group query attention. They might not. [00:48:42]Swyx: They do not. Only the large ones use. Okay, exactly. [00:48:45]Aman: Yeah. So then because they use normal multi-head attention, the thing is when your batch size increases enough, then the memory bottleneck is not your small quantized model weights. No, it's actually the KV cache. And so quantizing the model weights effectively will do nothing then. So the key insight there is like all these new techniques are fantastic when you're just kind of playing with these models, running them low batch sizes. But when you really try to increase the batch size and serve it in production, they're probably going to be lower or more expensive than FP16 because there are these optimizations with things like text generation inference, which uses VLM or like page attention, which are much, much faster. And so the best that I think you could probably do right now with open source is like full 8-bit quantization, which means not just quantizing the weights, but also like the actual activations and the KV cache so that none of those things end up being bottlenecks. [00:49:36]Swyx: That's a great breakdown. The post goes into much more detail with a lot of math, actually, which I love. And you also spec out some rules of thumb, which I think people can use to figure out their limitations and pricing and all that good stuff. Yeah. [00:49:49]Aman: One big caveat I'd say is that the other massive benefit of LLAMA too is that you can fine tune it. [00:49:54]Swyx: Yeah. Right. Well, you'll be able to fine tune OpenAI soon enough. We'll see. So we'll just get your general takes on LLM topics, just kind of quick fire, and then we'll go to lightning round. So human eval, that is the predominant way to benchmark code models because OpenAI benchmarks code models that way. There's some issues with it. [00:50:13]Aman: Yeah. With open source models and even probably with some closed source models, it's unclear how much of it has actually leaked into the train set, right? So there's a recent model, New Hope, which it looked like they had some leakage, which is why it had really, really good performance. But I think there was an interesting approach taken by Palm too, where I think this is actually possible for someone to do right now. I've been meaning to do it at some point, but there's this paper called Babel code and they have a library which I think literally translates human eval into all other languages. And I think that would be a really good test because the other issues, a lot of the models that perform really well on human eval are pure Python, right? And that doesn't really give you a sense of if it's a good coding model overall. So yeah, I think at some point it would be really helpful if just someone did the work and ran the Babel code engine and translated human eval into all these other languages and then was able to run it. I think that would probably be a better benchmark, but still. I think if the original human eval problems leaked, I suspect it would also be helpful for solving the problems translated into other languages. But the issue is it's just so easy to run and anything else is probably going to be quite painful. [00:51:24]Swyx: Right? Well, it'd be better if there was a sandbox to run it. So agent cloud, hashtag. Hot take on training. Yeah. [00:51:32]Alessio: Another one from your endless Twitter quality. Training will look like researchers offloading large scale training jobs to specialized training companies. A state of the word that resembles chip design and fabrication. Yeah. [00:51:44]Swyx: How do you think about that? [00:51:45]Alessio: And obviously Mosaic was on the podcast just got acquired. [00:51:47]Swyx: So you tweeted that in May in 2022 and then one year later Mosaic gets acquired. Like I think that's a pretty fresh hint. Yeah. [00:51:54]Aman: I was probably wrong about it in a lot of ways too, because I assumed the future would kind of look like a lot of startups would have their own models. And this is me kind of in the CAD frame of mind where I thought, okay, if you look at GPT-3 at that point, it was just like GPT-3, maybe a little bit 3.5. It wasn't like that good a generalist model. And I thought prompting is not the way to do things. It's just completely fine tuning or training your own models. And it was also a similar time that we kind of saw a lot of the open source earlier efforts in training models, which proved like not that great. I think Bloom and OPT were two models that came around about that time. And if you looked at the OPT logs, they manually tuned their learning rates several times. I think they switched the optimizer from Adam to something really weird where they switched the optimizer in the middle. And don't quote me on this because I could be wrong, but I remember it was like some really, really sketchy stuff down in the middle. And I just thought, wow, if it's this hard, it seems like there's a company to be built around it. The key difference is that there are just massive foundation model companies. And I think most AI product companies are not going to be mostly training their models or mostly using like custom models. It's more so going to look like them kind of using these APIs out of the box. And then maybe using, you know, the fine tuning endpoints there. [00:53:12]Alessio: Oh, I mean, it's the same. [00:53:14]Swyx: So you changed your mind a little bit. [00:53:15]Aman: I did change my mind a little bit. I assumed like with the CAD thing, I thought, okay, you're gonna need a foundation model for CAD. You're going to need a foundation model. [00:53:22]Swyx: No, that's old school thinking. [00:53:23]Aman: Yeah. And now it's just like you have the one generalist model. The one God model. And the one God model transfers fantastically well with everything. Okay, quickly move along. [00:53:31]Swyx: You had another one, which I loved. The size of all code history on GitHub public repos is 92 terabytes. The size of Google's monorepo is 86 terabytes of much higher quality code. If Google were willing to deploy code models trained on your own data, they would have a noticeable advantage over everyone else. Yeah. [00:53:46]Aman: Again, this is one thing that I think is probably a little wrong. Because this is based on the big science paper. And the big science paper, like basically said they scraped all of GitHub and they got 92 terabytes. And I think if you look closely, which I did kind of after some people kind of pointed out some mistakes, I think GitHub is like a lot, a lot bigger than that. The big science paper said they get cloned. And so I was assuming, okay, get clone means you get the full working tree, right? But if you look a little deeper, I think GitHub is like a lot bigger than people think. My expectation is that GitHub probably has something like five to 10 trillion tokens of code, usable code. And so that's a lot more than what they ended up getting. But yeah, Google still has like a pretty meaningful fraction. [00:54:33]Swyx: And they just put out IDX, which is somewhat of a competitor. Yeah, yeah. [00:54:37]Aman: I think it's more like, it looks more like a replit kind of competitor where it's like an in-browser thing. But yeah, I think a lot of people can be viewed as competitors. [00:54:46]Swyx: But you're very competitive as we established, you know. And then final question, why is the company called AnySphere? And you have this whole manifesto on your landing page on why humans should focus on bigger problems. [00:54:55]Aman: It's an interesting story where Michael and I were in this program Converge, and two of our friends, Arvid and Swale, who we knew like reasonably well at MIT. And we knew them because they're like some of the best engineers at MIT. And so they were independently kind of working on their own company. It was called AnySphere. And we both independently from after playing with GPT-4 realized, oh, wow, like the IDE is the thing to build. After a few months of independently working on it, we realized, okay, like, why are we doing this separately? We should just kind of join forces. And that's kind of what we did. And so right now, the overall company is called AnySphere. But yeah, the product and the core thing is Cursor. It's lovely. [00:55:34]Swyx: I recommend people actually check out AnySphere.co and read the manifesto because I think it's a broader message to builders out there. Yeah. [00:55:42]Alessio: Yeah. Let's jump into lightning round. Okay. We got three questions for you. The first one is, what is something that already happened in AI that you thought would take much longer? [00:55:52]Aman: I think code. Specifically, I think just being generalist at code, where before you had these specialized models, right, where codex was supposed to be kind of specialized for code. And then there's a general language model, but it's kind of unification of capabilities towards like this one model that's not just really good at text, but it's also fantastic at code. I was not expecting like the generalist model, I guess, to come super, super soon and be this good at code. [00:56:19]Swyx: That's why you pivoted or you started your whole company. What do you think is the most interesting unsolved question in AI? [00:56:26]Aman: I really think it's this kind of long-term memory piece where I think it's possible to get to maybe AGI superhuman level systems that still kind of hack around memory using like something that kind of resembles transformers. But it feels like the more elegant thing is how do you get models that really like continuously learn? Some kind of recurrent based system would be able to do this where there's like a state. But right now, like models can only really learn in context super efficiently. Fine-tuning is incredibly inefficient. It requires tons of data points to actually learn new things. So yeah, I'm really interested to see how we solve this lifelong learning efficiency problem. [00:57:06]Swyx: Yeah. I'm interested in using knowledge graphs to do that because I think that's kind of like a forgotten piece of the puzzle. And if you could have models update their own knowledge graphs and query their own knowledge graphs, that might be it. I think Llama Index is basically working itself into what that is. Oh, interesting. [00:57:22]Aman: Yeah. And then there's the techniques where the models directly kind of learn to like inside the weights or inside the architecture, you learn how to be able to read from databases and retrieval based like the retro based techniques. Like those seemed interesting, but it's surprising like you haven't really seen anything from that in a while after that initial paper. [00:57:42]Alessio: And just to wrap the episode up, what's one message you want everyone to remember and think about as they keep building and exploring in AI? [00:57:49]Swyx: Yeah. [00:57:50]Aman: I mean, GPT-4 is now a few months old. At some point we're going to get much, much better models and I think it'll be pretty soon. And so what does the world look like then? And specifically for coding, like what does the world look like when you have another step that's just as large as it was from GPT-3 to GPT-4? I think it's just so incredibly different. I think it just completely changes how people write software. [00:58:14]Swyx: In what direction though? So I've said my piece on like 4.5 being more inference time. I don't actually know if that's true. That's just my theory. [00:58:22]Aman: I think the direction that we'll probably see is, I mean, the language models will just get better at doing intense reasoning, right? So they'll be able to tackle harder problems. They'll probably pick up in more nuances and how like software engineering is done. They'll probably have longer context windows. And so I expect, yeah, more agentic type things will end up being more prominent in the future. I don't know how far you can take the agent stuff with a four level model, but I think with like a 4.5 or a 5, I think agent models will work for almost any kind of coding task. At least almost any kind of reasonably well-scoped coding tasks. [00:59:00]Swyx: Agents are the future. Well, thanks so much for coming in. Thanks Aman. Of course. [00:59:04]Alessio: Thanks for having me. [00:59:04] Get full access to Latent Space at www.latent.space/subscribe
59:2522/08/2023
The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI
Invites are going out for AI Engineer Summit! In the meantime, we have just announced our first Actually Open AI event with Brev.dev and Langchain, Aug 26 in our SF HQ (we’ll record talks for those remote). See you soon (and join the Discord)!Special thanks to @nearcyan for helping us arrange this with the Eleuther team.This post was on the HN frontpage for 15 hours.As startups and even VCs hoard GPUs to attract talent, the one thing more valuable than GPUs is knowing how to use them (aka, make GPUs go brrrr).There is an incredible amount of tacit knowledge in the NLP community around training, and until Eleuther.ai came along you pretty much had to work at Google or Meta to gain that knowledge. This makes it hard for non-insiders to even do simple estimations around costing out projects - it is well known how to trade $ for GPU hours, but trading “$ for size of model” or “$ for quality of model” is less known and more valuable and full of opaque “it depends”. This is why rules of thumb for training are incredibly useful, because they cut through the noise and give you the simple 20% of knowledge that determines 80% of the outcome derived from hard earned experience.Today’s guest, Quentin Anthony from EleutherAI, is one of the top researchers in high-performance deep learning. He’s one of the co-authors of Transformers Math 101, which was one of the clearest articulations of training rules of thumb. We can think of no better way to dive into training math than to have Quentin run us through a masterclass on model weights, optimizer states, gradients, activations, and how they all impact memory requirements.The core equation you will need to know is the following:Where C is the compute requirements to train a model, P is the number of parameters, and D is the size of the training dataset in tokens. This is also equal to τ, the throughput of your machine measured in FLOPs (Actual FLOPs/GPU * # of GPUs), multiplied by T, the amount of time spent training the model.Taking Chinchilla scaling at face value, you can simplify this equation to be `C = 120(P^2)`.These laws are only true when 1000 GPUs for 1 hour costs the same as 1 GPU for 1000 hours, so it’s not always that easy to make these assumptions especially when it comes to communication overhead. There’s a lot more math to dive into here between training and inference, which you can listen to in the episode or read in the articles. The other interesting concept we covered is distributed training and strategies such as ZeRO and 3D parallelism. As these models have scaled, it’s become impossible to fit everything in a single GPU for training and inference. We leave these advanced concepts to the end, but there’s a lot of innovation happening around sharding of params, gradients, and optimizer states that you must know is happening in modern LLM training. If you have questions, you can join the Eleuther AI Discord or follow Quentin on Twitter. Show Notes* Transformers Math 101 Article* Eleuther.ai* GPT-NeoX 20B* BLOOM* Turing NLG* Mosaic* Oak Ridge & Frontier Supercomputer* Summit Supercomputer * Lawrence Livermore Lab* RWKV* Flash Attention * Stas BekmanTimestamps* [00:00:00] Quentin's background and work at Eleuther.ai* [00:03:14] Motivation behind writing the Transformers Math 101 article* [00:05:58] Key equation for calculating compute requirements (tau x T = 6 x P x D)* [00:10:00] Difference between theoretical and actual FLOPs* [00:12:42] Applying the equation to estimate compute for GPT-3 training* [00:14:08] Expecting 115+ teraflops/sec per A100 GPU as a baseline* [00:15:10] Tradeoffs between Nvidia and AMD GPUs for training* [00:18:50] Model precision (FP32, FP16, BF16 etc.) and impact on memory* [00:22:00] Benefits of model quantization even with unlimited memory* [00:23:44] KV cache memory overhead during inference* [00:26:08] How optimizer memory usage is calculated* [00:32:03] Components of total training memory (model, optimizer, gradients, activations)* [00:33:47] Activation recomputation to reduce memory overhead* [00:38:25] Sharded optimizers like ZeRO to distribute across GPUs* [00:40:23] Communication operations like scatter and gather in ZeRO* [00:41:33] Advanced 3D parallelism techniques (data, tensor, pipeline)* [00:43:55] Combining 3D parallelism and sharded optimizers* [00:45:43] Challenges with heterogeneous clusters for distribution* [00:47:58] Lightning RoundTranscriptionAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: Hey, today we have a very special guest, Quentin Anthony from Eleuther.ai. The context for this episode is that we've been looking to cover Transformers math for a long time. And then one day in April, there's this blog post that comes out that literally is called Transformers Math 101 from Eleuther. And this is one of the most authoritative posts that I've ever seen. And I think basically on this podcast, we're trying to give people an intuition around what are the rules of thumb that are important in thinking about AI and reasoning by AI. And I don't think there's anyone more credible than the people at Eleuther or the people training actual large language models, especially on limited resources. So welcome, Quentin. [00:00:59]Quentin: Thank you. A little bit about myself is that I'm a PhD student at Ohio State University, starting my fifth year now, almost done. I started with Eleuther during the GPT-NeoX20B model. So they were getting started training that, they were having some problems scaling it. As we'll talk about, I'm sure today a lot, is that communication costs and synchronization and how do you scale up a model to hundreds of GPUs and make sure that things progress quickly is really difficult. That was really similar to my PhD work. So I jumped in and helped them on the 20B, getting that running smoothly. And then ever since then, just as new systems challenges arise, and as they move to high performance computing systems and distributed systems, I just sort of kept finding myself falling into projects and helping out there. So I've been at Eleuther for a little bit now, head engineer there now, and then finishing up my PhD and then, well, who knows where I'll go next. [00:01:48]Alessio: Awesome. What was the inspiration behind writing the article? Was it taking some of those learnings? Obviously Eleuther is one of the most open research places out there. Is it just part of the DNA there or any fun stories there? [00:02:00]Quentin: For the motivation for writing, you very frequently see in like the DL training space, like these Twitter posts by like, for example, like Stas Bekman at Hugging Face, you'll see like a Twitter post that's like, oh, we just found this magic number and everything is like 20% faster. He’s super excited, but doesn't really understand what's going on. And the same thing for us, we very frequently find that a lot of people understand the theory or maybe the fundamentals of why like AI training or inference works, but no one knows like the nitty gritty details of like, how do you get inference to actually run correctly on your machine split across two GPUs or something like that. So we sort of had all of these notes that we had accumulated and we're sort of sharing among engineers within Eleuther and we thought, well, this would really help a lot of other people. It's not really maybe appropriate for like a paper, but for something like a blog post or technical report, this would actually maybe squeeze a lot of performance out of people's hardware they're already running on. So I guess there are a lot of projects in Eleuther that we're sort of trying to share notes with people in a way that typical institutions don't. They sort of live within that institution and then you go to a different institution and they do something very similar, but without the lessons of the previous. And it's because everyone's trying to do their own special sauce with their own stack. Whereas Eleuther, we don't really have that constraint and we can just share everything to everybody. [00:03:14]Swyx: Yeah, this is a level of openness that basically very few people actually embrace. One, it's an extra effort to write things down, of course, but two, it is secret sauce and so that not many people do it. And therefore, oftentimes the only way to learn this stuff is to actually work in one of the large model labs. And so you guys are doing a lot. The only other instance where I can think of where people actually open sourced their process was Facebook's OPT. What else is similar, like sort of trade knowledge, but not formal research knowledge? [00:03:45]Quentin: I would say Bloom. So the Hugging Face Bloom project in big science and all of that, that was very open. I'd say it's the same caliber, if not more detailed than OPT. Other than that, I think there was like a doc from Microsoft on like their Turing NLG. Their paper is pretty relaxed in that it did talk about some of those challenges. Other than like OPT and Bloom and us, I can't think of any. It's a new thing. [00:04:10]Swyx: It matters that you are going for the sort of good enough rules of thumb, because I think a lot of people try to go for precision and being overly precise actually is not helpful. Right. Yes. [00:04:20]Quentin: You'll see some like statements in the blog posts that are just like, we think this is about 1.2 in our experience. And, you know, we don't go any further into detail and it would take maybe an extra month for us to chase down every single little piece of memory. But instead, like getting good enough is still helpful to people. [00:04:36]Alessio: Let's jump into it. The first part of the article, and we'll put this in the show notes so people will be following along with the post. So we don't need to read every single equation and every footnote for it. [00:04:46]Swyx: Okay. [00:04:46]Alessio: But the core equation here is that not the cost of compute, but the compute required to turn a transformer model is roughly equal to tau times T, where like T is the, where tau is the hardware setup throughput that you have. So number of GPUs times the actual flops per GPU. And then T is the time spent. I think people can visualize that pretty easily. It's basically like how many GPUs do you have and how much do you let them run for? And the things that come to it that people have read before in the Chinchilla paper in a way, and the OpenAI scaling law is that you can then equal this to 6PD, where P is the number of parameters in the model and D is the size of the, of the dataset in tokens. So talk a little bit about how people should think about the two. I think a lot of times the focus is on tokens parameter ratio in the training dataset and people don't think as much about the actual flops per GPU, which you're going to mention later in the blog post too, in terms of how much you can get out. So how should people think about this when they're building a model and where should they go to this equation as they're starting to think about training their own transformer-based [00:05:58]Swyx: model? [00:05:58]Quentin: You touched a little bit on the fact that people usually start with the dataset. So you have some dataset that you want to train a model on. And then from there, from the 6PD, you should see, okay, I should have about six tokens per parameter. So that determines my model size thereabouts for Chinchilla Optimal. So since then we've seen that need more something like 20 or more than that to get a good quality model. But the next question that should be on your mind in terms of a systems perspective is how long is it going to take for this model to train and what kind of budget should I expect? So let's say I want some cloud instance for some amount of time and each of them will have some price attached to it. So that's where the throughput comes in. So now that you have this model, this number of parameters, you should map that to a transformer architecture and you should benchmark what throughput you get on your software stack for that type of model. So now you have your flops per second on a single GPU. And then given whatever parallelism scheme, which I'm sure we'll get into, like data parallelism or tensor parallelism or whatever else, how is that flops number going to scale to whatever number of GPUs? And then from there, you're going to get a time. And if you have a time, you have a cost. Those are like the business answers that you'll be able to get using this formula. That's why we sort of split it into the T and the throughput terms so that you can solve for one of them, which is usually get throughput, need time, and from time you get cost. In a nutshell, that's the answer. [00:07:19]Alessio: One thing that I noticed, you mentioned some of these laws are only true when a thousand GPUs for one hour cost the same as one GPU for a thousand hours, given that we have a shortage of the biggest GPUs out there. Any thoughts there on how people should prioritize this? [00:07:36]Quentin: Yeah, so I would say you should find what the minimum number of GPUs is to just fit your model first. The memory bottleneck is your biggest problem if you have a sizable model. If it's a small model, nobody cares. But most models that people care about will need to be split across multiple GPUs. So find the minimum number of GPUs to just fit your one instance of your model and then calculate how long that's going to take. If it's a reasonable amount of time, then you're done. If it takes too long, then you need to start worrying about having multiple instances of that model. I always feel like you should go with the minimum number of GPUs because the more number of GPUs that you have, the more likely it is for things to break. So I would say just find out what time is reasonable for you and then fit the number of GPUs to that and no more. Because people get greedy and they say, if I have twice the GPUs, I can get this done in half the time. And then you end up taking three times the time because everything is breaking every day. And that's when I am up at midnight trying to fix your model that's broken. [00:08:34]Swyx: We had a previous guest which has invested a lot in their framework for training these things. Would there not be an equivalent open source framework you guys would have made that would help with scaling up GPUs linearly like that? Or is this an oversimplification? [00:08:50]Quentin: Okay, yeah. So maybe I should step back. Both Mosaic and us have our own sort of software stack recipe that scales well, theoretically. But I'll get to that in a minute. Mosaic is all based off optimizer sharding. So it's based off ZeRO. So you basically perfectly split your model optimizer and your parameters and your gradients across all of the different GPUs. So your aggregate memory is number of parameters divided by number of GPUs. Same thing for optimizer and so on. Whereas we at Eleuther use a Megatron deep speed based library. And for that, it's a bit more complex. So the efficiency can be a little higher, but it's more prone to failure at the same [00:09:30]Swyx: time. [00:09:30]Quentin: So you kind of have to tune it. In both cases, getting back to like the practical case, you should be able to get linear speed up by adding more GPUs. The problem is that there are hardware failures. You tend to have problems with like maybe loss will overflow if you have too many GPUs or maybe one GPU will hang. You might have software issues. You might have synchronization issues. And that's why I'm saying practically that you should take the minimum number of GPUs that you have because those are the easier cases to debug. That make sense? [00:10:00]Swyx: Yeah. [00:10:00]Quentin: Any more detail on any specific point? [00:10:02]Swyx: Not particularly, just because we haven't actually had to debug those things. But I imagine basically there's a lot of return towards encoding these knowledge into software and not repeating it again. So it makes a ton of sense. I think Alessio had more questions before we move too far into high level, more questions on just the equation itself. I think we want to spend time on essentially, this is the central equation of figuring out compute requirements. Yeah. [00:10:25]Alessio: Another thing in it is that the computer is like the forward pass and like the backwards pass and forward is 2PD, backward is 4PD. Why it's to the ratio between the two? Can you explain that? Why is it two and four? [00:10:39]Quentin: Yeah. [00:10:40]Alessio: Why is it twice the amount? [00:10:42]Quentin: Oh, okay. Intuitively for forward pass, you're just moving, you're propagating forward the inputs through the layer. And then in the backward pass, you're doing something a little more complex than that. You're doing back propagation. And I don't think I can explain it intuitively enough to go into more detail on the exact [00:10:58]Swyx: numbers. Yeah. [00:10:58]Quentin: That's okay. [00:10:59]Swyx: I feel like you want to get out a whiteboard and start drawing like, you know. [00:11:02]Quentin: That's what I would normally do. [00:11:03]Swyx: Tangents and gradients. It's actually surprisingly low to do the back propagation. Honestly, that's one of the fundamental things I love about the math of deep learning so far that as I've explored it, which is, it's surprisingly efficient as compared to other, I guess, numerical methods you might be exposed to and, you know, college calculus. Yeah. [00:11:22]Alessio: And I think the other thing is that things sound simple, you know, when people go on Twitter and say, Oh, 20 is like the optimal ratio. And it's like, then it's like, well, why is that the number? And the answer is usually much, much harder, like what we're seeing right now. So I think it's a, it's a good reminder that the numbers are simple, like all the best and most popular, like math equations are like, so elegant. Obviously the proof behind that is, it's not that easy. That's always a good reminder. [00:11:52]Swyx: I want to put this equation to the test a little bit. We can do this from either GPT-3's perspective or GPT-NeoX, whatever you're more comfortable with. You have this distinction of actual flops versus theoretical flops. And a lot of times when people report the flops it took to train a model, like we just saw one in Lama 2 where the estimate is something that the amount of flops and that's, that's what we go with. So GPT-3 took a 3.14 times 10 to the power 23 flops. That is the theoretical flops. I want to get to a point where I can sort of work out if a number passes the smell test. And I wonder how to do that because I should be able to plug in this equation, right? I know that GPT-3 was trained on 300 billion tokens. I know the parameter size of 175. Is it, is it just like a 6 times 175 times 300? Like I haven't done the math, but what are the nuances here that you might want to call out? [00:12:42]Quentin: Theoretical flops is usually given from, you have a given set of hardware and this is what you expect your hardware to get. The problem is that in practice, full utilization, that's the key word, right? Because in practice, there are a lot of cases where like you're spending time waiting on data movement from like the GPU to CPU. Or for example, you might be waiting to synchronize across the different GPUs. So there's a lot of idle time basically that you're going to be spending during training. [00:13:05]Swyx: Smell tests. [00:13:06]Quentin: I don't know if I have a smell test myself, to be honest, like maybe I'll look at like what sort of flops, what you would expect on like an A100. There's sort of just an expected flops for a given GPU that everyone sort of knows what you should expect. So like for an A100, that number is somewhere between 100 and 180. T flops is what you would expect to see on an A100. For a V100, like an older GPU, it's something more like 40 to 30. So people sort of know, given the kernels that we're running for a deep learning, what sort of flops you expect. And then you sort of compare that to the theory, to the theoretical flops that people are reporting and see if that matches your expectations. [00:13:47]Swyx: Yeah. [00:13:47]Alessio: And in the article you mentioned for the A100, like if you're seeing below 115 teraflops a second, there's something wrong with your model or hardware. How did you get to 115? Is it just, you know, production observability and like you've seen over months and months and months that like that's the baseline or how do you come up with the numbers like that? Yeah. [00:14:08]Quentin: For a number like that, we basically, we compared a lot of different frameworks. So like I mentioned before, Mosaic has their own framework and we have our own framework. They all have their own flop counters too, right? And we saw across a bunch of different hardware configurations that if you tune things correctly, you should be getting above 115 in pretty much all cases. So like there are some cases where things are tuned poorly or your system is a little weird, but we've never been able to get a new system and not been able to get above [00:14:35]Swyx: 115. [00:14:35]Quentin: If something is below 115, you have something really wrong in your software. But that's really all it is, is just comparing across software stacks and hardware systems. [00:14:44]Alessio: What about different GPUs? We had George Hotz on the podcast and he talked about AMD cards and how in theory their flops should be much better than some Nvidia cards, but the reality is like the CUDA runtime makes up for it. How should people think about improving that? You know, like do you see, okay, the A100 is like 115 teraflops. I'd rather just stick with this than try and figure out all the kinks of like a better AMD card or any thoughts there? [00:15:10]Swyx: Right. [00:15:10]Quentin: Well, that's sort of touching on developer time, right? And which ends up being more expensive because at the end of the day, the AMD and Rockham software stack has a long way to go. I would say most things run there, not particularly efficiently, but you're going to have weird bugs that no one has encountered before. One of the big pluses of going with the Nvidia and PyTorch stack is that there are thousands of GitHub issues with everyone facing the same problem as you and resolving them quickly and in an open source way is probably the biggest benefit of going with the Nvidia software stack right now. AMD has about the same hardware, software, not so much. And they haven't quite got the momentum in the open source realm, for example, to get close. Like something, for example, like Flash Attention, it's spread to more Nvidia GPU types than it has like to AMD at all. And waiting on those latest and greatest features to reach AMD is something that's prohibitive to a lot of people, but it's getting there. I'm running a lot of experiments on AMD right now because it's sort of reached the government lab supercomputers now. And so a lot of experiments are going there and it will catch up, I'd say within a few [00:16:14]Swyx: years. [00:16:14]Quentin: Awesome. [00:16:15]Swyx: Maybe just talk about what's available from the government labs and I heard the original, the origin of Eluther started with a grant for TPUs. Is that right? [00:16:24]Quentin: Yes, that was a little before me, but there was a lot of just like getting a grabbing a Google Cloud or TPU pod or something like that is a lot of the original TPU work on Mesh TensorFlow, which is like now like an ancient distributed deep learning library. [00:16:36]Quentin: Eluther got a grant, an insight grant with Oak Ridge last year, and we got quite a bit of Summit Compute. So Summit is a V100 based supercomputer. It's got some weirdness to it. So there's six V100 GPUs per node. And we did a lot of experiments there. It's a challenging system to scale to because your interconnect across nodes is kind of slow in comparison to within a node, which I think we'll get to later. But now Oak Ridge has moved to AMD. So the next grant that we're trying to work towards is on Frontier, which has four AMD GPUs per node and again has a slower interconnect across nodes. So we get all of those new challenges again to try and overlap things. But that's just like you have Oak Ridge, you have Lawrence Livermore. There's a lot of government supercomputers that you can apply for compute towards like open researchers too. It's sort of a new thing. I think we're one of the first like us and like Lion, for example, is another organization that's getting compute from government providers and such. They're all moving to AMD as well. And we look forward to exploring that with them. [00:17:42]Swyx: Yeah. [00:17:43]Alessio: The computing is definitely, it used to be easy to find the GPU. Now, not as much. So you got to find them anywhere. [00:17:49]Swyx: Yes. [00:17:49]Alessio: Let's talk about memory requirements a little bit. So you touched on this a little bit before and just before this, we had a trade out on the pockets from FlashAttention and memory speed was one of our main focuses, but this time we're being bound by actually memory size, like the VRAM itself, when it comes to model weights and parameters and optimizer states and all that fun stuff. Let's go through this and Sean, we can, we can take turns. There's a lot to cover here, but maybe we can start from model weights. So one topic we covered a lot in the past is precision and quantization. That's one of the obviously main driver of memory. You mentioned most of, in the article, most transformers are mixed precision, like FP16 plus FP32 or BF16 FP32, and they can be cast down. And you mentioned up to like INT8 without a lot of performance hit. So let's start there and maybe run people through some of the maths and like the byte per parameter ratio and different precision. [00:18:50]Swyx: Sure. [00:18:51]Quentin: So when I started deep learning, it was all FP32. You have 32 bits, four bytes per parameter. Things were pretty simple. You didn't have to do any loss scaling at all. But the problem was that you didn't get a whole lot of flops once NVIDIA moved to V100s and introduced Tensor cores. So Tensor cores do all of their computation at FP16 precision. So you're kind of throwing all of those away if you're doing things in FP32. So once the hardware moved to V100, the software moved to like mixed precision and APEX and AMP and such. And one counterintuitive part of mixed precision is that you actually require more memory when you're trained because you need an FP16 copy of the weights and an FP32 copy of the weights. The FP16 copy is where you're doing like your actual computation on the Tensor cores. So you get maybe it's not uncommon to get double the throughput that you would see before in FP32. And then you at each step update that FP32 copy with the FP16 update. So both need to be stored in memory. The problem with that is that FP16 is very precise but doesn't have a whole lot of range, [00:19:55]Swyx: dynamic range. [00:19:55]Quentin: So you have a really big mantissa if you're thinking in terms of like floating point representations, not a whole lot of exponent. So BF16 puts more of the bits from the mantissa back to the exponent. So you have a much higher range and a lower precision. And that gets rid of all of this instability problem and loss scaling and such that anyone familiar with debugging knows how unstable it can be, especially for large scale training. And BF16 does away with a lot of that, but it's only supported on A100s. So you see the back and forth between hardware and software. So every time NVIDIA introduces some new Tensor cores or BF16 support or something like that, the software adapts to support it and then training adapts. And then now you mentioned like Ind8 and such. Now we're seeing that you have some model that's been trained in FP16, FP32, whatever else. And then now you want to, with minimal loss and accuracy, quantize that model into a smaller representation like Ind8 and now like Ind4 and things like that and see what you can get away with. And then since deep learning is such like a stochastic problem that a lot of those last bits of precision don't really matter is what we're finding. And I expect that to continue. [00:21:06]Alessio: And so just to put some numbers to it, when you have a FP32, you need four bytes per parameter at inference time to load it in memory. If you have a eight bits model quantized down, you need one byte per parameter. So for example, in an H100, which is 80 gigabyte of memory, you could fit a 70 billion parameters in eight, you cannot fit a FP32 because you will need like 280 gigabytes of memory. So how much does that play into it? Like you mentioned it was all FP32 when you first started. Is it just like a development complexity thing, like going down to FP16 and then Ind8? Or if they could get a GPU with like a terabyte of VRAM, will people just load this memory as like FP32 weights or would they still want to quantize them to make them more efficient? Right. [00:22:00]Quentin: I would say even if you had infinite VRAM, you would still want a quantized model, just a bigger model that's quantized is what I would say. And that's because like I was mentioning there at the end, how like deep learning is very stochastic and a lot, you could have all the precision in the world, but ultimately it's meaningless when you still depend so much like on what the input is. And you depend so much on little variations and maybe a few more samples of training data would matter more. A lot of that precision in a nutshell doesn't really matter in deep learning. All that matters is the big picture. What is that neuron actually saying? And not the tiny details of what it might be thinking. Oh, I also wanted to mention that even if you have an A100, the actual model size is quite a bit smaller that you could load than what you mentioned. That's because of the KV cache. So the KV cache intuitively during inference, it only matters during inference and think intuitively if you're writing a paragraph, you want to remember every single previous word that you've written before you write the next word. So like what is autoregressive language modeling? It's filling in the next word, the next token. So if I say like the dog went to the, and I need to write the next word, I would say park or something. Before I write the next word, my memory is wiped and I have to read the whole thing again. That is life without a KV cache. And a KV cache says, remember everything that I've generated before, as well as all the context before what I've generated. But the memory overhead for a KV cache commonly is either comparable or larger than the model in some cases, if you have a really long context. And I think the exact equation is something like, oh, it's like two times the number of layers, times the number of heads, times the dimension of each head. And then there's two of those. You have one for K, one for V. But that was just a quick aside. Yeah. [00:23:44]Alessio: I know this is Transformers math, but do you think one of the interesting things about RNNs too, it's like moving away from this, like KV cache, the scales with the sequence length and having like a fixed sequence pass. I know those are some of the things that people are working on. [00:24:00]Swyx: Yeah. [00:24:00]Quentin: So there's a paper that I was involved with called RWKV that I would recommend people read. It is answering this exact question. So how do you get Transformers quality without this quadratic attention overhead that Transformers requires? So it is interesting. I don't know if I can really dive too deep into the technical details there. I'd recommend people read the paper. But yeah. [00:24:23]Swyx: Yeah. [00:24:23]Alessio: It's interesting to see if attention is all you need, or maybe attention is all we need, but we need better ways to make it infer in a good way. [00:24:33]Swyx: We've actually done an unreleased episode with one of the RWKV core members and they call it soft attention or light attention. I forget what they call it, but yeah, just ways to approximate it such that it's linear and not quadratic. That's great. Yeah. [00:24:47]Quentin: I didn't know that you were involved. [00:24:48]Swyx: That's great. How did you get involved? Is it just because like everyone just hangs out in Discord and talks about the future of Transformers? Oh yeah. [00:24:55]Quentin: I mean, the RWKV people specifically are in Eleuther all the time. Like they're very close collaboration with us. And my contribution was we have all of these experiments done by all of these people on RNNs and how they relate to Transformers and how do we turn that into a paper and disseminate that digestibly so that people don't have to read through like a Discord log from a year ago to understand what's going on. [00:25:16]Swyx: Oh my God. [00:25:16]Quentin: Just read this paper. So that took some work, but I wasn't a core contributor. So that's why I don't want to go into like the technical details. But yeah, that's how I did. [00:25:24]Swyx: We'll try to get that RWKV episode out. It seems like there's increasing mentions of it and they are doing pretty important work as far as scaling these models are concerned. Okay. So we discussed inference type quantization and memory requirements. And then you also had a section on training with a lot of stuff I think mentioned. I think we probably want to spend the most of our time on optimizer states and the Atom optimizer. Yeah. What are your takes on it and what should people keep in mind when they deal with these optimizers? Okay. [00:25:57]Quentin: I would say the Atom optimizer is good at what it does. It's sort of a broad question. So let me think. You have the copy of the weights and then you have your momentum and your variance that [00:26:08]Swyx: you store. [00:26:08]Quentin: And like, okay, maybe an intuitive explanation for momentum is that like, let's say you have a canyon and you're trying to get to the bottom. And if you're just doing basic SGD, then every step is going to be an equal size. Whereas if you're using something like Atom with the momentum term, then your steps should be progressively larger because you can see, oh, the general trend is we're heading downwards very quickly. But stepping back from that, since you have all of these extra terms in Atom, you require a lot more memory to store it. Like three times as much memory as SGD. And if you have all of this memory being spent on your optimizer states, then how do you distribute it across GPUs? Because you'll find that what ends up being your bottleneck more than just raw compute, raw flops on a given GPU is your parallelism. And that falls back onto how much model you can fit on a single GPU before you need to split it up across a bunch of GPUs. And then you end up spending time, more time with them talking to each other than actually making progress. So that's why all of this time in the blog post is spent on how do you distribute your model? What are all those different distributed strategies look like? Which ones are more efficient? And given that a lot of your memory is being spent optimizers, how do you distribute that optimizer specifically? Because a lot of people, when they talk about parallelism, they talk about model parallelism, the parameters themselves. In actuality, when you're training, a good portion of your memory is actually spent on optimizer states. So what specific part of that would you like to go into? Would you like to go into like zero or sharded optimizers? [00:27:36]Swyx: I think the sharded optimizer stuff is really interesting, but I think we're kind of leaving that towards the end, right? Because that's the maybe more advanced distributed sections. Here, I think we're just going for rough intuition for people who've maybe are familiar with the ideas of these optimizers, but haven't actually had to implement them yet. They read your code, but they don't really understand the intuition behind the code. I see. [00:28:00]Alessio: And Quentin, when you say in the blog post, it says, Adam is magic. How much of it is like actual magic, even to like people like you that are pretty close to the metal, so to speak? Are some of these things just come as gospel? It's like, I know this works, like I'm not touching it. I'm just leveraging it. How much of it are you actually thinking about improving on in your day-to-day work? I see. [00:28:22]Quentin: So I'm a systems guy. I'm an engineer. And a lot of these things come to me as magic. Adam comes to me as magic. I see it from the gods. I say, this is how a deep learning model is trained. And this is how the next step is calculated. And then I say, okay, how do I make that fast? I would say I do look at ways to improve upon it using things like second order optimizers. So there's a lot of research on there because they're hard to distribute. But the core contribution for me always comes down to someone else has done like some deep learning optimization and I need to make it run fast. So I can't really speak to the motivation of why Adam came about other than like simple, intuitive things like I mentioned with like the momentum. But what matters to me is that Adam takes more memory than SGD, specifically three times. And all of that memory needs to go somewhere and it needs to be split efficiently. [00:29:14]Swyx: Yeah. [00:29:14]Alessio: So when you add them all up, you got 12 bytes per parameter with vanilla Adam. [00:29:20]Swyx: Yeah. [00:29:20]Alessio: And then you still get the model parameters and memory too. So as you mentioned, you need to keep a copy of both for like a FB32, FB16 mixed, a copy of both quantization levels. So there's precision levels. So it's six bytes per parameter. Right. [00:29:36]Quentin: Taking a step back again, is that like, okay, most people think of your model getting big. So you need to split with model parallelism purely, something like tensor parallelism. But we can see that the model only takes like two bytes per parameter if we're doing FB16. Whereas the optimizer itself requires four bytes per parameter for the model states, four bytes for momentum, four bytes for variance. So what matters more is how do you split your optimizer efficiently and how do you store it efficiently? And something like bits and bytes, where the optimizer, you got like eight bit Adam, where those optimizer states is only one byte per parameter instead of four or something like that. That is going to give you a much better return on your model training and on your memory overhead required than if you were to, for example, quantize your pure like FB16 model weights down to int8 or something. So for training specifically, your optimizer memory matters a lot. The most in most cases. [00:30:31]Swyx: Well, yeah. [00:30:31]Alessio: And before we dive into zero, just to wrap up the items that you're going to shard later. So you have the parameters, you have the optimizer states, and then you have the gradients. Just maybe touch a little bit on that. And then we can talk about how to efficiently load them in GPUs. [00:30:48]Quentin: So the parameters are the FP32 copies of the parameters. We include them in the optimizer discussion. Some people don't, but just for clarity, it's 12 bytes per param for the optimizer states and four of them are for that FP32 copy of the weights. Four of them are for the momentum. I already went into why it's important to store momentum, but that's also per parameter. You need to store where that parameter is going and where it's been going in the past. You also need to know, okay, we know where it's going, but there's going to be bumps on this canyon that we're going down. So we need to store its variance. How often are those bumps? Should we be focusing more on the momentum? Or is this parameter just kind of jumping around everywhere? Those are all important answers that we need the optimizer to store, and it's per parameter. So that's where all three of those terms come from. And we also include some competing bits and bytes, for example, an SGD to show that depending on your optimizer, you may store all or none of these and in different representations. [00:31:50]Alessio: I'm looking at the total training memory. You essentially have model memory, optimizer memory, gradient memory, and activation memory. I think that's one of the last discussed things. So maybe just give people a little bit of a view. [00:32:03]Swyx: Yeah, this is completely new to me. [00:32:05]Alessio: Active, you know, recomputation, checkpointing, and all of that. [00:32:08]Swyx: Right. [00:32:09]Quentin: So, okay. So to summarize before activation checkpointing, which will be complicated, you have your model params, like I mentioned before, they used to be FP32. Now they're probably BF16, maybe FP16 if it's an older GPU. Then you have your optimizer. That's where a lot of the memory is going. And it's your high precision, usually FP32, copy of the weights. So that's four bytes per param. And then you have, optionally, a couple more terms like we just discussed, like momentum or variance or whatever else, depending on what your optimizer is. Then you have your gradients. So your gradients is what is the gradient update that we get after running the forward pass on the model. And that's going to be whatever your low precision copy of the weights is. So like two bytes per param, if you're using FP16 or BF16. And all of those are sort of set in stone. And that overhead is not going to go away for the duration of training. Your gradients might get cleared after you back propagate them, but your optimizer states and your model states aren't going away. That memory overhead will be there. Activation recomputation and activation memory is dynamic. So some people will come and have this problem where the model loads fine for training. But then when you actually run your first iteration, or you run some future iteration or something like that, you run out of memory, seemingly at random. And it's because of these activations that you're computing on the fly. Good summary, or do you want to get into activation recomputation now, or do you want me to touch on anything else? [00:33:35]Alessio: Yeah, I was going to say, when is the recomputation happening? How does it decide between recomputing versus storing? And talk a bit more about that, maybe. [00:33:47]Quentin: Yeah, okay. So there's a lot of different ways to do this, but I would say there are a few main ones. First is a very simple scheme. You recompute everything. Every single activation that you calculate is just going to be either used or thrown away until the end. So in that case, you care very much about memory. You care very little about compute. Maybe this would be a case where you have to distribute across a lot of different GPUs, for example. And your communication speed is really low. Then that might be a good case for you to just recompute everything. It happens rarely, but it happens. Next up would be something like selective recomputation. So in selective recomputation, which Megatron has a good paper on, and I believe the figure that we have in our blog post is from, in that case, you sort of do a weighted decision for each activation. So for really big activation tensors, you decide, is this going to be more expensive to save in terms of memory or to recompute in terms of compute? So that's sort of the smart scheme that Megatron implements. And there's a lot of different heuristics they use. It's probably not worth mentioning off this super long equation on a pod, but you should go and read that paper if you're interested on selective recomputation. And then a really stupid scheme that most people go with, including NeoX, would be something like, instead of doing all of these heuristics, you just say, if my tensor is bigger than X, I throw it away. And you set X to some static number, and that's it. And that is good enough for a lot of cases. [00:35:18]Swyx: Why is it good enough? [00:35:20]Quentin: You don't want to store more than, you know, X-sized tensor. And some fall above that, some fall below it. And you're not trying to squeeze. You care more about getting something close enough to what the actual heuristic should be without actually computing the heuristic because you don't want to spend the time writing that heuristic code. [00:35:37]Swyx: Cool. I think that does take us on a grand tour of the memory math. Is there any sort of high-level takeaway before we go into the distributed stuff? Zero and all that. Perhaps more detail than most people have ever encountered. And so I'll repeat the equation that Alessio mentioned again, which is total training memory now has all these components that you've mapped out for the first time as far as we're concerned. Model memory, optimizer memory, activation memory, gradient memory. We covered quite a few algorithms as to the choices you can make there. Anything else that you want to mention about just memory math? I don't think so. [00:36:11]Quentin: I think that about covers it. I will say that it's a very different scheme for training and inference. It's common for people to say, oh, BF16 is the best. Done. Whereas a more correct take is that during training, precision matters a bit more. So BF16 will be around longer for training than it will for inference, in which case your model is sort of already baked. And it definitely doesn't need some of those last bits of precision so you can get away much easier with going to int8 for inference rather than training. So everything that you learn for training has to be relearned for inference and vice versa. [00:36:44]Swyx: There's a third category. You're talking about training versus inference. This third category is emerging with regards to fine-tuning and perhaps parameter-efficient methods of fine-tuning. The naive way to implement fine-tuning is just to do more training. But I don't know if you've developed any intuitions over fine-tuning that's worth inserting here. Any intuitions? If you were to write fine-tuning math, what would go in there? That might be an interesting diff to training math. [00:37:10]Quentin: I think there's a lot of questions that are unanswered for fine-tuning. For example, we know scaling laws for training. And some people have done scaling laws for fine-tuning. But how does a model that's already been trained on one domain transfer to another in terms of fine-tuning size? How many tokens per parameter should you have for your fine-tuning dataset? Maybe I'm ignorant, but I feel like a lot of those sort of practical questions on how a model can transfer and how a model can learn or grok some new ability that wasn't in its original training dataset is something that I would definitely put inside a fine-tuning blog post. [00:37:45]Swyx: Something related to perplexity and, I guess, diversity of the tokens that you get. [00:37:49]Quentin: Yeah, sort of dataset transfer is something that I would be curious in. Learning rate transfer is another one. So your model has some decayed learning rate over the course of training. How does that change for fine-tuning? Things like that. [00:38:00]Swyx: All right, cool. Thanks for indulging that stuff. Sure. Yeah. [00:38:03]Alessio: I think after all of this, you can quickly do the math and see that training needs to be distributed to actually work because we just don't have hardware that can easily run this. So let's talk a bit about that. So zero is one of the first things that you mentioned here, which is focused on sharded optimizers. Maybe run people through that and how to think about it. [00:38:25]Swyx: Sure. [00:38:25]Quentin: So zero is centered around two communication operations. And the first is scatter. And people should be looking at the zero figure that I think we have. [00:38:35]Swyx: Yeah. [00:38:36]Quentin: So there's a figure in the paper with parameters, gradients, and optimizer states that people should be looking at when I'm talking about this. Every GPU is going to get its own equal portion of the slice. And if we're doing... There are different stages of zero, but let's just start off with assuming that it's an equal slice of the optimizer states, gradients, and parameters. That would be zero three, stage three in that case. And we do that with a scatter. And the scatter takes, say, one over end GPUs, plus this offset of that slice goes to that GPU. Now all of the GPUs have an equal slice that's in its rank order. And then during each training step, that GPU is going to wait for all of the other slices to communicate so that we now have a whole pie on that GPU, that single GPU. Once we have that whole pie, we do the forward pass on it. And then we distribute that forward pass to all of the others using a gather. So it's a scatter, reduced scatter specifically, and then a gather back to all the others. And you do that each step. So the point of it is that you're sharding these states across GPUs. And with the different stages, you'll see in that figure that the optimizer state is taking the most proportion, which is because of what I mentioned before. We're including the FP32 copy and we're doing atom. So we need those four bytes per param for momentum and for variance. And then zero stage one, which is the most common one, is just optimizer. Zero stage two is optimizer plus gradients. And zero stage three is optimizer gradients and model parameters. But it all comes back to this splitting up and then gathering together back and forth over and over. So you get a lot of communication overhead from zero. But the plus part of that is that you can overlap a lot of that movement with computation. [00:40:23]Alessio: How do you get the optimal number of GPUs to do this on? Is there a way to shard too much as well and put too much overhead? [00:40:31]Quentin: It depends more on what your interconnect is. Taking a step back, there is synchronization that's required, a lot of it, across all of these GPUs. And those tend to be cumulative. So if you go to too many GPUs on an interconnect that's too slow, then you're going to end up spending more time synchronizing. And that magic number where you spend more time synchronizing is going to be different depending on what your fabric is and what your GPU memory is specifically. Just how small of a slice is each GPU getting? I can't, for example, for Summit, that number comes out to be about 20 billion parameters. Now you have 20 billion parameters, and then your magic number of GPUs for that is going to be something like 100 to 200 scale. Beyond that, you're just going to end up spending more time communicating. And the actual flops dipping below some predetermined number by you is going to be whatever your sweet spot ends up being. [00:41:24]Alessio: And then, so this one was like hard for me to go through, so I'm excited to have you run through it, which is a 3D parallelism. [00:41:33]Swyx: It's fancy, it's cutting edge. [00:41:35]Alessio: Yeah, let's talk a bit more about that and some of the work. [00:41:38]Quentin: Okay, 3D parallelism. So what is each dimension? First is the really basic one. That's data parallelism. And data parallelism is you have a copy of the model. Let's say for simplicity, one copy fits on one GPU perfectly. Data parallelism is that now you have two GPUs, so you have one copy on GPU one, one copy on GPU two. Both of them do the forward and backward pass and then synchronize and average the gradients. And then that's a step. Data parallelism for 3D parallelism is actually zero. So it's, you're sharding the optimizer states across all of your different GPUs. Next up is tensor parallelism. Tensor parallelism is you split your model. Like say, if you have two GPUs, you split your model down the middle and each GPU on its tensor specifically is going to do its forward or backward operation on its tensor. And then only when necessary, it'll synchronize that tensor operation with the other GPU. It's a bit more complex than something like pipeline parallelism, which is the third dimension. In pipeline parallelism, let's say you have four layers in your model. And you have four GPUs. You put one layer on each GPU and then GPU one does the forward pass and then sends the output of its activations to GPU two. It does the forward pass, sends activations to three, and you're just moving down a line. That is a naive scheme in that all of the other GPUs are doing nothing while a single GPU is doing its forward or backward pass. So the reason it's called pipeline parallelism is because you're splitting your mini batch into micro batches. So GPU one will do the forward pass on micro batch one and then send to GPU two. And then while GPU two is running on that first micro batch, GPU one is working on the next micro batch. And so you're sort of pipelining the movement and computation of each micro batch. The problem with that is that you need a really big batch size in order to split it up into both mini batches and micro batches. So combining all three of those together, you get a 3D mesh of where each parameter and optimizer state and so on maps to each GPU. And that's 3D parallelism. So let's start diving into details on what have that made sense, what should I jump into more on? [00:43:55]Alessio: I think the main question is, do you need all of the GPUs to be the same to do this? Or can you have mismatching GPUs as well? [00:44:03]Quentin: Okay, two things matter. If there's a difference in VRAM for the two different kinds of GPUs, then you're going to be bottlenecked by whichever GPU has the lower amount of VRAM because it's going to run out of memory. And then you can't like whatever's left on the larger GPUs is going to be empty. As far as I'm aware, there's no like GPU single GPU aware memory overhead scheme that would account for that. The second problem is that let's say all of your GPUs have the same amount of VRAM, but half of them are really slow. And the problem with that is that those synchronizations that I mentioned earlier are going to kill you. So you're going to move as quickly as your slowest GPU in that case. So in both cases, you end up regressing to your slowest or smallest GPU. So you might as well have the same GPUs for all of them. Otherwise, you're wasting the nicer ones. And that also goes to your CPUs and your interconnect. So going back to the 20 billion parameter model that Eleuther was training, that was on a cluster that was sort of Frankenstein made during COVID when there was all of that shortage of network switches and such like that. So every node had a different network switch. And so you ended up moving at the speed of the slowest switch and getting everything tuned properly so that it's not worse than the slowest switch was challenging and is like a real world problem that sometimes comes up. [00:45:28]Alessio: Is this work widely accepted? Like I hadn't learned about this before studying for this episode. Is this something that people are still trying and researching? Or is everybody just aware of this and running this in production? [00:45:43]Quentin: What is this specifically? [00:45:44]Alessio: Like the sharded optimizers plus the 3D parallelism, bringing the two things together and having this kind of mesh strategy. [00:45:51]Quentin: I would say that a lot of major GPT-based models use this scheme. A lot of them now are sort of going with just a pure zero scheme. So just a pure sharded. You just shard everything. And then since that's so easy, everyone gets an equal slice. There's no such thing as a pipeline stage. There's no such thing as what tensor should go on which GPU. Instead, we shard everything equally and treat everything equally. It's a much easier problem to debug, to checkpoint, to run training on than it is with this 3D parallel scheme. I say 3D parallel gives you the most control and also the most ways to go wrong. And depending on whether you have more engineers or whether you have more GPUs, that should decide which of these you go with. [00:46:35]Swyx: It's also not too hard, right? You've basically outlined the five or six different numbers that you need to keep in your head. And it doesn't feel impossible that if you need to achieve that level of control, you've given everybody the main levers to do it with. And that's wonderful. Definitely. [00:46:51]Quentin: The problem that comes up is like, say, like, okay, GPT-4 came out. Now we have VLLMs. [00:46:57]Swyx: Whoa, what are VLLMs? Oh, okay. Virtual LLMs, like the Metro of Expert things? No, like visual. [00:47:03]Quentin: So now you have like multimodal models and such. How do you distribute that? Do you distribute it in a pipeline stage? And do you just shard it? Do you split the tensor and make a tensor parallel? It's sort of hard to change your model and add new features and such when you have this 3D parallel scheme. That's when I say hard. I mean, it's hard to sort of adapt and modify it to new features. [00:47:26]Alessio: I know we're at the hour mark, and I think we put our listeners through a very intense class today. So this was great, Quentin. And we're going to definitely link the article so that people can read it and follow along. Any other research that you're working on in this space that you want to shout out? I know one of our usual, I mean, wrong question is, what's the most interesting unsolved question in AI? So curious to hear if you think it's still on the training inference, math optimization, or are there more areas that people should pay attention to? [00:47:58]Quentin: I think in my area of research, there are two things that I think people should really care about. And the first is multimodal parallelism and RLHF. You were seeing more and more reinforcement learning and coming into the training loop. And so how do you split that some model or some GPUs are working on inference and some GPUs are working on training? And like I mentioned before, you have to relearn everything and they have very unique challenges. How do you split up a KV cache during training, for example? Those are challenges that are not well studied, I don't think. And then multimodal, you have like maybe a vision transformer and a text transformer. How do you split those up? Do you split them up equally? Do you put them on separate GPUs or do you just shard everything? And just maybe one GPU will have some vision, some text parameters. And then the second case I would say is that communication is very often a bottleneck. So we talk about 3D parallelism, but a lot of those like, for example, tensor parallelism, you can't go across nodes with. You'll just get killed in communication. So what I'm getting to is how should you compress your communication before it happens? So on the fly compression, you have some buffer that needs to be communicated. You compress it with a GPU kernel, then you send it across the network and then you decompress it, something like that. Making people spend less money on communication fabrics and more on GPUs as intended is sort of a thing that people need to explore. I think those are my two. [00:49:26]Alessio: Sean, you went over the other half of the lightning round before we wrap it up. [00:49:30]Swyx: That's a good brain dump. Cool. Yeah, I have so many more questions on the multimodal stuff, but that should be for another time. Acceleration, what has already happened in AI that you thought would take much longer? [00:49:42]Quentin: I would say flash attention. Guys, just talk to Tree. And flash attention is just sort of a really great set of kernels that I thought would take a while to get to us. [00:49:51]Alessio: Well, Quentin, thank you very much, man. This was super informative and I think hopefully helps demystify a little bit the blog post. I think people open it and it's like a lot of math on it. And I think you walking them through it was super helpful. So thank you so much for coming on. [00:50:07]Swyx: Of course. [00:50:08]Quentin: And I'm happy to answer any questions that people have offline if they have them. I do read my email. [00:50:13]Swyx: Email and Discord. Of course, yeah. [00:50:15]Quentin: Discord I'm even faster on. [00:50:16]Alessio: Thank you, everyone. [00:50:18]Swyx: Thanks, Quentin. [00:50:19] Get full access to Latent Space at www.latent.space/subscribe
50:3816/08/2023
LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML
We have just announced our first set of speakers at AI Engineer Summit! Sign up for the livestream or email [email protected] if you’d like to support.We are facing a massive GPU crunch. As both startups and VC’s hoard Nvidia GPUs like countries count nuclear stockpiles, tweets about GPU shortages have become increasingly common. But what if we could run LLMs with AMD cards, or without a GPU at all? There’s just one weird trick: compilation. And there’s one person uniquely qualified to do it.We had the pleasure to sit down with Tianqi Chen, who’s an Assistant Professor at CMU, where he both teaches the MLC course and runs the MLC group. You might also know him as the creator of XGBoost, Apache TVM, and MXNet, as well as the co-founder of OctoML. The MLC (short for Machine Learning Compilation) group has released a lot of interesting projects:* MLC Chat: an iPhone app that lets you run models like RedPajama-3B and Vicuna-7B on-device. It gets up to 30 tok/s!* Web LLM: Run models like LLaMA-70B in your browser (!!) to offer local inference in your product.* MLC LLM: a framework that allows any language models to be deployed natively on different hardware and software stacks.The MLC group has just announced new support for AMD cards; we previously talked about the shortcomings of ROCm, but using MLC you can get performance very close to the NVIDIA’s counterparts. This is great news for founders and builders, as AMD cards are more readily available. Here are their latest results on AMD’s 7900s vs some of top NVIDIA consumer cards.If you just can’t get a GPU at all, MLC LLM also supports ARM and x86 CPU architectures as targets by leveraging LLVM. While speed performance isn’t comparable, it allows for non-time-sensitive inference to be run on commodity hardware.We also enjoyed getting a peek into TQ’s process, which involves a lot of sketching:With all the other work going on in this space with projects like ggml and Ollama, we’re excited to see GPUs becoming less and less of an issue to get models in the hands of more people, and innovative software solutions to hardware problems!Show Notes* TQ’s Projects:* XGBoost* Apache TVM* MXNet* MLC* OctoML* CMU Catalyst* ONNX* GGML* Mojo* WebLLM* RWKV* HiPPO* Tri Dao’s Episode* George Hotz EpisodePeople:* Carlos Guestrin* Albert GuTimestamps* [00:00:00] Intros* [00:03:41] The creation of XGBoost and its surprising popularity* [00:06:01] Comparing tree-based models vs deep learning* [00:10:33] Overview of TVM and how it works with ONNX* [00:17:18] MLC deep dive* [00:28:10] Using int4 quantization for inference of language models* [00:30:32] Comparison of MLC to other model optimization projects* [00:35:02] Running large language models in the browser with WebLLM* [00:37:47] Integrating browser models into applications* [00:41:15] OctoAI and self-optimizing compute* [00:45:45] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: Okay, and we are here with Tianqi Chen, or TQ as people call him, who is assistant professor in ML computer science at CMU, Carnegie Mellon University, also helping to run Catalyst Group, also chief technologist of OctoML. You wear many hats. Are those, you know, your primary identities these days? Of course, of course. [00:00:42]Tianqi: I'm also, you know, very enthusiastic open source. So I'm also a VP and PRC member of the Apache TVM project and so on. But yeah, these are the things I've been up to so far. [00:00:53]Swyx: Yeah. So you did Apache TVM, XGBoost, and MXNet, and we can cover any of those in any amount of detail. But maybe what's one thing about you that people might not learn from your official bio or LinkedIn, you know, on the personal side? [00:01:08]Tianqi: Let me say, yeah, so normally when I do, I really love coding, even though like I'm trying to run all those things. So one thing that I keep a habit on is I try to do sketchbooks. I have a book, like real sketchbooks to draw down the design diagrams and the sketchbooks I keep sketching over the years, and now I have like three or four of them. And it's kind of a usually a fun experience of thinking the design through and also seeing how open source project evolves and also looking back at the sketches that we had in the past to say, you know, all these ideas really turn into code nowadays. [00:01:43]Alessio: How many sketchbooks did you get through to build all this stuff? I mean, if one person alone built one of those projects, he'll be a very accomplished engineer. Like you built like three of these. What's that process like for you? Like it's the sketchbook, like the start, and then you think about the code or like. [00:01:59]Swyx: Yeah. [00:02:00]Tianqi: So, so usually I start sketching on high level architectures and also in a project that works for over years, we also start to think about, you know, new directions, like of course generative AI language model comes in, how it's going to evolve. So normally I would say it takes like one book a year, roughly at that rate. It's usually fun to, I find it's much easier to sketch things out and then gives a more like a high level architectural guide for some of the future items. Yeah. [00:02:28]Swyx: Have you ever published this sketchbooks? Cause I think people would be very interested on, at least on a historical basis. Like this is the time where XGBoost was born, you know? Yeah, not really. [00:02:37]Tianqi: I started sketching like after XGBoost. So that's a kind of missing piece, but a lot of design details in TVM are actually part of the books that I try to keep a record of. [00:02:48]Swyx: Yeah, we'll try to publish them and publish something in the journals. Maybe you can grab a little snapshot for visual aid. Sounds good. [00:02:57]Alessio: Yeah. And yeah, talking about XGBoost, so a lot of people in the audience might know it's a gradient boosting library, probably the most popular out there. And it became super popular because many people started using them in like a machine learning competitions. And I think there's like a whole Wikipedia page of like all state-of-the-art models. They use XGBoost and like, it's a really long list. When you were working on it, so we just had Tri Dao, who's the creator of FlashAttention on the podcast. And I asked him this question, it's like, when you were building FlashAttention, did you know that like almost any transform race model will use it? And so I asked the same question to you when you were coming up with XGBoost, like, could you predict it would be so popular or like, what was the creation process? And when you published it, what did you expect? We have no idea. [00:03:41]Tianqi: Like, actually, the original reason that we built that library is that at that time, deep learning just came out. Like that was the time where AlexNet just came out. And one of the ambitious mission that myself and my advisor, Carlos Guestrin, then is we want to think about, you know, try to test the hypothesis. Can we find alternatives to deep learning models? Because then, you know, there are other alternatives like, you know, support vector machines, linear models, and of course, tree-based models. And our question was, if you build those models and feed them with big enough data, because usually like one of the key characteristics of deep learning is that it's taking a lot [00:04:22]Swyx: of data, right? [00:04:23]Tianqi: So we will be able to get the same amount of performance. That's a hypothesis we're setting out to test. Of course, if you look at now, right, that's a wrong hypothesis, but as a byproduct, what we find out is that, you know, most of the gradient boosting library out there is not efficient enough for us to test that hypothesis. So I happen to have quite a bit of experience in the past of building gradient boosting trees and their variants. So Effective Action Boost was kind of like a byproduct of that hypothesis testing. At that time, I'm also competing a bit in data science challenges, like I worked on KDDCup and then Kaggle kind of become bigger, right? So I kind of think maybe it's becoming useful to others. One of my friends convinced me to try to do a Python binding of it. That tends to be like a very good decision, right, to be effective. Usually when I build it, we feel like maybe a command line interface is okay. And now we have a Python binding, we have R bindings. And then it realized, you know, it started getting interesting. People started contributing different perspectives, like visualization and so on. So we started to push a bit more on to building distributive support to make sure it works on any platform and so on. And even at that time point, when I talked to Carlos, my advisor, later, he said he never anticipated that we'll get to that level of success. And actually, why I pushed for gradient boosting trees, interestingly, at that time, he also disagreed. He thinks that maybe we should go for kernel machines then. And it turns out, you know, actually, we are both wrong in some sense, and Deep Neural Network was the king in the hill. But at least the gradient boosting direction got into something fruitful. [00:06:01]Swyx: Interesting. [00:06:02]Alessio: I'm always curious when it comes to these improvements, like, what's the design process in terms of like coming up with it? And how much of it is a collaborative with like other people that you're working with versus like trying to be, you know, obviously, in academia, it's like very paper-driven kind of research driven. [00:06:19]Tianqi: I would say the extra boost improvement at that time point was more on like, you know, I'm trying to figure out, right. But it's combining lessons. Before that, I did work on some of the other libraries on matrix factorization. That was like my first open source experience. Nobody knew about it, because you'll find, likely, if you go and try to search for the package SVD feature, you'll find some SVN repo somewhere. But it's actually being used for some of the recommender system packages. So I'm trying to apply some of the previous lessons there and trying to combine them. The later projects like MXNet and then TVM is much, much more collaborative in a sense that... But, of course, extra boost has become bigger, right? So when we started that project myself, and then we have, it's really amazing to see people come in. Michael, who was a lawyer, and now he works on the AI space as well, on contributing visualizations. Now we have people from our community contributing different things. So extra boost even today, right, it's a community of committers driving the project. So it's definitely something collaborative and moving forward on getting some of the things continuously improved for our community. [00:07:37]Alessio: Let's talk a bit about TVM too, because we got a lot of things to run through in this episode. [00:07:42]Swyx: I would say that at some point, I'd love to talk about this comparison between extra boost or tree-based type AI or machine learning compared to deep learning, because I think there is a lot of interest around, I guess, merging the two disciplines, right? And we can talk more about that. I don't know where to insert that, by the way, so we can come back to it later. Yeah. [00:08:04]Tianqi: Actually, what I said, when we test the hypothesis, the hypothesis is kind of, I would say it's partially wrong, because the hypothesis we want to test now is, can you run tree-based models on image classification tasks, where deep learning is certainly a no-brainer right [00:08:17]Swyx: now today, right? [00:08:18]Tianqi: But if you try to run it on tabular data, still, you'll find that most people opt for tree-based models. And there's a reason for that, in the sense that when you are looking at tree-based models, the decision boundaries are naturally rules that you're looking at, right? And they also have nice properties, like being able to be agnostic to scale of input and be able to automatically compose features together. And I know there are attempts on building neural network models that work for tabular data, and I also sometimes follow them. I do feel like it's good to have a bit of diversity in the modeling space. Actually, when we're building TVM, we build cost models for the programs, and actually we are using XGBoost for that as well. I still think tree-based models are going to be quite relevant, because first of all, it's really to get it to work out of the box. And also, you will be able to get a bit of interoperability and control monotonicity [00:09:18]Swyx: and so on. [00:09:19]Tianqi: So yes, it's still going to be relevant. I also sometimes keep coming back to think about, are there possible improvements that we can build on top of these models? And definitely, I feel like it's a space that can have some potential in the future. [00:09:34]Swyx: Are there any current projects that you would call out as promising in terms of merging the two directions? [00:09:41]Tianqi: I think there are projects that try to bring a transformer-type model for tabular data. I don't remember specifics of them, but I think even nowadays, if you look at what people are using, tree-based models are still one of their toolkits. So I think maybe eventually it's not even a replacement, it will be just an ensemble of models that you can call. Perfect. [00:10:07]Alessio: Next up, about three years after XGBoost, you built this thing called TVM, which is now a very popular compiler framework for models. Let's talk about, so this came out about at the same time as ONNX. So I think it would be great if you could maybe give a little bit of an overview of how the two things work together. Because it's kind of like the model, then goes to ONNX, then goes to the TVM. But I think a lot of people don't understand the nuances. I can get a bit of a backstory on that. [00:10:33]Tianqi: So actually, that's kind of an ancient history. Before XGBoost, I worked on deep learning for two years or three years. I got a master's before I started my PhD. And during my master's, my thesis focused on applying convolutional restricted Boltzmann machine for ImageNet classification. That is the thing I'm working on. And that was before AlexNet moment. So effectively, I had to handcraft NVIDIA CUDA kernels on, I think, a GTX 2070 card. I have a 22070 card. It took me about six months to get one model working. And eventually, that model is not so good, and we should have picked a better model. But that was like an ancient history that really got me into this deep learning field. And of course, eventually, we find it didn't work out. So in my master's, I ended up working on recommender system, which got me a paper, and I applied and got a PhD. But I always want to come back to work on the deep learning field. So after XGBoost, I think I started to work with some folks on this particular MXNet. At that time, it was like the frameworks of CAFE, Ciano, PyTorch haven't yet come out. And we're really working hard to optimize for performance on GPUs. At that time, I found it's really hard, even for NVIDIA GPU. It took me six months. And then it's amazing to see on different hardwares how hard it is to go and optimize code for the platforms that are interesting. So that gets me thinking, can we build something more generic and automatic? So that I don't need an entire team of so many people to go and build those frameworks. So that's the motivation of starting working on TVM. There is really too little about machine learning engineering needed to support deep learning models on the platforms that we're interested in. I think it started a bit earlier than ONNX, but once it got announced, I think it's in a similar time period at that time. So overall, how it works is that TVM, you will be able to take a subset of machine learning programs that are represented in what we call a computational graph. Nowadays, we can also represent a loop-level program ingest from your machine learning models. Usually, you have model formats ONNX, or in PyTorch, they have FX Tracer that allows you to trace the FX graph. And then it goes through TVM. We also realized that, well, yes, it needs to be more customizable, so it will be able to perform some of the compilation optimizations like fusion operator together, doing smart memory planning, and more importantly, generate low-level code. So that works for NVIDIA and also is portable to other GPU backends, even non-GPU backends [00:13:36]Swyx: out there. [00:13:37]Tianqi: So that's a project that actually has been my primary focus over the past few years. And it's great to see how it started from where I think we are the very early initiator of machine learning compilation. I remember there was a visit one day, one of the students asked me, are you still working on deep learning frameworks? I tell them that I'm working on ML compilation. And they said, okay, compilation, that sounds very ancient. It sounds like a very old field. And why are you working on this? And now it's starting to get more traction, like if you say Torch Compile and other things. I'm really glad to see this field starting to pick up. And also we have to continue innovating here. [00:14:17]Alessio: I think the other thing that I noticed is, it's kind of like a big jump in terms of area of focus to go from XGBoost to TVM, it's kind of like a different part of the stack. Why did you decide to do that? And I think the other thing about compiling to different GPUs and eventually CPUs too, did you already see some of the strain that models could have just being focused on one runtime, only being on CUDA and that, and how much of that went into it? [00:14:50]Tianqi: I think it's less about trying to get impact, more about wanting to have fun. I like to hack code, I had great fun hacking CUDA code. Of course, being able to generate CUDA code is cool, right? But now, after being able to generate CUDA code, okay, by the way, you can do it on other platforms, isn't that amazing? So it's more of that attitude to get me started on this. And also, I think when we look at different researchers, myself is more like a problem solver type. So I like to look at a problem and say, okay, what kind of tools we need to solve that problem? So regardless, it could be building better models. For example, while we build extra boots, we build certain regularizations into it so that it's more robust. It also means building system optimizations, writing low-level code, maybe trying to write assembly and build compilers and so on. So as long as they solve the problem, definitely go and try to do them together. And I also see it's a common trend right now. Like if you want to be able to solve machine learning problems, it's no longer at Aggressor layer, right? You kind of need to solve it from both Aggressor data and systems angle. And this entire field of machine learning system, I think it's kind of emerging. And there's now a conference around it. And it's really good to see a lot more people are starting to look into this. [00:16:10]Swyx: Yeah. Are you talking about ICML or something else? [00:16:13]Tianqi: So machine learning and systems, right? So not only machine learning, but machine learning and system. So there's a conference called MLsys. It's definitely a smaller community than ICML, but I think it's also an emerging and growing community where people are talking about what are the implications of building systems for machine learning, right? And how do you go and optimize things around that and co-design models and systems together? [00:16:37]Swyx: Yeah. And you were area chair for ICML and NeurIPS as well. So you've just had a lot of conference and community organization experience. Is that also an important part of your work? Well, it's kind of expected for academic. [00:16:48]Tianqi: If I hold an academic job, I need to do services for the community. Okay, great. [00:16:53]Swyx: Your most recent venture in MLsys is going to the phone with MLCLLM. You announced this in April. I have it on my phone. It's great. I'm running Lama 2, Vicuña. I don't know what other models that you offer. But maybe just kind of describe your journey into MLC. And I don't know how this coincides with your work at CMU. Is that some kind of outgrowth? [00:17:18]Tianqi: I think it's more like a focused effort that we want in the area of machine learning compilation. So it's kind of related to what we built in TVM. So when we built TVM was five years ago, right? And a lot of things happened. We built the end-to-end machine learning compiler that works, the first one that works. But then we captured a lot of lessons there. So then we are building a second iteration called TVM Unity. That allows us to be able to allow ML engineers to be able to quickly capture the new model and how we demand building optimizations for them. And MLCLLM is kind of like an MLC. It's more like a vertical driven organization that we go and build tutorials and go and build projects like LLM to solutions. So that to really show like, okay, you can take machine learning compilation technology and apply it and bring something fun forward. Yeah. So yes, it runs on phones, which is really cool. But the goal here is not only making it run on phones, right? The goal is making it deploy universally. So we do run on Apple M2 Macs, the 17 billion models. Actually, on a single batch inference, more recently on CUDA, we get, I think, the most best performance you can get out there already on the 4-bit inference. Actually, as I alluded earlier before the podcast, we just had a result on AMD. And on a single batch, actually, we can get the latest AMD GPU. This is a consumer card. It can get to about 80% of the 4019, so NVIDIA's best consumer card out there. So it's not yet on par, but thinking about how diversity and what you can enable and the previous things you can get on that card, it's really amazing that what you can do with this kind of technology. [00:19:10]Swyx: So one thing I'm a little bit confused by is that most of these models are in PyTorch, but you're running this inside a TVM. I don't know. Was there any fundamental change that you needed to do, or was this basically the fundamental design of TVM? [00:19:25]Tianqi: So the idea is that, of course, it comes back to program representation, right? So effectively, TVM has this program representation called TVM script that contains more like computational graph and operational representation. So yes, initially, we do need to take a bit of effort of bringing those models onto the program representation that TVM supports. Usually, there are a mix of ways, depending on the kind of model you're looking at. For example, for vision models and stable diffusion models, usually we can just do tracing that takes PyTorch model onto TVM. That part is still being robustified so that we can bring more models in. On language model tasks, actually what we do is we directly build some of the model constructors and try to directly map from Hugging Face models. The goal is if you have a Hugging Face configuration, we will be able to bring that in and apply optimization on them. So one fun thing about model compilation is that your optimization doesn't happen only as a soft language, right? For example, if you're writing PyTorch code, you just go and try to use a better fused operator at a source code level. Torch compile might help you do a bit of things in there. In most of the model compilations, it not only happens at the beginning stage, but we also apply generic transformations in between, also through a Python API. So you can tweak some of that. So that part of optimization helps a lot of uplifting in getting both performance and also portability on the environment. And another thing that we do have is what we call universal deployment. So if you get the ML program into this TVM script format, where there are functions that takes in tensor and output tensor, we will be able to have a way to compile it. So they will be able to load the function in any of the language runtime that TVM supports. So if you could load it in JavaScript, and that's a JavaScript function that you can take in tensors and output tensors. If you're loading Python, of course, and C++ and Java. So the goal there is really bring the ML model to the language that people care about and be able to run it on a platform they like. [00:21:37]Swyx: It strikes me that I've talked to a lot of compiler people, but you don't have a traditional compiler background. You're inventing your own discipline called machine learning compilation, or MLC. Do you think that this will be a bigger field going forward? [00:21:52]Tianqi: First of all, I do work with people working on compilation as well. So we're also taking inspirations from a lot of early innovations in the field. Like for example, TVM initially, we take a lot of inspirations from Halide, which is just an image processing compiler. And of course, since then, we have evolved quite a bit to focus on the machine learning related compilations. If you look at some of our conference publications, you'll find that machine learning compilation is already kind of a subfield. So if you look at papers in both machine learning venues, the MLC conferences, of course, and also system venues, every year there will be papers around machine learning compilation. And in the compiler conference called CGO, there's a C4ML workshop that also kind of trying to focus on this area. So definitely it's already starting to gain traction and becoming a field. I wouldn't claim that I invented this field, but definitely I helped to work with a lot of folks there. And I try to bring a perspective, of course, trying to learn a lot from the compiler optimizations as well as trying to bring in knowledges in machine learning and systems together. [00:23:07]Alessio: So we had George Hotz on the podcast a few episodes ago, and he had a lot to say about AMD and their software. So when you think about TVM, are you still restricted in a way by the performance of the underlying kernel, so to speak? So if your target is like a CUDA runtime, you still get better performance, no matter like TVM kind of helps you get there, but then that level you don't take care of, right? [00:23:34]Swyx: There are two parts in here, right? [00:23:35]Tianqi: So first of all, there is the lower level runtime, like CUDA runtime. And then actually for NVIDIA, a lot of the mood came from their libraries, like Cutlass, CUDN, right? Those library optimizations. And also for specialized workloads, actually you can specialize them. Because a lot of cases you'll find that if you go and do benchmarks, it's very interesting. Like two years ago, if you try to benchmark ResNet, for example, usually the NVIDIA library [00:24:04]Swyx: gives you the best performance. [00:24:06]Tianqi: It's really hard to beat them. But as soon as you start to change the model to something, maybe a bit of a variation of ResNet, not for the traditional ImageNet detections, but for latent detection and so on, there will be some room for optimization because people sometimes overfit to benchmarks. These are people who go and optimize things, right? So people overfit the benchmarks. So that's the largest barrier, like being able to get a low level kernel libraries, right? In that sense, the goal of TVM is actually we try to have a generic layer to both, of course, leverage libraries when available, but also be able to automatically generate [00:24:45]Swyx: libraries when possible. [00:24:46]Tianqi: So in that sense, we are not restricted by the libraries that they have to offer. That's why we will be able to run Apple M2 or WebGPU where there's no library available because we are kind of like automatically generating libraries. That makes it easier to support less well-supported hardware, right? For example, WebGPU is one example. From a runtime perspective, AMD, I think before their Vulkan driver was not very well supported. Recently, they are getting good. But even before that, we'll be able to support AMD through this GPU graphics backend called Vulkan, which is not as performant, but it gives you a decent portability across those [00:25:29]Swyx: hardware. [00:25:29]Alessio: And I know we got other MLC stuff to talk about, like WebLLM, but I want to wrap up on the optimization that you're doing. So there's kind of four core things, right? Kernel fusion, which we talked a bit about in the flash attention episode and the tiny grab one memory planning and loop optimization. I think those are like pretty, you know, self-explanatory. I think the one that people have the most questions, can you can you quickly explain [00:25:53]Swyx: those? [00:25:54]Tianqi: So there are kind of a different things, right? Kernel fusion means that, you know, if you have an operator like Convolutions or in the case of a transformer like MOP, you have other operators that follow that, right? You don't want to launch two GPU kernels. You want to be able to put them together in a smart way, right? And as a memory planning, it's more about, you know, hey, if you run like Python code, every time when you generate a new array, you are effectively allocating a new piece of memory, right? Of course, PyTorch and other frameworks try to optimize for you. So there is a smart memory allocator behind the scene. But actually, in a lot of cases, it's much better to statically allocate and plan everything ahead of time. And that's where like a compiler can come in. We need to, first of all, actually for language model, it's much harder because dynamic shape. So you need to be able to what we call symbolic shape tracing. So we have like a symbolic variable that tells you like the shape of the first tensor is n by 12. And the shape of the third tensor is also n by 12. Or maybe it's n times 2 by 12. Although you don't know what n is, right? But you will be able to know that relation and be able to use that to reason about like fusion and other decisions. So besides this, I think loop transformation is quite important. And it's actually non-traditional. Originally, if you simply write a code and you want to get a performance, it's very hard. For example, you know, if you write a matrix multiplier, the simplest thing you can do is you do for i, j, k, c, i, j, plus, equal, you know, a, i, k, times b, i, k. But that code is 100 times slower than the best available code that you can get. So we do a lot of transformation, like being able to take the original code, trying to put things into shared memory, and making use of tensor calls, making use of memory copies, and all this. Actually, all these things, we also realize that, you know, we cannot do all of them. So we also make the ML compilation framework as a Python package, so that people will be able to continuously improve that part of engineering in a more transparent way. So we find that's very useful, actually, for us to be able to get good performance very quickly on some of the new models. Like when Lamato came out, we'll be able to go and look at the whole, here's the bottleneck, and we can go and optimize those. [00:28:10]Alessio: And then the fourth one being weight quantization. So everybody wants to know about that. And just to give people an idea of the memory saving, if you're doing FB32, it's like four bytes per parameter. Int8 is like one byte per parameter. So you can really shrink down the memory footprint. What are some of the trade-offs there? How do you figure out what the right target is? And what are the precision trade-offs, too? [00:28:37]Tianqi: Right now, a lot of people also mostly use int4 now for language models. So that really shrinks things down a lot. And more recently, actually, we started to think that, at least in MOC, we don't want to have a strong opinion on what kind of quantization we want to bring, because there are so many researchers in the field. So what we can do is we can allow developers to customize the quantization they want, but we still bring the optimum code for them. So we are working on this item called bring your own quantization. In fact, hopefully MOC will be able to support more quantization formats. And definitely, I think there's an open field that's being explored. Can you bring more sparsities? Can you quantize activations as much as possible, and so on? And it's going to be something that's going to be relevant for quite a while. [00:29:27]Swyx: You mentioned something I wanted to double back on, which is most people use int4 for language models. This is actually not obvious to me. Are you talking about the GGML type people, or even the researchers who are training the models also using int4? [00:29:40]Tianqi: Sorry, so I'm mainly talking about inference, not training, right? So when you're doing training, of course, int4 is harder, right? Maybe you could do some form of mixed type precision for inference. I think int4 is kind of like, in a lot of cases, you will be able to get away with int4. And actually, that does bring a lot of savings in terms of the memory overhead, and so on. [00:30:09]Alessio: Yeah, that's great. Let's talk a bit about maybe the GGML, then there's Mojo. How should people think about MLC? How do all these things play together? I think GGML is focused on model level re-implementation and improvements. Mojo is a language, super sad. You're more at the compiler level. Do you all work together? Do people choose between them? [00:30:32]Tianqi: So I think in this case, I think it's great to say the ecosystem becomes so rich with so many different ways. So in our case, GGML is more like you're implementing something from scratch in C, right? So that gives you the ability to go and customize each of a particular hardware backend. But then you will need to write from CUDA kernels, and you write optimally from AMD, and so on. So the kind of engineering effort is a bit more broadened in that sense. Mojo, I have not looked at specific details yet. I think it's good to start to say, it's a language, right? I believe there will also be machine learning compilation technologies behind it. So it's good to say, interesting place in there. In the case of MLC, our case is that we do not want to have an opinion on how, where, which language people want to develop, deploy, and so on. And we also realize that actually there are two phases. We want to be able to develop and optimize your model. By optimization, I mean, really bring in the best CUDA kernels and do some of the machine learning engineering in there. And then there's a phase where you want to deploy it as a part of the app. So if you look at the space, you'll find that GGML is more like, I'm going to develop and optimize in the C language, right? And then most of the low-level languages they have. And Mojo is that you want to develop and optimize in Mojo, right? And you deploy in Mojo. In fact, that's the philosophy they want to push for. In the ML case, we find that actually if you want to develop models, the machine learning community likes Python. Python is a language that you should focus on. So in the case of MLC, we really want to be able to enable, not only be able to just define your model in Python, that's very common, right? But also do ML optimization, like engineering optimization, CUDA kernel optimization, memory planning, all those things in Python that makes you customizable and so on. But when you do deployment, we realize that people want a bit of a universal flavor. If you are a web developer, you want JavaScript, right? If you're maybe an embedded system person, maybe you would prefer C++ or C or Rust. And people sometimes do like Python in a lot of cases. So in the case of MLC, we really want to have this vision of, you optimize, build a generic optimization in Python, then you deploy that universally onto the environments that people like. [00:32:54]Swyx: That's a great perspective and comparison, I guess. One thing I wanted to make sure that we cover is that I think you are one of these emerging set of academics that also very much focus on your artifacts of delivery. Of course. Something we talked about for three years, that he was very focused on his GitHub. And obviously you treated XGBoost like a product, you know? And then now you're publishing an iPhone app. Okay. Yeah. Yeah. What is his thinking about academics getting involved in shipping products? [00:33:24]Tianqi: I think there are different ways of making impact, right? Definitely, you know, there are academics that are writing papers and building insights for people so that people can build product on top of them. In my case, I think the particular field I'm working on, machine learning systems, I feel like really we need to be able to get it to the hand of people so that really we see the problem, right? And we show that we can solve a problem. And it's a different way of making impact. And there are academics that are doing similar things. Like, you know, if you look at some of the people from Berkeley, right? A few years, they will come up with big open source projects. Certainly, I think it's just a healthy ecosystem to have different ways of making impacts. And I feel like really be able to do open source and work with open source community is really rewarding because we have a real problem to work on when we build our research. Actually, those research bring together and people will be able to make use of them. And we also start to see interesting research challenges that we wouldn't otherwise say, right, if you're just trying to do a prototype and so on. So I feel like it's something that is one interesting way of making impact, making contributions. [00:34:40]Swyx: Yeah, you definitely have a lot of impact there. And having experience publishing Mac stuff before, the Apple App Store is no joke. It is the hardest compilation, human compilation effort. So one thing that we definitely wanted to cover is running in the browser. You have a 70 billion parameter model running in the browser. That's right. Can you just talk about how? Yeah, of course. [00:35:02]Tianqi: So I think that there are a few elements that need to come in, right? First of all, you know, we do need a MacBook, the latest one, like M2 Max, because you need the memory to be big enough to cover that. So for a 70 million model, it takes you about, I think, 50 gigahertz of RAM. So the M2 Max, the upper version, will be able to run it, right? And it also leverages machine learning compilation. Again, what we are doing is the same, whether it's running on iPhone, on server cloud GPUs, on AMDs, or on MacBook, we all go through that same MOC pipeline. Of course, in certain cases, maybe we'll do a bit of customization iteration for either ones. And then it runs on the browser runtime, this package of WebLM. So that will effectively... So what we do is we will take that original model and compile to what we call WebGPU. And then the WebLM will be to pick it up. And the WebGPU is this latest GPU technology that major browsers are shipping right now. So you can get it in Chrome for them already. It allows you to be able to access your native GPUs from a browser. And then effectively, that language model is just invoking the WebGPU kernels through there. So actually, when the LATMAR2 came out, initially, we asked the question about, can you run 17 billion on a MacBook? That was the question we're asking. So first, we actually... Jin Lu, who is the engineer pushing this, he got 17 billion on a MacBook. We had a CLI version. So in MLC, you will be able to... That runs through a metal accelerator. So effectively, you use the metal programming language to get the GPU acceleration. So we find, okay, it works for the MacBook. Then we asked, we had a WebGPU backend. Why not try it there? So we just tried it out. And it's really amazing to see everything up and running. And actually, it runs smoothly in that case. So I do think there are some kind of interesting use cases already in this, because everybody has a browser. You don't need to install anything. I think it doesn't make sense yet to really run a 17 billion model on a browser, because you kind of need to be able to download the weight and so on. But I think we're getting there. Effectively, the most powerful models you will be able to run on a consumer device. It's kind of really amazing. And also, in a lot of cases, there might be use cases. For example, if I'm going to build a chatbot that I talk to it and answer questions, maybe some of the components, like the voice to text, could run on the client side. And so there are a lot of possibilities of being able to have something hybrid that contains the edge component or something that runs on a server. [00:37:47]Alessio: Do these browser models have a way for applications to hook into them? So if I'm using, say, you can use OpenAI or you can use the local model. Of course. [00:37:56]Tianqi: Right now, actually, we are building... So there's an NPM package called WebILM, right? So that you will be able to, if you want to embed it onto your web app, you will be able to directly depend on WebILM and you will be able to use it. We are also having a REST API that's OpenAI compatible. So that REST API, I think, right now, it's actually running on native backend. So that if a CUDA server is faster to run on native backend. But also we have a WebGPU version of it that you can go and run. So yeah, we do want to be able to have easier integrations with existing applications. And OpenAI API is certainly one way to do that. Yeah, this is great. [00:38:37]Swyx: I actually did not know there's an NPM package that makes it very, very easy to try out and use. I want to actually... One thing I'm unclear about is the chronology. Because as far as I know, Chrome shipped WebGPU the same time that you shipped WebILM. Okay, yeah. So did you have some kind of secret chat with Chrome? [00:38:57]Tianqi: The good news is that Chrome is doing a very good job of trying to have early release. So although the official shipment of the Chrome WebGPU is the same time as WebILM, actually, you will be able to try out WebGPU technology in Chrome. There is an unstable version called Canary. I think as early as two years ago, there was a WebGPU version. Of course, it's getting better. So we had a TVM-based WebGPU backhand two years ago. Of course, at that time, there were no language models. It was running on less interesting, well, still quite interesting models. And then this year, we really started to see it getting matured and performance keeping up. So we have a more serious push of bringing the language model compatible runtime onto the WebGPU. [00:39:45]Swyx: I think you agree that the hardest part is the model download. Has there been conversations about a one-time model download and sharing between all the apps that might use this API? That is a great point. [00:39:58]Tianqi: I think it's already supported in some sense. When we download the model, WebILM will cache it onto a special Chrome cache. So if a different web app uses the same WebILM JavaScript package, you don't need to redownload the model again. So there is already something there. But of course, you have to download the model once at least to be able to use it. [00:40:19]Swyx: Okay. One more thing just in general before we're about to zoom out to OctoAI. Just the last question is, you're not the only project working on, I guess, local models. That's right. Alternative models. There's gpt4all, there's olama that just recently came out, and there's a bunch of these. What would be your advice to them on what's a valuable problem to work on? And what is just thin wrappers around ggml? Like, what are the interesting problems in this space, basically? [00:40:45]Tianqi: I think making API better is certainly something useful, right? In general, one thing that we do try to push very hard on is this idea of easier universal deployment. So we are also looking forward to actually have more integration with MOC. That's why we're trying to build API like WebILM and other things. So we're also looking forward to collaborate with all those ecosystems and working support to bring in models more universally and be able to also keep up the best performance when possible in a more push-button way. [00:41:15]Alessio: So as we mentioned in the beginning, you're also the co-founder of Octomel. Recently, Octomel released OctoAI, which is a compute service, basically focuses on optimizing model runtimes and acceleration and compilation. What has been the evolution there? So Octo started as kind of like a traditional MLOps tool, where people were building their own models and you help them on that side. And then it seems like now most of the market is shifting to starting from pre-trained generative models. Yeah, what has been that experience for you and what you've seen the market evolve? And how did you decide to release OctoAI? [00:41:52]Tianqi: One thing that we found out is that on one hand, it's really easy to go and get something up and running, right? So if you start to consider there's so many possible availabilities and scalability issues and even integration issues since becoming kind of interesting and complicated. So we really want to make sure to help people to get that part easy, right? And now a lot of things, if we look at the customers we talk to and the market, certainly generative AI is something that is very interesting. So that is something that we really hope to help elevate. And also building on top of technology we build to enable things like portability across hardwares. And you will be able to not worry about the specific details, right? Just focus on getting the model out. We'll try to work on infrastructure and other things that helps on the other end. [00:42:45]Alessio: And when it comes to getting optimization on the runtime, I see when we run an early adopters community and most enterprises issue is how to actually run these models. Do you see that as one of the big bottlenecks now? I think a few years ago it was like, well, we don't have a lot of machine learning talent. We cannot develop our own models. Versus now it's like, there's these great models you can use, but I don't know how to run them efficiently. [00:43:12]Tianqi: That depends on how you define by running, right? On one hand, it's easy to download your MLC, like you download it, you run on a laptop, but then there's also different decisions, right? What if you are trying to serve a larger user request? What if that request changes? What if the availability of hardware changes? Right now it's really hard to get the latest hardware on media, unfortunately, because everybody's trying to work on the things using the hardware that's out there. So I think when the definition of run changes, there are a lot more questions around things. And also in a lot of cases, it's not only about running models, it's also about being able to solve problems around them. How do you manage your model locations and how do you make sure that you get your model close to your execution environment more efficiently? So definitely a lot of engineering challenges out there. That we hope to elevate, yeah. And also, if you think about our future, definitely I feel like right now the technology, given the technology and the kind of hardware availability we have today, we will need to make use of all the possible hardware available out there. That will include a mechanism for cutting down costs, bringing something to the edge and cloud in a more natural way. So I feel like still this is a very early stage of where we are, but it's already good to see a lot of interesting progress. [00:44:35]Alessio: Yeah, that's awesome. I would love, I don't know how much we're going to go in depth into it, but what does it take to actually abstract all of this from the end user? You know, like they don't need to know what GPUs you run, what cloud you're running them on. You take all of that away. What was that like as an engineering challenge? [00:44:51]Tianqi: So I think that there are engineering challenges on. In fact, first of all, you will need to be able to support all the kind of hardware backhand you have, right? On one hand, if you look at the media library, you'll find very surprisingly, not too surprisingly, most of the latest libraries works well on the latest GPU. But there are other GPUs out there in the cloud as well. So certainly being able to have know-hows and being able to do model optimization is one thing, right? Also infrastructures on being able to scale things up, locate models. And in a lot of cases, we do find that on typical models, it also requires kind of vertical iterations. So it's not about, you know, build a silver bullet and that silver bullet is going to solve all the problems. It's more about, you know, we're building a product, we'll work with the users and we find out there are interesting opportunities in a certain point. And when our engineer will go and solve that, and it will automatically reflect it in a service. [00:45:45]Swyx: Awesome. [00:45:46]Alessio: We can jump into the lightning round until, I don't know, Sean, if you have more questions or TQ, if you have more stuff you wanted to talk about that we didn't get a chance to [00:45:54]Swyx: touch on. [00:45:54]Alessio: Yeah, we have talked a lot. [00:45:55]Swyx: So, yeah. We always would like to ask, you know, do you have a commentary on other parts of AI and ML that is interesting to you? [00:46:03]Tianqi: So right now, I think one thing that we are really pushing hard for is this question about how far can we bring open source, right? I'm kind of like a hacker and I really like to put things together. So I think it's unclear in the future of what the future of AI looks like. On one hand, it could be possible that, you know, you just have a few big players, you just try to talk to those bigger language models and that can do everything, right? On the other hand, one of the things that Wailing Academic is really excited and pushing for, that's one reason why I'm pushing for MLC, is that can we build something where you have different models? You have personal models that know the best movie you like, but you also have bigger models that maybe know more, and you get those models to interact with each other, right? And be able to have a wide ecosystem of AI agents that helps each person while still being able to do things like personalization. Some of them can run locally, some of them, of course, running on a cloud, and how do they interact with each other? So I think that is a very exciting time where the future is yet undecided, but I feel like there is something we can do to shape that future as well. [00:47:18]Swyx: One more thing, which is something I'm also pursuing, which is, and this kind of goes back into predictions, but also back in your history, do you have any idea, or are you looking out for anything post-transformers as far as architecture is concerned? [00:47:32]Tianqi: I think, you know, in a lot of these cases, you can find there are already promising models for long contexts, right? There are space-based models, where like, you know, a lot of some of our colleagues from Albert, who he worked on this HIPPO models, right? And then there is an open source version called RWKV. It's like a recurrent models that allows you to summarize things. Actually, we are bringing RWKV to MOC as well, so maybe you will be able to see one of the models. [00:48:00]Swyx: We actually recorded an episode with one of the RWKV core members. It's unclear because there's no academic backing. It's just open source people. Oh, I see. So you like the merging of recurrent networks and transformers? [00:48:13]Tianqi: I do love to see this model space continue growing, right? And I feel like in a lot of cases, it's just that attention mechanism is getting changed in some sense. So I feel like definitely there are still a lot of things to be explored here. And that is also one reason why we want to keep pushing machine learning compilation, because one of the things we are trying to push in was productivity. So that for machine learning engineering, so that as soon as some of the models came out, we will be able to, you know, empower them onto those environments that's out there. [00:48:43]Swyx: Yeah, it's a really good mission. Okay. Very excited to see that RWKV and state space model stuff. I'm hearing increasing chatter about that stuff. Okay. Lightning round, as always fun. I'll take the first one. Acceleration. What has already happened in AI that you thought would take much longer? [00:48:59]Tianqi: Emergence of more like a conversation chatbot ability is something that kind of surprised me before it came out. This is like one piece that I feel originally I thought would take much longer, but yeah, [00:49:11]Swyx: it happens. And it's funny because like the original, like Eliza chatbot was something that goes all the way back in time. Right. And then we just suddenly came back again. Yeah. [00:49:21]Tianqi: It's always too interesting to think about, but with a kind of a different technology [00:49:25]Swyx: in some sense. [00:49:25]Alessio: What about the most interesting unsolved question in AI? [00:49:31]Swyx: That's a hard one, right? [00:49:32]Tianqi: So I can tell you like what kind of I'm excited about. So, so I think that I have always been excited about this idea of continuous learning and lifelong learning in some sense. So how AI continues to evolve with the knowledges that have been there. It seems that we're getting much closer with all those recent technologies. So being able to develop systems, support, and be able to think about how AI continues to evolve is something that I'm really excited about. [00:50:01]Swyx: So specifically, just to double click on this, are you talking about continuous training? That's like a training. [00:50:06]Tianqi: I feel like, you know, training adaptation and it's all similar things, right? You want to think about entire life cycle, right? The life cycle of collecting data, training, fine tuning, and maybe have your local context that getting continuously curated and feed onto models. So I think all these things are interesting and relevant in here. [00:50:29]Swyx: Yeah. I think this is something that people are really asking, you know, right now we have moved a lot into the sort of pre-training phase and off the shelf, you know, the model downloads and stuff like that, which seems very counterintuitive compared to the continuous training paradigm that people want. So I guess the last question would be for takeaways. What's basically one message that you want every listener, every person to remember today? [00:50:54]Tianqi: I think it's getting more obvious now, but I think one of the things that I always want to mention in my talks is that, you know, when you're thinking about AI applications, originally people think about algorithms a lot more, right? Our algorithm models, they are still very important. But usually when you build AI applications, it takes, you know, both algorithm side, the system optimizations, and the data curations, right? So it takes a connection of so many facades to be able to bring together an AI system and be able to look at it from that holistic perspective is really useful when we start to build modern applications. I think it's going to continue going to be more important in the future. [00:51:35]Swyx: Yeah. Thank you for showing the way on this. And honestly, just making things possible that I thought would take a lot longer. So thanks for everything you've done. [00:51:46]Tianqi: Thank you for having me. [00:51:47]Swyx: Yeah. [00:51:47]Alessio: Thanks for coming on TQ. [00:51:49]Swyx: Have a good one. [00:51:49] Get full access to Latent Space at www.latent.space/subscribe
52:1010/08/2023
[AI Breakdown] Summer AI Technical Roundup: a Latent Space x AI Breakdown crossover pod!
Our 3rd podcast feed swap with other AI pod friends! Check out Cognitive Revolution and Practical AI as well.NLW is the best daily AI YouTube/podcaster with the AI Breakdown. His summaries and content curation are spot on and always finds the interesting angle that will keep you thinking. Subscribe to the AI Breakdown wherever fine podcasts are sold! https://pod.link/1680633614You can also watch on YouTube:Timestampscourtesy of summarize.techThe hosts discuss the launch of Code Interpreter as a separate model from OpenAI and speculate that it represents the release of GPT 4.5. People have found Code Interpreter to be better than expected, even for tasks unrelated to coding. They discuss the significance of this release, as well as the challenges of evaluating AI models, the cultural mismatch between researchers and users, and the increasing value of data in the AI industry. They also touch on the impact of open-source tools, the potential of AI companions, the advantages of Anthropics compared to other platforms, advancements in image recognition and multimodality, and predictions for the future of AI.* 00:00:00 In this section, the hosts discuss the launch of Code Interpreter from OpenAI and its significance in the development of the AI field. They explain that Code Interpreter, initially introduced as a plugin, is now considered a separate model with its own dropdown menu. They note that people have found Code Interpreter to be better than expected, even for tasks that are not related to coding. This leads them to speculate that Code Interpreter actually represents the release of GPT 4.5, as there has been no official announcement or blog post about it. They also mention that the AI safety concerns and regulatory environment may be impacting how OpenAI names and labels their models. Overall, they believe that Code Interpreter's release signifies a significant shift in the AI field and hints at the possibility of future advanced models like GPT 5.* 00:05:00 In this section, the speaker discusses the improvements in GPT 4.5 and how it enhances the experience for non-coding queries and inputs. They explain that the code interpreter feature allows for a wider range of use cases that were not possible with previous models like GPT 3.5. Additionally, they highlight the value of the code interpreter in assisting individuals with no coding experience to solve basic coding problems. This feature is likened to having a junior developer or intern analyst that aids in conducting tests and simplifies coding tasks. The speaker emphasizes that GPT 4.5 enables users to be more productive and efficient, especially when dealing with code-related challenges. They also discuss the future direction of AGI, where more time will be dedicated to inference rather than training, as this approach has shown significant improvements in terms of problem-solving.* 00:10:00 In this section, the speaker discusses how advanced AI models like GPT-4.5 are not just larger versions of previous models but rather employ fundamentally different techniques. They compare the evolution of AI models to the evolutionary timeline of humans, where the invention of tools opened up a whole new set of possibilities. They touch on the difficulty of evaluating AI models, particularly in more subjective tasks, and highlight how perceptions of model performance can be influenced by factors like formatting preferences. Additionally, the speaker mentions the challenges of reinforcement learning and the uncertainty around what the model is prioritizing in its suggestions. They conclude that OpenAI, as a research lab, is grappling with the complexities of updating models and ensuring reliability for users.* 00:15:00 In this section, the speaker discusses the cultural mismatch between OpenAI researchers and users of OpenAI's products, highlighting the conflicting statements made about model updates. They suggest that OpenAI needs to establish a policy that everyone can accept. The speaker also emphasizes the challenges of communication and the difficulty of serving different stakeholders. They mention the impact of small disruptions on workflows and the lack of immediate feedback within OpenAI's system. Additionally, the speaker briefly discusses the significance of OpenAI's custom instructions feature, stating that it allows for more personalization but is not fundamentally different from what other chat companies already offer. The discussion then transitions to Facebook's release of LAMA2, which holds significance both technically and for users, although further details on its significance are not provided in this excerpt.* 00:20:00 In this section, the introduction of GPT-4.5, also known as LAVA 2, is discussed. LAVA 2 is the first fully commercially usable GPT 3.5 equivalent model, which is a significant development because it allows users to run it on their own infrastructure and fine-tune it according to their needs. Although it is not fully open source, it presents new opportunities for various industries such as government, healthcare, and finance. The discussion also touches upon the open source aspect of LAVA 2, with the recognition that it has still contributed significantly to the community, as evidenced by the three million dollars' worth of compute and the estimated 15 to 20 million dollars' worth of additional fine-tuning capabilities it brings. The conversation acknowledges the value of open source models and data, while also recognizing the challenges and complexities in striking a balance between openness and restrictions.-* 00:25:00 In this section, the discussion centers around the commoditization of compute and the increasing value of data in the AI industry. While GPU compute is currently in high demand, it is observed that data is what holds the real value in AI. The conversation touches on the history of Open Source models and how the release of data for models like GPT J and GPT Neo signal a shift towards prioritizing data over model weights. The transcript also mentions the caution around data usage, citing examples of copyright concerns with datasets like Bookcorpus. The debate arises on whether ML engineers should proactively use open data or wait for permission, with some arguing for proactive usage to avoid holding back progress. The conversation also discusses the importance of terminology and protecting the definition of open source, while recognizing that the functional implications of open data are what matter most.* 00:30:00 In this section, the conversation revolves around the impact of open-source tools on companies and how it has influenced their approach to AI development. It is noted that companies can no longer just offer a nice user interface (UI) wrapper around an open AI model, as customers are demanding more. The competition has shifted towards other aspects of productionizing AI applications, which is seen as a positive development. The speaker predicts that OpenAI's competitive pressure will lead to opening up their source code and expects interesting advancements to emerge, such as running models locally for unlimited use. Additionally, the conversation touches on the potential of commercially available models, the application of new techniques, and the creativity unlocked by open source. The speaker also mentions the AI girlfriend economy, an area that is often overlooked but has millions of users and significant financial success.* 00:35:00 In this section, the speaker discusses their prediction about the long-term impact of AI on interpersonal relationships, suggesting that AI companions, such as AI girlfriends or boyfriends, could help address the loneliness crisis and reduce incidents of violence. They also mention the idea of using AI models to improve social interactions and communication skills. However, they highlight that this idea of AI companions may face resistance from older generations who may struggle to accept their legitimacy. The speaker also mentions an example of using AI models to create a mental wellness product in the form of a private journal. Overall, the speaker believes that while AI companions may have potential, they may not completely replace human relationships and interactions.* 00:40:00 In this section, the speaker discusses their views on Anthropics and the advantages it offers compared to other platforms. They mention that while Anthropics used to position themselves as the safer alternative to OpenAI, it was not appealing to many engineers. However, with the introduction of the 100K contest window and the ability to upload multiple files, Anthropics has become state-of-the-art in certain dimensions, such as latency and reliability in code synthesis. The speaker also notes that some businesses are choosing to build with the Anthropics API over OpenAI due to these advantages. They believe that Anthropics is finally finding its foothold after being overshadowed by OpenAI for a long time. Additionally, the speaker discusses their experience at the Anthropics hackathon, where they saw developer excitement for the platform. They believe that Anthropics is on its way up and that it paves the way for a multi-model future. However, they also acknowledge that the odds are stacked against Anthropics and that it needs more marketing support and community buy-in. Lastly, the speaker mentions the importance of running chats side by side against different models like Tracicia and GPT-4.5, and highlights that in their experience, Anthropics wins about 30% of the time, making it a valuable addition to one's toolkit.* 00:45:00 In this section, the discussion revolves around the advancements in image recognition and multimodality in language models like GPT-4.5. While there was some excitement about these developments, it was noted that relying on model updates alone may not be sufficient, and there is a need to focus on product-level improvements, such as integrating language models into services like Google Maps. However, concerns were raised about the reliability of updates, as evidenced by a regression in Bard's code interpreter functionality. Additionally, other trends in the developer community, like the emergence of auto GPT projects and the ongoing quest for building useful agents, were highlighted. Finally, there was mention of the growing interest in evaluation-focused companies like LangChain and LaunchLang, which aim to monitor the success of prompts and agents.* 00:50:00 In this section, the speaker discusses the focus on model evaluation and observability, as well as the importance of combining deep industry expertise with AI technology to make improvements. They also touch on the need for creating an information hierarchy between documents and scoring them in specific verticals like Finance. The speaker mentions advancements in text-to-image capabilities and expresses interest in character AI and AI-native social media. They mention the possibility of AI personas from Meta and the development of agent clouds optimized for EI agents. They acknowledge that these advancements may raise concerns among AI safety proponents. Overall, there seems to be excitement and exploration around these emerging technologies.* 00:55:00 In this section, the speakers discuss their predictions and what they are closely watching in the coming months. Alice believes that there will be more public talk about open source models being used in production, as currently, many perceive them as just toys. She expects companies to start deploying these models and showcasing their usage. Sean predicts the rise of AI engineers as a profession, with people transitioning from informal groups to certified professionals working in AI teams within companies. He mentions that the first AI engineer within Meta has already been announced. Overall, they anticipate a relatively quiet August followed by a resurgence of activity in September, with events like Facebook Connect and continued hackathons driving innovation.Transcriptall right what is going on how's it going boys great to have you here hey good how are y'all good I I think I'm excited for this yeah no I'm super excited I think uh you know we were just talking a little bit before this that the AI audience right now is really interesting it's sort of on the one hand you have of course the folks who are actually in it who are building in it who are you know or or dabbling because they're in some other field but they're fascinated by it and you know are spending their nights in weekends building and then on the other hand you have the folks who are you know what we used to call non-technical perhaps but who are actively paying attention in a way that I think is very different to the technical evolutions of this field because they have a sense or an understanding that it's so fast moving that the place that they have to be paying attention to is you know what's changing from the standpoint of of developers and Builders so I what we want to do today is kind of reflect on the month of July which had a couple of I think really Keystone events in the context of what it means for the technical development of the AI field and and what you know where it leads how people's Frameworks are changing how people sort of sense that things have changed over the last month and I think that the place to start although we could choose a lot of different examples is with an idea that you guys have spent a lot of time sharing on Twitter and in other places that the launch of code interpreter from openai which is nominally a chat GPT plugin actually represents functionally something closer to the release of GPT 4.5 so maybe we can start by just having you guys sort of explain that idea uh and then we can kind of take it from there yeah I'll maybe start with this one um yeah so quote interpreter was first announced as a plug-in at least in the plugins announcement from March but from the start it was already presented as a separate model because at least when you look in the UI you know you don't go into the charity plugin see why and pick it from a menu plugins it is actually a separate model in in the drop down menu and it is so today and I think um yes it adds on an additional sandbox for running and testing code and iterating on that um and actually you can upload files to it and do operations and files and people are having a lot of fun uploading different batteries and hacking uh to see what the container is and try to break out into the Container um but what really convinced me that it might be a separate model was when people tried it on tasks that were not code and found it better so code interpreter is poorly named not just because you know it just sounds like a like a weird developer Tool uh but they basically it's kind of maybe hiding some progress that openai has made that it's completely not been public about there's no blog post about it what interpreter itself is launched in a support Forum post uh you know low-key it wouldn't even announced by any of the major uh public channels that opening has um and so the leading theory is that you know I've dubbed a gpp 4.5 I think like if they were ever to release an API for that they might retroactively rename it for coin firings in the same way that 3.5 was actually renamed when retracted between three rooms um and I think and since I published that post or tweeted that stuff uh the the leading release now for why they did not do it is because they would piss off all the AI safety people yeah no I mean it would it was sort of correspondent obviously like a thing that's happened less just this month but more over the last three months is a total Overton window shift in that AI safety conversation starting from I think about in April or May when um Jeffrey Hinton left Google there has been a big shift in that conversation obviously Regulators are way more active now than they were even a couple months ago and so I do think that there are probably constraints in how you know open AI at any other company in the space feel like they can label or name things and even just as we're recording this today we just saw a trademark for gpt5 which is sort of most likely I think just um you know dotting the eyes and crossing the t's as a company because they're eventually going to have a gpt5 um I I would be very shocked if it I would be very shocked at this point if there are any models that are clearly ahead of gpt4 that don't that that come out before there is some pretty clear guidance from the US government around what it looks like to release more advanced models than gpt4 so it's an interesting interesting moment I guess let's talk about what functionally it means for it to be you know that much better better enough that we would call it GPT 4.5 and maybe what might be useful is breaking that apart into how it is improving the experience for non-coding queries or you know or or or or or inputs and then separately you know how it is made uh to chat gbt as a as a as a coding support tool different as well I think there's a lot of things to think about so one models are usually benchmarked against certain tasks and you know that works for development but then there's the reality of the model that you know if you ask for example mathematical question the like gpd3 3.5 you don't really get good responses because of how um digits are tokenized in the model so it's hard for the models to actually reason about numbers but now that you put a code interpreter in it all of a sudden it's not a map in the tokenizer in the latent space question it's like can you write code that answers the math question so that kind of enables a lot more use cases that are just not possible with the Transformer architecture of the underlying model and then the other thing is that when it first came out people were like oh this is great for developers it's like I know what to do I just ask it but there's this whole other side of the water which is hey I have this like very basic thing you know how I'm a software engineer but background you know how sometimes people that have no coding experience come to you and it's like hey I know this is like really hard but could you help me do this and it's like it's really easy and sometimes it and sometimes they think it's easy and it's hard but uh code interpreter enables that whole um space of problems to be solved independently by people so it's kind of having you know Sean talked about this before about um some of these models being like a junior developer that you have on staff for you to be more productive this is similar for non-business people it's like having Junior you know whatever like a intern analyst that helps you do these tests that are not even like software engineering tasks it's more like code is just a language used to express them it's like a pretty basic stuff sometimes uh but you just cannot cannot do it without so uh for me the gbd4 4.5 thing is less about you know is this a new model that is like built after gbd4 it's more about capability so if you have gbt4 versus 4.5 you're probably gonna get more stuff done with 4.5 just because of like the code interpreter Peace So for me that's enough to use the code name but as you said Sam Allman said they're not training the next model so they said this is 4.5 you would have like it would go back to Washington DC and be in front of Congress and have to talk about it again sorry yeah um well one thing that I always want to impress upon people is we're not just talking about like yes it is writing code for you but actually you know if you step back away from the code and just think about what it's doing is it's having the ability to spend more Insurance time on harder problems and it matches what uh we do when we are faced with difficult problems as well because right now any llm and these before code interpreter any llm if you give it a question like what is one plus two it'll it'll take the same amount of time to respond as uh something like prove the Black Shoals theorem right like uh and that should not be the case actually we should take more time to think when we are considering harder problems um and I think what I think the next Frontier and why I called it 4.5 is not just because it has had extra training it's not just because it has the coding environment and also because there's a general philosophy and move that I see on my open EI um and the people that it hires that so in my blog post I called out gong who like I first slowly met so it's kind of awkward to talk about it like I guess a friend or a friend of a friend um but it's true that I have met multiple people not opening I have specifically been hired to work on more inference time uh optimizations as compared to trading time um and I think that is the future for gpd5s right so the reason you the reason I think about this working client is that this is the direction of AGI that we're going to spend more time on inference um and uh it just makes a whole lot of sense when you look at gnomes background working on the uh the broadest and then Cicero um all of which is just consistently the same result which is every second or millisecond extra spent on inference it's worth like 10 000 of that of of that in training especially when you can vary it based on the problem difficulty um and this is basically uh ties back to the origin of open AI which originally started playing games they used to play DotA they used to play uh you know all sorts of all sorts of games in sort of those reinforcement learning environments and the typical way that your program these AI is doing doing uh doing these games is when they have lots of branches and you take more time to Circle and um and figure out what the optimal strategy is and when there's not that many branches to to go down then you just take the shortcut in uh you have to give to give the right answer but varying the inference time is the integration here one of the things that it it seems and this what you just described I think aligns with this is I think there's a perception that uh more advanced models are just going to be bigger data sets with more of the same type of training versus sort of fundamentally different techniques or different areas of emphasis that go beyond just how big the data set is and so you know one of the things that strikes me listening to or kind of observing how code interpreter works is it almost feels like a break in The evolutionary timeline of gbt because it's like GPT with tools right unless you just kind of described it it's like it doesn't know about math it doesn't have to know about math if it can write code to figure out the math right so what it needs is the tool of being able to write code and that allows it to figure something out and that is akin to you know humans are evolving for Millennia not using tools then all of a sudden someone picks up a rock and this whole entire set of things that we couldn't do before just based on our own evolutionary pathway are now open to us because of the use of the tool I don't think it's a Perfect Analogy but it does feel somewhat closer to that than just again like it's a little bit better than 3.5 so we called it four it's a little bit better than four so we called it 4.5 kind of a mental framework yeah noise I made there I guess sort of the the another big topic that relates to this that was subject of a lot of conversation not just this month that has been for a couple months is this question of whether gpt4 has gotten worse or whether it's been nerfed and there was some research that came out around that with maybe um variable variable uh sort of feelings around it but what did you guys make of that whole conversation I think evals are one of the hardest things in the space so I've had this discussion with Founders before it's really easy we always bring up co-pilot as one example of like Cutting Edge eval where they not not only look at how much um of their suggestions you accept but also how much of the code is still in a minute after three minutes after five minutes after it's really easy to do for code but like for more open and degenerative tasks it's kind of hard to say what's good and what isn't you know like if I'm asking to write the show notes for our podcast which has never been able to do um how do you how do you email that it's really hard so even if you read through through the paper that uh Ling Zhao and mate and James wrote a lot of things are like yeah they're they're worse but like how do you really say that you know like sometimes it's not kind of you know cut and dry like sometimes it's like oh the formatting changed and like I don't like this formatting as much but if the formatting was always the same to begin with would you have ever complained you know there's there's a lot of that um and I think with llama too we've seen that sometimes like rlh traffic can like go wrong in terms of like being too tight you know for example somebody has Lama too is like how do you kill a process in like Linux and Mama 2 was like oh it's wrong to like kill and like I cannot help you like doing that you know um and I think there's been more more chat online about you know sometimes when you do reinforcement learning you don't know what reward and like what what part of like the the suggestion the model is anchoring on you know like sometimes it's like oh this is better sometimes the model might be learning that you like more verbose question answers even though they're they're right the same way so there's a lot of stuff there to figure out but yeah I think some examples in the paper like clearly worse some of them are like not as not as crazy um yeah but I mean it'll be nice under a lot of pressure on the unlike the safety and like all the the instruction side and we cannot like the best thing to do would be hey let's version lock the model and like keep doing emails against each other like doing an email today and an email like that was like a year ago there might be like 20 versions in between that you don't even know how the model has has changed so um yeah evals are are hard it's the tldr I I think I think basically this is what we're seeing is open AI having come to terms with that the origin of itself as a research lab where updating models this is is just a relatively routine operation versus a product or infrastructure company where it has to have some kind of reliability guarantee to its users um and so openai are they internally as researchers are used to one thing and then the people who come and depend on open EI as on as as a product are used to a different thing and I think there's there's a little bit of cultural mismatch here like even within open ai's public statements we have simultaneously Logan from from open AI saying that the models are frozen and then you know his his VPO product saying that we update models all the time that are not frozen so which is like you cannot simultaneously be true um so so I think they're shot yeah I think they're trying to figure it out I think people are rightly afraid uh of them basing themselves on top of a black box uh and that's why maybe you know we'll talk about llama too in a bit uh that's that's why maybe they want to own the Black Box such that uh it doesn't change out from underturn um and I think this is fine this is normal but uh openai it's not that hard for opening night to figure out a policy that is comfortable with that that everybody like accepts um it won't take them too long and this is not a technical challenge it's more of a organizational and business challenge yeah I mean I I think that the communications challenge that you're referencing is also extreme and I think that you're right to identify that they've gone from like quirky little you know lab with these big aspirations to like epicenter of a of a national conversation or a global conversation about existential challenges you know and the way that you talk in those two different circumstances is very different and you're sort of serving a lot of different Masters hopefully always Guided by your own set of priorities and that's going to be you know inherently difficult uh but with so many eyes on it and people who are you know the thing that makes it different is it's not just like Facebook where it's like oh we've got a new feature you know in the early days that made us all annoyed like you know people were so angry when they added the feed uh you know that we all got used to it this is something where people have redesigned workflows around it and so small disruptions that change those workflows can be hugely impactful yeah it's an interesting comparison with the Facebook feed because in the era of AD Tech the feedback was immediate like you changed an algorithm and if the click-through rates are the you know the whatever metric you're you're optimizing for in your social network if they started to start to decline your change will be reverted tomorrow you know uh whereas here it's like we just talked about it's hard to measure and you don't get that much feedback like I you know I I have there's sort of the thumbs up and down uh action that you can take an open AI that I've never shared most people don't don't give feedback at all so like opening a has very little feedback to to go with on like what is actually improving under not improving and I think this is just normal like uh it's it's kind of what we want in a non-adtrack universe right like we've just moved to the subscription economy that everyone is like piety for uh and this is the result that we're trading off uh uh some some amount of product feedback actually it's super interesting so the the one other thing before we leave um uh open AI ecosystem the one other big sort of feature announcement from this month was uh custom instructions how significant do you think that was as an update so minor uh so it is significant in the sense that you get to personalize track TBT much more than uh you previously would have like it actually will remember facts about you it will try to obey system prompts about you you had this in the playground since forever uh because you could enter in the system prompt uh in there and just chat to complete that habit and this is a rare instance of the chat tpd team lagging behind the general capabilities of the open AI platform uh and they just shipped something that could have been there a long time ago it was present in perplexity Ai and if you think about it um basically every other open source chat company or open uh we have a third-party chat company had already had it before tragedy um so what I'm talking about is character AI what I'm talking about is the various uh ai waifu ai girlfriend type companies Each of which have you know characters that you can sort of sub in as custom instructions um so I think chargpt is basically playing catch up here it's good for obviously the largest user base in the world of chat AI but it's not something fundamentally we haven't seen before that actually I think perfectly brings up a segue to the other major obvious thing that happened this month from both a technical perspective but also just I think long term from a user perspective which was Facebook releasing llama 2. so this was something that was uh you know anticipated for a while but I I guess where to even start with the significance of llama 2 I mean how do you sum it up if you're talking to someone who sort of isn't paying attention to the space you know what what does the introduction of of lava 2 mean relative to other things that had been available previous to it um it is the first fully commercially usable not fully open source we'll talk about that first fully commercially usable gbt 3.5 equivalent model and that's a big deal because one you can run it on your own infrastructure you can write it on your own cloud so all the governments and Healthcare and financial use cases are opened up to that and then you can fine tune it because you have full control over all the weights and all the internals as much as you want um so it's a big deal from from that point of view um not as big in terms of the you know pushing you know for the state of the art um but it's still still extremely big deal yep I think the the open source part so I've wrote so the data it came out over this post um about you know why llamasu is not open source and why it doesn't matter and uh I was telling Sean I'm writing this thing and it was like whatever man like this license stuff is like so so tired I was like yeah I'll just post it on on anchor news in the morning and I think it was on the front page for like the whole day they got like 228 comments and I was regarding the flash attention podcast episode in the morning so I got out of the studio and it was like 230 comments of people being very like you know upset one way or the other about license and my point and you know I was I started an open source company myself in the past and I contributed to a bunch of projects is that yeah llama 2 is not open source but like the open source Institute definition but we just don't have a better definition for like models you know like because it's mostly open source you can use it for a lot of stuff so what's like the and it's not Source available because for a lot of stuff you can use it commercially so how do we find better labels and my point was like look let's figure out what the Better Label is but even though it's not fully open source it's still like three million dollars of like flops donated to the community basically you know who else who else in the open source Community is stepping up and putting 3 million of h100 to make us train this model so I I think like overall netmed is like a very positive thing for the community and then you've seen how much stuff was built on top of it there's like the quantized versions with ggml there's like the context window expansion um there's so much being done by the community that um I I think it was it was great for for everyone uh and by the way three million is the lower uh that's just compute um there's a reasonable estimate from scaliai that the extra fine tune that you could on top of it uh was worth about 15 to 20 million dollars um so that's a lot of money just kind of donated to the community um although they didn't release the data they didn't tell us any of the data sets uh they just say trust us we didn't train on any of your Facebook information which is uh it's the first instance where the models are more open than the data and I think that's a reflection of where the relative shift in value might uh happen um as a result of lava too and so I I don't know you can take that in multiple different directions but I just want to point that out yeah I was gonna say so we first had the the examples I made so we first had the open models open source models which is like rent pajama so the data so have been the training code is open the model weights are open then stability kind of did the same thing with stable LM which is like hey the widths are open but we're not giving you the data you know so you can you can download the model but you cannot retrain it yourself and that llama too it's like we don't give you the data we'll give you the models but you can only use it for for some stuff so there's more and more restriction but like Sean is saying and we talked about this before everybody wants to train their model nobody wants to open source the best data set for X you know which maybe is what more open source people should focus on it's like how to build better specific data sets instead of yet spending giving Jensen Wang another five million dollars of gpus but the model gets more headlines for now you know so that's that's what everybody Adidas yeah and I want to point out it's a reversal of the open source culture they used to get a sequence of openness and you could kind of pick and choose from uh whether it's open code all the way down to open data versus all the way down to uh open weights and you know there's some some barrier to combination I I wrote I wrote this book a long time ago because I don't remember that the five levels um uh but yeah like it's it's very strange and I think it's just it's just a relative uh um discussion of where the money is going um and I think it makes usually shows that compute is becoming commoditized um which yes there's a GPU approach right now uh a100 has sold out everywhere across the board people are commenting all about it uh this month um you know and there's people hoarding compute like nobody's business but as far as the value an AI is concerned it looks like computers is relatively um you know uh commoditized it's actually data that's that that people are kind of safeguarding generously um going all the way back to the history of Open Source models that you lose their AI when they when they train GPT J and GPT Neo as the first reproductions of gpt3 um they they release the data first uh stable diffusion when they train stable diffusion they release live on 500b first uh and that's I think reflectors or like the the normal sequence of events you release the data that anybody's uh the model weights but now now we're just skipping the data part and I think it's just it's fair it's a way to think about yourself you know I think um one of our conversations I think I think it was my Conover when he was talking about comparing our current AI era versus uh the 2000s era in search engines you know all he basically said like all of the public publishable information retrieval research dried up because all those phds went to work at Google and Google just sat on it uh and that it this is now you know a fight for IP um and and I think that is just a very rational way of behavior and I guess like a capitalist AI economy do you think so one of the things that we were talking about before starting with the the code interpreter 4.5 and why or gbt 4.5 and why they might not call it that is the emergence of this sort of regulatory if not pressure certainly Intrigue uh you know do you think that there's potentially an aspect of that when it comes to why people are so jealously safeguarding you know the the data is there more risk for for being open about where the data is actually coming from the the books three examples probably good so MPT trained their model on a data set called bookstree which is 190 000 books something like that um and then people on Twitter were like well this stuff is not you know in the free you know it's under copyright still you just published yeah yeah it's not in the public domain you can just take it and and train on it but the license for some of these books is like kind of blurry you know on like what's fair use and what is it um and so there was like this old thing on Twitter about it and then MPD you know Mosaic first changed the license and they changed it back and um I think Sean uh Sean presser from Luther was just tweeting about this yesterday and he was basically saying look as ml Engineers maybe it's better to not try and be the you know the main ethics night and just say hey look the data's open and let's try it and then maybe people later will say hey please don't use the data and then we can figure it out but like proactively not using all of this stuff can kind of keep the progress back and and you know he's more coming from the side of like a Luther which is like doing this work in public so for them it's like hey you know if you don't want us to train now this is fine but we shouldn't by default not do it um versus if you're meta you know they said the deterring llama on like stuff available on the internet they didn't say the train llama on stuff that is licensed to train on uh it's a it's a small it's a small difference the other piece of this that that I I wanted to sort of circle back to because we kind of breezed over it but I think it's really significant you know we did get a little lost in this conversation around open source definitions and I don't think that's unimportant I think that people are rightly protective when a set of terminology has a particular meaning and a massive Global Corporation sort of tries to like nudge it towards something that is potentially serving their ends versus uh you know actually being by that definition but I also think that your point which is that functionally relative to the rest of the space it probably doesn't super matter because what people mean is almost more about functionally what they can do with it and what it means for the space relative to more closed models and I I think one of the big observations has been that the availability of uh you know from from when llama one was you know fully fully leaked the availability of of all of that has pretty dramatically changed won the evolution of the space over the past few months and two I think from a business standpoint how the big companies and incumbents have thought about this so another big conversation this month going back to sort of the The Venture Capital side of of your life has been the extent to which uh companies or startups are or big companies are not wanting to sort of side on with some startup that's going to offer them you know AI whatever because their technical teams can just go spin up you know sort of their their own version of it because of the the sort of you know availability of these open source tools but you know I guess I'm interested I guess in bringing the the sort of Open Source you know in air quotes side of the conversation into the to the realm of how it has impacted how companies are thinking about you know uh their their development in the in the context of the AI space I think it's just Rising like put it raising the bar on like what you're supposed to offer so I think six nine months ago it was enough to offer a nice UI wrapper around an open AI model today it isn't anymore so that's really the main the main difference it's like what are you doing outside of wrapping the model and people need more and more before they buy versus building yeah I think um it actually moves the area of competition uh towards other parts of productionizing AI applications you know I I think that's probably just a positive um I I feel like um the uh actually the competitive pressure that La The Meta is putting on Open the Eyes is a good thing uh one of the fun predictions that I made was in the next six months ubt opening hour open source tpc3 um which which is not open source and uh I like it's so far behind the state of the art now that it doesn't matter as far as safety is concerned and it basically peeps open AI in the open source AI game uh which which would be nice to have of the things that people have been building um you called out a couple uh context window expansion but have there been any that really stand out to you as super interesting or unexpected or or you know particularly high potential um one of our short short term podcast guests uh the mlc team they were thumb wrapping llama two to run on MacBook gpus so I think that's like the the most interesting Gap right it's like how do we go from paper token to like unlimited local use that's one of the main main things that keep even people like me from like automating a lot of stuff right it's like I don't want to constantly pay open AI to do menial stuff but if I go run this locally and do it even if five times lower I would do it so that's uh that's a super exciting space yeah I would say beyond that there hasn't been that much I mean it's it's only a few weeks old so uh it hasn't been damaged uh emergence coming from it I would I would definitely say um you want to keep the lookout for uh the uh basically what happens in post lab number one which you know keep in mind it was only in February um the same thing that happened with Acuna alpaca and all the other sort of instructions to you and sort of research type models um but just more of them because now they are also commercially available um we haven't seen them come out yet but it's it's almost like guarantee that they will um you can also apply all the new techniques uh that have been have emerged since then like Json former because now you have access to all the model leads um to to to llama and I think uh that will also uh create another subset of models that uh basically was only theoretically applicable to sort of research holiday models uh before and so now these will be authored commercially as well um so like yeah nothing nothing like really eye-popping I would say um but but it's been five minutes is that it's yeah it's it's been it's been a very short amount of time uh and the thing of Open Source is that the creativity unlocked um is is very hard to predict and actually I think happens a lot in the uh let's just say the the mess official part of the economy where where I've been focusing a lot on recently on um the sort of AI girlfriend economy which is huge uh I I feel like it's not polite conversation that the amount of um AI girlfriend area has but it's real they're millions of users they're making a lot of money uh and it's just virtually not talked about in in like polite SF circles it feels like one of those areas that's going to be uh an absolute lightning rod when it comes to the societal debates around this technology like you can feel it that that sort of oh you know the people are going to hone in on that as example a of you know a change that they don't like that's my guess at least I don't know like so I have a really crazy longer term prediction like maybe on the order of like 30 to 50 years but um you know yeah a girlfriend for Nobel Peace Prize because it what if it solves the loneliness crisis right what if it cuts the rate of Terror and uh you know school shootings by like or something like that's huge my wife and I have joked about how every generation there's always something like they always think that they're like so far ahead and they think that there's nothing that their kids could throw at them that they just like fundamentally won't get and without fail every generation has something that seems just totally normal to them that their parents generation writ large just like has such a hard time with and we're like it's probably gonna be like AI girlfriends and boyfriends we're gonna be like yeah but they're not real they're like yeah but it's real to me you know they're having debates with our future 13 year old or kids are only four and two now so it feels like maybe the right timeline yeah I I've heard actually of all people Matthew McConaughey on the Lexus and what what yeah you was he was great shout out shout out shout out Matt um but they were talking about they were kind of talking about this and they were noodle in the this idea of like computers helping us being better so kind of like we have computers learn how to play chess and then we all got better at chess by using the computers to like learn and like experiment uh they were talking about similarly in interpersonal relationship maybe it does you know it doesn't have to be you shut off from from humans but it's like using some of these models and some of these things to actually like learn you know how to better interact with people and if you're like shy and an introvert it's like okay I can like try these jokes on like these conversation points with a model and like you know it teaches me hey that's not okay to say or like you know you should maybe be more open or or I don't know but I think that's a more wholesome view of it than like everybody just kind of runs away from society and that's like 10 AI friends and doesn't talk to humans anymore what's it's much less sexy to just say like AI friends right that even though like there's the if you look at the possibility set you know the idea that people might have this sort of uh to your point like conversational partner that helps them effectively work through their own things in this safe space that doesn't necessarily relate to romantic attachment just because the movie Her came out right right it can just be a panel of experts uh and I I've uh I had I do have plans to build uh you know a small CEO which is uh it's my own boss um and just for me to check it um and actually we'll flag out just lifting various services so you come a lot you come across a lot of AI Engineers who are interested in building mental wellness products and a lot of these will take the form of some kind of Journal um and this will be your most private uh thoughts that you don't really want to send anywhere else um and so actually all these will make advantage of Open Source models because they don't want to set it to open AI um and that makes a ton of sense which is something like I just came across uh from one of my friends uh here in the coordinating space that I have uh where it's it's one of those situations where you can actually try out like having a conversation and having a group of yeah friends chime in and see what that feels like to you uh it's it's the first example I found my past where someone's actually done this super interesting so uh llama and uh code interpreter I think stood out pretty clearly as as really big things to touch um I wanted to check in just as we sort of start to maybe around the corner towards wrapping up Claude 2 uh and anthropic how significant was this in what ways was a significant you know was it something that was sort of meaningful from expanding the capacity set for developers or was it sort of more just a good example of what you can do if you increase the context window but you know that's something that might ultimately become table Stakes later on yeah I could I could maybe speak through this a little bit um so it is significant but not earth shattering or clearly I think it is the first time that Claude as a whole has just been a generally publicly available you used to be on a weakness um yes it has a longer context window but to me more significantly it is anthropic finding its its footholds uh in the very competitive CI landscape you know um anthopics message used to be that we're yes we're number two to open the eye but we're safer you know and that's that's not a super appealing uh thing to to many uh Engineers it is it is very appealing to some uh uh corporations by the way um but uh you know I think I think having the 100K contest window makes them state-of-the-art in one dimension which is very useful uh the ability to upload multiple files I think is super useful as well um and I and actually I have met a number of businesses I'm closer as a source graph who are actually choosing to build with claw 2 API over and above open AI just because they are better at latency better reliability in in better in some form of code synthesis um so I think it's anthropic finding it's foothold finally after a long while uh of being in open the eyeshadow yeah and we use cloud for the uh the transcript and timestamps and the buckets so shout out the 100K context window you know we couldn't do that when we first started the podcast we were like okay how do we trunk this stuff or like gpd4 and and all of that and then Bob was like just put the whole thing in here man and works great so uh that's a good start but I feel like they're always yeah a second second fiddle you know it's like every time there really something people are like cool okay some people like it must be more like okay fine I I feel bad for them because it's like it's really good stuff you know but they just need they just need some uh some help on the marketing side and the community buy-in so I just spent this past weekend at uh the club hackathon which is as far as I know anthropics first hackathon I I treated a pretty well received video where I was I was just eating the hackathon venue at 2 am in the morning and there was just a ton of people hacking there there were like 300 people uh participating uh for Claude And I think it's just the first real developer excitement I've ever seen for enthalpy kid Claude um so I think they're on their way up I think this paves the way for a multi-model future um that is something that a lot of people are betting on um it's just the the odds are stacked against entropic but they're making some Headway um I I do think that you should always be running all your chat side by side against uh tragicia and Claude and maybe mama two um so I I immediately I have a little uh many of our app that does that that uh save all the all the chats across and uh and yeah I can say I can legitimately say that Claude wins about 30 of the time uh as far as any time I give it a task to do I ask it a question um which is not you know doesn't make it number one but it actually is very additive to your overall toolkit of yeah I think you shouldn't use yeah it's certainly the first time that you're if you go on Twitter on any given day you will see people saying things like if you haven't used uh Claude you know for writing you have to try it now or so you know like people who are really who have made a switch who are have no affiliation who are very convinced that it is now part of the the suite of tools that people should really be paying attention to which I think is great where we shouldn't be at a stage yet where we're you know total totally in on one just one tool set I'll also mention I think this month or at least July was when the first inspection of where whether like is too much context not actually a good thing um so there's a there's a pretty famously product I forget the actual title a bit uh that shows a very pronounced new curve in the retrieval abilities of large context models um and so basically if you if if you if the item that is being retrieved is at the start or the end of the context window then it has the best chance of being received but if it's in the middle it has a high chance of being lost um and so is 100k context a good thing are you systematically testing its ability to um to retrieve the correct factual information or are you just looking at a summary and growing yeah it looks good to me you know um I think we will be testing like whether or not it's worth extending it to 100K or a million tokens or infinite tokens uh or do you want to blend uh a short window like 8 000 tokens or 4 000 tokens uh in couple that together with a proper semantic search system uh like the retrieval augmented generation and Vector database companies are doing so I think that that discussion has come up in open source a lot um and basically it I think it matches human memory right like you want to have a short working memory hahaha you know the I was thinking about it the one other obviously big sort of company update that we haven't spoken about yet was around the middle of the month Google bard had a a big set of updates a lot of it was sort of business focused right so it was available in more languages uh it was you know whatever the the sort of from a feature perspective the biggest thing that they were sort of hanging their hat on was around image recognition and sort of this push towards uh towards multimodality but you know did did you have any guys did you guys have any thoughts about that or was that sort of like you know not sort of on the the high priority list as a as an announcement or development this month I I think going back to the point before we're getting to the maturity level of the industry we're like doing like model updates and all this stuff like it's fine but like people need more you know people need more and like that's why I call it interpreter it's like so good right it's not just like oh we made the model A little better like we added this thing it's like this is like a whole new thing if you're playing the model game if not you got to go to the product level and I think Google should start thinking about how to make that work because when I search on Google Maps for certain stuff it's like completely does not work so maybe they should use models to like make that better and then say we're using Bard in Google Maps search uh but yeah I don't know I've kind of I'm kind of tuning off a lot of the single just model announcements so uh so Bart's updates I think the the multi-modality they actually beat gpt4 to releasing a generally available multimodal wall right you can upload an image and have Bard describe it and that's pretty interesting pretty cool um I think uh one of our earliest guests Robo flow uh Brad their CTO was actually doing some comparisons because they have access to a lot of division models and and Bart came up a little bit short but it was pretty good it was it was like close to the state of the art um I would say the problem with Bard is that you can't rely on them having reliable updates because they had a June update I don't actually remember of implicit code execution where they started to ship uh the code interpreter type functionality but in a more limited format if you run the same code the same questions that but advertising the June blog post it's sundarkai advertise in in a video that and tweet it out they no longer worked in the heart so they had a regression that's that was very embarrassing um obviously unintended but uh it's and it shows that it's hard to keep model progress up to date but I think Google has this checkered history riff its products being reliable you know they also killed off Google Adobe rip um and uh and I think that's something that they have to combat which is like yes they're they're trying to ship model progress I've met the bar people they're you know good artist people um but they have struggled to to ship uh products even more than open AI which is frankly embarrassing for a couple of the size of Google outside of the the biggies are there any other sort of key trends or or you know maybe not even key trends but sort of bubbling interest that you guys are noticing in the developer community that aren't necessarily super widely uh seen outside you know one of the things that I keep an eye on is all the auto GPT like things you know in this month we had gbt engineer and we had multi-on who held a hackathon and you know there's a few few things like that but you know not necessarily in the agent space but are there any other themes that you guys are are keeping an eye on let's say uh I I'm sure Alessio can chime in but on on I do keep a relative uh close eye on that agent stuff uh it has not uh died down in terms of the the heat uh even the other GPT team who by the way I work uh on the first floor the building that I work on uh they're hard at work uh shipping the next version and so I think a lot of people are engaging in the dream of agents and um I think like scoping them down to something usable is still a task that uh has not as it has so far eluded every single team so far and uh and it is what it is I think I think uh all these very ambitious goals we are at the very start of of this journey uh the same Journey that maybe self-driving cars took uh in 2012 when when they started doing the darker challenge um and I think the other thing I'll point out interest in terms of uh just overall interest uh I am definitely seeing a lot of uh eval type companies being formed and winning hackathons too um so what what at Utah companies they're they're basically uh companies in that you uh monitor the uh the success of your prompts or your agents and version them and um and and just share them potentially um I I I feel like I can't be more descriptive just because it's hard to um to really describe what they do it's just because they are not very clear about what they do yet um Lang chain launch Lang Smith um and I think that is the first commercial product that nine chain probably you know the the top one or two developer oriented AI projects out there um and that's more observability but also local uh tensorous ebal as well because they Aqua hired in an AI eval projects as well so I was I'll just call out just the general domain of how to eval models um is a very big focus of the developers here again yep yeah we've done um two seats and companies doing agents but they're both verticalized agents so I think the open source motion has been Auto gbt do anything um and now we're seeing a lot of Founders is like hey you know if you take that and then you combine it with like deep industry expertise you can get so many improvements to it and then the other piece of it is how do you do information retrieval so you know in general knowledge like documents everything is kind of flat but when you're in specific vertical say Finance for example um you know if you're looking at the earnings from this quarter like 10 quarters ago like the latest ones are like much more important so how do you start to create this like information hierarchy between documents and then how do you use that instead of doing simple like retrieval from like an embedding store it's like how do you also start to score these things that's another area of of research from from founders oh I'll call out two more things um one more thing that happened this week this month was sdxl uh you know text to image doesn't seem as sexy anymore even though like last year with all the raids um I but I do think like it's it's coming along um I I definitely wish that Google was putting up more of a fight because they actually at the start of the Year released some very interesting Capers that they never followed up on uh that show some really interesting Transformers based uh text image models that I thought was super interesting and then this the other uh element which uh you know I'm just like very fascinated by a lot of the I don't know like the uh uh I I I hesitate to say this but it's actually like the the character and like the um um let's just call they call it character replica and and all the sort of work versions of that um I I do think that a lot of people are hacking on this kind of stuff um the retention metrics on character AI blows away um you know a lot of the uh the metrics that you might see in on traditional social media sites and basically AI native social media is something that is something that that is there's something there that I think people haven't really explored yet and and people are exploring it you know like uh is this company and like you know he's always a few years ahead of it so uh not to keep returning to this theme but I I just think like it's it's definitely coming for a lot of like a lot of the ways that we we deal with things like right now we think co-pilot and we right now we think um uh we've been chat gbt but like uh what what we what we really want to speak to is is uh a way of serializing personality and intelligence um and and potentially that is a that is a leading form of Mind upload um so that Becca is into science fiction but I do see a lot of people working on that yeah I mean we just got a Financial Times report that says that AI personas uh from meta from Facebook could be coming next month they were talking about uh yeah they were talking about airport was there's one one that's Abraham Lincoln one that's like a surfer dude who gives you travel advice so it's it's it's you know the sourcing is three people with knowledge of the project or whatever um and it you know no obviously no confirmation from meta but it's no secret that Zuckerberg has been interested in this stuff and uh you know the the ftp's is actually it's a good overview of why a company like Meadow would care about it in very dollars and cents terms yeah something like and I want to State like the first version of this is very very me like when I first looked at character AI it was like okay I want to talk to Genghis Khan if I'm doing a history class but it's like not it's like what if what a 10 year old would enjoy you know um but I think the the various iterations of this professionally would be very interesting so on the developer side of this I have been calling for the development of agent clouds which are clouds that are specifically uh optimized not for uh human use but for uh EI agent teams and that is a form of character right it's a character is it with the different environments uh with the different dependencies pre-installed uh that can be programmatically controlled can get programmatic feedback to agents um and uh and there's a protocol for me um that some of the leading figures like Auto gbt and e2b are creating that um lets agents run clouds um this would this would definitely terrify the AI safety people because we have gone from like running them on a single machine towards running you know clusters originally um but it's happening all right so so let's talk about what comes next do you guys have any predictions for August or if not predictions just things that you're watching most closely go ahead Alice uh let me let me think and I think Sean is usually good at like the super long term prediction some more uh pragmatic I don't know you know yeah he's more like he he like minimum like 12 to 24 months um I I think like for me probably starting to see more public talk about open source models in production with people using that as a differentiator I think right now a lot of it is kind of like oh these models are there but nobody's really saying oh I moved away from opening I'm using this but in our we run a early adopters Community with about 1500 kind of like a Fortune 500 large companies leaders and some of them were like oh we deployed dolly in production and we're using it we're not writing a blog post about it um so I think right now the perception is still everybody's using open Ai and the open source models are like really toys but I think we're gonna get into September and you know you're not going to see a lot of announcements in August proper but I think a lot of people are gonna spend August getting these models ready and then going into end of the year and say hey we're here too you know we're using the open models like we don't need open AI um I think right now there's still not not a lot of a lot of public talk about that so excited to to see more uh yeah I'm a little bit uh as for myself uh this is very self-interested obviously but we had to edit an agenda you know I wrote about the the rise of the AI engineer I mean I think it's definitely happening as we speak um I I have seen multiple tags like people tag me multiple times a day on like uh how they're reorienting their careers I think people professionalizing around this and going from essentially like informal groups and slack channels and meetups and stuff towards uh certifications and courses and job titles and actual AI teams in every single company I think is happening um I I just got notification like two days ago that the uh you know in meta apparently you can sort of name your name a job site title whatever you want internally uh and so they emerged as the first AI engineer within meta uh has has been announced and uh so I think I think as far as you know the near-term I do see this career this profession come into place um that I've been forecasting for uh for a little bit and I'm excited to help it along awesome well guys great conversation tons of interesting stuff happening obviously um I do think it you know ironically I think it's a relatively more quiet time in some ways than than it even was and you know my my prediction for August is that we're going to see the extension of that we're going to see sort of the the biggest breath that we've had at least from a from a feeling perspective maybe since Chachi PT but then we're gonna rage back in in September you got Facebook connects in September you've got sort of just the return to business that everyone does after August um but of course I think you know the hackathons aren't going to stop in the Bay Area so people are going to keep building and it's entirely possible that something you know hits in the next four weeks that that totally changes that be exciting to see looking forward Get full access to Latent Space at www.latent.space/subscribe
59:0204/08/2023
FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI
FlashAttention was first published by Tri Dao in May 2022 and it had a deep impact in the large language models space. Most open models you’ve heard of (RedPajama, MPT, LLaMA, Falcon, etc) all leverage it for faster inference. Tri came on the podcast to chat about FlashAttention, the newly released FlashAttention-2, the research process at Hazy Lab, and more. This is the first episode of our “Papers Explained” series, which will cover some of the foundational research in this space. Our Discord also hosts a weekly Paper Club, which you can signup for here. How does FlashAttention work?The paper is titled “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. There are a couple keywords to call out:* “Memory Efficient”: standard attention memory usage is quadratic with sequence length (i.e. O(N^2)). FlashAttention is sub-quadratic at O(N). * “Exact”: the opposite of “exact” in this case is “sparse”, as in “sparse networks” (see our episode with Jonathan Frankle for more). This means that you’re not giving up any precision.* The “IO” in “IO-Awareness” stands for “Input/Output” and hints at a write/read related bottleneck. Before we dive in, look at this simple GPU architecture diagram:The GPU has access to three memory stores at runtime:* SRAM: this is on-chip memory co-located with the actual execution core. It’s limited in size (~20MB on an A100 card) but extremely fast (19TB/s total bandwidth)* HBM: this is off-chip but on-card memory, meaning it’s in the GPU but not co-located with the core itself. An A100 has 40GB of HBM, but only a 1.5TB/s bandwidth. * DRAM: this is your traditional CPU RAM. You can have TBs of this, but you can only get ~12.8GB/s bandwidth, which is way too slow.Now that you know what HBM is, look at how the standard Attention algorithm is implemented:As you can see, all 3 steps include a “write X to HBM” step and a “read from HBM” step. The core idea behind FlashAttention boils down to this: instead of storing each intermediate result, why don’t we use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead? (We also talked about kernel fusion in our episode with George Hotz and how PyTorch / tinygrad take different approaches here)The result is much faster, but much harder to read:As you can see, FlashAttention is a very meaningful speed improvement on traditional Attention, and it’s easy to understand why it’s becoming the standard for most models.This should be enough of a primer before you dive into our episode! We talked about FlashAttention-2, how Hazy Research Group works, and some of the research being done in Transformer alternatives.Show Notes:* FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (arXiv)* FlashAttention-2* Together AI* From Deep Learning to Long Learning* The Hardware Lottery by Sara Hooker* Hazy Research* Is Attention All You Need?* Nvidia CUTLASS 3* SRAM scaling slows* Transformer alternatives:* S4* Hyena* Recurrent Neural Networks (RNNs)Timestamps:* Tri's background [00:00:00]* FlashAttention’s deep dive [00:02:18]* How the Hazy Research group collaborates across theory, systems, and applications [00:17:21]* Evaluating models beyond raw performance [00:25:00]* FlashAttention-2 [00:27:00]* CUDA and The Hardware Lottery [00:30:00]* Researching in a fast-changing market [00:35:00]* Promising transformer alternatives like state space models and RNNs [00:37:30]* The spectrum of openness in AI models [00:43:00]* Practical impact of models like LLAMA2 despite restrictions [00:47:12]* Incentives for releasing open training datasets [00:49:43]* Lightning Round [00:53:22]Transcript:Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. Today we have no Swyx, because he's in Singapore, so it's a one-on-one discussion with Tri Dao. Welcome! [00:00:24]Tri: Hi everyone. I'm Tri Dao, excited to be here. [00:00:27]Alessio: Tri just completed his PhD at Stanford a month ago. You might not remember his name, but he's one of the main authors in the FlashAttention paper, which is one of the seminal work in the Transformers era. He's got a lot of interest from efficient transformer training and inference, long range sequence model, a lot of interesting stuff. And now you're going to be an assistant professor in CS at Princeton next year. [00:00:51]Tri: Yeah, that's right. [00:00:52]Alessio: Yeah. And in the meantime, just to get, you know, a low pressure thing, you're Chief Scientist at Together as well, which is the company behind RedPajama. [00:01:01]Tri: Yeah. So I just joined this week actually, and it's been really exciting. [00:01:04]Alessio: So what's something that is not on the internet that people should know about you? [00:01:09]Tri: Let's see. When I started college, I was going to be an economist, so I was fully on board. I was going to major in economics, but the first week I was at Stanford undergrad, I took a few math classes and I immediately decided that I was going to be a math major. And that kind of changed the course of my career. So now I'm doing math, computer science, AI research. [00:01:32]Alessio: I had a similar thing. I started with physics and then I took like a programming course and I was like, I got to do computer science. I don't want to do physics. So FlashAttention is definitely, everybody's using this. Everybody loves it. You just released FlashAttention 2 last week. [00:01:48]Tri: Yeah. Early this week on Monday. Yeah. [00:01:53]Alessio: You know, AI time. Things move fast. So maybe let's run through some of the FlashAttention highlights, some of the innovation there, and then we can dive into FlashAttention 2. So the core improvement in FlashAttention is that traditional attention is a quadratic sequence length. And to the two, FlashAttention is linear, which obviously helps with scaling some of these models. [00:02:18]Tri: There are two factors there. So of course the goal has been to make attention go faster or more memory efficient. And ever since attention became popular in 2017 with the Transformer paper, lots and lots of folks have been working on this. And a lot of approaches has been focusing on approximating attention. The goal is you want to scale to longer sequences. There are tons of applications where you want to do that. But scaling to longer sequences is difficult because attention scales quadratically in sequence length on both runtime and memory, as you mentioned. So instead of trying to approximate attention, we were trying to figure out, can we do the same computation and maybe be more memory efficient? So in the end, we ended up being the memory is linear in sequence length. In terms of computation, it's still quadratic, but we managed to make it much more hardware friendly. And as a result, we do get wall clock speed up on the order of 2 to 4x, which really helps because that just means that you'll be able to train with 2 to 4x longer sequence length for the same cost without doing any approximations. As a result, lots of folks have been using this. The thing is available in a lot of libraries that do language model training or fine tuning. [00:03:32]Alessio: And the approximation thing is important because this is an exact thing versus a sparse. So maybe explain a little bit the difference there. [00:03:40]Tri: For sure. So in addition, essentially you compute pairwise similarity between every single element in a sequence against each other. So there's been other approaches where instead of doing all that pairwise computation, you only compute similarity for some pairs of elements in the sequence. So you don't do quadratic number of comparison. And this can be seen as some form of sparsity. Essentially you're ignoring some of the elements. When you write down the matrix, you essentially say, OK, I'm going to pretend there's zero. So that has some benefits in terms of runtime and memory. But the trade-off is that it tends to do worse in terms of quality because you're essentially approximating or ignoring some elements. And I personally have worked on this as well for a few years. But when we talk to practitioners who actually train models, especially at large scale, they say, tend not to use these approximate attention methods. Because it turns out, this was surprising to me at the time, was that these approximation methods, even though they perform fewer computation, they tend to not be faster in walk-on time. So this was pretty surprising because back then, I think my background was more on the theoretical side. So I was thinking of, oh, how many flops or floating point operations are you performing? And hopefully that correlates well with walk-on time. But I realized that I was missing a bunch of ideas from the system side where flops or floating point operations don't necessarily correlate with runtime. There are other factors like memory reading and writing, parallelism, and so on. So I learned a ton from just talking to systems people because they kind of figured this stuff out a while ago. So that was really eye-opening. And then we ended up focusing a lot more on memory reading and writing because that turned out to be the majority of the time when you're doing attention is reading and writing memory. [00:05:34]Alessio: Yeah, the I.O. awareness is probably one of the biggest innovations here. And the idea behind it is, like you mentioned, the FLOPS growth of the cards have been going up, but the memory bandwidth, not as much. So I think maybe that was one of the assumptions that the original attention paper had. So talk a bit about how that came to be as an idea. It's one of those things that like in insight, it's like, obviously, why are we like rewriting to like HBM every time, you know, and like once you change it, it's clear. But what was that discovery process? [00:06:08]Tri: Yeah, in hindsight, a lot of the ideas have already been there in the literature. And I would say is it was somehow at the intersection of both machine learning and systems. And you kind of needed ideas from both sides. So on one hand, on the system side, so lots of systems folks have known that, oh, you know, kernel fusion is great. Kernel fusion just means that instead of performing, you know, loading the same element, instead of performing an operation, write it down, load it back up and perform the second operation, you just load it once, perform two operations and then write it down again. So that saves you kind of memory read and write in the middle there. So kernel fusion has been a classic. There's been other techniques from the system side, like tiling, where you perform things in the form of computations in block, again, so that you can load it into a really fast memory. Think of it as a cache. And this is, again, classical computer science ideas, right? You want to use the cache. So the system folks have been thinking about these ideas for a long time, and they apply to attention as well. But there were certain things in attention that made it difficult to do a complete kernel fusion. One of which is there is this softmax operation in the middle, which requires you to essentially sum across the row of the attention matrix. So it makes it difficult to kind of break it, because there's this dependency. So it makes it difficult to break things into a block. So on the system side, people have been thinking about these ideas, but it's been difficult to kind of do kernel fusion for the entire operation. On the machine learning side, people have been thinking more algorithmically. They say, okay, either we can approximate attention, or there's this trick called the online softmax trick, which says that because of softmax, the way it's written mathematically, you can actually break it up into smaller pieces, do some rescaling, and still get the right answer. So this online softmax trick has been around for a while. I think there was a paper from NVIDIA folks back in 2018 about this. And then there was a paper from Google. So Marcus, Rob, and Stats wrote a paper late 2021 on using this online softmax trick to break attention up into smaller pieces. So a lot of the ideas were already there. But it turns out, you kind of need to combine ideas from both sides. So you need to understand that, hey, we want to do kernel fusion to reduce memory written writes. But we also need this online softmax trick to be able to break the softmax into smaller pieces so that a lot of the systems tricks kind of carry through. We saw that, and it was kind of a natural idea that we ended up using ideas from both sides, and it ended up working pretty well. Yeah. [00:08:57]Alessio: Are there any downsides to kernel fusion? If I think about databases and the reasons why we have atomic operations, you know, it's like, you have observability and fallback in between them. How does that work with attention? Is there anything that we lose by fusing the operations? [00:09:13]Tri: Yeah, I think mostly on the practical side is that you lose a little bit of flexibility in the sense that, hey, now you have, for example, faster attention, it's just a subroutine that you would call to do attention. But as a researcher, let's say you don't want that exact thing, right? You don't want just attention, let's say you want some modification to attention. You want to do, hey, I'm going to multiply the query and key, but then I'm going to do this extra thing before I carry on. So kernel fusion just means that, okay, we have a subroutine that does the entire thing. But if you want to experiment with things, you won't be able to use that fused kernel. And the answer is, can we have a compiler that then automatically does a lot of this kernel fusion? Lots of compiler folks are thinking about this, either with a new language or you can embed it in PyTorch. PyTorch folks have been working on this as well. So if you write just your code in PyTorch and they can capture the graph, can they generate code that will fuse everything together? That's still ongoing, and it works for some cases. But for attention, because of this kind of softmax rewriting stuff, it's been a little bit more difficult. So maybe in a year or two, we'll have compilers that are able to do a lot of these optimizations for you. And you don't have to, for example, spend a couple months writing CUDA to get this stuff to work. Awesome. [00:10:41]Alessio: And just to make it clear for listeners, when we say we're not writing it to memory, we are storing it, but just in a faster memory. So instead of the HBM, we're putting it in the SRAM. Yeah. [00:10:53]Tri: Yeah. [00:10:54]Alessio: Maybe explain just a little bit the difference there. [00:10:56]Tri: Yeah, for sure. This is kind of a caricature of how you think about accelerators or GPUs in particular, is that they have a large pool of memory, usually called HBM, or high bandwidth memory. So this is what you think of as GPU memory. So if you're using A100 and you list the GPU memory, it's like 40 gigs or 80 gigs. So that's the HBM. And then when you perform any operation, you need to move data from the HBM to the compute unit. So the actual hardware unit that does the computation. And next to these compute units, there are on-chip memory or SRAM, which are much, much smaller than HBM, but much faster. So the analogy there is if you're familiar with, say, CPU and RAM and so on. So you have a large pool of RAM, and then you have the CPU performing the computation. But next to the CPU, you have L1 cache and L2 cache, which are much smaller than DRAM, but much faster. So you can think of SRAM as the small, fast cache that stays close to the compute unit. Physically, it's closer. There is some kind of asymmetry here. So HBM is much larger, and SRAM is much smaller, but much faster. One way of thinking about it is, how can we design algorithms that take advantage of this asymmetric memory hierarchy? And of course, lots of folks have been thinking about this. These ideas are pretty old. I think back in the 1980s, the primary concerns were sorting. How can we sort numbers as efficiently as possible? And the motivating example was banks were trying to sort their transactions, and that needs to happen overnight so that the next day they can be ready. And so the same idea applies, which is that they have slow memory, which was hard disk, and they have fast memory, which was DRAM. And people had to design sorting algorithms that take advantage of this asymmetry. And it turns out, these same ideas can apply today, which is different kinds of memory. [00:13:00]Alessio: In your paper, you have the pyramid of memory. Just to give people an idea, when he says smaller, it's like HBM is like 40 gig, and then SRAM is like 20 megabytes. So it's not a little smaller, it's much smaller. But the throughput on card is like 1.5 terabytes a second for HBM and like 19 terabytes a second for SRAM, which is a lot larger. How do you think that evolves? So TSMC said they hit the scaling limits for SRAM, they just cannot grow that much more. HBM keeps growing, HBM3 is going to be 2x faster than HBM2, I think the latest NVIDIA thing has HBM3. How do you think about the future of FlashAttention? Do you think HBM is going to get fast enough when maybe it's not as useful to use the SRAM? [00:13:49]Tri: That's right. I think it comes down to physics. When you design hardware, literally SRAM stays very close to compute units. And so you don't have that much area to essentially put the transistors. And you can't shrink these things too much. So just physics, in terms of area, you don't have that much area for the SRAM. HBM is off-chip, so there is some kind of bus that essentially transfers data from HBM to the compute unit. So you have more area to essentially put these memory units. And so yeah, I think in the future SRAM probably won't get that much larger, because you don't have that much area. HBM will get larger and faster. And so I think it becomes more important to design algorithms that take advantage of this memory asymmetry. It's the same thing in CPU, where the cache is really small, the DRAM is growing larger and larger. DRAM could get to, I don't know, two terabytes, six terabytes, or something, whereas the cache stays at, I don't know, 15 megabytes or something like that. I think maybe the algorithm design becomes more and more important. There's still ways to take advantage of this, I think. So in the future, I think flash attention right now is being used. I don't know if in the next couple of years, some new architecture will come in and whatnot, but attention seems to be still important. For the next couple of years, I still expect some of these ideas to be useful. Not necessarily the exact code that's out there, but I think these ideas have kind of stood the test of time. New ideas like IO awareness from back in the 1980s, ideas like kernel fusions, tiling. These are classical ideas that have stood the test of time. So I think in the future, these ideas will become more and more important as we scale models to be larger, as we have more kinds of devices, where performance and efficiency become much, much more important. [00:15:40]Alessio: Yeah, and we had Jonathan Frankle on the podcast, and if you go to issattentionallyouneed.com, he has an outstanding bet, and he does believe that attention will be the state of the art architecture still in a few years. Did you think flash attention would be this popular? I'm always curious on the research side, you publish a paper, and obviously you know it's great work, but sometimes it just kind of falls flat in the industry. Could you see everybody just starting to use this, or was that a surprise to you? [00:16:11]Tri: Certainly, I didn't anticipate the level of popularity. Of course, we were extremely happy to have people using this stuff and giving us feedback and so on, and help us improve things. I think when we were writing the paper, I remember sending an email to one of my advisors, and like, hey, I'm excited about this paper, but I think the most important thing will be the artifact, which is the code. So I knew that the code will be valuable. So we kind of focus a lot on the code and make sure that the code is usable and as fast as can be. Of course, the idea, the paper presents the ideas and explain it and have experiments that validate the idea, but I knew that the artifact or the code was also pretty important. And that turned out to be the right focus, which is, you know, we put out the paper, we release the code and continue working on the code. So it's a team effort with my co-authors as well. [00:17:07]Alessio: We mentioned Hazy Research a bunch of times on the podcast before. I would love for you to spend five minutes just talking about how does the group work? How do people get together? How do you bounce ideas off of each other? Yeah. [00:17:21]Tri: So Hazy Research is a research group at Stanford led by one of my advisors, Chris Re. I love the people there. It was one of the best experiences I had. They've made my PhD so much more enjoyable. And I think there are a couple of ways that the group has been working pretty well. So one is, I think there's a diverse pool of people who either, you know, some of them focus on algorithms and theory, some of them focus on building systems, some of them focus on applications. And as a result, there is this flow of idea. So as an example, some of us were working on like more algorithms and theory, and then we can talk to the folks building systems and say, hey, let's try it out and let's put it in the systems and see how it is. And there you will get feedback from systems folks. They will say, hey, we implemented this, or we tried this and this is where it doesn't work, something like that. And once we put it in the systems, the application folks can use the algorithm or new methods or new models. And we again get great feedback from them because the application folks, for example, some of my good friends, they focus on medical imaging or seizure detection. And that is the problem they care about. And if your method doesn't work on the task they care about, they will tell you. Whereas I think a lot of people in machine learning, they're a little bit more flexible. So they will be like, hey, it doesn't work on seizure detection. Let's try some other task, right? But having that direct feedback of like, hey, it doesn't work there, let's figure out why. I think that that feedback allows us to do better work. And I think that kind of process of exchanging ideas, validating it in a real system so that applications folks can try it out and give you feedback. That cycle has been very, very useful. And so that's one, having a diverse group of people. The other one is, and this is something I really appreciate from advice from Chris was try to understand the fundamental, right? And he's happy letting me go off and read some textbooks and playing with things because I think a lot of research ideas come from understanding the old literature and see how it fits with the new landscape. And so if you just new archive papers every day, that's great, but you also need to read textbooks. And that's one advice I got from Chris, which is understand the fundamentals. And I think that allows us to do more impactful work. [00:19:46]Alessio: How do you think about academia versus industry? I feel like AI / Machine Learning has been an area where up until three, four years ago, most of the cutting edge work was being done in academia. And now there's all these big industry research labs. You're obviously going to Princeton, so you're an academia believer. How should people think about where to go? Say I'm doing my master's, I have to decide between doing a PhD and going into OpenAI Anthropic. How should I decide? [00:20:15]Tri: I think they kind of play a complementary role, in my opinion. Of course, I also was considering different paths as well. So I think right now, scaling matters a lot, especially when you talk about language models and AI and so on. Scaling matters a lot. And that means that you need compute resources and you need infrastructure and you need engineers time. And so industry tends to have an advantage when it comes to scaling things. But a lot of the ideas actually came from academia. So let's take Attention, which got popular with the Transformer in 2017. Attention actually has been around for a while. So I think the first mention was in 2014, a paper from Bernadot and others and Yoshua Bengio, which is coming from academia. A lot of ideas did come from academia. And scaling things up, of course, I think OpenAI has been great at scaling things up. That was the bet that they made after, I think, GPT-2. So they saw that scaling these things up to back then was 1.5 billion parameter seemed to give you amazing capabilities. So they really committed to that. They really committed to scaling things. And that turned out to be, it's been a pretty successful bet. I think for academia, we're still trying to figure out exactly what we're doing in this shifting landscape. And so lots of folks have been focusing on, for example, evaluation. So I know the Stanford Center for Foundation Model led by Percy, they have this benchmark called HELM, which is this holistic benchmark. So trying to figure out, okay, characterizing the landscape of different kinds of models, what people should evaluate, what people should measure, and things like that. So evaluation is one role. The other one is understanding. So this has happened historically where there's been some development in the industry and academia can play a role in explaining, understanding. They have the luxury to slow down trying to understand stuff, right? So lots of paper on understanding what's really going on, probing these models, and so on. I think I'm not as familiar with the NLP literature, but my impression is there's a lot of that going on in the NLP conferences, which is understanding what these models are doing, what capabilities they have, and so on. And the third one I could see is that the academia can take more risky bets in the sense that we can work on stuff that is quite different from industry. I think industry, my impression is you have some objective. You're trying to say, hey, for this quarter, we want to scale the model in this particular way. Next quarter, we want the model to have these capabilities. You're trying to get objectives that maybe, I don't know, 70% that will work out because it's important for the company's direction. I think for academia, the way things work is you have many, many researchers or PhD students, and they're kind of pursuing independent directions. And they have a little bit more flexibility on, hey, I'm going to try out this seemingly crazy idea and see, let's say there's a 30% chance of success or something. And however you define success, for academia, a lot of the time, success just means like, hey, we found something interesting. That could eventually go into industry through collaboration and so on. So I do see academia and industry kind of playing complementary roles. And as for someone choosing a career, I think just more and more generally, industry would be probably better in terms of compensation, in terms of probably work-life balance. But my biased perspective is that maybe academia gives you a little bit more freedom to think and understand things. So it probably comes down to personal choice. I end up choosing to be a professor next year at Princeton. But of course, I want to maintain a relationship with industry folks. I think industry folks can provide very valuable feedback to what we're doing in academia so that we understand where the field is moving because some of the directions are very much influenced by what, for example, OpenAI or Google is doing. So we want to understand where the field is moving. What are some promising applications? And try to anticipate, okay, if the field is moving like this, these applications are going to be popular. What problems will be important in two, three years? And then we try to start thinking about those problems so that hopefully in two, three years, we have some of the answers to some of these problems in two, three years. Sometimes it works out, sometimes it doesn't. But as long as we do interesting things in academia, that's the goal. [00:25:03]Alessio: And you mentioned the eval side. So we did a Benchmarks 101 episode. And one of the things we were seeing is sometimes the benchmarks really influence the model development. Because obviously, if you don't score well on the benchmarks, you're not going to get published and you're not going to get funded. How do you think about that? How do you think that's going to change now that a lot of the applications of these models, again, is in more narrow industry use cases? Do you think the goal of the academia eval system is to be very broad and then industry can do their own evals? Or what's the relationship there? [00:25:40]Tri: Yeah, so I think evaluation is important and often a little bit underrated. So it's not as flashy as, oh, we have a new model that can do such and such. But I think evaluation, what you don't measure, you can't make progress on, essentially. So I think industry folks, of course, they have specific use cases that their models need to do well on. And that's what they care about. Not just academia, but other groups as well. People do understand what are some of the emerging use cases. So for example, now one of the most popular use cases is Chatbot. And then I think folks from Berkeley, some of them are from Berkeley, call them MLCs. They set up this kind of Chatbot arena to essentially benchmark different models. So people do understand what are some of the emerging use cases. People do contribute to evaluation and measurement. And as a whole, I think people try to contribute to the field and move the field forward, albeit that maybe slightly different directions. But we're making progress and definitely evaluation and measurement is one of the ways you make progress. So I think going forward, there's still going to be just more models, more evaluation. We'll just have better understanding of what these models are doing and what capabilities they have. [00:26:56]Alessio: I like that your work has been focused on not making benchmarks better, but it's like, let's just make everything faster. So it's very horizontal. So FlashAttention 2, you just released that on Monday. I read in the blog post that a lot of the work was also related to some of the NVIDIA library updates. Yeah, maybe run us through some of those changes and some of the innovations there. Yeah, for sure. [00:27:19]Tri: So FlashAttention 2 is something I've been working on for the past couple of months. So the story is the NVIDIA CUTLASS team, they released a new version of their library, which contains all these primitives to allow you to do matrix multiply or memory loading on GPU efficiently. So it's a great library and I built on that. So they released their version 3 back in January and I got really excited and I wanted to play with that library. So as an excuse, I was just like, okay, I'm going to refactor my code and use this library. So that was kind of the start of the project. By the end, I just ended up working with the code a whole lot more and I realized that, hey, there are these inefficiencies still in Flash Attention. We could change this way or that way and make it, in the end, twice as fast. But of course, building on the library that the NVIDIA folks released. So that was kind of a really fun exercise. I was starting out, it's just an excuse for myself to play with the new library. What ended up was several months of improvement, improving Flash Attention, discovering new ideas. And in the end, we managed to make it 2x faster and now it's pretty close to probably the efficiency of things like matrix multiply, which is probably the most optimized subroutine on the planet. So we're really happy about it. The NVIDIA Cutlass team has been very supportive and hopefully in the future, we're going to collaborate more. [00:28:46]Alessio: And since it's an NVIDIA library, can you only run this on CUDA runtimes? Or could you use this and then run it on an AMD GPU? [00:28:56]Tri: Yeah, so it's an NVIDIA library. So right now, the code we release runs on NVIDIA GPUs, which is what most people are using to train models. Of course, there are emerging other hardware as well. So the AMD folks did implement a version of Flash Attention, I think last year as well, and that's also available. I think there's some implementation on CPU as well. For example, there's this library, ggml, where they implemented the same idea running on Mac and CPU. So I think that kind of broadly, the idea would apply. The current implementation ended up using NVIDIA's library or primitives, but I expect these ideas to be broadly applicable to different hardware. I think the main idea is you have asymmetry in memory hierarchy, which tends to be everywhere in a lot of accelerators. [00:29:46]Alessio: Yeah, it kind of reminds me of Sara Hooker's post, like the hardware lottery. There could be all these things that are much better, like architectures that are better, but they're not better on NVIDIA. So we're never going to know if they're actually improved. How does that play into some of the research that you all do too? [00:30:04]Tri: Yeah, so absolutely. Yeah, I think Sara Hooker, she wrote this piece on hardware lottery, and I think she captured really well of what a lot of people have been thinking about this. And I certainly think about hardware lottery quite a bit, given that I do some of the work that's kind of really low level at the level of, hey, we're optimizing for GPUs or NVIDIA GPUs and optimizing for attention itself. And at the same time, I also work on algorithms and methods and transformer alternatives. And we do see this effect in play, not just hardware lottery, but also kind of software framework lottery. You know, attention has been popular for six years now. And so many kind of engineer hours has been spent on making it as easy and efficient as possible to run transformer, right? And there's libraries to do all kinds of tensor parallel, pipeline parallel, if you use transformer. Let's say someone else developed alternatives, or let's just take recurrent neural nets, like LSTM, GRU. If we want to do that and run that efficiently on current hardware with current software framework, that's quite a bit harder. So in some sense, there is this feedback loop where somehow the model architectures that take advantage of hardware become popular. And the hardware will also kind of evolve to optimize a little bit for that kind of architecture and software framework will also evolve to optimize for that particular architecture. Right now, transformer is the dominant architecture. So yeah, I'm not sure if there is a good way out of this. Of course, there's a lot of development. Things like, I think compilers will play a role because compilers allow you to maybe still be much more efficient across different kinds of hardware because essentially you write the same code and compiler will be able to make it run efficiently different kinds of hardware. So for example, there's this language Mojo, they're compiler experts, right? And their bet is AI models will be running on different kinds of devices. So let's make sure that we have really good compilers with a good language that then the compiler can do a good job optimizing for all kinds of devices. So that's maybe one way that you can get out of this cycle. But yeah, I'm not sure of a good way. In my own research, I have to think about both the algorithm new model and how it maps to hardware. So there are crazy ideas that seem really good, but will be really, really difficult to run efficiently. And so as a result, for example, we can't really scale some of the architectures up simply because they're not hardware friendly. I have to think about both sides when I'm working on new models. [00:32:50]Alessio: Yeah. Have you spent any time looking at some of the new kind of like AI chips companies, so to speak, like the Cerebras of the world? Like one of their innovations is co-locating everything on the chip. So you remove some of this memory bandwidth issue. How do you think about that? [00:33:07]Tri: Yeah, I think that's an interesting bet. I think Tesla also has this Dojo supercomputer where they try to have essentially as fast on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends at Cerebros and, you know, they have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course, people are working on this. So I think, yeah, we kind of need these different bets on the hardware side as well. Hardware has, my understanding is, has a kind of a longer time scale. So you need to design hardware, you need to manufacture it, you know, maybe on the order of three to five years or something like that. So people are taking different bets, but the AI landscape is changing so fast that it's hard to predict, okay, what kind of models will be dominant in, let's say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not, right? And so different people will make different bets on the hardware side. [00:34:39]Alessio: Does the pace of the industry and the research also influence the PhD research itself? For example, in your case, you're working on improving attention. It probably took you quite a while to write the paper and everything, but in the meantime, you could have had a new model architecture come out and then it's like nobody cares about attention anymore. How do people balance that? [00:35:02]Tri: Yeah, so I think it's tough. It's definitely tough for PhD students, for researchers. Given that the field is moving really, really fast, I think it comes down to understanding fundamental. Because that's essentially, for example, what the PhD allows you to do. It's been a couple of years understanding the fundamentals. So for example, when I started my PhD, I was working on understanding matrix vector multiply, which has been a concept that's been around for hundreds of years. We were trying to characterize what kind of matrices would have theoretically fast multiplication algorithm. That seems to have nothing to do with AI or anything. But I think that was a time when I developed mathematical maturity and research taste and research skill. The research topic at that point didn't have to be super trendy or anything, as long as I'm developing skills as a researcher, I'm making progress. And eventually, I've gotten quite a bit better in terms of research skills. And that allows, for example, PhD students later in their career to quickly develop solutions to whatever problems they're facing. So I think that's just the natural arc of how you're being trained as a researcher. For a lot of PhD students, I think given the pace is so fast, maybe it's harder to justify spending a lot of time on the fundamental. And it's tough. What is this kind of explore, exploit kind of dilemma? And I don't think there's a universal answer. So I personally spend some time doing this kind of exploration, reading random textbooks or lecture notes. And I spend some time keeping up with the latest architecture or methods and so on. I don't know if there's a right balance. It varies from person to person. But if you only spend 100% on one, either you only do exploration or only do exploitation, I think it probably won't work in the long term. It's probably going to have to be a mix and you have to just experiment and kind of be introspective and say, hey, I tried this kind of mixture of, I don't know, one exploration paper and one exploitation paper. How did that work out for me? Should I, you know, having conversation with, for example, my advisor about like, hey, did that work out? You know, should I shift? I focus more on one or the other. I think quickly adjusting and focusing on the process. I think that's probably the right way. I don't have like a specific recommendation that, hey, you focus, I don't know, 60% on lecture notes and 40% on archive papers or anything like that. [00:37:35]Alessio: Let's talk about some Transformer alternatives. You know, say Jonathan Franco loses his bet and Transformer is not the state of the art architecture. What are some of the candidates to take over? [00:37:49]Tri: Yeah, so this bet is quite fun. So my understanding is this bet between Jonathan Franco and Sasha Rush, right? I've talked to Sasha a bunch and I think he recently gave an excellent tutorial on Transformer alternatives as well. So I would recommend that. So just to quickly recap, I think there's been quite a bit of development more recently about Transformer alternatives. So architectures that are not Transformer, right? And the question is, can they do well on, for example, language modeling, which is kind of the application that a lot of people care about these days. So there are methods based on state space methods that came out in 2021 from Albert Gu and Curran and Chris Re that presumably could do much better in terms of capturing long range information while not scaling quadratically. They scale sub-quadratically in terms of sequence length. So potentially you could have a much more efficient architecture when sequence length gets really long. The other ones have been focusing more on recurrent neural nets, which is, again, an old idea, but adapting to the new landscape. So things like RWKV, I've also personally worked in this space as well. So there's been some promising results. So there's been some results here and there that show that, hey, these alternatives, either RNN or state space methods, can match the performance of Transformer on language modeling. So that's really exciting. And we're starting to understand on the academic research side, we want to understand, do we really need attention? I think that's a valuable kind of intellectual thing to understand. And maybe we do, maybe we don't. If we want to know, we need to spend serious effort on trying the alternatives. And there's been folks pushing on this direction. I think RWKV scale up to, they have a model at 14 billion that seems pretty competitive with Transformer. So that's really exciting. That's kind of an intellectual thing. We want to figure out if attention is necessary. So that's one motivation. The other motivation is Transformer Alternative could have an advantage in practice in some of the use cases. So one use case is really long sequences. The other is really high throughput of generation. So for really long sequences, when you train with Transformer, with flash attention and so on, the computation is still quadratic in the sequence length. So if your sequence length is on the order of, I don't know, 16K, 32K, 100K or something, which some of these models have sequence length 100K, then you do get significantly slower in terms of training, also in terms of inference. So maybe these alternative architectures could scale better in terms of sequence length. I haven't seen actual validation on this. Let's say an RNN model release with context length, I don't know, 100K or something. I haven't really seen that. But the hope could be that as we scale to long sequences, these alternative architectures could be more well-suited. Not just text, but things like high resolution images, audio, video, and so on, which are emerging applications. So that's one, long sequences. Number two is a high throughput generation, where I can imagine scenarios where the application isn't like an interactive chatbot, but let's say a company wants to batch as many requests as possible on their server, or they're doing offline processing, they're generating stuff based on their internal documents, that you need to process in batch. And the issue with Transformer is that during generation, it essentially needs to keep around all the previous history. It's called the KV cache. And that could take a significant amount of memory, so you can't really batch too much because you run out of memory. I am personally bullish on RNNs. I think RNNs, they essentially summarize the past into a state vector that has fixed size, so the size doesn't grow with the history. So that means that you don't need as much memory to keep around all the previous tokens. And as a result, I think you can scale to much higher batch sizes. And as a result, you can make much more efficient use of the GPUs or the accelerator, and you could have much higher generation throughput. Now, this, I don't think, has been validated at scale. So as a researcher, I'm bullish on this stuff because I think in the next couple of years, these are use cases where these alternatives could have an advantage. We'll just kind of have to wait and see to see if these things will happen. I am personally bullish on this stuff. At the same time, I also spend a bunch of time making attention as fast as possible. So maybe hatching and playing both sides. Ultimately, we want to understand, as researchers, we want to understand what works, why do the models have these capabilities? And one way is, let's push attention to be as efficient as possible. On the other hand, let's push other alternatives to be as efficient at scale, as big as possible, and so that we can kind of compare them and understand. Yeah, awesome. [00:43:01]Alessio: And I think as long as all of this work happens and open, it's a net positive for everybody to explore all the paths. Yeah, let's talk about open-source AI. Obviously, together, when Red Pajama came out, which was an open clone of the LLAMA1 pre-training dataset, it was a big thing in the industry. LLAMA2 came out on Tuesday, I forget. And this week, there's been a lot of things going on, which they call open-source, but it's not really open-source. Actually, we wrote a post about it that was on the front page of Hacker News before this podcast, so I was frantically responding. How do you think about what open-source AI really is? In my mind, in open-source software, we have different levels of open. So there's free software, that's like the GPL license. There's open-source, which is Apache, MIT. And then there's kind of restricted open-source, which is the SSPL and some of these other licenses. In AI, you have the open models. So Red Pajama is an open model because you have the pre-training dataset, you have the training runs and everything. And then there's obviously RandomLens that doesn't make it one-to-one if you retrain it. Then you have the open-weights model that's kind of like StableLM, where the weights are open, but the dataset is not open. And then you have LLAMA2, which is the dataset is not open, the weights are restricted. It's kind of like not really open-source, but open enough. I think it's net positive because it's like $3 million of flops donated to the public. [00:44:32]Tri: How do you think about that? [00:44:34]Alessio: And also, as you work together, what is your philosophy with open-source AI? Right, right. [00:44:40]Tri: Yeah, I think that's a great question. And I think about it on maybe more practical terms. So of course, Meta has done an amazing job training LLAMA1, LLAMA2. And for LLAMA2, they make it much less restrictive compared to LLAMA1. Now you can use it for businesses, unless you are a monthly active user or something like that. I think just this change will have a very significant impact in the kind of landscape of open-source AI, where now lots of businesses, lots of companies will be using, I expect will be using things like LLAMA2. They will fine-tune on their own dataset. They will be serving variants or derivatives of LLAMA2. Whereas before, with LLAMA1, it was also a really good model, but your business companies weren't allowed to do that. So I think on a more practical term, it's kind of shifting the balance between a closed-source model like OpenAI and Anthropic and Google, where you're making API calls, right? And maybe you don't understand as much of what the model is doing, how the model is changing, and so on. Versus now, we have a model with open weight that is pretty competitive from what I've seen in terms of benchmarks, pretty competitive with GPT 3.5, right? And if you fine-tune it on your own data, maybe it's more well-suited for your own data. And I do see that's going to shift the balance of it. More and more folks are going to be using, let's say, derivatives of LLAMA2. More and more folks are going to fine-tune and serve their own model instead of calling an API. So that shifting of balance is important because in one way, we don't want just a concentration of decision-making power in the hands of a few companies. So I think that's a really positive development from Meta. Of course, training the model takes a couple of millions of dollars, but engineers have and I'm sure they spend tons of time trying many, many different things. So the actual cost is probably way more than that. And they make the weights available and they allow probably a lot of companies are going to be using this. So I think that's a really positive development. And we've also seen amazing progress on the open source community where they would take these models and they either fine-tune on different kinds of data sets or even make changes to the model. So as an example, I think for LLAMA1, the context lane was limited to 2K. Like a bunch of folks figured out some really simple methods to scale up to like 8K. [00:47:12]Alessio: Like the RoPE. [00:47:13]Tri: Yes. I think the open source community is very creative, right? And lots of people. LLAMA2 will, again, kind of accelerate this where more people will try it out. More people will make tweaks to it and make a contribution and then so on. So overall, I think I see that as still a very positive development for the field. And there's been lots of libraries that will allow you to host or fine-tune these models, like even with quantization and so on. Just a couple of hours after LLAMA2 was released, tons of companies announcing that, hey, it's on our API or hosting and so on and together did the same. So it's a very fast-paced development and just kind of a model with available weights that businesses are allowed to use. I think that alone is already a very positive development. At the same time, yeah, we can do much better in terms of releasing data sets. Data sets tend to be... Somehow people are not incentivized to release data sets. So philosophically, yeah, you want to be as open as possible. But on a practical term, I think it's a little bit harder for companies to release data sets. Legal issues. The data sets released tend to be not as eye-catchy as the model release. So maybe people are less incentivized to do that. We've seen quite a few companies releasing data sets together. Released a red pajama data set. I think Cerebus then worked on that and deduplicate and clean it up and release slim pajama and so on. So we're also seeing positive development on that front, kind of on the pre-training data set. So I do expect that to continue. And then on the fine-tuning data set or instruction tuning data set, I think we now have quite a few open data sets on instruction tuning and fine-tuning. But these companies do pay for human labelers to annotate these instruction tuning data set. And that is expensive. And maybe they will see that as their competitive advantage. And so it's harder to incentivize these companies to release these data sets. So I think on a practical term, we're still going to make a lot of progress on open source AI, on both the model development, on both model hosting, on pre-training data set and fine-tuning data set. Right now, maybe we don't have the perfect open source model since all the data sets are available. Maybe we don't have such a thing yet, but we've seen very fast development on the open source side. I think just maybe this time last year, there weren't as many models that are competitive with, let's say, ChatGPT. [00:49:43]Alessio: Yeah, I think the open data sets have so much more impact than open models. If you think about Elusive and the work that they've done, GPT-J was great, and the Pythia models are great, but the Pyle and the Stack, everybody uses them. So hopefully we get more people to contribute time to work on data sets instead of doing the 100th open model that performs worse than all the other ones, but they want to say they released the model. [00:50:14]Tri: Yeah, maybe the question is, how do we figure out an incentive structure so that companies are willing to release open data sets? And for example, it could be like, I think some of the organizations are now doing this where they are asking volunteers to annotate and so on. And maybe the Wikipedia model of data set, especially for instruction tuning, could be interesting where people actually volunteer their time and instead of editing Wikipedia, add annotation. And somehow they acknowledge and feel incentivized to do so. Hopefully we get to that kind of level of, in terms of data, it would be kind of like Wikipedia. And in terms of model development, it's kind of like Linux where people are contributing patches and improving the model in some way. I don't know exactly how that's going to happen, but based on history, I think there is a way to get there. [00:51:05]Alessio: Yeah, I think the Dolly-15K data set is a good example of a company saying, let's do this smaller thing, just make sure we make it open. We had Mike Conover from Databricks on the podcast, and he was like, people just bought into it and leadership was bought into it. You have companies out there with 200,000, 300,000 employees. It's like, just put some of them to label some data. It's going to be helpful. So I'm curious to see how that evolves. What made you decide to join Together? [00:51:35]Tri: For Together, the focus has been focusing a lot on open source model. And I think that aligns quite well with what I care about, of course. I also know a bunch of people there that I know and trust, and I'm excited to work with them. Philosophically, the way they've been really open with data set and model release, I like that a lot. Personally, for the stuff, for example, the research that I've developed, like we also try to make code available, free to use and modify and so on, contributing to the community. That has given us really valuable feedback from the community and improving our work. So philosophically, I like the way Together has been focusing on open source model. And the nice thing is we're also going to be at the forefront of research and the kind of research areas that I'm really excited about, things like efficient training and inference, aligns quite well with what the company is doing. We'll try our best to make things open and available to everyone. Yeah, but it's going to be fun being at the company, leading a team, doing research on the topic that I really care about, and hopefully we'll make things open to benefit the community. [00:52:45]Alessio: Awesome. Let's jump into the lightning round. Usually, I have two questions. So one is on acceleration, one on exploration, and then a takeaway. So the first one is, what's something that already happened in AI machine learning that you thought would take much longer than it has? [00:53:01]Tri: I think understanding jokes. I didn't expect that to happen, but it turns out scaling model up and training lots of data, the model can now understand jokes. Maybe it's a small thing, but that was amazing to me. [00:53:16]Alessio: What about the exploration side? What are some of the most interesting unsolved questions in the space? [00:53:22]Tri: I would say reasoning in the broad term. We don't really know how these models do. Essentially, they do something that looks like reasoning. We don't know how they're doing it. We have some ideas. And in the future, I think we will need to design architecture that explicitly has some kind of reasoning module in it if we want to have much more capable models. [00:53:43]Alessio: What's one message you want everyone to remember today? [00:53:47]Tri: I would say try to understand both the algorithm and the systems that these algorithms run on. I think at the intersection of machine learning system has been really exciting, and there's been a lot of amazing results at this intersection. And then when you scale models to large scale, both the machine learning side and the system side really matter. [00:54:06]Alessio: Awesome. Well, thank you so much for coming on 3. [00:54:09]Tri: This was great. Yeah, this has been really fun. [00:54:11] Get full access to Latent Space at www.latent.space/subscribe
54:3126/07/2023
Llama 2: The New Open LLM SOTA (ft. Nathan Lambert, Matt Bornstein, Anton Troynikov, Russell Kaplan, Whole Mars Catalog et al.)
As first discussed on our May Emergency pod and leaked 4 days ago, Llama (renamed from LLaMA) was upgraded to Llama 2 (pretraining on 2 trillion tokens with 2x the context length - bigger than any dataset discussed in Datasets 101, and adding ~$20m of RLHF/preference annotation) and released for commercial use on 18 July.It immediately displaced Falcon-40B as the leading open LLM and was immediately converted/quantized to GGML and other formats. Llama 2 seems to outperform all other open source models in their equivalent weight class:Why are open models important? The intersection of Open Source and AI is one of the oldest themes on this publication, and there has been a raging debate on the security and reliability of the OpenAI models and APIs. Users have reported GPT-4’s quality going down, which has been denied and denied and as of today, given some supporting data from Databricks, and complained about the API reliability and rapid deprecation schedules. Last and surely the biggest, there are entire classes of businesses and government/healthcare/military organizations that categorically cannot send any of their sensitive data to an external API provider, even if it is OpenAI through Azure. The only way to have total control is to own and serve your own models, which Llama 2 now pushes forward in terms of the state of the art (your own GPT3.5-quality model, though it is nowhere near Claude 2 or GPT-4).As we do with breaking news, we got on to Twitter Spaces again to chat with two scheduled guests:* Nathan Lambert, ML Researcher at Huggingface and author of Interconnects who had the best summary of the Llama2 paper* Matt Bornstein, organizer of the a16z infra team that launched Llama2.ai (source here) and has been coding up a storm with AI demo apps, unusual for VCsas well as Anton Troynikov of Chroma, Russell Kaplan of Scale AI, and Omar Qazi of the Whole Mars Catalog.Enjoy!Show Notes* Official links* Website, Paper* GitHub (Llama 2 commit)* Azure Partnership* Use policy, Statement of Support for Open Approach* Where to try* Llama2.ai (source), Perplexity Llama Chat* Live playground/API on Replicate, deploy all versions on Baseten* https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI * Dev ports - simonw llm-replicate, ggml using llama.cpp (7B, 13B) or pinokio, ollama, Core ML port* Timeline* 24 Feb - LLaMA 1 announced* 6 May - our No Moats podcast - first mention of Zuck opening up Llama* 14 July - Llama 2 leaked* 18 July - Llama 2 announced* Community notes* Nathan’s research paper recap* 638 LOC, 4 dependencies* Usage restrictions - MAU restriction, derivative models* Grouped Query Attention* System prompt* 2 trillion token dataset* >$20m price tag (rlhf, jimfan), * Separate models for safety and helpfulness (jimfan)* Mistral AI founders left out of paper* Interesting fails: Timestamps* [00:02:30] Introducing the speakers* [00:03:32] Nathan Lambert intro* [00:04:48] General Summary of Llama 2* [00:05:57] Sarah Silverman killed Dataset Transparency?* [00:08:48] Simon's Recap of Llama 2* [00:11:43] Matt's Intro* [00:12:59] a16z Infra's new AI team?* [00:15:10] Alessio's recap of Llama 2* [00:17:26] Datasets 101 Followup* [00:18:14] Context Length 4k* [00:20:35] Open-ish Source? Usage Policy and Restrictions* [00:23:38] Huggingface Responsible AI License* [00:24:57] Pretraining Llama 2 Base Model beyond Chinchilla* [00:29:55] Llama 2 is incomplete? Race to publish* [00:31:40] Come for the Llama, stay for the (Meta) drama* [00:33:22] Language Translation* [00:35:10] Llama2's coding abilities* [00:35:59] Why we want to know about the training data* [00:37:45] The importance of Meta pushing forward Truly Open AI* [00:40:59] Llama 2 as Enabler of Startups* [00:43:59] Where you can try Llama 2* [00:44:25] Do you need dataset transparency if you have evals?* [00:45:56] >$20m cost of Llama 2 is primarily preference data collection* [00:48:59] Do we even need human annotators?* [00:49:42] Models Rating Models* [00:53:32] How to get Code preference data* [00:54:34] Llama 2 Finetuning Ecosystem* [00:56:32] Hey Apple: Llama2 on Metal pls* [00:57:17] Llama 2 and Chroma* [01:00:15] Open Source MoE model?* [01:00:51] Llama 2 using tools* [01:01:40] Russell Kaplan on Scale AI's Llama 2 plans* [01:03:31] Scale annotating code?* [01:04:36] Immortality* [01:04:59] Running Llama on your phone* [01:06:54] Sama * [01:10:58] Meta "Open Source" Leadership* [01:11:56] Prediction: Finetuning => New Use Cases from Internal State* [01:13:54] Prediction: Llama Toolformer* [01:14:39] Prediction: Finetune-for-everything* [01:15:50] Predictions: Llama Agents* [01:16:35] dP(Doom)?* [01:19:21] Wrapping upTranscript[00:00:00] Introducing the speakers[00:00:00] Alessio Fanelli: There's not a single dull day in this space. I think when we started the podcast in January, a lot of people asked us, how long can you really do this? Just focusing on AI research and, and models. And I think the, the answer is clear now. A long time. So excited for this and excited to have Simon again.[00:00:16] You're basically a honorary guest host of all of our Twitter spaces. Cool. Thank you.[00:00:21] Simon Willison: No, it's great to be here again.[00:00:23] Alessio Fanelli: And Nathan, thanks for joining us. Actually share your your writeup on, on Lama two technical details with Swyx this morning. So it's great to to have you here to dive into some of the details.[00:00:33] Nathan Lambert: Yeah, sounds good. As probably clear Huggingface was trying to collaborate on releasing the model on the platform. So we ended up getting some early details, which made it a lot easier for me to cram study before the chaos hit.[00:00:48] Alessio Fanelli: No, that's great. It, it's kind of what happened with the code interpreter episode when Sean and I had access for about five hours and Simon was like, I've been playing with this for weeks and add all the, the insights scoops.[00:00:59] So I think this will be a, a good episode.[00:01:02] Nathan Lambert intro[00:01:02] Alessio Fanelli: Maybe Nathan, you just want to give people a little bit of background on what you do at Hugging and Face and yeah, the, your experience with the LAMA two kinda preview. Yeah. So[00:01:12] Nathan Lambert: I've been a researcher and helping lead reinforcement learning from human feedback efforts at Hugging and face, which really means I do some research and I try to figure out how to fine tune models to do what people want.[00:01:26] Generally we're trying to operate in the scale a little bit smaller than what Meta is doing cuz we obviously don't have that kind of resources at a startup. So I do a lot of technical research and also try to actually engage and communicate that with the community and specifically, Llama, I think I was most interested on kind of the research side.[00:01:48] I think the paper is a phenomenal artifact and it's clear that the model is really strong in a lot of areas. And then kind of the big picture trends of where open source is going. Like this is a clear step in a direction that a lot of people wanted, but weren't sure if it was gonna happen. Yep.[00:02:04] Alessio Fanelli: What are some of the things that stood out to you?[00:02:06] I think to a lot of the AI engineers audience that we have, they're not as deep into the details of the papers. We'd love to get a, a read from somebody like you who's a much deeper at a, you know, model research level.[00:02:18] General Summary of Llama 2[00:02:18] Nathan Lambert: Yeah. It's like, where do I start? So I think as a general summary, the paper includes a lot of details on methodology. So like, what are the things that they did in their stack to build, to actually run this? And it misses a lot of details on. What does a specific data set actually look like? It's clear that they have a really fine-tuned data set and they paid a lot of money for these data sets.[00:02:46] I think may like, it seems like now that both surge and scale are claiming some part in it, which I find hilarious. Cause it's really unclear, which are two of the probably biggest data labeling firms. So they kind of took the approach, meta took the approach of starting with open source preference data and then added a lot onto it.[00:03:04] And the most interesting part to me on this preference data, which is a new technical approach, is they trained two preference models, two reward models, one toward making the model helpful and one for making the model safe. And then in terms of open source models, it's clearly more performant on kind of ground root benchmarks and then it's safer.[00:03:27] Sarah Silverman killed Dataset Transparency?[00:03:27] swyx: That's where I was[00:03:28] Simon Willison: gonna wrap up to clarify, right. This is a big difference from the first LAMA paper. Cause the first LAMA paper was very, was so detailed in terms of how the training data worked, that people were able to essentially replicate it. And so you're saying that this new paper, there's, there's much less transparency as to how the training worked[00:03:45] Nathan Lambert: on the DIS side.[00:03:46] Yeah, I think they, they did a lot of new methodological things to, so taking the time to explain that like is not as much of a data focused paper. There's no table that is like, this is what the distribution of pre-training data came from. I would guess that it's a similar data set to the original llama with the kind of, they mentioned like one of the details that's really interesting is that they mentioned they up weight high factuality content.[00:04:14] So things that probably seem like Wikipedia, that seems like they're doing some sort of up ranking. During base model training, but they don't de, they did some type of thing they didn't detail[00:04:24] swyx: because it's also[00:04:25] Simon Willison: worth mentioning, I mean, they're being sued right now by Sarah Silverman of all people. I mean, it's one of the many lawsuits flying around, but there's a lawsuit specifically over the training data involved in the first Lama because one of the things that went into that was this data set called Books three and Books three is like 190,000 pirated eBooks, like the full text of all of the ha Harry bot novels, things like that.[00:04:45] Which, yeah, that's very difficult to say that that's not extremely copyrighted data. So I wonder if that's part of the reason they've been less transparent this time round is that, you know, it got them in trouble last time.[00:04:57] Nathan Lambert: Yeah. One of my colleagues on kind of the Ethics and Society time I side immediately pointed out that pub, publicly available data is the phrase often used in the paper, but that does not mean that it's free from copyright issues and or terms of service issues.[00:05:11] It means that I could go on a computer and download it.[00:05:13] Simon Willison: Right. If you, if you scrape the entire internet, very little of that stuff is actually like public domain.[00:05:21] Nathan Lambert: Yeah. And, and I, I think without going down kind of social issues, rabbit hole right now, I think the notion of public is extremely being strained by AI and changing communication practices. And it's just like kind of those things where it's like, oh, okay, here we go.[00:05:36] And they also use words like democratize and they have these sentences in the paper that are extremely value written, which is like the carbon footprint of our model. And releasing this is good because it'll mean a lot of people don't have to train models and burn more CO2 in the future. And it's like, okay, meta, like, like what?[00:05:53] Where are you going with[00:05:54] swyx: this? Yeah. Perhaps before we go too deep into the issues, cuz we, we have lots to talk about. I would also want to get a high level overview from Simon and from Matt who's also just joined us from a 16 and Z. So maybe Simon, you, you wanna go first with like, just recap for everybody what you think the relevant details are about LAMA two and, I mean, and we'll talk, we'll talk about Matt stuff.[00:06:18] Simon's Recap of Llama 2[00:06:18] swyx: Yeah.[00:06:19] Simon Willison: So, yeah, I mean the, the, the, the headline here is that LAMA two has been released and meta kept their promise of doing a version of llama that is used, usable for commercial purposes, which is so big because so much of the, like, llama itself came out at the end of February, and so many models have been released on top of that.[00:06:37] So, LA models like Vicuna, which was a fine tuned llama, all of them with the same, no, not, not usable for commercial purposes. Warning. So now we've got a really high quality foundation model that we are allowed to use commercially. I think the the amount of innovation we're gonna see over the next few weeks is, is just going to explode.[00:06:54] You know, I feel like this is, this is monumental on that front in terms of quality. I never know how to interpret these benchmarks. The benchmarks all look good. You know, the claims are, it's a bit better than, than Lama it's competitor with the GP chat, GPT 3.5, et cetera, et cetera. I have no reason to disbelieve that, but it always takes quite a while with these new models to get a feel for them.[00:07:13] You have to spend time with them to really feel like, is it trustworthy as a summarizer, all of those kinds of things. My, my hunch is that it is gonna do turn out to be extremely good. Like I, I, I doubt that it'll, it'll, it'll, it'll turn out to be sort of a damp squib on that front. But yeah, so they've released it.[00:07:30] The It's available commercially and you are allowed to redistribute it, but the only way to officially get the waits is to fill in a form on their website and wait for them to approve you still, which is kind of stupid because obviously it's already started leaking. I've down, I downloaded a version onto my laptop this afternoon, which, which worked.[00:07:47] There's a G G M L and the bloke thing that's floating around and hugging, hugging face already, so, you know, within. 24 to 48 hours. I think every possible version of this thing will be available to download without going through a waiting list. I'm almost not sure why they, why they even bother with that.[00:08:03] Especially since, you know, llama leaked within I within a few days last time and somebody ended up submitting a pull request to the GitHub Readme with a link to the BitTorrent for the LAMA models, which Facebook didn't delete. You know, they didn't sort of, They, they kind of like nodded and winked and said, yeah, this is what you can do.[00:08:20] And now it's even legitimately okay to do it because the license says you can. But anyway, it's out there. You can run it on your computer right now today. The it's also hosted in a bunch of places. Yeah Andrea Horowitz got that sponsored, the version of it that's available on Replicate, although you actually do have to pay for that.[00:08:37] I noticed that I built up 26 cents in, in replicate charges already playing around with that model. But it's api, so, so it's available via API or you can run it on your own machine and, you know, it's, it's open season. That's all start, start poking around with it and seeing what it can do.[00:08:52] swyx: It's open season.[00:08:53] Speaking of Andreesen, yes, Matt. Hey.[00:08:56] Matt Bornstein: Hey. Hey everyone. Thank you for having me. And Simon, if you wanna send me a Venmo request for 26 cents, I'll, I'll happily reimburse you.[00:09:02] Simon Willison: Absolutely. Yeah.[00:09:04] Matt Bornstein: We, we may lose about $3 on the transaction fee, but I think it'd be worth it[00:09:09] swyx: just to throw in a term sheet in there for a data set.[00:09:11] Nathan Lambert: You're good?[00:09:13] Matt's Intro[00:09:13] Matt Bornstein: No, I'm, I'm a huge data set fan. And, and, you know, we've, we've followed Simon's work for quite a while, and, and Nathan, it's, it's great to have a chance to share a stage with you. I think folks probably saw we you know, released a bunch of sort of, you know, VC version of evaluations. You know, we're way less smart than, you know, Nathan and Simon and a bunch of folks on the in the, in the space here.[00:09:33] But using just sort of the. Does it feel good approach and trying to get a fairly representative sample across different types of prompts? The model seems very good. We were playing a lot with 13 B and we're playing now with 70 B, and it really does give you kind of very fast g p t 3.5 level responses to some questions.[00:09:54] I, I think Simon's point about benchmarks is very well taken. It's hard to know how to interpret those. So, so we sort of go for the, for the direct version and for creative tasks. You know, especially it's, it, it seems very good so far. So, so a lot of what we're doing is just trying to get it out there as much as possible and, and, and as fast as possible.[00:10:11] You know, I I think we should all be incredibly, you know, appreciative that Meta is doing this and it, and it's not, you know, maybe quite perfect, you know, for some of the reasons that folks are are talking about. But you know, I think it's gonna be a huge unlock in open source LLMs and, and we're trying to, you know, just sort of support the community as much as possible.[00:10:29] a16z Infra's new AI team?[00:10:29] swyx: Yeah, I have to say, you guys are doing a bang up job recently. What, so what is, is there, this is a big team effort, right? Like I, I, I see that there's a number of names from your team, just essentially building projects and then collaborating on this this demo. Like maybe could just, could you describe like what it is andreessen's ACC sort, sort of involvement so far and like yeah.[00:10:50] What, what, what is the scope of this? Yeah.[00:10:53] Matt Bornstein: You know, we all applied for, you know L three engineer jobs and, and got turned down by all the, all the big tech firms. So we thought, hey, you know, we'll, we'll just do it our ourselves. Yeah. Look, I think, and this might be a little controversial, your average venture capitalist doesn't do any real work, and I completely include myself in this category, you know?[00:11:14] Allocating resources to support teams is, is important. It's an important function in the economy, but it's, it's what you might call indirect work, which is you're supporting someone else doing something. You know, we just sort of made the decision when we really saw AI starting to take off that we should start doing real work too.[00:11:31] And it's really just about supporting the ecosystem, especially around open source like Simon. We're massive believers that the innovation you see in open source is really gonna be a big unlock for AI based applications, right? Not everybody can just use. The Open AI API is good, as good as it is, and not everybody can train a model from scratch, right?[00:11:52] Not everybody you know is, is Nome Shazi or, or someone like that. So so we think it's a really huge unlock and, and again, we're just trying to support as much as possible. So today we you know, we released a playground to play around with Llama2. We got it up on, on Replicate so people can just sort of try it with an API call and try integrating it into their apps.[00:12:10] We released an AI starter kit over the last couple of weeks which people are actually using. We were shocked. We're, we're a little nervous cuz our, our code, you know, may or may not be production ready. But, but you'll see more and more of this from us over time.[00:12:23] swyx: Yeah, I've seen your companion chat bot, and I have to say, it's actually pretty impressive.[00:12:26] It's got all the, is it the latest features in terms, especially in terms of streaming and lag chain and all the other stuff. So kudos to your team on that. Just to round out the overviews or the, the high level takes, before we go into individual details Alessio has been compiling the show notes, which we were gonna publish when this podcast goes live on lane space.[00:12:45] Lessio, maybe you want to go over some of the, the notes that you've been taking. Then I'll, I'll go over to Alex.[00:12:50] Alessio's recap of Llama 2[00:12:50] Nathan Lambert: Yeah, we[00:12:50] Alessio Fanelli: got a, we got a lot of stuff to run through here. I think like the most interesting things that I read from the paper. One, there's a abandoned size model. So the 7 billion, 13 billion and 70 billion made it to release, but there's a 34 billion size that didn't make it.[00:13:08] And in the safety chart, you can actually see it's like, Twice as unsafe, quote unquote. And they decided not to publish it because of lack of time to red team it. So I don't know if anybody had a chance to try the 34 B before the release, but I would love to learn, learn more about that. Outside of that, yeah, as Simon and Nathan were talking about, the data piece is a lot more obscure.[00:13:31] So LAMA one was 67% common crop, 15% c4, a bunch of GitHub Vidia books as we mentioned. We don't have any information about LAMA two, but they did mention they have a 40% larger pre-training corpus. So they've obviously been investing a lot in that. Also, yeah, the, the supervised, fine tuning was very interesting.[00:13:52] I saw a tweet, somebody asked the laou how to kill a process, and laou was like, you can't kill things. And I was like, just a process. It's not a person. So I think in, in some places, the, it might have gone too far with the R L H F but that's another, that's another interesting side, right? Like if this is the starting point and like the defacto standard for open source models, are we okay with, you know, not being able to ask how to kill a Linux process?[00:14:18] But I'm not, I'm not sure about that[00:14:20] Nathan Lambert: yet.[00:14:21] Simon Willison: I ran into that myself. I, I asked it to give me all of the animal emoji and it said that that would be disrespectful if it, if it attempted to do that, which was kind of interesting.[00:14:32] Alessio Fanelli: Exactly. So that's a, that's an open question on open, you know, it's the Joel safety question.[00:14:39] It's like, how much do we need to do before we release the smartest to the public versus what should that. The public side. The other thing is like, they should have let this GPUs burn for more. Like if you look at the, at the loss graphs, like these models are not saturated, I guess. Like they spent a lot of, a lot of money to try and train these.[00:14:56] Datasets 101 Followup[00:14:56] Alessio Fanelli: But it seems like there's a lot of work left to do there. We just did a data sets 1 0 1 episode that we released yesterday, which is already old news because now LAMA two is out and this is all the rage. But we talked about some of the scaling laws and we thought the 200 x was like the new LAMA ratio.[00:15:12] But I think this one is 275 x Sean, I think.[00:15:17] swyx: Yeah. So that's five. Yeah, 2 trillion tokens for seven B model. And that's, you know, that's up from 1.2 last time. So they, they've definitely ramped up the, the, the amount of data and they, they just refuse to tell us any of it because, well, you know, guess what happened last time They, you know, they published the data, infra red pajama went and cloned you know, line for line exactly what was in the LAMA paper.[00:15:39] So, you know, then that created, you know, red pa, red pajama model and then open lama as well.[00:15:44] Context Length 4k[00:15:44] Simon Willison: So I saw it says that the context length is up from the first lama. Do we know what the new context length is?[00:15:50] Matt Bornstein: I think it's,[00:15:50] Nathan Lambert: yeah, 4k. 4k.[00:15:53] Simon Willison: Is that likely to be higher for the 70 B model or are they all the same context length?[00:15:58] Matt Bornstein: I believe they're all the same and we have tested it a little bit and my intuition is that you can actually get more effective performance, more accuracy out of 4K rather than scaling up the way, say OpenAI have to 32 K or high. Like it's, I think it's just hard to find high quality. Training data. So it's when users actually start to submit longer inputs, performance kind of breaks down.[00:16:22] And I'm not talking about open AI specifically, but in general, and that's, that's my intuition on why you know, why meta is keeping it relatively small for these models.[00:16:31] Simon Willison: I'm kind of hoping that somebody, now that it's open source, somebody finds some clever trick to increase that. I've been playing with the Claude 100,000 a lot recently and it's pretty phenomenal what you can do once you've got that extra context length.[00:16:43] swyx: There[00:16:44] Alex Volkov: is actually a trick. It's called rope. We've seen this with a two, two line change that you can, you can make Lama forget about the context it was trained on, and there was back and forth about how effective this is and whether or not it suffers from the same dip, you know, in the middle of the context.[00:16:59] But this rope scaling trick then was verified by folks from, I think Microsoft, independently from that guy Kaiko, Ken Devrel, and I, I see some folks in the audience here who are participating in this. So apparently this applies to the previous LAMA and would likely apply to this next one as well.[00:17:17] Simon Willison: That's pretty exciting. I can't wait to, this is the thing I'm looking forward to is now that it open source. All of this stuff is go, these experiments are just gonna start happening at such, such, such a fast rate. This happened with Lamba before. You know, once you let every researcher in the world download and start tinkering with your model, people start finding optimizations and, and new tricks at a, at a crazy rate.[00:17:37] It's gonna be really interesting.[00:17:39] Nathan Lambert: So[00:17:39] Alex Volkov: I think the interesting piece here is to see whether or not the commercial license will unlock even more, or did the researchers didn't care and kinda threw the kitchen sink of everything they wanted to hack together on the previous llama. I'm thinking because it's open source commercially now companies will actually start, you know, doubling down because there will be able to then use the fruits of their labor on commercial purposes.[00:18:02] So we'll likely see[00:18:04] Alessio Fanelli: more.[00:18:05] Open-ish Source? Usage Policy and Restrictions[00:18:05] Alessio Fanelli: I think you guys use the magic word, which is open source, and everybody has a, has a different, different definition. And I know we had Tom Warren in the audience who asked the question about this. So Tom, I'm gonna invite you up to speak if you're around.[00:18:18] Simon Willison: Yeah. I'm gonna say, call it, I, I say openly licensed, not open source, because I feel like open source has a definition, this doesn't quite apply here.[00:18:27] Alessio Fanelli: Yeah, yeah, exactly. If you go, actually on my website, I wrote like a 10,000 words thing on like the history of open source licensing, and there's things that are open source, things that are somewhat open source in traditional infra, that's like the server side public license. Some of these things that like Elastic and Mongo came up with to avoid the a w s a p i compatible in quotes products that were literally just the same thing.[00:18:51] So yeah, it's, it's really curious also that the breakpoint for the LAMA license is 700 million monthly active users, which is. A lot of users obviously, but there's some notable people that go over it. So Snapchat is one company that is obviously a, a close competitor to, to meta TikTok, isn't there?[00:19:10] YouTube, by far exceeds that[00:19:13] Simon Willison: amount. Yeah. It's worth noting, but that's actually, that's not a rule going forward as of the date of the release. If you have 700 milli monthly users, you can't, you, you have to get an extra license from, from Meta. If you manage to achieve 700 million million monthly extras next week, you could still use it.[00:19:30] Like it's, it's, it's, it's that point in time that[00:19:32] swyx: matters. Yeah, at that point they should just name people. But yeah. Just to close the loop on this open source element, you know, there's one other piece of about the open source or, or the usage policy, which is you can't use it to train any other model.[00:19:44] Thou shalt not have any other models before llama. Llama is your only model that you can fine tune with, with llama data.[00:19:52] Simon Willison: I think it's more than that. This is they're protecting against distilling the model, right? The thing that everyone's been doing, like Una was trained on Chachi PT data, despite open AI having a thing in their terms, it says you can't train a competing model.[00:20:04] I don't, I'm really frustrated by this because the, the language says you cannot train a competing large language model. But what does that even mean? Who gets to decide what a large language model is? If in six months time we invent a new architecture is that's still an l l M that's covered under those terms.[00:20:20] It's, it's frustratingly vague.[00:20:22] Nathan Lambert: Yeah, these clauses are kind of bogus. We talk about them a lot of hugging base. And it seems also from a legal perspective, the things that they're grounded in, like terms of service are being walked back in kind of this digital domain. And then also it's just like unclear what is actually using the language model.[00:20:40] So all these things where people use language models as a judge, or you can just generate a bunch of interesting prompts to then modify them. It's so ridiculous to even think of trying to enforce these clauses. It's surprising to see it show up,[00:20:54] swyx: which you have to note, like in the LAMA two paper itself, they also use other company models to do their evaluations.[00:21:02] Right? Like so and I, and you know, a strict reading of the, of those clauses would not allow them from from that.[00:21:08] Huggingface Responsible AI License[00:21:08] swyx: Nathan, actually a quick follow up. Hugging face has its own license, the rail license. I think there was some iteration following this stable diffusion release. Would you, would that be appropriate for something like Alama two?[00:21:19] Nathan Lambert: Yeah, I think it's good. I don't have a hundred percent knowledge of rail. My understanding is that it's like, generally the goal is to be like commercially available with good intention and then there's kind of like, it starts to try to give leverage for people to come after bad actors using their models.[00:21:37] I, I think the commercial use of this is gonna be off the charts very soon, like at hugging face. A lot of the monetization efforts are around like trying to enable commercial use of open source language models. And the license questions have been a constant discussion for the last six months from things we're trying to build and from customers.[00:21:57] So like this is definitely going to[00:21:59] swyx: be used. Yeah. Yeah. Okay. So I don't, it's, it's do we have, we have a lot of you know, insightful people here.[00:22:07] I feel like the, the best way to organize this space is maybe to just kind of try to stick to as, as many sort of factual elements as we, as we can.[00:22:15] I feel like Nathan, since you've done the most work you've had the most time with the paper, to be honest. What El maybe sort of pick on one other sort of element of, of the paper that you, that you find worth discussing and we can kind of go into that.[00:22:27] Pretraining Llama 2 Base Model beyond Chinchilla[00:22:27] swyx: Maybe the, sort of the, the pre-training base model stuff.[00:22:30] Nathan Lambert: Like, I, I don't think there's a lot on the pre-training. The, there's definitely an important thing that makes it able to be used, which is they use, like, what is cqa? It's like cross query attention, which will make inference on the bigger models faster. I think there's kind of a asterisk that is interesting on that code and math and reasoning seems pretty.[00:22:49] Not emphasized in the paper, and that's what their kind of like market for. That's what ChatGPT is used by a lot of people on this call for. I think at a technical level, the Rh f details are the most fleshed out that we have seen. Sure. And kind of confirm a lot of the capabilities we've seen insinuated by anthropic and open ai.[00:23:11] So that was like kind of a relief for me as someone that's trying to be like, I still think this really works. And they dropped this paper is like, we really like this, which was not guaranteed. I, I have one[00:23:22] Matt Bornstein: pre-training question. And this is for you, Nathan, or, or for the whole group. Like we, we talked about it before.[00:23:27] The, the amount of pre-training data here goes far beyond chinchilla optimal and the loss curves were still going down when they cut it off. Like, are we ready to say that chinchilla optimal is just not optimal anymore?[00:23:43] Nathan Lambert: Oh, I'm ready. I never really cared about it. Like I think data quality is changing that completely.[00:23:51] It's like, I think when Gent came out, data quality standards were so different and given what the practices are now, I, it's like, what does it mean?[00:24:03] Matt Bornstein: It was a really big deal at the time though, right? I mean, it was kind of this breathtaking result that if you just ramp up training data much higher than you thought or people had been doing, you just kept getting better performance.[00:24:15] May maybe Nathan, since you're, you know, the most knowledgeable on this space, like can you just like, give us a little intuition, like when you say better data quality, like what exactly is happening under the hood that makes this possible now?[00:24:26] Nathan Lambert: Oh, they're removing. Okay. Think about all the tweets and texts that everyone sends, and we have these weird insider jokes and phrasings that we do.[00:24:37] They make no sense if you read them and your language model, like half reproduces them. So like, and like I'll say like you got got, or something that is just very confusing from like a token prediction state point of view, and then also a ton of just errors. It's like I write a blog post. I used to not take it as seriously, I've like published a blog with a half finished sentence in it.[00:25:00] It's like they would just scrape that and take it, but trying to actually get data that is complete is, is consistent, is just extremely hard. I think technical terms are like deduplication, so you don't wanna pass the model, the same text, even if it came from different websites and there's tons more that goes into this.[00:25:21] I, I don't think it's the area of my most expertise, but I think it's actually pretty simple. You just wanna put good text into the model and understanding what good text is on the internet is really hard.[00:25:34] Matt Bornstein: So you're sort of saying the reason people were using not enough data initially is cuz they just weren't good enough at cleaning it. And now that those methods have advanced so much, we're moving duplicates better, we can measure quality better, all of that. Like, like do you think we're gonna keep going up, I guess is the question like this, you know, they trained a seven B model on 2 trillion tokens.[00:25:52] Like, do you think that's like the Maxim or are we gonna keep going?[00:25:55] Nathan Lambert: I kind of like, I, I think the intuition on like what you're saying is how getting more higher quality data is making it so using more works better. I like, that's what everyone in my circles is saying is the trend and given machine learning in the last few years, I think trends tend to be stickier than most people expect them to be.[00:26:17] So I would expect it to keep going. I just kind of trust the process to continue for a lot of stuff like this.[00:26:22] swyx: Yeah. So we on our podcast, we've been asking everyone that we can possibly CAGR ask about, you know, went from two x tokens to perran ratio with Kaplan, and then 20 x with chinch, now 200 x with llama, like someone's gonna try 2000.[00:26:37] Right? We did have a response today from one of our previous guests Varun of Codium who said that they did try a thousand to one tokens, to params ratio. And it definitely gone into the range of overfitting. So your loss can continue to go down, but you're not sort of measuring overfitting in, in, in, in some of that respect.[00:26:53] So it's, it's very unclear. I would say though, you know, I, I do have visual sources like. Chin. It's not that chinch was wrong. Chinch was optimizing for a particular set of assumptions, particularly the pre-training compute budget, right? Compute optimal sort of scaling laws. And if you look at the llama paper right on the first page, I have it open right in front of me.[00:27:12] They actually criticize that and say like, you know, this, this disregards the inference budget which is critical when you're actually serving the model instead of just optimizing for a pre-training compute objective. And as things move from research into production, inference starts to become more, more of a concern.[00:27:28] Resource constraints starts becoming more of, more of a concern. And so I, I, I think it's actually quite reasonable to move on from chinchilla, which is a very important result. And, and say that, you know, we are, we are exploring very different objectives as compared to, you know, more than a year ago when Chinchilla was published.[00:27:45] Llama 2 is incomplete? Race to publish[00:27:45] Nathan Lambert: Yeah, I agree. I was just gonna say that I feel like the was going down like all of these fa reading the paper, it feels like this is a checkpoint of a much longer term project. They like readily list off things that they didn't get to but they want to continue and like capabilities or something.[00:28:03] Some of the methods seem like kind of hacks to make things work that they didn't know if didn't get to work. Like Anthropic came up with context distillation, which is a way of getting a really, the behavior of a really long system prompt into a shorter prompt essentially like, and, and they did something like this in this paper to get the P model to behave like characters for longer conversation turns.[00:28:27] And like, there's all sorts of little things that I just think meta is going to continue this and.[00:28:34] Simon Willison: So that's kinda fascinating cuz that that implies that the, the actual story here, it's the AI arms race, right? It's, it's, it's Zuckerberg saying, no, we need to get something out right now. Get it to a point where it's good enough and safe enough and then let's ship it.[00:28:46] And it's not so much that they, they, they didn't necessarily have time to get to the sort of perfect point that they wanted to get to.[00:28:54] swyx: Yeah, that is the I have asked people about this offline, and so I was like, okay, so why don't people throw a lot more compute at this? And they're like, you know, as long as you have a state-of-the-art model, you should just ship it and get credit and then wait till, like, wait a few months and then get the next version out.[00:29:08] That way you have a lot more shots on gold.[00:29:11] Simon Willison: That totally makes sense. Yeah.[00:29:14] swyx: And I was like, oh, okay. Like we are in such early stages that honestly, I mean, they spent 3 million G p U hours on this thing. They could spend 30 million in, like, obviously it would be way better. Like we're in such early stages that even these relatively simple.[00:29:27] Like don't forget Lama one was published in February of this year. We're in such a easy cycle where it, it's, it's still within, you know, the order of months to make and improve one of these things. That it's not too terrible.[00:29:40] Come for the Llama, stay for the (Meta) drama[00:29:40] swyx: I do, I guess I should also mention a shout out that Not every person who worked on LAMA two is on the paper.[00:29:48] Guerro Lampel and who's, who's one of the co-founders of Misra, the French startup that raised like a hundred million C round. Apparently worked on LAMA two and they left him out because in, they left his team out because they left Meta before this paper was published. So interesting passage.[00:30:03] Treat there. If anyone wants to go through that,[00:30:05] Alessio Fanelli: come for Alama, stay for the drama. Oh, it's hard. It's hard to read, you know, into like the, as you know, especially when it comes to like, work that then goes over source. It's always we did the work. We didn't I don't know, since, since nobody here worked at Meta I would rather not go, not go down that path.[00:30:23] Yeah,[00:30:23] swyx: I, I'll just leave a bookmark there. Okay. Yeah, but exactly.[00:30:26] Nathan Lambert: We're not in the room there. I,[00:30:28] Matt Bornstein: I, I'm for one shocked to hear that there may be drama among researchers. I've, I've never heard of that happening before.[00:30:34] Nathan Lambert: Right. Near, especially after three organizational restructures of researchers hopping, playing hopscotch from one org to another, and being in between, in between jobs.[00:30:43] I don't know.[00:30:45] swyx: All right. Alex, do you have your hand up? And then I wanted to dig more on the the preference data that Nathan mentioned. Mm-hmm.[00:30:52] Language Translation[00:30:52] Alex Volkov: Hey guys. Just to introduce myself real quick, I'm Alex. We participant in the spaces is, and my angle and the way I vibe, quote unquote vibe check models is via languages.[00:31:03] And to me, it was really surprising that they released kind of the second iteration while also knowing how much meta actually does for translation. They have very famous N L L B models, no language left behind. They released the world models that you can speak in multiple, like a thousand languages that understands, and for some reason, they're open source models.[00:31:23] They are not very strong multilingually. So we've seen this with GPT4, which was way better at multilingual speak. Claude highlighted this point with Claude two that is like way better at the blue score. I think for, for languages, and I've tried and my go-to like vibe check with these models is to, with the, especially the open source one is the ability to translate, the ability to understand the languages.[00:31:46] I've tried it with, with Hebrew a little bit. I've tried with. Very, very impressed. Now, obviously fine tuning will come and obviously people will fine tune these, these models towards different outcomes, but it's very interesting considering how much meta does elsewhere for languages and to bring the world together.[00:32:02] How much kind of this model did not focus on this, this specific kind of issue. And the, the, the second thing is also code. I know you guys talked about human eval. That's fairly low in terms of the score out of the box. And obviously fine tuning will, will, will make it better, but fairly, fairly disappointing score on, on human ev, right?[00:32:22] Fairly low coding abilities. And we've seen previously that there's some assumption that training on more code in your dataset actually gives you better kinda logic and reasoning abilities. So kind of surprised that that was fairly low. We went to chairman with these two, two examples about Lama.[00:32:40] Llama2's coding abilities[00:32:40] swyx: I'll say on the human eval piece don't count it, not just yet. So I've, I've had some dms with Quinn Slack or of source graph, and he's is you know, very actively building Cody their, their coding assistant bot. And it's well known that human eval is not a very good or reflective measure of how we use coding chatbots.[00:32:59] And so like, it, it is probably human EV emails is probably overrepresented in terms of being, being like this effectively the sole benchmark by which we value code models. We, we just need new benchmarks for code.[00:33:11] Matt Bornstein: I do think it's possible better instruction tuning will improve code performance of the LAMA two models as well, because their reasoning capabilities are actually relatively good. Not perfect, but relatively good, which makes me think there may be more code in the pre-training than it seems.[00:33:26] swyx: Well it's difficult to know cuz they don't talk.[00:33:29] We'll, we'll see, we'll see.[00:33:31] Why we want to know about the training data[00:33:31] Simon Willison: I mean, this is the thing that's so infuriating about these opaque models that don't talk about their training data is as users of the models, we need to know, we need to know how much, like if it's had code in it, all of those kinds of things in order to make decisions about what we're going to use it for.[00:33:45] So I kind of feel like you know, the, the, the secrecy around these models really hurts me as a consumer of these models, just from a practical point of view of being able to make good judgements about what the model's gonna like to be able to do.[00:33:55] Matt Bornstein: I, I do think that's true, Simon. You know, I wanna make just one defensive of Meadow, which is like, this is pretty amazing what they've released and they've, you know, given to the world, obviously it may benefit them commercially as well, but you know, it actually carries pretty substantial risks for them and actually think it's kind of a courageous act to, to release and, you know, so it, and it's the things like the training data.[00:34:20] Safety that like really, you know, when you're, when you're meta and you have billions of, of active users, like you, you actually are taking a pretty big risk with these things. And, you know, regulatory bodies have their sights on you. So I, I do think you're right. I, I just, I, you know, for what it's worth, wanna I agree with, I agree with, it's actually a[00:34:37] Simon Willison: positive thing.[00:34:38] I agree with everything you say, but at the same time, right now, I've got a whole bunch of models that I'm choosing to be to, to, that I'm trying to choose between, and I don't have the information I need to make the decision. I feel like at some point it's going to be a competitive advantage to put out a model with transparency of the data over, over what went into the data.[00:34:55] Cause people will be able to use that model more effectively. But yeah, I completely understand these strategic challenges that I'm, I'm astonished that meta went ahead with this release. I never thought they'd, they'd take the risk of releasing something like this and someone use it for something bad and now they're on the front page, all of the, all of the papers for it.[00:35:12] So yeah, I'm, I'm super excited about it on that front. I wanna[00:35:15] The importance of Meta pushing forward Truly Open AI[00:35:15] Alex Volkov: ajo. Yeah. I know from the perspective of releasing something as open source as they did previously we didn't have commercial licensing, obviously. Now the big thing is we have commercial licensing, but the amount of people, I don't know if you guys noticed, but like the amount of people who signed, quote unquote in support of releasing these models, Paul Graham and Mark Andreesen, and like a bunch of other folks, like in addition to the model, they also released kind of a counterweight to the moratorium papers and all the AI safety stuff.[00:35:41] Because there was a, an FTC pro, right? There was like some, some regulatory stuff talking about the previous releases of LAMA from, from a long time ago. And now not only they released like the, the, the, the quote unquote open source. So unless it doesn't, doesn't kick me off here. Not fully open source, but definitely we're able to use this commercially.[00:36:00] But they also released kind of a industry leaders selling like the, the, the open source is needed. And I think that. That, like, gives a very strong counterweight to the M and the keep, keep it closed and don't release kind of thing. We saw, and it's very interesting. It comes from meta specifically.[00:36:16] So in addition to the courageousness that they did, it looks like they're also kind of leading the industry in terms of like, this is how to do fully commercial again, quote unquote open source, not open source license, but this is how to release models in a, in a, in a safe way. So definitely joining the, the courage and the applauds for meta and the team.[00:36:35] Nathan Lambert: Yeah, I just don't think that like, like the cu we're not the customers of meta with respect to this model. I think they're trying to build these for their own purposes and then they have very strong, like, I think it's kind of the principles of like transparency and research that these organizations at Meta have stood by. And I think that's like the newest representation of it, more than like, and I don't think they're trying to make money off releasing this in any way. Like there is an ecosystem perspective of like where AI content proliferates, there's more creativity for their users and that enables social media and things.[00:37:08] But I think we're still pretty far from that. And it's more of like a values and internal research and development tool for themselves. Like is there a way for them to make money directly off of this NPCs[00:37:19] Alessio Fanelli: and the Metaverse. But I mean, I don't know.[00:37:23] swyx: Well, so we, we, we last hosted one of these emergency pods, I think maybe two, two pods ago.[00:37:28] Which was I think in May where we did our when the No Moats memo came out from Google. And we actually talked a little bit about what an ecosystem around a language model looks like when you have stackable loras customizing and fine tunes that are based on top of an existing base model that is well known.[00:37:48] I, I think that might be part of the strategy there. You know Facebook is also well known for releasing, I guess, PyTorch and, and React. And, and those are very well, like, they don't make money from that directly, but they definitely do benefit from the ecosystem that has sprung around it, that, that essentially represents a lot of free development from, from the open source community.[00:38:07] Simon Willison: I think there's a lot to be said. The fact that meta AI are at the very heart of openly licensed language model research, and that's because of Lama, you know, Lama came out and it kicked off this immense tidal wave of interest and of activity with meta ai right at the very center of that. And in the world that we live in right now, being at the very center of all of the research and innovation happening around language models feels like a really valuable place to be.[00:38:31] Llama 2 as Enabler of Startups[00:38:31] swyx: Yeah, it, it, it really is. I I, and maybe we can go to a little bit to, to Matt again. One thing I wanted to get your thoughts on that, you know, I don't know how long you have with, with us, but is the impact on the startup ecosystem, right? Like how, how big of an enabler is this? Or does this, I guess just commoditize everything to a point where, you know, everyone's just rappers.[00:38:50] Matt Bornstein: I think it's a really, really massive deal. You know, we've met with. Conservatively hundreds of AI startups now maybe, maybe thousands. We'd have to go back and look and, and, and I sort of alluded to this before, but the really big dilemma is do I train my own model or do I just use something off the shelf?[00:39:15] And we're really, we're increasingly seeing that the answer for almost everybody is kind of a hybrid approach. We're seeing increasing number of startups, basically triage. Their AI workloads where if things require, you know, really high levels of accuracy and you know, human like text generation, GBT four is the only answer.[00:39:38] But many queries or workloads actually don't require that, right? So you can kind of scale down and say, you know, for a really simple query, I can use, you know, an open source model off the shelf for something in the middle. I can fine tune for various tasks and then you can get pretty sophisticated about what you route, where all of that is only possible if we have commercially usable, really high quality language models and especially ones that have been efficiently trained such that latency is, is, is low and cost is relatively low.[00:40:09] So I think what we're gonna see happen is there's gonna be a, a big push for startups to use. Lama two models and, and other open source models that have similar levels of performance. Fine tune it in ways that actually work for specific tasks, right? Not for specific data, like I think that was sort of a head fake, but for, for specific tasks and, and really be able to build more defensible businesses that way.[00:40:34] You know, this, there's nothing wrong with using OpenAI. That's fantastic, but it's probably not good to make that a hundred percent of your business. And, and a lot of founders are doing that now. So, so that's why I think this is, this is such a huge deal and, you know, the, the progress just today has been amazing.[00:40:51] Like, there's gonna be, by the end of today a number of hosts where you can just easily use The Lama two models, like right outta the box, you know, replicates one that we work with, but there there are others as well. You know, you can already run it on your local computer with two bit precision, which is kind of crazy if you stop and think about that for a second, that with two bits you can actually run a super advanced language model on your own computer.[00:41:15] So I, I think I, I just think this is a huge, huge deal for startups and I think if you're a startup founder working in ai, you know, you, you really should be taking a look at, at open source models now and seeing how they, how they can be used to, to kind of deepen your moat and, and, you know, build a really great AI product.[00:41:34] Where you can try Llama 2[00:41:34] swyx: Right. So me, I would like to help fill in the blanks. So apart from Replicate, it looks like hugging Face has also launched an inference endpoint for that. And as far as I know, it's one of the only few ways to try the 70 B model off the shelf. I think Base 10 has also maybe put something up. And then for the, for the two bit quantized model, you can look at the G GML ecosystem.[00:41:55] Do you need dataset transparency if you have evals?[00:41:55] swyx: Yeah. And, and then I also wanted to recognize one of the other respondents in our chat, we have a little, little comment window here. ARD Doshi was responding, I think, to Simon. And, and I, I did actually have a pushback, right? Like, we don't have to know. The full data sets of of Lama as long as we are able to eval for everything that we want to know about.[00:42:13] I think we actually have to live with AI becoming more and more of a black box. Even though the, the mo the the weights are open I mean for me it[00:42:20] Simon Willison: comes down to model competition. If I have two equally capable models and one of them, I know what's in it, them, I don't, and I'm gonna use the open, the, the, the more, the more transparent one.[00:42:30] And I'm hoping, because there are so many models competing now, I'm hoping this becomes one of the factors that models compete with each other on[00:42:38] swyx: I'm, you know, dataset non-transparency I guess is like an emerging theme because like, it's not like we had that for Falcon either. So yeah, we can[00:42:47] Simon Willison: hope for it and that's a huge problem, right?[00:42:49] Falcon, if you ask Falcon about human rights abuses in, in the Middle East, it has some very different opinions and I want to understand why. I want to know how they got it to, to do those things.[00:43:00] swyx: Yeah, yeah, exactly. Yeah, we won't know. And we can, all, we can, all we can do is ask for more transparency there.[00:43:06] But I do, I do support the you know, the concepts of building a business on open source models. Because open AI will not randomly deprecate your models on you, you know, every three months. And I do think that for people who want a certain level of stability and are okay with trading off not being state of the art in three months I think that is a perfectly reasonable tradeoff.[00:43:26] >$20m cost of Llama 2 is primarily preference data collection[00:43:26] swyx: Okay. I wanted to go back to Nathan A. Little bit and talk a little bit more about the preference data and the R R L H F data. So you estimated a 25 million cost for LAMA two. And as far as I can tell, That's, that's actually primarily data collection, not GPUs.[00:43:46] Nathan Lambert: Yeah. This is based on kind of our pilot contract to do preference data collection at hug and paste cuz we can give, like we're collecting a small amount of data in a similar way and if you do a back of the envelope cost calculation and scale it up by whatever, like 10 or a hundred x that what they did, then you get towards this 20 million number and it could be higher depending on how many flags they end up using in their data.[00:44:12] So I think what they did was safety is pretty interesting. So they like separated it and collected metadata and that means they could also collect other metadata during the process. And as you kind of add more knobs to the preference data collection because it takes longer for people to do the task and the cost goes up.[00:44:29] So I think pretty safe to say order of 10 million, especially given, because that's what was rumored with open AI around ChatGPT and everything like that. So, It is not a shock at all to me. And, and is the[00:44:43] swyx: focus on multi turn significantly higher or, you know, comment worthy I guess?[00:44:49] Nathan Lambert: Not really. So generally when doing on setting this up, it comes down to per pro, like how many tasks the workforce is gonna do.[00:44:58] And you could do an instruction prompt, which is one turn, or you could do a four turn chat and that would, you'd generally be able to trade off the number of labels that you get in that respect. So I think the multi turn is more because open source data sets don't contain a lot of that, which is something that we found in, in our work as well.[00:45:16] And they did that because they needed the model capabilities and they needed to train a preference model that can do that. And I agree, I, I think they must have figured that out months ago. Cause this also takes a lot of time how it works generally. You can see this in the paper, how they say they have these RH F versions and generally what happens is, You sign a contract and then these people sit you down and they're like, we are gonna try to do this over batches and we scale up the amount of data we're sending over time so that we can do calibration.[00:45:43] And each batch you get some data from the vendor and then you look through the samples and you see what you like and you see what you don't like and then you change it going forwards. And what they did is they took those batches and they trained a model iteratively and then they saw what their model needed and they went back to the vendor to say, okay, we need more data in this regard to improve things.[00:46:01] So it was a really hands-on, really involved process. And I would guess it takes weeks to months for them to get all this data from a vendor. It's definitely not something you can just get fast and honestly, a potential reason why code is not as good is because way harder to get code data in this regard.[00:46:20] So all the task companies are extremely limited in people that know a lot about code. So you get way lower throughput for getting preference labels in code and getting that kind of preference data.[00:46:33] Do we even need human annotators?[00:46:33] swyx: That makes a ton of sense. Anyone else have any other commentary, I guess, about the additional data collection? Like what I sense now is that they're, there're there's an inc, there's a shift away from, I guess the pre-training data sets which are more opaque but also equally well understood towards more of this preference in our HF data.[00:46:52] Alessio Fanelli: Yeah, they, they spent a lot of time in the supervised fine tuning data too. They actually compare human vendors to some of their models and they were like, yes, we should just use the. Human annotators or like reinforcement learning.[00:47:05] Nathan Lambert: I'll tell you what, yeah.[00:47:07] swyx: The annotators are using the models anyway, right?[00:47:09] So it's just Yeah, exactly.[00:47:10] Nathan Lambert: Models all the way down.[00:47:12] Models Rating Models[00:47:12] speaker 1: I I[00:47:13] Alessio Fanelli: think also the other, I mean, to me, some of these things are like chemy, right? They're like, we stopped annotating super fast and fine tuning data at 27,540 annotations. Why? It's like, it seems like such a arbitrary number, you know, that I feel like that's gonna be one of the next research areas, you know, figuring out where the, the right limit is.[00:47:35] Do we have maybe, do you know if there's any really good again, like open source? Open source, like datasets for posts, not pre-training, but like a fine tuning then R lhf. Because I think one of the big moments with Uber pajama was like, okay, we can take the LAMA one data mixture, use all the open source data sets and just run GPUs at them.[00:47:55] How do we get to do the same with the post-training flow?[00:47:58] Nathan Lambert: Okay, so you were breaking up a little bit for the question. So I, I'm gonna say what I think it was, and if it wasn't, you can jump in and clarify. So I think it's like, how do we recreate this supervised training data set and like, can we do anything else with it after the fact?[00:48:14] Yeah. So Gen, this is another thing that we've started doing, and I think that what, so the open source equivalents are something like Open Assistant created a really high quality dataset, artifact, and then the recent trend is for this thing that's like called Uncensored dataset, which I think is this totally silly name.[00:48:34] Because really what they're doing is they're removing instructions like as a language model, I don't wanna say this. And therefore when you remove these things, the model gets more helpful. So that's just gonna be the new type of data, which is just clean response on instructions with really strong distribution control.[00:48:50] And the thing is about recreating this is that it's. Hard to create a diverse set of tasks. So what they are essentially paying money for is someone to make sure that you're not getting a whole bunch of the same poems or something. It's like getting 27,000 weird creative tasks that don't all overlap with each other is why you have to pay a lot of money for it.[00:49:11] Rather than saying, oh, we have 250 people on this call, it's all due, 10 of them. And then that's a solid start. Like we would just have a totally misshape in distribution and it wouldn't be that useful. So I think even in, so you can go look at like instruction, BT and other papers like this have breakdowns of what that instruction data, the supervised, fine tuning data actually looks like.[00:49:33] But actually creating it is pretty hard. And I do think that the vendors provide a really high quality amount of data, but their point about the models being able to create it is also really true. So it's, it's, it's pretty borderline right now. And anthropic stop using that in their, in their future work.[00:49:50] So like, Philanthropics new base models are just good enough at responding to instructions where they don't need to do supervised, fine tuning. And that's like in the constitutional AI paper. So it's like, I don't think that's the place to invest time. It's much more on the preference side to get the RL HF model and to get these preference models going.[00:50:09] So then maybe you can even do creative things like constitutional AI and stuff after that.[00:50:13] Alessio Fanelli: Yep. So if you wanna do work in open source today, you think you're better off contributing to this site versus like trying to train another yet another model.[00:50:24] Nathan Lambert: Yeah. There's no preference models out there and it's astonishing to me, especially given that meta's papers like, oh, we use a ensemble of two preference models.[00:50:32] The thing that I wanna see is them do or someone do, is like take a base LAMA model and then also train another preference model that's for code and then try to do RH F where you like have a prompt flag for all the. All the code questions get rated by their own preference model as well and see what that can do because they already broke it down into like instruction helpfulness and safety.[00:50:52] Mm-hmm. It's like, why can't we add another one? It it, it's so obvious that I'm surprised it didn't, it, it just makes a lot of sense. Seeing it in the paper. I was like,[00:51:02] How to get Code preference data[00:51:02] swyx: stoked. Yeah. This, this conversation gave me a bit of an idea for essentially llama stack overflow. Like you, you imagine like Stack overflow with with like sort of llama at, its at its base, but then like, it's not very good at coding, but we can actually do ratings on like, you know, preference ratings on, on answers and, and, and entire conversation chains.[00:51:21] And at, at some point, we'll, we'll accumulate the, the code DA dataset that we need to find here in lama. That would probably do it.[00:51:27] Yeah,[00:51:28] Nathan Lambert: we, we've like, there's challenges in base models and how to execute code to get feedback and stuff, but, We've seen early experiments and like we've worked on one, funnily enough that was called Stack Lama. We like did a, like a nice experimentation of that hugging face and it's, it's out there, it's ready for someone to invest more time in it and do it.[00:51:48] I think especially now that Llama2, I'm like, Lama two's gonna be easier to work with. It's just better language models are a little bit easier to[00:51:56] swyx: steer. Absolutely. Alex, you have and Mars catalog you, you just joined and I I am sure you have a question. Yeah, go ahead Alex.[00:52:04] Llama 2 Finetuning Ecosystem[00:52:04] Alex Volkov: I, I, I just want to complete down what Nathan said.[00:52:06] It's going to be easier to work with because the ton of the ecosystem and the different kind of. Things that the first Lama opened up is now there, right? The G GML is there, all the, for all and, and the Pinocchio browsers, like all different things. How to run like Lama on your laptop already kind of existing.[00:52:25] And now we're just gonna see the commercial folk come in. The, the folks for, for whom working on this actually needs like a dollar sign afterwards. And now they'll be able to also participate in this. And we've seen this already. I, I dunno if you guys talked about this or not scale. AI apparently had early access to this and now released a a, I think open source, like full open source toolkit to fine tune mosaic and which is now Databricks also chime in, but it's now super simple to fine tune LAMA on their you know, infrastructure.[00:52:54] Even though they have the, the TT models, et cetera. They still wanna support LAMA and those Yeah, like the ecosystem exists and I think Nathan's completely right. It's gonna be easier to[00:53:03] Nathan Lambert: use. Easier to find tune. Yeah. Like hugging face. I think every. Library, like all these people at Hugging and Face, were working super hard this weekend to make day zero support for Llama2.[00:53:14] Like Transformers, pft, T r L, for like all these people put in the hours to make it's, it's there like this week it's. Like people are doing this now instead of talking on a podcast, they're fine doing this thing. I'm sure that,[00:53:28] swyx: For, for what it's worth I did actually look into the scale thing because I thought that was kind of interesting, their announcement.[00:53:33] They never said that they were directly used at Llama2. Perhaps there's, they're not allowed to say so. They all, they say scaly, I is proud to be a meta launch partner. We're launching a platform for customizing lms, blah, blah, blah. And, and obviously, you know, you know, that scale does annotation, so I think it's just heavily implied.[00:53:51] But I don't think they're allowed to say,[00:53:54] Simon Willison: I, I've got,[00:53:56] Nathan Lambert: yeah, surge announced they did the surge device data. At least I I think they did more of it too. Go ahead.[00:54:02] Hey Apple: Llama2 on Metal pls[00:54:02] Simon Willison: Quick hugging face Transformers question, I really want to run LAMA two on my M two Mac using metal. And so it takes advantage of the GPU integration and the M two.[00:54:12] Could somebody please figure out how to do that with hugging face transformers, then publish the world's most straightforward how to do this document because I have not managed it yet. And I think that would be a huge capacity increase for, for all sorts[00:54:24] swyx: of people.[00:54:24] Nathan Lambert: Yeah. Pedro's at hugging face is working on that. At least integrating these models with Apple directly is fantastic. I agree. I agree. We agree. There's[00:54:38] Russell Kaplan: also a project called llama cpp that hardware accelerates for the M two for the llama one. So I'm sure they're gonna be updating that for the new models as well,[00:54:49] Simon Willison: working mean on the cpp.[00:54:51] But I've, I've not seen it run metal yet. I need to, evidently I haven't checked the reading in the past few weeks.[00:54:58] swyx: Isn't it, as long as it's in G gml, it works, right? Yeah. And those are[00:55:01] Alex Volkov: the converted models in G GML format. We were able to run one. You guys should split it between CPUs and gpu and I don't know, in the audience, we LAMA two seven B in G gml and[00:55:13] Nathan Lambert: run really fast.[00:55:15] Simon Willison: Fantastic. Yeah. Again, if somebody wants to be really useful, publish a nice detailed step-by-step instructions, they're getting that working and I will benefit from it and so will load of it. I don't want to do it myself. I want somebody else to, to figure it out[00:55:26] swyx: for me. Yes. And, and Simon's, Simon's very good at this.[00:55:31] You can just kind of copy and paste the, the kind of tutorial quality that he does. That'd be great for all of us. Thank you.[00:55:36] I wanna recognize Anton, who is joined. Hey,[00:55:39] Nathan Lambert: stranger.[00:55:40] Anton Troynikov: Hey, Swick. How's it going,[00:55:41] swyx: man? It's going well. We're very excited about open source models. What you got?[00:55:46] Anton Troynikov: Yeah, I mean, it's an exciting time, right?[00:55:47] Llama 2 and Chroma[00:55:47] Anton Troynikov: I got asked almost immediately, what does this mean for chroma and retrieval and all the other things. We're in the process of benchmarking and evaluating. To see if it's actually suitable in the sort of retrieval augmented generation use case. Intuitively we have this idea that lighter weight models want to perform well because you don't need so many weights for all the facts.[00:56:08] You just need them to be reasoning machines. So yeah, we're excited to be trying that out. We'll ship results as soon as we have them available.[00:56:16] swyx: What evals do you look at for models as reasoning machines?[00:56:21] Anton Troynikov: I mean, there's plenty of retrieval, augmented generation benchmarks out there. The one that I usually run as a quick test is the SciQ data sets, the multiple choice question answering with distractors and supporting paragraphs.[00:56:33] Ah, but there's, you know, there's entire batteries of these tests. One of the things that we're actually looking at doing at chroma very soon, and we've been speaking to the AI research labs about this, is nobody's really got benchmarks that are relevant to production data. The benchmarks that exist are very academically oriented and fairly synthetic.[00:56:51] So they consist of, you know, crowdsourced exam, answer question answers. They consist of sort of this really document retrieval oriented thing where it's like, find a document that's relevant to this query, but production use cases don't always look like that. So we're actually looking at, you know, community sourced benchmarks that, that focus much more on the what, what the real data actually looks like.[00:57:15] swyx: Yeah, totally. The only one I can think of that is, I guess the most prominent one is the open assistance dataset that is gonna free and clear of any usage restrictions stuff. Yeah, I mean do would you, yeah, I think[00:57:27] Nathan Lambert: so.[00:57:28] Anton Troynikov: Usage restrictions, I think, I think for evaluating models, there are very few restrictions for use of these data sets.[00:57:36] For benchmarking, it's very few restrictions for training. There is for sort of commercial purposes, there is, but for the case of like, does this model work well in a retrieval context, there are very few usage restrictions.[00:57:48] Nathan Lambert: Got it.[00:57:49] swyx: Amazing. Who else has questions or topics that you wanna bring up about LAMA two and generate?[00:57:55] Open Source MoE model?[00:57:55] Alessio Fanelli: One thing that I was thinking about is in the benchmarks they compare to G B T for, but if what George Hotz said on the podcast was right and should be D four is like eight attention heads. I wonder when people are gonna get eight, you know, get a LAMA two mixer expert going and benchmarking that.[00:58:12] Maybe it will be better. I don't know.[00:58:15] swyx: Yes, there, there is a little bit of a playbook that has been published out there, so I mean, it, it takes more skill than I, I have, but I'm sure someone else, else out there is currently working on it. I think that the Chinese universities have, have made some interesting progress there.[00:58:28] Yeah, Simon, and then we'll go to Mars.[00:58:31] Llama 2 using tools[00:58:31] Simon Willison: So we talked about the we talked about retrieve augmented generation. The other thing I'm excited about is tool format, right? The the thing where it can call functions, essentially Uhhuh and that's mentioned in the paper. They mentioned they benchmarked along that, but, but I didn't get a feel for something that was really good at, the thing I want is I want basically exactly the same APIs, open AI functions, but I want it to run off of Llama2.[00:58:53] I think that would be, that would open up all sorts of opportunities.[00:58:57] Nathan Lambert: They, they said that that capability was emergent and they didn't train on it. There's a line in the discussion where it's like, oh yeah, we got some tool performance where we didn't train on it. So now we can all go fine tune on it and it should be easier.[00:59:10] Russell Kaplan on Scale AI's Llama 2 plans[00:59:10] Anton Troynikov: We got Russell Kaplan in here from the space, from scale ai. I think we wanna bring him up. I think he's got a few interesting things to say about how scale is thinking about these things. I know that they were mentioned here before.[00:59:20] swyx: Hey Russell.[00:59:21] Russell Kaplan: Here you go. Great. Yeah, no thanks. Thanks Anton. Yeah, we were, we were super stoked about the LAMA two release. Yeah, we put out a, an open source library LM engine for folks to fine tune and serve LAMA two and other language models whether hosted by scale or, or on their own infrastructure.[00:59:37] And I think generally at scale we're looking to start doing a lot more open source stuff. So you know, one of the next things we're gonna be doing is starting to fine tune LAMA two on interesting domain specific data sets that we create, or, or problem domain. So Anton you mentioned not sure how well it's working for retrieval.[00:59:55] You know, we'd love to just like put together a data set that we could use to fine tune these models to be good at retrieval. I think we have one planned out for SQL right now. Potentially other tool use. So yeah, I'd be really curious, you know, hearing from the audience. If there are sort of requests for, for good fine tunes of LAMA two or if anyone, you know, already has that data, you can just clone our repo LM engine and and try it out.[01:00:17] Simon Willison: So I've got one for you. I want a clone of chat GP PT code interpreter built on top of LAMA two, which I imagine would require quite extensive fine tuning. But my good, I mean we've talked about this recently, how chapter code interpreter really is a next level AI tool. Being able to run our own version of that against LAMA two would be incredible.[01:00:35] Yeah, that would be, that would be great.[01:00:36] Russell Kaplan: I, yeah, we do, we do, we do a lot of code sort of data acquisition right now, so I think that's definitely in the wheelhouse. But yeah, that's a, that's a good idea to,[01:00:45] Anton Troynikov: to try out.[01:00:45] Code data acquisition sounds so sinister. Russell,[01:00:49] Russell Kaplan: You know, it takes, you gotta, you gotta write a lot of code. Write a[01:00:52] Matt Bornstein: lot of code. Yeah.[01:00:53] Russell Kaplan: I think we have something like 350,000 people all around the world who are sort of helping with this stuff. And within that there's, you know, a lot of domain specific expertise.[01:01:01] Scale annotating code?[01:01:01] swyx: Is there a way that like, so we were talking before you joined about scale acquiring, I guess preference data from developers rather than I guess the, the standard annotators that you have. Is this a, is this a, is this a need or focus that you have? Is there a way that we can help or Yeah. How do we crowdsource this?[01:01:18] Yeah, no,[01:01:19] Russell Kaplan: definitely. No. So, so one of the interesting things has just been for, for our business where, you know, we do a lot of the R LH f labeling for, for all the companies training these foundation models has just been that the level of expertise required has gone up tremendously. Right? So we have a lot of our crowd now it's, it's really domain experts in.[01:01:38] Specific areas, whether it's programming in a particular language or people who have, you know, passed the CPA or people who have passed the bar or licensed in some profession. That's really been where a lot of our sort of growth has been. And so, yeah, I mean, if anyone is a programmer and wants to kind of infuse their knowledge into the AI, that will power the rest of our, of our society increasingly over time.[01:02:01] You can, you can just go to scale.com and and sign up to, to start help help[01:02:04] Nathan Lambert: programming.[01:02:06] Immortality[01:02:06] Anton Troynikov: Another, another benefit of this is by the time we have ais strong enough to simulate entire human beings, your data will already be in them. So you'll be resurrected and[01:02:15] Nathan Lambert: get to live forever in the afterlife.[01:02:18] swyx: Indeed, we are the first immorals. It's the way to achieve immortality. Yeah. You know, immortality take it. It's yours, but it's not on the battlefield. It's editing Wikipedia. That is that is immortality.[01:02:29] Running Llama on your phone[01:02:29] swyx: Mars, you had your hand up. Hey, really[01:02:31] Whole Mars Catalog: been enjoying listening to this conversation. I think it's such an exciting day with LAMA two and the commercial license.[01:02:39] One of the things that I've really been excited about, and I think Qualcomm made an announcement with Meta and they said they're going to be looking at optimizing it for Snapdragon hardware, accelerating it. I think one of the most interesting things about these open source models, especially now that you have a commercial license, is actually running it on your laptop or even your smartphone.[01:03:03] You know, maybe the 7 billion parameter model and the kind of use cases that opened up, that opens up that, you know, just weren't there a few months ago. I was wondering if people had any thoughts on that and what we might see in that area.[01:03:17] Nathan Lambert: Meta just gave Tipco a huge softball for Apple to fix Siri, and they still hate each other.[01:03:26] Simon Willison: So I've been running the Qna seven B on my iPhone for a couple of months, just as a, mainly as a demo. So I could just shove it people's face and go Look, my phone's offline. And it's still writing me terrible poetry. And I have to admit, it's fun. I've not yet found use cases for that quality of model for, for when I'm offline.[01:03:44] And maybe I'm just not being imaginative enough. My, my hunch is that models that are smaller like that, that can run on your phone are much more interesting if you combine them with retrieval, augmented generation or, or tool use. So on. And just as a, a plain sort of chatty PT style language model, I've not yet found many practical uses for it.[01:04:02] I'd love to hear from people. Oh, that's not true. I use it for brainstorming occasionally if I want to come up with a name for something that's like I used to dread naming things. Now I, I'm fine with naming things cause I get a language model to brainstorm for me. But one on my phone is good enough to do that.[01:04:16] I've had it come up with some names for things for me so far.[01:04:18] Nathan Lambert: We talked about evaluation a lot. I've used it for naming and I've also used these models to kind of generate evaluation prompts, which is kind of a different way to do it. It's like come up with some hard python coding questions where you put a bug in this type of function and like, I'm not gonna come up with that on my own.[01:04:36] Yeah, it can be a really[01:04:37] swyx: useful spot check, I guess, or I dunno, men mental augmentation tool, whatever[01:04:43] Nathan Lambert: we call that.[01:04:44] Sama [01:04:44] Anton Troynikov: So can we, can we take a minute to do some kremlinology here? What's the deal with like, friendship ended with Sam Altman? Now Mark Zuckerberg is my best friend with Satya. I wanna, I wanna get into that[01:04:55] Alessio Fanelli: side.[01:04:56] I was smiling a lot more in this picture with Mark than with Sam. That's what I noted. But wait, there's[01:05:01] swyx: the picture. What?[01:05:03] Alessio Fanelli: Satya posted a photo with, with Mark and he was like just laughing away. And then I looked back at the one that, remember the one you posted, Satya and Sam together, and I think the bill conference or something with[01:05:15] Anton Troynikov: Satya, Satya, Sam, and Sam's nipples.[01:05:17] Simon Willison: Yes.[01:05:19] Alessio Fanelli: And say Satya was not smiling as much. I don't know. But I, I really wonder what that does to the, you know, open AI does have to pay back a lot of money to Microsoft stuff. It's[01:05:29] Anton Troynikov: kinda, it's kinda crazy that that a Azure is the launch partner cuz Open AI is exclusively running on Azure, Azure hardware.[01:05:36] This is a very, very curious move. Right. And I, I can't really disentangle it. Given sort of the scope of Microsoft's investment in OpenAI is entirely in Azure credits. Like one interpretation of this move is that they've already got OpenAI locked in. Right. They're not going anywhere. So might as well get the other, you know, contending models, right?[01:06:02] If, if you're, if you're Satya, how are you thinking? The only thing that we know for sure at cruise value in this environment is owning compute, and that's what Microsoft[01:06:11] swyx: has. Yes. But AWS is also a launch partner, right? What does it mean to be a launch partner of an open source model? Like if you can run compute, you can, you can run it.[01:06:20] Alessio Fanelli: I think that's the, that's the main, the main question. Yeah. But I think like Microsoft is clearly, you know, happy to be involved. To them, it's like a yes. Their first equals exclusivity just one way, you know, it's not a two way exclusivity, so they don't, that's whatever. The other thing is[01:06:35] speaker 1: this, this will probably increase the demand, the compute demand on Azure from all of their enterprise customers, right?[01:06:41] So, you know, whether they're selling compute to OpenAI or all of the other enterprises they work with. Having more models available that, that everyone's using should, should just kinda[01:06:50] Matt Bornstein: keep growing that business. Not to mention, I[01:06:52] Russell Kaplan: think a lot of their Azure customers probably have significant concerns about privacy, about putting sensitive business data through this and being able to just run inference on your own hardware that you control probably is more appealing to them in some cases than running REST API and calling out to open AI's infrastructure.[01:07:11] Azure?[01:07:12] Anton Troynikov: Well, they've got, they've got Azure endpoints for the open AI models. I'm, I'm not that, I'm actually not quite up to speed with the privacy model there, but my understanding is there's not really much difference.[01:07:25] Simon Willison: My hunch is that it doesn't matter if it is what? What matters is, is what people feel.[01:07:29] It's the vibes. And you see so many of these, so many people, so many companies saying, no, absolutely no way we would pipe pump any of our private data through somebody else's model. Even if they say they won't use it for training, which they all do, but whereas I guess maybe they're okay with pumping it through as through Microsoft as you, but at least it's on our own, like GPU reserved instances.[01:07:51] Maybe that's what's going on here. There's so much paranoia around this space at the moment. Yeah, a lot of the[01:07:55] Russell Kaplan: details come down to can you run it within your own virtual private cloud? I, I wish, I wish we could close enterprise customer security requirements on the vibes, but at least in my experience at at scale people do, you know, there there's some compliance function somewhere in the organization that has to sort of check the boxes that you're not, you know, gonna get screwed on later.[01:08:15] And so that's definitely been one of the big drivers of people looking to self-host their own open source LMS more and more.[01:08:25] Alessio Fanelli: Yeah. And the other thing is that they did not use any Azure compute to actually train the model. So if you go in the paper it mentions they only use their super cluster and their internal production cluster.[01:08:35] So no Azure we use to train it. I guess it's just the inference partner. Yeah, so I mean, going back to the point of they just want GPUs to run. It's not about this is the best GPUs that we use. They didn't even use it.[01:08:48] Meta "Open Source" Leadership[01:08:48] Matt Bornstein: I think what's really interesting[01:08:49] speaker 1: about, about this release is that, you know, for, for a while people have been talking about how oh, is meta behind in, in ai, generative AI and language models. And, and you know, I think Roone had a tweet that was like, the best open source model sounds a lot better than the fifth best language model.[01:09:06] And it's actually totally true. And, and I actually think that that companies, you know, if you are behind, if you're not in first place, if you, if you open source stuff and you just sort of get the community using it you can, you can get a lot of goodwill,[01:09:18] Nathan Lambert: get a lot of adoption and actually really move[01:09:20] speaker 1: the industry forward.[01:09:21] So yeah, really cool to see Meta sort of put this out and I think, I think it will also spur a lot more open source from a lot[01:09:28] swyx: of other companies.[01:09:28] I fully agree. I think, I think this is something that we've been very excited about. We heard, we heard some bes about it a couple months ago and then you know earlier this week or I guess last week and now, now it's fully out. Okay. Maybe I'll do just a round for predictions.[01:09:43] What happens next in open source models over with Lama.[01:09:46] Prediction: Finetuning => New Use Cases from Internal State[01:09:46] Nathan Lambert: I'll go first. I'll[01:09:47] go[01:09:47] Anton Troynikov: first. I think the first thing that needs to happen here is the community will actually get the model into its hands and find out its true capabilities. Benchmarks only take us so far. Once that has happened, we're gonna see an extensive sort of period of fine tuning where people are going to apply it to their particular applications and, you know, keep, keep pushing the envelope here and then if it is sufficiently capable, I actually think that we might find new uses for these models that we don't find in rest APIs served ones because you can get at the internal state.[01:10:16] Right. The thing that I'm always thinking about obviously is embeddings and internal states and, and like modifications here. And I think that there's actually a great deal of interesting research and engineering to be done by looking into what's happening in these models live, especially a sufficiently capable one, which we can do reasoning.[01:10:32] And so I'm particularly excited about that. I'm particularly excited about having something at least sufficiently capable that we can start to reason about because the entire research community has access to it rather than, you know, behind a closed wall inside some of the[01:10:45] Nathan Lambert: bigger AI labs.[01:10:47] swyx: Anyone else? Simon Nathan?[01:10:48] Nathan Lambert: Yeah, I, I would mostly just double down on that and I could comment on how remarkable the collapse of kind of NLP research as it was, has been onto open AI APIs.[01:11:01] And this is an opportunity to reset some of that dynamic where so much academic work, which is fine tuning open AI models. And I was like, oh, sorry, we nuked all your fine tuned models and things like that. Like from a values perspective, this is huge for research to kind of proceed as it was meant to be in a way.[01:11:23] And that is wonderful.[01:11:24] Prediction: Llama Toolformer[01:11:24] Simon Willison: I'm looking forward to the first fine tunes. I think like alpaca is what unlocked llama. I can't wait to see what people do, especially since everyone's already amped up and ready to go. So I think it'll be fascinating to see what the, how those start shaping up the next few days, few weeks.[01:11:38] And yeah, I want to see people, I want to see the applications. I want to see people figure out retrieve augmented generation. I want to see people figure out if it can do to tool format, all of those things, especially the tricks which make the sort of smaller the seven B models able to do, solve interesting problems.[01:11:53] And I think this is gonna happen really quickly. You know, we've got so many more people who know how to work with these models today than we did when Lama came out back at the end of February. So I'm expecting that to just be a whirlwind of activity starting about four hours ago. And yeah, I can't wait to see what happens.[01:12:09] Prediction: Finetune-for-everything[01:12:09] Simon Willison: I, I totally[01:12:10] Russell Kaplan: agree. I think, I think there's gonna be an explosion. Of domain specific and use case specific fine tunes. And I think that, you know, the sort of first order effects are gonna be pretty clear on, you know, this different industry, this different domain. Everyone is gonna start putting out these domain specific fine tunes, not just the companies themselves doing it for their own use case, but you know, they're like, as someone said, like alpaca sort of made llama to or made llama accessible.[01:12:36] We'll have something really similar, but for each category of application. And then I think the second order effect that's really interesting to me is I think tool use and agents are gonna get really good. Right now. People are using, you know, sort of off the shelf untuned language models to try to build agents have them use tools.[01:12:57] But if you, if you're building a, you know, an application and you just need to use that one tool really, really well, Now you have suddenly a G P T 3.5 class model that you can fine tune exclusively to that tool. It's gonna work really well. And I think that the, you know, the barrier to, to utility is so high for these tool use real world applications because of this sort of problem of exponential compounding of errors over long chains.[01:13:24] But if fine tuning works well for that, I think it's gonna be a really big game changer.[01:13:30] Predictions: Llama Agents[01:13:30] Anton Troynikov: I am so bullish on agents, like I'm well aware that they're nothing but toys today. Although I can think of a couple of practical use cases, including in the fine tuning context. Russell, we ought to talk about this actually later, but that's a really good point to my mind that sort of having an easy to find train model for your particular agent use case is maybe going to make these things more useful than they are today.[01:13:51] I'm, I'm very bullish on that. I'm hopeful and of course cuz Koma builds memory for agents. It would be great for us to.[01:13:57] swyx: All right. I think unless you dunno if you have any predictions. I, I, I think I'm kind of out. You guys are definitely taking all the ones that I was gonna say. Yeah.[01:14:05] dP(Doom)?[01:14:05] Nathan Lambert: Wait, wait, wait,[01:14:05] Anton Troynikov: wait, wait. Before, before we sign off here, let's go around the, let's go around the room. Probability of AI doom improved or made worse by the release of LA material.[01:14:14] Nathan Lambert: Let's go.[01:14:15] Simon Willison: I couldn't care less. I don't care about the doom scenarios. I care about building stuff with, with what we've got.[01:14:22] Nathan Lambert: So,[01:14:22] Anton Troynikov: so none, it has not moved[01:14:24] Nathan Lambert: your needle. No.[01:14:25] Simon Willison: My, my needle is, is stuck on the sort of metal, maybe 5%, but, but not worth thinking about. Too hard.[01:14:31] Anton Troynikov: All right. Five, 5% doom. I'm, I'm willing to accept 5% doom.[01:14:36] We've, we've, we've accepted way more percent doom than other technologies.[01:14:39] Alessio Fanelli: I'm an old DOM, so it's we're, we're gonna use it for more good than bad. We'll be done with it.[01:14:45] Speaker 2: I would like to believe that having a model that we can actually understand and like go deep and develop on top of it, will not only advert the DOMA scenarios, but will allow us to prepare better in case any crazy person wants to make doom on their own. A sufficient enough community of builders of LLMs and ais[01:15:10] Matt Bornstein: can stop that.[01:15:12] Yeah, I think that's a really[01:15:13] Anton Troynikov: great point actually. The safety story gets better when we have more opportunities to work with the core internals of the models as they actually exist instead of hypothetical abstract objects that we reason about.[01:15:27] swyx: Yeah, I was[01:15:27] speaker 1: gonna say[01:15:28] swyx: like, I'm a pretty high P doom person, but it, it's moved down because we can have, you know, GC five or LAMA three, you know, explain the weights of LAMA two.[01:15:37] And I, I do think that that improves interpretability quite a bit. How[01:15:42] Nathan Lambert: are you going to know if it's telling the[01:15:43] Anton Troynikov: truth? I like, I, I know that you, I know about these, just ask the model approaches, but I'm pretty skeptical.[01:15:49] Nathan Lambert: I've gotta tell ya.[01:15:51] swyx: Give it a GoBoard you know, swap out one of the positions, see what happens, you know, that kinda stuff.[01:15:55] You know, we, we've done small versions of this. We've done, we've done very, very small skills version of this already, right. Like, so, I dunno,[01:16:01] Nathan Lambert: this[01:16:01] swyx: is hand wavy. I mean, you[01:16:02] Nathan Lambert: know. No, I'm,[01:16:03] Anton Troynikov: I'm just, I'm just genuinely curious about the ideas here, but that's, that's a different discussion. Exactly. Yeah. Yeah.[01:16:09] Russell Kaplan: Yeah, I just think it's amazing how these language model capabilities that just a few months ago felt cutting edge when people used them for the first time in chat. G B T have now progressed to a state where it's almost becoming commodified and everybody's having these models.[01:16:27] There's more and more of them popping up, people starting things and open source models exploding. I don't think necessarily we can fully understand the significance of what's happening here today, but going into the future, it's probably going to be really common for pretty much every computer to be running large language models natively on the device.[01:16:51] Wrapping up[01:16:51] swyx: All right. Well, that's a very positive view of the future. I think we're all very encouraged by that. Yeah. I would just want to thank everyone for joining and sharing your thoughts on LAMA two. Alessio. Did you have parting[01:17:01] Alessio Fanelli: thoughts? No, that was it. Thank you everyone.[01:17:05] swyx: Thank you so much. We'll clean up the audio of this thing and post it tomorrow on the in space, but otherwise, I think we should follow what Russell and, and Nathan and the others have been saying, which is go play with Llama2.[01:17:14] So I guess we'll all go do that. Have a wonderful day everyone. Thanks everyone. Thank you sir. Alex. Thanks everyone. Bye bye. Have a[01:17:23] Speaker 2: great time. Get full access to Latent Space at www.latent.space/subscribe
01:19:5319/07/2023
AI Fundamentals: Datasets 101
In April, we released our first AI Fundamentals episode: Benchmarks 101. We covered the history of benchmarks, why they exist, how they are structured, and how they influence the development of artificial intelligence.Today we are (finally!) releasing Datasets 101! We’re really enjoying doing this series despite the work it takes - please let us know what else you want us to cover!Stop me if you’ve heard this before: “GPT3 was trained on the entire Internet”.Blatantly, demonstrably untrue: the GPT3 dataset is a little over 600GB, primarily on Wikipedia, Books corpuses, WebText and 2016-2019 CommonCrawl. The Macbook Air I am typing this on has more free disk space than that. In contrast, the “entire internet” is estimated to be 64 zetabytes, or 64 trillion GB. So it’s more accurate to say that GPT3 is trained on 0.0000000001% of the Internet.Why spend $5m on GPU time training on $50 worth of data?Simple: Garbage in, garbage out. No matter how good your algorithms, no matter how much money/compute you have, your model quality is strongly determined by the data you train it on and research scientists think we just don’t need or have that much high quality data. We spend an enormous amount of effort throwing out data to keep the quality high, and recently Web 2.0-era UGC platforms like StackOverflow, Reddit, and Twitter clamped down on APIs as they realize the goldmines they sit on.Data is the new new oil. Time for a primer!Show Notes* Our 2 months worth of podcast prep notes!* The Token Crisis paper* Ilya Sutskever on datasets * OpenAI Tokenizer* Kaplan Scaling Laws Lecture* Chinchilla Paper* Sasha Rush’s Tweet* Karpathy’s Build Conference Presentation* LIMA Paper* Phi-1 by Microsoft* Washington Post Article on datasets* Our episode with Jonathan Frankle* Our episode with Mike Conover* BloombergGPT* Datasets* HuggingFace Hub* CommonCrawl, Overview* C4* List of Dirty, Naughty, Obscene, and Otherwise Bad Words* OpenWebText* books3* OpenAssistant * The Stack* The Pile* LAION* Audio:* LibriSpeech: A dataset of audio recordings of audiobooks* CommonVoice: A dataset of audio recordings of people speaking different languages* Voxforge: A dataset of audio recordings of people speaking different languages* Switchboard: A dataset of audio recordings of telephone conversations* Fisher Corpus: A dataset of audio recordings of news broadcasts* Chinese:* CMRC (Chinese Machine Reading Comprehension 2018)* DuReader* ChID* Copyright & Privacy:* https://stablediffusionlitigation.com/* https://haveibeentrained.com/* https://githubcopilotlitigation.com/* https://twitter.com/moyix/status/1662131770463072257* OpenAI Opt Out Process* Check if you’re in The Stack* Deduplication* Deduplicating Training Data Makes Language Models Better* Deduplicating Training Data Mitigates Privacy Risks in Language Models* Contamination* CodeForces example Get full access to Latent Space at www.latent.space/subscribe
01:00:5517/07/2023
Code Interpreter == GPT 4.5 (w/ Simon Willison, Alex Volkov, Aravind Srinivas, Alex Graveley, et al.)
Code Interpreter is GA! As we do with breaking news, we convened an emergency pod and >17,000 people tuned in, by far our most biggest ever. This is a 2-for-1 post - a longform essay with our trademark executive summary and core insights - and a podcast capturing day-after reactions. Don’t miss either of them!Essay and transcript: https://latent.space/p/code-interpreterPodcast Timestamps[00:00:00] Intro - Simon and Alex[00:07:40] Code Interpreter for Edge Cases[00:08:59] Code Interpreter's Dependencies - Tesseract, Tensorflow[00:09:46] Code Interpreter Limitations[00:10:16] Uploading Deno, Lua, and other Python Packages to Code Interpreter[00:11:46] Code Interpreter Timeouts and Environment Resets[00:13:59] Code Interpreter for Refactoring[00:15:12] Code Interpreter Context Window[00:15:34] Uploading git repos[00:16:17] Code Interpreter Security[00:18:57] Jailbreaking[00:19:54] Code Interpreter cannot call GPT APIs[00:21:45] Hallucinating Lack of Capability[00:22:27] Code Interpreter Installed Libraries and Capabilities[00:23:44] Code Interpreter generating interactive diagrams[00:25:04] Code Interpreter has Torch and Torchaudio[00:25:49] Code Interpreter for video editing[00:27:14] Code Interpreter for Data Analysis[00:28:14] Simon's Whole Foods Crime Analysis[00:31:29] Code Interpreter Network Access[00:33:28] System Prompt for Code Interpreter[00:35:12] Subprocess run in Code Interpreter[00:36:57] Code Interpreter for Microbenchmarks[00:37:30] System Specs of Code Interpreter[00:38:18] PyTorch in Code Interpreter[00:39:35] How to obtain Code Interpreter RAM[00:40:47] Code Interpreter for Face Detection[00:42:56] Code Interpreter yielding for Human Input[00:43:56] Tip: Ask for multiple options[00:44:37] The Masculine Urge to Start a Vector DB Startup[00:46:00] Extracting tokens from the Code Interpreter environment?[00:47:07] Clientside Clues for Code Interpreter being a new Model[00:48:21] Tips: Coding with Code Interpreter[00:49:35] Run Tinygrad on Code Interpreter[00:50:40] Feature Request: Code Interpreter + Plugins (for Vector DB)[00:52:24] The Code Interpreter Manual[00:53:58] Quorum of Models and Long Lived Persistence[00:56:54] Code Interpreter for OCR[00:59:20] What is the real RAM?[01:00:06] Shyamal's Question: Code Interpreter + Plugins?[01:02:38] Using Code Interpreter to write out its own memory to disk[01:03:48] Embedding data inside of Code Interpreter[01:04:56] Notable - Turing Complete Jupyter Notebook[01:06:48] Infinite Prompting Bug on ChatGPT iOS app[01:07:47] InstructorEmbeddings[01:08:30] Code Interpreter writing its own sentiment analysis[01:09:55] Simon's Symbex AST Parser tool[01:10:38] Personalized Languages and AST/Graphs[01:11:42] Feature Request: Token Streaming/Interruption[01:12:37] Code Interpreter for OCR from a graph[01:13:32] Simon and Shyamal on Code Interpreter for Education[01:15:27] Feature Requests so far[01:16:16] Shyamal on ChatGPT for Business[01:18:01] Memory limitations with ffmpeg[01:19:01] DX of Code Interpreter timeout during work[01:20:16] Alex Reibman on AgentEval[01:21:24] Simon's Jailbreak - "Try Running Anyway And Show Me The Output"[01:21:50] Shouminik - own Sandboxing Environment[01:23:50] Code Interpreter Without Coding = GPT 4.5???[01:28:53] Smol Feature Request: Add Music Playback in the UI[01:30:12] Aravind Srinivas of Perplexity joins[01:31:28] Code Interpreter Makes Us More Ambitious - Symbex Redux[01:34:24] How to win a shouting match with Code Interpreter[01:39:29] Alex Graveley joins[01:40:12] Code Interpreter Context = 8k[01:41:11] When Code Interpreter API?[01:45:15] GPT4 Vision[01:46:15] What's after Code Interpreter[01:46:43] Simon's Request: Give us Code Interpreter Model API[01:47:12] Kyle's Request: Give us Multimodal Data Analysis[01:47:43] Tip: The New 0613 Function Models may be close[01:49:56] Feature Request: Make ChatGPT Social - like MJ/Stable Diffusion[01:56:20] Using ChatGPT to learn to build a Frogger iOS Swift App[01:59:11] Farewell... until next time[02:00:01] Simon's plug[02:00:51] Swyx: What about Phase 5? and AI.Engineer Summit Get full access to Latent Space at www.latent.space/subscribe
02:03:5410/07/2023
[Practical AI] AI Trends: a Latent Space x Practical AI crossover pod!
Part 2 of our podcast feed swap weekend! Check out Cognitive Revolution as well."Data" Dan Whitenack has been co-host of the Practical AI podcast for the past 5 years, covering full journey of the modern AI wave post Transformers. He joined us in studio to talk about their origin story and highlight key learnings from past episodes, riff on the AI trends we are all seeing as AI practitioner-podcasters, and his passion for low-resource-everything!Subscribe on the Changelog, RSS, Apple Podcasts, Twitter, Mastodon, and wherever fine podcasts are sold!Show notes* Daniel Whitenack – Twitter, GitHub, Website* Featured Latent Space episodes:* Benchmarks* Reza Shabani* MosaicML and MPT* Segment Anything* Mike Conover* Featured Practical AI episodes:* From notebooks to Netflix scale with Metaflow* Capabilities of LLMs 🤯* ML at small organizations* Prediction Guard* Data DanTimestamps* 00:00 Welcome to Practical AI* 01:16 Latent Space Podcast* 04:00 Practical AI Podcast* 06:20 Prediction Guard* 08:05 Daniel's favorite episodes* 10:21 Alessio's favorite episode* 10:54 Swyx's favorite episode* 12:44 Listener favorites* 15:14 LLMOps* 17:06 Reza Shabani* 19:06 Benchmarks 101* 20:06 Roboflow* 21:38 Mode collapse* 26:21 Rajiv Shah* 28:01 Staying on top of things* 33:11 Kirsten Lum* 34:31 datadan.io* 38:48 Prompt engineering* 40:38 Unique challenges engineers face* 42:51 AI-UX* 45:31 NLP data sets* 50:49 Unlabeled data sets* 55:07 Lightning round!* 55:20 What's already happened in AI?* 56:27 Unsolved questions in AI* 58:01 Get hands on* 58:53 OutroTranscriptFull transcript is over at the Changelog site! Get full access to Latent Space at www.latent.space/subscribe
01:00:1902/07/2023
[Cognitive Revolution] The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research
Thanks to the over 1m people that have checked out the Rise of the AI Engineer. It’s a long July 4 weekend in the US, and we’re celebrating with a podcast feed swap!We’ve been big fans of Nathan Labenz and Erik Torenberg’s work at the Cognitive Revolution podcast for a while, which started around the same time as we did and has done an incredible job of hosting discussions with top researchers and thinkers in the field, with a wide range of topics across computer vision (a special focus thanks to Nathan’s work at Waymark), GPT-4 (with exceptional insight due to Nathan’s time on the GPT-4 “red team”), healthcare/medicine/biotech (Harvard Medical School, Med-PaLM, Tanishq Abraham, Neal Khosla), investing and tech strategy (Sarah Guo, Elad Gil, Emad Mostaque, Sam Lessin), safety and policy, curators and influencers and exceptional AI founders (Josh Browder, Eugenia Kuyda, Flo Crivello, Suhail Doshi, Jungwon Byun, Raza Habib, Mahmoud Felfel, Andrew Feldman, Matt Welsh, Anton Troynikov, Aravind Srinivas). If Latent Space is for AI Engineers, then Cognitive Revolution covers the much broader field of AI in tech, business and society at large, with a longer runtime to go deep on research papers like TinyStories. We hope you love this episode as much as we do, and check out CogRev wherever fine podcasts are sold!Subscribe to the Cognitive Revolution on:* Website* Apple Podcasts* Spotify* YoutubeGood Data is All You NeedThe work of Ronen and Yuanzhi echoes a broader theme emerging in the midgame of 2023: * Falcon-40B (trained on 1T tokens) outperformed LLaMA-65B (trained on 1.4T tokens), primarily due to the RefinedWeb Dataset that runs CommonCrawl through extensive preprocessing and cleaning in their MacroData Refinement pipeline. * UC Berkeley LMSYS’s Vicuna-13B is near GPT-3.5/Bard quality at a tenth of their size, thanks to fine-tuning from 70k user-highlighted ChatGPT conversations (indicating some amount of quality). * Replit’s finetuned 2.7B model outperforms the 12B OpenAI Codex model based on HumanEval, thanks to high quality data from Replit usersThe path to smaller models leans on better data (and tokenization!), whether from cleaning, from user feedback, or from synthetic data generation, i.e. finetuning high quality on outputs from larger models. TinyStories and Phi-1 are the strongest new entries in that line of work, and we hope you’ll pick through the show notes to read up further.Show Notes* TinyStories (Apr 2023)* Paper: TinyStories: How Small Can Language Models Be and Still Speak Coherent English?* Internal presentation with Sebastien Bubeck at MSR* Twitter thread from Ronen Eldan* Will future LLMs be based almost entirely on synthetic training data? In a new paper, we introduce TinyStories, a dataset of short stories generated by GPT-3.5&4. We use it to train tiny LMs (* Phi-1 (Jun 2023)* Paper: Textbooks are all you need (HN discussion)* Twitter announcement from Sebastien Bubeck:* phi-1 achieves 51% on HumanEval w. only 1.3B parameters & 7B tokens training dataset and 8 A100s x 4 days = 800 A100-hours. Any other >50% HumanEval model is >1000x bigger (e.g., WizardCoder from last week is 10x in model size and 100x in dataset size). Get full access to Latent Space at www.latent.space/subscribe
02:05:2501/07/2023
Commoditizing the Petaflop — with George Hotz of the tiny corp
We are now launching our dedicated new YouTube and Twitter! Any help in amplifying our podcast would be greatly appreciated, and of course, tell your friends! Notable followon discussions collected on Twitter, Reddit, Reddit, Reddit, HN, and HN. Please don’t obsess too much over the GPT4 discussion as it is mostly rumor; we spent much more time on tinybox/tinygrad on which George is the foremost authority!We are excited to share the world’s first interview with George Hotz on the tiny corp!If you don’t know George, he was the first person to unlock the iPhone, jailbreak the PS3, went on to start Comma.ai, and briefly “interned” at the Elon Musk-run Twitter. Tinycorp is the company behind the deep learning framework tinygrad, as well as the recently announced tinybox, a new $15,000 “luxury AI computer” aimed at local model training and inference, aka your “personal compute cluster”:* 738 FP16 TFLOPS* 144 GB GPU RAM* 5.76 TB/s RAM bandwidth* 30 GB/s model load bandwidth (big llama loads in around 4 seconds)* AMD EPYC CPU* 1600W (one 120V outlet)* Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)(In the episode, we also talked about the future of the tinybox as the intelligence center of every home that will help run models, at-home robots, and more. Make sure to check the timestamps 👀 )The tiny corp manifestoThere are three main theses to tinycorp:* If XLA/PrimTorch are CISC, tinygrad is RISC: CISC (Complex Instruction Set Computing) are more complex instruction sets where a single instruction can execute many low-level operations. RISC (Reduced Instruction Set Computing) are smaller, and only let you execute a single low-level operation per instruction, leading to faster and more efficient instruction execution. If you’ve used the Apple Silicon M1/M2, AMD Ryzen, or Raspberry Pi, you’ve used a RISC computer.* If you can’t write a fast ML framework for GPU, you can’t write one for your own chip: there are many “AI chips” companies out there, and they all started from taping the chip. Some of them like Cerebras are still building, while others like Graphcore seem to be struggling. But building chips with higher TFLOPS isn’t enough: “There’s a great chip already on the market. For $999, you get a 123 TFLOP card with 24 GB of 960 GB/s RAM. This is the best FLOPS per dollar today, and yet…nobody in ML uses it.”, referring to the AMD RX 7900 XTX. NVIDIA’s lead is not only thanks to high-performing cards, but also thanks to a great developer platform in CUDA. Starting with the chip development rather than the dev toolkit is much more cost-intensive, so tinycorp is starting by writing a framework for off-the-shelf hardware rather than taping their own chip. * Turing completeness considered harmful: Once you call in to Turing complete kernels, you can no longer reason about their behavior. Since they have to be able to execute any instruction, they are much more complex. To optimize Turing kernels performance, you fall back to caching, warp scheduling, and branch prediction. Since neural networks only need ADD/MUL operations and only rely on static memory accesses, there’s no need to have Turing completeness. This design decision allows tinygrad to optimize instructions at a much lower level. As you might have guessed, CUDA is Turing-complete; this is one of the main differences that tinycorp wants to leverage to be competitive. All that — covered in the first 10 minutes of our discussion. George came ready to go deep, so we went for it. Some of the other technical questions we went through:* Laziness: why laziness is important and how operation fusing can help with memory efficiency* Debugging & CI: Why great developer experience is a priority in tinygrad* Quantization: what’s the right level of quantization, how lossless are these transformations, his quick takes on Mojo and ggml, and why fp16 is the target for their out-of-the-box LLaMA. * Building rigs for individual use: we talked a bit about the design tradeoffs of building these machines with low noise and a single power plug, the difference that PCIe 4 vs 3 makes, and more.The “personal compute cluster” is $15,000, but for businesses interested in local training and inference, George also estimates that he will be able to build you a H100-class GPU that is 5-10x faster (than a H100) for the same price.Misc: Bitter Lessons, Core Insights, Remote WorkOutside of tiny, we also talked about one of George’s favorite units of measure “a person of compute”. Much of the AGI talk has been benchmark-driven, but looking at it from a compute throughput can also be interesting. One person of compute is roughly 20 PFLOPS (64 A100s, or a single dense 42U A100 rack); one A100 is ~$10-15,000, so the GPUs by themselves will come out at $640,000-$1,000,000. We also covered a wide range of topics, including his self analysis on GPT-4, Elon Musk, Remote Work, Computer Vision and the Comma Body, and life above/below the API (and above/below the Kanban board). See show notes and timestamps for more!Show Notes* “Unlocked iPhone Traded for Nissan 350Z”* “Unlocked iPhone” on YouTube (August 21st, 2007)* “The Light It Up Contest” on YouTube (February 13th, 2011)* Comma.ai* NHTSA cease and desist* The Hero’s Journey* The Portal Story* A Person of Compute* Above / Below the API Line (swyx take)* The Bitter Lesson* The Goddess of Everything Else (listen to George read it)* Meditations on Moloch* George’s email to Lisa Su, AMD’s CEO:Timestamps* [00:00:00] Intros & tinygrad’s “Portal Story”* [00:03:00] Thesis #1* [00:03:50] Thesis #2* [00:05:00] Thesis #3 + Turing completeness discussion* [00:10:00] tinygrad’s creation and core ideas* [00:16:00] Operation fusing in tinygrad* [00:17:00] Debugging & profiling in tinygrad* [00:18:30] Tinygrad vs Pytorch competitiveness* [00:20:30] geohot vs AMD* [00:25:00] On ggml* [00:26:00] Tinygrad’s CI philosophy* [00:26:30] On Mojo* [00:28:00] ggml quantization is made up* [00:31:00] Work for tiny: benchmark int8 vs fp16* [00:33:00] Why you can’t build tinybox - Design constraints* [00:35:00] The Personal Compute Cluster* [00:37:00] Shoutout to our MosaicML podcast* [00:39:00] FLOPcoin and other use cases for the tinybox* [00:43:00] Rumors on GPT-4 architecture* [00:46:00] The Bitter Lesson* [00:48:00] Hiring and Changing mind on remote work* [00:52:00] Above/Below The API* [00:55:40] Comma Bodies & Computer Vision* [00:58:40] Merging with the machine and AI girlfriends* [01:02:00] Is AI gonna kill us all?* [01:09:00] Why Avatar 2 was badTranscriptSwyx: Hey everyone, welcome to the Latent Space podcast. This is Swyx, writer and editor of Latent Space. And Alessio is taking over with the intros, Alessio is Partner and CTO in residence at Decibel Partners. [00:00:20]Alessio: Hey everyone, today we have Geohot on the podcast, aka George Hotz. Everybody knows George, so I'm not going to do a big intro. A couple of things that people might have missed: you traded the first ever unlocked iPhone for a Nissan 350Z and three new iPhones. You were then one of the first people to break into the PS3 to run arbitrary code. You got sued by Sony, you wrote a rap song to fight against that, which is still live on YouTube, which we're going to have on the show notes. Did not go to Tesla to build vision, and instead you started Comma.ai, which was an amazing engineering feat in itself until you got a cease and desist from the government to not put these things on the street and turned that into a research only project. [00:01:00]George: You know they're out there. [00:01:01]Alessio: Yeah, yeah. [00:01:03]Swyx: They're out there. [00:01:04]Alessio: But like in a, you know, you market them as a research kind of like no warranty. [00:01:06]George: Because I use the word dev kit, that's not about the government, that's nothing to do with the government. We offer a great one-year warranty. The truth about that is it's gatekeeping. What's the difference between a dev kit and not a dev kit? Nothing. Just the question of do you think it's for you? And if you think it's for you, buy it. It's a consumer product. We call it a dev kit. If you have a problem with that, it's not for you. [00:01:28]Swyx: That's great insight. [00:01:30]Alessio: I was going through your blog posts to get ready. You've wrote this post about The Hero's Journey. And you linked this thing called the portal story, which is kind of the set of stories in movies and books about people living this arbitrary life. And then the run to this magic portals kind of takes them into a new, very exciting life and dimension. When you wrote that post, you talked about TinyGrad, which is one of the projects we're working on today. You mentioned this is more of a hobby, something that is not going to change the course of history. Obviously, you're now going full speed into it. So we would love to learn more about what was the portal that you ran into to get here. [00:02:03]George: Well, what you realize is... You know what made me realize that I absolutely had to do the company? Seeing Sam Altman go in front of Congress. Why? What are the odds they nationalize NVIDIA? What are the odds that large organizations in the government, but of course I repeat myself, decide to try to clamp down on accessibility of ML compute? I want to make sure that can't happen structurally. So that's why I realized that it's really important that I do this. And actually, from a more practical perspective, I'm working with NVIDIA and Qualcomm to buy chips. NVIDIA has the best training chips. Qualcomm has the best inference chips. Working with these companies is really difficult. So I'd like to start another organization that eventually in the limit, either works with people to make chips or makes chips itself and makes them available to anybody. [00:02:48]Alessio: Can you share three core pieces to TinyCorp? Maybe we can dive into each of them. So XLA, PrimTorch, those are the complex instruction system. TinyGrad is the restricted instruction system. So you're kind of focused on, again, TinyGrad being small, not being overcomplicated and trying to get as close to the DSP as possible in a way where it's at more. [00:03:08]George: Well, it's a very clear analogy from how processes are developed. So a lot of processes back in the day were CISC, complex instruction set, system 360, and then x86. This isn't how things stayed. They went to now the most common processor is ARM, and people are excited about RISC-V. No one's excited about it. RISC-V is even less complex than ARM. No one is excited about CISC processors anymore. They're excited about reduced instruction set processors. So TinyGrad is, we are going to make a RISC offset for all ML models. And yeah, it can run all ML models with basically 25 instead of the 250 of XLA or PrimeTorch. So about 10x less complex. [00:03:47]Swyx: Yep. [00:03:48]Alessio: You talk a lot about existing AI chips. You said if you can’t write a fast ML framework for GPUs, you just cannot write one for your own chip. So that's another one of your core insights. I don't know if you want to expand on that. [00:03:59]George: Yeah. I mean, your chip is worse, right? There's no way the chip that you're going to tape out, especially on the first try, is going to be easier to use than an AMD GPU, right? And yet there's no good stack for AMD GPUs. So why do you think you can make one for your chip? You can't, right? There's one other company, aside from NVIDIA, who's succeeded at all at making training chips. What company? [00:04:20]Swyx: AMD? Intel? [00:04:22]George: No, no, no. I've never trained. Who's trained a model on AMD or Intel? Cerebras. [00:04:26]Swyx: Cerebras! [00:04:27]George: I'm talking about, you might know some startups who trained models on these chips. [00:04:31]Alessio: Oh, TPU. [00:04:32]George: Exactly. Right? So Midjourney is trained on TPU, right? Like a lot of startups do actually train on TPUs. And they're the only other successful training chip, aside from NVIDIA. But what's unique about Google is that they also wrote their own ML framework, right? And if you can't write your own ML framework that is performant on NVIDIA, there's no way you're going to make it performant on your stuff. [00:04:53]Alessio: And they started from TensorFlow and then they made the chip after. [00:04:56]Swyx: Yeah, exactly. Exactly. [00:04:58]George: And you have to do it in that direction. Otherwise, you're going to end up, you know, Cerebras, one of those things, a million... Has anyone ever seen a Cerebras? No one's ever like, oh, I trained my model on a Cerebras. Most people are like, I trained my model on GPUs. Some people, 20%, are like, I trained my model on TPUs. [00:05:14]Alessio: And then the third one, which is the one that surprised me the most, is Turing completeness is harmful. It should be avoided. It made sense once I read it, but maybe tell us a bit more about how you got there. [00:05:25]George: Okay. So CPUs devote tons of their silicon and power to things like reorder buffers and speculative execution and branch predictors. And the reason that you need all these things is because at compile time, you can't understand how the code's going to run. This is Rice’s theorem. This is the halting problem and its limit. And this is not like, oh, the halting problem is theoretical. No, no, no, no. It's actually very real. Does this branch get taken or not? Well, it depends on X. Where does X come from? Yeah, forget it, right? But no branches depend on X in a neural net. Every branch is a static loop. Like if you're doing a matrix multiply, it's a static loop over the inner dimension. And neural networks are even better. No loads even depend on X, right? So with a GPU shader, right, your load might depend on which texture you're actually loading into RAM. But with a neural network, your load is, well, I load that way. Why? Well, because I load that way the other million times I ran the same net. Every single time you run the net, you do the exact same set of loads, stores, and arithmetic. The only thing that changes is the data. And this gives you a very powerful ability to optimize that you can't do with CPU-style things, which have branches, and even GPU-style things, which have loads and stores. Well, GPUs, if you want GPU-style stuff, you have like load based on X, you now need a cache hierarchy, and not an explicit cache hierarchy, an implicit cache hierarchy with eviction policies that are hard-coded into the CPU. You start doing all this stuff, and you're never going to get theoretically good performance. Again, I don't think there's 100X. Some startups will talk about 100X, and they'll talk about absolutely ridiculous things like clockless computing or analog computing. Okay, here, analog computing just won't work. And clockless computing, sure, it might work in theory, but your EDA tools are... Maybe AIs will be able to design clockless chips, but not humans. But what actually is practical is changing cache hierarchies and removing branch predictors and removing warp schedulers, right? GPUs spend tons of power on warp scheduling because we have to hide the latency from the memory. We'll have to hide the latency if everything's statically scheduled. [00:07:25]Alessio: Why do you think people are still hanging on to Turing completeness? [00:07:27]Swyx: Well, because it's really easy. [00:07:29]George: Turing Complete is just really easy to just, oh, you know, it would just be so nice if I could do like an if statement here and actually branch the code, right? So it requires a lot more thought to do it without Turing Completeness. [00:07:41]Swyx: And would this be qualitatively different than TPUs? [00:07:44]George: So TPUs are a lot closer. Yeah. TPUs are a lot closer to what I'm talking about than like CUDA. Okay, so what is CUDA? Well, CUDA is a C-like language, which compiles to an LLVM-like IR, which compiles to PTX, which compiles to SAS, which are all Turing Complete. TPUs are much more like this. Yeah. Their memory is pretty statically managed. They have a V—I did some reverse engineering on the TPU. It's published in TinyGrad. It has like a VLIW instruction, and it runs them. So it's similar. I think the TPUs have a few problems. I think systolic arrays are the wrong choice. I think they have systolic arrays because that was the guy's PhD, and then of course Amazon makes— [00:08:20]Swyx: Could you summarize systolic arrays for us? [00:08:21]George: Systolic arrays are just—okay, so basically you have like—it's a way to do matrix multiplication. Think of a grid of mollax, and then the grid can multiply, and then shift, multiply, then shift, multiply, then shift. And they are very power efficient, but it becomes hard to schedule a lot of stuff on them if you're not doing like perfectly sized dense matrix multiplies, which you can argue, well, design your models to use perfectly sized dense matrix multiplies, sure. [00:08:47]Swyx: Thanks for indulging on these explanations. I think we need to keep our audience along with us by pausing every now and then to explain key terms. [00:08:56]George: When I say explain a systolic array, I just immediately get a picture in my head of like tilting a matrix and shifting it. It's hard to kind of explain. Yeah. [00:09:04]Swyx: Yeah. We'll do something. We'll do something. We'll have show notes. [00:09:08]George: And we edit in visuals. Yeah, yeah, yeah. There's some great graphics that just show you, oh, so that's what a systolic array is. But it's a mollax shift machine that looks kind of different from the typical ALU sort of machine. I think the right answer is something that looks more like queues that feed into ALUs, and then you can prefetch the loads from the memory, put in a bunch of queues, and then the queue is just like, and feeds into another queue over here. But yeah, but that's not even the main problem with TPUs. The main problem with TPUs is that they're closed source. Not only is the chip closed source, but all of XLA is open source. But the XLA to TPU compiler is a 32 megabyte binary blob called libTPU on Google's cloud instances. It's all closed source. It's all hidden stuff. And you know, well, there's a reason Google made it closed source. Amazon made a clone of the TPU. It's called Inferentia. Or they have some other name for it, a training. Tranium. Yeah, yeah, yeah. And look, it's a clone of the TPU. But Google's software at least kind of works. [00:09:58]Alessio: So those are kind of like the three core pieces. The first thing you're working on, that you've been working on, is TinyGrad. And one of your Twitch streams, you said, is the best thing you've ever written. [00:10:07]Swyx: Yeah. [00:10:08]Alessio: Tell us a bit more about that creation. [00:10:10]George: For a long time, TinyGrad had a hard limit at a thousand lines of code. And what this would force you to do is really make sure you were not wasting lines. I got rid of the restriction because it became a little code golfy at the end. But once like the core framework of TinyGrad was there in those thousand lines, but like the core framework, the ideas are expressed with no boilerplate. If you go read PyTorch, you know, PyTorch I think is actually pretty good code. I think Facebook's pretty good, but there's so much boilerplate. Go in PyTorch and try to track down how an LGU actually works. [00:10:44]Swyx: Just a lot of instructions. [00:10:45]George: Oh, you're going to be diving down a long stack from Python to C to custom libraries to dispatchers to, and then I don't even know how to read TensorFlow. I don't even know where's an LU in TensorFlow. [00:10:55]Swyx: Nobody knows. [00:10:56]George: Someone at Google knows maybe. Google as an organism knows. I don't know if anyone individual at Google knows. [00:11:02]Alessio: What are like the important ergonomics like for a developer as you think about designing the TinyGrad API? [00:11:07]George: So the TinyGrad front end looks very similar to PyTorch. There's an even higher level front end you can use for TinyGrad, which is just ONNX. We have better support for ONNX than Core ML does. And we're going to have, I think we're going to pass ONNX Runtime soon, too. People think ONNX Runtime, that's the gold standard for ONNX. No, you can do better. [00:11:23]Swyx: Pass them in what, specifically? Test compliance tests. [00:11:26]George: So ONNX has a big set of compliance tests that you can check out. And we have them running in TinyGrad, and there's some failures. We're below ONNX Runtime, but we're beyond Core ML. So that's where we are in ONNX support now. But we will pass ONNX Runtime soon because it becomes very easy to add ops because you don't need to do anything at the lower levels. You just do it at this very high level, and TinyGrad compiles it to something that's fast using these minimal ops. You can write, most concretely, what TinyGrad can do that PyTorch can't really do, is if you have something like A times B plus C. If you write that in NaivePyTorch, what it's going to do on the GPU is read A, read B in a kernel, and then store A times B in memory, and then launch another kernel to do A times B plus C. Okay, got to do those loads from memory. It's a whole extra round trip to memory that I just didn't have to do. And you're like, yeah, but you can use the Torch JIT, and it corrects this. Yeah, for that one example, for that one example of MUL/ACC, but, oh, now you did three multiplies? Six multiplies? It won't compile arbitrary code. [00:12:26]Swyx: And have you looked into the other approaches like PyTorch Lightning to accelerate PyTorch itself? [00:12:32]George: Well, PyTorch Lightning, my understanding is, it's mostly a framework around PyTorch, right? PyTorch Lightning is not going to fix this fundamental problem of I multiply six tensors together. It's not going to fix it going to memory any more than a single read from each and a single write to the output. There are lower level things in PyTorch that are, I'm not exactly sure what Dynamo does, but I know they're generating some Triton stuff, which is going to generate the kernels on the fly. But, you know, PyTorch Lightning is at a higher level of abstraction. So TinyGrad's front-end stuff looks like PyTorch. I made a few tweaks. There's a few things I don't like about PyTorch. Why is Relu a class? Really, what's the state? You make a class, and there's a state. Everything should just be Torch functional and then Relu, but just dot Relu on the tensor. There's things in Torch where you have to do tensor dot and not a tensor dot. It just shows an API that's not perfectly refined. But when you're doing stuff TinyGrad style where you don't have lines, well, it has to work this way. Because even the lines to express the, well, you can't use the where operator in PyTorch. Why is it true case, condition, false case? Ugh, that's how Python expresses ifs. It's disgusting. Turner operators are much nicer. It should be, I can do my like a less than zero dot where a comma one, right? [00:13:46]Swyx: The very pandas-like API? [00:13:50]George: It looks like Torch, NumPy, pandas. They're all very similar. I tried to take the cleanest subset of them and express them. But like I said, you can also interact with it using ONNX. I have a rewrite of StableDiffusion, I have a rewrite of Llama, I have a rewrite of Whisper. You can look at them. They're shorter than the Torch versions, and I think they're cleaner. And you stream them all? [00:14:05]Swyx: Yeah. Very nice. [00:14:07]Alessio: So what's the other important concept that you're leveraging to do operation fusing? [00:14:11]George: Yeah, you have basically like a few different like models for the simplest one is eager is as soon as the interpreter sees A times B, it actually dispatches A times B, right? Then you have graph like TensorFlow, which will put A times B into a graph, and then we'll do absolutely nothing until you actually compile the graph at the end. I like this third choice, which is somewhere in the middle, laziness. Laziness is you don't know when the ops are going to dispatch, and don't worry about that. You don't have to worry about this as a programmer, you just write out all your stuff. And then when you actually type `.numpy`, it'll be ready by the time you copy the thing back to CPU. Or you can do `.realize`, and it will actually like force that tensor to be allocated in RAM. And if you think about it, PyTorch is kind of lazy in a way, but they didn't extend the paradigm far enough, right? When I do A times B in PyTorch, it's going to launch a CUDA kernel to do A times B. But it's not going to wait for that CUDA kernel to complete. So you're getting the worst possible worlds. You're getting the same laziness, but you also can't get fusion, because PyTorch doesn't know that I'm then going to do plus C. There's no way for it to be like, whoa, whoa, whoa, don't launch that CUDA kernel. Whoa, just do this one too. Right? Again, PyTorch is working on this, and it's a little bit harder. In Kama, I felt like I was competing against a lot of idiots. Here, I'm competing against smart, very smart people who've made some, I think, different trade-offs. Whereas, if you're trying to build something that is just straight up good on NVIDIA, and we have a lot of people and complexity to throw at it, yeah, PyTorch made a lot of the right choices. I'm trying to build something that manages complexity. You can always make your software do more. The magic is when you can make your software do more without adding complexity, right? Because complex things eventually collapse under their own weight, so it's kind of... [00:15:58]Alessio: How does fusing actually work? [00:16:00]George: There's this thing called lazy.py, and when you do A times B, that's... It's put into a graph, but it's a very local graph. There's no global graph optimizations. And even this can change, right? Again, the programming model for TinyGrad does not preclude eagerness, right? Laziness is not guaranteed laziness. It's just going to try its best. So you put in A times B, and that's a binary op, right? And then you put in A times B, that's a node in the graph. It's a virtual node because it's not realized yet, plus C. Okay, here's a new node, which takes the C tensor in here and takes the output of A times B. It's like, whoa, there's two binary ops. Okay, we'll just fuse those together. Okay, here I have a kernel. This kernel has A, B, and C as inputs. It does A times B plus C in the local registers, and then outputs that to memory. And you can graph.one in TinyGrad. Another amazing thing that TinyGrad has that I've not seen in any other framework is two things. Graph equals one, which is an environment variable. It will output a complete graph of all the operations. Other people are like, oh, you can use PyTorch, export it to ONNX, and use Netron. Yeah, you can. Like, what? That's not what's real. Graph equals one will show you the actual kernels that were dispatched to the GPU. You can also type debug equals two, which will print those kernels out in your command line, and it will tell you the exact number of flops and the exact number of memory accesses in each kernel. So you can immediately see, wait a second, okay, this kernel used this many flops. This was the gigaflops. This is how many bytes it read, and this was the gigabyte per second. And then you can profile without having to like, okay, I mean, in theory, in PyTorch, Sure, use the NVIDIA Insight Profiler. No one does that. No one does, of course, because it's so difficult, right? Like, actually, NVIDIA used to, I think CUDA 9 was the last one that had it. They had a command line one, but now it's like, okay, I'm going to generate this blob, use this NVIDIA GUI tool to convert it into a Chrome trace, and then load it. Yeah, no one does that, right? Just type debug equals two in any TinyGrad model, and it will show you all the kernels that it launches and the efficiency of each kernel, basically. [00:17:58]Swyx: Yeah, this is something that John Carmack has often commented about, is that when you code, you need to build in your instrumentation or observability right into that. I wonder if whatever John is working on, he's adopting this style, and maybe we can sort of encourage it by, I don't know, naming it and coining a certain kind of debugging style? [00:18:16]George: If he would like to start contributing to TinyGrad, I'd be so happy. [00:18:19]Swyx: You should hook up with them. [00:18:22]George: I've chatted with them a few times. I'm not really sure what his company's doing, but no, I mean, hopefully we get TinyGrad to a point where people actually want to start using it. So TinyGrad right now is uncompetitive on NVIDIA, and it's uncompetitive on x86. [00:18:36]Swyx: And specifically, what do you care about when you say uncompetitive? Speed. [00:18:39]George: Share of speed. It's correct. The correctness is there. The correctness for both forwards and backwards passes is there. But on NVIDIA, it's about 5x slower than PyTorch right now. Like 5x, wow, this is unsurmountable. No, there's reasons it's 5x slower, and I can go through how we're going to make it faster. It could be 100x slower, so we're making progress. But there's one place where it actually is competitive, and that's Qualcomm GPUs. So TinyGrad is used to run the model in OpenPilot. Like right now, it's been live in production now for six months. And TinyGrad is about 2x faster on the GPU than Qualcomm's library. [00:19:10]Swyx: What about Qualcomm architecture? [00:19:12]George: What makes it doable? Well, because the world has spent how many millions of man hours to make NVIDIA fast? And Qualcomm has a team of 10 Qualcomm engineers? Okay, well, who can I beat here? What I propose with TinyGrad is that developer efficiency is much higher. But even if I have 10x higher developer efficiency, I still lose on NVIDIA, right? You know, okay, I didn't put 100,000 man hours into it, right? If they put a million, like, that's what I'm saying. But that's what I'm saying we can get. And we are going to close this speed gap a lot. Like I don't support TensorCourse yet. That's a big one that's just going to, okay, massively close the gap. And then AMD. I don't even have a benchmark for AMD because I couldn't get it compiled. Oh, and I tried. Oh, I tried. I spent a day. Like, I spent actually a day trying to get PyTorch. And I got it built. I got it kind of working, then I tried to run a model, like, there's all kinds of weird errors and the rabbit holes are so deep on this. I'm like, you know, you can compare the speed. Right now, you can run LLAMA, you can run anything you want on AMD. It already all works. Any OpenCL backend works, and it's not terribly slow. I mean, it's a lot faster than crashing. So it's infinitely times faster than PyTorch on AMD. But pretty soon, we're going to start getting close to theoretical maximums on AMD. That's really where I'm pushing. And I want to get AMD on MLPerf in a couple months, hopefully. [00:20:26]Swyx: Now that you bring up AMD. [00:20:27]Alessio: Yeah, let's dive into that. Because when you announced the Semicore fundraise, you mentioned one of your first goals is like build the framework, runtime and driver for AMD. And then on June 3rd on Twitch, you weren't as excited about AMD anymore. Maybe let's talk a bit about that. You compared the quality of commit messages from the AMD kernel to the Intel work that people are doing there. What's important to know? [00:20:51]George: When I said I want to write a framework, I never intended on writing a kernel driver. I mean, I flirted with that idea briefly, but realistically, there's three parts to it, right? There's the ML framework, there's the driver, and then there's the user space runtime. I was even down to rewrite the user space runtime. I have a GitHub repo called CUDA IOControlSniffer. It's terribly called. But you can actually launch a CUDA kernel without CUDA. So you don't need CUDA installed. Just the NVIDIA open source driver and this open source repo can launch a CUDA kernel. So rewriting the user space runtime is doable. Rewriting the kernel driver? [00:21:26]Swyx: I don't even have docs. [00:21:27]George: I don't have any docs for the GPU. Like it would just be a massive reverse engineering project. I wasn't complaining about it being slow. I wasn't complaining about PyTorch not compiling. I was complaining about the thing crashing my entire computer. It panics my kernel. And I have to wait five minutes while it reboots because it's a server motherboard and they take five minutes to reboot. So I was like, look, if you guys do not care enough to get me a decent kernel driver, there's no way I'm wasting my time on this, especially when I can use Intel GPUs. Intel GPUs have a stable kernel driver and they have all their hardware documented. You can go and you can find all the register docs on Intel GPUs. So I'm like, why don't I just use these? Now, there's a downside to them. Their GPU is $350. You're like, what a deal. [00:22:03]Swyx: It's $350. [00:22:04]George: You know, you get about $350 worth of performance. And if you're paying about $400 for the PCIe slot to put it in, right, like between the power and all the other stuff, you're like, okay, nevermind. You got to use NVIDIA or AMD from that perspective. But I sent an email to Lisa Su. She responded. [00:22:19]Swyx: Oh. [00:22:20]George: And I've had a few calls since. And like, what I tried to do, first off, like, thank you for responding. It shows me that like, if you don't care about your kernel panicking, I can't, like, this is just a huge waste of my time, right? I'll find someone who will care. I'm not asking for your seven by seven Winograd convolution when transposed to be fast. Like, I'm not asking for that. I'm asking literally for- The basics of getting it running. Oh, and this isn't TinyGrad. This is your demo apps. I ran their demo apps in loops, and I got kernel panics. I'm like, no, okay. No, Lisa Su reached out, connected with a whole bunch of different people. They sent me a pre-release version of RockM 5.6. They told me you can't release it, which I'm like, guys, why do you care? But they say they're going to release it by the end of the month, and it fixed the kernel panic. The guy managed to reproduce it with the two GPUs and the computer, and yeah, sent me a driver, and it works. I had that experience, and then I had another experience where I had two calls with, like, AMD's, like, communication people. I was just like, I tried to explain to these people, like, open source culture. Like, it's not open source if you dump the source code on a GitHub repo and then forget about it until the next release. It's not open source if all your issues are from 2022. Like, it's just no one's going to contribute to that project, right? Sure, it's open source in a very, like, technical sense. To be fair, it's better than nothing. It's better than nothing, but I fixed a bug in Nickel that I fixed. There's a fun fact, by the way. If you have a consumer AMD GPU, they don't support peer-to-peer, and their all-reduce bandwidth is horrendously slow because it's using CUDA kernels to do the copy between the GPUs, and it's putting so many transactions on the PCIe bus that it's really slow. But you can use CUDA memcpy, and there's a flag to use CUDA memcpy, but that flag had a bug. I posted the issue on Nickel. I expected nothing to happen. The NVIDIA guy replied to me within an hour. He's like, try this other flag. I'm like, okay, I tried the other flag. It still doesn't work, but here's a clean repro. And I spent, like, three hours writing a very clean repro. I ended up tracking the issue down myself, but just the fact that somebody responded to me within an hour and cared about fixing the issue? Okay, you've shown that it's worth my time, and I will put my time in because, like, let's make this better. Like, I'm here to help. But if you show me that, you know, you're like, you're the kernel panics. That's just, like, expected. Okay. [00:24:36]Swyx: Well, it sounds like AMD is getting the message. [00:24:38]George: They are. And I just, I don't really think they've had someone explain to them, like, like, I was like, you can, like, build in public. And they're like, what's an example of building in public? I'm like, go look at PyTorch. Go look at PyTorch. I have two minor things merged into PyTorch because it's very responsive, you know? [00:24:53]Alessio: So that's kind of like the lowest level of the stack. And then at a slightly higher level, obviously, there's TinyGrad, there's Mojo, there's ggml. How are you thinking about breadth versus, like, depth? Like, where you decided to focus early on? [00:25:06]George: So ggml is very much like a, okay, everyone has M1s, right? Actually, I was thinking, in the beginning, I was thinking of something more like ggml, focused on the M1s. But ggml showed up and was just like, we're actually just focusing on the M1s. And actually, M1 PyTorch is considerably better than AMD PyTorch. M1 PyTorch works, it only gives wrong answers sometimes, and it only crashes sometimes. But, like, some models kind of run. When I was writing the metal backend, I was comparing to MPS PyTorch, and I had, like, a discrepancy. TinyGrad checks all its outputs compared to Torch, and I had one where it didn't match. I'm like, I checked the matrix by hand, it matches TinyGrad, I don't understand. And then I switched PyTorch back to CPU, and it matched. I'm like, oh. Well, there's, like, bugs, like, if you, like, transpose the matrix, because, like, I think it has to do with, like, multi-views in PyTorch, and, like, weird under-the-hood stuff that's not exposed to you, like, there's bugs. And maybe they fixed them, but, like, you know, it seems like there was a lot of momentum. Again, because you're getting how many engineers care about making PyTorch work on M1, right? Thousands, tens of thousands. And you have an open development process, and guess what? It's going to be good. How many engineers care about AMD working, PyTorch AMD working? Well, you got 10 guys that work for AMD, and then, like, a couple hobbyists. [00:26:15]Swyx: You revealed an interesting detail about how you debug. You hand-check the matrix math? No, I don't hand-check it. [00:26:20]George: One of the best tests in TinyGrad is a file called testops.py. And it's just a hundred small examples written in TinyGrad and PyTorch, and it checks both the forwards and backwards to make sure they match. [00:26:34]Swyx: Good test suite. Yeah. Very important. [00:26:35]George: That's, I mean, that's one of them where, like, I really, I put a lot of effort into CI for TinyGrad. I think CI is super important. Like, I want that green check to mean I can merge this, right? Like, I don't want my tests to, and if the green check, if you somehow manage to introduce a bug and get the green check, okay, we're fixing the test, top priority. [00:26:51]Swyx: Mojo? [00:26:52]George: It's closed source. No, I'm not that interested. Do you know what I mean? Like, look, I like Chris Lattner. I think he's going to do great things, and I understand the, like, kind of the wisdom, even, in keeping it closed source. But, you know, I'm interested when it's open. [00:27:05]Swyx: Yeah. You have an interesting design deviation from him, because he's decided to be a, well, promised to be a superset of Python, and you have decided to break with PyTorch APIs. And I think that affects learnability and transportability of code. [00:27:18]George: You know, if the PyTorch thing ends up being, like, a stumbling block, I could write a perfect PyTorch instead of import PyTorch. Instead of, like, yeah, import torch, you type import tinytorchestorch. And if that really becomes the stumbling block, I will do that. No, Chris Lattner went much further than PyTorch. Replicating the PyTorch API is something I can do with a couple, you know, like an engineer monitor. [00:27:44]Swyx: A shim. [00:27:44]George: Right, like a shim, yeah. Replicating Python? [00:27:47]Swyx: Hoo-hoo-hoo! [00:27:48]George: There's a big graveyard of those projects. How's Piston going? How's Jython? [00:27:57]Swyx: PyPy? Oh, you can go way back. [00:27:59]Alessio: So your core mission is commoditizing the petaflop. And then your business goal is to sell computers for more than the cost to make, which seems super reasonable. And you're going to have three tiny boxes? [00:28:11]Swyx: Red, green, blue? No, no, no, no, no, no, no. [00:28:13]George: That was my... Look, you know, a lot of people, like, I love, you know, leaning into, like, saying I'm giving up, right? It's great to give up, right? Giving up is this wonderful thing. It's so liberating. And then, like, you can decide afterward if you really give up or not. There's very little harm in saying you give up, except, like, you know, great, Twitter haters have something to talk about, and all press is good press, kids, so... Just red, only red. [00:28:32]Swyx: Tiny box, red. Tiny box, red. [00:28:34]George: Unless AMD, you know, upsets me again, and then we're back to other colors. We have other colors to choose from. [00:28:41]Alessio: When you think about hardware design, what are some of the numbers you look for? So, teraflops per second is one, but, like, memory bandwidth is another big limiter. Like, how do you make those trade-offs? [00:28:52]George: Well, I mean, fundamentally, I'm limited to what GPUs I can buy. But, yeah, for something that I think a lot of people are going to want to reasonably do, with, um... A coworker of mine described them as luxury AI computers. Right? Like, luxury AI computers for people. And that's, like, what we're building. And I think a common thing people are going to want to do is run, like, Large Llama. Right? Or Large, like, Falcon or whatever. [00:29:13]Swyx: FB-16 Llama. [00:29:14]George: FB-16, exactly. Exactly. Um, you know, Int8, I think, can work. I think that, like, what GGML is doing to go to, like, N4. Like, this doesn't work. Like, have you done... I mean, maybe they have. But, like, I read what it was, and I was like, this isn't from any paper. This is just some... Squeezing as much as possible. Yeah, you made up some quantization standards to make it run fast. And, like, maybe it works. But, okay, where's, like, the Hellaswag number? Right? Where's your, uh... [00:29:38]Swyx: The thesis is right. That, like, if you have hundreds of billions of parameters, that the individual quantization doesn't actually matter that much. [00:29:44]George: Well, the real way to look at all of that is to just say you want to compress the weights, right? It's a form of weight compression. Quantization is a form of weight compression, right? Now, this is obviously not lossless. It's not a lossless compressor, right? If it's a lossless compressor, and you can show that it's correct, then, okay, we don't have to have any other conversation. But it's a lossy compressor. And how do you know that your loss isn't actually losing the power of the model? Maybe int4 65B llama is actually the same as FB16 7B llama, right? We don't know. Maybe someone has done this yet, but I looked for it when it, like, first came out and people were talking about it. And I'm like, it's not from a paper, right? The indate stuff is from a paper where they... Like, some of the indate stuff is from a paper. There's one paper, I think it's, like, indate... LLM.indate, where they actually do all the tests. And they didn't go fully indate. They made, like, 90% of it indate and kept, like, 10% of it in FB16 for what they called, like, the outliers or whatever. So I think that this is not quite so easy. [00:30:37]Swyx: And I think being able... [00:30:38]George: Well, so first off, if you're training, no one's gotten training to work with indate yet. There's a few papers that vaguely show it. But if you're training, you're going to need BF16 or float16. So this is why I target that. Now, the thing that you're going to want to do is run these large language models out of the box on your hardware in FB16, and that's memory bandwidth. So you need large amounts of memory bandwidth, too. So ask how I trade off memory bandwidth in Flop, so what GPUs can I buy? [00:31:02]Alessio: So first of all, you have this hiring process, which is you've got to solve one of the bounties that are open on TinyGrad. There's no technical interview. One of them is indate support. Do you already have some things you want to test on? [00:31:14]Swyx: We have indate support. What I'd like to see somebody do [00:31:16]George: is just load the ggml indate llama into TinyGrad and then benchmark it against the FB16 one. Indate already works in TinyGrad. It doesn't actually do the math in indate. It does all the math still in FB32. So indate can mean you just have your weights in indate, or indate can mean you actually do your math in indate. And doing your math in indate, the big gain that people care about is actually having your weights in indate, because weights in indate mean less memory and less memory bandwidth, whereas the math, keep it in FB32. With on M1s, it doesn't matter what data type you're doing in the GPU. I'm not even sure it can do indate, but FB16 and FB32 is the same tariff ops. So yeah, no, that's one of the bounties. One of the bounties is get indate llama running [00:31:58]Swyx: with the indate weights. [00:32:00]George: And then actually, what you could even do, if you really want to test this, just take the FB16 weights, convert them to indate, then convert them back to FB16, then compare the unconverted and converted. [00:32:10]Swyx: Oh, that's a nice hack. Oh, yeah. Right, like- This should be lossless in the other direction. Yeah, I think FB16, [00:32:17]George: it should be lossless in the other direction. I'm actually not 100% about that. Why not? Oh, because like, you ever try to like, like if you want to represent, if it was like int16, it's not lossless. [00:32:25]Swyx: Sure. [00:32:26]George: All of indate can be represented in FB16, but I'm not 100% about that. [00:32:29]Swyx: Just drop the bytes. We just have to do it, right? [00:32:32]George: Just literally do it. There's only 256 to check, like. But yeah, either way, or I mean, int4, definitely. So do your int4, convert it back, and now see, even with int4 weights and FB32 math, like, okay, how much has your performance degraded this model? [00:32:47]Alessio: I think like the, you're planning to release the first tiny box, ship them in like two to six, eight months, something like that. What's top of mind for you in terms of building a team? Who should, who are you calling for? [00:32:59]George: So as the GPU is picked out and you're like, well, I could make that computer with the GPUs. And my answer is, can you? Do you know how hard it is to put six GPUs in a computer? And people think it's really easy. And it's really easy to put one GPU in a computer. It's really easy to put two GPUs in a computer, but now you want to put in eight. Okay, so I'll tell you a few things about these GPUs. They take up four slots. You can buy the nicest super micro. You can't put eight of those in there. You need two slot blowers. [00:33:25]Swyx: If you want to use one of those, [00:33:25]George: those for you super micros, you need two slot blowers or water cooling, right? If you're trying to get the four slot cards in there, you're going to need some form of water cooling. There are some like Chinese 40 nineties that are blowers, right? You have any blowers or water cooling if you're trying to get it in those things, right? [00:33:37]Swyx: So are you doing water? [00:33:39]George: No, I'm not using that chassis. Okay, so now you want to get six GPUs in a computer. So that's a big challenge. You're like, oh, I'll just use a PCIe extenders. I saw it online as tech tips. It works great. No, it doesn't. Try PCIe extenders that work at PCIe 4.0 and interconnect bandwidth, super important. They don't work at 3.0. No PCIe extender I've tested, and I've bought 20 of them, works at PCIe 4.0. So you're going to need PCIe re-drivers. Now, okay, how much is that adding cost, right? Like these things all get really hard. And then tiny boxes, I've even had another constraint to it. I want this thing to be silent, not totally silent, but my limit is like 45, maybe 50 DB, but not super micro machine, 60 DB. We have a small, we have a compute cluster at comma. You gotta wear ear protection to go in there. Like- [00:34:24]Swyx: Yeah, I've seen some videos where you give a tour. Oh yeah. It's noisy. It's super loud. [00:34:28]George: You got all these machines just screaming. All those, like if you have a blower, what is that thing? 10,000 RPM, just screaming. Like I want to be able to use the normal big GPU fans and make this thing so it can sit under your desk, plug into one outlet of power, right? Six GPUs, your GPUs are 350 Watts each. Can't plug that into a wall outlet. Okay, so how are you going to deal with that? Good questions, right? [00:34:51]Swyx: And you're not sharing them. [00:34:52]George: Well, that one, I mean, that one is pretty obvious. You have to limit the power on the GPUs, right? You have to limit the power on the GPUs. Now you can limit power on GPUs and still get, you can use like half the power and get 80% of the performance. This is a known fact about GPUs, but like that's one of my design constraints. So when you start to add all these design constraints, good luck building a tiny box yourself. Obviously it can be done, but you need something that has actually quite a bit of scale and resources to do it. [00:35:15]Alessio: And you see like the, under the desk, it's like one of the main use cases, kind of like individual developer use or. [00:35:21]George: Yeah, what I also see is more of a, like an AI hub for your home, right? As we start to get like home robotics kind of stuff, you don't want to put the inference on the robot, but you also don't want to put the inference on the cloud. Well, you don't want to put it on the robot because, okay, it's 1500 Watts, tiny box. You'll put batteries and charge them, bad idea. Just wireless. Wireless is 0.5 milliseconds, right? This is super fast. You don't want to go to the cloud for two reasons. One, cloud's far away. Okay, it's not that far away. You can kind of address this. But two, cloud's also mad expensive. Like cloud GPUs are way more expensive than running that GPU at your house. At least any rates you're going to get, right? Maybe if you commit to buy, well, yeah, I'm going to buy 10,000 GPUs for three years, then maybe the cloud will give you a good rate. But like, you want to buy one GPU in the cloud? I mean, okay, you can go to like vast, but like if you're going on Azure AWS, so that's expensive. [00:36:12]Swyx: This is like a personal data center instead of a cloud data center. [00:36:16]George: We like the term compute cluster. So we can use NVIDIA GPUs. [00:36:20]Swyx: Yeah, data centers may be a little bit dated. It's a compute cluster, [00:36:23]George: which is totally legal under the CUDA license agreement. [00:36:26]Swyx: You talk a lot about the PCIe connection. Do you think there's any fat there to trim? What do you mean? You're limited by bandwidth. [00:36:32]George: Okay, for some things, yes. So bandwidth is roughly 10x less than what you can get with NB-linked A100s, right? NB-linked A100s are going to have, and then you can even get like full fabric and NVIDIA really pushes on that stuff, 600 gigabytes per second, right? And PCIe, four, you're going to get 60, right? So you're getting 10x less. That said, why do you need the bandwidth, right? And the answer is you need it for training huge models. If you're training on a tiny box, your limit's going to be about 7 billion. If you're training on big stuff, your limit's going to be like 70 billion, right? Okay, you can hack it to get a bit higher. You can hack it, like GPT hacked it to get a bit higher, but like that 65 billion in LLAMA, like there's a reason they chose 65 billion, right? And that's what can reasonably fit model parallel on a GPU, right? So yes, you are going to end up training models. The cap's going to be like 7 billion, but I actually heard this on your podcast. I don't think that the best chatbot models are going to be the big ones. I think the best chatbot models are going to be the ones where you had a thousand training runs instead of one. And I don't think that the interconnect bandwidth is going to matter that much. [00:37:33]Swyx: So what are we optimizing for instead of compute optimal? What do you mean compute optimal? You're talking about this, the LLAMA style models where you train for like 200x. You train longer, yeah. [00:37:41]George: Yeah, yeah, yeah. You can always make your model better by doing one of two things, right? And a comma, we just have a strict limit on it. You can always make your model better by training longer, and you can always make your model better by making it bigger. But these aren't the interesting ones, right? Particularly the making it bigger because training it longer, fine. You're getting a better set of weights. The inference is the same. The inference is the same whether I trained it for a day or a week. Okay, if it's 1 billion versus 10 billion, well, I 10x my inference too, right? So I think that these big models are kind of, sure, they're great if you're research labs and you're trying to like max out this hypothetical thing. [00:38:13]Swyx: Which you can talk about later. Yeah, yeah, yeah. [00:38:15]George: But if you're like a startup or you're like an individual or you're trying to deploy this to the edge anywhere, you don't need that many weights. [00:38:22]Swyx: Yeah, yeah. You actually don't want that many weights. Optimizing for inference rather than capabilities doing benchmarks. Yes. [00:38:29]George: And I think the inference thing, right? There's gonna be so much more. Right now, the ratio between like training and inference on clouds, I think it's only still, I think it's like two or three X, right? It's two or three X more inference, which doesn't make any sense. It's way more inference. [00:38:41]Swyx: Yeah. [00:38:42]George: There should be 10 to 100 X more inference in the world than training. But then also like, what is training, right? You start to see these things like LoRa, like it's kind of blurring the lines between inference and training. And I think that that blurred line is actually really good. I'd like to see much more like on-device training or on-device fine tuning of the final layer. We're pushing toward this stuff at Comma, right? Like why am I shipping a fixed model? I totally want this model to fine tune based on like how your left tire is flat, right? Every time you cut the same turn because your left tire is flat, well, it should learn that, right? [00:39:11]Swyx: So would Comma pursue parameter efficient fine tuning? Yeah. [00:39:16]George: We're looking into stuff like that. I mean, Comma is already very parameter efficient because we have to like run this thing in a car and you have to like cool it and power it. [00:39:22]Alessio: And so this kind of like intelligence cluster you have in your home, you see when the person is using third-party model, they load them locally and kind of do the final fine tuning. It kind of stays within the box. [00:39:33]George: I think that that's one version of it for the privacy conscious. I also see a world where you can have your tiny box in its down cycles, mine flop coin, right? You know, it turns out not all crypto is a scam. [00:39:45]Swyx: There's one way to tell if crypto is a scam. [00:39:46]George: If they're selling the coin before they make the product, [00:39:49]Swyx: it's a scam. [00:39:49]George: If they have the product and then they sell the coin, it's maybe not a scam, right? So yeah, my thought is like each tiny box would let you, would have a private key on it. And you have to do it this way. You can't just let anyone join because of Sybil attacks, right? [00:40:01]Swyx: There's a real problem of like, [00:40:01]George: how do I ensure your data is correct? And the way that I ensure your data is correct on the tiny net is if you ever send wrong data, you're banned from the network for life. [00:40:08]Swyx: Yeah. [00:40:09]George: Your $15,000 hardware box is banned. [00:40:11]Swyx: So, you know, don't cheat. [00:40:11]George: Obviously if it messes up, we'll forgive you. [00:40:14]Swyx: Somebody's going to try to jailbreak your devices. There's no jailbreak. [00:40:17]George: There's no jailbreak. [00:40:18]Swyx: It's just a different network. [00:40:19]George: Well, there's just a private key on ea ch device, right? Like if you buy a tiny box from the tiny corp, [00:40:23]Swyx: I give you a private key. [00:40:23]George: It's in my backend server, right? You want to hack my server, that's illegal. Anything you want to do on the device, the device is yours. My server's mine, right? [00:40:29]Swyx: Yeah. Have you looked into like a federated training at all? [00:40:33]George: Okay. There's orders of magnitude of federated training. You mean like over the cloud and stuff? [00:40:37]Swyx: Over the internet? Yeah. Over the internet, but also distributed on a bunch of devices, right? [00:40:41]George: Yeah, I'm very bearish on this stuff. Because your interconnect bandwidth, right? So, okay. At the high end, you have your interconnect bandwidth of NVLink, which is 600 gigabytes per second, right? The tiny box has 60 gigabytes per second. And then your internet has 125 megabytes per second, right? Not gigabits, 125 megabytes, right? So, okay. That's how many orders of magnitude we're talking here? Like from 60 down to 125? Like, all right, that's over a hundred X. That's 400 X, right? So like, what you can do is inference, right? Like there's, for inference, you don't care, right? For inference, there's so little bandwidth at the top and the bottom of the model that like, yeah, you can do federated inference, right? And that's kind of what I'm talking about. There's also interesting things to push into, like you're like, but okay, what if you want to run closed source models? This stuff gets kind of interesting, like using TPMs on the boxes and stuff. But then someone might jailbreak my device. So, you know, maybe we don't try to do that. [00:41:34]Alessio: Yeah, what's like the enterprise use case? Do you see companies buying a bunch of these and like stacking them together? [00:41:39]George: The tiny box is like the first version of what we're building. But what I really want to do is be on the absolute edge of flops per dollar and flops per watt. These are the two numbers that matter. So the enterprise use case is you want to train, like Kama, right? So Kama just built out a new compute cluster. It's about a person and a half. [00:41:56]Swyx: A person being 20 petaflops. [00:41:58]George: A person is 20 petaflops. It's about 30 petaflops. We built out a little compute cluster and, you know, we paid double what you theoretically could per flop, right? You theoretically could pay half per flop if you designed a bunch of custom stuff. And yeah, I mean, I could see that being, you know, a tiny corp. Kama's going to be the first customer. I'm going to build a box for Kama and then I'm going to show off the box I built for Kama and be like, okay, like, do you want to build? I sell $250,000 training computers. Or how much is one H100 box? [00:42:26]Swyx: It's 400 grand? [00:42:27]George: Okay, I'll build you a 400 grand training computer and it'll be 10x better than that H100 box. Again, not for every use case. For some, you need the interconnect bandwidth. But for 90% of most companies' model training use cases, the tiny box will be 5x faster for the same price. [00:42:41]Alessio: You mentioned the person of compute. How do we build a human for $20 million? [00:42:47]George: Well, it's a lot cheaper now. So like I said, Kama spent about half a million on our person and a half, so. [00:42:54]Alessio: What are some of the numbers people should think of when they compare compute to like people? So GPT-4 was 100 person years of training. That's more like on the timescale. 20 petaflops is one person. I think you, right now the math was that for the price of the most expensive thing we build, which is the International Space Station, we could build one Tampa of. Yeah, yeah, one Tampa of compute. [00:43:16]Swyx: Yeah, which is the ultimate currency of measurement. [00:43:20]George: Yeah, yeah, we could build. So like the biggest training clusters today, I know less about how GPT-4 was trained. I know some rough numbers on the weights and stuff, but Lama- [00:43:28]Swyx: A trillion parameters? [00:43:30]George: Well, okay, so GPT-4 is 220 billion in each head, and then it's an eight-way mixture model. So mixture models are what you do when you're out of ideas. So, you know, it's a mixture model. They just train the same model eight times, and then they have some little trick. They actually do 16 inferences, but no, it's not like- [00:43:45]Swyx: So the multimodality is just a vision model kind of glommed on? [00:43:49]George: I mean, the multimodality is like obvious what it is too. You just put the vision model in the same token space as your language model. Oh, did people think it was something else? The mixture has nothing to do with the vision or language aspect of it. It just has to do with, well, okay, we can't really make models bigger than 220 billion parameters. We want it to be better. Well, how can we make it better? Well, we can train it longer, and okay, we've actually already maxed that out. We're getting diminishing returns there. [00:44:13]Swyx: Okay. A mixture of experts. [00:44:14]George: Yeah, a mixture of experts. We'll train eight of them, right? [00:44:16]Swyx: So, all right. [00:44:17]George: So, you know, the real truth is whenever a start, whenever a company is secretive, it's because they're hiding something that's not that cool. And people have this wrong idea over and over again that they think they're hiding it because it's really cool. [00:44:28]Swyx: It must be amazing. [00:44:29]George: It's a trillion parameters. No, it's a little bigger than GPT-3, and they did an eight-way mixture of experts. Like, all right, dude, anyone can spend eight times the money and get that. Coming back to what I think is actually gonna happen is, yeah, people are gonna train smaller models for longer and fine-tune them and find all these tricks. OpenAI used to publish stuff on this, you know, [00:44:47]Swyx: when they would publish stuff [00:44:48]George: about how much better the training has gotten holding compute constant. It's gotten a lot better, right? Think, compare like BatchNorm to NoBatchNorm. [00:45:00]Swyx: Is you're finding algorithms like FlashAttention? [00:45:02]George: Yeah, well, FlashAttention, yeah. And FlashAttention is the same compute. FlashAttention is an interesting fact where it's actually the identical compute. It's just a more efficient way to do the compute. But I'm even talking about like, look at the new embeddings people are using, right? They used to use these like boring old embeddings. Now, like, Lama uses that complex one, and now there's like Alibi. I'm not up-to-date on all the latest stuff, but those tricks give you so much. [00:45:23]Swyx: There's been a whole round trip with positional embeddings. I don't know if you've seen this discussion. I haven't followed exactly. [00:45:29]George: I mean, you quickly run into the obvious problem with positional embeddings, which is you have to invalidate your KV cache if you run off the context. So that's why I think these new ones, [00:45:38]Swyx: they're playing with them, [00:45:38]George: but I'm not an expert on like the latest up-to-date language model stuff. [00:45:43]Alessio: What are some of the things, I mean, that people are getting wrong? So back to autonomous driving, there was like the whole like LiDAR versus vision thing. People don't get into accidents because they cannot see well. They get into accidents because they get distracted and all these things. Do you see similarities today on like the Pathway GI? [00:45:59]George: Nothing I say about this is ever gonna compete with how Rich Sutton stated it. [00:46:03]Swyx: Rich Sutton, the writer of [00:46:04]George: Reinforcement Learning, The Bitter Lesson. Nothing I say is ever gonna compete with, The Bitter Lesson's way better than any way I'm going to phrase this. Just go read that, and then like, I'm sorry it's bitter, but you actually just have to believe it. Like over and over again, people make this mistake. They're like, oh, we're gonna hand engineer this thing. No, like stop wasting time. [00:46:22]Swyx: I mean, OpenAI is not taking The Bitter Lesson. They were leaders in deep learning for a long, long, long time. [00:46:27]George: Well, OpenAI was the absolute leader to the thesis that compute is all you need, right? [00:46:31]Swyx: And there's a question of how long [00:46:32]George: this thesis is going to continue for. It's a cool thesis, and look, I think I would be lying along with everybody else. I was into language models like way back in the day for the Hutter Prize. I got into AI through the Hutter Prize. Like 2014, I'm trying to build compressive models of Wikipedia. And I'm like, okay, why is this so hard? What this is is a language model, right? And I'm playing with these Bayesian things, and I'm just like, oh, but I get it. I have two data points, and they're almost the same, but how do I measure that almost, right? I just wrapped my head around this, and this was around the time Karpathy released the first RNN that generated the Shakespeare stuff. And I'm like, okay, I get it, right? It's neural networks that are compressors. Now, this isn't actually, you can't actually win the Hutter Prize with these things because the Hutter Prize is MDL. It's the model, size of the model plus the size of the encodings, embeddings. So yeah, you can't, I mean, probably now you can because it's gotten so good. But yeah, back in the day, you kind of couldn't. So I was like, okay, cool. [00:47:29]Swyx: This is what it is. [00:47:29]George: I kind of get it. I didn't expect that it would continue to work this well. I thought there'd be real limits to how good autocomplete could get. That's fancy autocomplete. But yeah, it works well. So like, yeah, what is OpenAI getting wrong? Technically, not that much. I don't know. If I was a researcher, why would I go work there? [00:47:48]Swyx: Yes, so why is OpenAI like the Miami Heat? [00:47:51]George: No, look, this is my technical stuff. I don't really want to harp on this, but like, why go work at OpenAI when you could go work at Facebook as a researcher? OpenAI can keep ideologues who, you know, believe ideological stuff and Facebook can keep every researcher who's like, dude, I just want to build AI and publish it. [00:48:08]Alessio: Yeah, any other thoughts, tiny corp, bounties? [00:48:11]George: You know, I've been thinking a lot about like what it means to hire in today's world. Okay, look, I'm a believer that machines are going to replace everything in about 20 years. So, okay, what is that thing that people can still do that computers can't? And this is a narrowing list, but like, you know, back in the day, like imagine I was starting a company in 1960. Oh, and we're going to have to hire a whole bunch of calculators in the basement to do all the, you know, math to support the, dude, have you heard about computers? Why don't we just buy a few of those? Oh, wow, man, you're right. So like, I feel like that's kind of happening again. And I'm thinking about, I will post in my Discord, I'll be like, who wants to like, okay, I just changed my unary ops used to be log and exp in like E. I changed them to be log two and exp two because hardware has log two and exp two accelerators. [00:48:59]Swyx: Yeah, and of course you can just change your base. [00:49:00]George: It's one multiply to get it back to E. But like, I made the primitives log two and exp two, right? I just posted in the Discord. I'm like, could someone put this pull request up? And someone eventually did and I merged it. But I'm like, this is almost to the level [00:49:12]Swyx: where models can do it. [00:49:14]George: We're almost to the point where I can say that to a model and the model can do it. [00:49:17]Swyx: Have you tried? Yeah, I don't know. [00:49:20]George: I think autocomplete went further than I thought it would, but I'm also relatively unimpressed with these chatbots. The problem is if your loss function is categorical cross entropy on the internet, your responses will always be mid. [00:49:32]Swyx: Yes, mode collapse is what I call it, I don't know. [00:49:35]George: Maybe, I'm not even talking about mode collapse. You're actually trying to predict the, like, look, I rap. I'm a hobbyist rapper. When I try to get these things to write rap, the raps sound like the kind of raps you read in the YouTube comments. [00:49:45]Swyx: Nursery school. [00:49:46]George: Yeah, it's like, all right, great. You rhyme box with fox, sick rhyme, bro. You know, and Drake is rhyming give it up for me with napkins and cutlery, right? Like, all right, come on. [00:49:55]Swyx: He's got like this thing about orange. Orange is famous so you can't rhyme it. Yeah, yeah, yeah, yeah, yeah. [00:49:59]George: But now, of course, you know, four-inch screws and orange juice is in GPT's training course. Yeah, so I think it went further than everyone kind of thought it would. But the thing that I really want to see is like somebody put 10 LLMs in a room and have them discuss the answer before they give it to me. Right, like, you can actually do this, right? And I think the coding things have to be the same way. There is no coder alive, no matter how good you are, that sits down, well, I'm going to start at cell A1 and type my program, and then I'm going to press run and it's going to work. No one programs like that. So why do we expect the models to, right? So there's a lot that, like, still needs to be done. But, you know, at the tiny corp, I want to be on the cutting edge of this, too. I want to be, like, program generation. I mean, what is TinyGrad? It's a compiler, it generates programs. Generate the fastest program that meets the spec, right? Why am I not just having ML do that? So, you know, it's kind of a, you have to exist fluidly with the machines. And I've come around on a lot of stuff. I'm like, wait, TinyGrad, TinyCorp should be a remote company. I can't do this in person. [00:50:58]Swyx: Really? [00:50:58]George: Yeah, like, comma makes sense to be in person. Like, comma, sure. Yeah, we're getting off in San Diego. [00:51:04]Swyx: But that was a six-year-old company, right? [00:51:05]George: And it works, and it works for a certain type of people [00:51:08]Swyx: and a certain type of culture. [00:51:08]George: But what's going to be different this time? Okay, remote, but now it's remote. And now I'm getting these, like, people who apply, and I'm like, I literally have a thousand applications. I'm not calling you to do a technical screen. I can't really tell anything from a technical screen. What am I going to do? Make a code on a whiteboard? Like, bring up a shared notebook document, so we could, oh, like, that's not going to work. Okay, so then I'm moved to the next thing. We do this at Comma with good success, programming challenges. [00:51:31]Swyx: I've also found them to be, like, [00:51:32]George: completely non-predictive. I found one thing to actually be predictive, and it's, wait a second, just write code in TinyGrad. It's open source, right? And yeah, so, you know, I'm talking to a few people who've been contributing, and, like, contribute, or, you know, the job's not for you. But you can do it remote, and it's, look, it's a chill job. Like, you're not, you're like, oh, yeah, well, I work for the tiny corp. Like, well, you're writing MIT-licensed software. Like, you see what it's doing, right? Like, we'll just, I think, think of it as maybe more of, like, a stipend than a salary. And then also some equity. Like, if, you know, I get rich, we all get rich. [00:52:01]Alessio: How do you think about agents and kind of, like, thinking of them as people versus, like, job to be done? Sean built this thing called Small Developer. [00:52:09]Swyx: It's in the same vein. Or, like, the human in the loop with the language model and just iterating while you write code. I think that's absolutely where it goes. [00:52:17]Alessio: And there's, like, a, it's not, like, one thing. It's, like, there's Small Interpreter. There's, like, Small Debugger. It's kind of, like, all these different jobs to be done. [00:52:24]Swyx: It's a small world. [00:52:25]Alessio: Yeah, it's a, I know, this is, like, the small box is, like, small AI meets tiny corp. [00:52:29]Swyx: So we're all in the same wavelength. [00:52:30]Alessio: How do you think about that? Do you think people will have a human-like interaction where it's, like, oh, this is, like, the AI developer, or, like, is it I'm the human being supercharged by the AI tools? [00:52:41]George: Oh, I think it's, yeah, much more like I'm the human supercharged by the AI tools. I think that, like, coding is tool-complete. Like, driving's not tool-complete. We hire people to drive who are, like, below the API line. Right, there's an API line in the world, right? [00:52:53]Swyx: Love that. Yes. [00:52:53]George: Yeah, yeah, yeah, there's an API line in the world. And, like, you can think, like, Uber's a really clear example, right? There's the people below the API line and the people above the API line. And the way you can tell if you're below or above, by the way, is is your manager a computer, right? Who's the manager of the Uber driver? [00:53:06]Swyx: Well, a computer, right? Does the machine tell you what to do or do you tell machines what to do? Exactly, exactly. [00:53:09]George: So, coding is tool-complete, right? [00:53:13]Swyx: Coding is tool-complete. [00:53:13]George: Coding is above the API line. So it will always be tools supercharging your coding workflow. And it will never be you performing some, like, task. Like, okay, well, I can do everything except for actually starting a Docker container. Like, it just doesn't make any sense, right? Yeah, so it will always be sort of tools. And, you know, look, we see the same stuff with all the, like, people are like, stable diffusion's gonna replace artists or whatever. It's like, dude, like- [00:53:38]Swyx: It's gonna create new artists. [00:53:39]George: Did Photoshop replace artists? [00:53:41]Swyx: Like, what are you talking about, right? [00:53:42]George: Like, you know, a real artist's finger paint. They can't use brushes. Brushes are, you know, brushes are gonna replace all the, okay, like, I just can't. Like, it's all just tools and the tools are gonna get better and better and better. And then eventually, yes, the tools are going to replace us. But, you know, that's still 20 years away. So, you know, I got a company to run in the meantime. [00:54:02]Swyx: So I've written about the API line before and I think that's from Venkatesh. I don't know if you've got your directive to it. I don't know, I definitely took it from someone. [00:54:07]George: It's definitely not mine. [00:54:08]Swyx: It's VGR. But I also have a speculated, a higher line than that, which is the Kanban board. Like, who tells the programmers what to do, right? So are you above or below the Kanban board? Has that evolved your management thinking? [00:54:21]George: Yeah, like, that's sort of what I mean. Like, it's like, I'm just gonna describe the pull request in two sentences and then like, yeah. [00:54:28]Swyx: So you are running the Kanban board? Or the bounties, you know? [00:54:31]George: Yes, the bounties are the Kanban board, exactly. And that is kind of the high level. And then like, yeah, we'll get AIs to fill in some and we'll get people to fill in others. And that's also what it means to be like, full-time at TinyCorp, right? Would you start, and I wrote this up pretty concretely. I'm like, okay, step one is you do bounties for the company. Step two is you propose bounties for the company, right? You don't obviously pay them, we pay them. [00:54:52]Swyx: But you propose them. [00:54:52]George: And I'm like, yeah, that's a good bounty. That like, helps with the main workflow of the company. And step three is you get hired full-time, you get equity, we all, you know, maybe get rich. [00:55:01]Swyx: What else are you designing differently about the employee experience? [00:55:04]George: You know, some people really like to like, [00:55:06]Swyx: like keep a separation, right? [00:55:07]George: Some people really like to keep a separation between like employees and management or customers and employees. Like a comma, you know, the reason I do the DevKit thing, it's like, dude, you buy a comma thing, you're an employee of the company. Like you're just part of the company. It's all the same thing. There's no like secrets, there's no dividing lines. There's no like, it's all a spectrum for like, you know, down here at the spectrum, like you pay. And then up here at the spectrum, you get paid. You understand this is the same spectrum of college, right? Like for undergrad, you pay, and then you get up here to like, you know, I'm doing a PhD program, you get paid. Okay, well, cool. Welcome to the, you know. [00:55:39]Alessio: What about comma bodies? You mentioned a lot of this stuff is clearly virtual, but then there's below the API line you actually need. [00:55:47]Swyx: Wait, this is a thing that's been announced? Comma bodies? We sell them. You can buy them. [00:55:51]George: They're a thousand bucks on our website. [00:55:53]Swyx: Oh, okay, no, no, no. I'm thinking about like the, what Tesla announced with like the humanoid robots. It's the same thing. [00:55:58]George: Except of course, we made the comma version of it. Tesla uses 20 actuators. We use two, right? Like how do you build the simplest possible thing that can like turn the robotics problem into entirely a software problem? So right now it is literally just a comma three on a pole with two wheels. It balances, keeps the comma three up there. And like, there's so much you could do with that already. [00:56:21]Swyx: Right? [00:56:22]George: Like this should replace, how many security guards could this replace? Right? If this thing could just competently wander around a space and take pictures and, you know, focus in on things, send you a text message when someone's trying to break into your building, you know, like, like this could already do so much, of course, but the software is not there yet. Right? So how do we turn robotics into a thing where it's very clearly a software problem? You know, that people don't accept that self-driving cars are a software problem. Like, I don't, I don't know what to tell you, man. Like literally just watch the video yourself and then drive with a joystick, right? Can you drive? And we've actually done this test. We've actually done this test where you've had someone, okay, you just watch this video and here's a joystick and you got to drive the car. And of course they can drive the car. It takes a little bit of practice to get used to the joystick, but the problem is all the model, right? So I can now make the model better. [00:57:07]Swyx: Our second most popular episode ever was about segment anything coming out of Facebook, which as far as I understand the state of the art in computer vision, what are you hoping for there that you need for Karma? [00:57:17]George: I haven't used segment anything. Like they large, large YOLOs or not. I've used like large YOLOs and I'm super impressed by them. [00:57:24]Swyx: Yeah. [00:57:25]George: I got to check out segment anything. I don't think it's a distinct problem, right? Okay, here's something that I'm interested in. All right, we have great LLMs. We have great text to speech models and we have great speech to text models. Okay, so why can I not talk to an LLM? Like I'd have a normal conversation with it. [00:57:39]Swyx: You can with the latency of like two seconds every time. Right? [00:57:42]George: And then it feels so unnatural. It's this like staccato. Like I don't like the RLHF models. I don't like the tuned versions of them. You take on the personality of our customer support agent. Right? [00:57:53]Swyx: Like, oh, come on. [00:57:54]George: I like LLMA more than ChatGPT. ChatGPT's personality just graded on me. Whereas LLMA, like, cool. I read a little bit of pretext paragraph. I can put you in any scenario I want, right? Like, that's interesting to me. So yeah, I think there is really no like distinction between computer vision and language and any of this stuff. It's all eventually going to be fused into one massive. So to say computer vision is solved, well, it doesn't make any sense because what's the output of a computer vision model? Segmentation? Like, what a weird task, right? [00:58:26]Swyx: Who cares? OCR? [00:58:28]George: Who cares? [00:58:29]Swyx: I don't care if you can segment [00:58:29]George: which pixels make up that laptop. I care if you can pick it up. [00:58:32]Alessio: And you're going to have the local cluster. You're going to have the body. [00:58:36]Swyx: Yeah. [00:58:37]George: Yeah, I think that's kind of where that goes. [00:58:39]Swyx: Maybe we can paint the future of like, the year is 2050. You've achieved all you wanted at TinyCorp. What is the AI enabled future like? [00:58:48]George: Well, TinyCorp's the second company. Comma was the first. Comma builds the hardware infrastructure. TinyCorp builds the software infrastructure. The third company is the first one that's going to build a real product. And that product is AI Girlfriend. No, like I'm dead serious, right? Like, this is the dream product. This is the absolute dream product. Girlfriend is just the like- [00:59:08]Swyx: Stand-in. [00:59:09]George: Well, no, it's not a stand-in. No, no, no, no. I actually mean it, right? So I've been wanting to merge with a machine ever since I was like, mad little. [00:59:15]Swyx: Like, you know, I was just like, [00:59:16]George: how do I merge with a machine, right? [00:59:18]Swyx: And like, you can look at like, [00:59:19]George: maybe the Elon style way of thinking about it is Neuralink, right? I'm like, I don't think we need any of this, right? You ever, some of your friends maybe, they get into relationships and you start thinking of, you know, them and their partner as the same person. You start thinking of them as like one person. I mean, they are kind of like merged, right? Like, humans can just kind of do this. It's so cool. It's this ability that we already have. Right, so I don't need to put, you know, electrodes in my brain to merge with a machine. I need an AI Girlfriend, right? So that's what I mean. Like, this is the third product. This is the third company. And yeah, in 2050, I mean like, ah, it's so hard. I just like, maybe I can imagine like 2035. I don't even know 2050, but like, yeah, 2035. Like, yeah, that'd be really great. [01:00:03]Swyx: In terms of merging, like, isn't it, shouldn't you work on Brain Upload rather than AI Girlfriend? Brain Upload, right? [01:00:09]George: I don't need Brain Upload either. Like, there's thousands of hours of me on YouTube, right? Yes. How much of my brain's already uploaded? [01:00:17]Swyx: That's only the stuff that you voice. Yeah, it's not that different. [01:00:20]George: It's not that different, right? You really think a model with, you know, an exaflop of compute couldn't extract everything that's really going on in my brain? I'm a pretty open person, right? Like, I'm not running a complex filter. Humans can't run that complex of a filter. Like, humans just can't. Like, this is actually a cool quirk of biology. It's like, well, humans like can't lie that well. [01:00:39]Alessio: So is it good or bad to put all of your stream of consciousness out there? [01:00:43]George: I mean, I think it's good. [01:00:45]Swyx: I mean, he's streaming every day. I want to live forever. We said off mic that we may be the first immortals, right? Yeah, this is how you live forever. [01:00:54]George: It's a question of, okay, how many weights do I have? Right, okay, let's say I have a trillion weights, right? So talking about a terabyte, 100 terabytes here. [01:01:02]Swyx: Okay, but it's not really 100 terabytes, right? [01:01:03]George: Because it's Kolmogorov complexity. How much redundancy is there in those weights? So, like, maximally compressed, how big is the weight file for my brain? Quantize it whatever you want. Quantization is a poor man's compression. I think we're only talking really here about, like, maybe a couple gigabytes, right? And then if you have, like, a couple gigabytes of true information of yourself up there, cool, man. Like, what does it mean for me to live forever? [01:01:27]Swyx: Like, that's me. No, I think that's good. [01:01:29]Alessio: And I think there's a bit of, like, a professionalization of social media, where, like, a lot of people only have what's, like, PC out there, you know? And I feel like you're going to get, going back to the ChatGPT thing, right? You're going to train a model on, like, everything that's public about a lot of people. [01:01:44]Swyx: And it's like- [01:01:45]George: Then no one's going to run their model and they're going to die. Don't put PC on social media. [01:01:49]Swyx: We're moving on to what would normally be called the lightning round, but just general tics, because you're a generally interesting person with many other interests. What does the goddess of everything else mean to you? [01:01:59]George: Oh, it means that AI is not really going to kill us. [01:02:01]Swyx: Really? [01:02:01]George: Of course. [01:02:02]Swyx: Tell us more. [01:02:03]George: Lex asked me this, like, is AI going to kill us all? And I was quick to say yes, but I don't actually really believe it. I think there's a decent chance that AI kills 95% of us. [01:02:11]Swyx: Okay. [01:02:12]Alessio: But they saw on your Twitch streams that you're with them, so they're not going to- [01:02:16]Swyx: No, I don't think, I actually, [01:02:18]George: I don't also think it's AI. Like, I think the AI alignment problem is so misstated. I think it's actually not a question of whether the computer is aligned with the company who owns the computer. It's a question of whether that company's aligned with you or that government's aligned with you. And the answer is no, and that's how you end up dead. [01:02:31]Swyx: So what the goddess of everything else means to me [01:02:32]George: is like, the complexity will continue. Paper clippers don't exist. [01:02:37]Swyx: You know, there are forces. [01:02:38]George: The paper clipper is cancer, right? The paper clipper is really just a perfect form of cancer. And the goddess of everything else says, yeah, but cancer doesn't win, you know? [01:02:48]Swyx: Yeah, it's a beautiful story for those who haven't heard it. And you read it out and I listened to it. Yeah, what are you grateful for today? [01:02:55]George: Oh man, I mean, it's all just like, I haven't, I haven't thinking about this stuff forever. Like, that it's actually like happening and it's happening in an accessible way too. I guess that's what I'm really grateful for. It's not like, AI is not some Manhattan project style. You don't know anything about it. Closed doors. [01:03:12]Swyx: Closed doors. [01:03:13]George: I'll fight really hard to keep it that way. I'm grateful for just how much is released out there and how much I can just learn and stay up to date. And I guess I'm grateful to the true fabric of reality that, you know, I didn't need differential equations to understand it. Like, I don't need some like, there's a limit to my math abilities. I can do most undergrad math, but I took some grad math classes and okay, now we're getting to the end of what I can do. And it's just the actual like, end of what I can do. Like, I'm limited by my brain, but you know, ML stuff, hey, you need high school math. [01:03:45]Swyx: You know what I mean? [01:03:46]George: When I learned to multiply a matrix, seventh grade, [01:03:48]Swyx: like, it's all easy. You need more electrical engineering than you need high school math early. [01:03:52]George: Yeah, well, you need electrical engineering to like, build the machines, but even that, like, these machines are simpler than the machines that have existed before. The compute stack looks really nice. So, you know, yeah, I just, I'm grateful that it's all happening and I get to understand it. [01:04:05]Alessio: John Carmack mentioned there's about six insights we have left. Do you have an intuition for what some of the paths [01:04:11]Swyx: people should be taking? [01:04:12]Alessio: Obviously you're working on one. What are some of the other branches of the tree that people should go under? [01:04:17]George: I don't think I'm working on one of the six insights. I don't think TinyGrid's any one of the six insights. Something I really like that Elon does, and I try to be inspired by it, is look at the boring tunnel machine and ask how you can build a 10X cheaper one. All right, look at the rocket. How can I build a 10X cheaper one? All right, look at the electric car and say, how can I build a 10X cheaper, like, cheaper or, you know, can go further or whatever, whatever, whatever, right? And you just do the straight up physics math, right? I'm trying to do the same thing with ML frameworks, right? And in doing so, making sure that this stuff remains accessible. You could imagine a world where if Google TPUs were actually the ultimate, if Google TPUs were actually the best training things, I mean, actually, you know, I'm kind of grateful for NVIDIA, right? Because if Google TPUs were the ultimate, now you have this huge closed source compiler in between XLA and the hardware, and yeah, that's just a really bad thing. So, I mean, something that is somewhat upsetting about the Tiny Core is that it is trying to prevent downside, but it's not all trying to prevent downside. Like, we're also building computers and we're gonna build some awesome, powerful, cheap computers along the way. So, no, I'm not really working directly on any of the six tricks. I also think the six tricks are kind of gonna be like luck. [01:05:25]Swyx: I think it's just gonna be like, you know, [01:05:26]George: please tell me more about what covariate shift is and how that inspired you to come up with batch normalization. Please tell me more about why it's a transformer and it has a query, a key, and a value, right? Like Schmidt-Huber described it better in fast weights. I mean, my theory about why transformers work have nothing to do with this attention mechanism and just the fact that it's semi-weight sharing, right? Because the weight matrix is being generated on the fly, you can compress the weight matrix, right? Like, this is what that, there's an operation in the transformer, which, and by the way, this is like, Qualcomm's SNPE can't run transformers for this reason. So, most matrix multipliers in neural networks are weight times values, right? Whereas when you get to the outer product in transformers, well, it's weight times weight. It's values times values, right? So, SNPE doesn't even support that operation, right? So, it's like that operation that gives the transformer its power. It has nothing to do with the fact that it's attention, [01:06:20]Swyx: right? [01:06:21]George: And this is a funny, like, but that is one of the six tricks, right? Batch, like these norms are a trick. Transformers are a trick. Okay, six more. [01:06:29]Swyx: So, you talk about attention as weight compression. [01:06:33]George: Compression is not exactly the right word. What I mean is that the weight can change dynamically based on the context. So, there was this thing in PAC-8 in the Hutter Prize that I absolutely loved, and I've never seen it again in neural networks, and it's a really good trick. Okay, imagine you have 256 weight sets for a layer, right? And then you choose which of the weight sets you're loading in based on some context. And that context can come from another neural net, right? So, I have another neural net, which projects 256 wide, one hot, do a softmax, predict it, and then I actually load the weights in. And I can do this operation at both test time and train time. I can do this operation at both training and inference, and I load in the weights given the context. Like, that is what transformers do. But transformers, instead of having 256 discrete ones, it's actually just that, but continuous. Which is funny that that was in language models, and I just like, when I understood that about transformers, I'm like, oh, this is a real trick, and why are they using the word attention? [01:07:23]Alessio: And today is actually the anniversary of attention is all you need. What? [01:07:27]Swyx: Oh, that's so cool. [01:07:28]Alessio: Today, six years ago. [01:07:29]Swyx: Six years. [01:07:30]George: Six years. [01:07:31]Swyx: Changed the world. Wow. [01:07:32]George: Well, there's one of your envelope tricks, right? And you could easily write it on an envelope, think about how you write out that. How many times have you written that? Because it's not in any libraries, because it's all used a little differently each time. Like, you just write out that exact same, you know. [01:07:45]Swyx: You've name checked Elon a few times. I think about both of you as systems thinkers. Input, output, thinking something in between. What's different about your style versus his? [01:07:53]George: Elon's fundamental science for the world is physics, mine is information theory. But you do a lot of physics as well. [01:07:58]Swyx: I mean, like, you base it on- [01:07:59]George: And Elon does a lot of information theory as well, too. But the difference maybe is expressed in what your ambitions are, right? Elon's ambitions may be like- [01:08:08]Swyx: Go to Mars. Go to Mars, right? [01:08:10]George: Go to Mars is the ultimate modernist physics ambition, right? It's a physics problem getting to Mars, right? [01:08:16]Swyx: Well, what are electric cars? [01:08:17]George: It's a physics problem, right? Okay, now he's like pushing on the autonomy stuff, and you push a little on information theory. But fundamentally, his dreams are physics-based dreams. My dreams are information-based dreams. I want to live forever in virtual reality with my AI girlfriend. Those are the aspirations of someone who accepts information theory as a core science. So I think that's the main difference between me and him. He has physics-based aspirations, and I have information-based aspirations. [01:08:39]Swyx: Mark Andreessen, he is a- Hi, Mark. He's a listener. He's a big proponent of effective accelerationism. You've been a bit more critical. Why do you say that IAC is not taken seriously by its adherents? [01:08:50]George: Oh, well, only the left takes ideology seriously. It's just like a fact, right? [01:08:55]Swyx: Is the right more cynical? Is that what it is? [01:08:57]George: I don't know. [01:08:58]Swyx: It's like the left actually manages [01:08:59]George: to get energy around the ideologies, right? [01:09:02]Swyx: Look, here you have- [01:09:03]George: You have two effective altruists named Sam going in front of Congress. Only one of them is in jail. [01:09:08]Swyx: You know, it's interesting. [01:09:09]George: They're both calling for regulation in their respective spaces, right? [01:09:11]Swyx: So SBF is definitely like kind of wolf in sheep's clothing, kind of, right? Like he only adopted IAC or EA to market. [01:09:19]George: Oh, and Sam Altman is a genuinely good guy who is not interested in power-seeking for himself. [01:09:24]Swyx: All right. Okay, okay. We don't have to go there. Fair enough, fair enough. [01:09:27]George: But no, IAC is not like, like you are not serious, right? Mark Andreessen, I like Mark Andreessen, but it's like someone who's like 2019, whose like eyes were opened about like the political world being not exact. You mean all the people on the news were lying to me? [01:09:42]Swyx: Bro, they were lying to you. [01:09:43]George: Like, okay, we all figured this out five years ago. Now, what are you going to do about it? I'm going to complain about it on Twitter. Great, and that's what IAC is. [01:09:50]Alessio: Last and maybe most important, why was Avatar 2 bad? [01:09:55]Swyx: Oh, I have a whole, you can go on my blog. [01:09:56]George: I rewrote the script of Avatar 2. I wrote a script that actually might make you feel something for the characters. I killed Jake Sully in the first scene. Like you had to. Do you really think his second story art topped his first one? No, of course not. You had to kill the guy and make the movie about the brothers, right? And just that alone and realizing that, like you could have kept the Titanic scene. [01:10:16]Swyx: It would have been fine. [01:10:16]George: I didn't even take it out. I left your Titanic scene, James Cameron, but I wrote you a story. So, you know, you're just, just, just. [01:10:23]Swyx: He needs ships to sink in water. [01:10:24]George: Look, it's a great scene, but like the movie was just like, like the Roman, I've never. [01:10:30]Swyx: Great CGI, you know, let down by the writing maybe. It's a beautiful world. [01:10:34]George: And that's why like I care so much, right? Like you don't hear me ranting about Pirates of the Caribbean 2 being a terrible story. Cause come on, what do you expect, man? Like Johnny Depp's like, wow, I had a movie that made me rich. I love this. [01:10:44]Alessio: But this goes back to like the midpoint. You know, I think you wrote like, feels like ChatGPT wrote the movie and that's my worry a little bit. It's like kind of converging towards that. [01:10:53]Swyx: Oh, I. Malik, Malik wrote the movie. Sorry, I didn't want to interrupt you. [01:10:59]George: I closed a pull request two days ago. I was like, was this written by ChatGPT? And I just closed it. [01:11:04]Swyx: Like, you know what? [01:11:05]George: I honestly feel bad if you were a human who wrote this. [01:11:07]Swyx: Incapable of being more perplexed. [01:11:09]George: But if you, if I have a classifier running in my head that asks, you know, is this a AI or is this a human? Like, you know, the only way to deal with all this, like, like, like, oh God, it's like the worst possible. Like, you know, people are like, how are you mad about like these chatbots? You're not mad about like Tesla. I don't want to buy a Tesla. I don't have to buy a Tesla. And it won't really impact my life negatively. But if I don't want to use a chatbot, it's still going to impact my life negatively. All the amount of like personalized spam that now makes me spend more cycles on my classifier to tell if it's spam or not, because you can now use AIs and generate this so cheaply. Like, no, I mean, we have to move to a model where everything's just a dollar, right? Like you want to send me an email, it's a dollar. Like you guys wouldn't care. None of my friends would care. No one would care, except the spammers, right? Like we just got to move to those sort of models. [01:11:54]Swyx: Awesome. [01:11:55]Alessio: One last message you want everyone to remember. [01:11:58]George: Go try TinyGrad. I hope that we're a serious competitor to what's out there. And then I want to take it all the way. We'll start with just building something for GPUs and then we'll start building chips and then we'll start building fabs and then we'll start building silicon mines and then we'll have the first self-reproducing robot using. [01:12:15]Swyx: Yeah, okay. All right, George. [01:12:18]Alessio: Thank you so much for coming on. [01:12:19]Swyx: You did a big inspiration. Thank you. Thanks. [01:12:21]Swyx: Thank you. [01:12:29] Get full access to Latent Space at www.latent.space/subscribe
01:12:4120/06/2023
Emergency Pod: OpenAI's new Functions API, 75% Price Drop, 4x Context Length (w/ Alex Volkov, Simon Willison, Riley Goodside, Joshua Lochner, Stefania Druga, Eric Elliott, Mayo Oshin et al)
Full Transcript and show notes: https://www.latent.space/p/function-agents?sd=pfTimestamps:[00:00:00] Intro[00:01:47] Recapping June 2023 Updates[00:06:24] Known Issues with Long Context[00:08:00] New Functions API[00:10:45] Riley Goodside[00:12:28] Simon Willison[00:14:30] Eric Elliott[00:16:05] Functions API and Agents[00:18:25] Functions API vs Google Vertex JSON[00:21:32] From English back to Code[00:26:14] Embedding Price Drop and Pinecone Perspective[00:30:39] Xenova and Huggingface Perspective[00:34:23] Function Selection[00:39:58] Designing Code Agents with Function API[00:42:16] Models as Routers[00:46:48] Prompt Engineering replaced by Finetuning[00:52:15] The 2 Code x LLM Paradigms[00:56:30] Smol Models for the future[00:58:54] The Evolution of the GPT API[01:03:27] Functions API Security vs Prompt Injection[01:16:18] GPT Model Upgrades[01:17:36] JSONformer[01:21:03] Closing Comments - What We Want Next Get full access to Latent Space at www.latent.space/subscribe
01:28:1214/06/2023
From RLHF to RLHB: The Case for Learning from Human Behavior - with Jeffrey Wang and Joe Reeve of Amplitude
Welcome to the almost 3k latent space explorers that joined us last month! We’re holding our first SF listener meetup with Practical AI next Monday; join us if you want to meet past guests and put faces to voices! All events are in /community.Who among you regularly click the ubiquitous 👍 /👎 buttons in ChatGPT/Bard/etc?Anyone? I don’t see any hands up.OpenAI has told us how important reinforcement learning from human feedback (RLHF) is to creating the magic that is ChatGPT, but we know from our conversation with Databricks’ Mike Conover just how hard it is to get just 15,000 pieces of explicit, high quality human responses. We are shockingly reliant on good human feedback. Andrej Karpathy’s recent keynote at Microsoft Build on the State of GPT demonstrated just how much of the training process relies on contractors to supply the millions of items of human feedback needed to make a ChatGPT-quality LLM (highlighted by us in red):But the collection of good feedback is an incredibly messy problem. First of all, if you have contractors paid by the datapoint, they are incentivized to blast through as many as possible without much thought. So you hire more contractors and double, maybe triple, your costs. Ok, you say, lets recruit missionaries, not mercenaries. People should volunteer their data! Then you run into the same problem we and any consumer review platform run into - the vast majority of people send nothing at all, and those who do are disproportionately representing negative reactions. More subtle problems emerge when you try to capture subjective human responses - the reason that ChatGPT responses tend to be inhumanly verbose, is because humans have a well documented “longer = better” bias when classifying responses in a “laboratory setting”.The fix for this, of course, is to get out of the lab and learn from real human behavior, not artificially constructed human feedback. You don’t see a thumbs up/down button in GitHub Copilot nor Codeium nor Codium. Instead, they work an implicit accept/reject event into the product workflow, such that you cannot help but to give feedback while you use the product. This way you hear from all your users, in their natural environments doing valuable tasks they are familiar with. The prototypal example in this is Midjourney, who unobtrusively collect 1 of 9 types of feedback from every user as part of their workflow, in exchange for much faster first draft image generations:The best known public example of AI product telemetry is in the Copilot-Explorer writeup, which checks for the presence of generated code after 15-600 second intervals, which enables GitHub to claim that 40% of code is generated by Copilot.This is fantastic and “obviously” the future of productized AI. Every AI application should figure out how to learn from all their real users, not some contractors in a foreign country. Most prompt engineers and prompt engineering tooling also tend to focus on pre-production prototyping, but could also benefit from A/B testing their prompts in the real world.In short, AI may need Analytics more than Analytics needs AI.Amplitude’s Month of AIThis is why Amplitude is going hard on AI - and why we recently spent a weekend talking to Jeffrey Wang, cofounder and chief architect at Amplitude, and Joe Reeve, head of AI, recording a live episode at the AI + Product Hackathon where 150+ hackers gathered to compete for over $22.5k in prizes from Amplitude, New Relic, LanceDB, AWS, and more.To put things in perspective, Amplitude is a legendary YC alum with $238M of revenue in 2022 — our first guests representing the AI efforts of a public company!We chatted about how they have been approaching AI in their product (“question to chart” BI, text field autofill, instrumenting Amplitude with Amplitude), some of the issues they’ve had with different models, and the importance of first-party data in the world of LLMs. Another topic that came out of the Q&A was this idea of almost an “AmplitudeGPT”; rather than using language to simply generate a query, you could have these models investigate reasons for why certain behavior is happening in your user base. It was a really good discussion, and hope you all enjoy listening to it! Sections* [00:00:47] Amplitude's founding story and pivot* [00:03:28] Amplitude as an AI company and opportunities* [00:07:14] Limitations and challenges with using AI models* [00:10:56] Using Amplitude's product to build Amplitude - instrumenting AI* [00:12:32] Existing ML models in Amplitude's product and customer use cases* [00:15:50] “A/Z testing” and adaptable products* [00:19:33] The future of analytics and dashboards* [00:21:03] Optimizing for metrics in chatbots and AI products* [00:26:22] Using general models vs. fine-tuned models* [00:30:24] The importance of models vs. data - Amplitude's data set* [00:39:00] Lightning Round + Q&AShow Notes* Amplitude* Sonalight to Amplitude pivot announcement* The Slack origin story* Reverse Engineering Copilot* Simon Willison’s blogTranscriptEditor’s note: all timestamps are 1 minute behind because we hadn’t yet added the intro before making these. Sorry about that!Alessio: Thank you everyone for coming. Hopefully, some of you have listened to the podcast before, if you haven't, we focus on AI research and application. So we don't focus on “AI is going to kill us all”. We don't think about virtual girlfriends. We don't think about all of these more societal things. We're focused on models: how do you build them? How do you train them? How do you use them in production? What are some of the limitations on getting these things from demos to things that millions of users use? And obviously, a lot of you are building things. Otherwise, you wouldn't be here. And some of you have been building things for a long time, and now have a new paradigm that you want to build on top of. So I'm excited to dive in here. And maybe, I mean, I'm sure most people know you, but maybe you want to do intros and give a little background. [00:00:47]Jeffrey: Sure. Yeah, hey, everyone, met you all this morning, but I'm Jeffrey. I'm one of the co-founders and Chief Architect here at Amplitude. Been working on this product analytics thing, helping people understand user behavior data and make great product decisions and build better products for the last decade or so. And obviously, AI is a technology that we've been leveraging for a long time, but the recent trends are particularly exciting. And yeah, we have a lot of thoughts on how to apply that to our space, what we're doing in our product, and what we think the future of AI and product development and product data is. So excited to talk through some of those. [00:01:20]Joe: Yeah, I'm Joe, Joe Reeve. I've got a background in sort of startups and tech, been professional software engineer since I was 16, quit college. And at the moment, I'm running sort of AI R&D efforts here at Amplitude. Super excited about all the new stuff, but also all the stuff that Amplitude's been doing for a long time and how we're sort of getting renewed interest and excitement and abilities to push that even further forwards. [00:01:44]Swyx: So I think it's useful for people listening on the podcast and also some people here. Can you contextualize Amplitude as an AI company? Like what does that mean to you? What unique opportunities do you guys have? [00:02:02]Jeffrey: Sure, yeah, happy to speak to that. So, you know, if we think about the fundamental thing that our customers of Amplitude try to do, it's they want to look at their product data and they want to figure out how do I make my product better? And the really cool thing about product data is that one, it's often like very high fidelity, right? Digital products compared to, you know, let's say physical products before them have way more information about what's going on. And so that's why product data is, you know, even a thing at all, right? You finally have that feedback loop of, hey, I built this thing. This is how people are using it. Now let me learn from that and make my product better. Now, one of the downsides of that is that the data is massive. If you look at any of the internet scale products out there, they generate enormous amounts of data. And the ability of humans to kind of sift through that data is obviously limited. At Amplitude, we try to give people as many tools, whether AI or not, in order to process that. But at the end of the day, if you could get from the data and what user behavior is happening in your product to the insights of how to make your product better without as much manual work, that's kind of the holy grail of product analytics. And so in some sense, Amplitude has always been a company on the path to AI because figuring out how to make your product better from data is ultimately an AI problem. And so we're kind of just solving all the barriers in the way, like getting data in first, building good models for short-term things. And long-term, it's always been about, hey, how can you take product data and automatically make your product better as fast as possible? [00:03:28]Alessio: So that's the future of Amplitude. And a lot of people here probably want to start companies and whatnot. So maybe you want to give a 60 seconds of why you started Amplitude and what the story was like and maybe the first three to six months, what the challenges were. [00:03:42]Jeffrey: Yeah, of course. It's funny that we talk about this because the start of Amplitude is actually almost more AI than the current state. And so actually my two co-founders, Spencer and Curtis, they went through YC originally with not Amplitude, but SonaLite, which was a text-by-voice company. So it was kind of before the era of Siri and those types of technologies where they wanted to build something that would read text messages to them, that's easy, but also do voice recognition so that you could send text messages, say when you're driving, without having to pull out your phone. And so they worked on it and it was really popular back when they were doing it. After they finished YC, they realized the big innovation that they needed to figure out in order to make that successful was being really good at voice recognition, which was a different problem. They're awesome software engineers, but they don't come from an ML background. And so it's like, okay, are we going to spend the next five years solving voice recognition? Not really the thing that they had in mind when they were building product. But one thing that they happened to stumble upon as they were working on that was they spent a lot of time thinking about, hey, what was hard about that product? What made users churn? What made users really love it and engage? And they built a bunch of analytics tools to help them understand that. And they were really kind of shocked that those tools didn't exist out there in the market or they were like much more primitive than they wanted. And it turns out a bunch of other people in their YC batch felt the same. And they were like, hey, that analytics thing you're building, we want that. For you to text by voice, we want your analytics product. And so they're like, okay, fine. We will pivot, natural language and voice recognition isn't really our thing. And so we'll do distributed systems and analytics instead. That's where I came in. I'm a distributed systems and analytics guy. And so I happened to get in touch with them just through some mutual friends at the time. And then, yeah, we kind of went on it. The funny thing about a lot of things in technology is that the most forward thinking companies with respect to a lot of technologies are gaming companies. And so a lot of AmpliG's early start was either gaming companies or companies with founders that came from gaming backgrounds, where in gaming people have always been very, very rigorous about product data and optimizing engagement loops and all of that. And so they look for the best tools. We went to Zynga 15 years ago. It's like, that's where product analytics originated. And so a lot of those founders of new startups who had left Zynga were like, hey, that thing that you're building, that's trying to figure out patterns and user data and use that to make better products. That is exactly what we want after leaving Zynga. And then from there, that was Amplitude.Swyx: Yeah, I think famously other gaming companies would be like Slack, right? Mr. Butterfield tried to make a gaming company and failed and made Flickr. Then he tried to make another gaming company and failed and made Slack. And now look out to see what he does next. Discord as well. That's right. [00:06:34]Jeffrey: Yeah, people who come from gaming backgrounds are very rigorous in their product thinking. [00:06:39]Swyx: That's interesting. Alessio, you have a background in games? [00:06:43]Alessio: Yeah, in playing them, not in building them. So I will not fall into an enterprise company by doing that. Let's talk about R&D today and some of the ideas that you're working through, like some of the limitations that you run through. I think the most interesting thing about hackathons is you come with an idea and then you kind of hit a wall trying to build it. And then that takes you into another path. Like what are maybe funny things that you learn in terms of like the limitations of these models or like the missing infrastructure for using them? [00:07:14] Joe: So we've got a couple of different frames for thinking about this. There's AI that we're putting into our products and then us knowing that our customers want to put AI into their products. So there's the, how do we support our customers in their product development using AI? But how do we do that ourselves? And this is a great opportunity for us to learn the challenges our customers are gonna see. And so the first thing there is let's just start from the beginning, assume we want to add AI to our product, which maybe isn't the best place to start, but let's just assume we want to. How do we start ideating opportunities to put stuff into our product? So we sort of came up with this framework where we look at our product and we think about what are the collaboration touch points? So where are the points that a human might hand off to another human? And then think where can we replace one of those humans with the machine? So instead of thinking of some AI, amorphous AI, LLM, whatever, we're thinking actually, what if we had a robot that we were collaborating, not just a human, not just some sort of thing that spits out numbers. So collaborating. Then there's thinking of these as tools. So this is like your auto-suggest, on your mobile keyboard or spell check or something. How do you integrate this stuff as deeply into your product? So what are the friction points that users go through? Maybe they check lots of boxes. Is there a way we can pre-check those boxes we can get? So that's the feature embedding really deeply into the tool you've already got, the product you've already got. And then you step back and think, okay, what's a tool? So a tool is like ChatGPT, where you go there, it's an AI powered tool. It's not necessarily connected to your product, but it's a supplementary tool that you add. So there's a sort of ideation process there that we went through. And we sort of landed on a couple. And one of the key things that Amplitude does is help our customers, one, collect data in like a standard and sort of queryable way. And then we help them query it and get insights out of that data. So we were thinking, what's the feature there? How do we embed that? But also what's the collaboration point? And you might be a product manager asking an analyst, hey, please help me. Let's have a conversation about this. I don't know what questions to ask, but you also might just be about to go click the big create button and fill in a bunch of fields. And can we fill in a bunch of the fields for you? So we went to what to us seemed like one of the most obvious places. And we built a text box. Surprise, surprise with LLMs. We've got a text box. You can type in a question, type in anything about your data that you want to know, and then it'll spit back a chart, which is kind of neat. And we hit a bunch of problems there with LLMs hallucinating, losing context, even within the context windows, not really sort of recalling everything within the context window. So we sort of did a bunch of experimentation and realized if we split this down to seven different questions, so instead of saying, generate me a chart and a query for this one question, let's split that into lots of sub queries, like what kinds of events should I use? How should I display this? What should I call it? Rather than asking you all of that in one go. But then we had another problem where we have one query that a user makes that actually spins out seven different queries. So how do we monitor this? We can't just say one performance metric. You know, RLHF, you can't just say yes or no. Was the query response good? Because it might've failed for one of seven reasons. And maybe multiple of them failed or maybe some of them failed and then maybe they've hallucinated. And so we're getting code errors where an enum is not being matched. So we've had lots of sort of issues going all the way down there that we've had to figure out from first principles and sort of a really exciting way for us to understand what our customers are going through. [00:10:56] Swyx: So I wanna be clear. So you've described your exploration and how you think about products. What have you released so far? I just wanna get an idea of what has been shipped. [00:11:08]Joe: Sure. So in terms of LLM stuff, this, we call it question to chart internally. This ask a question, get a chart out. This, we've started rolling out to customers already. So last week, actually, started rolling out to our AI design partners a sign that we had signed up, which is a really exciting process. Actually, a lot of customers are just so excited to work with us and try it out and see how they can break it. So that's something we rolled out recently, which is built in LLM. It's the first piece built on LLM that we're working on. But we've also had a bunch of long-term ML, sort of traditional ML models that we've been running and products that we've been running with customers that help them predict what their users are gonna do. Because we've got this massive behavioral data set, best behavioral data set in the world. So we can train these awesome models and help our customers predict what their users are gonna do. So they can share the more relevant content or now is the right time to ask people if they want to upgrade or they want to rate your app or that sort of thing. [00:12:05]Swyx: Yeah, there is a little bit of a contrast, conflicts, because you already had all these ML models in-house and you're spinning up a new AI team and you're like, no, let's do all of this with GPT-3. Are the existing ML researchers saying like, no, this is a complete misuse of text generation? Or are they excited about it? Is it unlocking new things? [00:12:32]Joe: Yeah, actually, it's the combining these things. So we're able to use the traditional ML to shorten the fields, to narrow the number of things we need to pass into the LLMs. Because the LLMs can do a lot more of the reasoning, but we can make sure that the context we're providing is much more specific and generally much better by using the traditional ML models. [00:12:53]Swyx: Yeah, okay. And then the pain points that you're experiencing are hallucination. And then also like the multi-query thing. What do you think you wish for? Or what do you think you're thinking about to solve those pain points? [00:13:06]Joe: So right now we're instrumenting with our own product. So we're instrumenting groups of inferences and individual inferences, which means we can then create charts that show how often they fail, why they fail, how often we need to retry to get good answers. Swyx: So amplitude using amplitude. [00:13:23] Joe: Exactly. To build amplitude. [00:13:24]Swyx: Yeah, exactly. [00:13:25] Joe: Well, I mean, we're a product company. What else would we do? [00:13:29]Swyx: That is the second part of what you're saying, right? Which is, first of all, you want AI in the amplitude products. Second, people are shipping AI products with amplitude. You wanna talk a little bit more about what you're seeing there? [00:13:39]Joe: Yeah. I guess the key thing here is, for a lot of people is, okay, I can build the thing that calls OpenAI's API and then gives a response back. I'm nervous that I'm gonna be giving incorrect answers. I'm nervous that I don't really know how to measure whether the answers are incorrect. And I'm nervous that I'm not gonna be able to improve over time. So a lot of people we actually hear are nervous of giving thumbs up, thumbs down buttons because they're implying to their users that they're gonna be using this to improve the results. But they actually have no idea how to use that to improve the results in a meaningful way. And particularly when you've got multiple queries going off for one request, you've gotta then fine tune lots of different things in parallel. So it gets to be quite a technically complex sort of problem if you're not using great tooling that already exists for it. So that's, and then you have the extra layer of, I'm getting a bad result. I've tweaked my prompt template that I'm sending off to OpenAI. And now, has the result got better or worse? [00:14:35]Swyx: I don't know. [00:14:36]Joe: I don't know how to measure that. Except by thumbs up, thumbs down, which is a difficult measure in the first place. So that's where we can start saying, measuring the behavior of users once we've generated something for them. So have they gone and shared this content? Have they used this content? They actually gotten any value out of it? Not just have they pressed thumbs up. We can actually measure, are they getting value? Are they throwing it away from their behavior? But then using that through the Amplitude product, we can then tie that through to A-B tests, which is another product that Amplitude has. So then suddenly we start, and we're not doing this yet. This is sort of next on our list, is to start putting these prompts into our A-B test variants. So then we make a tweak in the UI, and it goes off, fires on the original, the control and our variant, our new variant. See, does it get fewer or more errors? Does it get fewer or more thumbs up, thumbs down? [00:15:30]Alessio: Have you thought about, I don't know, A-Z testing, I guess? Like one of the limitations has been, well, people can only write so much copywrite to test, but now with these generative models, you can actually generate a lot of copy. And like you go to on-demand test more and more and more copy. Have you seen any maybe fun customer stories? Like can you, anything there? [00:15:50]Jeffrey: Yeah, so actually there's a very good example of this. I don't know if I can share the actual customer, but actually from before the LLM days, where they literally generated the versions of the copy themselves, and they made their product basically adapt, you know, multi-arm bandit style of like, hey, here's all these different variations, like just go figure out the best one. At an internal hackathon, maybe two months ago, I built a prototype of what you're talking about, which is, okay, now replace the copy generation with an LLM. So just constantly generating new variations, and then multi-arm banditing to figure out which one's the best. I think that is probably the future of copywriting, where it's like, you don't actually need a whole lot of manual work anymore. It can, almost everything can happen automatically. And it's kind of the micro example in my head of this concept that we really like, which is self-improving products, where, you know, at some point, you know, someone has to say, hey, I'm gonna build a product that does this, you know, like a newsreader or something. But then, you know, after you have that, like the title of the newsreader, like the description of the sections, your navigation, all of that, in theory, you know, if you can give it some structure that the AI can play with, the LLM can manipulate all of that for you, and then use, you know, A-B testing, multi-arm bandits and all of that to kind of figure out what's best. And that generative AI kind of makes that last piece of like, what are my options possible? And that's super exciting for us. And we wanna be there, you know, to help you measure that, help you deploy that, and make that like the way people build products in the future. [00:17:14]Alessio: I think I've talked about this on the podcast, but this idea of like just-in-time UIs, you know, like each type of user wants to interact in a different way. And like, what you're building is a way of that, right? Like, Amplitude has been really like dashboard-driven, kind of like a diagram-driven, showing the user flow. Now each user can say, hey, I don't really want the table. I just want the charts. Or like, I don't want the charts. I just want the data. What do you think about the future of like dashboards and like BI in general? But like, the analysts used to come up with like what you should be seeing. Now each user can ask their own questions. [00:17:47]Jeffrey: Yeah, like the future of analytics, I think, is, you know, can go a few different paths. One thing that I want to, you know, counter against the whole LLM trend a little bit is I think when you get into really important and specific questions, you know, let's say you're writing like some complicated SQL or even code, you know, code and SQL are good because they're very specific, right? You can define your semantics very precisely. And that's something that I think, you know, when people start thinking about like natural language questions, they kind of take for granted. They're like, oh yeah, why doesn't it just, you know, figure out the precise semantics from my very ambiguous words? It's like, well, it's actually, in some senses it's possible, right? Because the precise semantics are not captured by your ambiguous natural language words. And so the way we think about it, at least today, you know, who knows what's going to change in the future is like natural language is a great interface to like get started. If you don't know what the underlying data looks like, if you don't know like what questions you should be asking, it is a very, very expressive way to start, get started. It's much easier than manipulating a bunch of things, much, much easier than writing SQL and all of that. But like once you kind of know what you want, it's very hard to like make it precise. It's actually easier to make SQL or code precise than it is natural language. And so that's a little bit of what we're thinking right now. So we think, you know, for sure the way that maybe many people will interface with analytics and data will turn into natural language because maybe the precision doesn't matter to them. But like at the end of the day, when you're trying to get, you're trying to sum up your revenue or something, it's like, you want to know that it's right. And you want to know the semantics that go into that. And like, that's why, you know, that's part of why data is hard. The semantics really do matter. They can make a huge difference in the output. And so there's a boundary there that I'm curious where it will push over time, but I don't think it's quite there yet. [00:19:33]Joe: I think this is where models sort of can become more embedded as features rather than go off and do this thing, create this analysis for me and then come back, the collaborator model. Then we're saying this field, I'm not sure what should go in there. Can you make a suggestion? And then I'm going to go and refine it over time. So it's the sort of autofill, but guessing autofill, but then you still, you can tweak everything. This is one of the core design sort of principles that we've come up is yes, you've got to be able to explain what the model's doing. And as a human, I need to understand, a user I need to understand what is the model doing and why is it doing it? But I also need to be able to tweak it once it's done it. I don't want to feel like I've just said go and then I can't stop it and it's going to go off and do stuff. And that's sometimes how things like AutoGPT can feel. It's going and it's costing me OpenAI tokens and I have no idea what's going on. So yeah, I think a key thing is servicing all the individual things the model's doing and allowing users to tweak it, stop it, retry while it's going. [00:20:33]Swyx: For me, one of the most challenging questions is something I think you guys have maybe thought about a lot which is chat. Ideally you want, like you could say naively, for example, you want to optimize time in app, but actually that's a sign of failure if the chat session is longer than it should be. Do you have any advice on, I'm sure you've dealt with this before pre AI era, but like what do you advise AI hackers to optimize for? Like what analytics should people be looking at? [00:21:03]Jeffrey: Yeah, our general kind of philosophy as a company is to work with customers to identify north star metrics. Right, and like time in app is not good primarily because it doesn't actually correlate with your business outcomes most of the time. And to be fair, sometimes it does. Like if you're a social media app, maybe it does correlate really well and maybe it's not a bad metric then. But for a lot of other products, right, if you're trying to do the search, for example, or like time on search, like nobody wants that. It's like, yeah, what is your success rate? You know, how many, do you get them to come back and search in the future? Like that's much more interesting than the time of your session. And so, because you know, each time you can serve apps, right, that's your business. And so it's like, if you choose a metric that's well correlated with your business outcomes, then that's at least the first step to getting that right and not getting caught up in other vanity metrics that sound like they could be good to increase, but then, you know, they can sometimes lead to negative business outcomes, you know, and then you get the worst. You've optimized the wrong metric the whole time. And that's where tying in AI and product analytics makes a lot of sense. And it's really important because product analytics, these companies that are like our customers that are trying out building features that are LMs and they're not sure what to optimize for, optimize for the same thing you're already optimizing for. You're already measuring conversions. You're measuring how much value, hopefully, your customers are getting out of your product. So continue doing that and maybe find a way to tie the LLM feature to that and sort of through A-B tests and that sort of thing. And then on the chat specifically, chat is obviously for a business maybe rolling out a chat box based on LLMs. It can be really scary. And that's another sort of mental model of framing we've been thinking around is we find LLMs right now are most useful either when you come from, either when you have a narrow input space and a broad output space, because you can be very, you know exactly what format of data, what kind of data is gonna be passed in. That's probably not coming directly from a user. It's probably coming from a button click or a toggle switch or something. And then you can have a general output and you can provide templates and that sort of thing. And then the other way is broad input space, narrow output space. So that's free form text box. And you can provide a bunch of sort of clamping, framing, validation on the output to make sure that you're not spewing out, you know, poems about Hitler or whatever it is. You know, you can be really, really deliberate when you've got a small output space. Chat is large input space, large output space, which is really, really scary. If you're, as a company, you're not selling a chat product, you're selling a, you know, an analytics product with maybe a chat support bot or something. [00:23:37]Swyx: Yeah, I think this is one of those opportunities. I always try to raise the awareness of this, that Copilot I think did a really interesting metric or North Star, which was how much code is kept or retained by the user. And for people who are Googling along, you can actually look for this blog post about reverse engineering Copilot internals. And they actually set up custom metrics around, you know, 30 seconds after a code snippet is accepted, one minute, two minute, three minute, all the way to five minutes. And you can sort of see it construct a curve of how long Copilot suggestions stick around. And from there, they can actually make statements like this, you know, evaluate the success of the products. It's pretty cool. [00:24:18]Joe: One of the really nice things we found actually, we accidentally did this. So our chart building interface, heavily instrumented. It's a, we're Amplitude. So we instrument our product. We also, it's one of the main tools that our customers use. So it's really, really well instrumented. And so when we tied chart creation through asking a question through an LLM, and then we tied that to a chart, an output chart, we then automatically were able to tie every time someone edits any of the parameters to that generation. So then we know, we have really detailed RLHF data for, yeah, you got everything apart from the metric, right? But you got everything apart from this event that shouldn't have been there, because that's the one that got removed. So similar to the Copilot there. [00:25:00]Alessio: And I want to make sure we open it up for questions, but like one last thing is about, everybody knows that small is beautiful. And when you think about what models to use and some of the parameters, like there's costs, there's latency, there's like accuracy. How do you think about using, you know, GPT-4 and some of those models versus using smaller ones that are fine-tuned? What are the trade-offs? [00:25:23]Joe: Yeah, I guess right now we're very much in the, let's explore, let's try everything and just iterate as fast as possible, which is what general models are great for. We do have some smaller, not even fine-tuned, some smaller models that we've sort of borrowed from Hugging Face that we run internally for more specific tasks. And that's often sort of selecting specific values before we pass it to a general model right now, just because the general models are much easier to communicate with and they understand most of the words we use. It's not like we use a word and suddenly we get random outputs for no reason, the sort of gold magic up type thing. So they're generally less susceptible to that. So that's why we're iterating heavily on the general models. I think we absolutely have to move to some more specific models, particularly given inference on fine-tuned open AI models gets more expensive and slower the more you do it. So yeah, that's definitely a thing we're looking at and we're doing some internal stuff, but it's the next step or one of the next steps. [00:26:22]Jeffrey: Yeah, to give a pseudo example of that, one of the hard things to help users within Amplitude is picking the right event to analyze. It's kind of your fundamental unit of analysis. And when a user comes in and let's say that's the first time they're using Amplitudes, someone else in their company has set up the product, so they don't know what the events are. Right now in Amplitude you get this massive dropdown and it's like, all right, there's a thousand things, like which one is the one I'm looking for. And sometimes the names are good and sometimes they're not. But one thing we did was, okay, yeah, feed that into open AI. Hey, tell me which event type best matches like this user's intent. That's like pretty good at that, right? So it's all language stuff, but it's a little bit slow and it's a little bit expensive to do that every time. And so we kind of fell back to, once we validated that that works, kind of fell back to a more traditional embedding-based approach. It's like, all right, compute all those embeddings. That's more work upfront because you have to go through your database of all of these things and you got to commit like that engineering work, but it's like you validate with the general model because it's just easy. It takes like an hour to figure out that it works. And then it's like, all right, can we do the same thing with embeddings? That's way faster, way cheaper and still has reasonable quality. Embeddings also have a nice quality that you can get like magnitude of things, whereas LLMs aren't great at giving you like, hey, it matches this much. It's kind of, you can ask it for an order and that's decent, but like, yeah, anything beyond that is pretty challenging. [00:27:42]Alessio: How do you think about the importance of the model versus the data, right? There's like a lot of companies that have a lot of data, but not a lot of AI expertise or companies that are just using off the shelf model. How should companies think about how much data to collect? What data is meaningful? What isn't, any thoughts there? [00:27:59]Jeffrey: Yeah, I think it's safe to say that both are really important, right? Like the evolution of LLMs really was a lot of model innovation. And so I don't want to downplay that. At the same time, I think the future of AI applications and doing really cool things with it will be in the data, partially because like, you know, ChatGPT has done such a huge advance, right? The LLMs model space has advanced like crazy in the last year. And so I think a lot of the untapped potential will be in data in the future. One thing that's particularly interesting to us is like we have a pretty unique data set, actually. It's a lot of first party behavior data, right? So if you're, you know, if you're Square, for example, you instrumented like the way that people interact with Square Cash and the wallet and the, you know, the checkout system. And like, those are very specific things. Like Square can't look elsewhere in the world for that stuff. And that's really interesting because, you know, to build models of user behavior, you need user behavior data. And it turns out there's not actually a lot of examples of user behavior data out there in the world. And so to Joy's point earlier about, you know, we have one of the best user behavior data sets in the world. And so if we want to build a model around that, I think it would be a super interesting one. So if you take an analogy to what ChatGPT does, it basically takes a bunch of language examples and it, you know, learns a bunch of abstract concepts, like how to, you know, prove math things or how to render in JavaScript. It's like, wow, that's very astonishing. They kind of prove, it's almost like a proof of concept to the world that if you train a sufficiently good, you know, transformer self-attention type model with a sufficiently large data set of, you know, hundreds of gigabytes of internet text, you'll learn really interesting abstract concepts. And so we want to apply that to our data set, right? Cat GPG is great because it's a proof of concept. If it didn't exist, you know, I would have told you, yeah, you can spend $10 million training this model on a data set, you'd probably not get anything interesting because we just have no idea. But because it exists, it kind of proves to the world that if you do this correctly, there is a ton of interesting value. And so that's what I think. And so, you know, amplitude is just one example of a very interesting data set that you will train something that's, you know, fundamentally very different from GPT or any LLM out there. And there's lots of other data sets out there. And I think that's where a lot of the interesting things will come once this kind of, this phase of like rapid model evolution kind of tapers out a little bit. And you'll see a lot of the more interesting applications there. [00:30:24]Swyx: So I've never thought about this much, but you guys must do it a lot. Like what is the ethics or best practices around training on user data when they don't know they're being watched? Like, I mean, presumably they're fine with tracking and events, but like, do we tell them that we're going to train on their data? Is it okay? [00:30:50]Joe: I guess there are a couple of things. One is PII. Doesn't go anywhere near the stuff, right? PII with strip and like, that's just a really important thing. [00:30:58]Swyx: You still need an identifier for streams. [00:31:02]Joe: Yeah, yeah. But in terms of training models, we don't want any of that to go in there because then you might accidentally, you know, like, hello, ChatGPT, please hallucinate me a social security number. That's dangerous. [00:31:11]Swyx: Also PII makes it into prompts a lot. [00:31:14]Joe: Sure, that's true. So then you have to strip that from your... So we have some experiments where we're stripping PII that is in places that shouldn't be, you know, descriptions of things. Sometimes people copy paste big long lists of email addresses into charts and things. But some of these things are actually pretty surprisingly easy to detect and strip out. So we can do that. And we have some layers that are stripping out that sort of replacing them with tokens. So the LLMs can still operate on them. But in terms of training this data, all that training is happening internally and we're not putting any sort of private data, personally identifiable information in. I don't know if there's anything you wanted to add there. Yeah, yeah. [00:31:54]Jeffrey: We certainly think about this a lot and our customers think about a lot. Like when I think about user privacy with respect to tracking, there's kind of this big spectrum. Around the one end, it's like literally track nothing and, you know, the end of story. And like for people like that, I mean, that's cool. You know, they're not gonna use Amplitude. They may not like us very much. You know, that is what it is. And then on the other end of the spectrum is like, we're gonna track you across the entire internet and sell your data to everyone. And like, that's obviously bad. And like, there's lots of good reasons to think that's bad. First party behavioral data, I think is actually probably almost as far. Fully anonymized first party behavior data would be like kind of the minimum. It's like web server logs with no IP, no identifier, nothing. The problem is that you can't do a lot of interesting behavioral analysis without that. You can't tell if, you know, this person that came on this day was the same one that purchased later. And so like, you can't actually, it's much harder to make your product better if you don't have that. And so, you know, we're kind of set at this place where we have, you know, like pseudo anonymized first party data. And like, we don't sell the data. You don't mix data from, you know, different places on the internet through Facebook cookies or things like that. And, you know, our philosophy is like, that is actually the most important data to build a better product. It's not the most important data to advertise, which is why Facebook and Google do what they do, but it's the most important data to build a better products. And it kind of strikes the right balance between yeah, totally tracking everything that you're doing and like not having any information to make your product better. [00:33:19]Swyx: Yeah, cool. And I think we're going to go to audience questions. So let's start warming them up soon. But I think we have some lightning round questions [00:33:29]Joe: The audience is thinking of questions while we go. [00:33:31]Alessio: The first one is, what's something that already happened in AI that you thought would take much longer to be here? [00:33:39] Jeffrey: I don't know what the constraints on our lightning round, but I think maybe creativity is the best word where it's, you know, with the image generation stuff, text generation, you know, one thing that still blows my mind, I used to be a competitive like math guy and like there's this international math Olympiad problem in one of the papers and it solves it. And I'm just like, wow, I can solve this when I was spending all my life doing this thing. Like that level of creativity really blew my mind. And what's the takeaway? It's like maybe the takeaway is that creativity is not as, you know, as not as high entropy or high dimensional as we think it is, which is kind of interesting takeaway. But yeah, that one definitely surprised me. [00:34:21]Joe: I guess there's something actually that maybe answering the inverse question that a lot of my friends were surprised happened quickly. And I was like, this is just braindead obvious. I've got a lot of friends in the AI safety space. So they're worried that in particular, X-risk, right, extinction risk, that AI is going to kill the human race. And they were like, oh no, what if an AI escapes containment and gets access to the internet? And then we get an LLM and the first thing we do is like, hey, also GPT, here's the internet. [00:34:48]Swyx: So you thought, it's happening faster than you thought. [00:34:53]Joe: Well, it's happening faster than, to me it makes sense, because I'm like one of the guys connecting it to the internet. And I'm like, I'm surprised that other people were surprised it was going to be so fast. [00:35:01]Swyx: Yeah, so a bit of context, Joe and I, we've been adjacent to the EA community and they have like smoothly migrated to the X-risk community very quickly after SBF. [00:35:13]Joe: Yeah, after SBF, yeah, that was fun. [00:35:16]Swyx: Okay, so next question, exploration. What do you think is the most interesting unsolved question in AI? What's next? [00:35:30]Joe: I guess like, is it going to keep getting better at the same rate? Is it going to, and that's just a super important question that's going to change. Like, depending on that answer, 50 startups are going to pivot or not pivot, right? [00:35:43]Swyx: Which is what's next, literally. [00:35:45]Joe: Literally, what's next? Like in a year's time, are the models similarly better than they have been so far? Or are we about to taper off or are we about to continue going linearly? [00:35:58]Jeffrey: Yeah, I'll throw one out that is not necessarily about AI, but like, what's intelligence, right? And if you ask people 20, 30 years ago, maybe even longer now, it's like, yeah, chess. Chess is intelligence. And then chess got solved and like, ah, that's just brute force. And it's like, well, you know, creating creative images and writing, that's intelligence. Well, it's like, that's solved too. Maybe it's just, you know, if you have enough parameters, you can capture that. So like, what is intelligence? What does it mean to have an AGI? What does that actually mean? And then what the implications that are on for our understanding of humans and our brains. I've always thought that, you know, everyone is just a stochastic machine. And so, you know, is everything consistent in my mind? Swyx: Free will and illusion. Exactly. [00:36:43]Joe: I guess maybe like the scaling piece is like that intelligence as you scale is gets more and more expensive on the traditional stuff. But then there's something I think I saw yesterday on Hacker News. It was people actually getting a brain to play tic-tac-toe. Like by a brain, I mean, stem cells grown into brain tissue. And they were able to train it. And like that to me is very significant because suddenly the like metal computers limitations is not applied. And then now we've got all this intelligence. What is intelligence stuff on a squishy wet computer? That makes it even harder to ask and even harder to draw lines. [00:37:18]Swyx: Yeah. Yeah. So famously, you know, language models are so much more inefficient than wet computers, as you say. And so if you can merge that, you know, the human brain runs on 30 Watts of power as it is my favorite fact. We're not anywhere close to that yet. [00:37:36]Alessio: Before we get into Q&A, one last takeaway that you want everybody to think about. [00:37:41]Jeffrey: Yeah, I'll do the one that we actually repeat in Inside Amplitude very often, not about AI, but I think it applies, which is it's early. It's sometimes hard to realize that when things are happening so fast, especially in the Bay Area, but like the ramifications of AI or in our case, product data and all that are gonna play out over the next many decades. And that's just, you know, we're very fortunate to be at the beginning of it. And so yeah, take advantage of it and keep reminding yourself that it's early. [00:38:15]Joe: I guess mine would be, let humans be good at doing human things. Let machines be good at doing machine things and let machines be good at doing machine things and help humans be good at doing human things. And like, if you don't do that, then you're gonna be building something that's either not useful or it's very scary. So yeah, get machines helping humans, not the other way around. [00:38:39]Swyx: Get machines helping humans. All right. With that, I think we're all gonna open up to questions. We're gonna toss you the mic. [00:38:45]Audience #1: Yeah, hey, thanks for the insight into how you guys implemented your AI, you know, question asking chatbot and how have you converted into seven sub queries and then generate the data out. I've just, I got a peak my interest about how you guys exactly do it. Like Alessio asked, like, what exactly is the model that you guys are using? Are you converting it into your, what are these queries that you generate from a single English language? Is it possible to go a little deeper just from a curiosity perspective? [00:46:34]Joe: So we have a custom query engine. So it's not SQL or anything that we're generating. We're generating a custom query output. So I guess the types of questions range. So things like chart type, are we doing a segmentation chart, a line chart or are we doing a funnel chart? You know, the number goes down over time or up over time or between a conversion between two events and there are various other types or metrics or, and then there's also the name. What should we name this chart that answers this question? So the way that's implemented in practice, you could use something like Lang chain to sort of chain these things together. But in our experience, I think Lang chain's a great tool for certain things and definitely really great for prototyping, but we found it quite restrictive. So we've ended up building sort of an internal, it's a very, very small wrapper, internal, we use TypeScript as well, framework that allows us to basically just write in code and infer within what we call a transaction, an inference transaction, which gets monitored as one, but then also all the individual inferences within it get monitored. So it's a bit like when you're writing a database transaction with most sort of, at least in the node ecosystem, the JavaScript ecosystem, where you sort of get a transaction object that you can operate on, and then you return your, or you return, you sort of commit your transaction. So we've got an interface like that, so we can just write pure TypeScript, await this response or await these responses. And then we've got a switch case. If it's a segmentation chart, go and do these with these queries. And then each of those inferences can be a different model. So we think in the future, maybe we have one query where we have some GPT-4 responses. We want some text responses. Maybe we also want to generate an image from that same query together, and then that gets bundled. So I don't know if that answers your question. Audience #1: Yeah, I think so. Yeah, thank you. I think so. You said in future, you're going to use GPT-4. What are you using right now for? [00:48:33]Joe: Right now, everything's GPT-3.5. We're moving around, and I think probably for some of the prompts, we'll use something like DaVinci. Some we might use GPT-4. Some we'll be using internal ones. And we also want to be able to degrade gracefully if a customer has told us they don't want us to send anything to OpenAI, then we can degrade to some internal models that maybe are some of the open source models that have been trained on smaller datasets. [00:48:57] Audience #1: Gotcha, makes sense. Thank you. [00:48:58]Jeffrey: Yeah, I think to add to that a little bit, the key is breaking down the problem sufficiently, because if you break down the problem enough, you can also provide it with some examples, which is super helpful, right? You know, GPT is quite good at zero shot, but within the context of our specific domain, it doesn't know what's going on. And so being able to break down the problem to, hey, select the type of chart. Don't generate me an entire chart definition. Select me the type of chart, and then select me the specific metric based on their query, and then giving it some examples. Select me the events and properties that I want to look at. By breaking it down and having very, very contextual prompts with respect to those examples, you get a lot higher quality output than trying to generate, like, you know, if you imagine generate, like, hey, generate me a whole SQL query with all, you know, here's like the schema of all my tables, now generate it entirely. It's like, it actually struggles with stuff like that, because it's just like kind of too much information and computation to come out of language. Now, maybe GPT-5 will be different, but like, that's the state of the art today. [00:49:57]Swyx: I'll ask a follow-up to Joe. So you mentioned, you mentioned trying LangChain, but not needing it for production. Any other comments on tooling that are out there that's interesting to you? Do you use a embedding database, for example, or do you just use a regular database? [00:50:18]Joe: Yeah, so we've actually been running embedding sort of similarity or vector search in production for multiple months, maybe even almost a year, and just like straight up Postgres, but now we're using PG Vector, which actually Jeffrey could probably speak more to about that decision and what that was like. [00:50:40]Swyx: So this is a pretty hot take. At Amplitude scale, all you need is Postgres? [00:50:46]Joe: We'd use many things other than Postgres. But I mean, we, this isn't rolled out for all customers and it's not necessarily getting sort of hit with a lot of traffic. And so the scale here is very different. Our usage scale is very different to our ingestion. [00:51:04]Swyx: Yeah, yeah, yeah. [00:51:06]Jeffrey: Just to clarify that a little bit more, we're not putting individual end user vectors or end event vectors. We're putting in taxonomies. So if I'm DoorDash, my taxonomy is add to cart, checkout, purchase, browse. That's the cardinality. And so that's actually small. It's on the order of tens of millions. And so yeah, you use stuff that in Postgres, no problem. Now, when we talk about large behavioral models or like actually embedding events, there are many, many trillions of those. And yeah, Postgres probably doesn't work there. [00:51:41]Swyx: Yeah, actually I wanted to comment on this slightly before, which is separating taxonomies from the actual data is one way you protect your customers against prompt injection. It's something that Simon Willison has been talking about where you want to have like query for one thing, but essentially no knowledge of the actual underlying data, just the taxonomy. So it's good practice. [00:52:00]Audience #2: Yeah, so you talked about a model which would be trained on user behavior data like amplitude GPT. It really piqued my interest and what capabilities would emerge? What do you think that you would find and what would be the first thing you would ask the model? That's a good question. [00:52:23]Jeffrey: So we've thought about this a little bit and I think the, right, these are sequence, token prediction models. And so at the very least, I would hope for a much better, we have a predictions feature right now, which says, hey, given what a user has done over the last 90 days, do we think they're gonna belong to this cohort in the future or not? So that cohort might be people who churn, people who purchase, people who upsell, whatever the customer wants. We think it would be much better at tasks like that, right, because if it just has a very good understanding of behavioral patterns and what's gonna come next, it would be able to do that. That's exciting, but not that exciting. If I'm trying to think about like the analogies to what we see in LLMs, it's like, okay, yeah, what is the behavioral equivalent of like learning physics concepts, right? It's like, oh, I don't actually know, but it might be this understanding of patterns of sessions and how that like, for example, categorizing users in a unsupervised way seems like a very simple output for a model that understands user behavior, right? Here's all the users and if you wanna discriminate them by their ability to achieve some outcome in the future, like here's the best way to separate that group and here's why, right? Be able to explain at that level and that would be super powerful for customers, right? A lot of times what our customers do is, hey, these people came back the next day and these people didn't, why? What was different about them? And so we have a bunch of heuristics to do that, but at the end, there's something like, causal impact is like one of the holy grails of product analytics. It's like, what was the causation behind some observed difference in behavior? And I think, yeah, a large behavioral model will be much better at assessing that and be able to give you potentially interpretable ways of answering that question that are like really hard to do, really hard, really computationally intensive, really like noisy, distilling causation correlation is obviously super hard. Those are some of the examples. The other one that I am, I don't know if I'm optimistic about it, but we really interesting is, one of the things that amplitude requires today is manual instrumentation, right? You have to decide, hey, this clicking of a button, this viewing of page, these are important things. I'm naming them in this way. There's a lot of popular tools out there that kind of just record user sessions or like track DOM events automatically. There's a lot of problems with those tools because the data is incredibly noisy. It's just so noisy, right? A lot of times you just can't actually interpret it. And so it's like, oh, it's great because I don't need to do any work. But like, well, you also don't get anything out of it. It's possible that a behavioral model would be able to actually understand what's going on there by understanding your user behavior in a correctly modeled and correctly labeled sense, and then figuring out. I don't know if that's possible. I think that would make everyone's lives a lot easier if you could somehow ask behavioral questions of data without having to instrument. All of our customers would love that, but also all of them are instrumenting because they know that's definitely not possible today. [00:55:26]Audience #2: This is really interesting. You're looking forward to the future. If you're gonna build it, it's gonna be amazing, yeah. [00:55:31]Jeffrey: That's the goal, that's the goal. [00:55:33]Audience #2: Awesome. [00:55:34]Swyx: Thanks for listening. [00:56:09] Get full access to Latent Space at www.latent.space/subscribe
49:2908/06/2023
Building the AI × UX Scenius — with Linus Lee of Notion AI
Read: https://www.latent.space/p/ai-interfaces-and-notionShow Notes* Linus on Twitter* Linus’ personal blog* Notion* Notion AI* Notion Projects* AI UX Meetup RecapTimestamps* [00:03:30] Starting the AI / UX community* [00:10:01] Most knowledge work is not text generation* [00:16:21] Finding the right constraints and interface for AI* [00:19:06] Linus' journey to working at Notion* [00:23:29] The importance of notations and interfaces* [00:26:07] Setting interface defaults and standards* [00:32:36] The challenges of designing AI agents* [00:39:43] Notion deep dive: “Blocks”, AI, and more* [00:51:00] Prompt engineering at Notion* [01:02:00] Lightning RoundTranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my co-host Swyx, writer and editor of Latent Space. [00:00:20]Swyx: And today we're not in our regular studio. We're actually at the Notion New York headquarters. Thanks to Linus. Welcome. [00:00:28]Linus: Thank you. Thanks for having me. [00:00:29]Swyx: Thanks for having us in your beautiful office. It is actually very startling how gorgeous the Notion offices are. And it's basically the same aesthetic. [00:00:38]Linus: It's a very consistent aesthetic. It's the same aesthetic in San Francisco and the other offices. It's been for many, many years. [00:00:46]Swyx: You take a lot of craft in everything that you guys do. Yeah. [00:00:50]Linus: I think we can, I'm sure, talk about this more later, but there is a consistent kind of focus on taste that I think flows down from Ivan and the founders into the product. [00:00:59]Swyx: So I'll introduce you a little bit, but also there's just, you're a very hard person to introduce because you do a lot of things. You got your BA in computer science at Berkeley. Even while you're at Berkeley, you're involved in a bunch of interesting things at Replit, CatalystX, Hack Club and Dorm Room Fund. I always love seeing people come out of Dorm Room Fund because they tend to be a very entrepreneurial. You're a product engineer at IdeaFlow, residence at Betaworks. You took a year off to do independent research and then you've finally found your home at Notion. What's one thing that people should know about you that's not on your typical LinkedIn profile? [00:01:39]Linus: Putting me on the spot. I think, I mean, just because I have so much work kind of out there, I feel like professionally, at least, anything that you would want to know about me, you can probably dig up, but I'm a big city person, but I don't come from the city. I went to school, I grew up in Indiana, in the middle of nowhere, near Purdue University, a little suburb. I only came out to the Bay for school and then I moved to New York afterwards, which is where I'm currently. I'm in Notion, New York. But I still carry within me a kind of love and affection for small town, Indiana, small town, flyover country. [00:02:10]Swyx: We do have a bit of indulgence in this. I'm from a small country and I think Alessio, you also kind of identified with this a little bit. Is there anything that people should know about Purdue, apart from the chickens? [00:02:24]Linus: Purdue has one of the largest international student populations in the country, which I don't know. I don't know exactly why, but because it's a state school, the focus is a lot on STEM topics. Purdue is well known for engineering and so we tend to have a lot of folks from abroad, which is particularly rare for a university in, I don't know, that's kind of like predominantly white American and kind of Midwestern state. That makes Purdue and the surrounding sort of area kind of like a younger, more diverse international island within the, I guess, broader world that is Indiana. [00:02:58]Swyx: Fair enough. We can always dive into sort of flyover country or, you know, small town insights later, but you and I, all three of us actually recently connected at AIUX SF, which is the first AIUX meetup, essentially which just came out of like a Twitter conversation. You and I have been involved in HCI Twitter is kind of how I think about it for a little bit and when I saw that you were in town, Geoffrey Litt was in town, Maggie Appleton in town, all on the same date, I was like, we have to have a meetup and that's how this thing was born. Well, what did it look like from your end? [00:03:30]Linus: From my end, it looked like you did all of the work and I... [00:03:33]Swyx: Well, you got us the Notion. Yeah, yeah. [00:03:36]Linus: It was also in the Notion office, it was in the San Francisco one and then thereafter there was a New York one that I decided I couldn't make. But yeah, from my end it was, and I'm sure you were too, but I was really surprised by both the mixture of people that we ended up getting and the number of people that we ended up getting. There was just a lot of attention on, obviously there was a lot of attention on the technology itself of GPT and language models and so on, but I was surprised by the interest specifically on trying to come up with interfaces that were outside of the box and the people that were interested in that topic. And so we ended up having a packed house and lots of interesting demos. I've heard multiple people comment on the event afterwards that they were positively surprised by the mixture of both the ML, AI-focused people at the event as well as the interface HCI-focused people. [00:04:24]Swyx: Yeah. I kind of see you as one of the leading, I guess, AI UX people, so I hope that we are maybe starting a new discipline, maybe. [00:04:33]Linus: Yeah, I mean, there is this kind of growing contingency of people interested in exploring the intersection of those things, so I'm excited for where that's going to go. [00:04:41]Swyx: I don't know if it's worth going through favorite demos. It was a little while ago, so I don't know if... [00:04:48]Alessio: There was, I forget who made it, but there was this new document writing tool where you could apply brushes to different paragraphs. [00:04:56]Linus: Oh, this was Amelia's. Yeah, yeah, yeah. [00:04:58]Alessio: You could set a tone, both in terms of writer inspiration and then a tone that you wanted, and then you could drag and drop different tones into paragraphs and have the model rewrite them. It was the first time that it's not just auto-complete, there's more to it. And it's not asked in a prompt, it's this funny drag-an-emoji over it. [00:05:20]Linus: Right. [00:05:21]Swyx: I actually thought that you had done some kind of demo where you could select text and then augment it in different moods, but maybe it wasn't you, maybe it was just someone else [00:05:28]Linus: I had done something similar, with slightly different building blocks. I think Amelia's demo was, there was sort of a preset palette of brushes and you apply them to text. I had built something related last year, I prototyped a way to give people sliders for different semantic attributes of text. And so you could start with a sentence, and you had a slider for length and a slider for how philosophical the text is, and a slider for how positive or negative the sentiment in the text is, and you could adjust any of them in the language model, reproduce the text. Yeah, similar, but continuous control versus distinct brushes, I think is an interesting distinction there. [00:06:03]Swyx: I should add it for listeners, if you missed the meetup, which most people will have not seen it, we actually did a separate post with timestamps of each video, so you can look at that. [00:06:13]Alessio: Sorry, Linus, this is unrelated, but I think you build over a hundred side projects or something like that. A hundred? [00:06:20]Swyx: I think there's a lot of people... I know it's a hundred. [00:06:22]Alessio: I think it's a lot of them. [00:06:23]Swyx: A lot of them are kind of small. [00:06:25]Alessio: Yeah, well, I mean, it still counts. I think there's a lot of people that are excited about the technology and want to hack on things. Do you have any tips on how to box, what you want to build, how do you decide what goes into it? Because all of these things, you could build so many more things on top of it. Where do you decide when you're done? [00:06:44]Linus: So my projects actually tend to be... I think especially when people approach project building with a goal of learning, I think a common mistake is to be over-ambitious and sort of not scope things very tightly. And so a classic kind of failure mode is, you say, I'm really interested in learning how to use the GPT-4 API, and I'm also interested in vector databases, and I'm also interested in Next.js. And then you devise a project that's going to take many weeks, and you glue all these things together. And it could be a really cool idea, but then especially if you have a day job and other things that life throws you away, it's hard to actually get to a point where you can ship something. And so one of the things that I got really good at was saying, one, knowing exactly how quickly I could work, at least on the technologies that I knew well, and then only adding one new unknown thing to learn per project. So it may be that for this project, I'm going to learn how the embedding API works. Or for this project, I'm going to learn how to do vector stuff with PyTorch or something. And then I would scope things so that it fit in one chunk of time, like Friday night to Sunday night or something like that. And then I would scope the project so that I could ship something as much work as I could fit into a two-day period, so that at the end of that weekend, I could ship something. And then afterwards, if I want to add something, I have time to do it and a chance to do that. But it's already shipped, so there's already momentum, and people are using it, or I'm using it, and so there's a reason to continue building. So only adding one new unknown per project, I think, is a good trick. [00:08:14]Swyx: I first came across you, I think, because of Monocle, which is your personal search engine. And I got very excited about it, because I always wanted a personal search engine, until I found that it was in a language that I've never seen before. [00:08:25]Linus: Yeah, there's a towel tower of little tools and technologies that I built for myself. One of the other tricks to being really productive when you're building side projects is just to use a consistent set of tools that you know really, really well. For me, that's Go, and my language, and a couple other libraries that I've written that I know all the way down to the bottom of the stack. And then I barely have to look anything up, because I've just debugged every possible issue that could come up. And so I could get from start to finish without getting stuck in a weird bug that I've never seen before. But yeah, it's a weird stack. [00:08:58]Swyx: It also means that you probably are not aiming for, let's say, open source glory, or whatever. Because you're not publishing in the JavaScript ecosystem. Right, right. [00:09:06]Linus: I mean, I've written some libraries before, but a lot of my projects tend to be like, the way that I approach it is less about building something that other people are going to use en masse. And make yourself happy. Yeah, more about like, here's the thing that I built, if you want to, and often I learn something in the process of building that thing. So like with Monocle, I wrote a custom sort of full text search index. And I thought a lot of the parts of what I built was interesting. And so I just wanted other people to be able to look at it and see how it works and understand it. But the goal isn't necessarily for you to be able to replicate it and run it on your own. [00:09:36]Swyx: Well, we can kind of dive into your other AIUX thoughts. As you've been diving in, you tend to share a lot on Twitter. And I just kind of took out some of your greatest hits. This is relevant to the demo that you picked out, Alessio. And what we're talking about, which is, most knowledge work is not a text generation task. That's funny, because a lot of what Notion AI is, is text generation right now. Maybe you want to elaborate a little bit. Yeah. [00:10:01]Linus: I think the first time you look at something like GPT, the shape of the thing you see is like, oh, it's a thing that takes some input text and generates some output text. And so the easiest thing to build on top of that is a content generation tool. But I think there's a couple of other categories of things that you could build that are sort of progressively more useful and more interesting. And so besides content generation, which requires the minimum amount of wrapping around ChatGPT, the second tier up from that is things around knowledge, I think. So if you have, I mean, this is the hot thing with all these vector databases things going around. But if you have a lot of existing context around some knowledge about your company or about a field or all of the internet, you can use a language model as a way to search and understand things in it and combine and synthesize them. And that synthesis, I think, is useful. And at that point, I think the value that that unlocks, I think, is much greater than the value of content generation. Because most knowledge work, the artifact that you produce isn't actually about writing more words. Most knowledge work, the goal is to understand something, synthesize new things, or propose actions or other kinds of knowledge-to-knowledge tasks. And then the third category, I think, is automation. Which I think is sort of the thing that people are looking at most actively today, at least from my vantage point in the ecosystem. Things like the React prompting technique, and just in general, letting models propose actions or write code to accomplish tasks. That's also moving far beyond generating text to doing something more interesting. So much of the value of what humans sit down and do at work isn't actually in the words that they write. It's all the thinking that goes on before you write those words. So how can you get language models to contribute to those parts of work? [00:11:43]Alessio: I think when you first tweeted about this, I don't know if you already accepted the job, but you tweeted about this, and then the next one was like, this is a NotionAI subtweet. [00:11:53]Swyx: So I didn't realize that. [00:11:56]Alessio: The best thing that I see is when people complain, and then they're like, okay, I'm going to go and help make the thing better. So what are some of the things that you've been thinking about? I know you talked a lot about some of the flexibility versus intuitiveness of the product. The language is really flexible, because you can say anything. And it's funny, the models never ignore you. They always respond with something. So no matter what you write, something is going to come back. Sometimes you don't know how big the space of action is, how many things you can do. So as a product builder, how do you think about the trade-offs that you're willing to take for your users? Where like, okay, I'm not going to let you be as flexible, but I'm going to create this guardrails for you. What's the process to think about the guardrails, and how you want to funnel them to the right action? [00:12:46]Linus: Yeah, I think what this trade-off you mentioned around flexibility versus intuitiveness, I think, gets at one of the core design challenges for building products on top of language models. A lot of good interface design comes from tastefully adding the right constraints in place to guide the user towards actions that you want to take. As you add more guardrails, the obvious actions become more obvious. And one common way to make an interface more intuitive is to narrow the space of choices that the users have to make, and the number of choices that they have to make. And that intuitiveness, that source of intuitiveness from adding constraints, is kind of directly at odds with the reason that language models are so powerful and interesting, which is that they're so flexible and so general, and you can ask them to do literally anything, and they will always give you something. But most of the time, the answer isn't that high quality. And so there's kind of a distribution of, like, there are clumps of things in the action space of what a language model can do that the model's good at, and there's parts of the space where it's bad at. And so one sort of high-level framework that I have for thinking about designing with language models is, there are actions that the language model's good at, and actions that it's bad at. How do you add the right constraints carefully to guide the user and the system towards the things that the language model's good at? And then at the same time, how do you use those constraints to set the user expectations for what it's going to be good at and bad at? One way to do this is just literally to add those constraints and to set expectations. So a common example I use all the time is, if you have some AI system to answer questions from a knowledge base, there are a couple of different ways to surface that in a kind of a hypothetical product. One is, you could have a thing that looks like a chat window in a messaging app, and then you could tell the user, hey, this is for looking things up from a database. You can ask a question, then it'll look things up and give you an answer. But if something looks like a chat, and this is a lesson that's been learned over and over for anyone building chat interfaces since, like, 2014, 15, if you have anything that looks like a chat interface or a messaging app, people are going to put some, like, weird stuff in there that just don't look like the thing that you want the model to take in, because the expectation is, hey, I can use this like a messaging app, and people will send in, like, hi, hello, you know, weird questions, weird comments. Whereas if you take that same, literally the same input box, and put it in, like, a thing that looks like a search bar with, like, a search button, people are going to treat it more like a search window. And at that point, inputs look a lot more like keywords or a list of keywords or maybe questions. So the simple act of, like, contextualizing that input in different parts of an interface reset the user's expectations, which constrain the space of things that the model has to handle. And that you're kind of adding constraints, because you're really restricting your input to mostly things that look like keyword search. But because of that constraint, you can have the model fit the expectations better. You can tune the model to perform better in those settings. And it's also less confusing and perhaps more intuitive, because the user isn't stuck with this blank page syndrome problem of, okay, here's an input. What do I actually do with it? When we initially launched Notion AI, one of my common takeaways, personally, from talking to a lot of my friends who had tried it, obviously, there were a lot of people who were getting lots of value out of using it to automate writing emails or writing marketing copy. There were a ton of people who were using it to, like, write Instagram ads and then sort of paste it into the Instagram tool. But some of my friends who had tried it and did not use it as much, a frequently cited reason was, I tried it. It was cool. It was cool for the things that Notion AI was marketed for. But for my particular use case, I had a hard time figuring out exactly the way it was useful for my workflow. And I think that gets back at the problem of, it's such a general tool that just presented with a blank prompt box, it's hard to know exactly the way it could be useful to your particular use case. [00:16:21]Alessio: What do you think is the relationship between novelty and flexibility? I feel like we're in kind of like a prompting honeymoon phase where the tools are new and then everybody just wants to do whatever they want to do. And so it's good to give these interfaces because people can explore. But if I go forward in three years, ideally, I'm not prompting anything. The UX has been built for most products to already have the intuitive, kind of like a happy path built into it. Do you think there's merit in a way? If you think about ChatGPT, if it was limited, the reason why it got so viral is people were doing things that they didn't think a computer could do, like write poems and solve riddles and all these different things. How do you think about that, especially in Notion, where Notion AI is kind of like a new product in an existing thing? How much of it for you is letting that happen and seeing how people use it? And then at some point be like, okay, we know what people want to do. The flexibility is not, it was cool before, but now we just want you to do the right things with the right UX. [00:17:27]Linus: I think there's value in always having the most general input as an escape hatch for people who want to take advantage of that power. At this point, Notion AI has a couple of different manifestations in the product. There's the writer. There's a thing we called an AI block, which is a thing that you can always sort of re-update as a part of document. It's like a live, a little portal inside the document that an AI can write. We also have a relatively new thing called AI autofill, which lets an AI fill an entire column in a Notion database. In all of these things, speaking of adding constraints, we have a lot of suggested prompts that we've worked on and we've curated and we think work pretty well for things like summarization and writing drafts to blog posts and things. But we always leave a fully custom prompt for a few reasons. One is if you are actually a power user and you know how language models work, you can go in and write your custom prompt and if you're a power user, you want access to the power. The other is for us to be able to discover new use cases. And so one of the lovely things about working on a product like Notion is that there's such an enthusiastic and lively kind of community of ambassadors and people that are excited about trying different things and coming up with all these templates and new use cases. And having a fully custom action or prompt whenever we launch something new in AI lets those people really experiment and help us discover new ways to take advantage of AI. I think it's good in that way. There's also a sort of complement to that, which is if we wanted to use feedback data or learn from those things and help improve the way that we are prompting the model or the models that we're building, having access to that like fully diverse, fully general range of use cases helps us make sure that our models can handle the full generality of what people want to do. [00:19:06]Swyx: I feel like we've segway’d a lot into our Notion conversation and maybe I just wanted to bridge that a little bit with your personal journey into Notion before we go into Notion proper. You spent a year kind of on a sabbatical, kind of on your own self-guided research journey and then deciding to join Notion. I think a lot of engineers out there thinking about doing this maybe don't have the internal compass that you have or don't have the guts to basically make no money for a year. Maybe just share with people how you decided to basically go on your own independent journey and what got you to join Notion in the end. [00:19:42]Linus: Yeah, what happened? Um, yeah, so for a little bit of context for people who don't know me, I was working mostly at sort of seed stage startups as a web engineer. I actually didn't really do much AI at all for prior to my year off. And then I took all of 2022 off with less of a focus on it ended up sort of in retrospect becoming like a Linus Pivots to AI year, which was like beautifully well timed. But in the beginning of the year, there was kind of a one key motivation and then one key kind of question that I had. The motivation was that I think I was at a sort of a privileged and fortunate enough place where I felt like I had some money saved up that I had saved up explicitly to be able to take some time off and investigate my own kind of questions because I was already working on lots of side projects and I wanted to spend more time on it. I think I also at that point felt like I had enough security in the companies and folks that I knew that if I really needed a job on a short notice, I could go and I could find some work to do. So I wouldn't be completely on the streets. And so that security, I think, gave me the confidence to say, OK, let's try this kind of experiment.[00:20:52]Maybe it'll only be for six months. Maybe it'll be for a year. I had enough money saved up to last like a year and change. And so I had planned for a year off and I had one sort of big question that I wanted to explore. Having that single question, I think, actually was really helpful for focusing the effort instead of just being like, I'm going to side project for a year, which I think would have been less productive. And that big question was, how do we evolve text interfaces forward? So, so much of knowledge work is consuming walls of text and then producing more walls of text. And text is so ubiquitous, not just in software, but just in general in the world. They're like signages and menus and books. And it's ubiquitous, but it's not very ergonomic. There's a lot of things about text interfaces that could be better. And so I wanted to explore how we could make that better. A key part of that ended up being, as I discovered, taking advantage of this new technologies that let computers make sense of text information. And so that's how I ended up sort of sliding into AI. But the motivation in the beginning was less focused on learning a new technology and more just on exploring this general question space. [00:21:53]Swyx: Yeah. You have the quote, text is the lowest denominator, not the end game. Right, right. [00:21:58]Linus: I mean, I think if you look at any specific domain or discipline, whether it's medicine or mathematics or software engineering, in any specific discipline where there's a narrower set of abstractions for people to work with, there are custom notations. One of the first things that I wrote in this exploration year was this piece called Notational Intelligence, where I talk about this idea that so much of, as a total sidebar, there's a whole other fascinating conversation that I would love to have at some point, maybe today, maybe later, about how to evolve a budding scene of research into a fully-fledged field. So I think AI UX is kind of in this weird stage where there's a group of interesting people that are interested in exploring this space of how do you design for this newfangled technology, and how do you take that and go and build best practices and powerful methods and tools [00:22:48]Swyx: We should talk about that at some point. [00:22:49]Linus: OK. But in a lot of established fields, there are notations that people use that really help them work at a slightly higher level than just raw words. So notations for describing chemicals and notations for different areas of mathematics that let people work with higher-level concepts more easily. Logic, linguistics. [00:23:07]Swyx: Yeah. [00:23:07]Linus: And I think it's fair to say that some large part of human intelligence, especially in these more technical domains, comes from our ability to work with notations instead of work with just the raw ideas in our heads. And text is a kind of notation. It's the most general kind of notation, but it's also, because of its generality, not super high leverage if you want to go into these specific domains. And so I wanted to try to improve on that frontier. [00:23:29]Swyx: Yeah. You said in our show notes, one of my goals over the next few years is to ensure that we end up with interface metaphors and technical conventions that set us up for the best possible timeline for creativity and inventions ahead. So part of that is constraints. But I feel like that is one part of the equation, right? What's the other part that is more engenders creativity? [00:23:47]Linus: Tell me a little bit about that and what you're thinking there. [00:23:51]Swyx: It's just, I feel like, you know, we talked a little bit about how you do want to constrain, for example, the user interface to guide people towards things that language models are good at. And creative solutions do arise out of constraints. But I feel like that alone is not sufficient for people to invent things. [00:24:10]Linus: I mean, there's a lot of directions, I think, that could go from that. The origin of that thing that you're quoting is when I decided to come help work on AI at Notion, a bunch of my friends were actually quite surprised, I think, because they had expected that I would have gone and worked… [00:24:29]Swyx: You did switch. I was eyeing that for you. [00:24:31]Linus: I mean, I worked at a lab or at my own company or something like that. But one of the core motivations for me joining an existing company and one that has lots of users already is this exact thing where in the aftermath of a new foundational technology emerging, there's kind of a period of a few years where the winners in the market get to decide what the default interface paradigm for the technology is. So, like, mini computers, personal computers, the winners of that market got to decide Windows are and how scrolling works and what a mouse cursor is and how text is edited. Similar with mobile, the concept of a home screen and apps and things like that, the winners of the market got to decide. And that has profound, like, I think it's difficult to understate the importance of, in those few critical years, the winning companies in the market choosing the right abstractions and the right metaphors. And AI, to me, seemed like it's at that pivotal moment where it's a technology that lots of companies are adopting. There is this well-recognized need for interface best practices. And Notion seemed like a company that had this interesting balance of it could still move quickly enough and ship and prototype quickly enough to try interesting interface ideas. But it also had enough presence in the ecosystem that if we came up with the right solution or one that we felt was right, we could push it out and learn from real users and iterate and hopefully be a part of that story of setting the defaults and setting what the dominant patterns are. [00:26:07]Swyx: Yeah, it's a special opportunity. One of my favorite stories or facts is it was like a team of 10 people that designed the original iPhone. And so all the UX that was created there is essentially what we use as smartphones today, including predictive text, because people were finding that people were kind of missing the right letters. So they just enhanced the hit area for certain letters based on what you're typing. [00:26:28]Linus: I mean, even just the idea of like, we should use QWERTY keyboards on tiny smartphone screens. Like that's a weird idea, right? [00:26:36]Swyx: Yeah, QWERTY is another one. So I have RSI. So this actually affects me. QWERTY was specifically chosen to maximize travel distance, right? Like it's actually not ergonomic by design because you wanted the keyboard, the key type writers to not stick. But we don't have that anymore. We're still sticking to QWERTY. I'm still sticking to QWERTY. I could switch to the other ones. I forget. QORAC or QOMAC anytime, but I don't just because of inertia. I have another thing like this. [00:27:02]Linus: So going even farther back, people don't really think enough about where this concept of buttons come from, right? So the concept of a push button as a thing where you press it and it activates some binary switch. I mean, buttons have existed for, like mechanical buttons have existed for a long time. But really, like this modern concept of a button that activates a binary switch really gets like popularized by the popular advent of electricity. Before the electricity, if you had a button that did something, you would have to construct a mechanical system where if you press down on a thing, it affects some other lever system that affects as like the final action. And this modern idea of a button that is just a binary switch gets popularized electricity. And at that point, a button has to work in the way that it does in like an alarm clock, because when you press down on it, there's like a spring that makes sure that the button comes back up and that it completes the circuit. And so that's the way the button works. And then when we started writing graphical interfaces, we just took that idea of a thing that could be depressed to activate a switch. All the modern buttons that we have today in software interfaces are like simulating electronic push buttons where you like press down to complete a circuit, except there's actually no circuit being completed. It's just like a square on a screen. [00:28:11]Swyx: It's all virtualized. Right. [00:28:12]Linus: And then you control the simulation of a button by clicking a physical button on a mouse. Except if you're on a trackpad, it's not even a physical button anymore. It's like a simulated button hardware that controls a simulated button in software. And it's also just this cascade of like conceptual backwards compatibility that gets us here. I think buttons are interesting. [00:28:32]Alessio: Where are you on the skeuomorphic design love-hate spectrum? There's people that have like high nostalgia for like the original, you know, the YouTube icon on the iPhone with like the knobs on the TV. [00:28:42]Linus: I think a big part of that is at least the aesthetic part of it is fashion. Like fashion taken very literally, like in the same way that like the like early like Y2K 90s aesthetic comes and goes. I think skeuomorphism as expressed in like the early iPhone or like Windows XP comes and goes. There's another aspect of this, which is the part of skeuomorphism that helps people understand and intuit software, which has less to do with skeuomorphism making things easier to understand per se and more about like, like a slightly more general version of skeuomorphism is like, there should be a consistent mental model behind an interface that is easy to grok. And then once the user has the mental model, even if it's not the full model of exactly how that system works, there should be a simplified model that the user can easily understand and then sort of like adopt and use. One of my favorite examples of this is how volume controls that are designed well often work. Like on an iPhone, when you make your iPhone volume twice as loud, the sound that comes out isn't actually like at a physical level twice as loud. It's on a log scale. When you push the volume slider up on an iPhone, the speaker uses like four times more energy, but humans perceive it as twice as loud. And so the mental model that we're working with is, okay, if I make this, this volume control slider have two times more value, it's going to sound two times louder, even though actually the underlying physics is like on a log scale. But what actually happens physically is not actually what matters. What matters is how humans perceive it in the model that I have in my head. And there, I think there are a lot of other instances where the skeuomorphism isn't actually the thing. The thing is just that there should be a consistent mental model. And often the easy, consistent mental model to reach for is the models that already exist in reality, but not always. [00:30:23]Alessio: I think the other big topic, maybe before we dive into Notion is agents. I think that's one of the toughest interfaces to crack, mostly because, you know, the text box, everybody understands that the agent is kind of like, it's like human-like feeling, you know, where it's like, okay, I'm kind of delegating something to a human, right? I think, like, Sean, you made the example of like a Calendly, like a savvy Cal, it's like an agent, because it's scheduling on your behalf for something. [00:30:51]Linus: That's actually a really interesting example, because it's a kind of a, it's a pretty deterministic, like there's no real AI to it, but it is agent in the sense that you're like delegating it and automate something. [00:31:01]Swyx: Yeah, it does work without me. It's great. [00:31:03]Alessio: So that one, we figured out. Like, we know what the scheduling interface is like. [00:31:07]Swyx: Well, that's the state of the art now. But, you know, for example, the person I'm corresponding with still has to pick a time from my calendar, which some people dislike. Sam Lesson famously says it's a sign of disrespect. I disagree with him, but, you know, it's a point of view. There could be some intermediate AI agents that would send emails back and forth like a human person to give the other person who feels slighted that sense of respect or a personalized touch that they want. So there's always ways to push it. [00:31:39]Alessio: Yeah, I think for me, you know, other stuff that I think about, so we were doing prep for another episode and had an agent and asked it to do like a, you know, background prep on like the background of the person. And it just couldn't quite get the format that I wanted it to be, you know, but I kept to have the only way to prompt that it's like, give it text, give a text example, give a text example. What do you think, like the interface between human and agents in the future will be like, do you still think agents are like this open ended thing that are like objective driven where you say, Hey, this is what I want to achieve versus I only trust this agent to do X. And like, this is how X is done. I'm curious because that kind of seems like a lot of mental overhead, you know, to remember each agent for each task versus like if you have an executive assistant, like they'll do a random set of tasks and you can trust them because they're a human. But I feel like with agents, we're not quite there. [00:32:36]Swyx: Agents are hard. [00:32:36]Linus: The design space is just so vast. Since all of the like early agent stuff came out around auto GPT, I've tried to develop some kind of a thesis around it. And I think it's just difficult because there's so many variables. One framework that I usually apply to sort of like existing chat based prompting kind of things that I think also applies just as well to agents is this duality between what you might call like trust and control. So you just now you brought up this example of you had an agent try to write some write up some prep document for an episode and it couldn't quite get the format right. And one way you could describe that is you could say, Oh, the, the agent didn't exactly do what I meant and what I had in my head. So I can't trust it to do the right job. But a different way to describe it is I have a hard time controlling exactly the output of the model and I have a hard time communicating exactly what's in my head to the model. And they're kind of two sides of the same coin. I think if you, if you can somehow provide a way to with less effort, communicate and control and constrain the model output a little bit more and constrain the behavior a little bit more, I think that would alleviate the pressure for the model to be this like fully trusted thing because there's no need for trust anymore. There's just kind of guardrails that ensure that the model does the right thing. So developing ways and interfaces for these agents to be a little more constrained in its output or maybe for the human to control its output a little bit more or behavior a little bit more, I think is a productive path. Another sort of more, more recent revelation that I had while working on this and autofill thing inside notion is the importance of zones of influence for AI agents, especially in collaborative settings. So having worked on lots of interfaces for independent work on my year off, one of the surprising lessons that I learned early on when I joined notion was that if you build a collaboration permeates everything, which is great for notion because collaborating with an AI, you reuse a lot of the same metaphors for collaborating with humans. So one nice thing about this autofill thing that also kind of applies to AI blocks, which is another thing that we have, is that you don't alleviate this problem of having to ask questions like, oh, is this document written by an AI or is this written by a human? Like this need for auditability, because the part that's written by the AI is just in like the autofilled cell or in the AI block. And you can, you can tell that's written by the AI and things outside of it, you can kind of reasonably assume that it was written by you. I think anytime you have sort of an unbounded action space for, for models like agents, it's especially important to be able to answer those questions easily and to have some sense of security that in the same way that you want to know whether your like coworker or collaborator has access to a document or has modified a document, you want to know whether an AI has permissions to access something. And if it's modified something or made some edit, you want to know that it did it. And so as a compliment to constraining the model's action space proactively, I think it's also important to communicate, have the user have an easy understanding of like, what exactly did the model do here? And I think that helps build trust as well. [00:35:39]Swyx: Yeah. I think for auto GPT and those kinds of agents in particular, anything that is destructive, you need to prompt for, I guess, or like check with, check in with the user. I know it's overloaded now. I can't say that. You have to confirm with the user. You confirm to the user. Yeah, exactly. Yeah. Yeah. [00:35:56]Linus: That's tough too though, because you, you don't want to stop. [00:35:59]Swyx: Yeah. [00:35:59]Linus: One of the, one of the benefits of automating these things that you can sort of like, in theory, you can scale them out arbitrarily. I can have like a hundred different agents working for me, but if that means I'm just spending my entire day in a deluge of notifications, that's not ideal either. [00:36:12]Swyx: Yeah. So then it could be like a reversible, destructive thing with some kind of timeouts, a time limit. So you could reverse it within some window. I don't know. Yeah. I've been thinking about this a little bit because I've been working on a small developer agent. Right. Right. [00:36:27]Linus: Or maybe you could like batch a group of changes and can sort of like summarize them with another AI and improve them in bulk or something. [00:36:33]Swyx: Which is surprisingly similar to the collaboration problem. Yeah. Yeah. Yeah. Exactly. Yeah. [00:36:39]Linus: I'm telling you, the collaboration, a lot of the problems with collaborating with humans also apply to collaborating with AI. There's a potential pitfall to that as well, which is that there are a lot of things that some of the core advantages of AI end up missing out on if you just fully anthropomorphize them into like human-like collaborators. [00:36:56]Swyx: But yeah. Do you have a strong opinion on that? Like, do you refer to it as it? Oh yeah. [00:37:00]Linus: I'm an it person, at least for now, in 2023. Yeah. [00:37:05]Swyx: So that leads us nicely into introducing what Notion and Notion AI is today. Do you have a pet answer as to what is Notion? I've heard it introduced as a database, a WordPress killer, a knowledge base, a collaboration tool. What is it? Yeah. [00:37:19]Linus: I mean, the official answer is that a Notion is a connected workspace. It has a space for your company docs, meeting notes, a wiki for all of your company notes. You can also use it to orchestrate your workflows if you're managing a project, if you have an engineering team, if you have a sales team. You can put all of those in a single Notion database. And the benefit of Notion is that all of them live in a single space where you can link to your wiki pages from your, I don't know, like onboarding docs. Or you can link to a GitHub issue through a task from your documentation on your engineering system. And all of this existing in a single place in this kind of like unified, yeah, like single workspace, I think has lots of benefits. [00:37:58]Swyx: That's the official line. [00:37:59]Linus: There's an asterisk that I usually enjoy diving deeper into, which is that the whole reason that this connected workspace is possible is because underlying all of this is this really cool abstraction of blocks. In Notion, everything is a block. A paragraph is a block. A bullet point is a block. But also a page is a block. And the way that Notion databases work is that a database is just a collection of pages, which are really blocks. And you can like take a paragraph and drag it into a database and it'll become a page. You can take a page inside a database and pull it out and it'll just become a link to that page. And so this core abstraction of a block that can also be a page, that can also be a row in a database, like an Excel sheet, that fluidity and this like shared abstraction across all these different areas inside Notion, I think is what really makes Notion powerful. This Lego theme, this like Lego building block theme permeates a lot of different parts of Notion. Some fans of Notion might know that when you, or when you join Notion, you get a little Lego minifigure, which has Lego building blocks for workflows. And then every year you're at Notion, you get a new block that says like you've been here for a year, you've been here for two years. And then Simon, our co-founder and CTO, has a whole crate of Lego blocks on his desk that he just likes to mess with because, you know, he's been around for a long time. But this Lego building block thing, this like shared sort of all-encompassing single abstraction that you can combine to build various different kinds of workflows, I think is really what makes Notion powerful. And one of the sort of background questions that I have for Notion AI is like, what is that kind of building block for AI? [00:39:30]Swyx: Well, we can dive into that. So what is Notion AI? Like, so I kind of view it as like a startup within the startup. Could you describe the Notion AI team? Is this like, how seriously is Notion taking the AI wave? [00:39:43]Linus: The most seriously? The way that Notion AI came about, as I understand it, because I joined a bit later, I think it was around October last year, all of Notion team had a little offsite. And as a part of that, Ivan and Simon kind of went into a little kind of hack weekend. And the thing that they ended up hacking on inside Notion was the very, very early prototype of Notion AI. They saw this GPT-3 thing. The early, early motivation for starting Notion, building Notion in the first place for them, was sort of grounded in this utopian end-user programming vision where software is so powerful, but there are only so many people in the world that can write programs. But everyone can benefit from having a little workspace or a little program or a little workflow tool that's programmed to just fit their use case. And so how can we build a tool that lets people customize their software tools that they use every day for their use case? And I think to them, seemed like such a critical part of facilitating that, bridging the gap between people who can code and people who need software. And so they saw that, they tried to build an initial prototype that ended up becoming the first version of Notion AI. They had a prototype in, I think, late October, early November, before Chachapiti came out and sort of evolved it over the few months. But what ended up launching was sort of in line with the initial vision, I think, of what they ended up building. And then once they had it, I think they wanted to keep pushing it. And so at this point, AI is a really key part of Notion strategy. And what we see Notion becoming going forward, in the same way that blocks and databases are a core part of Notion that helps enable workflow automation and all these important parts of running a team or collaborating with people or running your life, we think that AI is going to become an equally critical part of what Notion is. And it won't be, Notion is a cool connected workspace app, and it also has AI. It'll be that what Notion is, is databases, it has pages, it has space for your docs, and it also has this sort of comprehensive suite of AI tools that permeate everything. And one of the challenges of the AI team, which is, as you said, kind of a startup within a startup right now, is to figure out exactly what that all-permeating kind of abstraction means, which is a fascinating and difficult open problem. [00:41:57]Alessio: How do you think about what people expect of Notion versus what you want to build in Notion? A lot of this AI technology kind of changes, you know, we talked about the relationship between text and human and how human collaborates. Do you put any constraints on yourself when it's like, okay, people expect Notion to work this way with these blocks. So maybe I have this crazy idea and I cannot really pursue it because it's there. I think it's a classic innovator's dilemma kind of thing. And I think a lot of founders out there that are in a similar position where it's like, you know, series C, series D company, it's like, you're not quite yet the super established one, you're still moving forward, but you have an existing kind of following and something that Notion stands for. How do you kind of wrangle with that? [00:42:43]Linus: Yeah, that is in some ways a challenge and that Notion already is a kind of a thing. And so we can't just scrap everything and start over. But I think it's also, there's a blessing side of it too, in that because there are so many people using Notion in so many different ways, we understand all of the things that people want to use Notion for very well. And then so we already have a really well-defined space of problems that we want to help people solve. And that helps us. We have it with the existing Notion product and we also have it by sort of rolling out these AI things early and then watching, learning from the community what people want to do [00:43:17]Swyx: with them. [00:43:17]Linus: And so based on those learnings, I think it actually sort of helps us constrain the space of things we think we need to build because otherwise the design space is just so large with whatever we can do with AI and knowledge work. And so watching what people have been using Notion for and what they want to use Notion for, I think helps us constrain that space a little bit and make the problem of building AI things inside Notion a little more tractable. [00:43:36]Swyx: I think also just observing what they naturally use things for, and it sounds like you do a bunch of user interviews where you hear people running into issues and, or describe them as, the way that I describe myself actually is, I feel like the problem is with me, that I'm not creative enough to come up with use cases to use Notion AI or any other AI. [00:43:57]Linus: Which isn't necessarily on you, right? [00:43:59]Swyx: Exactly. [00:43:59]Linus: Again, like it goes way back to the early, the thing we touched on early in the conversation around like, if you have too much generality, there's not enough, there are not enough guardrails to obviously point to use cases. Blank piece of paper. [00:44:10]Swyx: I don't know what to do with this. So I think a lot of people judge Notion AI based on what they originally saw, which is write me a blog post or do a summary or do action items. Which, fun fact, for latent space, my very, very first Hacker News hit was reverse engineering Notion AI. I actually don't know if I got it exactly right. I think I got the easy ones right. And then apparently I got the action items one really wrong. So there's some art into doing that. But also you've since launched a bunch of other products and maybe you've already hinted at AI Autofill. Maybe we can just talk a little bit about what does the scope or suite of Notion AI products have been so far and what you're launching this week? Yeah. [00:44:53]Linus: So we have, I think, three main facets of Notion AI and Notion at the moment. We have sort of the first thing that ever launched with Notion AI, which I think that helps you write. It's, going back to earlier in the conversation, it's kind of a writing, kind of a content generation tool. If you have a document and you want to generate a summary, it helps you generate a summary, pull out action items, you can draft a blog post, you can help it improve, it's helped to improve your writings, it can help fix grammar and spelling mistakes. But under the hood, it's a fairly lightweight, a thick layer of prompts. But otherwise, it's a pretty straightforward use case of language models, right? And so there's that, a tool that helps you write documents. There's a thing called an AI block, which is a slightly more constrained version of that where one common way that we use it inside Notion is we take all of our meeting notes inside Notion. And frequently when you have a meeting and you want other people to be able to go back to it and reference it, it's nice to have a summary of that meeting. So all of our meeting notes templates, at least on the AI team, have an AI block at the top that automatically summarizes the contents of that page. And so whenever we're done with a meeting, we just press a button and it'll re-summarize that, including things like what are the core action items for every person in the meeting. And so that block, as I said before, is nice because it's a constrained space for the AI to work in, and we don't have to prompt it every single time. And then the newest member of this AI collection of features is AI autofill, which brings Notion AI to databases. So if you have a whole database of user interviews and you want to pull out what are the companies, core pain points, what are their core features, maybe what are their competitor products they use, you can just make columns. And in the same way that you write Excel formulas, you can write a little AI formula, basically, where the AI will look at the contents of the page and pull out each of these key pieces of information. The slightly new thing that autofill introduces is this idea of a more automated background [00:46:43]Swyx: AI thing. [00:46:44]Linus: So with Writer, the AI in your document product and the AI block, you have to always ask it to update. You have to always ask it to rewrite. But if you have a column in a database, in a Notion database, or a property in a Notion database, it would be nice if you, whenever someone went back and changed the contents of the meeting node or something updated about the page, or maybe it's a list of tasks that you have to do and the status of the task changes, you might want the summary of that task or detail of the task to update. And so anytime that you can set up an autofilled Notion property so that anytime something on that database row or page changes, the AI will go back and sort of auto-update the autofilled value. And that, I think, is a really interesting part that we might continue leading into of like, even though there's AI now tied to this particular page, it's sort of doing its own thing in the background to help automate and alleviate some of that pain of automating these things. But yeah, Writer, Blocks, and Autofill are the three sort of cornerstones we have today. [00:47:42]Alessio: You know, there used to be this glorious time where like, Roam Research was like the hottest knowledge company out there, and then Notion built Backlinks. I don't know if we are to blame for that. No, no, but how do Backlinks play into some of this? You know, I think most AI use cases today are kind of like a single page, right? Kind of like this document. I'm helping with this. Do you see some of these tools expanding to do changes across things? So we just had Itamar from Codium on the podcast, and he talked about how agents can tie in specs for features, tests for features, and the code for the feature. So like the three entities are tied together. Like, do you see some Backlinks help AI navigate through knowledge basis of companies where like, you might have the document the product uses, but you also have the document that marketing uses to then announce it? And as you make changes, the AI can work through different pieces of it? [00:48:41]Swyx: Definitely. [00:48:41]Linus: If I may get a little theoretical from that. One of my favorite ideas from my last year of hacking around building text augmentations with AI for documents is this realization that, you know, when you look at code in a code editor, what it is at a very lowest level is just text files. A code file is a text file, and there are maybe functions inside of it, and it's a list of functions, but it's a text file. But the way that you understand it is not as a file, like a Word document, it's a kind of a graph.[00:49:10]Linus: Like you have a function, you have call sites to that function, there are places where you call that function, there's a place where that function is tested, many different definitions for that function. Maybe there's a type definition that's tied to that function. So it's a kind of a graph. And if you want to understand that function, there's advantages to be able to traverse that whole graph and fully contextualize where that function is used. Same with types and same with variables. And so even though its code is represented as text files, it's actually kind of a graph. And a lot of the, of what, all of the key interfaces, interface innovations behind IDEs is helping surface that graph structure in the context of a text file. So like things like go to definition or VS Code's little window view when you like look at references. And interesting idea that I explored last year was what if you bring that to text documents? So text documents are a little more unstructured, so there's a less, there's a more fuzzy kind of graph idea. But if you're reading a textbook, if there's a new term, there's actually other places where the term is mentioned. There's probably a few places where that's defined. Maybe there's some figures that reference that term. If you have an idea, there are other parts of the document where the document might disagree with that idea or cite that idea. So there's still kind of a graph structure. It's a little more fuzzy, but there's a graph structure that ties together like a body of knowledge. And it would be cool if you had some kind of a text editor or some kind of knowledge tool that let you explore that whole graph. Or maybe if an AI could explore that whole graph. And so back to your point, I think taking advantage of not just the backlinks. Backlinks is a part of it. But the fact that all of these inside Notion, all of these pages exist in a single workspace and it's a shared context. It's a connected workspace. And you can take any idea and look up anywhere to fully contextualize what a part of your engineering system design means. Or what we know about our pitching their customer at a company. Or if I wrote down a book, what are other places where that book has been mentioned? All these graph following things, I think, are really important for contextualizing knowledge. [00:51:02]Swyx: Part of your job at Notion is prompt engineering. You are maybe one of the more advanced prompt engineers that I know out there. And you've always commented on the state of prompt ops tooling. What is your process today? What do you wish for? There's a lot here. [00:51:19]Linus: I mean, the prompts that are inside Notion right now, they're not complex in the sense that agent prompts are complex. But they're complex in the sense that there is even a problem as simple as summarize a [00:51:31]Swyx: page. [00:51:31]Linus: A page could contain anything from no information, if it's a fresh document, to a fully fledged news article. Maybe it's a meeting note. Maybe it's a bug filed by somebody at a company. The range of possible documents is huge. And then you have to distill all of it down to always generate a summary. And so describing that task to AI comprehensively is pretty hard. There are a few things that I think I ended up leaning on, as a team we ended up leaning on, for the prompt engineering part of it. I think one of the early transitions that we made was that the initial prototype for Notion AI was built on instruction following, the sort of classic instruction following models, TextWG003, and so on. And then at some point, we all switched to chat-based models, like Claude and the new ChatGPT Turbo and these models. And so that was an interesting transition. It actually kind of made few-shot prompting a little bit easier, I think, in that you could give the few-shot examples as sort of previous turns in a conversation. And then you could ask the real question as the next follow-up turn. I've come to appreciate few-shot prompting a lot more because it's difficult to fully comprehensively explain a particular task in words, but it's pretty easy to demonstrate like four or five different edge cases that you want the model to handle. And a lot of times, if there's an edge case that you want a model to handle, I think few-shot prompting is just the easiest, most reliable tool to reach for. One challenge in prompt engineering that Notion has to contend with often is we want to support all the different languages that Notion supports. And so all of our prompts have to be multilingual or compatible, which is kind of tricky because our prompts are written, our instructions are written in English. And so if you just have a naive approach, then the model tends to output in English, even when the document that you want to translate or summarize is in French. And so one way you could try to attack that problem is to tell the model, answering the language of the user's query. But it's actually a lot more effective to just give it examples of not just English documents, but maybe summarizing an English document, maybe summarize a ticket filed in French, summarize an empty document where the document's supposed to be in Korean. And so a lot of our few-shot prompt-included prompts in Notion AI tend to be very multilingual, and that helps support our non-English-speaking users. The other big part of prompt engineering is evaluation. The prompts that you exfiltrated out of Notion AI many weeks ago, surprisingly pretty spot-on, at least for the prompts that we had then, especially things like summary. But they're also outdated because we've evolved them a lot more, and we have a lot more examples. And some of our prompts are just really, really long. They're like thousands of tokens long. And so every time we go back and add an example or modify the instruction, we want to make sure that we don't regress any of the previous use cases that we've supported. And so we put a lot of effort, and we're increasingly building out internal tooling infrastructure for things like what you might call unit tests and regression tests for prompts with handwritten test cases, as well as tests that are driven more by feedback from Notion users that have chosen to share their feedback with us. [00:54:31]Swyx: You just have a hand-rolled testing framework or use Jest or whatever, and nothing custom out there. You basically said you've looked at so many prompt ops tools and you're sold on none of them. [00:54:42]Linus: So that tweet was from a while ago. I think there are a couple of interesting tools these days. But I think at the moment, Notion uses pretty hand-rolled tools. Nothing too heavy, but it's basically a for loop over a list of test cases. We do do quite a bit of using language models to evaluate language models. So our unit test descriptions are kind of funny because the test is literally just an input document and a query, and then we expect the model to say something. And then our qualification for whether that test passes or not is just ask the language model again, whether it looks like a reasonable summary or whether it's in the right language. [00:55:19]Swyx: Do you have the same model? Do you have entropic-criticized OpenAI or OpenAI-criticized entropic? That's a good question. Do you worry about models being biased towards its own self? [00:55:29]Linus: Oh, no, that's not a worry that we have. I actually don't know exactly if we use different models. If you have a fixed budget for running these tests, I think it would make sense to use more expensive models for evaluation rather than generation. But yeah, I don't remember exactly what we do there. [00:55:44]Swyx: And then one more follow-up on, you mentioned some of your prompts are thousands of tokens. That takes away from my budget as a user. Isn't that a trade-off that's a concern? So there's a limited context window, right? Some of that is taken by you as the app designer, product designer, deciding what system prompt to provide. And then the remainder is what I as a user can give you to actually summarize as my content. In theory. [00:56:10]Linus: I think in practice there are a couple of trends that make that an issue. So for things like generating summaries, a summary is only going to be so many tokens long. If our prompts are generating you 3,000 token summaries, the prompt is not doing its job anyway. [00:56:25]Swyx: Yeah, but the source doc is. [00:56:27]Linus: The source doc could be longer. So if you wanted to translate a 5,000 token document, you do have to truncate it. And there is a limitation. It's not something that we are super focused on at the moment for a couple of reasons. I think there are techniques that, if we need to, help us compress those prompts. Things like parameter-efficient fine-tuning. And also the context lengths. It seems like the dominant trend is that context lengths are getting cheaper and longer constantly. Anthropic recently announced their 100,000 token context model recently. And so I think in the longer term that's going to be taken care of anyway by the models becoming more accommodating of longer contexts. And it's more of a temporary limitation. Cool. [00:57:04]Swyx: Shall we talk about the professionalizing of a scene? [00:57:07]Linus: Yeah, I think one of the things that is a helpful bit of context when thinking about HCI and AI in particular is, historically, HCI and AI have been sort of competing disciplines. Competing very specifically in the sense that they often fought for the same sources of funding and the same kinds of people and attention throughout the history of computer science. HCI and AI both used to come from the same or very aligned, similar, parallel motivations of, we have computers. How do we make computers work better with humans? And one way to do it was to make the machine smarter. Another way to do it was to design better interfaces. And through the AI booms and busts, when the AI boom was happening, HCI would get less funding. And when AIs had winters, HCI would get a lot more attention because it was sort of the alternative solution. And now that we have this sort of renewed attention on how to build better interfaces for AI, I think it's interesting that it's kind of a scene now. There are podcasts like this where I get to talk about interfaces and AI. But it's definitely not a fully-fledged field. My favorite definition of sort of what distinguishes the two apart comes from Andy Matuszak, where he, I'm going to butcher the quote, but he said something to the effect of, a field has at their disposal a powerful set of established tools and methods and standards and a shared set of core questions they want to answer. And so if you look at machine learning, which is obviously a really dominant established field, if you want to answer, if you want to evaluate a model, if you want to answer, if you want to solve a particular task or build a model that solves a particular task, there are powerful methods that we have, like gradient descent and specific benchmarks, for building solutions and then re-evaluating how to do the solutions. Or if you have an even more expensive problem, there are surely attempts that have been made before and then attempts that people are making now for how to attack that problem and frameworks to think about these things. In AI and UX, I think, we're very early in the evolution of that space and that community, and there's a lot of people excited, a lot of people building, but we have yet to come up with a set of best practices and tools and methods and frameworks for thinking about these things. And those will surely arise, and as they do, I think we'll see the evolution of the field. In prompt engineering and using language models in products at large, I think that community is a little farther along. It's still very fast moving because it's really young, but there are established prompting techniques like React and distillation of larger instruction following models. And these techniques, I think, are the beginnings of best practices and powerful tools at the disposal of this language model using field. [00:59:43]Swyx: Yeah, and mostly it's just following Riley Goodside. It's how I learn about prompting techniques. Right, right. Yeah, pioneers. But yeah, I am actually interested in this. We've recently kind of rebranded the podcast or the newsletter somewhat in towards being for this term AI engineer, which I kind of view as somewhere between machine learning researcher and software engineer, some kind of in-between mix. And I think creating the media, creating meetups, creating a de facto conference for it, creating job titles, and then I think that core set of questions that everyone wants to get better at, I think that is essentially how this starts. Yeah, yeah. Pretty excited of. [01:00:25]Linus: Creating a space for the people that are interested to come together, I think, is a really, really key important part of it. I'm always, whenever I come back to it, I'm always amazed by how if you look at the sort of golden era of theoretical physics in the early 20th century, or the golden era of early personal computing, there are maybe like two dozen people that have contributed all of the significant ideas to that field. They all kind of know each other. I always found that really fascinating. And I think the causal relationship actually goes the other way. It's not that all those people happen to know each other. It's that because there was that core set of people that always, that were very close to each other and shared ideas often, and they were co-located, that that field is able to blossom. And so I think creating that space is really critical. [01:01:08]Swyx: Yeah, there's a very famous photo of the Solvay conference in 1927, where Albert Einstein, Niels Bohr, Marie Curie, all these top physics names. And how many Nobel laureates are in the photo, right? Yeah, and when I tweeted it out once, people were like, I didn't know these all lived together, and they all knew each other, and they must have exchanged so many ideas. [01:01:28]Linus: I mean, similar with artists and writers that help a new kind of period blossom. [01:01:34]Swyx: Now, is it going to be San Francisco, New York, though? [01:01:36]Alessio: That's a spicy question. [01:01:39]Swyx: I don't know, we'll see. Well, we're glad to at least be a part of your world, whether it is on either coast. But it's also virtual, right? Like, we have a Discord, it's happening online as well, even if you're in a small town like Indiana. [01:01:54]Swyx: Cool, lightning round? Awesome, yeah, let's do it. [01:01:59]Alessio: We only got three questions for you. One is acceleration, one exploration, then a final takeaway. So the first one we always like to ask is like, what is something that happened in AI that you thought would take much longer than it has? [01:02:13]Swyx: Price is coming down. [01:02:14]Linus: Price is coming down and or being able to get a lot more bang for your buck. So things like GPT-3.5 Turbo being, I don't know, exactly the figure, like 10 times, 20 times cheaper. [01:02:25]Swyx: And then having GPT, then DaVinci O3. [01:02:27]Linus: Then DaVinci O3 per token, or the super long context clod, or MPT StoryWriter, these like long context models that take, theoretically would take a lot of compute to run, but they're sort of accessible to us now. I think they're surprising because I would have thought that before these things came out, that cost per token and scaling context length, and these were like sort of core constraints that you would have to design your AI systems around. And it ends up being like, if you just wait a few months, like OpenAI will figure out how to make these models 10 times cheaper. Or Anthropic will figure out how to make the models be able to take a million tokens. And the speed at which that's happened has been surprising and a little bit frightening, because it invalidates a lot of the assumptions that I was operating with, and I have to recalibrate. [01:03:11]Swyx: Yeah, there's this very famous law called Wurf's Law, also known as Gates's Law, that basically says software engineers will take up whatever hardware engineers give them. And I feel like there's a parallel law right now where language model improvements, AI UX people are going to take up all the improvements that language model people will give them. So, you know, they're trying to, while the language model people are improving the costs by a single order of magnitude, you, with your Notion AI autofill, are increasing by orders of magnitude the amount of consumption that's being used. [01:03:39]Linus: Yeah, exactly. Before the show started, we were just talking about how when I was prototyping an autofill, just to make sure that things sort of like scaled up, okay, I ended up running autofill on a database with like 6,000 pages and just summaries. And usually these are fairly long pages. I ended up running through something like two or three million tokens in a matter of like 20 minutes. [01:03:58]Swyx: Yeah. [01:03:58]Linus: Which is not too expensive, luckily, because the models are getting cheaper. It's going to be fine. But it is like $5 or $6, which the concept of like running a test on my computer and it spending the price of like a nice coffee is kind of a weird thing still that I'm getting used to. [01:04:13]Swyx: And Notion AI currently is $10 a month, something like that. So there's ways to make Notion lose money. [01:04:20]Alessio: You just get negative gross margins on that test. [01:04:24]Linus: Not sanctioned by Notion. I mean, obviously, you should use it to, you know, improve your life and support your workflows in whatever ways that's useful. [01:04:33]Swyx: Okay, second question is about exploration. What do you think is the most interesting unsolved question in AI? [01:04:39]Linus: Predictability, reliability. Well, in AI broadly, I think it's much harder. But with language models specifically, I think how to build dependable systems is really important. If you ask Notion AI or if you ask ChatGPT or Claude, like maybe a bullet list of X, Y, Z, sometimes it'll make those bullets with like the Unicode center dot. Sometimes it'll make them with a dash. Sometimes it'll like add a title. Sometimes it'll like bold random things. And all of the things are fine. But it's a little jarring if every time the answer is a little stochastic. I think this is much more of a concern for when you're automating tasks or having the model make decisions by itself. Predictability, dependability, so much of the software that runs the world is sort of behind-the-scenes decision-making programs that run inside enterprises and automate systems and make decisions for people. And auditability, dependability is just so critical to all of them. One avenue of work that I'm really intrigued by is in these decision-making systems, not having the model sort of internally as a black box make decisions, but having the model synthesize code that makes decisions. So you might ask the model for things like summarization, like natural language tasks, you have to ask the model. But if you wanted to, I don't know, let's say you have a document and you want to filter out all the dates. Instead of asking the model, hey, can you grab all the dates? You can ask the model to write a regular expression that captures a particular set of date formats that you really care about. And at that point, the output of the model is a program. And the nice thing about a program is you can kind of check it. There's lots of nice things. One is it's much cheaper to run afterwards. Another is you can verify it. And the program becomes a kind of a, what in design we call a boundary object, where it's a shared thing that exists both in the sphere of the human and the sphere of the computer. And you can iterate on it to fix bugs. And you can co-evolve this object that is now like a representation of this decision that you want the model to, the computer to make. But it's auditable and dependable and reliable. And so I'm pretty bullish on co-generation and other sort of like program synthesis and program verification techniques. But using the model to write the initial program and help the people maintain the software. [01:06:36]Swyx: Yeah, I'm so excited by that. Just in terms of reliability, I'll call out our previous guest. Rojbal. Yeah, yeah. And she's working on Guardrails AI. There's also LMQL. And then Microsoft recently put out Guidance, which is their custom language thing. Have you explored any of those? [01:06:51]Linus: I've taken a look at all of them. I've spoken to Shreya. I think this general space of like more... Speaking of adding constraints to general systems, adding constraints, adding program verification, all of these things I think are super fascinating. I also personally like it a lot. Because before I was spending a lot of my time in AI, I spent a bunch of time looking at like programming languages and compilers and interpreters. And there is just so much amazing work that has gone into how do you build automated ways to reason about a program? Like compilers and type checkers and so on. And it would be a real shame if the whole field of program synthesis and verification just became like ask GPT-4. [01:07:30]Swyx: But actually, it's not. [01:07:30]Linus: Like they work together. You write the program, you synthesize the program with GPT-4 from human descriptions. And then now we have this whole set of powerful techniques that we can use to more formally understand and prove things about programs. And I think the synergy of them, I'm excited to see. [01:07:44]Swyx: Awesome. This was great, Linus. [01:07:47]Alessio: Our last question is always, what's one message you want everyone to remember today about the space, exciting challenges? [01:07:54]Swyx: We were at the beginning. [01:07:57]Linus: Maybe this is really cliche. But one thing that I always used to say about when I was working on text interfaces last year [01:08:05]Swyx: was that I would be really disappointed [01:08:07]Linus: if in a thousand years humans are still using the same kind of like writing tools and writing systems that we are today. Like it would be pretty surprising if we're still sort of like writing documents in the same way that we are today in a thousand years. And the language and the writing system hasn't evolved at all. If humans plan to be around for many thousands of years into the future, writing has really only been around for like two, three thousand years. And it's like sort of modern form. And we should, I think, care a lot more about building flexible, powerful tools than about backwards compatibility if we plan to be around for many more times the number of years that we've been around. And so I think whether we look at something as simple as language models or as expansive as like humans interacting with text documents, I think it's worth reminding yourself often that the things that we have today are sometimes that way for a reason but often just because an artifact of like the way that we've gotten here. And text can look very different. Language models can look very different. I personally think in a couple of years we're going to do something better than transformers. So all of these things are going to change. And I think it's important to have your eyes sort of looking over the horizon at what's coming far into the future. [01:09:24]Swyx: Nice way to end it. [01:09:25]Alessio: Well, thank you, Linus, for coming on. This was great. Thank you. This was lovely. [01:09:29]Linus: Thanks for having me. [01:09:31] Get full access to Latent Space at www.latent.space/subscribe
01:09:5001/06/2023
Debugging the Internet with AI agents – with Itamar Friedman of Codium AI and AutoGPT
We are hosting the AI World’s Fair in San Francisco on June 8th! You can RSVP here. Come meet fellow builders, see amazing AI tech showcases at different booths around the venue, all mixed with elements of traditional fairs: live music, drinks, games, and food! We are also at Amplitude’s AI x Product Hackathon and are hosting our first joint Latent Space + Practical AI Podcast Listener Meetup next month!We are honored by the rave reviews for our last episode with MosaicML! They are also welcome on Apple Podcasts and Twitter/HN/LinkedIn/Mastodon etc!We recently spent a wonderful week with Itamar Friedman, visiting all the way from Tel Aviv in Israel: * We first recorded a podcast (releasing with this newsletter) covering Codium AI, the hot new VSCode/Jetbrains IDE extension focused on test generation for Python and JS/TS, with plans for a Code Integrity Agent. * Then we attended Agent Weekend, where the founders of multiple AI/agent projects got together with a presentation from Toran Bruce Richards on Auto-GPT’s roadmap and then from Itamar on Codium’s roadmap* Then some of us stayed to take part in the NextGen Hackathon and won first place with the new AI Maintainer project.So… that makes it really hard to recap everything for you. But we’ll try!Podcast: Codium: Code Integrity with Zero BugsWhen it launched in 2021, there was a lot of skepticism around Github Copilot. Fast forward to 2023, and 40% of all code is checked in unmodified from Copilot. Codium burst on the scene this year, emerging from stealth with an $11m seed, their own foundation model (TestGPT-1) and a vision to revolutionize coding by 2025.You might have heard of "DRY” programming (Don’t Repeat Yourself), which aims to replace repetition with abstraction. Itamar came on the pod to discuss their “extreme DRY” vision: if you already spent time writing a spec, why repeat yourself by writing the code for it? If the spec is thorough enough, automated agents could write the whole thing for you.Live Demo Video SectionThis is referenced in the podcast about 6 minutes in.Timestamps, show notes, and transcript are below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Auto-GPT: A Roadmap To The Future of WorkMaking his first public appearance, Toran (perhaps better known as @SigGravitas on GitHub) presented at Agents Weekend:Lightly edited notes for those who want a summary of the talk:* What is AutoGPT?AutoGPT is an Al agent that utilizes a Large Language Model to drive its actions and decisions. It can be best described as a user sitting at a computer, planning and interacting with the system based on its goals. Unlike traditional LLM applications, AutoGPT does not require repeated prompting by a human. Instead, it generates its own 'thoughts', criticizes its own strategy and decides what next actions to take.* AutoGPT was released on GitHub in March 2023, and went viral on April 1 with a video showing automatic code generation. 2 months later it has 132k+ stars, is the 29th highest ranked open-source project of all-time, a thriving community of 37.5k+ Discord members, 1M+ downloads.* What’s next for AutoGPT? The initial release required users to know how to build and run a codebase. They recently announced plans for a web/desktop UI and mobile app to enable nontechnical/everyday users to use AutoGPT. They are also working on an extensible plugin ecosystem called the Abilities Hub also targeted at nontechnical users.* Improving Efficacy. AutoGPT has many well documented cases where it trips up. Getting stuck in loops, using instead of actual content incommands, and making obvious mistakes like execute_code("writea cookbook"'. The plan is a new design called Challenge Driven Development - Challenges are goal-orientated tasks or problems thatAuto-GPT has difficulty solving or has not yet been able to accomplish. These may include improving specific functionalities, enhancing the model's understanding of specific domains, or even developing new features that the current version of Auto-GPT lacks. (AI Maintainer was born out of one such challenge). Itamar compared this with Software 1.0 (Test Driven Development), and Software 2.0 (Dataset Driven Development).* Self-Improvement. Auto-GPT will analyze its own codebase and contribute to its own improvement. AI Safety (aka not-kill-everyone-ists) people like Connor Leahy might freak out at this, but for what it’s worth we were pleasantly surprised to learn that Itamar and many other folks on the Auto-GPT team are equally concerned and mindful about x-risk as well.The overwhelming theme of Auto-GPT’s roadmap was accessibility - making AI Agents usable by all instead of the few.Podcast Timestamps* [00:00:00] Introductions* [00:01:30] Itamar’s background and previous startups* [00:03:30] Vision for Codium AI: reaching “zero bugs”* [00:06:00] Demo of Codium AI and how it works* [00:15:30] Building on VS Code vs JetBrains* [00:22:30] Future of software development and the role of developers* [00:27:00] The vision of integrating natural language, testing, and code* [00:30:00] Benchmarking AI models and choosing the right models for different tasks* [00:39:00] Codium AI spec generation and editing* [00:43:30] Reconciling differences in languages between specs, tests, and code* [00:52:30] The Israeli tech scene and startup culture* [01:03:00] Lightning RoundShow Notes* Codium AI* Visualead* AutoGPT* StarCoder* TDD (Test-Driven Development)* AST (Abstract Syntax Tree)* LangChain* ICON* AI21TranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. I'm joined by my co-host, Swyx, writer and editor of Latent Space.Swyx: Today we have a special guest, Tamar Friedman, all the way from Tel Aviv, CEO and co-founder of Codium AI. Welcome.Itamar: Hey, great being here. Thank you for inviting me.Swyx: You like the studio? It's nice, right?Itamar: Yeah, they're awesome.Swyx: So I'm gonna introduce your background a little bit and then we'll learn a bit more about who you are. So you graduated from Teknion Israel Institute of Technology's kind of like the MIT of of Israel. You did a BS in CS, and then you also did a Master's in Computer Vision, which is kind of relevant.You had other startups before this, but your sort of claim to fame is Visualead, which you started in 2011 and got acquired by Alibaba Group You showed me your website, which is the sort of QR codes with different forms of visibility. And in China that's a huge, huge deal. It's starting to become a bigger deal in the west. My favorite anecdote that you told me was something about how much sales use you saved or something. I forget what the number was.Itamar: Generally speaking, like there's a lot of peer-to-peer transactions going on, like payments and, and China with QR codes. So basically if for example 5% of the scanning does not work and with our scanner we [00:01:30] reduce it to 4%, that's a lot of money. Could be tens of millions of dollars a day.Swyx: And at the scale of Alibaba, it serves all of China. It's crazy. You did that for seven years and you're in Alibaba until 2021 when you took some time off and then hooked up with Debbie, who you've known for 25 years, to start Codium AI and you just raised your $11 million seed rounds with TlB Partners and Vine. Congrats. Should we go right into Codium? What is Codium?Itamar: So we are an AI coding assistant / agent to help developers reaching zero bugs. We don't do that today. Right now, we help to reduce the amount of bugs. Actually you can see people commenting on our marketplace page saying that they found bugs with our tool, and that's like our premise. Our vision is like for Tesla zero emission or something like that, for us it's zero bugs.We started with building an IDE extension either in VS Code or in JetBrains. And that actually works alongside the main panel where you write your code and I can show later what we do is analyze the code, whether you started writing it or you completed it.Like you can go both TDD (Test-Driven Development) or classical coding. And we offer analysis, tests, whether they pass or not, we further self debug [00:03:00] them and make suggestions eventually helping to improve the code quality specifically on code logic testing.Alessio: How did you get there? Obviously it's a great idea. Like, what was the idea, maze? How did you get here?Itamar: I'll go back long. So, yes I was two and a half times a CTO, VC backed startup CTO where we talked about the last one that I sold to Alibaba. But basically I'm like, it's weird to say by 20 years already of R&D manager, I'm not like the best programmer because like you mentioned, I'm coming more from the machine learning / computer vision side, one, one of the main application, but a lot of optimization. So I’m not necessarily the best coder, but I am like 20 year R&D manager. And I found that verifying code logic is very hard thing. And one of the thing that really makes it difficult to increase the development velocity.So you have tools related to checking performance.You have tools for vulnerabilities and security, Israelis are really good at that. But do you have a tool that actually helps you test code logic? I think what we have like dozens or hundreds, even thousands that help you on the end to end, maybe on the microservice integration system. But when you talk about code level, there isn't anything.So that was the pain I always had, especially when I did have tools for that, for the hardware. Like I worked in Mellanox to be sold to Nvidia as a student, and we had formal tools, et cetera. [00:04:30] So that's one part.The second thing is that after being sold to Alibaba, the team and I were quite a big team that worked on machine learning, large language model, et cetera, building developer tools relate with, with LLMs throughout the golden years of. 2017 to 2021, 2022. And we saw how powerful they became.So basically, if I frame it this way, because we develop it for so many use cases, we saw that if you're able to take a problem put a framework of a language around it, whether it's analyzing browsing behavior, or DNA, or etc, if you can put a framework off a language, then LLMs take you really far.And then I thought this problem that I have with code logic testing is basically a combination of a few languages: natural language, specification language, technical language. Even visual language to some extent. And then I quit Alibaba and took a bit of time to maybe wrap things around and rest a bit after 20 years of startup and corporate and joined with my partner Dedy Kredo who was my ever first employee.And that's how we like, came to this idea.Alessio: The idea has obviously been around and most people have done AST analysis, kinda like an abstract syntax tree, but it's kind of hard to get there with just that. But I think these models now are getting good enough where you can mix that and also traditional logical reasoning.Itamar: Exactly.Alessio: Maybe talk a little bit more about the technical implementation of it. You mentioned the agent [00:06:00] part. You mentioned some of the model part, like what happens behind the scenes when Codium gets in your code base?Itamar: First of all, I wanna mention I think you're really accurate.If you try to take like a large language model as is and try to ask it, can you like, analyze, test the code, etc, it'll not work so good. By itself it's not good enough on the other side, like all the traditional techniques we already started to invent since the Greek times. You know, logical stuff, you mentioned ASTs, but there's also dynamic code analysis, mutation testing, etc. There's a lot of the techniques out there, but they have inefficiencies.And a lot of those inefficiencies are actually matching with AI capabilities. Let me give you one example. Let's say you wanna do fuzzy testing or mutation testing.Mutation testing means that you either mutate the test, like the input of the test, the code of the test, etc or you mutate the code in order to check how good is your test suite.For example, if I mutate some equation in the application code and the test finds a bug and it does that at a really high rate, like out of 100 mutation, I [00:07:30] find all of the 100 problems in the test. It's probably a very strong test suite.Now the problem is that there's so many options for what to mutate in the data, in the test. And this is where, for example, AI could help, like pointing out where's the best thing that you can mutate. Actually, I think it's a very good use case. Why? Because even if AI is not 100% accurate, even if it's 80% accurate, it could really take you quite far rather just randomly selecting things.So if I wrap up, just go back high level. I think LLM by themselves cannot really do the job of verifying code logic and and neither can the traditional ones, so you need to merge them. But then one more thing before maybe you tell me where to double click. I think with code logic there's also a philosophy question here.Logic different from performance or quality. If I did a three for in loop, like I loop three things and I can fold them with some vector like in Python or something like that. We need to get into the mind of the developer. What was the intention? Like what is the bad code? Not what is the code logic that doesn't work. It's not according to the specification. So I think like one more thing that AI could really help is help to match, like if there is some natural language description of the code, we can match it. Or if there's missing information in natural language that needs [00:09:00] to be asked for the AI could help asking the user.It's not like a closed solution. Rather open and leaving the developer as the lead. Just like moving the developer from, from being the coder to actually being like a pilot that that clicks button and say, ah, this is what I meant, or this is the fix, rather actually writing all the code.Alessio: That makes sense. I think I talked about it on the podcast before, but like the switch from syntax to like semantics, like developers used to be focused on the syntax and not the meaning of what they're writing. So now you have the models that are really good at the syntax and you as a human are supposed to be really good at the semantics of what you're trying to build.How does it practically work? So I'm a software developer, I want to use Codium, like how do I start and then like, how do you make that happen in the, in the background?Itamar: So, like I said, Codium right now is an IDE extension. For example, I'm showing VS code. And if you just install it, like you'll have a few access points to start Codium AI, whether this sidebar or above every component or class that we think is very good to check with Codium.You'll have this small button. There's other way you can mark specific code and right click and run code. But this one is my favorite because we actually choose above which components we suggest to use code. So once I click it code, I starts analyzing this class. But not only this class, but almost everything that is [00:10:30] being used by the call center class.But all and what's call center is, is calling. And so we do like a static code analysis, et cetera. What, what we talked about. And then Codium provides with code analysis. It's right now static, like you can't change. It can edit it, and maybe later we'll talk about it. This is what we call the specification and we're going to make it editable so you can add additional behaviors and then create accordingly, test that will not pass, and then the code will, will change accordingly. So that's one entrance point, like via natural language description. That's one of the things that we're working on right now. What I'm showing you by the way, could be downloaded as is. It's what we have in production.The second thing that we show here is like a full test suite. There are six tests by default but you can just generate more almost as much as you want every time. We'll try to cover something else, like a happy pass edge case et cetera. You can talk with specific tests, okay? Like you can suggest I want this in Spanish or give a few languages, or I want much more employees.I didn't go over what's a call center, but basically it manages like call center. So you can imagine, I can a ask to make it more rigorous, etc, but I don't wanna complicate so I'm keeping it as is.I wanna show you the next one, which is run all test. First, we verify that you're okay, we're gonna run it. I don't know, maybe we are connected to the environment that is currently [00:12:00] configured in the IDE. I don't know if it's production for some reason, or I don't know what. Then we're making sure that you're aware we're gonna run the code that and then once we run, we show if it pass or fail.I hope that we'll have one fail. But I'm not sure it's that interesting. So I'll go like to another example soon, but, but just to show you what's going on here, that we actually give an example of what's a problem. We give the log of the error and then you can do whatever you want.You can fix it by yourself, or you can click reflect and fix, and what's going on right now is a bit a longer process where we do like chain of thought or reflect and fix. And we can suggest a solution. You can run it and in this case it passes. Just an example, this is a very simple example.Maybe later I'll show you a bug. I think I'll do that and I'll show you a bug and how we recognize actually the test. It's not a problem in the test, it's a problem in the code and then suggest you fix that instead of the code. I think you see where I'm getting at.The other thing is that there are a few code suggestion, and there could be a dozen of, of types that could be related to performance modularity or I see this case there is a maintainability.There could also be vulnerability or best practices or even suggestion for bugs. Like if we noticed, if we think one of the tests, for example, is failing because of a bug. So just code presented in the code suggestion. Probably you can choose a few, for example, if you like, and then prepare a code change like I didn't show you which exactly.We're making a diff now that you can apply on your code. So basically what, what we're seeing here is that [00:13:30] there are three main tabs, the code, the test and the code analysis. Let's call spec.And then there's a fourth tab, which is a code suggestion, if you wanna look at analytics, etc. Mm-hmm. Right now code okay. This is the change or quite a big change probably clicked on something. So that's the basic demo.Right now let's be frank. Like I wanted to show like a simple example. So it's a call center. All the inputs to the class are like relatively simple. There is no jsm input, like if you're Expedia or whatever, you have a J with the hotels, Airbnb, you know, so the test will be almost like too simple or not covering enough.Your code, if you don't provide it with some input is valuable, like adjacent with all information or YAMA or whatever. So you can actually add input data and the AI or model. It's actually by the way, a set of models and algorithms that will use that input to create interesting tests. And another thing is many people have some reference tests that they already made. It could be because they already made it or because they want like a very specific they have like how they imagine the test. So they just write one and then you add a reference and that will inspire all the rest of the tests. And also you can give like hints. [00:15:00] This is by the way plan to be like dynamic hints, like for different type of code.We will provide different hints. So we can help you become a bit more knowledgeable about how to test your code. So you can ask for like having a, a given one then, or you can have like at a funny private, like make different joke for each test or for example,Swyx: I'm curious, why did you choose that one? This is the pirate one. Yeah.Itamar: Interesting choice to put on your products. It could be like 11:00 PM of people sitting around. Let's choose one funny thingSwyx: and yeah. So two serious ones and one funny one. Yeah. Just for the listening audience, can you read out the other hints that you decided on as well?Itamar: Yeah, so specifically, like for this case, relatively very simple class, so there's not much to do, but I'm gonna go to one more thing here on the configuration. But it basically is given when then style, it's one of the best practices and tests. So even when I report a bug, for example, I found a bug when someone else code, usually I wanna say like, given, use this environment or use that this way when I run this function, et cetera.Oh, then it's a very, very full report. And it's very common to use that in like in unit test and perform.Swyx: I have never been shown this format.Itamar: I love that you, you mentioned that because if you go to CS undergrad you take so many courses in development, but none of them probably in testing, and it's so important. So why would you, and you don't go to Udemy or [00:16:30] whatever and, and do a testing course, right? Like it's, it's boring. Like people either don't do component level testing because they hate it or they do it and they hate it. And I think part of it it’s because they're missing tool to make it fun.Also usually you don't get yourself educated about it because you wanna write your code. And part of what we're trying to do here is help people get smarter about testing and make it like easy. So this is like very common. And the idea here is that for different type of code, we'll suggest different type of hints to make you more knowledgeable.We're doing it on an education app, but we wanna help developers become smarter, more knowledgeable about this field. And another one is mock. So right now, our model decided that there's no need for mock here, which is a good decision. But if we would go to real world case, like, I'm part of AutoGPT community and there's all of tooling going on there. Right? And maybe when I want to test like a specific component, and it's relatively clear that going to the web and doing some search and coming back, I don't really need to do that. Like I know what I expect to do and so I can mock that part of using to crawl the web.A certain percentage of accuracy, like around 90, we will decide this is worth mocking and we will inject it. I can click it now and force our system to mock this. But you'll see like a bit stupid mocking because it really doesn't make sense. So I chose this pirate stuff, like add funny pirate like doc stringing make a different joke for each test.And I forced it to add mocks, [00:18:00] the tests were deleted and now we're creating six new tests. And you see, here's the shiver me timbers, the test checks, the call successful, probably there's some joke at the end. So in this case, like even if you try to force it to mock it didn't happen because there's nothing but we might find here like stuff that it mock that really doesn't make sense because there's nothing to mock here.So that's one thing I. I can show a demo where we actually catch a bug. And, and I really love that, you know how it is you're building a developer tools, the best thing you can see is developers that you don't know giving you five stars and sharing a few stuff.We have a discord with thousands of users. But I love to see the individual reports the most. This was one of my favorites. It helped me to find two bugs. I mentioned our vision is to reach zero bugs. Like, if you may say, we want to clean the internet from bugs.Swyx: So debugging the internet. I have my podcast title.Itamar: So, so I think like if we move to another exampleSwyx: Yes, yes, please, please. This is great.Itamar: I'm moving to a different example, it is the bank account. By the way, if you go to ChatGPT and, and you can ask me what's the difference between Codium AI and using ChatGPT.Mm-hmm. I'm, I'm like giving you this hard question later. Yeah. So if you ask ChatGPT give me an example to test a code, it might give you this bank account. It's like the one-on-one stuff, right? And one of the reasons I gave it, because it's easy to inject bugs here, that's easy to understand [00:19:30] anyway.And what I'm gonna do right now is like this bank account, I'm gonna change the deposit from plus to minus as an example. And then I'm gonna run code similarly to how I did before, like it suggests to do that for the entire class. And then there is the code analysis soon. And when we announce very soon, part of this podcast, it's going to have more features here in the code analysis.We're gonna talk about it. Yep. And then there is the test that I can run. And the question is that if we're gonna catch the bag, the bugs using running the test, Because who knows, maybe this implementation is the right one, right? Like you need to, to converse with the developer. Maybe in this weird bank, bank you deposit and, and the bank takes money from you.And we could talk about how this happens, but actually you can see already here that we are already suggesting a hint that something is wrong here and here's a suggestion to put it from minus to to plus. And we'll try to reflect and, and fix and then we will see actually the model telling you, hey, maybe this is not a bug in the test, maybe it's in the code.Swyx: I wanna stay on this a little bit. First of all, this is very impressive and I think it's very valuable. What user numbers can you disclose, you launched it and then it's got fairly organic growth. You told me something off the air, but you know, I just wanted to show people like this is being adopted in quite a large amount.Itamar: [00:21:00] First of all, I'm a relatively transparent person. Like even as a manager, I think I was like top one percentile being transparent in Alibaba. It wasn't five out of five, which is a good thing because that's extreme, but it was a good, but it also could be a bad, some people would claim it's a bad thing.Like for example, if my CTO in Alibaba would tell me you did really bad and it might cut your entire budget by 30%, if in half a year you're not gonna do like much better and this and that. So I come back to a team and tell 'em what's going on without like trying to smooth thing out and we need to solve it together.If not, you're not fitting in this team. So that's my point of view. And the same thing, one of the fun thing that I like about building for developers, they kind of want that from you. To be transparent. So we are on the high numbers of thousands of weekly active users. Now, if you convert from 50,000 downloads to high thousands of weekly active users, it means like a lot of those that actually try us keep using us weekly.I'm not talking about even monthly, like weekly. And that was like one of their best expectations because you don't test your code every day. Right now, you can see it's mostly focused on testing. So you probably test it like once a week. Like we wanted to make it so smooth with your development methodology and development lifecycle that you use it every day.Like at the moment we hope it to be used weekly. And that's what we're getting. And the growth is about like every two, three weeks we double the amount of weekly and downloads. It's still very early, like seven weeks. So I don't know if it'll keep that way, but we hope so. Well [00:22:30] actually I hope that it'll be much more double every two, three weeks maybe. Thanks to the podcast.Swyx: Well, we, yeah, we'll, we'll add you know, a few thousand hopefully. The reason I ask this is because I think there's a lot of organic growth that people are sharing it with their friends and also I think you've also learned a lot from your earliest days in, in the private beta test.Like what have you learned since launching about how people want to use these testing tools?Itamar: One thing I didn't share with you is like, when you say virality, there is like inter virality and intra virality. Okay. Like within the company and outside the company. So which teams are using us? I can't say, but I can tell you that a lot of San Francisco companies are using us.And one of the things like I'm really surprised is that one team, I saw one user two weeks ago, I was so happy. And then I came yesterday and I saw 48 of that company. So what I'm trying to say to be frank is that we see more intra virality right now than inter virality. I don't see like video being shared all around Twitter. See what's going on here. Yeah. But I do see, like people share within the company, you need to use it because it's really helpful with productivity and it's something that we will work about the [00:24:00] inter virality.But to be frank, first I wanna make sure that it's helpful for developers. So I care more about intra virality and that we see working really well, because that means that tool is useful. So I'm telling to my colleague, sharing it on, on Twitter means that I also feel that it will make me cool or make me, and that's something maybe we'll need, still need, like testing.Swyx: You know, I don't, well, you're working on that. We're gonna announce something like that. Yeah. You are generating these tests, you know, based on what I saw there. You're generating these tests basically based on the name of the functions. And the doc strings, I guess?Itamar:So I think like if you obfuscate the entire code, like our accuracy will drop by 50%. So it's right. We're using a lot of hints that you see there. Like for example, the functioning, the dog string, the, the variable names et cetera. It doesn't have to be perfect, but it has a lot of hints.By the way. In some cases, in the code suggestion, we will actually suggest renaming some of the stuff that will sync, that will help us. Like there's suge renaming suggestion, for example. Usually in this case, instead of calling this variable is client and of course you'll see is “preferred client” because basically it gives a different commission for that.So we do suggest it because if you accept it, it also means it will be easier for our model or system to keep improving.Swyx: Is that a different model?Itamar: Okay. That brings a bit to the topic of models properties. Yeah. I'll share it really quickly because Take us off. Yes. It's relevant. Take us off. Off. Might take us off road.I think [00:25:30] like different models are better on different properties, for example, how obedient you are to instruction, how good you are to prompt forcing, like to format forcing. I want the results to be in a certain format or how accurate you are or how good you are in understanding code.There's so many calls happening here to models by the way. I. Just by clicking one, Hey Codium AI. Can you help me with this bank account? We do a dozen of different calls and each feature you click could be like, like with that reflect and fix and then like we choose the, the best one.I'm not talking about like hundreds of models, but we could, could use different APIs of open AI for example, and, and other models, et cetera. So basically like different models are better on different aspect. Going back to your, what we talked about, all the models will benefit from having those hints in, in the code, that rather in the code itself or documentation, et cetera.And also in the code analysis, we also consider the code analysis to be the ground truth to some extent. And soon we're also going to allow you to edit it and that will use that as well.Alessio: Yeah, maybe talk a little bit more about. How do I actually get all these models to work together? I think there's a lot of people that have only been exposed to Copilot so far, which is one use case, just complete what I'm writing. You're doing a lot more things here. A lot of people listening are engineers themselves, some of them build these tools, so they would love to [00:27:00] hear more about how do you orchestrate them, how do you decide which model the what, stuff like that.Itamar: So I'll start with the end because that is a very deterministic answer, is that we benchmark different models.Like every time this there a new model in, in town, like recently it's already old news. StarCoder. It's already like, so old news like few days ago.Swyx: No, no, no. Maybe you want to fill in what it is StarCoder?Itamar: I think StarCoder is, is a new up and coming model. We immediately test it on different benchmark and see if, if it's better on some properties, et cetera.We're gonna talk about it like a chain of thoughts in different part in the chain would benefit from different property. If I wanna do code analysis and, and convert it to natural language, maybe one model would be, would be better if I want to output like a result in, in a certain format.Maybe another model is better in forcing the, a certain format you probably saw on Twitter, et cetera. People talk about it's hard to ask model to output JSON et cetera. So basically we predefine. For different tasks, we, we use different models and I think like this is for individuals, for developers to check, try to sync, like the test that now you are working on, what is most important for you to get, you want the semantic understanding, that's most important? You want the output, like are you asking for a very specific [00:28:30] output?It's just like a chat or are you asking to give a output of code and have only code, no description. Or if there's a description of the top doc string and not something else. And then we use different models. We are aiming to have our own models in in 2024. Being independent of any other third party, like OpenAI or so, but since our product is very challenging, it has UI/UX challenges, engineering challenge, statical and dynamical analysis, and AI.As entrepreneur, you need to choose your battles. And we thought that it's better for us to, to focus on everything around the model. And one day when we are like thinking that we have the, the right UX/UI engineering, et cetera, we'll focus on model building. This is also, by the way, what we did in in Alibaba.Even when I had like half a million dollar a month for trading one foundational model, I would never start this way. You always try like first using the best model you can for your product. Then understanding what's the glass ceiling for that model? Then fine tune a foundation model, reach a higher glass ceiling and then training your own.That's what we're aiming and that's what I suggest other developers like, don't necessarily take a model and, and say, oh, it's so easy these days to do RLHF, et cetera. Like I see it’s like only $600. Yeah, but what are you trying to optimize for? The properties. Don't try to like certain models first, organize your challenges.Understand the [00:30:00] properties you're aiming for and start playing with that. And only then go to train your own model.Alessio: Yeah. And when you say benchmark, you know, we did a one hour long episode, some benchmarks, there's like many of them. Are you building some unique evals to like your own problems? Like how are you doing that? And that's also work for your future model building, obviously, having good benchmarks. Yeah.Itamar:. Yeah. That's very interesting. So first of all, with all the respect, I think like we're dealing with ML benchmark for hundreds of years now.I'm, I'm kidding. But like for tens of years, right? Benchmarking statistical creatures is something that, that we're doing for a long time. I think what's new here is the generative part. It's an open challenge to some extent. And therefore, like maybe we need to re rethink some of the way we benchmark.And one of the notions that I really believe in, I don't have a proof for that, is like create a benchmark in levels. Let's say you create a benchmark from level one to 10, and it's a property based benchmark. Let's say I have a WebGPT ask something from the internet and then it should fetch it for me.So challenge level one could be, I'm asking it and it brings me something. Level number two could be I'm asking it and it has a certain structure. Let's say for example, I want to test AutoGPT. Okay. And I'm asking it to summarize what's the best cocktail I could have for this season in San Francisco.So [00:31:30] I would expect, like, for example, for that model to go. This is my I what I think to search the internet and do a certain thing. So level number three could be that I want to check that as part of this request. It uses a certain tools level five, you can add to that. I expect that it'll bring me back something like relevance and level nine it actually prints the cocktail for me I taste it and it's good. So, so I think like how I see it is like we need to have data sets similar to before and make sure that we not fine tuning the model the same way we test it. So we have one challenges that we fine tune over, right? And few challenges that we don't.And the new concept may is having those level which are property based, which is something that we know from software testing and less for ML. And this is where I think that these two concepts merge.Swyx: Maybe Codium can do ML testing in the future as well.Itamar: Yeah, that's a good idea.Swyx: Okay. I wanted to cover a little bit more about Codium in the present and then we'll go into the slides that you have.So you have some UI/UX stuff and you've obviously VS Code is the majority market share at this point of IDE, but you also have IntelliJ right?Itamar: Jet Brains in general.Swyx: Yeah. Anything that you learned supporting JetBrains stuff? You were very passionate about this one user who left you a negative review.What is the challenge of that? Like how do you think about the market, you know, maybe you should focus on VS Code since it's so popular?Itamar: Yeah. [00:33:00] So currently the VS Code extension is leading over JetBrains. And we were for a long time and, and like when I tell you long time, it could be like two or three weeks with version oh 0.5, point x something in, in VS code, although oh 0.4 or so a jet brains, we really saw the difference in, in the how people react.So we also knew that oh 0.5 is much more meaningful and one of the users left developers left three stars on, on jet brands and I really remember that. Like I, I love that. Like it's what do you want to get at, at, at our stage? What's wrong? Like, yes, you want that indication, you know, the worst thing is getting nothing.I actually, not sure if it's not better to get even the bad indication, only getting good ones to be re frank like at, at, at least in our stage. So we're, we're 9, 10, 10 months old startup. So I think like generally speaking We find it easier and fun to develop in vs code extension versus JetBrains.Although JetBrains has like very nice property, when you develop extension for one of the IDEs, it usually works well for all the others, like it's one extension for PyCharm, and et cetera. I think like there's even more flexibility in the VS code. Like for example, this app is, is a React extension as opposed that it's native in the JetBrains one we're using. What I learned is that it's basically is almost like [00:34:30] developing Android and iOS where you wanna have a lot of the best practices where you have one backend and all the software development like best practices with it.Like, like one backend version V1 supports both under Android and iOS and not different backends because that's crazy. And then you need all the methodology. What, what means that you move from one to 1.1 on the backend? What supports whatnot? If you don't what I'm talking about, if you developed in the past, things like that.So it's important. And then it's like under Android and iOS and, and you relatively want it to be the same because you don't want one developer in the same team working with Jet Brains and then other VS code and they're like talking, whoa, that's not what I'm seeing. And with code, what are you talking about?And in the future we're also gonna have like teams offering of collaboration Right now if you close Codium Tab, everything is like lost except of the test code, which you, you can, like if I go back to a test suite and do open as a file, and now you have a test file with everything that you can just save, but all the goodies here it's lost. One day we're gonna have like a platform you can save all that, collaborate with people, have it part of your PR, like have suggested part of your PR. And then you wanna have some alignment. So one of the challenges, like UX/UI, when you think about a feature, it should, some way or another fit for both platforms be because you want, I think by the way, in iOS and Android, Android sometimes you don’t care about parity, but here you're talking about developers that might be on the same [00:36:00] team.So you do care a lot about that.Alessio: Obviously this is a completely different way to work for developers. I'm sure this is not everything you wanna build and you have some hint. So maybe take us through what you see the future of software development look like.Itamar: Well, that's great and also like related to our announcement, what we're working on.Part of it you already start seeing in my, in my demo before, but now I'll put it into a framework. I'll be clearer. So I think like the software development world in 2025 is gonna look very different from 2020. Very different. By the way. I think 2020 is different from 2000. I liked the web development in 95, so I needed to choose geocities and things like that.Today's much easier to build a web app and whatever, one of the cloud. So, but I think 2025 is gonna look very different in 2020 for the traditional coding. And that's like a paradigm I don't think will, will change too much in the last few years. And, and I'm gonna go over that when I, when I'm talking about, so j just to focus, I'm gonna show you like how I think the intelligence software development world look like, but I'm gonna put it in the lens of Codium AI.We are focused on code integrity. We care that with all this advancement of co-generation, et cetera, we wanna make sure that developers can code fast with confidence. That they have confidence on generated code in the AI that they are using that. That's our focus. So I'm gonna put, put that like lens when I'm going to explain.So I think like traditional development. Today works like creating some spec for different companies, [00:37:30] different development teams. Could mean something else, could be something on Figma, something on Google Docs, something on Jira. And then usually you jump directly to code implementation. And then if you have the time or patience, or will, you do some testing.And I think like some people would say that it's better to do TDD, like not everyone. Some would say like, write spec, write your tests, make sure they're green, that they do not pass. Write your implementation until your test pass. Most people do not practice it. I think for just a few, a few reason, let them mention two.One, it's tedious and I wanna write my code like before I want my test. And I don't think, and, and the second is, I think like we're missing tools to make it possible. And what we are advocating, what I'm going to explain is actually neither. Okay. It's very, I want to say it's very important. So here's how we think that the future of development pipeline or process is gonna look like.I'm gonna redo it in steps. So, first thing I think there do I wanna say that they're gonna be coding assistance and coding agents. Assistant is like co-pilot, for example, and agents is something that you give it a goal or a task and actually chains a few tasks together to complete your goal.Let's have that in mind. So I think like, What's happening right now when you saw our demo is what I presented a few minutes ago, is that you start with an implementation and we create spec for you and test for you. And that was like a agent, like you didn't converse with it, you just [00:39:00] click a button.And, and we did a, a chain of thought, like to create these, that's why it's it's an agent. And then we gave you an assistant to change tests, like you can converse it with it et cetera. So that's like what I presented today. What we're announcing is about a vision that we called the DRY. Don't repeat yourself. I'm gonna get to that when I'm, when I'm gonna show you the entire vision. But first I wanna show you an intermediate step that what we're going to release. So right now you can write your code. Or part of it, like for example, just a class abstract or so with a coding assistant like copilot and maybe in the future, like a Codium AI coding assistant.And then you can create a spec I already presented to you. And the next thing is that you going to have like a spec assistant to generate technical spec, helping you fill it quickly focused on that. And this is something that we're working on and, and going to release the first feature very soon as part of announcement.And it's gonna be very lean. Okay? We're, we're a startup that going bottom up, like lean features going to more and more comprehensive one. And then once you have the spec and implementation, you can either from implementation, have tests, and then you can run the test and fix them like I presented to you.But you can also from spec create tests, okay? From the spec directly to tests. [00:40:30]So then now you have a really interesting thing going on here is that you can start from spec, create, test, create code. You can start from test create code. You can start from a limitation. From code, create, spec and test. And actually we think the future is a very flexible one. You don't need to choose what you're practicing traditional TDD or whatever you wanna start with.If you have already some spec being created together with one time in one sprint, you decided to write a spec because you wanted to align about it with your team, et cetera, and now you can go and create tests and implementation or you wanted to run ahead and write your code. Creating tests and spec that aligns to it will be relatively easy.So what I'm talking about is extreme DRY concept; DRY is don't repeat yourself. Until today when we talked about DRY is like, don't repeat your code. I claim that there is a big parts of the spec test and implementation that repeat himself, but it's not a complete repetition because if spec was as detailed as the implementation, it's actually the implementation.But the spec is usually in different language, could be natural language and visual. And what we're aiming for, our vision is enabling the dry concept to the extreme. With all these three: you write your test will help you generate the code and the spec you write your spec will help you doing the test and implementation.Now the developers is the driver, okay? You'll have a lot [00:42:00] of like, what do you think about this? This is what you meant. Yes, no, you wanna fix the coder test, click yes or no. But you still be the driver. But there's gonna be like extreme automation on the DRY level. So that's what we're announcing, that we're aiming for as our vision and what we're providing these days in our product is the middle, is what, what you see in the middle, which is our code integrity agents working for you right now in your id, but soon also part of your Github actions, et cetera, helping you to align all these three.Alessio: This is great. How do you reconcile the difference in languages, you know, a lot of times the specs is maybe like a PM or it's like somebody who's more at the product level.Some of the implementation details is like backend developers for something. Frontend for something. How do you help translate the language between the two? And then I think in the one of the blog posts on your blog, you mentioned that this is also changing maybe how programming language themselves work. How do you see that change in the future? Like, are people gonna start From English, do you see a lot of them start from code and then it figures out the English for them?Itamar: Yeah. So first of all, I wanna say that although we're working, as we speak on managing we front-end frameworks and languages and usage, we are currently focused on the backend.So for example, as the spec, we won't let you input Figma, but don't be surprised if in 2024 the input of the spec could be a Figma. Actually, you can see [00:43:30] demos of that on a pencil drawing from OpenAI and when he exposed the GPT-4. So we will have that actually.I had a blog, but also I related to two different blogs. One, claiming a very knowledgeable and respectful, respectful person that says that English is going to be the new language program language and, and programming is dead. And another very respectful person, I think equally said that English is a horrible programming language.And actually, I think both of are correct. That's why when I wrote the blog, I, I actually related, and this is what we're saying here. Nothing is really fully redundant, but what's annoying here is that to align these three, you always need to work very hard. And that's where we want AI to help with. And if there is inconsistency will raise a question, what do, which one is true?And just click yes or no or test or, or, or code that, that what you can see in our product and we'll fix the right one accordingly. So I think like English and, and visual language and code. And the test language, let's call it like, like that for a second. All of them are going to persist. And just at the level of automation aligning all three is what we're aiming for.Swyx: You told me this before, so I I'm, I'm just actually seeing Alessio’s reaction to it as a first time.Itamar: Yeah, yeah. Like you're absorbing like, yeah, yeah.Swyx: No, no. This is, I mean, you know, you can put your VC hat on or like compare, like what, what is the most critical or unsolved question presented by this vision?Alessio: A lot of these tools, especially we've seen a lot in the past, it's like the dynamic nature of a lot of this, you know?[00:45:00] Yeah. Sometimes, like, as you mentioned, sometimes people don't have time to write the test. Sometimes people don't have time to write the spec. Yeah. So sometimes you end up with things. Out of sync, you know? Yeah. Or like the implementation is moving much faster than the spec, and you need some of these agents to make the call sometimes to be like, no.Yeah, okay. The spec needs to change because clearly if you change the code this way, it needs to be like this in the future. I think my main question as a software developer myself, it's what is our role in the future? You know? Like, wow, how much should we intervene, where should we intervene?I've been coding for like 15 years, but if I've been coding for two years, where should I spend the next year? Yeah. Like focus on being better at understanding product and explain it again. Should I get better at syntax? You know, so that I can write code. Would love have any thoughts.Itamar: Yeah. You know, there's gonna be a difference between 1, 2, 3 years, three to six, six to 10, and 10 to 20. Let's for a second think about the idea that programming is solved. Then we're talking about a machine that can actually create any piece of code and start creating, like we're talking about singularity, right?Mm-hmm. If the singularity happens, then we're talking about this new set of problems. Let's put that aside. Like even if it happens in 2041, that's my prediction. I'm not sure like you should aim for thinking what you need to do, like, or not when the singularity happens. So I, [00:46:30] I would aim for mm-hmm.Like thinking about the future of the next five years or or, so. That's my recommendation because it's so crazy. Anyway. Maybe not the best recommendation. Take that we're for grain of salt. And please consult with a lawyer, at least in the scope of, of the next five years. The idea that the developers is the, the driver.It actually has like amazing team members. Agents that working for him or her and eventually because he or she's a driver, you need to understand especially what you're trying to achieve, but also being able to review what you get. The better you are in the lower level of programming in five years, it it mean like real, real program language.Then you'll be able to develop more sophisticated software and you will work in companies that probably pay more for sophisticated software and the more that you're less skilled in, in the actual programming, you actually would be able to be the programmer of the new era, almost a creator. You'll still maybe look on the code levels testing, et cetera, but what's important for you is being able to convert products, requirements, et cetera, to working with tools like Codium AI.So I think like there will be like degree of diff different type developers now. If you think about it for a second, I think like it's a natural evolution. It's, it's true today as well. Like if you know really good the Linux or assembly, et cetera, you'll probably work like on LLVM Nvidia [00:48:00] whatever, like things like that.Right. And okay. So I think it'll be like the next, next step. I'm talking about the next five years. Yeah. Yeah. Again, 15 years. I think it's, it's a new episode if you would like to invite me. Yeah. Oh, you'll be, you'll be back. Yeah. It's a new episode about how, how I think the world will look like when you really don't need a developer and we will be there as Cody mi like you can see.Mm-hmm.Alessio: Do we wanna dive a little bit into AutoGPT? You mentioned you're part of the community. Yeah.Swyx: Obviously Try, Catch, Finally, Repeat is also part of the company motto.Itamar: Yeah. So it actually really. Relates to what we're doing and there's a reason we have like a strong relationship and connection with the AutoGPT community and us being part part of it.So like you can see, we're talking about agent for a few months now, and we are building like a designated, a specific agent because we're trying to build like a product that works and gets the developer trust to have developer trust us. We're talking about code integrity. We need it to work. Like even if it will not put 100% it's not 100% by the way our product at all that UX/UI should speak the language of, oh, okay, we're not sure here, please take the driving seat.You want this or that. But we really not need, even if, if we're not close to 100%, we still need to work really well just throwing a number. 90%. And so we're building a like really designated agents like those that from code, create tests.So it could create tests, run them, fix them. It's a few tests. So we really believe in that we're [00:49:30] building a designated agent while Auto GPT is like a swarm of agents, general agents that were supposedly you can ask, please make me rich or make me rich by increase my net worth.Now please be so smart and knowledgeable to use a lot of agents and the tools, et cetera, to make it work. So I think like for AutoGPT community was less important to be very accurate at the beginning, rather to show the promise and start building a framework that aims directly to the end game and start improving from there.While what we are doing is the other way around. We're building an agent that works and build from there towards that. The target of what I explained before. But because of this related connection, although it's from different sides of the, like the philosophy of how you need to build those things, we really love the general idea.So we caught it really early that with Toran like building it, the, the maker of, of AutoGPT, and immediately I started contributing, guess what, what did I contribute at the beginning tests, right? So I started using Codium AI to build tests for AutoGPT, even, even finding problems this way, et cetera.So I become like one of the, let's say 10 contributors. And then in the core team of the management, I talk very often with with Toran on, on different aspects. And we are even gonna have a workshop,Swyx: a very small [00:49:00] meetingItamar: work meeting workshop. And we're going to compete together in a, in a hackathons.And to show that AutoGPT could be useful while, for example, Codium AI is creating the test for it, et cetera. So I'm part of that community, whether is my team are adding tests to it, whether like advising, whether like in in the management team or whether to helping Toran. Really, really on small thing.He is the amazing leader like visionaire and doing really well.Alessio: What do you think is the future of open source development? You know, obviously this is like a good example, right? You have code generating the test and in the future code could actually also implement the what the test wanna do. So like, yeah.How do you see that change? There's obviously not enough open source contributors and yeah, that's one of the, the main issue. Do you think these agents are maybe gonna help us? Nadia Eghbal has this great book called like Working in Public and there's this type of projects called Stadium model, which is, yeah, a lot of people use them and like nobody wants to contribute to them.I'm curious about, is it gonna be a lot of noise added by a lot of these agents if we let them run on any repo that is open source? Like what are the contributing guidelines for like humans versus agents? I don't have any of the answers, but like some of the questions that I've been thinking about.Itamar: Okay. So I wanna repeat your question and make sure I understand you, but like, if they're agents, for example, dedicated for improving code, why can't we run them on, mm-hmm.Run them on like a full repository in, in fixing that? The situation right now is that I don't think that right now Auto GPT would be able to do that for you. Codium AI might but it's not open sourced right now. And and like you can see like in the months or two, you will be able to like running really quickly like development velocity, like our motto is moving fast with confidence by the way.So we try to like release like every day or so, three times even a day in the backend, et cetera. And we'll develop more feature, enable you, for example, to run an entire re, but, but it's not open source. So about the open source I think like AutoGPT or LangChain, you can't really like ask please improve my repository, make it better.I don't think it will work right now because because let me like. Softly quote Ilya from Open AI. He said, like right now, let's say that a certain LLM is 95% accurate. Now you're, you're concatenating the results. So the accuracy is one point like it's, it's decaying. And what you need is like more engineering frameworks and work to be done there in order to be able to deal with inaccuracies, et cetera.And that's what we specialize in Codium, but I wanna say that I'm not saying that Auto GPT won't be able to get there. Like the more tools and that going to be added, the [00:52:30] more prompt engineering that is dedicated for this, this idea will be added by the way, where I'm talking with Toran, that Codium, for example, would be one of the agents for Auto GPT.Think about it AutoGPT is not, is there for any goal, like increase my net worth, though not focused as us on fixing or improving code. We might be another agent, by the way. We might also be, we're working on it as a plugin for ChatGPT. We're actually almost finished with it. So that's like I think how it's gonna be done.Again, open opensource, not something we're thinking about. We wanted to be really good before weSwyx: opensource it. That was all very impressive. Your vision is actually very encouraging as well, and I, I'm very excited to try it out myself. I'm just curious on the Israel side of things, right? Like you, you're visiting San Francisco for a two week trip for this special program you can tell us about. But also I think a lot of American developers have heard that, you know, Israel has a really good tech scene. Mostly it's just security startups. You know, I did some, I was in some special unit in the I D F and like, you know, I come out and like, I'm doing the same thing again, but like, you know, for enterprises but maybe just something like, describe for, for the rest of the world.It's like, What is the Israeli tech scene like? What is this program that you're on and what shouldItamar: people know? So I think like Israel is the most condensed startup per capita. I think we're number one really? Or, or startup pair square meter. I think, I think we're number one as well because of these properties actually there is a very strong community and like everyone are around, like are [00:57:00] working in a.An entrepreneur or working in a startup. And when you go to the bar or the coffee, you hear if it's 20, 21, people talking about secondary, if it's 2023 talking about like how amazing Geni is, but everyone are like whatever are around you are like in, in the scene. And, and that's like a lot of networking and data propagation, I think.Somehow similar here to, to the Bay Area in San Francisco that it helps, right. So I think that's one of our strong points. You mentioned some others. I'm not saying that it doesn't help. Yes. And being in the like idf, the army, that age of 19, you go and start dealing with technology like very advanced one, that, that helps a lot.And then going back to the community, there's this community like is all over the world. And for example, there is this program called Icon. It's basically Israelis and in the Valley created a program for Israelis from, from Israel to come and it's called Silicon Valley 1 0 1 to learn what's going on here.Because with all the respect to the tech scene in Israel here, it's the, the real thing, right? So, so it's an non-profit organization by Israelis that moved here, that brings you and, and then brings people from a 16 D or, or Google or Navon or like. Amazing people from unicorns or, or up and coming startup or accelerator, and give you up-to-date talks and, and also connect you to relevant people.And that's, that's why I'm here in addition to to, you know, to [00:58:30] me and, and participate in this amazing podcast, et cetera.Swyx: Yeah. Oh, well, I, I think, I think there's a lot of exciting tech talent, you know, in, in Tel Aviv, and I, I'm, I'm glad that your offer is Israeli.Itamar: I, I think one of thing I wanted to say, like yeah, of course, that because of what, what what we said security is, is a very strong scene, but a actually water purification agriculture attack, there's a awful other things like usually it's come from necessity.Yeah. Like, we have big part of our company of our state is like a desert. So there's, there's other things like ai by the way is, is, is big also in Israel. Like, for example, I think there's an Israeli competitor to open ai. I'm not saying like it's as big, but it's ai 21, I think out of 10.Yeah. Out. Oh yeah. 21. Is this really? Yeah. Out of 10 like most, mm-hmm. Profound research labs. Research lab is, for example, I, I love, I love their. Yeah. Yeah.Swyx: I, I think we should try to talk to one of them. But yeah, when you and I met, we connected a little bit Singapore, you know, I was in the Singapore Army and Israeli army.We do have a lot of connections between countries and small countries that don't have a lot of natural resources that have to make due in the world by figuring out some other services. I think the Singapore startup scene has not done as well as the Israeli startup scene. So I'm very interested in, in how small, small countries can have a world impact essentially.Itamar: It's a question we're being asked a lot, like why, for example, let's go to the soft skills. I think like failing is a bad thing. Yeah. Like, okay. Like sometimes like VCs prefer to [01:00:00] put money on a, on an entrepreneur that failed in his first startup and actually succeeded because now that person is knowledgeable, what it mean to be, to fail and very hungry to, to succeed.So I think like generally, like there's a few reason I think it's hard to put the finger exactly, but we talked about a few things. But one other thing I think like failing is not like, this is my fourth company. I did one as, it wasn't a startup, it was a company as a teenager. And then I had like my first startup, my second company that like, had a amazing run, but then very beautiful collapse.And then like my third company, my second startup eventually exit successfully to, to Alibaba. So, so like, I think like it's there, there are a lot of trial and error, which is being appreciated, not like suppressed. I guess like that's one of the reason,Alessio: wanna jump into lightning round?Swyx: Yes. I think we send you into prep, but there's just three questions now.We've, we've actually reduced it quite a bit, but you have it,Alessio: so, and we can read them that you can take time and answer. You don't have to right away. First question, what is a already appin in AI that Utah would take much longer than an sItamar: Okay, so I have to, I hope it doesn't sound like arrogant, but I started coding AI BC before chatty.Mm-hmm. And, and I was like going to like VCs and V P R and D is director, et cetera, and telling them, listen, we're gonna help with code logic testing and we're going to do that interactive conversation way. And they were like, no way. I even had like two saying, I won't let your silly AI get close to my code.[01:01:30]That was bc ac. It's really different. And so like we kind of saw like it. Like if you played with G P T three, especially three and a half, whatever, like you felt working really well with instruction and conversation. So having said that, I think like still like Open Eye did amazing job, like building the product, like of course building the model, but that's forgiven.Like they're the leaders, but did an amazing job building the product that's as accessible. And I think that was maybe a bit surprising. Like I think like many tried to do a chatbot or so with these GPTs, but they, since they're. Developing these, these models, they probably felt, and I think that's what happened, that it's not being used correctly.So I think like the fact that they built actually the product, so well, that was maybe surprising for me. Again, I hope it doesn't sound too arrogant, but I I don't feel like there was a step function here. We might reach your point, but that's like, as we said, a different episode at inflection point and things were gonna be really surprisingSwyx: when the agents take over exploration.So what do you think is the most interesting unsolved question in, in ai? Like, what would you re, what's an open question that you think, man, somebody should solve that?Itamar: Okay, so here I am going to go to the Yes obvious answer. That's a AI alignment. Mm-hmm. Like, it's, it's a technical question. It's it's a philosophy question, et cetera.It's, it's, it's not easy. Like it raises so many question even about ourself [01:03:00] as as human or we, like, I saw one tweet by someone that I'm thinking about like for a few years he wrote are we actually like LLMs, like in essence? So, so I think like we're trying to look into those LLMs for years. Like there, there was, like in 2014 there was already in the C N N, there was a few works.Trying to visualize what, what are the, the feature detection, the feature, like what are the feature with the hidden layers that you see, like we're trying to work on it for years, lately, like a really long time ago, like five years, days ago or so, like, we saw work by open ai, like trying to turn, look on on different parts of Dell LM and trying to provide a natural language description for them.So I think like this is very important. Very interesting tech-wise, philosophy wise, et cetera, that that's like, I think need to be explored more. And just one takeawayAlessio: for all the listeners, like what's one message you want everyone to remember about ai? I, IItamar: would say, again, something might be a bit obvious, but I think right now what's happening is that we're actually true to this month's overestimating what gen AI can do overestimating, but we're underestimating what it can do in the future.Okay. So why am I saying that? Because if you're a builder, I really encourage you, speak less and do more play with it. Try it for specific use cases and see what's easy to do. And then if your purpose is just like incorporating stuff and that's what you wanna do and [01:04:30] then do it, but don't like, tell everyone you're gonna do it before you do it, because you might find that it's actually really hard and there's a lot of problems.It works amazing. Like it wowed you for two examples, but then for eight other examples that like works crappy data. I want, if you're building, you wanna build a startup. So find that case where you believe that you can think about a solution around LLMs or what it's going to be in in one or two years because you want to, what?You wanna try to predict that and what's a challenging around it and do that through trying, trying, trying. Like for example, if you're really excited about auto G P T. Try to find five different cases that you, you managed to make it work for. Again, you might find you can't. I'm, I think that it's, it will do a lot and I think it was good that somebody brought these frameworks and now will try to jump, will progress with the levels that I talked about before.So that, that's my like really like. If you think of idea first, try it. It's like easier than ever. Like there are so many, so many tools to, to try like, and that's one of the things that brought us to coding large language model as is do not work for verifying code logic. But we think there's, we see the path, how to combine with other technical elements and how AI's going to evolve that we can actually bring to fruition this, this idea, this notion of the dry concept that I mentioned.Well,Alessio: Edmar, thank you so much for coming on. This was great.Itamar: Thank you for inviting me. It was a pleasure.[01:06:00] Get full access to Latent Space at www.latent.space/subscribe
01:02:3625/05/2023
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML
We are excited to be the first podcast in the world to release an in-depth interview on the new SOTA in commercially licensed open source models - MosiacML MPT-7B!The Latent Space crew will be at the NYC Lux AI Summit next week, and have two meetups in June. As usual, all events are on the Community page! We are also inviting beta testers for the upcoming AI for Engineers course. See you soon!One of GPT3’s biggest limitations is context length - you can only send it up to 4000 tokens (3k words, 6 pages) before it throws a hard error, requiring you to bring in LangChain and other retrieval techniques to process long documents and prompts. But MosaicML recently open sourced MPT-7B, the newest addition to their Foundation Series, with context length going up to 84,000 tokens (63k words, 126 pages):This transformer model, trained from scratch on 1 trillion tokens of text and code (compared to 300B for Pythia and OpenLLaMA, and 800B for StableLM), matches the quality of LLaMA-7B. It was trained on the MosaicML platform in 9.5 days on 440 GPUs with no human intervention, costing approximately $200,000. Unlike many open models, MPT-7B is licensed for commercial use and it’s optimized for fast training and inference through FlashAttention and FasterTransformer.They also released 3 finetuned models starting from the base MPT-7B: * MPT-7B-Instruct: finetuned on dolly_hhrlhf, a dataset built on top of dolly-5k (see our Dolly episode for more details). * MPT-7B-Chat: finetuned on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets.* MPT-7B-StoryWriter-65k+: it was finetuned with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. While 65k is the advertised size, the team has gotten up to 84k tokens in response when running on a single node A100-80GB GPUs. ALiBi is the dark magic that makes this possible. Turns out The Great Gatsby is only about 68k tokens, so the team used the model to create new epilogues for it!On top of the model checkpoints, the team also open-sourced the entire codebase for pretraining, finetuning, and evaluating MPT via their new MosaicML LLM Foundry. The table we showed above was created using LLM Foundry in-context-learning eval framework itself!In this episode, we chatted with the leads of MPT-7B at Mosaic: Jonathan Frankle, Chief Scientist, and Abhinav Venigalla, Research Scientist who spearheaded the MPT-7B training run. We talked about some of the innovations they’ve brought into the training process to remove the need for 2am on-call PagerDutys, why the LLM dataset mix is such an important yet dark art, and why some of the traditional multiple-choice benchmarks might not be very helpful for the type of technology we are building.Show Notes* Introducing MPT-7B* Cerebras* Lottery Ticket Hypothesis* Hazy Research* ALiBi* Flash Attention* FasterTransformer* List of naughty words for C4 https://twitter.com/code_star/status/1661386844250963972* What is Sparsity?* Hungry Hungry Hippos* BF16 FPp.s. yes, MPT-7B really is codenamed LLongboi!Timestamps* Introductions [00:00:00]* Intro to Mosaic [00:03:20]* Training and Creating the Models [00:05:45]* Data Choices and the Importance of Repetition [00:08:45]* The Central Question: What Mix of Data Sets Should You Use? [00:10:00]* Evaluation Challenges of LLMs [0:13:00]* Flash Attention [00:16:00]* Fine-tuning for Creativity [00:19:50]* Open Source Licenses and Ethical Considerations [00:23:00]* Training Stability Enhancement [00:25:15]* Data Readiness & Training Preparation [00:30:00]* Dynamic Real-time Model Evaluation [00:34:00]* Open Science for Affordable AI Research [00:36:00]* The Open Approach [00:40:15]* The Future of Mosaic [00:44:11]* Speed and Efficiency [00:48:01]* Trends and Transformers [00:54:00]* Lightning Round and Closing [1:00:55]TranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio partner and CTO-in-Residence at Decibel Partners. I'm joined by my co-host, Swyx, writer and editor of Latent Space.Swyx: Hey, and today we have Jonathan and Abhi from Mosaic ML. Welcome to our studio.Jonathan: Guys thank you so much for having us. Thanks so much.Swyx: How's it feel?Jonathan: Honestly, I've been doing a lot of podcasts during the pandemic, and it has not been the same.Swyx: No, not the same actually. So you have on your bio that you're primarily based in Boston,Jonathan: New York. New York, yeah. My Twitter bio was a probability distribution over locations.Swyx: Exactly, exactly. So I DMd you because I was obviously very interested in MPT-7B and DMd you, I was like, for the 0.2% of the time that you're in San Francisco, can you come please come to a podcast studio and you're like, I'm there next week.Jonathan: Yeah, it worked out perfectly. Swyx: We're really lucky to have you, I'll read off a few intros that people should know about you and then you can fill in the blanks.So Jonathan, you did your BS and MS at Princeton in programming languages and then found your way into ML for your PhD at MiT where you made a real splash with the lottery ticket hypothesis in 2018, which people can check up on. I think you've done a few podcasts about it over the years, which has been highly influential, and we'll talk about sparse models at Mosaic. You have also had some side [00:01:30] quest. You taught programming for lawyers and you did some law and privacy stuff in, in DC and also did some cryptography stuff. Um, and you've been an assistant professor at Harvard before earning your PhD.Jonathan: I've yet to start.Swyx: You, you yet to start. Okay. But you just got your PhD.Jonathan:. I technically just got my PhD. I was at Mosaic which delayed my defense by about two years. It was, I was at 99% done for two years. Got the job at Harvard, Mosaic started, and I had better things to do than write my dissertation for two years. Swyx: You know, you know, this is very out of order.Jonathan: Like, oh, completely out of order, completely backwards. Go talk to my advisor about that. He's also an advisor at Mosaic and has been from the beginning. And, you know, go talk to him about finishing on time.Swyx: Great, great, great. And just to fill it out, Abhi, you did your BS and MS and MIT, you were a researcher at Cerebras, and you're now a research scientist at Mosaic. Just before we go into Mosaic stuff, I'm actually very curious about Cereus and, uh, just that, that space in general. Um, what are they doing that people should know about?Abhinav: Yeah, absolutely. Um, I think the biggest thing about CEREUS is that they're really building, you know, kind of the NextGen computing platform beyond, like GPUs.Um, they're trying to build a system that uses an entire wafer, you know, rather than cutting up a wafer into smaller chips and trying to train a model on that entire system, or actually more recently on many such wafers. Um, so it's, and it's really extraordinary. I think it's like the first time ever that kind of wafer scale computing has ever really worked. And so it's a really exciting time to be there, trying to figure out how we can map ML workloads to work, um, on a much, much bigger chip.Swyx: And do you use like [00:03:00] a different programming language or framework to do that? Or is that like..Abhinav: Yeah, so I mean, things have changed a bit since I was there.I think, um, you can actually run just normal tensor flow and pie torch on there. Um, so they've built a kind of software stack that compiles it down. So it actually just kind of works naturally. But yeah.Jonathan : Compiled versions of Python is a hot topic at the moment with Mojo as well. Swyx: And then Mosaic, you, you spearheaded the MPT-7B effort.INTRO TO MOSAIC [00:03:20]Abhinav: Uh, yeah. Yeah, so it's kind of like, it's been maybe six months, 12 months in the making. We kind of started working on LMs sort of back in the summer of last year. Um, and then we came with this blog post where we kind of profiled a lot of LMs and saw, hey, the cost of training is actually a lot lower than what people might think.Um, and then since then, you know, being inspired by kind of, you know, meta’s release, so the LLaMA models and lots of other open source work, we kind of started working towards, well, what if we were to release a really good kind of 7 billion parameter model? And that's what MPT is. Alessio:You know, we mentioned some of the podcasts you had done, Jonathan, I think in one of them you mentioned Mosaic was not planning on building a model and releasing and obviously you eventually did. So what are some of the things that got you there that maybe obviously LLaMA you mentioned was an inspiration. You now have both the training and like inference products that you offer. Was this more of a research challenge in a way, uh, that you wanted to do?Or how did the idea come to be?Jonathan: I think there were a couple of things. So we still don't have a first class model. We're not an open AI where, you know, our businesses come to use our one great model. Our business is built around customers creating their own models. But at the end of the day, if customers are gonna create their own models, we have to have the tools to help them do that, and to have the tools to help them do that and know that they work we have to create our own models to start. We have to know that we can do something great if customers are gonna do something great. And one too many people may have challenged me on Twitter about the fact that, you know, mosaic claims all these amazing numbers, but, you know, I believe not to, you know, call out Ross Whiteman here, but, you know, I believe he said at some point, you know, show us the pudding.Um, and so Ross, you know, please let me know how the pudding tastes. But in all seriousness, like I think there is something, this is a demo in some sense. This is to say we did this in 9.5 days for a really reasonable cost, straight through 200, an intervention. 200 K. Yep. Um, you can do this too.Swyx: Uh, and just to reference the numbers that you're putting out, this is the, the last year you were making a lot of noise for trading GPT 3 under 450 K, which is your, your initial estimate.Um, and then it went down to a 100 K and stable diffusion 160 k going down to less than 50 K as well.Jonathan: So I will be careful about that 100 K number. That's certainly the challenge I've given Abhi to hit. Oh, I wouldn't make the promise that we’ve hit yet, but you know, it's certainly a target that we have.And I, you know, Abhi may kill me for saying this. I don't think it's crazy. TRAINING AND CREATING THE MODELS [00:05:45] Swyx: So we definitely want to get into like estimation math, right? Like what, what needs to happen for those big order magnitude changes to in, in infrastructure costs. But, uh, let's kind of stick to the MPT-7B story. Yeah. Tell us everything.Like you have, uh, three different models. One of them. State of the art essentially on context length. Let's talk about the process of training them, the, uh, the decisions that you made. Um, I can go into, you know, individual details, but I just wanna let you let you rip.Abhinav: Yeah, so I mean, I think, uh, we started off with the base model, which is kind of for all practical purposes, a recreation of LLaMA 7B.Um, so it's a 7 billion perimeter model trained on the trillion tokens. Um, and our goal was like, you know, we should do it efficiently. We should be able to do it like, kind of hands free so we don't have to babysit the runs as they're doing them. And it could be kind of a, a launching point for these fine tune models and those fine tune models, you know, on, on the one hand they're kind of really fun for the community, like the story writer model, which has like a 65,000 length context window and you can even kind of extrapolate beyond that. Um, but they're, they're also kind of just tr inspirations really. So you could kind of start with an MPT-7B base and then build your own custom, you know, downstream. If you want a long context code model, you could do that with our platform. If you wanted one that was for a particular language, you could do that too.But yeah, so we picked kind of the three variance chat and instruct and story writer just kind of like inspirations looking at what people were doing in the community today. Yeah. Alessio: And what's the beginning of the math to come up with? You know, how many tokens you wanna turn it on? How many parameters do you want in a bottle? 7 billion and 30 billion seem to be kind of like two of the magic numbers going around right now. Abhinav: Yeah, definitely. Definitely. Yeah, I think like there's sort of these scaling laws which kind of tell you how to best spend your training compute if that's all you cared about. So if you wanna spend $200,000 exactly in the most efficient way, there'd be a recipe for doing that.Um, and that we usually go by the Chinchilla laws. Now for these models, we actually didn't quite do that because we wanted to make sure that people could actually run these at home and that they [00:07:30] were good for inference. So we trained them kind of beyond those chinchilla points so that we're almost over-training them.I think there's like a joke going on online that they're like long boy and that that came up internally because we were training them for really, really long durations. So that 7B model, the chinchilla point might be 140 billion tokens. Instead, we trained a trillion, so almost seven times longer than you normally would.Swyx: So longboi was the code name. So is it, is it the trading method? Is it the scaling law that you're trying to coin or is it the code name for the 64 billion?Jonathan: Uh, 64. It was just an internal joke for the, for training on way more tokens than you would via chinchilla. Okay. Um, we can coin it long boy and it, it really stuck, but just to, you know, long boys filled with two ELs at the beginning.Yeah. Cause you know, we wanted the lLLaMA thing in there as well. Jonathan: Yeah, yeah, yeah. Our darn CEO we have to rein him in that guy, you know, you can't, yeah. I'm gonna take away his Twitter password at some point. Um, but you know, he had to let that one out publicly. And then I believe there was a YouTube video where someone happened to see it mentioned before the model came out and called it the Long G boy or something like that.Like, so you know, now it's out there in the world. It's out there. It's like Sydnee can't put it back inSwyx: There's a beautiful picture which I think Naveen tweeted out, which, um, shows a long boy on a whiteboard.Jonathan: That was the origin of Long Boy. In fact, the legs of the lLLaMA were the two Ls and the long boy.DATA CHOICES AND THE IMPORTANCE OF REPETITION [00:08:45]Swyx: Well, talk to me about your data choices, right? Like this is your passion project. Like what can you tell us about it?Jonathan: Yeah, I think Abhi wanted to kill me by the end for trying to use all the GPUs on data and none of them on actually training the model. Um, at the end of the day, We know that you need to train these models and [00:09:00] lots of data, but there are a bunch of things we don't know.Number one is what kinds of different data sources matter. The other is how much does repetition really matter? And really kind of repetition can be broken down into how much does quality versus quantity matter. Suppose I had the world's best 10 billion tokens of data. Would it be better to train on that a hundred times or better to train on a trillion tokens of low quality, fresh data?And obviously there's, there's a middle point in between. That's probably the sweet spot. But how do you even know what good quality data is? And. So, yeah, this is, nobody knows, and I think the more time I spent, we have a whole data team, so me and several other people, the more time that we spent on this, you know, I came away thinking, gosh, we know nothing.Gosh, if I were back in academia right now, I would definitely go and, you know, write a paper about this because I have no idea what's going on.Swyx: You would write a paper about it. I'm interested in such a paper. I haven't come across any that exists. Could you frame the central question of such a paper?THE CENTRAL QUESTION: WHAT MIX OF DATA SETS SHOULD YOU USE? [00:10:00]Jonathan: Yeah. The central question is what mix of data sets should you use? Okay. Actually I've, you know, you had mentioned my law school stuff. I went back to Georgetown Law where I used to teach, um, in the midst of creating this model, and I actually sat down with a class of law students and asked them, I gave them our exact data sets, our data mixes, um, like how many tokens we had, and I said, Create the best data set for your model.Knowing they knew nothing about large language models, they just know that data goes in and it's going to affect the behavior. Um, and I was like, create a mix and they basically covered all the different trade-offs. Um, you probably want a lot of English language [00:10:30] text to start with. You get that from the web, but do you want it to be multilingual?If so, you're gonna have a lot less English text. Maybe it'll be worse. Do you wanna have code in there? There are all these beliefs that code leads to models being better at logical reasoning, of which I've seen zero evidence. Rep. It's not, um, I mean, really made a great code model, but code models leading to better chain of thought reasoning on the part of language or code being in the training set leading to better chain of thought reasoning.People claim this all the time, but I've still never seen any real evidence beyond that. You know, one of the generations of the GPT three model started supposedly from Code Da Vinci. Yes. And so there's a belief that, you know, maybe that helped. But again, no evidence. You know, there's a belief that spending a lot of time on good sources like Wikipedia is good for the model.Again, no evidence. At the end of the day, we tried a bunch of different data mixes and the answer was that there are some that are better or worse than others. We did find that the pile, for example, was a really solid data mix, but you know, there were stronger data mixes by our evaluation metrics. And I'll get back to the evaluation question in a minute cuz that's a really important one.This data set called c4, which is what the original T five model was trained on, is weirdly good. And everybody, when I posted on this on Twitter, like Stella Beaterman from Luther mentioned this, I think someone else mentioned this as well. C4 does really well in the metrics and we have no idea why we de-duplicated it against our evaluation set.So it's not like it memorized the data, it is just one web scrape from 2019. If you actually look at the T five paper and see how it was pre-processed, it looks very silly. Mm-hmm. They removed anything that had the word JavaScript in it because they didn't want to get like no JavaScript [00:12:00] warnings. They removed anything with curly braces cuz they didn't wanna get JavaScript in it.They looked at this list of bad words, um, and removed anything that had those bad words. If you actually look at the list of bad words, words like gay are on that list. And so there's, you know, it is a very problematic, you know, list of words, but that was the cleaning that leads to a data set that seems to be unbeatable.So that to me says that we know nothing about data. We, in fact used a data set called mc four as well, which is they supposedly did the same pre-processing of C4 just on more web calls. The English portion is much worse than C4 for reasons that completely escape us. So in the midst of all that, Basically I set two criteria.One was I wanted to be at least as good as mc four English, like make sure that we're not making things actively worse. And mc four English is a nice step up over other stuff that's out there. And two was to go all in on diversity after that, making sure that we had some code, we had some scientific papers, we had Wikipedia, because people are gonna use this model for all sorts of different purposes.But I think the most important thing, and I'm guessing abhi had a million opinions on this, is you're only as good as your evaluation. And we don't know how to evaluate models for the kind of generation we ask them to do. So past a certain point, you have to kinda shrug and say, well, my evaluation's not even measuring what I care about.Mm-hmm. So let me just make reasonable choices. EVALUATION CHALLENGES OF LLMs [0:13:00]Swyx: So you're saying MMLU, big bench, that kind of stuff is not. Convincing for youJonathan: A lot of this stuff is you've got two kinds of tasks. Some of these are more of multiple choice style tasks where there is a right answer. Um, either you ask the model to spit out A, B, C, or D or you know, and if you're more [00:13:30] sophisticated, you look at the perplexity of each possible answer and pick the one that the model is most likely to generate.But we don't ask these models to do multiple choice questions. We ask them to do open-ended generation. There are also open-ended generation tasks like summarization. You compare using things like a blue score or a rouge score, which are known to be very bad ways of comparing text. At the end of the day, there are a lot of great summaries of a paper.There are a lot of great ways to do open form generation, and so humans are, to some extent, the gold standard. Humans are very expensive. It turns out we can't put them into our eval pipeline and just have the humans look at our model every, you know, 10 minutes? Not yet. Not yet. Maybe soon. Um, are you volunteering Abhi?Abhinav: I, I, I just know we have a great eval team who's, uh, who's helping us build new metrics. So if they're listening,Jonathan: But it's, you know, evaluation of large language models is incredibly hard and I don't think any of these metrics really truly capture. What we expect from the models in practice.Swyx: Yeah. And we might draw wrong conclusions.There's been a debate recently about the emergence phenomenon, whether or not it's a mirage, right? I don't know if you guys have opinions about that process. Abhinav: Yeah, I think I've seen like this paper and all and all, even just kind of plots from different people where like, well maybe it's just a artifact of power, like log scaling or metrics or, you know, we're meshing accuracy, which is this a very like harsh zero one thing.Yeah. Rather than kind of something more continuous. But yeah, similar to what Jonathan was saying about evals. Like there there's one issue of like you just like our diversity of eval metrics, like when we put these models up, even like the chat ones, the instruct ones, people are using 'em for such a variety of tasks.There's just almost no way we get ahead of time, like measuring individual dimensions. And then also particularly like, you know, at the 7B scale, [00:15:00] um, these models still are not super great yet at the really hard tasks, like some of the hardest tasks in MMLU and stuff. So sometimes they're barely scoring like the above kind of random chance, you know, like on really, really hard tasks.So potentially as we. You know, aim for higher and higher quality models. Some of these things will be more useful to us. But we kind of had to develop MPT 7B kind of flying a little bit blind on, on what we knew it was coming out and just going off of like, you know, a small set of common sensor reasoning tasks.And of course, you know, just comparing, you know, those metrics versus other open source models. Alessio: I think fast training in inference was like one of the goals, right? So there's always the trade off between doing the hardest thing and like. Doing all the other things quickly.Abhinav: Yeah, absolutely. Yeah, I mean, I think like, you know, even at the 7B scale, you know, uh, people are trying to run these things on CPUs at home.You know, people are trying to port these to their phones, basically prioritizing the fact that the small scale would lead to our adoption. That was like a big, um, big thing going on. Alessio: Yeah. and you mentioned, um, flash attention and faster transformer as like two of the core things. Can you maybe explain some of the benefits and maybe why other models don't use it?FLASH ATTENTION [00:16:00]Abhinav: Yeah, absolutely. So flash attention is this basically faster implementation of full attention. Um, it's like a mathematical equivalent developed by like actually some of our collaborators, uh, at Stanford. Uh, the hazy research. Hazy research, yeah, exactly.Jonathan: What is, what, what, what's the name hazy research mean?Abhinav: I actually have no idea.Swyx: I have no clue. All these labs have fun names. I always like the stories behind them.Abhinav: Yeah, absolutely. We really, really liked flash attention. We, I think, had to integrate into repo even as [00:16:30] as early as September of last year. And it really just helps, you know, with training speed and also inference speed and we kind of bake that into model architecture.And this is kind of unique amongst all the other hugging face models you see out there. So ours actually, you can toggle between normal torch attention, which will work anywhere and flash attention, which will work on GPUs right out of the box. And that way I think you get almost like a 2x speed up at training time and somewhere between like 50% to a hundred percent speed up at inference time as well.So again, this is just like, we really, really wanted people to use these and like, feel like an improvement and we, we have the team to, to help deliver that. Swyx: Another part, um, of your choices was alibi position, encodings, which people are very interested in, maybe a lot of people just, uh, to sort of take in, in coatings as, as a given.But there's actually a lot of active research and honestly, it's a lot of, um, it's very opaque as well. Like people don't know how to evaluate encodings, including position encodings, but may, may, could you explain, um, alibi and, um, your choice?Abhinav: Yeah, for sure. The alibi and uh, kind of flash attention thing all kind of goes together in interesting ways.And even with training stability too. What alibi does really is that it eliminates the need to have positional embeddings in your model. Where previously, if you're a token position one, you have a particular embedding that you add, and you can't really go beyond your max position, which usually is like about 2000.With alibies, they get rid of that. Instead, just add a bias to the attention map itself. That's kind of like this slope. And if at inference time you wanna go much, much larger, they just kind of stretch that slope out to a longer, longer number of positions. And because the slope is kind of continuous and you can interpret it, it all works out now.Now one of [00:18:00] the, the funny things we found is like with flash attention, it saved so much memory and like improved performance so much that even as early as I kind of last year, like we were profiling models with, with very long context lines up to like, you know, the 65 k that you seen in release, we just never really got around to using it cuz we didn't really know what we might use it for.And also it's very hard to train stably. So we started experimenting with alibi integration, then we suddenly found that, oh wow, stability improves dramatically and now we can actually work together with alibi in a long context lens. That's how we got to like our story writer model where we can stably train these models out to very, very long context lenses and, and use them performantly.Jonathan: Yeah.Swyx: And it's also why you don't have a firm number. Most people now have a firm number on the context line. Now you're just like, eh, 65 to 85Abhinav: Oh yeah, there's, there's a, there's a big age to be 64 K or 65 k. 65 k plus.Swyx: Just do powers of twos. So 64 isn't, you know. Jonathan: Right, right. Yeah. Yeah. But we could, I mean, technically the context length is infinite.If you give me enough memory, um, you know, we can just keep going forever. We had a debate over what number to say is the longest that we could handle. We picked 84 cakes. It's the longest I expect people to see easily in practice. But, you know, we played around for even longer than that and I don't see why we couldn't go longer.Swyx: Yeah. Um, and so for those who haven't read the blog posts, you put the Great Gatsby in there and, uh, asked it to write an epilogue, which seemed pretty impressive.Jonathan: Yeah. There are a bunch of epilogues floating around internally at Mosaic. Yeah. That wasn't my favorite. I think we all have our own favorites.Yeah. But there are a bunch of really, really good ones. There was one where, you know, it's Gatsby's funeral and then Nick starts talking to Gatsby's Ghost, and Gatsby's father shows up and, you know, then he's [00:19:30] at the police station with Tom. It was very plot heavy, like this is what comes next. And a bunch of that were just very Fitzgerald-esque, like, you know, beautiful writing.Um, but it was cool to just see that Wow, the model seemed to actually be working with. You know, all this input. Yeah, yeah. Like it's, it's exciting. You can think of a lot of things you could do with that kind of context length.FINE-TUNING FOR CREATIVITY [00:19:50]Swyx: Is there a trick to fine tuning for a creative task rather than, um, factual task?Jonathan: I don't know what that is, but probably, yeah, I think, you know, the person, um, Alex who did this, he did fine tune the model explicitly on books. The goal was to try to get a model that was really a story writer. But, you know, beyond that, I'm not entirely sure. Actually, it's a great question. Well, no, I'll ask you back.How would you measure that? Swyx: Uh, God, human feedback is the solve to all things. Um, I think there is a labeling question, right? Uh, in computer vision, we had a really, really good episode with Robo Flow on the segment. Anything model where you, you actually start human feedback on like very, I think it's something like 0.5% of the, the overall, uh, final, uh, uh, labels that you had.But then you sort augment them and then you, you fully automate them, um, which I think could be applied to text. It seems intuitive and probably people like snorkel have already raised ahead on this stuff, but I just haven't seen this applied in the language domain yet.Jonathan: It, I mean there are a lot of things that seem like they make a lot of sense in machine learning that never work and a lot of things that make zero sense that seem to work.So, you know, I've given up trying to even predict. Yeah, yeah. Until I see the data or try it, I just kind shg my shoulders and you know, you hope for the best. Bring data or else, right? Yeah, [00:21:00] exactly. Yeah, yeah, yeah.Alessio: The fine tuning of books. Books three is like one of the big data sets and there was the whole.Twitter thing about trade comments and like, you know, you know, I used to be a community [email protected] and we've run into a lot of things is, well, if you're explaining lyrics, do you have the right to redistribute the lyrics? I know you ended up changing the license on the model from a commercial use Permitted.Swyx: Yeah let's let them. I'm not sure they did. Jonathan: So we flipped it for about a couple hours. Swyx: Um, okay. Can we, can we introduce the story from the start Just for people who are under the loop. Jonathan: Yeah. So I can tell the story very simply. So, you know, the book three data set does contain a lot of books. And it is, you know, as I discovered, um, it is a data set that provokes very strong feelings from a lot of folks.Um, that was one, one guy from one person in particular, in fact. Um, and that's about it. But it turns out one person who wants a lot of attention can, you know, get enough attention that we're talking about it now. And so we had a, we had a discussion internally after that conversation and we talked about flipping the license and, you know, very late at night I thought, you know, maybe it's a good thing to do.And decided, you know, actually probably better to just, you know, Stan Pat's license is still Apache too. And one of the conversations we had was kind of, we hadn't thought about this cuz we had our heads down, but the Hollywood writer Strike took place basically the moment we released the model. Mm-hmm.Um, we were releasing a model that could do AI generated creative content. And that is one of the big sticking points during the strike. Oh, the optics are not good. So the optics aren't good and that's not what we want to convey. This is really, this is a demo of the ability to do really long sequence lengths and.Boy, you know, [00:22:30] that's, that's not timing that we appreciated. And so we talked a lot internally that night about like, oh, we've had time to read the news. We've had time to take a breath. We don't really love this. Came to the conclusion that it's better to just leave it as it is now and learn the lesson for the future.But certainly that was one of my takeaways is this stuff, you know, there's a societal context around this that it's easy to forget when you're in the trenches just trying to get the model to train. And you know, in hindsight, you know, I might've gone with a different thing than a story writer. I might've gone with, you know, coder because we seem to have no problem putting programmers out of work with these models.Swyx: Oh yeah. Please, please, you know, take away this stuff from me.OPEN SOURCE LICENSES AND ETHICAL CONSIDERATIONS [00:23:00]Jonathan: Right. You know, so it's, I think, you know, really. The copyright concerns I leave to the lawyers. Um, that's really, if I learned one thing teaching at a law school, it was that I'm not a lawyer and all this stuff is a little complicated, especially open source licenses were not designed for this kind of world.They were designed for a world of forcing people to be more open, not forcing people to be more closed. And I think, you know, that was part of the impetus here, was to try to use licenses to make things more closed. Um, which is, I think, against the grain of the open source ethos. So that struck me as a little bit strange, but I think the most important part is, you know, we wanna be thoughtful and we wanna do the right thing.And in that case, you know, I hope with all that interesting licensing fund you saw, we're trying to be really thoughtful about this and it's hard. I learned a lot from that experience. Swyx: There’s also, I think, an open question of fair use, right? Is training on words of fair use because you don't have a monopoly on words, but some certain arrangements of words you do.And who is to say how much is memorization by a model versus actually learning and internalizing and then. Sometimes happening to land at the right, the [00:24:00] same result.Jonathan: And if I've learned one lesson, I'm not gonna be the person to answer that question. Right, exactly. And so my position is, you know, we will try to make this stuff open and available.Yeah. And, you know, let the community make decisions about what they are or aren't comfortable using. Um, and at the end of the day, you know, it still strikes me as a little bit weird that someone is trying to use these open source licenses to, you know, to close the ecosystem and not to make things more open.That's very much against the ethos of why these licenses were created.Swyx: So the official mosaic position, I guess is like, before you use TC MPC 7B for anything commercial, check your own lawyers now trust our lawyers, not mosaic’s lawyers.Jonathan: Yeah, okay. Yeah. I'm, you know, our lawyers are not your lawyers.Exactly. And, you know, make the best decision for yourself. We've tried to be respectful of the content creators and, you know, at the end of the day, This is complicated. And this is something that is a new law. It's a new law. It's a new law that hasn't been established yet. Um, but it's a place where we're gonna continue to try to do the right thing.Um, and it's, I think, one of the commenters, you know, I really appreciated this said, you know, well, they're trying to do the right thing, but nobody knows what the right thing is to even do, you know, the, I guess the, the most right thing would've been to literally not release a model at all. But I don't think that would've been the best thing for the community either.Swyx: Cool.Well, thanks. Well handled. Uh, we had to cover it, just causeJonathan: Oh, yes, no worries. A big piece of news. It's been on my mind a lot.TRAINING STABILITY ENHANCEMENT [00:25:15]Swyx: Yeah. Yeah. Well, you've been very thoughtful about it. Okay. So a lot of these other ideas in terms of architecture, flash, attention, alibi, and the other data sets were contributions from the rest of the let's just call it open community of, of machine learning advancements. Uh, but Mosaic in [00:25:30] particular had some stability improvements to mitigate loss spikes, quote unquote, uh, which, uh, I, I took to mean, uh, your existing set of tools, uh, maybe we just co kind of covered that. I don't wanna sort of put words in your mouth, but when you say things like, uh, please enjoy my empty logbook.How much of an oversell is that? How much, you know, how much is that marketing versus how much is that reality?Abhinav: Oh yeah. That, that one's real. Yeah. It's like fully end-to-end. Um, and I think.Swyx: So maybe like what, what specific features of Mosaic malibu?Abhinav: Totally, totally. Yeah. I think I'll break it into two parts.One is like training stability, right? Knowing that your model's gonna basically get to the end of the training without loss spikes. Um, and I think, you know, at the 7B scale, you know, for some models like it ha it's not that big of a deal. As you train for longer and longer durations, we found that it's trickier and trickier to avoid these lost spikes.And so we actually spent a long time figuring out, you know, what can we do about our initialization, about our optimizers, about the architecture that basically prevents these lost spikes. And you know, even in our training run, if you zoom in, you'll see small intermittent spikes, but they recover within a few hundred steps.And so that's kind of the magical bit. Our line is one of defenses we recover from Las Vegas, like just naturally, right? Mm-hmm. Our line two defense was that we used determinism and basically really smart resumption strategies so that if something catastrophic happened, we can resume very quickly, like a few batches before.And apply some of these like, uh, interventions. So we had these kinds of preparations, like a plan B, but we didn't have to use them at all for MPT 7B training. So, that was kind of like a lucky break. And the third part of like basically getting all the way to the empty law book is having the right training infrastructure.[00:27:00]So this is basically what, like is, one of the big selling points of the platform is that when you try to train these models on hundreds of GPUs, not many people outside, you know, like deep industry research owners, but the GPUs fail like a lot. Um, I would say like almost once every thousand a 100 days.So for us on like a big 512 cluster every two days, basically the run will fail. Um, and this is either due to GPUs, like falling off the bus, like that's, that's a real error we see, or kind of networking failures or something like that. And so in those situations, what people have normally done is they'll have an on-call team that's just sitting round the clock, 24-7 on slack, once something goes wrong.And if then they'll basically like to try to inspect the cluster, take nodes out that are broken, restart it, and it's a huge pain. Like we ourselves did this for a few months. And as a result of that, because we're building such a platform, we basically step by step automated every single one of those processes.So now when a run fails, we have this automatic kind of watch talk that's watching. It'll basically stop the job. Test the nodes cord in anyone's that are broken and relaunch it. And because our software's all deterministic has fast resumption stuff, it just continues on gracefully. So within that log you can see sometimes I think maybe at like 2:00 AM or something, the run failed and within a few minutes it's back up and running and all of us are just sleeping peacefully.Jonathan: I do wanna say that was hard one. Mm-hmm. Um, certainly this is not how things were going, you know, many months ago, hardware failures we had on calls who were, you know, getting up at two in the morning to, you know, figure out which node had died for what reason, restart the job, have to cord the node. [00:28:30] Um, we were seeing catastrophic loss spikes really frequently, even at the 7B scale that we're just completely derailing runs.And so this was step by step just ratcheting our way there. As Abhi said, to the point where, Many models are training at the moment and I'm sitting here in the studio and not worrying one bit about whether the runs are gonna continue. Yeah. Swyx: I'm, I'm not so much of a data center hardware kind of guy, but isn't there existing software to do this for CPUs and like, what's different about this domain? Does this question make sense at all?Jonathan: Yeah, so when I think about, like, I think back to all the Google fault tolerance papers I read, you know, as an undergrad or grad student mm-hmm. About, you know, building distributed systems. A lot of it is that, you know, Each CPU is doing, say, an individual unit of work.You've got a database that's distributed across your cluster. You wanna make sure that one CPU failing can't, or one machine failing can't, you know, delete data. So you, you replicate it. You know, you have protocols like Paxos where you're literally, you've got state machines that are replicated with, you know, with leaders and backups and things like that.And in this case, you were performing one giant computation where you cannot afford to lose any node. If you lose a node, you lose model state. If you lose a node, you can't continue. It may be that, that in the future we actually, you know, create new versions of a lot of our distributed training libraries that do have backups and where data is replicated so that if you lose a node, you can detect what node you've lost and just continue training without having to stop the run, you know?Pull from a checkpoint. Yeah. Restart again on different hardware. But for now, we're certainly in a world where if anything dies, that's the end of the run and you have to go back and recover from it. [00:30:00]DATA READINESS & TRAINING PREPARATION [00:30:00]Abhinav: Yeah. Like I think a big part, a big word there is like synchronous data pluralism, right? So like, we're basically saying that on every step, every GP is gonna do some work.They're gonna stay in sync with each other and average their, their gradients and continue. Now that there are algorithmic techniques to get around this, like you could say, oh, if a GP dies, just forget about it. All the data that's gonna see, we'll just forget about it. We're not gonna train on it.But, we don't like to do that currently because, um, it makes us give up determinism, stuff like that. Maybe in the future, as you go to extreme scales, we'll start looking at some of those methods. But at the current time it's like, we want determinism. We wanted to have a run that we could perfectly replicate if we needed to.And it was, the goal is figure out how to run it on a big cluster without humans having to babysit it. Babysit it. Alessio: So as you mentioned, these models are kind of the starting point for a lot of your customers To start, you have a. Inference product. You have a training product. You previously had a composer product that is now kind of not rolled into, but you have like a super set of it, which is like the LLM foundry.How are you seeing that change, you know, like from the usual LOP stack and like how people train things before versus now they're starting from, you know, one of these MPT models and coming from there. Like worship teams think about as they come to you and start their journey.Jonathan: So I think there's a key distinction to make here, which is, you know, when you say starting from MPT models, you can mean two things.One is actually starting from one of our checkpoints, which I think very few of our customers are actually going to do, and one is starting from our configuration. You can look at our friends at Rep for that, where, you know, MPT was in progress when Refl [00:31:30] came to us and said, Hey, we need a 3 billion parameter model by next week on all of our data.We're like, well, here you go. This is what we're doing, and if it's good enough for us, um, hopefully it's good enough for you. And that's basically the message we wanna send to our customers. MPT is basically clearing a path all the way through where they know that they can come bring their data, they can use our training infrastructure, they can use all of our amazing orchestration and other tools that abhi just mentioned, for fault tolerance.They can use Composer, which is, you know, still at the heart of our stack. And then the l l M Foundry is really the specific model configuration. They can come in and they know that thing is gonna train well because we've already done it multiple times. Swyx: Let's dig in a little bit more on what should people have ready before they come talk to you? So data architecture, eval that they're looking, etc.Abhinav: Yeah, I, I mean, I think we'll accept customers at any kind of stage in their pipeline. You know, like I'd say science, there's archetypes of people who have built products around like some of these API companies and reach a stage or maturity level where it's like we want our own custom models now, either for the purpose of reducing cost, right?Like our inference services. Quite a bit cheaper than using APIs or because they want some kind of customization that you can't really get from the other API providers. I'd say the most important things to have before training a big model. You know, you wanna have good eval metrics, you know, some kind of score that you can track as you're training your models and scaling up, they can tell you you're progressing.And it's really funny, like a lot of times customers will be really excited about training the models, right? It's really fun to like launch shelves on hundreds of gfs, just all around. It's super fun. But then they'll be like, but wait, what are we gonna measure? Not just the training loss, right? I mean, it's gotta be more than that.[00:33:00]So eval metrics is like a, it's a good pre-req also, you know, your data, you know, either coming with your own pre-training or fine-tune data and having like a strategy to clean it or we can help clean it too. I think we're, we're building a lot of tooling around that. And I think once you have those two kinds of inputs and sort of the budget that you want, we can pretty much walk you through the rest of it, right?Like that's kind of what we do. Recently we helped build CR FM's model for biomedical language a while back. Jonathan: Um, we can. That's the center of research for foundation models. Abhi: Exactly, exactly.Jonathan: Spelling it out for people. Of course.Abhinav: No, absolutely. Yeah, yeah. No, you've done more of these than I have.Um, I think, uh, basically it's sort of, we can help you figure out what model I should train to scale up so that when I go for my big run company, your here run, it's, uh, it's predictable. You can feel confident that it's gonna work, and you'll kind of know what quality you're gonna get out before you have to spend like a few hundred thousand dollars.DYNAMIC REAL-TIME MODEL EVALUATION [00:34:00]Alessio: The rap Reza from rap was on the podcast last week and, uh, they had human eval and then that, uh, I'm Jon Eval, which is like vibe based. Jonathan: And I, I do think the vibe based eval cannot be, you know, underrated really at the, I mean, at the end of the day we, we did stop our models and do vibe checks and we did, as we monitor our models, one of our evals was we just had a bunch of prompts and we would watch the answers as the model trained and see if they changed cuz honestly, You know, I don't really believe in any of these eval metrics to capture what we care about.Mm-hmm. But when you ask it, uh, you know, I don't know. I think one of our prompts was to suggest games for a three-year-old and a seven-year-old. That would be fun to play. Like that was a lot more [00:34:30] valuable to me personally, to see how that answer evolved and changed over the course of training. So, you know, and human eval, just to clarify for folks, human human eval is an automated evaluation metric.There's no humans in it at all. There's no humans in it at all. It's really badly named. I got so confused the first time that someone brought that to me and I was like, no, we're not bringing humans in. It's like, no, it's, it's automated. They just called it a bad name and there's only a hundred cents on it or something.Abhinav: Yeah. Yeah. And, and it's for code specifically, right?Jonathan: Yeah. Yeah. It's very weird. It's a, it's a weird, confusing name that I hate, but you know, when other metrics are called hella swag, like, you know, you do it, just gotta roll with it at this point. Swyx: You're doing live evals now. So one, one of the tweets that I saw from you was that it is, uh, important that you do it paralyzed.Uh, maybe you kind of wanna explain, uh, what, what you guys did.Abhinav: Yeah, for sure. So with LLM Foundry, there's many pieces to it. There's obviously the core training piece, but there's also, you know, tools for evaluation of models. And we've kind of had one of the, I think it's like the, the fastest like evaluation framework.Um, basically it's multi GPU compatible. It runs with Composer, it can support really, really big models. So basically our framework runs so fast that even Azure models are training. We can run these metrics live during the training. So like if you have a dashboard like weights and biases, you kind of watch all these evil metrics.We have, like, 15 or 20 of them honestly, that we track during the run and add negligible overhead. So we can actually watch as our models go and feel confident. Like, it's not like we wait until the very last day to, to test if the models good or notJonathan: That's amazing. Yeah. I love that we've gotten this far into the conversation.We still haven't talked about efficiency and speed. Those are usually our two watch words at Mosaic, which is, you know, that's great. That says that we're [00:36:00] doing a lot of other cool stuff, but at the end of the day, um, you know, Cost comes first. If you can't afford it, it doesn't matter. And so, you know, getting things down cheap enough that, you know, we can monitor in real time, getting things down cheap enough that we can even do it in the first place.That's the basis for everything we do.OPEN SCIENCE FOR AFFORDABLE AI RESEARCH [00:36:00]Alessio: Do you think a lot of the questions that we have around, you know, what data sets we should use and things like that are just because training was so expensive before that, we just haven't run enough experiments to figure that out. And is that one of your goals is trying to make it cheaper so that we can actually get the answers?Jonathan: Yeah, that's a big part of my personal conviction for being here. I think I'm, I'm still in my heart, the second year grad student who was jealous of all his friends who had GPUs and he didn't, and I couldn't train any models except in my laptop. And that, I mean, the lottery ticket experiments began on my laptop that I had to beg for one K 80 so that I could run amist.And I'm still that person deep down in my heart. And I'm a believer that, you know, if we wanna do science and really understand these systems and understand how to make them work well, understand how they behave, understand what makes them safe and reliable. We need to make it cheap enough that we can actually do science, and science involves running dozens of experiments.When I finally, you know, cleaned out my g c s bucket from my PhD, I deleted a million model checkpoints. I'm not kidding. There were over a million model checkpoints. That is the kind of science we need, you know, that's just what it takes. In the same way that if you're in a biology lab, you don't just grow one cell and say like, eh, the drug seems to work on that cell.Like, there's a lot more science you have to do before you really know.Abhinav: Yeah. And I think one of the special things about Mosaic's kind of [00:37:30] position as well is that we have such, so many customers all trying to train models that basically we have the incentive to like to devote all these resources and time to do this science.Because when we learn which pieces actually work, which ones don't, we get to help many, many people, right? And so that kind of aggregation process I think is really important for us. I remember way back there was a paper about Google that basically would investigate batch sizes or something like that.And it was this paper that must have cost a few million dollars during all the experience. And it was just like, wow, what a, what a benefit to the whole community. Now, like now we all get to learn from that and we get, we get to save. We don't have to spend those millions of dollars anymore. So I think, um, kind of mosaical science, like the insights we get on, on data, on pre-screening architecture, on all these different things, um, that's why customers come to us.Swyx: Yeah, you guys did some really good stuff on PubMed, G B T as well. That's the first time I heard of you. Of you. And that's also published to the community.Abhinav: Yeah, that one was really fun. We were like, well, no one's really trained, like fully from scratch domain specific models before. Like, what if we just did a biomed one?Would it still work? And, uh, yeah, I'd be really excited. That did, um, we'll probably have some follow up soon, I think, later this summer.Jonathan: Yeah. Yes. Stay tuned on that. Um, but I, I will say just in general, it's a really important value for us to be open in some sense. We have no incentive not to be open. You know, we make our money off of helping people train better.There's no cost to us in sharing what we learn with the community. Cuz really at the end of the day, we make our money off of those custom models and great infrastructure and, and putting all the pieces together. That's honestly where the Mosaic name came from. Not off of like, oh, we've got, you know, this one cool secret trick [00:39:00] that we won't tell you, or, you know, closing up.I sometimes, you know, in the past couple weeks I've talked to my friends at places like Brain or, you know, what used to be Brain Now Google DeepMind. Oh, I R I P Brain. Yeah. R i p Brian. I spent a lot of time there and it was really a formative time for me. Um, so I miss it, but. You know, I kind of feel like we're one of the biggest open research labs left in industry, which is a very sad state of affairs because we're not very big.Um, but at least can you say how big the team is actually? Yeah. We were about 15 researchers, so we're, we're tiny compared to, you know, the huge army of researchers I remember at Brain or at fair, at Deep Mind back, you know, when I was there during their heydays. Um, you know, but everybody else is kind of, you know, closed up and isn't saying very much anymore.Yeah. And we're gonna keep talking and we're gonna keep sharing and, you know, we will try to be that vanguard to the best of our ability. We're very small and I, I can't promise we're gonna do what those labs used to do in terms of scale or quantity of research, but we will share what we learn and we will try to create resources for the community.Um, I, I dunno, I just, I believe in openness fundamentally. I'm an academic at heart and it's sad to me to watch that go away from a lot of the big labs. THE OPEN APPROACH [00:40:15]Alessio: We just had a live pod about the, you know, open AI snow mode, uh, post that came out and it was one of the first time I really dove into Laura and some of the this new technologies, like how are you thinking about what it's gonna take for like the open approach to really work?Obviously today, GPT four is still, you know, part of like that state-of-the-art model for a [00:40:30] lot of tasks. Do you think some of the innovation and kind of returning methods that we have today are enough if enough people like you guys are like running these, these research groups that are open? Or do you think we still need a step function improvement there?Jonathan: I think one important point here is the idea of coexistence. I think when you look at, I don't know who won Linux or Windows, the answer is yes. Microsoft bought GitHub and has a Windows subsystem for Linux. Linux runs a huge number of our servers and Microsoft is still a wildly profitable company.Probably the most successful tech company right now. So who won open source or closed source? Yes. Um, and I think that's a similar world that we're gonna be in here where, you know, it's gonna be different things for different purposes. I would not run Linux on my laptop personally cuz I like connecting to wifi and printing things.But I wouldn't run Windows on one of my surfers. And so I do think what we're seeing with a lot of our customers is, do they choose opening IR mosaic? Yes. There's a purpose for each of these. You have to send your data off to somebody else with open eyes models. That's a risk. GPT four is amazing and I would never promise someone that if they come to Mosaic, they're gonna get a GPT four quality model.That's way beyond our means and not what we're trying to do anyway. But there's also a whole world for, you know, domain specific models, context specific models that are really specialized, proprietary, trained on your own data that can do things that you could never do with one of these big models. You can customize in crazy ways like G B T four is not gonna hit 65 K context length for a very long time, cuz they've already trained that [00:42:00] model and you know, they haven't even released the 32 K version yet.So we can, you know, we can do things differently, you know, by being flexible. So I think the answer to all this is yes. But we can't see the open source ecosystem disappear. And that's the scariest thing for me. I hear a lot of talk in academia about, you know, whatever happened to that academic research on this field called information retrieval?Well, in 1999 it disappeared. Why? Because Google came along and who cares about information retrieval research when you know you have a Google Scale, you know, Web Scale database. So you know, there's a balance here. We need to have both. Swyx: I wanna applaud you, Elaine. We'll maybe edit it a little like crowd applause, uh, line.Cuz I, I think that, um, that is something that as a research community, as people interested in progress, we need to see these things instead of just, uh, seeing marketing papers from the advertising GPT 4.Jonathan: Yeah. I, I think I, you know, to get on my soapbox for 10 more seconds. Go ahead. When I talk to policymakers about, you know, the AI ecosystem, the usual fear that I bring up is, Innovation will slow because of lack of openness.I've been complaining about this for years and it's finally happened. Hmm. Why is Google sharing, you know, these papers? Why is Open AI sharing these papers? There are a lot of reasons. You know, I have my own beliefs, but it's not something we should take for granted that everybody's sharing the work that they do and it turns out well, I think we took it for granted for a while and now it's gone.I think it's gonna slow down the pace of progress. In a lot of cases, each of these labs has a bit of a monoculture and being able to pass ideas [00:43:30] back and forth was a lot of what kept, you know, scientific progress moving. So it's imperative not just, you know, for the open source community and for academia, but for the progress of technology.That we have a vibrant open source research community.THE FUTURE OF MOSAIC [00:44:11]Swyx: There’s a preview of the ecosystem and commentary that we're, we're gonna do. But I wanna close out some stuff on Mosaic. You launched a bunch of stuff this month. A lot of stuff, uh, actually was, I was listening to you on Gradient descent, uh, and other podcasts we know and love.Uh, and you said you also said you were not gonna do inference and, and, and last week you were like, here's Mosaic ML inference. Oops. So maybe just a, at a high level, what was Mosaic ml and like, what is it growing into? Like how do you conceptualize this? Jonathan: Yeah, and I will say gradient, when graded dissent was recorded, we weren't doing inference and had no plans to do it.It took a little while for the podcast to get out. Um, in the meantime, basically, you know, one thing I've learned at a startup, and I'm sure abhi can comment on this as well, focus is the most important thing. We have done our best work when we've been focused on doing one thing really well and our worst work when we've tried to do lots of things.Yeah. So, We don't want to do inference, we don't want to have had to do inference. Um, and at the end of the day, our customers were begging us to do it because they wanted a good way to serve the models and they liked our ecosystem. And so in some sense, we got dragged into it kicking and screaming. We're very excited to have a product.We're going to put our best foot forward and make something really truly amazing. But there is, you know, that's something that we were reluctant to do. You know, our customers convinced us it would be good for our business. It's been wonderful for business and we are gonna put everything into this, but you know, back when grading dissent came out, I [00:45:00] was thinking like, or when we recorded it or focused, oh God, like focus is the most important thing.I've learned that the hard way multiple times that Mosaic, abhi can tell you like, you know, I've made a lot of mistakes on not focusing enough. Um, boy inference, that's a whole second thing, and a whole different animal from training. And at the end of the day, when we founded the company, our belief was that inference was relatively well served at that time.There were a lot of great inference companies out there. Um, training was not well served, especially efficient training. And we had something to add there. I think we've discovered that as the nature of the models have changed, the nature of what we had to add to inference changed a lot and there became an opportunity for us to contribute something.But that was not the plan. But now we do wanna be the place that people come when they wanna train these big, complex, difficult models and know that it's gonna go right the first time and they're gonna have something they can servee right away. Um, you know, really the rep example of, you know, with 10 days to go saying, Hey, can you please train that model?And, you know, three or four days later the model was trained and we were just having fun doing interesting, fine tuning work in it for the rest of the 10 days, you know. That also requires good inference. Swyx: That’s true, that's true. Like, so running evals and, and fine tuning. I'm just putting my business hat on and you know, and Alessio as well, like, uh, I've actually had fights with potential co-founders about this on the primary business.Almost like being training, right? Like essentially a one-time cost.Jonathan: Who told you it was a one time cost? What, who, who told you that?Swyx: No, no, no, no. Correct me. Jonathan: Yeah. Yeah. Let me correct you in two ways. Um, as our CEO Navine would say, if he were here, when you create version 1.0 of your software, do you then fire all the engineers?Of [00:46:30] course not. You never, like, MPT has a thousand different things we wanted to do that we never got to. So, you know, there will be future models.Abhinav: And, and the data that's been trained on is also changing over time too, right? If you wanna ask anything about, I guess like May of 2023, we'll have to retrain it further and so on.Right? And I think this is especially true for customers who run like the kind of things that need to be up to date on world knowledge. So I, I think like, you know, the other thing I would say too is that, The malls we have today are certainly not the best malls we'll ever produce. Right. They're gonna get smaller, they're gonna get faster, they're gonna get cheaper, they're gonna get lower latency, they're gonna get higher quality.Right? And so you always want the next gen version of MPT and the one after that and one after that. There's a reason that even the GPT series goes three, four, and we know there's gonna be a five. Right? Um, so I I I also don't see as a, as a one-time cost.Jonathan: Yeah. Yeah. And I, if you wanna cite a stat on this, there are very, very few stats floating around on training versus inference cost.Mm-hmm. One is this blog post from I think David Patterson at Google, um, on the energy usage of ML at Google. And they break down and say three fifths of energy over the previous three years. I think this 2022 article was for inference, and two fifths were for training. And so actually that, you know, this is Google, which is serving models to billions of users.They're probably the most inference heavy place in the world. It's only a two fifth, three fifth breakdown, and that's energy training. Hardware is probably more expensive because it has fancier networking. That could be a 50 50 cost breakdown. And that's Google for a lot of other folks. It's gonna be weighed even more heavily, in favor of training.SPEED AND EFFICIENCY [00:48:01]Swyx: Amazing answer. Well, thanks. Uh, we can, we can touch on a little bit [00:48:00] on, uh, efficiency and speed because we, we, uh, didn't mention about that. So right now people spend between three to 10 days. You, you spend 10 days on, on mpc, seven rep spend three days. What's feasible? What's what Do you wanna get it down to?Abhinav: Oh, for, for these original models? Yeah. Yeah. So I think, um, this is probably one of the most exciting years, I think for training efficiency, just generally speaking, because we have the, the combination of a couple things, like one is like this next generation of hardware, like the H 100 s coming out from Nvidia, which on their own should be like, at least like a two x improvement or they 100 s on top of that, there's also a new floating point format f P eight, um, which could also deliver that alone.Does it? Yes. Yeah. Yeah. How, what, why? Oh, the f p thing? Yeah. Yeah. So basically what's happening is that, you know, when we do all of our math, like in the models matrix, multiplication, math, we do it in a particular precision. We started off in 32 bit precision a few years ago, and then in video came with 16 bit, and over the course of several years, we've all figured out how to do 16 bit training and that basically, you know, due to the harder requirements like.Increase the throughput by two x, reduce the cost by two x. That's about to happen again with FBA eight, like starting this year. And with Mosaic, you know, we've already started profiling L L M training with f p eight on H 100 s. We're seeing really, really good improvements there. And so you're gonna see a huge cost reduction this year just from this hardware fact alone.On top of that, you know, there's a lot of architectural applications. We're looking at ways to introduce some forms of sparsity, not necessarily like the, the, the super unstructured sparsity like lottery ticket. Um, which not that I'm sure I'm really happy to talk about. Um, but, but, um, are there ways of doing, like you [00:49:30] gating or like, kind of like m moe style architectures?So, you know, I think originally, you know, what was like 500 k. I think to try and train a Jeep, the equality model, if at the end of the year we could get that down to a hundred k, that would be fantastic.Swyx: That is this year's type of thing. Jonathan: Not, not, like, that's not a pie in the sky thing. Okay. It is not, it's not a place we are now, but I think it is a, you know, I don't think more than a year in the future these days, cuz it's impossible.I think that is very much a 2023 thing. Yeah. Yeah. Okay. And hold me to that later this year.Swyx: G PT three for a hundred K, let's go. Um, and then also stable diffusion originally reported to be 600 K. Uh, you guys can get it done for under 50. Anything different about image models that we should image, to text?Jonathan: Um, I mean I think the, the most important part in all this is, you know, it took us a while to get 50 down by almost seven x. That was our original kind of proof of concept project for Mosaic. You know, just at the beginning to show like, you know, we can even do this and our investors should give us more money.But what I love about newer models that come out is they're always really slow. We haven't figured out how to optimize them yet. And so there's so much work to be done. So getting, you know, in that case, I guess from the cost you mentioned like a 12 x cost reduction in stable diffusion. Mm-hmm. Honestly it was a lot easier than getting a seven X for RESNET 50 an image net or a three X for Burt, cuz the architecture was much newer and there were a lot of inefficiencies to improve.Um, you know, I'm guessing that's gonna continue to be the case as we lean toward the bleeding edge and try to, you know, push the bleeding edge. I hope that, you know, in some sense you'll see smaller speed ups from us because the new models will come from us and they'll already be fast.Alessio: So that's making existing [00:51:00] things better with the, the long boy, the 60 5K context window, uh, you've doubled instead of the r.There was the R M T a couple weeks ago that had a possible 1 million. Uh, that's the unlimited former thing that came out last week, which is theoretically limitless context. What should people think about trade offs? Implications? You mentioned memories kind of start to become one of the bounds.Yeah. What's the right number? Like is it based on the customer's needs? Like how would you advise customers and startups who might be building their own models?Jonathan: It's all contextual. You know, there's a lot of buzz coming for long contexts lately with a lot of these papers. None of them are exact. In terms of the way that they're doing attention.And so there's, you know, to some extent there's an approximation or a trade off between doing some kind of inexact or approximate or hierarchical or, you know, non quadratic attention versus doing it explicitly correctly the quadratic way. I'm a big fan of approximation, so I'm eager to dig into these papers.If I've learned one thing from writing and reading papers, it's to believe nothing until I've implemented it myself. And we've certainly been let down many, many, many times at Mosaic by papers that look very promising until we implement them and realize, you know, here's how they cook the books on their data.Here's, you know, the one big caveat that didn't show up in the paper. So I look at a lot of this with skepticism until, you know, I believe nothing until I re-implement it. And in general, I'm rewarded for doing that because, you know, a lot of this stuff doesn't end up working quite as well in practice.This is promised in a paper, the [00:52:30] incentives just aren't there, which is part of the reason we went with just pure quadratic attention here. Like it's known to work. We didn't have to make an approximation. There's no asterisk or caveat. This was in some sense a sheer force of will by our amazing engineers.Alessio: So people want super long context because, you know, they wanna feed more documents and right now people do it with embeddings and feed them into the context window. How do you kind of see that changing? Are we gonna get to a point where like, you know, maybe it's 60 4k, maybe it's 120 k, where it's like, okay.You know, semantic search and embeddings are gonna work better than just running a million parameters, like a million token context window.Jonathan: Do, do you wanna say the famous thing about 64 K? Does somebody wanna say that, that statement, the, you know, the 64 K is all you'll ever need? The Bill Gates statement about Rams.Swyx: Andre Kaparthi actually made that comparison before that, uh, context is essentially Ram,Jonathan: if I get quoted here saying 60 4K is all you need, I will be wrong. We have no idea. People are gonna get ambitious. Yes. Um, GPT four has probably taken an image and turning it into a bunch of tokens and plugging it in.I'm guessing each image is worth a hell of a lot of tokens. Um, maybe that's not a thousand words. Not a thousand words, but, you know, probably a thousand words worth of tokens, if not even more so. Maybe that's the reason they did 32 k. Maybe, you know, who knows? Maybe we'll wanna put videos in these models.Like every time that we say, ah, that isn't that model big enough, somebody just gets more ambitious. Who knows? TRENDS AND TRANSFORMERS [00:54:00]Swyx: Right? Um, you've famously made one. [00:54:00] Countertrend, uh, bet, which is, uh, you, you're actually betting that, uh, transformers will stick around for a long time. Jonathan: How is that counter trend? Swyx: Counter trend is in, you just said, a lot of things won't last.Right. A lot of things will get replaced, uh, really easily, butJonathan: transformers will stick around. I mean, look at the history here. How long did the Convolutional neural network stick around for? Oh wait. They're still here and vision Transformers still haven't replaced them. Mm-hmm. How long did r and n stick around for?Decades. And, you know, they're still alive and kicking in a bunch of different places, so, you know. The fundamental architecture improvements are really hard to come by. I can't wait to collect from Sasha on that bet.Abhinav: I, I think a lot of your bet hinges on what counts as attention, right.Swyx: Wait, what do you mean?Well, how, how can that change? Oh, because it'll be approximated.Abhinav: Well, I suppose if, if we ever replace like the Qk multiplication, something that looks sort of like it, I, I wonder who, who, who comes out on top here.Jonathan: Yeah. I mean at the end of the day is a feed forward network, you know, that's fully connected, just a transformer with very simple attention.Mm-hmm. Um, so Sasha better be very generous to me cause it's possible that could change, but at the end of the day, we're still doing Transformers the way, you know, Vaswani had all intended back six years ago now, so, I don't know, things. Six years is a pretty long time. What's another four years at this point?Alessio: Yeah. What do you think will replace it if you lose Ben? What do you think? You would've lost it time?Jonathan: If I knew that I'd be working on it.Abhinav: I think it's gonna be just like MLPs, you know, that's the only, that's the only way we can go, I think at this point, because Thelp, I, I dunno. Oh, just basically down to, to um, to linear layers.[00:55:30]Oh, mostly the percepts. Exactly. Got, yeah. Yeah. Yeah. Cuz the architecture's been stripped, simplified so much at this point. I think, uh, there's very little left other than like some linear layers, some like residual connections and, and of course the attention, um, dot product.Jonathan: But you're assuming things will get simpler, maybe things will get more complicated.Swyx: Yeah, there's some buzz about like, the hippo models. Hungry, hungry hippos.Jonathan: I, I mean there's always buzz about something, um, you know, that's not to dismiss this work or any other work, but there's always buzz about something. I tend to wait a little bit to see if things stand the test of time for like two weeks.Um, at this point, it used to be, you know, a year, but now it's down to two weeks. Oh. But you know, I'm. I don't know. I don't like to follow the hype. I like to see what sticks around, what people actually manage to build off of. Swyx: I have a follow up question actually on that. Uh, what's a, what's an egregiously overrated paper that once you actually looked into it fell apart completely?Jonathan: I'm not going down that path. Okay. I, you know, I even, even though I think there are papers that, you know, did not hold up under scrutiny, I don't think any of this was out of malice. And so I don't wanna go down that path. Alessio: Yeah. I know you already talked about your focus on open research. Are you mostly gonna focus on open models or are there also, are you working on configurations that are more just for your customers and private, like, what percentage of your time are you focusing on, on open work?Jonathan: It's a little fuzzy. I mean, I think at the end of the day you have to ask what is the point of our business? Our business is not just to train a bunch of open models and give them to the world. That would, our VCs probably wouldn't be very happy if that were the case. The open [00:57:00] models serve our business because they're demos.A demo does not mean we give away everything. Um, a demo does not mean every single thing we do is shared with the world, but. We do have a business imperative to share with the world, which I kind of like. That was part of the design of the company, was making sure we had an imperative to do science and an imperative to share.But we are still a company and we do have to make money, but it would be a disaster for our business if we didn't share. And that's by design from the start. So, you know, there's certainly going to be some work that we do that is for our customers only, but by and large for anything that we wanna advertise to customers, there has to be something that is meaningful and useful that's out there in the world.Otherwise we can't convince people that we have it. Abhinav: Yeah, I think like this, our recent inference product also makes the decision easier for us, right? So even since these open malls like we've developed so far, um, you can actually like, you know, uh, query them on our inference api, like our starter tier, and we basically charge like a, a per token fee.Very, very similar to the other API fighters. So there are pathways by which, you know, like even the open mall we provide for free still end up like helping our business out, right? You can customize them, deploy them on our, on our platform, and that way we, we still make money off of them.Alessio: Do you wanna jump into the landing ground?Anything else that you guys wanna cover that we didn't get to?Jonathan: This has been great. These are great questions. Swyx: Do you want to dish on why Sparsity is not a focus for Mosaic?Jonathan: Um, I can just say that, you know, sparsity is not a focus for Mosaic and I am definitely over lottery tickets when I give my mosaic talk.The first slide is a, you know, a circle with a slash through it over a lottery ticket. [00:58:30] Um, and anyone who mentions lottery tickets, I ask to leave the room. Um, cuz you know there's other work out there. But Abhi, please feel free to dish on sparsity.Abhinav: Yeah, I, I think it really comes down to the fact that we don't have hardware yet that can accelerate it.Right? Or at least it's been mostly true for a long period of time. So the kinds of sparsity that the lottery check was working on was like if you put random zeros in the, in the weights, you know, and basically we found basically the fast year is that yes, you can turn most of the weights to zeros and the model still does kind of work, but there's no hardware out there that can take a matrix with a bunch of zeros and one without and make it go fast.Now, the one caveat for this, and this is gonna sound like a bit of advertisement, is, is Cereus actually, and they've been, since the beginning, they've built that architecture for Sparsity and they've actually published some research papers just earlier this year showing that yes, they really can train with Sparsity and get, this is, uh, sparse.U P T. Exactly. Yeah, exactly right. So, the final missing piece is really like, okay, we have the science to show you can train with sparse models, you know, from initialization even, or, or close initialization. Um, the last piece is just, is there a piece of hardware that actually speeds it up and gives you a cost savings?In which case, like the, the field is wide open. Jonathan: The other big challenge here is that if you want to make sparsity go fast in general right now on standard hardware, you do need it to be structured in various ways. And any incremental amount of structure that you force on the sparsity dramatically reduces the quality of the resulting model that you get up to the point where if you remove just, you know, entire neurons from the model, you're just making the layers smaller and that really hurts the quality of the model.So these models, steel is all you need. These models love unstructured [01:00:00] sparsity. Um, and yeah, if there were a chip and a software package that made it really, really easy to accelerate it, I bet we would be doing it at Mosaic right now. Alessio: This is like Sarah Hooker's point with the hardware lottery post, talking about lotteries.Absolutely. Where you know, if you don't have the right hardware, some models, architectures just can't emerge quickly enough.Abhinav: This there, there's like an invariance to think of, which is that today's popular models always run fast on today's hardware. Like this, this has to be true. Mm-hmm. Right? Like there's no such thing as a popular model that runs slow cuz no one would've developed it.Yeah. Um, so it's kind of like with the new architectures, right? If there's new hardware that can do sparsity, you have to co-evolve like a new architecture that works with it. And then those two pair together really well. Transformers and GPUs are like a match made in heaven. Jonathan: How would say transformers and GPUs are a match made in heaven.Yeah. And we're lucky that they work on GPUs, but the folks at Google D designed them for TPUs cuz TPUs and R and Ns were not a match made in heaven.LIGHTNING ROUND AND CLOSING [1:00:55]Alessio: All right, we have three questions. One is on acceleration, one on exploration, and then just a takeaway for the audience. And you can, you know, either of you can start and the other can finish.So the first one is, what has already happened in AI That thought would take much longer than it has?Abhinav: Do you have an answer, Jon? Jonathan: Yeah, I have answer everything. Um, you know, I, I remember when GPT two came out and I looked at that and went, eh, you know, that doesn't seem very exciting. And gosh, it's already 1.5 billion parameters.You know, they can't possibly keep getting better as they make it bigger. And then GPT three came out and I was like, eh, it's slightly better at [01:01:30] generating text. Yeah, who cares? And you know, I've been wrong again and again and again. That. Next token prediction, making things big can produce useful models.To be fair, pretty much all of us were wrong about that. So I can't take that precisely on myself. Otherwise, Google, Facebook and Microsoft Research would all have had killer large language models way before opening I ever got the chance to do it. Um, opening I made a very strange bet and it happened to work out very well.But yeah, diffusion models, like they're pretty stupid at the end of the day and they produce beautiful images, it’s astounding.Abhinav: Yeah, I think my, my answer is gonna be like the, the chatbots at scale, like idea, like basically I thought it would be quite a while before, you know, like hundreds of millions of people will be talking to AI models for a large portion of the data, but now there's many startups and companies not, not just open with chat pt, but, but you know, like character and others where, um, it, it's really astounding, like how many people are actually developing like emotional connections to these, to these AI models.And I don't think I was. Would've predicted that like September, October of last year. But you know, the inflection point of the last six months has been really surprising.Swyx: I haven't actually tried any of these models, but I, I don't know. It seems like a very educational thing. It's like, oh, talk to Genius can, but like that's a very educational use case.Right? Right. Like what, what do you think they're using for, I guess, emotional support?Abhinav: Well, yes. I mean, I think some of them are sort of like, yeah, like either for emotional support or honestly just friends and stuff. Right. I mean, I think like, you know, loneliness mental health is a really a big problem everywhere.And so the most interesting I think I've found is that if you go to the subreddits, you know, for those communities and you see like how they [01:03:00] talk about and think about their like AI friends and like these characters, it's, it's, it's like out of a science fiction book, like I would never expect this to be like reality.Swyx: Yeah. What do you think are the most interesting unsolved questions in ai?Abhinav: I'm really interested in seeing how far down we can go in terms of precision and, and stuff like that. Particularly similar to the BF16 FP thing. Swyx: Okay. Um, there's also like just quantizing until like it's two bits.Abhinav: Yeah, exactly. Like, or even like down to analog or something like that. Because our brains obviously are not running on digital logic and stuff and so, you know, how many orders of magnitude do we have remaining in kind of like just these um, things and I wonder if some of these problems just get easier with scale.Like there have been sort of hints in some papers that, you know, it becomes easier to quantize or easier to prune as it gets bigger and bigger. So maybe as we, almost as a natural consequence of a scaling up over the next few years, will we just naturally become easier and easier to just start going to like four bits or two that are even binary leg weights.Jonathan: I want to know how small we can go in a different way. I just want to know how efficient we can make it to get models that are this good. That was my research question for my entire PhD lottery tickets were one way to get at that. That's now kind of the research question I'm chasing at Mosaic in a sense.I, you know, open ai has shown us that there is one path to getting these incredible capabilities that is scale. I hope that's not the only path. I hope there are lots of ways of getting there. There's better modeling, there are better algorithms. I hate the neuroscience metaphors, but in some sense, our existence and our brains are, you know, evidence that there is at least one other way to get to these kinds of incredible capabilities that doesn't require, you know, [01:04:30] a trillion parameters and megawatts and megawatts and gazillions of dollars.So, you know, I do wonder how small we can go? Is there another path to get to these capabilities without having to do it this way? If it's there, I hope we find it at Mosaic.Swyx: Yeah my, my favorite fact is something on the order of the human brain runs on 30 watts of energy, and so we are, we're doing like dozens of orders of magnitude off on that one.Abhinav: I, I don't think you can get like one gpu, one different. Yeah.Alessio: If there’s one message you want everyone. To remember when thinking about this thing. There's a lot of, you know, fear mongering. There's a lot of messaging being spread around, like, what should people think about in ai? What should be top of mind for them?Jonathan: I'll go for it. Which is, you know, stay balanced. They're the people who really feed into the hype or who, you know, eat up the hype. They're the people who are, you know, big pessimists or react very strongly against the hype, or to some extent are in denial. Stay balanced, embrace the fact that we've built extraordinarily useful tools.Um, but we haven't built a g I and you know, personally, I don't think we're anywhere close to that. You know, so stay balanced and follow the science. I think that's really, that's what we try to do around Mosaic. We try to focus on what's useful to people, what will, you know, hopefully make the world a better place.We try our best on that, but especially, you know, how we can follow the science and use data to be our guide, not just, you know, talk a lot, you know, try to talk through our work instead.Abhinav: And I would also say just kinda like research done in the open. I think like, you know, there's no computing with the, the open community, [01:06:00] right?Just in volume, the number of like, kind of eyeballs you basically have, like looking at your models at the, even at the problems with the models, at ways we improve them. Um, I just think, you know, yeah, research done in the open. It will, it will be the way forward, both to keep our models safe and to bely, like examine the consequences of these AI models like in the world.Alessio: Awesome. Thank you so much guys for coming on.Swyx: and thanks for keeping AI open. Abhinav: Thank you for having us. Jonathan: Yeah. Thank you so much for having us. Get full access to Latent Space at www.latent.space/subscribe
01:06:4320/05/2023
Guaranteed quality and structure in LLM outputs - with Shreya Rajpal of Guardrails AI
Tomorrow, 5/16, we’re hosting Latent Space Liftoff Day in San Francisco. We have some amazing demos from founders at 5:30pm, and we’ll have an open co-working starting at 2pm. Spaces are limited, so please RSVP here!One of the biggest criticisms of large language models is their inability to tightly follow requirements without extensive prompt engineering. You might have seen examples of ChatGPT playing a game of chess and making many invalid moves, or adding new pieces to the board. Guardrails AI aims to solve these issues by adding a formalized structure around inference calls, which validates both the structure and quality of the output. In this episode, Shreya Rajpal, creator of Guardrails AI, walks us through the inspiration behind the project, why it’s so important for models’ outputs to be predictable, and why she went with an XML-like syntax. Guardrails TLDRGuardrails AI rules are created as RAILs, which have three main “atomic objects”:* Output: what should the output look like?* Prompt: template for requests that can be interpolated* Script: custom rules for validation and correctionEach RAIL can then be used as a “guard” when calling an LLM. You can think of a guard as a wrapper for the API call. Before returning the output, it will validate it, and if it doesn’t pass it will ask the model again. Here’s an example of a bad SQL query being returned, and what the ReAsk query looks like: Each RAIL is also model-agnostic. This allows for output consistency across different models, even if they have slight differences in how they are prompted. Guardrails can easily be used with LangChain and other tools to structure your outputs!Show Notes* Guardrails AI* Text2SQL* Use Guardrails and GPT to play valid chess* Shreya’s AI Tinkerers demo* Hazy Research Lab* AutoPR* Ian Goodfellow* GANs (Generative Adversarial Networks)Timestamps* [00:00:00] Shreya's Intro* [00:02:30] What's Guardrails AI?* [00:05:50] Why XML instead of YAML or JSON?* [00:10:00] SQL as a validation language?* [00:14:00] RAIL composability and package manager?* [00:16:00] Using Guardrails for agents* [00:23:50] Guardrails "contracts" and guarantees* [00:31:30] SLAs for LLMs* [00:40:00] How to prioritize as a solo founder in open source* [00:43:00] Guardrails open source community involvement* [00:46:00] Working with Ian Goodfellow* [00:50:00] Research coming out of Stanford* [00:52:00] Lightning RoundTranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio partner and CTO-in-Residence at Decibel Partners. I'm joined by my cohost Swyx, writer and editor of Latent Space.Swyx: And today we have Shreya Rajpal in the studio. Welcome Shreya.Shreya: Hi. Hi. Excited to be here.Swyx: Excited to have you too.This has been a long time coming, you and I have chatted a little bit and excited to learn more about guardrails. We do a little intro for you and then we have you fill in the blanks. So you, you got your bachelor's at IIT Delhi minor in computer science with focus on AI, which is super relevant now. I bet you didn't think about that in undergrad.Shreya: Yeah, I think it's, it's interesting because like, I started working in AI back in 2014 and back then I was like, oh, it's, it's here. This is like almost changing the world already. So it feels like that that like took nine years, that meme of like, almost like almost arriving the thing.So yeah, I, it's felt this way where [00:01:00] it's almost shared. It's almost changed the world for as long as I've been working in it.Swyx: Yeah. That's awesome. Maybe we can explore your, like the origins of your interests, because then you went on to U I U C to do your master's also in ai. And then it looks like you went to drive.ai to work on Perception and then to Apple S P G as, as the cool kids call it special projects group working with Ian Goodfellow.Yeah, that's right. And then you were at pretty base up until recently? Actually, I don't know if you've quit yet. I have, yeah. Okay, good, good, good. You haven't updated e LinkedIn, but we're getting the by breaking news that you're working on guardrails full-time. Yeah, well that's the professional history.We can double back to fill in the blanks on anything. But what's a personal side? You know, what's not on your LinkedIn that people should know about you?Shreya: I think the most obvious thing, this is like, this is still professional, but the most obvious thing that isn't on my LinkedIn yet is, is Guardrails.So, yeah. Like you mentioned, I haven't updated my LinkedIn yet, but I quit some time ago and I've been devoting like all of my energy. Yeah. Full-time working on Guardrails and growing the open source package and building out exciting features, et cetera. So that's probably the thing that's missing the most.I think another. More personal skill, which I [00:02:00] think I'm like kind of okay for an amateur and that isn't on my LinkedIn is, is pottery. So I really enjoy pottery and yeah, don't know how to slot that in amongst, like, all of the AI. So that's not in there. Swyx: Well, you like shaping things into containers where, where like unstructured things and kind of flow in, so, yeah, yeah, yeah. See I can, I can spin it for you.Shreya: I should, I should use that. Yeah. Yeah.Alessio: Maybe for the audience, you wanna give a little bit of intro on Guardrails AI, what it is, why you wanted to start itShreya: Yeah, yeah, for sure. So Guardrails or, or the need for Guardrails really came up as I was kind of like building some of my own projects in the space and like really solving some of my own problems.So this was back of like end of last year I was kind of building some applications, like everybody else was very excited about the space. And I built some stuff and I quickly realized that yeah, I could, you know it works like pretty well a bunch of times, but like a lot of other times it really does not work as I, the developer of this tool, like, want my tool to work.And then as a developer like I can tell that there's very few tools available for me to like, get this to, you know cooperate [00:03:00] with me, like get it to follow directions, etc. And the only tool I really have is this prompt. And there's only so, so far you can go with like, putting instructions in like caps, adding a bunch of exclamations and being like, follow my instructions. Like give me this output this way. And so I think like part of it was, You know that it's not reliable, et cetera. But also as a user, it just if I'm building an application for a user, I just want the user to have a have a certain experience using it. And there's just not enough control to me, not enough, like knobs for me to tune, you know as a developer to do that.So guardrails kind of like came up as a way to just like, manage this better. The tool basically, I was like, okay. As I'm building this, I know from the ground up, like what is the experience I want the user to add, to have like, what is a great LLM output look like for me? And so I wanted a tool that allows me to kind of specify that and enforce those constraints.As I was thinking of this, I was like, this should be very extensible, very flexible so that there's a bunch of use cases that can be handled, et cetera. But the need really like, kind of came up from my own from my own, like I was basically solving for my own pain points.[00:04:00]So that's a little bit of the history, but what the tool does is that it allows you to kind of like specify. It's this two-part system where there's a specification framework and then there's like a code that enforces that specification on the LLM outputs. So the specification framework allows you to be like as coarse or as fine grained as you care about.So you can essentially think about what is the, on a very like first order business, like where is the structure and what are the types, etc, of the output that I want. If you want structured outputs from LLMs. But you can also go like very into semantic correctness with this, with a. I just released something this morning, which is that if you're summarizing a bunch of documents, make sure that it's a very faithful summary.Make sure that there's like coherence amongst like what the output is, et cetera. So you can have like all of these semantic guarantees as well. And guardrails created like rails, like a reliable AI markup language that allows you to specify that. And along with that, there's like code that backs up that specification and it makes sure that a, you're just generating prompts that are more likely to get you the output in the right manner to start out with.And then once you get that output all of the specification criteria you entered is like [00:05:00] systematically validated and like corrected. And there's a bunch of like tools in there that allow you a lot of control to like handle failures much more gracefully. So that's in a nutshell what guardrails does.Awesome.Alessio: And this is model agnostic. People can use it on any model.Shreya: Yeah, that's right. When I was doing my prototyping, I like was developing with like OpenAI, as I'm sure like a bunch of other developers were. But since then I've added support where you can basically like plug in any, essentially any function or any callable as long as you, it has a string input.String output you can plug it in there and I've had people test it out with a bunch of other models and get pretty good results. Yeah.Alessio: That's awesome. Why did you start from XML instead of YAML or JSON?Shreya: Yeah. Yeah. I think it's a good question. It's also the question I get asked the most. Yes. I remember we chat about this as well the first chat and I was like, wait, okay, let's get it out of the way. Cause I'm sure you answered this a lot.Shreya: So it is I didn't start out with it is the truth. Like, I think I started out from this code first framework service initially like Python classes, et cetera. And I was like, wait, this is too verbose. This is like I, as I'm thinking about what I want, I truly just [00:06:00] want this is like, this is what this dictionary should look like for me, right?And having to like create classes on top of that just seemed like a higher upfront cost. Like obviously there's a balance there. Like there's some flexibility that classes and code affords you that maybe isn't there in a declarative markup language. But that that was my initial kind of like balance there.And then within markup languages, I experimented with the bunch, but the idea, like a few aesthetic things about xml, like really appeal to me, as unusual as that may sound. But I think one is this idea of like properties off. Any field that you're getting back from an LLM, right. So I think one of the initial ones that I was experimenting with was like TypeScript, et cetera.And with TypeScript, like all of the control you have is like, you try to like stuff as much information as possible in the name of the key, right? But that's not really sufficient because like in, in XML or, or what gars allows you to do is like maybe add like descriptions for each field that you're getting, which like is, is really very helpful because that almost acts as a proxy prompt.You know, and, and it gets you like better outputs. You can add in like what the correctness criteria or what the validity criteria is for this field, et [00:07:00] cetera. That also gets like passed through to the prompt, et cetera. And these are all like, Properties for a single field, right? But fields themselves can be containers and can have like other nested like fields within them.And so the separation of like what's a property of a field versus what's like child of a field, et cetera, was like nice to me. And having like all of this metadata contained within this one, like tag was like kind of elegant. It also mapped very well to this idea of like error handling or like event handling because like each field may fail in weird ways.It's very inspired from H T M L in that way, in that you have these like event handlers for like, oh, if this validity criteria for this field fails maybe I wanna re-ask the large language model and here's my re-asking parameters, et cetera. Whereas like, if other criteria fail there's like maybe other ways to do to handle that.Like maybe I don't care about it as much. Right. So, so that seemed pretty elegant to me. That said, I've talked to a lot of people who are very opinionated about it. My, like, the thing that I was optimizing for was essentially that it seemed clean to me compared to like other things I tried out and seemed as close to English as [00:08:00] possible.I tested it out with, with a bunch of friends you know, who did not have tag backgrounds or worked in tag but weren't like engineers and it like and they resonated and they were able to pick it up. But I think you'll see updates in the works where I meet people where they are in terms of like, people who, especially like really hate xml.Like there's something in the works where there'll be like a code first version of this. And also like other markup languages, which I'm actively exploring. Like what is a, what is a joyful experience to have for like other market languages. Yeah. DoSwyx: you think that non-technical people would.Use rail was because I was, I was just surprised by your mention that you tested it on non-technical people. Is that a design goal? Yeah, yeah,Shreya: for sure. Wow. Okay. We're seeing this big influx of, of of people who are building tools with these applications who are kind of like, not machine learning people.And I think like, that's truly the kind of like big explosion that we're seeing. Right. And a lot of them are like getting so much like value out of like lms, but because it allows you like earlier if you were to like, I don't know. Build a web scraper, you would need to do this like via code.[00:09:00] But now like you can get not all the way, but like a decent amount of way there, like with just English. And that is very, very powerful. So it is a design goal to like have like essentially low floor, high ceiling is, was like absolutely a design goal. So if, if you're used to plain English and prompting using Chad PK with plain English, then you can it should be very easy for you to kind of like pick this up and there's not a lot of gap there, but like you can also build like pretty complex workflows with guardrails and it's like very adaptable in that way.Swyx: The thing about having custom language is essentially other people can build. Stuff that compiles to you. Mm-hmm. Which is also super nice and, and visual layers on top. Like essentially HTML is, is xml, like mm-hmm. And people then build the WordPress that is for non-technical people to interface with html.Shreya: I don't know. Yeah, yeah. No, absolutely. I think like in the very first week that Guardrails was out, like somebody reached out to me and they were pm and they essentially were like, I don't, you know there's a lot of people on my team who would love to use this, but just do not write code.[00:10:00] Like what is the, where is a visual interface for building something like this? But I feel like that's, that's another reason for why XML was appealing, because it's essentially like a document structuring, like it's a way to think about like documents as trees, right? And so again, if you're thinking about like what a visual interface would be, then maps going nicely to xml.But yeah. So those are some of the design considerations. Yeah.Swyx: Oh, I was actually gonna ask this at the end, but I'm gonna bring it up now. Did you explore sql, like. Syntax. And obviously there's a project now l m qr, which I'm sure you've looked at. Yeah. Just compare, contrast, anything.Shreya: Yeah. I think from my use case, like I was very, how I wanted to build this package was like essentially very, very focused on developer ergonomics.And so I didn't want to like add a lot of overhead or add a lot of like, kind of like high friction essentially like learning a whole new dialect of sequel or a sequel like language is seems like a much bigger overhead to me compared to like doing things in XML or doing things in a markup language, which is much more intuitive in some ways.So I think that was part of the inspiration for not exploring sql. I'd looked into it very briefly, but I mean, I think for my, for my own workflows, [00:11:00] I wanted to make it like as easy as possible to like wrap whatever LLM API calls you make. And, and to me that design was in markup or like in XML, where you just define your desiredSwyx: structures.For what it's worth. I agree with you. I would be able to argue for LMQL because SQL is the proven language for business analysts. Right. Like less technical, like let's not have technical versus non-technical. There's also like less like medium technical people Yeah. Who learn sql. Yeah. Yeah. But I, I agree with you.Shreya: Yeah. I think it depends. So I have I've received like, I think the why XML question, like I mentioned is like one of the things I get most, but I also hear like this feedback from other people, which is like all of like essentially enterprises are also like very comfortable with xml, right? So I guess even within the medium technical people, it's like different cohorts of like Yeah.Technologies people are used to and you know, what they would find kind of most comfortable, et cetera. Yeah. And,Swyx: Well, you have a good shot at establishing the standard, which is pretty exciting. I'm someone who has come from a, a long background with React, the JavaScript framework. I don't know if you.And it's kind of has that approach of [00:12:00] taking a templating XML like language to describe something that was typically previously described in Code. I wonder if you took any inspiration from that? If you want to just exchange notes on anything from that like made React successful. Cuz I, I spent a few years studying that.Yeah.Shreya: I'm happy to talk about it, but I will say that I am very uneducated when it comes to front end, so Yeah, that's okay. So I might say some things that like aren't, aren't valid or like don't really, don't really map very well, but I'm gonna give it a shot anyway. So I don't know if it was React specifically.I think just this idea of marrying essentially like event handlers, like with the declarative framework. Yes. And with this idea of being able to like insert scripts, et cetera, and quote snippets into that. Like, that was super duper appealing to me. And that was like something like where you're programming with.Like Gabriels and, and Rail specifically is essentially a way to like program with large language models outside of using like just national language. Right? And so like just thinking of like what are the different like programming workflows that people typically need and like what would be the most elegant way to add that in there?I think that was an inspiration. So I basically looked at like, [00:13:00] If you're familiar with Guardrails and you know that you can insert like dynamic scripting into a rail specification, so you can register custom validators within rail. You can maybe have like essentially code snippets where things are like lists or things are like dynamically generated array, et cetera, within GAR Rail.So that kind of resonated a lot to like using JavaScript injected within like HTML files. And I think other inspiration was like I mentioned this before, but the event handlers was like something that was very appealing, how validators are configured in guardrails right now. How you tack on specific validators that's kind of inspired from like c s s and adding like style tags, et cetera, to specific Oh, inline styling.Okay. Yeah, yeah, yeah, exactly. Wow. So that was like some of the inspiration, I guess that and pedantic and like how pedantic kind of like does its validation. I think those two were probably like the two biggest inspirations while building building the current version of guardrails. Swyx: One part of the design of React is composability.Can I import a guardrails thing from into another guardrails project? [00:14:00] I see. That paves the way for guardrails package managers or libraries or Right. Reusable components, essentially. I think that'sShreya: pretty interesting. Do you wanna expand on that a little bit more? Swyx: Like, so for example, you have guardrails for a specific use case and you want to like, use that, use it in a bigger thing. And then just compose it up. Yeah.Shreya: Yeah. I wanna say that, I think that should be pretty straightforward. I'm trying to think about like, use cases where people have done that, but I think that kind of maps into like chaining or like building complex workflows generally. Right. So how I think about guardrails is that like, I.If you're doing something like chaining, you essentially are composing together these like multiple LLM API calls and you have these like different atomic units of each LLM API calls, right? So where guardrails kind of slots in is add like one of those nodes. It essentially adds guarantees, et cetera, and make sure that you know, that that one node is like water tied, et cetera, in terms of the, the output that is, that it has.So each node in your graph or tree or in your dag would essentially have like a guardrails config associated with it. And you can kind of like use your favorite chaining libraries, like nine chain, et cetera, to like then compose this further together. [00:15:00] I think I've seen like one of the first actually community projects that was like built using guardrails, like had chaining and then had like different rails for each node of that chain.Essentially,Alessio: I'm building an agent internally for us. And Guardrails are obviously very exciting because once you set the initial prompt, like the model creates its own prompts. Can the models create rails for themselves? Like, have you tried this out? Like, can they understand what the output is supposed to be and like where their ownShreya: specs?Yeah. Yeah. I think this is a very interesting question. So I haven't personally tried this out, but I've ha I've received this request you know, a few different times. So on the roadmap like seeing how this can be done, but I think in general, like in all of the prompt engineering experiments I've done, et cetera, I don't see like why with, especially with like few short examples that shouldn't be possible.But that's, that's a fun like experiment. I wanna try out,Alessio: I was just thinking about this because if you think about Baby a gi mm-hmm. And some of these projects mm-hmm. A lot of them are just loops of prompts. Yeah. You know so I can see a future [00:16:00] in which. A lot of these loops are kind off the shelf thing and then you bring your own rails mm-hmm.To make sure that they work the way you expect them to be instead of expecting the model to do everything for you. Yeah. What are your thoughts on agents and kind of like how this plays together? I feel like when you start it, people were mostly just using this for a single prompt. You know, now you have this like automated chainShreya: happening.Yeah. I think agents are like absolutely fascinating in how. Powerful they are, but also how unruly they are sometimes. Right? And how hard to control they are. But I think in general, this kind of like ties into even with machine learning or like all of the machine learning applications that I worked on there's a reason like you don't have like fully end-to-end ML applications even in you know, so I, I worked in self-driving for example, like a driveway.I at driveway you don't have a fully end-to-end deep learning driving system, right? You essentially have like smaller components of it that are deep learning and then you have some kind of guarantees, et cetera, at those interfaces of those boundaries. And then you have like other maybe more deterministic competence, et cetera.So essentially like the [00:17:00] interesting thing about the agent framework for me is like how we will kind of like break this up into smaller tasks and then like assign those guarantees kind of at e each outputs. It's a problem that I've been like thinking about, but it's also like frankly a hard problem to solve because you're.Because the goals are auto generated. You know, there's also like the, the correctness criteria for those goals also needs to be auto generated, right? Which is like a little bit antithetical to you knowing ahead of time, like, what, what a correct output for me for a developer or for your application kind of looking like.So I think like that's the interesting crossroads. But I do think, like with that said, I think guardrails are like absolutely essential for Asian frameworks, right? Like partially because like, not just making sure they're like constrained and they're safe, et cetera, but also, frankly, to just make sure that they're doing what you want them to do, right?And you get the right output from them. So it is a problem. Like I'm, I'm thinking a bunch about, I think just, just this idea of like, how do you make sure that it's not it's not just models checking each other, but there's like some more determinism, some more notion of like guarantees that can be backed up in there.I think like that's [00:18:00] the, that would be like super compelling to me, and that is kind of like the solution that I would be interested in putting out. But yeah, it's, it's something that I'm thinking about for sure. I'mSwyx: curious in the scope of the problem. I feel like we need to. I think a lot of people, when they hear about AI progress, they always assume that, oh, that just if it's not good now, just wait a year later.And I think obviously, I think that's something that you have to think about as well, right? Like how much of what guardrails is gonna do is going to be Threatens or competed with by GC four having 32,000 context tokens. Just like what do you think are like the invariables in model capabilities that you're betting on versus like stuff that you would not bet on because you just expected to get better?Yeah.Shreya: Yeah. I think that's a great question, and I think just this way of thinking about invariables, et cetera is something that is very core to how I've been thinking about this problem and like why I also chose to work on this problem. So, I think again, and this is like guided by some of my past experience in machine learning and also kind of like looking at like how these problems are, how like other applications that I've had a lot [00:19:00] of interest, like how some of the ML challenges have been solved in there.So I think like context, like longer context, length is going to arrive for sure. We are gonna start saying we're already seeing like some, some academic papers and you know, we're gonna start seeing a lot more of them like translated into actual applications.Swyx: This is the new transformer thing that was being sent around with like a millionShreya: context.Yeah. I also, I think my my husband is a PhD student you know, at Stanford and then his lab also does research basically in like some of the more efficient architectures for Oh, that'sSwyx: a secret weapon for guard rails. Oh my god. What? Tell us more.Shreya: Yeah, I think, I think their lab is pretty exciting.This is a shouted to the hazy research lab at Stanford. And yeah, I think like some of, there's basically some active research there about like, basically looking into like newer architectures, like not just transform. Yeah, it might not be the most I've been artifact more architecture.Yeah, more architectural research that allows for like longer context length. So longer context, length is arriving for sure. Yeah. Lower latency lower memory efficiency, et cetera. So that is actually some of my background. I worked in that in my previous jobs, something I'm familiar with.I think there's like known recipes for making [00:20:00] this work. And it's, it's like a problem like once, essentially it's a problem of just kind of like a lot of experimentation and like finding exactly what configurations kind of get you there. So that will also arrive, both of those things combined, you know will like drive down the cost of running inference on these models.So I, all of those trends are coming for sure. I think the trend that. Are the problem that is not solved by these trends is the problem of like determinism on machine learning models, like fundamentally machine learning models, deep learning models specifically, like are impossible to add guarantees on even with temperature zero.Oh, absolutely. Even with temperature zero, it's not the same as like seed equals zero or seed equals like a fixed amount. Mm-hmm. So even if with temperature zero with the same inputs, you run it multiple times, you'll essentially see that you don't get the same output multiple times. Right.Combined with this, System where you don't even actually own the model yourself, right? So the models are updated from under you all the time. Like for building guardrails, like I had to do a bunch of prompt engineering, right? So that users get like really great structured outputs, like share of the bat [00:21:00] without like having to do any work.And I had this where I developed something and it worked and then it ended up like for some internal model version, updated, ended up like not being functional anymore and I had to go back to the drawing board and you know, do that prompt engineering again. There's a bit of a digression, but I do see that as like a strength of guardrails in that like the contract that I'm providing is not between the user.So the user has a contract with me essentially. And then like I am making sure that we are able to do prompt engineering to get like the output from the LLM. And so it kind of like takes away a lot of that burden of having to figure that out for the user, right? So there's a little bit of a digression, but these models change all the time.And temperature zero does not equal like seed zero or fixed seed rather. And so even with all of the trends that we're gonna see arriving pretty soon over the next year, if not sooner, this idea of like determinism reproducibility is not gonna change, right? Ignoring reproducibility is a whole other problem of like the really, really, really long tail of like inputs and outputs that are not covered by, by tests and by training data, [00:22:00] et cetera.And it is like virtually impossible to cover that. You kind of like, this is not simply a problem where like, Throwing more data at the model is going to solve. Right? Yeah. Because like, people are building like genuinely really fascinating, really amazing complex applications and like, and these are just developers, like users are then using those applications in many diverse complex ways.And so it's hard to figure out like, what if you get like weird way word prompts that you know, like aren't, that you didn't kind of account for, et cetera. And so there's no amount of like scaling laws essentially that kind of account for those problems. They can be like internal guardrails, et cetera.Of course. And I would be very surprised if like open air, for example, like doesn't have their own internal guardrails. You can already see it in like some, some differences for example, like URLs like tend to be valid URLs now. Right. Whereas it really Yeah, I didn't notice that.It's my, it's my kind of my job to like keep track of, keep it, yeah. So I'm sure that's, If that's the case that like there's some internal guard rails, and I'm sure that that would be a trend that we would kind of see. But even with that there's like a ton of use cases and a [00:23:00] ton of kind of like application areas where like there's different requirements from different types of guard rails are valuable in different requirements.So this is a problem essentially that would be like, harder to solve or next to impossible to solve with just data, with just scaling up the models. So you would need kind of this ensemble basically of, of LLMs of like these really powerful models along with like deterministic guarantees, rule-based heuristics, et cetera, more traditional you know machine learning tools and like you ensemble all of these together and you end up getting something that you know, is greater than the sum of it.Its parts in terms of what it's able to do. So I think like that is the inva that I'm thinking of is like the way that people would be developing these applications. I will followSwyx: up on, on that because I'm super excited. So when you sent mentioned you have people have a contract with guardrails.I'm actually looking at the validators page on your docs, something, you have something like 20 different contracts that people can have. I'll name some of them just just so that people can have an, have an idea, but also highly encourage people to check it out. Is profanity free, is a, is a good one.Bug-free Python. And that's, that's also pretty, [00:24:00] pretty cool. You have similar to document and extracted summary sentences match. Which I think is, is like don't hallucinate,Shreya: right? Yeah. It's, it's essentially making sure that if you're generating summaries the summary should be very faithful.Yeah. Should be like citable attributable, et cetera to the source text.Swyx: Right. Valid url, which we talked about. Mm-hmm. Maybe open AI is doing a little bit more of internally. Mm-hmm. Maybe open AI uses card rails. You don know be a great endorsement. Uhhuh what is surprisingly popular and what is, what do you think is like underrated?Out of all your contracts? Mm-hmm.Shreya: Mm-hmm. Okay. I think that the, well, not surprisingly, but the most obvious popular ones for me that I've seen are like structure, structure type, et cetera. Anything that kind of guarantees that. So this isn't specifically in the validators, this is essentially like part of the gut, the core proposition.Yeah, the core proposition. I think that is like very popular, but that's also kind of like the first order. Problem that people are kind of solving. I think the sequel thing, for example, it's very exciting because I had just released this like two days ago and then I already got some inbound with like people kinda swapping, like building these products and of swapping it out internally and you know, [00:25:00] getting a lot of value out of what the sequel bug-free SQL provides.So I think like the bug-free SQL is a great example because you can see like how complex these validators can really go because you end up seeing like bug-free sql. What it does is it kind of like takes a connection string or maybe a, a schema file, et cetera. It creates a sandbox SQL environment for you, like from that.And it does that at startups so that like every time you're getting like a text to SQL Query, you're not having to do pay that cost time and time again. It takes that query, it like executes that query on that sandbox in that sandbox environment and then sees if that query is executable or not.And then if there's any errors that you know, like. Packages of those errors very nicely. And if you've configured re-asking it sends it back to the model and you know, basically make sure that that like it tries to get corrected. Sequel. So I think I have an example up there in the docs to be in there, like in applications or something where you can kind of see like how it corrects like weird table names, like weird predicates, et cetera.I think there's other kind of like, You can build pretty complex systems with this. So other things in there are like it takes [00:26:00] information about your database and then injects it into the prompt with like, here's the schema of this table. It automatically, like given a national language query, it finds like what the most similar examples are from the history of like, serving this model and like injects those into the prompt, et cetera.So you end up getting like this very kind of well thought out validator and this very well thought out contract that is, is just way, way, way better than just asking in plain English, the large language model to give you something, right? So I think that is the kind of like experience that I wanna provide.And I basically, you'll see more often the package, my immediateSwyx: response is like, that's cool. It does more than I thought it was gonna do, which is just check the SQL syntax. But you're actually checking against schema, which is. Highly, highly variable. Yeah. It'sShreya: slow though. I love that question. Yeah. Okay.Yeah, so I think like, here's where this idea of like, it doesn't have to be like, you don't have to send every request to your L so you're sampling. Okay. So you can essentially figure out, so for example, like there's like how what guardrails essentially does is there's like corrective actions and re-asking is like one of those corrective actions, [00:27:00] right?But there's like a ton other ways to handle it. Like there's maybe deterministic fixes, like programmatic fixes, there's maybe default values. There's this doesn't work like quite work for sql, but if you're doing like a bunch of structured data and if you know there's an invalid value, you can just filter it or you can just refrain from asking, et cetera.So there's a ton of ways where you can like, just handle errors more gracefully. And the one I kind of wanna point out here is programmatically fixing something that is wrong, like on, on the client side instead of just sending over another request. To the large language model. So for sql, I think the example that I talked about earlier that essentially has like an incorrect table name and to correct the table name, you end up sending another request.But you can think about like other ways to handle disgracefully, right? Like essentially looking at essentially a fuzzy matching with like the existing table names in the repository and in, in the database. And you know, like matching any incorrect names to that. And so you can think of like merging this re-asking thing with like, other error handling things that like smaller, easier errors are able, you can handle them programmatically by just Doing this in like the more patching, patching or I, I guess the more like [00:28:00] classical ML way essentially, like not the super fancy deep learning is like, I think ML 2.0.But like, and this, I, I've been calling it like ML 3.0, but like, even in like ML 1.0 ways you can like, think of how to do this, right? So you're not having to make these like really expensive calls. And so that builds a very powerful system, right? Where you essentially have this, like, depending on what your error is, you don't like, always use G P D three or, or your favorite L M API when you don't need to, you essentially are able to like combine these like other ways, other error handling techniques, like very gracefully so that you get correct outbursts, validated outbursts, and you get them for cheap and like faster, et cetera.So that's, I think there's some other SQL validation things that are in there. So I think like exclude SQL Predicates. Yeah, exclude SQL Predicates. And then there's one about columns that if like some columns are like sensitive columnSwyx: prisons. Yeah. Yeah. Oh, just check if it's there.Shreya: Check if it's there and you know, if there's like only certain columns that you wanna show it to the user and like, maybe like other columns have like private data or sensitive data you know, you can like exclude those and you can think of doing this on the table level.So this is very [00:29:00] easy to do just locally. Right. Like, so there's like different ways essentially to kind of like handle this, which makes for like a more compelling way to build theseSwyx: systems. Yeah. Yeah. By the way, I think we're proving out why. XML was a better choice than SQL Cause now, now you're wrapping sql.Yeah. Yeah. It's pretty cool. Cause you're talking about the text to SQL application example that you put out. It actually puts something, a design choice that isn't talked about very much in center focus, which is your logs. Your logs are gorgeous. I'm sure that took work. I'm sure that's a strong opinion of yours.Yeah. Why do you spend so much time on logs? Just like, how do you, how do you think about designing these things? Should everyone do it this way? What are the drawbacks? Like? Is any like,Shreya: yeah, I'm so excited about this idea of logs because you know, you're like, all of this data is like in there for free, right?Like if you're, if you're do like any validation that is run, like essentially in memory, and then also I write it out to file, et cetera. You essentially get like this you get a history of this was the prompt that was run. This was the this was the L raw LLM output. This was the validation that was run.This was the output of those validations. This [00:30:00] was any corrective actions, et cetera, that were taken. And I think that's like very, like as a developer, like, I'm so happy to see that I use these logs like personally as well.Swyx: Yeah, they're colored. They're like nicely, like there's like form double borders on the, on the logs.I've never seen this in any ML tooling at all.Shreya: Oh, thanks. Yeah. I appreciate it. Yeah, I think this was mostly. For once again, like solving my own problems, which is like, I was building a lot of these things and you know, doing a lot of dog fooding and doing a lot of application building like in notebooks.Yeah. And so in a notebook I wanted to kind of see like what the easiest way to kind of interact with it was. And, and that was kind of what I ended up building. I really appreciate that. I think that's, that's very nice to, nice to hear. I think I'm also thinking about what are, what are interesting ways to be able to like whittle down very deeply into like what kind of went wrong or what is going right when you're like running, running an application and like what the nice kind of interface to design that would be.So yeah, thinking about that problem. Don't have anything on there yet, but, but I do really like this idea of really as a developer you're just like, you really want like all the visibility you can get into what's, [00:31:00] what's happening right. Under the hood. And I wanna be able to provide that. Yeah.Yeah.Swyx: I mean the, the, the downside I'll point out just quickly cuz we, we should, we should move on is that this is not machine readable. So like, how does it work with like a Datadog or, you know? Yeah,Shreya: yeah, yeah, yeah. Well, we can deal with that later. I think that's that's basically my answer as well, that I, I'll do, yeah.Problem for future sreya, basically.Alessio: Yeah. You call Gabriel's SLAs for l m outputs. You know, historically SLAs are pretty objective there's the five nines availability, things like that. How do you build them in a sarcastic system when, say, my queries, like draft me a marketing article. Mm-hmm. Like, Have you read an SLA for something like that?Yeah. But in terms of quality and like, in terms of we talked about what's slow and like latency, like Hmm. Sometimes I would read away more and I, and have a better copy of like, have you thought about what are like the, the access of measurement for some of these things and how should people think about it?Shreya: Yeah, the copy example is interesting because [00:32:00] I think for any of these things, the SLAs are purely on like content and output, not on time. I don't guardrails I don't think even can make any guarantees on the time that it'll take to make these external API calls. But like, even within quality, it's this idea of like, if you're able to communicate what you desire.Either programmatically or by using a model in the loop, then that is something that can be enforced, right? That is something that can be validated and checked. So for example, like for writing content copy, like what's interesting is like for example, if you can break down the copy that you wanna write into, like this is a title, this is maybe a TLDR description, this is a more detailed take on the, the changes or the product announcement, et cetera.And you wanna hit like maybe three, like some set of points in there. So you already kind of like start thinking of like, what was a monolith of like copy to you in, in terms of like smaller building blocks, et cetera. And then on those building blocks you can essentially like then add like certain guarantees.So you can say that let's say like length or readability is a [00:33:00] guarantee. So some of the updates that I pushed today on, on summarization and like specific guards for summarization, one of them essentially was that like the reading time for the summary should be within like some certain amount, right?And so that's like you can start enforcing like all of those guarantees, like on each individual block. So I think like, Some of those things are. Naturally harder to do and you know, like are harder to automate ways. So essentially like, does this copy, I don't know, is this witty or something, right. Or is this Yeah.Something that I guess like the model doesn't have a good idea for, but like other things, as long as you can kind of like enforce them and like check them either via model or programmatically, it's something that you can like start building some some notion of like guarantees around. Yeah.Yeah. So that's why I think about it.Alessio: Yeah. This is super interesting because right now a lot of products are kind of the same because all I do is they call it the model and some are prompted a little differently, but you can only guess so much delta between them in the future. It's be, it'll be really interesting to have products differentiate with the amount of guardrails that they give you.Like you already [00:34:00] see that, Ooh, with open AI today when some people complain that too many of the responses have too much like, Well actually in it where it's like, oh, you ask a question, it's like, but you should remember that's actually not good. And remember this other side of the story and, and all of that.And some people don't want to have that in their automated generation. So, yeah. I'm really curious, and I think to Sean's point before about importing guardrails into products, like if there's a default amount of guardrails that you have and like you've being the provider of it, like that's really powerful.And then maybe there's a faction that is against guardrails and it's like they wanna, they wanna break out, they wanna be free. Yeah. So it's a. Interesting times. Yeah.Shreya: I think to that, like what I, I was actually chatting with someone who was building some application for content creators where like authenticity you know, was a big requirement, like of what they cared about in the right output.And so within authenticity, like why conventional models were not good for them is that they already have a lot of like quote unquote guardrails right. To, to I guess like [00:35:00] appeal to like certain certain sections of the audience to essentially be very cleaned up and then that was like an undesirable trade because that, for them, like, almost took away from that authenticity, et cetera.Right. So I think just this idea of like, I guess like what a guardrail means is like so different for different applications. Like I, I guess like I, there's like about 20 or so things in there. I think there's like a few more that I've added this morning, which Yes. Which are not Yeah. Which are not updated and then in the end.But there's like a lot of the, a lot of the common workflows, like you do have an understanding of like what the right. I guess like what is an appropriate constraint for this? Right. Of course, things like summarization, four things like text sequel, but there's also like so many like just this wide variety of like applications, which are so fascinating to learn about where you, you would wanna build something in-house, which is like your, so which is your secret sauce.And so how Guardrail is kind of designed or, or my intention with designing is that here's this way of breaking down what this problem is, right? Of like getting some determinism, getting some guarantees from your LM outputs. [00:36:00] And you can use this framework and like go crazy with it. Like build whatever you want, right?Like if you want this output to be more authentic or, or, or less clean or whatever, you can like add that in there, like making sure that it does have maybe some profanity and that's a desirable output for you. So I think like the framework side of it is very exciting to me as this, as this way of solving the problem.And then you can build your custom validators or use the ones that I provide out of the box. Yeah. Yeah.Alessio: So chat plugins, it's another big piece of this and. A lot of the integrations are very thin specs and like a lot of prompting, for example, a lot of them are asking to not mention the competitors. I think the Expedia one said, please do not mention any other travel website on the internet.Do not give any other alternative to what we do. Yeah. How do you see all these things come together? Like, do you see guardrails as something that not only helps with the prompting, but also helps with bringing external data into these things, and especially with agents going on any website, do you see each provider having like their own [00:37:00] guardrail where it's like, Hey, this is what you can expect from us, or this is what we want to provide?Or do you think that's, that's not really what, what you're interested in guardrailsShreya: being? Yeah, I think agents are a very fascinating question for me. I don't think I like quite know what the right, who the right owner for this guardrail is. Right. And maybe, I don't know if you guys wanna keep this in there or like maybe cut this front of my answer out, up to, up to you guys.I'm, I'm fine either way, but I think like that problem is, A harder problem to solve just from like a framework design perspective as well. Right. I think this idea of like, okay, right now it's just in the prompt, like don't mention competitors, et cetera. Like that is exactly that use case.Or I feel like, okay, if I was that business owner, right, and if I wanted to build this application, like, is that sufficient? There's like so much prompt injection, right? And you can get, or, or just so much like, just like an absolute lack of guarantees. Like, and, and it's hard to even detect that this is happening.Like let's say I have this running in production and then turns out that there was like some sort of leakage, et cetera, and you know, like my bot has actually been talking about like all of my competitors forever, [00:38:00] right? Like, that's a, that's a substantial risk. And so just this idea of like needing this like post-hoc validation to ensure deterministically that like it does what you want it to do is like, just so is like.As a developer putting myself in the shoes of like people building business applications like that is what gives me like peace of mind, right? So this framework, I think, like applies very well within those settings.Swyx: I'll go right into, we're gonna broaden out a little bit into commentary on other parts of the ecosystem that might, that might be interesting.So I think you and I. Talks briefly about this, but I think the, the broader population should know about it, which is that you also have an LLM API wrapper. Mm-hmm. So, such that the way, part of the way that guardrails works is you in, inject part of the few shot example into the prompt.Mm-hmm. And then you also do re-asking in all the other stuff post, I dunno what the pipeline is in, in, in your terminology. So essentially you have an API wrapper for open ai.completion.com dot create. But so does LangChain, so does Hellicone so does everyone I can name like five other people who are all fighting essentially for [00:39:00] the base layer, LLM API wrapper.Mm-hmm. I think this is valuable real estate, but I don't know how you like, think about working with other people or do you wanna be the base layer, likeShreya: I feel pretty collaboratively about it. I also feel like there's, like lang chain is doing like, it's so flexible as a framework, right?Like you can solve so many of your problems in there. And I think like it's, I, I have like a lang chain integration. I have a GPT Index / Llama integration, et cetera. And I think my view on this is that I wanna integrate with everybody. I think it is valuable real estate. It's not personally real estate that I'm interested in.Like you can essentially bring the LLM callable or the LLM API that's in there. It's just like some stub of a function that you can just add your favorite thing in there, right? It just, the only requirement is that string in first string output, that is all the requirement. And then you can bring in your own favorite component from your own favorite library in order to do that.And so, yeah, it's, I think like I'm pretty focused on this problem of like what is the guardrail that you would wanna build for a certain applications? So it's valuable real estate. I'm sure that people don't own [00:40:00] it.Swyx: It's, as long as people give you a way to insert your stuff, you're good.Shreya: Yeah, yeah. Yeah. I do think that, like I've chat with a bunch of people and then different applications and I do think that the abstractions that I have haven't failed me yet. Like it is very flexible. It is very easy to slot in into any workflow. Yeah.Swyx: I would love to ask about the meta elements of working on guardrails.This is your first company, but you launched five things this morning. The pace of the good AI projects that I've seen out there, like LangChain launches 10 things a week or whatever, I don't know. Surely that's something that you prioritize. How do you, how do you think about like, shipping versus like going going back and like testing and working in community and all the other stuff that you're managing?How do you prioritize? Shreya: That’s such a wonderful question. Yeah. A very hard question as well. I don't know if I would have a good answer for this. I think right now it's instinctive. Like I have a whole kind of stack ranked list of like things I wanna do and features I wanna build and like, support, et cetera.Combined with that is like a feature request I get or maybe some bugs, et cetera, that folks report. So I'm pretty focused on like any failures, any [00:41:00] feature requests from the community. So if those come up, I th those tend to Trump like anything else that I'm working on. But outside of that I have like this whole pool of ideas and like pool of features I wanna build and I kind of.Constantly kind of keep stack ranking them and like pushing something out. So I'm spending like I'm thinking about this problem constantly and as, as a function of that, I have like a ton of ideas for like what would be cool to build and, and what would be the right way to like, do certain things and yeah, wanna basically kind of like I keep jotting it down and keep thinking of like every time I cross something off the list.I think about like, what's the next exciting thing to work on. I think simultaneously with that we mentioned that at the beginning of this conversation, but like this idea of like what the right interface for rail is, right? Like, is it the xl, is it code, et cetera. So I think like those are like fundamental kind of design questions and I'm you know, collaborating with folks and trying to figure that out now.And yeah, I think that's like a parallel project that I'm hoping that yeah, you'll basically, that we'll be out soon. Like in termsSwyx: of the levers, how do you, like, let's just say in like a typical week, is it like 50% [00:42:00] calls with partners mm-hmm. And potential users and just understanding your use cases and the 50% building would you move that, that percentage anyway anywhere?Would you add in something that's significant?Shreya: I think it's frankly very variable week to week. So, yeah. I think early on when I released Guardrails I was like, here's how I'm thinking about this problem. Right? Yeah. Don't need anyone else. You just no, but actually to the contrary, it was like, this is like, I'm very opinionated about like what the right way to solve this is.And this is all of the problems I've thought about and like, and I know this framework maps well to these sets of problems, right? What are your problems? Like there's this whole other like big population of people that are building and you know, I basically wanna make sure that I have like user empathy and I have like I'm able to understand what people are doing and like make sure the framework like maps well.So I think I did a lot of that, like. Immediately after the release, like talking to a lot of teams and talking to a lot of users. I think since then, I basically feel like I have a fair idea of like, you know what's great about it, what's mediocre about it, and what's like, not good about it? And that helps kind of guide my prioritization list of like what I [00:43:00] wanna ship and what I wanna build.So now it's more kind of like, I would say, yeah, back to being more, more balanced. Alessio: All the companies we work with that are in open source, I always try and have them think through open source as a distribution model. Mm-hmm. Or like a development model. I was looking in the contributors list, and you have by far the most code, the second largest contributor. It's your husband. And after that it kind of goes, goes or magnitude lower. What have you found kind of working in, in open source in like a very fast moving project for, for the first time? You know, it's a, like with my husband, it's the community. No, no. It's the, it's the community like, A superpower to you?Do you feel like, do you feel like having to explain why you're doing things a certain way, like getting people buy in is maybe slowing you down when things move so quickly? I'm, I'm always interested to hears people's thoughts.Shreya: Oh that's a good question. I think like, there's part of like, I think guardrails at that stage, right?You know, I have like feature requests and I have [00:44:00] contributors, but I think right now, like I'm doing the bulk of like supporting those feature requests, et cetera. So I think a goal for me, and I remember we chatted about this as well you know, when we, when we spoke last, we're just like, okay.You know, getting into that point where, yeah, you, you essentially like kind of start nurturing and like getting more contributions from like the open source. So I think like that's one of the things that yeah. Is kind of the next goal for me. Yeah, it's been pretty. Fun. I, I would say like up until now, because I haven't made any big breaking a API changes, et cetera, so I haven't like, needed that community input.I think like one of the big ones that is coming right now is like the code, right? Like the code first, a API for creating rails. So I think like that was kind of important for like nailing that user experience, et cetera. So the, so the collaborators that I'm working with, there's basically an an R F C and community input, et cetera, and you know, what the best way to do that would be.And so that's actually, frankly, been like pretty fun as well to see the community be like opinionated about like, here's how I'm doing it and like, this works for me, this doesn't work for me, et cetera. So that's been like new for me as well. Like, I [00:45:00] think I am my previous company we also had like open source project and it was built on open source, but like, this is the first time that I've created a project with an open source project with like that level of engagement.So that's been pretty fun.Swyx: I'm always curious about like potential future business model, modern sensation,Shreya: anything like that. Yeah. I think I'm interested in entrepreneurship generally, honestly, trying to figure out like what the, all of those questions, right?Like business model, ISwyx: think a lot of people are in your shoes, right? They're developers. Mm-hmm. They and see a lot of energy they would like to start working on with open source projects. Mm-hmm. What is a deciding factor? What do you think people should think about when deciding whether or not, Hey, this is just a project that I maintained versus, Nope, I'm going to do the whole thing that get funding and allShreya: that.I think for me So I'm already kind of like I'm al I'm working on the open source full time. I think like the motivating thing for me was that, okay, this is. A problem that would need to get solved, like one way or another.This we talked about in variance earlier, and I do think that this is a, like being able to, like, I think if, if there's a contraction or a correction and [00:46:00] the, these LMS like don't have the kind of impact that we're, we're all hoping they would, I think it would be because of like, this problem because people kind of find that it's not as useful when it's running at very large scales when it's running in production, et cetera.So I think like that was very, that gave me a lot of conviction that it's something that I kind of wanted to work on and that was a switch for me. That it gave me the conviction to, for example, quit my job. Yeah. Also, yeah. Slightly confidential. Off the record. Off the record, yeah. Yeah.Alessio: We're not gonna talk about. Special project at Apple. That's a, that's very secret. Yeah. But you overlap Apple with Ian Goodfellow, which is obviously a, a very public figure in the AI space.Swyx: Actually, not that many people know what he did, so maybe we can, she can introduce Ian Goodfellow as well.Shreya: But, yeah, so Ian Goodfellow is the creator of Ganz or a generative adversarial network.So this was, I think I'm gonna mess up between 1215, I think 14, 15 ish if I remember correctly. So he basically created gans as a PhD student. As a PhD student. And he has a pretty interesting story of like how he thought of them and how [00:47:00] he kind of, Built the, and I I'm sure there's like interviews in like podcasts, et cetera with him where he talks about it, where like, how he got the idea for it and how he kind of like wrote the paper and did the experiments.So gans essentially were kind of like the first wave of generative images where you would see essentially kind of like fake auto-generated images, you know conditioned on like certain distributions. And so they were like very many variants of gans, like DC GAN, I'm gonna mess up the pronunciation, but dub, I'm just gonna call it w GaN.Mm-hmm. GAN Yeah. That like, you would essentially see these like really wonderful generative art. And I do think that like so I, I got the chance to work with him while at Apple. He had just moved to Apple from Google Brain and was building the cross-functional machine learning team within SPG.And I got the chance to work with him, which is very exciting. I learned so much and he is a fantastic manager and yeah, really, really enjoyed working withAlessio: him. And then he, he quit his job when they forced him to go back to the office. Right? That's theSwyx: Oh, really? Oh,Alessio: I didn't see that. Oh, okay. I think he basically, apple was like, you gotta go [00:48:00] back to the office.He said peace. That justSwyx: went toon. I'm curious, like what's some, some things that you learned from Ian that, or maybe some stories that,Shreya: Could be interesting. So there's like one, maybe machine learning specific and like one, maybe not machine learning specific and just general, like career stuff.Yeah. So the ML specific one was that well, Very high level. I think like working with him, you just truly see the creativity. And like after I worked with him, I was like, yeah, I, I totally get that. This is the the guy, like how his, how his brain works it's totally, it's so obvious that this is the guy who made like gans work basically.So I think he, when he does machine learning and when he thinks about like problems to solve, he thinks about it from a very creative out of the box way of thinking about it. And we kind of saw that with like, some of the problems where he was working on where anytime he had like feedback or suggestions on the, on the approaches that I was taking, I was like, wow, this is really exciting and like very creative and yeah, it was very, very cool to work on.So that was very high level machine learning.Swyx: I think the apple, apple standing by with like a blow dart if you, if like, say anymore.Shreya: I think the, the non-technical stuff, which [00:49:00] was I think truly made him such a fantastic manager. But when I went to Apple, I was, you know maybe a year outta school outta my job at that point.And I remember that I like most new grads was. Had like, okay, I, I need to kind of solve this problem on my own before I kind of get external help. Yeah. Yeah. And like, one of my first, I think probably my first or second week, like Ian and I, we were para programming and I remember that we were working together and like some setup issues were happening.And he would wait like exactly 45 seconds before he would like, fire up a message on Slack and like, how do I, how do I kind of fix this? How do they do this? And it just like totally transformed like, like, they're just like us, you know? I think not even that, it's that like. I kind of realized that I was optimizing for the wrong thing, right?By trying to like solve this myself. And instead of just if I'm running into a problem posting on Slack and like getting collaborative information, it wasn't that, yeah, it was, it was more the idea of my job is not like to solve this myself. My job is to solve this period.Mm-hmm. And the fastest way to solve this is the most, is the most correct way to do it. And like, [00:50:00] yeah, I truly, like, he's one of my favorite people. And I truly enjoyed working with him a lot, but that was one of my, Super early into my job there. Like I, I learned that that was You're verySwyx: lucky to do that.Yeah. Yeah. That's awesome. I love learning about the people side. Mm-hmm. You know, because that's what we deal with on a day-to-day basis, so. Mm-hmm. It's really nice to Yeah. To hear about that kind of stuff. Yeah. I was gonna go into one more academia question and then we'll go into lighting rounds.So you're close to Stanford. There'sShreya: obviously a lot of By, by my, yeah. My, my husband basically. Yeah. He doesn't have aSwyx: choice. There's a lot of interesting things coming on to Stanford, right. Vicuna, Alpaca and, and Stanford home. Are you keeping a close eye on like, the academic outputs? What are you seeing that is interesting to you?Shreya: I think obviously because of I'm, I'm focused on this problem, definitely looking at like how people are, you know thinking about the guard rails and like kind of adding more constraints.Swyx: It's such a great name by the way. I love it. Every time I see people say Guardrails, I'm like, yeah. Shreya: Yeah, I appreciate that. So I think like that is definitely one of the things. I think other ones are kind of like more out of like curiosity because of like some ML problems that I worked on in the past. Like I, [00:51:00] I mentioned that I worked on a efficient ml, so looking into like how people are doing, like more efficient inference.I think that is very fascinating to me. Mm-hmm. So, yeah, looking into that. I think evaluation helm was pretty exciting, really looking forward to like longer context length and seeing what's possible with that. More better fine tuning with like maybe lower data, et cetera. I think those are all some of the themes that I'm interested in.Swyx: Yeah. Yeah. Okay. So just because you have more expertise with efficiency, are you talking about quantization? Are you talking about pruning? Are you talking about. Distillation. I doShreya: think that the right way to solve these problems is always like to a mix. Yeah. A mix. Everything of them and like ensemble, all of these methods together.So I think, yeah, basically there's this like constant like tug of war and like push and pull between adding like some of these colonization for example, like improved memory, improved latency, et cetera. But then immediately you get like a performance hit, right? So like there's this like balance between like making it smaller and making it more efficient, but like not losing out on like what that performance is.And it's a big kind of experimentation framework. It's like understanding like where the bottlenecks are. So it's very, it's [00:52:00] very. You know, exploratory and experimental in nature. And so it's hard to kind of like be prescriptive about this is exactly what would work. It like, truly depends, like use case to use case architecture to architecture, hardware to hardware, et cetera.Yeah. WannaAlessio: jump into lightning round? Yeah. You ready?Shreya: I, IAlessio: hope so. Yeah. So we have five questions. Mm-hmm. And yeah, just respond in a sentence or two. Sean sometimes has the follow up tendency to follow up questions. The light. Yeah. You wanna get more info, which is, which is be ready. So the first one we always ask is what's your favorite AI product?Shreya: Very boring answer, but co-pilot life changing. Yeah. Yeah. Absolutely. Love it. Yeah.Swyx: Surprisingly not that many people have called out copilot in Oh, really? In our interviews. Cuz everyone's going to arts, like, they're like mid journeys, they will diff stuff. I see. Gotcha. But yeah, co-pilot is is great.Underrated. Yeah. It's still for $10 a month.Shreya: I mean, why not? Yeah. It's, it's, it's so wonderful.Swyx: I'm looking forward to co-pilot X, which is sort of the next iteration. Yeah.Shreya: I was testing on my co-pilot, so I [00:53:00] just got upgrade my laptop and then setting up vs code. And then I got co-pilot labs, I think is it?Or experimental. Yeah. Even that like Yes. Brushes and stuff. Yeah. Yeah. Yeah.Swyx: That was pretty cool. Talk to Amelia, who works on GitHub next. They, they build copilot labs and there's the voice component, which I don't know if you've tried. Oh, I, I stick whisper with co-pilot.Shreya: I see. It's just like your instructions and, yeah.Yeah. Oh,wellSwyx: also I have rsi. Mm-hmm. So actually sometimes it, it hurts when I type. I So, see it's actually super helpful to talk to your,Shreya: ah, interesting. Okay. Id, yeah, it's pretty, yeah. I, it was, Playing around with it yesterday, I was like, wow, this is so cool.Swyx: Yeah. Next question. What is something you thought would take much longer than, but it's already here.Like this is an acceleration question.Shreya: Let's see. Yeah, maybe this is getting like too developer focused too. Code focused. It's, but I, I do think like a lot of the auto generating code stuff is is really freaking cool. And I think especially if combine it with like maybe testing, right? Mm-hmm.Where you have like code and then you have like test to make sure the code work. And like you have this like, kind of like iterative loop until you refinement, until you're able to kind of [00:54:00] like self-heal code or like automatically generate code. I think like that is superSwyx: fascinating to you. Are you referring to some productsShreya: or demos that Actually I wouldn't give a, a plug for like basically this GitHub action called AutoPR, which like one of my community contributors kind of built using guardrails.And so the idea of what auto PR does is it takes a GitHub issue and if you have the right label for it, it automatically triggers this action where you create a PR given the issue text, et cetera. Huh? Yeah. Oh, it's so cool. It's, so your issue is the prompt. Yeah. Amongst like, other things other like Other context that you don't like?I'm gonna try this out right now. Yeah. Yeah. This is crazy. Yeah, it, it's, it's really cool. So I think like these types of workflows, it will take time before we can use them seamlessly, but Yeah. Truly very fascinating. Alessio: There's another open source project called a Wolverine by BiobootloaderYeah. Yeah, it's cool. It's really cool. It's basically like self-healing code. Yeah. You just let it run and then it makes a mistake and runs in a REPL, takes the code and ask it to just give you the diff and [00:55:00] like drops out the code and runs it again. It justSwyx: automates what I do anyway. Exactly.Alessio: So we can focus on the podcast.Shreya: This is one of the things that won't be automated away. Yeah. I think like, yeah, I, I saw over bringing, I think it was pretty cool and I think I'm very excited about that problem also because if you can think about it as like framing it within the context of these validators, et cetera, right?Like I think so bug-free sequel. What that does is like exactly that workflow of like generates code, executes, it takes failures, re-ask, et cetera. So implements that whole workflow like within a validator. Yeah. Swyx:The future is here.Alessio: Well, this kind of ties into the next question.A year from now, what will be will be the most surprised by in AI?Shreya: Hmm. Yeah. Not to be a downer, but I do think that like how hard it is to truly take these things to production and like get consistently amazing user experiences from it. But I think like this, yeah, we're at that stage where there's basically like a little bit of a gap between like what, what you kind of [00:56:00] see as being very exciting.And I think it's like, it's a demonstration of what's possible with this, right? But like, closing that gap between like what's possible versus like what's consistently deliverable. I think it's, it's a harder problem to solve. So I do think that it's gonna take some time before all of these experiences are like absolutely wonderful.So yeah, I think like a year from now we'll kind of like find some of these things taking a little bit longer than expected.Swyx: Request for startups or request for product. What's an AI thing you would pay for if somebodyShreya: built it? I think this is already exists and I just kind of maybe have to hook it up, et cetera, but I would a hundred percent pay for this, like emails.Emails in my tone. Oh, I see. Yeah, no, keep yeah,Swyx: emails, list your specs. Like what, what should it do? What should IShreya: not do? Yeah. I think like, I basically have an idea always of like this is tldr what I want this email to say. Sure. I want it to be in my tone so that it's not super formal, it's not super like lax, et cetera.I want it to be like tours and short and I want it to like I wanted to have context of like a previous history and maybe some [00:57:00] other like links, et cetera that I'm adding. So I wanted to hook it up to like, some of my data sources and do that. I think that would, I would like pay Yeah.Good money for that every month. Yeah. Nice.Alessio: I, I bill one the only as the, the email trend as the context, but then as a bunch of things like For example, for me it's like if this company is not in the developer tool space, I'm gonna pass on it. So direct to pass email, if the person is asking to schedule, please ask them to send them to send me their calendarly so I can pick a time from there.All these different things I see. But sometimes it's a new thread with somebody you already spoken with a bunch of times, so it should pull all of that stuff too. But I open source all of it because I don't want to deal with storing peoples email. It'sShreya: like the, the hardest thing. Do you find that it does tone well?Like does it match your tone or doesAlessio: it I have to use right now public figures as a I see thing. So it, I do things like write like Paul Graham or write or like, people that are like, have a lot of variety. Oh, that's actually pretty cool. Yeah. You know? Yeah. Yeah. It works pretty well. I see. Nice.There's some things Paul Graham would not [00:58:00] say that it writes in the, in the emails, but overall I would say probably like 20% of the drafts it creates are like, Usually good to go, like 70% it needs some work. And then there's like the 10% that is like, I have no idea why you just said that. It's completely like out of left field.I see. Yeah. But it will, it'll get better if I spend more time on it. But you know, it kind of adds up because I use G B D four, I get a lot of emails, so like having an autodraft responses for everything in my inbox, it, it adds up. So maybe the pattern of having, based on the label you put on the email to auto generate, it'sShreya: it's good.Oh, that's pretty cool. Yeah. And actually, yeah, as a separate follower, I would love to know like all of the ways it messes up and, you know if we get on guard, let's talk about it now. Let's,Swyx: yeah. Sometimes it doesn't, your project should use guardrails.Alessio: Yeah. No, no, no. Definitely. I think sometimes it doesn't understand the, the email is not a pitch, so somebody emails me something that's like unrelated and then it's like, oh, thank you.[00:59:00]But since you're not working in the space, I'm not gonna be investing in you. But good luck with the rest of your fundraise. But it's like, never mention a fundraise, but because in the prompt, it, as part of the prompt is like, if it's a pitch and it's not in the space, a pre-draft, an email, it thinks it has to do it a lot more than it should.Or like, same with scheduling somebody you know, any sales call that, any sales email that I get, it always wants to schedule a call with them. And I was like, I don't wanna meet with them, I don't wanna buy this thing. But the, the context of the email is like, they wanna schedule something so the responders you know, is helping you schedule, but it doesn't know that I don't want to, doesShreya: it like autodraft all, like is there any input that you give for each email or does it autodraft everything?Alessio: I just give it the tread and then a blank blank slate. I don't give it anything else because I wanted to run while I'm not in the inbox, but yours. It's a little better. What I'm doing is draft generation. What you wanna do is like draft expansion. So instead of looking at the [01:00:00] inbox in your case, you will look at the draft folder and look through each draft and expend the draft.Yeah, to be a full response, which makes a lot of sense.Shreya: Yeah, that's pretty interesting. I, I can think of like some guardrails that I can know quick, quick and dirty guardrails that I can hook up that would make some of those problems like go away. Yeah. Yeah,Swyx: like as in do they existShreya: now or they don't exist?They don't exist now, but I can like, think about like, I'm like always looking for problems so yeah. This is aSwyx: API design issue, right? Because if, if one conversation, you come away with like three guardrails and then another conversation, you come, none of three guardrails. How do you think about like, there's so many APIs that you could possibly do, right?You need to design for generally composable orShreya: reusable APIs. Yeah, so I would probably like break this down into like, like a relevant action item guardrail or something, right? And it's basically like essentially only talk about, or only like the action items should only be things that are within the context of those emails.And if something hasn't been mentioned, don't add context about that. So that would probably be a generic gar that I could, I could add. And then you, you could probably configure it with like, what are the sets of like [01:01:00] follow up action items that you typically have and, and correct for it that way.Swyx: We, we just heard a new API being designed live, which doesn't happen very often.Shreya: It's very cool. Yeah. AndAlessio: last but not least, if there's one thing you want people to take away about AI and kind of this moment that we're in, in technology, what would that be?Shreya: I do think this is the most exciting time in machine learning, as least as long as I've been working on it.And so I do think, like, frankly, we're all just so lucky to kind of be living through this and it's just very fascinating to be part of that. I think at the same time the technology is so exciting that you, you get like, Driven by wanting to use it. But I think like really thinking about like what's the best way to use it along with like other systems that have existed so that it's more kind of like task focused and like outcome focused rather than like technology focused.So this kind of like obviously I'm biased because I feel this way because I've designed guardrails this way that it kind of like merges LLMs with rules and heuristics and like traditional ML, et cetera. But I do think [01:02:00] that like this, this general framework of like thinking about how to build ML products is something that I'm bullish on and something I'd want people to like think about as well.Yeah.Alessio: Awesome. Well thank you so much for comingShreya: Yeah, absolutely. Thanks for inviting me. Get full access to Latent Space at www.latent.space/subscribe
01:02:2816/05/2023
The AI Founder Gene: Being Early, Building Fast, and Believing in Greatness — with Sharif Shameem of Lexica
Thanks to the over 42,000 latent space explorers who checked out our Replit episode! We are hosting/attending a couple more events in SF and NYC this month. See you if in town!Lexica.art was introduced to the world 24 hours after the release of Stable Diffusion as a search engine for prompts, gaining instant product-market fit as a world discovering generative AI also found they needed to learn prompting by example.Lexica is now 8 months old, serving 5B image searches/day, and just shipped V3 of Lexica Aperture, their own text-to-image model! Sharif Shameem breaks his podcast hiatus with us for an exclusive interview covering his journey building everything with AI!The conversation is nominally about Sharif’s journey through his three startups VectorDash, Debuild, and now Lexica, but really a deeper introspection into what it takes to be a top founder in the fastest moving tech startup scene (possibly ever) of AI. We hope you enjoy this conversation as much as we did!Full transcript is below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Timestamps* [00:00] Introducing Sharif* [02:00] VectorDash* [05:00] The GPT3 Moment and Building Debuild* [09:00] Stable Diffusion and Lexica* [11:00] Lexica’s Launch & How it Works* [15:00] Being Chronically Early* [16:00] From Search to Custom Models* [17:00] AI Grant Learnings* [19:30] The Text to Image Illuminati?* [20:30] How to Learn to Train Models* [24:00] The future of Agents and Human Intervention* [29:30] GPT4 and Multimodality* [33:30] Sharif’s Startup Manual* [38:30] Lexica Aperture V1/2/3* [40:00] Request for AI Startup - LLM Tools* [41:00] Sequencing your Genome* [42:00] Believe in Doing Great Things* [44:30] Lightning RoundShow Notes* Sharif’s website, Twitter, LinkedIn* VectorDash (5x cheaper than AWS)* Debuild Insider, Fast company, MIT review, tweet, tweet* Lexica* Introducing Lexica* Lexica Stats* Aug: “God mode” search* Sep: Lexica API * Sept: Search engine with CLIP * Sept: Reverse image search* Nov: teasing Aperture* Dec: Aperture v1* Dec - Aperture v2* Jan 2023 - Outpainting* Apr 2023 - Aperture v3* Same.energy* AI Grant* Sharif on Agents: prescient Airpods tweet, Reflection* MiniGPT4 - Sharif on Multimodality* Sharif Startup Manual* Sharif Future* 23andMe Genome Sequencing Tool: Promethease* Lightning Round* Fave AI Product: Cursor.so. Swyx ChatGPT Menubar App.* Acceleration: Multimodality of GPT4. Animated Drawings* Request for Startup: Tools for LLMs, Brex for GPT Agents* Message: Build Weird Ideas!TranscriptAlessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO on Residence at Decibel Partners. I'm joined by my co-host Wix, writer and editor of Latent Space. And today we have Sharish Amin. Welcome to the studio. Sharif: Awesome. Thanks for the invite.Swyx: Really glad to have you. [00:00] Introducing SharifSwyx: You've been a dream guest, actually, since we started drafting guest lists for this pod. So glad we could finally make this happen. So what I like to do is usually introduce people, offer their LinkedIn, and then prompt you for what's not on your LinkedIn. And to get a little bit of the person behind the awesome projects. So you graduated University of Maryland in CS. Sharif: So I actually didn't graduate, but I did study. Swyx: You did not graduate. You dropped out. Sharif: I did drop out. Swyx: What was the decision behind dropping out? Sharif: So first of all, I wasn't doing too well in any of my classes. I was working on a side project that took up most of my time. Then I spoke to this guy who ended up being one of our investors. And he was like, actually, I ended up dropping out. I did YC. And my company didn't end up working out. And I returned to school and graduated along with my friends. I was like, oh, it's actually a reversible decision. And that was like that. And then I read this book called The Case Against Education by Brian Kaplan. So those two things kind of sealed the deal for me on dropping out. Swyx: Are you still on hiatus? Could you still theoretically go back? Sharif: Theoretically, probably. Yeah. Still on indefinite leave. Swyx: Then you did some work at Mitra? Sharif: Mitra, yeah. So they're lesser known. So they're technically like an FFRDC, a federally funded research and development center. So they're kind of like a large government contractor, but nonprofit. Yeah, I did some computer vision work there as well. [02:00] VectorDashSwyx: But it seems like you always have an independent founder bone in you. Because then you started working on VectorDash, which is distributed GPUs. Sharif: Yes. Yeah. So VectorDash was a really fun project that we ended up working on for a while. So while I was at Mitra, I had a friend who was mining Ethereum. This was, I think, 2016 or 2017. Oh my God. Yeah. And he was mining on his NVIDIA 1080Ti, making around like five or six dollars a day. And I was trying to train a character recurrent neural network, like a character RNN on my iMessage text messages to make it like a chatbot. Because I was just curious if I could do it. Because iMessage stores all your past messages from years ago in a SQL database, which is pretty nifty. But I wanted to train it. And I needed a GPU. And it was, I think, $60 to $80 for a T4 on AWS, which is really slow compared to a 1080Ti. If you normalize the cost and performance versus the 1080Ti when someone's mining Ethereum, it's like a 20x difference. So I was like, hey, his name was Alex. Alex, I'll give you like 10 bucks if you let me borrow your 1080Ti for a week. I'll give you 10 bucks per day. And it was like 70 bucks. And I used it to train my model. And it worked great. The model was really bad, but the whole trade worked really great. I got a really high performance GPU to train my model on. He got much more than he was making by mining Ethereum. So we had this idea. I was like, hey, what if we built this marketplace where people could rent their GPUs where they're mining cryptocurrency and machine learning researchers could just rent them out and pay a lot cheaper than they would pay AWS. And it worked pretty well. We launched in a few months. We had over 120,000 NVIDIA GPUs on the platform. And then we were the cheapest GPU cloud provider for like a solid year or so. You could rent a pretty solid GPU for like 20 cents an hour. And cryptocurrency miners were making more than they would make mining crypto because this was after the Ethereum crash. And yeah, it was pretty cool. It just turns out that a lot of our customers were college students and researchers who didn't have much money. And they weren't necessarily the best customers to have as a business. Startups had a ton of credits and larger companies were like, actually, we don't really trust you with our data, which makes sense. Yeah, we ended up pivoting that to becoming a cloud GPU provider for video games. So we would stream games from our GPUs. Oftentimes, like many were located just a few blocks away from you because we had the lowest latency of any cloud GPU provider, even lower than like AWS and sometimes Cloudflare. And we decided to build a cloud gaming platform where you could pretty much play your own games on the GPU and then stream it back to your Mac or PC. Swyx: So Stadia before Stadia. Sharif: Yeah, Stadia before Stadia. It's like a year or so before Stadia. Swtx: Wow. Weren't you jealous of, I mean, I don't know, it sounds like Stadia could have bought you or Google could have bought you for Stadia and that never happened? Sharif: It never happened. Yeah, it didn't end up working out for a few reasons. The biggest thing was internet bandwidth. So a lot of the hosts, the GPU hosts had lots of GPUs, but average upload bandwidth in the United States is only 35 megabits per second, I think. And like a 4K stream needs like a minimum of 15 to 20 megabits per second. So you could really only utilize one of those GPUs, even if they had like 60 or 100. [05:00] The GPT3 Moment and Building DebuildSwyx: And then you went to debuild July 2020, is the date that I have. I'm actually kind of just curious, like what was your GPT-3 aha moment? When were you like GPT-3-pilled? Sharif: Okay, so I first heard about it because I was also working on another chatbot. So this was like after, like everything ties back to this chatbot I'm trying to make. This was after working on VectorDash. I was just like hacking on random projects. I wanted to make the chatbot using not really GPT-2, but rather just like it would be pre-programmed. It was pretty much you would give it a goal and then it would ask you throughout the week how much progress you're making to that goal. So take your unstructured response, usually a reply to a text message, and then it would like, plot it for you in like a table and you could see your progress over time. It could be for running or tracking calories. But I wanted to use GPT-3 to make it seem more natural because I remember someone on Bookface, which is still YC's internal forum. They posted and they were like, OpenAI just released AGI and it's GPT-3. I asked it like a bunch of logic puzzles and it solved them all perfectly. And I was like, what? How's no one else talking about this? Like this is either like the greatest thing ever that everyone is missing or like it's not that good. So like I tweeted out if anyone could get me access to it. A few hours later, Greg Brockman responded. Swyx: He is everywhere. Sharif: He's great. Yeah, he's on top of things. And yeah, by that afternoon, I was like messing around with the API and I was like, wow, this is incredible. You could chat with fake people or people that have passed away. You could like, I remember the first conversation I did was this is a chat with Steve Jobs and it was like, interviewer, hi. What are you up to today on Steve? And then like you could talk to Steve Jobs and it was somewhat plausible. Oh, the thing that really blew my mind was I tried to generate code with it. So I'd write the function for a JavaScript header or the header for a JavaScript function. And it would complete the rest of the function. I was like, whoa, does this code actually work? Like I copied it and ran it and it worked. And I tried it again. I gave more complex things and like I kind of understood where it would break, which was like if it was like something, like if it was something you couldn't easily describe in a sentence and like contain all the logic for in a single sentence. So I wanted to build a way where I could visually test whether these functions were actually working. And what I was doing was like I was generating the code in the playground, copying it into my VS code editor, running it and then reloading the react development page. And I was like, okay, cool. That works. So I was like, wait, let me just put this all in like the same page so I can just compile in the browser, run it in the browser and then submit it to the API in the browser as well. So I did that. And it was really just like a simple loop where you just type in the prompt. It would generate the code and then compile it directly in the browser. And it showed you the response. And I did this for like very basic JSX react components. I mean, it worked. It was pretty mind blowing. I remember staying up all night, like working on it. And it was like the coolest thing I'd ever worked on at the time so far. Yeah. And then I was like so mind blowing that no one was talking about this whole GPT three thing. I was like, why is this not on everyone's minds? So I recorded a quick 30 second demo and I posted on Twitter and like I go to bed after staying awake for like 20 hours straight. When I wake up the next morning and I had like 20,000 likes and like 100,000 people had viewed it. I was like, oh, this is so cool. And then I just kept putting demos out for like the next week. And yeah, that was like my GPT three spark moment. Swyx: And you got featured in like Fast Company, MIT Tech Review, you know, a bunch of stuff, right? Sharif: Yeah. Yeah. I think a lot of it was just like the API had been there for like a month prior already. Swyx: Not everyone had access. Sharif: That's true. Not everyone had access. Swyx: So you just had the gumption to tweet it out. And obviously, Greg, you know, on top of things as always. Sharif: Yeah. Yeah. I think it also makes a lot of sense when you kind of share things in a way that's easily consumable for people to understand. Whereas if you had shown a terminal screenshot of a generating code, that'd be pretty compelling. But whereas seeing it get rendered and compiled directly in front of you, there's a lot more interesting. There's also that human aspect to it where you want to relate things to the end user, not just like no one really cares about evals. When you can create a much more compelling demo explaining how it does on certain tasks. [09:00] Stable Diffusion and LexicaSwyx: Okay. We'll round it out soon. But in 2022, you moved from Debuild to Lexica, which was the search engine. I assume this was inspired by stable diffusion, but I can get the history there a little bit. Sharif: Yeah. So I was still working on Debuild. We were growing at like a modest pace and I was in the stable... Swyx: I was on the signup list. I never got off. Sharif: Oh yeah. Well, we'll get you off. It's not getting many updates anymore, but yeah, I was in the stable diffusion discord and I was in it for like many hours a day. It was just like the most exciting thing I'd ever done in a discord. It was so cool. Like people were generating so many images, but I didn't really know how to write prompts and people were like writing really complicated things. They would be like, like a modern home training on our station by Greg Rutkowski, like a 4k Unreal Engine. It's like that there's no way that actually makes the images look better. But everyone was just kind of copying everyone else's prompts and like changing like the first few words. Swyx: Yeah. Yeah. Sharif: So I was like using the discord search bar and it was really bad because it showed like five images at a time. And I was like, you know what? I could build a much better interface for this. So I ended up scraping the entire discord. It was like 10 million images. I put them in a database and I just pretty much built a very basic search engine where you could just type for type a word and then it returned all the prompts that had that word. And I built the entire website for it in like 20, in like about two days. And we shipped it the day I shipped it the day after the stable diffusion weights were open sourced. So about 24 hours later and it kind of took off in a way that I never would have expected. Like I thought it'd be this cool utility that like hardcore stable diffusion users would find useful. But it turns out that almost anyone who mentioned stable diffusion would also kind of mention Lexica in conjunction with it. I think it's because it was like it captured the zeitgeist in an easy to share way where it's like this URL and there's this gallery and you can search. Whereas running the model locally was a lot harder. You'd have to like to deploy it on your own GPU and like set up your own environment and like do all that stuff. Swyx: Oh, my takeaway. I have two more to add to the reasons why Lexica works at the time. One is lower latency is all you need. So in other words, instead of waiting a minute for your image, you could just search and find stuff that other people have done. That's good. And then two is everyone knew how to search already, but people didn't know how to prompt. So you were the bridge. Sharif: That's true. Yeah. You would get a lot better looking images by typing a one word prompt versus prompting for that one word. Yeah. Swyx: Yeah. That is interesting. [11:00] Lexica’s Explosion at LaunchAlessio: The numbers kind of speak for themselves, right? Like 24 hours post launch, 51,000 queries, like 2.2 terabytes in bandwidth. Going back to the bandwidth problem that you have before, like you would have definitely run into that. Day two, you doubled that. It's like 111,000 queries, four and a half terabytes in bandwidth, 22 million images served. So it's pretty crazy. Sharif: Yeah. I think we're, we're doing like over 5 billion images served per month now. It's like, yeah, that's, it's pretty crazy how much things have changed since then. Swyx: Yeah. I'm still showing people like today, even today, you know, it's been a few months now. This is where you start to learn image prompting because they don't know. Sharif: Yeah, it is interesting. And I, it's weird because I didn't really think it would be a company. I thought it would just be like a cool utility or like a cool tool that I would use for myself. And I really was just building it for myself just because I didn't want to use the Discord search bar. But yeah, it was interesting that a lot of other people found it pretty useful as well. [11:00] How Lexica WorksSwyx: So there's a lot of things that you release in a short amount of time. The God mode search was kind of like, obviously the first thing, I guess, like maybe to talk about some of the underlying technology you're using clip to kind of find, you know, go from image to like description and then let people search it. Maybe talk a little bit about what it takes to actually make the search magic happen. Sharif: Yeah. So the original search was just using Postgres' full text search and it would only search the text contents of the prompt. But I was inspired by another website called Same Energy, where like a visual search engine. It's really cool. Do you know what happened to that guy? I don't. Swyx: He released it and then he disappeared from the internet. Sharif: I don't know what happened to him, but I'm sure he's working on something really cool. He also worked on like Tabnine, which was like the very first version of Copilot or like even before Copilot was Copilot. But yeah, inspired by that, I thought like being able to search images by their semantics. The contents of the image was really interesting. So I pretty much decided to create a search index on the clip embeddings, the clip image embeddings of all the images. And when you would search it, we would just do KNN search on pretty much the image embedding index. I mean, we had way too many embeddings to store on like a regular database. So we had to end up using FAISS, which is a Facebook library for really fast KNN search and embedding search. That was pretty fun to set up. It actually runs only on CPUs, which is really cool. It's super efficient. You compute the embeddings on GPUs, but like you can serve it all on like an eight core server and it's really, really fast. Once we released the semantic search on the clip embeddings, people were using the search way more. And you could do other cool things. You could do like similar image search where if you found like a specific image you liked, you could upload it and it would show you relevant images as well. Swyx: And then right after that, you raised your seed money from AI grant, NetFreedman, then Gross. Sharif: Yeah, we raised about $5 million from Daniel Gross. And then we also participated in AI grant. That was pretty cool. That was kind of the inflection point. Not much before that point, Lexic was kind of still a side project. And I told myself that I would focus on it full time or I'd consider focusing on it full time if we had broke like a million users. I was like, oh, that's gonna be like years away for sure. And then we ended up doing that in like the first week and a half. I was like, okay, there's something here. And it was kind of that like deal was like growing like pretty slowly and like pretty linearly. And then Lexica was just like this thing that just kept going up and up and up. And I was so confused. I was like, man, people really like looking at pictures. This is crazy. Yeah. And then we decided to pivot the entire company and just focus on Lexica full time at that point. And then we raised our seed round. [15:00] Being Chronically EarlySwyx: Yeah. So one thing that you casually dropped out, the one that slip, you said you were working on Lexica before the launch of Stable Diffusion such that you were able to launch Lexica one day after Stable Diffusion. Sharif: Yeah.Swyx: How did you get so early into Stable Diffusion? Cause I didn't hear about it. Sharif: Oh, that's a good question. I, where did I first hear about Stable Diffusion? I'm not entirely sure. It must've been like somewhere on Twitter or something. That changed your life. Yeah, it was great. And I got into the discord cause I'd used Dolly too before, but, um, there were a lot of restrictions in place where you can generate human faces at the time. You can do that now. But when I first got access to it, like you couldn't do any faces. It was like, there were like a, the list of adjectives you couldn't use was quite long. Like I had a friend from Pakistan and it can generate anything with the word Pakistan in it for some reason. But Stable Diffusion was like kind of the exact opposite where there were like very, very few rules. So that was really, really fun and interesting, especially seeing the chaos of like a bunch of other people also using it right in front of you. That was just so much fun. And I just wanted to do something with it. I thought it was honestly really fun. Swyx: Oh, well, I was just trying to get tips on how to be early on things. Cause you're pretty consistently early to things, right? You were Stadia before Stadia. Um, and then obviously you were on. Sharif: Well, Stadia is kind of shut down now. So I don't know if being early to that was a good one. Swyx: Um, I think like, you know, just being consistently early to things that, uh, you know, have a lot of potential, like one of them is going to work out and you know, then that's how you got Lexica. [16:00] From Search to Custom ModelsAlessio: How did you decide to go from search to running your own models for a generation? Sharif: That's a good question. So we kind of realized that the way people were using Lexica was they would have Lexica open in one tab and then in another tab, they'd have a Stable Diffusion interface. It would be like either a discord or like a local run interface, like the automatic radio UI, um, or something else. I just, I would watch people use it and they would like all tabs back and forth between Lexica and their other UI. And they would like to scroll through Lexica, click on the prompt, click on an image, copy the prompt, and then paste it and maybe change a word or two. And I was like, this should really kind of just be all within Lexica. Like, it'd be so cool if you could just click a button in Lexica and get an editor and generate your images. And I found myself also doing the all tab thing, or it was really frustrating. I was like, man, this is kind of tedious. Like I really wish it was much simpler. So we just built generations directly within Lexica. Um, so we do, we deployed it on, I don't remember when we first launched, I think it was November, December. And yeah, people love generating directly within it. [17:00] AI Grant LearningsSwyx: I was also thinking that this was coming out of AI grants where, you know, I think, um, yeah, I was like a very special program. I was just wondering if you learned anything from, you know, that special week where everyone was in town. Sharif: Yeah, that was a great week. I loved it. Swyx: Yeah. Bring us, bring us in a little bit. Cause it was awesome. There. Sharif: Oh, sure. Yeah. It's really, really cool. Like all the founders in AI grants are like fantastic people. And so I think the main takeaway from the AI grant was like, you have this massive overhang in compute or in capabilities in terms of like these latest AI models, but to the average person, there's really not that many products that are that cool or useful to them. Like the latest one that has hit the zeitgeist was chat GPT, which used arguably the same GPT three model, but like RLHF, but you could have arguably built like a decent chat GPT product just using the original GPT three model. But no one really did it. Now there were some restrictions in place and opening. I like to slowly release them over the few months or years after they release the original API. But the core premise behind AI grants is that there are way more capabilities than there are products. So focus on building really compelling products and get people to use them. And like to focus less on things like hitting state of the art on evals and more on getting users to use something. Swyx: Make something people want.Sharif: Exactly. Host: Yeah, we did an episode on LLM benchmarks and we kind of talked about how the benchmarks kind of constrain what people work on, because if your model is not going to do well, unlike the well-known benchmarks, it's not going to get as much interest and like funding. So going at it from a product lens is cool. [19:30] The Text to Image Illuminati?Swyx: My hypothesis when I was seeing the sequence of events for AI grants and then for Lexica Aperture was that you had some kind of magical dinner with Emad and David Holtz. And then they taught you the secrets of training your own model. Is that how it happens? Sharif: No, there's no secret dinner. The Illuminati of text to image. We did not have a meeting. I mean, even if we did, I wouldn't tell you. But it really boils down to just having good data. If you think about diffusion models, really the only thing they do is learn a distribution of data. So if you have high quality data, learn that high quality distribution. Or if you have low quality data, it will learn to generate images that look like they're from that distribution. So really it boils down to the data and the amount of data you have and that quality of that data, which means a lot of the work in training high quality models, at least diffusion models, is not really in the model architecture, but rather just filtering the data in a way that makes sense. So for Lexica, we do a lot of aesthetic scoring on images and we use the rankings we get from our website because we get tens of millions of people visiting it every month. So we can capture a lot of rankings. Oh, this person liked this image when they saw this one right next to it. Therefore, they probably preferred this one over that. You can do pairwise ranking to rank images and then compute like ELO scores. You can also just train aesthetic models to learn to classify a model, whether or not someone will like it or whether or not it's like, rank it on a scale of like one to ten, for example. So we mostly use a lot of the traffic we get from Lexica and use that to kind of filter our data sets and use that to train better aesthetic models. [20:30] How to Learn to Train ModelsSwyx: You had been a machine learning engineer before. You've been more of an infrastructure guy. To build, you were more of a prompt engineer with a bit of web design. This was the first time that you were basically training your own model. What was the wrap up like? You know, not to give away any secret sauce, but I think a lot of people who are traditional software engineers are feeling a lot of, I don't know, fear when encountering these kinds of domains. Sharif: Yeah, I think it makes a lot of sense. And to be fair, I didn't have much experience training massive models at this scale before I did it. A lot of times it's really just like, in the same way when you're first learning to program, you would just take the problem you're having, Google it, and go through the stack overflow post. And then you figure it out, but ultimately you will get to the answer. It might take you a lot longer than someone who's experienced, but I think there are enough resources out there where it's possible to learn how to do these things. Either just reading through GitHub issues for relevant models. Swyx: Oh God. Sharif: Yeah. It's really just like, you might be slower, but it's definitely still possible. And there are really great courses out there. The Fast AI course is fantastic. There's the deep learning book, which is great for fundamentals. And then Andrej Karpathy's online courses are also excellent, especially for language modeling. You might be a bit slower for the first few months, but ultimately I think if you have the programming skills, you'll catch up pretty quickly. It's not like this magical dark science that only three people in the world know how to do well. Probably was like 10 years ago, but now it's becoming much more open. You have open source collectives like Eleuther and LAION, where they like to share the details of their large scale training runs. So you can learn from a lot of those people. Swyx: Yeah. I think what is different for programmers is having to estimate significant costs upfront before they hit run. Because it's not a thing that you normally consider when you're coding, but yeah, like burning through your credits is a fear that people have. Sharif: Yeah, that does make sense. In that case, like fine tuning larger models gets you really, really far. Even using things like low rank adaptation to fine tune, where you can like fine tune much more efficiently on a single GPU. Yeah, I think people are underestimating how far you can really get just using open source models. I mean, before Lexica, I was working on Debuild and we were using the GP3 API, but I was also like really impressed at how well you could get open source models to run by just like using the API, collecting enough samples from like real world user feedback or real world user data using your product. And then just fine tuning the smaller open source models on those examples. And now you have a model that's pretty much state of the art for your specific domain. Whereas the runtime cost is like 10 times or even 100 times cheaper than using an API. Swyx: And was that like GPT-J or are you talking BERT? Sharif: I remember we tried GPT-J, but I think FLAN-T5 was like the best model we were able to use for that use case. FLAN-T5 is awesome. If you can, like if your prompt is small enough, it's pretty great. And I'm sure there are much better open source models now. Like Vicuna, which is like the GPT-4 variant of like Lama fine tuned on like GPT-4 outputs. Yeah, they're just going to get better and they're going to get better much, much faster. Swyx: Yeah. We're just talking in a previous episode to the creator of Dolly, Mike Conover, which is actually commercially usable instead of Vicuna, which is a research project. Sharif: Oh, wow. Yeah, that's pretty cool. [24:00] Why No Agents?Alessio: I know you mentioned being early. Obviously, agents are one of the hot things here. In 2021, you had this, please buy me AirPods, like a demo that you tweeted with the GPT-3 API. Obviously, one of the things about being early in this space, you can only do one thing at a time, right? And you had one tweet recently where you said you hoped that that demo would open Pandora's box for a bunch of weird GPT agents. But all we got were docs powered by GPT. Can you maybe talk a little bit about, you know, things that you wish you would see or, you know, in the last few, last few weeks, we've had, you know, Hugging GPT, Baby AGI, Auto GPT, all these different kind of like agent projects that maybe now are getting closer to the, what did you say, 50% of internet traffic being skips of GPT agents. What are you most excited about, about these projects and what's coming? Sharif: Yeah, so we wanted a way for users to be able to paste in a link for the documentation page for a specific API, and then describe how to call that API. And then the way we would need to pretty much do that for Debuild was we wondered if we could get an agent to browse the docs page, read through it, summarize it, and then maybe even do things like create an API key and register it for that user. To do that, we needed a way for the agent to read the web page and interact with it. So I spent about a day working on that demo where we just took the web page, serialized it into a more compact form that fit within the 2048 token limit of like GPT-3 at the time. And then just decide what action to do. And then it would, if the page was too long, it would break it down into chunks. And then you would have like a sub prompt, decide on which chunk had the best action. And then at the top node, you would just pretty much take that action and then run it in a loop. It was really, really expensive. I think that one 60 second demo cost like a hundred bucks or something, but it was wildly impractical. But you could clearly see that agents were going to be a thing, especially ones that could read and write and take actions on the internet. It was just prohibitively expensive at the time. And the context limit was way too small. But yeah, I think it seems like a lot of people are taking it more seriously now, mostly because GPT-4 is way more capable. The context limit's like four times larger at 8,000 tokens, soon 32,000. And I think the only problem that's left to solve is finding a really good representation for a webpage that allows it to be consumed by a text only model. So some examples are like, you could just take all the text and pass it in, but that's probably too long. You could take all the interactive only elements like buttons and inputs, but then you miss a lot of the relevant context. There are some interesting examples, which I really like is you could run the webpage or you could run the browser in a terminal based browser. So there are some browsers that run in your terminal, which serialize everything into text. And what you can do is just take that frame from that terminal based browser and pass that directly to the model. And it's like a really, really good representation of the webpage because they do things where for graphical elements, they kind of render it using ASCII blocks. But for text, they render it as actual text. So you could just remove all the weird graphical elements, just keep all the text. And that works surprisingly well. And then there are other problems to solve, which is how do you get the model to take an action? So for example, if you have a booking page and there's like a calendar and there are 30 days on the calendar, how do you get it to specify which button to press? It could say 30, and you can match string based and like find the 30. But for example, what if it's like a list of friends in Facebook and trying to delete a friend? There might be like 30 delete buttons. How do you specify which one to click on? The model might say like, oh, click on the one for like Mark. But then you'd have to figure out the delete button in relation to Mark. And there are some ways to solve this. One is there's a cool Chrome extension called Vimium, which lets you use Vim in your Chrome browser. And what you do is you can press F and over every interactive element, it gives you like a character or two characters. Or if you type those two characters, it presses that button or it opens or focuses on that input. So you could combine a lot of these ideas and then get a really good representation of the web browser in text, and then also give the model a really, really good way to control the browser as well. And I think those two are the core part of the problem. The reasoning ability is definitely there. If a model can score in the top 10% on the bar exam, it can definitely browse a web page. It's really just how do you represent text to the model and how do you get the model to perform actions back on the web page? Really, it's just an engineering problem. Swyx: I have one doubt, which I'd love your thoughts on. How do you get the model to pause when it doesn't have enough information and ask you for additional information because you under specified your original request? Sharif: This is interesting. I think the only way to do this is to have a corpus where your training data is like these sessions of agents browsing the web. And you have to pretty much figure out where the ones that went wrong or the agents that went wrong, or did they go wrong and just replace it with, hey, I need some help. And then if you were to fine tune a larger model on that data set, you would pretty much get them to say, hey, I need help on the instances where they didn't know what to do next. Or if you're using a closed source model like GPT-4, you could probably tell it if you're uncertain about what to do next, ask the user for help. And it probably would be pretty good at that. I've had to write a lot of integration tests in my engineering days and like the dome. Alessio: They might be over. Yeah, I hope so. I hope so. I don't want to, I don't want to deal with that anymore. I, yeah, I don't want to write them the old way. Yeah. But I'm just thinking like, you know, we had the robots, the TXT for like crawlers. Like I can definitely see the DOM being reshaped a little bit in terms of accessibility. Like sometimes you have to write expats that are like so long just to get to a button. Like there should be a better way to do it. And maybe this will drive the change, you know, making it easier for these models to interact with your website. Sharif: There is the Chrome accessibility tree, which is used by screen readers, but a lot of times it's missing a lot of, a lot of useful information. But like in a perfect world, everything would be perfectly annotated for screen readers and we could just use that. That's not the case. [29:30] GPT4 and MultimodalitySwyx: GPT-4 multimodal, has your buddy, Greg, and do you think that that would solve essentially browser agents or desktop agents? Sharif: Greg has not come through yet, unfortunately. But it would make things a lot easier, especially for graphically heavy web pages. So for example, you were using Yelp and like using the map view, it would make a lot of sense to use something like that versus a text based input. Where, how do you serialize a map into text? It's kind of hard to do that. So for more complex web pages, that would make it a lot easier. You get a lot more context to the model. I mean, it seems like that multimodal input is very dense in the sense that it can read text and it can read it really, really well. So you could probably give it like a PDF and it would be able to extract all the text and summarize it. So if it can do that, it could probably do anything on any webpage. Swyx: Yeah. And given that you have some experience integrating Clip with language models, how would you describe how different GPT-4 is compared to that stuff? Sharif: Yeah. Clip is entirely different in the sense that it's really just good at putting images and text into the same latent space. And really the only thing that's useful for is similarity and clustering. Swyx: Like literally the same energy, right? Sharif: Yeah. Swyx: Yeah. And then there's Blip and Blip2. I don't know if you like those. Sharif: Yeah. Blip2 is a lot better. There's actually a new project called, I think, Mini GPT-4. Swyx: Yes. It was just out today. Sharif: Oh, nice. Yeah. It's really cool. It's actually really good. I think that one is based on the Lama model, but yeah, that's, that's like another. Host: It's Blip plus Lama, right? So they, they're like running through Blip and then have Lama ask your, interpret your questions so that you do visual QA. Sharif: Oh, that's cool. That's really clever. Yeah. Ensemble models are really useful. Host: Well, so I was trying to articulate, cause that was, that's, there's two things people are talking about today. You have to like, you know, the moment you wake up, you open Hacker News and go like, all right, what's, what's the new thing today? One is Red Pajama. And then the other one is Mini GPT-4. So I was trying to articulate like, why is this not GPT-4? Like what is missing? And my only conclusion was it just doesn't do OCR yet. But I wonder if there's anything core to this concept of multimodality that you have to train these things together. Like what does one model doing all these things do that is separate from an ensemble of models that you just kind of duct tape together? Sharif: It's a good question. This is pretty related to interoperability. Like how do we understand that? Or how, how do we, why do models trained on different modalities within the same model perform better than two models perform or train separately? I can kind of see why that is the case. Like, it's kind of hard to articulate, but when you have two different models, you get the reasoning abilities of a language model, but also like the text or the vision understanding of something like Clip. Whereas Clip clearly lacks the reasoning abilities, but if you could somehow just put them both in the same model, you get the best of both worlds. There were even cases where I think the vision version of GPT-4 scored higher on some tests than the text only version. So like there might even be some additional learning from images as well. Swyx: Oh yeah. Well, uh, the easy answer for that was there was some chart in the test. That wasn't translated. Oh, when I read that, I was like, Oh yeah. Okay. That makes sense. Sharif: That makes sense. I thought it'd just be like, it sees more of the world. Therefore it has more tokens. Swyx: So my equivalent of this is I think it's a well-known fact that adding code to a language model training corpus increases its ability to do language, not just with code. So, the diversity of datasets that represent some kind of internal logic and code is obviously very internally logically consistent, helps the language model learn some internal structure. Which I think, so, you know, my ultimate test for GPT-4 is to show the image of like, you know, is this a pipe and ask it if it's a pipe or not and see what it does. Sharif: Interesting. That is pretty cool. Yeah. Or just give it a screenshot of your like VS code editor and ask it to fix the bug. Yeah. That'd be pretty wild if it could do that. Swyx: That would be adult AGI. That would be, that would be the grownup form of AGI. [33:30] Sharif’s Startup ManualSwyx: On your website, you have this, um, startup manual where you give a bunch of advice. This is fun. One of them was that you should be shipping to production like every two days, every other day. This seems like a great time to do it because things change every other day. But maybe, yeah, tell some of our listeners a little bit more about how you got to some of these heuristics and you obviously build different projects and you iterate it on a lot of things. Yeah. Do you want to reference this? Sharif: Um, sure. Yeah, I'll take a look at it. Swyx: And we'll put this in the show notes, but I just wanted you to have the opportunity to riff on this, this list, because I think it's a very good list. And what, which one of them helped you for Lexica, if there's anything, anything interesting. Sharif: So this list is, it's pretty funny. It's mostly just like me yelling at myself based on all the mistakes I've made in the past and me trying to not make them again. Yeah. Yeah. So I, the first one is like, I think the most important one is like, try when you're building a product, try to build the smallest possible version. And I mean, for Lexica, it was literally a, literally one screen in the react app where a post-process database, and it just showed you like images. And I don't even know if the first version had search. Like I think it did, but I'm not sure. Like, I think it was really just like a grid of images that were randomized, but yeah, don't build the absolute smallest thing that can be considered a useful application and ship it for Lexica. That was, it helps me write better prompts. That's pretty useful. It's not that useful, but it's good enough. Don't fall into the trap of intellectual indulgence with over-engineering. I think that's a pretty important one for myself. And also anyone working on new things, there's often times you fall into the trap of like thinking you need to add more and more things when in reality, like the moment it's useful, you should probably get in the hands of your users and they'll kind of set the roadmap for you. I know this has been said millions of times prior, but just, I think it's really, really important. And I think if I'd spent like two months working on Lexica, adding a bunch of features, it wouldn't have been anywhere as popular as it was if I had just released the really, really boiled down version alongside the stable diffusion release. Yeah. And then there are a few more like product development doesn't start until you launch. Think of your initial product as a means to get your users to talk to you. It's also related to the first point where you really just want people using something as quickly as you can get that to happen. And then a few more are pretty interesting. Create a product people love before you focus on growth. If your users are spontaneously telling other people to use your product, then you've built something people love. Swyx: So this is pretty, it sounds like you've internalized Paul Graham's stuff a lot. Yeah. Because I think he said stuff like that. Sharif: A lot of these are just probably me taking notes from books I found really interesting or like PG essays that were really relevant at the time. And then just trying to not forget them. I should probably read this list again. There's some pretty personalized advice for me here. Oh yeah. One of my favorite ones is, um, don't worry if what you're building doesn't sound like a business. Nobody thought Facebook would be a $500 billion company. It's easy to come up with a business model. Once you've made something people want, you can even make pretty web forms and turn that into a 200 person company. And then if you click the link, it's to LinkedIn for type form, which is now, uh, I think they're like an 800 person company or something like that. So they've grown quite a bit. There you go. Yeah. Pretty web forms are pretty good business, even though it doesn't sound like it. Yeah. It's worth a billion dollars. [38:30] Lexica Aperture V1/2/3Swyx: One way I would like to tie that to the history of Lexica, which we didn't go over, which was just walk us through like Aperture V1, V2, V3, uh, which you just released last week. And how maybe some of those principles helped you in that journey.Sharif: Yeah. So, um, V1 was us trying to create a very photorealistic version of our model of Sable to Fusion. Uh, V1 actually didn't turn out to be that popular. It turns out people loved not generating. Your marketing tweets were popular. They were quite popular. So I think at the time you couldn't get Sable to Fusion to generate like photorealistic images that were consistent with your prompt that well. It was more so like you were sampling from this distribution of images and you could slightly pick where you sampled from using your prompt. This was mostly just because the clip text encoder is not the best text encoder. If you use a real language model, like T5, you get much better results. Like the T5 XXL model is like a hundred times larger than the clip text encoder for Sable to Fusion 1.5. So you could kind of steer it into like the general direction, but for more complex prompts, it just didn't work. So a lot of our users actually complained that they preferred the 1.5, Sable to Fusion 1.5 model over the Aperture model. And it was just because a lot of people were using it to create like parts and like really weird abstract looking pictures that didn't really work well with the photorealistic model trained solely on images. And then for V2, we kind of took that into consideration and then just trained it more on a lot of the art images on Lexica. So we took a lot of images that were on Lexica that were art, used that to train aesthetic models that ranked art really well, and then filtered larger sets to train V2. And then V3 is kind of just like an improved version of that with much more data. I'm really glad we didn't spend too much time on V1. I think we spent about one month working on it, which is a lot of time, but a lot of the things we learned were useful for training future versions. Swyx: How do you version them? Like where do you decide, okay, this is V2, this is V3? Sharif: The versions are kind of weird where you can't really use semantic versions because like if you have a small update, you usually just make that like V2. Versions are kind of used for different base models, I'd say. So if you have each of the versions were a different base model, but we've done like fine tunes of the same version and then just release an update without incrementing the version. But I think when there's like a clear change between running the same prompt on a model and you get a different image, that should probably be a different version. [40:00] Request for AI Startup - LLM ToolsAlessio: So the startup manual was the more you can actually do these things today to make it better. And then you have a whole future page that has tips from, you know, what the series successor is going to be like to like why everyone's genome should be sequenced. There's a lot of cool stuff in there. Why do we need to develop stimulants with shorter half-lives so that we can sleep better. Maybe talk a bit about, you know, when you're a founder, you need to be focused, right? So sometimes there's a lot of things you cannot build. And I feel like this page is a bit of a collection of these. Like, yeah. Are there any of these things that you're like, if I were not building Lexica today, this is like a very interesting thing. Sharif: Oh man. Yeah. There's a ton of things that I want to build. I mean, off the top of my head, the most exciting one would be better tools for language models. And I mean, not tools that help us use language models, but rather tools for the language models themselves. So things like giving them access to browsers, giving them access to things like payments and credit cards, giving them access to like credit cards, giving them things like access to like real world robots. So like, it'd be cool if you could have a Boston dynamic spot powered by a language model reasoning module and you would like to do things for you, like go and pick up your order, stuff like that. Entirely autonomously given like high level commands. That'd be like number one thing if I wasn't working on Lexica. [40:00] Sequencing your GenomeAnd then there's some other interesting things like genomics I find really cool. Like there's some pretty cool things you can do with consumer genomics. So you can export your genome from 23andMe as a text file, like literally a text file of your entire genome. And there is another tool called Prometheus, I think, where you upload your 23andMe text file genome and then they kind of map specific SNPs that you have in your genome to studies that have been done on those SNPs. And it tells you really, really useful things about yourself. Like, for example, I have the SNP for this thing called delayed sleep phase disorder, which makes me go to sleep about three hours later than the general population. So like I used to always be a night owl and I never knew why. But after using Prometheus it pretty much tells you, oh, you have the specific genome for specific SNP for DSPS. It's like a really tiny percentage of the population. And it's like something you should probably know about. And there's a bunch of other things. It tells you your likelihood for getting certain diseases, for certain cancers, oftentimes, like even weird personality traits. There's one for like, I have one of the SNPs for increased risk taking and optimism, which is pretty weird. That's an actual thing. Like, I don't know how. This is the founder gene. You should sequence everybody. It's pretty cool. And it's like, it's like $10 for Prometheus and like 70 bucks for 23andMe. And it explains to you how your body works and like the things that are different from you or different from the general population. Wow. Highly recommend everyone do it. Like if you're, if you're concerned about privacy, just purchase a 23andMe kit with a fake name. You don't have to use your real name. I didn't use my real name. Swyx: It's just my genes. Worst you can do is clone me. It ties in with what you were talking about with, you know, we want the future to be like this. And like people are building uninspired B2B SaaS apps and you and I had an exchange about this. [42:00] Believe in Doing Great ThingsHow can we get more people to believe they can do great things? Sharif: That's a good question. And I like a lot of the things I've been working on with GP3. It has been like trying to solve this by getting people to think about more interesting ideas. I don't really know. I think one is just like the low effort version of this is just putting out really compelling demos and getting people inspired. And then the higher effort version is like actually building the products yourself and getting people to like realize this is even possible in the first place. Like I think the baby AGI project and like the GPT Asian projects on GitHub are like in practice today, they're not super useful, but I think they're doing an excellent job of getting people incredibly inspired for what can be possible with language models as agents. And also the Stanford paper where they had like the mini version of Sims. Yeah. That one was incredible. That was awesome. Swyx: It was adorable. Did you see the part where they invented day drinking? Sharif: Oh, they did? Swyx: Yeah. You're not supposed to go to these bars in the afternoon, but they were like, we're going to go anyway. Nice. Sharif: That's awesome. Yeah. I think we need more stuff like that. That one paper is probably going to inspire a whole bunch of teams to work on stuff similar to that. Swyx: And that's great. I can't wait for NPCs to actually be something that you talk to in a game and, you know, have their own lives and you can check in and, you know, they would have their own personalities as well. Sharif: Yeah. I was so kind of off topic. But I was playing the last of us part two and the NPCs in that game are really, really good. Where if you like, point a gun at them and they'll beg for their life and like, please, I have a family. And like when you kill people in the game, they're like, oh my God, you shot Alice. Like they're just NPCs, but they refer to each other by their names and like they plead for their lives. And this is just using regular conditional rules on NPC behavior. Imagine how much better it'd be if it was like a small GPT-4 agent running in every NPC and they had the agency to make decisions and plead for their lives. And I don't know, you feel way more guilty playing that game. Alessio: I'm scared it's going to be too good. I played a lot of hours of Fallout. So I feel like if the NPCs were a lot better, you would spend a lot more time playing the game. Yeah. [44:30] Lightning RoundLet's jump into lightning round. First question is your favorite AI product. Sharif: Favorite AI product. The one I use the most is probably ChatGPT. The one I'm most excited about is, it's actually a company in AI grants. They're working on a version of VS code. That's like an entirely AI powered cursor, yeah. Cursor where you would like to give it a prompt and like to iterate on your code, not by writing code, but rather by just describing the changes you want to make. And it's tightly integrated into the editor itself. So it's not just another plugin. Swyx: Would you, as a founder of a low code prompting-to-code company that pivoted, would you advise them to explore some things or stay away from some things? Like what's your learning there that you would give to them?Sharif: I would focus on one specific type of code. So if I'm building a local tool, I would try to not focus too much on appealing developers. Whereas if I was building an alternative to VS code, I would focus solely on developers. So in that, I think they're doing a pretty good job focusing on developers. Swyx: Are you using Cursor right now? Sharif: I've used it a bit. I haven't converted fully, but I really want to. Okay. It's getting better really, really fast. Yeah. Um, I can see myself switching over sometime this year if they continue improving it. Swyx: Hot tip for, for ChatGPT, people always say, you know, they love ChatGPT. Biggest upgrade to my life right now is the, I forked a menu bar app I found on GitHub and now I just have it running in a menu bar app and I just do command shift G and it pops it up as a single use thing. And there's no latency because it just always is live. And I just type, type in the thing I want and then it just goes away after I'm done. Sharif: Wow. That's cool. Big upgrade. I'm going to install that. That's cool. Alessio: Second question. What is something you thought would take much longer, but it's already here? Like what, what's your acceleration update? Sharif: Ooh, um, it would take much longer, but it's already here. This is your question. Yeah, I know. I wasn't prepared. Um, so I think it would probably be kind of, I would say text to video. Swyx: Yeah. What's going on with that? Sharif: I think within this year, uh, by the end of this year, we'll have like the jump between like the original DALL-E one to like something like mid journey. Like we're going to see that leap in text to video within the span of this year. Um, it's not already here yet. So I guess the thing that surprised me the most was probably the multi-modality of GPT four in the fact that it can technically see things, which is pretty insane. Swyx: Yeah. Is text to video something that Aperture would be interested in? Sharif: Uh, it's something we're thinking about, but it's still pretty early. Swyx: There was one project with a hand, um, animation with human poses. It was also coming out of Facebook. I thought that was a very nice way to accomplish text to video while having a high degree of control. I forget the name of that project. It was like, I think it was like drawing anything. Swyx: Yeah. It sounds familiar. Well, you already answered a year from now. What will people be most surprised by? Um, and maybe the, uh, the usual requests for startup, you know, what's one thing you will pay for if someone built it? Sharif: One thing I would pay for if someone built it. Um, so many things, honestly, I would probably really like, um, like I really want people to build more, uh, tools for language models, like useful tools, give them access to Chrome. And I want to be able to give it a task. And then just, it goes off and spins up a hundred agents that perform that task. And like, sure. Like 80 of them might fail, but like 20 of them might kind of succeed. That's all you really need. And they're agents. You can spin up thousands of them. It doesn't really matter. Like a lot of large numbers are on your side. So that'd be, I would pay a lot of money for that. Even if it was capable of only doing really basic tasks, like signing up for a SAS tool and booking a call or something. If you could do even more things where it could have handled the email, uh, thread and like get the person on the other end to like do something where like, I don't even have to like book the demo. They just give me access to it. That'd be great. Yeah. More, more. Like really weird language model tools would be really fun.Swyx: Like our chat, GPT plugins, a step in the right direction, or are you envisioning something else? Sharif: I think GPT, chat GPT plugins are great, but they seem to only have right-only access right now. I also want them to have, I want these like theoretical agents to have right access to the world too. So they should be able to perform actions on web browsers, have their own email inbox, and have their own credit card with their own balance. Like take it, send emails to people that might be useful in achieving their goal. Ask them for help. Be able to like sign up and register for accounts on tools and services and be able to like to use graphical user interfaces really, really well. And also like to phone home if they need help. Swyx: You just had virtual employees. You want to give them a Brex card, right? Sharif: I wouldn't be surprised if, a year from now there was Brex GPT or it's like Brex cards for your GPT agents. Swyx: I mean, okay. I'm excited by this. Yeah. Kind of want to build it. Sharif: You should. Yeah. Alessio: Well, just to wrap up, we always have like one big takeaway for people, like, you know, to display on a signboard for everyone to see what is the big message to everybody. Sharif: Yeah. I think the big message to everybody is you might think that a lot of the time the ideas you have have already been done by someone. And that may be the case, but a lot of the time the ideas you have are actually pretty unique and no one's ever tried them before. So if you have weird and interesting ideas, you should actually go out and just do them and make the thing and then share that with the world. Cause I feel like we need more people building weird ideas and less people building like better GPT search for your documentation. Host: There are like 10 of those in the recent OST patch. Well, thank you so much. You've been hugely inspiring and excited to see where Lexica goes next. Sharif: Appreciate it. Thanks for having me. Get full access to Latent Space at www.latent.space/subscribe
50:3708/05/2023
No Moat: Closed AI gets its Open Source wakeup call — ft. Simon Willison
It’s now almost 6 months since Google declared Code Red, and the results — Jeff Dean’s recap of 2022 achievements and a mass exodus of the top research talent that contributed to it in January, Bard’s rushed launch in Feb, a slick video showing Google Workspace AI features and confusing doubly linked blogposts about PaLM API in March, and merging Google Brain and DeepMind in April — have not been inspiring. Google’s internal panic is in full display now with the surfacing of a well written memo, written by software engineer Luke Sernau written in early April, revealing internal distress not seen since Steve Yegge’s infamous Google Platforms Rant. Similar to 2011, the company’s response to an external challenge has been to mobilize the entire company to go all-in on a (from the outside) vague vision.Google’s misfortunes are well understood by now, but the last paragraph of the memo: “We have no moat, and neither does OpenAI”, was a banger of a mic drop.Combine this with news this morning that OpenAI lost $540m last year and will need as much as $100b more funding (after the complex $10b Microsoft deal in Jan), and the memo’s assertion that both Google and OpenAI have “no moat” against the mighty open source horde have gained some credibility in the past 24 hours.Many are criticising this memo privately:* A CEO commented to me yesterday that Luke Sernau does not seem to work in AI related parts of Google and “software engineers don’t understand moats”. * Emad Mostaque, himself a perma-champion of open source and open models, has repeatedly stated that “Closed models will always outperform open models” because closed models can just wrap open ones.* Emad has also commented on the moats he does see: “Unique usage data, Unique content, Unique talent, Unique product, Unique business model”, most of which Google does have, and OpenAI less so (though it is winning on the talent front)* Sam Altman famously said that “very few to no one is Silicon Valley has a moat - not even Facebook” (implying that moats don’t actually matter, and you should spend your time thinking about more important things)* It is not actually clear what race the memo thinks Google and OpenAI are in vs Open Source. Neither are particularly concerned about running models locally on phones, and they are perfectly happy to let “a crazy European alpha male” run the last mile for them while they build actually monetizable cloud infrastructure.However moats are of intense interest by everybody keen on productized AI, cropping up in every Harvey, Jasper, and general AI startup vs incumbent debate. It is also interesting to take the memo at face value and discuss the searing hot pace of AI progress in open source. We hosted this discussion yesterday with Simon Willison, who apart from being an incredible communicator also wrote a great recap of the No Moat memo. 2,800 have now tuned in on Twitter Spaces, but we have taken the audio and cleaned it up here. Enjoy!Timestamps* [00:00:00] Introducing the Google Memo* [00:02:48] Open Source > Closed?* [00:05:51] Running Models On Device* [00:07:52] LoRA part 1* [00:08:42] On Moats - Size, Data* [00:11:34] Open Source Models are Comparable on Data* [00:13:04] Stackable LoRA* [00:19:44] The Need for Special Purpose Optimized Models* [00:21:12] Modular - Mojo from Chris Lattner* [00:23:33] The Promise of Language Supersets* [00:28:44] Google AI Strategy* [00:29:58] Zuck Releasing LLaMA* [00:30:42] Google Origin Confirmed* [00:30:57] Google's existential threat* [00:32:24] Non-Fiction AI Safety ("y-risk")* [00:35:17] Prompt Injection* [00:36:00] Google vs OpenAI* [00:41:04] Personal plugs: Simon and TravisTranscripts[00:00:00] Introducing the Google Memo[00:00:00] Simon Willison: So, yeah, this is a document, which Kate, which I first saw at three o'clock this morning, I think. It claims to be leaked from Google. There's good reasons to believe it is leaked from Google, and to be honest, if it's not, it doesn't actually matter because the quality of the analysis, I think stands alone.[00:00:15] If this was just a document by some anonymous person, I'd still think it was interesting and worth discussing. And the title of the document is We Have No Moat and neither does Open ai. And the argument it makes is that while Google and OpenAI have been competing on training bigger and bigger language models, the open source community is already starting to outrun them, given only a couple of months of really like really, really serious activity.[00:00:41] You know, Facebook lama was the thing that really kicked us off. There were open source language models like Bloom before that some G P T J, and they weren't very impressive. Like nobody was really thinking that they were. Chat. G P T equivalent Facebook Lama came out in March, I think March 15th. And was the first one that really sort of showed signs of being as capable maybe as chat G P T.[00:01:04] My, I don't, I think all of these models, they've been, the analysis of them has tend to be a bit hyped. Like I don't think any of them are even quite up to GT 3.5 standards yet, but they're within spitting distance in some respects. So anyway, Lama came out and then, Two weeks later Stanford Alpaca came out, which was fine tuned on top of Lama and was a massive leap forward in terms of quality.[00:01:27] And then a week after that Vicuna came out, which is to this date, the the best model I've been able to run on my own hardware. I, on my mobile phone now, like, it's astonishing how little resources you need to run these things. But anyway, the the argument that this paper made, which I found very convincing is it only took open source two months to get this far.[00:01:47] It's now every researcher in the world is kicking it on new, new things, but it feels like they're being there. There are problems that Google has been trying to solve that the open source models are already addressing, and really how do you compete with that, like with your, it's closed ecosystem, how are you going to beat these open models with all of this innovation going on?[00:02:04] But then the most interesting argument in there is it talks about the size of models and says that maybe large isn't a competitive advantage, maybe actually a smaller model. With lots of like different people fine tuning it and having these sort of, these LoRA l o r a stackable fine tuning innovations on top of it, maybe those can move faster.[00:02:23] And actually having to retrain your giant model every few months from scratch is, is way less useful than having small models that you can tr you can fine tune in a couple of hours on laptop. So it's, it's fascinating. I basically, if you haven't read this thing, you should read every word of it. It's not very long.[00:02:40] It's beautifully written. Like it's, it's, I mean, If you try and find the quotable lines in it, almost every line of it's quotable. Yeah. So, yeah, that's that, that, that's the status of this[00:02:48] Open Source > Closed?[00:02:48] swyx: thing. That's a wonderful summary, Simon. Yeah, there, there's so many angles we can take to this. I, I'll just observe one, one thing which if you think about the open versus closed narrative, Ima Mok, who is the CEO of Stability, has always been that open will trail behind closed, because the closed alternatives can always take.[00:03:08] Learnings and lessons from open source. And this is the first highly credible statement that is basically saying the exact opposite, that open source is moving than, than, than closed source. And they are scared. They seem to be scared. Which is interesting,[00:03:22] Travis Fischer: Travis. Yeah, the, the, the, a few things that, that I'll, I'll, I'll say the only thing which can keep up with the pace of AI these days is open source.[00:03:32] I think we're, we're seeing that unfold in real time before our eyes. And. You know, I, I think the other interesting angle of this is to some degree LLMs are they, they don't really have switching costs. They are going to be, become commoditized. At least that's, that's what a lot of, a lot of people kind of think to, to what extent is it Is it a, a rate in terms of, of pricing of these things?[00:03:55] , and they all kind of become roughly the, the, the same in, in terms of their, their underlying abilities. And, and open source is gonna, gonna be actively pushing, pushing that forward. And, and then this is kind of coming from, if it is to be believed the kind of Google or an insider type type mentality around you know, where is the actual competitive advantage?[00:04:14] What should they be focusing on? How can they get back in into the game? When you know, when, when, when, when currently the, the, the external view of, of Google is that they're kind of spinning their wheels and they have this code red,, and it's like they're, they're playing catch up already.[00:04:28] Like how could they use the open source community and work with them, which is gonna be really, really hard you know, from a structural perspective given Google's place in the ecosystem. But a, a lot, lot, a lot of jumping off points there.[00:04:42] Alessio Fanelli: I was gonna say, I think the Post is really focused on how do we get the best model, but it's not focused on like, how do we build the best product around it.[00:04:50] A lot of these models are limited by how many GPUs you can get to run them and we've seen on traditional open source, like everybody can use some of these projects like Kafka and like Alaska for free. But the reality is that not everybody can afford to run the infrastructure needed for it.[00:05:05] So I, I think like the main takeaway that I have from this is like, A lot of the moats are probably around just getting the, the sand, so to speak, and having the GPUs to actually serve these models. Because even if the best model is open source, like running it at large scale for an end is not easy and like, it's not super convenient to get a lot, a lot of the infrastructure.[00:05:27] And we've seen that model work in open source where you have. The opensource project, and then you have a enterprise cloud hosted version for it. I think that's gonna look really different in opensource models because just hosting a model doesn't have a lot of value. So I'm curious to hear how people end up getting rewarded to do opensource.[00:05:46] You know, it's, we figured that out in infrastructure, but we haven't figured it out in in Alans[00:05:51] Running Models On Device[00:05:51] Simon Willison: yet. I mean, one thing I'll say is that the the models that you can run on your own devices are so far ahead of what I ever dreamed they would be at this point. Like Vicuna 13 b i i, I, I think is the current best available open mo model that I've played with.[00:06:08] It's derived from Facebook Lama, so you can't use it for commercial purposes yet. But the point about MCK 13 B is it runs in the browser directly on web gpu. There's this amazing web l l M project where you literally, your browser downloaded a two gigabyte file. And it fires up a chat g D style interface and it's quite good.[00:06:27] It can do rap battles between different animals and all of the kind of fun stuff that you'd expect to be able to do the language model running entirely in Chrome canary. It's shocking to me that that's even possible, but that kind of shows that once, once you get to inference, if you can shrink the model down and the techniques for shrinking these models, the, the first one was the the quantization.[00:06:48] Which the Lama CPP project really sort of popularized Matt can by using four bits instead of 16 bit floating point numbers, you can shrink it down quite a lot. And then there was a paper that came out days ago suggesting that you can prune the models and ditch half the model and maintain the same level of quality.[00:07:05] So with, with things like that, with all of these tricks coming together, it's really astonishing how much you can get done on hardware that people actually have in their pockets even.[00:07:15] swyx: Just for completion I've been following all of your posts. Oh, sorry. Yes. I just wanna follow up, Simon. You're, you said you're running a model on your phone. Which model is it? And I don't think you've written it up.[00:07:27] Simon Willison: Yeah, that one's vina. I did, did I write it up? I did. I've got a blog post about how it it, it, it knows who I am, sort of, but it said that I invented a, a, a pattern for living called bear or bunny pattern, which I definitely didn't, but I loved that my phone decided that I did.[00:07:44] swyx: I will hunt for that because I'm not yet running Vic on my phone and I feel like I should and, and as like a very base thing, but I'll, okay.[00:07:52] Stackable LoRA Modules[00:07:52] swyx: Also, I'll follow up two things, right? Like one I'm very interesting and let's, let's talk about that a little bit more because this concept of stackable improvements to models I think is extremely interesting.[00:08:00] Like, I would love to MPM install abilities onto my models, right? Which is really awesome. But the, the first thing thing is under-discussed is I don't get the panic. Like, honestly, like Google has the most moats. I I, I was arguing maybe like three months ago on my blog. Like Google has the most mote out of a lot of people because, hey, we have your calendar.[00:08:21] Hey, we have your email. Hey, we have your you know, Google Docs. Like, isn't that a, a sufficient mode? Like, why are these guys panicking so much? I don't, I still don't get it. Like, Sure open source is running ahead and like, it's, it's on device and whatev, what have you, but they have so much more mode.[00:08:36] Like, what are we talking about here? There's many dimensions to compete on.[00:08:42] On Moats - Size, Data[00:08:42] Travis Fischer: Yeah, there's like one of, one of the, the things that, that the author you know, mentions in, in here is when, when you start to, to, to have the feeling of what we're trailing behind, then you're, you're, you're, you're brightest researchers jump ship and go to OpenAI or go to work at, at, at academia or, or whatever.[00:09:00] And like the talent drain. At the, the level of the, the senior AI researchers that are pushing these things ahead within Google, I think is a serious, serious concern. And my, my take on it's a good point, right? Like, like, like, like what Google has modes. They, they, they're not running outta money anytime soon.[00:09:16] You know, I think they, they do see the level of the, the defensibility and, and the fact that they want to be, I'll chime in the, the leader around pretty much anything. Tech first. There's definitely ha ha have lost that, that, that feeling. Right? , and to what degree they can, they can with the, the open source community to, to get that back and, and help drive that.[00:09:38] You know all of the llama subset of models with, with alpaca and Vicuna, et cetera, that all came from, from meta. Right. Like that. Yeah. Like it's not licensed in an open way where you can build a company on top of it, but is now kind of driving this family of, of models, like there's a tree of models that, that they're, they're leading.[00:09:54] And where is Google in that, in that playbook? Like for a long time they were the one releasing those models being super open and, and now it's just they, they've seem to be trailing and there's, there's people jumping ship and to what degree can they, can they, can they. Close off those wounds and, and focus on, on where, where they, they have unique ability to, to gain momentum.[00:10:15] I think is a core part of my takeaway from this. Yeah.[00:10:19] Alessio Fanelli: And think another big thing in the post is, oh, as long as you have high quality data, like you don't need that much data, you can just use that. The first party data loops are probably gonna be the most important going forward if we do believe that this is true.[00:10:32] So, Databricks. We have Mike Conover from Databricks on the podcast, and they talked about how they came up with the training set for Dolly, which they basically had Databricks employees write down very good questions and very good answers for it. Not every company as the scale to do that. And I think products like Google, they have millions of people writing Google Docs.[00:10:54] They have millions of people using Google Sheets, then millions of people writing stuff, creating content on YouTube. The question is, if you wanna compete against these companies, maybe the model is not what you're gonna do it with because the open source kind of commoditizes it. But how do you build even better data?[00:11:12] First party loops. And that's kind of the hardest thing for startups, right? Like even if we open up the, the models to everybody and everybody can just go on GitHub and. Or hugging face and get the waste to the best model, but get enough people to generate data for me so that I can still make it good. That's, that's what I would be worried about if I was a, a new company.[00:11:31] How do I make that happen[00:11:32] Simon Willison: really quickly?[00:11:34] Open Source Models are Comparable on Data[00:11:34] Simon Willison: I'm not convinced that the data is that big a challenge. So there's this PO project. So the problem with Facebook LAMA is that it's not available for, for commercial use. So people are now trying to train a alternative to LAMA that's entirely on openly licensed data.[00:11:48] And that the biggest project around that is this red pajama project, which They released their training data a few weeks ago and it was 2.7 terabytes. Right? So actually tiny, right? You can buy a laptop that you can fit 2.7 terabytes on. Got it. But it was the same exact data that Facebook, the same thing that Facebook Lamb had been trained on.[00:12:06] Cuz for your base model. You're not really trying to teach it fact about the world. You're just trying to teach it how English and other languages work, how they fit together. And then the real magic is when you fine tune on top of that. That's what Alpaca did on top of Lama and so on. And the fine tuning sets, it looks like, like tens of thousands of examples to kick one of these role models into shape.[00:12:26] And tens of thousands of examples like Databricks spent a month and got the 2000 employees of their company to help kick in and it worked. You've got the open assistant project of crowdsourcing this stuff now as well. So it's achievable[00:12:40] swyx: sore throat. I agree. I think it's a fa fascinating point. Actually, so I've heard through the grapevine then red pajamas model.[00:12:47] Trained on the, the data that they release is gonna be releasing tomorrow. And it's, it's this very exciting time because the, the, there, there's a, there's a couple more models that are coming down the pike, which independently we produced. And so yeah, that we, everyone is challenging all these assumptions from, from first principles, which is fascinating.[00:13:04] Stackable LoRA[00:13:04] swyx: I, I did, I did wanted to, to like try to get a little bit more technical in terms of like the, the, the, the specific points race. Cuz this doc, this doc was just amazing. Can we talk about LoRA. I, I, I'll open up to Simon again if he's back.[00:13:16] Simon Willison: I'd rather someone else take on. LoRA, I've, I, I know as much as I've read in that paper, but not much more than that.[00:13:21] swyx: So I thought it was this kind of like an optimization technique. So LoRA stands for lower rank adaptation. But this is the first mention of LoRA as a form of stackable improvements. Where he I forget what, let, just, let me just kind of Google this. But obviously anyone's more knowledgeable please.[00:13:39] So come on in.[00:13:40] Alessio Fanelli: I, all of Lauren is through GTS Man, about 20 minutes on GT four, trying to figure out word. It was I study computer science, but this is not this is not my area of expertise. What I got from it is that basically instead of having to retrain the whole model you can just pick one of the ranks and you take.[00:13:58] One of like the, the weight matrix tests and like make two smaller matrixes from it and then just two to be retrained and training the whole model. So[00:14:08] swyx: it save a lot of Yeah. You freeze part of the thing and then you just train the smaller part like that. Exactly. That seems to be a area of a lot of fruitful research.[00:14:15] Yeah. I think Mini GT four recently did something similar as well. And then there's, there's, there's a, there's a Spark Model people out today that also did the same thing.[00:14:23] Simon Willison: So I've seen a lot of LoRA stable, the stable diffusion community has been using LoRA a lot. So they, in that case, they had a, I, the thing I've seen is people releasing LoRA's that are like you, you train a concept like a, a a particular person's face or something you release.[00:14:38] And the, the LoRA version of this end up being megabytes of data, like, which is, it's. You know, it's small enough that you can just trade those around and you can effectively load multiple of those into the model. But what I haven't realized is that you can use the same trick on, on language models. That was one of the big new things for me in reading the the leaks Google paper today.[00:14:56] Alessio Fanelli: Yeah, and I think the point to make around on the infrastructure, so what tragedy has told me is that when you're figuring out what rank you actually wanna do this fine tuning at you can have either go too low and like the model doesn't actually learn it. Or you can go too high and the model overfit those learnings.[00:15:14] So if you have a base model that everybody agrees on, then all the subsequent like LoRA work is done around the same rank, which gives you an advantage. And the point they made in the, that, since Lama has been the base for a lot of this LoRA work like they own. The, the mind share of the community.[00:15:32] So everything that they're building is compatible with their architecture. But if Google Opensources their own model the rank that they chose For LoRA on Lama might not work on the Google model. So all of the existing work is not portable. So[00:15:46] Simon Willison: the impression I got is that one of the challenges with LoRA is that you train all these LoRAs on top of your model, but then if you retrain that base model as LoRA's becoming invalid, right?[00:15:55] They're essentially, they're, they're, they're built for an exact model version. So this means that being the big company with all of the GPUs that can afford to retrain a model every three months. That's suddenly not nearly as valuable as it used to be because now maybe there's an open source model that's five years old at this point and has like multiple, multiple stacks of LoRA's trained all over the world on top of it, which can outperform your brand new model just because there's been so much more iteration on that base.[00:16:20] swyx: I, I think it's, I think it's fascinating. It's I think Jim Fan from Envidia was recently making this argument for transformers. Like even if we do come up with a better. Architecture, then transformers, they're the sheer hundreds and millions of dollars that have been invested on top of transformers.[00:16:34] Make it actually there is some switching costs and it's not exactly obvious that better architecture. Equals equals we should all switch immediately tomorrow. It's, it's, it's[00:16:44] Simon Willison: kinda like the, the difficulty of launching a new programming language today Yes. Is that pipeline and JavaScript have a million packages.[00:16:51] So no matter how good your new language is, if it can't tap into those existing package libraries, it's, it's not gonna be useful for, which is why Moji is so clever, because they did build on top of Pips. They get all of that existing infrastructure, all of that existing code working already.[00:17:05] swyx: I mean, what, what thought you, since you co-create JAO and all that do, do we wanna take a diversion into mojo?[00:17:10] No, no. I[00:17:11] Travis Fischer: would, I, I'd be happy to, to, to jump in, and get Simon's take on, on Mojo. 1, 1, 1 small, small point on LoRA is I, I, I just think. If you think about at a high level, what the, the major down downsides are of these, these large language models. It's the fact that they well they're, they're, they're difficult to, to train, right?[00:17:32] They, they tend to hallucinate and they are, have, have a static, like, like they were trained at a certain date, right? And with, with LoRA, I think it makes it a lot more amenable to Training new, new updates on top of that, that like base model on the fly where you can incorporate new, new data and in a way that is, is, is an interesting and potentially more optimal alternative than Doing the kind of in context generation cuz, cuz most of like who at perplexity AI or, or any of these, these approaches currently, it's like all based off of doing real-time searches and then injecting as much into the, the, the local context window as possible so that you, you try to ground your, your, your, your language model.[00:18:16] Both in terms of the, the information it has access to that, that, that helps to reduce hallucinations. It can't reduce it, but helps to reduce it and then also gives it access to up-to-date information that wasn't around for that, that massive like, like pre-training step. And I think LoRA in, in, in mine really makes it more, more amenable to having.[00:18:36] Having constantly shifting lightweight pre-training on top of it that scales better than than normal. Pre I'm sorry. Fine tune, fine tuning. Yeah, that, that was just kinda my one takeaway[00:18:45] Simon Willison: there. I mean, for me, I've never been, I want to run models on my own hard, I don't actually care about their factual content.[00:18:52] Like I don't need a model that's been, that's trained on the most upstate things. What I need is a model that can do the bing and bar trick, right? That can tell when it needs to run a search. And then go and run a search to get extra information and, and bring that context in. And similarly, I wanted to be able to operate tools where it can access my email or look at my notes or all of those kinds of things.[00:19:11] And I don't think you need a very powerful model for that. Like that's one of the things where I feel like, yeah, vicuna running on my, on my laptop is probably powerful enough to drive a sort of personal research assistant, which can look things up for me and it can summarize things for my notes and it can do all of that and I don't care.[00:19:26] But it doesn't know about the Ukraine war because the Ukraine war training cutoff, that doesn't matter. If it's got those additional capabilities, which are quite easy to build the reason everyone's going crazy building agents and tools right now is that it's a few lines of Python code, and a sort of couple of paragraphs to get it to.[00:19:44] The Need for Special Purpose Optimized Models[00:19:44] Simon Willison: Well, let's, let's,[00:19:45] Travis Fischer: let's maybe dig in on that a little bit. And this, this also is, is very related to mojo. Cuz I, I do think there are use cases and domains where having the, the hyper optimized, like a version of these models running on device is, is very relevant where you can't necessarily make API calls out on the fly.[00:20:03] and Aug do context, augmented generation. And I was, I was talking with, with a a researcher. At Lockheed Martin yesterday, literally about like, like the, the version of this that's running of, of language models running on, on fighter jets. Right? And you, you talk about like the, the, the amount of engineering, precision and optimization that has to go into, to those type of models.[00:20:25] And the fact that, that you spend so much money, like, like training a super distilled ver version where milliseconds matter it's a life or death situation there. You know, and you couldn't even, even remotely ha ha have a use case there where you could like call out and, and have, have API calls or something.[00:20:40] So I, I do think there's like keeping in mind the, the use cases where, where. There, there'll be use cases that I'm more excited about at, at the application level where, where, yeah, I want to to just have it be super flexible and be able to call out to APIs and have this agentic type type thing.[00:20:56] And then there's also industries and, and use cases where, where you really need everything baked into the model.[00:21:01] swyx: Yep. Agreed. My, my favorite piece take on this is I think DPC four as a reasoning engine, which I think came from the from Nathan at every two. Which I think, yeah, I see the hundred score over there.[00:21:12] Modular - Mojo from Chris Lattner[00:21:12] swyx: Simon, do you do you have a, a few seconds on[00:21:14] Simon Willison: mojo. Sure. So Mojo is a brand new program language you just announced a few days ago. It's not actually available yet. I think there's an online demo, but to zooming it becomes an open source language we can use. It's got really some very interesting characteristics.[00:21:29] It's a super set of Python, so anything written in Python, Python will just work, but it adds additional features on top that let you basically do very highly optimized code with written. In Python syntax, it compiles down the the main thing that's exciting about it is the pedigree that it comes from.[00:21:47] It's a team led by Chris Latner, built L L V M and Clang, and then he designed Swift at Apple. So he's got like three, three for three on, on extraordinarily impactful high performance computing products. And he put together this team and they've basically, they're trying to go after the problem of how do you build.[00:22:06] A language which you can do really high performance optimized work in, but where you don't have to do everything again from scratch. And that's where building on top of Python is so clever. So I wasn't like, if this thing came along, I, I didn't really pay attention to it until j Jeremy Howard, who built Fast ai put up a very detailed blog post about why he was excited about Mojo, which included a, there's a video demo in there, which everyone should watch because in that video he takes Matrix multiplication implemented in Python.[00:22:34] And then he uses the mojo extras to 2000 x. The performance of that matrix multiplication, like he adds a few static types functions sort of struck instead of the class. And he gets 2000 times the performance out of it, which is phenomenal. Like absolutely extraordinary. So yeah, that, that got me really excited.[00:22:52] Like the idea that we can still use Python and all of this stuff we've got in Python, but we can. Just very slightly tweak some things and get literally like thousands times upwards performance out of the things that matter. That's really exciting.[00:23:07] swyx: Yeah, I, I, I'm curious, like, how come this wasn't thought of before?[00:23:11] It's not like the, the, the concept of a language super set hasn't hasn't, has, has isn't, is completely new. But all, as far as I know, all the previous Python interpreter approaches, like the alternate runtime approaches are like they, they, they're more, they're more sort of, Fit conforming to standard Python, but never really tried this additional approach of augmenting the language.[00:23:33] The Promise of Language Supersets[00:23:33] swyx: I, I'm wondering if you have many insights there on, like, why, like why is this a, a, a breakthrough?[00:23:38] Simon Willison: Yeah, that's a really interesting question. So, Jeremy Howard's piece talks about this thing called M L I R, which I hadn't heard of before, but this was another Chris Latner project. You know, he built L L VM as a low level virtual machine.[00:23:53] That you could build compilers on top of. And then M L I R was this one that he initially kicked off at Google, and I think it's part of TensorFlow and things like that. But it was very much optimized for multiple cores and GPU access and all of that kind of thing. And so my reading of Jeremy Howard's article is that they've basically built Mojo on top of M L I R.[00:24:13] So they had a huge, huge like a starting point where they'd, they, they knew this technology better than anyone else. And because they had this very, very robust high performance basis that they could build things on. I think maybe they're just the first people to try and build a high, try and combine a high level language with M L A R, with some extra things.[00:24:34] So it feels like they're basically taking a whole bunch of ideas people have been sort of experimenting with over the last decade and bundled them all together with exactly the right team, the right level of expertise. And it looks like they've got the thing to work. But yeah, I mean, I've, I've, I'm. Very intrigued to see, especially once this is actually available and we can start using it.[00:24:52] It, Jeremy Howard is someone I respect very deeply and he's, he's hyping this thing like crazy, right? His headline, his, and he's not the kind of person who hypes things if they're not worth hyping. He said Mojo may be the biggest programming language advanced in decades. And from anyone else, I'd kind of ignore that headline.[00:25:09] But from him it really means something.[00:25:11] swyx: Yes, because he doesn't hype things up randomly. Yeah, and, and, and he's a noted skeptic of Julia which is, which is also another data science hot topic. But from the TypeScript and web, web development worlds there has been a dialect of TypeScript that was specifically optimized to compile, to web assembly which I thought was like promising and then, and, and eventually never really took off.[00:25:33] But I, I like this approach because I think more. Frameworks should, should essentially be languages and recognize that they're language superset and maybe working compilers that that work on them. And then that is the, by the way, that's the direction that React is going right now. So fun times[00:25:50] Simon Willison: type scripts An interesting comparison actually, cuz type script is effectively a superset of Java script, right?[00:25:54] swyx: It's, but there's no, it's purely[00:25:57] Simon Willison: types, right? Gotcha. Right. So, so I guess mojo is the soup set python, but the emphasis is absolutely on tapping into the performance stuff. Right.[00:26:05] swyx: Well, the just things people actually care about.[00:26:08] Travis Fischer: Yeah. The, the one thing I've found is, is very similar to the early days of type script.[00:26:12] There was the, the, the, the most important thing was that it's incrementally adoptable. You know, cuz people had a script code basis and, and they wanted to incrementally like add. The, the, the main value prop for TypeScript was reliability and the, the, the, the static typing. And with Mojo, Lucia being basically anyone who's a target a large enterprise user of, of Mojo or even researchers, like they're all going to be coming from a, a hardcore.[00:26:36] Background in, in Python and, and have large existing libraries. And the the question will be for what use cases will mojo be like a, a, a really good fit for that incremental adoption where you can still tap into your, your, your massive, like python exi existing infrastructure workflows, data tooling, et cetera.[00:26:55] And, and what does, what does that path to adoption look like?[00:26:59] swyx: Yeah, we, we, we don't know cuz it's a wait listed language which people were complaining about. They, they, the, the mojo creators were like saying something about they had to scale up their servers. And I'm like, what language requires essential server?[00:27:10] So it's a little bit suss, a little bit, like there's a, there's a cloud product already in place and they're waiting for it. But we'll see. We'll see. I mean, emojis should be promising in it. I, I actually want more. Programming language innovation this way. You know, I was complaining years ago that programming language innovation is all about stronger types, all fun, all about like more functional, more strong types everywhere.[00:27:29] And, and this is, the first one is actually much more practical which I, which I really enjoy. This is why I wrote about self provisioning run types.[00:27:36] Simon Willison: And[00:27:37] Alessio Fanelli: I mean, this is kind of related to the post, right? Like if you stop all of a sudden we're like, the models are all the same and we can improve them.[00:27:45] Like, where can we get the improvements? You know, it's like, Better run times, better languages, better tooling, better data collection. Yeah. So if I were a founder today, I wouldn't worry as much about the model, maybe, but I would say, okay, what can I build into my product and like, or what can I do at the engineering level that maybe it's not model optimization because everybody's working on it, but like you said, it's like, why haven't people thought of this before?[00:28:09] It's like, it's, it's definitely super hard, but I'm sure that if you're like Google or you're like open AI or you're like, Databricks, we got smart enough people that can think about these problems, so hopefully we see more of this.[00:28:21] swyx: You need, Alan? Okay. I promise to keep this relatively tight. I know Simon on a beautiful day.[00:28:27] It is a very nice day in California. I wanted to go through a few more points that you have pulled out Simon and, and just give you the opportunity to, to rant and riff and, and what have you. I, I, are there any other points from going back to the sort of Google OpenAI mode documents that, that you felt like we, we should dive in on?[00:28:44] Google AI Strategy[00:28:44] Simon Willison: I mean, the really interesting stuff there is the strategy component, right? The this idea that that Facebook accidentally stumbled into leading this because they put out this model that everyone else is innovating on top of. And there's a very open question for me as to would Facebook relic Lama to allow for commercial usage?[00:29:03] swyx: Is there some rumor? Is that, is that today?[00:29:06] Simon Willison: Is there a rumor about that?[00:29:07] swyx: That would be interesting? Yeah, I saw, I saw something about Zuck saying that he would release the, the Lama weights officially.[00:29:13] Simon Willison: Oh my goodness. No, that I missed. That is, that's huge.[00:29:17] swyx: Let me confirm the tweet. Let me find the tweet and then, yeah.[00:29:19] Okay.[00:29:20] Simon Willison: Because actually I met somebody from Facebook machine learning research a couple of weeks ago, and I, I pressed 'em on this and they said, basically they don't think it'll ever happen because if it happens, and then somebody does horrible fascist stuff with this model, all of the headlines will be Meg releases a monster into the world.[00:29:36] So, so hi. His, the, the, the, a couple of weeks ago, his feeling was that it's just too risky for them to, to allow it to be used like that. But a couple of weeks is, is, is a couple of months in AI world. So yeah, it wouldn't be, it feels to me like strategically Facebook should be jumping right on this because this puts them at the very.[00:29:54] The very lead of, of open source innovation around this stuff.[00:29:58] Zuck Releasing LLaMA[00:29:58] swyx: So I've pinned the tweet talking about Zuck and Zuck saying that meta will open up Lama. It's from the founder of Obsidian, which gives it a slight bit more credibility, but it is the only. Tweet that I can find about it. So completely unsourced,[00:30:13] we shall see. I, I, I mean I have friends within meta, I should just go ask them. But yeah, I, I mean one interesting angle on, on the memo actually is is that and, and they were linking to this in, in, in a doc, which is apparently like. Facebook got a bunch of people to do because they, they never released it for commercial use, but a lot of people went ahead anyway and, and optimized and, and built extensions and stuff.[00:30:34] They, they got a bunch of free work out of opensource, which is an interesting strategy.[00:30:39] There's okay. I don't know if I.[00:30:42] Google Origin Confirmed[00:30:42] Simon Willison: I've got exciting piece of news. I've just heard from somebody with contacts at Google that they've heard people in Google confirm the leak. That that document wasn't even legit Google document, which I don't find surprising at all, but I'm now up to 10, outta 10 on, on whether that's, that's, that's real.[00:30:57] Google's existential threat[00:30:57] swyx: Excellent. Excellent. Yeah, it is fascinating. Yeah, I mean the, the strategy is, is, is really interesting. I think Google has been. Definitely sleeping on monetizing. You know, I, I, I heard someone call when Google Brain and Devrel I merged that they would, it was like goodbye to the Xerox Park of our era and it definitely feels like Google X and Google Brain would definitely Xerox parks of our, of our era, and I guess we all benefit from that.[00:31:21] Simon Willison: So, one thing I'll say about the, the Google side of things, like the there was a question earlier, why are Google so worried about this stuff? And I think it's, it's just all about the money. You know, the, the, the engine of money at Google is Google searching Google search ads, and who uses Chachi PT on a daily basis, like me, will have noticed that their usage of Google has dropped like a stone.[00:31:41] Because there are many, many questions that, that chat, e p t, which shows you no ads at all. Is, is, is a better source of information for than Google now. And so, yeah, I'm not, it doesn't surprise me that Google would see this as an existential threat because whether or not they can be Bard, it's actually, it's not great, but it, it exists, but it hasn't it yet either.[00:32:00] And if I've got a Chatbook chatbot that's not showing me ads and chatbot that is showing me ads, I'm gonna pick the one that's not showing[00:32:06] swyx: me ads. Yeah. Yeah. I, I agree. I did see a prototype of Bing with ads. Bing chat with ads. I haven't[00:32:13] Simon Willison: seen the prototype yet. No.[00:32:15] swyx: Yeah, yeah. Anyway, I I, it, it will come obviously, and then we will choose, we'll, we'll go out of our ways to avoid ads just like we always do.[00:32:22] We'll need ad blockers and chat.[00:32:23] Excellent.[00:32:24] Non-Fiction AI Safety ("y-risk")[00:32:24] Simon Willison: So I feel like on the safety side, the, the safety side, there are basically two areas of safety that I, I, I sort of split it into. There's the science fiction scenarios, the AI breaking out and killing all humans and creating viruses and all of that kind of thing. The sort of the terminated stuff. And then there's the the.[00:32:40] People doing bad things with ai and that's latter one is the one that I think is much more interesting and that cuz you could u like things like romance scams, right? Romance scams already take billions of dollars from, from vulner people every year. Those are very easy to automate using existing tools.[00:32:56] I'm pretty sure for QNA 13 b running on my laptop could spin up a pretty decent romance scam if I was evil and wanted to use it for them. So that's the kind of thing where, I get really nervous about it, like the fact that these models are out there and bad people can use these bad, do bad things.[00:33:13] Most importantly at scale, like romance scamming, you don't need a language model to pull off one romance scam, but if you wanna pull off a thousand at once, the language model might be the, the thing that that helps you scale to that point. And yeah, in terms of the science fiction stuff and also like a model on my laptop that can.[00:33:28] Guess what comes next in a sentence. I'm not worried that that's going to break out of my laptop and destroy the world. There. There's, I'm get slightly nervous about the huge number of people who are trying to build agis on top of this models, the baby AGI stuff and so forth, but I don't think they're gonna get anywhere.[00:33:43] I feel like if you actually wanted a model that was, was a threat to human, a language model would be a tiny corner of what that thing. Was actually built on top of, you'd need goal setting and all sorts of other bits and pieces. So yeah, for the moment, the science fiction stuff doesn't really interest me, although it is a little bit alarming seeing more and more of the very senior figures in this industry sort of tip the hat, say we're getting a little bit nervous about this stuff now.[00:34:08] Yeah.[00:34:09] swyx: So that would be Jeff Iton and and I, I saw this me this morning that Jan Lacoon was like happily saying, this is fine. Being the third cheer award winner.[00:34:20] Simon Willison: But you'll see a lot of the AI safe, the people who've been talking about AI safety for the longest are getting really angry about science fiction scenarios cuz they're like, no, the, the thing that we need to be talking about is the harm that you can cause with these models right now today, which is actually happening and the science fiction stuff kind of ends up distracting from that.[00:34:36] swyx: I love it. You, you. Okay. So, so Uher, I don't know how to pronounce his name. Elier has a list of ways that AI will kill us post, and I think, Simon, you could write a list of ways that AI will harm us, but not kill us, right? Like the, the, the non-science fiction actual harm ways, I think, right? I haven't seen a, a actual list of like, hey, romance scams spam.[00:34:57] I, I don't, I don't know what else, but. That could be very interesting as a Hmm. Okay. Practical. Practical like, here are the situations we need to guard against because they are more real today than that we need to. Think about Warren, about obviously you've been a big advocate of prompt injection awareness even though you can't really solve them, and I, I worked through a scenario with you, but Yeah,[00:35:17] Prompt Injection[00:35:17] Simon Willison: yeah.[00:35:17] Prompt injection is a whole other side of this, which is, I mean, that if you want a risk from ai, the risk right now is everyone who's building puts a building systems that attackers can trivially subvert into stealing all of their private data, unlocking their house, all of that kind of thing. So that's another very real risk that we have today.[00:35:35] swyx: I think in all our personal bios we should edit in prompt injections already, like in on my website, I wanna edit in a personal prompt injections so that if I get scraped, like I all know if someone's like reading from a script, right? That that is generated by any iBot. I've[00:35:49] Simon Willison: seen people do that on LinkedIn already and they get, they get recruiter emails saying, Hey, I didn't read your bio properly and I'm just an AI script, but would you like a job?[00:35:57] Yeah. It's fascinating.[00:36:00] Google vs OpenAI[00:36:00] swyx: Okay. Alright, so topic. I, I, I think, I think this this, this mote is is a peak under the curtain of the, the internal panic within Google. I think it is very val, very validated. I'm not so sure they should care so much about small models or, or like on device models.[00:36:17] But the other stuff is interesting. There is a comment at the end that you had by about as for opening open is themselves, open air, doesn't matter. So this is a Google document talking about Google's position in the market and what Google should be doing. But they had a comment here about open eye.[00:36:31] They also say open eye had no mode, which is a interesting and brave comment given that open eye is the leader in, in a lot of these[00:36:38] Simon Willison: innovations. Well, one thing I will say is that I think we might have identified who within Google wrote this document. Now there's a version of it floating around with a name.[00:36:48] And I look them up on LinkedIn. They're heavily involved in the AI corner of Google. So my guess is that at Google done this one, I've worked for companies. I'll put out a memo, I'll write up a Google doc and I'll email, email it around, and it's nowhere near the official position of the company or of the executive team.[00:37:04] It's somebody's opinion. And so I think it's more likely that this particular document is somebody who works for Google and has an opinion and distributed it internally and then it, and then it got leaked. I dunno if it's necessarily. Represents Google's sort of institutional thinking about this? I think it probably should.[00:37:19] Again, this is such a well-written document. It's so well argued that if I was an executive at Google and I read that, I would, I would be thinking pretty hard about it. But yeah, I don't think we should see it as, as sort of the official secret internal position of the company. Yeah. First[00:37:34] swyx: of all, I might promote that person.[00:37:35] Cuz he's clearly more,[00:37:36] Simon Willison: oh, definitely. He's, he's, he's really, this is a, it's, I, I would hire this person about the strength of that document.[00:37:42] swyx: But second of all, this is more about open eye. Like I'm not interested in Google's official statements about open, but I was interested like his assertion, open eye.[00:37:50] Doesn't have a mote. That's a bold statement. I don't know. It's got the best people.[00:37:55] Travis Fischer: Well, I, I would, I would say two things here. One, it's really interesting just at a meta, meta point that, that they even approached it this way of having this public leak. It, it, it kind of, Talks a little bit to the fact that they, they, they felt that that doing do internally, like wasn't going to get anywhere or, or maybe this speaks to, to some of the like, middle management type stuff or, or within Google.[00:38:18] And then to the, the, the, the point about like opening and not having a moat. I think for, for large language models, it, it, it will be over, over time kind of a race to the bottom just because the switching costs are, are, are so low compared with traditional cloud and sas. And yeah, there will be differences in, in, in quality, but, but like over time, if you, you look at the limit of these things like the, I I think Sam Altman has been quoted a few times saying that the, the, the price of marginal price of intelligence will go to zero.[00:38:47] Time and the marginal price of energy powering that intelligence will, will also hit over time. And in that world, if you're, you're providing large language models, they become commoditized. Like, yeah. What, what is, what is your mode at that point? I don't know. I think they're e extremely well positioned as a team and as a company for leading this space.[00:39:03] I'm not that, that worried about that, but it is something from a strategic point of view to keep in mind about large language models becoming a commodity. So[00:39:11] Simon Willison: it's quite short, so I think it's worth just reading the, in fact, that entire section, it says epilogue. What about open ai? All of this talk of open source can feel unfair given open AI's current closed policy.[00:39:21] Why do we have to share if they won't? That's talking about Google sharing, but the fact of the matter is we are already sharing everything with them. In the form of the steady flow of poached senior researchers until we spent that tide. Secrecy is a moot point. I love that. That's so salty. And, and in the end, open eye doesn't matter.[00:39:38] They are making the same mistakes that we are in their posture relative to open source. And their ability to maintain an edge is necessarily in question. Open source alternatives. Canned will eventually eclipse them. Unless they change their stance in this respect, at least we can make the first move. So the argument this, this paper is making is that Google should go, go like meta and, and just lean right into open sourcing it and engaging with the wider open source community much more deeply, which OpenAI have very much signaled they are not willing to do.[00:40:06] But yeah, it's it's, it's read the whole thing. The whole thing is full of little snippets like that. It's just super fun. Yes,[00:40:12] swyx: yes. Read the whole thing. I, I, I also appreciate that the timeline, because it set a lot of really great context for people who are out of the loop. So Yeah.[00:40:20] Alessio Fanelli: Yeah. And the final conspiracy theory is that right before Sundar and Satya and Sam went to the White House this morning, so.[00:40:29] swyx: Yeah. Did it happen? I haven't caught up the White House statements.[00:40:34] Alessio Fanelli: No. That I, I just saw, I just saw the photos of them going into the, the White House. I've been, I haven't seen any post-meeting updates.[00:40:41] swyx: I think it's a big win for philanthropic to be at that table.[00:40:44] Alessio Fanelli: Oh yeah, for sure. And co here it's not there.[00:40:46] I was like, hmm. Interesting. Well, anyway,[00:40:50] swyx: yeah. They need, they need some help. Okay. Well, I, I promise to keep this relatively tight. Spaces do tend to have a, have a tendency of dragging on. But before we go, anything that you all want to plug, anything that you're working on currently maybe go around Simon are you still working on dataset?[00:41:04] Personal plugs: Simon and Travis[00:41:04] Simon Willison: I am, I am, I'm having a bit of a, so datasets my open source project that I've been working on. It's about helping people analyze and publish data. I'm having an existential crisis of it at the moment because I've got access to the chat g p T code, interpreter mode, and you can upload the sequel light database to that and it will do all of the things that I, on my roadmap for the next 12 months.[00:41:24] Oh my God. So that's frustrating. So I'm basically, I'm leaning data. My interest in data and AI are, are rapidly crossing over a lot harder about the AI features that I need to build on top of dataset. Make sure it stays relevant in a chat. G p t can do most of the stuff that it does already. But yeah the thing, I'll plug my blog simon willis.net.[00:41:43] I'm now updating it daily with stuff because AI move moved so quickly and I have a sub newsletter, which is effectively my blog, but in email form sent out a couple of times a week, which Please subscribe to that or RSS feed on my blog or, or whatever because I'm, I'm trying to keep track of all sorts of things and I'm publishing a lot at the moment.[00:42:02] swyx: Yes. You, you are, and we love you very much for it because you, you are a very good reporter and technical deep diver into things, into all the things. Thank you, Simon. Travis are you ready to announce the, I guess you've announced it some somewhat. Yeah. Yeah.[00:42:14] Travis Fischer: So I'm I, I just founded a company.[00:42:16] I'm working on a framework for building reliable agents that aren't toys and focused on more constrained use cases. And you know, I I, I look at kind of agi. And these, these audigy type type projects as like jumping all the way to str to, to self-driving. And, and we, we, we kind of wanna, wanna start with some more enter and really focus on, on reliable primitives to, to start that.[00:42:38] And that'll be an open source type script project. I'll be releasing the first version of that soon. And that's, that's it. Follow me you know, on here for, for this type of stuff, I, I, I, everything, AI[00:42:48] swyx: and, and spa, his chat PT bot,[00:42:50] Travis Fischer: while you still can. Oh yeah, the chat VT Twitter bot is about 125,000 followers now.[00:42:55] It's still running. I, I'm not sure if it's your credit. Yeah. Can you say how much you spent actually, No, no. Well, I think probably totally like, like a thousand bucks or something, but I, it's, it's sponsored by OpenAI, so I haven't, I haven't actually spent any real money.[00:43:08] swyx: What? That's[00:43:09] awesome.[00:43:10] Travis Fischer: Yeah. Yeah.[00:43:11] Well, once, once I changed, originally the logo was the Chachi VUI logo and it was the green one, and then they, they hit me up and asked me to change it. So it's now it's a purple logo. And they're, they're, they're cool with that. Yeah.[00:43:21] swyx: Yeah. Sending take down notices to people with G B T stuff apparently now.[00:43:26] So it's, yeah, it's a little bit of a gray area. I wanna write more on, on mos. I've been actually collecting and meaning to write a piece of mos and today I saw the memo, I was like, oh, okay. Like I guess today's the day we talk about mos. So thank you all. Thanks. Thanks, Simon. Thanks Travis for, for jumping on and thanks to all the audience for engaging on this with us.[00:43:42] We'll continue to engage on Twitter, but thanks to everyone. Cool. Thanks everyone. Bye. Alright, thanks everyone. Bye. Get full access to Latent Space at www.latent.space/subscribe
43:4905/05/2023
Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit
Latent Space is popping off! Welcome to the over 8500 latent space explorers who have joined us. Join us this month at various events in SF and NYC, or start your own!This post spent 22 hours at the top of Hacker News.As announced during their Developer Day celebrating their $100m fundraise following their Google partnership, Replit is now open sourcing its own state of the art code LLM: replit-code-v1-3b (model card, HF Space), which beats OpenAI’s Codex model on the industry standard HumanEval benchmark when finetuned on Replit data (despite being 77% smaller) and more importantly passes AmjadEval (we’ll explain!)We got an exclusive interview with Reza Shabani, Replit’s Head of AI, to tell the story of Replit’s journey into building a data platform, building GhostWriter, and now training their own LLM, for 22 million developers!8 minutes of this discussion go into a live demo discussing generated code samples - which is always awkward on audio. So we’ve again gone multimodal and put up a screen recording here where you can follow along on the code samples!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. We would really appreciate if you shared our pod with friends on Twitter, LinkedIn, Mastodon, Bluesky, or your social media poison of choice!Timestamps* [00:00:21] Introducing Reza* [00:01:49] Quantitative Finance and Data Engineering* [00:11:23] From Data to AI at Replit* [00:17:26] Replit GhostWriter* [00:20:31] Benchmarking Code LLMs* [00:23:06] AmjadEval live demo* [00:31:21] Aligning Models on Vibes* [00:33:04] Beyond Chat & Code Completion* [00:35:50] Ghostwriter Autonomous Agent* [00:38:47] Releasing Replit-code-v1-3b* [00:43:38] The YOLO training run* [00:49:49] Scaling Laws: from Kaplan to Chinchilla to LLaMA* [00:52:43] MosaicML* [00:55:36] Replit's Plans for the Future (and Hiring!)* [00:59:05] Lightning RoundShow Notes* Reza Shabani on Twitter and LinkedIn* also Michele Catasta and Madhav Singhal* Michele Catasta’s thread on the release of replit-code-v1-3b* Intro to Replit Ghostwriter* Replit Ghostwriter Chat and Building Ghostwriter Chat* Reza on how to train your own LLMs (their top blog of all time)* Our Benchmarks 101 episode where we discussed HumanEval* AmjadEval live demo* Nat.dev* MosaicML CEO Naveen Rao on Replit’s LLM* MosaicML Composer + FSDP code* Replit’s AI team is hiring in North America timezone - Fullstack engineer, Applied AI/ML, and other roles!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my co-host, swyx, writer and editor of Latent Space.[00:00:21] Introducing Reza[00:00:21] swyx: Hey and today we have Reza Shabani, Head of AI at Replit. Welcome to the studio. Thank you. Thank you for having me. So we try to introduce people's bios so you don't have to repeat yourself, but then also get a personal side of you.[00:00:34] You got your PhD in econ from Berkeley, and then you were a startup founder for a bit, and, and then you went into systematic equity trading at BlackRock in Wellington. And then something happened and you were now head of AI at Relet. What should people know about you that might not be apparent on LinkedIn?[00:00:50] One thing[00:00:51] Reza Shabani: that comes up pretty often is whether I know how to code. Yeah, you'd be shocked. A lot of people are kind of like, do you know how to code? When I was talking to Amjad about this role, I'd originally talked to him, I think about a product role and, and didn't get it. Then he was like, well, I know you've done a bunch of data and analytics stuff.[00:01:07] We need someone to work on that. And I was like, sure, I'll, I'll do it. And he was like, okay, but you might have to know how to code. And I was like, yeah, yeah, I, I know how to code. So I think that just kind of surprises people coming from like Ancon background. Yeah. Of people are always kind of like, wait, even when people join Relet, they're like, wait, does this guy actually know how to code?[00:01:28] Is he actually technical? Yeah.[00:01:30] swyx: You did a bunch of number crunching at top financial companies and it still wasn't[00:01:34] Reza Shabani: obvious. Yeah. Yeah. I mean, I, I think someone like in a software engineering background, cuz you think of finance and you think of like calling people to get the deal done and that type of thing.[00:01:43] No, it's, it's not that as, as you know, it's very very quantitative. Especially what I did in, in finance, very quantitative.[00:01:49] Quantitative Finance and Data Engineering[00:01:49] swyx: Yeah, so we can cover a little bit of that and then go into the rapid journey. So as, as you, as you know, I was also a quantitative trader on the sell side and the buy side. And yeah, I actually learned Python there.[00:02:01] I learned my, I wrote my own data pipelines there before airflow was a thing, and it was just me writing running notebooks and not version controlling them. And it was a complete mess, but we were managing a billion dollars on, on my crappy code. Yeah, yeah. What was it like for you?[00:02:17] Reza Shabani: I guess somewhat similar.[00:02:18] I, I started the journey during grad school, so during my PhD and my PhD was in economics and it was always on the more data intensive kind of applied economic side. And, and specifically financial economics. And so what I did for my dissertation I recorded cnbc, the Financial News Network for 10 hours a day, every day.[00:02:39] Extracted the close captions from the video files and then used that to create a second by second transcript of, of cmbc, merged that on with high frequency trading, quote data and then looked at, you know, went in and did some, some nlp, tagging the company names, and and then looked at the price response or the change in price and trading volume in the seconds after a company was mentioned.[00:03:01] And, and this was back in. 2009 that I was doing this. So before cloud, before, before a lot of Python actually. And, and definitely before any of these packages were available to make this stuff easy. And that's where, where I had to really learn to code, like outside of you know, any kind of like data programming languages.[00:03:21] That's when I had to learn Python and had to learn all, all of these other skills to work it with data at that, at that scale. So then, you know, I thought I wanted to do academia. I did terrible on the academic market because everyone looked at my dissertation. They're like, this is cool, but this isn't economics.[00:03:37] And everyone in the computer science department was actually way more interested in it. Like I, I hung out there more than in the econ department and You know, didn't get a single academic offer. Had two offer. I think I only applied to like two industry jobs and got offers from both of them.[00:03:53] They, they saw value in it. One of them was BlackRock and turned it down to, to do my own startup, and then went crawling back two and a half years later after the startup failed.[00:04:02] swyx: Something on your LinkedIn was like you're trading Chinese news tickers or something. Oh, yeah. I forget,[00:04:07] Reza Shabani: forget what that was.[00:04:08] Yeah, I mean oh. There, there was so much stuff. Honestly, like, so systematic active equity at, at BlackRock is, was such an amazing. Group and you just end up learning so much and the, and the possibilities there. Like when you, when you go in and you learn the types of things that they've been trading on for years you know, like a paper will come out in academia and they're like, did you know you can use like this data on searches to predict the price of cars?[00:04:33] And it's like, you go in and they've been trading on that for like eight years. Yeah. So they're, they're really ahead of the curve on, on all of that stuff. And the really interesting stuff that I, that I found when I went in was all like, related to NLP and ml a lot of like transcript data, a lot of like parsing through the types of things that companies talk about, whether an analyst reports, conference calls, earnings reports and the devil's really in the details about like how you make sense of, of that information in a way that, you know, gives you insight into what the company's doing and, and where the market is, is going.[00:05:08] I don't know if we can like nerd out on specific strategies. Yes. Let's go, let's go. What, so one of my favorite strategies that, because it never, I don't think we ended up trading on it, so I can probably talk about it. And it, it just kind of shows like the kind of work that you do around this data.[00:05:23] It was called emerging technologies. And so the whole idea is that there's always a new set of emerging technologies coming onto the market and the companies that are ahead of that curve and stay up to date on on the latest trends are gonna outperform their, their competitors.[00:05:38] And that's gonna reflect in the, in the stock price. So when you have a theory like that, how do you actually turn that into a trading strategy? So what we ended up doing is, well first you have to, to determine what are the emergent technologies, like what are the new up and coming technologies.[00:05:56] And so we actually went and pulled data on startups. And so there's like startups in Silicon Valley. You have all these descriptions of what they do, and you get that, that corpus of like when startups were getting funding. And then you can run non-negative matrix factorization on it and create these clusters of like what the various Emerging technologies are, and you have this all the way going back and you have like social media back in like 2008 when Facebook was, was blowing up.[00:06:21] And and you have things like mobile and digital advertising and and a lot of things actually outside of Silicon Valley. They, you know, like shale and oil cracking. Yeah. Like new technologies in, in all these different types of industries. And then and then you go and you look like, which publicly traded companies are actually talking about these things and and have exposure to these things.[00:06:42] And those are the companies that end up staying ahead of, of their competitors. And a lot of the the cases that came out of that made a ton of sense. Like when mobile was emerging, you had Walmart Labs. Walmart was really far ahead in terms of thinking about mobile and the impact of mobile.[00:06:59] And, and their, you know, Sears wasn't, and Walmart did well, and, and Sears didn't. So lots of different examples of of that, of like a company that talks about a new emerging trend. I can only imagine, like right now, all of the stuff with, with ai, there must be tons of companies talking about, yeah, how does this affect their[00:07:17] swyx: business?[00:07:18] And at some point you do, you do lose the signal. Because you get overwhelmed with noise by people slapping a on everything. Right? Which is, yeah. Yeah. That's what the Long Island Iced Tea Company slaps like blockchain on their name and, you know, their stock price like doubled or something.[00:07:32] Reza Shabani: Yeah, no, that, that's absolutely right.[00:07:35] And, and right now that's definitely the kind of strategy that would not be performing well right now because everyone would be talking about ai. And, and that's, as you know, like that's a lot of what you do in Quant is you, you try to weed out other possible explanations for for why this trend might be happening.[00:07:52] And in that particular case, I think we found that, like the companies, it wasn't, it wasn't like Sears and Walmart were both talking about mobile. It's that Walmart went out of their way to talk about mobile as like a future, mm-hmm. Trend. Whereas Sears just wouldn't bring it up. And then by the time an invest investors are asking you about it, you're probably late to the game.[00:08:12] So it was really identifying those companies that were. At the cutting edge of, of new technologies and, and staying ahead. I remember like Domino's was another big one. Like, I don't know, you[00:08:21] swyx: remember that? So for those who don't know, Domino's Pizza, I think for the run of most of the 2010s was a better performing stock than Amazon.[00:08:29] Yeah.[00:08:31] Reza Shabani: It's insane.[00:08:32] swyx: Yeah. Because of their investment in mobile. Mm-hmm. And, and just online commerce and, and all that. I it must have been fun picking that up. Yeah, that's[00:08:40] Reza Shabani: that's interesting. And I, and I think they had, I don't know if you, if you remember, they had like the pizza tracker, which was on, on mobile.[00:08:46] I use it[00:08:46] swyx: myself. It's a great, it's great app. Great app. I it's mostly faked. I think that[00:08:50] Reza Shabani: that's what I heard. I think it's gonna be like a, a huge I don't know. I'm waiting for like the New York Times article to drop that shows that the whole thing was fake. We all thought our pizzas were at those stages, but they weren't.[00:09:01] swyx: The, the challenge for me, so that so there's a, there's a great piece by Eric Falkenstein called Batesian Mimicry, where every signal essentially gets overwhelmed by noise because the people who wants, who create noise want to follow the, the signal makers. So that actually is why I left quant trading because there's just too much regime changing and like things that would access very well would test poorly out a sample.[00:09:25] And I'm sure you've like, had a little bit of that. And then there's what was the core uncertainty of like, okay, I have identified a factor that performs really well, but that's one factor out of. 500 other factors that could be going on. You have no idea. So anyway, that, that was my existential uncertainty plus the fact that it was a very highly stressful job.[00:09:43] Reza Shabani: Yeah. This is a bit of a tangent, but I, I think about this all the time and I used to have a, a great answer before chat came out, but do you think that AI will win at Quant ever?[00:09:54] swyx: I mean, what is Rentech doing? Whatever they're doing is working apparently. Yeah. But for, for most mortals, I. Like just waving your wand and saying AI doesn't make sense when your sample size is actually fairly low.[00:10:08] Yeah. Like we have maybe 40 years of financial history, if you're lucky. Mm-hmm. Times what, 4,000 listed equities. It's actually not a lot. Yeah, no, it's,[00:10:17] Reza Shabani: it's not a lot at all. And, and constantly changing market conditions and made laden variables and, and all of, all of that as well. Yeah. And then[00:10:24] swyx: retroactively you're like, oh, okay.[00:10:26] Someone will discover a giant factor that, that like explains retroactively everything that you've been doing that you thought was alpha, that you're like, Nope, actually you're just exposed to another factor that you're just, you just didn't think about everything was momentum in.[00:10:37] Yeah. And one piece that I really liked was Andrew Lo. I think he had from mit, I think he had a paper on bid as Spreads. And I think if you, if you just. Taken, took into account liquidity of markets that would account for a lot of active trading strategies, alpha. And that was systematically declined as interest rates declined.[00:10:56] And I mean, it was, it was just like after I looked at that, I was like, okay, I'm never gonna get this right.[00:11:01] Reza Shabani: Yeah. It's a, it's a crazy field and I you know, I, I always thought of like the, the adversarial aspect of it as being the, the part that AI would always have a pretty difficult time tackling.[00:11:13] Yeah. Just because, you know, there's, there's someone on the other end trying to out, out game you and, and AI can, can fail in a lot of those situations. Yeah.[00:11:23] swyx: Cool.[00:11:23] From Data to AI at Replit[00:11:23] Alessio Fanelli: Awesome. And now you've been a rep almost two years. What do you do there? Like what does the, the team do? Like, how has that evolved since you joined?[00:11:32] Especially since large language models are now top of mind, but, you know, two years ago it wasn't quite as mainstream. So how, how has that evolved?[00:11:40] Reza Shabani: Yeah, I, so when I joined, I joined a year and a half ago. We actually had to build out a lot of, of data pipelines.[00:11:45] And so I started doing a lot of data work. And we didn't have you know, there, there were like databases for production systems and, and whatnot, but we just didn't have the the infrastructure to query data at scale and to process that, that data at scale and replica has tons of users tons of data, just tons of ripples.[00:12:04] And I can get into, into some of those numbers, but like, if you wanted to answer the question, for example of what is the most. Forked rep, rep on rep, you couldn't answer that back then because it, the query would just completely time out. And so a lot of the work originally just went into building data infrastructure, like modernizing the data infrastructure in a way where you can answer questions like that, where you can you know, pull in data from any particular rep to process to make available for search.[00:12:34] And, and moving all of that data into a format where you can do all of this in minutes as opposed to, you know, days or weeks or months. That laid a lot of the groundwork for building anything in, in ai, at least in terms of training our own own models and then fine tuning them with, with replica data.[00:12:50] So then you know, we, we started a team last year recruited people from, you know from a team of, of zero or a team of one to, to the AI and data team today. We, we build. Everything related to, to ghostrider. So that means the various features like explain code, generate code, transform Code, and Ghostrider chat which is like a in context ide or a chat product within the, in the ide.[00:13:18] And then the code completion models, which are ghostwriter code complete, which was the, the very first version of, of ghostrider. Yeah. And we also support, you know, things like search and, and anything in terms of what creates, or anything that requires like large data scale or large scale processing of, of data for the site.[00:13:38] And, and various types of like ML algorithms for the site, for internal use of the site to do things like detect and stop abuse. Mm-hmm.[00:13:47] Alessio Fanelli: Yep. Sounds like a lot of the early stuff you worked on was more analytical, kind of like analyzing data, getting answers on these things. Obviously this has evolved now into some.[00:13:57] Production use case code lms, how is the team? And maybe like some of the skills changed. I know there's a lot of people wondering, oh, I was like a modern data stack expert, or whatever. It's like I was doing feature development, like, how's my job gonna change? Like,[00:14:12] Reza Shabani: yeah. It's a good question. I mean, I think that with with language models, the shift has kind of been from, or from traditional ml, a lot of the shift has gone towards more like nlp backed ml, I guess.[00:14:26] And so, you know, there, there's an entire skill set of applicants that I no longer see, at least for, for this role which are like people who know how to do time series and, and ML across time. Right. And, and you, yeah. Like you, you know, that exact feeling of how difficult it is to. You know, you have like some, some text or some variable and then all of a sudden you wanna track that over time.[00:14:50] The number of dimensions that it, that it introduces is just wild and it's a totally different skill set than what we do in a, for example, in in language models. And it's very it's a, it's a skill that is kind of you know, at, at least at rep not used much. And I'm sure in other places used a lot, but a lot of the, the kind of excitement about language models has pulled away attention from some of these other ML areas, which are extremely important and, and I think still going to be valuable.[00:15:21] So I would just recommend like anyone who is a, a data stack expert, like of course it's cool to work with NLP and text data and whatnot, but I do think at some point it's going to you know, having, having skills outside of that area and in more traditional aspects of ML will, will certainly be valuable as well.[00:15:39] swyx: Yeah. I, I'd like to spend a little bit of time on this data stack notion pitch. You were even, you were effectively the first data hire at rep. And I just spent the past year myself diving into data ecosystem. I think a lot of software engineers are actually. Completely unaware that basically every company now eventually evolves.[00:15:57] The data team and the data team does everything that you just mentioned. Yeah. All of us do exactly the same things, set up the same pipelines you know, shop at the same warehouses essentially. Yeah, yeah, yeah, yeah. So that they enable everyone else to query whatever they, whatever they want. And to, to find those insights that that can drive their business.[00:16:15] Because everyone wants to be data driven. They don't want to do the janitorial work that it comes, that comes to, yeah. Yeah. Hooking everything up. What like, so rep is that you think like 90 ish people now, and then you, you joined two years ago. Was it like 30 ish people? Yeah, exactly. We're 30 people where I joined.[00:16:30] So and I just wanna establish your founders. That is exactly when we hired our first data hire at Vilify as well. I think this is just a very common pattern that most founders should be aware of, that like, You start to build a data discipline at this point. And it's, and by the way, a lot of ex finance people very good at this because that's what we do at our finance job.[00:16:48] Reza Shabani: Yeah. Yeah. I was, I was actually gonna Good say that is that in, in some ways, you're kind of like the perfect first data hire because it, you know, you know how to build things in a reliable but fast way and, and how to build them in a way that, you know, it's, it scales over time and evolves over time because financial markets move so quickly that if you were to take all of your time building up these massive systems, like the trading opportunities gone.[00:17:14] So, yeah. Yeah, they're very good at it. Cool. Okay. Well,[00:17:18] swyx: I wanted to cover Ghost Writer as a standalone thing first. Okay. Yeah. And then go into code, you know, V1 or whatever you're calling it. Yeah. Okay. Okay. That sounds good. So order it[00:17:26] Replit GhostWriter[00:17:26] Reza Shabani: however you like. Sure. So the original version of, of Ghost Writer we shipped in August of, of last year.[00:17:33] Yeah. And so this was a. This was a code completion model similar to GitHub's co-pilot. And so, you know, you would have some text and then it would predict like, what, what comes next. And this was, the original version was actually based off of the cogen model. And so this was an open source model developed by Salesforce that was trained on, on tons of publicly available code data.[00:17:58] And so then we took their their model, one of the smaller ones, did some distillation some other kind of fancy tricks to, to make it much faster and and deployed that. And so the innovation there was really around how to reduce the model footprint in a, to, to a size where we could actually serve it to, to our users.[00:18:20] And so the original Ghost Rider You know, we leaned heavily on, on open source. And our, our friends at Salesforce obviously were huge in that, in, in developing these models. And, but, but it was game changing just because we were the first startup to actually put something like that into production.[00:18:38] And, and at the time, you know, if you wanted something like that, there was only one, one name and, and one place in town to, to get it. And and at the same time, I think I, I'm not sure if that's like when the image models were also becoming open sourced for the first time. And so the world went from this place where, you know, there was like literally one company that had all of these, these really advanced models to, oh wait, maybe these things will be everywhere.[00:19:04] And that's exactly what's happened in, in the last Year or so, as, as the models get more powerful and then you always kind of see like an open source version come out that someone else can, can build and put into production very quickly at, at, you know, a fraction of, of the cost. So yeah, that was the, the kind of code completion Go Strider was, was really just, just that we wanted to fine tune it a lot to kind of change the way that our users could interact with it.[00:19:31] So just to make it you know, more customizable for our use cases on, on Rep. And so people on Relet write a lot of, like jsx for example, which I don't think was in the original training set for, for cogen. And and they do specific things that are more Tuned to like html, like they might wanna run, right?[00:19:50] Like inline style or like inline CSS basically. Those types of things. And so we experimented with fine tuning cogen a bit here and there, and, and the results just kind of weren't, weren't there, they weren't where you know, we, we wanted the model to be. And, and then we just figured we should just build our own infrastructure to, you know, train these things from scratch.[00:20:11] Like, LMS aren't going anywhere. This world's not, you know, it's, it's not like we're not going back to that world of there's just one, one game in town. And and we had the skills infrastructure and the, and the team to do it. So we just started doing that. And you know, we'll be this week releasing our very first open source code model.[00:20:31] And,[00:20:31] Benchmarking Code LLMs[00:20:31] Alessio Fanelli: and when you say it was not where you wanted it to be, how were you benchmarking[00:20:36] Reza Shabani: it? In that particular case, we were actually, so, so we have really two sets of benchmarks that, that we use. One is human eval, so just the standard kind of benchmark for, for Python, where you can generate some code or you give you give the model a function definition with, with some string describing what it's supposed to do, and then you allow it to complete that function, and then you run a unit test against it and and see if what it generated passes the test.[00:21:02] So we, we always kind of, we would run this on the, on the model. The, the funny thing is the fine tuned versions of. Of Cogen actually did pretty well on, on that benchmark. But then when we, we then have something called instead of human eval. We call it Amjad eval, which is basically like, what does Amjad think?[00:21:22] Yeah, it's, it's exactly that. It's like testing the vibes of, of a model. And it's, it's cra like I've never seen him, I, I've never seen anyone test the model so thoroughly in such a short amount of time. He's, he's like, he knows exactly what to write and, and how to prompt the model to, to get you know, a very quick read on, on its quote unquote vibes.[00:21:43] And and we take that like really seriously. And I, I remember there was like one, one time where we trained a model that had really good you know, human eval scores. And the vibes were just terrible. Like, it just wouldn't, you know, it, it seemed overtrained. So so that's a lot of what we found is like we, we just couldn't get it to Pass the vibes test no matter how the, how[00:22:04] swyx: eval.[00:22:04] Well, can you formalize I'm jal because I, I actually have a problem. Slight discomfort with human eval. Effectively being the only code benchmark Yeah. That we have. Yeah. Isn't that[00:22:14] Reza Shabani: weird? It's bizarre. It's, it's, it's weird that we can't do better than that in some, some way. So, okay. If[00:22:21] swyx: I, if I asked you to formalize Mja, what does he look for that human eval doesn't do well on?[00:22:25] Reza Shabani: Ah, that is a, that's a great question. A lot of it is kind of a lot of it is contextual like deep within, within specific functions. Let me think about this.[00:22:38] swyx: Yeah, we, we can pause for. And if you need to pull up something.[00:22:41] Reza Shabani: Yeah, I, let me, let me pull up a few. This, this[00:22:43] swyx: is gold, this catnip for people.[00:22:45] Okay. Because we might actually influence a benchmark being evolved, right. So, yeah. Yeah. That would be,[00:22:50] Reza Shabani: that would be huge. This was, this was his original message when he said the vibes test with, with flying colors. And so you have some, some ghostrider comparisons ghost Rider on the left, and cogen is on the right.[00:23:06] AmjadEval live demo[00:23:06] Reza Shabani: So here's Ghostrider. Okay.[00:23:09] swyx: So basically, so if I, if I summarize it from a, for ghosting the, there's a, there's a, there's a bunch of comments talking about how you basically implement a clone. Process or to to c Clooney process. And it's describing a bunch of possible states that he might want to, to match.[00:23:25] And then it asks for a single line of code for defining what possible values of a name space it might be to initialize it in amjadi val With what model is this? Is this your, this is model. This is the one we're releasing. Yeah. Yeah. It actually defines constants which are human readable and nice.[00:23:42] And then in the other cogen Salesforce model, it just initializes it to zero because it reads that it starts of an int Yeah, exactly. So[00:23:51] Reza Shabani: interesting. Yeah. So you had a much better explanation of, of that than than I did. It's okay. So this is, yeah. Handle operation. This is on the left.[00:24:00] Okay.[00:24:00] swyx: So this is rep's version. Yeah. Where it's implementing a function and an in filling, is that what it's doing inside of a sum operation?[00:24:07] Reza Shabani: This, so this one doesn't actually do the infill, so that's the completion inside of the, of the sum operation. But it, it's not, it's, it, it's not taking into account context after this value, but[00:24:18] swyx: Right, right.[00:24:19] So it's writing an inline lambda function in Python. Okay.[00:24:21] Reza Shabani: Mm-hmm. Versus[00:24:24] swyx: this one is just passing in the nearest available variable. It's, it can find, yeah.[00:24:30] Reza Shabani: Okay. So so, okay. I'll, I'll get some really good ones in a, in a second. So, okay. Here's tokenize. So[00:24:37] swyx: this is an assertion on a value, and it's helping to basically complete the entire, I think it looks like an E s T that you're writing here.[00:24:46] Mm-hmm. That's good. That that's, that's good. And then what does Salesforce cogen do? This is Salesforce cogen here. So is that invalidism way or what, what are we supposed to do? It's just making up tokens. Oh, okay. Yeah, yeah, yeah. So it's just, it's just much better at context. Yeah. Okay.[00:25:04] Reza Shabani: And, and I guess to be fair, we have to show a case where co cogen does better.[00:25:09] Okay. All right. So here's, here's one on the left right, which[00:25:12] swyx: is another assertion where it's just saying that if you pass in a list, it's going to throw an exception saying in an expectedly list and Salesforce code, Jen says,[00:25:24] Reza Shabani: This is so, so ghost writer was sure that the first argument needs to be a list[00:25:30] swyx: here.[00:25:30] So it hallucinated that it wanted a list. Yeah. Even though you never said it was gonna be a list.[00:25:35] Reza Shabani: Yeah. And it's, it's a argument of that. Yeah. Mm-hmm. So, okay, here's a, here's a cooler quiz for you all, cuz I struggled with this one for a second. Okay. What is.[00:25:47] swyx: Okay, so this is a four loop example from Amjad.[00:25:50] And it's, it's sort of like a q and a context in a chat bot. And it's, and it asks, and Amjad is asking, what does this code log? And it just paste in some JavaScript code. The JavaScript code is a four loop with a set time out inside of it with a cons. The console logs out the iteration variable of the for loop and increasing numbers of of, of times.[00:26:10] So it's, it goes from zero to five and then it just increases the, the delay between the timeouts each, each time. Yeah.[00:26:15] Reza Shabani: So, okay. So this answer was provided by by Bard. Mm-hmm. And does it look correct to you? Well,[00:26:22] the[00:26:22] Alessio Fanelli: numbers too, but it's not one second. It's the time between them increases.[00:26:27] It's like the first one, then the one is one second apart, then it's two seconds, three seconds. So[00:26:32] Reza Shabani: it's not, well, well, so I, you know, when I saw this and, and the, the message and the thread was like, Our model's better than Bard at, at coding Uhhuh. This is the Bard answer Uhhuh that looks totally right to me.[00:26:46] Yeah. And this is our[00:26:47] swyx: answer. It logs 5 5 55, what is it? Log five 50. 55 oh oh. Because because it logs the state of I, which is five by the time that the log happens. Mm-hmm. Yeah.[00:27:01] Reza Shabani: Oh God. So like we, you know we were shocked. Like, and, and the Bard dancer looked totally right to, to me. Yeah. And then, and somehow our code completion model mind Jude, like this is not a conversational chat model.[00:27:14] Mm-hmm. Somehow gets this right. And and, you know, Bard obviously a much larger much more capable model with all this fancy transfer learning and, and and whatnot. Some somehow, you know, doesn't get it right. So, This is the kind of stuff that goes into, into mja eval that you, you won't find in any benchmark.[00:27:35] Good. And and, and it's, it's the kind of thing that, you know, makes something pass a, a vibe test at Rep.[00:27:42] swyx: Okay. Well, okay, so me, this is not a vibe, this is not so much a vibe test as the, these are just interview questions. Yeah, that's, we're straight up just asking interview questions[00:27:50] Reza Shabani: right now. Yeah, no, the, the vibe test, the reason why it's really difficult to kind of show screenshots that have a vibe test is because it really kind of depends on like how snappy the completion is, how what the latency feels like and if it gets, if it, if it feels like it's making you more productive.[00:28:08] And and a lot of the time, you know, like the, the mix of, of really low latency and actually helpful content and, and helpful completions is what makes up the, the vibe test. And I think part of it is also, is it. Is it returning to you or the, the lack of it returning to you things that may look right, but be completely wrong.[00:28:30] I think that also kind of affects Yeah. Yeah. The, the vibe test as well. Yeah. And so, yeah, th this is very much like a, like a interview question. Yeah.[00:28:39] swyx: The, the one with the number of processes that, that was definitely a vibe test. Like what kind of code style do you expect in this situation? Yeah.[00:28:47] Is this another example? Okay.[00:28:49] Reza Shabani: Yeah. This is another example with some more Okay. Explanations.[00:28:53] swyx: Should we look at the Bard one[00:28:54] Reza Shabani: first? Sure. These are, I think these are, yeah. This is original GT three with full size 175. Billion[00:29:03] swyx: parameters. Okay, so you asked GPC three, I'm a highly intelligent question answering bot.[00:29:07] If you ask me a question that is rooted in truth, I'll give you the answer. If you ask me a question that is nonsense I will respond with unknown. And then you ask it a question. What is the square root of a bananas banana? It answers nine. So complete hallucination and failed to follow the instruction that you gave it.[00:29:22] I wonder if it follows if one, if you use an instruction to inversion it might, yeah. Do what better?[00:29:28] Reza Shabani: On, on the original[00:29:29] swyx: GP T Yeah, because I like it. Just, you're, you're giving an instructions and it's not[00:29:33] Reza Shabani: instruction tuned. Now. Now the interesting thing though is our model here, which does follow the instructions this is not instruction tuned yet, and we still are planning to instruction tune.[00:29:43] Right? So it's like for like, yeah, yeah, exactly. So,[00:29:45] swyx: So this is a replica model. Same question. What is the square of bananas? Banana. And it answers unknown. And this being one of the, the thing that Amjad was talking about, which you guys are. Finding as a discovery, which is, it's better on pure natural language questions, even though you trained it on code.[00:30:02] Exactly. Yeah. Hmm. Is that because there's a lot of comments in,[00:30:07] Reza Shabani: No. I mean, I think part of it is that there's a lot of comments and there's also a lot of natural language in, in a lot of code right. In terms of documentation, you know, you have a lot of like markdowns and restructured text and there's also just a lot of web-based code on, on replica, and HTML tends to have a lot of natural language in it.[00:30:27] But I don't think the comments from code would help it reason in this way. And, you know, where you can answer questions like based on instructions, for example. Okay. But yeah, it's, I know that that's like one of the things. That really shocked us is the kind of the, the fact that like, it's really good at, at natural language reasoning, even though it was trained on, on code.[00:30:49] swyx: Was this the reason that you started running your model on hella swag and[00:30:53] Reza Shabani: all the other Yeah, exactly. Interesting. And the, yeah, it's, it's kind of funny. Like it's in some ways it kind of makes sense. I mean, a lot of like code involves a lot of reasoning and logic which language models need and need to develop and, and whatnot.[00:31:09] And so you know, we, we have this hunch that maybe that using that as part of the training beforehand and then training it on natural language above and beyond that really tends to help. Yeah,[00:31:21] Aligning Models on Vibes[00:31:21] Alessio Fanelli: this is so interesting. I, I'm trying to think, how do you align a model on vibes? You know, like Bard, Bard is not purposefully being bad, right?[00:31:30] Like, there's obviously something either in like the training data, like how you're running the process that like, makes it so that the vibes are better. It's like when it, when it fails this test, like how do you go back to the team and say, Hey, we need to get better[00:31:44] Reza Shabani: vibes. Yeah, let's do, yeah. Yeah. It's a, it's a great question.[00:31:49] It's a di it's very difficult to do. It's not you know, so much of what goes into these models in, in the same way that we have no idea how we can get that question right. The programming you know, quiz question. Right. Whereas Bard got it wrong. We, we also have no idea how to take certain things out and or, and to, you know, remove certain aspects of, of vibes.[00:32:13] Of course there's, there's things you can do to like scrub the model, but it's, it's very difficult to, to get it to be better at something. It's, it's almost like all you can do is, is give it the right type of, of data that you think will do well. And then and, and of course later do some fancy type of like, instruction tuning or, or whatever else.[00:32:33] But a lot of what we do is finding the right mix of optimal data that we want to, to feed into the model and then hoping that the, that the data that's fed in is sufficiently representative of, of the type of generations that we want to do coming out. That's really the best that, that you can do.[00:32:51] Either the model has. Vibes or, or it doesn't, you can't teach vibes. Like you can't sprinkle additional vibes in it. Yeah, yeah, yeah. Same in real life. Yeah, exactly right. Yeah, exactly. You[00:33:04] Beyond Code Completion[00:33:04] Alessio Fanelli: mentioned, you know, co being the only show in town when you started, now you have this, there's obviously a, a bunch of them, right.[00:33:10] Cody, which we had on the podcast used to be Tap nine, kite, all these different, all these different things. Like, do you think the vibes are gonna be the main you know, way to differentiate them? Like, how are you thinking about. What's gonna make Ghost Rider, like stand apart or like, do you just expect this to be like table stakes for any tool?[00:33:28] So like, it just gonna be there?[00:33:30] Reza Shabani: Yeah. I, I do think it's, it's going to be table stakes for sure. I, I think that if you don't if you don't have AI assisted technology, especially in, in coding it's, it's just going to feel pretty antiquated. But but I do think that Ghost Rider stands apart from some of, of these other tools for for specific reasons too.[00:33:51] So this is kind of the, one of, one of the things that these models haven't really done yet is Come outside of code completion and outside of, of just a, a single editor file, right? So what they're doing is they're, they're predicting like the text that can come next, but they're not helping with the development process quite, quite yet outside of just completing code in a, in a text file.[00:34:16] And so the types of things that we wanna do with Ghost Rider are enable it to, to help in the software development process not just editing particular files. And so so that means using a right mix of like the right model for for the task at hand. But but we want Ghost Rider to be able to, to create scaffolding for you for, for these projects.[00:34:38] And so imagine if you would like Terraform. But, but powered by Ghostrider, right? I want to, I put up this website, I'm starting to get a ton of traffic to it and and maybe like I need to, to create a backend database. And so we want that to come from ghostrider as well, so it can actually look at your traffic, look at your code, and create.[00:34:59] You know a, a schema for you that you can then deploy in, in Postgres or, or whatever else? You know, I, I know like doing anything in in cloud can be a nightmare as well. Like if you wanna create a new service account and you wanna deploy you know, nodes on and, and have that service account, kind of talk to those nodes and return some, some other information, like those are the types of things that currently we have to kind of go, go back, go look at some documentation for Google Cloud, go look at how our code base does it you know, ask around in Slack, kind of figure that out and, and create a pull request.[00:35:31] Those are the types of things that we think we can automate away with with more advanced uses of, of ghostwriter once we go past, like, here's what would come next in, in this file. So, so that's the real promise of it, is, is the ability to help you kind of generate software instead of just code in a, in a particular file.[00:35:50] Ghostwriter Autonomous Agent[00:35:50] Reza Shabani: Are[00:35:50] Alessio Fanelli: you giving REPL access to the model? Like not rep, like the actual rep. Like once the model generates some of this code, especially when it's in the background, it's not, the completion use case can actually run the code to see if it works. There's like a cool open source project called Walgreen that does something like that.[00:36:07] It's like self-healing software. Like it gives a REPL access and like keeps running until it fixes[00:36:11] Reza Shabani: itself. Yeah. So, so, so right now there, so there's Ghostrider chat and Ghostrider code completion. So Ghostrider Chat does have, have that advantage in, in that it can it, it knows all the different parts of, of the ide and so for example, like if an error is thrown, it can look at the, the trace back and suggest like a fix for you.[00:36:33] So it has that type of integration. But the what, what we really want to do is is. Is merge the two in a way where we want Ghost Rider to be like, like an autonomous agent that can actually drive the ide. So in these action models, you know, where you have like a sequence of of events and then you can use you know, transformers to kind of keep track of that sequence and predict the next next event.[00:36:56] It's how, you know, companies like, like adapt work these like browser models that can, you know, go and scroll through different websites or, or take some, some series of actions in a, in a sequence. Well, it turns out the IDE is actually a perfect place to do that, right? So like when we talk about creating software, not just completing code in a file what do you do when you, when you build software?[00:37:17] You, you might clone a repo and then you, you know, will go and change some things. You might add a new file go down, highlight some text, delete that value, and point it to some new database, depending on the value in a different config file or in your environment. And then you would go in and add additional block code to, to extend its functionality and then you might deploy that.[00:37:40] Well, we, we have all of that data right there in the replica ide. And and we have like terabytes and terabytes of, of OT data you know, operational transform data. And so, you know, we can we can see that like this person has created a, a file what they call it, and, you know, they start typing in the file.[00:37:58] They go back and edit a different file to match the you know, the class name that they just put in, in the original file. All of that, that kind of sequence data is what we're looking to to train our next model on. And so that, that entire kind of process of actually building software within the I D E, not just like, here's some text what comes next, but rather the, the actions that go into, you know, creating a fully developed program.[00:38:25] And a lot of that includes, for example, like running the code and seeing does this work, does this do what I expected? Does it error out? And then what does it do in response to that error? So all, all of that is like, Insanely valuable information that we want to put into our, our next model. And and that's like, we think that one can be way more advanced than the, than this, you know, go straighter code completion model.[00:38:47] Releasing Replit-code-v1-3b[00:38:47] swyx: Cool. Well we wanted to dive in a little bit more on, on the model that you're releasing. Maybe we can just give people a high level what is being released what have you decided to open source and maybe why open source the story of the YOLO project and Yeah. I mean, it's a cool story and just tell it from the start.[00:39:06] Yeah.[00:39:06] Reza Shabani: So, so what's being released is the, the first version that we're going to release. It's a, it's a code model called replica Code V1 three B. So this is a relatively small model. It's 2.7 billion parameters. And it's a, it's the first llama style model for code. So, meaning it's just seen tons and tons of tokens.[00:39:26] It's been trained on 525 billion tokens of, of code all permissively licensed code. And it's it's three epox over the training set. And And, you know, all of that in a, in a 2.7 billion parameter model. And in addition to that, we, for, for this project or, and for this model, we trained our very own vocabulary as well.[00:39:48] So this, this doesn't use the cogen vocab. For, for the tokenize we, we trained a totally new tokenize on the underlying data from, from scratch, and we'll be open sourcing that as well. It has something like 32,000. The vocabulary size is, is in the 32 thousands as opposed to the 50 thousands.[00:40:08] Much more specific for, for code. And, and so it's smaller faster, that helps with inference, it helps with training and it can produce more relevant content just because of the you know, the, the vocab is very much trained on, on code as opposed to, to natural language. So, yeah, we'll be releasing that.[00:40:29] This week it'll be up on, on hugging pace so people can take it play with it, you know, fine tune it, do all type of things with it. We want to, we're eager and excited to see what people do with the, the code completion model. It's, it's small, it's very fast. We think it has great vibes, but we, we hope like other people feel the same way.[00:40:49] And yeah. And then after, after that, we might consider releasing the replica tuned model at, at some point as well, but still doing some, some more work around that.[00:40:58] swyx: Right? So there are actually two models, A replica code V1 three B and replica fine tune V1 three B. And the fine tune one is the one that has the 50% improvement in in common sense benchmarks, which is going from 20% to 30%.[00:41:13] For,[00:41:13] Reza Shabani: for yes. Yeah, yeah, yeah, exactly. And so, so that one, the, the additional tuning that was done on that was on the publicly available data on, on rep. And so, so that's, that's you know, data that's in public res is Permissively licensed. So fine tuning on on that. Then, Leads to a surprisingly better, like significantly better model, which is this retuned V1 three B, same size, you know, same, very fast inference, same vocabulary and everything.[00:41:46] The only difference is that it's been trained on additional replica data. Yeah.[00:41:50] swyx: And I think I'll call out that I think in one of the follow up q and as that Amjad mentioned, people had some concerns with using replica data. Not, I mean, the licensing is fine, it's more about the data quality because there's a lot of beginner code Yeah.[00:42:03] And a lot of maybe wrong code. Mm-hmm. But it apparently just wasn't an issue at all. You did[00:42:08] Reza Shabani: some filtering. Yeah. I mean, well, so, so we did some filtering, but, but as you know, it's when you're, when you're talking about data at that scale, it's impossible to keep out, you know, all of the, it's, it's impossible to find only select pieces of data that you want the, the model to see.[00:42:24] And, and so a lot of the, a lot of that kind of, you know, people who are learning to code material was in there anyway. And, and you know, we obviously did some quality filtering, but a lot of it went into the fine tuning process and it really helped for some reason. You know, there's a lot of high quality code on, on replica, but there's like you, like you said, a lot of beginner code as well.[00:42:46] And that was, that was the really surprising thing is that That somehow really improved the model and its reasoning capabilities. It felt much more kind of instruction tuned afterward. And, and you know, we have our kind of suspicions as as to why there's, there's a lot of like assignments on rep that kind of explain this is how you do something and then you might have like answers and, and whatnot.[00:43:06] There's a lot of people who learn to code on, on rep, right? And, and like, think of a beginner coder, like think of a code model that's learning to, to code learning this reasoning and logic. It's probably a lot more valuable to see that type of, you know, the, the type of stuff that you find on rep as opposed to like a large legacy code base that that is, you know, difficult to, to parse and, and figure out.[00:43:29] So, so that was very surprising to see, you know, just such a huge jump in in reasoning ability once trained on, on replica data.[00:43:38] The YOLO training run[00:43:38] swyx: Yeah. Perfect. So we're gonna do a little bit of storytelling just leading up to the, the an the developer day that you had last week. Yeah. My understanding is you decide, you raised some money, you decided to have a developer day, you had a bunch of announcements queued up.[00:43:52] And then you were like, let's train the language model. Yeah. You published a blog post and then you announced it on Devrel Day. What, what, and, and you called it the yolo, right? So like, let's just take us through like the[00:44:01] Reza Shabani: sequence of events. So so we had been building the infrastructure to kind of to, to be able to train our own models for, for months now.[00:44:08] And so that involves like laying out the infrastructure, being able to pull in the, the data processes at scale. Being able to do things like train your own tokenizes. And and even before this you know, we had to build out a lot of this data infrastructure for, for powering things like search.[00:44:24] There's over, I think the public number is like 200 and and 30 million res on, on re. And each of these res have like many different files and, and lots of code, lots of content. And so you can imagine like what it must be like to, to be able to query that, that amount of, of data in a, in a reasonable amount of time.[00:44:45] So we've You know, we spent a lot of time just building the infrastructure that allows for for us to do something like that and, and really optimize that. And, and this was by the end of last year. That was the case. Like I think I did a demo where I showed you can, you can go through all of replica data and parse the function signature of every Python function in like under two minutes.[00:45:07] And, and there's, you know, many, many of them. And so a and, and then leading up to developer day, you know, we had, we'd kind of set up these pipelines. We'd started training these, these models, deploying them into production, kind of iterating and, and getting that model training to production loop.[00:45:24] But we'd only really done like 1.3 billion parameter models. It was like all JavaScript or all Python. So there were still some things like we couldn't figure out like the most optimal way to to, to do it. So things like how do you pad or yeah, how do you how do you prefix chunks when you have like multi-language models, what's like the optimal way to do it and, and so on.[00:45:46] So you know, there's two PhDs on, on the team. Myself and Mike and PhDs tend to be like careful about, you know, a systematic approach and, and whatnot. And so we had this whole like list of things we were gonna do, like, oh, we'll test it on this thing and, and so on. And even these, like 1.3 billion parameter models, they were only trained on maybe like 20 billion tokens or 30 billion tokens.[00:46:10] And and then Amjad joins the call and he's like, no, let's just, let's just yolo this. Like, let's just, you know, we're raising money. Like we should have a better code model. Like, let's yolo it. Let's like run it on all the data. How many tokens do we have? And, and, and we're like, you know, both Michael and I are like, I, I looked at 'em during the call and we were both like, oh God is like, are we really just gonna do this?[00:46:33] And[00:46:34] swyx: well, what is the what's the hangup? I mean, you know that large models work,[00:46:37] Reza Shabani: you know that they work, but you, you also don't know whether or not you can improve the process in, in In important ways by doing more data work, scrubbing additional content, and, and also it's expensive. It's like, it, it can, you know it can cost quite a bit and if you, and if you do it incorrectly, you can actually get it.[00:47:00] Or you, you know, it's[00:47:02] swyx: like you hit button, the button, the go button once and you sit, sit back for three days.[00:47:05] Reza Shabani: Exactly. Yeah. Right. Well, like more like two days. Yeah. Well, in, in our case, yeah, two days if you're running 256 GP 100. Yeah. Yeah. And and, and then when that comes back, you know, you have to take some time to kind of to test it.[00:47:19] And then if it fails and you can't really figure out why, and like, yeah, it's, it's just a, it's kind of like a, a. A time consuming process and you just don't know what's going to, to come out of it. But no, I mean, I'm Judd was like, no, let's just train it on all the data. How many tokens do we have? We tell him and he is like, that's not enough.[00:47:38] Where can we get more tokens? Okay. And so Michele had this you know, great idea to to train it on multiple epox and so[00:47:45] swyx: resampling the same data again.[00:47:47] Reza Shabani: Yeah. Which, which can be, which is known risky or like, or tends to overfit. Yeah, you can, you can over overfit. But you know, he, he pointed us to some evidence that actually maybe this isn't really a going to be a problem.[00:48:00] And, and he was very persuasive in, in doing that. And so it, it was risky and, and you know, we did that training. It turned out. Like to actually be great for that, for that base model. And so then we decided like, let's keep pushing. We have 256 TVs running. Let's see what else we can do with it.[00:48:20] So we ran a couple other implementations. We ran you know, a the fine tune version as I, as I said, and that's where it becomes really valuable to have had that entire pipeline built out because then we can pull all the right data, de-dupe it, like go through the, the entire like processing stack that we had done for like months.[00:48:41] We did that in, in a matter of like two days for, for the replica data as well removed, you know, any of, any personal any pii like personal information removed, harmful content, removed, any of, of that stuff. And we just put it back through the that same pipeline and then trained on top of that.[00:48:59] And so I believe that replica tune data has seen something like 680. Billion tokens. And, and that's in terms of code, I mean, that's like a, a universe of code. There really isn't that much more out there. And, and it, you know, gave us really, really promising results. And then we also did like a UL two run, which allows like fill the middle capabilities and and, and will be, you know working to deploy that on, on rep and test that out as well soon.[00:49:29] But it was really just one of those Those cases where, like, leading up to developer day, had we, had we done this in this more like careful, systematic way what, what would've occurred in probably like two, three months. I got us to do it in, in a week. That's fun. It was a lot of fun. Yeah.[00:49:49] Scaling Laws: from Kaplan to Chinchilla to LLaMA[00:49:49] Alessio Fanelli: And so every time I, I've seen the stable releases to every time none of these models fit, like the chinchilla loss in, in quotes, which is supposed to be, you know, 20 tokens per, per, what's this part of the yo run?[00:50:04] Or like, you're just like, let's just throw out the tokens at it doesn't matter. What's most efficient or like, do you think there's something about some of these scaling laws where like, yeah, maybe it's good in theory, but I'd rather not risk it and just throw out the tokens that I have at it? Yeah,[00:50:18] Reza Shabani: I think it's, it's hard to, it's hard to tell just because there's.[00:50:23] You know, like, like I said, like these runs are expensive and they haven't, if, if you think about how many, how often these runs have been done, like the number of models out there and then, and then thoroughly tested in some forum. And, and so I don't mean just like human eval, but actually in front of actual users for actual inference as part of a, a real product that, that people are using.[00:50:45] I mean, it's not that many. And, and so it's not like there's there's like really well established kind of rules as to whether or not something like that could lead to, to crazy amounts of overfitting or not. You just kind of have to use some, some intuition around it. And, and what we kind of found is that our, our results seem to imply that we've really been under training these, these models.[00:51:06] Oh my god. And so like that, you know, all, all of the compute that we kind of. Through, with this and, and the number of tokens, it, it really seems to help and really seems to to improve. And I, and I think, you know, these things kind of happen where in, in the literature where everyone kind of converges to something seems to take it for for a fact.[00:51:27] And like, like Chinchilla is a great example of like, okay, you know, 20 tokens. Yeah. And but, but then, you know, until someone else comes along and kind of tries tries it out and sees actually this seems to work better. And then from our results, it seems imply actually maybe even even lla. Maybe Undertrained.[00:51:45] And, and it may be better to go even You know, like train on on even more tokens then and for, for the[00:51:52] swyx: listener, like the original scaling law was Kaplan, which is 1.7. Mm-hmm. And then Chin established 20. Yeah. And now Lama style seems to mean 200 x tokens to parameters, ratio. Yeah. So obviously you should go to 2000 X, right?[00:52:06] Like, I mean, it's,[00:52:08] Reza Shabani: I mean, we're, we're kind of out of code at that point, you know, it's like there, there is a real shortage of it, but I know that I, I know there are people working on I don't know if it's quite 2000, but it's, it's getting close on you know language models. And so our friends at at Mosaic are are working on some of these really, really big models that are, you know, language because you with just code, you, you end up running out of out of context.[00:52:31] So Jonathan at, at Mosaic has Jonathan and Naveen both have really interesting content on, on Twitter about that. Yeah. And I just highly recommend following Jonathan. Yeah,[00:52:43] MosaicML[00:52:43] swyx: I'm sure you do. Well, CAGR, can we talk about, so I, I was sitting next to Naveen. I'm sure he's very, very happy that you, you guys had such, such success with Mosaic.[00:52:50] Maybe could, could you shout out like what Mosaic did to help you out? What, what they do well, what maybe people don't appreciate about having a trusted infrastructure provider versus a commodity GPU provider?[00:53:01] Reza Shabani: Yeah, so I mean, I, I talked about this a little bit in the in, in the blog post in terms of like what, what advantages like Mosaic offers and, and you know, keep in mind, like we had, we had deployed our own training infrastructure before this, and so we had some experience with it.[00:53:15] It wasn't like we had just, just tried Mosaic And, and some of those things. One is like you can actually get GPUs from different providers and you don't need to be you know, signed up for that cloud provider. So it's, it kind of detaches like your GPU offering from the rest of your cloud because most of our cloud runs in, in gcp.[00:53:34] But you know, this allowed us to leverage GPUs and other providers as well. And then another thing is like train or infrastructure as a service. So you know, these GPUs burn out. You have note failures, you have like all, all kinds of hardware issues that come up. And so the ability to kind of not have to deal with that and, and allow mosaic and team to kind of provide that type of, of fault tolerance was huge for us.[00:53:59] As well as a lot of their preconfigured l m configurations for, for these runs. And so they have a lot of experience in, in training these models. And so they have. You know, the, the right kind of pre-configured setups for, for various models that make sure that, you know, you have the right learning rates, the right training parameters, and that you're making the, the best use of the GPU and, and the underlying hardware.[00:54:26] And so you know, your GPU utilization is always at, at optimal levels. You have like fewer law spikes than if you do, you can recover from them. And you're really getting the most value out of, out of the compute that you're kind of throwing at, at your data. We found that to be incredibly, incredibly helpful.[00:54:44] And so it, of the time that we spent running things on Mosaic, like very little of that time is trying to figure out why the G P U isn't being utilized or why you know, it keeps crashing or, or why we, you have like a cuda out of memory errors or something like that. So like all, all of those things that make training a nightmare Are are, you know, really well handled by, by Mosaic and the composer cloud and and ecosystem.[00:55:12] Yeah. I was gonna[00:55:13] swyx: ask cuz you're on gcp if you're attempted to rewrite things for the TPUs. Cause Google's always saying that it's more efficient and faster, whatever, but no one has experience with them. Yeah.[00:55:23] Reza Shabani: That's kind of the problem is that no one's building on them, right? Yeah. Like, like we want to build on, on systems that everyone else is, is building for.[00:55:31] Yeah. And and so with, with the, with the TPUs that it's not easy to do that.[00:55:36] Replit's Plans for the Future (and Hiring!)[00:55:36] swyx: So plans for the future, like hard problems that you wanna solve? Maybe like what, what do you like what kind of people that you're hiring on your team?[00:55:44] Reza Shabani: Yeah. So We are, we're currently hiring for for two different roles on, on my team.[00:55:49] Although we, you know, welcome applications from anyone that, that thinks they can contribute in, in this area. Replica tends to be like a, a band of misfits. And, and the type of people we work with and, and have on our team are you know, like just the, the perfect mix to, to do amazing projects like this with very, very few people.[00:56:09] Right now we're hiring for the applied a applied to AI ml engineer. And so, you know, this is someone who's. Creating data pipelines, processing the data at scale creating runs and and training models and you know, running different variations, testing the output running human evals and, and solving a, a ton of the issues that come up in the, in the training pipeline from beginning to end.[00:56:34] And so, you know, if you read the, the blog post we'll be going into, we'll be releasing additional blog posts that go into the details of, of each of those different sections. You know, just like tokenized training is incredibly complex and you can write, you know, a whole series of blog posts on that.[00:56:50] And so the, those types of really challenging. Engineering problems of how do you sample this data at, at scale from different languages in different RDS and pipelines and, and feed them to you know, sense peace tokenize to, to learn. If you're interested in working in that type of, of stuff we'd love to speak with you.[00:57:10] And and same for on the inference side. So like, if you wanna figure out how to make these models be lightning fast and optimize the the transformer layer to get like as much out of out of inference and reduce latency as much as possible you know, you'd be, you'd be joining our team and working alongside.[00:57:29] Bradley, for example, who was like he, I always embarrass him and he's like the most humble person ever, but I'm gonna embarrass him here. He was employee number seven at YouTube and Wow. Yeah, so when I met him I was like, why are you here? But that's like the kind of person that joins Relet and, you know, he, he's obviously seen like how to scale systems and, and seen, seen it all.[00:57:52] And like he's like the type of person who works on like our inference stack and makes it faster and scalable and and is phenomenal. So if you're just a solid engineer and wanna work on anything related to LLMs In terms of like training inference, data pipelines the applied AI ML role is, is a great role.[00:58:12] We're also hiring for a full stack engineer. So this would be someone on my team who does both the model training stuff, but, but is more oriented towards bringing that AI to to users. And so that could mean many different things. It could mean you know, on the front end building the integrations with the workspace that allow you to, to receive the code completion models.[00:58:34] It means working on Go rider chats, like the conversational ability between. Ghost Writer and what you're trying to do, building the various agents that we want replica to have access to. Creating embeddings to allow people to ask questions about you know, docs or or, or their own projects or, or other teams, projects that they're collaborating with.[00:58:55] All of those types of things are in the, in the kind of full stack role that that I'm hiring for on my team as well. Perfect. Awesome.[00:59:05] Lightning Round[00:59:05] Alessio Fanelli: Yeah, let's jump into Lining Ground. We'll ask you Factbook questions give us a short answer. I know it's a landing ground, but Sean likes to ask follow up questions to the landing ground questions.[00:59:15] So be ready.[00:59:18] swyx: Yeah. This is an acceleration question. What is something you thought would take much longer, but it's already here.[00:59:24] It's coming true much faster than you thought.[00:59:27] Reza Shabani: Ai I mean, it's, it's like I, I know it's cliche, but like every episode of Of Black Mirror that I watched like in the past five years is already Yeah. Becoming true, if not, will become true very, very soon. I remember that during there was like one episode where this, this woman, her boyfriend dies and then they train the data on, they, they go through all of his social media and train a, a chat bot to speak like him.[00:59:54] And at the, and you know, she starts speaking to him and, and it speaks like him. And she's like, blown away by this. And I think everyone was blown away by that. Yeah. That's like old news. That's like, it's, and, and I think that that's mind blowing. How, how quickly it's here and, and how much it's going to keep changing.[01:00:13] Yeah.[01:00:14] swyx: Yeah. Yeah. And, and you, you mentioned that you're also thinking about the social impact of some of these things that we're doing.[01:00:19] Reza Shabani: Yeah. That that'll be, I think one of the. Yeah, I, I think like another way to kind of answer that question is it's, it's forcing us, the, the speed at which everything is developing is forcing us to answer some important questions that we might have otherwise kind of put off in terms of automation.[01:00:39] I think like one of the there's a bit of a tangent, but like, one, one of the things is I think we used to think of AI as these things that would come and take blue collar jobs. And then now, like with a lot of white collar jobs that seem to be like at risk from something like chat G B T all of a sudden that conversation becomes a lot, a lot more important.[01:00:59] And how do we it, it suddenly becomes more important to talk about how do we allow AI to help people as opposed to replace them. And and you know, what changes we need to make over the very long term as a society to kind of Allow you know, people to enjoy the kind of benefits that AI brings to an economy and, and to a society and not feel threatened by it instead.[01:01:23] Alessio Fanelli: Yeah. What do you think a year from now, what will people be the most[01:01:26] Reza Shabani: surprised by? I think a year from now, I'm really interested in seeing how a lot of this technology will be applied to domains outside of chat. And, and I think we're kind of just at the beginning of, of that world you know, chat, G B T, that that took a lot of people by surprise because it was the first time that people started to, to actually interact with it and see what the the capabilities were.[01:01:54] And, and I think it's still just a, a chatbot for many people. And I think that once you start to apply it to actual products, businesses use cases, it's going to become incredibly Powerful. And, and I don't think that we're kind of thinking of the implications for, for companies and, and for the, for the economy.[01:02:14] You know, if you, for example, are like traveling and you want to be able to ask like specific questions about where you're going and plan out your trip, and maybe you wanna know if like if there are like noise complaints in the Airbnb, you just are thinking of booking. And, and you might have like a chat bots actually able to create a query that goes and looks at like, noise complaints that were filed or like construction permits that are filed that are fall within the same date range of your stay.[01:02:40] Like I, I think that that type of like transfer learning when applied to like specific industries and specific products is gonna be incredibly powerful. And I don't think. Anyone has like that much clue in terms of like what's what's going to be possible there and how much a lot of our favorite products might, might change and become a lot more powerful with this technology.[01:03:00] swyx: Request for products or request for startups. What is an AI thing you would pay for if somebody built it with their personal work?[01:03:08] Reza Shabani: Oh, man. The, the, there's a lot of a lot of this type of stuff, but or, or a lot of people trying to build this type of, of thing, but a good L l m IDE is kind of what, what we call it in You mean the one, like the one you work on?[01:03:22] Yeah, exactly. Yeah. Well, so that's why we're trying to build it so that people Okay. Okay. Will pay for it. No, I, but, but I mean, seriously, I think that I, I, I think something that allows you to kind of. Work with different LLMs and not have to repeat a lot of the, the annoyance that kind of comes with prompt engineering.[01:03:44] So think, think of it this way. Like I want to be able to create different prompts and and test them and against different types of models. And so maybe I want to test open AI's models. Google's models. Yeah. Cohere.[01:03:57] swyx: So the playground, like from[01:03:59] Reza Shabani: net Devrel, right? Exactly. So, so like think Nat dot Devrel for Yeah.[01:04:04] For, well, for anything I guess. So Nat, maybe we should say what Nat dot Devrel is for people don't know. So Nat Friedman, Nat Friedman former GitHub ceo. CEO and, and or not current ceo, right? No. Former. Yeah. Went on replica Hired a bounty and, and had a bounty build this website for him.[01:04:25] Yeah. That allows you to kind of compare different language models and and get a response back. Like you, you add one prompt and then it queries these different language models, gets the response back. And it, it turned into this really cool tool that people were using to compare these models.[01:04:39] And then he put it behind a paywall because people were starting to bankrupt him as a result of using it. But but something like that, that allows you to test different models, but also goes further and lets you like, keep the various responses that were, that were generated with these various parameters.[01:04:56] And, and, you know, you can do things like perplexity analysis and how, how widely The, the, the responses differ and over time and using what prompts, strategies and whatnot, I, I do think something like that would be really useful and isn't really built into most ides today. But that's definitely something, especially given how much I'm playing around with prompts and and language models today would be incredibly useful to have.[01:05:22] I[01:05:22] swyx: perceive you to be one layer below prompts. But you're saying that you actually do a lot of prompt engineering yourself because you, I thought you were working on the model, not the prompts, but maybe I'm wrong.[01:05:31] Reza Shabani: No, I, so I work on, on everything. Both, yeah. On, on everything. I think most people still work with pro, I mean, even a code completion model, you're still working with prompts to Yeah.[01:05:40] When you're, when you're you know running inference and, and whatever else. And, you know, instruction tuning, you're working with prompts. And so like, there's There's still a big need for for, for prompt engineering tools as well. I, I do, I guess I should say, I do think that that's gonna go away at some point.[01:05:59] That's my, that's my like, hot take. I don't know if, if you all agree on that, but I do kind of, yeah. I think some of that stuff is going to, to go away at[01:06:07] swyx: some point. I'll, I'll represent the people who disagree. People need problems all the time. Humans need problems all the time. We, you know, humans are general intelligences and we need to tell them to align and prompts our way to align our intent.[01:06:18] Yeah. So, I don't know the, it's a way to inject context and give instructions and that will never go away. Right. Yeah.[01:06:25] Reza Shabani: I think I think you're, you're right. I totally agree by the way that humans are general intelligences. Yeah. Well, I was, I was gonna say like one thing is like as a manager, you're like the ultimate prompt engineer.[01:06:34] Prompt engineer.[01:06:35] swyx: Yeah. Any executive. Yeah. You have to communicate extremely well. And it is, it is basically akin of prompt engineering. Yeah. They teach you frameworks on how to communicate as an executive. Yeah.[01:06:45] Reza Shabani: No, absolutely. I, I completely agree with that. And then someone might hallucinate and you're like, no, no, this is, let's try it this way instead.[01:06:52] No, I, I completely agree with that. I think a lot of the more kind of I guess the algorithmic models that will return something to you the way like a search bar might, right? Yeah. I think that type of You wanted to disappear. Yeah. Yeah, exactly. And so like, I think that type of prompt engineering will, will go away.[01:07:08] I mean, imagine if in the early days of search when the algorithms weren't very good, imagine if you were to go create a middleware that says, Hey type in what you're looking for, and then I will turn it into the set of words that you should be searching for. Yes. To get back the information that's most relevant, that, that feels a little like what prompt engineering is today.[01:07:28] And and sure that would've been really useful. But like then, you know, Google slash yahoo slash search engine Yeah. Would kind of removes that. Like that benefit by improving the, the underlying model. And so I do think that there's gonna be improvements in, in transformer architecture and the models themselves to kind of reduce Like overly yeah.[01:07:51] Like different types of prompt engineering as we know them today. But I completely agree that for the way larger, kind of like more human-like models Yeah. That you'll always need to, we'll talk some form of, of prompt engineering. Yeah. Okay.[01:08:04] Alessio Fanelli: Awesome. And to wrap this up, what's one thing you want everyone to take away about ai?[01:08:09] Both. It can be about work, it can be about personal life and the[01:08:13] Reza Shabani: societal impact. Learn how to use it. I, I would say learn how to learn how to use it, learn how it can help you and, and benefit you. I think there's like a lot of fear of, of ai and, and how it's going to impact society. And I think a lot of that might be warranted, but it, it's in the same way that pretty much anything new that comes along changes society in that way, and it's very powerful and very fundamental.[01:08:36] Like the internet. Change society in a lot of ways. And, and sure kids can go like cheat on their homework by finding something online, but there's also plenty of good that kind of comes out of opening up the the world to, to everyone. And I think like AI's gonna be just another iteration of, of that same thing.[01:08:53] Another example of, of that same thing. So I think the, the people who will be really successful are the ones that kind of understand it know how to use it, know its limitations and, and know how it can make them more productive and, and better at anything they want to do. Awesome. Well, thank[01:09:08] Alessio Fanelli: you so much for coming on.[01:09:10] This was[01:09:10] Reza Shabani: great. Of course. Thank you. Get full access to Latent Space at www.latent.space/subscribe
01:09:3103/05/2023
Mapping the future of *truly* Open Models and Training Dolly for $30 — with Mike Conover of Databricks
The race is on for the first fully GPT3/4-equivalent, truly open source Foundation Model! LLaMA’s release proved that a great model could be released and run on consumer-grade hardware (see llama.cpp), but its research license prohibits businesses from running it and all it’s variants (Alpaca, Vicuna, Koala, etc) for their own use at work. So there is great interest and desire for *truly* open source LLMs that are feasible for commercial use (with far better customization, finetuning, and privacy than the closed source LLM APIs).The previous leading contenders were Eleuther’s GPT-J and Neo on the small end (FLAN-T5 (137B), PaLM (540B), and BigScience’s BLOOM (176B) on the high end. But Databricks is to my knowledge the first to release not just a cleanly licensed, high quality LLM that can run on affordable devices, but also a simple Databricks notebook that can be customized to be finetuned for your data/desired style - for $30 in 30 minutes on one machine!Mike Conover tells the story of how a small team of Applied AI engineers got convinced Ali Ghodsi and 5,000 of their coworkers to join in the adventure of building the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. He also indulges our questions on other recent open source LLM projects, CerebasGPT and RedPajama, though we recorded this a week before Stability’s StableLM release. Stick around to the end for some easter eggs featuring AI Drake!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Show Notes* Mike Conover LinkedIn and Twitter* Dolly 1.0* Dolly 2.0* CICERO and Diplomacy* Dolly and Deepspeed* LLMops: * https://nat.dev/* PromptLayer* HumanLoop* Spreadsheets??* Quadratic* Alessio’s Email GPT Drafter* Open Models* Open Assistant* Cerebras GPT* RedPajama* Reflexion, Recursive Criticism and Improvement* Lightning Round* AI Product: Google Maps* AI People: EleutherAI, Huggingface’s Stas Bekman* AI Prediction: Open LLaMA reproduction, AI Twins of People (AI Drake), Valuing Perplexity * Request for Startups: LLMOps/Benchmarks, Trail MappingTimestamps* [00:00:21] Introducing Mike Conover* [00:03:10] Dolly 1.0* [00:04:18] Making Dolly* [00:06:12] Dolly 2.0* [00:09:28] Gamifying Instruction Tuning* [00:11:36] Summarization - Thumbnails for Language* [00:15:11] CICERO and Geopolitical AI Agents* [00:17:09] Datasets vs Intentional Design* [00:21:44] Biological Basis of AI* [00:23:27] Training Your Own LLMs* [00:28:21] You May Not Need a Large Model* [00:29:59] Good LLM Use cases* [00:31:33] Dolly Cost $30 on Databricks* [00:36:06] Databricks Open Source* [00:37:31] LLMOps and Prompt Tooling* [00:42:26] "I'm a Sheets Maxi"* [00:44:19] AI and Workplace Productivity* [00:47:02] OpenAssistant* [00:47:41] CerebrasGPT* [00:51:35] RedPajama* [00:54:07] Why Dolly > OpenAI GPT* [00:56:19] Open Source Licensing for AI Models* [00:57:09] Why Open Source Models?* [00:58:05] Moving Models* [01:00:34] Learning in a Simulation* [01:01:28] Why Model Reflexion and Self Criticism Works* [01:03:51] Lightning RoundTranscripts[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio Partner and CT and Residence and Decibel Partners. I'm Joan Bama, cohost swyx Brighter and Editor of Space. Welcome, Mike.[00:00:21] Introducing Mike Conover[00:00:21] Hey, pleasure to be here. Yeah, so[00:00:23] we tend to try to introduce you so that you don't have to introduce yourself. Yep.[00:00:27] But then we also ask you to fill in the blanks. So you are currently a, uh, staff software engineer at Databricks. Uh, but you got your PhD at Indiana on the University of Bloomington in Complex Systems analysis where you did some, uh, analysis of clusters on, on Twitter, which I found pretty interesting.[00:00:43] Yeah. Uh, I highly recommend people checking that out if you're interested in getting information from indirect sources or I, I don't know how you describe it. Yes. Yeah. And then you went to LinkedIn working on. Homepage News, relevance, and then SkipFlag, which is a smart enterprise knowledge graph, which was then acquired, uh, by Workday, where you became director of machine learning engineering and now your Databricks.[00:01:06] So that's the quick bio and we can kind of go over Yeah. Step by step. But, uh, what's not on your LinkedIn that people[00:01:12] should know about you? So, because I worked at LinkedIn, that's actually how new hires introduce themselves at LinkedIn is this question. So I, okay. I have a pat answer to it. Uhhuh. Um, I love getting off trail in the backcountry.[00:01:25] Okay. And I, you know, I think that the sort of like radical responsibility associated to that is clarifies the mind. And I think that the, the things that I really like about machine learning engineering and sort of the topology of high-dimensional spaces kind of manifest when you think about a topographic mat as a contour plot.[00:01:44] You know, it's a two-dimensional projection of a three-dimensional space and it's very much like looking at information visualizations and you're trying to relate your. Localized perception of the environment around you and the contours of, uh, ridges that you see, or basins that you might go into and you're like, there's that little creek down there.[00:02:04] And relate that to the projection that you see on the map. I think it's physically demanding. It's intellectually challenging. It's natural. Beauty is a big part of it, and you're generally spending time with friends, and so I just, I love that. I love that these are camping trips. Uh, multi-day. Yeah. Yeah.[00:02:21] Camping. I, I hunt too, you know, I, um, shoot archery, um, big game back country hunting, but yeah. You know, sometimes it's just, let's take a walk in the woods and see where it goes.[00:02:32] Oh yeah. You ever think about going on one of those, um, journeys in the, uh, the Australian Outbacks? Like where people find themselves?[00:02:40] I'm[00:02:40] a mountain. I'm a mountain guy. I like to You're mountain guy. I like to fly fish. I like to, you like to hill climb? Yeah. Like the outback seems beautiful. I think eight of the 10 most deadly snakes live in Australia. Like I'm, uh, yeah, you're good. You're good. Yeah. Yeah.[00:02:52] Yeah. Any lessons from like, Real hill climbing[00:02:55] versus machine learning, hill climbing.[00:02:56] Great Dude. It's a lot like gradient descent. Yeah, for sure, man. Um, yeah, I that I have remarked on that to myself before for sure. Yeah, I don't, I'm not sure. This is like least resistance, please.[00:03:10] Dolly 1.0[00:03:10] That's awesome. So Dolly, you know, it's kind of come up in the last three weeks you went from a brand new project at Databricks to one of the hottest open source things out there.[00:03:19] So March 24th you had Dolly 1.0. It was a 6 billion parameters model based on GPT-J 6 billion and you saw alpaca training set to train it. First question is, why did you start with GPT-J instead of LLaMA, which was what everybody else was kind of starting from[00:03:34] at the time. Yeah, well, I mean, so, you know, we had talked about this a little before the show, but LLaMA's hard to get.[00:03:40] We had requested the model weights and had just not heard back. And you know, I think our experience with the, um, The original email alias for Dolly, before it was available on hugging face, you get hundreds of people asking for it, and I think it's like, it's easy to just not be able to handle the inbound.[00:03:56] Mm-hmm. And so like, I mean, there was a practical consideration, which is that, you know, we did not have the LLaMA weights, but additionally I think it's like much more interesting if anybody can build it. Right. And so I think that was our, um, and I had worked with the GPT-J model in the past and, and knew it to be high quality from a grammatical ness standpoint.[00:04:15] And so I think it was a reasonable choice. Mm-hmm. Yeah.[00:04:18] Making Dolly[00:04:18] Yeah. Maybe we should, we can also go into the impetus of why you started work on Dolly. Uh, you had been at Databricks for about a year. Mm-hmm. Was there, was this like a top-down directive? Was this your idea? We'll see, uh,[00:04:31] what happened? I've been working in N L P and language understanding for a fair while now.[00:04:36] I mean certainly since Skip flag back in 20 16, 20 17, we can introduce Skip flag is that's, if that's, sorry. You know, we don't have to focus too much on it, but like, this is a, an area how information moves through networks of people is a longstanding interest of mine. And we built a hack day project and I just slacked it to our c e o and I was, you know, this was when ChatGPT came out and it was an integration into the developer experience.[00:05:02] And I was like, as a user, this should exist. I want this. Mm-hmm. We should build this. It doesn't have to be us. And I mean, to our, uh, our leadership team is like 10 years into this journey, probably more than that at Databricks. And they are still. So hungry. It's wild. It's just wild to see these, these people in action, you know, this like this far into the marathon.[00:05:23] And, um, he's like, great, build it. Do make it. So, you know, and I, we had have, uh, full-time responsibilities and infrastructure forecasting and infrastructure optimization. And so we did, you know, and, um, we just started building and, you know, so we'd been working on this class of technologies for, um, several months.[00:05:46] And we had a stack that in part how we were able to kind of pivot on the balls of our feet. Uh, we repurposed a lot of existing code that we had built up, you know, in the past several quarters, um, to, to create Dolly and, and just to[00:05:58] be clear, like is this an internal stack or is this, uh, externally available as data?[00:06:02] Much of what we open sourced what, you know, like that that is a, that is the, the, it's, I mean, no, it's not the exhaustive stack by any account, but it's, it's some of the core components. Okay. Yeah.[00:06:12] Dolly 2.0[00:06:12] It only took 19 days to go from 1.0 to 2.0. Yeah. So 2.0 is 12 billion. So twist the number of parameters. You base this on the model family from Elu.[00:06:23] I instead, and I think the, the biggest change is like instead of using the alpaca turning set, which is change generated, so it has its own limitations, you created a brand new, uh, training data set created by the Databricks employees. So I would love to talk about how you actually made that happen. You know, did you just go around and say, Hey guys, I just need to like today, spend your day coming up with the instruction set?[00:06:47] Or like, did people volunteer to be a part of this?[00:06:50] Yeah, I mean, so again, like a lot of credit to our founding team, they see it, I think as much as anybody you'll talk to who is a new founder or somebody trying to work in this space, like our executives have the fire and will see a, a bright neon meta future that, uh, Databricks will confidently lead.[00:07:12] The world into. And so Ali just sent emails twice a day. Do it, do it. You know, we put together, you know, we, we use the InstructGPT sort of task families, you know, gen content generation, brainstorming close qa, open qa, paraphrasing, things like this, and basically put together these Google forms.[00:07:34] You know, just like, how can we build this as quickly as possible? We see this need, you know, the alpaca trick is amazing that it works. It's amazing that we're highly non-obvious that, you know, for GPT-J or even lLLaMA, you know, hundreds of billions of tokens into the train, this whisper of new data, you know, sort of moves it in, moves the parameter, uh, tensors into a new part of the state space.[00:08:02] I think, you know, my background is roughly in statistical physics related areas, and I think kind of like a phase transition. Mm-hmm. Like ice and water. It's like they're. Very, very little separates the two, but they could not be more different. And so Ali just kept haranging, like a huge email list of people.[00:08:21] Um, thousands and thousands of people. And, um, it worked. The other thing is, you know, to our employees credit, people see the moment and they wanna be part of something. And I think there's just passion and enthusiasm for. Doing this. So it was easier than you would expect[00:08:37] The answer is, so you put some answers in the blog post.[00:08:40] Yeah. And they're pretty comprehensive. Cuz one of the questions was like, how do I build a campfire? Yeah. And then the response was four paragraphs[00:08:46] of actual Truly, and I think Yeah, true. Yeah. And I think part of it is that because of the rapid adoption of these technologies like that, you have hundreds of millions of people, you know, who knows what the numbers are.[00:08:58] But on ChatGPT. People have become educated in terms of, and opinionated about what they expect from these tools. And so I think, you know, a lot of the answers are like, written in the style of what you would want from one of these assistants. And I think just to kind of like riff on how this question of like how the composition, cuz this is really re relevant to our enterprise customers, how the composition of the dataset qualitatively shapes the resulting behaviors of the fine-tuned models that are exposed to that stimulus.[00:09:28] Gamifying Instruction Tuning[00:09:28] You know, you look at a dataset like flan, which is a really, really large dataset that is, I think thousand plus tasks. Um, that's, you know, kind of this. Gold standard instruction data set, and a lot of it's synthesized the responses and we'll talk about evaluation, but the responses are very brief. You know, it's like emit the word positive or negative in relation to the, you know, as a judgment of the sentiment of this utterance.[00:09:52] And so it's, it's very multitask and I think like having thousands of different task types perform sort of irregular, you can't overfit to one specific behavior and so you have to compress and like do many things reasonably well. And so that I think you, you have to kind of wind up in interpolating between different types of behaviors that way.[00:10:12] But there's also like the question of like, when do you predict the end of sequence token? And if your completions, particularly for instruction tuning are short. Our empirical observation is that the fine tune model emits shorter results. And so having how to build a campfire. And like a narrative thoughtful human-like description.[00:10:36] I think it requires that demonstration to get that behavior from the model. And you had a, you had a leaderboard, um, who did[00:10:43] what, uh, any fun shenanigans that came out of, uh, the gamification?[00:10:46] Well, so the thing is like, you know, I think you can just ask people like be helpful. Uh, you know, like, like some people always take it too far and then Sure.[00:10:55] Yeah. Well, so you definitely see a long tail distribution. I think I was looking at the open assistant paper last night, and I think, I mean, don't quote me on this, but something like 12 people accounted for 10% of the total responses, which is super, that's just human systems have that long tail distribution terms of activity thing.[00:11:12] Yeah, yeah, exactly. So it's not surprising. And we see that to a some degree in our data set as well, but, um, not in the way that you would if you opened it up to the, like internet at large. So I, I think people are incentivized coworkers. Yeah. Do the right thing and you know, it's, you know, and also it's our company.[00:11:29] Like we. Want it to actually be useful, not just a performance of usefulness. And I think people got that.[00:11:36] Summarization - Thumbnails for Language[00:11:36] Is there a task[00:11:37] that you found like particularly hard to get data on? Like good data summarization?[00:11:41] Oh, because it's like a, it's both like long, uh, it's long and requires thought, you know, you have to synthesize and as opposed to name all the people in places in this passage from Wikipedia that's like, I can kind of do that while I'm watching television, but like writing an essay.[00:11:59] Yeah, it's a compare is hard. Yeah, there's probably more structure and like in terms of um, like an information theoretic standpoint, how much new signal each record introduces into the model. I expect that summarization is actually. A very demanding task and would not soon become overfit. We're developing our, our, I don't have like definitive answers to how that works because we're still, it's an open research project for the, for the business.[00:12:27] Yeah. Well, I, you know, just categorically, I think sum summarization is becoming more important, the more generative ai. For freights because we kind of need to expand and we see the contract again, in terms of what, uh, what we consume in terms of, uh,[00:12:41] information. Truly. I mean, like, to kind of riff on that, I think the, there's just so much material at your business.[00:12:48] You think about like, uh, PRDs, like, or, you know, product requirement stocks, you know, reasonable people. You kind of want like a zoom lens on language and you want the ability to see the high level structure of something and then be able to get details on demand like you would pan or like, you know, zoom into an information visualization.[00:13:09] I was talking with. Um, The head of AI at Notion about this and who, you know, you guys probably know and as a really remarkable person, and this idea of like, what does a thumbnail for language look like? Because like your visual cortex is structured such that like it's highly evolutionarily conserved to be able to glance at something and perceive its essence.[00:13:28] And that makes seeing a field of thumbnails. Like you guys I think are gonna speak with, um, Lexi folks here shortly. And you can see us like the field of images in response to a query and get a sense for like, oh, these are all like moody cyber punk scenes. Mm-hmm. What is that for language? And maybe it's like, maybe it doesn't exist.[00:13:52] Maybe it's the case. Stop me if I'm getting too far afield here. But you think about clothes as a technology that has shaped our physiology. Right. Like, and our, our phen, our phenotypic expression, we used to be covered in hair. We evolved this technology fire would also be in this class, and our bodies changed in response to it on the very long time scale of human history.[00:14:15] Mm-hmm. It may be the case that AI in the way that the visual cortex has been evolutionarily conserved to be able to rapidly perceive things, shapes how we process information. I don't know. What to do about language right now. It looks like reading a lot of samples from different models and seeing how they perform as we move through the loss curve.[00:14:34] That makes[00:14:34] sense. I mean, if you think about images in text, you don't really have like peripheral vision. You know, when you're like seeing something, you focus on the main thing and then you kind of like start to expand to see the rest. Yes. Like text is kind of like a, the density is like the same across the tax.[00:14:49] Like nothing jumps out when you see a wall of tax versus when you see an NI image. Just like something usually jumps out first. Yes. So I don't have the answer either. Was gonna say, I'm really curious word[00:14:58] clouds, which, but that, that's the thing is like, that's such a joke, right? Wait for me. Yeah, it's like punchline.[00:15:06] You must have[00:15:06] done, you know, your, your Twitter[00:15:08] work. I've cut a few word clouds in my day.[00:15:11] CICERO and Geopolitical AI Agents[00:15:11] Um, you know, I also think like this question of like, what are you most excited about in ai? Like what do you see as the sort of like grandest potential? And one of the things that I reflect on is, is the. Possibility of having agents that are able to, to negotiate intractable geopolitical problems.[00:15:31] So like if you look at like, the Cicero paper from, from Meta, can you recap for those who are making Yeah. So I mean it's, you know, I don't wanna like represent somebody else's work as like you're just talking Yeah, exactly. But like, um, my understanding is that diplomacy is a, um, turn-based negotiating game, like risk where you are all making the decision in simultaneously and you're trying to convince people that you're going to do or not do something.[00:15:56] And, uh, this paper was co-authored with one of the top diplomacy players and Meta built a system that was very, very capable at this negotiating game. I. Can envision nation states operating ais that find game theoretically optimal and sort of non exploitable steady states basically. Mm-hmm. That, you know, if you think about a lot of the large scale geopolitical disputes where it's just like human mediators are unable to find a compromise, ais may be able to satisfying conditions that you're like, yeah, actually I don't, that works for me.[00:16:36] Mm-hmm. And to your point about like how the phobia and attention generally, but like how the actual visual cortex works, the idea that like a great writer says something in a way and it hits unique structures in your brain and you have that chemical cascade, which is understanding, we may be able to design systems that compress very long documents on a per person basis so as to maximize information transfer, and maybe that's what the thumbnail looks like.[00:17:03] Mm-hmm.[00:17:04] Yeah, maybe it's emojis all the way down. I dunno.[00:17:08] Yeah.[00:17:09] Datasets vs Intentional Design[00:17:09] Obviously the dataset is like one of the, the big things in Dolly. Yeah. But you talked about some of these technologies being like discover, not designed, like maybe talk a bit about the process that took it to Dolly and like the experimentation[00:17:21] there.[00:17:22] So it's not my, my friend, my dear friend, Jacob Burk kind of had this insight, which is that AI is you, you design a jet turbine, like for sure you make a plan. Mm-hmm. And you, you know, have some working model of aerodynamics and you execute on the jet turbine. I think that with ai, generally we see. You know, this instruction following behavior that we saw in Dolly was not present in the, the base model.[00:17:53] It, you know, effectively will, it's a, you know, very powerful base model, but it will just complete the prefix as though it's random page on the internet. We had Databricks, but also the community with Alpaca discovered that you can perturb them just, just so, and get quite different behavior. That was not really a design.[00:18:13] I mean, it's designed in the sense that you had an intent and then you saw it happen. But we do not like choose the parameters they are arrived upon. And the question that I have is, what other capabilities are latent in these models, right? GPT-J was two years old. Can it do anything else? That's surprising?[00:18:36] Probably so, and I think you look at, you know, particularly, and this is why the Pithia Suite is so cool, is that, and you know, a ton of credit to, for. Having this vision, and I think it will probably take some time for the research community to, to understand what to do with these artifacts that they've created.[00:18:54] But it's effectively like this matrix of model checkpoints and sizes where you say, I'm gonna take from I think 110 million all the way up to 12 billion, which is what Dolly two is based on. And then at every checkpoint through the training run under, I think it's 2 million. Yeah. Tokens. Yeah. Well, so the, I think the Pithia suite is just trained on the pile, so it's like three, 400 million, which is probably undertrained.[00:19:18] And did you guys see this red? I think it's red Pajama released this morning. They've reproduced the lLLaMA training data set. So, so it's 1.2 trillion tokens and it's, um, I mean, you know, a separate topic, but we looked pretty hard at what it would take to reproduce the LLaMA data set. And it's like, Non-trivial.[00:19:35] I mean, bringing Common Crawl online and then d near de-duping it and you know, filtering it for quality. So the, the Common Crawl data set in LLLaMA is they fit a model to predict whether a page in common crawl is likely to be a reference on Wikipedia. And so that's like a way to like, I don't want lists of phone numbers, for example, or like ads.[00:19:58] All of that is a lot of work. And so anyway, with Pit, I think we can start to ask questions like through this, this matrix with size and like checkpoint depth. We have these different model parameters. How do behaviors emerge through that training process? And at different scales, you know, maybe it will be less of a discovery process.[00:20:22] Maybe we will get more intentional about, like, I want to elicit the fol, I want summarization, I want closed form, question answering. Those are the only things that matter to me. How much data do I need to. Generate or buy, how many parameters do I need to solve that compression problem? And maybe it will become much more deterministic, but right now it feels a lot like we're just trying things and seeing if it works, which is quite different from a lot of engineering disciplines.[00:20:51] I'm curious, does that reflect your experiences? Like Yeah, I[00:20:54] think like we had a whole episode on, um, kind of like scaling loss and everything with Varun from Exafunction. And I feel like the, when the Chinch paper came out, a lot of teams look at their work and they were like, we're just kind of throwing darts.[00:21:07] Exactly. That's now one,[00:21:10] 1.2 to, uh, 1.7 tokens, uh, you know, per, uh, per parameter. And, uh, now we're redoing everything with[00:21:16] 20 tokens. It's exciting, but also as like, you know, I'm, I'm a, an engineer and a hacker, like I'm not a scientist, but I, you know, used to pretend to be a scientist. Not, you know, not really pretend, but like I respect the, I respect the craft and like, It's also very exciting to have something you really don't understand that well, because that's an opportunity to create knowledge.[00:21:41] So that's part of why it's such an exciting time in the field. There's some work[00:21:44] Biological Basis of AI[00:21:44] on with, um, understanding the development of AI progress, uh, using biological basis. Mm-hmm. So in, in some sense, we're a speed running evolution Yeah. With training. Yeah. So in a sense that of just natural discovery of things and, and just kind of throwing epox at it Yeah.[00:22:02] Is, makes intuitive sense to me. But, uh, I do think that it is unintuitive to estimate how different artificial life might evolve differently[00:22:12] from biological life. Yeah. I, so like Richard Dawkins had, um, this sort of toy model called bio morphs. Which, uh, no, I haven't heard of it. Yeah, it's, I think it was dates to the eighties.[00:22:25] So it's a pretty old school demonstration of capabilities. But the idea is that you have, imagine they look, they're little insects that look like vector art. And the parameters of how they are rendered are governed by, you know, it's parametric, right? So some of them have long antennas and some of them have wide bodies and some of them have 10 legs, some of them have four legs.[00:22:46] And the underlying method is, is genetic algorithms where you take subsets of the parameters and kind of recombine them. And you're presented as a user with a three by three grid, and you click based on what you find subjectively beautiful. And so the fitness function, then they're re combined and you render a new set of nine by nine, some of which are mutated.[00:23:05] And so the fitness function is your perception of aesthetic beauty. That is the pressure from the environment. And I think like with things like RLHF where you're having this preference learning task, that is a little different from next token prediction in terms of like what is synthetic life and how are our preferences reflected there?[00:23:23] I think it's a very sort of interesting, yeah, interesting area. Okay. So a[00:23:27] Training Your Own LLMs[00:23:27] lot of people are very inspired by work with Dolly. Obviously Databricks, uh, is doing it. Partially out of the kindness of your hearts, but also to advertise Databricks capabilities. Uh, how should businesses who want to do the similar things for their own data sets and companies, uh, how, how should they think about[00:23:43] going about this?[00:23:44] I really would actually say that it's probably less about advertising our capabilities. I mean, that, you know, we're exercising our capabilities, but I, I really think that to the extent that we can help define some of the moves that reasonable teams would make in creating technologies like this, it, it helps everybody understand more clearly what needs to be done to make it useful and not just interesting.[00:24:08] And so, one, you know, one of the canonical examples that we had in the original Dolly was write a love letter, ed Growlin Poe. Yep. Which is super cool and like very moody. You know, I, I dunno if you guys remember the particulars of it, but it was like, I. The person, the imagined person writing this letter was like, I, I basically couldn't, like, I couldn't stand you, but I can't stop thinking about you, you know, which is a very like, gothic, uh, kinda, uh, mood in, in a letter like that not relevant to the enterprise context.[00:24:39] Right. So, you know, like it's neat that it does it, but if I don't have to buy training data that gets it to write moody, gothic letters to Edgar and Poe, and if I can be choosy about how I invest my token budget, that is useful to many businesses. And so, you know, one of the things that. We're trying to understand more clearly is I, we talked a little bit about like different tasks require that you compress in a way that generalizes, you know, if you think about it, the, the parameters as compressing language and also world knowledge.[00:25:15] The question is like, for a given model size, how many demonstrations of summarization, for example, are required in order to get a really useful, grounded QA bot? And so I think in building these kinds of solutions and sort of seeing how the. Categories of behaviors in the instruction tuning or sort of fine tuning data sets are related to those behaviors, I think will develop a playbook for startups in the enterprise that makes it, um, so that you can move with an economy of motion.[00:25:44] And this is related to evaluations as well. So one of the things that we had talked about sort of before we started recording was the using the EleutherAI evaluation benchmarks, and I think helm and the, you know, there's a bunch of other batteries that you can push your models through. But the metrics that we looked at first when we built the first version of Dolly, and this is on our hanging face page, you can go see this yourself.[00:26:08] The GPT-J model. And the fine-tuned dolly model have almost identical benchmark scores, but the qualitative character of the model just couldn't be any more different. And so I think that it requires better ways to measure the desired behavior, and especially in these enterprise contexts where it's like, is this a good summary and how can I determine that without asking a person?[00:26:37] And maybe it's kind of like you train reward bottles where you, you know, you have sort of a learned preferences and then you show, you know, you take kind of an active learning approach where you show the ones that it's most uncertain about to crowd workers and it's kind of like human in the loop.[00:26:52] Would this be p p o ish?[00:26:54] I mean, potential. That's, so this, that's not an area of expertise in mine yet. You know, this is something that we're also trying to, uh, more deeply understand kind of what the applicability of that stack is to, like, I'm just trying to ship. Mm-hmm. You know, my understanding is that that's somewhat challenging to bring online and also requires a fair number of labels.[00:27:14] And so it's like from an active learning standpoint, uh, my thinking would be more like, You have a reward model that you've trained and you said like, this is based on human judgments from my employees or some crowd workers, what I want from a summarization or a close, close form question answering. And then you basically, you choose new examples to show to humans that are close to the decision boundary and that are like maximally confusing.[00:27:38] It's like, I'm just really not sure rather than things that are far from the decision boundary. And it's, it's kind of like, I actually think there's gonna be, in terms of value creation in the next, let's say 18 to 36 months, there's still room for like old tricks. You know, like not everything has to be generative AI for it to be very valuable and very useful.[00:27:56] And maybe, maybe these models and, and zero shot prompting just eats everything. But it's probably the case that like an ensemble of techniques will be valuable and that you don't have to, you know, establish like room temperature fusion to like, you know, create value in the world, at least for, you know, another year and a half.[00:28:20] You know, like[00:28:21] You May Not Need a Large Model[00:28:21] just, just to spell it out for people trying to, uh, go deep on stuff. Um, maybe leave breadcrumbs. Um, sure. When you say techniques, you don't just mean prompting.[00:28:29] Oh, I mean even like named entity recognition, like Yeah, there's just like classic NLP stuff, you know, like supervised learning. I mean, multi-class classifi.[00:28:37] I have customer support tickets. I want to know whether this is going to be flagged as. P zero. Like that's just, it's not a complicated problem to solve, but it's still very valuable in these models that can deeply understand the essence of something and not necessarily generate language. But understand, I expect that you will see like s because, so for example, inference right now is time consuming.[00:29:04] Mm-hmm. Just, you know, it's like, unless you are really rigorous, and I think it, one of the things I'm excited about at Databricks is that we're, our inference stack is very, very fast. Like orders of magnitude faster than you would get if you took the naive approach. And that leads to very qualitative, like a very different way that you interact with these models.[00:29:22] You can explore more and understand their behavior more when it doesn't take 30 or 40 seconds to generate a sample and it's instead 1800 milliseconds. You know, that's something that's very exciting. But if you need to spend your compute budget, Efficiently and you have tens of thousands of possible things that you could summarize, but you can really only, you know, in a day do so many.[00:29:45] Having some stack ranking of them with a classical machine learning model is just valuable. And I, I expect that you'll see like an ecosystem of tools and that it's not all going to be necessarily agents talking to agents. I could be proven wrong on that. Like, I, I don't know. We'll see. Hey,[00:29:59] Good LLM Use cases[00:29:59] going back to the evolutionary point, I feel like people think that the generative AI piece is like the one with the most like, uh, possible branches of the tree still to explore.[00:30:09] So they're all focusing on that. But like you said, we're probably gonna stop at some point and be like, oh. That thing we were doing is just as good. Let's pair them together and like use that instead of just like trying to make this model do everything.[00:30:22] Yeah. And there, yeah, there are things like categorically that only generative models can accomplish.[00:30:28] And I do think, I mean, one of the reasons that at Databricks we see so much value for companies is that you can, with zero shot prompting, you can say, given this customer support ticket, for example, give me a summary of the key issues represented in it. And then simply by changing that prefix, say, write a thoughtfully composed reply that addresses these issues in the tone and voice of our company.[00:30:53] And imagine you have a model that's been fine tuned on the tone voice that's in your, in your, uh, from your support team. Both of those problems historically would've taken like a reasonable machine learning team, six to eight weeks to build. And frankly, the right, the response, I'm not sure you can do it without generative techniques.[00:31:13] And now your director of sales can do that. You know, and it's like, the thing that might make me look foolish in retrospect is that. Orders of magnitudes cheaper to do it with prompting. And maybe it's like, well, sure the inference costs are non-trivial, but it's just we've saved all of that in time. I don't know.[00:31:33] Dolly Cost $30 on Databricks[00:31:33] We'll see. I'm[00:31:34] always interested in, uh, more economics of, um, of these things. Uh, and one of the headline figures that you guys put out for Dolly was the $30 training cost. Yes. How did you get that number? Was it. Much lower than you expected and just let's just go as deep[00:31:50] as you want. Well, you just think about, so you know, we trained the original dolly on a 100 s and so one of the cool things about this is we're doing this all on Databricks clusters, right?[00:32:00] So this like, this works out of the box on Databricks and like turns out, you know, I think you would probably need slightly different configurations if you were going to do your own full pre-training run on, you know, trillions of tokens. You have to think about things like network interconnect and like placement groups in the data center in a more like opinionated way than you might for spark clusters.[00:32:23] But for multi-node distributed fine tuning, the Databricks stack is great out of the box. That was wonderful to find.[00:32:32] You've been building the perfect fine tuning architecture the whole[00:32:34] time. Yeah. You know, may, maybe it's not perfect yet, but like, It's pretty good. And I think, so for the original Dolly, it was just a single node, and so you can bring up an eight node, a 100 machine, and I'm, you know, I thinking of just the off the rack pricing from the cloud providers, it's about 30 bucks.[00:32:55] I think the actual number's probably less than $30. For How long are you for? It was less than an hour to train the thing. It's 50, I mean it's 50 thou alpacas, 50,000 records. Right.[00:33:04] And you've open sourced the, the notebook, which people can check out what[00:33:07] gonna show notes. There's. The risk that I am making this up is zero.[00:33:11] Yeah. No, no, no. I'm not, I'm[00:33:12] not saying the I know you're not. I'm just saying I'm, I'm, I'm leaving break rooms for people to say, Hey, it, it's 30[00:33:17] bucks, takes an hour. Go do it. It's, it's crazy. And, and that's like the, I mean, you think about, I yeah, I, I, I know for a fact that you're not suggesting that, but it's just like, what's nuts is that you can just try it.[00:33:28] You know, you can, if you have 30 bucks, you can stand this thing up and, um, on a single machine, execute this training run. And I think I talked about like this idea that it's kind of like a phase transition. What's surprising about it, if you were to say, Hey, given a corpus of millions of instruction pairs, you can for.[00:33:50] $10,000, which is still an order of magnitude less than it cost to train the thing, get this qualitatively different behavior. I'd be like, yeah, that that sounds about right. And it's like, yeah, if you have an afternoon, like you can do this. That was not certainly, it was not obvious to me that that was true.[00:34:08] I think especially like, you know, like with libraries, like deep speed that, you know, so deep speed is a, is a library that gives you many different options for dealing with models that don't fit in memory and helping increase the effective batch size by, you know, for example, putting the entire model on a GP on several different GPUs and then having device local batches that are then the gradients are, are accumulated, are sort of aggregated for those, those from those different devices to get an effective batch or sharding the actual different model submodules across GPUs.[00:34:43] And this is all available in the notebook and the, the model that we train does not fit on a single device. And so you have to shard the model across the GPUs to run the training, you know, an incredible time that like this technology is just like free and open source and it's like the Microsoft team and the, you know, the hugging face team have made it so easy.[00:35:04] To accomplish things that even just two years ago really required a PhD. And so it's like level of effort, capital expenditure, substantially less than I would've expected. Yeah.[00:35:17] And you, you sort of co-evolve this cuz you also happen to work on the infrastructure optimization[00:35:21] team. Yeah, I mean that's kind of, um, like, you know, this is really kind of a separate project at Databricks, which is like making sure that we have a great customer experience and that we have the resources that are required for all of our customers.[00:35:37] You can push a button, get a computer, uh, get a Spark cluster. And I think when you look to a world where everybody is using GPUs on Databricks, making sure that we are running as efficiently as possible so that we can make Databricks a place that is extremely cost effective to train and operate these models.[00:35:55] I think you have to solve both problems simultaneously. And I think the company that does that effectively is, um, is gonna create a lot of value for the market.[00:36:06] Databricks Open Source[00:36:06] Yeah. You mentioned Spark, obviously Databricks, you know, Started, like the founders of Databricks created a spark. Yeah. At Berkeley. Then, you know, from an open source project, you start thinking about the enterprise use cases.[00:36:18] You end up building a whole platform. Yeah. You still had a lot of great open source projects like uh, ML Flow, Delta Lakes. Yeah. Um, yeah. Things like that. How are you thinking about that was kind of the ML ops phase. Yeah. Right. As you think about the l lm ops, like needs, you know, like obviously. We can think of some of these models as the spark, so to speak, of this new generation.[00:36:39] Like what are some of the things that you see needed in infrastructure and that maybe you're thinking about building?[00:36:44] Yeah, I mean, um, so kind of first to address this, this matter of open source. I think, you know, Databricks has done a lot of things that, and has released into the public domain a lot of technologies where a reasonable person could have said, you should.[00:37:00] Treat that as IP that you and no one else has. And I think time and again, the story has been more, is better and we all succeed together. And when you create a new class, people rush in to fill it with ideas and use cases and that it's, it's really powerful. It's both good business and it's good for the community.[00:37:21] And Dolly I think is very much a natural extension of that urge, which just, I think reflects our founders tastes and beliefs about markets and, and technology[00:37:31] LLMOps and Prompt Tooling[00:37:31] when it comes to LM ops, which is not a phrase that rolls off the tongue. We'll, we're gonna need something better than that. We, this kinda gets back to like what is a thumbnail for text.[00:37:43] Mm-hmm. One of the things that my team winds up doing a fair amount of right now is like slacking back and forth examples of like generated samples. Okay. Because like these evaluation benchmarks do not capture the behaviors of interest. And so we often have like a reference battery of prompts. Let's say 50 to a hundred.[00:38:03] Write a love letter to Edgar and Poe. Yeah. Give me a list of ins. Like what are, what are one of our things is what are considerations? Like it should keep in mind when planning for a backcountry backpacking trip can you generate a list of reasonable suggestions for a backpacking trip. And you see, as you kind of move the model through the loss curve under instruction tuning that um, that behavior emerges and that like you kind of wind up qualitatively evaluating is the model doing what I want in respect to these prompts that I've seen many different models answer this model or this, this instruction tuning data set is generating shorter completions.[00:38:40] This one is generating the. Wackier completions, you know, this one is much likelier to produce lists all of these things. I don't know if you've seen Nat Devrel. Mm-hmm. I'm sure, of course you have that idea of the grid of like, I want to run inference in parallel on arbitrary prompts and compare and contrast, like tooling like that is going to make it, and especially with a fast inference layer, and this is where I think Databricks has a lot of opportunity to create value for people is being able to serve, interact, and measure the behavior of the model as it changes over time and subject it not only to quantitative.[00:39:19] Benchmarks, but also qualitative subjective benchmarks plus human in the loop feedback where imagine that I burn a model checkpoint and every thousand steps, I send it off to an annotation team and I get a hundred pieces of human feedback on the results. And it's like there's kind of like what is the right volume of human feedback to get to statistical significance?[00:39:43] But I think there is. An ensemble, you know, each of these is like a different perspective on the behavior of the model. A quantitative, qualitative, and then human, uh, feedback at scale. Somebody's going to build a product that does these things well in a delightful user form factor. And that is fast and um, addresses the specific needs of AI developers.[00:40:04] And I think that business will be very successful and I would like for it to be Databricks. Ah, okay.[00:40:10] Teasing what you might be[00:40:11] building. Interesting. You know, and this, not to make forward-looking statements, but it's just like, make sense as obvious as a person, you wanna do it? Mm-hmm. I need that. Yeah.[00:40:19] Yeah. I need that. Yeah. I happen to work at a company.[00:40:21] Yeah. So just to push on, uh, uh, this one a little bit, cuz I have spent some time looking into this. Sure. Have you come across prompt layer? That would be one of the leading tools. And then I think Human Loop has a little bit of it, but yes, it's not a course focus of theirs, is it?[00:40:34] Prompt layer? Yeah. I'll, okay. Send And happy to drop that reference cuz uh, he has reached out to me and I, I looked at his demo video and it, yeah, it kind of is, isn't that in the ballpark? And I think there are a lot of people, uh, zeroing in on it. But the reason I have not done anything in, in, in this area at all is because I could just do it in a spreadsheet.[00:40:51] Like all you need to do is Yeah.[00:40:53] Spreadsheet function that you can, but I mean like editing text and Google Sheets is a drag. Is it? Yeah. I, I mean mm-hmm. What's missing? You know? Oh, so a, like the text editing experience in it, like you're trying to wrap these cells. Okay. And so now you gotta like double click to get into the editing mode.[00:41:12] I think they struggle with large record sets. So like the spreadsheets slow down, you kind of want, this is not some, like a, this specific question of like, how does Google Sheets fail to meet the need is something that, you know, I don't have a talk track around Sure. But like linking it to an underlying data source where it's sort of like persisted.[00:41:34] Cuz now I'm, now I have a bunch of spreadsheets that I'm managing and it's like, those live on in Google Drive, which has kind of a garbage ui. Or is it on my local machine? Am I sending those around? Like, if, can I lock the records so that they can't be annotated later? How do I collect multiple evaluations from different people?[00:41:50] How do I compute summary statistics across those evaluations? Listen, I'm the first person to like, fire up sublime. Yeah. You know, like, keep it simple, right? Yeah. Just for sure. I feel like the, the way that I have talked with colleagues about it is it's like we are emailing around. Photocopies of signed printouts of PDFs and DocuSign doesn't exist yet, and nobody realizes that they're doing this like ridiculous dance.[00:42:16] And I get it. I too have used Google Sheets to solve this problem, and I believe that they're, there's maybe something better. I've Stockholm Syndrome.[00:42:26] "I'm a Sheets Maxi"[00:42:26] So there's a couple more that I would highlight, uh, which is Quadra. Uh, okay. Uh, full disclosure, an investment of mine, but basically Google Sheets implement, implemented a web assembly.[00:42:35] Yeah. And a, and a canvas. Okay. And it speaks Python and sql. Yeah. Yeah. And, uh, and Scala. Yeah. Uh, so I, I think, I think, yeah, there, there's some people working on interesting hearings[00:42:46] at those. And what you could do is like, like imagine that you have a Google Sheets type ui, the ability to select like a column or a range and subject all of those values to a prompt.[00:42:59] Yes. And like say like, I have template filling and I want, that's what I want. My problem[00:43:04] with most other SaaS attempts is people tend to build UIs that get in your way of just free range experimentation. Yes. And I'm a sheet's, uh, maxi. Like if I can do it in a sheet, I'll do[00:43:16] in a sheet, you know? Yeah. Well, and I mean, kind of to continue, like on the sheets, sort of mining that vein, you know, on the, sort of like how does AI impact the workplace and like human productivity?[00:43:29] I think like a, I really like the metaphor, which is comparing, uh, AI technologies to the development, the advent of spreadsheets in the eighties, and this idea that like you had a lot of professionals who were like well educated, like serious people doing serious accounting and finance work, who saw as their kind of core job function manually calculating.[00:43:53] Values in forecasts on paper as like, this is how I create value for the business. And spreadsheets came along and I think. There was a lot of concern that like, what am I gonna do? Yeah. With my days? And it turns out that like I think of it sometimes, like being in a warm bath and you don't notice how nice the water is until you wiggle your toes a little bit.[00:44:14] You kind of get used to your circumstances and you stop noticing the things that would stand out.[00:44:19] AI and Workplace Productivity[00:44:19] So on the subject of how artificial intelligence technologies will shape productivity in the workplace, you have, I think, a good metaphor in comparing this to spreadsheets and the Adventist spreadsheets In the eighties, I think you had a lot of really serious people who were taking, making an earnest effort to be as productive and effective as possible in their lives, who were not making it their business to waste time.[00:44:42] Saw spreadsheet technology come out and it's like, man, well what am I gonna do? I'm the person that calculates things. Like I write it all down and that's how I create value. And then like you start using this new tool and it's like, oh, it turns out that was the Ted most tedious and least rewarding part of my job.[00:44:58] And I'm just so, you know, like I have, like, I still have that human drive to create. You just kind of point it at like more pressing and important problems. And I think that, that we probably don't, especially, and even when it comes to writing, which feels like a very like quintessentially human and creative act, there's a lot of just formulaic writing that you have to do.[00:45:22] Oh yeah. And it's like, maybe I shouldn't be spending my time on all of that kind of boiler plate. And, you know, there's a question of like, should we be spending our time reading boilerplate? And if so, why is there so much boiler plate? But I, I think that humans are incredibly resourceful and incredibly perceptive as to how they can be effective.[00:45:43] And that, you know, the, I think it will free us up to do much more useful things with our time. I think right now[00:45:50] there's still a, a bit of a stigma around, you know, you're using the model mm-hmm. To generate some of the text. But I built a open source, like a email drafter. Yeah. So for all of my emails, I get a G PT four pre-draft response.[00:46:04] And a lot of them I just sent, but now I'm still pretending to be me.[00:46:07] Okay. So that's why I'm talking to you[00:46:09] When I talk to you, you need to fine tune it. Right.[00:46:12] But in the future, maybe it's just gonna be acceptable that it's like, Hey, we don't actually need to spend this time, you and I talking. Yes. It's like, let the agents like cash it out and then come back to us and say, this[00:46:22] is what you're gonna do next.[00:46:23] Articulate your preferences and then you, I think this like trustworthiness is a piece of this here where like hallucinations, T b D, whether it is like actually attractable problem or whether you need other affordances like grounded methods to, to sort of. Is a hallucination, just a form of creativity, like, we'll see.[00:46:42] But um, I do think eventually we'll get to a point where we can, we trust these things to act on our behalf. And that scenario of like calendaring, for example, or just like, you know, even working out contract details, it's like, Just let me tell you exactly what I want and you make sure that you faithfully represent my interests.[00:47:00] That'll be really powerful.[00:47:02] OpenAssistant[00:47:02] So we haven't run this by you, but uh, I think you have a lot of opinions about, you know, the projects that are out there, uh mm-hmm. And three that are, are on mine. For one, you've already mentioned Open Assistant two, cereus, G B T also came out roughly in the same timeframe. I'm not sure if you want to comment on it, I'd like to compare because they, they also had a similar starting point as as you guys, and then three Red Pajama, which, uh, was just out this morning.[00:47:24] Yeah. We might, as might as well get a soundbite from you on your thoughts. So yeah, if you want to pick one, what was the first one? Uh, open Assistant.[00:47:30] Yeah. So, I mean, open Assistant is awesome. I love what they've done. I will be eager to use their free and open data set, uh, to improve the quality of Dolly three.[00:47:41] CerebrasGPT[00:47:41] Yeah, but also just like we're seeing the, the training is, so Cerus is a good example of, you know, I think they were, my understanding, and I don't know that team or really, you know, I haven't looked too closely at the technology, but I have worked with the model is that it's a demonstration of their capabilities on this unique chip that they've designed where they don't have to federate the models out to multiple cards.[00:48:04] But I think if you look at some of the benchmarks, it is on par or maybe a little shy of some of the Ethe I models. And I think that one of the things that you may see here is that the market for foundation models and like the importance of having your own foundation model is actually not that great.[00:48:27] That like you have a few. Core trains that people, I think of these kind of like stem cells where, you know, a stem cell is a piece of is, is a cell that can become more like its surrounding context. It can become anything upon differentiation when it's exposed to eye tissue or kidney tissue. These foundation models sort of are archetypal and then under fine tuning become the specific agent that you have a desire for.[00:48:53] And so I think they're expensive to train. They take a long time to train. Even with thousands of GPUs, I think you're still looking at like a month to stand up some of these really big models, and that's assuming everything goes correctly. And so what Open Assistant is doing is. I think representative of the next stage, which is like open data sets, and that's what the Dolly release is also about, is, I kind of think of it like an upgrade in a video game.[00:49:21] I don't play a ton of video games, but I, you know, I, I used to, and I'm familiar with the concept of like, your character can now double jump. Mm-hmm. Right. Great. You know, it's like, here's a data set that gives it the ability to talk to you. Hmm. Here's a data set that gives it the ability to answer questions over passages from a vector index.[00:49:38] I think anybody who's listening, I think there's a tremendous opportunity to create a lot of value for people by going through this exercise of the unsexy work, of just writing it down and figuring out ways to do that at scale. Some of that looks like semi-synthetic methods, so something I would love to see from the Dolly data set.[00:49:58] Is paraphrasing of all the prompts. So basically you now have multiple different ways of saying the same thing and you have the completions which are correct answers to different variants of the question. I think that will act as like a regular, it's kind of like image augmentation. I was gonna say, you flip it.[00:50:13] Yeah. Yeah. I believe that that will work for language. Like one of the things you could do. Cause we, we saw that within 24 hours the dataset had been translated into Spanish and Japanese. The dolly dataset. Yeah, it was, I mean, you know, it's maybe, yeah. Yeah. Right. Yeah. So that's super cool. Um, and also something that is only possible with open data.[00:50:31] Well, it's only useful with open data, but I just last night was thinking like, I wonder if you could to paraphrase, cuz it's not obvious to me like what the best and state of the most state-of-the-art paraphrasing model is. You could use Google Translate potentially and take the prompt. Translate it to Spanish and then translate it back to English, you get a slightly different way of saying the same thing.[00:50:54] Ah, right. So I think the self instruct paper is really about like few shot prompting to get more prompts and then using large models to get completions and then using human annotators to judge or train a reward model. I think that bootstrapping loop on the back of these open data sets is going to create multimillion scale training corpuses.[00:51:14] And so I, what Open Assistant has done is a, it's a great model. I don't know if you've tried their interactive chat, but it's just really quite an impressive accomplishment. But that the gesture towards open data that you know, the Dolly dataset and the open assistant dataset represent, I think is probably gonna define the next six to nine months of.[00:51:35] RedPajama[00:51:35] Work in this space. Um, and then the red, a red pajama. Red pajama, I mean, yeah, it's like I said, you can do a close read of the LLaMA paper. There's the dataset section and I think they use seven distinct data sets, archive, and I think maybe Stack exchange and common crawl.[00:51:50] Okay. So they have common crawl.[00:51:52] Yep. C4, which is Common crawl, but filtered subset. Yeah. Uh, GitHub archive books. Wikipedia Stack Exchange.[00:51:59] Yes. So, you know, take Common Crawl, for example, when you read the lLLaMA paper. So a common crawl I think is three terabytes in the lLLaMA paper. It's not something you just download from, like it's, you have to produce this data set, or at least the CC net, um, implementation that they reference there.[00:52:18] And you have like a single paragraph in this research paper dedicated to how they produce Common Crawl and they do near de-duplication. They train a model to predict whether something is likely to be a link, a reference link on Wikipedia. And there's just a bunch of other stuff that. Not only from like a, where do you get the model to predict whether something is a link as a reference on Wikipedia when you train it and then like where's your cut point?[00:52:41] You know, now you have kinda this precision recall trade off and it's like those decisions have material impacts on the quality and the character of the model that you learn. But also just from a scale standpoint, like building Common Crawl locally requires like a non-trivial distributed systems left.[00:52:59] And so I think Red Pajama is, and I think it's Mila and Chris Ray's lab hazy research, I think, or at least he's attached and together and I think together is kind of leading. There's a bunch of great teams behind that and so I have no reason to think they didn't do. The hard, difficult work correctly.[00:53:21] Yeah. And now is this major piece of the lift if you're wanting to do a lLLaMA repro in public. And I think that's would very naturally be the next step. And I would be kind of surprised if a train was not currently underway. Everybody agrees. LLLaMA is very, very strong. Also, we agree that it is not open incentives for somebody to spend a couple million bucks and produce it and then be the team that opened this architecture is, are quite high.[00:53:50] Mm-hmm. So I, I think in the next, you know, you asked for like predictions. I think we're five months at most away from a open LLaMA clone that is as high quality as, as what meta is produced. I will be disappointed if that's not the case.[00:54:07] Why Dolly > OpenAI GPT[00:54:07] And I think like there's the big distinction between what is open and what is like, Open in a way that is commercially usable.[00:54:13] Yeah. After that, I know the Dolly two post, you mentioned that you had a lot of inbound with Dolly. Yeah. 1.0. But a lot of businesses could not use it. Yeah. Because of where the data training data came from. Yes. What are some of the use cases that people have? There is, uh, a lot of it kind of like talking to your data.[00:54:30] Are there like, uh, other things that are maybe people are not thinking about using it for?[00:54:34] Yeah, so I mean, we have a number of customers who have reached out with really concrete use cases around customer support ticket resolution. One of the things that a lot of business open AI's models are incredibly powerful, and Databricks wants to be a business where you can use the right tool for the job.[00:54:55] Like if you have information from the public web, let's say you have forum posts, right, that you need to synthesize and process, that's just not sensitive information. You should be able to use truly whatever model. That might be a fine-tuned model that is like laser focused on your problem. It might be a general instruction following model and, and sort of whatever kind of intelligence GPT4 is, it's, you know, it's quite powerful.[00:55:20] You should be able to use those tools. There are definitely use cases in the enterprise where it's like, I either just, I'm not interested in sharing this ip. You know, these are effectively our state secrets. Or from a regulatory and compliance standpoint. I just can't send this data to a third party sub-process or something.[00:55:38] Even as quotidian is like, I just really don't want to go through procurement on this. You know, like it's kind of around those, um, I have some reasons to keep this in house. A lot of use cases like that and that, you know, I'm not a lawyer and so I won't speculate on the sort of actual licensing considerations or the actual obligations, but it's just like people like to be able to move confidently and what we've done with Dolly is make it super clear.[00:56:09] This model and this data set are licensed for commercial use. You can build a business on the back of this. And that, I think is a big part of why the response has been so positive.[00:56:19] Open Source Licensing for AI Models[00:56:19] Hugging face has, uh, the rail license responsible, um mm-hmm. AI license, which isn't recognized as open source yet. So that was the whole problem with stable diffusion, that it's just unclear cuz this, this is completely new license that is, uh, unproven.[00:56:32] But I just find it interesting that the existing open source licensing regime is mostly around code. And right now, you know, the, the value has shifted from code to the waits.[00:56:43] Yes. I think we can go on a three hour rant about the open source initiative and like who decides what an open source license is.[00:56:51] But I think there's a, I think the approach of like, hey, We know what commercial uses. Like this is good for it. Yes, it's good. You're not gonna have to worry about us suing you. It's like, you know, the semantics of it. Clear is always better. Exactly. It's like we don't need to be approved by the osi. Yeah.[00:57:07] You're gonna be okay. Just[00:57:09] Why Open Source Models?[00:57:09] to kind of like continue, like why open source? Yeah. I think that like it is with many eyes, all bugs are shallow. I think the reality is that like we do not know what the challenges we face with AI systems will be. Mm-hmm. And that the likelihood that we can get it a representative and comprehensive solution to the challenges they present by putting it in public and creating research artifacts that people who deal with ethics bias, ai, safety, security, these really sort of thorny issues, that they can take a hard look at how the actual thing is built and how it works and study it comprehensively rather than, Hey, we've got a team for that.[00:57:50] You're gonna mm-hmm. Just, you're just, we're just gonna need you to trust our work. I think I wanna be in that the former future rather than sort of like, I, I hope that people have done this correctly. I hope that this is somebody is taking care of this.[00:58:05] Moving Models[00:58:05] When people[00:58:06] evaluate this, how do you think about moving between models?[00:58:10] You know, obviously we talked about how the data set kind of shapes how the model behaves. Hmm. There's obviously people that might start on open AI and now they wanna try dollies. Yeah. Like what are some of the infrastructure there that maybe needs to be built to allow people to move their prompts from model to model?[00:58:26] Like to figure out, uh, how that works.[00:58:28] That's really interesting. Um, because you see even like moving between GPT3.5 and GPT4 that the behavior, like some things that were not possible on three five are No, I mean, many, many things that were not possible on three five are not possible on four, but you kind of want like slightly different problem formula, like slightly different prompt formulations or.[00:58:51] It's kind of like you want regression tests for prompts, and you could see like an automated system, which is uh, helps design a prompt such that the output of this new model is isomorphic to the outputs of the previous model. And sort of like using a language model to iterate on the prompt. So it just kind of evolves it to like adapt to the new model.[00:59:13] I have two beautiful boys who are, they're just incredible humans and my friend Ben and I built them a, an interactive choose your own adventure storytelling book that uses ChatGPT to generate stories and then options within those stories, and then uses open AI's image generation model Dolly to illustrate.[00:59:36] Those options. And then the kids can kind of choose their way through these stories. And the thing that you really like when you start to really push these things for more than just like single turn prompt response and I'm, I'm, you know, it's fine if it's language and you really need it to be like an api.[00:59:52] Is that like 19 times out, 20 it's like an p i and then the 20th generation. It's like just a totally different format. And he just like, you really like try to in the system prompt really seriously. I just only want you to give me three options. Yeah. And letter A, B, C, you know, I think that from a regression test standpoint, how do you know, like if I run this prompt a hundred times does a hundred out of a hun, does it come back a hundred out of a hundred in the format and sort of character that I require?[01:00:21] That's not something a person can really do effectively, and so I think you do need sort of model meta models that judge the outputs and that manage those migrations. Mm-hmm. Yeah, so I had, that's an interesting. Product class. I hadn't thought about it too much. Yeah.[01:00:34] Learning in a Simulation[01:00:34] When you mentioned before the example of the, you know, back country trip, I was like, yeah, it would be so cool if you had a, like a simulation where like, okay, this is the list you had.[01:00:44] Now I have this game where like I'm putting a character with that inventory and see if they survive in the back country. Cause you can like, you know, the first time I went to Yellowstone to camp, I forgot to pack like a fly for my tent and obviously it rained. That's because, you know, you get punished[01:00:58] right away.[01:00:59] Yeah. That's the environment providing you with a gradient. Exactly. Update your model eight. You should be grateful to have such an excellent Yeah. Mini[01:01:06] these models like the, the evolutionary piece that is missing is like, these models cannot. Die. They cannot break a arm. They cannot, when they make suggestions, like they don't actually Yeah.[01:01:16] Have any repercussion on them. Um, so I'm really curious if in the future, you know, okay, you wanna make a poem, uh, you know, I love poem. Now we're gonna send this structural people. Yeah. And if you get rejected, your model's gonna[01:01:28] Why Model Reflexion and Self Criticism Works[01:01:28] die. So I think like one of the things that's cool about Lang Chain, for example, we all know they're doing awesome work and building useful tools, but these models can tell if they're wrong.[01:01:38] So you can, like, you can ask a model to generate an utterance. And that next token prediction loss function may not capture. You may hallucinate something, you may make something up, but then you can show that generation to the same model. And ask it to tell you if it's correct or not. And it can, it can recognize that it's not, and I think that is a directly a function of the attention weights and that you can attend to the entire.[01:02:03] Whereas like for next token prediction, all I can see is the prefix and I'm just trying to choose and choosing sarcastically. Right. You're f frequently, like it's a weighted sample from the distribution over that soft softmax output vector, which does not have any. Reference to factuality, but when you resubmit to the model and you give it like, here's the entire generated passage, judge it in its completeness.[01:02:25] Well now I can attend to all of the token simultaneously, and it's just a much, much easier problem to solve. And so I think that like, oh, that's a cool insight. Yeah. Yeah. I mean it's, yeah. It's just, this is reflection. Yeah. You, you can just see what you said and like the model may contain enough information to judge it.[01:02:41] And so it's kind of like subject your plan mm-hmm. To an environment and see how it performs. I think like you could probably ask the model, I mean, we can try this today. Here's my plan for a trip. Critique it. Mm-hmm. Right? Like, what are, what are the things that could go wrong with this inventory? And I think that there's one scenario, there's one trajectory for this class of technologies, which would be like self-reflexive models where it is not super linear.[01:03:10] You don't get anything more than what is already contained in the models, and you just kind of saturate and it's like, okay, you need human feedback. There's another scenario, which is the alpha go scenario where models can play themselves and in observing their behavior and interactions they. Get stronger and better and more capable.[01:03:31] That's a much more interesting scenario and this idea that like in considering the entire generated sample, I have more insight than just when I'm sampling the next token. Mm-hmm. Suggests that there may. Be that escape potential in terms of getting super, you know, unsaturated returns on quality.[01:03:51] Lightning Round[01:03:51] Yeah, this was great, Mike kind of we're where a time, maybe we can jump into landing ground next.[01:03:55] We'll read you the questions again. Okay. If you wanna think about it. So, okay. Favorite AI[01:04:00] product? This is a boring answer, but it's true. Google Maps. Ah. And it's, how is it AI A, they're recently doing stuff with Nerf so that you can using Yeah. Multiple different photos. You can explore the interior of a business.[01:04:15] They are also undoubtedly, I mean like, I don't know the team at Google doing this, but digesting the sum total of human knowledge about each entity in their graph to like process that language and make judgements about what is this business? And listen, it's not an AI product, but it is a machine learning product categorically, and it's also an amazing product.[01:04:37] You forget how much you use it. I was at the coffee shop around the corner. I used it to figure out where to come. It was literally 150 meter walk, you know, it's just like that reflexive, but it's also from a, an information visualization. So I love maps. Mm-hmm. I opened our conversation saying that I think a lot about maps, that it is adaptive at multiple scales and will corson and refine the, the information that's displayed requires many, many judgements to be made and sim simultaneously about what is relevant and it's personalized.[01:05:08] It will take your intent. Are you driving? Okay, well show me parking garages preferentially. So it's very adaptive in such subtle ways that we don't notice it. And I think that's like great product design is like good editing. You don't notice it when it's good. Mm-hmm. And so I think Google Maps is an incredible AI ml.[01:05:28] Product accomplishment. Google Maps. Yeah. It's a great pick. Great. Well, and they need the help. Yeah.[01:05:36] It is actually the best ad uh, real estate, right? Like, there should be a ton of people buying ads specifically on Google Maps. Yeah. So they just show up and I, I don't know how big that business is, but it's gotta be huge.[01:05:45] Yeah. And, and then my subsequent thing is like, there should be Google Maps optimization, where you would name your business like Best Barbershop and it would show up as Best Barbershop when you look at it. Yeah,[01:05:55] of course. Right? Yeah. It's like AAA lock picks. Yeah. Right at the front of the Yellow Pages.[01:06:01] Favorite[01:06:01] AI people and communities you wanna shout out?[01:06:03] You know, I don't think that I have necessarily anything super original to say on this front. Um, The best of my understanding, this is an all volunteer effort and it's, you know, incredible what they have been able to accomplish. And it's like kind of in the constellation of projects.[01:06:20] You know, the additionally, I think these are what you would say and answer in response to this question, I think like the hugging face group is, it's kind of like Google Maps in a way, in the sense that you like, forget how complicated the thing that it's doing is, and I think they have. You see like specific people, I was thinking of STAs STAs, who works on the, works on a lot of the deep speed stuff, just super conscientious and like engaged with the community and like that the entire team at Hugging face is incredible and you know, they, you know, have made a lot of what is happening possible in the industry at large.[01:06:53] And so, um, and I think, yeah, this is like the power of open source ultimately Transformers, library, diffusers, all of it. It's just great. It's a great, it's a delightful product experience.[01:07:03] I think a lot of people, like I had, I once had hugging Face explained to me as Free, get LFS hosting. And I think they've, uh, they've moved beyond that in, in[01:07:11] recent years.[01:07:11] Yeah. A bit. Yeah. It's, it's quite strong work. Yeah.[01:07:14] Yeah. A year from now, what will people be the most surprised by in ai? You already[01:07:19] hinted[01:07:19] at? Uh, yeah, but I think that's not, like, I think that won't be surprising, I think as we're on a ballistic trajectory to having like a, an open lLLaMA reproduction. So here's something I think that will happen that we are not, like socially, we don't have a lot of priors for how to deal with, so this ghost writer track just came out this Kanye West Weekend.[01:07:40] Mm-hmm. AI collaboration. He has thoughts, Drake? Yeah. His thoughts. It's not really, Dave has thoughts. It's not really like, I, I like a different breed of hiphop, but like, it's. For an example of the class, it's like that does sound like a thing I might hear on the radio. So there's a world in, so skip flag was this knowledge graph that's builds itself from your workplace communication.[01:08:02] Think about all of the times that you have expressed your position and intent around a given topic in workplace communication or on the internet at large. I think like character AI is going in this direction where you're going to be able to talk to high fidelity avatars that represent the beliefs and intents of people around you, and that it will be both useful and convincing.[01:08:27] I don't know that like society has good models for how to sort of adapt to that existing and that it will, I suspect just on the basis of like what people are doing. Happened rather quickly at first.[01:08:41] Listen, you can definitely tell it's really good. Mm-hmm. I'm really curious what the long-term results are gonna be, because once you listen it once or twice, you can tell that it's like, it's not really like a coherent song kind of written.[01:08:55] But to me that the funniest thing is that actually, so Drake and the Weekend that never made a song together again because they kinda had a, a follow up between then and, and the Weekend at One song where he said, if you made me then replace me. Because Drake basically hinting that like if he didn't put the weekend on his album, he would've never become popular.[01:09:13] Okay. So it's funny that now there's like this AI generated song from the weekend. It just kind of puts the, you know, if you made me then replace me line in in a different context. But I think this will be super interesting for the labels, you know, like a lot of them do on the Masters to a lot of this music they do on, yeah.[01:09:31] A lot of rides. So, At some point, it's much easier to generate music this way than to do it in person. But I still think you need the artist touch.[01:09:39] Just like what is it that is unique and what, you know. I think artists frequently, you know, I, I know in my own writing and sort of like creative process, you sometimes feel like you're just going through the motions.[01:09:50] And it's funny how we have ways of talking about a phrase rolls off the tongue. That's very much like a causal language model. Mm-hmm. Where like we talk about talk tracks. I have a whole spiel, you know, you talk to a startup founder and you're like, oh my God, how many times have you said like, very close, like very tight variance on this Three minutes sometimes.[01:10:10] That's good. Yeah. It's, it's fine. It's just, it's a thing that we do. And so touching on this idea that like some of what we consider creative acts may not actually be creative acts and sort of, is there a pr, is there a market pressure to favor things that are truly creative versus just like formulaic and like re like rehashing kind of the same essence?[01:10:29] I think like art. Transcends boundaries is often the most interesting art to engage with, where it, it truly does confront you with something you haven't considered before. I hope that that's the place where humans play. And that they're kind of like, oh, I just need some lo-fi study beats. It's like, just gimme an infinite stream.[01:10:49] I'm fine. Because I'm just like,[01:10:52] you've seen that chart of like pop uh, songs, declining interns of the key changes, key changes in[01:10:58] Octa ranges. Completely. Completely. And like, I mean, we used to have[01:11:02] Bohemian Rhapsody and, and[01:11:03] yeah, it's a great example of something that would not be priced appropriately.[01:11:08] This is why I, I think perplexity AI is just very well named because we want more perplexity in our lives. Yes, by the way, shout out for replica ai. I don't know if you've come across them, but Absolutely. They are working on the digital twin stuff. Okay. Ai, uh, request for startups. AI thing you would pay for if someone[01:11:21] built it.[01:11:22] Well, so the LM op stuff for sure. Just like make it easy to generate and evaluate samples using multimodal, multimodal, I mean multiple modalities, not images and texts, but rather like humans, quantitative benchmarks and qualitative Oh, samples that I, I am able to evaluate myself, but other AI startups. I think that we have your sister, your wife, your wife has family that works in the park system.[01:11:49] Mm-hmm. Because it is so everybody has access to effectively the same information about what's interesting in the outdoors. I think you get to a lot of trail heads and you have very, very tight parking lots and it's difficult to get to a lot of these beautiful places. And like, um, mere Woods is another example of like, you gotta reserve a parking spot in the woods that's a plumber.[01:12:12] But I think that the US in particular is so unique in that we have such an expansive public lands, and I think that there are a lot of really majestic and beautiful places in the world that are not written about. And so I think from a geospatial standpoint, you could imagine representing each tile on a map like a word deve.[01:12:39] Embedding where you look at the context in which a location exists and the things people have said about it, and you, you kind of distill the essence of a place and you can given a statement about how I wanna spend my day route traffic more evenly. On the surface of the earth so that we are not all competing for the same fixed pool of resources.[01:13:03] I don't know that that's something really that's monetizable in like a, you know, is this gonna be the next 10 billion business sort of way. But like there's so much public land and there's so many back roads and like the days where I have, you know, rumbling down a dirt road, my brother are just the best days of my life.[01:13:22] And, uh, I want more of those. I want systems that help us live as fully as possible as humans. Yeah, there's definitely[01:13:29] a lot of, you know, you got the. The most popular trails. Everybody wants to be there. Yeah. And then there's the less known ones. And I feel like a lot of people back to the text to back is like, they don't know what they're gonna find, you know?[01:13:41] Mm-hmm. There's not like YouTube reviews of all these trails. Totally. But like you can see it. So I think a way to, to better understand that would be, would be cool.[01:13:49] I mean, just to kind of like riff on this just a little more and we can wrap, like I do think there's a AI technology as a swarm management.[01:13:59] Tool, you know, being able to perceive sensor and camera inputs from multiple different agents in a system. And I think about like ultra low powered gliders as an example of like, I would like to be able to get, I mean, there, there are tools now where you can, uh, for 180 bucks get a satellite to take a da a picture today of like a five by five kilometer area.[01:14:21] I just wanna be able to run recon fleets on the back country and get like up to date trail conditions. I don't know that anybody's gonna make any real money doing this, but if it existed, I would use it. So maybe I should build it maybe. Yeah, exactly. Open source. It's part of Databricks longstanding commitment to open source for diversifying new markets.[01:14:44] Awesome. Mike, it was, it was great[01:14:45] to have you. Oh, this was a, yeah. Get full access to Latent Space at www.latent.space/subscribe
01:15:5929/04/2023
AI-powered Search for the Enterprise — with Deedy Das of Glean
The most recent YCombinator W23 batch graduated 59 companies building with Generative AI for everything from sales, support, engineering, data, and more:Many of these B2B startups will be seeking to establish an AI foothold in the enterprise. As they look to recent success, they will find Glean, started in 2019 by a group of ex-Googlers to finally solve AI-enabled enterprise search. In 2022 Sequoia led their Series C at a $1b valuation and Glean have just refreshed their website touting new logos across Databricks, Canva, Confluent, Duolingo, Samsara, and more in the Fortune 50 and announcing Enterprise-ready AI features including AI answers, Expert detection, and In-context recommendations.We talked to Deedy Das, Founding Engineer at Glean and a former Tech Lead on Google Search, on why he thinks many of these startups are solutions looking for problems, and how Glean’s holistic approach to enterprise probllem solving has brought so much success. Deedy is also just a fascinating commentator on AI current events, being both extremely qualified and great at distilling insights, so we also went over his many viral tweets diving into Google’s competitive threats, AI Startup investing, and his exposure of Indian University Exam Fraud!Show Notes* Deedy on LinkedIn and Twitter and Personal Site* Glean* Glean and Google Moma* Golinks.io* Deedy on Google vs ChatGPT* Deedy on Google Ad Revenue* Deedy on How much does it cost to train a state-of-the-art foundational LLM?* Deedy on Google LaMDA cost* Deedy’s Indian Exam Fraud Story* Lightning Round* Favorite Products: (covered in segment)* Favorite AI People: AI Pub* Predictions: Models will get faster for the same quality* Request for Products: Hybrid Email Autoresponder* Parting Takeaway: Read the research!Timestamps* [00:00:21] Introducing Deedy* [00:02:27] Introducing Glean* [00:05:41] From Syntactic to Semantic Search* [00:09:39] Why Employee Portals* [00:12:01] The Requirements of Good Enterprise Search* [00:15:26] Glean Chat?* [00:15:53] Google vs ChatGPT* [00:19:47] Search Issues: Freshness* [00:20:49] Search Issues: Ad Revenue* [00:23:17] Search Issues: Latency* [00:24:42] Search Issues: Accuracy* [00:26:24] Search Issues: Tool Use* [00:28:52] Other AI Search takes: Perplexity and Neeva* [00:30:05] Why Document QA will Struggle* [00:33:18] Investing in AI Startups* [00:35:21] Actually Interesting Ideas in AI* [00:38:13] Harry Potter IRL* [00:39:23] AI Infra Cost Math* [00:43:04] Open Source LLMs* [00:46:45] Other Modalities* [00:48:09] Exam Fraud and Generated Text Detection* [00:58:01] Lightning RoundTranscript[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO and residence at Decibel Partners. I'm joined by my, cohost swyx, writer and editor of[00:00:19] Latent Space. Yeah. Awesome.[00:00:21] Introducing Deedy[00:00:21] And today we have a special guest. It's Deedy Das from Glean. Uh, do you go by Deedy or Debarghya? I go by Deedy. Okay.[00:00:30] Uh, it's, it's a little bit easier for the rest of us to, uh, to, to spell out. And so what we typically do is I'll introduce you based on your LinkedIn profile, and then you can fill in what's not on your LinkedIn. So, uh, you graduated your bachelor's and masters in CS from Cornell. Then you worked at Facebook and then Google on search, specifically search, uh, and also leading a sports team focusing on cricket.[00:00:50] That's something that we, we can dive into. Um, and then you moved over to Glean, which is now a search unicorn in building intelligent search for the workplace. What's not on your LinkedIn that people should know about you? Firstly,[00:01:01] guys, it's a pleasure. Pleasure to be here. Thank you so much for having me.[00:01:04] What's not on my LinkedIn is probably everything that's non-professional. I think the biggest ones are I'm a huge movie buff and I love reading, so I think I get through, usually I like to get through 10 books ish a year, but I hate people who count books, so I should say the number. And increasingly, I don't like reading non-fiction books.[00:01:26] I actually do prefer reading fiction books purely for pleasure and entertainment. I think that's the biggest omission from my LinkedIn.[00:01:34] What, what's, what's something that, uh, caught your eye for fiction stuff that you would recommend people?[00:01:38] Oh, I recently, we started reading the Three Body Problem and I finished it and it's a three part series.[00:01:45] And, uh, well, my controversial take is I did not really enjoy the second part, and so I just stopped. But the first book was phenomenal. Great concept. I didn't know you could write alien fiction with physics so Well, and Chinese literature in particular has a very different cadence to it than Western literature.[00:02:03] It's very less about the, um, let's describe people and what they're all about and their likes and dislikes. And it's like, here's a person, he's a professor of physics. That's all you need to know about him. Let's continue with the story. Um, and, and I, I, I, I enjoy it. It's a very different style from, from what I'm used.[00:02:21] Yeah, I, I heard it's, uh, very highly recommended. I think it's being adapted to a TV show, so looking forward[00:02:26] to that.[00:02:27] Introducing Glean[00:02:27] Uh, so you spend now almost four years at gle. The company's not unicorn, but you were on the founding team and LMS and tech interfaces are all the reach now. But you were building this before.[00:02:38] It was cool, so to speak. Maybe tell us more about the story, how it became, and some of the technological advances you've seen. Because I think you started, the company started really close to some of the early GPT models. Uh, so you've seen a lot of it from, from day one.[00:02:53] Yeah. Well, the first thing I'll say is Glean was never started to be a.[00:02:58] Technical product looking for a solution. We were always wanted to solve a very critical problem first that we saw, not only in the companies that we'd worked in before, but in all of the companies that a lot of our, uh, a lot of the founding team had been in past their time at Google. So Google has a really neat tool that already kind of does this internally.[00:03:18] It's called MoMA, and MoMA sort of indexes everything that you'd use inside Google because they have first party API accessed who has permissions to what document and what documents exist, and they rank them with their internal search tool. It's one of those things where when you're at Google, you sort of take it for granted, but when you leave and go anywhere else, you're like, oh my God, how do I function without being able to find things that I've worked on?[00:03:42] Like, oh, I remember this guy had a presentation that he made three meetings ago and I don't remember anything about it. I don't know where he shared it. I don't know if he shared it, but I do know the, it was a, something about X and I kind of wanna find that now. So that's the core. Information retrieval problem that we had set out to tackle, and we realized when we started looking at this problem that enterprise search is actually, it's not new.[00:04:08] People have been trying to tackle enterprise search for decades. Again, pre two thousands people have been trying to build these on-prem enterprise search systems. But one thing that has really allowed us to build it well, A, you now have, well, you have distributed elastic, so that really helps you do a lot of the heavy lifting on core infra.[00:04:28] But B, you also now have API support that's really nuanced on all of the SaaS apps that you use. So back in the day, it was really difficult to integrate with a messaging app. They didn't have an api. It didn't have any way to sort of get the permissions information and get the messaging information. But now a lot of SaaS apps have really robust APIs that really let.[00:04:50] Index everything that you'd want though though. That's two. And the third sort of big macro reason why it's happening now and why we're able to do it well is the fact that the SaaS apps have just exploded. Like every company uses, you know, 10 to a hundred apps. And so just the urgent need for information, especially with, you know, remote work and work from home, it's just so critical that people expect this almost as a default that you should have in your company.[00:05:17] And a lot of our customers just say, Hey, I don't, I can't go back to a life without internal search. And I think we think that's just how it should be. So that's kind of the story about how Glean was founded and a lot of the LLM stuff. It's neat that all, a lot of that's happening at the same time that we are trying to solve this problem because it's definitely applicable to the problem we're trying to solve.[00:05:37] And I'm really excited by some of the stuff that we are able to do with it.[00:05:41] From Syntactic to Semantic Search[00:05:41] I was talking with somebody last weekend, they were saying the last couple years we're going from the web used to be syntex driven. You know, you siegal for information retrieval, going into a symantics driven where the syntax is not as important.[00:05:55] It's like the, how you actually explain the question. And uh, we just asked Sarah from Seek.ai on the previous episode and instead of doing natural language and things like that for enterprise knowledge, it's more for business use cases. So I'm curious to see, you know, The enterprise of the future, what that looks like, you know, is there gonna be way less dropdowns and kind of like, uh, SQL queries and stuff like that.[00:06:19] And it's more this virtual, almost like person that embodies the company that is like a, an LLM in a way. But how do you do that without being able to surface all the knowledge that people have in the organization? So something like Lean is, uh, super useful for[00:06:35] that. Yeah, I mean, already today we see these natural language queries as well.[00:06:39] I, I will say at, at this point, it's still a small fraction of the queries. You see a lot of, a lot of the queries are, hey, what is, you know, just a name of a project or an acronym or a name of a person or some someone you're looking for. Yeah, I[00:06:51] think actually the Glean website explains gleans features very well.[00:06:54] When I, can I follow the video? Actually, video wasn't that, that informative video was more like a marketing video, but the, the actual website was showing screenshots of what you see there in my language is an employee portal. That happens to have search because you also surface like collections, which proactively show me things without me searching anything.[00:07:12] Right. Like, uh, you even have Go links, you should copy it, I think from Google, right? Which like, it's basically, uh, you know, in my mind it's like this is ex Googlers missing Google internal stuff. So they just built it for everyone else. So,[00:07:25] well, I can, I can comment on that. So a, I should just plug that we have a new website as of today.[00:07:30] I don't know how, how it's received. So I saw it yesterday, so let, let me know. I think today we just launch, I don't know when we launched a new one, I think today or yesterday. Yeah,[00:07:38] it's[00:07:38] new. I opened it right now it's different than yesterday.[00:07:41] Okay. It's, it's today and yeah. So one thing that we find is that, Search in itself.[00:07:48] This is actually, I think, quite a big insight. Search in itself is not a compelling enough use case to keep people drawn to your product. It's easy to say Google search is like that, but Google Search was also in an era where that was the only website people knew, and now it's not like that. When you are a new tool that's coming into a company, you can't sit on your high horse and say, yeah, of course you're gonna use my tool to search.[00:08:13] No, they're not gonna remember who you are. They're gonna use it once and completely forget to really get that retention. You need to sort of go from being just a search engine to exactly what you said, Sean, to being sort of an employee portal that does much more than that. And yeah, the Go Links thing, I, I mean, yes, it is copied from Google.[00:08:33] I will say there's a complete other startup called Go links.io that has also copied it from Google and, and everyone, everyone misses Go Links. It's very useful to be able to write a document and just be like, go to go slash this. And. That's where the document is. And, and so we have built a big feature set around it.[00:08:50] I think one of the critical ones that I will call out is the feed. Just being able to see, not just, so documents that are trending in your sub-organization documents that you, we think you should see are a limited set of them, as well as now we've launched something called Mentions, which is super useful, which is all of your tags across all of your apps in one place in the last whatever, you know, time.[00:09:14] So it's like all of the hundred Slack pings that you have, plus the Jira pings, plus the, the, the email, all of that in one place is super useful to have. So you did GitHub. Yeah, we do get up to, we do get up to all the mentions.[00:09:28] Oh my God, that's amazing. I didn't know you had it, but, uh, um, this is something I wish for myself.[00:09:33] It's amazing.[00:09:34] It's still a little buggy right now, but I think it's pretty good. And, and we're gonna make it a lot better as as we go.[00:09:39] Why Employee Portals[00:09:39] This[00:09:39] is not in our preset list of questions, but I have one follow up, which is, you know, I've worked in quite a few startups now that don't have employee portals, and I've worked at Amazon, which had an employee portal, but it wasn't as beautiful or as smart as as glean.[00:09:53] Why isn't this a bigger norm in all[00:09:56] companies? Well, there's several reasons. I would say one reason is just the dynamics of how enterprise sales happens is. I wouldn't say broken. It is, it is what it is, but it doesn't always cater to employees being happy with the best tools. What it does cater to is there's different incentive structures, right?[00:10:16] So if I'm an IT buyer, I have a budget and I need to understand that for a hundred of these tools that are pitched to me all the time, which ones really help the company And the way usually those things are evaluated is does it increase revenue and does it cut cost? Those are the two biggest ones. And for a software like Glean or a search portal or employee portal, it's actually quite difficult when you're in, generally bucketed in the space of productivity to say, Hey, here's a compelling use use case for why we will cut your cost or increase your revenue.[00:10:52] It's just a softer argument that you have to make there. It's just a fundamental nature of the problem versus if you say, Hey, we're a customer support tool. Everyone in SaaS knows that customer support tools is just sort of the. The last thing that you go to when you're looking for ideas, because it's easy to sell.[00:11:08] It's like, here's a metric. How many tickets can your customer support agent resolve? We've built a thing that makes it 20% better. That means it's 1,000 thousand dollars cost savings. Pay us 50 k. Call it a deal. That's a good argument. That's a very simple, easy to understand argument. It's very difficult to make that argument with search, which you're like, okay, you're gonna get see about 10 to 20 searches that's gonna save about this much time, uh, a day.[00:11:33] And that results in this much employee productivity. People just don't buy it as easily. So the first reaction is, oh, we work fine without it. Why do we need this now? It's not like the company didn't work without this tool, and uh, and only when they have it do they realize what they were missing out on.[00:11:50] So it's a difficult thing to sell in, in some ways. So even though the product is, in my opinion, fantastic, sometimes the buyer isn't easily convinced because it doesn't increase revenue or cut cost.[00:12:01] The Requirements of Good Enterprise Search[00:12:01] In terms of technology, can you maybe talk about some of the stack and you see a lot of companies coming up now saying, oh, we help you do enterprise search.[00:12:10] And it's usually, you know, embedding to then do context for like a LLM query mostly. I'm guessing you started as like closer to like the vector side of thing maybe. Yeah. Talk a bit about that and some learning siva and as founders try to, to build products like this internally, what should they think[00:12:27] about?[00:12:28] Yeah, so actually leading back from the last answer, one of the ways a lot of companies who are in the enterprise search space are trying to tackle the problem of sales is to lean into how advance the technology is, which is useful. It's useful to say we are AI powered, LLM powered vector search, cutting edge, state-of-the-art, yada, yada, yada.[00:12:47] Put it all your buzzwords. That's nice, but. The question is how often does that translate to better user experience is sort of, a fuzzy area where it, it's really hard for even users to tell, to be honest. Like you can have one or two great queries and one really bad query and be like, I don't know if this thing is smart.[00:13:06] And it takes time to evaluate and understand how a certain engine is doing. So to that, I think one of the things that we learned from Google, a lot of us come from an ex Google search background, and one of the key learnings is often with search, it's not about how advanced or how complex the technology is, it's about the rigor and intellectual honesty that you put into tuning the ranking algorithm.[00:13:30] That's a painstaking long-term and slow process at Google until I would say maybe 20 17, 20 18. Everything was run off of almost no real ai, so to speak. It was just information retrieval at its core, very basic from the seventies, eighties, and a bunch of these ranking components that are put stacked on top of it that do various tasks really, really well.[00:13:57] So one task in search is query understanding what does the query mean? One task is synonymous. What are other synonyms for this thing that we can also match on? One task is document understanding. Is this document itself a high quality document or not? Or is it some sort of SEO spam? And admittedly, Google doesn't do so well on that anymore, but there's so many tough sub problems that it breaks search down into and then just gets each of those problems, right, to create a nice experience.[00:14:24] So to answer your question, also, vector search we do, but it is not the only way we get results. We do a hybrid approach both using, you know, core IR signal synonymy. Query accentuation with things like acronym expansion, as well as stuff like vector search, which is also useful. And then we apply our level of ranking understanding on top of that, which includes personalization, understanding.[00:14:50] If you're an engineer, you're probably not looking for Salesforce documents. You know, you're probably looking for documents that are published or co-authored by people in your team, in your immediate team, and our understanding of all of your interactions with people around you. Our personalization layer, our good work on ranking is what makes us.[00:15:09] Good. It's not sort of, Hey, drop in LLM and embeddings and we become amazing at search. That's not how we think it[00:15:16] works. Yeah. I think there's a lot of polish that mix into quality products, and that's the difference that you see between Hacker News, demos and, uh, glean, which is, uh, actual, you know, search and chat unicorn.[00:15:26] Glean Chat?[00:15:26] But also is there a glean chat coming? Is is, what do you think about the[00:15:30] chat form factor? I can't say anything about it, but I think that we are experi, my, my politically correct answer is we're experimenting with many technologies that use modern AI and LLMs, and we will launch what we think users like best.[00:15:49] Nice. You got some media training[00:15:51] again? Yeah. Very well handed.[00:15:53] Google vs ChatGPT[00:15:53] We can, uh, move off of Glean and just go into Google search. Uh, so you worked on search for four years. I've always wanted to ask what happens when I type something into Google? I feel like you know more than others and you obviously there's the things you cannot say, but I'm sure Google does a lot of the things that Glean does as well.[00:16:08] How do you think about this Google versus ChatGPT debate? Let's, let's maybe start at a high level based on what you see out there, and I think you, you see a lot of[00:16:15] misconceptions. Yeah. So, okay, let me, let me start with Google versus ChatGPT first. I think it's disingenuous, uh, if I don't say my own usage pattern, which is I almost don't go back to Google for a large section of my queries anymore.[00:16:29] I just use ChatGPT I am a paying plus subscriber and it's sort of my go-to for a lot of things. That I ask, and I also have to train my mind to realize that, oh, there's a whole set of questions in your head that you never realize the internet could answer for you, and that now you're like, oh, wait, I could actually ask this, and then you ask it.[00:16:48] So that's my current usage pattern. That being said, I don't think that ChatGPT is the best interface or technology for all sets of queries. I think humans are obviously very easily excited by new technology, but new technology does not always mean the previous technology was worse. The previous technology is actually really good for a lot of things, and for search in particular, if you think about all the queries that come into Google search, they fall into various kinds of query classes, depending on whatever taxonomy you want to use.[00:17:24] But one sort of way of, of of understanding broad, generally, the query classes is something that is information seeking or exploratory. And for information for exploratory queries. I think there are uses where Google does really well. Like for example, let's say you want to just know a list of songs of this artist in this year.[00:17:49] Google will probably be able to add a hundred percent, tell you that pretty accurately all the time. Or if you want to say understand like what showtimes of movies came out today. So fresh queries, another query class, Google will be really good at that chat, not so good at that. But if you look at information seeking queries, you could even argue that if I ask for information about Donald Trump, Maybe ChatGPT will spit out a reasonable sounding paragraph and it makes sense, but it doesn't give me enough stuff to like click on and go to and navigate to in a news article here.[00:18:25] And I just kind wanna see a lot of stuff happening. So if you really break down the problem, I think it's not as easy as saying ChatGPT is a silver bullet for every kind of information need. There's a lot of information needs, especially for tail queries. So for long. Un before seen queries like, Hey, tell me the cheat code in Doom three.[00:18:43] This level, this boss ChatGPTs gonna blow it out the water on those kind of queries cuz it's gonna figure out all of these from these random sparse documents and random Reddit threads and assemble one consistent answer for you where it takes forever to find this kind of stuff on Google. For me personally, coding is the biggest use case for anything technical.[00:19:02] I just go to ChatGPT cuz parsing through Stack Overflow is just too mentally taxing and I don't care about, even if ChatGPT hallucinates a wrong answer, I can verify that. But I like seeing a coherent, nice answer that I can just kind of good starting point for my research on whatever I'm trying to understand.[00:19:20] Did you see the, the statistic that, uh, the Allin guys have been saying, which is, uh, stack overflow traffic is down 15%? Yeah, I did, I did.[00:19:27] See that[00:19:28] makes sense. But I, I, I don't know if it's like only because of ChatGPT, but yeah, sure. I believe[00:19:33] it. No, the second part was just about if some of the enterprise product search moves out of Google, like cannot, that's obviously a big AdWords revenue driver.[00:19:43] What are like some of the implications in terms of the, the business[00:19:46] there?[00:19:47] Search Issues: Freshness[00:19:47] Okay,[00:19:47] so I would split this answer into two parts. My first part is just talking about freshness, cuz the query that you mentioned is, is specifically the, the issue there is being able to access fresh information. Google just blanket calls his freshness.[00:20:01] Today's understanding of large language models is that it cannot do anything that's highly fresh. You just can't train these things fast enough and cost efficiently enough to constantly index new, new. Sources of data and then serve it at the same time in any way that's feasible. That might change in the future, but today it's not possible.[00:20:20] The best thing that you can get that's close to it is what, you know, the fancy term is retrieval, augmented generation, but it's a fancy way of saying just do the search in the background and then use the results to create the actual response. That's what Bing does today. So to answer the question about freshness, I would say it is possible to do with these methods, but those methods all in all involve using search in the backend to, to sort of get the context to generate the answer.[00:20:49] Search Issues: Ad Revenue[00:20:49] The second part of the answer is, okay, talk about ad revenue. A lot of Google's ad revenue just comes from the fact that over the last two decades, it's figured out how to put ad links on top of a search result page that sometimes users click. Now the user behavior on a chat product is not to click on anything.[00:21:10] You don't click on stuff you just read and you move on. And that actually, in my opinion, has severe impacts on the web ecosystem, on all of Google and all of technology and how we use the internet in the future. And, and the reason is one thing we also take for granted is that this ad revenue where everyone likes to say Google is bad, Google makes money off ads, yada, yada, yada, but this ad revenue kind of sponsored the entire internet.[00:21:37] So you have Google Maps and Google search and photos and drive and all of this great free stuff basically because of ads. Now, when you have this new interface, sure it, it comes with some benefits, but if users aren't gonna click on ads and you replace the search interface with just chat, that can actually be pretty dangerous in terms of what it even means.[00:21:59] To have to create a website, like why would I create a website if no one's gonna come to my. If it's just gonna be used to train a model and then someone's gonna spit out whatever my website says, then there's no incentive. And that kind of dwindles the web ecosystem. In the end, it means less ad revenue.[00:22:15] And then the other existential question is, okay, I'm okay with saying the incumbent. Google gets defeated and there's this new hero, which is, I don't know, open AI and Microsoft. Now reinvent the wheel. All of that stuff is great, but how are they gonna make money? They can make money off, I guess, subscriptions.[00:22:31] But subscriptions is not nearly gonna make you enough. To replace what you can make on ad revenue. Even for Bing today. Bing makes it 11 billion off ad revenue. It's not a society product like it's a huge product, and they're not gonna make 11 billion off subscriptions, I'll tell you that. So even they can't really replace search with this with chat.[00:22:51] And then there are some arguments around, okay, what if you start to inject ads in textual form? But you know, in my view, if the natural user inclination is not to click on something or chat, they're clearly not gonna click on something. No matter how much you try to inject, click targets into your result.[00:23:10] So, That's, that's my long answer to the ads question. I don't really know. I just smell danger in the horizon.[00:23:17] Search Issues: Latency[00:23:17] You mentioned the information augmented generation as well. Uh, I presumably that is literally Bing is probably just using the long context of GPT4 and taking the full text of all the links that they find, dumping it in, and then generating some answer.[00:23:34] Do you think like speed is a concern or people are just people willing to wait for smarter?[00:23:40] I think it's a concern. We noticed that every, every single product I've worked on, there's almost a linear, at least for some section of it, a very linear curve. A linear line that says the more the latency, the less the engagement, so there's always gonna be some drop off.[00:23:55] So it is a concern, but with things like latency, I just kind of presume that time solves these things. You optimize stuff, you make things a little better, and the latency will get down with time. And it's a good time to even mention that. Bard, we just came out today. Google's LLM. For Google's equivalent, I haven't tried it, but I've been reading about it, and that's based off a model called LamDA.[00:24:18] And LamDA intrinsically actually does that. So it does query what they call a tool set and they query search or a calculator or a compiler or a translator. Things that are good at factual, deterministic information. And then it keeps changing its response depending on the feedback from the tool set, effectively doing something very similar to what Bing does.[00:24:42] Search Issues: Accuracy[00:24:42] But I like their framing of the problem where it's just not just search, it's any given set of tools. Which is similar to what a Facebook paper called Tool Former, where you can think of language as one aspect of the problem and language interfaces with computation, which is another aspect of the problem.[00:24:58] And if you can separate those two, this one just talks to these things and figures out what to, how to phrase it. Yeah, so it's not really coming up with the answer. Their claim is like GPT4, for example. The reason it's able to do factual accuracy without search is just by memorizing facts. And that doesn't scale.[00:25:18] It's literally somewhere in the whole model. It knows that the CEO of Tesla is Elon Musk. It just knows that. But it doesn't know that this is a competition. It just knows that. Usually I see CEO, Tesla, Elon, that's all it knows. So the abstraction of language model to computational unit or tool set is an interesting one that I think is gonna be more explored by all of these engines.[00:25:40] Um, and the latency, you know, it'll.[00:25:42] I think you're focusing on the right things there. I actually saw another article this morning about the memorization capability. You know how GPT4 is a lot of, uh, marketed on its ability to answer SAT questions and GRE questions and bar exams and, you know, we covered this in our benchmarks podcast Alessio, but like I forgot to mention that all these answers are out there and were probably memorized.[00:26:05] And if you change them just, just a little bit, the model performance will probably drop a lot.[00:26:10] It's true. I think the most compelling, uh, proof of that, of what you just said is the, the code forces one where somebody I think tweeted, tweeted, tweeted about the, yeah, the 2021. Everything before 2021. It solves everything after.[00:26:22] It doesn't, and I thought that was interesting.[00:26:24] Search Issues: Tool Use[00:26:24] It's just, it's just dumb. I'm interested in two former, and I'm interested in react type, uh, patterns. Zapier just launched a natural language integration with LangChain. Are you able to compare contrast, like what approaches you like when it comes to LMS using[00:26:36] tools?[00:26:37] I think it's not boiled down to a science enough for me to say anything that's uh, useful. Like I think everyone is at a point of time where they're just playing with it. There's no way to reason about what LLMs can and can't do. And most people are just throwing things at a wall and seeing what sticks.[00:26:57] And if anyone claims to be doing better, they're probably lying because no one knows how these things behaves. You can't predict what the output is gonna be. You just think, okay, let's see if this works. This is my prompt. And then you measure and you're like, oh, that worked. Versus the stint and things like react and tool, form are really cool.[00:27:16] But those are just examples of things that people have thrown at a wall that stuck. Well, I mean, it's provably, it works. It works pretty, pretty well. I will say that one of the. It's not really of the framing of what kind of ways can you use LLMs to make it do cool things, but people forget when they're looking at cutting edge stuff is a lot of these LLMs can be used to generate synthetic data to bootstrap smaller models, and it's a less sexy space of it all.[00:27:44] But I think that stuff is really, really cool. Where, for example, I want to tag entities in a sentence that's a very simple classical natural language problem of NER. And what I do is I just, before I had to gather training data, train model, tune model, all of this other stuff. Now what I can do is I can throw GPT4 at it to generate a ton of synthetic data, which looks actually really good.[00:28:11] And then I can either just train whatever model I wanted to train before on this data, or I can use something called like low rank adaptation, which is distilling this large model into a much smaller, cost effective, fast model that does that task really well. And in terms of productionable natural language systems, that is amazing that this is stuff you couldn't do before.[00:28:35] You would have teams working for years to solve NER and that's just what that team does. And there's a great red and viral thread about our, all the NLP teams at Big Tech, doomed and yeah, I mean, to an extent now you can do this stuff in weeks, which is[00:28:51] huge.[00:28:52] Other AI Search takes: Perplexity and Neeva[00:28:52] What about some of the other kind of like, uh, AI native search, things like perplexity, elicit, have you played with, with any of them?[00:29:00] Any thoughts on[00:29:01] it? Yeah. I have played with perplexity and, and niva. Everyone. I think both of those products sort of try to do, again, search results, synthesis. Personally, I think Perplexity might be doing something else now, but I don't see the, any of those. Companies or products are disrupting either open AI or ChatGPT or Google being whatever prominent search engines with what they do, because they're all built off basically the Bing API or their own version of an index and their search itself is not good enough and there's not a compelling use case enough, I think, to use those products.[00:29:40] I don't know how they would make money, a lot of Neeva's way of making money as subscriptions. Perplexity I don't think has ever turned on the revenue dial. I just have more existential concerns about those products actually functioning in the long run. So, um, I think I see them as they're, they're nice, they're nice to play with.[00:29:56] It's cool to see the cutting edge innovation, but I don't really understand if they will be long lasting widely used products.[00:30:05] Why Document QA will Struggle[00:30:05] Do you have any idea of what it might take to actually do like a new kind of like, type of company in this space? Like Google's big thing was like page rank, right? That was like one thing that kind of set them apart.[00:30:17] Like people tried doing search before, like. Do you have an intuition for what, like the LM native page rank thing is gonna be to make something like this exist? Or have we kinda, you know, hit the plateau when it comes to search innovation?[00:30:31] So I, I talk to so many of my friends who are obviously excited about this technology as well, and many of them who are starting LLM companies.[00:30:38] You know, how many companies in the YC batch of, you know, winter 23 are LM companies? Crazy half of them. Right? Right. It's, it's ridiculous. But what I always, I think everyone's struggling with this problem is what is your advantage? What is your moat? I don't see it for a lot of these companies, and, uh, it's unclear.[00:30:58] I, I don't have a strong intuition. My sense is that the people who focus on problem first usually get much further than the people who focus solution first. And there's way too many companies that are solutions first. Which makes sense. It's always been the, a big achilles heel of the Silicon Valley.[00:31:16] We're a bunch of nerds that live in a whole different dimension, which nobody else can relate to, but nobody else. The problem is nobody else can relate to them and we can't relate to their problems either. So we look at tech first, not problem first a lot. And I see a lot of companies just, just do that.[00:31:32] Where I'll tell you one, this is quite entertaining to me. A very common theme is, Hey, LMS are cool, that, that's awesome. We should build something. Well, what should we build? And it's like, okay, consumer, consumer is cool, we should build consumer. Then it's like, ah, nah man. Consumers, consumer's pretty hard.[00:31:49] Uh, it's gonna be a clubhouse gonna blow up. I don't wanna blow up, I just wanna build something that's like, you know, pretty easy to be consistent with. We should go enter. Cool. Let's go enterprise. So you go enterprise. It's like, okay, we brought LMS to the enterprise. Now what problem do we tackle? And it's like, okay, well we can do q and A on documents.[00:32:06] People know how to do that, right? We've seen a couple of demos on that. So they build it, they build q and a on documents, and then they struggle with selling, or they're like, or people just ask, Hey, but I don't ask questions to my documents. Like, you realize this is just not a flow that I do, like I, oh no.[00:32:22] I ask questions in general, but I don't ask them to my documents. And also like what documents can you ask questions to? And they'll be like, well, any of them is, they'll say, can I ask them to all of my documents? And they'll be like, well, sure, if you give them, give us all your documents, you can ask anything.[00:32:39] And then they'll say, okay, how will you take all my document? Oh, it seems like we have to build some sort of indexing mechanism and then from one thing to the other, you get to a point where it's like we're building enterprise search and we're building an LM on top of it, and that is our product. Or you go to like ML ops and I'm gonna help you host models, I'm gonna help you train models.[00:33:00] And I don't know, it's, it seems very solution first and not problem first. So the only thing I would recommend is if you think about the actual problems and talk to users and understand what this can be useful for. It doesn't have to be that sexy of how it's used, but if it works and solves the problem, you've done your job.[00:33:18] Investing in AI Startups[00:33:18] I love that whole evolution because I think quite a few companies ha are, independently finding this path and, going down this route to build a glorified, you know, search spot. We actually interviewed a very problem focused builder, Mickey Friedman, who's very, very focused on products placement, image generation.[00:33:34] , and, you know, she's not focused on anything else in terms of image generation, like just focused on product placement and branding. And I think that's probably the right approach, you know, and, and if you think about like Jasper, right? Like they, they're out of all the other GPT3 companies when, when GPT3 first came out, they built focusing on, you know, writers on Facebook, you know, didn't even market on Twitter.[00:33:56] So like most people haven't heard of them. Uh, I think it's a timeless startup lesson, but it's something to remind people when they're building with, uh, language models. I mean, as a, as an investor like you, you know, you are an investor, you're your scout with me. Doesn't that make it hard to invest in anything like, cuz.[00:34:10] Mostly it's just like the incumbents will get to the innovation faster than startups will find traction.[00:34:16] Really. Like, oh, this is gonna be a hot take too. But, okay. My, my in, in investing, uh, with people, especially early, is often for me governed by my intuition of how they approach the problem and their experience with the technology, and pretty much solely that I don.[00:34:37] Really pretend to be an expert in the industry or the space that's their problem. If I think they're smart and they understand the space better than me, then I mostly convinced as if they've thought through enough of the business stuff, if they've thought through the, the market and everything else. I'm convinced I typically stray away from, you know, just what I just said.[00:34:57] Founders who are like LMS are cool and we should build something with them. That's not like usually very convincing to me. That's not a thesis. But I don't concern myself too much with pretending to understand what this space means. I trust them to do that. If I'm convinced that they're smart and they've thought about it, well then I'm pretty convinced that that they're a good person to, to, to[00:35:20] back.[00:35:21] Cool.[00:35:21] Actually Interesting Ideas in AI[00:35:21] Kinda like super novel idea that you wanna shout.[00:35:25] There's a lot of interesting explorations, uh, going on. Um, I, I, okay, I'll, I'll preface this with I, anything in enterprise I just don't think is cool. It's like including, like, it's just, it's, you can't call it cool, man. You're building products for businesses.[00:35:37] Glean is pretty cool. I'm impressed by Glean. This is what I'm saying. It's, it's cool for the Silicon Valley. It's not cool. Like, you're not gonna go to a dinner party with your parents and be like, Hey mom, I work on enterprise search. Isn't that awesome? And they're not all my, all my[00:35:51] notifications in one place.[00:35:52] Whoa.[00:35:55] So I will, I'll, I'll start by saying, for in my head, cool means like, the world finds this amazing and, and it has to be somewhat consumer. And I do think that. The ideas that are being played with, like Quora is playing with Poe. It's kind of strange to think about, and may not stick as is, but I like that they're approaching it with a very different framing, which is, Hey, how about you talk to this, this chat bot, but let's move out of this, this world where everyone's like, it's not WhatsApp or Telegram, it's not a messaging app.[00:36:30] You are actually generating some piece of content that now everybody can make you use of. And is there something there Not clear yet, but it's an interesting idea. I can see that being something where, you know, people just learn. Or see cool things that GPT4 has said or chatbots have said that's interesting in the image space.[00:36:49] Very contrasted to the language space. There's so much like I don't even begin to understand the image space. Everything I see is just like blows my mind. I don't know how mid journey gets from six fingers to five fingers. I don't understand this. It's amazing. I love it. I don't understand what the value is in terms of revenue.[00:37:08] I don't know where the markets are in, in image, but I do think that's way, way cooler because that's a demo where, and I, and I tried this, I showed GPT4 to, to my mom and my mom's like, yeah, this is pretty cool. It does some pretty interesting stuff. And then I showed the image one and she is just like, this is unbelievable.[00:37:28] There's no way a computer could write do this, and she just could not digest it. And I love when you see those interactions. So I do think image world is a whole different beast. Um, and, and in terms of coolness, lot more cool stuff happening in image video multimodal I think is really, really cool. So I haven't seen too many startups that are doing something where I'm like, wow, that's, that's amazing.[00:37:51] Oh, 11 labs. I'll, I'll mention 11 labs is pretty cool. They're the only ones that I know that are doing Oh, the voice synthesis. Have you tried it? I've only played with it. I haven't really tried generating my own voice, but I've seen some examples and it looks really, really awesome. I've heard[00:38:06] that Descript is coming up with some stuff as well to compete, cuz yeah, this is definitely the next frontier in terms of, podcasting.[00:38:13] Harry Potter IRL[00:38:13] One last thing I I will say on the cool front is I think there is something to be said about. A product that brings together all these disparate advancements in ai. And I have a view on what that looks like. I don't know if everyone shares that view, but if you bring together image generation, voice recognition, language modeling, tts, and like all of the other image stuff they can do with like clip and Dream booth and putting someone's actual face in it.[00:38:41] What you can actually make, this is my view of it, is the Harry Potter picture come to life where you actually have just a digital stand where there's a person who's just capable of talking to you in their voice, in, you know, understandable dialogue. That is how they speak. And you could just sort of walk by, they'll look at you, you can say hi, they'll be, they'll say hi back.[00:39:03] They'll start talking to you. You start talking back to it. That's sort of my, that's my my wild science fiction dream. And I think the technology exists to put all of those pieces together and. The implications for people who are older or saving people over time are huge. This could be a really cool thing to productionize.[00:39:23] AI Infra Cost Math[00:39:23] There's one more part of you that also tweets about numbers and math, uh, AI math essentially is how I'm thinking about it. What gets you into talking about costs and math and, and you know, just like first principles of how to think about language models.[00:39:39] One of my biggest beefs with big companies is how they abstract the cost away from all the engineers.[00:39:46] So when you're working on a Google search, I can't tell you a single number that is cost related at all. Like I just don't know the cost numbers. It's so far down the chain that I have no clue how much it actually costs to run search, and how much these various things cost aside from what the public knows.[00:40:03] And I found that very annoying because when you are building a startup, particularly maybe an enterprise startup, you have to be extremely cognizant about the cost because that's your unit economics. Like your primary cost is the money you spend on infrastructure, not your actual labor costs. The whole thesis is the labor doesn't scale, but the inf.[00:40:21] Does scale. So you need to understand how your infra costs scale. So when it comes to language models, given that these things are so compute heavy, but none of the papers talk about cost either. And it's just bothers me. I'm like, why can't you just tell me how much it costs you to, to build this thing?[00:40:39] It's not that hard to say. And it's also not that hard to figure out. They give you everything else, which is, you know, how many TPUs it took and how long they trained it for and all of that other stuff, but they don't tell you the cost. So I've always been curious because ev all everybody ever says is it's expensive and a startup can't do it, and an individual can't do it.[00:41:01] So then the natural question is, okay, how expensive is it? And that's sort of the, the, the background behind. Why I started doing some more AI math and, and one of the tweets that probably the one that you're talking about is where I compare the cost of LlaMA, which is Facebook's LLM, to PaLM with, uh, my best estimates.[00:41:23] And, uh, the only thing I'll add to that is it is quite tricky to even talk about these things publicly because you get rammed in the comments because by people who are like, oh, don't you know that this assumption that you made is completely BS because you should have taken this cost per hour? Because obviously people do bulk deals.[00:41:42] And yeah, I have two 80 characters. This is what I could have said. But I think ballpark, I think I got close. I, I'd like to imagine, I think I was off maybe by, by by two x on the lower side. I think I took an upper bound and I might have been off by, by two x. So my quote was 4 million for LlaMA and 27 for PaLM.[00:42:01] In fact, later today I'm going to do, uh, one on Bard. So. Oh oh one bar. Oh, the exclusive is that It's four, it's 4 million for Bard two.[00:42:10] Nice. Nice. Which is like, do you think that's like, don't you think that's actually not a lot, like it's a drop in the bucket for these[00:42:17] guys. One, and one of the, the valuable things to note when you're talking about this cost is this is the cost of the final training step.[00:42:24] It's not the cost of the entire process. And a common rebuttal is, well, yeah, this is your cost of the final training process, but in total it's about 10 x this amount cost. Because you have to experiment. You have to tune hyper parameters, you have to understand different architectures, you have to experiment with different kinds of training data.[00:42:43] And sometimes you just screw it up and you don't know why. And you have, you're just spend a lot of time figuring out why you screwed it up. And that's where the actual cost buildup happens, not in the one final last step where you actually train the final model. So even assuming like a 10 x on top of this, I think is, is, is fair for how much it would actually cost a startup to build this from scratch?[00:43:03] I would say.[00:43:04] Open Source LLMs[00:43:04] How do you think about open source in this then? I think a lot of people's big 2023 predictions are an LLM, you know, open source LLM, that is comparable performance to the GPT3 model. Who foots the bill for the mistakes? You know, like when when somebody opens support request that it's not good.[00:43:25] It doesn't really cost people much outside of like a GitHub actions run as people try entering these things separately. Like do you think open source is actually bad because you're wasting so much compute by so many people trying to like do their own things and like, do you think it's better to have a centralized team that organizes these experiments or Yeah.[00:43:43] Any thoughts there? I have some thoughts. I. The most easy comparison to make is to image generation world where, you know, you had Mid Journey and Dolly come out first, and then you had Imad come out with stability, which was completely open source. But the difference there is I think stability. You can pretty much run on your machine and it's okay.[00:44:06] It works pretty fast. So it, so the entire concept of, of open sourcing, it worked and people made forks that fine tuned it on a bunch of different random things and it made variance of stability that could. A bunch of things. So I thought the stability thing, agnostic of the general ethical concerns of training on everyone's art.[00:44:25] I thought it was a cool, cool addition to the sort of trade-offs in different models that you can have in image generation for text generation. We're seeing an equivalent effect with LlaMA and alpaca, which LlaMA being, being Facebook's model, which they didn't really open source, but then the weights got leaked and then people clone them and then they tuned them using GPT4 generated synthetic data and made alpaca.[00:44:50] So the version I think that's out there is only the 7,000,000,001 and then this crazy European c plus plus God. Came and said, you know what, I'm gonna write this entire thing in c plus plus so you can actually run it locally and and not have to buy GPUs. And a combination of those. And of course a lot of people have done work in optimizing these things to make it actually function quickly.[00:45:13] And we can get into details there, but a function of all of these things has enabled people to actually. Semi-good models on their computer. I don't have that much, I don't have any comments on, you know, energy usage and all of that. I don't really have an opinion on that. I think the fact that you can run a local version of this is just really, really cool, but also supremely dangerous because with images, conceivably, people can tell what's fake and what's real, even though there, there's some concerns there as well. But for text it's, you know, like you can do a lot of really bad things with your own, you know, text generation algorithm. You know, if I wanted to make somebody's life hell, I could spam them in the most insidious ways with all sorts of different kinds of text generation indefinitely, which I, I can't really do with images.[00:46:02] I don't know. I find it somewhat ethically problematic in terms of the power is too much for an individual to wield. But there are some libertarians who are like, yeah, why should only open AI have this power? I want this power too. So there's merits to both sides of the argument. I think it's generally good for the ecosystem.[00:46:20] Generally, it will get faster and the latency will get better and the models may not ever reach the size of the cutting edge that's possible, but it could be good enough to do. 80% of the things that bigger model could do. And I think that's a really good start for innovation. I mean, you could just have people come up with stuff instead of companies, and that always unlocks a whole vector of innovation that didn't previously exist.[00:46:45] Other Modalities[00:46:45] That was a really good, conclusion. I, I, I want to ask follow up questions, but also, that was a really good place to end it. Was there any other AI topics that you wanted to[00:46:52] touch on? I think Runway ML is the one company I didn't mention and that, that one's, uh, one to look out for.[00:46:58] I think doing really cool stuff in terms of video editing with generative techniques. So people often talk about the open AI and the Googles of the world and philanthropic and clo and cohere and big journey, all the image stuff. But I think the places that people aren't paying enough attention to that will get a lot more love in the next couple of years.[00:47:19] Better whisper, so better streaming voice recognition, better t t s. So some open source version of 11 labs that people can start using. And then the frontier is sort of multi-modality and videos. Can you do anything with videos? Can you edit videos? Can you stitch things together into videos from images, all sorts of different cool stuff.[00:47:40] And then there's sort of the long tail of companies like Luma that are working on like 3D modeling with generative use cases and taking an image and creating a 3D model from nothing. And uh, that's pretty cool too, although the practical use cases to me are a little less clear. Uh, so that's kind of covers the entire space in my head at least.[00:48:00] I[00:48:00] like using the Harry Potter image, like the moving and speaking images as a end goal. I think that's something that consumers can really get behind as well. That's super cool.[00:48:09] Exam Fraud and Generated Text Detection[00:48:09] To double back a little bit before we go into the lining round, I have one more thing, which is, relevant to your personal story, but then also relevant to our debate, which is a nice blend.[00:48:18] You're concerned about the safety of everyone having access to language models and you know, the potential harm that you can do there. My guess is that you're also not that positive on watermarking. Techniques from internal languages, right? Like maybe randomly sprinkling weird characters so that people can see like that this is generated by an AI model, but also like you have some personal experience with this because you found manipulation in the Indian Exam Board, which, uh, maybe you might be a similar story.[00:48:48] I, I don't know if you like, have any thoughts about just watermarking manipulation, like, you know, ethical deployments of, of, uh,[00:48:55] generated data.[00:48:57] Well, I think those two things are a little separate. Okay. One I would say is for watermarking text data. There is a couple of different approaches. I think there is actual value to that because from a pure technical perspective, you don't want models to train on stuff they've generated.[00:49:13] That's kind of bad for models. Yes. And two is obviously you don't want people to keep using Chatt p t for i, I don't know if you want this to use it for all their assignments and never be caught. Maybe you don't. Maybe you don't. But it, it seems like it's valuable to at least understand that this is a machine generated text versus not just ethically that seems, seems like something that should exist.[00:49:33] So I do think watermarking is, is. A good direction of research and it's, and I'm fairly positive on it. I actually do think people should standardize how that water marketing works across language models so that everyone can detect and understand language models and not just, OpenAI does its own models, but not the other ones and, and so on.[00:49:51] So that's my view on that. And then, and sort of transitioning into the exam data, this is really old one, but it's one of my favorite things to talk about is I. In America, as you know. Usually the way it works is you give your, you, you take your s a t exam, uh, you take a couple of aps, you do your school grades, you apply to colleges, you do a bunch of fluff.[00:50:10] You try to prove how you're good at everything. And then you, you apply to colleges and then it's a, a weird decision based on a hundred other factors. And then they decide whether you get in or not. But if you're rich, you're basically gonna get in anyway. And if you're a legacy, you're probably gonna get in and there's a whole bunch of stuff going on.[00:50:23] And I don't think the system is necessarily bad, but it's just really complicated. And some of the things are weird in India and in a lot of the non developed world, people are like, yeah, okay, we can't scale that. There's no way we can have enough people like. Non rigorously evaluate this cuz there's gonna be too much corruption and it's gonna be terrible at the end cuz people are just gonna pay their way in.[00:50:45] So usually it works in a very simple way where you take an exam that is standardized and sometimes you have many exams, sometimes you have an exam for a different subject. Sometimes it's just one for everything. And you get ranked on that exam and depending on your rank you get to choose the quality and the kind of thing you want to study.[00:51:03] Which this, the kind of thing always surprises people in America where it's not like, oh it's glory land, where you walk in and you're like, I think this is interesting and I wanna study this. Like, no, in the most of the world it's like you're not smart enough to study this, so you're probably not gonna study it.[00:51:18] And there's like a rank order of things that you need to be smart enough to do. So it's, it's different. And therefore these exams. Much more critical for the functioning of the system. So when there's fraud, it's not like a small part of your application going wrong, it's your entire application going wrong.[00:51:36] And that's why, that's just me explaining why this is severe. Now, one such exam is the one that you take in school. There's a, it's called a board exam. You take one in the 10th grade, which doesn't really matter for much, but, and then you take one in the 12th grade when you're about to graduate and that.[00:51:53] How you, where you go to college for a large set of colleges, not all, but a large set of colleges, and based on how much you get on your top five average, you're sort of slotted into a different stream in a d in a, in a different college. And over time, because of the competition between two of the boards that are a duopoly, there's no standardization.[00:52:13] So everyone's trying to like, give more marks than the, the, the other person to attract more students into their board because oh, that means that you can then claim, oh, you're gonna get into a better college if you take our exam and don't go to a school that administers the other exam. What? So it's, and that's, that's the, everyone knew that was happening ish, but there was no data to back it.[00:52:34] But when you actually take this exam as I did, you start realizing that the numbers, the marks make no sense because you're looking at. Kid who's also in your class and you're like, dude, this guy's not smart. How did he get a 90 in English? He's not good at English. Like, you can't speak it. You cannot give him a 90.[00:52:54] You gave me a 90. How did this guy get a 90? So everyone has like their anecdotal, this doesn't make any sense me, uh, moments with, with this exam, but no one has access to the data. So way back when, what I did was I realized they have very little security surrounding the data where the only thing that you need to put in to get access is your role number.[00:53:15] And so as long as you predict the right set of role numbers, you can get everybody's results. So unlike America, also exam results aren't treated with a level of privacy. In India, it's very common to sort of the entire class's results on a bulletin board. And you just see how everyone did and you shamed the people who are stupid.[00:53:32] That's just how it works. It's changed over time, but that's fundamentally a cultural difference. And so when I scraped all these results and I published it, and I, and I did some analysis, what I found was, A couple of very insidious things. One is that in, if you plot the distribution of marks, you generally tend to see some sort of skewed, but pseudo normal distribution where it's a big peak and a, and it falls off on both ends, but you see two interesting patterns.[00:54:01] One that is just the most obvious one, which is Grace Marks, which is the pass grade is 33. You don't see nobody got between 29 and 32 because what they did for every single exam is they just made you pass. They just rounded up to 33, which is okay. I'm not that concerned about whether you give Grace Marks.[00:54:21] It's kind of messed up that you do that, but okay, fine. You want to pass a bunch of people who deserve to fail, do it. Then the other more concerning thing was between 33 and 93, right? That's about 60 numbers, 61 numbers, 30 of those numbers were just missing, as in nobody got 91 on this exam. In any subject in any year.[00:54:44] How, how does that happen? You, you don't get a 91, you don't get a 93, 89, 87, 85, 84. Some numbers were just missing. And at first when I saw this, I'm like, this is definitely some bug in my code. There's no way that, like, there's 91 never happened. And so I started, I remember I asked a bunch of my friends, I'm like, dude, did you ever get a 9 81 in anything?[00:55:06] And they're like, no. And it just unraveled that this is obviously problematic cuz that means that they're screwing with your final marks in some way or the other. Yeah. And, and they're not transparent about how they do it. Then I did, I did the same thing for the other board. We found something similar there, but not, not, not the same.[00:55:24] The problem there was, there was a huge spike at 95 and then I realized what they were doing is they'd offer various exams and to standardize, they would blanket add like a, a, a, a raw number. So if you took the harder math exam, everyone would get plus 10. Arbitrarily, no one. This is not revealed or publicized.[00:55:41] It's randomly, that was the harder exam you guys all get plus 10, but it's capped at 95. That's just this stupid way to standardize. It doesn't make any sense. Ah, um, they're not transparent about it. And it affects your entire life because yeah, this is what gets you into college. And yeah, if you add the two exams up, this is 1.1 million kids taking it every year.[00:56:02] So that's a lot of people's lives that you're screwing with by not understanding numbers and, and not being transparent about how you're manipulating them. So that was the thesis in my view, looking back on it, 10 years later, it's been 10 years at this point. I think the media never did justice to it because to be honest, nobody understands statistics.[00:56:23] So over time it became a big issue then. And then there was a big Supreme court or high court ruling, which said, Hey, you guys can't do this, but there's no transparency. So there's no way of actually ensuring that they're not doing it. They just added a, a level of password protection, so now I can't scrape it anymore.[00:56:40] And, uh, they probably do the same thing and it's probably still as bad, but people aren't. Raising an issue about it. It's really hard to make this people understand the significance of it because people are so compelled to just go lean into the narrative of exams are b******t and we should never trust exams, and this is why it's okay to be dumb.[00:56:59] And it's not, that's not the point, like the point. So, I, I think the, the response was lackluster in retrospect, but that's, that's what I unveiled in 2013. That's fascinating.[00:57:09] You know, in my finance background, uh, the similar case happens with the Madoff funds because if you plot the, the statistical distribution of the, the Madoff funds, you could see that they were just not a normal distribution, and therefore they would, they would probably made up numbers.[00:57:25] And, uh, we also did the same thing in my first job as a, as a regulator in Singapore for, for hedge funds returns. Wow. Which is watermarking. It's this, this is a watermark of a human or, uh, some kind of system. Uh, you know, making it up. And statistically, if you look at the distribution, you can see like this, this violates any reasonable assumption.[00:57:41] Therefore, something's.[00:57:42] Wrong. Well, I see, I see what you mean there. Like in that sense. Yes. That's really cool that you worked on a very similar problem, and I agree that it's messed up. It's a good way to catch liars in[00:57:53] Madoff's case. Like they actually made it a big deal, but I don't know, like I don't see how this was a big, wasn't a bigger deal in India.[00:57:58] But anyway, uh, that's a conversation for another, uh, over drinks perhaps.[00:58:01] Lightning Round[00:58:01] But, so now we're gonna go into the lightning round. Just to cut things off with a, uh, overview. What are your favorite AI people and communities? You mentioned Reddits. Let's be specific about which, uh,[00:58:12] I actually don't really use Reddit that much for, uh, AI stuff.[00:58:16] It was just one, a one-off example. Most of my learnings are Twitter, and I think there are the obvious ones, like everyone follows Riley Goodside now and there's a bunch of like the really famous ones. But I think amongst the lesser known ones, there are, let me say just my favorite one is probably AI Pub because it does a roundup of everybody else's stuff regularly.[00:58:40] I know Brian who runs AI Pub as well, and I just think I find it really useful cuz often it's very hard to catch up on stuff and this gives you the entire roundup of last two weeks, here's what happened in ai.[00:58:51] Good, good, good. Uh, and any other communities like Slack communities, the scores? You don't[00:58:55] do that stuff?[00:58:56] I try to, but I, I don't because it's too time consuming. I prefer reading at my own pace.[00:59:02] Yeah, yeah, yeah. Okay. So, so my, my learning is, uh, start a Twitter like, uh, weekly recap of here's what happened in ai. I mean, it makes sense, right? Like it'll do very well. It was you[00:59:11] very well a year from now. What do you think people will be the most surprised[00:59:15] by in ai?[00:59:17] I think they're gonna be surprised at how much cheaper they're able to bring out, down the cost to, and how much faster that these models get. I'm more optimistic about cost and latency more than I am about just quality improvements at this point. I think modalities will change, but I think quality is near about like a, a maxima that we're gonna achieve.[00:59:42] So this is a request for startups or a request for site projects. What's an AI thing that you would pay for? Is somebody else built[00:59:47] it aside from the Harry Potter image one, which I would definitely, I would pay a lot of money to have like a floating, I don't know, bill Clinton in my room, just saying things back to me whenever I talk to it.[00:59:59] That would be cool. But in terms of other products, uh, if somebody built. A product that would smartly, I know many people have tried to build things like this that would smartly auto respond to things that it can auto respond to. And for the things that are actually important, please don't auto respond and just tell me to do it.[01:00:19] And that distinction, I think is really important. So somewhere in between the automate everything and the just suggest everything hybrid that works well, I think that would be really cool. Yeah. I've thought[01:00:30] about this as well. Even if it doesn't respond for you, it can draft an answer for you to edit.[01:00:35] Right. Uh, so that you, you at least get to review.[01:00:37] I actually built that this morning. If you guys want it. Ooh. You just, oh, with Gmail and then it pre-draft every email in your inbox. Really? But, uh, yeah, you have to change the prompt because my prompt says like, you. Software engineer. I'm a venture capitalist, this is where it works, blah, blah, blah, blah.[01:00:55] But you can modify that and then it, it works. It works. Are you[01:00:58] gonna open source it?[01:01:00] I, I probably will, but it sometimes it's like it cares too much about the prompt. So for example, in the prompt, I was like, if the person is asking about scheduling, suggest the time and public like the calendar, my calendar or give this calendar link in every email.[01:01:15] It will respond. And if you ever wanna chat, here's my calendar. Like no matter what the email was, every email, it would tell them to schedule time. So there's still work to[01:01:24] be done. You're just very helpful. You're just very, very helpful. Well, so actually I have a GitHub version of this, which I actually would pay someone to build, which is read somebody who opened a GitHub issue, like, and, and check if they have missed anything for resolutions.[01:01:38] And then generate response to like request for resolution. And then like me, you know, if, if they haven't answered in like 30 days, close the issue.[01:01:45] Absolutely. And, and one thing I'll add to that is also the idea of the ai, just going in and making PRS for you, I think is super compelling that it just says, Hey, I found all these vulnerabilities, uh, patch man.[01:01:58] Yeah, yeah. We, we got a cell company doing it, so Hello. Yeah, I'll let you know more. Deedee, thank you so much for coming on. I think to wrap it up, um, is there any parting thoughts, kind of like one thing that you want everyone to take away about AI and the impact this kind of have?[01:02:14] Yeah, I think my, my parting thought is I have always been a big fan of people of bridging the gap between research and the end consumer.[01:02:24] And I think this is just a great time to be alive where. If you are interested in AI or if you're even remotely interested, of course you can go build stuff. Of course you can read about it. But I think it's so cool that you sh you can just go read the paper and read the raw things that people did to make this happen.[01:02:42] And I really encourage people to go and read research, follow people on YouTube who are explaining this. Andre Kapai has a great channel where he also explains it. It's just a great time to learn in this space and I would really encourage more people to go and and read the actual stuff. It's really cool.[01:03:01] Thank you[01:03:01] so much, Didi, for coming on. It was a great chat. Um, where can people follow you on Twitter? Any other thing you wanna[01:03:08] plug? I think Twitter is fine. And there's a link to my website from my Twitter too. It's my first name, debark underscore das is my Twitter and dego.com is my website. But you can also just Google DB das and you will find both of those links.[01:03:25] Awesome. All right. Thank you so much.[01:03:27] Thank you. Thanks guys. Get full access to Latent Space at www.latent.space/subscribe
01:04:0222/04/2023
Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow
2023 is the year of Multimodal AI, and Latent Space is going multimodal too! * This podcast comes with a video demo at the 1hr mark and it’s a good excuse to launch our YouTube - please subscribe! * We are also holding two events in San Francisco — the first AI | UX meetup next week (already full; we’ll send a recap here on the newsletter) and Latent Space Liftoff Day on May 4th (signup here; but get in touch if you have a high profile launch you’d like to make). * We also joined the Chroma/OpenAI ChatGPT Plugins Hackathon last week where we won the Turing and Replit awards and met some of you in person!This post featured on Hacker News.Out of the five senses of the human body, I’d put sight at the very top. But weirdly when it comes to AI, Computer Vision has felt left out of the recent wave compared to image generation, text reasoning, and even audio transcription. We got our first taste of it with the OCR capabilities demo in the GPT-4 Developer Livestream, but to date GPT-4’s vision capability has not yet been released. Meta AI leapfrogged OpenAI and everyone else by fully open sourcing their Segment Anything Model (SAM) last week, complete with paper, model, weights, data (6x more images and 400x more masks than OpenImages), and a very slick demo website. This is a marked change to their previous LLaMA release, which was not commercially licensed. The response has been ecstatic:SAM was the talk of the town at the ChatGPT Plugins Hackathon and I was fortunate enough to book Joseph Nelson who was frantically integrating SAM into Roboflow this past weekend. As a passionate instructor, hacker, and founder, Joseph is possibly the single best person in the world to bring the rest of us up to speed on the state of Computer Vision and the implications of SAM. I was already a fan of him from his previous pod with (hopefully future guest) Beyang Liu of Sourcegraph, so this served as a personal catchup as well. Enjoy! and let us know what other news/models/guests you’d like to have us discuss! - swyxRecorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Show Notes* Joseph’s links: Twitter, Linkedin, Personal* Sourcegraph Podcast and Game Theory Story* Represently* Roboflow at Pioneer and YCombinator* Udacity Self Driving Car dataset story* Computer Vision Annotation Formats* SAM recap - top things to know for those living in a cave* https://segment-anything.com/* https://segment-anything.com/demo* https://arxiv.org/pdf/2304.02643.pdf * https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/* https://blog.roboflow.com/segment-anything-breakdown/* https://ai.facebook.com/datasets/segment-anything/* Ask Roboflow https://ask.roboflow.ai/* GPT-4 Multimodal https://blog.roboflow.com/gpt-4-impact-speculation/Cut for time:* WSJ mention* Des Moines Register story* All In Pod: timestamped mention* In Forbes: underrepresented investors in Series A* Roboflow greatest hits* https://blog.roboflow.com/mountain-dew-contest-computer-vision/* https://blog.roboflow.com/self-driving-car-dataset-missing-pedestrians/* https://blog.roboflow.com/nerualhash-collision/ and Apple CSAM issue * https://www.rf100.org/Timestamps* [00:00:19] Introducing Joseph* [00:02:28] Why Iowa* [00:05:52] Origin of Roboflow* [00:16:12] Why Computer Vision* [00:17:50] Computer Vision Use Cases* [00:26:15] The Economics of Annotation/Segmentation* [00:32:17] Computer Vision Annotation Formats* [00:36:41] Intro to Computer Vision & Segmentation* [00:39:08] YOLO* [00:44:44] World Knowledge of Foundation Models* [00:46:21] Segment Anything Model* [00:51:29] SAM: Zero Shot Transfer* [00:51:53] SAM: Promptability* [00:53:24] SAM: Model Assisted Labeling* [00:56:03] SAM doesn't have labels* [00:59:23] Labeling on the Browser* [01:00:28] Roboflow + SAM Video Demo * [01:07:27] Future Predictions* [01:08:04] GPT4 Multimodality* [01:09:27] Remaining Hard Problems* [01:13:57] Ask Roboflow (2019)* [01:15:26] How to keep up in AITranscripts[00:00:00] Hello everyone. It is me swyx and I'm here with Joseph Nelson. Hey, welcome to the studio. It's nice. Thanks so much having me. We, uh, have a professional setup in here.[00:00:19] Introducing Joseph[00:00:19] Joseph, you and I have known each other online for a little bit. I first heard about you on the Source Graph podcast with bian and I highly, highly recommend that there's a really good game theory story that is the best YC application story I've ever heard and I won't tease further cuz they should go listen to that.[00:00:36] What do you think? It's a good story. It's a good story. It's a good story. So you got your Bachelor of Economics from George Washington, by the way. Fun fact. I'm also an econ major as well. You are very politically active, I guess you, you did a lot of, um, interning in political offices and you were responding to, um, the, the, the sheer amount of load that the Congress people have in terms of the, the support.[00:01:00] So you built, representing, which is Zendesk for Congress. And, uh, I liked in your source guide podcast how you talked about how being more responsive to, to constituents is always a good thing no matter what side of the aisle you're on. You also had a sideline as a data science instructor at General Assembly.[00:01:18] As a consultant in your own consultancy, and you also did a bunch of hackathon stuff with Magic Sudoku, which is your transition from N L P into computer vision. And apparently at TechCrunch Disrupt, disrupt in 2019, you tried to add chess and that was your whole villain origin story for, Hey, computer vision's too hard.[00:01:36] That's full, the platform to do that. Uh, and now you're co-founder c e o of RoboFlow. So that's your bio. Um, what's not in there that[00:01:43] people should know about you? One key thing that people realize within maybe five minutes of meeting me, uh, I'm from Iowa. Yes. And it's like a funnily novel thing. I mean, you know, growing up in Iowa, it's like everyone you know is from Iowa.[00:01:56] But then when I left to go to school, there was not that many Iowans at gw and people were like, oh, like you're, you're Iowa Joe. Like, you know, how'd you find out about this school out here? I was like, oh, well the Pony Express was running that day, so I was able to send. So I really like to lean into it.[00:02:11] And so you kind of become a default ambassador for places that. People don't meet a lot of other people from, so I've kind of taken that upon myself to just make it be a, a part of my identity. So, you know, my handle everywhere Joseph of Iowa, like I I, you can probably find my social security number just from knowing that that's my handle.[00:02:25] Cuz I put it plastered everywhere. So that's, that's probably like one thing.[00:02:28] Why Iowa[00:02:28] What's your best pitch for Iowa? Like why is[00:02:30] Iowa awesome? The people Iowa's filled with people that genuinely care. You know, if you're waiting a long line, someone's gonna strike up a conversation, kinda ask how you were Devrel and it's just like a really genuine place.[00:02:40] It was a wonderful place to grow up too at the time, you know, I thought it was like, uh, yeah, I was kind of embarrassed and then be from there. And then I actually kinda looking back it's like, wow, you know, there's good schools, smart people friendly. The, uh, high school that I went to actually Ben Silverman, the CEO and, or I guess former CEO and co-founder of Pinterest and I have the same teachers in high school at different.[00:03:01] The co-founder, or excuse me, the creator of crispr, the gene editing technique, Dr. Jennifer. Doudna. Oh, so that's the patent debate. There's Doudna. Oh, and then there's Fang Zang. Uh, okay. Yeah. Yeah. So Dr. Fang Zang, who I think ultimately won the patent war, uh, but is also from the same high school.[00:03:18] Well, she won the patent, but Jennifer won the[00:03:20] prize.[00:03:21] I think that's probably, I think that's probably, I, I mean I looked into it a little closely. I think it was something like she won the patent for CRISPR first existing and then Feng got it for, uh, first use on humans, which I guess for commercial reasons is the, perhaps more, more interesting one. But I dunno, biolife Sciences, is that my area of expertise?[00:03:38] Yep. Knowing people that came from Iowa that do cool things, certainly is. Yes. So I'll claim it. Um, but yeah, I, I, we, um, at Roble actually, we're, we're bringing the full team to Iowa for the very first time this last week of, of April. And, well, folks from like Scotland all over, that's your company[00:03:54] retreat.[00:03:54] The Iowa,[00:03:55] yeah. Nice. Well, so we do two a year. You know, we've done Miami, we've done. Some of the smaller teams have done like Nashville or Austin or these sorts of places, but we said, you know, let's bring it back to kinda the origin and the roots. Uh, and we'll, we'll bring the full team to, to Des Moines, Iowa.[00:04:13] So, yeah, like I was mentioning, folks from California to Scotland and many places in between are all gonna descend upon Des Moines for a week of, uh, learning and working. So maybe you can check in with those folks. If, what do they, what do they decide and interpret about what's cool. Our state. Well, one thing, are you actually headquartered in Des Moines on paper?[00:04:30] Yes. Yeah.[00:04:30] Isn't that amazing? That's like everyone's Delaware and you're like,[00:04:33] so doing research. Well, we're, we're incorporated in Delaware. Okay. We we're Delaware Sea like, uh, most companies, but our headquarters Yeah. Is in Des Moines. And part of that's a few things. One, it's like, you know, there's this nice Iowa pride.[00:04:43] And second is, uh, Brad and I both grew up in Brad Mc, co-founder and I grew up in, in Des Moines. And we met each other in the year 2000. We looked it up for the, the YC app. So, you know, I think, I guess more of my life I've known Brad than not, uh, which is kind of crazy. Wow. And during yc, we did it during 2020, so it was like the height of Covid.[00:05:01] And so we actually got a house in Des Moines and lived, worked outta there. I mean, more credit to. So I moved back. I was living in DC at the time, I moved back to to Des Moines. Brad was living in Des Moines, but he moved out of a house with his. To move into what we called our hacker house. And then we had one, uh, member of the team as well, Jacob Sorowitz, who moved from Minneapolis down to Des Moines for the summer.[00:05:21] And frankly, uh, code was a great time to, to build a YC company cuz there wasn't much else to do. I mean, it's kinda like wash your groceries and code. It's sort of the, that was the routine[00:05:30] and you can use, uh, computer vision to help with your groceries as well.[00:05:33] That's exactly right. Tell me what to make.[00:05:35] What's in my fridge? What should I cook? Oh, we'll, we'll, we'll cover[00:05:37] that for with the G P T four, uh, stuff. Exactly. Okay. So you have been featured with in a lot of press events. Uh, but maybe we'll just cover the origin story a little bit in a little bit more detail. So we'll, we'll cover robo flow and then we'll cover, we'll go into segment anything.[00:05:52] Origin of Roboflow[00:05:52] But, uh, I think it's important for people to understand. Robo just because it gives people context for what you're about to show us at the end of the podcast. So Magic Sudoku tc, uh, techers Disrupt, and then you go, you join Pioneer, which is Dan Gross's, um, YC before yc.[00:06:07] Yeah. That's how I think about it.[00:06:08] Yeah, that's a good way. That's a good description of it. Yeah. So I mean, robo flow kind of starts as you mentioned with this magic Sudoku thing. So you mentioned one of my prior business was a company called Represent, and you nailed it. I mean, US Congress gets 80 million messages a year. We built tools that auto sorted them.[00:06:23] They didn't use any intelligent auto sorting. And this is somewhat a solved problem in natural language processing of doing topic modeling or grouping together similar sentiment and things like this. And as you mentioned, I'd like, I worked in DC for a bit and been exposed to some of these problems and when I was like, oh, you know, with programming you can build solutions.[00:06:40] And I think the US Congress is, you know, the US kind of United States is a support center, if you will, and the United States is sports center runs on pretty old software, so mm-hmm. We, um, we built a product for that. It was actually at the time when I was working on representing. Brad, his prior business, um, is a social games company called Hatchlings.[00:07:00] Uh, he phoned me in, in 2017, apple had released augmented reality kit AR kit. And Brad and I are both kind of serial hackers, like I like to go to hackathons, don't really understand new technology until he build something with them type folks. And when AR Kit came out, Brad decided he wanted to build a game with it that would solve Sudoku puzzles.[00:07:19] And the idea of the game would be you take your phone, you hover hold it over top of a Sudoku puzzle, it recognizes the state of the board where it is, and then it fills it all in just right before your eyes. And he phoned me and I was like, Brad, this sounds awesome and sounds like you kinda got it figured out.[00:07:34] What, what's, uh, what, what do you think I can do here? It's like, well, the machine learning piece of this is the part that I'm most uncertain about. Uh, doing the digit recognition and, um, filling in some of those results. I was like, well, I mean digit recognition's like the hell of world of, of computer vision.[00:07:48] That's Yeah, yeah, MNIST, right. So I was like, that that part should be the, the easy part. I was like, ah, I'm, he's like, I'm not so super sure, but. You know, the other parts, the mobile ar game mechanics, I've got pretty well figured out. I was like, I, I think you're wrong. I think you're thinking about the hard part is the easy part.[00:08:02] And he is like, no, you're wrong. The hard part is the easy part. And so long story short, we built this thing and released Magic Sudoku and it kind of caught the Internet's attention of what you could do with augmented reality and, and with computer vision. It, you know, made it to the front ofer and some subreddits it run Product Hunt Air app of the year.[00:08:20] And it was really a, a flash in the pan type app, right? Like we were both running separate companies at the time and mostly wanted to toy around with, with new technology. And, um, kind of a fun fact about Magic Sudoku winning product Hunt Air app of the year. That was the same year that I think the model three came out.[00:08:34] And so Elon Musk won a Golden Kitty who we joked that we share an award with, with Elon Musk. Um, the thinking there was that this is gonna set off a, a revolution of if two random engineers can put together something that makes something, makes a game programmable and at interactive, then surely lots of other engineers will.[00:08:53] Do similar of adding programmable layers on top of real world objects around us. Earlier we were joking about objects in your fridge, you know, and automatically generating recipes and these sorts of things. And like I said, that was 2017. Roboflow was actually co-found, or I guess like incorporated in, in 2019.[00:09:09] So we put this out there, nothing really happened. We went back to our day jobs of, of running our respective businesses, I sold Represently and then as you mentioned, kind of did like consulting stuff to figure out the next sort of thing to, to work on, to get exposed to various problems. Brad appointed a new CEO at his prior business and we got together that summer of 2019.[00:09:27] We said, Hey, you know, maybe we should return to that idea that caught a lot of people's attention and shows what's possible. And you know what, what kind of gives, like the future is here. And we have no one's done anything since. No one's done anything. So why is, why are there not these, these apps proliferated everywhere.[00:09:42] Yeah. And so we said, you know, what we'll do is, um, to add this software layer to the real world. Will build, um, kinda like a super app where if you pointed it at anything, it will recognize it and then you can interact with it. We'll release a developer platform and allow people to make their own interfaces, interactivity for whatever object they're looking at.[00:10:04] And we decided to start with board games because one, we had a little bit of history there with, with Sudoku two, there's social by default. So if one person, you know finds it, then they'd probably share it among their friend. Group three. There's actually relatively few barriers to entry aside from like, you know, using someone else's brand name in your, your marketing materials.[00:10:19] Yeah. But other than that, there's no real, uh, inhibitors to getting things going and, and four, it's, it's just fun. It would be something that'd be bring us enjoyment to work on. So we spent that summer making, uh, boggle the four by four word game provable, where, you know, unlike Magic Sudoku, which to be clear, totally ruins the game, uh, you, you have to solve Sudoku puzzle.[00:10:40] You don't need to do anything else. But with Boggle, if you and I are playing, we might not find all of the words that adjacent letter tiles. Unveil. So if we have a, an AI tell us, Hey, here's like the best combination of letters that make high scoring words. And so we, we made boggle and released it and that, and that did okay.[00:10:56] I mean maybe the most interesting story was there's a English as a second language program in, in Canada that picked it up and used it as a part of their curriculum to like build vocabulary, which I thought was kind of inspiring. Example, and what happens just when you put things on the internet and then.[00:11:09] We wanted to build one for chess. So this is where you mentioned we went to 2019. TechCrunch Disrupt TechCrunch. Disrupt holds a Hackathon. And this is actually, you know, when Brad and I say we really became co-founders, because we fly out to San Francisco, we rent a hotel room in the Tenderloin. We, uh, we, we, uh, have one room and there's like one, there's room for one bed, and then we're like, oh, you said there was a cot, you know, on the, on the listing.[00:11:32] So they like give us a little, a little cot, the end of the cot, like bled and over into like the bathroom. So like there I am sleeping on the cot with like my head in the bathroom and the Tenderloin, you know, fortunately we're at a hackathon glamorous. Yeah. There wasn't, there wasn't a ton of sleep to be had.[00:11:46] There is, you know, we're, we're just like making and, and shipping these, these sorts of many[00:11:50] people with this hack. So I've never been to one of these things, but[00:11:52] they're huge. Right? Yeah. The Disrupt Hackathon, um, I don't, I don't know numbers, but few hundreds, you know, classically had been a place where it launched a lot of famous Yeah.[00:12:01] Sort of flare. Yeah. And I think it's, you know, kind of slowed down as a place for true company generation. But for us, Brad and I, who likes just doing hackathons, being, making things in compressed time skills, it seemed like a, a fun thing to do. And like I said, we'd been working on things, but it was only there that like, you're, you're stuck in a maybe not so great glamorous situation together and you're just there to make a, a program and you wanna make it be the best and compete against others.[00:12:26] And so we add support to the app that we were called was called Board Boss. We couldn't call it anything with Boggle cause of IP rights were called. So we called it Board Boss and it supported Boggle and then we were gonna support chess, which, you know, has no IP rights around it. Uh, it's an open game.[00:12:39] And we did so in 48 hours, we built an app that, or added fit capability to. Point your phone at a chess board. It understands the state of the chess board and converts it to um, a known notation. Then it passes that to stock fish, the open source chess engine for making move recommendations and it makes move recommendations to, to players.[00:13:00] So you could either play against like an ammunition to AI or improve your own game. We learn that one of the key ways users like to use this was just to record their games. Cuz it's almost like reviewing game film of what you should have done differently. Game. Yeah, yeah, exactly. And I guess the highlight of, uh, of chess Boss was, you know, we get to the first round of judging, we get to the second round of judging.[00:13:16] And during the second round of judging, that's when like, TechCrunch kind of brings around like some like celebs and stuff. They'll come by. Evan Spiegel drops by Ooh. Oh, and he uh, he comes up to our, our, our booth and um, he's like, oh, so what does, what does this all do? And you know, he takes an interest in it cuz the underpinnings of, of AR interacting with the.[00:13:33] And, uh, he is kinda like, you know, I could use this to like cheat on chess with my friends. And we're like, well, you know, that wasn't exactly the, the thesis of why we made it, but glad that, uh, at least you think it's kind of neat. Um, wait, but he already started Snapchat by then? Oh, yeah. Oh yeah. This, this is 2019, I think.[00:13:49] Oh, okay, okay. Yeah, he was kind of just checking out things that were new and, and judging didn't end up winning any, um, awards within Disrupt, but I think what we won was actually. Maybe more important maybe like the, the quote, like the co-founders medal along the way. Yep. The friends we made along the way there we go to, to play to the meme.[00:14:06] I would've preferred to win, to be clear. Yes. You played a win. So you did win, uh,[00:14:11] $15,000 from some Des Moines, uh, con[00:14:14] contest. Yeah. Yeah. The, uh, that was nice. Yeah. Slightly after that we did, we did win. Um, some, some grants and some other things for some of the work that we've been doing. John Papa John supporting the, uh, the local tech scene.[00:14:24] Yeah. Well, so there's not the one you're thinking of. Okay. Uh, there's a guy whose name is Papa John, like that's his, that's his, that's his last name. His first name is John. So it's not the Papa John's you're thinking of that has some problematic undertones. It's like this guy who's totally different. I feel bad for him.[00:14:38] His press must just be like, oh, uh, all over the place. But yeah, he's this figure in the Iowa entrepreneurial scene who, um, he actually was like doing SPACs before they were cool and these sorts of things, but yeah, he funds like grants that encourage entrepreneurship in the state. And since we'd done YC and in the state, we were eligible for some of the awards that they were providing.[00:14:56] But yeah, it was disrupt that we realized, you know, um, the tools that we made, you know, it took us better part of a summer to add Boggle support and it took us 48 hours to add chest support. So adding the ability for programmable interfaces for any object, we built a lot of those internal tools and our apps were kind of doing like the very famous shark fin where like it picks up really fast, then it kind of like slowly peters off.[00:15:20] Mm-hmm. And so we're like, okay, if we're getting these like shark fin graphs, we gotta try something different. Um, there's something different. I remember like the week before Thanksgiving 2019 sitting down and we wrote this Readme for, actually it's still the Readme at the base repo of Robo Flow today has spent relatively unedited of the manifesto.[00:15:36] Like, we're gonna build tools that enable people to make the world programmable. And there's like six phases and, you know, there's still, uh, many, many, many phases to go into what we wrote even at that time to, to present. But it's largely been, um, right in line with what we thought we would, we would do, which is give engineers the tools to add software to real world objects, which is largely predicated on computer vision. So finding the right images, getting the right sorts of video frames, maybe annotating them, uh, finding the right sort of models to use to do this, monitoring the performance, all these sorts of things. And that from, I mean, we released that in early 2020, and it's kind of, that's what's really started to click.[00:16:12] Why Computer Vision[00:16:12] Awesome. I think we should just kind[00:16:13] of[00:16:14] go right into where you are today and like the, the products that you offer, just just to give people an overview and then we can go into the, the SAM stuff. So what is the clear, concise elevator pitch? I think you mentioned a bunch of things like make the world programmable so you don't ha like computer vision is a means to an end.[00:16:30] Like there's, there's something beyond that. Yeah.[00:16:32] I mean, the, the big picture mission for the business and the company and what we're working on is, is making the world programmable, making it read and write and interactive, kind of more entertaining, more e. More fun and computer vision is the technology by which we can achieve that pretty quickly.[00:16:48] So like the one liner for the, the product in, in the company is providing engineers with the tools for data and models to build programmable interfaces. Um, and that can be workflows, that could be the, uh, data processing, it could be the actual model training. But yeah, Rob helps you use production ready computer vision workflows fast.[00:17:10] And I like that.[00:17:11] In part of your other pitch that I've heard, uh, is that you basically scale from the very smallest scales to the very largest scales, right? Like the sort of microbiology use case all the way to[00:17:20] astronomy. Yeah. Yeah. The, the joke that I like to make is like anything, um, underneath a microscope and, and through a telescope and everything in between needs to, needs to be seen.[00:17:27] I mean, we have people that run models in outer space, uh, underwater remote places under supervision and, and known places. The crazy thing is that like, All parts of, of not just the world, but the universe need to be observed and understood and acted upon. So vision is gonna be, I dunno, I feel like we're in the very, very, very beginnings of all the ways we're gonna see it.[00:17:50] Computer Vision Use Cases[00:17:50] Awesome. Let's go into a lo a few like top use cases, cuz I think that really helps to like highlight the big names that you've, big logos that you've already got. I've got Walmart and Cardinal Health, but I don't, I don't know if you wanna pull out any other names, like, just to illustrate, because the reason by the way, the reason I think that a lot of developers don't get into computer vision is because they think they don't need it.[00:18:11] Um, or they think like, oh, like when I do robotics, I'll do it. But I think if, if you see like the breadth of use cases, then you get a little bit more inspiration as to like, oh, I can use[00:18:19] CVS lfa. Yeah. It's kind of like, um, you know, by giving, by making it be so straightforward to use vision, it becomes almost like a given that it's a set of features that you could power on top of it.[00:18:32] And like you mentioned, there's, yeah, there's Fortune One there over half the Fortune 100. I've used the, the tools that Robel provides just as much as 250,000 developers. And so over a quarter million engineers finding and developing and creating various apps, and I mean, those apps are, are, are far and wide.[00:18:49] Just as you mentioned. I mean everything from say, like, one I like to talk about was like sushi detection of like finding the like right sorts of fish and ingredients that are in a given piece of, of sushi that you're looking at to say like roof estimation of like finding. If there's like, uh, hail damage on, on a given roof, of course, self-driving cars and understanding the scenes around us is sort of the, you know, very early computer vision everywhere.[00:19:13] Use case hardhat detection, like finding out if like a given workplace is, is, is safe, uh, disseminate, have the right p p p on or p p e on, are there the right distance from various machines? A huge place that vision has been used is environmental monitoring. Uh, what's the count of species? Can we verify that the environment's not changing in unexpected ways or like river banks are become, uh, becoming recessed in ways that we anticipate from satellite imagery, plant phenotyping.[00:19:37] I mean, people have used these apps for like understanding their plants and identifying them. And that dataset that's actually largely open, which is what's given a proliferation to the iNaturalist, is, is that whole, uh, hub of, of products. Lots of, um, people that do manufacturing. So, like Rivian for example, is a Rubal customer, and you know, they're trying to scale from 1000 cars to 25,000 cars to a hundred thousand cars in very short order.[00:20:00] And that relies on having the. Ability to visually ensure that every part that they're making is produced correctly and right in time. Medical use cases. You know, there's actually, this morning I was emailing with a user who's accelerating early cancer detection through breaking apart various parts of cells and doing counts of those cells.[00:20:23] And actually a lot of wet lab work that folks that are doing their PhDs or have done their PhDs are deeply familiar with that is often required to do very manually of, of counting, uh, micro plasms or, or things like this. There's. All sorts of, um, like traffic counting and smart cities use cases of understanding curb utilization to which sort of vehicles are, are present.[00:20:44] Uh, ooh. That can be[00:20:46] really good for city planning actually.[00:20:47] Yeah. I mean, one of our customers does exactly this. They, they measure and do they call it like smart curb utilization, where uhhuh, they wanna basically make a curb be almost like a dynamic space where like during these amounts of time, it's zoned for this during these amounts of times.[00:20:59] It's zoned for this based on the flows and e ebbs and flows of traffic throughout the day. So yeah, I mean the, the, the truth is that like, you're right, it's like a developer might be like, oh, how would I use vision? And then all of a sudden it's like, oh man, all these things are at my fingertips. Like I can just, everything you can see.[00:21:13] Yeah. Right. I can just, I can just add functionality for my app to understand and ingest the way, like, and usually the way that someone gets like almost nerd sniped into this is like, they have like a home automation project, so it's like send Yeah. Give us a few. Yeah. So send me a text when, um, a package shows up so I can like prevent package theft so I can like go down and grab it right away or.[00:21:29] We had a, uh, this one's pretty, pretty niche, but it's pretty funny. There was this guy who, during the pandemic wa, wanted to make sure his cat had like the proper, uh, workout. And so I've shared the story where he basically decided that. He'd make a cat workout machine with computer vision, you might be alone.[00:21:43] You're like, what does that look like? Well, what he decided was he would take a robotic arm strap, a laser pointer to it, and then train a machine to recognize his cat and his cat only, and point the laser pointer consistently 10 feet away from the cat. There's actually a video of you if you type an YouTube cat laser turret, you'll find Dave's video.[00:22:01] Uh, and hopefully Dave's cat has, has lost the weight that it needs to, cuz that's just the, that's an intense workout I have to say. But yeah, so like, that's like a, um, you know, these, uh, home automation projects are pretty common places for people to get into smart bird feeders. I've seen people that like are, are logging and understanding what sort of birds are, uh, in their background.[00:22:18] There's a member of our team that was working on actually this as, as a whole company and has open sourced a lot of the data for doing bird species identification. And now there's, I think there's even a company that's, uh, founded to create like a smart bird feeder, like captures photos and tells you which ones you've attracted to your yard.[00:22:32] I met that. Do, you know, get around the, uh, car sharing company that heard it? Them never used them. They did a SPAC last year and they had raised at like, They're unicorn. They raised at like 1.2 billion, I think in the, the prior round and inspected a similar price. I met the CTO of, of Getaround because he was, uh, using Rob Flow to hack into his Tesla cameras to identify other vehicles that are like often nearby him.[00:22:56] So he's basically building his own custom license plate recognition, and he just wanted like, keep, like, keep tabs of like, when he drives by his friends or when he sees like regular sorts of folks. And so he was doing like automated license plate recognition by tapping into his, uh, camera feeds. And by the way, Elliot's like one of the like OG hackers, he was, I think one of the very first people to like, um, she break iPhones and, and these sorts of things.[00:23:14] Mm-hmm. So yeah, the project that I want, uh, that I'm gonna work on right now for my new place in San Francisco is. There's two doors. There's like a gate and then the other door. And sometimes we like forget to close, close the gate. So like, basically if it sees that the gate is open, it'll like send us all a text or something like this to make sure that the gate is, is closed at the front of our house.[00:23:32] That's[00:23:32] really cool. And I'll, I'll call out one thing that readers and listeners can, uh, read out on, on your history. One of your most popular initial, um, viral blog post was about, um, autonomous vehicle data sets and how, uh, the one that Udacity was using was missing like one third of humans. And, uh, it's not, it's pretty problematic for cars to miss humans.[00:23:53] Yeah, yeah, actually, so yeah, the Udacity self-driving car data set, which look to their credit, it was just meant to be used for, for academic use. Um, and like as a part of courses on, on Udacity, right? Yeah. But the, the team that released it, kind of hastily labeled and let it go out there to just start to use and train some models.[00:24:11] I think that likely some, some, uh, maybe commercial use cases maybe may have come and, and used, uh, the dataset, who's to say? But Brad and I discovered this dataset. And when we were working on dataset improvement tools at Rob Flow, we ran through our tools and identified some like pretty, as you mentioned, key issues.[00:24:26] Like for example, a lot of strollers weren't labeled and I hope our self-driving cars do those, these sorts of things. And so we relabeled the whole dataset by hand. I have this very fond memory is February, 2020. Brad and I are in Taiwan. So like Covid is actually just, just getting going. And the reason we were there is we were like, Hey, we can work on this from anywhere for a little bit.[00:24:44] And so we spent like a, uh, let's go closer to Covid. Well, you know, I like to say we uh, we got early indicators of, uh, how bad it was gonna be. I bought a bunch of like N 90 fives before going o I remember going to the, the like buying a bunch of N 95 s and getting this craziest look like this like crazy tin hat guy.[00:25:04] Wow. What is he doing? And then here's how you knew. I, I also got got by how bad it was gonna be. I left all of them in Taiwan cuz it's like, oh, you all need these. We'll be fine over in the us. And then come to find out, of course that Taiwan was a lot better in terms of, um, I think, yeah. Safety. But anyway, we were in Taiwan because we had planned this trip and you know, at the time we weren't super sure about the, uh, covid, these sorts of things.[00:25:22] We always canceled it. We didn't, but I have this, this very specific time. Brad and I were riding on the train from Clay back to Taipei. It's like a four hour ride. And you mentioned Pioneer earlier, we were competing in Pioneer, which is almost like a gamified to-do list. Mm-hmm. Every week you say what you're gonna do and then other people evaluate.[00:25:37] Did you actually do the things you said you were going to do? One of the things we said we were gonna do was like this, I think re-release of this data set. And so it's like late, we'd had a whole week, like, you know, weekend behind us and, uh, we're on this train and it was very unpleasant situation, but we relabeled this, this data set, and one sitting got it submitted before like the Sunday, Sunday countdown clock starts voting for, for.[00:25:57] And, um, once that data got out back out there, just as you mentioned, it kind of picked up and Venture beat, um, noticed and wrote some stories about it. And we really rereleased of course, the data set that we did our best job of labeling. And now if anyone's listening, they can probably go out and like find some errors that we surely still have and maybe call us out and, you know, put us, put us on blast.[00:26:15] The Economics of Annotation (Segmentation)[00:26:15] But,[00:26:16] um, well, well the reason I like this story is because it, it draws attention to the idea that annotation is difficult and basically anyone looking to use computer vision in their business who may not have an off-the-shelf data set is going to have to get involved in annotation. And I don't know what it costs.[00:26:34] And that's probably one of the biggest hurdles for me to estimate how big a task this is. Right? So my question at a higher level is tell the customers, how do you tell customers to estimate the economics of annotation? Like how many images do, do we need? How much, how long is it gonna take? That, that kinda stuff.[00:26:50] How much money and then what are the nuances to doing it well, right? Like, cuz obviously Udacity had a poor quality job, you guys had proved it, and there's errors every everywhere. Like where do[00:26:59] these things go wrong? The really good news about annotation in general is that like annotation of course is a means to an end to have a model be able to recognize a thing.[00:27:08] Increasingly there's models that are coming out that can recognize things zero shot without any annotation, which we're gonna talk about. Yeah. Which, we'll, we'll talk more about that in a moment. But in general, the good news is that like the trend is that annotation is gonna become decreasingly a blocker to starting to use computer vision in meaningful ways.[00:27:24] Now that said, just as you mentioned, there's a lot of places where you still need to do. Annotation. I mean, even with these zero shot models, they might have of blind spots, or maybe you're a business, as you mentioned, that you know, it's proprietary data. Like only Rivian knows what a rivian is supposed to look like, right?[00:27:39] Uh, at the time of, at the time of it being produced, like underneath the hood and, and all these sorts of things. And so, yeah, that's gonna necessarily require annotation. So your question of how long is it gonna take, how do you estimate these sorts of things, it really comes down to the complexity of the problem that you're solving and the amount of variance in the scene.[00:27:57] So let's give some contextual examples. If you're trying to recognize, we'll say a scratch on one specific part and you have very strong lighting. You might need fewer images because you control the lighting, you know the exact part and maybe you're lucky in the scratch. Happens more often than not in similar parts or similar, uh, portions of the given part.[00:28:17] So in that context, you, you, the function of variance, the variance is, is, is lower. So the number of images you need is also lower to start getting up to work. Now the orders of magnitude we're talking about is that like you can have an initial like working model from like 30 to 50 images. Yeah. In this context, which is shockingly low.[00:28:32] Like I feel like there's kind of an open secret in computer vision now, the general heuristic that often. Users, is that like, you know, maybe 200 images per class is when you start to have a model that you can rely[00:28:45] on? Rely meaning like 90, 99, 90, 90%, um,[00:28:50] uh, like what's 85 plus 85? Okay. Um, that's good. Again, these are very, very finger in the wind estimates cuz the variance we're talking about.[00:28:59] But the real question is like, at what point, like the framing is not like at what point do it get to 99, right? The framing is at what point can I use this thing to be better than the alternative, which is humans, which maybe humans or maybe like this problem wasn't possible at all. And so usually the question isn't like, how do I get to 99?[00:29:15] A hundred percent? It's how do I ensure that like the value I am able to get from putting this thing in production is greater than the alternative? In fact, even if you have a model that's less accurate than humans, there might be some circumstances where you can tolerate, uh, a greater amount of inaccuracy.[00:29:32] And if you look at the accuracy relative to the cost, Using a model is extremely cheap. Using a human for the same sort of task can be very expensive. Now, in terms of the actual accuracy of of what you get, there's probably some point at which the cost, but relative accuracy exceeds of a model, exceeds the high cost and hopefully high accuracy of, of a human comparable, like for example, there's like cameras that will track soccer balls or track events happening during sporting matches.[00:30:02] And you can go through and you know, we actually have users that work in sports analytics. You can go through and have a human. Hours and hours of footage. Cuz not just watching their team, they're watching every other team, they're watching scouting teams, they're watching junior teams, they're watching competitors.[00:30:15] And you could have them like, you know, track and follow every single time the ball goes within blank region of the field or every time blank player goes into, uh, this portion of the field. And you could have, you know, exact, like a hundred percent accuracy if that person, maybe, maybe not a hundred, a human may be like 95, 90 7% accuracy of every single time the ball is in this region or this player is on the field.[00:30:36] Truthfully, maybe if you're scouting analytics, you actually don't need 97% accuracy of knowing that that player is on the field. And in fact, if you can just have a model run at a 1000th, a 10000th of the cost and goes through and finds all the times that Messi was present on the field mm-hmm. That the ball was in this region of the.[00:30:54] Then even if that model is slightly less accurate, the cost is just so orders of magnitude different. And the stakes like the stakes of this problem, of knowing like the total number of minutes that Messi played will say are such that we have a higher air tolerance, that it's a no-brainer to start to use Yeah, a computer vision model in this context.[00:31:12] So not every problem requires equivalent or greater human performance. Even when it does, you'd be surprised at how fast models get there. And in the times when you, uh, really look at a problem, the question is, how much accuracy do I need to start to get value from this? This thing, like the package example is a great one, right?[00:31:27] Like I could in theory set up a camera that's constantly watching in front of my porch and I could watch the camera whenever I have a package and then go down. But of course, I'm not gonna do that. I value my time to do other sorts of things instead. And so like there, there's this net new capability of, oh, great, I can have an always on thing that tells me when a package shows up, even if you know the, the thing that's gonna text me.[00:31:46] When a package shows up, let's say a flat pack shows up instead of a box and it doesn't know what a flat pack likes, looks like initially. Doesn't matter. It doesn't matter because I didn't have this capability at all before. And I think that's the true case where a lot of computer vision problems exist is like it.[00:32:00] It's like you didn't even have this capability, this superpower before at all, let alone assigning a given human to do the task. And that's where we see like this explosion of, of value.[00:32:10] Awesome. Awesome. That was a really good overview. I want to leave time for the others, but I, I really want to dive into a couple more things with regards to Robo Flow.[00:32:17] Computer Vision Annotation Formats[00:32:17] So one is, apparently your original pitch for Robo Flow was with regards to conversion tools for computer vision data sets. And I'm sure as, as a result of your job, you have a lot of rants. I've been digging for rants basically on like the best or the worst annotation formats. What do we know? Cause most of us, oh my gosh, we only know, like, you know, I like,[00:32:38] okay, so when we talk about computer vision annotation formats, what we're talking about is if you have an image and you, you picture a boing box around my face on that image.[00:32:46] Yeah. How do you describe where that Monty box is? X, Y, Z X Y coordinates. Okay. X, y coordinates. How, what do you mean from the top lefts.[00:32:52] Okay. You, you, you, you take X and Y and then, and then the. The length and, and the width of the, the[00:32:58] box. Okay. So you got like a top left coordinate and like the bottom right coordinate or like the, the center of the bottom.[00:33:02] Yeah. Yeah. Top, left, bottom right. Yeah. That's one type of format. Okay. But then, um, I come along and I'm like, you know what? I want to do a different format where I wanna just put the center of the box, right. And give the length and width. Right. And by the way, we didn't even talk about what X and Y we're talking about.[00:33:14] Is X a pixel count? Is a relative pixel count? Is it an absolute pixel count? So the point is, the number of ways to describe where a box lives in a freaking image is endless, uh, seemingly and. Everyone decided to kind of create their own different ways of describing the coordinates and positions of where in this context of bounding Box is present.[00:33:39] Uh, so there's some formats, for example, that like use re, so for the x and y, like Y is, uh, like the left, most part of the image is zero. And the right most part of the image is one. So the, the coordinate is like anywhere from zero to one. So 0.6 is, you know, 60% of your way right up the image to describe the coordinate.[00:33:53] I guess that was, that was X instead of Y. But the point is there, of the zero to one is the way that we determined where that was in the position, or we're gonna do an absolute pixel position anyway. We got sick, we got sick of all these different annotation formats. So why do you even have to convert between formats?[00:34:07] Is is another part of this, this story. So different training frameworks, like if you're using TensorFlow, you need like TF Records. If you're using PyTorch, it's probably gonna be, well it depends on like what model you're using, but someone might use Coco JSON with PyTorch. Someone else might use like a, just a YAML file and a text file.[00:34:21] And to describe the cor it's point is everyone that creates a model. Or creates a dataset rather, has created different ways of describing where and how a bounding box is present in the image. And we got sick of all these different formats and doing these in writing all these different converter scripts.[00:34:39] And so we made a tool that just converts from one script, one type of format to another. And the, the key thing is that like if you get that converter script wrong, your model doesn't not work. It just fails silently. Yeah. Because the bounding boxes are now all in the wrong places. And so you need a way to visualize and be sure that your converter script, blah, blah blah.[00:34:54] So that was the very first tool we released of robo. It was just a converter script, you know, like these, like these PDF to word converters that you find. It was basically that for computer vision, like dead simple, really annoying thing. And we put it out there and people found some, some value in, in that.[00:35:08] And you know, to this day that's still like a surprisingly painful[00:35:11] problem. Um, yeah, so you and I met at the Dall-E Hackathon at OpenAI, and we were, I was trying to implement this like face masking thing, and I immediately ran into that problem because, um, you know, the, the parameters that Dall-E expected were different from the one that I got from my face, uh, facial detection thing.[00:35:28] One day it'll go away, but that day is not today. Uh, the worst format that we work with is, is. The mart form, it just makes no sense. And it's like, I think, I think it's a one off annotation format that this university in China started to use to describe where annotations exist in a book mart. I, I don't know, I dunno why that So best[00:35:45] would be TF record or some something similar.[00:35:48] Yeah, I think like, here's your chance to like tell everybody to use one one standard and like, let's, let's, can[00:35:53] I just tell them to use, we have a package that does this for you. I'm just gonna tell you to use the row full package that converts them all, uh, for you. So you don't have to think about this. I mean, Coco JSON is pretty good.[00:36:04] It's like one of the larger industry norms and you know, it's in JS O compared to like V xml, which is an XML format and Coco json is pretty descriptive, but you know, it has, has its own sort of drawbacks and flaws and has random like, attribute, I dunno. Um, yeah, I think the best way to handle this problem is to not have to think about it, which is what we did.[00:36:21] We just created a, uh, library that, that converts and uses things. Uh, for us. We've double checked the heck out of it. There's been hundreds of thousands of people that have used the library and battle tested all these different formats to find those silent errors. So I feel pretty good about no longer having to have a favorite format and instead just rely on.[00:36:38] Dot load in the format that I need. Great[00:36:41] Intro to Computer Vision Segmentation[00:36:41] service to the community. Yeah. Let's go into segmentation because is at the top of everyone's minds, but before we get into segment, anything, I feel like we need a little bit of context on the state-of-the-art prior to Sam, which seems to be YOLO and uh, you are the leading expert as far as I know.[00:36:56] Yeah.[00:36:57] Computer vision, there's various task types. There's classification problems where we just like assign tags to images, like, you know, maybe safe work, not safe work, sort of tagging sort of stuff. Or we have object detection, which are the boing boxes that you see and all the formats I was mentioning in ranting about there's instant segmentation, which is the polygon shapes and produces really, really good looking demos.[00:37:19] So a lot of people like instant segmentation.[00:37:21] This would be like counting pills when you point 'em out on the, on the table. Yeah. So, or[00:37:25] soccer players on the field. So interestingly, um, counting you could do with bounding boxes. Okay. Cause you could just say, you know, a box around a person. Well, I could count, you know, 12 players on the field.[00:37:35] Masks are most useful. Polygons are most useful if you need very precise area measurements. So you have an aerial photo of a home and you want to know, and the home's not a perfect box, and you want to know the rough square footage of that home. Well, if you know the distance between like the drone and, and the ground.[00:37:53] And you have the precise polygon shape of the home, then you can calculate how big that home is from aerial photos. And then insurers can, you know, provide say accurate estimates and that's maybe why this is useful. So polygons and, and instant segmentation are, are those types of tasks? There's a key point detection task and key point is, you know, if you've seen those demos of like all the joints on like a hand kind of, kind of outlined, there's visual question answering tasks, visual q and a.[00:38:21] And that's like, you know, some of the stuff that multi-modality is absolutely crushing for, you know, here's an image, tell me what food is in this image. And then you can pass that and you can make a recipe out of it. But like, um, yeah, the visual question in answering task type is where multi-modality is gonna have and is already having an enormous impact.[00:38:40] So that's not a comprehensive survey, very problem type, but it's enough to, to go into why SAM is significant. So these various task types, you know, which model to use for which given circumstance. Most things is highly dependent on what you're ultimately aiming to do. Like if you need to run a model on the edge, you're gonna need a smaller model, cuz it is gonna run on edge, compute and process in, in, in real time.[00:39:01] If you're gonna run a model on the cloud, then of course you, uh, generally have more compute at your disposal Considerations like this now, uh,[00:39:08] YOLO[00:39:08] just to pause. Yeah. Do you have to explain YOLO first before you go to Sam, or[00:39:11] Yeah, yeah, sure. So, yeah. Yeah, we should. So object detection world. So for a while I talked about various different task types and you can kinda think about a slide scale of like classification, then obvious detection.[00:39:20] And on the right, at most point you have like segmentation tasks. Object detection. The bounding boxes is especially useful for a wide, like it's, it's surprisingly versatile. Whereas like classification is kind of brittle. Like you only have a tag for the whole image. Well, that doesn't, you can't count things with tags.[00:39:35] And on the other hand, like the mask side of things, like drawing masks is painstaking. And so like labeling is just a bit more difficult. Plus like the processing to produce masks requires more compute. And so usually a lot of folks kind of landed for a long time on obvious detection being a really happy medium of affording you with rich capabilities because you can do things like count, track, measure.[00:39:56] In some CAGR context with bounding boxes, you can see how many things are present. You can actually get a sense of how fast something's moving by tracking the object or bounding box across multiple frames and comparing the timestamp of where it was across those frames. So obviously detection is a very common task type that solves lots of things that you want do with a given model.[00:40:15] In obviously detection. There's been various model frameworks over time. So kind of really early on there's like R-CNN uh, then there's faster rc n n and these sorts of family models, which are based on like resnet kind of architectures. And then a big thing happens, and that is single shot detectors. So faster, rc n n despite its name is, is very slow cuz it takes two passes on the image.[00:40:37] Uh, the first pass is, it finds par pixels in the image that are most interesting to, uh, create a bounding box candidate out of. And then it passes that to a, a classifier that then does classification of the bounding box of interest. Right. Yeah. You can see, you can see why that would be slow. Yeah. Cause you have to do two passes.[00:40:53] You know, kind of actually led by, uh, like mobile net was I think the first large, uh, single shot detector. And as its name implies, it was meant to be run on edge devices and mobile devices and Google released mobile net. So it's a popular implementation that you find in TensorFlow. And what single shot detectors did is they said, Hey, instead of looking at the image twice, what if we just kind of have a, a backbone that finds candidate bounding boxes?[00:41:19] And then we, we set loss functions for objectness. We set loss function. That's a real thing. We set loss functions for objectness, like how much obj, how object do this part of the images. We send a loss function for classification, and then we run the image through the model on a single pass. And that saves lots of compute time and you know, it's not necessarily as accurate, but if you have lesser compute, it can be extremely useful.[00:41:42] And then the advances in both modeling techniques in compute and data quality, single shot detectors, SSDs has become, uh, really, really popular. One of the biggest SSDs that has become really popular is the YOLO family models, as you described. And so YOLO stands for you only look once. Yeah, right, of course.[00:42:02] Uh, Drake's, uh, other album, um, so Joseph Redman introduces YOLO at the University of Washington. And Joseph Redman is, uh, kind of a, a fun guy. So for listeners, for an Easter egg, I'm gonna tell you to Google Joseph Redman resume, and you'll find, you'll find My Little Pony. That's all I'll say. And so he introduces the very first YOLO architecture, which is a single shot detector, and he also does it in a framework called Darknet, which is like this, this own framework that compiles the Cs, frankly, kind of tough to work with, but allows you to benefit from the speedups that advance when you operate in a low level language like.[00:42:36] And then he releases, well, what colloquially is known as YOLO V two, but a paper's called YOLO 9,000 cuz Joseph Redmond thought it'd be funny to have something over 9,000. So get a sense for, yeah, some fun. And then he releases, uh, YOLO V three and YOLO V three is kind of like where things really start to click because it goes from being an SSD that's very limited to competitive and, and, and superior to actually mobile That and some of these other single shot detectors, which is awesome because you have this sort of solo, I mean, him and and his advisor, Ali, at University of Washington have these, uh, models that are becoming really, really powerful and capable and competitive with these large research organizations.[00:43:09] Joseph Edmond leaves Computer Vision Research, but there had been Alexia ab, one of the maintainers of Darknet released Yola VI four. And another, uh, researcher, Glenn Yer, uh, jocker had been working on YOLO V three, but in a PyTorch implementation, cuz remember YOLO is in a dark implementation. And so then, you know, YOLO V three and then Glenn continues to make additional improvements to YOLO V three and pretty soon his improvements on Yolov theory, he's like, oh, this is kind of its own things.[00:43:36] Then he releases YOLO V five[00:43:38] with some naming[00:43:39] controversy that we don't have Big naming controversy. The, the too long didn't read on the naming controversy is because Glen was not originally involved with Darknet. How is he allowed to use the YOLO moniker? Roe got in a lot of trouble cuz we wrote a bunch of content about YOLO V five and people were like, ah, why are you naming it that we're not?[00:43:55] Um, but you know,[00:43:56] cool. But anyway, so state-of-the-art goes to v8. Is what I gather.[00:44:00] Yeah, yeah. So yeah. Yeah. You're, you're just like, okay, I got V five. I'll skip to the end. Uh, unless, unless there's something, I mean, I don't want, well, so I mean, there's some interesting things. Um, in the yolo, there's like, there's like a bunch of YOLO variants.[00:44:10] So YOLOs become this, like this, this catchall for various single shot, yeah. For various single shot, basically like runs on the edge, it's quick detection framework. And so there's, um, like YOLO R, there's YOLO S, which is a transformer based, uh, yolo, yet look like you only look at one sequence is what s stands were.[00:44:27] Um, the pp yo, which, uh, is PAT Paddle implementation, which is by, which Chinese Google is, is their implementation of, of TensorFlow, if you will. So basically YOLO has like all these variants. And now, um, yo vii, which is Glen has been working on, is now I think kind of like, uh, one of the choice models to use for single shot detection.[00:44:44] World Knowledge of Foundation Models[00:44:44] Well, I think a lot of those models, you know, Asking the first principal's question, like let's say you wanna find like a bus detector. Do you need to like go find a bunch of photos of buses or maybe like a chair detector? Do you need to go find a bunch of photos of chairs? It's like, oh no. You know, actually those images are present not only in the cocoa data set, but those are objects that exist like kind of broadly on the internet.[00:45:02] And so computer visions kind of been like us included, have been like really pushing for and encouraging models that already possess a lot of context about the world. And so, you know, if GB T's idea and i's idea OpenAI was okay, models can only understand things that are in their corpus. What if we just make their corpus the size of everything on the internet?[00:45:20] The same thing that happened in imagery, what's happening now? And that's kinda what Sam represents, which is kind of a new evolution of, earlier on we were talking about the cost of annotation and I said, well, good news. Annotations then become decreasingly necessary to start to get to value. Now you gotta think about it more, kind of like, you'll probably need to do some annotation because you might want to find a custom object, or Sam might not be perfect, but what's about to happen is a big opportunity where you want the benefits of a yolo, right?[00:45:47] Where it can run really fast, it can run on the edge, it's very cheap. But you want the knowledge of a large foundation model that already knows everything about buses and knows everything about shoes, knows everything about real, if the name is true, anything segment, anything model. And so there's gonna be this novel opportunity to take what these large models know, and I guess it's kind of like a form of distilling, like distill them down into smaller architectures that you can use in versatile ways to run in real time to run on the edge.[00:46:13] And that's now happening. And what we're seeing in actually kind of like pulling that, that future forward with, with, with Robo Flow.[00:46:21] Segment Anything Model[00:46:21] So we could talk a bit about, um, about SAM and what it represents maybe into, in relation to like these, these YOLO models. So Sam is Facebook segment Everything Model. It came out last week, um, the first week of April.[00:46:34] It has 24,000 GitHub stars at the time of, of this recording within its first week. And why, what does it do? Segment? Everything is a zero shot segmentation model. And as we're describing, creating masks is a very arduous task. Creating masks of objects that are not already represented means you have to go label a bunch of masks and then train a model and then hope that it finds those masks in new images.[00:47:00] And the promise of Segment anything is that in fact you just pass at any image and it finds all of the masks of relevant things that you might be curious about finding in a given image. And it works remarkably. Segment anything in credit to Facebook and the fair Facebook research team, they not only released the model permissive license to move things forward, they released the full data set, all 11 million images and 1.1 billion segmentation masks and three model sizes.[00:47:29] The largest ones like 2.5 gigabytes, which is not enormous. Medium ones like 1.2 and the smallest one is like 400, 3 75 megabytes. And for context,[00:47:38] for, for people listening, that's six times more than the previous alternative, which, which is apparently open images, uh, in terms of number images, and then 400 times more masks than open[00:47:47] images as well.[00:47:48] Exactly, yeah. So huge, huge order magnitude gain in terms of dataset accessibility plus like the model and how it works. And so the question becomes, okay, so like segment. What, what do I do with this? Like, what does it allow me to do? And it didn't Rob float well. Yeah, you should. Yeah. Um, it's already there.[00:48:04] You um, that part's done. Uh, but the thing that you can do with segment anything is you can almost, like, I almost think about like this, kinda like this model arbitrage where you can basically like distill down a giant model. So let's say like, like let's return to the package example. Okay. The package problem of, I wanna get a text when a package appears on my front porch before segment anything.[00:48:25] The way that I would go solve this problem is I would go collect some images of packages on my porch and I would label them, uh, with bounding boxes or maybe masks in that part. As you mentioned, it can be a long process and I would train a model. And that model it actually probably worked pretty well cause it's purpose-built.[00:48:44] The camera position, my porch, the packages I'm receiving. But that's gonna take some time, like everything that I just mentioned there is gonna take some time. Now with Segment, anything, what you can do is go take some photos of your porch. So we're, we're still, we're still getting that. And then we're asking segment anything, basically.[00:49:00] Do you see, like segment, everything you see here? And, you know, a limitation of segment anything right now is it gives you masks without labels, like text labels for those masks. So we can talk about the way to address that in a, in a moment. But the point is, it will find the package in, in your photo. And again, there might be some positions where it doesn't find the package, or sometimes thing things look a little bit differently and you're gonna have to like, fine tune or whatever.[00:49:22] But, okay, now you've got a, you've got the intelligence of a package finder. Now you wanna deploy that package. Well, you could either call the Segment Everything model api, which hosted on platforms like RoboFlow, and I'm sure other places as well. Or you could probably distill it down to a smaller model.[00:49:38] You can run on the edge, like you wanna run it maybe on like a raspberry pie that just is looking and finding, well, you can't run segment everything on a raspberry pie, but you can run a single shot detector. So you just take all the data that's been basically automatically labeled for you and then you distill it down and train in much, much more efficient, smaller model.[00:49:57] And then you deploy that model to the edge and this is sort of what's gonna be increasingly possible. By the way, this has already happened in in LLMs, right? Like for example, like GPT4 knows. A lot about a lot and people will distill it down in some ways by seeing all the, uh, like code completion will say, let's say you're building a code completion model.[00:50:16] GPT4 can do any type of completion in addition to code completion. If you want to build your own code completion model, cause that's the only task that you're worried about for the future you're building. You could R H L F on all of GPT4 s code completion examples, and then almost kind of use that as distilling down into your own version of a code completion model and almost, uh, have a cheaper, more readily available, simpler model that yes, it only does one task, but that's the only task you need.[00:50:43] And it's a model that you own and it's a model that you can. Deploy more lightly and get more value from. That's sort of what has been represented as possible with, with Segment anything. But that's just on the dataset prep side, right? Like segment anything means you can make your own background removal, you can make your own sort of video editing software.[00:50:59] You can make like any, this promise of trying to make the world be understood and, uh, viewable and programmable just got so much more accessible. Yeah,[00:51:10] that's an incredible overview. I think we should just get your takes on a couple of like, so this is a massive, massive release. There are a lot of sort of small little features that, uh, they, they spent and elaborated in the blog post and the paper.[00:51:24] So I'm gonna pull out a few things to discuss and obviously feel free to suggest anything that you really want to get off your chest.[00:51:29] SAM: Zero Shot Transfer[00:51:29] So, zero shot transfer is.[00:51:31] No. Okay. But, uh, this level of quality, yes, much better. Yeah. So you could rely on large models previously for doing zero shot, uh, detection. But as you mentioned, the scale and size of the data set and resulting model that was trained is, is so much superior.[00:51:48] And that's, uh,[00:51:49] I guess the benefit of having world, world knowledge, um, yes. And being able to rely on that. Okay.[00:51:53] SAM: Promptability[00:51:53] And then prompt model, this is new. I still don't really understand how they did[00:51:58] it. Okay. So, so Sam basically said, why don't we take these 11 million images, 1.1 billion masks, and we'll train a transformer and an image encoder on all of those images.[00:52:14] And that's basically the pre-training that we'll use for passing any candidate image through. We'll pass that through this image encoder. So that's the, um, backbone, if you will, of the model. Then the much lighter parts become, okay, so if I've got that image encoding. I need to interact and understand what's inside the image en coating.[00:52:31] And that's where the prompting comes into play. And that's where the, the mask decoder comes into play in, in the model architecture. So image comes in, it goes through the imaging coder. The image en coder is what took lots of time and resources to train and get the weights for of, of what is Sam. But at inference time, of course, you don't have to re refine those weights.[00:52:49] So image comes in, goes to the image en coder, then you have the image and bedding. And now to interact with that image and embed, that's where you're gonna be doing prompting and the decoding specifically, what comes out of, out of Sam at the image encoding step is a bunch of candidate masks. And those candidate masks are the ones that you say you want to interact with.[00:53:06] What's really cool is there's both prompts for saying like the thing that you're interested in, but then there's also, you can also say the way that you wanna pass a candidate for which mask you're interested in from Sam, is you can just like point and click and say, this is the part of the image I'm interested in.[00:53:24] SAM: Model Assisted Labeling[00:53:24] Which is exactly what, like a, a labeling interface would be, uh, useful for, as an example,[00:53:30] which they actually use to bootstrap their own annotation, it seems.[00:53:33] Exactly. Isn't that pretty cool? Yes, exactly. So this is, this is why I was mentioning earlier that like the way to solve a computer vision problem, you know, like waterfall development versus agile development.[00:53:41] Sure. The same thing, like in machine learning, uh, it took a, it took a little bit, but folks like, oh, we can do this in, in machine learning too. And the way you do it, machine learning is instead of saying, okay, waterfall, I'm gonna take all my images and label them all. Okay, I'm done with the labeling part, now I'm gonna go to the training part.[00:53:55] Okay, I'm done with that part. Now I'm gonna go to the deployment part. A much more agile look would be like, okay, if I have like 10,000 images, let's label the first like hundred and just see what we get and we'll train a model and now we're gonna use that model that we trained to help us label the next thousand images.[00:54:10] And then we're gonna do this on repeat. That's exactly what the SAM team did. Yeah. They first did assisted man, they call it assisted manual. Manual, yeah.[00:54:15] Yep. Yeah. Where, which is uh, 4.3 million mass from 120,000 images.[00:54:19] Exactly. And then semi-automatic, which[00:54:22] is 5.9 million mass and 180,000[00:54:24] images. And in that step, they were basically having the human annotators point out where Sam may have missed a mask and then they did fully auto, which[00:54:32] is the whole thing.[00:54:33] Yes. 11 million images and 1.1[00:54:35] billion mask. And that's where they said, Sam, do your thing and predict all the mask. We won't[00:54:39] even, we won't even judge. Yeah. We just[00:54:41] close our eyes, which is what people are suspecting is happening for training G P T five. Right. Is that we're creating a bunch of candidate task text from G P T four to use in training the, the next g PT five.[00:54:52] So, but by the way, that process, like, you don't have to be a Facebook to take advantage of that. Like That's exactly what, like people building with Rob Flow. That's what you do.[00:54:59] Exactly. That's, this is your tool. That's the onboarding[00:55:01] that I did. That's exactly it. Is that like, okay, like you've got a bunch of images, but just label a few of them first.[00:55:07] Now you've got a, I almost think about it like a, you know, co-pilot is the term now, but I almost, I used to describe it as like a, an army of interns, otherwise known as AI that works alongside you. To have a first guess at labeling images for you, and then you're just kinda like supervising and improving and doing better.[00:55:23] And that relationship is a lot more efficient, a lot more effective. And by the way, by doing it this way, you don't waste a bunch of time labeling images. Like, again, we label images and pursuit of making sure our model learns something. We don't label images to label images, which means if we can label the right images defined by which images most help our model learn things next we should.[00:55:45] So we should look and see where's our model most likely to fail, and then spend our time labeling those images. And that's, that's sort of the tooling that, that we work on, making that exact loop faster and easier. Yeah. Yeah.[00:55:54] I highly recommend everyone try it. It's takes a few minutes. It's, it's great.[00:55:58] It's great. Is there anything else in, in Sam that, Sam specifically that you wanna go over? Or do you wanna go to Robot[00:56:03] SAM doesn't have labels[00:56:03] Full plus Sam? I mentioned one key thing about Sam that it doesn't do, and that is it doesn't outta the box give you labels for your masks. Now the paper. Alludes to the researchers attempting to get that part figured out.[00:56:18] And I think that they will, I think that they were like, we're just gonna publish this first part of just doing all the masks. Cuz that alone is like incredibly transformative for what's possible in, in computer vision. But in the interim, what is happening is people stitching together different models to name those masks, right?[00:56:35] So imagine that you go to Sam and you say, here's an image, and then Sam makes perfect masks of everything in the image. Now you need to know what are these masks, what objects are in these masks? Isn't it[00:56:45] funny that Sam doesn't know because you, you just said it knows[00:56:48] everything. Yeah, it knows it's weird.[00:56:50] It knows all the candidate masks. And that's, that's because that was the function that it was Yeah. Dream for. Yeah. Right, right. Okay. But again, like this is, this is what's going, like this is exactly what multi-modality is going to have happen anyway. You solved it. Yeah. So, yeah, so, so there's a couple different solutions.[00:57:04] I mean, this is where it's. You're begging the question of like, what are you trying to do with Sam? Like if you wanna do Sam, and then you wanna distill it down to deploy a more purpose-built task-specific, faster, cheaper model that you own. Yeah. That's commonly, I think what's gonna happen. So in that context, you're using SAM to accelerate your labeling.[00:57:21] Another way you might wanna use Sam is just in prod outta the box. Like, Sam is gonna produce good candidate labels and I don't need to fine tune anything and I just wanna like, use that as is. Well, in both of these contexts, we need to know the names of the masks that Sam is finding, right? Because like, if we're using Sam to label our stuff, well, telling us the mask isn't so helpful.[00:57:39] Like, in my image of packages, it's like, did you label the door? Did you label the package? I, I need to know what this mask is. There's an[00:57:45] objects nest there. Yeah. That, uh, that we can tell.[00:57:49] Yeah. And so you can use Sam in combination with other models. And pretty soon this is gonna be a single model. Like this podcast is gonna gonna like, I'll make a bold prediction in 30 days.[00:57:59] Like someone will do it, someone will do it in a single model, but with two models. So there's a model, for example, called Grounding DINO. Mm-hmm. Which is zero. Bounding box prediction. Mm-hmm. And with labels, and you interact with Grounding DINO through text prompts. So you could say like, here's an image.[00:58:14] You know, you and I are seated here in the studio. There's cans in front of us. You could say, give me the left can, and it would label bounding box only around the can on the left, like it understands text in that way. So you could use the masks from Sam and then ask Grounding DINO, what are these things?[00:58:29] Or where is X in between the combination of those two things? Boom, you have an automatic working text description of the things that you have in mind. Now again, this isn't perfect, like there will be places that still require human in loop review, and especially like on the novelty of a data set. These things will be be dependent.[00:58:49] But the point is, yes, there's places to improve and yes, you're gonna need to use tooling to do those improvements. The point is like we're starting so far ahead in our process. We're no longer starting at just like, I've got some images, what do I do? We're starting at, I've got some images and candidate descriptions of what's in those images.[00:59:04] How do I now. Mesh these two things together to understand precisely what I want to know from these images. And then deploy this thing because that's where you ultimately capture the value, is deploying this thing and, and envision a lot of that means on the edge because you have things running out in fields where people aren't.[00:59:21] Um, and that usually means constrained compute,[00:59:23] Labeling on the Browser[00:59:23] part of the demo of segment. Anything runs in the browser as well, which is interesting to some people. I I'm not sure how what percent of it was done.[00:59:30] That's what's fascinating. Um, because, and the reason it can do that, right, is because again, the giant image encoder, so remember the steps?[00:59:36] Yeah. It takes an image, the image encoder, and then you prompt from that image encoder. The image en coder is a large model and you need a spun up GPU to run the ongoing encoding that requires meaningful compute. Yeah. But the prompting can run in the browser. It's that lightweight, which means you can provide really fast feedback.[00:59:54] And that's exactly what we did at Robo Flow is we. Sam, and we made it be the world's best labeling tool. Like you can click on anything and Sam immediately says, this is what you wanted. The thing that you wanted to label is in these, this pixel coordinates area. And to be clear, we already had like this like kind of, we call it smart poly, like this thing that, like you could click and it would make regions of, of guesses of interest.[01:00:18] Sam is just such a stepwise improvement that will show, I mean, things that used to take maybe five or six clicks, you can, Sam immediately understands in one click. In one click.[01:00:28] Roboflow +SAM Video Demo[01:00:28] Cool. I, I think we might search over to the, uh, demo, but yeah, I think this is the, the time that we switch to a multimodal podcast and, uh, have a first screen share.[01:00:38] Amazing. So I'll semi nari what's, uh, what's going on, but, uh, we are checking out Joseph's screen and this is the interface of Robo flow. We have, we have Robo Flow before Sam and we have Robo Post Sam, and we're gonna see what, uh, the quality[01:00:53] difference is. Okay, so here is, uh, an image where we have a given weld that we're interested in segmenting this portion of the weld where these two pipes come together.[01:01:06] Yeah. And the weld is highly[01:01:06] irregular. It's kind of like curved in, in both in three dimensions. So it's just not a typical easily segmentable[01:01:13] thing. Yeah. To the human eye. Like pic eye could figure out, you know, probably where this weld starts and stops. But that's gonna take a lot of clicks. Certainly.[01:01:21] Like we could go through and like, we could, you know, this would be like the really old fashioned way of like creating, apparently[01:01:27] this is how they did, uh, lightsabers, that you had to like, mask out lightsabers and then use of the sub in on the, the lights. And you did it for every. So just really super expensive cuz they didn't have any other options.[01:01:39] Wow. And now it's one click in runway.[01:01:41] Wow. Wow. Okay. So open call for someone to make a light saber simulator using Robo Flow. That's awesome. You haven't had one? Not a, I'm aware. Okay. Oh my God, that's a great idea. Yeah. Yeah. Alright. Okay. So we, so that's, that's the very old fashion way now inside Robo Flow, like, uh, before Sam, we did have this thing called Smart Poly.[01:01:58] Uh, and this will still be, still be available for, for users to use. And so if like, I'm, I'm labeling the weld area, I'd go like this. And you know, the first click I'll, I'll narrate a little bit for, for swyx, I clicked on the welded joint. And it got the welded joint, but also includes lots of irrelevant[01:02:12] area, the rest of the, the bottom pipe and then, and the parts on the right.[01:02:15] What is that picking up? Is it picking up on like just the color or is[01:02:17] it like Yeah, this specific model probably wasn't pre-trained on images of welds and pipes and so it just doesn't have a great concept. Yeah. Of what region starts and stop. Now to be clear, I'm not sol here, like part of, part of the thing with robo, I can go say, I can add positive and negative points, so I can say, no, I didn't, I didn't want this part.[01:02:33] Yeah. And so I said I don't want that bottom part of the pipe little better, and I still don't want the bottom part of the pipe. Okay. That's almost, almost there.[01:02:41] There's a lot of space on either side of the weld. Okay. All right.[01:02:43] That's better. So, so four clicks we got, we got our way to, to, you know, the, the weld here.[01:02:48] Yeah. Um, now with Sam. And so we're gonna do the same thing. I'm going to label the weld portion with a single click. It understands the context of, of that, that, that weld. Uh, I was labeling fish, so I thought I was working on fish. So that's like one Okay, that's, that's great. Of like a, a before and after.[01:03:06] But let's talk about maybe some of the other, Examples of things that I might wanna work on. I came with some fun examples. Let's do, um, so I've got this image of two kids playing when I was holding a balloon in the background. There's like a brick wall. The lighting's not great. Yeah, lighting's not fantastic, but um, you know, we can clearly make out what's going on.[01:03:25] So I'm going to click the, uh, the brick wall in the background. Sam immediately labels both sides of the brick wall, even though there is a pole separating view between the left portion of the brick wall and the right portion of the brick wall. So I can just say like, I dunno, I'll just say thing for ease.[01:03:44] Or let's say I wanna do this guy's shoe, and I'm like, actually, you know what, no, I don't want the shoe, I want the whole, uh, person so I can That's two clicks. Two clicks, and Sam immediately got it. Maybe I wanna be even more really precise and get that portion there and miss face a little bit. So we click the face and that's another thing.[01:04:02] Or let's jump to maybe this one's very[01:04:05] fun. Okay, so there's a blue, a chihuahua with a bunch of[01:04:08] balloons. Yeah. So here, let's say like I wanted to do, uh, maybe I just wanted do like the eyes, right? Uhhuh. So I'll click like the left[01:04:15] eye that makes the whole chihuahua light[01:04:17] up so it gets the whole chihuahua.[01:04:19] Now here's where interactivity with models and kind of like a new UX paradigm for interaction with models make some sense. I'm gonna say, okay, I wanted that left eye. I don't want the, like the rest of the dog. Rest of the dog. So I'm gonna say no on this part of the dog. Then I'm gonna go say I go straight to the eye.[01:04:32] Yeah. Yep. I'm gonna say yes on the other eye. Uhhuh boom. Right now you got both eyes. I got both eyes and nothing else. And I could do the same thing with the ear. So I could say like, I want the ear and I click the right ear and it gets the whole again, the whole dog head. But I could say, no, I don't want the dog head.[01:04:46] And it boom recognizes that I want only the right ear. So can[01:04:49] I[01:04:49] ask about, so obviously this is super impressive. Can I ask like, is there a way to generalize this work? Like, I did this work for one image. Can I take a another image of a, the same chihuahua and just say, do that. The, um,[01:05:02] reapply what I did to some degree.[01:05:04] There's a few ways we could do that. The, probably the simplest way is actually going back to what we were talking about where you label a few examples and then you create your own kind of mini model that understands exactly what you're after. Yeah. And then you have that mini model finish the work for you.[01:05:18] And you just do that within robot flow. You just do that within Rob flow? Of course. Yeah. So like, I've got like, so I label, I label a bunch of my images after I've got, you know, we'll say like 10 of them labeled, then I'll kick off, you know, my own custom model. And the nice thing is that like right, I'm building my own ip.[01:05:34] And that's one of the big things that like I'm pretty excited about with, uh, Motomod modality and especially with GBT and some of these things, is that like I can take what these massive models understand. This is a generalist way of saying distill, but I can distill them down into a different architecture that captures that portion of the world.[01:05:54] And use that model for, let's say in this context, I've got an image up of, uh, men kind of in front of a pier and they've got aprons on. I can build my own apron detector. Again, this is sort of like in some context, like if I wanna build a task specific model and, and Sam knows everything that it knows, I can either go the route of trying to use Sam zero shot plus another model to label the, the, the mask images that might be limiting cuz of just the compute intensity that Sam requires to run and, you know, maybe wanna build some of my own IP and make use of some of my own data.[01:06:24] But these are kinda the two routes that I think we'll see continue to evolve. And I can use text prompting with Grounding DINO plus Sam to get a sense of which portions of the image I care about. And then I'm probably gonna need to do a little bit of QA of, of that. But, Like the dataset prep process and the biggest inhibitor to creating your own value in IP just got so much simpler.[01:06:49] And I think that, um, I think we're the first ones to go live with this, so that's, yeah, I'm, I'm very thrilled about that. We're recording[01:06:54] this earlier, but it's, uh, when, when this podcast drops, it'll be live. Uh, hopefully, you know, if everything goes well, I'll coordinate with you. So, so, so it will be live?[01:07:02] No, it will, it will, it will be live, yes. Yes, yes. Uh, and people can go try it out. Exactly. I guess it'll be just be part of the Rofo platform and I, I, I assume I'll, I'll add a, a blog post to it. Anything else on just, uh, so we're, we're about to zoom out from Sam and computer vision to Easter general AI takes, but, uh, anything else in terms of like future projections of, of the, of what happens next in, in computer vision segmentation or anything in that, in that,[01:07:27] Future Predictions[01:07:27] As you were describing earlier, Sam right now only produces masks.[01:07:30] It can't be text steer to give the context of those masks that's gonna happen in a single architecture without chaining together a couple different architectures. That's, that's for sure. The second thing is, um, multimodality generally will allow us to add more context to the things that we're seeing and doing.[01:07:45] And I'm sure we'll probably talk about this in a moment, but like, that's maybe a good segue into like GPT4 Yeah. And GPT4's capabilities, what we expect, how we're excited about it, the ways that we're already using some of GPT4, and really gonna lean into the capabilities that unlocks from, from imagery and, and a visual prep perspective.[01:08:04] GPT4 Multimodality[01:08:04] Let's go into that. Great. I was watching that keynote on GPT4. I was blown away. What were your reactions as a computer vision company?[01:08:13] Similar. Similar, yeah. Apparently. Um, so Greg Brockman did that demo where he said, make a joke generator website. Apparently that was totally ad hoc, like that. Didn't practiced that at all.[01:08:22] Which, what? Yeah, he just gave it a go. Yeah. I, I think that like the. Generation of code from imagery. I think that like screenshot of a website to rack components within six months. I think stuff like that will be imminently possible, doable and just unlock all kinds of potential.[01:08:38] And then did you see the second one with the Discord screenshot that they posted in?[01:08:42] It was a very quick part of the demo, so a lot of people missed it. But essentially what Logan from opening I did was screenshotted, uh, the Discord screen he was on and then pasted it into the discord that had GPT4 read it and it was able to read every word on it. Yes.[01:08:57] I think OCR is a solved problem[01:08:59] in a large language model as opposed to like a dedicated OCR R model.[01:09:03] Yes. Isn't that that that's, we've[01:09:05] never seen that. That's right. Yeah. And I think OCR like is actually a perfect candidate for like multimodality, right, because it's literally photos of text. Yeah. Yeah. And there's already gonna be like ample training data from all the work that's been done on creating prior OCR models.[01:09:20] Right. But yeah, I think that they probably are about to release the world's best. OCR model. Full stop. Yeah. Well,[01:09:27] Remaining Hard Problems[01:09:27] so I think those were like, kind of what they wanted to show on the demo. I, you know, it's, it's news to me that the, the drawing was impromptu. What's a really hard challenge that you wanna try on GT four once you get access to it, what are you going run[01:09:38] it on?[01:09:39] So, the way I think about like, advances in computer vision and what, uh, capabilities get unlocked, where there's still gonna be problems in ensuring that we're building tooling that really unblocks people. I think that, like if you think about the types of use cases that a model already knows without any training, I think about like a bell curve distribution.[01:09:58] Where in the fat center of the curve you have, uh, what historically has been like the cocoa dataset, common objects and context, a 2014 release from Microsoft, 80 classes, things like chair, silverware, food, car. They say sports ball for all. Sports ball. Did they really? Yeah. In the dataset. Yeah.[01:10:16] That's a, that's hilarious.[01:10:18] Oh[01:10:18] my God. So, yeah. And so you've got like all these, I mean, I, I get why they do that. It's like a capture for all sports. Um, but the point is, like in the fat center, you have these things, these, these objects that are as common as possible. And I think that, and then go to the exact, like long tails of this distribution and the very, very like edge of the tails you have.[01:10:38] Data and problems that are not common or regularly seen, the prevalence of that image may be existing on the web is maybe one way to think about this. And that's where you have like maybe a manufacturer that makes their own good that no one else makes, or a logistics company that knows what their stuff were supposed to look like or maybe your specific house looks like a very notable way or a pattern or, or something like this.[01:10:59] And of course, all these problems depend on like what exactly you want to do, but there will be places where there's just proprietary information that doesn't exist on the web basically. And, um, I think of that like what's happening in vision is that fat middle is steadily expanding outward. The models that are trained on cocoa, you know, do better and better and better on like, making that middle sliver really, really confident.[01:11:23] And then models like clip, which, you know, two years ago, the first kind of multimodality approach, which robos already power like we already have clip powered search and robo and have for over a year. Which, you know, links text and images in a way we haven't seen before it. And that basically increases the generalizability of what models can see.[01:11:45] I think G p D four expands that even further, where like, you get like, even further into like, those, those long, long tails. I don't think that like completely, like, I don't think that like, we'll, like never train again, so to speak. That's kinda like my, my mental model of what's happening, what's gonna continue to happen.[01:11:59] Now that still creates emergent problems for developers. That still creates problems like, like we were talking about earlier. Even if, you know, I have a model that knows everything in the world, that model might be a not mine or it might be a model that I can't run where I need to run it. Uh, maybe a place without internet, maybe a place on the edge, maybe a place that's compute constrained.[01:12:16] So I might need to do like some distilling down. I might have data that's truly proprietary that's like not present on the web. So like I can't rely on this model. I might have a task type that these G B D four and multimodal models are extremely good at visual question answering. And I think they'll be able to describe images in kinda like a freeform text way.[01:12:34] But you're still gonna come, maybe need to massage that text into something useful and, and insightful and, and to be, to be understood. And maybe that's a place where you're like, you know, use like lang chain and things to like, uh, figure out what's going on from, from the candidates descriptions of, of text.[01:12:48] And so there's still gonna be a healthy set of problems to making this stuff be, be usable, but ways that we're thinking about at Roble that I'm very excited about. So we already used GPT4 to do like dataset description with, to be clear, just the text only. Just the text only? Yeah, just the text only.[01:13:02] We're, we're fortunate like Greg and, and Sam back us. Um, uh, but personally, personally,[01:13:06] Sam as in Altman, Sam, not the, yeah, not the model Sam, because the mo the model could be smart enough to[01:13:11] back you. I don't know. That's been a funny confusion this last week. You know? Which, which Sam, which Sam are you talking about?[01:13:15] You were talking a lot about Sam does. So, but, but we don't have, um, visual access to be clear. Text only GPT4 to do dataset description, basically passing it what we already know, like we have, Hey, I have a computer vision model with like these sorts of classes or things like this, and gimme a dataset description that enriches, enriches my dataset.[01:13:31] And then we also of course have like GPT4 powered support, like a lot of folks do of like, uh, we ingested, uh, the 480 blogs and the Ripple blog, the 120 YouTube videos, 280, the you guys, the uh, dozens of open source projects and every page in our. Uh, and our help center. And then we ingested that and now we have a GPT4 powered bot that can generate not only like code snippets, just like GPT4 can do really well, but regurgitate and site and point you to the resources across Robo Flow.[01:13:57] Ask Roboflow (2019)[01:13:57] Shout out to the og uh, robo fans. You are the first to have your own bot, which is Ask Robo Flow. I saw this at Hack News. I was like, wait, this is a harbinger of things to come. And uh,[01:14:06] in 2019, this is where the name road flow comes from. Really? We, we, yes. I was[01:14:10] thinking there's nothing imaging in your, in your, uh, description or your[01:14:13] name.[01:14:14] Yeah. Yeah. Cuz I mean, I think that, um, to build, to build a hundred years end durable company, you can't just be one thing. You gotta, you gotta do everything. You gotta, you gotta be Microsoft anyway, so, yeah, yeah, yeah. One of the first things we were doing with, um, AI in 2019 was we realized Stack Overflow is extremely valuable resource, but it's only in English and programmers come from all around the world.[01:14:33] So logically programmers are gonna be speaking various languages to wanna understand and debug their programs. So we said, with these advances in N L P, don't you think that we could translate Stack Overflow? To every single other language and provide a really useful localized stack overflow. And so we started working on that.[01:14:47] We called it Stack Robo Flow. And then, um, Josh, the founder of, uh, delicious, if you remember that, that site. Mm-hmm. Mm-hmm. He Shawn Pardo, he's like, drop, drop the stack. It's cleaner. Just, just make it be robo Flow. It's a great story.[01:14:59] Oh, love the story behind names. And[01:15:00] from from then on, it's just been, uh, Rob Flow.[01:15:02] Yeah, yeah. Um, which is, you know, been a useful name and it's, and it's stuck. But yeah, like we, I mean actually Stack Rob. Dot com is still up and you can like ask it questions. It's not nearly as good, of course. It's like it's before LLMs. Like it's, uh, but uh, yeah, ask Rob Flow was the very first, you know, programmer completion sort of, sort of guide.[01:15:21] So we've been really excited that, um, others have picked up and done a much better job with that than what we were doing.[01:15:26] How to keep up in AI[01:15:26] Yeah. You have a really sort of hacker mentality, which I love. Uh, obviously you at, at the various hack hackathons in San Francisco. Uh, and maybe we can close out with that. I know we've been running long, so, uh, I'm just gonna zoom out a little bit into the broader sort of personal or meta question about how do you keep up with ai, right?[01:15:41] Like you, you're econ grad, you went into data science, very common path. I I had a similar path as well, and I'm going down this AI journey, um, about six, seven years after you. How do you recommend people keep[01:15:51] up? The way that I do is ingest sources from probably similar places that others do of whether it's the research community is quite active on, on Twitter.[01:15:59] Regularly seen papers linked on, on archived people will be in communities, various discords or even inside the robo flow Slack. People will share papers and things that are, um, meaningful and interesting. But that's just like one part is like ingestion. Yes. Getting ingestion from friends, having like engaged in conversations and just kind of being eyes wide open to various things.[01:16:18] The second part is production. Yeah. And we can kinda like read some tweets and see some demos, but for me when Robo Flow, when Brad and I, uh, were just working on stuff very early, one of the pioneer goals that we had was published three blogs and two YouTube videos per week. And we did that for seven months.[01:16:33] So I was just nonstop producing content and that wasn't just like writing a blog. It'd usually be like, Um, you know, you, you do a blog sometimes, or you do like a, a co-lab notebook, training tutorial, or the point is you're basically like naturally re-implementing the papers and things that you're reading and as you mention you out of[01:16:49] ideas.[01:16:50] Anyway. Yeah. Gotta do something.[01:16:53] I mean, and as you mentioned, I spent some time teaching data science work Yeah. Journal assembly and actually taught a bit about gw and I really became a subscriber to the belief that if you can't describe something simply, then you probably don't understand, don't know it yourself.[01:17:05] Yeah. And so being forced to, to produce things and then Yeah. You mentioned like hackathons, like I still, still have a good hackathon, whether that's internal to our team or inside the outside in the community. And I really look up to folks like, I mean, I'm sure you've probably come across like, uh, you, you recently mentioned that you, you'd spent some time with like the notion founders and you know, they're insanely Yeah.[01:17:22] Curious and you would've. Idea of the stature of, of the business. And I think that that's like an incredibly strong ethos to, to[01:17:30] have, they're billionaires and they're having lunch with me to ask what I think[01:17:34] about I, well, yeah, I mean, I think you have an incredibly good view of what's next and what's coming up and uh, a different purview.[01:17:41] But that's exactly what I mean. Right. Like engage in other folks and legitimately asking them and wanting to glean and, and be curious. Like, I dunno, like I think about someone like Jeff Dean who made map produce and also introduced one of the first versions of TensorFlow. Yeah. Like, he just has to be so innately curious to, I don't even know if it's, if it's called reinventing yourselves at that.[01:18:00] By that time, if you've just like been. Uh, so on the, the cutting edge, but it's not like I think about like someone considering themselves, quote unquote an expert in like TensorFlow or a framework or whatever, and it's like everyone is learning. Some people are just like further ahead on their journey and you can actually catch up pretty quickly with some strong, some strong effort.[01:18:18] So I think that that's a lot of it is like being, is there's just as much the mentality as there is, like the, the resources and then like the, the production. And I mean, you kinda mentioned before we started recording like, oh, you're like the expert on these, these sorts of things. And I don't even think that that's, uh, I spend more time thinking about them than a lot of people, but there's still a ton to ingest and work on and change and improve.[01:18:41] And I think that that's actually a pretty big opportunity for, uh, young companies especially that have a, a habit of being able to move quickly and really focus on like unlocking user value rather than most other things.[01:18:53] Well, that's a perfect way to end things. Uh, thank you for being my and many other people's first introduction to computer vision in the state of the art.[01:19:01] Uh, I'm sure we'll have you back for, you know, whatever else comes, uh, along. But you are literally the perfect guest to talk segment anything, and it was by far the hottest this topic of discussion this past week. So thanks for, uh, taking the[01:19:12] time. I had a ton of fun. Thanks for having me. All right. Thank you. Get full access to Latent Space at www.latent.space/subscribe
01:19:3513/04/2023
AI Fundamentals: Benchmarks 101
We’re trying a new format, inspired by Acquired.fm! No guests, no news, just highly prepared, in-depth conversation on one topic that will level up your understanding. We aren’t experts, we are learning in public. Please let us know what we got wrong and what you think of this new format!When you ask someone to break down the basic ingredients of a Large Language Model, you’ll often hear a few things: You need lots of data. You need lots of compute. You need models with billions of parameters. Trust the Bitter Lesson, more more more, scale is all you need. Right?Nobody ever mentions the subtle influence of great benchmarking.LLM Benchmarks mark our progress in building artificial intelligences, progressing from * knowing what words go with others (1985 WordNet)* recognizing names and entities (2004 Enron Emails) * and image of numbers, letters, and clothes (1998-2017 MNIST)* language translation (2002 BLEU → 2020 XTREME)* more and more images (2009 ImageNet, CIFAR)* reasoning in sentences (2016 LAMBADA) and paragraphs (2019 AI2RC, DROP)* stringing together whole sentences (2018 GLUE and SuperGLUE)* question answering (2019 CoQA)* having common sense (2018 Swag and HellaSwag, 2019 WinoGrande)* knowledge of all human tasks and professional exams (2021 MMLU)* knowing everything (2022 BIG-Bench)People who make benchmarks are the unsung heroes of LLM research, because they dream up ever harder tests that last ever shorter periods of time.In our first AI Fundamentals episode, we take a trek through history to try to explain what we have learned about LLM Benchmarking, and what issues we have discovered with them. There are way, way too many links and references to include in this email. You can follow along the work we did for our show prep in this podcast’s accompanying repo, with all papers and selected tests pulled out.Enjoy and please let us know what other fundamentals topics you’d like us to cover!Timestamps* [00:00:21] Benchmarking Questions* [00:03:08] Why AI Benchmarks matter* [00:06:02] Introducing Benchmark Metrics* [00:08:14] Benchmarking Methodology* [00:09:45] 1985-1989: WordNet and Entailment* [00:12:44] 1998-2004 Enron Emails and MNIST* [00:14:35] 2009-14: ImageNet, CIFAR and the AlexNet Moment for Deep Learning* [00:17:42] 2018-19: GLUE and SuperGLUE - Single Sentence, Similarity and Paraphrase, Inference* [00:23:21] 2018-19: Swag and HellaSwag - Common Sense Inference* [00:26:07] Aside: How to Design Benchmarks* [00:26:51] 2021: MMLU - Human level Professional Knowledge* [00:29:39] 2021: HumanEval - Code Generation* [00:31:51] 2020: XTREME - Multilingual Benchmarks* [00:35:14] 2022: BIG-Bench - The Biggest of the Benches* [00:37:40] EDIT: Why BIG-Bench is missing from GPT4 Results* [00:38:25] Issue: GPT4 vs the mystery of the AMC10/12* [00:40:28] Issue: Data Contamination* [00:42:13] Other Issues: Benchmark Data Quality and the Iris data set* [00:45:44] Tradeoffs of Latency, Inference Cost, Throughput* [00:49:45] ConclusionTranscript[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO and residence at Decibel Partners, and I'm joined by my co-host, swyx writer and editor of Latent Space.[00:00:21] Benchmarking Questions[00:00:21] Up until today, we never verified that we're actually humans to you guys. So we'd have one good thing to do today would be run ourselves through some AI benchmarks and see if we are humans.[00:00:31] Indeed. So, since I got you here, Sean, I'll start with one of the classic benchmark questions, which is what movie does this emoji describe? The emoji set is little Kid Bluefish yellow, bluefish orange Puffer fish. One movie does that. I think if you added an octopus, it would be slightly easier. But I prepped this question so I know it's finding Nemo.[00:00:57] You are so far a human. Second one of these emoji questions instead, depicts a superhero man, a superwoman, three little kids, one of them, which is a toddler. So you got this one too? Yeah. It's one of my favorite movies ever. It's the Incredibles. Uh, second one was kind of a letdown, but the first is a.[00:01:17] Awesome. Okay, I'm gonna ramp it up a little bit. So let's ask something that involves a little bit of world knowledge. So when you drop a ball from rest, it accelerates downward at 9.8 meters per second if you throw it downward instead, assuming no air resistance, so you're throwing it down instead of dropping it, it's acceleration immediately after leaving your hand is a 9.8 meters per second.[00:01:38] B, more than 9.8 meters per second. C less than 9.8 meters per second. D cannot say unless the speed of the throw is. I would say B, you know, I started as a physics major and then I changed, but I think I, I got enough from my first year. That is B Yeah. Even proven that you're human cuz you got it wrong.[00:01:56] Whereas the AI got it right is 9.8 meters per second. The gravitational constant, uh, because you are no longer accelerating after you leave the hand. The question says if you throw it downward after leaving your hand, what is the. It is, it goes back to the gravitational constant, which is 9.8 meters per, I thought you said you were a physics major.[00:02:17] That's why I changed. So I'm a human. I'm a human. You're human. You're human. But you, you got them all right. So I can't ramp it up. I can't ramp it up. So, Assuming, uh, the AI got all of that right, you would think that AI will get this one wrong. Mm-hmm. Because it's just predicting the next token, right?[00:02:31] Right. In the complex Z plane, the set of points satisfying the equation. Z squared equals modulars. Z squared is A, a pair points B circle, C, a half line D, online D square. The processing is, this is going on in your head. You got minus three. A line. This is hard. Yes, that is. That is a line. Okay. What's funny is that I think if, if an AI was doing this, it would take the same exact amount of time to answer this as it would every single other word.[00:03:05] Cuz it's computationally the same to them. Right.[00:03:08] Why AI Benchmarks matter[00:03:08] Um, so anyway, if you haven't caught on today, we're doing our first, uh, AI fundamentals episode, which just the two of us, no guess because we wanted to go deep on one topic and the topic. AI benchmarks. So why are we focusing on AI benchmarks? So, GPT4 just came out last week and every time a new model comes out, All we hear about is it's so much better than the previous model on benchmark X, on benchmark Y.[00:03:33] It performs better on this, better on that. But most people don't actually know what actually goes on under these benchmarks. So we thought it would be helpful for people to put these things in context. And also benchmarks evolved. Like the more the models improve, the harder the benchmarks get. Like I couldn't even get one of the questions right.[00:03:52] So obviously they're working and you'll see that. From the 1990s where some of the first ones came out to day, the, the difficulty of them is truly skyrocketed. So we wanna give a, a brief history of that and leave you with a mental model on, okay, what does it really mean to do well at X benchmark versus Y benchmark?[00:04:13] Um, so excited to add that in. I would also say when you ask people what are the ingredients going into a large language model, they'll talk to you about the data. They'll talk to you about the neural nets, they'll talk to you about the amount of compute, you know, how many GPUs are getting burned based on this.[00:04:30] They never talk to you about the benchmarks. And it's actually a shame because they're so influential. Like that is the entirety of how we judge whether a language model is better than the other. Cuz a language model can do anything out of. Potentially infinite capabilities. How do you judge one model versus another?[00:04:48] How do you know you're getting better? And so I think it's an area of intense specialization. Also, I think when. Individuals like us, you know, we sort of play with the language models. We are basically doing benchmarks. We're saying, look, it's, it's doing this awesome thing that I found. Guess what? There have been academics studying this for 20 years who have, uh, developed a science to this, and we can actually benefit from studying what they have done.[00:05:10] Yep. And obviously the benchmarks also drive research, you know, in a way whenever you're working on, in a new model. Yeah. The benchmark kind of constraints what you're optimizing for in a way. Because if you've read a paper and it performs worse than all the other models, like you're not gonna publish it.[00:05:27] Yeah. So in a way, there's bias in the benchmark itself. Yeah. Yeah. We'll talk a little bit about that. Right. Are we optimizing for the right things when we over-optimize for a single benchmark over over some others? And also curiously, when GPT4 was released, they emitted some very. Commonplace industry benchmarks.[00:05:44] So the way that you present yourself, it is a form of marketing. It is a form of trying to say you're better than something else. And, and trying to explain where you think you, you do better. But it's very hard to verify as well because there are certain problems with reproducing benchmarks, uh, especially when you come to large language models.[00:06:02] Introducing Benchmark Metrics[00:06:02] So where do we go from here? Should we go over the, the major concept? Yeah. When it comes to benchmark metrics, we get three main measures. Accuracy, precision, recall accuracy is just looking at how many successful prediction the model does. Precision is the ratio of true positives, meaning how many of them are good compared to the overall amount of predictions made Versus recall is what proportion of the positives were identified.[00:06:31] So if you think. Spotify playlist to maybe make it a little more approachable, precision is looking. How many songs in a Spotify playlist did you like versus recall is looking at of all the Spotify songs that you like in the word, how many of them were put in the in the playlist? So it's more looking at how many of the true positives can you actually bring into the model versus like more focusing on just being right.[00:06:57] And the two things are precision and recall are usually in tension.. If you're looking for a higher position, you wanna have a higher percentage of correct results. You're usually bringing recall down because you lead to kind of like lower response sets, you know, so there's always trade offs. And this is a big part of the benchmarking too.[00:07:20] You know, what do you wanna optimize for? And most benchmarks use this, um, F1 score, which is the harmonic mean of precision and recall. Which is, you know, we'll put it in the show notes, but just like two times, like the, you know, precision Times Recall divided by the sum. So that's one. And then you get the Stanford Helm metrics.[00:07:38] Um, yeah, so ultimately I think we have advanced a lot in the, in the past few decades on how we measure language models. And the most interesting one came out January of this year from Percy Lang's research lab at Stanford, and he's got. A few metrics, accuracy, calibration, robustness, fairness, efficiency, general information bias and toxicity, and caring that your language models are not toxic and not biased.[00:08:03] So is is, mm-hmm. Kind of a new thing because we have solved the other stuff, therefore we get to care about the toxic of, uh, the language models yelling at us.[00:08:14] Benchmarking Methodology[00:08:14] But yeah, I mean, maybe we can also talk about the other forms of how their be. Yeah, there's three main modes. You can need a benchmark model in a zero shot fashion, few shot or fine tune models, zero shots.[00:08:27] You do not provide any example and you're just testing how good the model is at generalizing few shots, you have a couple examples that you provide and then. You see from there how good the model is. These are the number of examples usually represented with a K, so you might see few shots, K equal five, it means five examples were passed, and then fine tune is you actually take a bunch of data and fine tune the model for that specific task, and then you test it.[00:08:55] These all go from the least amount of work required to the most amount of work required. If you're doing zero shots benchmarking, you do not need to have any data, so you can just take 'em out and do. If you're fine tuning it, you actually need a lot of data and a lot of compute time. You're expecting to see much better results from there.[00:09:14] Yeah. And sometimes the number of shots can go up to like a hundred, which is pretty surprising for me to see that people are willing to test these language models that far. But why not? You just run the computer a little bit longer. Yeah. Uh, what's next? Should we go into history and then benchmarks? Yeah.[00:09:29] History of Benchmarking since 1985[00:09:29] Okay, so I was up all night yesterday. I was like, this is a fascinating topic. And I was like, all right, I'll just do whatever's in the G PT three paper. And then I read those papers and they all cited previous papers, and I went back and back and back all the way to 1985. The very first benchmark that I can find.[00:09:45] 1985-1989: WordNet and Entailment[00:09:45] Which is WordNet, which is uh, an English benchmark created in at Princeton University by George Miller and Christian Fellbaum. Uh, so fun fact, Chris George Miller also authored the paper, the Magical Number seven plus Minus two, which is the observation that people have a short term memory of about seven for things.[00:10:04] If you have plus or minus two of seven, that's about all you can sort of remember in the short term, and I just wanted. Say like, this was before computers, right? 1985. This was before any of these personal computers were around. I just wanna give people a sense of how much work manual work was being done by these people.[00:10:22] The database, uh, WordNet. Sorry. The WordNet database contains 155,000 words organized in 175,000 sys. These sys are basically just pairings of nouns and verbs and adjectives and adverbs that go together. So in other words, for example, if you have nouns that are hyper names, if every X is a, is a kind of Y.[00:10:44] So a canine is a hyper name of a dog. It's a holo. If X is a part of Y, so a building is a hollow name of a window. The most interesting one for in terms of formal, uh, linguistic logic is entailment, which captures the relationship between two words, where the verb Y is entailed by X. So if by doing X, you must be doing Y.[00:11:02] So in other words, two, sleep is entailed by two snore because you cannot snore without also sleeping and manually mapping 155,000 words like that, the relationships between all of them in a, in a nested tree, which is. Incredible to me. Mm-hmm. And people just did that on faith. They were like, this will be useful somehow.[00:11:21] Right. Uh, and they were interested in cycle linguistics, like understanding how humans thought, but then it turned out that this was a very good dataset for understanding semantic similarity, right? Mm-hmm. Like if you measure the distance between two words by traversing up and down the graph, you can find how similar to two words are, and therefore, Try to figure out like how close they are and trade a model to, to predict that sentiment analysis.[00:11:42] You can, you can see how far something is from something that is considered a good sentiment or a bad sentiment or machine translation from one language to the other. Uh, they're not 200 word languages, which is just amazing. Like people had to do this without computers. Penn Tree Bank, I was in 1989, I went to Penn, so I always give a shout out to my university.[00:12:01] This one expanded to 4.5 million words of text, which is every uh, wall Street Journal. For three years, hand collected, hand labeled by grad students your tuition dollars at work. So I'm gonna skip forward from the eighties to the nineties. Uh, NYS was the most famous data set that came out of this. So this is the, uh, data set of 60,000.[00:12:25] Training images of, uh, of numbers. And this was the first visual dataset where, uh, people were tr tracking like, you know, handwritten numbers and, and mapping them to digital numbers and seeing what the error rate for them was. Uh, these days I think this can be trained in like e every Hello world for machine learning is just train missed in like four lanes of code.[00:12:44] 1998-2004 Enron Emails and MNIST[00:12:44] Then we have the Enron email data set. Enron failed in 2001. Uh, the emails were released in 2004 and they've been upgraded every, uh, every few years since then. That is 600,000 emails by 150 senior employees of Enron, which is really interesting because these are email people emailing each other back and forth in a very natural.[00:13:01] Context not knowing they're being, they're about to be observed, so you can do things like email classification, email summarization, entity recognition and language modeling, which is super cool. Any thoughts about that be before we go into the two thousands? I think like in a way that kind of puts you back to the bias, you know, in some of these benchmarks, in some of these data sets.[00:13:21] You know, like if your main corpus of benchmarking for entity recognition is a public energy company. Mm-hmm. You know, like if you're building something completely different and you're building a model for that, maybe it'll be worse. You know, you start to see how we started. With kind of like, WordNet is just like human linguistics, you know?[00:13:43] Yes. It's not domain related. And then, um, same with, you know, but now we're starting to get into more and more domain-specific benchmarks and you'll see this increase over time. Yeah. NY itself was very biased towards, um, training on handwritten letter. Uh, and handwritten numbers. So, um, in 2017 they actually extended it to Eist, which is an extended to extension to handwritten letters that seems very natural.[00:14:08] And then 2017, they also had fashion ness, which is a very popular data set, which is images of clothing items pulled from Zando. So you can see the capabilities of computer vision growing from single digit, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, to all the letters of the alphabet. To now we can recognize images, uh, of fashion, clothing items.[00:14:28] So it's pretty. So the big one for deep learning, cuz all of that was just, just the appetizers, just getting started.[00:14:35] 2009-2014 : ImageNet, CIFAR and the AlexNet Moment for Deep Learning[00:14:35] The big one for deep learning was ImageNet, which is where Fafa Lee came into the picture and that's why she's super well known. She started working in 2006 and released it in 2009. Fun fact, she actually met with, uh, Christian Feldbaum, who was, uh, one of the co-authors of, uh, war.[00:14:51] To create ImageNet. So there's a direct lineage from Words to Images. Yeah. And uh, they use Amazon Mechanical Turk to help with classification images. No longer grad students. But again, like I think, uh, this goes, kind of goes back to your observation about bias, like when I am a mechanical Turk worker. And I'm being paid by the image to classify an image.[00:15:10] Do you think I'll be very careful at my job? Right? Yeah. Whereas when I'm a, you know, Enron employee, emailing my, my fellow coworker, trying to just communicate something of, of natural language that is a different type of, uh, environment. Mm-hmm. So it's a pretty interesting benchmark. So it was released in 2009 ish and, you know, people were sort of competing to recognize and classify that properly.[00:15:33] The magic moment for ImageNet came in 2012, uh, which is called the AlexNet moment cuz I think that grad student that, um, created this recognition model was, uh, named Alex, I forget his last name, achieved a error rate of 15%, which is, More than 10% lower than the runner up. So it was used just so much better than the second place that everyone else was like, what are you doing?[00:15:54] Uh, and it turned out that he was, he was the first to use, uh, deep learning, uh, c n n 10 percentage points. So like 15 and the other one was 25. Yeah, exactly. So it was just so much, so much better than the others. It was just unbelievable that no one else was, no other approach was even coming close.[00:16:09] Therefore, everyone from there on out for the next, until today we're just learning the lessons of deep learning because, um, it is so much superior to the other approaches. And this was like a big. Images and visual moment because then you had like a sci-fi 10, which is a, another, like a data set that is mostly images.[00:16:27] Mm-hmm. Focused. Mm-hmm. So it took a little bit before we got back to to text. And nowadays it feels like text, you know, text models are kind of eating the word, you know, we're making the text one multi-model. Yeah. So like we're bringing the images to GBT four instead of the opposite. But yeah, in 2009 we had a, another 60,000 images that set.[00:16:46] 32 by 32. Color images with airplanes, automobiles, like, uh, animals, like all kind of stuff. Like I, I think before we had the numbers, then we had the handwritten letters. Then we had clothing, and then we finally made clothing items came after, oh, clothing items. 2009. Yeah, this is 2009. I skipped, I skipped time a little bit.[00:17:08] Yeah, yeah. But yeah, CFR 10 and CFR 100. CFR 10 was for 10 classes. And that that was chosen. And then obviously they optimized that and they were like, all right, we need a new problem now. So in 20 14, 5 years later, they introduced CFAR 100, which was a hundred classes of other items. And I think this is a very general pattern, which is used.[00:17:25] You create a data set for a specific be. You think it's too hard for machines? Mm-hmm. It lasts for five years before it's no longer too hard for machines, and you have to find a new data set and you have to extend it again. So it's Similarly, we are gonna find that in glue, which is another, which is one of more modern data sets.[00:17:42] 2018-19: GLUE and SuperGLUE - Single Sentence, Similarity and Paraphrase, Inference[00:17:42] This one came out in 2018. Glue stands for general Language Understanding Evaluation. This is one of the most influential, I think, early. Earlier, um, language model benchmarks, and it has nine tasks. Um, so it has single sentence tasks, similarity and paraphrase tasks and inference tasks. So a single sentence task, uh, would be something like, uh, the Stanford Sentiment Tree Bank, which is a.[00:18:05] Uh, sentences from movie reviews and human annotations of the sentiment, whether it's positive or negative, in a sort of like a four point scale. And your job is to predict the task of a single sentence. This similarity task would involve corpuses, like the Microsoft research paraphrase corpus. So it's a corpus of sentence pairs automatically extracted from online news sources with human annotations for whether or not the sentence is in the para semantically equivalent.[00:18:28] So you just predict true or false and again, Just to call back to the math that we did earlier in this episode, the classes here are imbalance. This data set, for example, is 68% positive. So we report both accuracy and F1 scores. F1 is a more balanced approach because it, it adjusts for, uh, imbalanced, um, data sets.[00:18:48] Mm-hmm. Yeah. And then finally, inference. Inference is the one where we really start to have some kind of logic. So for example, the M N L I. Um, actually I'm, I'm gonna focus on squad, the Stanford questioning question answering dataset. It's another data set of pairs, uh, questions, uh, uh, p question paragraphs, pairs.[00:19:04] So where one of the sentences of the paragraph drawn from Wikipedia contains the answer to the corresponding question, we convert the task into a sentence, para classification by forming a pair between each question in each sentence into corresponding context and filtering out pairs of low overlap. So basically annotating whether or not.[00:19:20] Is the answer to the question inside of this paragraph that I pulled. Can you identify that? And again, like Entailment is kind of included inside of each of these inference tasks because it starts to force the language model to understand whether or not one thing implies the other thing. Mm-hmm. Yeah.[00:19:37] And the, the models evolving. This came out in 2018, lasted one year exactly. One year later, people were like, that's too easy. That's too easy. So in 2019, they actually came out with super. I love how you'll see later with like swag and hella swag. It's like they come up with very good names for these things.[00:19:55] Basically what's super glue dead is stick glue and try and move outside of the single sentence evaluation. So most of the tasks that. Sean was talking about focus on one sentence. Yeah, one sentence, one question. It's pretty straightforward in that way. Superglue kind of at the, so one, it went from single sentence to having some multi sentence and kind of like a context driven thing.[00:20:21] So you might have questions where, The answer is not in the last paragraph that you've read. So it starts to test the, the context window on this model. Some of them are more, in order to know the answer, you need to know what's not in the question kind of thing. So like you may say, Hey, this drink is owned by the Coca-Cola company.[00:20:43] Is this a Pepsi product? You know, so you need to make the connection false. Exactly, yeah. Then you have also like, um, embedded clauses. So you have things that are not exactly said, have to be inferred, and like a lot of this stack is very conversational. So some of the example contain a lot of the, um, um, you know, or this question's very hard to read out.[00:21:07] Yeah, I know. It's like, it sounds like you are saying, um, but no, you're actually, you're actually. And yet I hope to see employer base, you know, helping out child, um, care centers at the place of employment, things like that, that will help out. It's kind of hard to even read it. And then the hypothesis is like they're setting a trend.[00:21:27] It's going from something very simple like a big p d extract to something that is more similar to how humans communicate. Transcripts, like audio transcripts. Exactly. Of how people talk. Yeah. And some of them are also, Plausibility. You know, like most of these models have started to get good at understanding like a clear cause, kind of like a.[00:21:48] You know, cause effect things. But some of the plausible ones are like, for example, this one is a copa. They're called choice of plausible alternatives. The premises, my body cast a shadow over the grass. What's the cost for this alternative? One, the sun was rising. Alternative to the grass was cut.[00:22:07] Obviously it's the sun was rising, but nowhere. In the question we're actually mentioning the sun, uh, we are mentioning the grass. So some models, some of the older models might see the grass and make the connection that the grass is part of the reason, but the models start to get better and better and go from simply looking at the single sentence context to a more of a, a word new, uh, word knowledge.[00:22:27] It's just really impressive, like the fact that. We can expect that out of a model. It still blows my mind. I think we should not take it for granted that when we're evaluating models, we're asking questions like this that is not obvious from just the given text itself. Mm-hmm. So it, it is just coming with a memorized view of the world, uh, or, or world knowledge. And it understands the premise on, on some form. It is not just random noise. Yeah, I know. It's really impressive. This one, I actually wanted multi rc I actually wanted to spring on you as a, as a test, but it's just too long to read. It's just like a very long logic question.[00:23:03] And then it'll ask you to do, uh, comprehension. But uh, yeah, we'll just, we'll just kinda skip that. We'll put it, we'll put it in the show notes, and then you have to prove us that you're a human. Send us the answer exactly. Exactly and subscribe to the podcast. So superglue was a lot harder, and I think also was superseded eventually, pretty soon.[00:23:21] 2018-2019: Swag and HellaSwag - Common Sense Inference[00:23:21] And, uh, yeah, then we started coming onto the more recent cohort of tests. I don't know how to introduce the rest. Uh, there, there are just so many tests here that I, I struggle a little bit picking from these. Uh, but perhaps we can talk about swag and heli swyx since you mentioned it. Yeah. So SWAG stands for situations with Adversarial Generations.[00:23:39] Uh, also came out in 2018, but this guy, zes Etal, likes to name his data sets and his benchmarks in a very memorable way. And if you look at the PDF of the paper, he also has a little icon, uh, image icon for swag. And he doesn't just go by, uh, regular language. So he definitely has a little bit of branding to this and it's.[00:24:00] Part. So I'll give you an example of the kind of problems that swyx poses. Uh, it it is focused on common sense inference. So what's common sense inference? So, for example, given a partial description, like she opened the hood of the car, humans can reason about the situation and anticipate what might come next.[00:24:16] Then she examined the engine. So you're supposed to pick based on what happened in the first part. What is most likely to happen in the second part based on the, uh, multiple choice question, right? Another example would be on stage, a woman takes a seat at the piano. She a, sits on a bench as her sister plays at the doll.[00:24:33] B. Smiles with someone as the music play. C is in the crowd watching the dancers. D nervously set her fingers on the keys, so A, B, C, or D. It's not all of them are plausible. When you look at the rules of English, we're we've, we're not even checking for whether or not produces or predicts grammatical English.[00:24:54] We're checking for whether the language model can correctly pick what is most likely given the context. The only information that you're given is on stage. A woman takes a seat at the piano, what is she most likely to do next? And D makes sense. It's arguable obviously. Sometimes it could be a. In common sense, it's D.[00:25:11] Mm-hmm. So we're training these models to have common. Yeah, which most humans don't have. So it's a, it's already a step up. Obviously that only lasted a year. Uh, and hello, SWAG was no longer, was no longer challenging in 2019, and they started extending it quite a lot more, a lot more questions. I, I forget what, how many questions?[00:25:33] Um, so Swag was a, swag was a data set. A hundred thousand multiple choice questions. Um, and, and part of the innovation of swag was really that you're generating these questions rather than manually coming up with them. Mm-hmm. And we're starting to get into not just big data, but big questions and big benchmarks of the, of the questions.[00:25:51] That's where the adversarial generations come in, but how that swag. Starts pulling in from real world questions and, and data sets like, uh, wikiHow and activity net. And it's just really, you know, an extension of that. I couldn't even add examples just cuz there's so many. But just to give you an idea of, uh, the progress over time.[00:26:07] Aside: How to Design Benchmarks[00:26:07] Most of these benchmarks are, when they're released, they set. Benchmark at a level where if you just randomly guessed all of the questions, you'll get a 25%. That's sort of the, the baseline. And then you can run each of the language models on them, and then you can run, uh, human evaluations on them. You can have median evaluations, and then you have, um, expert evaluations of humans.[00:26:28] So the randoms level was, uh, for halla. swyx was 20. GT one, uh, which is the, uh, 2019 version that got a 41 on the, on the Hello Sue X score. Bert from Google, got 47. Grover, also from Google, got 57 to 75. Roberta from Facebook, got 85 G P T, 3.5, got 85, and then GPT4 got 95 essentially solving hello swag. So this is useless too.[00:26:51] 2021 - MMLU - Human level Professional Knowledge[00:26:51] We need, we need super Hell now's use this. Super hell swyx. I think the most challenging one came from 2021. 2021 was a very, very good year in benchmarking. So it's, we had two major benchmarks that came out. Human eval and M M L U, uh, we'll talk about mm. M L U first, cuz that, that's probably the more, more relevant one.[00:27:08] So M M L U. Stands for measuring mul massive multitask language understanding, just by far the biggest and most comprehensive and most human-like, uh, benchmark that we've had for until 2021. We had a better one in 2022, but we'll talk about that. So it is a test that covers 57 tasks, including elementary, math, US history, computer science law, and more.[00:27:29] So to attain high accuracy on this task, models must possess extensive world knowledge and prop problem solving. Its. Includes practice questions for the GRE test and the U United States, um, m l e, the medical exam as. It also includes questions from the undergrad courses from Oxford, from all the way from elementary high school to college and professional.[00:27:49] So actually the opening question that I gave you for this podcast came from the math test from M M L U, which is when you drop a ball from rest, uh, what happens? And then also the question about the Complex Z plane, uh, but it equally is also asking professional medicine question. So asking a question about thyroid cancer and, uh, asking you to diagnose.[00:28:10] Which of these four options is most likely? And asking a question about microeconomics, again, giving you a, a situation about regulation and monopolies and asking you to choose from a list of four questions. Mm-hmm. Again, random baseline is 25 out of 100 G P T two scores, 32, which is actually pretty impressive.[00:28:26] GT three scores between 43 to 60, depending on the the size. Go. Scores 60, chinchilla scores 67.5, GT 3.5 scores, 70 GPT4 jumps, one in 16 points to 86.4. The author of M M L U, Dan Hendrix, uh, was commenting on GPT4 saying this is essentially solved. He's basically says like, GT 4.5, the, the next incremental improvement on GPT4 should be able to reach expert level human perform.[00:28:53] At which point it is passing simultaneously, passing all the law exams, all the medical exams, all the graduate student exams, every single test from AP history to computer science to. Math to physics, to economics. It's very impressive. Yeah. And now you're seeing, I mean, it's probably unrelated, but Ivy League universities starting to drop the a t as a requirement for getting in.[00:29:16] So yeah. That might be unrelated as well, because, uh, there's a little bit of a culture war there with regards to, uh, the, the inherent bias of the SATs. Yeah. Yeah. But I mean, that's kinda, I mean exactly. That's kinda like what we were talking about before, right? It's. If a model can solve all of these, then like how good is it really?[00:29:33] How good is it as a Exactly. Telling us if a person should get in. It captures it. Captures with just the beginning. Yeah. Right.[00:29:39] 2021: HumanEval - Code Generation[00:29:39] Well, so I think another significant. Benchmark in 2021 was human eval, which is, uh, the first like very notable benchmark for code code generation. Obviously there's a, there's a bunch of research preceding this, but this was the one that really caught my eye because it was simultaneously introduced with Open Eyes Codex, which is the code generation model, the version of G P T that was fine tuned for generating code.[00:30:02] Uh, and that is, Premise of, well, there is the origin or the the language model powering GitHub co-pilot and yeah, now we can write code with language models, just with that, with that benchmark. And it's good too. That's the other thing, I think like this is one where the jump from GT 3.5 to GPT4 was probably the biggest, like GT 3.4 is like 48% on. On this benchmark, GPT4 is 67%. So it's pretty big. Yeah. I think coders should rest a little bit. You know, it's not 90 something, it's, it's still at 67, but just wait two years. You know, if you're a lawyer, if you're a lawyer, you're done. If you're a software engineer, you got, you got a couple more years, so save your money.[00:30:41] Yeah. But the way they test it is also super creative, right? Like, I think maybe people don't understand that actually all of the tests that are given here are very intuitive. Like you. 90% of a function, and then you ask the language model to complete it. And if it completes it like any software engineer would, then you give it a win.[00:31:00] If not, you give it a loss, run that model 164 times, and that is human eval. Yeah. Yeah. And since a lot of our listeners are engineers too, I think the big thing here is, and there was a, a link that we had that I missed, but some of, for example, some of. Coding test questions like it can answer older ones very, very well.[00:31:21] Like it doesn't not answer recent ones at all. So like you see some of like the data leakage from the training, like since it's been trained on the issues, massive data, some of it leaks. So if you're a software engineer, You don't have to worry too much. And hopefully, especially if you're not like in the JavaScript board, like a lot of these frameworks are brand new every year.[00:31:41] You get a lot of new technologies. So there's Oh, there's, oh yeah. Job security. Yes, exactly. Of course. Yeah. You got a new, you have new framework every year so that you have job security. Yeah, exactly. I'll sample, uh, data sets.[00:31:51] 2020 - XTREME - Multilingual Benchmarks[00:31:51] So before we get to big bench, I'll mention a couple more things, which is basically multilingual benchmarks.[00:31:57] Uh, those are basically simple extensions of monolingual benchmarks. I feel like basical. If you can. Accurately predicts the conversion of one word or one part of the word to another part of the word. Uh, you get a score. And, and I think it's, it's fairly intuitive over there. Uh, but I think the, the main benchmarks to know are, um, extreme, which is the, uh, x the x lingual transfer evaluation, the multilingual encoders, and much prefer extreme.[00:32:26] I know, right? Uh, that's why, that's why they have all these, uh, honestly, I think they just wanted the acronym and then they just kinda worked backwards. And then the other one, I can't find it in my notes for, uh, what the other multilingual ones are, but I, I just think it's interesting to always keep in mind like what the other.[00:32:43] Language capabilities are like, one language is basically completely equivalent to another. And I think a lot of AI ethicists or armchair AI ethicists are very angry that, you know, most of the time we optimize for English because obviously that has, there's the most, uh, training corpuses. I really like extreme the work that's being done here, because they took a, a huge amount of effort to make sure they cover, uh, sparse languages like the, the less popular ones.[00:33:06] So they had a lot of, uh, the, the, obviously the, the popular. Uh, the world's top languages. But then they also selected to maximize language diversity in terms of the complete diversity in, uh, human languages like Tamil Telugu, maam, and Sohi and Yoruba from Africa. Mm-hmm. So I just thought like that kind of effort is really commendable cuz uh, that means that the rest of the world can keep up in, in this air race.[00:33:28] Right. And especially on a lot of the more human based things. So I think we talked about this before, where. A lot of Israel movies are more[00:33:36] focused on culture and history and like are said in the past versus a lot of like the Western, did we talk about this on the podcast? No, not on the podcast. We talked and some of the Western one are more focused on the future and kind of like what's to come.[00:33:48] So I feel like when you're, some of the benchmarks that we mentioned before, you know, they have movie reviews as like, uh, one of the. One of the testing things. Yeah. But there's obviously a big cultural difference that it's not always captured when you're just looking at English data. Yeah. So if you ask the a motto, it's like, you know, are people gonna like this movie that I'm writing about the future?[00:34:10] Maybe it's gonna say, yeah, that's a really good idea. Or if I wanna do a movie about the past, it's gonna be like maybe people want to hear about robots. But that wouldn't be the case in, in every country. Well, since you and I speak different languages, I speak Chinese, you speak Italian, I'm sure you've tested the Italian capabilities.[00:34:29] What do you think? I think like as. Italy, it's so much more, um, dialect driven. So it can be, it can be really hard. So what kind of Italian does g PT three speak? Actually Italian, but the reality is most people have like their own, their own like dialect. So it would be really hard for a model to fool. An Italian that it's like somebody from where they are, you know?[00:34:49] Yeah. Like you can actually tell if you're speaking to AI bot in Chinese because they would not use any of the things that human with humans would use because, uh, Chinese humans would use all sorts of replacements for regular Chinese words. Also, I tried one of those like language tutor things mm-hmm.[00:35:06] That people are making and they're just not good Chinese. Not colloquial Chinese, not anything that anyone would say. They would understand you, but they were from, right, right.[00:35:14] 2022: BIG-Bench - The Biggest of the Benches[00:35:14] So, 2022, big bench. This was the biggest of the biggest, of the biggest benchmarks. I think the, the main pattern is really just, Bigger benchmarks rising in opposition to bigger and bigger models.[00:35:27] In order to evaluate these things, we just need to combine more and more and way more tasks, right? Like swag had nine tasks, hello swag had nine more tasks, and then you're, you're just adding and adding and adding and, and just running a battery of tasks all over. Every single model and, uh, trying to evaluate how good they are at each of them.[00:35:43] Big bench was 204 tasks contributed by 442 authors across 132 institutions. The task topics are diverse, drawing from linguistics, childhood development, math, common sense reasoning, biology, physics, social bias, software development, and beyond. I also like the fact that these authors also selected tasks that are not solved by current language models, but also not solvable by memorizing the internet, which is mm-hmm.[00:36:07] Tracking back to a little bit of the issues that we're, we're gonna cover later. Right. Yeah. I think that's, that's super interesting. Like one of, some of the examples would include in the following chess position, find a checkmate, which is, some humans cannot do that. What is the name of the element within a topic number of six?[00:36:22] Uh, that one you can look up, right? By consulting a periodic table. We just expect language models to memorize that. I really like this one cuz it's, uh, it's inherent. It's, uh, something that you can solve.[00:36:32] Identify whether this sentence has an anachronism. So, option one. During the Allied bombardment of the beaches of Iwojima, Ralph spoke loudly into his radio.[00:36:41] And in option two, during the allied bombardment of the beaches of Iwojima, Ralph spoke loudly into his iPhone. And you have to use context of like when iPhone, when Ally bombarding. Mm-hmm. And then sort of do math to like compare one versus the other and realize that okay, this one is the one that's out of place.[00:36:57] And that's asking more and more and more of the language model to do in implicitly, which is actually modeling what we do when we listen to language, which is such a big. Gap. It's such a big advancement from 1985 when we were comparing synonyms. Mm-hmm. Yeah, I know. And it's not that long in the grand scheme of like humanity, you know, like it's 40 years.[00:37:17] It's crazy. It's crazy. So this is a big missing gap in terms of research. Big benches seems like the most comprehensive, uh, set of benchmarks that we have. But it is curiously missing from Gypsy four. Mm-hmm. I don't know. On paper, for code, I only see Gopher two 80. Yeah. On it. Yeah. Yeah. It could be a curious emission because it maybe looks.[00:37:39] Like it didn't do so well.[00:37:40] EDIT: Why BIG-Bench is missing from GPT4 Results[00:37:40] Hello, this is Swyx from the editing room sometime in the future. I just wanted to interject that. Uh, we now know why the GPT for benchmark results did not include the big bench. Benchmark, even though that was the state-of-the-art benchmark at the time. And that's because the. Uh, GPC four new the Canary G U I D of the big bench.[00:38:02] Benchmark. Uh, so Canary UID is a random string, two, six[00:38:08] eight six B eight, uh, blah, blah, blah. It's a UID. UID, and it should not be knowable by the language model. And in this case it was therefore they had to exclude big bench and that's. And the issue of data contamination, which we're about to go into right now.[00:38:25] Issue: GPT4 vs the mystery of the AMC10/12[00:38:25] And there's some interesting, if you dive into details of GPT4, there's some interesting results in GPT4, which starts to get into the results with benchmarking, right? Like so for example, there was a test that GPT4 published that is very, very bizarre to everyone who is even somewhat knowledgeable.[00:38:41] And this concerns the Ammc 10 and AMC 12. So the mc. Is a measure of the American math 10th grade student and the AMC12 is a, uh, is a measure of the American 12th grade student. So 12 is supposed to be harder than 10. Because the students are supposed to be older, it's, it's covering topics in algebra, geometry number, theory and combinatorics.[00:39:04] GPT4 scored a 30 on AMC10 and scored a 60 on AMC12. So the harder test, it got twice as good, and 30 was really, really bad. So the scoring format of AMC10. It is 25 questions. Each correct answer is worth six points. Each incorrect answer is worth 1.5 points and unanswered questions receive zero points.[00:39:25] So if you answer every single question wrong, you will get more than GPT4 got on AMC10. You just got everything wrong. Yeah, it's definitely better in art medics, you know, but it's clearly still a, a long way from, uh, from being even a high school student. Yeah. There's a little bit of volatility in these results and it, it shows that we, it's not quite like machine intelligence is not the same, or not linearly scaling and not intuitive as human intelligence.[00:39:54] And it's something that I think we should be. Aware of. And when it freaks out in certain ways, we should not be that surprised because Yeah, we're seeing that. Yeah. I feel like part of it is also human learning is so structured, you know, like you learn the new test, you learn the new test, you learn the new test.[00:40:10] But these models, we kind of throw everything at them all at once, you know, when we train them. So when, when the model is strained, are you excusing the model? No, no, no. I'm just saying like, you know, and you see it in everything. It's like some stuff. I wonder what the percentage of. AMC 10 versus AMC 12.[00:40:28] Issue: Data Contamination[00:40:28] Content online is, yes. This comes in a topic of contamination and memorization. Right. Which we can get into if we, if we, if we want. Yeah. Yeah, yeah. So, uh, we're getting into benchmarking issues, right? Like there's all this advancements in benchmarks, uh, language models. Very good. Awesome. Awesome, awesome. Uh, what are the problems?[00:40:44] Uh, the problem is that in order to train these language models, we are scraping the vast majority of the internet. And as time passes, the. Of previous runs of our tests will be pasted on the internet, and they will go into the corpus and the leg model will be memorizing them rather than reasoning them from first principles.[00:41:02] So in, in the machine, classic machine learning parlance, this would be overfitting mm-hmm. Uh, to the test rather than to the generalizing to the, uh, the results that we really want. And so there's an example of, uh, code forces as well also discovered on GPT4. So Code Forces has annual vintages and there was this guy, uh, C H H Halle on Twitter who ran GPT4 on pre 2021 problems, solved all of them and then ran it on 2022 plus problems and solved zero of them.[00:41:31] And we know that the cutoff for GPT4 was 2021. Mm-hmm. So it just memorized the code forces problems as far as we can tell. And it's just really bad at math cuz it also failed the mc 10 stuff. Mm-hmm. It's actually. For some subset of its capabilities. I bet if you tested it with GPT3, it might do better, right?[00:41:50] Yeah. I mean, this is the, you know, when you think about models and benchmarks, you can never take the benchmarks for what the number says, you know, because say, you know, you're focusing on code, like the benchmark might only include the pre 2021 problems and it scores great, but it's actually bad at generalizing and coming up with new solutions.[00:42:10] So, yeah, that, that's a. Big problem.[00:42:13] Other Issues: Benchmark Data Quality and the Iris data set[00:42:13] Yeah. Yeah. So bias, data quality, task specificity, reproducibility, resource requirements, and then calibrating confidence. So bias is, is, is what you might think it is. Basically, there's inherent bias in the data. So for example, when you think about doctor, do you think about a male doctor, a female doctor, in specifically an image net?[00:42:31] Businessmen, white people will be labeled businessmen, whereas Asian businessmen will be labeled Asian businessmen and that can reinforce harmful serotypes. That's the bias issue. Data quality issue. I really love this one. Okay, so there's a famous image data set we haven't talked about called the pedals or iris.[00:42:47] Iris dataset mm-hmm. Contains measurements of, uh, of, uh, length with petal length and petal with, uh, three different species of iris, iris flowers, and they have labeling issues in. So there's a mini, there's a lowest level possible error rate because the error rate exists in the data itself. And if you have a machine learning model that comes out with better error rate than the data, you have a problem cuz your machine learning model is lying to you.[00:43:12] Mm-hmm. Specifically, there's, we know this for a fact because especially for Iris flowers, the length should be longer than the, than the width. Um, but there. Number of instances in the data set where the length was shorter than the, than the width, and that's obviously impossible. So there was, so somebody made an error in the recording process.[00:43:27] Therefore if your machine learning model fits that, then it's doing something wrong cuz it's biologically impossible. Mm-hmm. Task specificity basically if you're overfitting to, to one type of task, for example, answering questions based on a single sentence or you're not, you know, facing something real world reproducibility.[00:43:43] This one is actually, I guess, the fine details of machine learning, which people don't really like to talk about. There's a lot. Pre-processing and post-processing done in I Python notebooks. That is completely un versions untested, ad hoc, sticky, yucky, and everyone does it differently. Therefore, your test results might not be the same as my test results.[00:44:04] Therefore, we don't agree that your scores are. The right scores for your benchmark, whereas you're self reporting it every single time you publish it on a, on a paper. The last two resource requirements, these are, these are more to do with GPTs. The larger and larger these models get, the harder, the more, more expensive it is to run some.[00:44:22] And some of them are not open models. In other words, they're not, uh, readily available, so you cannot tell unless they run it themselves on, on your benchmark. So for example, you can't run your GPT3, you have to kind of run it through the api. If you don't have access to the API like GPT4, then you can't run it at all.[00:44:39] The last one is a new one from GPT4's Paper itself. So you can actually ask the language models to expose their log probabilities and show you how confident they think they are in their answer, which is very important for calibrating whether the language model has the right amount of confidence in itself and in the GPT4 people. It. They were actually very responsible in disclosing that They used to have about linear correspondence between the amount of confidence and the amount of times it was right, but then adding R L H F onto GPT4 actually skewed this prediction such that it was more confident than it should be. It was confidently incorrect as as people say.[00:45:18] In other words, hallucinating. And that is a problem. So yeah, those are the main issues with benchmarking that we have to deal with. Mm-hmm. Yeah, and a lot of our friends, our founders, we work with a lot of founders. If you look at all these benchmarks, all of them just focus on how good of a score they can get.[00:45:38] They don't focus on what's actually feasible to use for my product, you know? So I think.[00:45:44] Tradeoffs of Latency, Inference Cost, Throughput[00:45:44] Production benchmarking is something that doesn't really exist today, but I think we'll see the, the rise off. And I think the main three drivers are one latency. You know, how quickly can I infer the answer cost? You know, if I'm using this model, how much does each call cost me?[00:46:01] Like is that in line with my business model I, and then throughput? I just need to scale these models to a lot of questions on the ones. Again, I just do a benchmark run and you kind of come up. For quadrants. So if on the left side you have model size going from smallest to biggest, and on the X axis you have latency tolerance, which is from, I do not want any delay to, I'll wait as long as I can to get the right answer.[00:46:27] You start to see different type of use cases, for example, I might wanna use a small model that can get me an answer very quickly in a short amount of time, even though the answer is narrow. Because me as a human, maybe I'm in a very iterative flow. And we have Varun before on the podcast, and we were talking about a kind of like a acceleration versus iteration use cases.[00:46:50] Like this is more for acceleration. If I'm using co-pilot, you know, the code doesn't have to be a hundred percent correct, but it needs to happen kind of in my flow of writing. So that's where a model like that would be. But instead, other times I might be willing, like if I'm asking it to create a whole application, I'm willing to wait one hour, you know, for the model to get me a response.[00:47:11] But you don't have, you don't have a way to choose that today with most models. They kind of do just one type of work. So I think we're gonna see more and more of these benchmark. Focus on not only on the research side of it, which is what they really are today when you're developing a new model, like does it meet the usual standard research benchmarks to having more of a performance benchmark for production use cases?[00:47:36] And I wonder who's gonna be the first company that comes up with, with something like this, but I think we're seeing more and more of these models go from a research thing to like a production thing. And especially going from companies like. Google and Facebook that have kinda unlimited budget for a lot of these things to startups, starting to integrate them in the products.[00:48:00] And when you're on a tight budget paying, you know, 1 cent per thousand tokens or 0.10 cent for a thousand tokens, like it's really important. So I think that's, um, that's what's missing to get a lot of these things to productions. But hopefully we, we see them.[00:48:16] Yeah, the software development lifecycle I'm thinking about really is that most people will start with large models and then they will prototype with that because that is the most capable ones.[00:48:25] But then as they put more and more of those things in production, people always want them to run faster and faster and faster and cheaper. So you will distill towards a more domain specific model, and every single company that puts this into production, we'll, we'll want something like that, but I, I think it's, it's a reasonable bet because.[00:48:41] There's another branch of the AI builders that I see out there who are build, who are just banking on large models only. Mm-hmm. And seeing how far they can stretch them. Right. With building on AI agents that can take arbitrarily long amounts of time because they're saving you lots of, lots of time with, uh, searching the web for you and doing research for you.[00:48:59] And I think. I'm happy to wait for Bing for like 10 seconds if it does a bunch of searches for median. Mm-hmm. Just ends with, ends with the right, right result. You know, I was, I was tweeting the other day that I wanted an AI enabled browser because I was seeing this table, uh, there was an image and I just needed to screenshot an image and say, plot this on a chart for me.[00:49:17] And I just wanted to do that, but it would have to take so many steps and I would be willing to wait for a large model to do that for me. Mm-hmm. Yeah. I mean, web development so far has been, Reduce, reduce, reduce the loading times. You know, it's like first we had the, I don't know about that. There, there are people who disagree.[00:49:34] Oh. But I, I think, like if you think about, you know, the CDN and you think about deploying things at the edge, like the focus recently has been on lowering the latency time versus increasing it.[00:49:45] Conclusion[00:49:45] Yeah. So, well that's the, that's Benchmark 1 0 1. Um. Let us know how we, how you think we did. This is something we're trying for the first time.[00:49:52] We're very inspired by other podcasts that we like where we do a bunch of upfront prep, but then it becomes a single topical episode that is hopefully a little bit more timeless. We don't have to keep keeping up with the news. I think there's a lot of history that we can go back on and. Deepen our understanding of the context of all these evolutions in, uh, language models.[00:50:12] Yeah. And if you have ideas for the next, you know, 1 0 1 fundamentals episode, yeah, let us know in the, in the comments and we'll see you all soon. Bye. Get full access to Latent Space at www.latent.space/subscribe
50:3807/04/2023
Grounded Research: From Google Brain to MLOps to LLMOps — with Shreya Shankar of UC Berkeley
We are excited to feature our first academic on the pod! I first came across Shreya when her tweetstorm of MLOps principles went viral:Shreya’s holistic approach to production grade machine learning has taken her from Stanford to Facebook and Google Brain, being the first ML Engineer at Viaduct, and now a PhD in Databases (trust us, its relevant) at UC Berkeley with the new EPIC Data Lab. If you know Berkeley’s history in turning cutting edge research into gamechanging startups, you should be as excited as we are!Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Edit from the future: Shreya obliged us with another round of LLMOps hot takes after the pod!Other Links* Shreya’s About: https://www.shreya-shankar.com/about/* Berkeley Sky Computing Lab - Utility Computing for the Cloud* Berkeley Epic Data Lab - low-code and no-code interfaces for data work, powered by next-generation predictive programming techniques* Shreya’s ML Principles * Grounded Theory* Lightning Round:* Favorite AI Product: Stability Dreamstudio* 1 Year Prediction: Data management platforms* Request for startup: Design system generator* Takeaway: It’s not a fad!Timestamps* [00:00:27] Introducing Shreya (poorly)* [00:03:38] The 3 V's of ML development* [00:05:45] Bridging Development and Production* [00:08:40] Preventing Data Leakage* [00:10:31] Berkeley's Unique Research Lab Culture* [00:11:53] From Static to Dynamically Updated Data* [00:12:55] Models as views on Data* [00:15:03] Principle: Version everything you do* [00:16:30] Principle: Always validate your data* [00:18:33] Heuristics for Model Architecture Selection* [00:20:36] The LLMOps Stack* [00:22:50] Shadow Models* [00:23:53] Keeping Up With Research* [00:26:10] Grounded Theory Research* [00:27:59] Google Brain vs Academia* [00:31:41] Advice for New Grads* [00:32:59] Helping Minorities in CS* [00:35:06] Lightning RoundTranscript[00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio partner and CTM residence at Decibel Partners. I'm joined by my co-host, swyx writer and editor of Latent Space. Yeah,[00:00:21] it's awesome to have another awesome guest Shankar. Welcome .[00:00:25] Thanks for having me. I'm super excited.[00:00:27] Introducing Shreya (poorly)[00:00:27] So I'll intro your formal background and then you can fill in the blanks.[00:00:31] You are a bsms and then PhD at, in, in Computer Science at Stanford. So[00:00:36] I'm, I'm a PhD at Berkeley. Ah, Berkeley. I'm sorry. Oops. . No, it's okay. Everything's the bay shouldn't say that. Everybody, somebody is gonna get mad, but . Lived here for eight years now. So[00:00:50] and then intern at, Google Machine learning learning engineer at Viaduct, an OEM manufacturer, uh, or via OEM analytics platform.[00:00:59] Yes. And now you're an e I R entrepreneur in residence at Amplify.[00:01:02] I think that's on hold a little bit as I'm doing my PhD. It's a very unofficial title, but it sounds fancy on paper when you say[00:01:09] it out loud. Yeah, it is fancy. Well, so that is what people see on your LinkedIn. What's, what should, what should people know about you that's not on your LinkedIn?[00:01:16] Yeah, I don't think I updated my LinkedIn since I started the PhD, so, I'm doing my PhD in databases. It is not AI machine learning, but I work on data management for building AI and ML powered software. I guess like all of my personal interests, I'm super into going for walks, hiking, love, trying coffee in the Bay area.[00:01:42] I recently, I've been getting into cooking a lot. Mm-hmm. , so what kind of cooking? Ooh. I feel like I really like pastas. But that's because I love carbs. So , I don't know if it's the pasta as much as it's the carb. Do you ever cook for[00:01:56] like large[00:01:57] dinners? Large groups? Yeah. We just hosted about like 25 people a couple weeks ago, and I was super ambitious.[00:02:04] I was like, I'm gonna cook for everyone, like a full dinner. But then kids were coming. and I was like, I know they're not gonna eat tofu. The other thing with hosting in the Bay Area is there's gonna be someone vegan. There's gonna be someone gluten-free. Mm-hmm. . There's gonna be someone who's keto. Yeah.[00:02:20] Good luck, .[00:02:21] Oh, you forgot the seeds. That's the sea disrespects.[00:02:25] I know. . So I was like, oh my God, I don't know how I'm gonna do this. Yeah. The dessert too. I was like, I don't know how I'm gonna make everything like a vegan, keto nut free dessert, just water. It was a fun challenge. We ordered pizza for the children and a lot of people ate the pizza.[00:02:43] So I think , that's what happens when you try to cook, cook for everyone.[00:02:48] Yeah. The reason I dug a bit on the cooking is I always find like if you do cook for large groups, it's a little bit like of an ops situation. Yeah. Like a lot of engineering. A lot of like trying to figure out like what you need to deliver and then like what the pipeline[00:02:59] is and Oh, for sure.[00:03:01] You write that Gantt chart like a day in advance. , did you actually have a ga? Oh, I did. My gosh. Of course I had a Gantt chart. I, I dunno how people, did[00:03:08] you orchestrate it with airflow or ?[00:03:12] I orchestrated it myself. .[00:03:15] That's awesome. But yeah, we're so excited to have you, and you've been a pretty prolific writer, researcher, and thank you.[00:03:20] You have a lot of great content out there. I think your website now says, I'm currently learning how to make machine learning work in the real world, which is a challenge that mm-hmm. , everybody is steaming right now from the Microsoft and Googles of the word that have rogue eyes flirting with people, querying them to people, deploy models to production.[00:03:38] The 3 V's of ML development[00:03:38] Maybe let's run through some of the research you've done, especially on lops. Sure. And how to get these things in production. The first thing I really liked from one of your paper was the, the three VS of ML development. Mm-hmm. , which is velocity validation and versioning. And one point that you were making is that the development workflow of software engineering is kind of very different from ML because ML is very experiment driven.[00:04:00] Correct. There's a lot of changes that you need to make, you need to kill things very quickly if they're not working. So maybe run us through why you decided as kind of those three vs. Being some of the, the core things to think about. and some of the other takeaways from their research. Yeah,[00:04:15] so this paper was conducted as a loosely structured interview study.[00:04:18] So the idea is you interview like three or four people and then you go and annotate all the transcripts, tag them, kind of put the word clouds out there, whatever. There's a bunch of like cool software to do this. Then we keep seeing these, themes of velocity wasn't the word, but it was like experiment quickly or high experimentation rate.[00:04:38] Sometimes it was velocity. And we found that that was like the number one thing for people who were talking about their work in this kind of development phase. We also categorized it into phases of the work. So the life cycle like really just fell into place when we annotated the transcripts. And so did the variables.[00:04:55] And after three or four interviews you iterate on them. You kind of iterate on the questions, and you iterate on the codes or the tags that you give to the transcripts and then you do it again. And we repeated this process like three or four times up to that many people, and the story kind of told itself in a way that[00:05:11] makes sense.[00:05:12] I think, like I was trying to figure out why you picked those, but it's interesting to see that everybody kinda has the same challenges.[00:05:18] It fell out. I think a big thing, like even talking to the people who are at the Microsofts and the Googles, they have models in production. They're frequently training these models in production, yet their Devrel work is so experimental.[00:05:31] Mm-hmm. . And we were like, so it doesn't change. Even when you become a mature organization, you still throw 100 darts at the wall for five of them to stick and. That's super interesting and I think that's a little bit unique to data science and machine learning work.[00:05:45] Bridging Development and Production[00:05:45] Yeah. And one one point you had is kind of how do we bridge the gap between the development environments and the production environments?[00:05:51] Obviously you're still doing work in this space. What are some of the top of mind areas of focus for you in[00:05:57] this area? Yeah, I think it. Right now, people separate these environments because the production environment doesn't allow people to move at the rate that they need to for experimentation. A lot of the times as you're doing like deep learning, you wanna have GPUs and you don't wanna be like launching your job on a Kubernetes cluster and waiting for the results to come.[00:06:17] And so that's just the hardware side of things. And then there is the. Execution stack. Um, you wanna be able to query and create features real time as you're kind of training your model. But in production things are different because these features are kind of scheduled, maybe generated every week.[00:06:33] There's a little bit of lag. These assumptions are not accounted for. In development and training time. Mm-hmm. . So of course we're gonna see that gap. And then finally, like the top level, the interface level. People wanna experiment in notebooks, in environments that like allow them to visualize and inspect their state.[00:06:50] But production jobs don't typically run in notebooks. Yeah, yeah, yeah. I mean there, there are tools like paper mill and et cetera. But it's not the same, right? So when you just look at every single layer of the kind of data technical stack, there's a develop. Side of things and there's a production side of things and they're completely different.[00:07:07] It makes sense why. Way, but I think that's why you get a bunch of bugs that come when you put things in production.[00:07:14] I'm always interested in the elimination of those differences. Mm-hmm. And I don't know if it's realistic, but you know, what would it take for people to, to deploy straight to production and then iterate on production?[00:07:27] Because that's ultimately what you're[00:07:29] aim for. This is exactly what I'm thinking about right now in my PhD for kind of like my PhD. But you said it was database. I think databases is a very, very large field. , pretty much they do everything in databases . But the idea is like, how do we get like a unified development and production experience, Uhhuh, for people who are building these ML models, I think one of the hardest research challenges sits at that execution layer of kind of how do.[00:07:59] Make sure that people are incorporating the same assumptions at development time. Production time. So feature stores have kind of come up in the last, I don't know, couple of years, three years, but there's still that online offline separation. At training time, people assume that their features are generated like just completely, perfectly.[00:08:19] Like there's no lag, nothing is stale. Mm-hmm. , that's the case when trading time, but those assumptions aren't really baked. In production time. Right. Your features are generated, I don't know, like every week or some Every day. Every hour. That's one thing. How do, like, what does that execution model look like to bridge the two and still give developers the interactive latencies with features?[00:08:40] Preventing Data Leakage[00:08:40] Mm-hmm. . I think another thing also, I don't know if this is an interface problem, but how do we give developers the guardrails to not look at data that they're not supposed to? This is a really hard problem. For privacy or for training? Oh, no, just for like training. Yeah. Okay. also for privacy. Okay. But when it comes to developing ML models in production, like you can't see, you don't see future data.[00:09:06] Mm-hmm. . Yeah. You don't see your labels, but at development time it's really easy to. to leak. To leak and even like the seeming most seemingly like innocuous of ways, like I load my data from Snowflake and I run a query on it just to get a sense for, what are the columns in my data set? Mm-hmm. or like do a DF dot summary.[00:09:27] Mm-hmm. and I use that to create my features. Mm-hmm. and I run that query before I do train test. , there's leakage in that process. Right? And there's just at the fun, most fundamental level, like I think at some point at my previous company, I just on a whim looked through like everyone's code. I shouldn't have done that , but I found that like everyone's got some leakage assumptions somewhere.[00:09:49] Oh, mm-hmm. . And it's, it's not like people are bad developers, it's just that. When you have no guard the systems. Yeah, do that. Yeah, you do this. And of course like there's varying consequences that come from this. Like if I use my label as a feature, that's a terrible consequence. , if I just look at DF dot summary, that's bad.[00:10:09] I think there's like a bunch of like unanswered interesting research questions in kind of creating. Unified experience. I was[00:10:15] gonna say, are you about to ban exploratory data analysis ?[00:10:19] Definitely not. But how do we do PDA in like a safe , data safe way? Mm-hmm. , like no leakage whatsoever.[00:10:27] Right. I wanna ask a little small follow up about doing this at Berkeley.[00:10:31] Berkeley's Uniquely Research Lab Culture[00:10:31] Mm-hmm. , it seems that Berkeley does a lot of this stuff. For some reason there's some DNA in Berkeley that just, that just goes, hey, just always tackle this sort of hard data challenges. And Homestate Databricks came out of that. I hear that there's like some kind of system that every five years there's a new lab that comes up,[00:10:46] But what's going on[00:10:47] there? So I think last year, rise Lab which Ray and any scale came out of. Kind of forked into two labs. Yeah. Sky Lab, I have a water bottle from Sky Lab. Ooh. And Epic Lab, which my advisor is a co-PI for founding pi, I don't know what the term is. And Skylabs focus, I think their cider paper was a multi-cloud programming environment and Epic Lab is, Their focus is more like low-code, no-code, better data management tools for this like next generation of Interfa.[00:11:21] I don't even know. These are like all NSF gra uh, grants.[00:11:24] Yeah. And it's five years, so[00:11:26] it could, it could involve, yeah. Who knows what's gonna be, and it's like super vague. Yeah. So I think we're seeing like two different kinds of projects come out of this, like the sky projects of kind of how do I run my job on any cloud?[00:11:39] Whichever one is cheapest and has the most resources for me, my work is kind of more an epic lab, but thinking about these like interfaces, mm-hmm. , better execution models, how do we allow people to reason about the kind of systems they're building much more effectively. Yeah,[00:11:53] From Static Data to Dynamically Updated Data[00:11:53] yeah. How do you think about the impact of the academia mindset when then going into.[00:11:58] Industry, you know, I know one of the points in your papers was a lot of people in academia used with to static data sets. Mm-hmm. , like the data's not updating, the data's not changing. So they work a certain way and then they go to work and like they should think about bringing in dynamic data into Yeah.[00:12:15] Earlier in the, in the workflow, like, , how do you think we can get people to change that mindset? I think[00:12:21] actually people are beginning to change that mindset. We're seeing a lot of kind of dynamic data benchmarks or people looking into kind of streaming datasets, largely image based. Some of them are language based, but I do think it's somewhat changing, which is good.[00:12:35] But what I don't think is changing is the fact that model researchers and Devrel developers want. to create a model that learns the world. Mm-hmm. . And that model is now a static artifact. I don't think that's the way to go. I want people, at least in my research, the system I'm building, models are not a one time thing.[00:12:55] Models as views on Data[00:12:55] Models are views that are frequently recomputed over your data to use database speak, and I don't see people kind of adopting that mindset when it comes to. Kind of research or the data science techniques that people are learning in school. And it's not just like retrain G P T every single day or whatever, but it, it is like, how do I make sure that I don't know, my system is evolving over time.[00:13:19] Mm-hmm. that whatever predictions or re query results that are being generated are. Like that process is changing. Can you give[00:13:27] a, an overview of your research project? I know you mentioned a couple snippets here and there,[00:13:32] but that would be helpful. . I don't have a great pitch yet. I haven't submitted anything, still working on it, but the idea is like I want to create a system for people to develop their ML pipelines, and I want it to be like, Like unifying the development production experience.[00:13:50] And the key differences about this is one, you think of models as like data transformations that are recomputed regularly. So when you write your kind of train or fit functions, like the execution engine understands that this is a process that runs repeatedly. It monitors the data under the hood to refit the computation whenever it's detected.[00:14:12] That kind of like the data distributions have changed. So that way whenever you. Test your pipelines before you deploy them. Retraining is baked in, monitoring is baked in. You see that? And the gold star, the gold standard for me is the number that you get at development time. That should be the number that you get when you deploy[00:14:33] There shouldn't be this expected 10% drop. That's what I know I will have. Made something. But yeah, definitely working on that.[00:14:41] Yeah. Cool. So a year ago you tweeted a list of principles that you thought people should know and you split it very hopefully. I, I thought into beginner, intermediate, advanced, and sometimes the beginner is not so beginner, you know what I mean?[00:14:52] Yeah, definitely. .[00:14:53] The first one I write is like,[00:14:57] so we don't have to go through the whole thing. I, I do recommend people check it out, but also maybe you can pick your favorites and then maybe something you changed your mind.[00:15:03] Principle: Version Everything You Do[00:15:03] I think several of them actually are about versioning , which like maybe that bias the interview studying a little bit.[00:15:12] Yeah. But I, I really think version everything you do, because in experimentation time, because when you do an experiment, you need some version there because if you wanna pr like publish those. , you need something to go back to. And the number of people who like don't version things, it is just a lot. It's also a lot to expect for someone to commit their code every time they like.[00:15:33] Mm-hmm. train their model. But I think like having those practices is definitely worth it. When you say versioning,[00:15:39] you mean versioning code.[00:15:40] versioning code versioning data, like everything around a single like trial run.[00:15:45] So version code get fine. Mm-hmm. versioning data not[00:15:48] as settled. Yeah. I think that part, like you can start with something super hacky, which is every time you run your script, like just save a copy of your training set.[00:16:00] Well, most training sets are not that big. Yeah. Like at least when people are like developing on their computer, it. Whatever. It's not that big. Just save a copy somewhere. Put it ass three, like it's fine. It's worth it. Uhhuh, . I think there's also like tools like dvc like data versioning kind of tools. I think also like weights and biases and these experiment track like ML flow, the experiment tracking tools have these hooks to version your data for you.[00:16:23] I don't know how well they work these days, but . Yeah, just something around like versioning. I think I definitely agree with[00:16:30] Principle: Always validate your Data[00:16:30] I'm. Super, super big into data validation. People call it monitoring. I used to think it was like monitoring. I realize now like how little at my previous company, we just like validated the input data going into these pipelines and even talking to people in the interview study people are not doing.[00:16:48] Data validation, they see that their ML performance is dropping and they're like, I don't know why. What's going on ? And when you dig into it, it's a really fascinating, interesting, like a really interesting research problem. A lot of data validation techniques for machine learning result in too many false positive alerts.[00:17:04] And I have a paper got rejected and we're resubmitting on this. But yeah, like there, it's active research problem. How do you create meaningful alerts, especially when you have tons of features or you have large data sets, that's a really hard problem, but having some basic data validation check, like check that your data is complete.[00:17:23] Check that your schema matches up. Check that your most frequent, like your. Most frequently occurring value is the same. Your vocabulary isn't changing if it's a large language model. These are things that I definitely think I could have. I should have said that I did say data validation, but I didn't like, like spell it out.[00:17:39] Have you, have you looked into any of the current data observability platforms like Montecarlo or Big I I think you, I think you have some experience with that as[00:17:47] well. Yeah. I looked at a Monte car. Couple of years back, I haven't looked into big eye. I think that designing data validation for ML is a different problem because in the machine learning setting, you can allow, there's like a tolerance for how corrupted your data is and you can still get meaningful prediction.[00:18:05] Like that's the whole point of machine learning. Yeah, so like. A lot of the times, like by definition, your data observability platform is gonna give you false positives if you just care about the ML outputs. So the solution really, at least our paper, has this scheme where we learn from performance drops to kind of iterate on the precision of the data validation, but it's a hybrid of like very old databases techniques as well as kind of adapting it to the ML setting.[00:18:33] Heuristics for Model Architecture Selection[00:18:33] So you're an expert in the whole stack. I think I, I talk with a lot of founders, CTOs right now that are saying, how can I get more ML capabilities in, in my application? Especially when it comes to LLMs. Mm-hmm. , which are kind of the, the talk of the town. Yeah. How should people think about which models to use, especially when it comes to size and how much data they need to actually make them useful, for example, PT three is 175 billion parameters co-pilot use as a 12 billion model.[00:19:02] Yeah. So it's much smaller, but it's very good for what it does. Do you have any heuristics or mental models that you use when teams should think about what models to use and how big they need it to be?[00:19:12] Yeah I think that the. Precursor to this is the operational capabilities that these teams have. Do they have the capability to like literally host their own model, serve their own model, or would they rather use an api?[00:19:25] Mm-hmm. , a lot of teams like don't have the capability to maintain the actual model artifact. So even like the process of kind of. Fine tuning A G P T or distilling that, doing something like it's not feasible because they're not gonna have someone to maintain it over time. I see this with like some of the labs, like the people that we work with or like the low-code, no-code.[00:19:47] Or you have to have like really strong ML engineers right over time to like be able to have your own model. So that's one thing. The other thing is these G P T, these, these large language models, they're really good. , like giving you useful outputs. Mm-hmm. compared to like creating your own thing. Mm-hmm.[00:20:02] even if it's smaller, but you have to be okay with the latency. Mm-hmm. and the cost that comes out of it. In the interview study, we talk to people who are keeping their own, like in memory stores to like cash frequently. I, I don't know, like whatever it takes to like avoid calling the Uhhuh API multiple types, but people are creative.[00:20:22] People will do this. I don't think. That it's bad to rely on like a large language model or an api. I think it like in the long term, is honestly better for certain teams than trying to do their own thing on[00:20:36] house.[00:20:36] The LLMOps Stack[00:20:36] How's the L l M ops stack look like then? If people are consuming this APIs, like is there a lot of difference in under They manage the, the data, the.[00:20:46] Well,[00:20:46] I'll tell you the things that I've seen that are unified people need like a state management tool because the experience of working with a L L M provi, like A G P T is, mm-hmm. . I'm gonna try start out with these prompts and as I learn how to do this, I'm gonna iterate on these prompts. These prompts are gonna end up being this like dynamic.[00:21:07] Over time. And also they might be a function of like the most recent queries Tonight database or something. So the prompts are always changing. They need some way to manage that. Mm-hmm. , like I think that's a stateful experience and I don't see the like, like the open AI API or whatever, like really baking that assumption in into their model.[00:21:26] They do keep a history of your[00:21:27] prompts that help history. I'm not so sure. , a lot of times prompts are like, fetch the most recent similar data in my database, Uhhuh, , and then inject that into the pump prompt. Mm-hmm. . So I don't know how, Okay. Like you wanna somehow unify that and like make sure that's the same all the time.[00:21:44] You want prompt compiler. Yeah, . I think there's some startup probably doing that. That's definitely one thing. And then another thing that we found very interesting is that when people put these. LLMs in production, a lot of the bugs that they observe are corrected by a filter. Don't output something like this.[00:22:05] Yes. Or don't do this like, so there's, or please output G on, yeah. . So these pipelines end up becoming a hybrid of like the API uhhuh, they're. Service that like pings their database for the most recent things to put in their prompt. And then a bunch of filters, they add their own filters. So like what is the system that allows people to build, build such a pipeline, this like hybrid kind of filter and ML model and dynamic thing.[00:22:30] So, so I think like, The l l m stack, like is looking like the ML ops thing right in this way of like hacking together different solutions, managing state all across the pipeline monitoring, quick feedback loop.[00:22:44] Yeah. You had one, uh, just to close out the, the tweet thread thing as well, but this is all also relevant.[00:22:50] Shadow Models[00:22:50] You have an opinion about shadowing a less complicated model in production to fall back on. Yeah. Is that a good summary?[00:22:55] The shadowing thing only works in situations where you don. Need direct feedback from. The user because then you can like very reasonably serve it like Yeah, as as long, like you can benchmark that against the one that's currently in production, if that makes sense.[00:23:15] Right. Otherwise it's too path dependent or whatever to.[00:23:18] evaluate. Um, and a lot of services can benefit from shadowing. Like any, like I used to work a lot on predictive analytics, predictive maintenance, like stuff like that, that didn't have, um, immediate outputs. Mm-hmm. or like immediate human feedback. So that was great and okay, and a great way to like test the model.[00:23:36] Got it. But I think as. Increasingly trying to generate predictions that consumers immediately interact with. It might not be I, I'm sure there's an equivalent or a way to adapt it. Mm-hmm. AV testing, stage deployment, that's in the paper.[00:23:53] Keeping Up With Research[00:23:53] Especially with keeping up with all the new thing. That's one thing that I struggle with and I think preparing for this. I read a lot of your papers and I'm always like, how do you keep up with, with all of this stuff?[00:24:02] How should people do it? You know? Like, now, l l M is like the hot thing, right? There's like the, there's like the chinchilla study. There's like a lot of cool stuff coming out. Like what's. U O for like staying on top of this research, reading it. Yeah. How do you figure out which ones are worth reading?[00:24:16] Which ones are kind of like just skim through? I read all of yours really firmly. , but I mean other ones that get skimmed through, how should people figure it out?[00:24:24] Yeah, so I think. I'm not the best person to ask for this because I am in a university and every week get to go to amazing talks. Mm-hmm. and like engage with the author by the authors.[00:24:35] Yeah. Right. Yeah. Yeah. So it's like, I don't know, I feel like all the opportunities are in my lap and still I'm struggling to keep up, if that makes sense. Mm-hmm. . I used to keep like running like a bookmark list of papers or things that I want to read. But I think every new researcher does that and they realize it's not you worth their time.[00:24:52] Right? Like they will eventually get to reading the paper if it's absolutely critical. No, it's, it's true, it's true. So like we've, I've adopted this mindset and like somehow, like I do end up reading things and the things that I miss, like I don't have the fo. Around. So I highly encourage people to take that mentality.[00:25:10] I also, I think this is like my personal taste, but I love looking into the GitHub repos that people are actually using, and that usually gives me a sense for like, what are the actual problems that people have? I find that people on Twitter, like sometimes myself included, will say things, but you, it's not how big of a problem is it?[00:25:29] Mm-hmm. , it's not. Yeah, like , I find that like just looking at the repos, looking at the issues, looking at how it's evolved over time, that really, really helps. So you're,[00:25:40] to be specific, you're not talking about paper repos?[00:25:43] No, no, no, no. I'm talking about tools, but tools also come with papers a lot in, um, databases.[00:25:49] Yeah. Yeah. I think ML specifically, I think there's way too much ML research out there and yeah, like so many papers out there, archive is like, kind of flooded. Yeah.[00:26:00] It's like 16% of old papers produced.[00:26:02] It's, it's crazy. . I don't know if it's a good use of time to try to read all of them, to be completely honest.[00:26:10] Grounded Theory for Problem Discovery[00:26:10] You have a very ethnographic approach, like you do interviews and I, I assume like you just kinda observe and don't Yeah. Uh, prescribe anything. And then you look at those GitHub issues and you try to dig through from like production, like what is this orientation? Is there like a research methodology that you're super influenced by that guides you like this?[00:26:28] I wish that I had. Like awareness and language to be able to talk about this. Uhhuh, , . I[00:26:37] don't know. I, I think it's, I think it's a bit different than others who just have a technology they wanna play with and then they, they just ignore, like they don't do as much, uh, like people research[00:26:47] as[00:26:47] you do. So the HCI I researchers like, Have done this forever and ever and ever.[00:26:53] Yeah. But grounded theory is a very common methodology when it comes to trying to understand more about a topic. Yeah. Which is you go in, you observe a little bit, and then you update your assumptions and you keep doing this process until you have stopped updating your assumptions. . And I really like that approach when it comes to.[00:27:13] Just kind of understanding the state of the world when it comes to like a cer, like LLMs or whatever, until I feel like, like there was like a point in time for like lops on like tabular data prior to these large language models. I feel like I, I'd gotten the space and like now that these like large language models have come out and people are really trying to use them.[00:27:35] They're tabular kind of predictions that they used to in the past. Like they're incorporating language data, they're incorporating stuff like customer feedback from the users or whatever it is to make better predictions. I feel like that's totally changing the game now, and I'm still like, Why, why is this the case?[00:27:52] Was were the models not good enough? Do people feel like they're behind? Mm-hmm. ? I don't know. I try to talk to people and like, yeah, I have no answers.[00:27:59] Google Brain vs Academia[00:27:59] So[00:27:59] how does the industry buzz and focus influence what stuff the research teams work on? Obviously arch language models, everybody wants to build on them.[00:28:08] When you're looking at, you know, other peers in the, in the PhD space, are they saying, oh, I'm gonna move my research towards this area? Or are they just kind of focused on the idea of the[00:28:18] first. . This is a good question. I think that we're at an interesting time where the kind of research a PhD student in an academic institution at CS can do is very different from the research that a large company, because there aren't like, There just aren't the resources.[00:28:39] Mm-hmm. that large companies compute resources. There isn't the data. And so now PhD students I think are like, if they want to do something better than industry could do it, like there's like a different class of problems that we have to work on because we'll never be able to compete. So I think that's, yeah, I think that's really hard.[00:28:56] I think a lot of PhD students, like myself included, are trying to figure out like, what is it that we can do? Like we see the, the state of the field progressing and we see. , why are we here? If we wanna train language model, I don't, but if somebody wants to train language models, they should not be at uc.[00:29:11] Berkeley, , they shouldn't .[00:29:15] I think it's, there's a sort of big, gets bigger mentality when it comes to training because obviously the big companies have all the data, all the money. But I was kind of inspired by Luther ai. Mm-hmm. , um, which like basically did independent reproductions Yeah. Of G P T three.[00:29:30] Don't you think like that is a proof of, of existence that it is possible to do independently?[00:29:34] Totally. I think that kind of reproducing research is interesting because it doesn't lead to a paper. Like PhD students are still like, you can only graduate when you have papers. Yeah. So to have a whole lab set.[00:29:46] I think Stanford is interesting cuz they did do this like reproducing some of the language models. I think it should be a write[00:29:50] a passage for like every year, year one PhD. You[00:29:53] must reproduce everything. I won't say that no one's done it, but I do understand that there's an incentive to do new work because that's what will give you the paper.[00:30:00] Yeah. So will you put 20 of your students to. I feel like only a Stanford or somebody who like really has a plan to make that like a five plus year. Mm-hmm. research agenda. And that's just the first step sort of thing. Like, I can't imagine every PhD student wants to do that. Well, I'm just[00:30:17] saying, I, I, I feel like that there will be clouds, uh, the, the, you know, the big three clouds.[00:30:21] Mm-hmm. Probably the Microsoft will give you credits to do whatever you want. And then it's on you to sort of collect the data but like there of existence that it is possible to[00:30:30] It's definitely possible. Yeah. I think it's significantly harder. Like collecting the data is kind of hard. Like just like because you have the cloud credits doesn't mean like you have a cluster that has SREs backing it.[00:30:42] Mm-hmm. who helped you run your experiments. Right, right. Like if you are at Google Rain. Yeah. I was there what, like five, six years ago. God, like I read an experiment and I didn. Problems. Like it was just there. Problems . It's not like I'm like running on a tiny slur cluster, like watching everything fail every five.[00:31:01] It's like, this is why I don't train models now, because I know that's not a good use of my time. Like I'll be in so many like SRE issues. Yeah. If I do it now, even if I have cloud credits. Right. So, Yeah, I think it's, it can feel disheartening. , your PhD student training models,[00:31:18] well, you're working on better paradigms for everyone else.[00:31:21] You know? That's[00:31:22] the goal. I don't know if that's like forced, because I'm in a PhD program, , like maybe if I were someone else, I'd be training models somewhere else. I don't know. Who knows? Yeah. Yeah.[00:31:30] You've read a whole post on this, right? Choosing between a PhD and going into. Obviously open ai. Mm-hmm. is kinda like the place where if you're a researcher you want to go go work in three models.[00:31:41] Advice for New Grads[00:31:41] Mm-hmm. , how should people think about it? What are like maybe areas of research that are underappreciated in industry that you're really excited about at a PhD level? Hmm.[00:31:52] I think I wrote that post for new grads. . So it might not be as applicable like as a new grad. Like every new grad is governed by, oh, not every, a good number of new grads are governed by, like, I wanna do work on something that's impactful and I want to become very known for this.[00:32:06] Mm-hmm. , like, that's like , like a lot of, but like they don't really, they're walking outta the world for the first time almost. So for that reason, I think that like it's worth working on problems. We'll like work on any data management research or platform in an industry that's like working on Providence or working on making it more efficient to train model or something like.[00:32:29] You know, that will get used in the future. Mm-hmm. . So it might be worth just going and working on that in terms of, I guess like going to work at a place like OpenAI or something. I do think that they're doing very interesting work. I think that it's like not a fad. These models are really interesting.[00:32:44] Mm-hmm. and like, they will only get more interesting if you throw more compute Right. And more data at them. So it, it seems like these industry companies. Doing something interesting. I don't know much more than that. .[00:32:59] Helping Minorities in CS[00:32:59] Cool. What are other groups, organizations, I know you, you're involved with, uh, you were involved with She Plus Plus Helping with the great name.[00:33:07] Yeah, I just[00:33:08] got it.[00:33:10] when you say it[00:33:10] out loud, didn't name Start in 2012. Long time ago. Yeah.[00:33:15] What are some of the organizations you wanna highlight? Anything that that comes to?[00:33:20] Yeah. Well, I mean, shva Plus is great. They work on kind of getting more underrepresented minorities in like high school, interested, kind of encoding, like I remember like organizing this when I was in college, like for high schoolers, inviting them to Stanford and just showing them Silicon Valley.[00:33:38] Mm-hmm. and the number of students who went from like, I don't know what I wanna do to, like, I am going to major or minor in c. Almost all of them, I think. I think like people are just not aware of the opportunities in, like, I didn't really know what a programmer was like. I remember in Texas, , like in a small town, like it's, it's not like one of the students I've mentored, their dad was a vc, so they knew that VC is a career path.[00:34:04] Uhhuh, . And it's like, I didn't even know, like I see like, like stuff like this, right? It's like just raising your a. Yeah. Or just exposure. Mm-hmm. , like people who, kids who grow up in Silicon Valley, I think like they're just in a different world and they see different things than people who are outside of Silicon Valley.[00:34:20] So, yeah, I think Chiles West does a great job of like really trying to like, Expose people who would never have had that opportunity. I think there's like also a couple of interesting programs at Berkeley that I'm somewhat involved in. Mm-hmm. , there's dare, which is like mentoring underrepresented students, like giving research opportunities and whatnot to them and Cs.[00:34:41] That's very interesting. And I'm involved with like a summer program that's like an r u also for underrepresented minorities who are undergrads. , find that that's cool and fun. I don't know. There aren't that many women in databases. So compared to all the people out there. ? Yeah.[00:35:00] My wife, she graduated and applied physics.[00:35:02] Mm-hmm. . And she had a similar, similar feeling when she was in, in school.[00:35:06] Lightning Round[00:35:06] All right. Let's jump into the lining ground. So your favorite AI product.[00:35:12] I really like. Stable diffusion, like managed offerings or whatever. I use them now to generate all of my figures for any talks that I give. I think it's incredible.[00:35:25] I'm able to do this or all of my like pictures, not like graphs or whatever, .[00:35:31] It'd be great if they could do that. Really looking[00:35:34] forward to it. But I, I love, like, I'll put things like bridging the gap between development and production or whatever. I'll do like a bridge between a sandbox and a city. Like, and it'll make it, yeah.[00:35:46] like, I think that's super cool. Yeah. Like you can be a little, I, I enjoy making talks a lot more because of , these like dream studio, I, I don't even know what they're called, what organization they're behind. I think that is from Stability. Stability,[00:35:58] okay. Yeah. But then there's, there's like Lexi there. We interviewed one that's focused on products that's Flare ai, the beauty of stable diffusion being open sources.[00:36:07] Yeah. There's 10[00:36:07] of these. Totally, totally. I'll just use whichever ones. I have credits on .[00:36:13] A lot of people focus on, like have different focuses, like Sure. Mid Journey will have an art style as a focus. Mm-hmm. and then some people have people as the focus for scenes. I, I feel like just raw, stable diffusion two probably is the[00:36:24] best.[00:36:24] Yeah. Yeah. But I don't do, I don't have images of people in my slides . Yeah, yeah. Yeah. That'd be a little bit weird.[00:36:31] So a year from now, what do you think people will be most surprised by in ai? What's on the horizon and about to come, but people don't realize. .[00:36:39] I don't know if this will be, this is related to the AI part of things or like an AI advancement, but I consistently think people underestimate the data management challenges.[00:36:50] Ooh. In putting these things in production. Uhhuh, . And I think people get frustrated that they really try, they see these like amazing prototypes, but they cannot for the life of them, figure out how to leverage them in their organization. And I think. That frustration will be collectively felt by people as it's like it's happened in the past, not for LLMs, but for other machine learning models.[00:37:15] I think people will turn to whatever it, it's just gonna be really hard, but we're gonna feel that collective frustration like next year is what I think.[00:37:22] And we talked a little bit before the show about data management platforms. Yeah. Do you have a spec for what that[00:37:27] is? The broad definition is a system that handles kind of execution.[00:37:33] or orchestration of different like data transformations, data related transformation in your pipeline. It's super broad. So like feature stores, part of it, monitoring is part of it. Like things that are not like your post request to open AI's, p i, , .[00:37:51] What's one AI thing you would pay for if someone built.[00:37:54] So whenever I do like web development or front end projects or like build dashboards, like often I want to manage my styles in a nice way.[00:38:02] Like I wanna generate a color palette, uhhuh, and I wanna manage it, and I wanna inject it throughout the application. And I also wanna be able to change it over time. Yeah. I don't know how to do this. Well, ? Yeah, in like large or E even like, I don't know, just like not even that large of projects. Like recently I was building my own like Jupyter Notebook cuz you can do it now.[00:38:23] I'm super excited by this. I think web assembly is like really changed a lot of stuff. So I was like building my own Jupyter Notebook just for fun. And I used some website to generate a color palette that I liked and then I was like, how do I. Inject this style like consist because I was learning next for the first time.[00:38:39] Yeah. And I was using next ui. Yeah. And then I was like, okay, like I could just use css but then like, is that the way to do it for this? Like co-pilot's not gonna tell me how to do this. There's too many options. Yeah. So just like, let me like just read my code and read and give me a color palette and allow me to change it over time and have this I opera.[00:38:58] With different frameworks, I would pay like $5 a month for this.[00:39:01] Yeah, yeah, yeah. It's, it's a, you know, the classic approach to this is have a design system and then maintain it. Yeah. I'm not designing Exactly. Do this. Yeah, yeah, yeah, yeah. This is where sort of the front end world eats its own tail because there's like, 10 different options.[00:39:15] They're all awesome. Yeah, you would know . I'm like, I have to apologize on behalf of all those people. Cuz like I, I know like all the individual solutions individually, but I also don't know what to recommend to you .[00:39:28] So like that's therein lies is the thing, right? Like, ai, solve this for me please. ,[00:39:35] what's one thing you want everyone to take away about?[00:39:39] I think it's really exciting to me in a time like this where we're getting to see like major technological advances like in front of our eyes. Maybe the last time that we saw something of this scale was probably like, I don't know, like I was young, but still like Google and YouTube and those. It's like they came out and it was like, wow, like the internet is so cool , and I think we're getting to see something like that again.[00:40:05] Yeah. Yeah. I think that's just so exciting. To be a part of it somehow, and maybe I'm like surrounded by a bunch of like people who are like, oh, like it's just a fad or it's just a phase. But I don't think so. Mm-hmm. , I think I'm like fairly grounded. So yeah. That's the one takeaway I have. It's, it's not a fad.[00:40:24] My grandma asked me about chat, g p t, she doesn't know what a database is, but she knows about chat. G p t I think that's really crazy. , what does she, what does she use it for? No, she just like saw a video about it. Ah, yeah. On like Instagram or not, she's not like on like something YouTube. She watches YouTube.[00:40:41] She's sorry. She saw like a video on ChatGPT and she was like, what do you think? Is it a fad? And I was like, oh my god. , she like watched after me with this and I was like, do you wanna try it out? She was like, what ? Yeah,[00:40:55] she should.[00:40:55] Yeah, I did. I did. I don't know if she did. So yeah, I sent it to her though.[00:40:59] Well[00:40:59] thank you so much for your time, Sreya. Where should people find you online? Twitter.[00:41:04] Twitter, I mean, email me if you wanna directly contact me. I close my dms cuz I got too many, like being online, exposing yourself to strangers gives you a lot of dms. . Yeah. Yeah. But yeah, you can contact me via email.[00:41:17] I'll respond if I can. Yeah, if there's something I could actually be helpful with, so, oh,[00:41:22] awesome.[00:41:23] Thank you. Yeah, thanks for, thanks for. Get full access to Latent Space at www.latent.space/subscribe
41:4529/03/2023
Emergency Pod: ChatGPT's App Store Moment (w/ OpenAI's Logan Kilpatrick, LindyAI's Florent Crivello and Nader Dabit)
This blogpost has been updated since original release to add more links and references.The ChatGPT Plugins announcement today could be viewed as the launch of ChatGPT’s “App Store”, a moment as significant as when Apple opened its App Store for the iPhone in 2008 or when Facebook let developers loose on its Open Graph in 2010. With a dozen lines of simple JSON and a mostly-english prompt to help ChatGPT understand what the plugin does, developers will be able to add extensions to ChatGPT to get information and trigger actions in the real world. OpenAI itself launched with some killer first party plugins for: * Browsing the web, * writing AND executing Python code (in an effortlessly multimodal way), * retrieving embedded documents from external datastores,* as well as 11 launch partner plugins from Expedia to Milo to Zapier.My recap thread was well received:But the thing that broke my brain was that ChatGPT’s Python Interpreter plugin can run nontrivial code - users can upload video files and ask ChatGPT to edit it, meaning it now has gone beyond mere chat to offer a substantial compute platform with storage, memory and file upload/download. I immediately started my first AI Twitter Space to process this historical moment with Alessio and friends of the pod live. OpenAI’s Logan (see Episode 1 from *last month*…) suggested that you might be able to link ChatGPT up with Zapier triggers to do arbitrary tasks! and then Flo Crivello, who just launched his AI Assistant startup Lindy, joined us to discuss the builder perspective.Tune in on this EMERGENCY EPISODE of Latent Space to hear developers ask and debate all the issues spilling out from the ChatGPT Plugins launch - and let us know in the comments if you want more/have further questions!SPECIAL NOTE: I was caught up in the hype and was far more negative on Replit than I initially intended as I tried to figure out this new ChatGPT programming paradigm. I regret this. Replit is extremely innovative and well positioned to help you develop and host ChatGPT plugins, and of course Amjad is already on top of it:Mea culpa.Timestamps* [00:00:38] First Reactions to ChatGPT Plugins* [00:07:53] Q&A: Keeping up with AI* [00:10:39] Q&A: ChatGPT Intepreter changes Programming* [00:12:27] Q&A: ChatGPT for Education* [00:15:21] Q&A: GPT4 Sketch to Website Demo* [00:16:32] Q&A: AI Competition and Human Jobs* [00:18:44] ChatGPT Plugins as App Store* [00:34:40] Google vs ChatGPT* [00:36:04] Nader Dabit on Selling His GPT App* [00:43:16] Q&A: ChatGPT Waitlist and Voice* [00:45:26] LangChain with Human in the Loop* [00:46:58] Google vs Microsoft vs Apple* [00:51:43] ChatGPT Plugin Ideas* [00:53:49] Not an app store?* [00:55:24] LangChain and the Future of AI* [01:00:48] Q&A: ChatGPT Bots and Cronjobs* [01:04:43] Logan Joins Us!* [01:07:14] Q&A: Plugins Rollout* [01:08:26] Q&A: Plugins Discovery* [01:10:00] Q&A: OpenAI vs BingChat* [01:11:03] Q&A: App Store Monetization* [01:14:45] Q&A: ChatGPT Plugins API* [01:17:17] Q&A: Python Interpreter* [01:19:58] The History of App Stores and Marketplaces* [01:22:40] LindyAI's Flo Crivello Joins Us* [01:29:42] AI Safety* [01:31:07] Multimodal GPT4* [01:32:10] Designing AI-safe APIs* [01:34:39] Flo's Closing CommentsTranscript[00:00:00] Hello and welcome to the Latent Space Emergency episode. This is our first ever where chatty PT just dropped a plugin ecosystem today, or at least they demoed their plugins. It's still on the wait list, but it is the app store moment for ai. And we did an emergency two hour space with Logan from OpenAI and Flo Coveo from Lin AI and a bunch of our friends.[00:00:28] And if you ever wanted to listen to what it's like to hear developers process in real time when a new launch happens, this is it. Enjoy,[00:00:38] First Reactions to ChatGPT Plugins[00:00:38] I assume everyone has read the blog post. For me the, the big s**t was do you see Greg Brockman's tweet about FFMPEG? I did not. I should check it out. It is amazing. Okay, so. So ChatGPT can generate Python code. We knew this, this is not new, and they can now run the code that it generates.[00:00:58] This is not new. I mean this is like, this is good. It's not like surprising. It's, it's fine. It can run FFMPEG code. You can upload a file, ask it to edit the video file, and it can process the video file and then it can give you the link to download the video file. So it's a general purpose compute platform.[00:01:22] Wow. Did they show how to do this? Agents? I just, I just, I just pinned it. I just, it did I, did I turn into this space? I dunno how to use it. Yeah, it's, it's showing up there. Okay. It can run like is. Is, is, is my And by, by the way hi to people. I, I don't know how to run spaces. I, I not something I normally do.[00:01:42] But You wanna say something? Please request. But yeah, reactions have a look at this video because it run, it generates and runs video editing code. You can upload any arbitrary file. It seems to have good enough compute and memory and file storage. This is not chat anymore, man. I don't know what the hell this is.[00:02:01] What, what is this?[00:02:02] Well, progress has been all faster than I expected. . That's all I can, I, I, I don't know how to respond. . Yeah. It's pretty wild. I wonder, I wonder, I'm wondering how, how this will affect, like opening up the app store different from, let's say Apple App Store when it opened up. Because there are a lot of, of big companies just building stuff already and how like a small developer will be able to, to build something that's not already there.[00:02:31] I dunno. It will be interesting. So one thing that's really nice, have you seen the installation process for the plugins? It's right at the bottom of the blog post and you have to play the video to kind of see it, but literally anybody can write your own plugin. It's a small little json file. It's, it's literally like 10 lines of code.[00:02:49] It's 10 nights of, you described what your plugin does in English, you given an open API spec. That's it. That, that's, that's the plugin. It's amazing. You can distribute your plugin. This is, this is, this is easier than extensions manifest v3, which nobody knows how to use. This is English.[00:03:15] You write English . So, so, yeah. I mean I think, I think I think there'll be a lot of people trying to develop for this if they can get access, which you know, everybody's on a wait list. I, I've, I've signed up to 200 wait lists this week. . I wonder if, if it'll be different if you, if you sign up as a, as a developer or as the chat user.[00:03:35] Hopefully it doesn't matter, right? Use different emails and sign up to both. Let's, let's just see, in fact, use t to generate like, plausible sounding reasons for why you want to build whatever. Cause they don.[00:03:47] But yeah, I mean, how do you compete? I, I don't know, man. You know, it, it's really OpenAI is definitely a partnership strategy to do what they do here which means they're essentially picking favorites. So if you're a competitor of Expedia Kayak Open Table Wolf from Zapier, you're a s**t out of luck, kind of, you know?[00:04:06] Cause these are presumptive winners of their spaces. Right. And it'll happen in too many industries, probably. Right. I was thinking about maybe summarization or, or I don't know, YouTube video summarization, but there seems to be some application of that already on the examples that you shared. Yeah, yeah, yeah.[00:04:26] They have shared that, but I think there's always room to improve the experience. It's just, you know It's interesting which platform, like sort of platform strategy, right? Like if you write an OpenAI chat plugin, you instantly gain access to a hundred million users, right? All of them can instantly use your thing.[00:04:47] Whereas if you are a standalone app or company, good luck trying to able to use OpenAI through you. There's just no point. So you much rather just be on OpenAI platform and promote there. The the fortunate thing is they don't have some kind of like popularity ranking yet. Actually, someone should go open, someone should do register, like OpenAI plugins list.com or something where like everyone can like submit their own opening app plugins and like upload them, review them cuz this like, this is not a complete app store without reviews and a rating system and a reputation system and probably monetization opening app probably doesn't care about that.[00:05:26] But I mean, I can go start that right now. F**k. I can go start it right now.[00:05:34] Yeah, it'll, it'll take a while, right? Like this is the, like the basic version of the, of the app evolving. But this is a pretty basic version. Yeah. The basic version can browse the web, it can write, write an execute code. It can retrieve you know, we can retrieve data from documents, right? So all the documents search just died.[00:06:02] There's like five of these in Y Combinator right now. Oh.[00:06:08] Examples. Pretty crazy how, how they use the FFMPEG library or, I dunno if I'm saying that correctly, but right in there. You don't need to, to write code to,[00:06:27] it's crazy. Dunno. Yeah. Any reactions? Please, please, you know, open space. Anyone can request a speaker. Oh, Ash, come on in. Ash. I have to add you a speaker. Yeah, we're, we're just reacting here. I just, I, I needed a place to talk and I'm in Japan and I don't have anyone else to talk to, so I need, I, I I just want to share this moment.[00:06:46] I think it's a special moment in history. This is the biggest new app source since ever. Yeah. Hey, Shawn. I think plugin is already taken. . Oh man. Someone, someone bought it already. Yep. , of course. Right? Of course. , what are your reactions? What how are you feeling? What's what are you seeing out there?[00:07:07] Just crowdsource all the tweeting. Yeah, man, it's, it's been wild. I mean, I get out of there to like five minutes and then anything drops, you know, , I think productivity today will be like zero. If I, if I still, like, I quit my job you know, a few weeks ago but I would not be working today. There, there's no point.[00:07:26] There's nothing else. There's nothing else that's important, like, nothing's going on. Like this is the only story. Yep. . I wonder if you have any, any frameworks or anyone that's listening any frameworks on, on how you're handling all of this new, new stuff. Like every single day if something new comes up and, or you can like get the, the wait list invitations to, to use the new products.[00:07:52] Q&A: Keeping up with AI[00:07:52] Like, for example, today I just got the, the one from GIK cli and I was just playing around with that. And then suddenly I started to see all of the, these Twitter threads with announcements. It's getting crazy just to follow up with, with the stuff. And every day something new comes up and started. I was starting to feel a lot of formal, you know, like, h how do you keep up with all of these?[00:08:12] Or how do you focus? Does anyone have any, any good frameworks for that? Well, feel free to respond. Also, we, we have some more room if anyone wants to share your feelings. This is a, this is a safe space to share your feelings because. We all dunno how to react right now. I don't know. I just, I, I, I have a few notifications on for OpenAI employees and people that I do that I think do good recaps.[00:08:37] So in other words, find the people who are high signal and who do a lot of gathering of other people's stuff for, and then just subscribe to those people and trust that that is 90% of it and forget the 10%[00:08:57] Alright. And Sean probably, I have, I have another question. So I can't really figure out like what's left for us to do, you know, without AI tools. Like what, what is we learn next? You know, there's no learning some coding stuff, because you can only do that. You know, we can't do arts, we can't do poetry.[00:09:17] Farming[00:09:17] bakery, probably making things with your hands. Enjoying the sun.[00:09:23] Do you guys think this should be regulated? Like you don't go more than like the speed is going? I don't know. I dunno. There's, there's no point. Like if, like, if you regulate OpenAI, then someone else will come along. The secret is out now that you can't do this, and at most you'll slow things down by 10 years.[00:09:44] You called the secret. This is the end. . Yeah. Yeah. I, I don't know. Secret is out. China's trying to do it right, so I don't know if people have seen, but like China was, was fairly strict on crypto, which is probably good for them. And now they're, they're also trying to clamp down on AI stuff, which is funny because oa like they're, you know, the m i t of of China Ihu, I was actually doing like producing like really good bilingual models.[00:10:10] But yeah, they, they seem to be locking this down, so we'll see. We'll see. Right? Like you know, in, in, in sort of the, the free world there, there's open innovation that may be unsafe. OpenAI, try to be safe. You know, there, there's a big part of the blog post that was talk, talking about red team meeting and all that.[00:10:24] I'm sure every one of us skipped it. I skipped it. And then and then we just care about capabilities and now that, you know, every time people have their minds opened, like, I did not know Ron. EG in chat.[00:10:38] Q&A: ChatGPT Intepreter changes Programming[00:10:38] Now that I know my conception of what a REPL is, or literate programming or what a notebook is, is completely blown outta the water, right?[00:10:44] Like there's no like this, this is a new form factor for me. So not now that I know that I won't be innovating on that or trying to, to shape this into something that I can use because I want to use this, and this is, this is clearly better. Does, does this ha have to do with, with the, like AI as backend?[00:11:00] Yeah. Ideas that have been, yeah. You know, GP as backend. So, so apparently I had a few friends reach out to those guys and they're not doing that because it's not mature enough. Like it works for a simple demo. So, so for, for those who don't know ScaleAI did a hackathon I think two months ago just before I did mine.[00:11:18] And the winner on the hackathon was, was something called GPT is all you need for backend. And they actually what in register? DBC is backend.com. But as far as I can tell, they're not gonna start a company based on that because if you even push a little bit, it falls apart, right? So GPT3 wasn't good enough for that.[00:11:36] Maybe GPT4 is maybe GPT5, but then it'll still be super slow and super expensive. Like you don't want to run, you know, a large language model on every API request. So I don't know. I think it'll be good for scaffolding. I think it'll be good for re type use cases. Like, Hey, I need to edit this video on an ad hoc basis.[00:11:53] I don't, I don't want to learn FFMPEG. I don't need to now, because I can just talk to ChatGPT. That makes sense. But if you want a reliable, scalable backend you probably don't want to use it on a large language model, but that's okay because language model can probably help you write it rather than run it.[00:12:13] Hey, Lessio. Hey guys. Oh yeah. Hey guys. What's up? Hey, yeah, we're, we're just, there's no structure. Just drop your reactions. Let's go. Awesome. Awesome, awesome guys.[00:12:26] Q&A: ChatGPT for Education[00:12:26] What do you think what if Shawn, what do you think if you could use you know AI and the education field, like, you know, like personal attribution system for students?[00:12:35] What's the thought automation education or attribution edu edu education. Yeah. That is the holy grail. This is called the Blooms two Sigma problem. Like the, the, the, one of the big issues of education is we have to teach to the slowest person in the class. And, and, you know, I'm a beneficiary of, of a gifted education system where they take out you know, nominally high IQ people and put them in a separate class.[00:12:56] And, and yeah, we did, we did do better. What if we can personalize every student's experience there's, there's some educational theory. This is called Bloom's two Sigma problem. Where the results will be better. I think that we are closer, but like, I still hope that we're pretty far , which sounds like a negative, like why do I want to deny education to students?[00:13:18] Because if we are there, then we will have achieved theory of mind for ai. The AI has a very good model, is able to develop a representation of who you are, is able to develop theories that the test who you are in, in a short amount of time. And I, it's a very dangerous path to, to go down. So I want, I want us to go slowly rather than fast on, on the education front.[00:13:41] Does that make sense? Yeah, definitely. It makes a lot sense and yeah, definitely. I think personally the education for each student and making it turn the best way would be great. And what do you think how about like, first of all, I'm, I'm having very curious, curious question, you know, like we are having, this week was full of launches, so how you guys are keeping up with if we're not, this is, I created the space though cuz I cannot handle it.[00:14:05] Today, today was my breaking point. I was like I don't know what's happening anymore. Yeah, like every single day I'm just in constant anxiety that like everything I assumed about the world is gonna be thrown up. Like I don't know how to handle it. This is a therapy session, so feel free to express.[00:14:21] Definitely. It's, it's been a very overwhelming feeling for everyone of us like that. I think, you know, like past two weeks and like the industry was definitely a lot, lot of ones we are definitely open for, you know, to discuss more about it. Thanks a lot for this space. Sean. Yeah. Appreciate. Yeah. Va one more thing.[00:14:39] So I think that the most constrained version of education use cases is language teaching. So there are a few language teachers out there speak I think is one of them that is an OpenAI partner. And they're also part of the chat GPT plugin release. , but there are also other language tutor platforms.[00:14:57] You can certainly have your news. There was one that was released maybe like four or five months ago that you can try to see what the experience is like. And you can, you can tell when the teacher has no idea who you are and it breaks the illusion that you're speaking to another human. So I, I just, you can experience that today and, and decipher yourself if we're ready for that.[00:15:14] I hope that we're not ready and it seems like we're not ready. Yeah, definitely, definitely. Thanks a lot for sharing. And guys, what do you think?[00:15:19] Q&A: GPT4 Sketch to Website Demo[00:15:19] Like I, in the launch of four we have show that we could, you know, generate apps and web apps just from you know, like a single simple sketch, you know different tent.[00:15:30] Just start from sketch. So what do you think like how, how it would be impacting the industry? It's all because it's not just like that, that sketch was very, was a very shitty sketch. Right. It was just like drawn on a piece of paper. But if you combine that with the multimodal, like it was that they had another part of that demo where they had a screenshot of the discord the opening eye discord and you're mm-hmm.[00:15:57] and they put it in and it, it like read the entire screen to you and if you can read the entire screen, you can code the entire . Screen. So it's over like[00:16:12] It's definitely, I think interaction, interaction designers, you know, like people who like, think design function still have some time. Yeah. I, I just, I just, I just tried the same thing, you know on bar today and it was like much more better than GPT3 so definitely it's you know, things are really changing.[00:16:30] Q&A: AI Competition and Human Jobs[00:16:30] Great forward. I'm, I'm really worried what we wanna do, you know? Do you think the competition will like stable everything? Like what competition? Anthropic. Well, like Google, Google won't race, I don't think. Google Race, like Google the fight. The one that, the one that launched the W links list of blog posts.[00:16:50] That, that Google.[00:16:55] Well, no, not, not the list. Not the list. Competitions will come. . I have a question. I mean I mean my fear is many of the jobs that are going away, whether it is developer and designers, because I mean, I think GPT four is very capable. So how to deal with it. I mean, it's going to replace, I mean, many of the jobs, that's for sure.[00:17:16] Yeah. It's okay. We'll find new jobs or we'll, we'll not need jobs anymore. We should, we should also, Start universal basic income. That's, that, that is something I, I do believe, yeah, I think the, the main change is going from the web of like, syntax to like the web of Symantec. So if your job is valuable because, you know, a unique syntax or like, you know, how to transform things from like words to syntax, I think that will be a lot less useful going forward.[00:17:45] But the Symantec piece is still important. So a lot of product work, it's not just writing CSS and HTML and like the backend for it. It's a lot more than that. So I just thinking about how do you change your skills to do that. But yeah, even the sketch, you know, you gotta like, you gotta draw the sketch and to draw the sketch, you gotta know where the button should go.[00:18:06] You know, you have, you know, incorrect with it. Yeah. I'm just processing this as I, I just read the whole thing as well. And Yeah, I mean, it's been a wild wild couple of weeks and it's gotten me thinking that maybe all our role was over the past couple years was we were just middlemen to talk to computers, right?[00:18:27] So we're sitting in between, it's over man PMs or business folks or whoever wanna build a product. And then as a software developer, you're just a middle manish talking to the machine and it seems like. N LP is the way forward and, oh, yeah. Yeah. It's, it's been it's been, it's been a while.[00:18:42] ChatGPT Plugins as App Store[00:18:42] Couple of weeks. It's, I feel like we all just have to move either move upstream or, or find other jobs. You just gotta move upstream, either toward product directly. Cuz right now the plugin is yeah, is, is just you know, it's still a very sanitized UI that is controlled by OpenAI. But imagine them opening up the ui portion as well.[00:19:03] So you no longer need to have a siloed product that needs to integrate. ChatGPT instead you can bring your product directly into into ChatGPT, I don't think exactly. I think that would be probably the next next logical move after this, and I'm sure they're already thinking about that.[00:19:22] So that's a great, I don't know if this is, it's wild. What are you guys think? Yeah. Yeah. Like, so before you came up, right, I was, I was talking about this like ChatGPT has at least a hundred million users. Why would you bring people to your platform rather than write a plugin for ChatGPT and use their platform?[00:19:39] It's an open question now. Zapier just launched their integration. OpenAI and OpenAI just launched their integration of Zapier. Which one is gonna be more interesting? Probably OpenAI.[00:19:50] Totally a hundred percent . this is the app store of wow, our century of our decade. Like, I don't know, maybe century. I, I think the thing with ster though, if you think about it, like how many native apps do you download every week, every month versus like how many web things you use. So I think it's all about whether or not long-term opening eyes incentivize to keep broadening the things you can do within the plugin space.[00:20:17] And I think the lab, you know, as this technology gets more widespread, they're gonna have a lot more pressure from regulators, safety, blah, blah, blah. So I'm really curious to see you know, all, all the, all the government stuff that they'll, they'll have a congressional on this in six months and by then it will be completely irrelevant.[00:20:34] It's like that beside that time, they, they, they called it the GameStop guy after he made like 20 million on GameStop. And he just, you know, he was like, yeah, you know, followed the rules, made a bunch of money for those who don't know, unless you're our co-host. On the, we were supposed to drop an episode today, which I was supposed to work on, and then Chatty Phi dropped this thing, and now I, I can't think about anything else.[00:20:59] So this, this is my excuse for not, for for not working on the podcast today. . I know it's funny, we have like three, four recorded ones and spend last week, like GP four came out and we're like, okay, everybody's talking about this is irrelevant. What else? Anything else? Like, but I'm really excited about the, I, I feel like the first, the first use case for this, and I think he tweeted it about it too, is like, before if you had to do like data reformatting and stuff like that, it was really hard to do programmatically.[00:21:32] You know, like you didn't have an natural language interface and now you have it. And before if you had to integrate things together, like you could explain it very easily, but you couldn't like, put the APIs together and now they kind of remove all that part. So I'm excited to see what this looks like.[00:21:48] For commercial use cases, you know, you could see like, is there gonna be like a collaborative ChatGPT where like you're gonna have two, three people in the same conversation working on things. I think there's a lot of ui things that will improve. And so as we have lining from OpenAI for a second, almost pulled them up, but I'm sure you cannot talk about it[00:22:07] But yeah, it'll be interesting to see. Yes, sir. We're extremely excited. Extremely excited. I, I don't, if you, I don't know what else I'm, I'm like, so as far as I can tell there's the, there's hacker and Twitter. I haven't looked at Reddit yet, but I'm sure there's a bunch of reactions on Reddit.[00:22:23] I'm sure there's the OpenAI discord that we can also check out. I got locked out of the discord at some point, but yeah, anyone, anyone else like see news, demos, tweets the whole point of this is that it's live, so please feel free to share on comments or anything like that. But yeah. Yeah, the, the craziest thing I saw was the Mitchell from Hash.[00:22:44] We tweeted about Yes. How the integrations actually work and you just write a open APIs back and then just use natural language to describe what it's supposed to do. And then their model does everything. I wonder if they're using the off-the-shelf model or they have like a fine tune model to actually run integrations.[00:23:02] I wonder, I don't think they'll ever say it. Knowing them, probably they would just use the base one cuz they want, like, I think opening eyes kind of wants a God model, right? There's no point. It's not intellectually interesting to do small models, but like, like it's trivial. Yeah. Yeah. It's, this is a minor optimization problem as far as the, the long arc of history and the, the point is to build a gi safe agi and I, I do think this is kind of safe, right?[00:23:33] Like, . One of the criticisms that people were saying on hacks was that this is very closed. Like it's, it is an app store. At any point opening, I can randomly decide to close this like they did for Codex, and then they change their minds. Whereas if you use something like Alan Chain, it is more open and something that at the same time, like clearly this is a better integration path than long-chain.[00:23:56] Like, I much rather write this kind of plugin than a long-chain plugin. So they, they've managed to, I mean, they know how to ship man, like they're an AI research lab, but they also know how to ship product. Mm-hmm. . Yeah. I, I'm curious to see what the pricing models gonna look like. Also, I mean, if I'm writing the plugin, this is great because I don't even have to take care of the compute, you know, like, I just plug it in, then they actually run everything for me.[00:24:26] Yeah, but how, how it'll be monetized. I mean if the is giving their plugin know Expedia, I mean, people will not go to their website. Yeah. I don't, I mean, yeah. I have no idea that they, I don't think they said also don't super care . Yeah. It's because in the, in the app store, it's transaction driven.[00:24:46] But on Channel G, you're just paying a flat fee every month. So like, you can't really do revenue share on a flat fee. And I don't think that we use like, the Spotify model, but it's like a why not the amount of times? No, wait, wait, wait, wait, wait. Why not , you have Spotify. I just, Spotify model works. Cause swyx has power, right?[00:25:05] Opening has power. Same thing. They have all the audience. Yeah. But every, every every song is like the same value. Like if you listen to song actor to song y. , like, you're gonna make the same money. Like if I'm calling the API to, for like the meme generator or if I'm calling the API for the, you know, business summary thing, they're probably gonna cost the firm things, you know, so it's kind of hard to model up for OpenAI to say, Hey, okay, we're charging, we're going from 20 to 35 bucks a month.[00:25:36] But then like, how do you actually do royalties on a per model basis? Like how do people decide what royalties to negotiate? This probably needs to be a flat fee, but I dunno. Or put your credit card it OpenAI and then every time you wanna use a plugin, you pay for it separately. Uvp, usage based pricing all the way, and then you just get at the end of every month.[00:25:58] Exactly the, the only question mark is like, how much does OpenAI value the training they on and like how much they wanna subsidize the usage. Canada they have, they have promised to not use any of our usage data for training. So, oh, but the, I think like the plugins, it's a, it's a different thing.[00:26:16] It's like, like how you could, you could easily see how are like requests usually structure for like these things, you know, like, are people searching? So how are people searching for flights and stuff like that. I don't know. I haven't read the terms for like the actual plugin, you know, so. Well if anyone has please come up to speak cuz we're all processing this live.[00:26:37] This is the therapy session. Yeah, go ahead. One thing I see is basically you have to change the plugin I mean, to ask anything or even if you did browsing, right? I mean I see. I mean, they are becoming directly competitor to Microsoft also, I think, because now a user can actually just see, I mean, instead of being chat or Google, I mean they, they just.[00:27:04] Basically select the browsing plugin and basically get all the updated data. And other thing I see is basically you have to change the plugins. Like if you want to use the Expedia data, I don't know how it'll fit with the browsing plugin or you can select multiple plugins. But yeah, it is interesting.[00:27:23] I mean, if we get access, yeah, there is no actual browsing plugin. The browsing is a new model. So just like you can select GT three, GT 3 45, GT four, there's a new model now that says browsing alpha. So you, you can use CHATT in browsing mode and then you can use it in plugins mode, which which is a different model again.[00:27:45] So the, the plug browsing don't cross over.[00:27:51] Oh, that's interesting. And how do you see, I mean, in this whole descending, they are becoming competitive to Microsoft or how they're playing it out. I mean, Bing is just by the way, like, yeah, this, this killed the bing wait list. Cuz you don't need to wait for Bing. You can just use the browser mode open of Chatt.[00:28:11] How does it compete? It competes for sure. I don't think Microsoft cares. I don't think OpenAI cares. This is one of those things where like, you know, they are the two, two friends, you know, and they're clearly winning, so who cares? I don't like, I don't imagine it takes any of their mental bandwidth at all.[00:28:29] Yeah. The main thing is Google is Yeah, the main, like how is Google competing? Well let's see. Right. Bard is out there. I haven't got us yet, but could be interesting. Again, like it doesn't seem like they have the shipping capacity or velocity of Open I Microsoft and. That is probably going to bite them eventually because there's already been a big brain drain.[00:28:53] Something like four researchers, four, the top Google Brain researchers left Google Brain for OpenAI in January. And you know, those are the ones that I know about. And I, I imagine there's, there's quite a bit of brain, brain drain and firing going on at Google, so who knows.[00:29:08] All right, well, any other topics, concerns? Hyperventilation, if you just wanna scream I can turn down the volume and you can just, ah, for like five minutes. , that was literally, I was like, I, I need to like scream and just, ah, because what is going on?[00:29:29] I said that I'm filling out the form right now for the Oh, yeah. Okay. So wait list. So use use chat t to fill out that form. Right. And then, and then use a fake, use a different email and fill out the form a different way. This maximizes . I'm going to ask GT for what plugin do I want to build or, right, right.[00:29:51] Exactly. Yeah. Yeah. I, we can brainstorm. My plugins can live. Yeah. I think that will be a fun exercise. Like the, the main thing that breaks my brain is just this, this whole ability to run code, right? Like this is a new notebook, a new ripple. Mm-hmm. It, it looks like it has storage and it has memory.[00:30:08] Probably it has GPUs. That, I mean, can we run Lama inside GP?[00:30:19] I don't know if that's a, a model within a model. I think for me, most of the things come to like, you know, if I have my own personal assistant, what I want the assistant to do. I think like travel is like the first thing that comes to mind. Like, if I could use pt Yeah. Expedia, plug in with my calendar.[00:30:39] Yeah, yeah, yeah, yeah. But it needs to like know where I, where I'm supposed to be going to, you know, like if I just add a calendar that's like I'm going to, you know, room this week. Yeah. And then like can automatically both send my calendar and say, okay, these are like, or like the times that you like to travel, I know that you don't like ops and yada yada, yada.[00:31:00] That's one thing that I've always, we had this thesis at my peers firm about personalized consumer. There's so many website like, . I go to a lot of basketball games and every time I open Ticketmaster or whatever, it always shows me that she's a seat. And like, I'm not gonna see, that's not what I, that's not the tickets I wanna buy, you know?[00:31:18] But doesn't matter how many tickets I buy, never remembers that. So I think a way to say, to see, take all the information in and suggest, Hey, I saw that there's actually a price drop for the specific seats that you want, not for like any seats. You know, I think that would be a, a very good use case. So I've been a personal entertainment assistant for like, travel like going to shows, going to games.[00:31:41] That would be cool. That's what I'll submit on the wait list. Then we'll see if anybody cares. Right. Did you see get Lindy? Yeah. Yeah. At the, maybe you wanna recap, get Lindy for people. I'm gonna pin it up on the. . Yeah. So basically and this is like the kind of like a assistant lend the ai, right?[00:32:03] Yeah. Lend the ai it's on the board right now. Yeah. For those who can see it through the space. Yeah. Yeah. Actually at the AI Thinkers meet up the, the other day, you can basically like create all kind of like personal workflows and you, it kind of looks like integrations like zier, but it's actually just natural language.[00:32:24] So you can pop this thing up on your desktop and say, trying to hire 10 software engineers. So go on LinkedIn and plan 10 software engineers. The next step, draft a, an email that says, I'm the CEO of this company and I'm trying to hire for my team. If you wanna talk. Then the next step is like, send emails to all these people and it's gonna use people data labs or something else that they use on the backend to get the emails.[00:32:50] Then it actually sends the emails and. This is just gonna run in the background as if it was like you actually doing it. It's pretty neat that you don't have to write the actual integrations. Like it just uses natural language so you're not bound by what they build. Like theoretically anything you wanna integrate with, you can just explain to it how it works and it's gonna figure out how to do it.[00:33:12] So there's a wait list now. Flow didn't give us any papers just because we were at the meetup, so I'm also waiting to get access to it, but it looks really, really good. Yeah, so generative AI's top use case is generating wait lists, right? Like we we're, we are, so we have never had such an easy way to generate a lot of wait lists.[00:33:30] A lot of signup for witness. Oh my God. So much interest. So much product market fit. But also you know, one thing that you, you raising this point? I think, I think, I think by the way, I also pin this up. Mindy can support complex roles like no meetings on Fridays, all one-on-ones on Monday. , I like my meetings back to back within five minutes.[00:33:47] Five minutes in between. So it's just arbitrary rules that you could not program in a normal assistant type environment without a large language model. Which is kind of exactly what you want when you're booking your travel, right? Like, hey, I only like aisle seats unless it's it's a flight that is less than one hour that I don't care, right?[00:34:02] Mm-hmm. . So stuff like that I think is, is super interesting. And but also like not a common use case. Like how many times do you travel a year? Like, you know, five, right? Like more than that, but yes, I think for, yeah, a lot of times it's not a, it's not like a super widespread thing, especially if you don't do it or work.[00:34:21] If it's infrequent, you want high value and then if it's, if it's frequents, you can do low value, right? Like that, that's the sort of binary tradeoff, like the Uber is sort of frequent and low value. Airbnb is high value in frequent there's something of that nature. . So like, you want, you want sort of inspections of that sort.[00:34:37] Google vs ChatGPT[00:34:37] But the other thing that you brought to my attention was, and, and has room for Google to do something is do you notice that OpenAI plugins, none of them are Google because they're not friends. So Open BT will probably never have first party access to Google Calendar, probably never your Gmail and probably whatever, you know, Google copies, OpenAI again.[00:35:04] They will do, Hey, we have all your docs.[00:35:10] Yeah, I, I, I'm interested in that because I don't know if you remember, but like in the first iPhone, like YouTube came, like pre-installed on the homepage and then I forgot when, but one of the early ioss, they removed it. So now obviously Google's not a friend. Who's gonna be a friend in the future, who's not gonna be like, do we all have to hail our AI overlords?[00:35:33] Yeah. To get access to the, the only plugin system. Yeah. The only winners are brown CEOs. Think you're fine. Alright. But yeah, yeah. I just invited nada. C my old boss. Hi. You can't lurk. I, I want, I want to hear from you. And but, but also, you know, yeah, I, I think the Google point is actually novel.[00:35:50] I'll probably write something about that. Yeah. I mean, I'll have to write something about this today. So please feed me things to write.[00:36:01] Nader Dabit on Selling His GPT App[00:36:01] Oh, there we go. Hey, what's up man? What are you think. I know it's like, not entirely your space, but like you're, you're all about the future, right? I mean I did build and sell an AI company about a month ago, . I did the wait, what travel app was built on GP T three Tweeted about You sold it? Yeah.[00:36:21] It was getting like a hundred thousand visitors a day, like 60 to 80,000 unique a day. And then I, whoa. Yeah, I sold it like within about 24 hours. I tweeted out that it was for sale. I had like 30 or 40 people in my inbox. Whoa, whoa, whoa, whoa. Okay. I need, so like, but you're right. This isn't my, my man like domain of expertise.[00:36:41] It's fine. You make, you may just a thousand dollars on the side. It's, it's cool. Wait, wait. So I saw you tweet your original thing, which was, Hey you know, GP three can plan your travel. I don't know what happened since then. Can you, can you fill the rest of. Yeah. Yeah. So I mean I was basically, you know, I travel a lot for work.[00:36:55] I, I do travel like once a month and, you know, but I'm also very resource constrained on my time. So I usually like to spend like one day sightseeing. So what I typically do is I go a trip advisor and then I kind of like, you know, Google around and like look at all these things and it usually takes me about an hour to figure out like what I wanna do on my day or two off to go, like sighting.[00:37:14] And then I realized GPT3, you know, you can just literally ask and, and say, okay, within X number of. Like, I'm gonna be in this city, I want to have an iter itinerary. You know, you can give all these different parameters and it gives back a really good response. This was before GPT, even three and a half or four was out.[00:37:30] So I just built like a nice UI on top. Then, like I mapped over the results and, and was linking to, you know, the the Google searches for these different items and, and kind of made it into a nice user interface and, you know, just built it out and tweeted it out. And it, it just got a lot of traction and attention.[00:37:48] Like I said, I had around a hundred thousand visitors a day, like right off the bat, 60,000 uniques like per day. So it was getting a shitload of of traction and. I don't have a lot of free time to kind of like, maintain or build something like that out. So it was costing me money, but I wasn't monetizing it.[00:38:06] So the way that I was thinking to monetize it would be to use affiliate links and stuff like that. So I could either, you know, spend time figuring out a way to monetize it or just try to make, flip it and just make some money. So I decided to sell it and that was kind of it. I just sent a tweet out and kind of said, this is for sale, who wants it?[00:38:25] And I had I had so much inbound from that that I had to delete the tweet within about two hours cuz I was just unable to keep up with all the people that were coming in. And I filled it out a couple of offers and I, I found the person with the most money that could close within the shortest amount of time and just took it.[00:38:44] Well done. Well done. Nice. Awesome. I need a, I need a, I need an applause button right here. . Okay. So with that context your thoughts on today, what you seeing? There's Expedia there, but. Comment on travel or not travel, whatever you want. . Yeah, I'm still reading up on the, the chat plugins actually.[00:39:01] And I was hoping to kind of chime into this to learn a little more about how they work. I'm here on the the page. I've had API access from fairly early on. I signed up and I've been you using it a lot. I'm trying to find some different ways to integrate AI and machine learning into the blockchain space.[00:39:20] There's a lot of stuff around civil resistance that I think are gonna be, you know, pretty interesting use cases for us. It's obviously not like a, a a type of use case that is gonna be useful to, to the general public maybe, but yeah, I'm still, actually still trying to understand how these plugins work.[00:39:35] So what have you seen the developer documentation, which developer documentation at the bottom? Yes. That's where I'm, I'm check, I'm reading through as of now, I see the examples, which are pretty cool. Yeah. Yeah. So my, my quote the, the quote I put on Hacker News was, this is OpenAI leveraging chat, GPT to write OpenAI op open API to extend OpenAI chat.[00:39:58] GPT. I'm confused, but it sounds sick, but yeah, I mean, so open api, you know, not to be confused is OpenAI is randomly the perfect spec for OpenAI to navigate because it, you know, is somewhat plain English. And then you just supply a description for model. You described a off method. So they actually provided a link to a repo where you can see some examples.[00:40:20] The examples are not very, not very flesh out. But you can do, like, bear off, I assume you can do whatever, whatever kind of off you like then you just provide like logo url, legal info url. It's not, it's not, it's not that much. This is 10 times better than Chrome manifest.[00:40:37] Like manifest v3. Yeah, I mean, I'm reading through some of these examples and a lot of them are in Python. I wish they would've more JavaScript stuff, but I would say 10 times would be kind of an understatement if I'm understanding how some of this stuff is gonna work. English is all you need, man.[00:40:53] English is all you need.[00:40:57] Well, so, so, and then I think in buried in the video is sort of the Ethan experience, right? Which is where you specify. So if you're, if you're first party congrats, you know, you're, you're inside of the the chatt ui, but if you're third party, you can just host your Js o file anywhere. It's literally a JSON file on an API spec, right?[00:41:15] You hosted Jason file anywhere. And then you just like plug it into their their, their text field here and then they, they validate a little bit and it's installed. So there is a third party app store on day one. Yeah, that open table plugin example is pretty sick. Yeah. So like yeah, I I What would you want as a developer that's missing?[00:41:41] I think that we're like in the golden age of of being a developer and I don't know if it's gonna go downhill quickly or if it's gonna go like, get better quickly or this is like the, the end of all of it. like, is OpenAI just gonna be where like we do everything like nothing else is like gonna exist.[00:42:00] I think that Okay. You know what I, I know that's not the answer for sure. I'm just kind of joking, but I think it will, this is obviously shut down a lot of companies. This is the app store moment, right? For like, just like, I mean, you and I remember the iPhone app store moment. Some people dropped everything to write apps and they made it big and some, a lot of people did not.[00:42:20] But the people who were earlier rather than later probably benefited from understanding the platform. Like imagine, imagine you, like, you know, you, you are a big React native person for a long while. Like imagine if you had the chance to drop everything and be one of the first developers on a new app store.[00:42:35] Like that's pretty huge. Yeah, a hundred percent. But I'm wondering like the, the type of mode that you'll be able to build with some of this stuff, because it seems like that OpenAI AI will just continue adding more and more features directly into the platform. But I think like for very like, Proprietary type of stuff.[00:42:54] It might make more sense, but like if you, if you want to build like an app for the general public it just seems like they'll end up integrating something like directly within their platform for a lot of different ideas like, such as this travel app that I sold. I have a feeling like they'll have a way better version of that built directly into their platform sometime soon.[00:43:13] Q&A: ChatGPT Waitlist and Voice[00:43:13] Hey, hey guys. Can I ask just to get a quick update does anyone here have access to it yet? Like is it, is it open? Cause I signed up for the wait list, but I haven't seen anything yet. Yeah, no, it's just, it's just wait list where just like 90% of the stuff that people launch, you know, she has a few, she has a few videos and demos, but yeah, it's just a wait list.[00:43:31] Who knows? I mean, thanks. Opening OpenAI Pretty has been pretty good about getting people off wait list, right? Like a lot of people got off the GT four API wait list, like the day after they launched. Mm-hmm. . This one, I feel like they're quite fully baked, like it's. I wouldn't be surprised if they started dropping tomorrow.[00:43:50] So we'll see. But like you can start developing your, your third party plugins today, because there's examples. The docs are like two paragraphs, but that's all I need really . So, so I've been, I've been working and, and I've been following a lot of projects where people are, the one thing I don't see with ChatGPT is like, why are they have, we have Whisper, we have the APIs for ChatGPT.[00:44:13] It's like, why are we not at the point where we're talking to this thing and it's talking back to us? Like, I don't know how we haven't, nobody's wrapped their head around that yet, but it's like, it seems to me like, don't you wanna be like, Hey computer build me an app that does X and it says okay and builds it for you and talks back to you.[00:44:29] Like, I just, it's like, I don't know. That'll be the first probably plugin that I try to work on, but it's just driving me a little nuts. That's all interesting. I like the voice interfaces because sometimes it gets really long, like some of the prompts get really long. They're like, I don't wanna talk that long.[00:44:46] Yeah, yeah. Yeah. I was so, so I was doing, I was messing with the system prompt, basically get it to be like, Hey look, I'm gonna be talking to you. So keep it condensed. I think like the ideal interface would be like, for like, talking to, it would be like putting that at like the system level, but also, you know, being able to type as well as speak to it is just something that I'm, I'm trying to work on.[00:45:08] And I think with Plug, you know, if we could do that with plugins, I'd be huge. Cuz I know there's already a, like a Chrome extension that allows you to talk to it. Or, or I guess you could do it natively as well, but, you know, native stuff on like iPhone and Android is not too good.[00:45:24] LangChain with Human in the Loop[00:45:24] Hey, you, you mentioned that. Hi, by the way. You mentioned the hey way of, of talking to or having way the AI talking to you as a user. So just today there was a new release to of LangChain. I know it's kind of, not really the plugin, but this is the closest thing probably. And they edit a Ask Human tool.[00:45:46] So now the model can ask you a question if it's not sure. About something[00:45:55] to share. Share what? Go ahead. So, so the ask you if it's during its chain of thought, when it's not sure. To an example. Right, right. Oh, I would love that. Yeah. Probably not gonna do that. It's too confident. Yeah, I, I've seen a little bit about. LangChain, but I haven't used it yet. Has anyone here it?[00:46:15] Oh, it's all about it.[00:46:19] I did, I did. I built the LangChain on UI too. It's pretty nice. I mean, especially when it first came out, the, the trolling, it was like so rudimentary. But it's nice to be able to change things together. I think the agent part is pretty interesting. I haven't used it myself because I didn't need it.[00:46:34] But yeah, there's a, a very big community. See, see, light chain was very smart, right? Like they picked out the open source angle first, and then the others like dust or did the closed source angle. Now they have indirect competition with ChatGPT, but Langchain still has that. It's open source, extensible, like you own your agent.[00:46:55] Google vs Microsoft vs Apple[00:46:55] Them doing business deals with OpenAI in, in closed doors, right? Like, so pretty smart, like strategic position. All things considered.[00:47:05] It's a little, isn't it? It's like a little funny to me. That, you know, it's like goo because Google just came out with Bard, right. And I don't know if you guys have messed with Bard at all, but it's at least to me another wait list. Oh, okay. Yeah. I mean, to me it was a little underwhelming. I mean, I'm, I don't know if you've seen like the same, yeah, if you've seen like the screenshots going around, like it seems like, you know, someone tweeted it was like in, in guys in a boardroom or whoever's in a boardroom just being like, s**t.[00:47:30] Like, we need to you know, we lost our first mover advantage here. But it's just kind of funny to me that like, I guess now Microsoft's gonna have like an app store, right? Like just after everything, you know, Microsoft dominated in the nineties and stuff, and then it was Apple, apple, apple. But it's just kind of funny to me that it's gonna be, I guess Microsoft now, right?[00:47:49] Bard feels like Bing does to Google. Totally. Yeah. A hundred percent. I agree with you a hundred percent. All the turntables, right?[00:47:57] Yeah. So for, for those of you who might have missed the earlier discussion the one thing that OpenAI or Microsoft will not do is integrate with your Google calendar. So, the one saving grace that Google probably has it, it probably owns your workspace, right? Like most of us have Google accounts, Gmail accounts.[00:48:14] When we work, we log into Gmail and Google, again, use Google Docs spreadsheets. So if Bard is smart, they will take advantage of that. And then slowly watch as everyone moves to Microsoft Office.[00:48:31] I think Apple should do a partnership with the OpenAI and basically Microsoft. Cause Google has huge advantage of Android. So basically having OpenAI on the, I, I mean, it would I mean having the partnership with OpenAI would make, I mean, very useful on I devices if they, I mean, Siri is really bad and if they integrate with I, I mean they've win the world I think.[00:49:00] So it would be huge, beneficial to Apple and basically the Microsoft also if they integrate together because Microsoft doesn't have any of the devices and most people, I, most ordinary people use the devices iPhone or phone and . So it would be huge advantage. And for the 10, basically Apple I, I'm very curious to see what Apple ships next.[00:49:24] You know, everyone's shipping AI stuff and then Apple was like, Hey, look at our AR glasses. . Yeah, but I mean, ar ar with, with the, with the 3D models that are, that are coming out cuz isn't it mid journeys working on like a three, like their lab, I know is, is building a 3d generative model. And I think that sort of stuff with, with AR is very, oh, is that, is that public?[00:49:45] How did, how did you know that? I don't know if it's public. I, I saw a tweet about it I don't know, like a week ago. It is a semi, semi open secret in San Francisco, but I, I don't know if it's public. Yeah, I think I, I saw them, it was some context of they were talking about text to video and they were like, well we're, we're doing our like 3D modeling first.[00:50:02] So, I mean, my assumption is, and I, I don't work in the space yet, unless anyone's hiring please, I'm looking for work. But it seems to me like Apple. Seems to have their head on straight and like it might be that if they're gonna release these ar like mixed reality ar vr glasses, like, you know, the mo the thing that makes the most sense to me is like getting with generative AI graffiti modeling.[00:50:24] It's like, you know, it would be cool to go to like a coffee house or a bar. And then, you know, when you see like the graffiti in the bathroom when people write sometimes funny stuff, sometimes, like the worst stuff you've ever read in your life and you're like, what is going on when this person's going to the bathroom where they have this much hate?[00:50:38] But it's like, it would be cool to have a component of that, you know, like in the metaverse, so to speak, right? Like, so you put on your AR glasses and it's like, oh cool, I can see like a bulletin board here that exists in the fizzled. But it's also in the, you know, it's like augmented, right? That's just, to me it seems to be like the logical next step.[00:50:57] Interesting. Well, we'll, we'll see that when that happens. I recently got a Quest Pro quest to my, and yeah, my parents love it. And any tech, any type that my parents like, I think has a real crossover appeal. You know, the thing that you, your conversation had gimme an idea for winners of every app store in the early days, like Facebook has an app store, apple had an app store, you know, the winners of an app, store games like what we need Yep.[00:51:24] Is a multi-player. Like everyone logging into chat, BT and then playing a multiplayer game line. Mpc. MPCs are gonna text you on your. , that would be kind of cool.[00:51:40] ChatGPT Plugin Ideas[00:51:40] Actually. I was thinking, I don't, I don't know if it's gonna be game games at first though. Like, it seems like games always push the envelope with tech.[00:51:47] Well, it's like pornography and games, right? But like, I don't know, I was talking to like, you, you mentioned your parents and like you know, I was talking to my mom about this stuff and I was like, you know, I'm seeing stuff that are just demos of just like, Hey, take a picture of your fridge and it'll tell you like, here's what you can make.[00:52:01] Or you know, even like talking to it and just being like, Hey, here's what I ate today. You know, what's my, how many calories I ate today? Or, you know, what's my diet plan? Just things like that. And that's why I brought up the talking to it just with na using natural language and then having it, being able to talk back to you.[00:52:17] I'm surpri I'm like really surprised that they haven't implemented that yet. Cuz it seems to me like that's a use case that a lot of people would use it for, you know? Or if you could just like, you know, call it on a phone if you built like a Twilio back in, into it or something. Like I just don't, it, it boggles my mind why they haven't.[00:52:35] Put that feature in yet? . Yeah. Yeah. I really don't think it's gonna be too long before you're, you're sitting there at work and you get a text or call on your phone from an nbc, Hey, our village is burning down. You need to come over here and help . Do, do you guys think there's gonna be different silos?[00:52:55] Like you know, with Bard coming out and you know, people implementing GP T three and four now, I guess, into all their apps, but do you think they'll be like, chat GP p chat, GP, PT will have their store and then Google will have their store? Do you think it'll be like, there's gonna be a clear Victor here and then, you know, it'll be like, okay, Google's apps or, you know, Google Docs or whatever is like part of chat GP t's plugins, right.[00:53:20] Yeah, it is gonna be like crypto. Everybody's just gonna be fighting for the top. You're gonna have the couple of dominant people, but then you're gonna have all the, the small guys who go up and down and Yeah, I I, I feel like it's gonna be pretty similar to, to how crypto was. So we're gonna have some slur juices is what you're telling me.[00:53:41] Yeah, boy. Nice, nice. I dig it.[00:53:46] Not an app store?[00:53:46] So may maybe we aren't, tell me what you guys think about this, cuz maybe we aren't thinking about this right? Because maybe this is not an app store. Cuz typically in an app store you'll go ahead and choose which plugins you want installed, like on a phone or whatever have you.[00:54:02] But the path forward seems like all the plugins are like omnipresent. I, I don't know why Google isn't shitting their, shitting their pants right now. Cuz basically you check like openly I could just force all. The big companies to write plugins and then just be a single search box for everything. So imagine if you wanna like fly somewhere or you wanna book a hotel you, we have the Expedia and booking.com.[00:54:29] Both of those plugins summoned up and it shows you both the results. And then you can click through on whichever ones you want. And then, yeah, you charge 'em based on click throughs. Like I, I think like we're, maybe we're just getting tripped over by the fact that you have to choose a plugin right now and only interact with that single plugin.[00:54:49] But I think I think the smart move forward would probably be just to have all of them omnipresent and then have this like n l p higher layer up there to summon the right plugin when need be. What, what do you guys think about that? Yeah, so, so that's like the LangChain thing. That's what I haven't used LangChain yet, but it sounds like that's, from what I was reading with LangChain, it sounds like that's kind of is how I thought that worked.[00:55:12] But I don't know, can someone here like enlighten me? I, I don't know if it, how, how LangChain works.[00:55:21] LangChain and the Future of AI[00:55:21] Yeah. I don't know how LangChain works either, but I think it's gonna be a two-way street. Everybody's gonna be making plug-ins with chat GP p t and everybody's gonna be making chat GP plug-ins for other services as well. I think there's gonna be a whole bunch of people about to make a bunch of Jira plugins and stuff like that, so I think it's kind of gonna be a, a two-way street.[00:55:45] I dunno, is anyone else, like, this is super exciting to me. I haven't been this excited about like, the internet since like, probably like the, like the web 1.0 days. Like I, I, I hate, I'm so . Yeah. Like, I hate web two. Like, this is cool. I'm glad that like spaces exist, but I hate Web 2.0, like Web 3.0. I'm about, and like, I, I consider this part of Web 3.0.[00:56:04] But it's exciting, right? Like, this is cool. Like I, I'm really, you know, I'm stoked about, about the progress that's being, like, the joke is like, you know, every day in, in AI is like, it's like way longer, right? It's like we're telescoping very quickly. Yeah, I mean, one of the things, telescope and updating.[00:56:23] Yeah. You know, I, I would say I noticed towards, maybe like three years ago when I was working at aws, it just seemed like for, for about five or or so years, everything was very stagnant and there just wasn't a lot of exciting things that were happening. Everyone was like, if you remember, all the Devrel advocates were like all creating like tutorials around creating your own CMS and your blog, and you saw like that exact same tutorial given by like hundreds of people over the course of a few years because there just wasn't any cool s**t that was happening.[00:56:52] And then I think when crypto and, and blockchain stuff like that kind of caught my attention. Caught my attention, and I'm still excited by that, that stuff. And then this seems to be just almost like when, if you were like around when the iPhone was coming out and actually realized how important it was, I think everyone now is, is seeing this and they're all like realizing how important it is.[00:57:13] And it's cool to be like part of this moment as a software engineer. Yeah, I'm, yeah, go ahead. Oh, sorry. I was gonna say, like, I'm, I'm excited for you, I'm sure you guys saw the alpaca stuff, right? And I know that they're doing D D M C A stuff, but essentially someone's gonna train one of these models and it's gonna, you know, you're gonna be able to run this stuff offline.[00:57:35] And just like the way to, if, if you have access to like I forget which one of the EAC accelerate people was talking about it, but it was like wharf in the flask. It's like you've gotten the machine offline. So if you don't need internet access to access, like, the entirety of human knowledge, whatever's in the data set up until 2021 or whatever, and you don't need internet access, like that's gonna revolutionize everything.[00:57:57] Like, that's insane to think about[00:57:59] Yeah. Oh, well we won't speculating You can run in Inside Chat runs Python. Oh, really? Is that, is that happening? I mean, it has a file system and it has file storage and CPU at memory. Yeah.[00:58:20] is turtles all the way down. Turtles all the way down, man.[00:58:23] The, I, I think the plugin system, if people can get to run their own models like the LAMA ones and the same structure for plugins, you can see like going back to the Metaverse thing like a and snow crash where people built their own like demons. You know, it's like I got the demonn that like kicks people out of the club, the, the black sun.[00:58:43] But you can see in real life it's like I have a bunch of plugins that only I have, you know, and I use them to make myself more productive, use them to make myself, you know, look like I'm working when I'm not working and I'm like responding to my emails and stuff like that. But I think like, The OpenAI releasing this today makes it so much easier to start it because you don't have to worry about any of the infrastructure.[00:59:07] You just build the plugin and then they run everything and you get the best model possible. But I think none line, you know, I would love to walk around with my own, you know, raspberry pie or whatever of my wrist, kind of like I'm fall out and say, Hey, I wanna do this, I wanna do that. I don't know, I don't think we're that far away, so I'm excited to, to keep building.[00:59:28] Shoot, the, the technology exists where you could make that now, but it'd be a little awkward to have a raspberry pie on your wrist at the moment. . Well, well, well, that's kind of what I'm saying with the, with the al alpaca thing, right? It's like if you don't need internet access to, to use the model, I mean, we're, we're still pretty far off floor.[00:59:48] I don't know if Moore's Law even applies anymore. You know, we're not that far off from being able to run this stuff on, you know, consumer hardware that's cheap and that's gonna be huge for, for, you know, the majority of the world, right? Like, that's gonna be very big. Like e e even bigger than this. Like, it's great that we can do it with the internet, but as soon as we don't need the internet to access it, like it's, it's over, but we're back.[01:00:12] Whatever, whichever one you believe. It's just, this is crazy to think about that. Yeah, you could, you could if that happens, you can go and hook it up to a coding compiler and have it sped out human readable errors, but at that point it's probably just gonna be brighten on the cup for us anyways.[01:00:30] So we have a Hey guys. Hey. Hey, Alex. Go ahead. I, one more question, but yeah. Oh, go ahead. No, no, no. I have a right in question from someone who's trying to join but was unable to Stefani. , who I met, by the way, at the LangChain Hackathon, LangChain meetup in San Francisco. She has a lot of cool insights.[01:00:45] Q&A: ChatGPT Bots and Cronjobs[01:00:45] Follow it. Yeah, go ahead, Alex. I'll, I'll cue the question up. Oh yeah, for sure. Uh, One thing that really got my mind out this stuff and, you know, high vision mode is the fact that you can kind of externalize memories now. So the main use case I was thinking about is you could basically set up crime jobs, for lack of a better word.[01:01:04] So suppose you're, I don't know, building a trading bot, right? And you can say, Hey, Chad, GPT, look at the price of wheat every day at midnight. And you can just cue that up in the background and then have that send the response back to back to the LLM at a certain time. , and, you know, that's just like one use case.[01:01:21] But here comes like the play where like there's time sensitive things that break the one by one synchronous nature of ChatGPT and adds a little more, you can say from one level more humanness to it rather than like direct response and reply with latency. So there's that level, but also you can like schedule tasks and I think that's gonna be the killer plugin, whoever creates like the, the cal.com or the you know, theron integrations for just like, Hey, look at this point in time, and they give me the response.[01:01:48] I don't know if anybody's been thinking about that. Yeah, I, I was thinking about that a lot. Like how you said the expand, it's like an expandable, it's like a portable brain. Like, it's like, Hey, here's my secondary brain and it does, it's like my secretary, or it's like my assistant, right? Like somebody had a prompt where it was, you know, you're a form of, you know, one person's wisdom, one person's, you know, thinking about.[01:02:11] things x X way. Someone's thinking about it y way and like being able to have that just on demand with the like expandable component where you're able to basically Yeah. Delegate tasks to it and be like, Hey, you know keep what's, what's like the way to think about it? Like, not like a crime job, well, sort of like Aron job, but like like, you know, like news alerts, like Google news alerts, like things like that.[01:02:33] Just being able to be like, Hey, like keep, keep an eye on this thread for me while I do other things. And then if something comes up, you know, whether, you know, you just do some NLP or whatever, search for keywords you know, alert me or do whatever. And being able to do that without having to go through, you know, setting a reminder or doing all that painful, like, pain in the ass calendar stuff.[01:02:52] Cuz I think there's so many, there's so much software for that because people just hate doing it so much. Like, that's gonna be so big. Yeah, no, I was thinking that's probably a better way to put it, right? Like asynchronous alerts or I guess you could do timed alerts also. Because the one thing I was thinking about is the Instacart api, which is what they're demoing.[01:03:10] I don't know if anybody uses Instacart, but it's pretty slow on the lookups. So that's like, you know, that's a blocking process in the current integration of chat GPT. But if they could figure out a way to make it like asynchronous and then actually interact when it's done getting the, the fetch, and then you can do stuff in between that, that's gonna really change the interface.[01:03:27] And that's like, that's really the step closer to having like a real personal assistant in your pocket, man being able to just give chat d p t all of your Ps that you cook that week, and then just have it, order all the stuff from Instacart, from you. I can't wait for that man. Oh my God, that's great.[01:03:49] Oh, okay. Okay. You know, you can ship a boat Logan, like a cook, a cookbook with like actual recipe, but yeah. Yeah. Let's introduce Logan. So does this, like physical companies that integrate with software are gonna be coming like more of a moat as opposed to just software specific companies? Every software is a software company.[01:04:10] I know[01:04:13] Yeah. But if you're just a software company, OpenAI or, or, or some, one of these companies can just build that feature in now a lot easier than they could maybe in the past. Yes. For instance, you know I don't know, like we were talking about travel and, and stuff like that. But, but let's say you have a physical, you know, product that that, that maybe you can just separate yourself from other products by building, you know a better quality user experience.[01:04:40] Logan Joins Us![01:04:40] And so we got Logan here was our first podcast guest and the first Devrel person at OpenAI actually. . So Logan, welcome. Obviously, a lot of people here are excited to talk about this. One thing I noticed from the plugins is that a lot of them are more mundane things. You know, you got travel, you got grocery.[01:04:58] Can you tell us a bit more about how you picked those and like maybe give us a sneak peek of other use cases that you all are excited about? Yeah, I, I think first of all, I think going back to the conversation about the ability to like queue up tasks for you in the background, I'm, my understanding is that Zapier actually already does this by default.[01:05:20] And I'll, I'll go play around with it after this and see, but my, I, I think Zapier has the ability to schedule things and I think this is the part. Yeah, people are sleeping on this the most is that basically Zapier is already connected.[01:05:36] Zapier's already connected to 5,000 different plugins, and now you can just integrate directly with all of those through Zapier, which is incredible. So you don't even need to wait for like the plugin or whatever to come. Zapier will already do that for you. Which is, which is super cool. And it already has the ability, I'm 90% sure to like schedule certain actions to happen which is awesome.[01:05:57] So I, I think going back to the point of like how these folks were, were specifically chosen, I think the reality was when it was initially scoped out for doing this work, there was just, we needed people who were willing to sort of deal with the idea of of sort of, we were still building this entire platform and infrastructure from the ground up.[01:06:15] And I think those. Those folks who were featured today during the blog post, did a lot of work of iterating on these things with us as we figured out a lot of the challenges. So huge shout out to all those, the engineering teams of those companies for, for working with us so closely to make it happen.[01:06:32] I just gotta say too shameless, shameless plug here. It's my birthday today and this is a super cool birthday gift. So thanks for, for doing this and the blog post. It's really awesome. happy birthday. Yeah. Thank you. Thank you. I think we all just got a, a huge gift like look like. Yeah, Logan, you don't have to speak on opening as we have here.[01:06:50] Like, we're all just like, you know, large model and Enjoyers here. I think. And this is a, this is a app store moment for like all of us. Like it's, I I'm just processing this and, and just trying to. Do therapy in public[01:07:06] There's a lot of wait list fo here, so we're all excited. Oh, yes.[01:07:11] Q&A: Plugins Rollout[01:07:11] What do we have to do to get the wait list? Yes, . I, I think the reality is yeah, it, it's, they're rolling people out really slowly and I think the intent is part of this is to understand, and I think it was one of the big highlights of the blog post about what are the new sort of accesses for, for harm here.[01:07:30] And I think we know some of those things, but there's a lot of known unknowns, so it'll be intentionally small for the time being. But hopefully we'll, we'll expand that access in.[01:07:44] bottom line, get on, get on the wait list and, and keep your, keep your fingers crossed. Come up, like come up with a cool use case. I think there's something, there's part of the wait list is like submitting what you would be interested in working on and actually in, they actually will, we will actually read that to make sure that, you know, we're bringing people in who are gonna build cool things, not stuff that's uninteresting or potentially harmful.[01:08:06] Okay. Are you using Tri GB two to analyze the wait list? ? Yeah, that was my question. It'll probably be humans to analyze the wait list would be my guess, but maybe, maybe not. I'm not sure. Very old. What's the difference? Old, like, yeah, yeah, we have a question from write in who couldn't join for technical issues.[01:08:23] Q&A: Plugins Discovery[01:08:23] Stefania, who is a researcher at Microsoft right now. and her question is about search. How what is the future of search for plugins? How do we discover new plug-ins? Do we need a schema for plug-ins with complex queries or, or complex behaviors? And does it limit the context window as well?[01:08:41] Like, do we install like a hundred different plug-ins and like, does that, does that hurt help? I don't know. . Yeah, it does. So there's a limited, and I, it's all in the developer documentation right now if you wanna read through it. But there's a bunch of limits on like your open API spec and the descriptions you use.[01:08:58] But we actually take all that information. We take a sample request, we take a sample response, we take the description of it, and it's actually all inside of the, the context window to begin with. So it is limited right now. And I think that's where some of those larger models like GPT four with 32 K contacts in the future, when that's the available will be super helpful and you'll be able to bring a lot of plug-ins in.[01:09:20] But at the current moment, the more plug-ins you add, the less, the less context to you you actually have in the conversation. Yeah, yeah, that makes sense. Makes sense. I mean with like 50 pages worth of context, that that's a lot. And you know, I was very impressed at the latency as well that that at least the demo was able to pull off, which is awesome.[01:09:39] Yeah. Any, any other like, reactions, thoughts, questions to plugin? I have a couple new people joining. Hey ar Yeah, I had a couple of them. If I can chime in. First of all, just blown away. I mean, it's a fairly interesting approach to deal with, like live data with this data that you guys train on. Couple of quick questions for you.[01:09:57] Q&A: OpenAI vs BingChat[01:09:57] How do you see this? Maybe it's too early to ask, but how do you see this starting out to something like a Bing Chat? The, the reason why I ask this is, I mean currently Bing is more of the UI that you're dealing with and chat GP t's being launched on the side. But do you see it more being like a platform or do you see it more consumer facing?[01:10:20] I mean, I dunno if this question was to me or not. Yeah, you don't, you don't have to answer that. You know, obviously Logan cannot comment on Microsoft.[01:10:31] I do think though, that the, the interesting differentiator is that the, the work, and I think this was in their public blog post, is that a lot of the stuff that Bing is doing is optimized for search specifically. So it's, it's just a fundamentally different experience. I still think that like if you're, if you want like that search first experience, I think something like B makes a ton of sense.[01:10:54] Yeah, it's just, it, it feels like a different experience to me, so, thanks.[01:11:00] Q&A: App Store Monetization[01:11:00] So I think it's been mentioned a few times that this is like the new app store or ai. What, I guess I'd, I'd like to hear thoughts of other people as well, but like, what's the, so the app store is monetized, right? So that's a big incentive for people to put their apps on there.[01:11:14] So how does in, in this case, you put a manifest and it hits, hits the API for your app maybe. So what side of the monetization strategy here? I mean, this is not a question for OpenAI, it's just like a general sort of direction for things. Yeah. I don't know if they care. , this is like trivial to OpenAIr.[01:11:34] Yeah, we were talking, you're paying for the api, right? So you're you mean like on top of, of paying for API access, like you're using your credentials, you supply your credentials when you, when you sign up to plug in. Right. So I guess you do building off platform.[01:11:50] Yeah, I guess so. So not from an OpenAI point of view. So Open of course, makes money on wins anyway. What I mean is like for an app developer to go on there. So I guess you have an app outside of OpenAIr, which is useful. And this is kind of distribution for your app. Is that, is that kind of the, the sale for the app?[01:12:07] I mean, we're three hours into it, so it's hard to say , definitely. But I think that's, I'm just waiting for someone to write a mega threat on how to make money with the app store here. Seven ways. I'm sure. I'm sure there's gonna be people on YouTube making videos with themselves streaming, and that's how they all saying, I just figure figured how, how to make millions.[01:12:27] But yeah, one model we were talking about was maybe you can do kind like Spotify or like a, you have Achen GD subscription and then people each plug in gets royalty. Or a lot of things. So like Instacart, like the Chan GD thing is more like a UI alternative rather than like an app itself.[01:12:46] So it makes a lot of sense. Do I have things like that? But yeah, it would be. . Yeah, I guess what I mean I think Dylan or somebody else said earlier that this might not be the, the app store might be like something different. I think App Store is like the closest we, we have to think about. Like that's the closest analogy, but it might be just something completely new.[01:13:06] And that's very interesting. I think that's that's a pretty, pretty exciting place to be. Well, well, I don't know how much overlap with like the web three stuff, but it seems to me, I know there's like a couple projects out there that are, I think there's one called Bit Tenser, where it's like people are you know, basically selling their you know, their, their GPU usage, right?[01:13:24] Like, you know, there's tons of gamers out there that just have, their cards are just sitting idly by, and I don't know, it seems to me like a monetization model for OpenAIr might be to, you know, they own the model, right? So it's like, I don't know if they can like, lease out the model if you could like write a smart contract that like, uses their model somehow, or, I dunno, maybe plugins could be like written into a smart contract where it's like if you, if you're using this plugin, like, I don't know how that would work specifically, but thinking ahead, like, I don't know, do you think it's gonna just be centralized this, this whole time or like, surely there's gonna be a way for this to, to spread.[01:13:58] And you know, obviously like there's a. What's the, what's the word? It's, it's kind of like you're trying to hold all this water back with like this one stone, and it's like eventually it's gonna break. So like, there's gonna be some decentralization in this at some point. So I don't know if that makes sense.[01:14:12] I'm just trying to think about like, how, how there's a monetization you know, pathway for, for this. For, for the, for these plugins.[01:14:24] Yeah. We're not gonna get the answer today.[01:14:34] Let's, it's Farmville. We're gonna, we're Farmville on ChatGPT. Let's do it. Yeah.[01:14:42] Q&A: ChatGPT Plugins API[01:14:42] . Yeah. I was interested in like if there's already an API for this or if there's like an planned, so like when chat was just a weapon interface and then we got the API later, or is this like a web only?[01:15:02] There is a API available today, but you have to have access to actually create plugins. So you won't have the interface to install a plugin or do anything like that. You can basically build all the stuff on the backend right now if you want to, and then when you get access you'll be able to actually install the plugin through the ChatGPT UI test it out and all that stuff.[01:15:23] But as of the present moment, no one beyond a very small group of people are able to actually install those developer unverified plugins. Yeah, I was I don't know if if that's what you meant, but I was thinking about like, do we have a programmatic way of calling the ChatGPT API with these plug-ins enabled and get like adjacent response back opposed to like using the weapon interface with the plug-ins enabled?[01:15:47] Yeah, so that, that doesn't exist yet today either. I think it's, it's unclear when and if that will come, but it's definitely something that folks are, are thinking about. I think there's just a little bit more a bunch more security and other challenges like that when you give the plugin access through the api, but it's, it's definitely something the team has talked and thought about internally.[01:16:09] Alright. Thanks for your insight, Leo, follow up question. Did, did you have a specific use case in mind for that that specific need that that can help to motivate things sometimes? No, not right now. It's just a general question exploring. Yeah. Well, okay. You know, you can sort of hack it together with the stuff that Diane Gross was doing in the early days of chat.[01:16:27] Bt. But then also, like, I, I feel like we could make like a mock validator for plugins such that we are ready to go when it's live. I don't think it'll be too hard. Yeah. Any clones, 20 clones out there for like chat ui, so you can sort to kind of hack it in. Maybe it's like not, not the highest fidelity, but the, the schema is out there, so there's nothing really stopping us apart from, you know, waking up tomorrow and, and seeing that Chad opening, I have done it already.[01:16:54] So , I, I think the only, the only, you could definitely do some of that today. I think some part of the challenge will be that it's a different model that's powering some of these things, which isn't available. Yeah. Yeah. I think that would be, but I still think even with probably base Sahara and just injecting some of this in there you could probably get most of the way there.[01:17:14] Q&A: Python Interpreter[01:17:14] Yeah. By the way, that, that was a misconception that I had to correct a bit early on in the space before you came on. You dropped three models today. Like there was a browsing model and then there's a separate plugins model. And the plugins model doesn't talk to the browsing model. And then there's a, you know, there's.[01:17:28] Python running, which is still going my mind by the way. . Yeah. The Python running also goes back to the piece around, if you wanna basically have things like set things up to dispatch, you can essentially have it write the code and just like plug into any third party library and like set up crime jobs and all that stuff for you.[01:17:47] So going back to sort of having chat b t do your bidding, you could, you could do all that with the code interpreter, which is super cool. And I think Greg tweeted like 20 minutes ago or an hour ago, something like that. An example of it, yes. Like doing video compression and like editing and stuff like that, which was super cool.[01:18:05] That that is the one. Like are we gonna have that or is that Greg's special box? Like No, I think that he's just running straight up interpreter is my understanding. I don't think there's anything special going on there because like that is insane, that like you have storage, you have compute you are a compute platform now.[01:18:22] Like CHATT is not a chat app. It's crazy. Like this is what made me start this space because I was like, wait, like this is not chat. This is a new thing. I don't know what this is. So yeah, I have to drop, but this was, this was awesome. Thanks for hosting this, and thanks for, thanks for having me on again.[01:18:41] Appreciate you. Happy birthday, Dylan. Hopefully this was a, a worthwhile present. , it was great. Thanks for coming on. Yeah, yeah, yeah, yeah. All right. Bye, Logan. Okay. A couple more questions. If anyone has them. These things tend to drag on a little bit, so I always like to end on a well-defined note. Anyone else have reactions, questions, see anything out there that might be interesting?[01:19:01] I did see you know, the, the, the chat partners are starting to tweet out some stuff, so Ane Patel tweeted up about the Milo plugin that they developed with OpenAI, so we can see a little bit of that. Oh, particularly, I haven't particularly like dived in. . But yeah, you know, I, I'm collecting all, all sorts of information and, and reactions.[01:19:18] I'm gonna write out something today because I think this is one of the biggest days again, in tech since, I dunno, Tuesday since last week.[01:19:30] it's hard, but I mean, does anyone agree that things were like, really boring for a while? And this is like the first exciting thing that I've seen. The, the reacts people are still talking about use effects. Like, f**k that. Like ? Yes, exactly. Like we were stuck and reacting like CMS land for like 10 years, just.[01:19:52] Thank God. Thank God. Hey Peter. Hey. Thanks for having me on.[01:19:55] The History of App Stores and Marketplaces[01:19:55] I just wanted to say something real quick to the person that was asking earlier about monetization models and, and plug-ins and touch and I just, I thought one, one thing that occurred listening was that you know, a lot of these, I've done a lot of these plug-in marketplaces over my career and I think there's obviously an opportunity to like, offer different levels of validation and sort of test compatibility kit pass.[01:20:16] And you know, there's also an ongoing component of it cuz there's, you know, potentially data streaming through and, you know, You know, concerns around, you know, the quality of that data does it, you know, circumvent or inter interfere with OpenAI safety systems. So, you know, one obvious way that they could, you know, potentially monetize, you know, any marketplace really, you know, app store, whatever, JetBrains, you know intelligent idea marketplace, right?[01:20:38] Is to have that concept of different levels of validation and, and compliance, you know, to a certain specification. And, you know, you get a little logo or something like that and, you know, so anyway, just a quick thought as I was listening. Fascinating. And thanks for having me on. Hey Peter, since I want you, you to, since you had felt like you have a bunch of experience could you list like the, the, the marketplaces that you've been a part of?[01:20:59] And like, maybe like one thing they did well, one thing they recorded. Sure. I, I'd love to get a top down view. Sure. Yeah. I, I, I don't know that I've seen all of them, but I mean, you know, obviously I'm an iPhone and Android user, so I've, I've seen the marketplace like the rest of us. But JetBrains marketplace I think was particularly good.[01:21:13] Postman has a really good API marketplace rapid. I didn't know that. Rapid ap. Yeah. You know, I think, I think a lot of platform companies have gotten the message and, and they think about marketplaces, obviously the, the hyperscalers, right? You know, you've got the, you know, the, the cloud marketplaces from Amazon.[01:21:28] From Amazon and Google and, and Azure and such. But you know, it's some of the, sometimes it's these smaller ones that are also surprisingly good, like the intelligent idea, you know you know, you go to their website and it's like, you can buy an ad banner if you're in marketing, but, you know. Yeah. Anyway, so this concept of like validated plugins, right?[01:21:44] Especi. when there's this aspect of the data that's flowing through them I think presents an interesting opportunity not only for, for developers to, to make non-st plugins, pardon my frank for you know, for for OpenAI to, to, you know, say, Hey, we looked at this and not just with chat, GPT, no offense[01:22:01] you know, we, we we're giving it a th seal of approval. Right. You know, and that'll, that'll carry weight and carry meeting and people will pay for that is my guess. Yeah. Yeah, yeah. Yeah. Awesome. Well, if I think there's an appetite for like, understanding how to do well in the marketplace right now, if you write a post about that, I think you'll be very well received.[01:22:18] Sweet. Cool. I'll try to find you on Twitter. I, I just kind of dropped in. This was sort of an instinct and then I saw like, NARS here and all these other people here, so it was just kinda like, wow, this is awesome. I know, I know, I know, I know. Well, we're all just like reacting and we need a, we need a space to, to yell because this was huge.[01:22:34] So thanks Peter. No problem. And yeah, let's, let's connect offline.[01:22:37] LindyAI's Flo Crivello Joins Us[01:22:37] Flow is here. I'm trying to invite you, Flo. Because we were talking about Lindy earlier. We're talking about what this, what judge plugins means for Lindy. I don't think it'll, it will, I, I think actually like it will help highlight the differences.[01:22:49] But Oh, you're speaker. Okay. Congrats on your launch, by the way. Very, very, very well done. Thanks. Yeah. One hell of a day. . Hi everyone. Hell of a day. Did you know this was coming by the way? We didn't know it was coming today, but yes, we knew, we knew about this and we knew it would come in the, in the viewing of future.[01:23:05] Yeah. So I'll, I'll intro, I'll reintroduce cuz like the space is like, like four x since the time I talked about. But, you know, AI, virtual assistant is able to arbitrarily respond emails and step meetings and use natural language to do all of that. I think the, the user interface also was very, very well.[01:23:22] Which you know, I, I can't, I can't imagine how long you took to, to do that, but like that is the polish that you need for personal use stuff, right? Like it, this is the, this is the table six. Thank you. I'll, I'll pass your compliments to the designers who hate me now,[01:23:38] it did take a long time to reach this point. I mean, my take is that I think like the button is being passed from the folks like the, the, the, the lab coat researchers working on the models, they're passing the button over to like the, the product teams, basically. And I think we're gonna see a new wave of aed, not just about, Hey, we have a model that is X billion parameters, but we're gonna see a new wave of startups that own a business of building great products around these models.[01:24:07] And with a very simple interface, which is well, sorry, sorry. Yeah. Well, I'll tell you about plugins, but you're talking about over the foundation model APIs. . That's, that's correct. Yeah. Yeah. So I mean, are, are you worried about competition from like, you know, chatt, like, let's, let's talk, let's talk this out, right?[01:24:22] Like what do you see sort of the products gaps that, that PTs have versus whistle? Yeah. My understanding is that chat PT is really like chatt plugin by understanding, so up on the announcement, it's like, it's really more of like a developer product. So OpenAI is remaining true to the DNA of like, you know, we're building models and we're building stuff for, for developers to build product on.[01:24:42] So the impact on companies like Lin is that it's lowering the barrier to entry, which I think you're not targeting developers. Yeah, well, it's not just, it's like, it's become easier to buildy, like a whole lot of stuff that we've built, like over AI just released for free and we're like, well, f**k, like, I guess we build that.[01:25:01] So it's, it's lowering the barrier to entry, but you know, you, you're still left with your expertise. . Yeah, that's true. That's true. Yeah. And also also commenting before you came on that open, I probably will never have Google Calendar on their list of preferred, you know, plugins. They'll never have Gmail on.[01:25:20] And, and your, your integration is already super tight like this, this plugs in exactly to where, what people use today instead of having difficulty Microsoft and Google. Yeah. I wouldn't say never. I think the, but certainly their incentives are not secure aligned. And so I think there is going to be merit in being Switzerland here.[01:25:37] Right? It's like, look, our incentives are aligned with you as the user. Like we're not embed with, with Microsoft or Google or whatever. We're not protecting an existing ecosystem. We're just like, send AI assistant and we are gonna play as well as we can with all of your product. Yeah. Yeah. Does anyone have like I'll open up, you know, obviously we have the founder of Lindy here.[01:25:55] Like, does anyone have questions about Lindy? Did you see the launch? Did you have a follow up? Like this is a very nice place to. Ask it. Unless you wanna , you wanna start? I just wanna get, I'm gonna pay you just wanna get access. yesterday. It would be cool for you to maybe talk a little about how the integrations work.[01:26:16] And I know you're using natural language for it. I think like when tools like it, they think, oh, is my tool gonna be supportive? So yeah, maybe you wanna talk about it. Yeah, definitely. So the, and so I actually tweeted about that separately. Like, the way we build integration is we literally just give the documentation of the API to Lindy and then she out how to use the APIs on her own.[01:26:36] And so it's trivial for us to build a new integration. Like it actually takes 15 minutes to build a new integration. And so the answer to will my product, like, will my thing be supported will be yes. Like in 15 minutes. Like, it'll be like, Hey, you asked us do something. And literally it's like, we couldn't do it yesterday and today we can.[01:26:52] And it's gonna be as simple as that. So, yeah. . Yeah. I, I think to me the most interesting thing is that a lot of companies, I mean, even if you think about Airbyte and Fivetran, like when it comes to connectors, there was like the whole closed source versus open source. Like the open source usually at an advantage because the community can help you build more connectors.[01:27:12] But now using natural language, like the barrier is so much lower and just, it's just super exciting to, to, to use everything right away instead of waiting like four months because I'm the only person using that one tool. So excited to, to . Yeah, 100%. Well, even considering a world in which the user creates their own integration by themselves in like 10 minutes, it's like, hey, like give us, really the only thing we need is like, we need a, a documentation and then we need like an API token.[01:27:39] Like that's the only part that right now requires like an engineer's involvement. But you know, perhaps some power users would be fine generating some developer API token and building their own integration in like 10 minutes. I mean, I, the, the sort of app store model between Google and, and Apple and it's like the bar for quality that they held, you know what I mean?[01:27:57] That, that, I don't know. It's, it's, I don't know. It makes me think of that whole race again and it's like, do you lower the bar for quality and, and go the Android route or do you keep the qual, do you keep the bar high? And especially if, if there's, you know, issues with circumventing or interfering with safety systems and, and data quality and you know, things that are inappropriate, like, I dunno, I wonder, it makes me think, well the thing is that there is a ceiling to quality here when it comes to this integration.[01:28:23] Like, how good can you make a Gmail integration? There's like, there's like 20 endpoint or something, and then the question is like, can you call this endpoint and can you support their parameters? And it's not even the user who would actually like write the endpoint and the parameters. They would literally just like point us to the right API documentation.[01:28:39] Good point. . Yeah, I do think it's a little scary when I give my, you know, if I give like my, my Gmail integration and then you have Brad access, like actually just open source, the GBD four, like email drafter. And I didn't put any auto send or anything like that because I was so scared of it. But I wrote all the code, so it's I trust it.[01:29:02] But it'll be interesting to see how people are gonna trust these systems. Yeah. So we've built some, like hard guardrails in place where certain actions especially any endpoint that is a post endpoint we, we, we flag these actions as like, we call them like a right action. So it's like read action versus right actions.[01:29:18] And if it's a right action, we require user of information in a way that I mean this is like technical details, but like, it, it is physically impossible for the model to actually take a right action without user of inform. So the user, it asks for his information and like the user through the confirmation actually issues a token that is required for the model to be able to call that, that thing.[01:29:39] AI Safety[01:29:39] How worried about you about AI safety is, is this like coming from a place of UX or AI safety? , I'm, I'm super worried about very long term AI safety, right? Yeah. I am, I am, I am moderately worried about like medium term AI safety, like the whole like misinformation thing and like, yeah, like I'm sure there are ways in which Lindy may go wrong, but like, that's not the top of my concerns, and especially because I've built this kind of system.[01:30:04] Like I see the ways in which you can build guardrails and like, this is just like an engineering challenge. Like it's, it's very solvable now. The very long term AI safety thing, like Yeah, I mean there's like an existential race and this is, this is a whole different beast. Yeah. Part, part of me, like trying to do B2B stuff, you know, in the, in the face of AI safety issues, it feels like, you know, you're just kind of rearranging textures on the Titanic.[01:30:25] Or like, you, you know, you're the four piece string quartet playing music to entertain people while the strip is thinking like . Yeah, yeah. It is discouraging a little bit because you, you don't really have a take on the problem, do you? Right. You're like, all right, I guess this is coming. And I, like, I, I, I'm my head and I'm like, I don't really see what I can do about it.[01:30:46] Sam Altman seems to think he can turn it off. Like he has his blue bag, which presume presumably has the off the off button. That that's why he, that's why he always has it with him. Dunno. Yeah. I dunno. Yeah, yeah, yeah. So, . Yeah.[01:31:04] Multimodal GPT4[01:31:04] Well, can I get your reactions just generally on like potential of like maybe multimodal GT four, like just anything that your, your, you know, US builder are looking to really take advantage of as it, as it comes down the line?[01:31:14] Yeah, I think multimodality and you know, audio and, and image especially, I think is like the next big zero to one thing, but otherwise, I think like, just language gets so far, man. So I was just having this conversation. To me it's the same thing as like the cpu, right? Where it's like Fairchild Semiconductor and like Intel, like they gave us the CPU and I think again, like the lab coat researchers passed the button to the hackers and Z garage, like the Steve Jobs and, and, and Steve Snak who now owns the business of building the pc.[01:31:42] And so that doesn't mean that like innovation in the CPU is over, like the CPU still has like four decades of ahead of it. But yeah, like we've got the cpu and now I think that the product and engineering and hacker teams have to, to take it from there. I mean, Intel did pretty well. Totally. Yeah.[01:31:59] I'm not, I'm not saying like OpenAI is going anywhere, for sure. Yeah. Cool, cool, cool. Uh, Any other yeah, does anyone else have questions? No, I see you unmuted.[01:32:07] Designing AI-safe APIs[01:32:07] Yeah. Just upon the on the like safety, AI safety side, I mean, as much as I Sure. Hit the complexity of Im I mean like permissions in AWS and GCP and so on, the server purpose, and I think like maybe in this page, like if you can hit any endpoint on the internet like how do you control which endpoint?[01:32:24] Yeah. So maybe this is, this is like a connection for flow, like one new generation of Im, which is, you know, you have a proxy sitting in front of, in front of the internet and you're only allowed to see certain parts of the internet. You said you have like, you have like right access on the post request already, but yeah, maybe there's something around.[01:32:40] Yeah. So we're looking into this kind of catchall guardrails right now. The way our must, for example, the Gmail API is, so it actually writes code, but at no point does it use a library to make rest API and, and phone calls, right? Like it actually we give it a function that's like Gmail, send email with like primaries for like two and subject and bugging and all of that stuff, right?[01:33:01] And certain of these actions, again, require an authorization token that is specific for like that one action, and these authorization tokens actually expire. So yes, in theory the model could circumnavigate that by writing code to like call the, the API endpoints directly. We've not seen it do that yet.[01:33:17] And, and that's just not the way we train the model to behave. That's pretty response. That's like general platform question for maybe you in the future, maybe OpenAI. That if you hook it up to the, how do you prevent it from, I, I'm not saying that the AI will do something malicious, but like a developer who gets it to write some code and hidden endpoint that you didn't give it permission for.[01:33:40] So for example, you can, in Deno, I I love the permission system in Deno. You can give it access to your file system or the n or you know, like the internet, but like how do you specify only a part of the internet or only a part of a domain or so on?[01:33:56] Yeah, so open by the way, I, I, I'm a little bit bearish on the Deno permissioning because it's permissioning on the whole executable. And and that's, you know, it's basically you're going to try to relax it the moment you run into errors and people just kind of relax it all the way, you know, it's kind of.[01:34:12] True. Very. I was actually I, the way I got around it, I, I was starting a new a new process subprocess and only giving it access. Really? So instead of making Yeah, it was, it was really done. Really annoying. Well done. They should go get it only. Exactly. Yeah. It's kind of overselling the security if like everybody just runs like, you know, pseudo whatever the pseudo is in, in, in Deno.[01:34:34] But yeah. Okay. Cool. Any other reactions?[01:34:36] Flo's Closing Comments[01:34:36] Flo, before I'll give you the, the last word here, just reactions to Chatt, PT and open the eye shipping velocity in general. You're, you're always a good speaker, so leaving to you for soundbites. Soundbites. No, it's great. You know, I, I, I'm excited to see this kind of product, see the light, and I, I, I don't use them as like direct competitors just yet.[01:34:51] And even if they. Look, I think the market, this is going to be the model of our market, so I think it's gonna be, it's gonna be more than fine, but maybe room full. Mini here. Blue ocean. That's right. Time to build. Let's go. What do you think swyx? What do I think? I, I, I, I don't know what to think. That's, this is why I started this space because I saw that CHE BT can run f fm Peg, which means it is a compute platform, right?[01:35:16] Like it generates Python code, it runs the Python code. It can receive files, it can store files, it has memory and then it can let you download the files. Give it some GPUs, and you can run Lama inside of chat, gbc, for whatever reason you want. It is a new compute platform now, and I want to build for it, but I don't know what I, what I can.[01:35:38] Yeah, I, I agree. I think it's, it's, these large models are like the next operating system. I'm, I'm very convinced that that's the way people are gonna interact with the computers. Like, you're no longer gonna do work at your computer, you're gonna have a conversation with your computer and the computers gonna work for you.[01:35:55] Well, you're, you're certainly building the platform for that. So everyone go check out Lindy. I think this is a great conversation. I always want spaces to end on a high note. But thanks for joining in. I know it's like zero notice. I was just DMing you. But thanks for coming on, man. Yeah, thanks everyone.[01:36:09] Yeah, all. Go out there. Bye. Thanks. Get full access to Latent Space at www.latent.space/subscribe
01:36:1624/03/2023
From Astrophysics to AI: Building the future AI Data Stack — with Sarah Nagy of Seek.ai
If Text is the Universal Interface, then Text to SQL is perhaps the killer B2B business usecase for Generative AI. You may have seen incredible demos from Perplexity AI, OSS Insights, and CensusGPT where the barrier of learning SQL and schemas goes away and you can intuitively converse with your data in natural language.But in the multi-billion dollar data engineering industry, Seek.ai has emerged as the forerunner in building a conversational engine and knowledge base that truly democratizes data insights. We’re proud to present our first remote interview with Sarah Nagy to learn how AI can help you “seek what matters”!Timestamps* 00:00: Intro to Sarah* 03:40: Seek.ai origin* 05:45: Data driven vs Data backfit* 09:15: How Enterprises adopt AI* 12:55: Patents and IP Law* 14:05: The Semantic Layer* 16:35: Interfaces - Dashboards vs Chat?* 21:05: LLM performance and selection* 26:05: LLMOps and LangChain* 30:55: Lightning roundShow notes* Sarah Nagy Linkedin* Seek.ai* Sarah on the dbt podcastLightning Rounds* Favorite AI Product: Stable Diffusion* Favorite AI Community: Eleuther* One year prediction: Things will move fast!* Request for Startup: Scheduling/Emails (shoutout Ipso.ai from our hackathon!)* Takeaway: Automate everything! Get full access to Latent Space at www.latent.space/subscribe
37:3110/03/2023
97% Cheaper, Faster, Better, Correct AI — with Varun Mohan of Codeium
OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users.We were fortunate enough to record Episode 2 of our podcast with someone who routinely creates 90%+ improvements for their customers, and in fact have started productizing their own infra skills with Codeium, the rapidly growing free-forever Copilot alternative (see What Building “Copilot for X” Really Takes). Varun Mohan is CEO of Exafunction/Codeium, and he indulged us in diving deep into AI infrastructure, compute-optimal training vs inference tradeoffs, and why he loves suffering.Recorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold. Timestamps* 00:00: Intro to Varun and Exafunction* 03:06: GPU Efficiency, Model Flop Utilization, Dynamic Multiplexing* 05:30: Should companies own their ML infrastructure?* 07:00: The two kinds of LLM Applications* 08:30: Codeium* 14:50: “Our growth is 4-5% day over day”* 16:30: Latency, Quality, and Correctability* 20:30: Acceleration mode vs Exploration mode* 22:00: Copilot for X - Harvey AI’s deal with Allen & Overy* 25:00: Scaling Laws (Chinchilla)* 28:45: “The compute-optimal model might not be easy to serve”* 30:00: Smaller models* 32:30: Deepmind Retro can retrieve external infromation* 34:30: Implications for embedding databases* 37:10: LLMOps - Eval, Data Cleaning* 39:45: Testing/User feedback* 41:00: “Users Is All You Need”* 42:45: General Intelligence + Domain Specific Dataset* 43:15: The God Nvidia computer* 46:00: Lightning roundShow notes* Varun Mohan Linkedin* Exafunction* Blogpost: Are GPUs Worth it for ML* Codeium* Copilot statistics* Eleuther’s The Pile and The Stack* What Building “Copilot for X” Really Takes* Copilot for X* Harvey, Copilot for Law - deal with Allen & Overy* Scaling Laws* Training Compute-Optimal Large Language Models - arXiv (Chinchilla paper)* chinchilla's wild implications (LessWrong)* UL2 20B: An Open Source Unified Language Learner (20B)* Paper - Deepmind Retro* “Does it make your beer taste better”* HumanEval benchmark/dataset* Reverse Engineering Copilot internals* Quora Poe* Prasanna Sankar notes on FLOPs and Bandwidth* NVIDIA H100 specs - 3TB/s GPU memory, 900GB/s NVLink Interconnect* Optimizer state is 14x size of model - 175B params => 2.5TB to store state → needs at least 30 H100 machines with 80GB each* Connor Leahy on The Gradient PodcastLightning Rounds* Favorite AI Product: Midjourney* Favorite AI Community: Eleuther and GPT-J* One year prediction: Better models, more creative usecases* Request for Startup: Superathlete Fitness Assistant* Takeaway: Continue to tinker!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx, writer, editor of L Space Diaries.[00:00:20] swyx: Hey, and today we have Varun Mohan from Codeium / Exafunction on. I should introduce you a little bit because I like to get the LinkedIn background out of the way.[00:00:30] So you did CS at MIT and then you spent a few years at Nuro where you were ultimately tech lead manager for autonomy. And that's an interesting dive. Self-driving cars in AI and then you went straight into Exafunction with a few of your coworkers and that's where I met some of them and started knowing about Exafunction.[00:00:51] And then from out of nowhere you cloned GitHub Copilot. That's a lot of progress in a very short amount of time. So anyway, welcome .[00:00:59] Varun Mohan: That's high praise.[00:01:00] swyx: What's one thing about you that doesn't appear on LinkedIn that is a big part of what people should know?[00:01:05] Varun Mohan: I actually really like endurance sports actually.[00:01:09] Like I, I've done multiple triathlons. I've actually biked from San Francisco to LA. I like things that are like suffering. I like to suffer while I, while I do sports. Yeah.[00:01:19] swyx: Do you think a lot about like code and tech while you're doing those endurance sports or are you just,[00:01:24] Varun Mohan: your mind is just focused?[00:01:26] I think it's maybe a little bit of both. One of the nice things about, I guess, endurance athletics, It's one of the few things you can do where you're not thinking about, you can't really think about much beyond suffering. Like you're climbing up a hill on a bike and you see like, uh, you see how many more feet you need to climb, and at that point you're just struggling.[00:01:45] That's your only job. Mm-hmm. . Yeah. The only thing you can think of is, uh, pedaling one more pedal. So it's actually like a nice, a nice way to not think about work. Yeah,[00:01:53] Alessio Fanelli: yeah, yeah. Maybe for the audience, you wanna tell a bit about exa function, how that came to be and how coding came out[00:01:59] Varun Mohan: of that. So a little bit about exo function.[00:02:02] Before working at exa function, I worked at Neuro as Sean was just saying, and at neuro, I sort of managed large scale offline deep learning infrastructure. Realized that deep learning infrastructure is really hard to build and really hard to maintain for even the most sophisticated companies, and started exa function to basically solve that gap, to make it so that it was much easier for companies.[00:02:24] To serve deep learning workloads at scale. One of the key issues that we noticed is GPUs are extremely hard to manage fundamentally because they work differently than CPUs. And once a company has heterogeneous hardware requirements, it's hard to make sure that you get the most outta the hardware. It's hard to make sure you can get, get great GPU utilization and exa function was specifically built to make it so that you could get the most outta the hardware.[00:02:50] Make sure. Your GP was effectively virtualized and decoupled from your workload to make it so that you could be confident that you were running at whatever scale you wanted without burning the bank.[00:03:00] swyx: Yeah. You gave me this metric about inefficiency,[00:03:03] Varun Mohan: right? Oh, okay. Like flop efficiency. Yeah. Yeah. So basically, I think it comes down to, for most people, one of the things about CPUs that's really nice is with containers, right?[00:03:13] You can end up having a single. You can place many containers on them and all the containers will slowly start eating the compute. It's not really the same with GPUs. Like let's say you have a single. For the most part, only have one container using that gpu. And because of that, people heavily underestimate what a single container can sort of do.[00:03:33] And the GPU is left like heavily idle. And I guess the common term now with a lot of LM workloads is like the flop efficiency of these workloads. M F U, yeah. Yeah. Model flop utilization. The model flop utilization, which is basically like what fraction of the flops or compute on the hardware is actually getting used.[00:03:49] And sort of what we did at exa function. Not only make it so that the model was always running, we also built compiler technology to make it so that the model was also running more efficiently. And some of these things are with tricks like operator fusion, like basically you could imagine fusing two operations together such that the time it takes to compute.[00:04:07] the fused operation is lower than the time it takes for each individual operation. Oh my God. Yeah. .[00:04:13] Alessio Fanelli: Yeah. And you have this technique called dynamic multiplexing, which is basically, instead of having a one-to-one relationship, you have one GP for multiple clients. And I saw one of your customers, they went from three clients to just one single GPU and the cost by 97%.[00:04:29] What were some of those learning, seeing hardware usage and efficiencies and how that then played into what, what[00:04:34] Varun Mohan: you're building? Yeah, I think it basically showed that there was probably a gap with even very sophisticated teams. Making good use of the hardware is just not an easy problem. I think that was the main I, it's not that these teams were like not good at what they were doing, it's just that they were trying to solve a completely separate problem.[00:04:50] They had a model that was trained in-house and their goal was to just run it and it, that should be an easy. Easy thing to do, but surprisingly still, it's not that easy. And that problem compounds in complexity with the fact that there are more accelerators now in the cloud. There's like TPUs, inferential and there's a lot of decisions, uh, that users need to make even in terms of GPU types.[00:05:10] And I guess sort of what we had was we had internal expertise on what the right way to run the workload was, and we were basically able to build infrastructure and make it so that companies could do that without thinking. So most[00:05:21] Alessio Fanelli: teams. Under utilizing their hardware, how should they think about what to own?[00:05:26] You know, like should they own the appearance architecture? Like should they use Xlo to get it to production? How do you think[00:05:32] Varun Mohan: about it? So I think one thing that has proven to be true over the last year and a half is companies, for the most part, should not be trying to figure out what the optimal ML architecture is or training architecture is.[00:05:45] Especially with a lot of these large language models. We have generic models and transformer architecture that are solving a lot of distinct problems. I'll caveat that with most companies. Some of our customers, which are autonomous vehicle companies, have extremely strict requirements like they need to be able to run a model at very low latency, extremely high precision recall.[00:06:05] You know, GBT three is great, but the Precision Recall, you wouldn't trust someone's life with that, right? So because of that, they need to innovate new kinds of model architectures. For a vast majority of enterprises, they should probably be using something off the shelf, fine tuning Bert models. If it's vision, they should be fine tuning, resonant or using something like clip like the less work they can do, the better.[00:06:25] And I guess that was a key turning point for us, which is like we start to build more and more infrastructure for the architectures that. The most popular and the most popular architecture was the transformer architecture. We had a lot of L L M companies explicitly reach out to us and ask us, wow, our GT three bill is high.[00:06:44] Is there a way to serve G P T three or some open source model much more cheaply? And that's sort of what we viewed as why we were maybe prepared for when we internally needed to deploy transform models our.[00:06:58] Alessio Fanelli: And so the next step was, Hey, we have this amazing infrastructure. We can build kind of consumer facing products, so to speak, at with much better unit economics, much better performance.[00:07:08] And that's how code kind[00:07:10] Varun Mohan: of came to be. Yeah. I think maybe the, the play is not maybe for us to be just, we make a lot of consumer products. We want to make products with like clear ROI in the long term in the enterprise. Like we view code as maybe one of those things. Uh, and maybe we can, we can talk about code maybe after this.[00:07:27] We. Products like co-pilot as being extremely valuable and something that is generating a lot of value to professionals. We saw that there was a gap there where a lot of people probably weren't developing high intensive L L M applications because of cost, because of the inability to train models the way they want to.[00:07:44] And we thought we could do that with our own infrastructure really quickly.[00:07:48] swyx: I wanna highlight when you say high intensive, you mean basically generate models every key, uh, generate inferences on every keystroke? That's[00:07:55] Varun Mohan: right. Yeah. So I would say like, there's probably two kinds of L l M applications here.[00:07:59] There's an L L M application where, you know, it rips through a bunch of data and maybe you wait a couple minutes and then you see something, and then there's an application where the quality is not exactly what you want, but it's able to generate enough, sorry, low enough latency. It's still providing a ton of value.[00:08:16] And I will say there's like a gap there where the number of products that have hit that co-pilot spot is actually not that high. Mm. A lot of them are, are kind of like weight and, you know, just generate a lot of stuff and see what happens because one is clearly more compute intensive than the other Basically.[00:08:31] swyx: Well co uh, I don't know if we told the whole story yet, you were going to[00:08:35] Varun Mohan: dive into it. . Yeah, so I guess, I guess the story was I guess four or five months ago we sort of decided internally as a team we were like very early adopters of co-pilot. I'm not gonna sit here and say co-pilot, it's not a great tool.[00:08:45] We love co-pilot. It's like a fantastic tool. We all got on the beta. The moment it came out we're like a fairly small T, but we, like we all got in, we were showing each other completions. We end up writing like a lot of cuda and c plus plus inside the company. And I think there was probably a thought process within us that was like, Hey, the code we write is like very high aq.[00:09:04] You know? So like there's no way it can help. And one of the things in c plus plus that's like the most annoying is writing templates. Writing template programming is maybe one of those things. No one, maybe there's like some people in the C plus O standards community that can do it without looking at the, looking at anything online.[00:09:19] But we struggle. We struggle writing bariatric templates and COPA just like ripped through. Like we had a 500 line file and it was just like writing templates like, and we didn't really even test it while we were running it. We then just compiled it and it just, We're like, wow. Like this is actually something that's not just like it's completing four loops, it's completing code for us.[00:09:38] That is like hard in our brains to reach, but fundamentally and logically is not that complicated. The only reason why it's complicated is there's just a lot of rules, right. And from then we were just like, wow, this is, that was maybe the first l l m application for us internally, because we're not like marketers that would use, uh, Jasper, where we were like, wow, this is like extremely valuable.[00:09:58] This is not a toy anymore. So we wanted to take our technology to build maybe apps where these apps were not gonna be toys, right? They were not gonna be like a demo where you post it on Twitter and then you know there's hype and then maybe like a month later, no one's using.[00:10:11] swyx: There's a report this morning, um, from co-pilot where they, they were estimating the key tabs on amount of code generated by a co-pilot that is then left in code repos and checked in, and it's something like 60 to 70%[00:10:24] Varun Mohan: That's, that's nuts, but I totally believe it given, given the stats we have too. There's this flips in your head once you start using products like this, where in the beginning there's like, there's like skepticism, like how, how valuable can it be? And suddenly now like user behavior fundamentally changes so that now when I need to write a function, I'm like documenting my code more because I think it's prompting the model better, right?[00:10:43] So there's like this crazy thing where it's a self-fulfilling prophecy where when you get more value from it, more of your code is generated. From co-pilot[00:10:50] swyx: just to walk through the creation process, I actually assumed that you would have grabbed your data from the pile, which is the Luther ai, uh, open source, uh, code information.[00:11:00] But apparently you scraped your own[00:11:01] Varun Mohan: stuff. Yeah. We ended up basically using a lot of open, I guess, permissively licensed code, uh, in the public internet, mainly because I think also the pile is, is fairly a small subset. Uh, I think maybe after we started there was the, that was also came to be, but for us, we had a model for ourselves even before that, uh, was the point.[00:11:21] Ah, okay. So the timing was just a little bit off. Yeah, exactly. Exactly. But it's awesome work. It's, it seems like there's a good amount of work that's getting done Decentrally. Yeah. Which is a little bit surprising to me because I'm like more bullish on everyone needs to get together in a room and make stuff happen.[00:11:35] Like we're all in person in Mountain View. But yeah, no, it's pretty impressive. Yeah. Luther in general, like everything they've done, I'm pretty impressed with it. Yeah, and we're[00:11:42] swyx: gonna talk about that. Cause I, I didn't know you were that involved in the community[00:11:45] Varun Mohan: that early on I wasn't involved. It was more of like a, I was watching and maybe commenting from time to time.[00:11:50] So they're a very special community for sure. Yeah,[00:11:52] swyx: yeah, yeah. That's true. That's true. My impression is a bunch of you are geniuses. You sit down together in a room and you. , get all your data, you train your model, like everything's very smooth sailing. Um, what's wrong with that[00:12:02] Varun Mohan: image? Yeah, so probably a lot of it just in that a lot of our serving infrastructure was already in place, Uhhuh before then.[00:12:09] So like, hey, we were able to knock off one of these boxes that I think a lot of other people maybe struggle with. The open source serving offerings are just, I will say, not great in that. That they aren't customized to transformers and these kind of workloads where I have high latency and I wanna like batch requests, and I wanna batch requests while keeping latency low.[00:12:29] Mm-hmm. , right? One of the weird things about generation models is they're like auto regressive, at least for the time being. They're auto aggressive. So the latency for a generation is a function of the amount of tokens that you actually end up generating. Like that's like the math. And you could imagine while you're generating the tokens though, unless you batch a.[00:12:46] It's gonna end up being the case that you're not gonna get great flop utilization on the hardware. So there's like a bunch of trade offs here where if you end up using something completely off the shelf, like one of these serving thing, uh, serving frameworks, you're gonna end up leaving a lot of performance on the table.[00:13:00] But for us, we were already kind of prepared. To sort of do that because of our infrastructure that we had already built up. And probably the other thing to sort of note is early on we were able to leverage open source models, sort of bootstrap it internally within our company, but then to ship, we finally had some requirements like, Hey, we want this model to have fill in the middle capabilities and a bunch of other things.[00:13:20] And we were able to ship a model ourselves. So we were able to time it so that over the course of multiple months, different pieces were like working out properly for us. So it wasn't. . You know, we started out and we were just planning the launch materials. The moment we started there was like maybe some stuff that was already there, some stuff that we had already figured out how to train models at scale internally.[00:13:38] So we were able to just leverage that muscle very quickly. I think the one[00:13:41] swyx: thing that you had figured out from the beginning was that it was gonna be free forever. Yeah. Yeah, co-pilot costs $10[00:13:47] Varun Mohan: a month. Co-pilot costs $10 a month. I would argue significantly more value than $10 a month. The important thing for us though, was we are gonna continue to build more great products on top of code completion.[00:13:58] We think code completion is maybe day one of what the future looks like. And for that, clearly we can't be a product that's like we're $10 a month and we're adding more products. We want a user base that loves using us. And we'll continue to stay with us as we continue to layer on more products. And I'm sure we're gonna get more users from the other products that we have, but we needed some sort of a differentiator.[00:14:17] And along the way we realized, hey, we're pretty efficient at running these workloads. We could probably do this. Oh, so it wasn't,[00:14:23] swyx: it was a plan to be free from the start. You just[00:14:25] Varun Mohan: realized we, yeah. We realized we could probably, if we cut and optimized heavily, we could probably do this properly. Part of the reasoning here was we were confident we could probably build a pro tier and go to the enter.[00:14:35] But for now, originally when we, when we started, we weren't like, we're just gonna go and give every, all pieces of software away for free. That wasn't like sort of the goal there. And[00:14:43] swyx: since you mentioned, uh, adoption and, you know, traction and all that, uh, what can you disclose about user growth? Yeah, user adoption.[00:14:50] Varun Mohan: Yeah. So right now we have. We probably have over 10,000 users and thousands of daily actives, and people come back day over day. Our growth is like around, you know, four to 5% day over day right now. So all of our growth right now is sort of like word of mouth, and that's fundamentally because like the product is actually one of those products where.[00:15:08] Even use COT and use us, it's, it's hard to tell the difference actually. And a lot of our users have actually churned off of cot isn't Yeah. I,[00:15:14] swyx: I swept Yeah. Yeah. To support you guys, but also also to try[00:15:17] Varun Mohan: it out. Yeah, exactly. So the, the crazy thing is it wasn't like, Hey, we're gonna figure out a marketing motion of like, Going to the people that have never heard of co-pilot and we're gonna like get a bunch of users.[00:15:27] We wanted to just get users so that in our own right we're like a really great product. Uh, and sort of we've spent a lot of engineering time and obviously we co-wrote a blog post with you, Sean, on this in terms of like, there's a lot of engineering work, even beyond the latency, making sure that you can get your cost down to make a product like this actually work.[00:15:44] swyx: Yeah. That's a long tail of, of stuff that you referenced,[00:15:47] Varun Mohan: right? Yes. Yeah, exactly.[00:15:48] swyx: And you, you said something to the order of, um, and this maybe gets into co-pilot for X uh, which is something that everybody is keen about cuz they, they see the success of co-pilot. They're like, okay, well first of all, developer tools, there's more to do here.[00:16:00] And second of all, let's say the co-pilot idea and apply for other disciplines. I don't know if you wanna Yeah.[00:16:06] Varun Mohan: There's[00:16:06] Alessio Fanelli: gonna some. Key points that, that you touched on. Um, how to estimate, inference a scale, you know, and the latency versus quality trade-offs. Building on first party. So this is free forever because you run your own models, right?[00:16:19] That's right. If you were building on open ai, you wouldn't be able to offer it for free real-time. You know, when I first use coding, It was literally the same speed as Copi is a little bit[00:16:29] swyx: faster. I don't know how to quantify it,[00:16:31] Varun Mohan: but we are faster. But it's one of those things that we're not gonna like market as that's the reason because it's not in and of itself a right for you to like, I'm just gonna be open with you.[00:16:39] It's not a reason for you to like suddenly turn off a copilot where if our answers were trash, uh, but we were faster. You know what I mean? But your focus[00:16:46] Alessio Fanelli: was there. We used the alpha, I think prem on our discord came to us and say, you guys should try this out. So it was really fast. Even then, prompt optimization is another big thing, and model outputs and UX kind of how you bring them together.[00:17:00] Which ones of these things are maybe like the one or two that new founders should really think about first?[00:17:07] Varun Mohan: Yeah, I think, I think my feeling on this is unless you are ex, you probably should always bootstrap on top of an existing a. Because like even if you were to, the only reason why we didn't is because we knew that this product was actually buildable.[00:17:22] Probably if we worked hard enough to train a model, we would actually be able to build a great product already. But if you're actually going out and trying to build something from scratch, unless you genuinely believe, I need to fine tune on top of, you know, terabytes of data terabyte is a very large amount of data, but like tens of gigabytes of data.[00:17:37] Probably go out and build on top of an API and spend most of your time to make it so that you can hit that quality latency trade off properly. And if I were to go out and think about like the three categories of like an LM product, it's probably like latency, quality, and correct ability. The reality is, you know, if I were to take a product like co-pilot or Coum, the latency is very low.[00:17:58] The quality I think, is good enough for the task, but the correct ability is, is very easy. Credibility. What, what is correct ability? Correct ability means, let's say the quality is not there. Like you consider the the case where, The answer is wrong. How easy is it for your user to actually go and leverage parts of the generation?[00:18:16] Maybe a, a concrete example. There's a lot of things people are excited about right now where I write a comment and it generates a PR for me, and that's like, that's like really awesome in theory. I think that's like a really cool thing and I'm sure at some point we will be able to get there. That will probably require an entirely new model for what it's worth that's trained on diffs and commits and all these other things that looks at like improvements and code and stuff.[00:18:37] It's probably not gonna be just trained on generic code. But the problem with those, those sort of, I would say, applications are that, let's suppose something does change many files, makes large amounts of changes. First of all, it's guaranteed not gonna be. Because even the idea of like reviewing the change takes a long time.[00:18:54] So if the quality and the correct ability is just not there, let's say you had 10 file, a 10 file change and you modified like, you know, file two and four, and those two modifications were consistent, but the other eight files were not consistent. Then suddenly the correct ability is like really hard.[00:19:10] It's hard to correct the output of the model. And so the user interface is 100% really important. But maybe until you get the latency down or the correct ability, like correct ability, like a lot better, it's probably not gonna be shippable. And I think that's what you gotta spend your time focusing on.[00:19:26] Can you deliver a product that is actually something users want to use? And I think this is why I was talking about like demo. It's like very easy to hand to handpick something that like works, that works for a demo, exceedingly hard for something that has large scope, like a PR to work consistently. It will take a lot of engineering effort to make it work on small enough chunks so that a user is like, wow, this is value generative to me.[00:19:49] Because eroding user trust or consumer trust is very easy. Like that is, it is is much, much, it's very easy to erode user trust versus enterprise. So just be mindful of that, and I think that's probably like the mantra that most of these companies need to operate under. Have you done any[00:20:05] Alessio Fanelli: analysis on. What the ratio between code generated and latency is.[00:20:11] So you can generate one line, but you could also generate the whole block. You can generate Yeah. A whole class and Yeah. You know, the more you generate the, the more time it takes. Like what's the sweet spot that, that you[00:20:21] Varun Mohan: found? Yeah, so I think there was a great study and, and I'm not sure if it's possible to link it, but there was a great study about co-pilot actually that came out.[00:20:28] Basically what they said was there were two ways that developers usually develop with a code assistant technology. They're either in what's called like acceleration mode or exploration mode. And exploration mode is basically you're in the case where you don't even know what the solution space for the function is.[00:20:43] and you just wanna generate a lot of code because you don't even know what that looks like. Like it might use some API that you've never heard of. And what you're actually doing at that point is like you're writing a clean comment, just wishing and praying that you know, the generation is long enough and gets you, gets you far enough, right?[00:20:57] acceleration mode is basically you are doing things where you are very confident in what you're doing and effectively. Code gives you that muscle so that you can basically stay in flow state and you're not thinking about like exactly what the APIs look like, but push comes to shove. You will figure out what the APIs look like, but actually like mentally, it takes off like a load in your head where you're like, oh wow.[00:21:18] Like I can just do this. The intent to execution is just a lot, a lot lower there. And I think effectively you want a tool that captures that a little bit. And we have heuristics in terms of captur. Whether or not you're in acceleration versus exploration mode. And a good heuristic is, let's say you're inside like a basic block of a piece of code.[00:21:37] Let's say you're inside a a block of code or an IF statement, you're probably already in acceleration mode and you would feel really bad if I started generating the ELs clause. Because what happens if that else causes really wrong? That's gonna cause like mental load for you because you are the way programmers think.[00:21:51] They only want to complete the if statement first, if that makes sense. So there are things where we are mindful of like how many lines we generate if you use the product, like multi-line generations happen and we are happy to do them, but we don't want to do them when we think it's gonna increase load on developers, if that makes sense.[00:22:07] That[00:22:07] Alessio Fanelli: makes sense. So co-pilot for x. , what are access that you think are interesting for people to build[00:22:13] Varun Mohan: in? Didn't we see some, some tweet recently about Harvey ai, uh, company that, that is trying to sell legal? It's like a legal, legal assistance. That's, that's pretty impressive, honestly. That's very impressive.[00:22:23] So it seems like I would really love to see what the product looks like there, because there's a lot of text there. You know, looking at bing, bing, ai, like, I mean, it's, it's pretty cool. But it seems like groundedness is something a lot of these products struggle with, and I assume legal, if there's one thing you want them to.[00:22:39] To get right. It's like the groundedness. Yeah.[00:22:42] swyx: Yeah. I've made the analogy before that law and legal language is basically just another form of programming language. You have to be that precise. Yes. Definitions must be made, and you can scroll to find the definition. It's the same thing. Yes. ,[00:22:55] Varun Mohan: yes. Yeah. But like, I guess there's a question of like comprehensiveness.[00:22:59] So like, let's say, let's say the only way it generates a suggestion is it provides like, you know, citations to other legal. You don't want it to be the case that it misses things, so you somehow need the comprehensiveness, but also at the same time, you also don't want it to make conclusions that are not from the site, the things at sites.[00:23:15] So, I don't know, like that's, that's very impressive. It's clear that they've demonstrated some amount of value because they've been able to close a fairly sizable enterprise contract. It was like a firm with 3,500 lawyers, something nuts, honestly. Very cool. So it's clear this is gonna happen, uh, and I think people are gonna need to be clever about how they actually make it work.[00:23:34] Within the constraints of whatever workload they're operating in. Also, you, you guys[00:23:37] swyx: are so good at trading stuff, why don't you, you try[00:23:39] Varun Mohan: cloning it. Yeah. So I think, I think that's, that's, uh, preview the roadmap. Yeah, yeah, yeah, yeah. No, no, no, but I'm just kidding. I think one of the things that we genuinely believe as a startup is most startups can't really even do one thing properly.[00:23:52] Mm-hmm. Focus. Yeah. Yeah. Usually doing one thing is really hard. Most companies that go public have like maybe a couple big products. They don't really have like 10, so we're under no illusions. Give the best product experience, the amount of engineering and attention to detail, to build one good product as hard.[00:24:08] So it's probably gonna be a while before we even consider leaving code. Like that's gonna be a big step because the amount of learning we need to do is gonna be high. We need to get users right. We've learned so much from our users already, so, yeah, I don't think we'd go into law anytime soon.[00:24:22] swyx: 3,500 lawyers with Ellen and Ry, uh, is, is is apparently the, the new[00:24:27] Varun Mohan: That's actually really big.[00:24:28] Yeah. Yeah. I can congrat.[00:24:29] swyx: Yeah, it's funny cuz like, it seems like these guys are moving faster than co-pilot. You know, co-pilot just launched, just announced enterprise, uh, like co-pilot for teams or co-pilot for Enterprise. Yeah. After like two years of testing.[00:24:40] Varun Mohan: Yeah, it does seem like the co-pilot team has built a very, very good product.[00:24:44] Um, so I don't wanna like say anything, but I think it is the case to startups will be able to move faster. I feel like that is true, but hey, like GitHub has great distribution. Whatever product they do have, they will be able to sell it really. Shall[00:24:56] swyx: we go into model numbers and infra estimates? our favorite[00:25:01] Varun Mohan: topics.[00:25:02] Nice small models. Nice.[00:25:04] swyx: So this is, um, relevant to basically I'm researching a lot of skilling law stuff. You have a lot of thoughts. You, you host paper discussions[00:25:12] Varun Mohan: in your team. Yeah, we, we try to like read papers that we think are really interesting and relevant to us. Recently that's been, there's just a fire hose of papers.[00:25:21] You know, someone even just curating what papers we should read internally as a company. Yeah, I think, I think there's, there's so much good content[00:25:28] swyx: out there. You should, you guys should have a podcast. I mean, I told you this before. Should have a podcast. Just, just put a mic near where, where you guys are[00:25:33] Varun Mohan: talking.[00:25:34] We gotta, we gotta keep developing coding though, . No, but you're doing this discussion[00:25:38] swyx: anyway. You[00:25:38] Varun Mohan: might as well just, oh, put the discussion on a podcast. I feel like some of the, some of the thoughts are raw, right? Like, they're not gonna be as, as nuanced. Like we'll just say something completely stupid during our discussions.[00:25:48] I don't know, , maybe that's exciting. Maybe that's, it's kinda like a justin.tv, but for ML papers, Okay, cool. I watched that.[00:25:55] swyx: Okay, so co-pilot is 12 billion parameters. Salesforce cogen is up to 16. G P t three is 175. GP four is gonna be 100 trillion billion. Yeah. So what, what we landed on with you is with, uh, with Cilla, is that we now have an idea of what compute optimal data scaling is.[00:26:14] Yeah. Which is about 20 times parameters. Is that intuitive to you? Like what, what did that[00:26:18] Varun Mohan: unlock? I think basically what this shows is that bigger models are like more data efficient, like given the same number of tokens, a big model like trained on the same number of tokens. A bigger model is like, is gonna learn more basically.[00:26:32] But also at the same time, the way you have to look at it is there are more flops to train a bigger model on the same number of tokens. So like let's say I had a 10 billion parameter model and I trained it on on 1 million tokens, but then I had a 20 billion parameter model at the end of it will be a better.[00:26:47] It will have better perplexity numbers, which means like the probability of like a prediction is gonna be better for like the next token is gonna be better. But at the end of it, you did burn twice the amount of compute on it. Right? So Shinto is an interesting observation, which says if you have a fixed compute budget, And you want the best model that came out of it because there's like a difference here where a model that is, that is smaller, trained on the same number of tokens as fewer flops.[00:27:12] There's a a sweet spot of like number of tokens and size a model. I will say like people probably like. Are talking about it more than they should, and, and I'll, I'll explain why, but it's a useful result, which is like, let's say I have, you know, some compute budget and I want the best model. It tells you what that, what you should generate.[00:27:31] The problem I think here is there is a real trade off of like, you do need to run this model somewhere. You need to run it on a piece of hardware. So then it comes down to how much memory does that piece of hardware have. Let's say for a fixed compute budget, you could train a 70 billion parameter. What are you gonna put that on?[00:27:47] Yeah, maybe you could, could you put that on an 80 gig, A 100? It would be a stretch. You could do things like f, you know, in eight F p a, to reduce the amount of memory that's on the box and do all these other things. But you have to think about that first, right? When you want to go out and train that model.[00:27:59] The worst case is you ended up training that mo, that model, and you cannot serve it. So actually what you end up finding is for a lot of these code completion models, they are actually what you would consider over-trained . So by that I mean like, let's look at a model like Cogen. It's actually trained on, I believe, and, and I could be wrong by, you know, a hundred billion here or there.[00:28:18] I got some data. Oh, okay. Let's look at the 3 billion parameter model. It's a 2.7. I think it's actually a 2.7 billion barometer model. It's weird because they also trained on natural language on top of code, but it's trained on hundreds of billions of tokens. If you applied that chinchilla, Optimization to it, you'd be like, wow, this is, this is a stupid use of compute.[00:28:36] Right? Because three, they should be going to 60, any anything more than 60. And they're like, they should have just increased the model size. But the reality is if they had like the compute optimal one might not be one that's easy to serve, right? It could just have more parameters. And for our case, our models that we train internally, they might not be the most compute.[00:28:56] In other words, we probably could have had a better model by making it larger, but the trade off would've been latency. We know what the impact of having higher latency is, and on top of that, being able to fit properly on our hardware constraints would've also been a concern.[00:29:08] swyx: Isn't the classic stopping point when you, you see like loss kind of levels off.[00:29:12] Right now you're just letting chinchilla tell you,[00:29:16] Varun Mohan: but like you should just look at loss. The problem is the loss will like continue to go down. It'll just continue to go down like, like in a, in a way that's like not that pleasing. It's gonna take longer and longer. It's gonna be painful, but it's like one of those things where if you look at the perplexity number of difference between.[00:29:31] Let's say a model that's like 70 billion versus 10 billion. It's not massive. It's not like tens of percentage points. It's like very small, right? Mm. The reality is here, like, I mean this comes down to like IQ of like these models in some sense, like small wins at the margins are massive wins in terms of iq.[00:29:47] Like it's harder to get those and they don't look as big, but they're like massive wins in terms of reasoning. They can now do chain of thought, all these other things. Yeah, yeah, yeah.[00:29:55] swyx: It's, and, and so apparently unlocked around the[00:29:57] Varun Mohan: 20 billion. Yes. That's right. Some kind of magic. Yeah. I think that was from the UL two or maybe one of those land papers.[00:30:03] Any thoughts on why? Like is there is? I don't know. I mean, emergence of intelligence, I think. I think maybe one of the things is like we don't even know, maybe like five years from now of what we're gonna be running are transformers. But I think it's like, we don't, we don't 100% know that that's true. I mean, there's like a lot of maybe issues with the current version of the transformers, which is like the way attention works, the attention layers work, the amount of computers quadratic in the context sense, because you're like doing like an n squared operation on the attention blocks basically.[00:30:30] And obviously, you know, one of the things that everyone wants right now is infinite context. They wanna shove as much prop as possible in here. And the current version of what a transformer looks like is maybe not ideal. You might just end up burning a lot of flops on this when there are probably more efficient ways of doing it.[00:30:45] So I'm, I'm sure in the future there's gonna be tweaks to this. Yeah. Uh, but it is interesting that we found out interesting things of like, hey, bigger is pretty much always better. There are probably ways of making smaller models significantly better through better data. That is like definitely true. Um, And I think one of the cool things that the stack showed actually was they did a, like a, I think they did some ablation studies where they were like, Hey, what happens if we do, if we do decontamination of our data, what happens if we do de-duplication?[00:31:14] What happens if we do near dup of our data and how does the model get better? And they have like some compelling results that showcase data quality really matters here, but ultimately, Yeah, I think it is an interesting result that at 20 billion there's something happening. But I also think like some of these things in the future may look materially different than what they look like right now.[00:31:30] Hmm. Do you think[00:31:31] Alessio Fanelli: the token limitation is actually a real architectural limitation? Like if you think about the tokens need as kind of like atic, right? Like once you have. 50,000 tokens context, like 50,000 or infinite. For most use cases, it's like the same. Where do you think that number is, especially as you think about code, like some people have very large code bases, there's a lot.[00:31:53] Have you done any work there to figure out where the sweet[00:31:55] Varun Mohan: spot is? Yeah, look, I think what's gonna really end up happening is if people come up with a clever way and, and it, there was some result research that I believe came out of Stanford. I think the team from the Helm group, I think came out with some architecture that looks a little bit different than Transformers, and I'm sure something like this will work in the future.[00:32:13] What I think is always gonna happen is if you find a cheap way to embed context, people are gonna figure out a way to, to put as much as possible in because L LM so far have been like virtually stateless. So the only thing that they have beyond fine tuning is like just shoveling everything you can inside.[00:32:28] And there are some interesting papers, like retro, actually there are maybe some interesting pieces of thought like ideas that have come out recently. Yeah, let's go through them. So one of the really interesting ideas, I think is retro. It's this paper that came out of DeepMind and the idea is actually, let's say you send out, you send out, uh, a prompt.[00:32:44] Okay? Send out a prompt. You compute the burt embedding of that. And then you have this massive embedding database. And by massive, I'm not talking about like gigabytes, I'm talking about terabytes. Like you have, geez, you actually have 10 times the number of tokens as what was used to train the model. So like, let's say you had a model that was trained on a trillion tokens, you have a 10 trillion embed, uh, like embedding database.[00:33:04] And obviously Google has this because they have all content that ever existed in humanity and they have like the best data set and sort of, they were able to make one of these, uh, embedding databases. But the idea here, which is really cool, is you end. Taking your prompt, computing, the bird, embedding you find out the things that were nearby.[00:33:20] So you do roughly like a semantic search or an embedding search within that. And then you take those, you take the documents that were from those embeddings and you shove those in the model too, in what are called like cross chunked attention. So you like shove them in the model with it as well.[00:33:34] Suddenly now the model is able to take in external. Which is really exciting actually, because suddenly now you're able to get dynamic context in, and the model in some sense is deciding what that context is. It's not deciding it completely. In this case, because the Bert model in this case was actually frozen.[00:33:50] It wasn't trained with the retro model as well, but. The idea is you're somehow adding or augmenting context, which I think is like quite exciting. There's probably two futures. Either context becomes really cheap. Right now it's quadratic. Maybe there's a future where it becomes linear in the, in the size of the context, but the future might actually be the model itself dictates, Hey, I have this context.[00:34:10] You have this data source. Give me this. The model itself is going out into your database and like being like, I want this information, and this is kind of like. What Bing search is looking like. Right? Or bing chat is sort of looking like where it's like I, the model is probably, there's probably some model that's saying I want this information.[00:34:27] And that is getting augmented into the context. Now the model itself knows what context it sort of has and it can sort of like build a state machine of sort of what it needs. And that's probably what the future of this looks like. So you, you[00:34:37] swyx: predict monster embedding database[00:34:39] Varun Mohan: companies? Probably Monster embedding database companies or, yeah.[00:34:43] The model in some sense will need to talk to, Talk to these embedding databases. I'm actually not convinced that the current breed of embedding database companies are like ready for what the future sort of looks like. I think I'm just looking at their pricing, how much it costs per gigabyte and it's prohibitive at the scale we're talking about, like let's say you actually did want to host a 10 terabyte embedding database.[00:35:03] A lot of them were created, let's say two years ago, two, three years ago, where people were like, you know, embedding databases are small and they need to make the cost economics work. But maybe, yeah, there's probably gonna be a big workload there. I will just say for us, we will probably just build this in-house to start with, and that's because I think the technology probably isn't there.[00:35:20] And I think that the technology isn't there yet. Like waiting on point solutions to come up is a lot harder, um, than probably building it up. The way I, I like to think about this is probably the world looks on the LM space. Looks like how the early internet days were, where I think the value was accrued to probably like Google and Google needed to figure out all the crazy things to make their workload work.[00:35:41] And the reason why they weren't able to outsource is, is no one else was feeling the pain. ,[00:35:46] swyx: they're just solving their own pain points. They're just solving their own pain points. They're so far ahead of everyone else. Yes, yes. And just wait[00:35:50] Varun Mohan: for people to catch up. Yes. Yes. And that's maybe different than how things like Snowflake look where the interface has been decided for what SQL looks like 50 years ago.[00:35:58] And because of that, you can go out and build the best database and Yeah, like everyone's gonna be like, this doesn't make my beer taste better. And buy your database basically. That's[00:36:08] swyx: a great reference, by the way. Yeah. We have some friends of the, the pod that are working on embedding database, so we'll try to connect you Toroma[00:36:14] Varun Mohan: and see.[00:36:14] Yeah. Oh, I actually know Anton. I worked with him at Neuro. Oh. Although, there you go. Yeah. Uh, what do you, well, what do you think about, I mean,[00:36:20] swyx: so chromas pivoting towards an embedding[00:36:22] Varun Mohan: database. I think it's an interesting idea. I think it's an interesting idea. I wonder what the early set of workloads that.[00:36:27] They will hit our, and you know what the scaling requirements are. This is maybe the classic thing where like, the teams are great, but you need to pick a workload here that you care about the most. You could build anything. You could build anything. When you're an infrastructure company, you can go in, if I was selling, serving in for, I could build, serving for like linear aggression.[00:36:44] I could build this, but like, unless you hit the right niche for the end user, it's gonna be. . So I think it, I'm excited to see what comes out and if they're great, then we'll use it. Yeah.[00:36:54] swyx: I also like how you slowly equated yourself to Google there. Oh, we're not, we're not Google. You're, you're gonna be the Google of ai.[00:37:00] Varun Mohan: We're definitely, we're definitely not Google. But I was just saying in terms of like, if you look at like the style of companies that came out. Yeah. You know? Absolutely. Or maybe we should live in the cutting edge in[00:37:08] swyx: the future. Yeah. I think that's the pitch.[00:37:10] Varun Mohan: Okay, thanks for b***h us.[00:37:13] Alessio Fanelli: So you just mentioned the older vector embedding source are kind of not made for the L l M generation of compute size.[00:37:21] what does l LM ops look like? You know, which pieces need to be drastically different? Which ones can we recycle?[00:37:27] Varun Mohan: Yeah. One of the things that we've found, like in our own thing of building code that's been just shows how much is missing, and this is the thing where like, I don't know how much of this you can really outsource, which is like we needed to build eval infrastructure.[00:37:40] That means how do you build a great code? And there are things online like human eval, right? And uh, I was telling, which is the benchmark telling Sean about this, the idea of human eval is really neat for code. The idea is you provide a bunch of functions with Docstrings and the eval instead of being, did you predict next token?[00:37:56] It's like, did you generate the entire function and does the function run correctly against a bunch of unit tests? Right. And we've built more sophisticated evals to work on many languages, to work on more variety of code bases. One of the issues that ends up coming up with things like human eval is contam.[00:38:12] Because a lot of these, uh, things that train models end up training on all of GitHub GitHub itself has human eva, so they end up training on that. And then the numbers are tiny, though. It's gonna be tiny, right? But it doesn't matter if it's tiny because it'll just remember it. It'll remember that it's, it's not that it's that precise, but it will, it's like, it's basically like mixing your, your training and validation set.[00:38:32] It's like, oh, yeah, yeah, yeah, yeah. But we've seen cases where like online where someone is like, we have a code model that's like, they we're like, we did this one thing, and HU and human eval jumped a ton and we were just like, huh, did human eval get into your data set? Is that really what happened there?[00:38:46] But we've needed to build all this eval. And what is shown is data cleaning is massive, but data cleaning looks different by. Like code data cleaning is different than what is a high quality piece of code is probably different than what's a high quality legal document. Yeah. And then on top of that, how do you eval this?[00:39:01] How do you also train it at scale at whatever cost you really want to get? But those are things that the end user is either gonna need to solve or someone else is gonna need to solve for them. And I guess maybe one of the things I'm a little bearish on is if another company comes out and solves eval properly for a bunch of different verticals, what was the company that they were selling to really?[00:39:21] What were they really doing at that point? If they themselves were not eval for their own workload and all these other things? I think there are cases where, let's say for code where we probably couldn't outsource our eval, like we wouldn't be able to ship models internally if we didn't know how to eval, but it's clear that there's a lot of different things that people need to take.[00:39:38] Like, Hey, maybe there's an embedding piece. How large is this embedding database actually need to be? But hey, this does look very different than what classic ML ops probably did. Mm-hmm. . How[00:39:47] Alessio Fanelli: do you compare some of these models? Like when you're thinking about model upgrading and making changes, like what does the testing piece of it internally?[00:39:56] Yeah. For us look like.[00:39:56] Varun Mohan: For us, it's like old school AB testing. We've built like infrastructure to be able to say, ramp up users from one to 10 to. 50% and slowly roll things out. This is all classic software, uh, which[00:40:09] swyx: you do in-house. You don't, you don't buy any[00:40:10] Varun Mohan: services. We don't buy services for that.[00:40:13] There are good services, open source services that help you just don't need them. Uh, yeah, I think that's just like not the most complicated thing for us. Sure. Basically. Yeah. Uh, but I think in the future, maybe, we'll, obviously we use things like Google Analytics and all this other stuff, but Yeah. For things of ramping our models, finding out if they're actually better because the eval also doesn't tell the whole story because also for us, Even before generating the prompt, we do a lot of work.[00:40:36] And the only way to know that it's really good across all the languages that our users need to tell us that it's actually good. And, and they tell us by accepting completions. So, so GitHub[00:40:44] swyx: co-pilot, uh, the extension does this thing where they, they like, they'll set a timer and then within like five minutes, 10 minutes, 20 minutes, they'll check in to see if the code is still there.[00:40:54] I thought it was a[00:40:54] Varun Mohan: pretty creative way. It's, it's a very, it's honestly a very creative way. We do do things to see, like in the long term, if people did. Accept or write things that are roughly so because they could accept and then change their minds. They could accept and then change their minds. So we, we are mindful of, of things like that.[00:41:09] But for the most part, the most important metric is at the time, did they actually, did we generate value? And we want to know if that's true. And it's, it's kind of, it's honestly really hard to get signal unless you have like a non-trivial amount of usage, non-trivial, meaning you're getting, you're doing hundreds of thousands of completions, if not millions of completions.[00:41:25] That sounds like, oh wow. Like, that's like a very small amount. But like it's classic. Maybe like if you look at like when I used to be an intern at Quora, like, you know, now more than seven, eight years ago. When I was there, I like shipped a change and then Cora had like millions of daily actives and then it looked like it was good, and then a week later it was just like way worse.[00:41:43] And how is this possible? Like in a given hour we get like hundreds of thousands of interaction, just like, no, you just need way more data. So this is like one of those things where I think having users is like genuinely very valuable to us, basically. Users is all you need. . Yeah.[00:41:59] swyx: Um, by the way, since you brought out Quora, have you tried po any, any thoughts[00:42:03] Varun Mohan: on po I have not actually tried po I've not actually tried.[00:42:05] I[00:42:05] swyx: mean, it seems like a question answering website that's been around for 20 years or something. Would be very, would be very good at question answering. Yeah.[00:42:12] Varun Mohan: Also Adam, the ceo, is like incredibly brilliant. That guy is like insanely smart, so I'm sure they're gonna do,[00:42:18] swyx: they have accidentally built the perfect like data collection company for For qa.[00:42:22] Varun Mohan: Yeah. . It takes a certain kind of person to go and like cannibalize your original company like the in, I mean, it was kinda stagnant for like a few years. Yeah, that's probably true. That's[00:42:31] swyx: probably true. The observation is I feel like you have a bias to its domain specific. , whereas most research is skewed towards, uh, general models, general purpose models.[00:42:40] I don't know if there's like a, a deeper insight here that you wanna go into or, or not, but like, train on all the things, get all the data and you're like, no, no, no. Everyone needs like customized per task,[00:42:49] Varun Mohan: uh, data set. Yeah. I think I'm not gonna. Say that general intelligence is not good. You want a base model that's still really good and that's probably trained on normal text, like a lot of different content.[00:43:00] But I think probably one thing that old school machine learning, even though I'm like the kind of person that says a lot of old school machine learning is just gonna die, is that training on a high quality data set for your workload is, is always gonna yield better results and more, more predictable results.[00:43:15] And I think we are under no illusions that that's not the case. Basical. And[00:43:19] swyx: then the other observation is bandwidth and connectivity, uh, which is not something that people usually think about, but apparently is a, is a big deal. Apparently training agreed in the synchronous needs, high GPU coordination.[00:43:29] These are deleted notes from Sam Altman talking about how they think about training and I was like, oh yeah, that's an insight. And[00:43:34] Varun Mohan: you guys have the same thing. Yeah. So I guess for, for training, you're right in that it is actually nuts to think about how insane the networks are for NVIDIA's most recent hardware, it's.[00:43:46] For the H 100 boxes, you shove eight of these H 100 s on a. Between two nodes. The bandwidth is 3,200 gigabits a second, so 400 gigabytes a second between machines. That's like nuts when you just sit and think about it. That's like double the memory bandwidth of what a CPU has, but it's like between two machines.[00:44:04] On top of that, within the machine, they've created this, this fabric called envy link that allows you to communicate at ultra low latency. That's even lower than P C I E. If you're familiar, that's like the communication protocol. . Yeah, between like the CPU and the other devices or other P C I E devices.[00:44:21] All of this is to make sure that reductions are fast, low latency, and you don't need to think about it. And that's because like a lot of deep learning has sort of evolved. Uh, training has evolved to be synchronous in the OG days. There is a lot of analysis in terms of how good is asynchronous training, which is like, Hey, I have a node, it has a current state of the model.[00:44:39] It's gonna update that itself locally, and it'll like every once in a while, go to another machine and update the weights. But I think like everyone has converged to synchronous. I'm not exactly sure. There's not a lot of good research on asynchronous training right now. Or maybe there is an, I haven't read it.[00:44:52] It's just that there isn't as much research because people are just like, oh, synchronous works. Uh, and the hardware is continually upleveled to handle[00:44:59] swyx: that. Yeah. It was just un unintuitive to me cuz like the whole purpose of GPUs could train things. A lot of things in parallel. Yes.[00:45:05] Varun Mohan: But the crazy thing is also, maybe I can, I can give some dumb math here.[00:45:09] Sure. Here, which is that, uh, let's go with uh, G B T three, which is like 170 billion per. The optimizer state, so while you're training is 14 times the size of the model, so in this case, if it's like 170 billion parameters, it's probably, I'm not great at mental math here, but that's probably around 2.5 terabytes to just store the optimizer state.[00:45:30] That has gotta be sharded across a lot of machines. Like that is not a single gpu. Even if you take an H 100 with 80 gigs to just shard that much, that's like 40, at least 30 machines. So there's like something there where these things need to communicate with each other too.[00:45:44] swyx: You need to vertically scale horizontally.[00:45:46] Varun Mohan: Yeah. You gotta co-located, you gotta somehow feel like you have this massive, the, the ideal programming paradigm is you feel like you have this massive computer. That has no communication, you know, overhead at all, but it has like infinite computer and infinite memory bandwidth.[00:45:59] swyx: That's the AI cluster. Um, okay, well, uh, we want to head to the questions.[00:46:05] Alessio Fanelli: So favorite AI product that you are not[00:46:08] Varun Mohan: building? Yeah, I'm friends with some of the folks at Mid Journey and I really think the Mid Journey product is super cool, especially seeing how the team is iterating and the quality of generations. It consistently gets upleveled. I think it's like quite neat and I think internally at at exa functional, we've been trying out mid Journey for like random content to like generate images and stuff.[00:46:26] Does it bother[00:46:26] swyx: you that they have like a style. I don't know. It, it seems like they're hedging themselves into a particular, like you want mid journey art, you go there.[00:46:33] Varun Mohan: Yeah. It's a brand of art. Yeah, you're right. I think they do have a style, but it seems more predictably good for that style. Okay. So maybe that's too, so just get good at, uh, domain specific thing.[00:46:41] Yeah. Yeah. maybe. Maybe I, maybe I'm just selling, talking to a booker right now. . Yeah. Uh, okay.[00:46:46] swyx: Uh, next question. Uh, favorite AI people and[00:46:48] Varun Mohan: communities? Yeah, so I think I mentioned this before, but I think obviously the open. The opening eye folks are, are insane. Like we, we only have respect for them. But beyond that, I think Elu is a pretty special group.[00:46:59] Especially it's been now probably more than a year and a half since they released like G P T J, which was like back when open source G PT three Curri, which was comparable. And it wasn't like a model where like, It wasn't good. It was like comparable in terms of perplexity to GT three curity and it was trained by a university student actually, and it just showed that, you know, in the end, like I would say pedigree is great, but in if you have people that are motivated know how computers work and they're willing to just get their hands dirty, you can do crazy things and that was a crazy project that gave me more hope.[00:47:34] Decentral training being potentially pretty massive. But I think that was like a very cool thing where a bunch of people just got on Discord and were chatting and they were able to just turn this out. Yeah. I did[00:47:42] swyx: not know this until I looked in further into Luther, but it was not a formal organization.[00:47:45] Was a company was a startup. It's not, yeah. Bunch of guys on Discord.[00:47:48] Varun Mohan: They gotta you, they gotta keep you research grant and they somehow just wrote some codes. .[00:47:52] Alessio Fanelli: Yeah. Yeah. Listen to APAC with Connor, who's the person, and basically Open Eye at the time was like, we cannot release G P T because it's like too good and so bad.[00:48:01] And he was like, He actually said he was sick, so he couldn't leave home for like a, a few weeks. So it was like, what else am I gonna do? And ended up getting through the Google like research programs through his university and they were like, oh, we'll give you TPUs. And he was like, cool. And that's how, that's,[00:48:17] Varun Mohan: that's amazing.[00:48:18] So I came to you. I love the story. Yeah, it's a great story. .[00:48:21] Alessio Fanelli: So a year from now, what do you think people will be most surprised by[00:48:25] Varun Mohan: In ai? Yeah. I think the thing people will be most surprised by is, I think they, the models are gonna, More good at SP special tasks for sure, but even the existing models, I think people will come up with more creative ways of leveraging them to build like world class products.[00:48:39] I think that's just like human creativity is gonna go wild. It seems like Cha GBT has already kind of unleashed that. I think I'm just excited to see what the future of these products look like. I guess law was not something I expected in such a short, well,[00:48:51] swyx: totally expected. I, I, I was actually watching a different company that I thought was gonna be the winner, and then Harvey just came outta nowhere,[00:48:56] Oh, wow. Okay. Okay. Well that's, that's awesome. But yeah. So my, my takeaway from what you're saying is like, foundation models have kind of shot way too far ahead of the apps and people need to build[00:49:05] Varun Mohan: apps. Yes. I think people should be building apps, but I. The reality is the model is like probably at a state right now where it can do crazy enough things.[00:49:12] Uh, and I think great apps will, will come out of this. Yeah.[00:49:16] swyx: AI thing you would pay for if someone else built it personal or work.[00:49:20] Varun Mohan: I think if, if someone else built like a proper assistant, like a proper like fitness assistant, I would probably pay for that actually. I know that, that sounds weird, but someone that actually tells me like, how should I end up, like, you know, doing fitness today, I ended up injuring my knee from over biking.[00:49:35] I ended up biking like 150 miles a week and I ended up just injuring my knee outta nowhere. So, so you need, you need an app to tell you to exercise less. Exercise less, but tell me what my training regimen is. Uh, tell me what I should do to prepare for things. I know that this is like a big niche, but I think the fact that Strava is such a big group of people and like swyx is a big group of people, seems to suggest that I think a lot of people would be willing to pay for something like this.[00:49:57] Alessio Fanelli: what's one thing you want everyone to take away about AI and our[00:50:01] Varun Mohan: conversation? Probably the most important thing to take away is there's probably a lot out there if people continue to tinker. I think that's probably like the biggest takeaway I've had. Uh, and it's, you know, being a pure infrastructure company, I think like, uh, six to eight months ago, I think it was like very hard to watch everyone tinkering and us just, you know, building, building infrastructure.[00:50:22] But I think there's gonna be some crazy things that come out over the next year or. Um, excited to just see what that looks like. Awesome. Yeah, man. That's it. This was fantastic. Thanks so much. Thanks for coming. Get full access to Latent Space at www.latent.space/subscribe
50:5202/03/2023
ChatGPT, GPT4 hype, and Building LLM-native products — with Logan Kilpatrick of OpenAI
We’re so glad to launch our first podcast episode with Logan Kilpatrick! This also happens to be his first public interview since joining OpenAI as their first Developer Advocate. Thanks Logan!Recorded in-person at the beautiful StudioPod studios in San Francisco. Full transcript is below the fold.Timestamps* 00:29: Logan’s path to OpenAI* 07:06: On ChatGPT and GPT3 API* 16:16: On Prompt Engineering* 20:30: Usecases and LLM-Native Products* 25:38: Risks and benefits of building on OpenAI* 35:22: OpenAI Codex* 42:40: Apple's Neural Engine* 44:21: Lightning RoundShow notes* Sam Altman’s interview with Connie Loizos* OpenAI Cookbook* OpenAI’s new Embedding Model* Cohere on Word and Sentence Embeddings* (referenced) What is AGI-hard?Lightning Rounds* Favorite AI Product: https://www.synthesia.io/* Favorite AI Community: MLOps * One year prediction: Personalized AI, https://civitai.com/* Takeaway: AI Revolution is here!Transcript[00:00:00] Alessio Fanelli: Hey everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, swyx writer editor of L Space Diaries. Hey.[00:00:20] swyx: Hey . Our guest today is Logan Kilpatrick. What I'm gonna try to do is I'm gonna try to introduce you based on what people know about you, and then you can fill in the blanks.[00:00:28] Introducing Logan[00:00:28] swyx: So you are the first. Developer advocate at OpenAI, which is a humongous achievement. Congrats. You're also the lead developer community advocate of the Julia language. I'm interested in a little bit of that and apparently as I've did a bit of research on you, you got into Julia through NASA where you interned and worked on stuff that's gonna land on the moon apparently.[00:00:50] And you are also working on computer vision at Apple. And had to sit at path, the eye as you fell down the machine learning rabbit hole. What should people know about you that's kind of not on your LinkedIn that like sort of ties together your interest[00:01:02] Logan Kilpatrick: in story? It's a good question. I think so one of the things that is on my LinkedIn that wasn't mentioned that's super near and dear to my heart and what I spend a lot of time in sort of wraps a lot of my open source machine learning developer advocacy experience together is supporting NumFOCUS.[00:01:17] And NumFOCUS is the nonprofit that helps enable a bunch of the open source scientific projects like Julia, Jupyter, Pandas, NumPy, all of those open source projects are. Facilitated legal and fiscally through NumFOCUS. So it's a very critical, important part of the ecosystem and something that I, I spend a bunch of my now more limited free time helping support.[00:01:37] So yeah, something that's, It's on my LinkedIn, but it's, it's something that's important to me. Well,[00:01:42] swyx: it's not as well known of a name, so maybe people kind of skip over it cuz they were like, I don't know what[00:01:45] Logan Kilpatrick: to do with this. Yeah. It's super interesting to see that too. Just one point of context for that is we tried at one point to get a Wikipedia page for non focus and it's, it's providing, again, the infrastructure for, it's like a hundred plus open source scientific projects and they're like, it's not notable enough.[00:01:59] I'm like, well, you know, there's something like 30 plus million developers around the world who use all these open source tools. It's like the foundation. All open source like science that happens. Every breakthrough in science is they discovered the black hole, the first picture of the black hole, all that stuff using numb focus tools, the Mars Rovers, NumFOCUS tools, and it's interesting to see like the disconnect between the nonprofit that supports those projects and the actual success of the projects themselves.[00:02:26] swyx: Well, we'll, we'll get a bunch of people focused on NumFOCUS and we'll get it on Wikipedia. That that is our goal. . That is the goal. , that is our shot. Is this something that you do often, which is you? You seem to always do a lot of community stuff. When you get into something, you're also, I don't know where this, where you find time for this.[00:02:42] You're also a conference chair for DjangoCon, which was last year as well. Do you fall down the rabbit hole of a language and then you look for community opportunities? Is that how you get into.[00:02:51] Logan Kilpatrick: Yeah, so the context for Django stuff was I'd actually been teaching and still am through Harvard's division of continuing education as a teaching fellow for a Django class, and had spent like two and a half years actually teaching students every semester, had a program in Django and realized that like it was kind of the one ecosystem or technical tool that I was using regularly that I wasn't actually contributing to that community.[00:03:13] So, I think sometime in 2021 like applied to be on the board of directors of the Django Events Foundation, north America, who helps run DjangoCon and was fortunate enough to join a support to be the chair of DjangoCon us and then just actually rolled off the board because of all the, all the craziness and have a lot less free time now.[00:03:32] And actually at PATH ai. Sort of core product was also using, was using Django, so it also had a lot of connections to work, so it was a little bit easier to justify that time versus now open ai. We're not doing any Django stuff unfortunately, so, or[00:03:44] swyx: Julia, I mean, should we talk about this? Like, are you defecting from Julia?[00:03:48] What's going on? ,[00:03:50] Logan Kilpatrick: it's actually felt a little bit strange recently because I, for the longest time, and, and happy to talk about this in the context of Apple as well, the Julie ecosystem was my outlet to do a lot of the developer advocacy, developer relations community work that I wanted to do. because again, at Apple I was just like training machine learning models.[00:04:07] Before that, doing software engineering at Apple, and even at Path ai, we didn't really have a developer product, so it wasn't, I was doing like advocacy work, but it wasn't like developer relations in the traditional sense. So now that I'm so deeply doing developer relations work at Open OpenAI, it's really difficult to.[00:04:26] Continue to have the energy after I just spent nine hours doing developer relations stuff to like go and after work do a bunch more developer relations stuff. So I'll be interested to see for myself like how I'm able to continue to do that work and I. The challenge is that it's, it's such critical, important work to happen.[00:04:43] Like I think the Julie ecosystem is so important. I think the language is super important. It's gonna continue to grow in, in popularity, and it's helping scientists and engineers solve problems they wouldn't otherwise be able to. So it's, yeah, the burden is on me to continue to do that work, even though I don't have a lot of time now.[00:04:58] And I[00:04:58] Alessio Fanelli: think when it comes to communities, the machine learning technical community, I think in the last six to nine months has exploded. You know, you're the first developer advocate at open ai, so I don't think anybody has a frame of reference on what that means. What is that? ? So , what do you, how did, how the[00:05:13] swyx: job, yeah.[00:05:13] How do you define the job? Yeah, let's talk about that. Your role.[00:05:16] Logan Kilpatrick: Yeah, it's a good question and I think there's a lot of those questions that actually still exist at OpenAI today. Like I think a lot of traditional developed by advocacy, at least like what you see on Twitter, which I think is what a lot of people's perception of developer advocacy and developer relations is, is like, Just putting out external content, going to events, speaking at conferences.[00:05:35] And I think OpenAI is very unique in the sense that, at least at the present moment, we have so much inbound interest that there's, there is no desire for us to like do that type of developer advocacy work. So it's like more from a developer experience point of view actually. Like how can we enable developers to be successful?[00:05:53] And that at the present moment is like building a strong foundation of documentation and things like that. And we had a bunch of amazing folks internally who were. Who were doing some of this work, but it really wasn't their full-time job. Like they were focused on other things and just helping out here and there.[00:06:05] And for me, my full-time job right now is how can we improve the documentation so that people can build the next generation of, of products and services on top of our api. And it's. Yeah. There's so much work that has to happen, but it's, it's, it's been a ton of fun so far. I find[00:06:20] swyx: being in developer relations myself, like, it's kind of like a fill in the blanks type of thing.[00:06:24] Like you go to where you, you're needed the most open. AI has no problem getting attention. It is more that people are not familiar with the APIs and, and the best practices around programming for large language models, which is a thing that did not exist three years ago, two years ago, maybe one year ago.[00:06:40] I don't know. When she launched your api, I think you launched Dall-E. As an API or I, I don't[00:06:45] Logan Kilpatrick: know. I dunno. The history, I think Dall-E was, was second. I think it was some of the, like GPT3 launched and then GPT3 launched and the API I think like two years ago or something like that. And then Dali was, I think a little more than a year ago.[00:06:58] And then now all the, the Chachi Beast ChatGPT stuff has, has blown it all outta the water. Which you have[00:07:04] swyx: a a wait list for. Should we get into that?[00:07:06] Logan Kilpatrick: Yeah. .[00:07:07] ChatGPT[00:07:07] Alessio Fanelli: Yeah. We would love to hear more about that. We were looking at some of the numbers you went. Zero to like a million users in five days and everybody, I, I think there's like dozens of ChatGPT API wrappers on GitHub that are unofficial and clearly people want the product.[00:07:21] Like how do you think about that and how developers can interact with it.[00:07:24] Logan Kilpatrick: It. It's absolutely, I think one of the most exciting things that I can possibly imagine to think about, like how much excitement there was around ChatGPT and now getting to hopefully at some point soon, put that in the hands of developers and see what they're able to unlock.[00:07:38] Like I, I think ChatGPT has been a tremendous success, hands down without a question, but I'm actually more excited to see what developers do with the API and like being able to build those chat first experiences. And it's really fascinating to see. Five years ago or 10 years ago, there was like, you know, all this like chatbot sort of mm-hmm.[00:07:57] explosion. And then that all basically went away recently, and the hype went to other places. And I think now we're going to be closer to that sort of chat layer and all these different AI chat products and services. And it'll be super interesting to see if that sticks or not. I, I'm not. , like I think people have a lot of excitement for ChatGPT right now, but it's not clear to me that that that's like the, the UI or the ux, even though people really like it in the moment, whether that will stand the test of time, I, I just don't know.[00:08:23] And I think we'll have to do a podcast in five years. Right. And check in and see whether or not people are still really enjoying that sort of conversational experience. I think it does make sense though cause like that's how we all interact and it's kind of weird that you wouldn't do that with AI products.[00:08:37] So we. and I think like[00:08:40] Alessio Fanelli: the conversational interface has made a lot of people, first, the AI to hallucinate, you know, kind of come up with things that are not true and really find all the edge cases. I think we're on the optimism camp, you know, like we see the potential. I think a lot of people like to be negative.[00:08:56] In your role, kind of, how do you think about evangelizing that and kind of the patience that sometimes it takes for these models to become.[00:09:03] Logan Kilpatrick: Yeah, I think what, what I've done is just continue to scream from the, the mountains that like ChatGPT has, current form is definitely a research preview. The model that underlies ChatGPT GPT 3.5 is not a research preview.[00:09:15] I think there's things that folks can do to definitely reduce the amount of hall hallucinations and hopefully that's something that over time I, I, again have full confidence that it'll, it'll solve. Yeah, there's a bunch of like interesting engineering challenges. you have to solve in order to like really fix that problem.[00:09:33] And I think again, people are, are very fixated on the fact that like in, you know, a few percentage points of the conversations, things don't sound really good. Mm-hmm. , I'm really more excited to see, like, again when the APIs and the Han developers like what are the interesting solutions that people come up with, I think there's a lot that can be explored and obviously, OpenAI can explore all them because we have this like one product that's using the api.[00:09:56] And once you get 10,000, a hundred thousand developers building on top of that, like, we'll see what are the different ways that people handle this. And I imagine there's a lot of low-hanging fruit solutions that'll significantly improve the, the amount of halluc hallucinations that are showing up. Talk about[00:10:11] swyx: building on top of your APIs.[00:10:13] Chat GPTs API is not out yet, but let's assume it is. Should I be, let's say I'm, I'm building. A choice between GP 3.5 and chat GPT APIs. As far as I understand, they are kind of comparable. What should people know about deciding between either of them? Like it's not clear to me what the difference is.[00:10:33] Logan Kilpatrick: It's a great question.[00:10:35] I don't know if there's any, if we've made any like public statements about like what the difference will be. I think, I think the point is that the interface for the Chachi B API will be like conversational first, and that's not the case now. If you look at text da Vinci oh oh three, like you, you just put in any sort of prompt.[00:10:52] It's not really built from the ground up to like keep the context of a conversation and things like that. And so it's really. Put in some sort of prompt, get a response. It's not always designed to be in that sort of conversational manner, so it's not tuned in that way. I think that's the biggest difference.[00:11:05] I think, again, the point that Sam made in a, a strictly the strictly VC talk mm-hmm. , which was incredible and I, I think that that talk got me excited and my, which, which part? The whole thing. And I think, I haven't been at open AI that long, so like I didn't have like a s I obviously knew who Sam was and had seen a bunch of stuff, but like obviously before, a lot of the present craziness with Elon Musk, like I used to think Elon Musk seemed like a really great guy and he was solving all these really important problems before all the stuff that happened.[00:11:33] That's a hot topic. Yeah. The stuff that happened now, yeah, now it's much more questionable and I regret having a Tesla, but I, I think Sam is actually. Similar in the sense that like he's solving and thinking about a lot of the same problems that, that Elon, that Elon is still today. But my take is that he seems like a much more aligned version of Elon.[00:11:52] Like he's, he's truly like, I, I really think he cares deeply about people and I think he cares about like solving the problems that people have and wants to enable people. And you can see this in the way that he's talked about how we deploy models at OpenAI. And I think you almost see Tesla in like the completely opposite end of the spectrum, where they're like, whoa, we.[00:12:11] Put these 5,000 pound machines out there. Yeah. And maybe they'll run somebody over, maybe they won't. But like it's all in the interest of like advancement and innovation. I think that's really on the opposite end of the spectrum of, of what open AI is doing, I think under Sam's leadership. So it's, it's interesting to see that, and I think Sam said[00:12:30] Alessio Fanelli: that people could have built Chen g p t with what you offered like six, nine months ago.[00:12:35] I[00:12:35] swyx: don't understand. Can we talk about this? Do you know what, you know what we're talking about, right? I do know what you're talking about. da Vinci oh three was not in the a p six months before ChatGPT. What was he talking about? Yeah.[00:12:45] Logan Kilpatrick: I think it's a little bit of a stretch, but I do think that it's, I, I think the underlying principle is that.[00:12:52] The way that it, it comes back to prompt engineering. The way that you could have engineered, like the, the prompts that you were put again to oh oh three or oh oh two. You would be able to basically get that sort of conversational interface and you can do that now. And, and I, you know, I've seen tutorials.[00:13:05] We have tutorials out. Yep. No, we, I mean, we, nineties, we have tutorials in the cookbook right now in on GitHub. We're like, you can do this same sort of thing. And you just, it's, it's all about how you, how you ask for responses and the way you format data and things like that. It. The, the models are currently only limited by what people are willing to ask them to do.[00:13:24] Like I really do think that, yeah, that you can do a lot of these things and you don't need the chat CBT API to, to build that conversational layer. That is actually where I[00:13:33] swyx: feel a little bit dumb because I feel like I don't, I'm not smart enough to think of new things to ask the models. I have to see an example and go, oh, you can do that.[00:13:43] All right, I'm gonna do that for now. You know, and, and that's why I think the, the cookbook is so important cuz it's kind of like a compendium of things we know about the model that you can ask it to do. I totally[00:13:52] Logan Kilpatrick: agree and I think huge shout out to the, the two folks who I work super closely with now on the cookbook, Ted and Boris, who have done a lot of that work and, and putting that out there and it's, yeah, you see number one trending repo on, on GitHub and it was super, like when my first couple of weeks at Open ai, super unknown, like really, we were only sort of directing our customers to that repo.[00:14:13] Not because we were trying to hide it or anything, but just because. It was just the way that we were doing things and then all of a sudden it got picked up on GitHub trending and a bunch of tweets went viral, showing the repo. So now I think people are actually being able to leverage the tools that are in there.[00:14:26] And, and Ted's written a bunch of amazing tutorials, Boris, as well. So I think it's awesome that more people are seeing those. And from my perspective, it's how can we take those, make them more accessible, give them more visibility, put them into the documentation, and I don't think that that connection right now doesn't exist, which I'm, I'm hopeful we'll be able to bridge those two things.[00:14:44] swyx: Cookbook is kind of a different set of documentation than API docs, and I think there's, you know, sort of existing literature about how you document these things and guide developers the right way. What, what I, what I really like about the cookbook is that it actually cites academic research. So it's like a nice way to not read the paper, but just read the conclusions of the paper ,[00:15:03] Logan Kilpatrick: and, and I think that's, that's a shout out to Ted and Boris cuz I, I think they're, they're really smart in that way and they've done a great job of finding the balance and understanding like who's actually using these different tools.[00:15:13] So, . Yeah.[00:15:15] swyx: You give other people credit, but you should take credit for yourself. So I read your last week you launched some kind of documentation about rate limiting. Yeah. And one of my favorite things about reading that doc was seeing examples of, you know, you were, you're telling people to do exponential back off and, and retry, but you gave code examples with three popular libraries.[00:15:32] You didn't have to do that. You could have just told people, just figure it out. Right. But you like, I assume that was you. It wasn't.[00:15:38] Logan Kilpatrick: So I think that's the, that's, I mean, I'm, I'm helping sort of. I think there's a lot of great stuff that people have done in open ai, but it was, we have the challenge of like, how can we make that accessible, get it into the documentation and still have that high bar for what goes into the doc.[00:15:51] So my role as of recently has been like helping support the team, building that documentation first culture, and supporting like the other folks who actually are, who wrote that information. The information was actually already in. Help center but it out. Yeah, it wasn't in the docs and like wasn't really focused on, on developers in that sense.[00:16:10] So yeah. I can't take the, the credit for the rate limit stuff either. , no, this[00:16:13] swyx: is all, it's part of the A team, that team effort[00:16:16] On Prompt Engineering[00:16:16] Alessio Fanelli: I was reading on Twitter, I think somebody was saying in the future will be kind of like in the hair potter word. People have like the spell book, they pull it out, they do all the stuff in chat.[00:16:24] GP z. When you talk with customers, like are they excited about doing prompt engineering and kind of getting a starting point or do they, do they wish there was like a better interface? ?[00:16:34] Logan Kilpatrick: Yeah, that's a good question. I think prompt engineering is so much more of an art than a science right now. Like I think there are like really.[00:16:42] Systematic things that you can do and like different like approaches and designs that you can take, but really it's a lot of like, you kind of just have to try it and figure it out. And I actually think that this remains to be one of the challenges with large language models in general, and not just head open ai, but for everyone doing it is that it's really actually difficult to understand what are the capabilities of the model and how do I get it to do the things that I wanted to do.[00:17:05] And I think that's probably where a lot of folks need to do like academic research and companies need to invest in understanding the capabilities of these models and the limitations because it's really difficult to articulate the capabilities of a model without those types of things. So I'm hopeful that, and we're shipping hopefully some new updated prompt engineering stuff.[00:17:24] Cause I think the stuff we have on the website is old, and I think the cookbook actually has a little bit more up-to-date stuff. And so hopefully we'll ship some new prompt engineering stuff in the, in the short term. I think dispel some of the myths and rumors, but like I, it's gonna continue to be like a, a little bit of a pseudoscience, I would imagine.[00:17:41] And I also think that the whole prompt engineering being like a job in the future meme, I think is, I think it's slightly overblown. Like I think at, you see this now actually with like, there's tools that are showing up and I forgot what the, I just saw went on Twitter. The[00:17:57] swyx: next guest that we are having on this podcast, Lang.[00:17:59] Yeah. Yeah.[00:18:00] Logan Kilpatrick: Lang Chain and Harrison on, yeah, there's a bunch of repos too that like categorize and like collect all the best prompts that you can put into chat. For example, and like, that's like the people who are, I saw the advertisement for someone to be like a prompt engineer and it was like a $350,000 a year.[00:18:17] Mm-hmm. . Yeah, that was, that was philanthropic. Yeah, so it, it's just unclear to me like how, how sustainable stuff like that is. Cuz like, once you figure out the interesting prompts and like right now it's kind of like the, the Wild West, but like in a year you'll be able to sort of categorize all those and then people will be able to find all the good ones that are relevant for what they want to do.[00:18:35] And I think this goes back to like, having the examples is super important and I'm, I'm with you as well. Like every time I use Dall-E the little. While it's rendering the image, it gives you like a suggestion of like how you should ask for the art to be generated. Like do it in like a cyberpunk format. Do it in a pixel art format.[00:18:53] Et cetera, et cetera, and like, I really need that. I'm like, I would never come up with asking for those things had it not prompted me to like ask it that way. And now I always ask for pixel art stuff or cyberpunk stuff and it looks so cool. That's what I, I think,[00:19:06] swyx: is the innovation of ChatGPT as a format.[00:19:09] It reduces. The need for getting everything into your prompt in the first try. Mm-hmm. , it takes it from zero shot to a few shot. If, if, if that, if prompting as, as, as shots can be concerned.[00:19:21] Logan Kilpatrick: Yeah. , I think that's a great perspective and, and again, this goes back to the ux UI piece of it really being sort of the differentiating layer from some of the other stuff that was already out there.[00:19:31] Because you could kind of like do this before with oh oh three or something like that if you just made the right interface and like built some sort of like prompt retry interface. But I don't think people were really, were really doing that. And I actually think that you really need that right now. And this is the, again, going back to the difference between like how you can use generative models versus like large scale.[00:19:53] Computer vision systems for self-driving cars, like the, the answer doesn't actually need to be right all the time. That's the beauty of, of large language models. It can be wrong 50% of the time and like it doesn't really cost you anything to like regenerate a new response. And there's no like, critical safety issue with that, so you don't need those.[00:20:09] I, I keep seeing these tweets about like, you need those like 99.99% reliability and like the three nines or whatever it is. Mm-hmm. , but like you really don't need that because the cost of regenerating the prop is again, almost, almost. I think you tweeted a[00:20:23] Alessio Fanelli: couple weeks ago that the average person doesn't yet fully grasp how GBT is gonna impact human life in the next four, five years.[00:20:30] Usecases and LLM-Native Products[00:20:30] Alessio Fanelli: I think you had an example in education. Yeah. Maybe touch on some of these. Example of non-tech related use cases that are enabling, enabled by C G B[00:20:38] T.[00:20:39] Logan Kilpatrick: I'm so excited and, and there's a bunch of other like random threads that come to my mind now. I saw a thread and, and our VP of product was, Peter, was, was involved in that thread as well, talking about like how the use of systems like ChatGPT will unlock like pretty almost low to zero cost access to like mental health services.[00:20:59] You know, you can imagine like the same use case for education, like really personalized tutors and like, it's so crazy to think about, but. The technology is not actually , like it's, it's truly like an engineering problem at this point of like somebody using one of these APIs to like build something like that and then hopefully the models get a little bit better and make it, make it better as well.[00:21:20] But like it, I have no doubt in my mind that three years from now that technology will exist for every single student in the world to like have that personalized education experience, have a pr, have a chat based experience where like they'll be able. Ask questions and then the curriculum will just evolve and be constructed for them in a way that keeps, I think the cool part is in a way that keeps them engaged, like it doesn't have to be sort of like the same delivery of curriculum that you've always seen, and this now supplements.[00:21:49] The sort of traditional education experience in the sense of, you know, you don't need teachers to do all of this work. They can really sort of do the thing that they're amazing at and not spend time like grading assignments and all that type of stuff. Like, I really do think that all those could be part of the, the system.[00:22:04] And same thing, I don't know if you all saw the the do not pay, uh, lawyer situation, say, I just saw that Twitter thread, I think yesterday around they were going to use ChatGPT in the courtroom and basically I think it was. California Bar or the Bar Institute said that they were gonna send this guy to prison if he brought, if he put AirPods in and started reading what ChatGPT was saying to him.[00:22:26] Yeah.[00:22:26] swyx: To give people the context, I think, like Josh Browder, the CEO of Do Not Pay, was like, we will pay you money to put this AirPod into your ear and only say what we tell you to say fr from the large language model. And of course the judge was gonna throw that out. I mean, I, I don't see how. You could allow that in your court,[00:22:42] Logan Kilpatrick: Yeah, but I, I really do think that, like, the, the reality is, is that like, again, it's the same situation where the legal spaces even more so than education and, and mental health services, is like not an accessible space. Like every, especially with how like overly legalized the United States is, it's impossible to get representation from a lawyer, especially if you're low income or some of those things.[00:23:04] So I'm, I'm optimistic. Those types of services will exist in the future. And you'll be able to like actually have a, a quality defense representative or just like some sort of legal counsel. Yeah. Like just answer these questions, what should I do in this situation? Yeah. And I like, I have like some legal training and I still have those same questions.[00:23:22] Like I don't know what I would do in that situation. I would have to go and get a lawyer and figure that out. And it's, . It's tough. So I'm excited about that as well. Yeah.[00:23:29] Alessio Fanelli: And when you think about all these vertical use cases, do you see the existing products implementing language models in what they have?[00:23:35] Or do you think we're just gonna see L L M native products kind of come to market and build brand[00:23:40] Logan Kilpatrick: new experiences? I think there'll be a lot of people who build the L l M first experience, and I think that. At least in the short term, those are the folks who will have the advantage. I do think that like the medium to long term is again, thinking about like what is your moat for and like again, and everyone has access to, you know, ChatGPT and to the different models that we have available.[00:24:05] So how can you build a differentiated business? And I think a lot of it actually will come down to, and this is just the true and the machine learning world in general, but having. Unique access to data. So I think if you're some company that has some really, really great data about the legal space or about the education space, you can use that and be better than your competition by fine tuning these models or building your own specific LLMs.[00:24:28] So it'll, it'll be interesting to see how that plays out, but I do think that. from a product experience, it's gonna be better in the short term for people who build the, the generative AI first experience versus people who are sort of bolting it onto their mm-hmm. existing product, which is why, like, again, the, the Google situation, like they can't just put in like the prompt into like right below the search bar.[00:24:50] Like, it just, it would be a weird experience and, and they have to sort of defend that experience that they have. So it, it'll be interesting to see what happens. Yeah. Perplexity[00:24:58] swyx: is, is kind of doing that. So you're saying perplexity will go Google ?[00:25:04] Logan Kilpatrick: I, I think that perplexity has a, has a chance in the short term to actually get more people to try the product because it's, it's something different I think, whether they can, I haven't actually used, so I can't comment on like that experience, but like I think the long term is like, How can they continue to differentiate?[00:25:21] And, and that's really the focus for like, if you're somebody building on these models, like you have to be, your first thought should be, how do I build a differentiated business? And if you can't come up with 10 reasons that you can build a differentiated business, you're probably not gonna succeed in, in building something that that stands the test of time.[00:25:37] Yeah.[00:25:37] Risks and benefits of building on OpenAI[00:25:37] swyx: I think what's. As a potential founder or something myself, like what's scary about that is I would be building on top of open ai. I would be sending all my stuff to you for fine tuning and embedding and what have you. By the way, fine tuning, embedding is their, is there a third one? Those are the main two that I know of.[00:25:55] Okay. And yeah, that's the risk. I would be a open AI API reseller.[00:26:00] Logan Kilpatrick: Yeah. And, and again, this, this comes back down to like having a clear sense of like how what you're building is different. Like the people who are just open AI API resellers, like, you're not gonna, you're not gonna have a successful business doing that because everybody has access to the Yeah.[00:26:15] Jasper's pretty great. Yeah, Jasper's pretty great because I, I think they've done a, they've, they've been smart about how they've positioned the product and I was actually a, a Jasper customer before I joined OpenAI and was using it to do a bunch of stuff. because the interface was simple because they had all the sort of customized, like if you want for like a response for this sort of thing, they'd, they'd pre-done that prompt engineering work for us.[00:26:39] I mean, you could really just like put in some exactly what you wanted and then it would make that Amazon product description or whatever it is. So I think like that. The interface is the, the differentiator for, for Jasper. And again, whether that send test time, hopefully, cuz I know they've raised a bunch of money and have a bunch of employees, so I'm, I'm optimistic for them.[00:26:58] I think that there's enough room as well for a lot of these companies to succeed. Like it's not gonna, the space is gonna get so big so quickly that like, Jasper will be able to have a super successful business. And I think they are. I just saw some, some tweets from the CEO the other day that I, I think they're doing, I think they're doing well.[00:27:13] Alessio Fanelli: So I'm the founder of A L L M native. I log into open ai, there's 6 million things that I can do. I'm on the playground. There's a lot of different models. How should people think about exploring the surface area? You know, where should they start? Kind of like hugging the go deeper into certain areas.[00:27:30] Logan Kilpatrick: I think six months ago, I think it would've been a much different conversation because people hadn't experienced ChatGPT before.[00:27:38] Now that people have experienced ChatGPT, I think there's a lot more. Technical things that you should start looking into and, and thinking about like the differentiators that you can bring. I still think that the playground that we have today is incredible cause it does sort of similar to what Jasper does, which is like we have these very focused like, you know, put in a topic and we'll generate you a summary, but in the context of like explaining something to a second grader.[00:28:03] So I think all of those things like give a sense, but we only have like 30 on the website or something like that. So really doing a lot of exploration around. What is out there? What are the different prompts that you can use? What are the different things that you can build on? And I'm super bullish on embeddings, like embed everything and that's how you can build cool stuff.[00:28:20] And I keep seeing all these Boris who, who I talked about before, who did a bunch of the cookbook stuff, tweeted the other day that his like back of the hand, back of the napkin math, was that 50 million bucks you can embed the whole internet. I'm like, Some companies gonna spend the 50 million and embed the whole internet and like, we're gonna find out what that product looks like.[00:28:40] But like, there's so many cool things that you could do if you did have the whole internet embedded. Yeah, and I, I mean, I wouldn't be surprised if Google did that cuz 50 million is a drop in the bucket and they already have the whole internet, so why not embed it?[00:28:52] swyx: Can can I ask a follow up question on that?[00:28:54] Cuz I am just learning about embeddings myself. What makes open eyes embeddings different from other embeddings? If, if there's like, It's okay if you don't have the, the numbers at hand, but I'm just like, why should I use open AI emitting versus others? I[00:29:06] Logan Kilpatrick: don't understand. Yeah, that's a really good question.[00:29:08] So I'm still ramping up on my understanding of embeddings as well. So the two things that come to my mind, one, going back to the 50 million to embed the whole internet example, it's actually just super cheap. I, I don't know the comparisons of like other prices, but at least from what I've seen people talking about on Twitter, like the embeddings that that we have in the API is just like significantly cheaper than a lot of other c.[00:29:30] Embeddings. Also the accuracy of some of the benchmarks that are like, Sort of academic benchmarks to use in embeddings. I know at least I was just looking back through the blog post from when we announced the new text embedding model, which is what Powers embeddings and it's, yeah, the, on those metrics, our API is just better.[00:29:50] So those are the those. I'll go read it up. Yeah, those are the two things. It's a good. It's a good blog post to read. I think the most recent one that came out, but, and also the original one from when we first announced the Embeddings api, I think also was a, it had, that one has a little bit more like context around if you're trying to wrap your head around embeddings, how they work.[00:30:06] That one has the context, the new one just has like the fancy new stuff and the metrics and all that kind of stuff.[00:30:11] swyx: I would shout a hugging face for having really good content around what these things like foundational concepts are. Because I was familiar with, so, you know, in Python you have like text tove, my first embedding as as a, as someone getting into nlp.[00:30:24] But then developing the concept of sentence embeddings is, is as opposed to words I think is, is super important. But yeah, it's an interesting form of lock in as a business because yes, I'm gonna embed all my source data, but then every inference needs an embedding as. . And I think that is a risk to some people, because I've seen some builders should try and build on open ai, call that out as, as a cost, as as like, you know, it starts to add a cost to every single query that you, that you[00:30:48] Logan Kilpatrick: make.[00:30:49] Yeah. It'll be interesting to see how it all plays out, but like, my hope is that that cost isn't the barrier for people to build because it's, it's really not like the cost for doing the incremental like prompts and having them embedded is, is. Cent less than cents, but[00:31:06] swyx: cost I, I mean money and also latency.[00:31:08] Yeah. Which is you're calling the different api. Yeah. Anyway, we don't have to get into that.[00:31:13] Alessio Fanelli: No, but I think embeds are a good example. You had, I think, 17 versions of your first generation, what api? Yeah. And then you released the second generation. It's much cheaper, much better. I think like the word on the street is like when GPT4 comes out, everything else is like trash that came out before it.[00:31:29] It's got[00:31:30] Logan Kilpatrick: 100 trillion billion. Exactly. Parameters you don't understand. I think Sam has already confirmed that those are, those are not true . The graphics are not real. Whatever you're seeing on Twitter about GPT4, you're, I think the direct quote was, you're begging to be disappointed by continuing to, to put that hype out.[00:31:47] So[00:31:48] Alessio Fanelli: if you're a developer building on these, What's kind of the upgrade path? You know, I've been building on Model X, now this new model comes out. What should I do to be ready to move on?[00:31:58] Logan Kilpatrick: Yeah. I think all of these types of models folks have to think about, like there will be trade offs and they'll also be.[00:32:05] Breaking changes like any other sort of software improvement, like things like the, the prompts that you were previously expecting might not be the prompts that you're seeing now. And you can actually, you, you see this in the case of the embeddings example that you just gave when we released Tex embeddings, ADA oh oh two, ada, ada, whichever it is oh oh two, and it's sort of replaced the previous.[00:32:26] 16 first generation models, people went through this exact experience where like, okay, I need to test out this new thing, see how it works in my environment. And I think that the really fascinating thing is that there aren't, like the tools around doing this type of comparison don't exist yet today. Like if you're some company that's building on lms, you sort of just have to figure it out yourself of like, is this better in my use case?[00:32:49] Is this not better? In my use case, it's, it's really difficult to tell because the like, Possibilities using generative models are endless. So I think folks really need to focus on, again, that goes back to how to build a differentiated business. And I think it's understanding like what is the way that people are using your product and how can you sort of automate that in as much way and codify that in a way that makes it clear when these different models come up, whether it's open AI or other companies.[00:33:15] Like what is the actual difference between these and which is better for my use case because the academic be. It'll be saturated and people won't be able to use them as a point of comparison in the future. So it'll be important to think about. For your specific use case, how does it differentiate?[00:33:30] swyx: I was thinking about the value of frameworks or like Lang Chain and Dust and what have you out there.[00:33:36] I feel like there is some value to building those frameworks on top of Open Eyes, APIs. It kind of is building what's missing, essentially what, what you guys don't have. But it's kind of important in the software engineering sense, like you have this. Unpredictable, highly volatile thing, and you kind of need to build a stable foundation on top of it to make it more predictable, to build real software on top of it.[00:33:59] That's a super interesting kind of engineering problem. .[00:34:03] Logan Kilpatrick: Yeah, it, it is interesting. It's also the, the added layer of this is that the large language models. Are inherently not deterministic. So I just, we just shipped a small documentation update today, which, which calls this out. And you think about APIs as like a traditional developer experience.[00:34:20] I send some response. If the response is the same, I should get the same thing back every time. Unless like the data's updating and like a, from like a time perspective. But that's not the, that's not the case with the large language models, even with temperature zero. Mm-hmm. even with temperature zero. Yep.[00:34:34] And that's, Counterintuitive part, and I think someone was trying to explain to me that it has to do with like Nvidia. Yeah. Floating points. Yes. GPU stuff. and like apparently the GPUs are just inherently non-deterministic. So like, yes, there's nothing we can do unless this high Torch[00:34:48] swyx: relies on this as well.[00:34:49] If you want to. Fix this. You're gonna have to tear it all down. ,[00:34:53] Logan Kilpatrick: maybe Nvidia, we'll fix it. I, I don't know, but I, I think it's a, it's a very like, unintuitive thing and I don't think that developers like really get that until it happens to you. And then you're sort of scratching your head and you're like, why is this happening?[00:35:05] And then you have to look it up and then you see all the NVIDIA stuff. Or hopefully our documentation makes it more clear now. But hopefully people, I also think that's, it's kinda the cool part as well. I don't know, it's like, You're not gonna get the same stuff even if you try to.[00:35:17] swyx: It's a little spark of originality in there.[00:35:19] Yeah, yeah, yeah, yeah. The random seed .[00:35:22] OpenAI Codex[00:35:22] swyx: Should we ask about[00:35:23] Logan Kilpatrick: Codex?[00:35:23] Alessio Fanelli: Yeah. I mean, I love Codex. I use it every day. I think like one thing, sometimes the code is like it, it's kinda like the ChatGPT hallucination. Like one time I asked it to write up. A Twitter function, they will pull the bayou of this thing and it wrote the whole thing and then the endpoint didn't exist once I went to the Twitter, Twitter docs, and I think like one, I, I think there was one research that said a lot of people using Co Palace, sometimes they just auto complete code that is wrong and then they commit it and it's a, it's a big[00:35:51] Logan Kilpatrick: thing.[00:35:51] swyx: Do you secure code as well? Yeah, yeah, yeah, yeah. I saw that study.[00:35:54] Logan Kilpatrick: How do[00:35:54] Alessio Fanelli: you kind of see. Use case evolving. You know, you think, like, you obviously have a very strong partnership with, with Microsoft. Like do you think Codex and VS code will just keep improving there? Do you think there's kind of like a. A whole better layer on top of it, which is from the scale AI hackathon where the, the project that one was basically telling the l l m, you're not the back end of a product[00:36:16] And they didn't even have to write the code and it's like, it just understood. Yeah. How do you see the engineer, I, I think Sean, you said copilot is everybody gets their own junior engineer to like write some of the code and then you fix it For me, a lot of it is the junior engineer gets a senior engineer to actually help them write better code.[00:36:32] How do you see that tension working between the model and the. It'll[00:36:36] Logan Kilpatrick: be really interesting to see if there's other, if there's other interfaces to this. And I think I've actually seen a lot of people asking, like, it'd be really great if I had ChatGPT and VS code because in, in some sense, like it can, it's just a better, it's a better interface in a lot of ways to like the, the auto complete version cuz you can reprompt and do, and I know Via, I know co-pilot actually has that, where you can like click and then give it, it'll like pop up like 10 suggested.[00:36:59] Different options instead of brushes. Yeah, copilot labs, yeah. Instead of the one that it's providing. And I really like that interface, but again, this goes back to. I, I do inherently think it'll get better. I think it'll be able to do a lot, a lot more of the stuff as the models get bigger, as they have longer context as they, there's a lot of really cool things that will end up coming out and yeah, I don't think it's actually very far away from being like, much, much better.[00:37:24] It'll go from the junior engineer to like the, the principal engineer probably pretty quickly. Like I, I don't think the gap is, is really that large between where things are right now. I think like getting it to the point. 60% of the stuff really well to get it to do like 90% of the stuff really well is like that's within reach in the next, in the next couple of years.[00:37:45] So I'll be really excited to see, and hopefully again, this goes back to like engineers and developers and people who aren't thinking about how to integrate. These tools, whether it's ChatGPT or co-pilot or something else into their workflows to be more efficient. Those are the people who I think will end up getting disrupted by these tools.[00:38:02] So figuring out how to make yourself more valuable than you are today using these tools, I think will be super important for people. Yeah.[00:38:09] Alessio Fanelli: Actually use ChatGPT to debug, like a react hook the other day. And then I posted in our disc and I was like, Hey guys, like look, look at this thing. It really helped me solve this.[00:38:18] And they. That's like the ugliest code I've ever seen. It's like, why are you doing that now? It's like, I don't know. I'm just trying to get[00:38:24] Logan Kilpatrick: this thing to work and I don't know, react. So I'm like, that's the perfect, exactly, that's the perfect solution. I, I did this the other day where I was looking at React code and like I have very briefly seen React and run it like one time and I was like, explain how this is working.[00:38:38] So, and like change it in this way that I want to, and like it was able to do that flawlessly and then I just popped it in. It worked exactly like I. I'll give a[00:38:45] swyx: little bit more context cause I was, I was the guy giving you feedback on your code and I think this is a illustrative of how large language models can sort of be more confident than they should be because you asked it a question which is very specific on how to improve your code or fix your code.[00:39:00] Whereas a real engineer would've said, we've looked at your code and go, why are you doing it at at all? Right? So there's a sort of sycophantic property of martial language. Accepts the basis of your question, whereas a real human might question your question. Mm-hmm. , and it was just not able to do that. I mean, I, I don't see how he could do that.[00:39:17] Logan Kilpatrick: Yeah. It's, it's interesting. I, I saw another example of this the other day as well with some chatty b t prompt and I, I agree. It'll be interesting to see if, and again, I think not to, not to go back to Sam's, to Sam's talk again, but like, he, he talked real about this, and I think this makes a ton of sense, which is like you should be able to have, and this isn't something that that exists right now, but you should be able to have the model.[00:39:39] Tuned in the way that you wanna interact with. Like if you want a model that sort of questions what you're asking it to do, like you should be able to have that. And I actually don't think that that's as far away as like some of the other stuff. Um, It, it's a very possible engineering problem to like have the, to tune the models in that way and, and ask clarifying questions, which is even something that it doesn't do right now.[00:39:59] It'll either give you the response or it won't give you the response, but it'll never say like, Hey, what do you mean by this? Which is super interesting cuz that's like we spend as humans, like 50% of our conversational time being like, what do you mean by that? Like, can you explain more? Can you say it in a different way?[00:40:14] And it's, it's fascinating that the model doesn't do that right now. It's, it's interesting.[00:40:20] swyx: I have written a piece on sort of what AGI hard might be, which is the term that is being thrown around as like a layer of boundary for what is, what requires an A real AGI to do and what, where you might sort of asymptotically approach.[00:40:33] So, What people talk about is essentially a theory of mind, developing a con conception of who I'm talking to and persisting that across sessions, which essentially ChatGPT or you know, any, any interface that you build on top of GPT3 right now would not be able to do. Right? Like, you're not persisting you, you are persisting that history, but you don't, you're not building up a conception of what you know and what.[00:40:54] I should fill in the blanks for you or where I should question you. And I think that's like the hard thing to understand, which is what will it take to get there? Because I think that to me is the, going back to your education thing, that is the biggest barrier, which is I, the language model doesn't have a memory or understanding of what I know.[00:41:11] and like, it's, it's too much to tell them what I don't know. Mm-hmm. , there's more that I don't know than I, than I do know . I think the cool[00:41:16] Logan Kilpatrick: part will be when, when you're able to, like, imagine you could upload all of the, the stuff that you've ever done, all the texts, the work that you've ever done before, and.[00:41:27] The model can start to understand, hey, what are the, what are the conceptual gaps that this person has based on what you've said, based on what you've done? I think that would be really interesting. Like if you can, like I have good notes on my phone and I can still go back to see all of the calculus classes that I took and I could put in all my calculus notebooks and all the assignments and stuff that I did in, in undergrad and grad school, and.[00:41:50] basically be like, Hey, here are the gaps in your understanding of calculus. Go and do this right now. And I think that that's in the education space. That's exactly what will end up happening. You'll be able to put in all this, all the work that you've done. It can understand those ask and then come up with custom made questions and prompts and be like, Hey, how, you know, explain this concept to me and if it.[00:42:09] If you can't do that, then it can sort of put that into your curriculum. I think like Khan Academy as an example, already does some of this, like personalized learning. You like take assessments at the beginning of every Khan Academy model module, and it'll basically only have you watch the videos and do the assignments for the things that like you didn't test well into.[00:42:27] So that's, it's, it's sort of close to already being there in some sense, but it doesn't have the, the language model interface on top of it before we[00:42:34] swyx: get into our lightning round, which is like, Quick response questions. Was there any other topics that you think you wanted to cover? We didn't touch on, whisper.[00:42:40] We didn't touch on Apple. Anything you wanted to[00:42:42] Logan Kilpatrick: talk?[00:42:43] Apple's Neural Engine[00:42:43] Logan Kilpatrick: Yeah, I think the question around Apple stuff and, and the neural engine, I think will be really interesting to see how it all plays out. I think, I don't know if you wanna like ask just to give the context around the neural engine Apple question. Well, well, the[00:42:54] swyx: only thing I know it's because I've seen Apple keynotes.[00:42:57] Everyone has, you know, I, I have a m M one MacBook Cure. They have some kind of neuro chip. , but like, I don't see it in my day-to-day life, so when is this gonna affect me, essentially? And you worked at Apple, so I I was just gonna throw the question over to you, like, what should we[00:43:11] Logan Kilpatrick: expect out of this? Yeah.[00:43:12] The, the problem that I've seen so far with the neural engine and all the, the Mac, and it's also in the phones as well, is that the actual like, API to sort of talk to the neural engine isn't something that's like a common you like, I'm pretty sure it's either not exposed at all, like it only like Apple basically decides in the software layer Yeah.[00:43:34] When, when it should kick in and when it should be used, which I think doesn't really like help developers and it doesn't, that's why no one is using it. I saw a bunch of, and of course I don't have any good insight on this, but I saw a bunch of rumors that we're talking about, like a lot of. Main use cases for the neural engine stuff.[00:43:50] It's, it's basically just in like phantom mode. Now, I'm sure it's doing some processing, but like the main use cases will be a lot of the ar vr stuff that ends up coming out and like when it gets much heavier processing on like. Graphic stuff and doing all that computation, that's where it'll be. It'll be super important.[00:44:06] And they've basically been able to trial this for the last, like six years and have it part of everything and make sure that they can do it cheaply in a cost effective way. And so it'll be cool to see when that I'm, I hope it comes out. That'll be awesome.[00:44:17] swyx: Classic Apple, right? They, they're not gonna be first, but when they do it, they'll make a lot of noise about it.[00:44:21] Yeah. . It'll be[00:44:22] Logan Kilpatrick: awesome. Sure.[00:44:22] Lightning Round[00:44:22] Logan Kilpatrick: So, so are we going to light. Let's[00:44:24] Alessio Fanelli: do it. All right. Favorite AI products not[00:44:28] Logan Kilpatrick: open AI. Build . I think synthesis. Is synthesis.io is the, yeah, you can basically put in like a text prompt and they have like a human avatar that will like speak and you can basically make content in like educational videos.[00:44:44] And I think that's so cool because maybe as people who are making content, like it's, it's super hard to like record video. It just takes a long time. Like you have to edit all the stuff, make sure you sound right, and then when you edit yourself talking it's super weird cuz your mouth is there and things.[00:44:57] So having that and just being able to ChatGPT A script. Put it in. Hopefully I saw another demo of like somebody generating like slides automatically using some open AI stuff. Like I think that type of stuff. Chat, BCG, ,[00:45:10] swyx: a fantastic name, best name of all time .[00:45:14] Logan Kilpatrick: I think that'll be cool. So I'm super excited,[00:45:16] swyx: but Okay.[00:45:16] Well, so just a follow up question on, on that, because we're both in that sort of Devrel business, would you put AI Logan on your video, on your videos and a hundred[00:45:23] Logan Kilpatrick: percent, explain that . A hundred percent. I would, because again, if it reduces the time for me, like. I am already busy doing a bunch of other stuff,[00:45:31] And if I could, if I could take, like, I think the real use case is like I've made, and this is in the sense of like creators wanting to be on every platform. If I could take, you know, the blog posts that I wrote and then have AI break it up into a bunch of things, have ai Logan. Make a TikTok, make a YouTube video.[00:45:48] I cannot wait for that. That's gonna be so nice. And I think there's probably companies who are already thinking about doing that. I'm just[00:45:53] swyx: worried cuz like people have this uncanny valley reaction to like, oh, you didn't tell me what I just watched was a AI generated thing. I hate you. Now you know there, there's a little bit of ethics there and I'm at the disclaimer,[00:46:04] Logan Kilpatrick: at the top.[00:46:04] Navigating. Yeah. I also think people will, people will build brands where like their whole thing is like AI content. I really do think there are AI influencers out there. Like[00:46:12] swyx: there are entire Instagram, like million plus follower accounts who don't exist.[00:46:16] Logan Kilpatrick: I, I've seen that with the, the woman who's a Twitch streamer who like has some, like, she's using like some, I don't know, that technology from like movies where you're like wearing like a mask and it like changes your facial appearance and all that stuff.[00:46:27] So I think there's, there's people who find their niche plus it'll become more common. So, cool. My[00:46:32] swyx: question would be, favorite AI people in communities that you wanna shout up?[00:46:37] Logan Kilpatrick: I think there's a bunch of people in the ML ops community where like that seemed to have been like the most exciting. There was a lot of innovation, a lot of cool things happening in the ML op space, and then all the generative AI stuff happened and then all the ML Ops two people got overlooked.[00:46:51] They're like, what's going on here? So hopefully I still think that ML ops and things like that are gonna be super important for like getting machine learning to be where it needs to be for us to. AGI and all that stuff. So a year from[00:47:05] Alessio Fanelli: now, what will people be the most[00:47:06] Logan Kilpatrick: surprised by? N. I think the AI is gonna get very, very personalized very quickly, and I don't think that people have that feeling yet with chat, BT, but I, I think that that's gonna, that's gonna happen and they'll be surprised in like the, the amount of surface areas in which AI is present.[00:47:23] Like right now it's like, it's really exciting cuz Chat BT is like the one place that you can sort of get that cool experience. But I think that, The people at Facebook aren't dumb. The people at Google aren't dumb. Like they're gonna have, they're gonna have those experiences in a lot of different places and I think that'll be super fascinating to see.[00:47:40] swyx: This is for the builders out there. What's an AI thing you would pay for if someone built it with their personal[00:47:45] Logan Kilpatrick: work? I think more stuff around like transfer learning for, like making transfer, learning easier. Like I think that's truly the way to. Build really cool things is transfer learning, fine tuning, and I, I don't think that there's enough.[00:48:04] Jeremy Howard who created Fasted AI talks a lot about this. I mean, it's something that really resonates with me and, and for context, like at Apple, all the machine learning stuff that we did was transfer learning because it was so powerful. And I think people have this perception that they need to.[00:48:18] Build things from scratch and that's not the case. And I think especially as large language models become more accessible, people need to build layers and products on top of this to make transfer learning more accessible to more people. So hopefully somebody builds something like that and we can all train our own models.[00:48:33] I think that's how you get like that personalized AI experiences you put in your stuff. Make transfer learning easy. Everyone wins. Just just to vector in[00:48:40] swyx: a little bit on this. So in the stable diffusion community, there's a lot of practice of like, I'll fine tune a custom dis of stable diffusion and share it.[00:48:48] And then there also, there's also this concept of, well, first it was textual inversion and then dream booth where you essentially train a concept that you can sort of add on. Is that what you're thinking about when you talk about transfer learning or is that something[00:48:59] Logan Kilpatrick: completely. I feel like I'm not as in tune with the generative like image model community as I probably should be.[00:49:07] I, I think that that makes a lot of sense. I think there'll be like whole ecosystems and marketplaces that are sort of built around exactly what you just said, where you can sort of fine tune some of these models in like very specific ways and you can use other people's fine tunes. That'll be interesting to see.[00:49:21] But, c.ai is,[00:49:23] swyx: what's it called? C C I V I Ts. Yeah. It's where people share their stable diffusion checkpoints in concepts and yeah, it's[00:49:30] Logan Kilpatrick: pretty nice. Do you buy them or is it just like free? Like open. Open source? It's, yeah. Cool. Even better.[00:49:34] swyx: I think people might want to sell them. There's a, there's a prompt marketplace.[00:49:38] Prompt base, yeah. Yeah. People hate it. Yeah. They're like, this should be free. It's just text. Come on, .[00:49:45] Alessio Fanelli: Hey, it's knowledge. All right. Last question. If there's one thing you want everyone to take away about ai, what would.[00:49:51] Logan Kilpatrick: I think the AI revolution is gonna, you know, it's been this like story that people have been talking about for the longest time, and I don't think that it's happened.[00:50:01] It was really like, oh, AI's gonna take your job, AI's gonna take your job, et cetera, et cetera. And I think people have sort of like laughed that off for a really long time, which was fair because it wasn't happening. And I think now, Things are going to accelerate very, very quickly. And if you don't have your eyes wide open about what's happening, like there's a good chance that something that you might get left behind.[00:50:21] So I'm, I'm really thinking deeply these days about like how that is going to impact a lot of people. And I, I'm hopeful that the more widespread this technology becomes, the more mainstream this technology becomes, the more people will benefit from it and hopefully not be affected in that, in that negative way.[00:50:35] So use these tools, put them into your workflow, and, and hopefully that will, and that will acceler. Well,[00:50:41] swyx: we're super happy that you're at OpenAI getting this message out there, and I'm sure we'll see a lot more from you in the coming months[00:50:46] Logan Kilpatrick: and years. I'm excited that this was awesome to be on. This is actually the first, my first in-person podcast.[00:50:52] I've done so many Yeah. Virtual podcasts over the, the covid years and it's, it's super fun to be in person and where the headphones in . Yeah.[00:51:00] swyx: We gotta shout out this studio. I mean, let's, let's get them a shout out Pod on[00:51:03] Alessio Fanelli: in San Francisco, California. Where should people find you? Social media.[00:51:08] Logan Kilpatrick: Twitter. It'll be interesting to see how that, the migration or not migration.[00:51:12] I was, I was pretty sold. I'm like everyone was getting off Twitter and then that seemed like that. It sort of was a network. Network effects are hard too. Yeah, it is hard. So Twitter, I'll see you on Twitter. Thanks so much coming. Thanks. Thanks for having me. This was awesome. Thank you, Logan. Get full access to Latent Space at www.latent.space/subscribe
51:3723/02/2023