- Turing Post
- Posts
- 🎙️When Will We Give AI True Memory?
🎙️When Will We Give AI True Memory?
An Inference with Edo Liberty / CEO and co-founder @ Pinecone
Hi everyone – hope the weekend's treating you well. Turing Post now has a proper YouTube channel – ‘Inference’, and you can listen to our podcast on all major platforms: YouTube, Spotify or Apple Podcasts.
We’re also working on more video content – both serious and fun. A few interesting things will be coming from Microsoft Build next week – we are visiting their Quantum Computing Lab and Applied Sciences Lab. Stay tuned ——>
What happens when one of the architects of modern vector database asks whether AI can remember like a seasoned engineer, not a gold‑fish savant? In this episode, Edo Liberty – founder & CEO of Pinecone and one‑time Amazon scientist – joins me to discuss memory, knowledge, and intelligence of the models. We unpack the gap between raw cognitive skill and workable knowledge, why RAG still feels pre‑ChatGPT, and the breakthroughs needed to move from demo‑ware to dependable memory stacks.
Edo explains why a vector database needs to be built from the ground (and then rebuilt many times), that storage – not compute – has become the next hardware frontier, and predicts a near‑term future where ingesting a million documents is table stakes for any serious agent. We also touch the thorny issues of truth, contested data, and whether knowledgeable AI is an inevitable waypoint on the road to AGI.
Whether you wrangle embeddings for a living, scout the next infrastructure wave, or simply wonder how machines will keep their facts straight, this conversation will sharpen your view of “memory” in the age of autonomous agents.
Let’s find out when tomorrow’s AI will finally remember what matters.
This is a free edition. Upgrade if you want to receive our deep dives directly in your inbox. If you want to support us without getting a subscription – do it here.
The transcript (edited for clarity, brevity, and sanity) ⬇️
Ksenia: Hi, Edo, thank you for joining! Let me start from the big question: When will we give AI true memory?
Edo Liberty: It's a fantastic question. Let me slightly zoom out and spend a minute explaining where knowledge and memory are different and what functions they play in true intelligence.
Foundation models today really specialize in having cognitive skills: like reading, writing, summarizing, reasoning, problem solving, math, etc – those are computational problems. That is a completely different kind of machinery that you need, for example, to read all the Boeing technical manuals and being able to then go and replace some part in an engine.
Okay, something has to be able to read, consume, understand, organize, and index in some way to make it available in real time for decision making. So if you're an airplane mechanic – there is a wealth of information at your fingertips to make those decisions. That's knowledge. That’s memory. That's information. To make all of this work, you have to do all of these steps.
It's not enough to digest it correctly, understand it correctly, organize it correctly, in real time access it correctly, post-process it and so on. It's a very complex system. Today RAG (retrieval-augmented generation) became a standard approach to at least have a very, very crude version of this. People have many different variations of RAG and search, combined with models with MCPs and then other stuff. Based on what I can see in terms of quality, in terms of reasoning, in terms of capabilities we are now on the knowledge front, roughly where models were, you know, maybe pre-ChatGPT.
There are good heuristics in the industry that start to already do something qualitatively better than we could before. But we're still very far away from being truly good at this.
We work with thousands of customers who use our own vector database as a basis for RAG. And almost everybody does something very basic first. Then there are like 7,000 things they want to do on top of it. That’s where the industry is right now. We've broken the first barriers, we have amazing ideas, we have infrastructure in place, we have a vector database, we have models, we have all the components. In terms of us truly unlocking this, we're on our way there. I think we will be there fully as an end-to-end automated system that truly understands the memory and has all the information available to it probably in a few years.
Ksenia: How many years?
Edo: It's hard to say. These things tend to move faster than I predict. I think within two years, a lot of people will consider this problem semi-solved. And by semi-solved, I mean they'll rely on something like Pinecone to ingest a million documents — and on agents that can pull the right context from Pinecone at the right time to make good decisions. In the same way they’ve come to expect language models to behave reasonably.
Ksenia: So when you say we're far away… Let's address your scientist-founder side: What breakthroughs do we need to achieve that?
Edo: There are multiple components that need to improve independently – and then be brought together to really work. Pinecone is well known for leading the vector database space, and a vector database is essentially a search engine – infrastructure that stores data and performs the raw search.
If you want to search over hundreds of millions or even billions of documents, with filters, complex embeddings, and all that – you need seriously strong infrastructure. It’s like training large models: you can’t do it without GPUs. Funny enough, in both cases, the key unlock turned out to be hardware.
We had to develop TensorFlow and PyTorch to be a lot better in GPU acceleration and distribution. So we had to do a lot of software investment to be able to do training large models. Pretty much the whole vector database space and our journey into it is basically to be able to throw an infinite amount of data into the same knowledge machine, right? Because that's how you unlock. The scaling just looks different. It's not a compute thing, it's a storage thing.
But that’s not nearly enough.
We have our own research team developing embeddings trained specifically for retrieval, so your data is organized in a way that lets you fetch the right information based on context.
We're about to release a new version of our contextual token-based model – because the same word in different documents can mean slightly different things, carry more or less importance depending on the context. That nuance really matters for search, and the only way to capture it is with models. So we train those too.
We combine sparse and dense search – roughly speaking, searching by words and concepts versus by meaning, context, and overall relevance. You need both. That’s what humans do, too.
But even that’s not enough. In Assistant – and we’re now breaking this off into its own component – the query itself has to be trained.
When you send a query, it’s not like a traditional search where you just match words. If the query is tied to a task, then something – an LLM or another model – needs to interpret the situation you're in and figure out what information it needs to fetch in order to make a good decision. That process becomes a kind of knowledge agent or search agent. I’m not even sure what to call it yet. But it's iterative – you keep pulling in information, and at some point, you go, okay, I’ve got enough context to decide.
And we’re not even touching the full stack – data connectors, PDF parsing – there’s a whole universe of supporting software that has to be built to make this work.
We’re also actively improving all of the following:
We host existing models because people like them.
We ship our own models too.
We’ve built and open-sourced a re-ranker, a coherency ranker, and others for post-processing.
Our Assistant product handles everything end-to-end.
And we’re launching our own context agent, which I mentioned earlier.
The core DB itself is our ability to work at massive, massive, massive scale and make it extremely cheap so that you can operate at those scales and not lose sleep over the fact that you're burning through your club budget. It makes a big difference for companies.
Ksenia: When you first pitched the idea of vector database – back when you were just out of Amazon – you said almost no one understood the concept. After the chat GPT boom, Vector Databases became a hot category with a lot of competition. Now some people say it's becoming commoditized. It becomes just a layer of infrastructure. You mentioned that with GPUs. Do you agree with that? And where do you think the future of vector databases is headed?
Edo: I am incredibly lucky and fortunate to be one of the very few founders who actually started a category. This is a huge privilege. Some of it is an insight and a lot of it is just dumb luck and serendipity and good timing, which again was a version of luck. I didn't know ChatGPT was going to happen. Almost everybody was surprised by the timing of that. And you know, almost the definition of a category is that you have competition. Otherwise it's not a category.
We're being pushed by our customers to be more efficient. We're being pushed to innovate on our technology and our architecture. That’s a journey, and we’re going to keep going: making our vector database faster, bigger, more cost effective, more performant, more feature rich, more secure and so on, more stable.
At the same time, there's also small open-source solutions, incumbent databases like OpenSearch and others, they are adding vector search into their offerings. But the truth is that it works fine at small scale. What we see at Pinecone is that people come to us because they say, yep, I did the proof-of-concept (POC), it worked great. Now I'm ready to go to production. Now I'm ready to scale. And I see either performance degrade, cost go up, stability starts looking a little bit shaky. Just managing this thing becomes a pain in the butt. As an engineer, I'm suddenly on the hook to maintain this thing in production. I’m like: I didn't sign to be a database admin for my job! So when people need a large vector database, when people need production grade, when people need scale and they need this to be cheap and fast and reliable and managed, they come to us. So I have no doubt that vector databases are a category and has to be built from the ground to do that well. Like I told you, we've already evolved our architecture multiple times – each time to unlock a 10x improvement in scale. Even though we designed it specifically as a vector database, we've still had to revisit and evolve the architecture repeatedly. At this point, we're operating at a level that's far beyond what almost any node-based solution can handle.
Ksenia: I think you told me at HumanX that you had to rewrite the whole architecture. Is it right and why did you need to do that?
Edo: To explain that to you, I need to take you on a little historical trip through the kinds of workloads that vector databases have seen. Circa 2020, people didn’t have a lot of data, but they were very aspirational in terms of high throughput. So there were very computationally intense workloads. You’d have one or two or 10 million vectors, but you’d need to query them 1,000 times a second. That requires super advanced algorithms, high-performance computing, and efficient data structures. These are core internals of the DB. We spent several years just optimizing that and becoming extremely good at it.
Then indices started becoming larger. People got used to the idea of vector databases. They got more comfortable with vector embeddings. They said, OK, now we want to run search at large scale. So people started vector searching over a billion vectors – multiple billions. Now distribution becomes incredibly difficult.
By the way, it’s not only that the amount of data became huge – the ratio between compute and storage changed meaningfully, because now those systems are memory-bound, storage-bound, network-bound. They're very rarely CPU-bound. So if you just try to take the high-performance computing thing and replicate it, it will be nauseatingly expensive.
So we built our own serverless solution to be able to fan out and cohabitate thousands of users, so that when you search, everybody can use the same CPUs and share the same storage. Then you can actually get high performance on a massive amount of data even though specific use cases don’t saturate the CPU – that’s the only way you can actually do this cost-effectively.
Interestingly enough, the reason why we had to redo our architecture – or at least change it meaningfully – in the last six months is because a third pattern is now becoming more common. And that third pattern is vector databases – or at least datasets – that are massive, even bigger than the ones I mentioned before. They could be tens or hundreds of billions of vectors. Some customers even talk to us about trillions. We're like, OK, fine, this is starting to push the boundary here. But they’re not searching everything at the same time.
Those 100 billion vectors, say, are actually siloed into maybe a few tens of thousands of shards. As an example, we work very closely with Rubrik. They provide agents on top of folders. So now every folder, or every set of folders, or every user is a different index, right? But they have millions of those. So now you have millions of small indices. By small, I mean anything from 100,000 to a few million vectors.
Now, this is a big departure from the original setup. Before, we said: every index is massive. The object you call an index can be very heavy. You can have all sorts of bookkeeping and indices and substructures and folder structures. If you spend 100 megabytes of memory on each index – of course, much more than that on disk – just on the optimization for the index, that’s fine. It’s going to be great. But when you have 10 million of those, that’s not going to work. It just doesn’t work at all. So you have to rewrite everything. You really have to make sure that you organize data a lot more effectively.
The last thing I’d say is: because we are a fully serverless system, we have no idea, to begin with, if an index like that is going to have 100 vectors or 10,000 or 10 million. We have no idea.
We have no idea if it’s going to be queried once a month or a hundred times a second. We just don’t know. So we also had to rebuild our own LSM structure so that every level has its own indexing – based on how much usage, and how deep you are in the hierarchy.
Because we do have a lot of very small indices. And for those, you really don’t want to index anything. You want to be super scrappy and have almost no structure. The bigger it is, the longer you have the data, the more it's queried – the more you can justify spending the energy to index it better.
And I think a lot of the discussion in the database industry is like: What algorithm is best? What should I use? How should I configure things? And so on.
Our experience is that there is no answer. You have to use all of them.
If you’re an email provider and you have a million users—the top 20% never search at all. Then there’s maybe 60% in your torso who have some normal behavior. And then you have heavy users who do something altogether different. And they might have 10x as much data. That behavior – you’re going to need your database to choose the right indexing at the right level, and all that stuff, dynamically. Otherwise, you’re not going to be able to manage it.
That’s another big, big, big differential. And a completely new thing – we have to be very dynamic, also with our own data structures, to be able to operate across all those operating points.
Ksenia: That's super interesting. As a part of infrastructure for, let's say, agentic workflows, and with so many dynamic parts and constantly rethinking and rebuilding architecture with this basically limitless amount of data queries, what are the questions about memory or infrastructure that we are not asking enough right now, on a bigger scale?
Edo: There again, I have my own bias because I occupy some part of the stack. So take that as a disclaimer from my answer, right?
I am both horrified and delighted that people assume that both LLMs and knowledge agents and search and RAG and things like Pinecone and Assistant just work and they're perfect.
It's horrifying because they're not perfect yet.
But it's also amazing because that means that the people working on it, our scientists, our engineers, and everybody in our level of the stack has a lot of work and has a lot of figuring out to do. The questions that we don't ask as a technology community enough is really what does knowledge mean? What do we expect from these systems? How accurate do they need to be in what setting? What does accuracy even mean?
For example, we always provide references and so on. We have to make sure that all the information you get back is grounded in information you gave us, because you never want it to hallucinate. But is that too high a bar? I don't know. Maybe it is for some applications. How do you break out of that. Do you need to break out of that? There are also really deep questions on what to do with contested information? Sometimes you have a point of view and they have the opposite point of view, both in the data. What do you do with that?
The same questions about truth that we struggle with as society come up with data as well. The fact that something is much more common doesn't make it more true. And models are by nature Bayesian, if something you see more, you learn it more often, you would tend to give that answer more.
So the direct correlation in AI between how common or frequent something is and how truthful the model thinks it is — that's a bug. That's a problem. We have to fix that. There are deep questions around how we process data, understand it, make it coherent, and make it accessible in a way that society can start to trust. I love it, because there's easily 20 years of research ahead of us — and we’re nowhere near done. If I had 50 PhDs working on this, they'd all be very busy!
Ksenia: Can you give me a short description of what you mean by knowledgeable AI?
Yeah, we touched on this a bit, but let me try to give you a more concise answer:
You need a system that can take large amounts of unstructured, unorganized information, digest it, organize it, and then – in real time – surface the insights and data needed to complete a task, answer a question, or solve a problem.
That entire pipeline – from ingestion to insight – is what I call knowledge. What you want is an agent that carries historical context and can bring it to bear in the right moment, in the right setting, to get the job done. That’s it.
Ksenia: Do you see memory and knowledgeable AI as stepping stones toward AGI? And more broadly, what's your take on AGI?
Edo: 100%. I don’t think you can be intelligent without being knowledgeable. It’s just not possible.
Take your primary care doctor – it’s not enough that they have a high IQ. You want them to have gone to medical school, to have actually understood what they read, to remember it well enough to apply it intelligently. IQ alone doesn’t cut it. In fact, they were probably quicker on their feet at 25 than at 50 – but you trust them more now because they’ve accumulated knowledge.
And if you look at the parts of the economy that benefit most from AI — it’s all knowledge work. Lawyers, accountants, patent editors, musicians, artists. If AI is going to help those people, it has to retain information and be truly knowledgeable. Otherwise, it’s just a spell check on steroid – tweak this, write that – and that’s a miss.
Ksenia: I have two last questions. One is about books. I think books form people. What book formed you and is there a particular book or idea you keep returning to as you build Pinecone's future?
Edo: Wow. That's a great question. A) because I don't get to read as much as I would want as a CEO.
Ksenia: That's a problem!
Edo: There's a book I read many years ago that I still come back to, often just thinking about it. I reread it again recently, and actually bought it for some people on the team.
It's called Endurance. It's the Southern Pole expedition by Ernest Shackleton and his team that starts with absolute disaster with their boat being stranded. It's an unbelievable journey. But for me, as a CEO, there are so many different lessons on leadership, on hope, on resourcefulness, on just the limits of the human spirit.
Just imagining people going through what they're going through is awe-inspiring. There's just something about the human spirit there that is, for me, just incredibly inspiring.
Ksenia: That's a great suggestion. I'm making a list of such books. It's always good to have different perspectives, what forms you, what helps you going as a leader, as a scientist. So thank you. That's great.
If we're getting back to AI field, when you think ahead, I don't know, let's say five years, what excites you or concerns you the most about the future, about the world you're helping build?
Edo: I’ll say two things. One is that I’ll reiterate what I said before about truth and knowledge.
Those are, to some extent, technology challenges – but a lot of them are just human challenges. In our world, agreeing on what’s true is not easy, even if you have access to all the information. So it’s not even a question of sources or intelligence. It’s a political problem. It’s a people problem. It’s a human problem. It’s something we have to deal with as a society.
An we are grappling with them all day long. For Pinecone specifically, it’s significantly easier because we work with companies and their data – so they define the desired outcome of the product they’re building. We sort of get a free pass on that, a little bit. But even then, we still have to grapple with some of that information. I think in AI more broadly, this will become a big issue.
Just like search engines had to deal with these issues in the early 2000s, and social networks had to deal with bias, filter bubbles, information silos – all of that will become an AI problem too. These are human problems, not just technology problems.
The second thing I’d say – and this doesn’t worry me as much, but it’s still something we’ll have to deal with – is that there’s a bit of a race between how solid, well-understood, and trustworthy the technology is, versus what you choose to use it for. Ideally, those evolve together.
I want to see us get better – as a society – at curbing unethical or irresponsible uses of AI when we spot them, rather than letting them take off just because they’re profitable and no one wants to shut them down.
We’ll need to get ahead of that. And for me – I’m not in politics or law enforcement – the only way I can fight that is by making the technology better, faster, more trustworthy, more understood, more… you know, more manageable.
Ksenia: Well, these are two concerns. Is there anything that excites you?
Edo: A ton excites me. The whole thing is incredibly exciting. We’re seeing an absolute sea change in how pretty much every profession is practiced. I’ve never seen anything like this in my lifetime – maybe the internet had a similar effect, but that’s about it.
I have young kids in primary school. They already see AI the way 30-year-olds think about the internet. They can’t even imagine a world where you couldn’t talk to a machine and have it respond – talk back, do things, whatever.
It sounds insane to them. The world has already changed – and it’s going to keep changing. The value we’ll unlock from this is massive. Companies will be smaller. People will be doing more. A lot of the annoying, menial cognitive tasks – summarizing, taking notes, entering stuff into systems after meetings – all of that is going to disappear. It’ll be done better, faster, and cheaper. That’s just the societal and economic impact.
For me, as a scientist and engineer, it’s the privilege of actually building this – influencing how it works, and getting to have those “wait, this actually works” moments. It’s like, oh my god. Those eureka moments are unbelievable.
And the fact that we have a platform that serves tens of thousands of developers – it means that just weeks or a couple of months after a breakthrough, we can ship it. People actually use it. That’s the icing on the cake.
Ksenia: Thank you very much for the conversation today.
Do leave a comment |
Reply