Subscribe to our YouTube channel, or listen the interviews on Spotify / Apple
What limits AI today isnโt imagination โ itโs the cost of running it at scale.
In this episode of Inference, I sat down with Lin Qiao, co-founder & CEO of Fireworks AI โ an inference-first company, and former head of PyTorch at Meta, where she led the rebuild of Metaโs entire AI infrastructure stack.
We talk about:
Why product-market fit can be the beginning of bankruptcy in GenAI
The iceberg problem of hidden GPU costs
Why inference scales with people, not researchers 2025 as the year of AI agents (coding, hiring, SRE, customer service, medical, marketing)
Open vs closed models โ and why Chinese labs are setting new precedents
The coming wave of 100ร more efficient AI infrastructure
Watch to hear Linโs vision for inference, alignment, and the future of AI infrastructure. And โ at the end โ Lin shares her very personal journey to overcome fears. Watch it now โ
This is a free edition. Upgrade if you want to receive our deep dives directly in your inbox. If you want to support us without getting a subscription โ do it here.
This transcript is edited by GPT-5. Let me know what you think. And โ itโs always better to watch the full video) โฌ๏ธ
Ksenia Se:
Hello everyone, and welcome back to Inference, the interview series on Turing Post. Today Iโm thrilled to talk with Lin Qiao, co-founder and CEO of Fireworks AI, and former head of PyTorch at Meta, where she led the rebuild of Metaโs entire AI infrastructure stack. Welcome, Lin. Letโs start with the big question: when will inference become a solved problem? What would it take for inference to feel like electricity โ reliable, cheap, and invisible? And what still stands in the way?
Lin Qiao:
Thanks for having me. Thatโs an interesting question. I think weโre just at the starting point of optimizing inference, and there are many dimensions to look at. I want to start from the perspective of active operators โ the cohort we care most about.
Thereโs no question that GenAI is a revolutionary technology. It can generate content on par with or beyond human interaction with the real world, and thatโs its biggest value. Because of this, itโs safe to predict weโll see many generational companies emerge, defining new user experiences that never existed before. Theyโll disrupt industries and change how we interact with software day to day.
Through that lens, weโre already seeing a lot of innovation. But thereโs an interesting phenomenon: in traditional startups, once you hit product-market fit, you scale โ and thatโs how you build a viable business. With GenAI applications, hitting product-market fit and having a viable business are two different problems.
You can create a new user experience that delivers tremendous value to consumers or developers, but that doesnโt mean you can quickly scale to a viable business. The cost structure is so much higher. Everything around GPUs โ the infrastructure, the operations โ is orders of magnitude more expensive than building traditional apps on CPUs.
We often hear stories from companies that say: weโre confident in our product, the signals are great, we even have a waiting list of millions of usersโฆ but we canโt open the floodgates, because if we do, weโll run out of money. In other words, in GenAI, hitting product-market fit can actually be the beginning of bankruptcy.
Ksenia: Thatโs very interesting. It really is something new.
Lin:
Yes, and itโs fundamental. You can visualize it like an iceberg. Right now, a huge iceberg of GenAI applications is being built, but most of it is still submerged under the waterline because infrastructure costs are so high. If those costs shrink by even 10x, the number of applications emerging above the waterline will be enormous. Thatโs where the future is.
When it comes to how infrastructure costs shrink, there are many approaches. Iโll share our observations from working with leading application providers.
Ksenia: So whatโs your approach โ how do you make this iceberg smaller?
Lin:
It boils down to a fundamental misalignment between two sets of data. On one side, you have the data used to train foundation models in research labs โ whether open or closed models. Those labs define objectives, design problem statements, and curate datasets to produce the outcomes they want.
On the other side, you have application developers. Their goal is product design that maximizes user engagement. They constantly experiment with features and collect product data. That data distribution is built for a completely different purpose.
So when app developers use these foundation models to power their products, they inherit this misalignment. And thatโs the root cause of gaps we see in accuracy, latency, and efficiency.
Some young companies have figured out how to close this gap โ aligning their product data with the models to build systems that are faster, cheaper, and more accurate. That allows them to scale beyond product-market fit into viable businesses. But the majority still treat models as utilities โ they send requests to the API without addressing the underlying misalignment.
Thatโs the dynamic space we see right now. And itโs where weโre trying to help โ enabling application developers to close that alignment gap.
Ksenia: So youโre working with enterprises to align those two data streams. Tell me a little about your journey โ you founded Fireworks in October 2022, before ChatGPT. Why then? And how did your vision and approach change once generative AI really boomed?
Lin:
Our founding team is large, and many of us worked at Meta for seven to ten years, essentially bootstrapping Metaโs AI infrastructure across both training and inference. When we started Fireworks in September 2022, we had the option to focus on either side โ PyTorch is used for both. At that time, most people were focused on training: building models, calling GPUs for training, or creating training infrastructure. We made a strategic decision to go all in on inference.
Why? Because inference scales fundamentally differently. Training scales with a small pool of researchers. Inference scales with consumers and developers โ with the entire world population as the upper bound. The production requirements are higher, the complexity greater, and those are the kinds of problems we wanted to solve.
Looking back, that choice set us apart. It let us build an inference toolchain sophisticated enough to make us the best provider on that side of the stack. Our approach ties back to the data alignment problem I mentioned earlier. We donโt believe in โone size fits all.โ Instead, we believe in โone size fits one.โ Every application workload is different, and we optimize for each one.
The analogy we use is a database. A database doesnโt treat every query the same โ it runs a query optimizer that figures out the most efficient execution plan. We apply the same idea to inference, but itโs even more complex. We built what we call our 3D optimizer, which optimizes across three dimensions simultaneously: quality, speed, and cost.
The challenge is the search space โ there are many underlying components, each with dozens of options, leading to hundreds of thousands of possible combinations. Weโre finding the one needle in that haystack. But the good news is, weโre very good at solving these kinds of problems. Today, nearly all Fireworks customers use our 3D optimizer.
Ksenia: It must be hard to explain that level of complexity to enterprises.
Lin:
Thatโs what happens under the hood. With enterprises, we map it to business value and use cases. And right now, in 2025, the big theme is agents. Startups and enterprises alike are building them.
There are coding agents โ of many types โ that dramatically increase developer productivity. Hiring agents that take a job profile, source candidates, run interviews, and assess performance. SRE agents that debug and triage production issues during incidents. Customer service agents โ hugely popular, since some enterprises have 20,000+ human agents today. Making them more productive translates to massive cost savings.
We also see marketing agents that can automatically design outbound campaigns targeted to specific audiences. And adoption is spreading across verticals: medical, retail, education, finance, and more.
When we talk to enterprises, we frame the impact of our 3D optimizer through these case studies. That lands better than talking only in technical terms.
Ksenia: And how do you talk to them about models? Whatโs your approach? Do you lean toward general-purpose models, or smaller, narrower ones?
Lin:
We believe strongly in developing in the open. Our business model is mainly focused on open models because they give enterprises transparency and control โ something they care about deeply.
That said, we also look at it from the userโs perspective. Their goal isnโt to make open models successful โ their goal is to solve business problems and deliver impact. Theyโll use whatever tool helps them do that. So in enterprise engagements, we provide a โcookbookโ for building an AI gateway. It connects to whatever model providers they want, and weโre one of those providers.
We help standardize that stack. We also give them private evaluation benchmarks so they can objectively compare models for different use cases. If a closed model works better, weโll simply show them the report and let them decide. And if they want to tune an open model for the best quality, we provide the tools. Our principle is simple: meet customers where they are, rather than forcing them into a vendorโs frame.
Ksenia: Speaking of open models โ Chinese labs have been on fire lately. DeepSeek R1, Kimi K2, GLM from Zhipuโฆ theyโve set new precedents for everyone. Why do you think theyโve been so successful?
Lin:
Itโs fascinating, especially since they often have more resource constraints โ fewer powerful GPUs โ yet theyโve achieved remarkable results. I see this as a sign of convergence. Closed and open models are starting to look similar in quality.
At the end of the day, two factors matter: training techniques and data. On techniques, talent flows globally now. People move, research is shared. Thereโs less โsecret sauceโ than before. On data, the distributions that determine model quality are also converging. Everyone draws from similar public datasets and works with the same labeling companies. That levels the playing field.
So the real race becomes: how do you generate more and better data? Large models generate synthetic data to train smaller ones. Thatโs capital intensive, but synthetic data quality is improving. Everyoneโs experimenting.
Another frontier is training with inference in mind. Instead of just optimizing training-time quality, labs are tweaking architectures to improve inference-time performance โ making models faster and cheaper to run without losing accuracy. Weโve seen a lot of creative work here, especially from recent releases. Expect more small but important architectural tweaks rather than massive leaps.
Overall, I think base models in text are converging. But in multimodality โ voice, vision, video โ closed models are ahead. Theyโve invested heavily there, while open models are more focused on reasoning, coding, and tool use for agents. Multimodal will catch up, but text-based models will converge faster.
Ksenia: Yes, multimodal is much more expensive, so it makes sense for closed labs to protect their inventions. But I agree โ open will catch up. Iโm curious about something else. Chinese companies like Zhipu (now ZAI) tie open source to their AGI vision. In the U.S., itโs different. Your former employer Meta recently suggested they might not open source everything โ Mark Zuckerberg said theyโd be more cautious. Why the difference?
Lin:
I donโt think Meta has made a final call. There are still healthy internal debates about priorities โ whether to focus on product to drive revenue, or on ecosystem to consolidate around LLaMA. Thatโs just business strategy.
At the same time, Google and others in the U.S. are also driving open models. And contributing to the open community is no small task โ the bar is high. Anyone releasing a new model has to show strong benchmarks against the latest open and closed models. That competition raises the quality bar for everyone.
So I remain excited. The open ecosystem will only get stronger because no one can afford to lower the bar. Each new release becomes a demonstration of research depth, talent density, and technical output โ good for the companyโs reputation, and good for the broader community.
Ksenia: If we talk about superintelligence and AGI โ whatโs your take? Is it about solving intelligence, or more about building better tools? Or do you not think about it that way at all?
Lin:
We have a strong opinion on this. Fireworks wants to provide value by making application developers shine. Our role is to build the best tools and infrastructure for them โ so they can easily create a data flywheel from their product.
Product data aligns with the model, making the model better tuned to the application. A better model drives better user engagement. More engagement produces more data. More data improves the model again. Thatโs the virtuous cycle we want developers to build with Fireworks.
So our position is clear: we add value as tools and infrastructure. We want to empower developers to create social value that shows up in everyday life โ things my mom might use and say, โthis is really cool.โ And Iโll be able to tell her, โthat app runs on Fireworks.โ Thatโs the impact weโre aiming for.
Ksenia: When do you think AI infrastructure will become much easier and lighter โ really everywhere?
Lin:
Weโve already shown whatโs possible. With our 3D optimizer weโve accelerated inference speed and reduced cost by 4x to 10x โ sometimes even improving quality at the same time. Thatโs the power weโve demonstrated, and weโll keep pushing it.
But in the bigger picture, I believe weโll see 100x more efficient infrastructure. Look at CPUs: from single core to dual core to many-core, every generation brought better price-performance and lower manufacturing cost with scale. The same thing will happen with GPUs, ASICs, and accelerators. Hardware will get more efficient, and infrastructure around it will too.
Ksenia: Your approach is pragmatic. Youโre building the AI world as it comes. What excites you most โ and what concerns you most โ about this world youโre building?
Lin:
What excites me is the pace. This is a generational technology shift, bigger than cloud-first or mobile-first. Every day we wake up and go to sleep thinking about how to solve problems in this new paradigm. It sparks intellectual curiosity and creativity, not just in our team but across the whole community.
What keeps me up at night is balance. Our principles are โcustomer firstโ and โhigh velocity of innovation.โ But moving fast has a flip side: infrastructure must also be stable and reliable. Striking that balance is critical. We want to innovate quickly, scale features quickly, and support customization quickly โ but always on a foundation that enterprises can trust. Managing that tension is the challenge.
Ksenia: Thank you. My last question: what is a book or idea that shaped how you think about leadership and the future?
Lin:
I canโt point to one book. Itโs more my life experience โ the people Iโve worked with, the internal journey of learning to think differently. Seventeen years ago I was a completely different person. I had passion, but also an inner voice saying, โothers can do this better, not you.โ It took me years to quiet that voice.
The change came because people pushed me โ they challenged me, forced me to embrace the idea that I could accomplish things, and gave me a different imagination for what was possible. That journey shaped me deeply.
It also gave me insight I share with others: often the real limitation is your inner voice, not the external world. Thatโs why I believe finding people who challenge you โ who make you uncomfortable โ is a blessing. Itโs painful, but if you face it, even if you fail sometimes, you come out a different person on the other side.
Ksenia: So your book is the people around you.
Lin:
Definitely. The people who challenge you and push you through the tunnel of discomfort โ theyโre the ones who shape who you become.
Ksenia: Thatโs very inspirational. Thank you so much for this interview.
Lin:
Thank you for having me. I had a lot of fun.
