Subscribe to our YouTube channel, or listen the interviews on Spotify / Apple
What limits AI today isnβt imagination β itβs the cost of running it at scale.
In this episode of Inference, I sat down with Lin Qiao, co-founder & CEO of Fireworks AI β an inference-first company, and former head of PyTorch at Meta, where she led the rebuild of Metaβs entire AI infrastructure stack.
We talk about:
Why product-market fit can be the beginning of bankruptcy in GenAI
The iceberg problem of hidden GPU costs
Why inference scales with people, not researchers 2025 as the year of AI agents (coding, hiring, SRE, customer service, medical, marketing)
Open vs closed models β and why Chinese labs are setting new precedents
The coming wave of 100Γ more efficient AI infrastructure
Why Product-Market Fit Can Bankrupt a GenAI Startup
Watch to hear Linβs vision for inference, alignment, and the future of AI infrastructure. And β at the end β Lin shares her very personal journey to overcome fears. Watch it now β
This is a free edition. Upgrade if you want to receive our deep dives directly in your inbox. If you want to support us without getting a subscription β do it here.
This transcript is edited by GPT-5. Let me know what you think. And β itβs always better to watch the full video) β¬οΈ
Ksenia Se:
Hello everyone, and welcome back to Inference, the interview series on Turing Post. Today Iβm thrilled to talk with Lin Qiao, co-founder and CEO of Fireworks AI, and former head of PyTorch at Meta, where she led the rebuild of Metaβs entire AI infrastructure stack. Welcome, Lin. Letβs start with the big question: when will inference become a solved problem? What would it take for inference to feel like electricity β reliable, cheap, and invisible? And what still stands in the way?
Lin Qiao:
Thanks for having me. Thatβs an interesting question. I think weβre just at the starting point of optimizing inference, and there are many dimensions to look at. I want to start from the perspective of active operators β the cohort we care most about.
Thereβs no question that GenAI is a revolutionary technology. It can generate content on par with or beyond human interaction with the real world, and thatβs its biggest value. Because of this, itβs safe to predict weβll see many generational companies emerge, defining new user experiences that never existed before. Theyβll disrupt industries and change how we interact with software day to day.
Through that lens, weβre already seeing a lot of innovation. But thereβs an interesting phenomenon: in traditional startups, once you hit product-market fit, you scale β and thatβs how you build a viable business. With GenAI applications, hitting product-market fit and having a viable business are two different problems.
You can create a new user experience that delivers tremendous value to consumers or developers, but that doesnβt mean you can quickly scale to a viable business. The cost structure is so much higher. Everything around GPUs β the infrastructure, the operations β is orders of magnitude more expensive than building traditional apps on CPUs.
We often hear stories from companies that say: weβre confident in our product, the signals are great, we even have a waiting list of millions of usersβ¦ but we canβt open the floodgates, because if we do, weβll run out of money. In other words, in GenAI, hitting product-market fit can actually be the beginning of bankruptcy.
Ksenia: Thatβs very interesting. It really is something new.
Lin:
Yes, and itβs fundamental. You can visualize it like an iceberg. Right now, a huge iceberg of GenAI applications is being built, but most of it is still submerged under the waterline because infrastructure costs are so high. If those costs shrink by even 10x, the number of applications emerging above the waterline will be enormous. Thatβs where the future is.
When it comes to how infrastructure costs shrink, there are many approaches. Iβll share our observations from working with leading application providers.
The Alignment Gap: Why Foundation Models Don't Fit Your Product
Ksenia: So whatβs your approach β how do you make this iceberg smaller?
Lin:
It boils down to a fundamental misalignment between two sets of data. On one side, you have the data used to train foundation models in research labs β whether open or closed models. Those labs define objectives, design problem statements, and curate datasets to produce the outcomes they want.
On the other side, you have application developers. Their goal is product design that maximizes user engagement. They constantly experiment with features and collect product data. That data distribution is built for a completely different purpose.
So when app developers use these foundation models to power their products, they inherit this misalignment. And thatβs the root cause of gaps we see in accuracy, latency, and efficiency.
Some young companies have figured out how to close this gap β aligning their product data with the models to build systems that are faster, cheaper, and more accurate. That allows them to scale beyond product-market fit into viable businesses. But the majority still treat models as utilities β they send requests to the API without addressing the underlying misalignment.
Thatβs the dynamic space we see right now. And itβs where weβre trying to help β enabling application developers to close that alignment gap.
Ksenia: So youβre working with enterprises to align those two data streams. Tell me a little about your journey β you founded Fireworks in October 2022, before ChatGPT. Why then? And how did your vision and approach change once generative AI really boomed?
Lin:
Our founding team is large, and many of us worked at Meta for seven to ten years, essentially bootstrapping Metaβs AI infrastructure across both training and inference. When we started Fireworks in September 2022, we had the option to focus on either side β PyTorch is used for both. At that time, most people were focused on training: building models, calling GPUs for training, or creating training infrastructure. We made a strategic decision to go all in on inference.
Why? Because inference scales fundamentally differently. Training scales with a small pool of researchers. Inference scales with consumers and developers β with the entire world population as the upper bound. The production requirements are higher, the complexity greater, and those are the kinds of problems we wanted to solve.
Looking back, that choice set us apart. It let us build an inference toolchain sophisticated enough to make us the best provider on that side of the stack. Our approach ties back to the data alignment problem I mentioned earlier. We donβt believe in βone size fits all.β Instead, we believe in βone size fits one.β Every application workload is different, and we optimize for each one.
The analogy we use is a database. A database doesnβt treat every query the same β it runs a query optimizer that figures out the most efficient execution plan. We apply the same idea to inference, but itβs even more complex. We built what we call our 3D optimizer, which optimizes across three dimensions simultaneously: quality, speed, and cost.
The challenge is the search space β there are many underlying components, each with dozens of options, leading to hundreds of thousands of possible combinations. Weβre finding the one needle in that haystack. But the good news is, weβre very good at solving these kinds of problems. Today, nearly all Fireworks customers use our 3D optimizer.
How Fireworks AI Optimizes Inference: Quality, Speed, and Cost
Ksenia: It must be hard to explain that level of complexity to enterprises.
Lin:
Thatβs what happens under the hood. With enterprises, we map it to business value and use cases. And right now, in 2025, the big theme is agents. Startups and enterprises alike are building them.
There are coding agents β of many types β that dramatically increase developer productivity. Hiring agents that take a job profile, source candidates, run interviews, and assess performance. SRE agents that debug and triage production issues during incidents. Customer service agents β hugely popular, since some enterprises have 20,000+ human agents today. Making them more productive translates to massive cost savings.
We also see marketing agents that can automatically design outbound campaigns targeted to specific audiences. And adoption is spreading across verticals: medical, retail, education, finance, and more.
When we talk to enterprises, we frame the impact of our 3D optimizer through these case studies. That lands better than talking only in technical terms.
AI Agents in 2025: Coding, Hiring, SRE, and Beyond
Ksenia: And how do you talk to them about models? Whatβs your approach? Do you lean toward general-purpose models, or smaller, narrower ones?
Lin:
We believe strongly in developing in the open. Our business model is mainly focused on open models because they give enterprises transparency and control β something they care about deeply.
That said, we also look at it from the userβs perspective. Their goal isnβt to make open models successful β their goal is to solve business problems and deliver impact. Theyβll use whatever tool helps them do that. So in enterprise engagements, we provide a βcookbookβ for building an AI gateway. It connects to whatever model providers they want, and weβre one of those providers.
We help standardize that stack. We also give them private evaluation benchmarks so they can objectively compare models for different use cases. If a closed model works better, weβll simply show them the report and let them decide. And if they want to tune an open model for the best quality, we provide the tools. Our principle is simple: meet customers where they are, rather than forcing them into a vendorβs frame.
Ksenia: Speaking of open models β Chinese labs have been on fire lately. DeepSeek R1, Kimi K2, GLM from Zhipuβ¦ theyβve set new precedents for everyone. Why do you think theyβve been so successful?
Lin:
Itβs fascinating, especially since they often have more resource constraints β fewer powerful GPUs β yet theyβve achieved remarkable results. I see this as a sign of convergence. Closed and open models are starting to look similar in quality.
At the end of the day, two factors matter: training techniques and data. On techniques, talent flows globally now. People move, research is shared. Thereβs less βsecret sauceβ than before. On data, the distributions that determine model quality are also converging. Everyone draws from similar public datasets and works with the same labeling companies. That levels the playing field.
So the real race becomes: how do you generate more and better data? Large models generate synthetic data to train smaller ones. Thatβs capital intensive, but synthetic data quality is improving. Everyoneβs experimenting.
Another frontier is training with inference in mind. Instead of just optimizing training-time quality, labs are tweaking architectures to improve inference-time performance β making models faster and cheaper to run without losing accuracy. Weβve seen a lot of creative work here, especially from recent releases. Expect more small but important architectural tweaks rather than massive leaps.
Overall, I think base models in text are converging. But in multimodality β voice, vision, video β closed models are ahead. Theyβve invested heavily there, while open models are more focused on reasoning, coding, and tool use for agents. Multimodal will catch up, but text-based models will converge faster.
Why Chinese AI Labs Are Setting New Benchmarks
Ksenia: Yes, multimodal is much more expensive, so it makes sense for closed labs to protect their inventions. But I agree β open will catch up. Iβm curious about something else. Chinese companies like Zhipu (now ZAI) tie open source to their AGI vision. In the U.S., itβs different. Your former employer Meta recently suggested they might not open source everything β Mark Zuckerberg said theyβd be more cautious. Why the difference?
Lin:
I donβt think Meta has made a final call. There are still healthy internal debates about priorities β whether to focus on product to drive revenue, or on ecosystem to consolidate around LLaMA. Thatβs just business strategy.
At the same time, Google and others in the U.S. are also driving open models. And contributing to the open community is no small task β the bar is high. Anyone releasing a new model has to show strong benchmarks against the latest open and closed models. That competition raises the quality bar for everyone.
So I remain excited. The open ecosystem will only get stronger because no one can afford to lower the bar. Each new release becomes a demonstration of research depth, talent density, and technical output β good for the companyβs reputation, and good for the broader community.
Ksenia: If we talk about superintelligence and AGI β whatβs your take? Is it about solving intelligence, or more about building better tools? Or do you not think about it that way at all?
Lin:
We have a strong opinion on this. Fireworks wants to provide value by making application developers shine. Our role is to build the best tools and infrastructure for them β so they can easily create a data flywheel from their product.
Product data aligns with the model, making the model better tuned to the application. A better model drives better user engagement. More engagement produces more data. More data improves the model again. Thatβs the virtuous cycle we want developers to build with Fireworks.
So our position is clear: we add value as tools and infrastructure. We want to empower developers to create social value that shows up in everyday life β things my mom might use and say, βthis is really cool.β And Iβll be able to tell her, βthat app runs on Fireworks.β Thatβs the impact weβre aiming for.
Ksenia: When do you think AI infrastructure will become much easier and lighter β really everywhere?
Lin:
Weβve already shown whatβs possible. With our 3D optimizer weβve accelerated inference speed and reduced cost by 4x to 10x β sometimes even improving quality at the same time. Thatβs the power weβve demonstrated, and weβll keep pushing it.
But in the bigger picture, I believe weβll see 100x more efficient infrastructure. Look at CPUs: from single core to dual core to many-core, every generation brought better price-performance and lower manufacturing cost with scale. The same thing will happen with GPUs, ASICs, and accelerators. Hardware will get more efficient, and infrastructure around it will too.
Ksenia: Your approach is pragmatic. Youβre building the AI world as it comes. What excites you most β and what concerns you most β about this world youβre building?
Lin:
What excites me is the pace. This is a generational technology shift, bigger than cloud-first or mobile-first. Every day we wake up and go to sleep thinking about how to solve problems in this new paradigm. It sparks intellectual curiosity and creativity, not just in our team but across the whole community.
What keeps me up at night is balance. Our principles are βcustomer firstβ and βhigh velocity of innovation.β But moving fast has a flip side: infrastructure must also be stable and reliable. Striking that balance is critical. We want to innovate quickly, scale features quickly, and support customization quickly β but always on a foundation that enterprises can trust. Managing that tension is the challenge.
Lin Qiao on Leadership: Overcoming the Inner Voice
Ksenia: Thank you. My last question: what is a book or idea that shaped how you think about leadership and the future?
Lin:
I canβt point to one book. Itβs more my life experience β the people Iβve worked with, the internal journey of learning to think differently. Seventeen years ago I was a completely different person. I had passion, but also an inner voice saying, βothers can do this better, not you.β It took me years to quiet that voice.
The change came because people pushed me β they challenged me, forced me to embrace the idea that I could accomplish things, and gave me a different imagination for what was possible. That journey shaped me deeply.
It also gave me insight I share with others: often the real limitation is your inner voice, not the external world. Thatβs why I believe finding people who challenge you β who make you uncomfortable β is a blessing. Itβs painful, but if you face it, even if you fail sometimes, you come out a different person on the other side.
Ksenia: So your book is the people around you.
Lin:
Definitely. The people who challenge you and push you through the tunnel of discomfort β theyβre the ones who shape who you become.
Ksenia: Thatβs very inspirational. Thank you so much for this interview.
Lin:
Thank you for having me. I had a lot of fun.

