This website uses cookies

Read our Privacy policy and Terms of use for more information.

Dear readers, apologies for the delay – flu season has hit North Connecticut hard, and we haven't been spared. Stay safe!

ElevenLabs seems to be everywhere. In January 2025, Lex Fridman’s three-hour interview with Ukrainian President Zelenskyy in Kyiv featured ElevenLabs’ AI-powered translations in English, Ukrainian, and Russian, flawlessly preserving Zelenskyy’s voice and intonations. It was a striking showcase of AI’s ability to bridge language barriers.

Ask anyone, and they’ll say: just try ElevenLabs – it’s incredible. Their partners? A powerhouse lineup across industries. If an ethical concern threatens to cast a shadow over their technology, ElevenLabs moves fast, flipping potential bad publicity into another win. They just seem to do everything right – except that they never share their research and don’t open-source.

A few weeks ago, they closed another big round, raising $108 million at a $3.3 billion valuation – investors were lining up to get in. How did two guys from Poland push ElevenLabs to the forefront of AI voice technology? Why does everyone love them, despite their secrecy?

Let’s explore their journey, dominance in AI voice, and strategy for staying everywhere – and beloved.

In today’s episode:

  • How it all started – dubbing in Poland sucks

  • Research-first company with direct outreach to businesses

  • ChatGPT comes in handy – Investors want GenAI

  • Very wrong predictions

  • Laser-focus business strategy

  • Financial situation – everyone wants a piece

  • Market size

  • Is ElevenLabs Worth It? Pricing and Free Plan

  • Products - Conversational AI

  • What Is the ElevenLabs Controversy?

  • “Open-source” to promote

  • Key competitors – big and small

  • Final Thoughts

  • Bonus: Resources

How ElevenLabs Was Founded

Mati Staniszewski and Piotr Dabkowski have known each other forever – since their teenage years at Copernicus High School in Warsaw. That’s a pretty tight bond, but when you know someone that well, working together can be tough. Still, these two managed to turn their shared passion for technology into a long-term collaboration.

Over the years, they loved their occasional weekend projects, always tackling problems that mattered to both of them. Life, meanwhile, went on. Both Piotr and Mati moved to the UK for college – Mati studied mathematics at Imperial College London, while Piotr pursued computer science at Oxford/Cambridge – further sharpening their technical expertise and entrepreneurial instincts. Then came seven years of working at big-name companies: Opera Software, BlackRock, and Palantir for Mati; Tessian and Google for Piotr.

That could have been the path forever, but one of their weekend projects suddenly revealed a potential solution to a frustration they had shared since childhood: the state of movie dubbing in Poland. Foreign films, often voiced by a single monotone narrator, sounded just awful. While experimenting with speech analysis, Mati and Piotr became intrigued by the nuances of pronunciation, emotion, and tonality in voice. It was a moment of clarity – if voice synthesis could be improved to capture authentic emotion and character, it had the potential to redefine content accessibility worldwide. And maybe, finally, save Polish people from terrible dubbing!

Not waiting too long, in April 2022, Mati and Piotr took their weekend project to the next level and started ElevenLabs.

ElevenLabs' Early Strategy: Research and B2B Outreach

In 2022, ElevenLabs emerged as a research-first company with a mission to make high-quality content universally accessible in any language and voice. Mati became the CEO of the company, and Piotr its CTO. Interestingly, the company initially operated without a physical office and with a lean team of only 15 employees. For the first six months or so, both partners concentrated on the research aspect and building the product. It was all about Text-to-speech (TTS).

TTS is a system that converts written text into natural-sounding spoken language using machine learning. You’ve experienced TTS many times – when talking to Siri or Alexa. At its core, TTS involves several stages: first, text normalization and phoneme conversion, which standardize and encode the input; then, an encoder-decoder architecture (often a sequence-to-sequence model with attention) translates these linguistic representations into intermediate acoustic features like mel-spectrograms. Finally, a neural vocoder (such as WaveNet or its variants) generates the raw audio waveform from these spectrograms. This end-to-end approach leverages deep learning to capture prosody, intonation, and naturalness, enabling highly expressive and realistic speech synthesis.

ElevenLabs was not satisfied with TTS in Siri and Alexa. They dove into research to create something much more emotionally realistic. Finally, they launched their product in Beta in January 2023, and Mati – fully embracing the role of CEO in a lean startup – began reaching out to potential clients, trying to prove ElevenLabs in the field.

Mati reached out to me too. Unfortunately, Turing Post hasn’t yet existed then and TheSequence was not interested in making its content more accessible with voiceover. Anyhow, Mati – manually – reached out to the content creators he wanted to work with and build a sufficient network of use cases. 

By June 2023, over 1 million users has registered on their AI platform. By that time, ElevenLabs has forged several B2B partnerships, collaborating with major industry players such as Storytel, one of the largest audiobook publishers, TheSoul Publishing, a top global content creator platform, renowned game developers like Embark Studios and Paradox Interactive, and the creative media platform MNTN.

Mati was building B2B relationships, and the team concentrated on working towards:

  • Speech synthesis: They focused on long-form TTS that considers context, ensuring natural-sounding speech.

  • Voice cloning: Unlike traditional methods requiring large datasets, ElevenLabs’ system can replicate voices with just one minute of data.

  • Compression: Their models compress speech data 100x more than MP3, allowing for high-quality, efficient encoding.

How ChatGPT Timing Helped ElevenLabs Raise Funding

ElevenLabs raised their first pre-seed round in January 2023 from the friendly Czech fund Credo Ventures, which supports technology founders from Central and Eastern Europe, and Concept Ventures, the UK's largest dedicated pre-seed fund. 

Remember those times? ChatGPT was launched on November 30, 2022, and investors went into a frenzy over anything that even hinted at generative AI. Money poured in, valuations skyrocketed, and suddenly, every startup was "revolutionizing" something with large language models. The thing about ElevenLabs was that by then they’ve already had a proprietary AI model capable of creating natural-sounding, contextually aware voices and had a product they could introduce to clients. They were officially and no-BS generative AI startup.

ElevenLabs' Early Product Roadmap

I think, in the beginning, they didn’t fully appreciate the revolution that LLMs had kicked off in the generative space, with multimodal models on the horizon. In April 2023, in an interview, Mati was making predictions for 5 and 10 years ahead:

  • 5 years: AI-powered dubbing surpassing human quality, enabling Hollywood-level movie translations.

  • 10 years: Real-time voice translation that preserves speaker identity and emotions, removing language barriers in global communication.

I would not be surprised if by May-June – the time for the Series A round – Mati and Piotr reviewed those predictions and urgently embarked on the task of improving their offering with new capabilities that large language models provided.

All in all, ElevenLabs was in an absolutely perfect situation timewise, being a proven Generative AI startup with a product already in beta, and then they executed a perfect business strategy.

ElevenLabs Business Model & Go-to-Market Strategy

ElevenLabs' journey from research to productization has been deliberate and strategic, focusing on industries where high-quality, scalable voice AI can have the most impact. Instead of chasing broad applications, they identified media (newspapers, newsletters), entertainment (film and TV), and publishing (audiobooks) – as an area with an urgent need for better long-form speech synthesis. Endeavor estimated that these industries collectively spent an estimated $6B a year producing high-quality voiceovers for content – a huge opportunity for this two-year-old upstart.

These high-visibility sectors come with built-in audiences, making them a perfect PR amplifier. When a widely followed newsletter adopts your technology, it naturally spreads the word to its readers – who then want to try it themselves. Mati, a former Palantir strategist, had mapped this out from the start: B2B first, then consumer adoption through organic exposure.

They also aimed to disrupt multiple industries:

  • Education: Making content accessible in multiple languages at scale.

  • Gaming: giving voices to characters in many languages

  • Real-time communication: live translation and voice-assisted interactions.

By focusing first on perfecting long-form speech synthesis, they laid the groundwork for broader applications in AI-driven voice transformation. 

What sets ElevenLabs apart is that their strategy was never just about dubbing or cutting costs – they relentlessly pursued entirely new possibilities for voice AI, expanding its role beyond what anyone expected.

ElevenLabs Funding Rounds and Investors

Their strategy paid off well. With clients from all sorts of industries piling up, the investors were knocking at their door. As a result, ElevenLabs is well-backed by VC behemoths with the ability to choose who they take money from.

ElevenLabs Market: AI Voice & Text-to-Speech Industry Size

The AI voice cloning market, valued at $1.45 billion in 2022, is expected to grow at 26.1% CAGR through 2030, driven by demand for AI-driven speech technologies. ElevenLabs is well-positioned in this market, particularly in audiobooks ($5B market, projected to hit $35B by 2030) and enterprise communications. Additionally, its work in assistive technology – helping patients regain expressive speech – taps into a $25B market spanning ALS support, stroke recovery, and elder care. And then there those estimated by Endeavor $6B a year that are spent for producing high-quality voiceovers for content. However you look at it – there are lucrative options for ElevenLabs to thrive.

Is ElevenLabs Worth It? Pricing and Free Plan

ElevenLabs operates a subscription-based SaaS model, generating revenue through tiered pricing tied to text-to-speech character processing volume.

It also runs a voice marketplace, allowing creators to monetize their voice profiles, adding an additional revenue stream. It’s an interesting offering that both addresses frequent accusations that GenAI steals human jobs and strengthens a key part of ElevenLabs community. It gives voice artists a space to share their work while letting users easily discover new voices for their projects.

Revenue Structure

ElevenLabs has experienced significant revenue growth since its launch in 2022. As of October 2024, the company's estimated annual recurring revenue (ARR) was $80 million, a dramatic increase from $25 million at the beginning of the year.

We didn’t find any confirmations about if the company is yet profitable. Its revenue is primarily generated through its AI voice platform, which is used by over 40% of Fortune 500 companies. Key enterprise customers include media companies like the Washington Post and TIME, gaming studios like Paradox Interactive, and publishing houses like HarperCollins.

Products - Conversational AI

ElevenLabs offers AI-driven voice synthesis products, including text-to-speech (TTS), voice cloning, and dubbing tools. Their platform enables users to generate realistic speech in 32 languages, suitable for audiobooks, video voiceovers, and more. The Voice Library provides a vast collection of voices for various projects, while the Dubbing Studio facilitates audio and video translation, preserving the original speaker's emotion and tone. Additionally, the ElevenLabs Reader app allows users to listen to written content across multiple languages.

Throughout this period, ElevenLabs has also developed various speech synthesis models optimized for different use cases, quality levels, and performance requirements. These include models like "Eleven Multilingual v2," known for its lifelike, emotionally rich speech synthesis across 29 languages, and "Eleven Flash v2.5," a fast, affordable model supporting 32 languages with ultra-low latency.

What is ElevenLabs’ Conversational AI?

Finally, utilizing LLMs capabilities to the fullest, recently ElevenLabs introduced their Conversational AI platform designed to deploy customized, interactive voice agents that facilitate natural, human-like interactions. It integrates several key components:

  • Speech-to-Text (STT): Accurately transcribes user speech into text.

  • Large Language Models (LLMs): Processes the transcribed text to understand context and generate appropriate responses.

  • Text-to-Speech (TTS): Converts the generated text responses back into natural-sounding speech.

Additionally, the platform features advanced turn-taking and interruption handling mechanisms, ensuring smooth and responsive conversations. Users can select from a vast library of voices or clone their own to match specific needs. The system also supports integration with external applications through function calling, enabling real-time information retrieval and action execution. This flexibility allows for a wide range of applications, including customer support agents, virtual tutors, interactive game characters, and more.

What Is the ElevenLabs Controversy?

There is a bunch of interesting research on TTS, Conversational AI, LLMs etc – but none from ElevenLabs.

In January 2023, just a few days after ElevenLabs opened up their Beta, some bad actors, of course, used the technology for pranks, cloning the voices of famous people. Reacting fast, the company introduced a $5 starter tier, as they said, just to be sure that everyone who plays with the technology can be identified. Twitter immediately burst out cursing, predicting the early death of the startup because of a soon-to-come open-source project with the same capabilities.

We are in February 2025 now, and though there are open-source projects like TorToise TTS, ElevenLabs is alive and thriving – without having publicly shared formal research papers or open-sourcing any of their technology. While their website references a "Research Lab" focused on voice generation, it does not provide links to specific publications. This suggests that their research remains internal and is not yet available through traditional academic channels. A search for technical reports also yielded no results, indicating that the company keeps its technical details private, possibly for competitive reasons or intellectual property protection.

They remain quite secretive about their technology. At the same time, they offer such an intuitive user experience that once trying it, people get hooked on its simplicity and beauty.

Is ElevenLabs Open Source?

ElevenLabs maintains a GitHub repository with open-source code and documentation related to how implement their products. These “open-source” contributions are more about promoting their ecosystem rather than fully embracing open-source principles. While they provide developer tools, like their Python API and example projects on GitHub, these are primarily designed to make integration with their proprietary models easier. They don’t open-source their core voice synthesis technology or models, meaning users still rely on their API and paid services. It’s more of a strategic move – giving developers just enough to build with their tools while keeping the core technology closed and monetized. It’s “open” for convenience, not in the true spirit of open-source freedom.

ElevenLabs Competitors: AI Voice Market Overview

ElevenLabs faces competition from both AI-native startups and established tech giants:

1. AI-native startups

Several companies are carving out niches in voice synthesis, offering distinct approaches:

  • MURF.AI, Play.ht, and WellSaid Labs – These startups focus on AI-generated voiceovers for content creators, providing synthetic speech solutions for videos, audiobooks, and corporate training.

  • Descript – A prominent tool for content creators, Descript offers AI voice cloning for podcasts, YouTube, and professional editing.

  • Replica Studios – Specializes in gaming and immersive experiences, generating unique character voices for interactive storytelling.

  • Resemble AI – Offers real-time voice cloning and synthetic speech for personalized content and dynamic conversations.

  • Voicery – Focuses on real-time voice interactions with an emphasis on emotion and natural intonation.

2. Big Tech players

The largest competitors in the AI voice space remain Google, Amazon, and OpenAI, though their approach differs:

  • Google Cloud TTS & Amazon Polly – These platforms provide cloud-based speech synthesis but have struggled with integrating the latest AI advances into existing ecosystems. Their models often lack the expressiveness and customization that ElevenLabs delivers.

  • OpenAI’s Voice Models – While OpenAI has integrated voice capabilities into ChatGPT, its focus is primarily on multimodal AI, rather than specializing in lifelike synthetic speech.

Key Differentiator:
ElevenLabs has outpaced these larger firms by being hyper-focused on AI voice synthesis, allowing for faster iteration and superior product-market fit.

Final Thoughts

ElevenLabs' edge lies in its hyper-realistic, emotionally rich AI voices, outperforming Big Tech’s generic TTS solutions. Its well-thought-out strategy of partnerships and showcases (like the Lex Fridman podcast) both drives revenue and keeps the company consistently in the news, maintaining visibility and irritating other startup competitors. Its real-time responsiveness, user-friendly interface, developer-friendly API, and seamless integration make it the go-to choice for media, gaming, and publishing. Strong ethical safeguards, including consent mechanisms and deepfake detection, reinforce trust. A thriving creator marketplace adds network effects, while a high-growth SaaS model fuels rapid revenue expansion. They are so good at all of this, it’s almost boring. As ElevenLabs continues to expand, staying ahead in both technology and market adoption will be key. ElevenLabs is well-positioned for the future.

Thank you for reading and supporting Turing Post 🤍 We appreciate you

Reply

Avatar

or to participate

Keep Reading