Updates:
June 11, 2024
This French startup, founded in April 2023 with the ambitious goal of challenging the European Union's technological supremacy, has earned both admiration and skepticism. What sets Mistral AI apart is its focus on open-source technology and its bold approach, unapologetically offering models devoid of safety controls. According to a list of 178 questions and answers composed by AI safety researcher Paul Röttger and 404 Media’s testing, Mistral AI’s models have been churning out some rather dicey advice. The content generated by Mistral AI's models has ignited debates on morality, spanning topics from ethnic cleansing to retrograde discrimination, even venturing into unsettling DIY territory.
In December 2023, only 7 months after their launch, Mistral AI ripped all the charts, becoming a GenAI unicorn with a valuation exceeding $2 billion. They also unconventionally launched an open-sourced model, Mixtral 8x7B, based on the sparse mixture-of-experts technique, via a torrent link! Who are these bold French innovators, what drives them, why the Mixtral model is so efficient, who supports them, and why? Let’s find out.
Starting point of Mistral AI
The founders' (or France's?) vision
Founder’s views toward AI risks
Financial situation
It took them four months to roll out their first LLM
Next step: Mixtral – understanding SMoE architecture and what makes that model so efficient
How does Mistral make money?
Conclusion

Starting point of Mistral AI
“The foundational layer story isn't written yet. There are still many things to invent. And that's what we're starting doing,” Mensch told Sifted. “That's why we left our companies that weren't innovative enough — that’s why we started Mistral AI.”
Arthur Mensch (CEO), along with his co-founders Timothée Lacroix (CTO) and Guillaume Lample (Chief Science Officer), go way back – to their AI-studying days at École Polytechnique and École Normale Supérieure. Lacroix and Lample both started at Facebook in 2014 as interns and eventually ended up in Meta’s Paris AI hub. Mensch joined DeepMind's Paris office in 2020, working on “Large language modeling - multimodal models - retrieval”. According to their first lead investor, Lightspeed Venture Partners, “During his time at DeepMind, Arthur was a lead contributor to the Retro, Flamingo, and Chinchilla projects, gaining valuable experience in optimizing large language models. Guillaume led the development of LLaMa LLM along with Timothée.”

Then the friends started mulling over where AI is headed and how they can create a credible open-source alternative and make Europe – specifically Paris, France – a main hub for that.
“It is a market where, in Europe, many actors won’t be willing to rely on American providers,” says Mensch. “There is a geographical stake here that we are willing to exploit.”
The founders' (or France's?) vision
Recently, prominent French AI leaders such as Yann LeCun of Meta and Clément Delangue of Hugging Face have been actively promoting French tech achievements on Twitter. This effort culminated in a partnership between Meta, Hugging Face, and Scaleway at Paris's Station F, signaling a shift in the global tech landscape. France, with its academic excellence and government support, aims to emerge as a potential open-source AI capital. Mistral's rise first caught our attention in November 2023, when France emerged as a potential open-source AI capital — we covered it in our global AI digest.
Mistral AI fits neatly into this narrative. Although their website provides an extensive explanation of why the trio is building Mistral AI – emphasizing community-backed model development to combat censorship and bias, and offering open-weight models as a credible alternative to AI oligopolies – it appears more like the founding trio – Mensch, Lacroix, and Lample – found themselves in the right place at the right time, with the right expertise.
The question remains – with the European Union's intended stringent control of AI and the unclear position of open-source in this model – how will Mistral AI comply?
Founder’s views toward AI risks
At a recent gathering of tech leaders in the halls of Bletchley Park, England one face stood out from the Silicon Valley crowd. Among the luminaries at the AI Safety Summit, Mistral AI's CEO Mensch was the lone European voice, holding court with about 30 other executives after a more intimate Day Two session, which was a contrast to the previous day's discussion amongst a crowd of 100. He is one of the only co-founder who was publicly vocal about the company’s work.
His stand is that foundation models are primarily tools for developers, and it's their responsibility to guarantee safe usage, rather than the startups that create these models. In a long tweet, Mensch clarified that they argue against regulating foundation models themselves, comparing it to not regulating the C language because of potential misuse. Instead, they advocate for regulating the usage of AI applications. Mistral AI criticizes the AI Act's approach to systemic risks and its unclear taxonomy for determining model capabilities. They argue that the current AI Act could hinder the European AI ecosystem by creating a divide between large, compliant companies and smaller, innovative ones.
To their credit, Mistral AI immediately reacts when someone calls out the lack of guardrails (released instructions on how to add guardrails to the model to avoid this) or complains about them not being open enough.
But still, Mistral AI’s page on Hugging Face says that “The Mixtral-8x7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.”
Financial situation

The notable rounds:
In June 2023: Barely a month-old company, Mistral AI raised a $113 million round of seed funding ready to go toe-to-toe with the likes of OpenAI in the AI division. It then announced plans to use the funding to assemble a “world-class team” creating “the best open source models.”
In December 2023: Mistral AI pulled off a blockbuster deal, raising in dollars approximately $415 million at over $2 billion valuation.
“Mistral is at the center of a small but passionate developer community growing up around open-source AI,” Andreessen Horowitz said in its funding announcement. “Community fine-tuned models now routinely dominate open-source leaderboards (and even beat closed source models on some tasks).”
Investors certainly appear to be viewing Mistral AI as Europe’s opportunity to plant its flag in the very fertile (at present) generative AI ground.
It took them four months to roll out their first LLM
In September 2023 Mistral AI rolled out its debut AI model, Mistral-7B. They dropped this model online, and everyone wanted a piece. Users were able to access this model from all corners of the internet – torrents, GitHub, Discord. They even created a repository with a handy Apache 2.0 license, which is a highly permissive scheme that has no restrictions on use or reproduction beyond attribution. The users are good to go – as long as they can handle the tech side and foot the bill for cloud resources.
The high-performance model does not guzzle resources like LLaMA 2 yet offers similar results (according to some standard benchmarks). While GPTs of the world can perform much better, they are just behind the API affairs and far more expensive to run.
Mistral wasn't some weekend hackathon project, the team put four months of blood, sweat, and code to build it. They built this thing from the ground up, fine-tuning their MLOps and data pipelines. Mensch has hinted that not every model will be up for grabs with that generous Apache 2.0 license. They're keeping their options open, maybe rolling out some premium offerings behind a paywall.
The next release was even funkier. It looked like this:
Mixtral – understanding SMoE architecture and what makes that model so efficient
Their first ‘child’ Mistral 7B is a 7-billion-parameter language model, engineered for high performance and efficiency. It surpasses existing models like Llama 2 (13B) and Llama 1 (34B) in various benchmarks, especially in reasoning, mathematics, and code generation. Its architecture uses grouped-query attention (GQA) for faster inference and sliding window attention (SWA) for handling long sequences more efficiently, improving computational cost and inference latency.
In September, it was revealed that Mistral's 7B model required approximately 200,000 GPU hours for training. With NVIDIA's cloud GPUs costing about $2 to $2.5 per hour, the compute cost for Mistral-7B was estimated to be between $400,000 and $450,000. For comparison, GPT-4, a significantly larger model, had a training cost exceeding $100 million. While specifics of Mistral's approach weren't fully disclosed, CEO Arthur Mensch hinted at significant investments in data and algorithm development to optimize model performance.
But the most interesting model they dropped last December, as we’ve mentioned before, via Torrent link: Mixtral 8×7B.
This new model incorporates a sparse mixture-of-experts (SMoE) architecture, which deviates from traditional monolithic (dense) transformer models. The SMoE approach allows Mistral to efficiently direct specific inputs to designated experts within its network, enhancing multitasking and learning capabilities. As seen on the image above, Mixtral 8x7B outperforms models like Llama 2 70B in most assessments and offers six times faster inference, making it one of the most efficient open-weight models available. The model also aced tests in French, German, Spanish, Italian, and English.
The SMoE is part of the Mixture-of-Experts (MoE) models. It operates on a system of multiple 'expert' sub-models, each specializing in different data subsets or problem aspects. Crucially, for any input, it uses only a select few experts, maintaining efficiency and scalability.
Key to SMoE is its dynamic routing algorithm. This algorithm, often a neural network itself, determines the most relevant experts for a given input, focusing computational resources effectively. This approach allows SMoE models to handle complex tasks with diverse data types, improving overall performance.
SMoE's architecture is notably efficient due to its sparsity, allowing it to scale to a large number of parameters more effectively than traditional dense models.
Mixtral 8x7B's architecture is characterized by its decoder-only model and a feedforward block that selects from eight groups of parameters. Thanks to the architecture, Mistral can compete against a model trained on 70 billion parameters using a model 10 times smaller. According to their website, “Mixtral has 46.7B total parameters but only uses 12.9B per token. It processes input and generates output at the same speed and for the same cost as a 12.9B model."
The exciting thing about Mixtral is that it proves that MOE can be effective at smaller scales on hardware that is accessible to developers. Mixtral 8x7B, along with its instruction-optimized version, Mixtral 8x7B Instruct, are freely accessible under the Apache 2.0 license.
How does Mistral make money?
So far, it doesn’t.

And when you try to use it, it says: “Thank your for your interest in Mistral AI! Your account is almost set up, but you are still in the waitlist to use the platform.”
Earlier the website stated, “We will propose optimized proprietary models for on-premise/virtual private cloud deployment. These models will be distributed as white-box solutions, making both weights and code sources available. We are actively working on hosted solutions and dedicated deployment for enterprises.”
Currently, the company is operating with a substantial amount of capital it managed to secure in a very short time.
Conclusion
Mistral AI exemplifies a bold new direction in the landscape of LLMs, marking a significant leap in the evolution of open-source AI technology. It fights to be a paradigm-shifting force in the generative AI domain, challenging established norms and tech giants and provoking discussions on the right regulatory practice. Their rapid ascent to a GenAI unicorn status and the release of groundbreaking models like Mixtral 8x7B highlight a strategic blend of innovation and open-source ethos.
However, their approach has sparked debates around the true openness of their models and the broader implications of AI without safety nets. Mistral AI's story is emblematic of the dynamic and often contentious nature of AI development, raising critical questions about the balance between technological advancement and responsible AI governance. As they continue on their journey, Mistral AI's influence on the European AI landscape and its role in shaping the global discourse on AI ethics and open-source models will be key areas to watch. Their story is a compelling chapter in the ongoing narrative of AI's evolution and its impact on society. The questions remain: for what applications will Mixtral be used, and when will the company start making money?
Bonus: All important links about the founders and CEO
Arthur Mensch
Guillaume Lample
Timothee Lacroix
Previously in Unicorn Chronicles:
How was it?
P.S. If you think someone else you know may like this newsletter, share your referral code with them to join: https://www.turingpost.com/subscribe?ref=PLACEHOLDER




