This website uses cookies

Read our Privacy policy and Terms of use for more information.

It's a compelling story about how a successful exit can empower an innovative itch. Months of brainstorming led to the ambitious decision to tackle a seemingly insurmountable challenge, ultimately resulting in a groundbreaking product. Cerebras Systems not only dreams big but also acts big – their AI chip is so large it could be compared to a dinner plate or a pizza box, making it the largest single piece of silicon ever produced. And it works.

Cerebras Systems, an eight-year-old company, recently introduced the third version of their wafer-scale engine (WSE-3), a massive 5nm-based chip boasting 4 trillion transistors and 900,000 AI-optimized compute cores, which powers the CS-3 AI supercomputer. Last week, they also announced a collaboration with Dell Technologies to address the growing AI workload demands.

We will discuss what it all means, why they are less known than NVIDIA despite continuously claiming to outpace NVIDIA's chips, and how their valuation recently reached over $4 billion (with the next stop being an IPO?) in our AI Infra Unicorn series.

In today’s episode:

  1. Starting point of Cerebras Systems: daunting challenge and metaphors from Andrew Feldman

  2. Becoming a unicorn - financial situation

  3. But what exactly does Cerebras offer?

  4. Mission

  5. Training capabilities and inference challenges

  6. Cerebras vs. NVIDIA: another analogy and key differences

  7. Can Cerebras’s chips Replace NVIDIA GPUs?

  8. How does the company make money?

  9. Conclusion

Starting point of Cerebras: daunting challenge and metaphors from Andrew Feldman

It was a tremendous success. In five years, Andrew Feldman and Gary Lauterbach built SeaMicro, a novel power-efficient computer server for data processing, and sold it to AMD for $334 million. Serving on the board for another two years, in 2014 they finally decided to get some rest and quit. But once an entrepreneur, always an entrepreneur, especially when tremendous talent still floats around you, ready to follow. Feldman and Lauterbach stayed in touch with three other colleagues from SeaMicro: Michael James, J.P. Fricker, and Sean Lie, and gradually started to brainstorm, each bringing unique expertise in software, hardware, and systems architecture. All five of them shared the ambition to create something big, not just another incremental improvement in the tech world.

The idea of building a new type of server, optimized for Intel's groundbreaking 3D XPoint memory, initially captivated them. This technology promised to transform computing with its unprecedented speed and durability. However, the team quickly realized the limitations imposed by Intel's dominance over the technology. They shifted their focus to an even bolder vision: creating a computer optimized for artificial intelligence.

Feldman envisioned a machine solely dedicated to AI tasks, eschewing all other functionalities. This concept involved constructing a wafer-scale chip, a colossal 60 times larger than any existing chip, with unparalleled compute power.

“All we put on our chip is stuff for A.I. For now, progress will come through specialization.”

Andrew Feldman at The New Yorker

So when they said they wanted something big, they meant it literally. But it was a daunting challenge, reminiscent of the failed efforts of Trilogy Systems decades earlier, who were also trying to build a wafer-scale systems. And, because it was a daunting challenge, it became so inspiring: solving such a complex problem would grant them a unique market advantage with few competitors.

Thanks to their phenomenal exit and network built by Feldman, they could raise money solely for an idea. How? Andrew Feldman loves analogies and metaphors. In an interview with Mark Leslie, he compared specialists and generalists to cheetahs and hyenas (WSE-GPU). In Spectrum, he compares GPUs to tailors who can’t make one suit together. In The New Yorker, he says, “We invented a technique such that you could communicate across that little bit of cookie dough between the two cookies.” In TechCrunch, he likens the challenge to climbing Mount Everest: “It’s like the first set of guys failed to climb Mount Everest, they said, ‘Shit, that first part is really hard.’ And then the next set came along and said ‘That shit was nothing. That last hundred yards, that’s a problem.’” He is the closest thing we have encountered to ChatGPT in terms of producing analogies and metaphors on the fly, and we bet this style works phenomenally on investors. His founding team was very solid, his metaphors were embracing, and investors fell for it, not fully realizing what a complicated story they were signing up for.

In less than a week of conversations to test the level of interest from potential investors, Feldman had received over $100 million worth of commitments. In March 2016, Cerebras was launched, with Andrew Feldman, Gary Lauterbach, Michael James, J.P. Fricker, and Sean Lie as co-founders.

Becoming a unicorn – financial situation

The rest of this fascinating story is available to our Premium users only. Please →

Their first round was only around $30 million, though, with the next year's Series B round of $25 million. It took them two more years (from 2017 to 2019) and two more rounds to come out of stealth with their first WSE. That same year – 2019 – they raised Series E, and it was massive: $272 million with an immediate jump to the Unicorn family.

The interesting moment here is the amount of press in 2019. It’s an avalanche with the WSE launch, and then it’s silence. An avalanche again with the update and next huge round in 2021, and then silence again. Considering that the idea was to build a chip and supercomputer for AI specifically, it’s odd that in 2022 – when ChatGPT brought generative AI to daylight and NVIDIA started to dominate each and every conversation about compute – Cerebras wasn’t mentioned much.

Among investors there is a bunch of interesting individuals:

Image Credit: Cerebras

But what exactly does Cerebras offer?

Wafer-scale engine (WSE)

While other chip developers focus on creating small chips that are so tiny you can't even see or touch them, Cerebras has taken a completely different approach.

They focused on creating a massive processor of a single giant piece of silicon that's 46225 mm2. To give you an idea, it's the same size as an A4 paper with one of its longer sides trimmed to make a square. This dramatically exceeds the size of conventional chips.

WSE-3, the third generation of chips, is powered to perform a massive number of operations simultaneously. Specifically, Cerebras outline the number of 125 petaFLOPS thanks to 4 trillion transistors, 900,000 cores, and 44 GB on-chip memory. If these numbers don’t make any sense for you, don’t worry. We break down all the technical specifications of WSE-3.

A petaflop represents one quadrillion (15 zeroes) floating point operations per second. It’s a record-fast computing speed. With 125 petaFLOPS, the WSE-3 chip can efficiently handle highly complex computational tasks.

Key factors contributing to its high speed and efficiency:

  • 4 trillion transistors: Transistors are tiny switches that can stop or allow the flow of electricity. With 4 trillion transistors, the WSE-3 can process a multitude of operations in parallel.

  • 900,000 cores: Cores are individual processing units that can read and execute program instructions. Having 900,000 of these means the WSE-3 can handle vast amounts of operations simultaneously.

  • 44 GB On-Chip Memory: This type of memory is integrated directly into the chip, making it faster than off-chip memory. 44 GB of on-chip memory enables quick storage and access to large amounts of data.

According to Cerebras, that’s how WSE-3 outperforms NVIDIA H100:

It stands out significantly from other AI-optimized processors for its size and single-chip performance capabilities. Forbes highlights these distinguishing features:

“As a result, there really is no fair comparison to any other semiconductor solution in terms of size or single chip performance.”

Image Credit: Cerebras

Cerebras claims these are the advantages over traditional chip architectures:

  • Latency: The WSE-3 uses a single silicon wafer, eliminating interconnect and memory latencies typical in multi-chip configurations, resulting in faster data processing.

  • Power consumption: By integrating all components on one wafer, the WSE-3 drastically reduces the power requirements and operational latency compared to multi-server systems.

  • Scalability: The unified silicon design allows for seamless integration and synchronization of cores, simplifying the scaling of complex computations without the usual overhead.

  • Programming: The WSE-3 can be programmed like a single processor system using standard AI frameworks like PyTorch, easing the deployment and management of AI models.

  • AI-optimized: Designed specifically for AI tasks, the WSE-3 features high memory bandwidth and extensive on-chip memory, crucial for efficiently handling deep learning computations.

Image Credit: Cerebras

AI supercomputers

In July 2023, Cerebras announced the launch of its first AI supercomputer, the Condor Galaxy 1, in Santa Clara, California. The supercomputer reportedly cost over $100 million and was built in partnership with Group 42 (G42), an Arab technology firm that has strong ties with tech giants like OpenAI, Dell, IBM, Microsoft, Nvidia, AstraZeneca, and Illumina. In response to China's relations with Group 42, Microsoft has recently invested $1.5 billion to compete. Microsoft's vice chair and president, Brad Smith, joined the G42 board as part of the deal.

Cerebras has two supercomputers running in California and has announced the third supercomputer, Condor Galaxy 3, to be available in Q2 2024, which ends just in half a month. Condor Galaxy 3 is powered by the newly announced CS-3 infrastructure built around Cerebras’ new chip WSE-3. The WSE-3 is designed to double the performance of its predecessor while keeping the same power consumption and cost.

CS-3: A Revolution in AI Infrastructure

Cerebras sells complete server solutions, not individual chips. The CS-3, their latest server, includes a new chassis design housing the WSE-3 chip. This setup integrates the processor with essential support infrastructure such as power supplies, cooling systems, and connectivity options.

Made for Generative AI

CS-3 can be stacked together clustering up to 2048 units. The system's modular design facilitates scalability from small to hyper-scale deployments. It offers linear scaling with large language models such as Falcon and Llama. At the same time, with small and efficient models gaining traction, these giant chips inside CS-3 are also a great option for efficiency.

Mission

According to Feldman, it’s: “to show the world that this work can be done faster, it can be done with less energy, it can be done for lower cost.

Training Capabilities and Performance

Since the first WSE, Cerebras became known mostly for training deep learning models. To prove all systems work, Cerebras Systems has trained its own large language models (LLMs). In 2022, they introduced the Cerebras-GPT models, which range from 111 million to 13 billion parameters. These models are designed to showcase the capabilities of Cerebras' CS-2 system, powered by their WSE. These models are open source, demonstrating the performance and efficiency of their hardware (to play with them, check Hugging Face). Usually, the training happens in a fraction of the time that other tech giants like Meta and OpenAI spend training their models.

Challenges in Inference and partnership with Qualcomm

It’s the inference that used to be lagging behind. Though it was possible to utilize it for inference tasks, there were two main challenges:

  1. Cost: Cerebras' systems as high-end solutions may be more cost-effective for large-scale training rather than for smaller-scale inference tasks.

  2. Specialization: The hardware is highly specialized, and while it can be used for inference, other hardware solutions like GPUs might be more commonly used and supported for a wider range of inference tasks.

In March, 2024, Cerebras Systems has partnered with Qualcomm to enhance AI inference performance. The Qualcomm Cloud AI 100 Ultra, combined with Cerebras’ CS-3 AI accelerators, offers up to 10x more tokens per dollar, reducing AI deployment costs. This collaboration leverages advanced techniques like unstructured sparsity, speculative decoding, and efficient MX6 inference to deliver high-performance, cost-effective AI solutions, particularly beneficial for industries like pharmaceuticals where operating costs are crucial.

Cerebras vs. NVIDIA: another analogy

Being true to himself, Cerebras CEO Feldman offers another analogy: “It’s equivalent to a shipper only wanting to move stuff on pallets because they don’t want to examine each box. Memory bandwidth is the ability to examine each box to make sure it’s not empty. If it’s empty, set it aside and then not move it.” To Forbes, he said, that in terms of the number of transistors, GPUs will not match WSE-3 for another 6 years. "It's 57 times larger. It's got 52 times more cores. It's got 800 times more memory on chip. It's got 7,000 times more memory bandwidth and more than 3,700 times more fabric bandwidth. These are the underpinnings of performance."

Comparing Cerebras and NVIDIA: Key Differences

1. Architecture

  • Cerebras: Uses a single large chip to minimize communication overhead between cores, leading to lower latency and higher throughput for certain AI tasks.

  • NVIDIA: Utilizes GPU clusters which involve multiple smaller chips working together, potentially leading to higher communication overhead.

2. Performance

  • Cerebras: Excels in workloads that require high memory bandwidth and low latency, such as large transformer models. Their approach simplifies the model training process by reducing the need for distributed computing across multiple devices.

  • NVIDIA: Offers high flexibility and scalability with their GPUs, which can be easily integrated into existing data centers. Their GPUs are versatile and handle a wide range of AI tasks effectively.

3. Software Ecosystem

  • Cerebras: Has a more specialized software stack optimized for their hardware. While effective for specific tasks, it lacks the broader ecosystem and community support that NVIDIA enjoys.

  • NVIDIA: CUDA and other software tools provide extensive support for a wide range of AI frameworks and applications, making it easier for developers to adopt and scale their solutions.

4. Adoption and Market Position

  • Cerebras: Gaining traction in niche markets that require extreme performance for specific AI tasks. Their hardware is being adopted by such institutions as Mayo clinic, Aleph Alpha, National Center for Supercomputing Applications, AstraZeneca, G42 etc – and it’s only 15 of them on Cerebras’s website.

  • NVIDIA: Dominates the AI hardware market with widespread adoption across various industries, including AI research. Their GPUs are considered the standard for AI and machine learning tasks.

Can Cerebras Replace NVIDIA GPUs?

In specific scenarios, yes. For tasks that benefit from Cerebras’ unique architecture, such as training large language models or other compute-heavy AI applications, Cerebras can outperform traditional GPU clusters by reducing communication overhead and improving efficiency. However, replacing NVIDIA GPUs entirely is unlikely in the near term due to the following reasons:

  1. Versatility: NVIDIA GPUs are more versatile and handle a wider range of applications beyond AI, including graphics rendering, scientific computing, and more.

  2. Ecosystem: NVIDIA's extensive software ecosystem and community support make it easier for developers to build and deploy AI solutions.

  3. Scalability and Integration: NVIDIA's solutions are well-integrated into existing data center infrastructures, offering seamless scalability. Plus people who build AI models are accustomed to using software that works on Nvidia’s AI chips.

How does the company make money

This information is not disclosed.

Barron reports that Cerebras takes a different approach with its chips: instead of selling them individually, the company incorporates them into complete computing systems. This unique business model generates revenue in two ways:

  • 20% of sales: Selling the entire system directly to customers.

  • 80% of sales: Selling access to the systems, similar to how cloud computing services like Amazon Web Services, Microsoft Azure, and Google Cloud operate.

Feldman says the company in 2023 built eight times as many systems as it did the prior year and it expects the total to increase 10 times in 2024. Feldman says Cerebras has reached break-even in terms of cash flow.”

Conclusion

It's phenomenal that Cerebras actually pulled it off, managed to break the curse of Trilogy Systems and built the biggest AI chip ever to exist. Cerebras offers significant advantages for specific AI workloads but lacks the versatility, software ecosystem, and broad market adoption that NVIDIA GPUs enjoy. While Cerebras can outperform NVIDIA in certain high-performance AI applications, it is not a wholesale replacement for NVIDIA GPUs.

Now they need to use all their creative power that goes into metaphors to channel and envision an even bigger picture that can compete with the visionary views of Jensen Huang, NVIDIA's CEO. Now we still live in the world he imagined and keeps building. Cerebras can obtain a chunk of the market but until it creates a visionary world like Jensen Huang, it will remain one of the other chip makers that try to rival NVIDIA.

Cerebras is reportedly planning a public offer at a $4 billion valuation. NVIDIA's market capitalization is 810 times higher currently (as of Jun 15, 2024, it’s 3.24 trillion).

Thank you for reading, please feel free to share with your friends and colleagues. 🤍

Reply

Avatar

or to participate

Keep Reading