Turing Post
Posts
Breakdown: Kimi K2, DeepSeek-R1, Qwen3 (+Coder), and GLM-4.5

Breakdown: Kimi K2, DeepSeek-R1, Qwen3 (+Coder), and GLM-4.5

Most talked-about reasoning and agentic models from Moonshot AI, DeepSeek, Qwen and Z.ai that are just crushing it

Alyona Vert. & Ksenia Se
July 30, 2025

This month is a treasure trove of hot new model releases. Chinese companies – MoonshotAI, Qwen, and Z.ai – have turned it into a battleground of the strongest agentic models ever. Not only do they mark the beginning of a new era with where just reasoning isn’t enough, they also keep this field open and accessible to everyone. Once again, we witness a huge moment where open technologies perform on par with, or even surpass, the closed models we’ve grown used to.

And, honestly, shame on Meta for backing off open source. In 2024, it was the “Linux of AI”. In 2025, it’s suddenly “be careful what we open”? Not a good vibe.

Anyway, it’s time for agentic innovation, and Chinese models won’t let you get bored.

We’ll start with Kimi K2 – the most talked-about model, then revisit DeepSeek-R1 – the solid baseline for reasoning models, and finally explore the freshest Qwen3, Qwen3-Coder, and the latest GLM-4.5. Join us for this fascinating breakdown.

In today’s episode, we will cover:

Kimi K2 - the Agentic Intelligence ambassador
- Exclusive innovations (MuonClip optimizer, Synthetic data and rephrasing are the key, Self-critic – a special shift in reward modeling for open-ended tasks)
- Kimi K2’s great achievement
DeepSeek-R1: The reasoning baseline
Qwen3 - the model with controllable thinking modes
- Architecture and training strategies
- Results of Qwen3-235B
What is Qwen3-Coder?
GLM-4.5 - the hottest Z.ai’s release
- How does GLM-4.5 work?
- What GLM-4.5 really can
Conclusion: Comparison of the models
Sources and further reading

Kimi K2 – the Agentic Intelligence ambassador

Kimi K2 is now the most talked-about massive MoE model, released on July, 12 that comes with the shift towards Agentic Intelligence. It is the result of Moonshot AI researchers’ proficiency and determination to build advanced AI technologies that prioritize lossless long context and personalization. This is about perfect, high-fidelity recall, giving the model full memory of entire conversations. AI-native products built on these principles can deliver highly customized user experiences without the need for traditional model fine-tuning.

And what about the innovations that Kimi K2 brings to the AI world? Well, it’s really one of the most important models of the year as it marks an agentic moment, similar to DeepSeek-R1 reasoning moment. It quickly became a new baseline for agentic behavior coming with the following technical decisions:

Specially built MuonClip optimizer that keeps learning stable and allows training on huge data − 15.5 trillion tokens.
A large-scale synthetic data pipeline which focuses on building agentic capabilities in Kimi K2.
Capability to learn from its own outputs on open-ended questions via Self-Critique Rubric Reward.

Let’s break everything in order.

Exclusive innovations

Firstly, a little bit about Kimi K2 architecture. It’s a MoE model, which is extended to a massive 1.04 trillion total parameters with only 32 billion of them active at any one time. It increases sparsity level to 48, using 384 total experts and activating 8 per forward pass, without increasing compute cost.

Moonshot AI’s model uses Multi-head Latent Attention (MLA) and has a hidden size of 7168, with expert layers using 2048-dimensional hidden states. It also introduces a smart trade-off in attention head count to keep long-context inference practical (just remember about lossless long context concept). Kimi K2 uses 64 attention heads compared to DeepSeek-V3 128, for example. Less amount of attention helps to make the workflow faster.

Now let’s explore smart innovations we’ve mentioned before.

MuonClip optimizer

We’ll start with the custom optimizer, MuonClip, that is used to train Kimi K2 from the very start.

Join Premium members from top companies like Microsoft, Google, Hugging Face, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI. Simplify your learning journey 👆🏼

Reply

or to participate.