LLM Post-Training: GRPO, DPO, RLHF & Fine-Tuning

How LLMs are trained and improved: RL methods like GRPO and DPO, distillation, fine-tuning, quantization, and retrieval — explained for practitioners.

AI 101

GRPO Explained: Group Relative Policy Optimization

11 min read

Jul 5, 2026

GRPO Explained: Group Relative Policy Optimization

GRPO (Group Relative Policy Optimization) is a critic-free RL algorithm for LLM training. Learn how it works, how it differs from PPO, and its role in DeepSeek-R1

Alyona Vert.

AI Concepts & Techniques

The Human Touch: How HITL is Saving AI from Itself with Synthetic Data

12 min read

Jul 1, 2026

The Human Touch: How HITL is Saving AI from Itself with Synthetic Data

How human-in-the-loop keeps synthetic data safe from model collapse – with real examples from GPT-4.5, Phi-4, Walmart, and NVIDIA Cosmos

Alyona Vert., +1

AI 101

10 min read

Mar 25, 2026

AI 101: Transformers Depth Is an Addressable Dimension

MoDA and Attention Residuals make Transformer depth queryable — not just a fixed pipeline. Learn how both approaches work and why it matters for deep LLMs.

Alyona Vert., +1

AI 101

14 min read

Mar 11, 2026

AI 101: Beyond RL: The New Fine-Tuning Stack for LLMs

How Doc-to-LoRA, Text-to-LoRA, LoRA-Squeeze, Kron-LoRA, MoA & Evolution Strategies replace expensive RL loops with a modular, cheaper post-training stack for LLMs.

Alyona Vert., +1

AI Concepts & Techniques

AI 101: From Vibe Coding to Spec-Driven Development

11 min read

Mar 4, 2026

AI 101: From Vibe Coding to Spec-Driven Development

Why vibe coding breaks at scale — and how spec-driven development (SDD) fixes it. Covers Kiro by AWS, GitHub Spec Kit, Tessl, and when to use each approach.

Alyona Vert., +1

AI 101

14 min read

Feb 11, 2026

AI 101: "On-Policy Distillation Zeitgeist"

Three papers on on-policy self-distillation — OPSD, SDFT, SDPO — replacing sparse rewards with dense token-level feedback. Benchmarks and limitations covered.

Alyona Vert.

AI 101

10 min read

Feb 4, 2026

AI 101: Conditional Memory and the Rise of Selective Intelligence

Engram by DeepSeek gives LLMs conditional memory: models learn when to access knowledge. Covers architecture, U-shaped allocation law, and reasoning gains.

Ksenia Se

AI Concepts & Techniques

AI 101: The State of Reinforcement Learning in 2025

15 min read

Dec 10, 2025

AI 101: The State of Reinforcement Learning in 2025

Reinforcement learning in 2025: RLVR surprising findings, GRPO vs PPO, RLHF vs RLAIF, agentic RL, robotics advances, and Karpathy's critique explained

Ksenia Se, +1

AI 101

What matters for RL? Precision! Switching BF16 → FP16

10 min read

Nov 5, 2025

What matters for RL? Precision! Switching BF16 → FP16

BF16 vs FP16: how switching precision during RL fine-tuning fixes training-inference mismatch, stabilizes GRPO, and why Karpathy applied it to nanochat.

Ksenia Se, +1

AI 101

10 min read

Oct 15, 2025

AI 101: What are Modular Manifolds?

Modular manifolds treat neural network layers as geometric modules for stable, scalable optimization. A deep dive into Thinking Machines Lab's approach.

Alyona Vert.

AI 101

9 min read

Sep 3, 2025

What is XQuant?

XQuant and XQuant-CL cut LLM KV cache memory up to 12x by storing input activations instead of keys and values. How the method works and when to use it.

Alyona Vert.

AI 101

13 min read

Aug 13, 2025

What's New in Test-Time Scaling?

How Chain-of-Layers, MindJourney, and Google's TTD-DR push test-time scaling further — and where inverse scaling shows its limits.

Alyona Vert.

AI Concepts & Techniques

11 min read

Jun 25, 2025

RLHF variations: DPO, RRHF, RLAIF

DPO, RRHF, and RLAIF explained: three RLHF alternatives that skip reward models, use ranking loss, or replace human annotators with AI feedback.

Alyona Vert.

AI 101

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

12 min read

Apr 23, 2025

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

The fresh angle on current Mixture-of-Expert. We discuss what new MoE techniques like S'MoRE, Symbolic-MoE, and others mean to the next generation AI

Alyona Vert.

AI Concepts & Techniques

LLM Inference Explained: Latency, Throughput & How It Work

10 min read

Apr 2, 2025

LLM Inference Explained: Latency, Throughput & How It Work

How to optimize LLM inference latency and throughput: quantization, batching, KV cache, speculative decoding, GPU vs TPU, and hardware accelerators.

Alyona Vert.

AI 101

Slim Attention, XAttention & KArAt: Three New Attention Mechanisms Explained

12 min read

Mar 26, 2025

Slim Attention, XAttention & KArAt: Three New Attention Mechanisms Explained

We explore three advanced attention mechanisms which improve how models handle long sequences, cut memory use and make attention learnable

Alyona Vert.

AI 101

How to Reduce Memory Use in Reasoning Models

10 min read

Mar 12, 2025

How to Reduce Memory Use in Reasoning Models

How LightThinker and Multi-Head Latent Attention (MLA) reduce memory use and speed up inference in reasoning models like DeepSeek-R1.

Alyona Vert.

AI 101

Everything You Need to Know about Knowledge Distillation

12 min read

Mar 5, 2025

Everything You Need to Know about Knowledge Distillation

This is one of the hottest topics thanks to DeepSeek. Learn with us: the core idea, its types, scaling laws, real-world cases and useful resources to dive deeper

Alyona Vert.

AI 101

What are Chain-of-Agents and Chain-of-RAG?

10 min read

Feb 12, 2025

What are Chain-of-Agents and Chain-of-RAG?

CoRAG and Chain-of-Agents are two upgrades to standard RAG: one masters multi-hop reasoning, the other handles extremely long contexts. Compare both

Alyona Vert.

AI Concepts & Techniques

What is test-time compute and how to scale it?

13 min read

Feb 5, 2025

What is test-time compute and how to scale it?

We dive into test-time compute and discuss five+ open-source methods for its effective scaling for deep step-by-step models' reasoning.

Alyona Vert.

AI 101

5 min read

Jan 29, 2025

The Keys to Prompt Optimization

Prompt optimization improves LLM outputs by refining query structure. Covers expansion, decomposition, disambiguation, abstraction, and combining strategies effectively.

Isabel González

AI 101

What is HtmlRAG, Multimodal RAG and Agentic RAG?

9 min read

Jan 8, 2025

What is HtmlRAG, Multimodal RAG and Agentic RAG?

Three RAG upgrades explained: HtmlRAG preserves HTML structure, Multimodal RAG retrieves images, and Agentic RAG reformulates queries for better results

Alyona Vert.

AI Concepts & Techniques

What is Natural Language Reinforcement Learning (NLRL)?

8 min read

Dec 18, 2024

What is Natural Language Reinforcement Learning (NLRL)?

NLRL redefines reinforcement learning using natural language instead of numeric rewards. Learn how it works, how LLMs fit in, and where it outperforms PPO.

Alyona Vert.

AI 101

Flow Matching for Generative Modeling: How It Works and Why It Matters

11 min read

Dec 4, 2024

Flow Matching for Generative Modeling: How It Works and Why It Matters

Flow Matching explained: how it trains generative models faster than diffusion, what conditional flow matching adds, and why Flux, F5-TTS and MovieGen use it.

Alyona Vert.

AI 101

6 min read

Nov 13, 2024

What is Mixture-of-Depths?

Mixture-of-Depths lets transformers skip layers for low-priority tokens, cutting FLOPs by up to 50%. Learn how MoD routing works and how it compares to MoE.

Alyona Vert.