This website uses cookies
Read our Privacy policy and Terms of use for more information.
How AI models are trained, tuned, and improved – RL methods like GRPO and DPO, distillation, fine-tuning, retrieval, quantization – the techniques behind working systems
AI 101
+1

10 min read
Mar 25, 2026
Deep transformers used to accumulate layer history. Now they are starting to retrieve from it.


AI 101
+1

14 min read
Mar 11, 2026
Discussing the rise of generated adapters, like Text-to-LoRA, Doc-to-LoRA and others, Evolution Strategies (ES) optimization and the future of dynamic fine-tuning


Concepts
+3

11 min read
Mar 4, 2026
Why vibe coding breaks at scale — and how spec-driven development (SDD) fixes it. Covers Kiro by AWS, GitHub Spec Kit, Tessl, and when to use each approach.


AI 101
+1

13 min read
Feb 11, 2026
Can a model teach itself well in 2026? Checking out some banger papers on self-distillation that demonstrate a new phase

AI 101
+1

10 min read
Feb 4, 2026
let's explore this new memory paradigm and why it's important as architectural principle

Concepts
+2

15 min read
Dec 10, 2025
State of RL in 2025: RLVR surprising findings, GRPO, RLHF vs RLAIF, policy optimization, agentic RL, robotics advances, and key trends for 2026.


AI 101
+1

8 min read
Nov 5, 2025
BF16 vs FP16: how switching precision during RL fine-tuning fixes training-inference mismatch, stabilizes GRPO, and why Karpathy applied it to nanochat.


AI 101
+1

10 min read
Oct 15, 2025
Modular manifolds treat neural network layers as geometric modules for stable, scalable optimization. A deep dive into Thinking Machines Lab's approach.

AI 101
+1

9 min read
Sep 3, 2025
Compute is not a big deal for LLMs now, but memory is. Explore how a new XQuant method and its XQuant-CL variation can save the memory use up to 12 times

AI 101
+1

13 min read
Aug 13, 2025
How Chain-of-Layers, MindJourney, and Google's TTD-DR push test-time scaling further — and where inverse scaling shows its limits.

Concepts
+2

11 min read
Jun 25, 2025
DPO, RRHF, and RLAIF explained: three RLHF alternatives that skip reward models, use ranking loss, or replace human annotators with AI feedback.

Concepts
+2

10 min read
Jun 11, 2025
we explore how human-in-the-loop systems are keeping synthetic data grounded, useful, and safe in the age of AI self-training


AI 101
+1

11 min read
May 14, 2025
GRPO explained: DeepSeek's critic-free RL algorithm for LLMs. Covers GRPO vs PPO, how Flow-GRPO works for images, and DeepSeek-R1's training stages.

AI 101
+1

12 min read
Apr 23, 2025
The fresh angle on current Mixture-of-Expert. We discuss what new MoE techniques like S'MoRE, Symbolic-MoE, and others mean to the next generation AI

Concepts
+3

10 min read
Apr 2, 2025
How to optimize LLM inference latency and throughput: quantization, batching, KV cache, speculative decoding, GPU vs TPU, and hardware accelerators.

AI 101
+1

12 min read
Mar 26, 2025
We explore three advanced attention mechanisms which improve how models handle long sequences, cut memory use and make attention learnable

AI 101
+1

10 min read
Mar 12, 2025
we explore how combining LightThinker and Multi-Head Latent Attention cuts memory and boosts performance

AI 101
+1

12 min read
Mar 5, 2025
This is one of the hottest topics thanks to DeepSeek. Learn with us: the core idea, its types, scaling laws, real-world cases and useful resources to dive deeper

AI 101
+1

10 min read
Feb 12, 2025
We explore Google's and Microsoft's advancements that implement "chain" approaches for long context and multi-hop reasoning

Concepts
+2

12 min read
Feb 5, 2025
We dive into test-time compute and discuss five+ open-source methods for its effective scaling for deep step-by-step models' reasoning.

AI 101
+1

5 min read
Jan 29, 2025
Practical Insights for Large Language Models

AI 101
+2

9 min read
Jan 8, 2025
Three RAG upgrades explained: HtmlRAG preserves HTML structure, Multimodal RAG retrieves images, and Agentic RAG reformulates queries for better results

Concepts
+2

8 min read
Dec 18, 2024
NLRL redefines reinforcement learning using natural language instead of numeric rewards. Learn how it works, how LLMs fit in, and where it outperforms PPO.

Turing Post is an AI newsletter for engineers, researchers, founders, and technical managers who want to understand how machine learning and AI actually work.
Built on more than two decades in tech and seven years focused on AI, we track the research that matters, the systems being built, and the ideas shaping the field, from LLMs and AI agents to JEPA, world models, retrieval, inference, evaluation, AI infrastructure, and agentic workflows.
Join 110,000+ professionals who rely on Turing Post for precise, grounded analysis of AI’s past, present, and future.