This website uses cookies
Read our Privacy policy and Terms of use for more information.
The structural blueprints behind the models – transformers, mixture-of-experts, state-space models, diffusion, JEPA – and how each design choice shapes what a system can learn
AI 101
+1

8 min read
May 11, 2026
xLSTM revives recurrent networks with exponential gating & matrix memory. Compare xLSTM vs Transformers & classic LSTM – and when to use each.

Concepts
+3

8 min read
Jan 14, 2026
Princeton’s new recipe for building better world models to support AI agents


AI 101
+1

9 min read
Jan 8, 2026
DeepSeek's mHC (Manifold-Constrained Hyper-Connections) fixes hyper-connection instability with geometric constraints. How it works and why it matters for scaling LLMs.


AI 101
+1

10 min read
Dec 3, 2025
We discuss how models fuse different data types, why it is important now, and what is special in the new Meta and KAUST's fusion method – Mixture of States

AI 101
+2

11 min read
Nov 19, 2025
LeJEPA by Yann LeCun: provably stable self-supervised learning without heuristics. SIGReg, isotropic Gaussian embeddings & world models explained. Turing Post.


AI 101
+3

11 min read
May 28, 2025
BERT explained: how bidirectional pre-training works, MLM vs NSP, fine-tuning, RoBERTa, DistilBERT, ModernBERT, NeoBERT, and ConstBERT for retrieval.

AI 101
+2

11 min read
Apr 30, 2025
we discuss a new wave of architecture from Liquid AI – built from first principles, optimized for real hardware, and challenging the Transformer playbook with smarter, leaner models

AI 101
+1

6 min read
Feb 19, 2025
we discuss how to enable the Mamba Selective State Space Model (SSM) to handle multimodal data using the Mixture-of-Transformers concept and modality-aware sparsity

AI 101
+3

4 min read
Jul 10, 2024
LongRAG uses 4K-token retrieval units instead of 100-word chunks, reducing corpus size 30×. How LongRAG architecture works and how it compares to standard RAG.


AI 101
+1

8 min read
Jul 3, 2024
KAN (Kolmogorov-Arnold Networks) replaces fixed activation functions with learnable splines. How KAN works, how it compares to MLP, and where it falls short.

AI 101
+2

11 min read
Jun 12, 2024
JEPA explained: Yann LeCun's Joint Embedding Predictive Architecture for world modeling. Covers I-JEPA, V-JEPA, MC-JEPA, architecture & key concepts.

Turing Post is an AI newsletter for engineers, researchers, founders, and technical managers who want to understand how machine learning and AI actually work.
Built on more than two decades in tech and seven years focused on AI, we track the research that matters, the systems being built, and the ideas shaping the field, from LLMs and AI agents to JEPA, world models, retrieval, inference, evaluation, AI infrastructure, and agentic workflows.
Join 110,000+ professionals who rely on Turing Post for precise, grounded analysis of AI’s past, present, and future.