Traditional AI, especially machine learning, is mostly focused on finding patterns in data. It learns correlations between inputs and outputs, that’s why it’s powerful for predictions, but not always explanations or decision-making. It doesn’t know why things happen, just that they tend to happen together.
To properly explore why something happens using AI, we need other systems that will focus on cause-and-effect relationships. And Causal AI can figure this out. It answers more difficult practical questions like: “What will happen if we change the treatment?” and “Would the patient still have recovered if they hadn’t taken the medication?” The main idea and benefit is to understand how different things influence each other and analyze what this can cause potentially. Causal AI helps with decision-making, planning, and "what-if" questions – areas where regular AI falls short. In fields that require thorough investigation and creativity, Causal AI, with its ability to construct and analyze "what if" scenarios, would be an excellent assistant! But we don’t talk much about Casual AI. It is still largely academic or niche-industry focused, but it might be crucial for achieving human-like reasoning and AGI. So today, let's explore this fascinating topic from the basics, see how it’s applied in the real world, and consider what it could mean for the future of AI.
In today’s episode, we will cover:
The main idea behind Causal AI
Will Causal AI help us reach AGI?
The basics of Causal AI
Causal inference
Directed Acyclic Graphs (DAGs) and Structural Causal Models (SCMs)
The do-operator
Do-calculus rules
Counterfactuals: The "What Ifs"
Causal discovery process
Real world influence
Conclusion: Why is Causal AI important for the future of AI?
Sources and further reading
What Is the Main Idea Behind Causal AI?
Traditional AI/ML models operate in an associational mode – they infer patterns from observational data and can predict outcomes from inputs, but they cannot reliably say what would happen under an intervention not seen in the data. Causal AI goes beyond prediction to enable explanation and intervention. It seeks to identify causal relationships that indicate how changes in one factor will influence others. This enables models to not only predict but also to explain and act.
This also makes AI more human-like in reasoning. Think of the word “cause,” and you will see that the idea of Causal AI is to always answer questions, such as “What will cause improvements?”, “Why something happens?”, “What if we did something different?”
Understanding the "why" behind actions and outcomes and being able to work with "what if" scenarios are the main features of Causal AI compared to the models we are used to working with. We’d like to call it the foundation of critical thinking of models – this allows one not just to follow rules to receive rewards while learning patterns, but to truly "understand" and analyze why something works or doesn't work in a particular way.
The Turing Award winner Judea Pearl, often referred to as the father of Causal AI, laid the theoretical groundwork for how machines can reason about cause and effect using tools like do-calculus and causal graphs (we’ll cover these terms in next section).
In his book "The Book of Why," co-authored with Dana Mackenzie, he proposed the concept of the ladder of causation with three levels of causal reasoning:

Image Credit: The Book of Why
Level 1: Association/Seeing involves finding patterns in observational data.
Level 2: Intervention/Doing refers to predicting future effects of deliberate actions (the do-operator).
Level 3: Counterfactuals/Imagining involves reasoning about hypothetical scenarios (what would have happened if something was different).
Traditional machine learning mostly lives on the first level. Causal AI opens up the upper levels, helping answer why something happened and what might happen. Causal AI builds upon the formal language of causal inference. But first – why do we decide to cover Causal AI at all?
Will Causal AI help us reach AGI?
As we wrote in our previous article about World Models, the integration of Causal AI could make these models far more powerful. And when it comes to AGI, Causal AI may not just be useful – it could be essential.
In his book The Path to AGI, John Thompson – we are honored to have John as one of our longest-standing subscribers – frames AGI as a trifecta: Foundational AI (traditional machine learning), Generative AI (GenAI) (today’s headline-grabber), and Causal AI – each playing a critical role in building true intelligence.
Thompson argues that the future lies in Composite AI – the convergence of all three domains – gradually evolving into AGI through integrated, pragmatic development. We find this theory fascinating (do read John’s book!) and want everyone to be more aware of Causal AI. For now, it remains mostly in the academic shadows, but as you’ll see from the article below, its importance may be greater than most assume. Let’s learn →
Core Concepts in Causal AI
Some of the main ideas include explicit causal models and graphs, the notion of interventions (the do-operator), reasoning about counterfactuals, and the use of tools like structural causal models (SCMs) and do-calculus to derive causal conclusions. Yes, there is much new information, and we are here to break it down for your convenience with a small math part.
We will focus on these two types of questions we can ask when working with data:
Observation-based: "What usually happens when I see X = x?"
Intervention-based: "What would happen if I set X = x?"
In regular machine learning, models almost always answer the first one. But sometimes, the second one is what we really care about. So what forms Causal AI?
Causal inference
Causal inference is about answering questions like “What will happen if I do X?” rather than just observing what tends to happen when X is seen or occurs naturally. It focuses on intervention – what changes if we act. For example, just think of the difference between:
Observing that people who take a drug tend to recover (correlation)
Versus asking if the drug actually causes recovery (causation)
This shift, from passive observation to active intervention, is the heart of causal thinking.
There are a few different ways of thinking about cause and effect:
Potential outcomes (thinking in terms of “what would’ve happened if...”)
Directed Acyclic Graphs (DAGs) (drawing arrows to show cause and effect)
Structural Causal Models (SCMs) (more formal models of how things are connected)
Directed Acyclic Graphs (DAGs) and Structural Causal Models (SCMs)
Directed Acyclic Graphs (DAGs) play a crucial role in how to represent causal relationships visually and mathematically.
As any graphs, DAGs have nodes and edges. Each node in a DAG represents a variable (for example, “smoking” or “lung cancer”), and each edge represents a direct causal influence from one variable to another.
Directed means the edges between variables go one way, from cause to effect.
Acyclic means you can’t loop back to the same variable — there are no circular dependencies.

Image Credit: “Applied Causal Inference Powered by ML and AI” paper
So DAGs are causal maps, that allow to visualize how variables are causally connected, identify confounders (variables that influence both a cause and an effect), and figure out which variables you need to control for when estimating a causal effect. They make confounding problems visible and solvable.
Structural Causal Models (SCMs) go further – they combine these graphs with mathematical functions that define how variables interact and affect each other. SCMs allow to compute outcomes under interventions, answer complex questions like “What would have happened if...?” and even explore alternate scenarios (counterfactuals).
What are exactly these math functions and counterfactuals?
The do-operator
While SCMs allows to distinguish between observing and intervening, the do-operator, written as do(X), formalizes that distinction mathematically.
Judea Pearl introduced this do-operator to set X directly, like in a randomized experiment, not just seeing X happen. So here we have:
P(Y | X) = What’s the probability of Y when we observe X?
P(Y | do(X)) = What’s the probability of Y when we intervene to set X?
This distinction is crucial. For example, observing someone takes a drug is different from assigning it. The first tells us about association, while the second can reveal causation.
To manipulate and simplify expressions involving the do-operator, Judea Pearl also introduced do-calculus as a mathematical tool to reason about interventions using causal diagrams (DAGs).
Do-calculus rules
Do-calculus consists of three rules, and they allow to transform expressions involving the do-operator into ordinary probability expressions under certain assumptions. It’s especially useful when we can't perform interventions directly and only have observational data. Here are these three rules:
If, in a graph where X is intervened on (meaning do(X) is used), Y is conditionally independent of Z given X and W, then including Z in the conditioning doesn't change anything, if Z doesn’t influence Y once X and W are known.
If Y is conditionally independent of the intervention on X given X, Z, and W, then we can replace the intervention with a regular observation under specific conditions determined by the graph.
If Y is conditionally independent of the intervention on X given Z and W, then we remove a do(X) if it has no impact on Y given Z and W.
Do-calculus gives Causal AI a toolbox for causal reasoning. It can determine which variables to condition on or ignore to correctly infer causality. Together with DAGs, do-calculus provides the logic for making cause-and-effect conclusions without always needing randomized experiments.
Anyway the do-operator is important, because it allows to simulate counterfactual worlds (what would happen if we acted differently), even if we only have observational data given the right conditions.
Counterfactuals: The "What Ifs"
The most advanced form of causal reasoning asks counterfactual questions, like: "Would a person have survived if they hadn’t taken the drug?" These questions explore alternate realities. To answer them, SCMs is used to simulate these alternate realities where the intervention didn’t happen but everything else stayed the same.
These questions go beyond data. They require a model of the world, which is why classical statistics struggles with these "what if" scenarios, whereas causal inference can address them.
If we return to the Pearl’s ladder of causation that organizes causal thinking, counterfactual will be at the highest level:
Association – What we observe: P(Y | X).
Intervention – What happens when we act: P(Y | do(X)). Here we can see the do-operator.
Counterfactuals – What would have happened under different scenarios.
Putting this all together, we can observe the process of causal discovery.
How Causal Discovery Works
Causal discovery is the process of figuring out which things cause what just by looking at data. But here’s an issue: data usually shows only correlations, not causation. So causal discovery tries to go a step further and guess the likely cause-and-effect relationships between variables, and its purpose is to find the right causal graph, DAG.
Causal discovery uses patterns in the data, especially patterns of independence and dependence between variables, to figure out what the structure of the graph might look like. This works only if there is no hidden confounding variables (nothing sneaky causing both X and Y variables), and the data follows consistent logic, called causal faithfulness.
Here are several algorithms used in causal discovery:
Constraint-based algorithms
They use conditional independence relationships as constraints, and construct causal structures that respect those constraints.
A basic example is PC (Peter-Clark) Algorithm. It starts with all variables connected, then removes connections between variables that are independent (when you control for others). It uses these rules to figure out which arrows are likely real. The result is a causal graph, or often a set of possible graphs.

Image Credit: causaLens blog
Its extension is FCI (Fast Causal Inference) algorithm that does the same but also accounts for hidden confounders, meaning unknown variables influencing the results.
Score-based algorithms
These algorithms try different graphs, score how well each one fits the data, and pick the best.
For example, GES (Greedy Equivalence Search) adds and removes edges to maximize a “fit score”. It tries to balance how accurate vs how complex the graph is. It can be slow for large datasets, but works well for smaller ones.
Other special algorithms
Algorithms such as NOTEARS from Carnegie Mellon University aim to optimize the search for the correct DAGs.
NOTEARS, in particular, turns the problem into a smooth mathematical equation. It starts with a matrix that represents possible relationships between variables (edges in the graph). A score function defines how well the graph explains the data and a smooth mathematical constraint is added to make sure the graph has no cycles. Then optimization tools, like a standard numerical solver, find the best matrix. This method is easy to implement and avoids the need for complex, custom algorithms.
Once we know how Causal AI works in general, its time to see how it can be used in real world scenarios to advance reasoning and help with various difficult tasks.
Causal AI Applications: Real-World Examples
Causal AI is not a topic that everyone is constantly discussing, unlike, for example, reinforcement learning. However, its implementation demonstrates that models that can find cause-and-effect relationships can strongly reshape our world. Here are some examples of use cases from different domains.
Healthcare
In 2022 researchers at Elevance Health (formerly Anthem) applied a causal deep learning model called BCAUS to real-world health records from over 1 million diabetic patients. Since they studied observational data, people on different treatments might differ in other important ways. BCAUS, which is a neural network-based method, balanced these differences. As a result, researchers compared the effectiveness of 80+ antihyperglycemic treatment strategies and identified which drug combinations best reduced blood sugar levels (HbA1c) for different patient cohorts. The causal model revealed that top-ranked therapies achieved an average 0.69% greater HbA1c reduction than lower-ranked treatments, a significant improvement in outcomes.

Image Credit: “Causal deep learning reveals the comparative effectiveness of antihyperglycemic treatments in poorly controlled diabetes” paper
Another example is a study from the University of Edinburgh and Canon Medical Research Europe that demonstrates how Causal AI improves medical decision-making in Alzheimer’s disease diagnosis using brain MRI scans. Researchers used a causal graph to model the relationships between age, Alzheimer’s status, and brain structure, identifying age as a confounder.
Then, they applied a causal generative model to create synthetic brain images by altering either age or Alzheimer’s status while keeping the other fixed. This produced counterfactual examples, helping the model learn to distinguish between the effects of normal aging and disease. For instance, in the 80–90 age group, diagnostic precision increased from 75.5% to 84.2%.

Image Credit: “Causal Machine Learning for Healthcare and Precision Medicine” paper
This shows that incorporating causal knowledge can reduce bias and improve generalization, making machine learning more reliable for personalized healthcare.
Finance
The Bank of England’s supervisors explored Causal AI to explain anomalous financial risk metrics in banks. Using a DAG built from regulatory data and DoWhy Python library to run analyses and validate the insights, they performed root-cause analysis on sudden changes in indicators like liquidity. In a case study, this approach attributed an abnormal spike in a bank’s Liquidity Coverage Ratio to a shortfall in the liquidity buffer.

Image Credit: Bank Underground blog
Meta’s Instagram tech
Meta’s Instagram team deployed causal inference to improve the notification experience for users. In 2022, they ran a randomized experiment for uplift modeling and used causal machine learning to identify users who would see certain content organically without a notification. By causally targeting notification send/drop decisions, they reduced the number of notifications sent by Instagram to these active users while improving overall user engagement and experience. As a result, causal AI helped Instagram send fewer, but more impactful, notifications.
Causal AI and Reinforcement Learning
Google DeepMind researchers proved a fundamental result linking causality and reinforcement learning (RL). In a 2024 paper, they showed that any AI agent can’t rely on correlations alone – it needs to grasp causal structure to maintain low regret when an environment’s dynamics change. They found that:
If an agent can adapt well, it must have learned the causal structure.
If an agent has a good causal model, it can make good decisions.
Even when things aren't perfect, approximate learning still works.
Causal discovery is hidden inside transfer learning problems.
This bridges decision-making, transfer learning, and causal inference — showing they’re all connected at a deep level. This survey suggests that as AI agents are deployed in open-ended or changing real-world settings, causal understanding will be key to their robustness.
Conclusion: Why is Causal AI important for the future of AI?
Causal inference, a method for understanding why things happen rather than just what happened, can help make investigations faster, more accurate, and more explainable.
Causal AI is powerful, but to truly have an impact, it needs to go beyond just experts. The more people use causal tools – especially scientists, decision-makers, and data analysts – the more we can uncover where current methods fall short. That, in turn, will help steer future research in useful directions.
By incorporating causation, AI can overcome many limitations of current purely statistical learners. Here is why Causal AI is important for the future of AI:
Better generalization and robustness: Unlike traditional ML models, causal models are more stable under changing conditions because they capture true cause-and-effect relationships, not just surface correlations.
Explainability and transparency: As Causal AI can say why something happened, not just what, it identifies the true drivers of decisions and helps developers trace issues back to their source.
Decision-making and ‘What-If’ reasoning: Causal AI enables simulations to predict outcomes of potential actions, which is essential for policy, healthcare, business strategy and analysis in any other domain.
Towards more human-like AI: Understanding only patterns can’t be enough for human-level reasoning. If AI can go deeper in the real cause of its predictions this empowers its common sense and critical “thinking.”
The more people ask "Why?" and analyze nontrivial cause-and-effect relationships, the smarter they become. The same for AI. To achieve advanced systems with human-like reasoning, developing Causal AI is one of the essential steps. If combined with ML, GenAI and Physical AI, it could lead to a powerful development that can see, imagine, learn, act, and identify the cause of everything.
Sources and further reading
The Book of Why: The New Science of Cause and Effect (book by Judea Pearl, Dana Mackenzie)
The Path To AGI (book by John Thonpson)
What is a directed acyclic graph (DAG)? (IBM’s blog)
Resources from the Turing Post







