A longer note on our’s team first experiment with an agentic workflow for causal inference.
While preparing causal inference workshops for two recent conferences - SIOP 2026 and Machine Learning Prague 2026 - our team started experimenting with adding AI automation to our usual causal analysis workflow.
The motivation was practical. Causal analysis involves many decisions before any estimator is run: defining the causal question, specifying the treatment and outcome, understanding the assignment mechanism, choosing adjustment variables, drawing a DAG, checking identification, and deciding which diagnostics and robustness checks are necessary.
These steps are easy to compress or skip when working with a general-purpose AI assistant. We wanted a workflow that makes the design stage explicit, forces important decisions to be documented, and prevents the assistant from moving to implementation before the causal setup is clear.
The current version is a small, opinionated repo that wires together three layers.
The first layer is a reusable causal-inference skill: a SKILL.md file plus reference materials for study design, DAGs and identification, estimator choice, diagnostics, refutation and sensitivity analysis, code recipes, and reporting.
The skill encodes a 10-step workflow:
Formulate → Design → DAG → Identify → Choose estimator → Estimate → Diagnose → Refute → Interpret → Report
Steps 1-5 are design-focused. The agent should not write analysis code until the causal question, DAG, identification strategy, and estimator choice have been made explicit.
The workflow is adapted from the causal inference skill by Alexandre Andorra from Learning Bayesian Statistics. We adjusted it to be more broadly accessible, moving beyond a primarily Bayesian analytical framing while keeping the emphasis on causal design, identification, DAG-based reasoning, estimation strategy, diagnostics, and robustness checks.
The skill follows the same SKILL.md plus references/ structure with progressive disclosure, so the agent only loads the reference file relevant to the current step.
The repo also includes a document-generation skill, based on Anthropic’s docx skill, used near the end of the workflow to produce Word reports. These reports cover the causal question, assumptions, analysis plan, diagnostics, robustness checks, and interpretation.
The second layer is an AGENTS.md file with rules that apply across agents. This file contains the main workflow constraints and guardrails. For example:
The goal is to make the workflow more disciplined and easier to audit. Many causal analysis problems originate before estimation: vague treatment definitions, unclear assignment mechanisms, inappropriate controls, post-treatment covariates, positivity problems, or unsupported identification claims. The invariants file is meant to catch some of these issues before they are buried in code.
The third layer is a set of four agent roles with explicit handoffs.
design.yaml file containing the key answered design questions.In practice, this creates an iterative workflow rather than a single long prompt. The process usually involves several rounds between the Planner, Implementer, Reviewer, and the human analyst.

The repo also enforces a few project-hygiene rules.
project.yaml is the main source of truth for what is already known about the project: treatment, assignment mechanism, outcomes, covariates, measurement timing, and file paths.This makes it easier to see which design decisions were in place when a particular result, diagnostic, or report was generated.
This template is implemented for GitHub Copilot custom agents. The workflow itself is platform-neutral: adapt AGENTS.md, .github/agents/*.agent.md, and causal-inference to your preferred agentic coding tool’s instruction format. Most modern coding agents can do this translation in a few minutes.
You can find the repo here.
A typical workflow starts by forking the repo. Then you fill in project.yaml with the information you already know about your project:
Then you place the raw data files into raw/.
After that, open your AI assistant of choice and ask the Planner to read project.yaml and prepare a causal estimation plan.
The Planner performs structural EDA on the raw data, emulates a target trial where appropriate, draws a DAG, asks design-critical clarification questions, and writes the plan to:
docs/plans/<slug>.md
Once the design questions are answered and the identification block is approved, the work moves to the Implementer.
The Implementer writes and runs the analysis code, produces diagnostics and robustness checks, and prepares outputs.
The Reviewer then audits the design and implementation. If the analysis is ready, it returns a terminal approval. If not, it routes findings back to the Planner or Implementer, depending on whether the problem is design-level or code-level.
So far, we have tried the workflow on two projects. That is still a small sample, but the initial experience has been encouraging.
The workflow feels fairly natural because it matches the way causal analysis usually unfolds: not as one clean linear sequence, but as repeated movement between design clarification, implementation, diagnostics, review, and revision.
A typical run has taken around 10-15 iterations across the Planner, Implementer, Reviewer, and the human analyst.
The most useful aspect so far has been the enforced separation between design and implementation. The agent is less likely to rush into estimation, and the human analyst gets clearer artifacts to review: a design plan, a DAG, an identification block, diagnostics, robustness checks, and a final report.
This is not meant to replace domain knowledge or causal judgment. The workflow can help document assumptions, enforce process constraints, and surface missing design decisions. It cannot determine whether the assumptions are substantively correct. That still requires human expertise and, usually, discussion with people who understand the domain, intervention, data-generating process, and measurement context.
The workflow is also still early. We need to test it across more designs, data structures, and failure cases. The areas where I expect problems to show up first are:
The repo ships with dummy data from a manager leadership development program evaluation - the same dataset we used at the workshops. It’s a realistic observational setup (voluntary participation, uneven program uptake across departments, baseline and follow-up survey outcomes, retention flags at 3/6/9/12 months), so you can run the workflow end-to-end without your own project.
Give it a try and let me know how it works for you. I would especially appreciate feedback on:
I would also be very interested to see related efforts. If you are building agentic workflows for causal inference, statistical analysis, scientific workflows, or research automation more broadly, please share what you have tried and what has or has not worked.
For attribution, please cite this work as
Stehlík (2026, May 16). Ludek's Blog About People Analytics: Agentic workflow for causal inference. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2026-05-16-agentic-workflow-for-causal-inference/
BibTeX citation
@misc{stehlík2026agentic,
author = {Stehlík, Luděk},
title = {Ludek's Blog About People Analytics: Agentic workflow for causal inference},
url = {https://blog-about-people-analytics.netlify.app/posts/2026-05-16-agentic-workflow-for-causal-inference/},
year = {2026}
}