In psychology, few things are drilled into you harder than: âBinary outcome? Use logistic regression. Never OLS. Never ever.â
I carried this rule for years like a sacred commandment. Then I started working on causal inference - evaluating program effects, matching, treatment estimates - and discovered that the entire applied causal inference field happily runs OLS on binary outcomes under the fancy name: Linear Probability Model.
At first it felt like watching someone pour red wine into a coffee mug. Technically functional, deeply unsettling. But hereâs why it actually may make sense in this context:ď¸
- Youâre not predicting, youâre estimating. When you care about one number - âhow much did this program change the probability of outcome Xâ - OLS gives you that directly as a coefficient. Logit gives you a log-odds coefficient that you then need to transform into a probability difference via post-estimation (average marginal effects). In many practical settings, both give you comparable answers - but OLS skips the extra step.
*ď¸ It doesnât blow up so easily. With small samples, many categorical controls, and sparse cells, logit loves to produce infinite coefficients, NaNs, and complete separation drama.ď¸
- The coefficient is human-readable. âParticipants were 3 percentage points more likely to move internallyâ vs âthe log-odds ratio was 0.47.â One of these gets a nod in a stakeholder meeting. The other gets a blank stare.ď¸
- Itâs not just some hack. Angrist & Pischke basically made it the default in Mostly Harmless Econometrics. Imbens & Rubin cover it. The entire economics causal inference tradition uses it routinely. You add robust standard errors (HC1) for simple designs - with matching or clustering, you may need to adjust the variance estimator further.ď¸
- The limits are real - and worth knowing. LPM can predict probabilities below 0 or above 1. If youâre building a risk scoring model, that matters. If youâre estimating an average effect from a matched sample, it usually doesnât - but one shouldnât stop there. When baseline rates are very high or very low, when youâre estimating interaction effects, or when covariate relationships are strongly nonlinear, LPM and logit can diverge meaningfully. Out-of-bounds predictions are your canary: if youâre seeing them, check whether the linear approximation is actually holding up. The point is to know when it breaks.

The deeper lesson for me: the real question isnât âOLS vs logit.â Itâs âwhat estimand do you want?â Risk difference â LPM is often the natural fit. Odds ratio â logistic regression. Risk ratio â log-binomial or modified Poisson. The tool follows the question, not the other way around.
Seems that sometimes sin is the way đ
Citation
For attribution, please cite this work as
StehlĂk (2026, March 19). Ludek's Blog About People Analytics: The statistical "sin" as best / common practice?. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2026-03-19-ols-vs-logistic-regression/
BibTeX citation
@misc{stehlĂk2026the,
author = {StehlĂk, LudÄk},
title = {Ludek's Blog About People Analytics: The statistical "sin" as best / common practice?},
url = {https://blog-about-people-analytics.netlify.app/posts/2026-03-19-ols-vs-logistic-regression/},
year = {2026}
}