A demonstration of how the outputs of Bayesian analysis can be used to simulate business processes while preserving inherent uncertainties.
One of the advantages of doing statistical analysis in a Bayesian framework is that its generative part makes it a natural fit for business process simulation, which incorporates all the uncertainties inherent in the statistical models we use to capture the patterns of interest.
Once the parameters of the models have been estimated, their corresponding posterior distributions can be easily sampled and used in combination with a range of input values to simulate the expected outcomes, including the associated uncertainty that needs to be taken into account when making decisions.
To illustrate, imagine, for example, that as a CHRO you want to estimate the number of potential new employees brought in by employees who choose to participate in a new referral program. Fortunately, you have data from a small, three-month pilot of this program in which you offered participation to a small random sample of employees. Using a combination of simple Binomial and Poisson models, you can easily arrive at a reasonable estimate of the outcome of interest when you introduce the program to a larger portion of the company.
Lets’ implement this simple illustrative example with PyMC, a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.
First, let’s upload the data from the pilot program. The first table includes 150 employees who were randomly selected and offered participation in the pilot program. The second table then shows 42 employees who chose to participate in the program and the number of potential new employees they brought in.
import pandas as pd
# table with all pilot nominees
=pd.read_excel("./dataBayesSim.xlsx", sheet_name="nominees")
nominees
# table with all pilot participants
=pd.read_excel("./dataBayesSim.xlsx", sheet_name="participants")
participants
# showing first few rows of the tables
5) nominees.head(
employeeID participation
0 e1 0
1 e2 0
2 e3 0
3 e4 0
4 e5 0
5) participants.head(
employeeID referrals
0 e7 2
1 e9 4
2 e10 2
3 e11 2
4 e25 1
In order to estimate the expected number of new potential employees after the introduction of a new referral program to a larger part of the company, we want to model the probability that nominees actually participate in the program and the expected number of potential candidates that a participant brings in. To do this, we can fit Binomial and Poisson models to the data, respectively. As mentioned above, we will do this in a Bayesian framework that will make it easier to deal with uncertainty later in our simulation. A side note: for the sake of brevity, I omit the usual sanity checks that should be performed before drawing any conclusions from fitted models - e.g., checking for convergence of Markov chains or posterior predictive checks for how well the fitted models predict observed data.
Let’s start with the first model. From the summary below, we see that the estimated probability of participating in the programme is between 0.21 and 0.35.
import pymc as pm
# estimating the participation rate
= nominees.shape[0]
nominated = nominees['participation'].sum()
participated
# setting up the model
with pm.Model() as participationModel:
# assigning a flat Beta prior for p
= pm.Beta("p", alpha=1, beta=1)
p
# defining likelihood
= pm.Binomial("obs", p=p, n=nominated, observed=participated)
obs
# running mcmc
= pm.sample(3000, tune=500, chains=3, cores=1)
idata
# generating posterior predictive sample
= pm.sample_posterior_predictive(idata, extend_inferencedata=True) participationModelPosterior
import arviz as az
# trace plot showing the evolution of parameter vector over the iterations of Markov chain(s)
#az.plot_trace(idata, kind="trace", divergences="bottom", show=True)
# posterior predictive check
#az.plot_ppc(participationModelPosterior, num_pp_samples=500, random_seed=7, alpha=0.3, textsize=14, kind='kde', show=True)
# tabular and visual summary of the posterior probability distribution of the p parameter value
round(2) az.summary(idata).
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
p 0.28 0.04 0.22 0.35 0.0 0.0 4338.0 6229.0 1.0
=.95, show=True) az.plot_posterior(idata, hdi_prob
The second model then suggests that program participants brought in an average of 1.7 to 2.5 referrals.
# setting up the Poisson model
with pm.Model() as referralModel:
# weakly informative exponential prior for lambda parameter with mean 3
= pm.Exponential('lambda', 1/3)
lambda_ # alternative flat prior for lambda parameter
#lambda_ = pm.Uniform('lambda', lower=0, upper=25)
# Poisson likelihood
= pm.Poisson('y_obs', mu=lambda_, observed=participants['referrals'])
y_obs
# running mcmc
= pm.sample(3000, tune=500, chains=3, cores=1)
trace
# generating posterior predictive sample
= pm.sample_posterior_predictive(trace, extend_inferencedata=True) referralModelPosterior
# trace plot showing the evolution of parameter vector over the iterations of Markov chain(s)
#az.plot_trace(trace, kind="trace", divergences="bottom", show=True)
# posterior predictive check
#az.plot_ppc(referralModelPosterior, num_pp_samples=500, random_seed=7, alpha=0.3, textsize=14, kind='kde', show=True)
# tabular and visual summary of the posterior probability distribution of the p parameter value
round(2) az.summary(trace).
mean sd hdi_3% hdi_97% ... mcse_sd ess_bulk ess_tail r_hat
lambda 2.1 0.22 1.69 2.51 ... 0.0 3676.0 6056.0 1.0
[1 rows x 9 columns]
=.95, show=True) az.plot_posterior(trace, hdi_prob
We can now sample the posterior distributions of the parameters p
and lambda
, insert them into the dataframe, and for each row/combination calculate the expected number of referrals brought in by participating employees when the program is rolled out to the entire population of 1500 employees. As you can see below, using the IQR, our CHRO can expect recruiters to reach 790 to 990 potential candidates once the new company-wide referral program is in place.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
="white")
sns.set_theme(style
# sampling from the posterior distribution the parameters p and lambda
= pd.DataFrame({
posterior 'p': idata['posterior']['p'].values.flatten(),
'lambda': trace['posterior']['lambda'].values.flatten()
})
# computing expected number of referrals with 1500 nominees
'expectedReferrals'] = 1500*posterior['p']*posterior['lambda']
posterior[
# computing summary statistics
= posterior['expectedReferrals'].mean().round(1)
m = np.percentile(posterior['expectedReferrals'], 25).round(1)
Q1 = np.percentile(posterior['expectedReferrals'], 50).round(1)
Q2 = np.percentile(posterior['expectedReferrals'], 75).round(1)
Q3
# visualizing results
'expectedReferrals'], bins=30, kde=True, color='#5b7db6').set(xlabel ="Number of new referrals", ylabel = "Count")
sns.histplot(posterior['Expected referrals for program rollout to all 1500 employees', fontsize=13)
plt.gcf().suptitle(f'Mean={m}, Q1={Q1}, Median={Q2}, Q3={Q3}', fontsize=10)
plt.gca().set_title( plt.show()
And it doesn’t have to end there. For example, this estimate can be combined with other inputs, e.g. the cost of a new referral program, the cost of an alternative solution, etc., to make a better informed decision, taking into account the existing uncertainty.
For attribution, please cite this work as
Stehlík (2023, Sept. 17). Ludek's Blog About People Analytics: Harnessing Bayesian analysis for business process simulation. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2023-09-17-bayesian-simulation/
BibTeX citation
@misc{stehlík2023harnessing, author = {Stehlík, Luděk}, title = {Ludek's Blog About People Analytics: Harnessing Bayesian analysis for business process simulation}, url = {https://blog-about-people-analytics.netlify.app/posts/2023-09-17-bayesian-simulation/}, year = {2023} }