Using CausalPy package to test the plausibility of the hypothesis that Stack Overflow may not be the only “victim” of ChatGPT and the GenAI likes.
I wanted to try out the new CausalPy package for causal inference and was brainstorming interesting research questions to apply it to. My inspiration came from a recent wave of LI posts in my feed, all sharing a chart showing a drop in Stack Overflow traffic after ChatGPT’s release in November 2022.
Given that many studies - including one we conducted internally at Sanofi - show that one of the most frequent use cases for GenAI in the business world is helping with email writing (we’ve probably all seen the hilarious meme on this topic shown below 😁), I got curious about whether a similar pattern might show up in Google Trends for searches like “How to write an email.”
For the analysis, I used an interrupted time series method, which examines the effect of an intervention by comparing time series data before and after the intervention that happened at a known point in time, allowing us to assess any shifts in level or trend.
As the resulting charts below show, there does seem to be a noticeable drop in searches on how to write emails following ChatGPT’s official release. In fact, the shift is so pronounced that it’s clear even with a quick eyeballing analysis. Sure, we can speculate about other factors to be involved - maybe we’re sending fewer emails because we’re relying more on platforms like Slack or Teams - but as other available stats suggest (see, for example, this one), we can only wish that were the case 😉
Either way, CausalPy turned out to be a super easy-to-use and user-friendly tool. I’m looking forward to using it in my future projects. Kudos to the entire CausalPy team! 👏
P.S. To replicate the analysis, you can use the Python code below.
# uploading the reticulate package for running Python in .Rmd file
library(reticulate)
# uploading libraries
import pandas as pd
import numpy as np
from pytrends.request import TrendReq
import matplotlib.pyplot as plt
import arviz as az
import causalpy as cp
# getting global Google Trends data for a key phrase of interest
# initializing the TrendReq object
= TrendReq(hl='en-US', tz=360, requests_args={'verify': False})
pytrends
# specifying keyword and time range
= ['How to write email']
keywords = '2020-01-01 2024-10-31'
timeframe
# creating payload and fetching data
=0, timeframe=timeframe, geo='', gprop='')
pytrends.build_payload(keywords, cat= pytrends.interest_over_time()
data
# data overview
# data.head()
# data.info()
# creating month variable for capturing seasonality in the data
'month'] = data.index.month
data[
# creating time variable
't'] = np.arange(len(data))
data[
# renaming target variable
={'How to write email': 'y'}, inplace=True)
data.rename(columns
# saving the data for later usage
# data.to_csv('google_trends_data.csv', index=True)
# specifying the date of intervention
= pd.to_datetime("2022-10-30")
treatment_time
# specifying and fitting the model
= 2024
seed = cp.InterruptedTimeSeries(
result
data,
treatment_time,="y ~ 1 + t + C(month)",
formula=cp.pymc_models.LinearRegression(
model={
sample_kwargs"random_seed": seed,
"draws": 5000,
"tune": 1000,
"chains": 4
}
),
)
# summary of the fitted model
result.summary()
# plotting the results
= result.plot()
fig, ax
plt.show()
# summary statistics of the causal impact over the entire post-intervention period
"obs_ind"))
az.summary(result.post_impact.mean(
# summary statistics of the cumulative causal impact
# getting index of the final time point
= result.post_impact_cumulative.obs_ind.max()
index # grabbing the posterior distribution of the cumulative impact at this final time point
= result.post_impact_cumulative.sel({"obs_ind": index})
last_cumulative_estimate # getting summary stats
az.summary(last_cumulative_estimate)
For attribution, please cite this work as
Stehlík (2024, Nov. 13). Ludek's Blog About People Analytics: ChatGPT as a new email writing coach?. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2024-11-13-chatgpt-emails-and-causalpy/
BibTeX citation
@misc{stehlík2024chatgpt, author = {Stehlík, Luděk}, title = {Ludek's Blog About People Analytics: ChatGPT as a new email writing coach?}, url = {https://blog-about-people-analytics.netlify.app/posts/2024-11-13-chatgpt-emails-and-causalpy/}, year = {2024} }