Validating the Causal Impression of the Artificial Management Methodology | by Ryan O’Sullivan | Jun, 2024


Causal AI, exploring the combination of causal reasoning into machine studying

Picture by Irina Inga on Unsplash

Welcome to my sequence on Causal AI, the place we’ll discover the combination of causal reasoning into machine studying fashions. Anticipate to discover quite a lot of sensible purposes throughout totally different enterprise contexts.

Within the final article we coated measuring the intrinsic causal affect of your advertising and marketing campaigns. On this article we’ll transfer onto validating the causal impression of the artificial controls.

If you happen to missed the final article on intrinsic causal affect, test it out right here:

On this article we’ll give attention to understanding the artificial management methodology and exploring how we are able to validate the estimated causal impression.

The next points shall be coated:

  • What’s the artificial management methodology?
  • What problem does it attempt to overcome?
  • How can we validate the estimated causal impression?
  • A Python case examine utilizing sensible google pattern information, demonstrating how we are able to validate the estimated causal impression of the artificial controls.

The complete pocket book will be discovered right here:

What’s it?

The artificial management methodology is a causal method which can be utilized to evaluate the causal impression of an intervention or therapy when a randomised management trial (RCT) or A/B check was not doable. It was initially proposed in 2003 by Abadie and Gardezabal. The next paper features a nice case examine that can assist you perceive the proposed methodology:

https://web.stanford.edu/~jhain/Paper/JASA2010.pdf

Person generated picture

Let’s cowl a number of the fundamentals ourselves… The artificial management methodology creates a counterfactual model of the therapy unit by making a weighted mixture of management models that didn’t obtain the intervention or therapy.

  • Handled unit: The unit which receives the intervention.
  • Management models: A set of comparable models which didn’t obtain the intervention.
  • Counterfactual: Created as a weighted mixture of the management models. Purpose is to seek out weights for every management unit that end in a counterfactual which intently matches the handled unit within the pre-intervention interval.
  • Causal impression: The distinction between the post-intervention therapy unit and counterfactual.

If we wished to actually simplify issues, we may consider it as linear regression the place every management unit is a characteristic and the therapy unit is the goal. The pre-intervention interval is our prepare set, and we use the mannequin to attain our post-intervention interval. The distinction between the precise and predicted is the causal impression.

Beneath are a pair examples to convey it to life after we would possibly think about using it:

  • When operating a TV advertising and marketing marketing campaign, we’re unable to randomly assign the viewers into these that may and might’t see the marketing campaign. We may nonetheless, fastidiously choose a area to trial the marketing campaign and use the remaining areas as management models. As soon as we now have measured the impact the marketing campaign may very well be rolled out to different areas. That is typically referred to as a geo-lift check.
  • Coverage adjustments that are introduced into some areas however not others — For instance an area council could convey a coverage develop into drive to cut back unemployment. Different areas the place the coverage wasn’t in place may very well be used as management models.

What problem does it attempt to overcome?

After we mix high-dimensionality (a number of options) with restricted observations, we are able to get a mannequin which overfits.

Let’s take the geo-lift instance for example. If we use weekly information from the final 12 months as our pre-intervention interval, this provides us 52 observations. If we then determine to check our intervention throughout international locations in Europe, that may give us an statement to characteristic ratio of 1:1!

Earlier we talked about how the artificial management methodology may very well be carried out utilizing linear regression. Nevertheless, the statement to characteristic ratio imply it is extremely possible linear regression will overfit leading to a poor causal impression estimate within the post-intervention interval.

In linear regression the weights (coefficients) for every characteristic (management unit) may very well be adverse or optimistic they usually could sum to a quantity better than 1. Nevertheless, the artificial management methodology learns the weights while making use of the beneath constraints:

  • Constraining weights to sum to 1
  • Constraining weights to be ≥ 0
Person generate picture

These constraints assist with regularisation and keep away from extrapolation past the vary of the noticed information.

It’s value noting that when it comes to regularisation, Ridge and Lasso regression can obtain this, and in some circumstances are affordable alternate options. However we’ll check this out within the case examine!

How can we validate the estimated causal impression?

An arguably larger problem is the truth that we’re unable to validate the estimated causal impression within the post-intervention interval.

How lengthy ought to my pre-intervention interval be? Are we positive we haven’t overfit our pre-intervention interval? How can we all know whether or not our mannequin generalises nicely within the publish intervention interval? What if I wish to check out totally different implementations of artificial management methodology?

We may randomly choose just a few observations from the pre-intervention interval and maintain them again for validation — However we now have already highlighted the problem which comes from having restricted observations so we could make issues even worse!

What if we may run some type of pre-intervention simulation? Might that assist us reply a number of the questions highlighted above and achieve confidence in our fashions estimated causal impression? All shall be defined within the case examine!

Background

After convincing Finance that model advertising and marketing is driving some critical worth, the advertising and marketing staff strategy you to ask about geo-lift testing. Somebody from Fb has informed them it’s the subsequent massive factor (though it was the identical one that informed them Prophet was an excellent forecasting mannequin) they usually wish to know whether or not they may use it to measure their new TV marketing campaign which is arising.

You’re a little involved, because the final time you ran a geo-lift check the advertising and marketing analytics staff thought it was a good suggestion to mess around with the pre-intervention interval used till they’d a pleasant massive causal impression.

This time spherical, you counsel that they run a “pre-intervention simulation” after which you plan that the pre-intervention interval is agreed earlier than the check begins.

So let’s discover what a “pre-intervention simulation” seems to be like!

Creating the information

To make this as sensible as doable, I extracted some google pattern information for almost all of nations in Europe. What the search time period was isn’t related, simply faux it’s the gross sales for you firm (and that you just function throughout Europe).

Nevertheless, if you’re serious about how I acquired the google pattern information, try my pocket book:

Beneath we are able to see the dataframe. We’ve gross sales for the previous 3 years throughout 50 European international locations. The advertising and marketing staff plan to run their TV marketing campaign in Nice Britain.

Person generated picture

Now right here comes the intelligent bit. We’ll simulate an intervention within the final 7 weeks of the time sequence.

np.random.seed(1234)

# Create intervention flag
masks = (df['date'] >= "2024-04-14") & (df['date'] <= "2024-06-02")
df['intervention'] = masks.astype(int)

row_count = len(df)

# Create intervention uplift
df['uplift_perc'] = np.random.uniform(0.10, 0.20, dimension=row_count)
df['uplift_abs'] = spherical(df['uplift_perc'] * df['GB'])
df['y'] = df['GB']
df.loc[df['intervention'] == 1, 'y'] = df['GB'] + df['uplift_abs']

Now let’s plot the precise and counterfactual gross sales throughout GB to convey what we now have executed to life:

def synth_plot(df, counterfactual):

plt.determine(figsize=(14, 8))
sns.set_style("white")

# Create plot
sns.lineplot(information=df, x='date', y='y', label='Precise', shade='b', linewidth=2.5)
sns.lineplot(information=df, x='date', y=counterfactual, label='Counterfactual', shade='r', linestyle='--', linewidth=2.5)
plt.title('Artificial Management Methodology: Precise vs. Counterfactual', fontsize=24)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Metric Worth', fontsize=20)
plt.legend(fontsize=16)
plt.gca().xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter('%Y-%m-%d'))
plt.xticks(rotation=90)
plt.grid(True, linestyle='--', alpha=0.5)

# Excessive the intervention level
intervention_date = '2024-04-07'
plt.axvline(pd.to_datetime(intervention_date), shade='ok', linestyle='--', linewidth=1)
plt.textual content(pd.to_datetime(intervention_date), plt.ylim()[1]*0.95, 'Intervention', shade='ok', fontsize=18, ha='proper')

plt.tight_layout()
plt.present()

synth_plot(df, 'GB')
Person generated picture

So now we now have simulated an intervention, we are able to discover how nicely the artificial management methodology will work.

Pre-processing

All the European international locations other than GB are set as management models (options). The therapy unit (goal) is the gross sales in GB with the intervention utilized.

# Delete the unique goal column so we do not use it as a characteristic accidentally
del df['GB']

# set characteristic & targets
X = df.columns[1:50]
y = 'y'

Regression

Beneath I’ve setup a perform which we are able to re-use with totally different pre-intervention durations and totally different regression fashions (e.g. Ridge, Lasso):

def train_reg(df, start_index, reg_class):

df_temp = df.iloc[start_index:].copy().reset_index()

X_pre = df_temp[df_temp['intervention'] == 0][X]
y_pre = df_temp[df_temp['intervention'] == 0][y]

X_train, X_test, y_train, y_test = train_test_split(X_pre, y_pre, test_size=0.10, random_state=42)

mannequin = reg_class
mannequin.match(X_train, y_train)

yhat_train = mannequin.predict(X_train)
yhat_test = mannequin.predict(X_test)

mse_train = mean_squared_error(y_train, yhat_train)
mse_test = mean_squared_error(y_test, yhat_test)
print(f"Imply Squared Error prepare: {spherical(mse_train, 2)}")
print(f"Imply Squared Error check: {spherical(mse_test, 2)}")

r2_train = r2_score(y_train, yhat_train)
r2_test = r2_score(y_test, yhat_test)
print(f"R2 prepare: {spherical(r2_train, 2)}")
print(f"R2 check: {spherical(r2_test, 2)}")

df_temp['pred'] = mannequin.predict(df_temp.loc[:, X])
df_temp['delta'] = df_temp['y'] - df_temp['pred']

pred_lift = df_temp[df_temp['intervention'] == 1]['delta'].sum()
actual_lift = df_temp[df_temp['intervention'] == 1]['uplift_abs'].sum()
abs_error_perc = abs(pred_lift - actual_lift) / actual_lift
print(f"Predicted raise: {spherical(pred_lift, 2)}")
print(f"Precise raise: {spherical(actual_lift, 2)}")
print(f"Absolute error share: {spherical(abs_error_perc, 2)}")

return df_temp, abs_error_perc

To begin us off we maintain issues easy and use linear regression to estimate the causal impression, utilizing a small pre-intervention interval:

df_lin_reg_100, pred_lift_lin_reg_100 = train_reg(df, 100, LinearRegression())
Person generated picture

Wanting on the outcomes, linear regression doesn’t do nice. However this isn’t shocking given the statement to characteristic ratio.

synth_plot(df_lin_reg_100, 'pred')
Person generated picture

Artificial management methodology

Let’s leap proper in and see the way it compares to the artificial management methodology. Beneath I’ve setup the same perform as earlier than, however making use of the artificial management methodology utilizing sciPy:

def synthetic_control(weights, control_units, treated_unit):

artificial = np.dot(control_units.values, weights)

return np.sqrt(np.sum((treated_unit - artificial)**2))

def train_synth(df, start_index):

df_temp = df.iloc[start_index:].copy().reset_index()

X_pre = df_temp[df_temp['intervention'] == 0][X]
y_pre = df_temp[df_temp['intervention'] == 0][y]

X_train, X_test, y_train, y_test = train_test_split(X_pre, y_pre, test_size=0.10, random_state=42)

initial_weights = np.ones(len(X)) / len(X)

constraints = ({'sort': 'eq', 'enjoyable': lambda w: np.sum(w) - 1})

bounds = [(0, 1) for _ in range(len(X))]

outcome = reduce(synthetic_control,
initial_weights,
args=(X_train, y_train),
methodology='SLSQP',
bounds=bounds,
constraints=constraints,
choices={'disp': False, 'maxiter': 1000, 'ftol': 1e-9},
)

optimal_weights = outcome.x

yhat_train = np.dot(X_train.values, optimal_weights)
yhat_test = np.dot(X_test.values, optimal_weights)

mse_train = mean_squared_error(y_train, yhat_train)
mse_test = mean_squared_error(y_test, yhat_test)
print(f"Imply Squared Error prepare: {spherical(mse_train, 2)}")
print(f"Imply Squared Error check: {spherical(mse_test, 2)}")

r2_train = r2_score(y_train, yhat_train)
r2_test = r2_score(y_test, yhat_test)
print(f"R2 prepare: {spherical(r2_train, 2)}")
print(f"R2 check: {spherical(r2_test, 2)}")

df_temp['pred'] = np.dot(df_temp.loc[:, X].values, optimal_weights)
df_temp['delta'] = df_temp['y'] - df_temp['pred']

pred_lift = df_temp[df_temp['intervention'] == 1]['delta'].sum()
actual_lift = df_temp[df_temp['intervention'] == 1]['uplift_abs'].sum()
abs_error_perc = abs(pred_lift - actual_lift) / actual_lift
print(f"Predicted raise: {spherical(pred_lift, 2)}")
print(f"Precise raise: {spherical(actual_lift, 2)}")
print(f"Absolute error share: {spherical(abs_error_perc, 2)}")

return df_temp, abs_error_perc

I maintain the pre-intervention interval the identical to create a good comparability to linear regression:

df_synth_100, pred_lift_synth_100 = train_synth(df, 100)
Person generated picture

Wow! I’ll be the primary to confess I wasn’t anticipating such a big enchancment!

synth_plot(df_synth_100, 'pred')
Person generated picture

Comparability of outcomes

Let’s not get too carried away but. Beneath we run just a few extra experiments exploring mannequin sorts and pre-interventions durations:

# run regression experiments
df_lin_reg_00, pred_lift_lin_reg_00 = train_reg(df, 0, LinearRegression())
df_lin_reg_100, pred_lift_lin_reg_100 = train_reg(df, 100, LinearRegression())
df_ridge_00, pred_lift_ridge_00 = train_reg(df, 0, RidgeCV())
df_ridge_100, pred_lift_ridge_100 = train_reg(df, 100, RidgeCV())
df_lasso_00, pred_lift_lasso_00 = train_reg(df, 0, LassoCV())
df_lasso_100, pred_lift_lasso_100 = train_reg(df, 100, LassoCV())

# run artificial management experiments
df_synth_00, pred_lift_synth_00 = train_synth(df, 0)
df_synth_100, pred_lift_synth_100 = train_synth(df, 100)

experiment_data = {
"Methodology": ["Linear", "Linear", "Ridge", "Ridge", "Lasso", "Lasso", "Synthetic Control", "Synthetic Control"],
"Information Dimension": ["Large", "Small", "Large", "Small", "Large", "Small", "Large", "Small"],
"Worth": [pred_lift_lin_reg_00, pred_lift_lin_reg_100, pred_lift_ridge_00, pred_lift_ridge_100,pred_lift_lasso_00, pred_lift_lasso_100, pred_lift_synth_00, pred_lift_synth_100]
}

df_experiments = pd.DataFrame(experiment_data)

We’ll use the code beneath to visualise the outcomes:

# Set the model
sns.set_style="whitegrid"

# Create the bar plot
plt.determine(figsize=(10, 6))
bar_plot = sns.barplot(x="Methodology", y="Worth", hue="Information Dimension", information=df_experiments, palette="muted")

# Add labels and title
plt.xlabel("Methodology")
plt.ylabel("Absolute error share")
plt.title("Artificial Controls - Comparability of Strategies Throughout Completely different Information Sizes")
plt.legend(title="Information Dimension")

# Present the plot
plt.present()

Person generated picture

The outcomes for the small dataset are actually attention-grabbing! As anticipated, regularisation helped enhance the causal impression estimates. The artificial management then took it one step additional!

The outcomes of the massive dataset counsel that longer pre-intervention durations aren’t at all times higher.

Nevertheless, the factor I need you to remove is how helpful finishing up a pre-intervention simulation is. There are such a lot of avenues you would discover with your individual dataset!

At this time we explored the artificial management methodology and how one can validate the causal impression. I’ll go away you with just a few ultimate ideas:

  • The simplicity of the artificial management methodology make it probably the most broadly used method from the causal AI toolbox.
  • Sadly additionally it is essentially the most broadly abused — Lets run the R CausalImpact bundle, altering the pre-intervention interval till we see an uplift we like. 😭
  • That is the place I extremely advocate operating pre-intervention simulations to agree check design upfront.
  • Artificial management methodology is a closely researched space. It’s value testing the proposed adaptions Augmented SC, Sturdy SC and Penalized SC.

Alberto Abadie, Alexis Diamond & Jens Hainmueller (2010) Artificial Management Strategies for Comparative Case Research: Estimating the Impact of California’s Tobacco Management Program, Journal of the American Statistical Affiliation, 105:490, 493–505, DOI: 10.1198/jasa.2009.ap08746

Leave a Reply

Your email address will not be published. Required fields are marked *