ITS Plot Examples: Univariate vs Multivariate, Two vs Three Periods#

This notebook illustrates the different plot options for multiple outcomes Interrupted Time Series (ITS) in CausalPy:

  • Two-period design (permanent intervention: one treatment start) vs three-period design (temporary intervention: treatment start and end)

  • For multivariate: overlay (all outcomes on the same axes) vs per_outcome (one figure per outcome)

All examples use the Bayesian (PyMC) workflow. Sampling uses reduced chains/draws so the notebook runs quickly.

Summary: What each part does#

  • 1. Multivariate, two-period : Several outcomes at once, each with its own formula, using MultivarLinearReg. Demonstrates two plot layouts: overlay (all outcomes on the same axes) and per_outcome (one figure per outcome, each with three panels).

  • 2. Multivariate, three-period : Same multivariate setup as above, but with a temporary intervention (start and end). Again shows overlay and per_outcome options.

  • 3. Get plot data (DataFrame export) : How to obtain the data behind the plots via get_plot_data(): for univariate ITS you get one DataFrame with time index, prediction, impact, and HDI columns; for multivariate you get a long-format DataFrame with an outcome column.

  • 4. Proving we recover the covariance matrix : Validation of MultivarLinearReg: we generate data from a known residual covariance (three outcomes), fit the model, and compare the posterior mean covariance to the true one (max absolute difference and mean squared error).

  • 7. Summary : How summary() works: it prints the formula, coefficients, and the effect summary table (per outcome in the multivariate case).

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import causalpy as cp

seed = 42
# Fast sampling for demo (reduce for real analyses)
sample_kwargs = {
    "chains": 2,
    "draws": 200,
    "tune": 100,
    "progressbar": False,
    "random_seed": seed,
}
import os

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

1. Multivariate, two-period#

Two outcomes with a list of formulas. We use MultivarLinearReg (supports multivariate). Plot API:

  • Default: call plot() (or plot(layout="overlay")) — one figure, 3 panels, all outcomes on the same axes.

  • Per outcome: call plot(layout="per_outcome") — one figure per outcome, each with 3 panels.

  • Optionally pass outcomes_to_plot=["y1", ...] to show a subset of outcomes.

rng = np.random.default_rng(seed)
n = 72
dates = pd.date_range(start="2015-01-01", periods=n, freq="MS")
t = np.arange(n, dtype=float)
y1 = 10 + 0.3 * t + 2 * np.sin(2 * np.pi * np.arange(n) / 12) + rng.normal(0, 1, n)
y2 = 20 + 0.1 * t + rng.normal(0, 1.5, n)
df_multi = pd.DataFrame(
    {"y1": y1, "y2": y2, "t": t},
    index=dates,
)
df_multi.index.name = "obs_ind"
treatment_time_m = pd.to_datetime("2018-01-01")

result_multi_2p = cp.InterruptedTimeSeries(
    df_multi,
    treatment_time_m,
    formula=["y1 ~ 1 + t", "y2 ~ 1 + t"],
    model=cp.pymc_models.MultivarLinearReg(sample_kwargs=sample_kwargs),
)
c:\Users\jeanv\miniconda3\envs\CausalPy\Lib\site-packages\threadpoolctl.py:1226: RuntimeWarning: 
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

  warnings.warn(msg, RuntimeWarning)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, chol_cov]
Sampling 2 chains for 100 tune and 200 draw iterations (200 + 400 draws total) took 35 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [beta, chol_cov, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
# Default: overlay (all outcomes on the same axes). Same as plot(layout="overlay")
fig, ax = result_multi_2p.plot()
fig.suptitle("Multivariate, two-period (overlay)", y=1.02)
fig.tight_layout()
plt.show()
C:\Users\jeanv\AppData\Local\Temp\ipykernel_12588\2607712260.py:4: UserWarning: The figure layout has changed to tight
  fig.tight_layout()
../_images/17304be5bd843ceb7a368c29bd0f3b42677b7aa6457a8fc15f92adc128504d34.png

Inferred covariance matrix#

MultivarLinearReg estimates a covariance matrix between outcomes (LKJ prior on the correlation and scale per outcome). Below we inspect the posterior mean covariance and the derived correlation matrix to show that the model correctly infers the residual dependence between y1 and y2.

# Layout per_outcome: one figure per outcome (returns a list of (fig, ax))
# One figure per outcome: pass layout="per_outcome"
figs = result_multi_2p.plot(layout="per_outcome")
for i, (fig, _ax) in enumerate(figs):
    fig.suptitle(
        f"Multivariate, two-period — outcome: {result_multi_2p.outcome_variable_names[i]}",
        y=1.02,
    )
    fig.tight_layout()
plt.show()
C:\Users\jeanv\AppData\Local\Temp\ipykernel_7320\885547691.py:9: UserWarning: The figure layout has changed to tight
  fig.tight_layout()
../_images/dff7a46873b0b9152d0ffdfc6ab83c176248c133c0263dfc2bd9f7e81b2c613f.png ../_images/a516517c7d686a1166ce6bc5997ed9ca59fffae02b5c305cda329b821bff6710.png
# Optional: subset of outcomes (overlay). Default overlay + outcomes_to_plot
fig, ax = result_multi_2p.plot(outcomes_to_plot=["y1"])
fig.suptitle("Multivariate, two-period (overlay, y1 only)", y=1.02)
fig.tight_layout()
plt.show()
C:\Users\jeanv\AppData\Local\Temp\ipykernel_14100\1352864922.py:4: UserWarning: The figure layout has changed to tight
  fig.tight_layout()
../_images/a704bf4086ebb3f39a2319d2259afd711d0d563a76ec09a2ec09377993e5d2d5.png

4. Multivariate, three-period#

Multivariate ITS with temporary intervention: same overlay and per_outcome options, with treatment start and end lines on the plot.

treatment_end_time_m = pd.to_datetime("2019-06-01")

result_multi_3p = cp.InterruptedTimeSeries(
    df_multi,
    treatment_time_m,
    formula=["y1 ~ 1 + t", "y2 ~ 1 + t"],
    treatment_end_time=treatment_end_time_m,
    model=cp.pymc_models.MultivarLinearReg(sample_kwargs=sample_kwargs),
)
fig, ax = result_multi_3p.plot()  # default overlay
fig.suptitle("Multivariate, three-period (overlay)", y=1.02)
fig.tight_layout()
plt.show()
c:\Users\jeanv\miniconda3\envs\CausalPy\Lib\site-packages\threadpoolctl.py:1226: RuntimeWarning: 
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

  warnings.warn(msg, RuntimeWarning)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, chol_cov]
Sampling 2 chains for 100 tune and 200 draw iterations (200 + 400 draws total) took 18 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [beta, chol_cov, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
C:\Users\jeanv\AppData\Local\Temp\ipykernel_14100\1244863932.py:12: UserWarning: The figure layout has changed to tight
  fig.tight_layout()
../_images/831cca89b1d7f07c8133ae71c8106e6426b4e716f43d7334a7166155007ee4ac.png

5. Get plot data (DataFrame export)#

For univariate ITS, get_plot_data() returns a DataFrame with the time index, prediction, impact, and HDI columns. For multivariate, the DataFrame is in long format with an outcome column.

# Multivariate: long format with outcome column
plot_data_multi = result_multi_2p.get_plot_data()
print("Multivariate columns:", list(plot_data_multi.columns))
print("Outcomes in data:", plot_data_multi["outcome"].unique().tolist())
plot_data_multi.head(10)
Multivariate columns: ['y1', 'y2', 't', 'outcome', 'prediction', 'pred_hdi_lower_94', 'pred_hdi_upper_94', 'impact', 'impact_hdi_lower_94', 'impact_hdi_upper_94']
Outcomes in data: ['y1', 'y2']
y1 y2 t outcome prediction pred_hdi_lower_94 pred_hdi_upper_94 impact impact_hdi_lower_94 impact_hdi_upper_94
obs_ind
2015-01-01 10.304717 18.620822 0.0 y1 10.298209 9.191484 11.490822 0.006508 -1.065639 1.347100
2015-02-01 10.260016 20.845741 1.0 y1 10.584169 9.539022 11.750634 -0.324153 -1.354346 0.943285
2015-03-01 13.082502 20.413639 2.0 y1 10.870129 9.836503 11.938577 2.212373 1.224135 3.403732
2015-04-01 13.840565 21.335728 3.0 y1 11.156089 10.167732 12.176982 2.684475 1.738196 3.799754
2015-05-01 10.981016 19.759121 4.0 y1 11.442050 10.477895 12.392716 -0.461034 -1.364177 0.578165
2015-06-01 11.197820 20.737810 5.0 y1 11.728010 10.809258 12.628730 -0.530189 -1.391565 0.473865
2015-07-01 11.927840 21.538386 6.0 y1 12.013970 11.068768 12.802749 -0.086130 -0.905423 0.843753
2015-08-01 10.783757 20.235980 7.0 y1 12.299930 11.457413 13.099959 -1.516173 -2.284106 -0.637511
2015-09-01 10.651148 21.485163 8.0 y1 12.585890 11.794811 13.352568 -1.934742 -2.670125 -1.089803
2015-10-01 9.846956 19.907111 9.0 y1 12.871851 12.189518 13.664122 -3.024894 -3.736210 -2.232080

6. Proving we recover the covariance matrix#

To show that MultivarLinearReg correctly infers the residual covariance between outcomes, we generate a dataset with three outcomes whose residuals follow a known covariance matrix. We then fit the model and compare the posterior mean covariance to the true one. For this section we use heavier sampling (more chains, draws, tune) and target_accept=0.95 so that NUTS takes smaller steps;

# True covariance matrix (3x3, positive definite) used to generate correlated residuals
# We choose a known structure: correlation + scale per outcome
true_corr = np.array(
    [
        [1.0, 0.6, 0.2],
        [0.6, 1.0, 0.5],
        [0.2, 0.5, 1.0],
    ]
)
true_std = np.array([1.2, 0.9, 1.0])  # standard deviations per outcome
true_cov = np.outer(true_std, true_std) * true_corr
outcome_names = ["y1", "y2", "y3"]

# Generate data: trend + correlated noise
n = 100
rng_cov = np.random.default_rng(seed)
t = np.linspace(0, 10, n)
# Mean structure: different intercept and slope per outcome
beta_true = np.array(
    [
        [1.0, 0.3],  # y1: intercept, slope
        [10.0, 0.1],  # y2
        [15.5, 0.2],  # y3
    ]
)
X = np.column_stack([np.ones(n), t])
mean_3d = X @ beta_true.T  # (n, 3)
# Correlated residuals from the known covariance; standardize so residual variance = 1 per outcome
# (reduces sampler geometry issues and we then compare inferred cov to true_corr)
residuals = rng_cov.multivariate_normal(np.zeros(3), true_cov, size=n)
residuals_std = residuals / residuals.std(axis=0) * true_std
y1 = mean_3d[:, 0] + residuals_std[:, 0]
y2 = mean_3d[:, 1] + residuals_std[:, 1]
y3 = mean_3d[:, 2] + residuals_std[:, 2]
# Treatment effect from t=50 onward
treatment_idx = 50
y1[treatment_idx:] += 20.0
y2[treatment_idx:] += 1.0
y3[treatment_idx:] -= 10

df_cov = pd.DataFrame(
    {"y1": y1, "y2": y2, "y3": y3, "t": t},
    index=np.arange(n),
)
df_cov.index.name = "obs_ind"
treatment_time_cov = 50.0  # corresponds to index ~50

# Heavier sampling + target_accept to avoid divergences (LKJ covariance has difficult geometry)
sample_kwargs_cov = {
    "chains": 4,
    "draws": 4000,
    "tune": 2000,
    "target_accept": 0.98,
    "progressbar": True,
    "random_seed": seed,
}

result_cov = cp.InterruptedTimeSeries(
    df_cov,
    treatment_time=treatment_time_cov,
    formula=["y1 ~ 1 + t", "y2 ~ 1 + t", "y3 ~ 1 + t"],
    model=cp.pymc_models.MultivarLinearReg(sample_kwargs=sample_kwargs_cov),
)
c:\Users\jeanv\miniconda3\envs\CausalPy\Lib\site-packages\threadpoolctl.py:1226: RuntimeWarning: 
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

  warnings.warn(msg, RuntimeWarning)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, chol_cov]

Sampling 4 chains for 2_000 tune and 4_000 draw iterations (8_000 + 16_000 draws total) took 135 seconds.
Sampling: [beta, chol_cov, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
result_cov.plot()  # default overlay
plt.show()
../_images/345cdc025d0ddeb0c46bb3efcd8b0d989217d4909cb184b0524c77dd19baf2fa.png
# Compare true vs inferred covariance matrix
idata_cov = result_cov.model.idata
cov_post = idata_cov.posterior["cov"].mean(dim=["chain", "draw"]).values
if cov_post.ndim > 2:
    cov_post = cov_post.reshape(cov_post.shape[-2], cov_post.shape[-1])

# True residual cov = true_corr (we used standardized residuals, so unit variance)
true_cov_df = pd.DataFrame(true_cov, index=outcome_names, columns=outcome_names)
inferred_cov_df = pd.DataFrame(cov_post, index=outcome_names, columns=outcome_names)
diff_cov = inferred_cov_df - true_cov_df

print("True residual covariance (correlation matrix, from standardized residuals):")
display(true_cov_df)
print("Inferred posterior mean covariance matrix:")
display(inferred_cov_df)
print("Difference (inferred - true). Should be close to zero:")
display(diff_cov.round(4))
print("Max absolute difference: {:.4f}".format(np.abs(diff_cov.values).max()))
print("Mean squared error: {:.6f}".format(np.mean(diff_cov.values**2)))
True residual covariance (correlation matrix, from standardized residuals):
y1 y2 y3
y1 1.440 0.648 0.24
y2 0.648 0.810 0.45
y3 0.240 0.450 1.00
Inferred posterior mean covariance matrix:
y1 y2 y3
y1 1.551499 0.514865 0.086018
y2 0.514865 0.697577 0.539261
y3 0.086018 0.539261 1.016035
Difference (inferred - true). Should be close to zero:
y1 y2 y3
y1 0.1115 -0.1331 -0.1540
y2 -0.1331 -0.1124 0.0893
y3 -0.1540 0.0893 0.0160
Max absolute difference: 0.1540
Mean squared error: 0.013793
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# 1. Extract all raw posterior samples for the 'cov' matrix
# The initial shape is usually (chains, draws, N, N)
cov_samples = idata_cov.posterior["cov"].values

# Flatten the chain and draw dimensions to pool all samples together
# New shape: (total_samples, N, N)
cov_samples_flat = cov_samples.reshape(-1, cov_samples.shape[-2], cov_samples.shape[-1])

n_outcomes = len(outcome_names)

# 2. Set up a grid of subplots (N x N)
fig, axes = plt.subplots(
    n_outcomes, n_outcomes, figsize=(3.5 * n_outcomes, 3.5 * n_outcomes)
)

# Handle the 1x1 case gracefully if there's only one outcome
if n_outcomes == 1:
    axes = np.array([[axes]])

# 3. Loop through each cell in the covariance matrix
for i in range(n_outcomes):
    for j in range(n_outcomes):
        ax = axes[i, j]

        # Extract the 1D array of samples for this specific (i, j) component
        cell_samples = cov_samples_flat[:, i, j]

        # Plot the posterior distribution
        sns.kdeplot(cell_samples, ax=ax, fill=True, color="C0", alpha=0.6)

        # Plot the true value as a vertical red dashed line
        true_val = true_cov[i, j]
        ax.axvline(
            true_val,
            color="red",
            linestyle="--",
            linewidth=2,
            label=f"True: {true_val:.2f}",
        )

        # Plot the inferred mean as a vertical black dotted line
        inferred_mean = cov_post[i, j]
        ax.axvline(
            inferred_mean,
            color="black",
            linestyle=":",
            linewidth=2,
            label=f"Mean: {inferred_mean:.2f}",
        )

        # Formatting
        ax.set_title(f"Cov({outcome_names[i]}, {outcome_names[j]})")
        ax.set_yticks([])  # Hide y-axis ticks for cleaner look

        # Only show the legend on the very first subplot to avoid clutter
        if i == 0 and j == 0:
            ax.legend()

plt.tight_layout()
plt.show()
../_images/6ff84a636b9d61b2c6eaeac383cadf7ae70d4a0020570f989793a7d2bd0a3f1b.png

7. Summary: coefficients and effect summary#

The summary() method prints the experiment type, formula, model coefficients, and an effect summary table (mean impact, HDI, cumulative effect, etc.). For multivariate ITS, the effect summary has one row per outcome and per statistic (average / cumulative), so you can read the causal effect for each outcome.

# Uses the multivariate two-period result from section 1
result_cov.summary()
==================================Pre-Post Fit==================================
Formula: ['y1 ~ 1 + t', 'y2 ~ 1 + t', 'y3 ~ 1 + t']
Model coefficients:

  Outcome: y1
    Intercept  0.86, 94% HDI [0.21, 1.5]
    t          0.33, 94% HDI [0.1, 0.55]

  Outcome: y2
    Intercept  9.9, 94% HDI [9.4, 10]
    t          0.16, 94% HDI [0.0099, 0.31]

  Outcome: y3
    Intercept  16, 94% HDI [15, 16]
    t          0.2, 94% HDI [0.017, 0.38]

  Residual covariance (upper triangle), 94% HDI:
  cov(y1, y1)  1.6, 94% HDI [1, 2.3]
  cov(y1, y2)  0.51, 94% HDI [0.24, 0.87]
  cov(y1, y3)  0.086, 94% HDI [-0.26, 0.43]
  cov(y2, y2)  0.7, 94% HDI [0.48, 1]
  cov(y2, y3)  0.54, 94% HDI [0.31, 0.85]
  cov(y3, y3)  1, 94% HDI [0.69, 1.5]

Effect summary:
  outcome   statistic         mean       median   hdi_lower    hdi_upper    p_gt_0  relative_mean  relative_hdi_lower  relative_hdi_upper
0      y1     average    20.038780    20.036233   18.789462    21.273351  1.000000     634.875454          369.626710          936.991426
1      y1  cumulative  1001.938982  1001.811646  939.473116  1063.667528  1.000000     634.875456          369.626711          936.991430
2      y2     average     0.750040     0.748689   -0.075078     1.576197  0.962562       6.912498           -0.923358           14.976382
3      y2  cumulative    37.502004    37.434429   -3.753891    78.809834  0.962562       6.912498           -0.923358           14.976382
4      y3     average   -10.165110   -10.166098  -11.158164    -9.155131  0.000000     -59.417583          -61.792658          -57.032796
5      y3  cumulative  -508.255516  -508.304923 -557.908203  -457.756544  0.000000     -59.417583          -61.792658          -57.032796
[y1] Post-period (50 to 99), the average effect was 20.04 (95% HDI [18.79, 21.27]), with a posterior probability of an increase of 1.000. The cumulative effect was 1001.94 (95% HDI [939.47, 1063.67]); probability of an increase 1.000. Relative to the counterfactual, this equals 634.88% on average (95% HDI [369.63%, 936.99%]).

[y2] Post-period (50 to 99), the average effect was 0.75 (95% HDI [-0.08, 1.58]), with a posterior probability of an increase of 0.963. The cumulative effect was 37.50 (95% HDI [-3.75, 78.81]); probability of an increase 0.963. Relative to the counterfactual, this equals 6.91% on average (95% HDI [-0.92%, 14.98%]).

[y3] Post-period (50 to 99), the average effect was -10.17 (95% HDI [-11.16, -9.16]), with a posterior probability of an increase of 0.000. The cumulative effect was -508.26 (95% HDI [-557.91, -457.76]); probability of an increase 0.000. Relative to the counterfactual, this equals -59.42% on average (95% HDI [-61.79%, -57.03%]).