etwfe

Extended Two-Way Fixed Effects for Python — implementing Wooldridge's modern approach to difference-in-differences with staggered treatment adoption.

Two-way fixed effects (TWFE) regressions are widely used for causal inference with panel data, but naive TWFE specifications produce misleading estimates when treatment adoption is staggered and effects vary across cohorts or time. Wooldridge (2023, 2025) proposed a clean solution: fully interact treatment status with cohort and time indicators. This saturated specification—extended TWFE (ETWFE)—recovers cohort-time-specific effects that can be aggregated into interpretable summaries like overall average effects or dynamic event studies.

This Python package development was inspired by the excellent etwfe R package developed by Grant McDermott. It handles the tedious work of building the saturated interaction specification, performs estimation through high-dimensional fixed effects or quasi-MLE, and extracts interpretable treatment effect estimates with proper standard errors.

Installation

shell

pip install etwfe

Requires Python 3.8+. Dependencies install automatically.

Sample Data

The examples in this guide use mpdta, a panel dataset tracking teen employment in 500 U.S. counties from 2003 through 2007. Some counties implemented minimum wage increases in 2004, 2006, or 2007, while others never raised their minimum wage—creating a textbook staggered adoption design for difference-in-differences analysis.

python

from etwfe import etwfe
import pyreadr

# Retrieve the dataset
url = "https://github.com/bcallaway11/did/raw/master/data/mpdta.rda"
pyreadr.download_file(url, "mpdta.rda")
mpdta = pyreadr.read_r("mpdta.rda")["mpdta"]

# Set up the variables
mpdta["first_treat"] = mpdta["first.treat"].astype(int)
mpdta["year"] = mpdta["year"].astype(int)
mpdta["countyreal"] = mpdta["countyreal"].astype(int)

mpdta.head()

Output

    year  countyreal      lpop      lemp  first.treat  treat  first_treat
866 2003        8001  5.896761  8.461469         2007      1         2007
841 2004        8001  5.896761  8.336870         2007      1         2007
842 2005        8001  5.896761  8.340217         2007      1         2007
819 2006        8001  5.896761  8.378161         2007      1         2007
827 2007        8001  5.896761  8.487352         2007      1         2007

The outcome of interest is lemp, which records log teen employment. The first_treat column indicates the year a county first saw a minimum wage increase (with 0 denoting counties that never experienced an increase). We also have lpop (log population) available as a potential covariate.

Core Workflow

Analysis with etwfe proceeds in two stages. First, etwfe() fits the saturated interaction specification. Second, emfx() transforms the many coefficient estimates into meaningful treatment effect summaries.

Fitting the Model

The etwfe() function requires a formula identifying the outcome and any covariates, plus arguments specifying the time period variable, treatment cohort variable, and the data. Clustering standard errors by unit is generally advisable for panel data.

python

mod = etwfe(
    fml="lemp ~ lpop",          # outcome ~ covariates
    tvar="year",                # time period
    gvar="first_treat",         # treatment cohort
    data=mpdta,
    vcov={"CRV1": "countyreal"}  # cluster by county
)

Behind the scenes, the function constructs treatment indicators and fully interacts them with cohort and time dummies. The fitted object wraps a pyfixest regression, so methods like .summary() work as expected. However, the raw interaction coefficients lack direct interpretation—that's where emfx() becomes essential.

Extracting Treatment Effects

The emfx() method aggregates the interaction coefficients into interpretable treatment effect estimates. A basic call produces the overall average treatment effect on the treated:

python

# Overall average effect
mod.emfx()

Average Treatment Effect

   _Dtreat  estimate  std.error  conf.low  conf.high
0      1.0 -0.050627   0.012497 -0.075122  -0.026132

The estimate of −0.051 implies that minimum wage increases lowered teen employment by roughly 5% on average across treated counties and post-treatment periods. The confidence interval excludes zero, indicating statistical significance.

The type parameter controls how effects are broken down. Setting type="event" generates an event study that tracks effects relative to treatment timing:

python

# Dynamic event study
mod.emfx(type="event")

Event Study

   event  estimate  std.error  conf.low  conf.high
0      0 -0.033212   0.013366 -0.059409  -0.007015
1      1 -0.057346   0.017150 -0.090959  -0.023732
2      2 -0.137870   0.030788 -0.198216  -0.077525
3      3 -0.109539   0.032315 -0.172877  -0.046202

The event study uncovers interesting dynamics: effects begin modestly (−3.3% at the treatment date) but amplify substantially over subsequent years, peaking at −13.8% two years after adoption before moderating slightly.

Creating Visualizations

Event Study Plots

The plot() method generates ready-to-use visualizations. The default style shows point estimates with error bars:

python

mod.plot(type="event")

Event study showing treatment effects relative to adoption timing

When using the default cgroup="notyet" control group, pre-treatment coefficients are mechanically zero because not-yet-treated observations serve as controls during those periods. To assess whether parallel trends held before treatment—a crucial diagnostic—restrict the comparison group to never-treated units:

python

mod_never = etwfe(
    fml="lemp ~ lpop",
    tvar="year",
    gvar="first_treat",
    data=mpdta,
    cgroup="never",              # only never-treated as controls
    vcov={"CRV1": "countyreal"}
)

# Include pre-treatment periods
mod_never.emfx(type="event", post_only=False)

Event Study with Pre-Treatment

   event  estimate  std.error  conf.low  conf.high
0     -4  0.006896   0.024689 -0.041495   0.055287
1     -3  0.027595   0.018148 -0.007976   0.063166
2     -2  0.023465   0.014531 -0.005017   0.051947
3     -1  0.000000   0.000000  0.000000   0.000000
4      0 -0.021147   0.011394 -0.043478   0.001185
5      1 -0.053356   0.015774 -0.084274  -0.022438
6      2 -0.141080   0.032289 -0.204367  -0.077793
7      3 -0.107544   0.032923 -0.172074  -0.043015

The pre-treatment estimates (events −4 through −2) cluster near zero with confidence intervals spanning it, supporting the parallel trends assumption. For a smoother visual, use ribbon-style confidence bands:

python

mod_never.plot(type="event", post_only=False, style="ribbon")

Event study with ribbon-style confidence bands

Comparing Control Group Specifications

You can compare estimates across control group specifications using a regression table format:

python

import pandas as pd
import numpy as np

# Get event study results
es_never = mod_never.emfx(type="event", post_only=False)[["event", "estimate", "std.error"]]
es_notyet = mod.emfx(type="event")[["event", "estimate", "std.error"]]

# Rename columns
es_never.columns = ["event", "est_1", "se_1"]
es_notyet.columns = ["event", "est_2", "se_2"]

# Merge
table = es_never.merge(es_notyet, on="event", how="outer")
table = table.sort_values("event").reset_index(drop=True)

# Get model stats
nobs_1 = mod_never.model_._N
nobs_2 = mod.model_._N
r2_1 = mod_never.model_._r2
r2_2 = mod.model_._r2
rmse_1 = mod_never.model_._rmse
rmse_2 = mod.model_._rmse

# Function to add significance stars
def add_stars(est, se):
    if pd.isna(est) or pd.isna(se) or se == 0:
        return f"{est:.3f}" if pd.notna(est) else ""
    t = abs(est / se)
    if t > 3.291:
        return f"{est:.3f}***"
    elif t > 2.576:
        return f"{est:.3f}**"
    elif t > 1.96:
        return f"{est:.3f}*"
    elif t > 1.645:
        return f"{est:.3f}+"
    else:
        return f"{est:.3f}"

# Build formatted output
print("\n" + "="*50)
print("                    Event Study")
print("="*50)
print(f"{'':30} {'(1)':>10} {'(2)':>10}")
print("-"*50)

for _, r in table.iterrows():
    evt = int(r["event"])
    label = f"Years post treatment = {evt}"
    
    col1 = add_stars(r["est_1"], r["se_1"]) if pd.notna(r["est_1"]) else ""
    col2 = add_stars(r["est_2"], r["se_2"]) if pd.notna(r["est_2"]) else ""
    
    print(f"{label:30} {col1:>10} {col2:>10}")
    
    se1 = f"({r['se_1']:.3f})" if pd.notna(r['se_1']) and r['se_1'] != 0 else ""
    se2 = f"({r['se_2']:.3f})" if pd.notna(r['se_2']) else ""
    
    print(f"{'':30} {se1:>10} {se2:>10}")

print("-"*50)
print(f"{'Num.Obs.':30} {nobs_1:>10} {nobs_2:>10}")
print(f"{'R2':30} {r2_1:>10.3f} {r2_2:>10.3f}")
print(f"{'RMSE':30} {rmse_1:>10.3f} {rmse_2:>10.3f}")
print(f"{'FE..first.treat':30} {'X':>10} {'X':>10}")
print(f"{'FE..year':30} {'X':>10} {'X':>10}")
print("-"*50)
print("+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001")
print("Std. errors are clustered at the county level")
print("="*50)

Regression Comparison Table

==================================================
                    Event Study
==================================================
                                      (1)        (2)
--------------------------------------------------
Years post treatment = -4           0.007           
                                  (0.025)           
Years post treatment = -3           0.028           
                                  (0.018)           
Years post treatment = -2           0.023           
                                  (0.015)           
Years post treatment = -1           0.000           
                                                    
Years post treatment = 0          -0.021+    -0.033*
                                  (0.011)    (0.013)
Years post treatment = 1        -0.053***  -0.057***
                                  (0.016)    (0.017)
Years post treatment = 2        -0.141***  -0.138***
                                  (0.032)    (0.031)
Years post treatment = 3         -0.108**  -0.110***
                                  (0.033)    (0.032)
--------------------------------------------------
Num.Obs.                             2500       2500
R2                                  0.873      0.873
RMSE                                0.537      0.537
FE..first.treat                         X          X
FE..year                                X          X
--------------------------------------------------
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Std. errors are clustered at the county level
==================================================

Column (1) uses never-treated controls (cgroup="never") while column (2) uses not-yet-treated controls (cgroup="notyet"). The pre-treatment coefficients in column (1) support the parallel trends assumption. Post-treatment estimates are similar across specifications, with the largest effects occurring at event time 2.

python

mod_never.plot(type="event", post_only=False)

Event study with pre-treatment periods showing parallel trends validation

Effect Heterogeneity

Treatment effects frequently differ across subpopulations. The xvar parameter enables estimation of heterogeneous effects by interacting treatment with a categorical moderator.

As an illustration, we can test whether minimum wage impacts differ between Great Lakes states and other regions:

python

# Flag Great Lakes states by FIPS code
great_lakes = [17, 18, 26, 27, 36, 39, 42, 55]
mpdta["gls"] = mpdta["countyreal"].astype(str).str[:2].astype(int).isin(great_lakes)

With xvar specified, emfx(by_xvar=True) produces separate estimates for each subgroup:

python

hmod = etwfe(
    fml="lemp ~ lpop",
    tvar="year",
    gvar="first_treat",
    data=mpdta,
    cgroup="never",
    xvar="gls",                  # heterogeneity dimension
    vcov={"CRV1": "countyreal"}
)

# Subgroup-specific effects
hmod.emfx(by_xvar=True)

Effects by Region

   estimate  std.error  conf.low  conf.high  _Dtreat    gls
0 -0.051920   0.034637 -0.119809   0.015969      1.0  False
1 -0.039396   0.024296 -0.087016   0.008223      1.0   True

Both regions exhibit negative point estimates, though the confidence intervals overlap considerably—the regional difference doesn't appear statistically meaningful. Visualize the heterogeneous event study with:

python

hmod.plot(type="event", by_xvar=True, post_only=False, style="ribbon")

Heterogeneous treatment effects by region (Great Lakes vs. Other)

Nonlinear Specifications

Standard ETWFE presumes that treated and control groups follow parallel trajectories in outcome levels. This "linear parallel trends" assumption becomes questionable when outcomes have bounded ranges or skewed distributions.

When Linear Assumptions Strain

Consider binary outcomes where baseline rates differ markedly between groups—say 15% versus 40%. Expecting parallel evolution in probability levels may be implausible; perhaps trends should be parallel in log-odds instead. Analogous concerns arise with count data clustering near zero, or fractional responses constrained between 0 and 1.

Wooldridge (2023) developed nonlinear ETWFE estimators that assume parallel trends in an index (the linear predictor) rather than the outcome itself. For binary outcomes, this suggests logit; for counts, Poisson. These estimators preserve the same equivalence results as linear ETWFE—pooled quasi-MLE and imputation yield identical estimates.

Poisson Implementation

Here we apply Poisson regression to employment counts rather than log employment:

python

import numpy as np

# Convert to levels
mpdta["emp"] = np.exp(mpdta["lemp"])

# Poisson ETWFE
mod_pois = etwfe(
    fml="emp ~ lpop",
    tvar="year",
    gvar="first_treat",
    data=mpdta,
    vcov={"CRV1": "countyreal"},
    family="poisson"
)

mod_pois.emfx(type="event")

Poisson Event Study

   event   estimate  std.error    conf.low   conf.high
0      0 -25.349748  15.886320  -56.486936    5.787439
1      1   1.091751  40.295264  -77.886967   80.070469
2      2 -75.124632  23.154168 -120.506802  -29.742462
3      3 -101.823979  27.087814 -154.916095  -48.731864

Robustness Assessment

Even when linear parallel trends seems reasonable, running both linear and nonlinear models provides a valuable robustness check. When estimates align across specifications, confidence in the findings grows. When they diverge, it signals sensitivity to the functional form assumption—useful diagnostic information for interpreting results.

Computational Efficiency

Model fitting via etwfe() runs quickly. The emfx() marginal effects calculation can slow down with large datasets due to the high dimensionality of interaction terms. Two strategies help manage computation time.

Omitting Standard Errors

Setting vcov=False skips variance-covariance estimation, which often dominates runtime. This returns point estimates only—useful for checking specifications before investing in full computation:

python

# Point estimates without standard errors
mod.emfx(type="event", vcov=False)

Data Compression

The compress=True option implements the data compression approach from Wong et al. (2021). The core insight is that for linear models, data can be reduced to sufficient statistics—aggregated summaries preserving all information needed for estimation.

Compression can shrink datasets with millions of observations to much smaller structures, enabling interactive analysis that would otherwise demand specialized infrastructure. Point estimates match uncompressed results exactly, while standard errors are approximate but typically accurate to 2-3 decimal places.

python

# Compressed estimation
mod.emfx(type="event", compress=True)

Practical Approach

For large datasets, a staged approach works well:

Run emfx(vcov=False) to verify point estimates look sensible
Run emfx(vcov=False, compress=True) to confirm compression preserves estimates
If results align, run emfx(compress=True) for approximate standard errors

API Reference

etwfe()

Fit an Extended Two-Way Fixed Effects model.

Argument	Type	Description
fml	`str`	Model formula: `"outcome ~ covariates"` or `"outcome ~ 0"` for no covariates
tvar	`str`	Name of the time period variable
gvar	`str`	Name of the treatment cohort variable (period of first treatment; 0 = never treated)
data	`DataFrame`	Panel dataset containing all required variables
ivar	`str`	Unit identifier for unit-level fixed effects (optional)
xvar	`str`	Variable for heterogeneous treatment effects (optional)
tref	`int`	Reference time period (optional; defaults to earliest)
gref	`int`	Reference cohort (optional; defaults to 0 / never-treated)
cgroup	`str`	Control group: `"notyet"` (default) or `"never"`
fe	`str`	Fixed effects: `"vs"`, `"feo"` (default), or `"none"`
family	`str`	GLM family: `"poisson"`, `"logit"`, `"probit"`, etc.
vcov	`str \| dict`	Variance estimator, e.g., `{"CRV1": "cluster_var"}`

model.emfx()

Calculate aggregated marginal effects (treatment effect estimates).

Argument	Type	Description
type	`str`	Aggregation: `"simple"` (overall ATT), `"event"`, `"group"`, or `"calendar"`
by_xvar	`bool`	Report separate effects by xvar levels (default False)
post_only	`bool`	Restrict to post-treatment periods only (default True)
compress	`bool`	Apply data compression for speed (default False)
vcov	`bool`	Calculate standard errors; set False to skip (default True)
predict	`str`	Prediction scale: `"response"` (default) or `"link"`

model.plot()

Generate visualizations of treatment effects.

Argument	Type	Description
type	`str`	Plot type: `"event"`, `"group"`, or `"calendar"`
style	`str`	Visual style: `"errorbar"` (default) or `"ribbon"`
post_only	`bool`	Display only post-treatment periods
by_xvar	`bool`	Draw separate series for each xvar group
figsize	`tuple`	Figure dimensions as (width, height)
color	`str`	Color for single-group plots
colors	`list`	Color sequence for multi-group plots

Requirements

Python >= 3.9
numpy >= 1.21.0
pandas >= 1.3.0
matplotlib >= 3.4.0
pyfixest >= 0.18.0

References

Wooldridge, J. M. (2025). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. Empirical Economics, 69(5), 2545-2587.
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31-C66.
Callaway, B. and Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.
Wong, J., Forsell, E., Lewis, R., Mao, T., and Wardrop, M. (2021). You only compress once: Optimal data compression for estimating linear models. arXiv:2102.11297.

Acknowledgments

This package relies on several high-quality open-source projects. Estimation uses pyfixest for high-dimensional fixed effects regression. The API and methodology draw heavily from Grant McDermott's etwfe package for R. We thank the developers of these projects for their contributions to accessible econometric software.