Sunbelt Computer Software

Test Data Fixtures

hrs_edid_validation.csv

Source: Dobkin, C., Finkelstein, A., Kluender, R., & Notowidigdo, M. J. (2018). "The Economic Consequences of Hospital Admissions." American Economic Review, 108(2), 308-352. Replication kit: https://www.openicpsr.org/openicpsr/project/116186/version/V1/view

License / redistribution: This fixture is a derived four-column subset (de-identified HRS public-use person id, wave, out-of-pocket spending, and first-hospitalization wave) of the publicly available Dobkin et al. (2018) replication package deposited in the AEA's openICPSR repository (project 116186, distributed by ICPSR for replication of the published article). Only the derived subset is committed — the full source HRS_long.dta is not redistributed here (.gitignored as replication_data/; regenerate from the openICPSR deposit via the snippet below). It is included solely as a regression-test fixture to replicate the paper's Table 6. Consult the openICPSR deposit page for the deposit's exact Terms of Use.

Sample selection: Follows Sun & Abraham (2021), as used by Chen, Sant'Anna & Xie (2025) Section 6:

Read HRS_long.dta from the Dobkin et al. replication kit
Keep waves 7-11, retain only individuals present in all 5 waves
Filter to ever-hospitalized individuals with first_hosp >= 8
Filter to ages 50-59 at hospitalization (age_hosp)
Drop wave 11 (no valid comparison group)
Recode first_hosp == 11 as never-treated (inf)

Expected counts:

Columns: unit (hhidpn), time (wave), outcome (oop_spend, 2005 dollars), first_treat (first_hosp)

Note on sample size (656 vs 652): Chen, Sant'Anna & Xie (2025) Table 6 reports 652 individuals; this fixture yields 656. The four-individual difference reflects a minor sample-selection nuance (e.g. exact age-window or first-hospitalization tie handling) not fully pinned down by the paper text. It is immaterial to the validation: every EDiD point estimate in tests/test_efficient_did_validation.py::TestHRSReplication matches the published Table 6 value to within 0.03 of the published standard error.

Regeneration: Requires the Dobkin et al. replication kit (.gitignored as replication_data/).

import pandas as pd, numpy as np
df = pd.read_stata("replication_data/116186-V1/Replication-Kit/HRS/Data/HRS_long.dta")
sub = df[df["wave"].isin([7, 8, 9, 10, 11])]
balanced = sub.groupby("hhidpn")["wave"].nunique()
sub = sub[sub["hhidpn"].isin(balanced[balanced == 5].index)]
sub = sub[sub["hhidpn"].isin(sub[sub["first_hosp"].notna()]["hhidpn"].unique())]
fh = sub.groupby("hhidpn")["first_hosp"].first()
sub = sub[sub["hhidpn"].isin(fh[fh >= 8].index)]
ages = sub.groupby("hhidpn")["age_hosp"].first()
sub = sub[sub["hhidpn"].isin(ages[(ages >= 50) & (ages <= 59)].index)]
sub = sub[sub["wave"] <= 10]
sub["first_treat"] = sub["first_hosp"].apply(lambda x: np.inf if x == 11 else int(x))
out = sub[["hhidpn", "wave", "oop_spend", "first_treat"]].copy()
out.columns = ["unit", "time", "outcome", "first_treat"]
out["unit"] = out["unit"].astype(int)
out["time"] = out["time"].astype(int)
out.sort_values(["unit", "time"]).reset_index(drop=True).to_csv(
    "tests/data/hrs_edid_validation.csv", index=False
)

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
hrs_edid_validation.csv		hrs_edid_validation.csv
sdid_placebo_indices_r.json		sdid_placebo_indices_r.json
synth_basque_golden.json		synth_basque_golden.json
synth_basque_panel.csv		synth_basque_panel.csv

Column	Values
Total individuals	656
Waves	7, 8, 9, 10
Rows	2,624
G=8	252
G=9	176
G=10	163
G=inf	65

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Test Data Fixtures

hrs_edid_validation.csv

Sunbelt Computer Software

PL/B Language Development and Support

FilesExpand file tree

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Test Data Fixtures

hrs_edid_validation.csv