diff --git a/CHANGELOG.md b/CHANGELOG.md
index 92f06d56..9b6ec40c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster=<col> + survey_design.psu + vcov_type="conley"` coerces `cluster=<col>` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference.
 - **HeterogeneousAdoptionDiD methodology-review-tracker promotion.** New `tests/test_methodology_had.py` (6 classes, 36 tests) with paper-equation-numbered Verified Components walk-through against de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) arXiv:2405.04465v6 (Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7): Design 1' MC recovery on both the zero-boundary DGP AND a nonzero-boundary-intercept DGP (`ΔY = c + β·D + ε` with `c != 0`) so the `att = (mean(ΔY) − τ_bc) / mean(D)` subtraction term is verified explicitly, N(0,1) coverage at `n_replicates=200`, mass-point Wald-IV closed-form equivalence at `atol=1e-9`, QUG limit-law distributional match at KS-stat ≤ 0.05 (n_draws=5000), Yatchew-HR paper-literal `σ²_diff = 1/(2G)` normalization lock, joint Stute pre-trends + homogeneity H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP, and library-deviation locks (equal-weighting via selective low-dose-region replication, sup-t bootstrap gating, staggered-timing fail-closed `ValueError`). Added "Non-testable assumptions (paper Section 3.1.2)" Notes block to `HeterogeneousAdoptionDiD` class docstring + "Scope (what this test does NOT cover)" clauses to `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections explicitly stating that the pre-tests verify ADJACENT assumptions (Assumption 4 / 7 / 8) and CANNOT test Assumptions 5 or 6. Phase-4 validation-harness items (Pierce-Schott 2016 Figure 2 replication, Table 1 coverage-rate reproduction across 3 DGPs × G ∈ {100, 500, 2500}) waived with documented rationale: R parity at `atol=1e-8` in `tests/test_did_had_parity.py` (3 DGPs × 5 method combos, bit-exact via `rtol=0`) is a strictly stronger anchor than coverage-rate Monte Carlo, and the paper itself self-acknowledges (Section 5.2) that NP estimators are too noisy to be informative on the LBD-restricted PNTR panel. REGISTRY HAD section gains a consolidated Deviations block (5 entries with framing header) and closes 2 of 3 unchecked Implementation Checklist items — the staggered-timing fail-closed `ValueError` and the Assumption 5/6 non-testability documentation; the `covariates=` Theorem 6 follow-up and the extensive-margin / "consider running standard DiD" warning both remain explicitly tracked in `TODO.md` as Low-priority follow-ups rather than claimed-closed. `dechaisemartin-2026-review.md:182-194` requirements checklist boxes the Phase 1a/1b/1c implementation-status closures + the Assumption 5/6 documentation + the staggered-timing closures; the extensive-margin item is acknowledged as partial (zero-dose `UserWarning` exists in `qug_test`; main-`fit()` "consider standard DiD" recommendation is the TODO follow-up). `METHODOLOGY_REVIEW.md` HAD row promoted **In Progress** → **Complete**.
+- **WLS-CR2 Bell-McCaffrey gates lifted (clubSandwich port).** `vcov_type="hc2_bm" + weights` (both one-way and cluster-robust) is now supported via `compute_robust_vcov` / `solve_ols` / `LinearRegression`, matching `clubSandwich::vcovCR(..., type="CR2")` + `coef_test(test="Satterthwaite")$df_Satt` and `Wald_test(test="HTZ")$df_denom` at `atol=1e-10` on six new weighted scenarios in `benchmarks/data/clubsandwich_cr2_golden.json` (vcov + non-noise-floor per-coefficient DOF + compound-contrast DOF match; high-leverage FE-dummy coefficients fall at the noise floor and are suppressed to NaN per the precision-limit note below) (`tests/test_methodology_wls_cr2.py`). `LinearRegression` now populates per-coefficient `_bm_dof` on the weighted-cluster path (previously skipped), so `get_inference()` reports the correct Satterthwaite DOF instead of falling back to `n-k`. The lift applies to the **analytical** vcov surface; estimator-level `survey_design=` paths continue to use the Taylor-series linearization survey variance (which takes precedence over the analytical sandwich). Weight-type restriction: the port matches the `pweight` (sampling-weight) convention only — `aweight` and `fweight` raise `NotImplementedError` (separate methodology task); CR1 (`vcov_type="hc1"`) supports all three weight types. Critical implementation note: the diff-diff form matches **clubSandwich's specific algebra** (R source: `CR-adjustments.R::CR2`, `clubSandwich.R::vcov_CR`, `coef_test.R::Satterthwaite_df`), NOT a textbook Pustejovsky-Tipton (2018) §3.3 transform-once derivation — clubSandwich uses W (not √W) in the hat matrix, W² in the bias term, and unweighted residuals in the score construction. The Satterthwaite DOF uses the full `get_arrays.R::get_GH` H1/H2/H3 array construction; the simpler `(tr B)² / tr(B²)` form (which is exact for unweighted) diverges from clubSandwich by 0.5-30% on weighted designs. Unweighted CR2-BM behavior is bit-equal to prior at `atol=1e-14` (regression-safe via `TestUnweightedRegressionStillBitEqual` + `TestDOFFormulaDualPathEquivalence`). **Known precision limit**: the Satterthwaite DOF formula for high-leverage FE-dummy contrasts (where the contrast vector projects to near-zero on the design) is at the float64 noise floor; the helper detects this regime via two criteria applied union-wise — **(a) batch-relative** (per-contrast max|P| below `1e-10 ×` the largest contrast's max|P|; catches FE-dummies in a per-coefficient sweep), and **(b) absolute single-contrast safe** (per-contrast max|P| below `(EPS × n × k × max(bread_inv_scale, 1))²`; catches single-contrast calls like MPD avg_att where no batch reference exists) — and returns NaN with a `UserWarning` rather than ship BLAS-implementation-dependent DOF (15-30% disagreement vs R's clubSandwich on those specific contrasts). VCOV/SE remain valid; only the affected coefficients' DOF (and any t-test or CI that depends on it) is suppressed. The weighted clustered CR2 path also physically filters `weights > 0` rows before per-cluster computations to preserve subpopulation invariance (zero-weight rows inside otherwise positive-weight clusters were previously entering the CR2 adjustment matrices on the row side; fixed in PR #475 round 2). clubSandwich version pin: ≥ 0.7.0. Closes TODO.md rows 104-105 (Gates 4 and 5).
 - **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
 - **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
 - **`SpilloverDiD(survey_design=...)` integration on HC1 / CR1 paths via Binder TSL (Wave E.1).** Lifts the Wave B/C/D upfront `NotImplementedError` and adds design-based variance for `vcov_type ∈ {"hc1"}` plus `cluster=<col>` (CR1). **Documented synthesis** of Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — composed with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all ingredients. **Mechanical composition:** SpilloverDiD's per-obs Wave D IF `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` (with survey weights threaded through `gamma_hat` solve, eps construction, and bread inversion via Hájek normalization) is aggregated to PSU totals and passed to the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. Stage-1 FE estimation extends `_iterative_fe_subset` with a `weights=` kwarg implementing WLS-FE via weighted bincount (numerator `bincount(w*resid)` / denominator `bincount(w)`); the `weights is None` path is bit-identical to the Wave B / C / D unweighted bincount. **Degrees of freedom:** t-distribution lookup uses `ResolvedSurveyDesign.df_survey` (4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`), threaded through all four `safe_inference` call sites (aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, scalar `att` lincom). **Survey-array subsetting:** when `finite_mask` drops baseline-treated rows, `survey_weights` and `ResolvedSurveyDesign.{weights, strata, psu, fpc, replicate_weights}` are subsetted in parallel; `n_psu`, `n_strata`, and `survey_metadata` are recomputed (mirrors `TwoStageDiD.fit:567-601`). **Cluster + survey resolution:** when `cluster=<col>` and `survey_design.psu` are both supplied with different groupings, a `UserWarning` fires and PSU wins (mirrors `_resolve_effective_cluster` at `survey.py:1253-1275`; TwoStageDiD parity). When `cluster=<col>` is supplied without `survey_design.psu`, the cluster column is injected as the effective PSU via `_inject_cluster_as_psu`, which now honors `SurveyDesign.nest`: under `nest=False`, cluster labels must be globally unique across strata (raises if they repeat, matching the explicit-PSU resolver's contract). **Saturated `df_survey = 0` NaN-fail:** when `lonely_psu="remove"` removes all strata (singleton PSUs), the meat helper returns `(_, var_computed=False, legit_zero=0)` and SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. **Subpopulation limitation (Wave E.3 follow-up):** `SurveyDesign.subpopulation()`-derived designs with zero-weight padding rows that lose stage-1 FE support have those rows physically removed by `finite_mask`, so `n_psu` / `df_survey` / Binder centering reflect the reduced fit sample rather than the full domain design (documented in REGISTRY; Wave E.3 will preserve full-design bookkeeping). **Public surface restrictions:** `vcov_type="conley" + survey_design=` originally raised `NotImplementedError` pointing at planned Wave E.2; lifted in the Wave E.2 entry above (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up; non-pweight (`weight_type ∈ {"fweight", "aweight"}`) raises `ValueError` (the Binder TSL assumes probability weights). **Implementation:** `_compute_gmm_corrected_meat` extended with `survey_weights` + `resolved_survey` kwargs at `diff_diff/two_stage.py:56` (TYPE_CHECKING forward reference for `ResolvedSurveyDesign` to avoid circular import); new module-level helper `_compute_binder_tsl_meat` at `diff_diff/two_stage.py` wraps `_compute_stratified_meat_from_psu_scores` with implicit per-obs PSU synthesis for no-PSU survey designs + the Wave E.1 NaN-fail + warning; `_iterative_fe_subset` weighted path at `diff_diff/spillover.py:1382` (in-place extension, bit-identical fallback, positive-weight identification gate); `_inject_cluster_as_psu` honors `nest` (shared survey-helper fix that also benefits TwoStageDiD); `ResolvedSurveyDesign` gains a `nest` field propagated through all 5 construction sites. `SpilloverDiDResults` extended with `survey_metadata`, `n_psu`, `n_strata` fields at `diff_diff/results.py`. **Tests:** new `TestSpilloverDiDWaveE1SurveyDesignHc1` (17 tests: bit-identity fallback, Binder TSL hand-check uniform + non-uniform weights, lonely_psu modes, FPC degenerate limits ×3, saturated NaN-fail with `pytest.warns(match="df_survey")`, cluster+survey warn-and-use-PSU, no-PSU regressions (weights-only, weights+strata, cluster-without-PSU, cluster overlap with nest=False/True), zero-weight Omega_0 exclusion + all-zero raises, replicate-weight + non-pweight + Conley+survey rejections, fit idempotency, finite_mask subsetting) and `TestSpilloverDiDWaveE1SurveyDesignEventStudy` (7 tests: event-study + survey on both `is_staggered` branches with `df_survey` lincom verification, distinguishability between survey-share and sample-share lincom rules via manual reconstruction with cohort-correlated weights + non-constant tau_k, aggregate-vs-event-study parity, drift goldens, subset-path invariant). Wave B/C/D bullets below are unchanged; this entry replaces the pre-Wave-E.1 `survey_design=` rejection.
diff --git a/TODO.md b/TODO.md
index eedc57df..0371202d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -101,8 +101,14 @@ Deferred items from PR reviews that were not addressed before merge.
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 7 standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham` (this row tracks the remaining 7). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
-| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
-| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
+<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
+     form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
+     unweighted residuals) rather than the textbook PT2018 transform-once derivation
+     which historically diverged by 0.5-30%. Pinned at atol=1e-10 on 6 weighted
+     scenarios in benchmarks/data/clubsandwich_cr2_golden.json. -->
+| ~~Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster)~~ — LIFTED via clubSandwich WLS-CR2 port. Routes through `_compute_cr2_bm_contrast_dof` with singleton-cluster reduction (each obs is its own cluster). See REGISTRY.md Phase 1a row for the algebra. | `linalg.py` | LIFTED | — |
+| ~~Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`)~~ — LIFTED via clubSandwich WLS-CR2 port. `_compute_cr2_bm` now accepts `weights=` and threads through clubSandwich's specific WLS-CR2 algebra. | `linalg.py::_compute_cr2_bm` | LIFTED | — |
+| `LinearRegression.fit()` pays the CR2 cost twice on the weighted `hc2_bm` path: once inside `solve_ols(..., return_vcov=True)` and again via `compute_robust_vcov(..., return_dof=True)` to populate `_bm_dof`. Correct but redundant. Fix: thread `return_dof` through `solve_ols` so the same CR2 computation produces both vcov + DOF, or cache the per-cluster `A_g` / `MUWTWUM` precomputes between calls. CI codex P3 on PR #475. | `linalg.py::LinearRegression.fit`, `linalg.py::solve_ols` | PR #475 follow-up | Low |
 | `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` with replicate-weight survey designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate (re-demeaning depends on the per-replicate weight vector), which doesn't compose with the full-dummy HC2/HC2-BM build — a correct implementation would need per-replicate full-dummy refit. Workaround: use `vcov_type="hc1"` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Low |
 | TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper (or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch, with TWFE-specific cluster default threading) to reduce drift risk on FE naming, survey behavior, and result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Low |
 | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
@@ -201,7 +207,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 #### Tier C — Heavy / derivation required
 
 - HonestDiD Δ^RM ARP conditional/hybrid confidence sets (`honest_did.py`)
-- Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey (linalg derivations + R parity harness) (`linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_compute_cr2_bm`)
+- ~~Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey~~ — LIFTED via the clubSandwich WLS-CR2 port (PR #475); see rows 109-110 above for the closed rows.
 - Multi-absorb weighted demeaning: alternating-projection iteration for N>1 absorb + weights (`estimators.py`)
 - ImputationDiD dense `(A0'A0).toarray()` OOM: alternative dense fallback or richer sparse strategy (`imputation.py:1531`)
 - HAD mass-point `vcov_type ∈ {hc2, hc2_bm}`: 2SLS-specific leverage derivation (`had.py::_fit_mass_point_2sls`)
diff --git a/benchmarks/R/generate_clubsandwich_golden.R b/benchmarks/R/generate_clubsandwich_golden.R
index 77976fa3..7c466c36 100644
--- a/benchmarks/R/generate_clubsandwich_golden.R
+++ b/benchmarks/R/generate_clubsandwich_golden.R
@@ -23,6 +23,7 @@ suppressPackageStartupMessages({
   library(jsonlite)
 })
 
+stopifnot(packageVersion("clubSandwich") >= "0.7.0")
 set.seed(20260420)
 
 # --- Three deterministic datasets ---------------------------------------------
@@ -468,12 +469,242 @@ output$sun_abraham_two_cohort <- list(
   dof_bm_contrast_overall_unit = unname(sa_dof_bm_overall_unit)
 )
 
+# =============================================================================
+# Weighted scenarios (clubSandwich WLS-CR2 port)
+# =============================================================================
+# Pin diff-diff's weighted CR2-BM port against clubSandwich's specific WLS-CR2
+# algebra (R/CR-adjustments.R::CR2 + R/get_arrays.R::get_GH + coef_test.R).
+# Verified algorithm uses W (not sqrt(W)) in the hat matrix, W^2 in the bias
+# correction term, and unweighted residuals in the score construction.
+
+# ---- Scenario: weighted_one_way_no_cluster ----------------------------------
+# 12 observations, single "cluster" via vcovCR(cluster=1:n, type="CR2") for the
+# one-way HC2-BM reduction. Heteroskedastic weights.
+set.seed(50100)
+d_w_oneway <- data.frame(
+  x = c(-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, -0.8, 0.3, 1.2, -1.2),
+  z = c(0.2, -0.3, 0.5, -0.1, 0.4, -0.5, 0.6, -0.2, 0.1, -0.4, 0.3, 0.5)
+)
+d_w_oneway$y <- 1 + 0.5 * d_w_oneway$x + 0.3 * d_w_oneway$z +
+  rnorm(nrow(d_w_oneway), sd = 0.5)
+w_oneway <- c(0.5, 0.5, 1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5, 3, 3)
+fit_w_oneway <- lm(y ~ x + z, data = d_w_oneway, weights = w_oneway)
+vcov_w_oneway <- vcovCR(fit_w_oneway,
+                        cluster = seq_len(nrow(d_w_oneway)), type = "CR2")
+ct_w_oneway <- coef_test(fit_w_oneway, vcov = vcov_w_oneway)
+output$weighted_one_way_no_cluster <- list(
+  x = d_w_oneway$x,
+  z = d_w_oneway$z,
+  y = d_w_oneway$y,
+  weights = w_oneway,
+  coef = as.numeric(coef(fit_w_oneway)),
+  coef_names = names(coef(fit_w_oneway)),
+  vcov_hc2_bm = as.numeric(vcov_w_oneway),
+  vcov_hc2_bm_shape = dim(vcov_w_oneway),
+  dof_bm_one_way = as.numeric(ct_w_oneway$df_Satt),
+  se_hc2_bm = as.numeric(ct_w_oneway$SE)
+)
+
+# ---- Scenario: weighted_balanced_clusters -----------------------------------
+# 20 observations, 4 clusters of 5; weights vary within and across clusters.
+set.seed(50200)
+n_w_bal <- 20
+cluster_w_bal <- rep(1:4, each = 5)
+d_w_bal <- data.frame(
+  cluster = cluster_w_bal,
+  x = rnorm(n_w_bal),
+  z = rnorm(n_w_bal)
+)
+d_w_bal$y <- 1 + 0.5 * d_w_bal$x + 0.3 * d_w_bal$z + rnorm(n_w_bal, sd = 0.5)
+w_bal <- runif(n_w_bal, min = 0.5, max = 3.0)
+fit_w_bal <- lm(y ~ x + z, data = d_w_bal, weights = w_bal)
+vcov_w_bal <- vcovCR(fit_w_bal, cluster = d_w_bal$cluster, type = "CR2")
+ct_w_bal <- coef_test(fit_w_bal, vcov = vcov_w_bal)
+output$weighted_balanced_clusters <- list(
+  x = d_w_bal$x,
+  z = d_w_bal$z,
+  y = d_w_bal$y,
+  weights = w_bal,
+  cluster = d_w_bal$cluster,
+  coef = as.numeric(coef(fit_w_bal)),
+  coef_names = names(coef(fit_w_bal)),
+  vcov_cr2 = as.numeric(vcov_w_bal),
+  vcov_cr2_shape = dim(vcov_w_bal),
+  dof_bm = as.numeric(ct_w_bal$df_Satt),
+  se_cr2 = as.numeric(ct_w_bal$SE)
+)
+
+# ---- Scenario: weighted_unbalanced_clusters ---------------------------------
+# 52 observations, 8 clusters of sizes 3-10. Heteroskedastic weights.
+set.seed(50300)
+cluster_sizes_unbal <- c(3, 4, 5, 6, 7, 8, 9, 10)
+cluster_w_unbal <- rep(1:8, times = cluster_sizes_unbal)
+n_w_unbal <- length(cluster_w_unbal)
+d_w_unbal <- data.frame(
+  cluster = cluster_w_unbal,
+  x = rnorm(n_w_unbal),
+  z = rnorm(n_w_unbal)
+)
+shock_unbal <- rnorm(8, sd = 0.3)
+d_w_unbal$y <- 1 + 0.5 * d_w_unbal$x + 0.3 * d_w_unbal$z +
+  shock_unbal[d_w_unbal$cluster] + rnorm(n_w_unbal, sd = 0.4)
+w_unbal <- runif(n_w_unbal, min = 0.3, max = 3.0)
+fit_w_unbal <- lm(y ~ x + z, data = d_w_unbal, weights = w_unbal)
+vcov_w_unbal <- vcovCR(fit_w_unbal, cluster = d_w_unbal$cluster, type = "CR2")
+ct_w_unbal <- coef_test(fit_w_unbal, vcov = vcov_w_unbal)
+output$weighted_unbalanced_clusters <- list(
+  x = d_w_unbal$x,
+  z = d_w_unbal$z,
+  y = d_w_unbal$y,
+  weights = w_unbal,
+  cluster = d_w_unbal$cluster,
+  coef = as.numeric(coef(fit_w_unbal)),
+  coef_names = names(coef(fit_w_unbal)),
+  vcov_cr2 = as.numeric(vcov_w_unbal),
+  vcov_cr2_shape = dim(vcov_w_unbal),
+  dof_bm = as.numeric(ct_w_unbal$df_Satt),
+  se_cr2 = as.numeric(ct_w_unbal$SE),
+  cluster_sizes = as.numeric(table(d_w_unbal$cluster))
+)
+
+# ---- Scenario: weighted_singletons_present ----------------------------------
+# Adversarial: prior PT2018 transform-once derivation hit ~30% gap on
+# singleton-cluster scenarios. Verifies the clubSandwich port handles this.
+set.seed(50400)
+cluster_sizes_sing <- c(1, 1, 2, 3, 4, 5, 6, 6, 4, 3)
+cluster_w_sing <- rep(1:10, times = cluster_sizes_sing)
+n_w_sing <- length(cluster_w_sing)
+d_w_sing <- data.frame(
+  cluster = cluster_w_sing,
+  x = rnorm(n_w_sing),
+  z = rnorm(n_w_sing)
+)
+shock_sing <- rnorm(10, sd = 0.3)
+d_w_sing$y <- 1 + 0.5 * d_w_sing$x + 0.3 * d_w_sing$z +
+  shock_sing[d_w_sing$cluster] + rnorm(n_w_sing, sd = 0.4)
+w_sing <- runif(n_w_sing, min = 0.3, max = 3.0)
+fit_w_sing <- lm(y ~ x + z, data = d_w_sing, weights = w_sing)
+vcov_w_sing <- vcovCR(fit_w_sing, cluster = d_w_sing$cluster, type = "CR2")
+ct_w_sing <- coef_test(fit_w_sing, vcov = vcov_w_sing)
+output$weighted_singletons_present <- list(
+  x = d_w_sing$x,
+  z = d_w_sing$z,
+  y = d_w_sing$y,
+  weights = w_sing,
+  cluster = d_w_sing$cluster,
+  coef = as.numeric(coef(fit_w_sing)),
+  coef_names = names(coef(fit_w_sing)),
+  vcov_cr2 = as.numeric(vcov_w_sing),
+  vcov_cr2_shape = dim(vcov_w_sing),
+  dof_bm = as.numeric(ct_w_sing$df_Satt),
+  dof_per_coef = as.numeric(ct_w_sing$df_Satt),
+  se_cr2 = as.numeric(ct_w_sing$SE),
+  cluster_sizes = as.numeric(table(d_w_sing$cluster))
+)
+
+# ---- Scenario: weighted_did_absorbed_fe -------------------------------------
+# DiD-style analytical CR2 design-matrix parity fixture: 8 units x 4 periods,
+# treat_post + unit + period FE, analytics weights varying by unit. Pins the
+# clubSandwich WLS-CR2 analytical surface (compute_robust_vcov / solve_ols /
+# LinearRegression direct callers) on a DiD-shaped design. NOTE: this is NOT
+# a public-estimator `DifferenceInDifferences(survey_design=...)` parity
+# fixture — estimator-level survey designs route through the TSL survey
+# variance path, which takes precedence over the analytical CR2 sandwich.
+set.seed(50500)
+d_did_w <- make_did_panel(n_units = 8, n_periods = 4, treatment_period = 2,
+                          seed = 50501)
+# Unit-level weight (stratum-like): vary by unit, constant within unit-period.
+unit_w_did <- runif(8, min = 0.5, max = 2.5)
+d_did_w$weights <- unit_w_did[d_did_w$unit_int]
+fit_did_w <- lm(y ~ treat_post + unit + period, data = d_did_w,
+                weights = d_did_w$weights)
+vcov_did_w_cr2 <- vcovCR(fit_did_w, cluster = d_did_w$unit_int, type = "CR2")
+ct_did_w_cr2 <- coef_test(fit_did_w, vcov = vcov_did_w_cr2)
+output$weighted_did_absorbed_fe <- list(
+  unit = d_did_w$unit_int,
+  period = d_did_w$period_int,
+  treated = d_did_w$treated,
+  post = d_did_w$post,
+  treat_post = d_did_w$treat_post,
+  y = d_did_w$y,
+  weights = d_did_w$weights,
+  coef = as.numeric(coef(fit_did_w)),
+  coef_names = names(coef(fit_did_w)),
+  vcov_cr2 = as.numeric(vcov_did_w_cr2),
+  vcov_cr2_shape = dim(vcov_did_w_cr2),
+  dof_cr2 = as.numeric(ct_did_w_cr2$df_Satt)
+)
+
+# ---- Scenario: weighted_mpd_avg_att_dof -------------------------------------
+# MPD-style analytical CR2 contrast-DOF parity fixture: 15 units x 4 periods,
+# MPD parameterization with analytics weights + cluster=unit. Compound
+# contrast = post-period-average ATT. Pins the clubSandwich `Wald_test(test=
+# "HTZ")$df_denom` against `_compute_cr2_bm_contrast_dof(weights=)` on an
+# MPD-shaped design. NOTE: like `weighted_did_absorbed_fe`, this fixture
+# targets the analytical surface only — estimator-level
+# `MultiPeriodDiD(survey_design=...)` paths use TSL.
+set.seed(50600)
+d_mpd_w <- make_mpd_panel(n_total = 15, units_per_cohort = 5, n_periods = 4,
+                          seed = 50601)
+d_mpd_w$period_f <- relevel(factor(d_mpd_w$period), ref = "1")
+for (p in 2:4) {
+  d_mpd_w[[paste0("treated_period_", p)]] <-
+    d_mpd_w$treated * (d_mpd_w$period == p)
+}
+unit_w_mpd <- runif(15, min = 0.5, max = 2.5)
+d_mpd_w$weights <- unit_w_mpd[d_mpd_w$unit]
+fit_mpd_w <- lm(y ~ treated + period_f +
+                    treated_period_2 + treated_period_3 + treated_period_4 +
+                    factor(unit),
+                data = d_mpd_w, weights = d_mpd_w$weights)
+vcov_mpd_w_cr2 <- vcovCR(fit_mpd_w, cluster = d_mpd_w$unit, type = "CR2")
+ct_mpd_w_cr2 <- coef_test(fit_mpd_w, vcov = vcov_mpd_w_cr2)
+# Compound contrast: post-period-average over treated_period_{2,3,4}.
+all_coef_names_w <- names(coef(fit_mpd_w))
+n_coef_w <- length(all_coef_names_w)
+c_avg_vec_w <- setNames(rep(0, n_coef_w), all_coef_names_w)
+post_names_w <- c("treated_period_2", "treated_period_3", "treated_period_4")
+c_avg_vec_w[post_names_w] <- 1 / length(post_names_w)
+finite_mask_w <- !is.na(coef(fit_mpd_w))
+c_avg_kept_w <- c_avg_vec_w[finite_mask_w]
+dof_avg_w <- Wald_test(
+  fit_mpd_w,
+  constraints = matrix(c_avg_kept_w, 1),
+  vcov = vcov_mpd_w_cr2,
+  test = "HTZ"
+)$df_denom
+output$weighted_mpd_avg_att_dof <- list(
+  unit = d_mpd_w$unit,
+  period = d_mpd_w$period,
+  treated = d_mpd_w$treated,
+  y = d_mpd_w$y,
+  weights = d_mpd_w$weights,
+  cluster = d_mpd_w$unit,
+  coef = as.numeric(coef(fit_mpd_w)),
+  coef_names = all_coef_names_w,
+  finite_coef_names = all_coef_names_w[finite_mask_w],
+  vcov_cr2 = as.numeric(vcov_mpd_w_cr2),
+  vcov_cr2_shape = dim(vcov_mpd_w_cr2),
+  dof_per_coef = as.numeric(ct_mpd_w_cr2$df_Satt),
+  c_avg = as.numeric(c_avg_kept_w),
+  dof_avg = unname(dof_avg_w),
+  post_interaction_names = post_names_w,
+  reference_period = 1L,
+  n_post_periods = length(post_names_w)
+)
+
 output$meta <- list(
   source = "clubSandwich",
   clubSandwich_version = as.character(packageVersion("clubSandwich")),
   R_version = R.version.string,
   generated_at = format(Sys.time(), tz = "UTC", usetz = TRUE),
-  note = "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
+  note = paste0(
+    "CR2 Bell-McCaffrey cluster-robust parity target for ",
+    "diff_diff._compute_cr2_bm. Unweighted scenarios pin against ",
+    "_compute_cr2_bm / _compute_bm_dof_oneway; weighted scenarios pin ",
+    "the clubSandwich WLS-CR2 port (W not sqrt(W), W^2 bias term, ",
+    "unweighted residuals)."
+  )
 )
 
 out_path <- file.path("benchmarks", "data", "clubsandwich_cr2_golden.json")
diff --git a/benchmarks/data/clubsandwich_cr2_golden.json b/benchmarks/data/clubsandwich_cr2_golden.json
index 34eb0b2c..5f8bd117 100644
--- a/benchmarks/data/clubsandwich_cr2_golden.json
+++ b/benchmarks/data/clubsandwich_cr2_golden.json
@@ -117,11 +117,98 @@
     "dof_bm_contrast_es0_unit": 34.999999999999865,
     "dof_bm_contrast_overall_unit": 21.211009174311531
   },
+  "weighted_one_way_no_cluster": {
+    "x": [-1.5, -1, -0.5, 0, 0.5, 1, 1.5, 2, -0.8, 0.3, 1.2, -1.2],
+    "z": [0.2, -0.3, 0.5, -0.1, 0.4, -0.5, 0.6, -0.2, 0.1, -0.4, 0.3, 0.5],
+    "y": [0.2545059828412936, -0.1144214115246267, 1.146998198838752, 1.147491940047105, 0.9038865036585249, 1.664890628059604, 1.411754645520116, 2.074412325055902, 0.8402019123424249, 1.459015848932287, 2.7274111498635, 0.5134875755019385],
+    "weights": [0.5, 0.5, 1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5, 3, 3],
+    "coef": [1.200447520280967, 0.5567076424492209, -0.01058416930242507],
+    "coef_names": ["(Intercept)", "x", "z"],
+    "vcov_hc2_bm": [0.009572486374129946, 0.01171461935495017, 0.02202523381491585, 0.01171461935495016, 0.03149562956602776, 0.07288352542952749, 0.02202523381491584, 0.07288352542952749, 0.20631070113179],
+    "vcov_hc2_bm_shape": [3, 3],
+    "dof_bm_one_way": [3.658586186556956, 3.561535732786126, 3.928984857847434],
+    "se_hc2_bm": [0.09783908408264023, 0.1774700807630057, 0.4542143779448093]
+  },
+  "weighted_balanced_clusters": {
+    "x": [0.497895572432374, 3.057554937847414, 0.09630395091262728, -0.07778318241031754, 0.5816582621663493, -0.3081382740520117, 2.203136894152622, 0.8596865967942998, 0.2174670944191681, 1.890925141897248, 0.1344365697571656, -0.2921761276470969, -0.8216067859357986, 0.5112934456463226, 0.458616529105074, 0.5233128561706929, 0.1393071599926687, -0.5734864092098354, 1.578061262433531, 0.2114937713637269],
+    "z": [-1.359626052778771, 0.5490960321585909, 0.5110071736099511, -2.864598979682063, 1.794791481107389, -0.8651280993421233, 1.448882473708593, -0.3803030228705687, -0.9326874819641782, -1.085533200150278, -2.113648483781554, 0.5871210183367245, 0.3746077099297441, 0.8034274985900017, 0.4828186801609228, 0.8165790679562114, -1.557754016772785, -0.5242799147266215, -2.241211750510999, -0.5650840273052196],
+    "y": [1.295300424563947, 2.066123563746261, 1.168329348504406, 0.1926345471429525, 2.194972700001895, -0.3623503987260962, 1.880593242273413, 1.943056744348266, 0.2096603116088021, 2.393988591298549, 0.5523416291655595, 0.7891371689743394, 0.9166692744371431, 1.40538687248714, 2.642127100647723, 1.44267784237531, 1.583671308145638, 0.6456033271397608, 0.8525389070327165, 1.084058989209185],
+    "weights": [0.9752403678139672, 1.453358756378293, 1.994948330218904, 2.848997579538263, 2.518879582174122, 1.228916482185014, 1.902456686482765, 2.709635726176202, 2.496082805562764, 2.087121451739222, 0.5264923868235201, 2.929599115857854, 2.0987546404358, 2.444028527941555, 1.999268490588292, 1.648351823212579, 1.685008906177245, 1.581653047003783, 1.442831613938324, 2.990091201034375],
+    "cluster": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
+    "coef": [1.112995565816867, 0.4520613463328544, 0.2642771754021827],
+    "coef_names": ["(Intercept)", "x", "z"],
+    "vcov_cr2": [0.01036410509311339, -0.01085435894028598, 0.003903827762049024, -0.01085435894028598, 0.01437949999803382, -0.004956415639175168, 0.003903827762049024, -0.004956415639175169, 0.002854251375055785],
+    "vcov_cr2_shape": [3, 3],
+    "dof_bm": [2.314748612075773, 2.614469013815373, 1.872252930587907],
+    "se_cr2": [0.101804248895188, 0.119914552903448, 0.05342519419764222]
+  },
+  "weighted_unbalanced_clusters": {
+    "x": [0.1700257637101301, 0.8471605793290365, 0.7607362189483639, 0.8991792602988701, -1.819641765807002, 0.7138531378263915, -0.363167215454481, -1.9319620979415, -0.787024189701091, 0.5302488596829736, -0.5066394985906996, -0.5512592471101819, -0.5953507187909981, -0.3696967789062436, 0.4141094751845556, -0.7865571685291729, 0.4472485572197755, 1.696664934996508, 1.514885018690522, -0.1468751822667536, -0.2588690712859881, -0.5329060342529905, -0.8881213731610088, 1.15224077871137, -0.4891684378504669, 0.1055836602004347, -1.070624707969419, -0.501473030587183, -1.461613510277211, -0.3851219415128875, 0.5139696748263425, -0.2084231193768033, 0.6239667157075947, 1.089340520259811, -0.441298707938111, -0.3144779319312493, -1.201864940048664, -0.6038748246273411, -0.4338055907163749, 1.272812449965721, 1.645106327539352, -0.5326414909357686, -0.4492438978597619, -1.069480835988015, -0.1597700187564996, 0.7842571197351411, -0.1375358216917391, -0.61464564096129, 1.134378801883864, -1.203975350662919, -1.84129716372263, -1.213610765484462],
+    "z": [-0.1091850747573143, -0.8591967936980275, -0.8250710914370268, -0.1801479399661773, -1.913234187363314, 0.3265761942570716, -0.4695875535270576, 0.2828249045835142, -0.2637814218132422, -0.521166013857016, 2.325752636111909, -3.223507645410792, 1.294283511479585, -1.086468134041922, 0.5696767094019248, 1.538174078426102, -0.1730445073245152, 0.16357159794326, -1.059200634051567, -1.182877105852174, -0.6322788997272122, 1.633036792485955, -0.6408734025178767, -0.543790344342109, -1.515209319990585, 0.2395441740225159, -1.067827435871503, 0.07579642452497221, 0.2682916988451204, 1.06595746147165, 0.9687373777397362, -0.8147854056407897, -2.570837564260244, 0.6248236443426528, 0.9060310859195424, 0.02323126848759456, -0.7962952641646476, 0.5399111020217939, 0.9488247029931438, 0.7644633881262044, -0.4250613659624701, 0.6591694709313157, 0.8154380042348803, 2.629729012597462, -0.9871602522976277, 0.8354812212757736, -1.601897079063906, -0.6336935138921488, -1.551400892307474, 0.9810887261315703, -1.199636605252803, 0.3674161488612564],
+    "y": [0.4630621179411795, 0.9192132028442144, 0.8537994884716151, 1.100204725665069, 0.3537214552456592, 1.364943689987401, 0.9545514829191797, 0.9709061374942889, 1.417612530876569, 1.576148970538916, 2.652838505287043, 0.6300337941901875, 1.553242843036968, 0.8142654327090031, 1.693396355464565, 1.147035822402205, 1.751910947626995, 1.827284019136156, 1.573915696827097, 0.5077994935337435, 0.6482945814745228, 1.249874039879542, 0.633202655880128, 1.121324348299585, 0.7251038588365031, 1.243329651084485, 0.1144735412805133, 0.8123828895748136, 0.1716022553715115, 0.9255162277435098, 2.102484044185116, 0.4494995616250322, -0.2642525955352465, 1.521680473889361, 0.502314471249461, 0.3348838862242045, -0.5723017912855416, 0.09929403182597019, 0.89129736563392, 1.860532717182939, 1.413766888400529, 0.7866944862687557, 0.7917192314157128, 0.3316390730217313, -0.5972960836146139, 1.390567925433349, 0.7875676337919002, 0.006301286997087527, 0.1070562784619726, 0.1065047762532, -0.5947116540389716, 0.004132997472964722],
+    "weights": [0.9031896912027151, 1.873537396453321, 2.729299383074977, 2.517922308412381, 2.554755570203997, 0.3819767832756042, 1.19937636544928, 1.406278725899756, 0.5915587811730803, 0.4882745106238872, 1.33074354857672, 0.9725192229961976, 2.383067906973884, 2.32964567549061, 1.831404032651335, 0.5105581857729703, 1.919225583435036, 0.7080367370741442, 2.515758992219344, 2.867132637673058, 1.34958449320402, 1.09255025004968, 1.94399518542923, 1.638967646844685, 0.6610008525894955, 1.362667562579736, 0.8535449544433504, 1.409185838978738, 1.37667216185946, 1.772799369134009, 1.25428713618312, 2.784489484922961, 0.3590846309904009, 1.939538310212083, 2.244168590591289, 0.5101799157215282, 1.303672137879767, 1.881783301988617, 2.030243344651535, 2.459536990919151, 2.16933219996281, 1.853940955898725, 2.093805230199359, 0.8066784407710657, 0.7310235396958887, 2.525525042018853, 0.8900950277689844, 0.5643322949996218, 0.8150937611237169, 0.974766900530085, 0.8633906545350328, 2.532768704323098],
+    "cluster": [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
+    "coef": [0.9460723269689574, 0.409443850090639, 0.2522796102420196],
+    "coef_names": ["(Intercept)", "x", "z"],
+    "vcov_cr2": [0.01544375785108722, -0.008878961106364092, 0.004155134409409824, -0.008878961106364092, 0.01099403590939687, 2.5660273165996e-05, 0.004155134409409824, 2.566027316599605e-05, 0.004775737925894957],
+    "vcov_cr2_shape": [3, 3],
+    "dof_bm": [5.433322579854079, 4.639148851013618, 6.20486806011078],
+    "se_cr2": [0.1242729168044559, 0.1048524482756453, 0.06910671404353529],
+    "cluster_sizes": [3, 4, 5, 6, 7, 8, 9, 10]
+  },
+  "weighted_singletons_present": {
+    "x": [0.4067767593383133, -0.7109222810358509, -0.8668035246604626, -0.5703710634686002, -0.3030755337955155, -0.5283976491150604, -0.09096892878882246, -0.3628800521834397, 1.054455289774895, -1.203025183896919, 1.985902239511619, 0.6505090454824404, 1.406484265428918, 1.230396085537867, -1.215488287370151, -0.009733872981889312, 0.1675571807367171, -1.204710061967087, -0.05757657014103108, -0.1295491284921035, -1.138731644058949, 0.04325336518351799, -0.362552211625902, -0.2448367156683839, 0.6268229420311034, -0.4555368322000388, -0.247622575428871, 1.786869527952035, 1.357488072670255, -0.03696579182414967, 0.8516449891133896, 0.511211316099193, 2.033211276638259, -1.190292644742976, -0.7230722331927855],
+    "z": [1.003038859366986, -2.550039441605369, -0.2076595522685525, 0.2499084706915529, 1.137962901973954, 0.2083064542831441, 0.7476470705736329, -0.03625564662628415, 0.5157729911812325, -0.7610710228036714, -1.481038792753671, -0.7984198561901753, -0.2260229729168584, 0.5617272465895701, -0.02575503564694858, 1.256586715038484, -1.772564645423327, 1.291702549796663, -1.717555861571738, 0.9582305358270773, -0.5940888419227472, -1.777376739775493, -0.2144668712146, 0.423946490940611, -0.153615128640419, 1.479871505080308, 0.5238286911024496, -0.5035239483086945, 0.9417984529308548, -0.3349751079408045, 0.3288136462061103, 0.1872849061830458, 2.609865833930387, 0.08753371273693382, 2.834695738180907],
+    "y": [1.226921217157507, -0.09393131417825959, 0.8411437856728794, 0.6208123792054755, 0.9858547913155151, 1.52530337011539, 1.854105927304284, 1.084594718944251, 1.582729000457745, -1.087909707573491, 1.117691189546059, 0.6150008668107773, 1.640443104343987, 1.094850459166496, 0.2017571025135735, 0.9560990186790926, -0.2079028825688845, -0.3731402418575339, 0.3680355714336011, 0.2398258117389408, -0.05402341915184727, -0.4501570939831154, 0.5894333761149737, 0.7768044197349441, 0.8216465289104175, 1.929245498849222, 1.12957244220005, 1.430096055875213, 2.586549814466467, 0.7497787081120627, 1.125340141100099, 0.8404366383511823, 2.89672020628496, 0.364485576071561, 1.092463608783902],
+    "weights": [1.182571023213677, 2.913374680909328, 2.667768498254009, 1.212338169640861, 2.121006679418497, 0.4489697153912857, 2.804210721119307, 2.897144587640651, 2.951976609556004, 0.5251405492192134, 1.592602305393666, 1.369845829694532, 2.515343518462032, 2.048994413483888, 2.156659548170865, 0.3023814621381462, 2.641614361549728, 1.053614864149131, 1.181866449234076, 0.8170217901701108, 2.828353272099048, 2.348975717369467, 0.9643143250839785, 1.385686740302481, 1.793250523670577, 0.4886338327545673, 1.597924707876518, 2.248105251998641, 2.594786335271783, 0.9103751726215705, 0.5885890521341934, 1.884197746496648, 1.638390199514106, 0.5767186880344525, 1.419133383594453],
+    "cluster": [1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10],
+    "coef": [0.8362061617840696, 0.4909272685051819, 0.3953476774903695],
+    "coef_names": ["(Intercept)", "x", "z"],
+    "vcov_cr2": [0.01427824301756761, -0.006349066362093786, -0.001910959103363912, -0.006349066362093786, 0.005977023360000337, 0.001737192630125399, -0.001910959103363913, 0.001737192630125399, 0.004828328539389793],
+    "vcov_cr2_shape": [3, 3],
+    "dof_bm": [6.452850736468327, 5.700532144696306, 3.441915076346584],
+    "dof_per_coef": [6.452850736468327, 5.700532144696306, 3.441915076346584],
+    "se_cr2": [0.1194916022888956, 0.07731121108869228, 0.06948617516736544],
+    "cluster_sizes": [1, 1, 2, 3, 4, 5, 6, 6, 4, 3]
+  },
+  "weighted_did_absorbed_fe": {
+    "unit": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8],
+    "period": [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
+    "treated": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
+    "post": [0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1],
+    "treat_post": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1],
+    "y": [2.245279639812404, 2.442175127769796, 2.867752159874622, 3.619434127689569, -1.823559344308326, -2.727741027578404, -1.582833463976761, -0.9911700545412858, 2.12902821265476, 1.264117287650781, 2.841544540813218, 3.31143937894424, 2.304030161487548, 0.987094837589835, 2.367591101983983, 3.009438142803223, -0.2005497188901713, 1.574281646445233, 3.245244933635386, 3.186348981556681, -0.4147282038989865, 0.3995519437268384, 2.003183253592468, 2.992467308475275, -1.166995207509165, 0.07740370198218015, 1.267215687377219, 1.860239034011572, 0.5144659340411173, 2.077183296893155, 2.484077938367208, 3.150274932387417],
+    "weights": [1.583369129803032, 1.583369129803032, 1.583369129803032, 1.583369129803032, 1.000942167360336, 1.000942167360336, 1.000942167360336, 1.000942167360336, 0.8374841092154384, 0.8374841092154384, 0.8374841092154384, 0.8374841092154384, 1.871459642890841, 1.871459642890841, 1.871459642890841, 1.871459642890841, 1.012972505297512, 1.012972505297512, 1.012972505297512, 1.012972505297512, 0.6240475559607148, 0.6240475559607148, 0.6240475559607148, 0.6240475559607148, 0.9292489611543715, 0.9292489611543715, 0.9292489611543715, 0.9292489611543715, 1.916603899095207, 1.916603899095207, 1.916603899095207, 1.916603899095207],
+    "coef": [2.629329718329883, 2.079697886972744, -4.574986236387794, -0.4071279087708489, -0.6266217028204513, -2.402102218329374, -3.108315103542257, -3.843967875050704, -2.296933153593933, -0.6799998598431396, 0.3739260505483298, 0.9633959911216661],
+    "coef_names": ["(Intercept)", "treat_post", "unit2", "unit3", "unit4", "unit5", "unit6", "unit7", "unit8", "period2", "period3", "period4"],
+    "vcov_cr2": [0.04029069397538304, 0.05372092530051082, 8.554436534042884e-17, 9.992433363647783e-17, 3.403132305244789e-17, -0.04029069397538301, -0.04029069397538297, -0.04029069397538302, -0.04029069397538302, -0.06988862506383763, -0.04450427364310035, -0.04676987719459453, 0.05372092530051082, 0.1160594712815107, 1.196956605008603e-16, 1.343858455779316e-16, 4.438900309305662e-17, -0.08704460346113287, -0.08704460346113281, -0.08704460346113287, -0.08704460346113281, -0.1099650001911151, -0.04456966530602798, -0.06034903570490068, 8.554436534042891e-17, 1.196956605008604e-16, 4.280487735476482e-31, 4.28576441500879e-31, 3.992675358147652e-31, -8.977174537564487e-17, -8.977174537564482e-17, -8.977174537564489e-17, -8.977174537564493e-17, -1.558184767192196e-16, -8.281198304066131e-17, -1.035470016018363e-16, 9.992433363647788e-17, 1.343858455779316e-16, 4.285764415008789e-31, 4.564692272723701e-31, 4.086765806203453e-31, -1.007893841834483e-16, -1.007893841834482e-16, -1.007893841834483e-16, -1.007893841834484e-16, -1.782022064945133e-16, -1.006522966392193e-16, -1.208428314121805e-16, 3.403132305244794e-17, 4.438900309305661e-17, 3.992675358147651e-31, 4.086765806203456e-31, 5.379058241684956e-31, -3.329175231979208e-17, -3.329175231979208e-17, -3.329175231979208e-17, -3.329175231979214e-17, -6.562283443841389e-17, -2.385067066094372e-17, -4.665178711043567e-17, -0.04029069397538301, -0.08704460346113289, -8.97717453756448e-17, -1.007893841834483e-16, -3.329175231979206e-17, 0.06528345259584954, 0.0652834525958495, 0.06528345259584956, 0.06528345259584951, 0.08247375014333612, 0.03342724897952088, 0.0452617767786754, -0.04029069397538299, -0.08704460346113284, -8.977174537564476e-17, -1.007893841834482e-16, -3.329175231979204e-17, 0.06528345259584951, 0.06528345259584947, 0.0652834525958495, 0.06528345259584949, 0.08247375014333606, 0.03342724897952086, 0.04526177677867536, -0.04029069397538301, -0.08704460346113289, -8.977174537564482e-17, -1.007893841834483e-16, -3.329175231979206e-17, 0.06528345259584954, 0.0652834525958495, 0.06528345259584956, 0.0652834525958495, 0.08247375014333612, 0.03342724897952089, 0.04526177677867541, -0.04029069397538303, -0.08704460346113287, -8.977174537564485e-17, -1.007893841834483e-16, -3.329175231979208e-17, 0.06528345259584956, 0.06528345259584951, 0.06528345259584954, 0.06528345259584951, 0.08247375014333615, 0.03342724897952092, 0.04526177677867543, -0.06988862506383765, -0.1099650001911151, -1.558184767192195e-16, -1.782022064945133e-16, -6.562283443841386e-17, 0.08247375014333613, 0.08247375014333604, 0.08247375014333613, 0.08247375014333609, 0.1323825893320879, 0.07016541117719213, 0.07700649974607114, -0.04450427364310037, -0.044569665306028, -8.281198304066126e-17, -1.006522966392193e-16, -2.385067066094369e-17, 0.0334272489795209, 0.03342724897952086, 0.03342724897952091, 0.03342724897952092, 0.07016541117719216, 0.05508283641475486, 0.0527688469804548, -0.04676987719459454, -0.06034903570490069, -1.035470016018362e-16, -1.208428314121805e-16, -4.665178711043565e-17, 0.04526177677867541, 0.04526177677867536, 0.04526177677867541, 0.04526177677867541, 0.07700649974607114, 0.05276884698045479, 0.05730416205185264],
+    "vcov_cr2_shape": [12, 12],
+    "dof_cr2": [2.079708364811179, 3.754252421708284, 1.867009422347841, 1.442536044092711, 1.434423388184993, 3.754252421708287, 3.754252421708284, 3.754252421708285, 3.754252421708288, 2.583991941622942, 2.583991941622944, 2.583991941622943]
+  },
+  "weighted_mpd_avg_att_dof": {
+    "unit": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
+    "period": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
+    "treated": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
+    "y": [1.450236116866374, 1.384186418694614, 1.810200304063252, 1.362325464097078, 1.714832616965797, 1.073655602641084, 2.036944639614525, 2.504973584559241, 1.85193279679849, 2.868405248820625, 2.313414588712066, 2.385847183049069, 2.712233007996873, 1.935785990136764, 2.888006529186367, 1.597464840557834, 1.817696391704754, 1.660745689021554, 1.705135352095145, 2.094615790972424, 1.569284826502699, 2.216917076802389, 1.887784249350644, 2.910139753186967, 2.094903287308483, 2.699981726497862, 2.12126113649166, 2.856097835427411, 2.374294742104558, 2.25134046798101, 3.522157960637041, 2.000619503645162, 3.243770682446319, 2.668745267618224, 2.10736092223072, 2.314219154619426, 3.105713625781752, 2.848666202751697, 3.048771923002261, 3.713274199823152, 2.757597113069262, 2.715143523294024, 3.257090841581132, 2.722960383858503, 3.862848526906458, 1.603328812547071, 2.532256441603572, 3.18585981630333, 2.593118489138658, 2.389624843947402, 3.646966461136928, 2.749723507458603, 3.249380465489038, 3.4315186923903, 3.301398110312618, 3.040050041202797, 3.283230604072161, 3.222319933655847, 4.341300857542413, 3.494045376879352],
+    "weights": [1.605493367183954, 1.005099594127387, 1.974008734337986, 1.528855750802904, 1.382769018411636, 0.8168281498365104, 1.116725771687925, 0.7146185920573771, 0.6638457239605486, 0.9160327934660017, 1.408900015987456, 1.494089051149786, 1.121897284407169, 2.47896139184013, 0.9660030328668654, 1.605493367183954, 1.005099594127387, 1.974008734337986, 1.528855750802904, 1.382769018411636, 0.8168281498365104, 1.116725771687925, 0.7146185920573771, 0.6638457239605486, 0.9160327934660017, 1.408900015987456, 1.494089051149786, 1.121897284407169, 2.47896139184013, 0.9660030328668654, 1.605493367183954, 1.005099594127387, 1.974008734337986, 1.528855750802904, 1.382769018411636, 0.8168281498365104, 1.116725771687925, 0.7146185920573771, 0.6638457239605486, 0.9160327934660017, 1.408900015987456, 1.494089051149786, 1.121897284407169, 2.47896139184013, 0.9660030328668654, 1.605493367183954, 1.005099594127387, 1.974008734337986, 1.528855750802904, 1.382769018411636, 0.8168281498365104, 1.116725771687925, 0.7146185920573771, 0.6638457239605486, 0.9160327934660017, 1.408900015987456, 1.494089051149786, 1.121897284407169, 2.47896139184013, 0.9660030328668654],
+    "cluster": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
+    "coef": [2.625713294025229, -1.152326777667365, 0.1047873423559094, 0.6187749583750757, 1.26982542412129, 0.03040493582558063, 0.5187722995270357, -0.2629232950280215, -0.1096072437400547, 0.4318471903065332, 0.03903421058519574, 0.03331161087700512, 0.1077345785729542, 0.4840277797622372, 0.5794041928855744, 0.7672938586924244, 0.9511982789141399, -0.4212993578678005, -0.497689613511568, -0.112124820572981, -0.2804747318277372, "NA"],
+    "coef_names": ["(Intercept)", "treated", "period_f2", "period_f3", "period_f4", "treated_period_2", "treated_period_3", "treated_period_4", "factor(unit)2", "factor(unit)3", "factor(unit)4", "factor(unit)5", "factor(unit)6", "factor(unit)7", "factor(unit)8", "factor(unit)9", "factor(unit)10", "factor(unit)11", "factor(unit)12", "factor(unit)13", "factor(unit)14", "factor(unit)15"],
+    "finite_coef_names": ["(Intercept)", "treated", "period_f2", "period_f3", "period_f4", "treated_period_2", "treated_period_3", "treated_period_4", "factor(unit)2", "factor(unit)3", "factor(unit)4", "factor(unit)5", "factor(unit)6", "factor(unit)7", "factor(unit)8", "factor(unit)9", "factor(unit)10", "factor(unit)11", "factor(unit)12", "factor(unit)13", "factor(unit)14"],
+    "vcov_cr2": [0.03877206090896604, -0.03877206090896605, -0.03331790924004146, -0.01668959399268778, -0.1050807404031349, 0.03331790924004153, 0.01668959399268781, 0.1050807404031349, 1.235166961706397e-17, 1.580840474375051e-17, 1.785220253129e-17, 1.790804070331918e-17, 1.716594453556469e-17, 7.344328781082032e-18, 2.500210109634643e-17, 1.129661962742403e-17, 2.229789708277196e-17, -1.01520589886476e-17, -4.591108916704839e-18, -4.291640564560931e-18, -1.496065622586256e-16, -0.03877206090896604, 0.04411143325189374, 0.03331790924004149, 0.01668959399268779, 0.1050807404031349, -0.03798232527847893, -0.023816445148991, -0.114646962580105, -2.389767432905781e-17, -2.871704776245365e-17, -3.161869951688231e-17, -2.967154924526644e-17, -3.652423083491364e-17, -1.812574536639178e-17, -3.818972254279221e-17, -2.525378630235397e-17, -3.760791895833968e-17, 1.486173352290406e-17, 8.866637612966353e-18, 8.889497475730824e-18, 1.548012768462699e-16, -0.0333179092400415, 0.03331790924004151, 0.0413526151615022, 0.009508855236688842, 0.08241016656197483, -0.04135261516150229, -0.009508855236688879, -0.08241016656197483, -1.057236212269788e-17, -1.45190316959588e-17, -1.588003786479355e-17, -1.630871440227382e-17, -1.558856096853653e-17, -6.806304027090383e-18, -2.125334816227217e-17, -9.312877126049096e-18, -1.965820223312272e-17, 2.092411428091354e-17, 1.951583915206511e-17, 1.372935228950203e-17, 1.342704512274321e-16, -0.01668959399268779, 0.01668959399268779, 0.009508855236688788, 0.01323478000222856, 0.04401474073183374, -0.009508855236688877, -0.01323478000222852, -0.04401474073183375, -4.818714157705903e-18, -6.124763184642996e-18, -6.956649872839472e-18, -7.212877962031036e-18, -6.530443440101033e-18, -3.317418989422836e-18, -1.035674541518275e-17, -4.792922577296129e-18, -9.189119729740084e-18, -6.29412247139263e-18, -4.050967720580605e-18, -8.663841771505713e-18, 5.147976511440467e-17, -0.1050807404031348, 0.1050807404031349, 0.08241016656197477, 0.04401474073183376, 0.2938980543187307, -0.08241016656197489, -0.04401474073183382, -0.2938980543187307, -3.401560218785202e-17, -4.258982409440017e-17, -4.857212238752693e-17, -4.811057044897176e-17, -4.654477373362109e-17, -1.925359210781479e-17, -6.839831080793064e-17, -3.108067880635081e-17, -6.034426636822491e-17, 2.597824414506937e-17, 2.899564235334748e-18, 1.210105174024731e-17, 4.126760326926655e-16, 0.03331790924004151, -0.0379823252784789, -0.0413526151615022, -0.009508855236688851, -0.08241016656197488, 0.05762030108252313, 0.007543394897072548, 0.08676560513431984, 1.62653121858232e-17, 1.102035404126904e-17, 2.300068925740764e-17, 2.172852085959686e-17, 2.792829589735232e-17, 1.134919768297453e-17, 2.953504384474163e-17, 2.349788991873195e-17, 3.419367938878152e-17, -2.560436908006503e-17, -2.455167775992318e-17, -1.871666509611811e-17, -1.398982144080852e-16, 0.01668959399268783, -0.02381644514899101, -0.00950885523668884, -0.01323478000222856, -0.04401474073183385, 0.007543394897072588, 0.05048554445738759, 0.03723684124150355, 8.252220436044849e-17, 9.243921090447958e-17, 9.222885227176348e-17, 8.474359441557387e-17, 8.345168592275608e-17, 8.515403431190532e-17, 9.281211667522906e-17, 8.08605261912235e-17, 8.823130836987948e-17, 3.374098230274628e-18, 2.603668677723448e-18, 6.246749175952447e-18, -5.55895615822805e-17, 0.1050807404031349, -0.1146469625801051, -0.08241016656197479, -0.04401474073183377, -0.2938980543187308, 0.08676560513431994, 0.0372368412415036, 0.3345854039445968, -3.19681923004133e-18, 1.140862610406506e-17, 1.12452565383573e-17, 1.221408170589418e-17, 3.471694151954527e-17, -2.400025052931364e-17, 3.041172965119725e-17, -3.343270900540417e-18, 2.800668807469692e-17, -3.721666324182575e-17, -1.351854136966555e-17, -2.308807398275752e-17, -4.237173313947137e-16, 1.2351669617064e-17, -2.389767432905784e-17, -1.05723621226979e-17, -4.818714157705943e-18, -3.401560218785211e-17, 1.626531218582327e-17, 8.252220436044858e-17, -3.196819230041314e-18, 2.239193094714048e-31, 2.351431130340321e-31, 2.412630109743412e-31, 2.242481873169718e-31, 2.398078200282154e-31, 2.34231403965325e-31, 2.336296671869482e-31, 2.265614234704675e-31, 2.316950554609178e-31, -5.940426390538525e-33, -1.166237187460746e-33, -3.556054223841715e-33, -5.421033653033469e-32, 1.580840474375057e-17, -2.871704776245373e-17, -1.451903169595885e-17, -6.124763184643061e-18, -4.258982409440029e-17, 1.102035404126915e-17, 9.243921090447966e-17, 1.140862610406519e-17, 2.351431130340322e-31, 2.631586935333294e-31, 2.541063135652704e-31, 2.364447576733195e-31, 2.48608545255808e-31, 2.454526951520984e-31, 2.460954570516459e-31, 2.352934503182668e-31, 2.418666219948441e-31, -8.256013055338425e-33, -2.776721002119976e-33, -4.781097048474545e-33, -6.791977022828346e-32, 1.785220253128999e-17, -3.161869951688231e-17, -1.588003786479351e-17, -6.95664987283941e-18, -4.857212238752699e-17, 2.300068925740766e-17, 9.222885227176347e-17, 1.124525653835724e-17, 2.41263010974341e-31, 2.541063135652703e-31, 2.702480525608586e-31, 2.424109380451132e-31, 2.556668839604363e-31, 2.515670722978988e-31, 2.528950162085977e-31, 2.422170440655363e-31, 2.497033787678356e-31, -9.313433004615063e-33, -3.603470689416668e-33, -5.921347005911199e-33, -7.743241651676204e-32, 1.790804070331919e-17, -2.967154924526647e-17, -1.630871440227382e-17, -7.212877962031075e-18, -4.811057044897181e-17, 2.172852085959695e-17, 8.474359441557392e-17, 1.221408170589417e-17, 2.242481873169718e-31, 2.364447576733193e-31, 2.424109380451134e-31, 2.256486859100999e-31, 2.413684396564978e-31, 2.338552849935726e-31, 2.356954598586318e-31, 2.266288872181801e-31, 2.33458598932679e-31, -8.065790635872877e-33, -3.234024177631039e-33, -4.6031004650253e-33, -7.538695100149888e-32, 1.71659445355647e-17, -3.652423083491368e-17, -1.558856096853654e-17, -6.530443440101053e-18, -4.654477373362117e-17, 2.792829589735238e-17, 8.345168592275614e-17, 3.471694151954527e-17, 2.398078200282154e-31, 2.486085452558079e-31, 2.556668839604364e-31, 2.413684396564977e-31, 3.155825474233936e-31, 2.459156458042098e-31, 2.490130100689166e-31, 2.396056715397363e-31, 2.480097313791876e-31, -1.648637735812342e-32, -1.077490932936038e-32, -1.310150241828421e-32, -8.189979525256463e-32, 7.344328781082076e-18, -1.812574536639181e-17, -6.806304027090419e-18, -3.317418989422927e-18, -1.925359210781489e-17, 1.134919768297459e-17, 8.515403431190538e-17, -2.400025052931363e-17, 2.342314039653251e-31, 2.454526951520984e-31, 2.51567072297899e-31, 2.338552849935726e-31, 2.4591564580421e-31, 2.466438684811666e-31, 2.42807649552639e-31, 2.362881195973991e-31, 2.400416639234534e-31, -2.571200782823255e-33, 1.914651770064149e-34, -6.989915924213099e-34, -3.169981023144877e-32, 2.500210109634638e-17, -3.818972254279215e-17, -2.12533481622721e-17, -1.035674541518271e-17, -6.83983108079306e-17, 2.953504384474161e-17, 9.281211667522899e-17, 3.041172965119709e-17, 2.33629667186948e-31, 2.460954570516457e-31, 2.528950162085976e-31, 2.356954598586317e-31, 2.490130100689166e-31, 2.428076495526388e-31, 2.549313041435374e-31, 2.344184839059218e-31, 2.435493785106449e-31, -1.009496757425997e-32, -3.360948220199006e-33, -5.819136978283817e-33, -1.042725451157439e-31, 1.129661962742403e-17, -2.525378630235397e-17, -9.312877126049095e-18, -4.792922577296144e-18, -3.108067880635084e-17, 2.349788991873199e-17, 8.086052619122353e-17, -3.343270900540485e-18, 2.265614234704674e-31, 2.352934503182666e-31, 2.422170440655364e-31, 2.2662888721818e-31, 2.396056715397363e-31, 2.36288119597399e-31, 2.344184839059219e-31, 2.601016387499351e-31, 2.330975606992207e-31, -7.403727134888907e-33, -3.299531925923137e-33, -5.355663515028411e-33, -5.219989385272334e-32, 2.229789708277195e-17, -3.76079189583397e-17, -1.965820223312269e-17, -9.189119729740094e-18, -6.034426636822495e-17, 3.419367938878156e-17, 8.823130836987949e-17, 2.800668807469688e-17, 2.316950554609178e-31, 2.41866621994844e-31, 2.497033787678355e-31, 2.334585989326789e-31, 2.480097313791876e-31, 2.400416639234532e-31, 2.43549378510645e-31, 2.330975606992206e-31, 2.557034420971738e-31, -1.221301270656962e-32, -6.595220147552603e-33, -8.292724453952795e-33, -9.623583276683192e-32, -1.01520589886477e-17, 1.486173352290405e-17, 2.092411428091362e-17, -6.294122471392508e-18, 2.597824414506983e-17, -2.560436908006519e-17, 3.374098230274474e-18, -3.721666324182588e-17, -5.940426390538698e-33, -8.256013055338457e-33, -9.313433004615232e-33, -8.065790635872951e-33, -1.648637735812373e-32, -2.571200782823215e-33, -1.009496757426003e-32, -7.403727134889149e-33, -1.221301270656994e-32, 3.478397996836671e-32, 3.18904518135145e-32, 3.264135717190936e-32, 5.159003222320179e-32, -4.591108916704885e-18, 8.866637612966348e-18, 1.951583915206516e-17, -4.050967720580556e-18, 2.899564235335142e-18, -2.455167775992314e-17, 2.603668677723342e-18, -1.351854136966551e-17, -1.16623718746087e-33, -2.776721002119952e-33, -3.60347068941673e-33, -3.234024177631068e-33, -1.077490932936067e-32, 1.914651770064466e-34, -3.360948220198865e-33, -3.29953192592333e-33, -6.59522014755286e-33, 3.189045181351349e-32, 4.18701071079314e-32, 3.048691422932938e-32, 5.591773473618826e-33, -4.291640564560958e-18, 8.889497475730826e-18, 1.372935228950199e-17, -8.663841771505596e-18, 1.210105174024744e-17, -1.871666509611806e-17, 6.246749175952318e-18, -2.30880739827574e-17, -3.556054223841856e-33, -4.781097048474557e-33, -5.921347005911343e-33, -4.603100465025324e-33, -1.310150241828457e-32, -6.989915924212668e-34, -5.819136978283798e-33, -5.355663515028621e-33, -8.292724453953024e-33, 3.264135717190897e-32, 3.048691422933002e-32, 3.424383784493886e-32, 2.173699014669866e-32, -1.496065622586257e-16, 1.5480127684627e-16, 1.342704512274323e-16, 5.147976511440482e-17, 4.126760326926659e-16, -1.398982144080852e-16, -5.558956158228065e-17, -4.237173313947138e-16, -5.421033653033485e-32, -6.791977022828344e-32, -7.743241651676213e-32, -7.538695100149896e-32, -8.189979525256494e-32, -3.169981023144874e-32, -1.042725451157439e-31, -5.219989385272356e-32, -9.623583276683227e-32, 5.159003222320038e-32, 5.591773473618162e-33, 2.173699014669707e-32, 6.574246225997392e-31],
+    "vcov_cr2_shape": [21, 21],
+    "dof_per_coef": [2.502183633460334, 5.084769819665309, 2.50218363346033, 2.502183633460329, 2.50218363346033, 5.08476981966529, 5.084769819665291, 5.084769819665295, 4.578580048081343, 3.636592706738188, 2.679689564722192, 4.634792290867446, 4.665289907174915, 4.251900922599552, 5.109173170485254, 4.301901447322372, 5.310177731969367, 1.440076743016052, 1.3032384322084, 1.17179324479538, 1.915047493011446],
+    "c_avg": [0, 0, 0, 0, 0, 0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+    "dof_avg": 5.084769819665292,
+    "post_interaction_names": ["treated_period_2", "treated_period_3", "treated_period_4"],
+    "reference_period": 1,
+    "n_post_periods": 3
+  },
   "meta": {
     "source": "clubSandwich",
     "clubSandwich_version": "0.7.0",
     "R_version": "R version 4.5.2 (2025-10-31)",
-    "generated_at": "2026-05-20 00:53:14 UTC",
-    "note": "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
+    "generated_at": "2026-05-20 14:55:31 UTC",
+    "note": "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm. Unweighted scenarios pin against _compute_cr2_bm / _compute_bm_dof_oneway; weighted scenarios pin the clubSandwich WLS-CR2 port (W not sqrt(W), W^2 bias term, unweighted residuals)."
   }
 }
diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py
index e764d3a8..aa8a3ef5 100644
--- a/diff_diff/estimators.py
+++ b/diff_diff/estimators.py
@@ -1891,8 +1891,13 @@ def _refit_mp_absorb(w_r):
                 else:
                     # Cluster-aware CR2 BM Satterthwaite DOF for per-coefficient
                     # AND post-period-average compound contrast (Gate 6 lift).
-                    # Weighted CR2-BM is a separate gate; survey paths never
-                    # reach this block (outer `not _use_survey_vcov` guard).
+                    # This branch is guarded above by `not _use_survey_vcov`,
+                    # so when reached, `survey_weights` is None (survey designs
+                    # always route through the TSL path). The clubSandwich
+                    # WLS-CR2 port lifted `_compute_cr2_bm_contrast_dof` to
+                    # accept `weights=`, but no MPD entry point currently
+                    # passes non-None weights here — `weights=None` is the
+                    # de facto contract on this code path today.
                     _dof_all = _compute_cr2_bm_contrast_dof(
                         X_kept,
                         effective_cluster_ids,
diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py
index d9d9f652..63e763f2 100644
--- a/diff_diff/linalg.py
+++ b/diff_diff/linalg.py
@@ -538,10 +538,11 @@ def solve_ols(
           ``cluster_ids`` (use ``"hc2_bm"`` for clustered Bell-McCaffrey).
         - ``"hc2_bm"``: HC2 + Imbens-Kolesar (2016) Satterthwaite DOF one-way;
           Pustejovsky-Tipton (2018) CR2 Bell-McCaffrey with ``cluster_ids``.
-          **Not supported with weights** (either one-way or clustered):
-          raises ``NotImplementedError`` because the BM DOF helper is
-          inconsistent with ``solve_ols``'s WLS transform. Tracked in
-          ``TODO.md``.
+          With ``weights``, dispatches to the clubSandwich WLS-CR2 port —
+          supported for ``weight_type="pweight"`` only. ``aweight`` and
+          ``fweight`` raise ``NotImplementedError`` (port matches the
+          ``pweight`` convention only; aweight/fweight derivations are a
+          separate methodology task).
         - ``"conley"``: Conley (1999) spatial-HAC sandwich. Requires
           ``conley_coords`` (n × 2 array) and ``conley_cutoff_km`` (positive
           bandwidth, no default per Conley 1999 Section 5's sensitivity-grid
@@ -1118,13 +1119,13 @@ def _validate_vcov_args(
         combined with a ``vcov_type`` that is one-way only (``classical``,
         ``hc2``).
     NotImplementedError
-        If ``vcov_type == "hc2_bm"`` is combined with ``weights`` (with OR
-        without ``cluster_ids``). The weighted Bell-McCaffrey DOF requires
-        re-deriving the Satterthwaite ratio on the WLS-transformed design
-        ``X* = sqrt(w) X`` to match ``solve_ols``'s residual convention;
-        until that derivation is in place, the path raises rather than
-        shipping silently-inconsistent small-sample p-values. Tracked in
-        ``TODO.md``.
+        If ``vcov_type == "conley"`` is combined with ``weights`` (the
+        Bertanha-Imbens 2014 weighted-Conley methodology is a separate
+        follow-up). NOT raised here for ``hc2_bm + weights``: that
+        weight-type contract is enforced downstream in
+        ``_compute_robust_vcov_numpy`` (which has access to ``weight_type``
+        and rejects ``aweight`` / ``fweight`` while routing ``pweight``
+        through the clubSandwich WLS-CR2 port).
     """
     if vcov_type not in _VALID_VCOV_TYPES:
         raise ValueError(
@@ -1141,36 +1142,14 @@ def _validate_vcov_args(
             ),
         }[vcov_type]
         raise ValueError(msg)
-    if vcov_type == "hc2_bm" and weights is not None:
-        # Weighted Bell-McCaffrey (both one-way and cluster) is deferred to
-        # Phase 2+. The current `_compute_bm_dof_from_contrasts` builds the
-        # hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but
-        # `solve_ols` solves the weighted problem by transforming to
-        # `X* = sqrt(w) X`, `y* = sqrt(w) y` and computing residuals on that
-        # transformed system. The transformed hat matrix
-        # `H* = sqrt(W) X (X'WX)^{-1} X' sqrt(W)` is symmetric idempotent and
-        # is the correct residual-maker for the Satterthwaite ratio
-        # `(tr G)^2 / tr(G^2)`; the asymmetric `H = X (X'WX)^{-1} X' W`
-        # currently produced here is not, so the BM DOF it yields is
-        # internally inconsistent with `solve_ols`'s WLS convention. Rather
-        # than ship silently-wrong small-sample p-values, we fail fast.
-        # Tracked in TODO.md under Methodology/Correctness (rederive the BM
-        # DOF on the transformed WLS design + add weighted parity tests).
-        if cluster_ids is not None:
-            raise NotImplementedError(
-                "vcov_type='hc2_bm' with both cluster_ids and weights is a "
-                "Phase 2+ follow-up. Use vcov_type='hc1' for weighted cluster-"
-                "robust, or drop weights for CR2 Bell-McCaffrey."
-            )
-        raise NotImplementedError(
-            "vcov_type='hc2_bm' with weights is a Phase 2+ follow-up: the "
-            "Bell-McCaffrey DOF helper builds the hat matrix on the "
-            "unscaled design, which is inconsistent with solve_ols's WLS "
-            "transformation (X* = sqrt(w) X). Shipping in this state would "
-            "produce silently-wrong small-sample p-values. Use vcov_type="
-            "'hc1' for weighted HC1, or drop weights for one-way HC2 + "
-            "Bell-McCaffrey. Tracked in TODO.md."
-        )
+    # Weighted Bell-McCaffrey (both one-way and cluster) is now supported via
+    # the clubSandwich WLS-CR2 port. See `_compute_cr2_bm` and
+    # `_compute_bm_dof_from_contrasts` for the algorithm: clubSandwich uses an
+    # asymmetric weighted hat `H_gg = X_g M_U X_g' W_g` (W, not sqrt(W)) with
+    # a W² bias-correction term `S_W = sum_g X_g' W_g² X_g` in the
+    # Satterthwaite numerator. Matches `clubSandwich::vcovCR(lm(weights=w),
+    # type="CR2") + coef_test(test="Satterthwaite")$df_Satt` at atol=1e-10.
+    # Unweighted CR2-BM behavior is unchanged (regression-safe).
     if vcov_type == "conley":
         # Conley + cluster_ids is now supported (combined spatial + cluster
         # product kernel; see ``docs/methodology/REGISTRY.md`` § ConleySpatialHAC
@@ -1283,8 +1262,9 @@ def compute_robust_vcov(
       Pustejovsky-Tipton (2018) CR2 Bell-McCaffrey cluster-robust estimator
       (matches R ``clubSandwich::vcovCR(..., type="CR2")``). Required by the
       Pierce-Schott (2016) TWFE application in de Chaisemartin et al. (2026)
-      with ``G=103``. Weighted clustered CR2 is the Phase 2+ follow-up and
-      raises ``NotImplementedError``.
+      with ``G=103``. **Weighted hc2_bm** (both one-way and clustered) is
+      supported for ``weight_type="pweight"`` only via the clubSandwich
+      WLS-CR2 port; ``aweight`` and ``fweight`` raise ``NotImplementedError``.
     - ``"conley"``: spatial HAC sandwich (Conley 1999 Eq 4.2). Requires
       ``conley_coords`` (n×2 array) and ``conley_cutoff_km`` (positive
       bandwidth). Two operating modes: cross-sectional (default) and panel
@@ -1306,9 +1286,12 @@ def compute_robust_vcov(
         OLS residuals.
     cluster_ids : ndarray of shape (n,), optional
         Cluster identifiers. Valid with ``vcov_type="hc1"`` (dispatches to CR1)
-        and ``vcov_type="hc2_bm"`` (dispatches to CR2 Bell-McCaffrey).
+        and ``vcov_type="hc2_bm"`` (dispatches to CR2 Bell-McCaffrey, including
+        the weighted-CR2 clubSandwich port for ``weight_type="pweight"``).
         Combining with ``classical`` or ``hc2`` raises ``ValueError``.
-        Combining with ``hc2_bm`` AND ``weights`` raises ``NotImplementedError``.
+        Combining ``hc2_bm`` with ``weights`` and ``weight_type ∈ {"aweight",
+        "fweight"}`` raises ``NotImplementedError`` — the port matches the
+        ``pweight`` convention only.
     weights : ndarray of shape (n,), optional
         Observation weights. If provided, computes weighted sandwich estimator.
     weight_type : str, default "pweight"
@@ -1486,10 +1469,16 @@ def _compute_hat_diagonals(
     return np.asarray(h_diag, dtype=np.float64)
 
 
-def _cr2_adjustment_matrix(I_minus_H_gg: np.ndarray, tol: float = 1e-10) -> np.ndarray:
-    """Symmetric matrix square root of ``(I - H_gg)^{-1}`` via eigendecomposition.
+def _cr2_adjustment_matrix(G_g: np.ndarray, tol: float = 1e-10) -> np.ndarray:
+    """Symmetric matrix square root of ``G_g^{-1}`` via eigendecomposition.
+
+    For the unweighted case, ``G_g = I - H_gg``. For the WLS case (clubSandwich
+    convention), ``G_g = I - H_gg - H_gg' + X_g M_U S_W M_U X_g'`` where
+    ``H_gg = X_g M_U X_g' W_g`` (asymmetric weighted hat) and
+    ``S_W = sum_g X_g' W_g² X_g``. The algebra (symmetric eigendecomp +
+    pseudoinverse) is identical in both cases.
 
-    For a real symmetric positive-semidefinite ``I - H_gg``, eigendecompose as
+    For real symmetric positive-semidefinite ``G_g``, eigendecompose as
     ``U diag(s) U'`` and return ``U diag(s^{-1/2}) U'`` with pseudoinverse
     handling: eigenvalues below ``tol`` are treated as zero (Moore-Penrose).
     Handles singleton clusters, absorbed cluster FEs (``H_gg`` has eigenvalue
@@ -1497,7 +1486,7 @@ def _cr2_adjustment_matrix(I_minus_H_gg: np.ndarray, tol: float = 1e-10) -> np.n
     R ``clubSandwich::vcovCR(..., type="CR2")``.
     """
     # Ensure symmetric — the bread_inv arithmetic can leave tiny asymmetry.
-    sym = 0.5 * (I_minus_H_gg + I_minus_H_gg.T)
+    sym = 0.5 * (G_g + G_g.T)
     eigvals, eigvecs = np.linalg.eigh(sym)
     inv_sqrt = np.where(eigvals > tol, 1.0 / np.sqrt(np.maximum(eigvals, tol)), 0.0)
     return (eigvecs * inv_sqrt) @ eigvecs.T
@@ -1508,43 +1497,90 @@ def _compute_cr2_bm(
     residuals: np.ndarray,
     cluster_ids: np.ndarray,
     bread_matrix: np.ndarray,
+    weights: Optional[np.ndarray] = None,
 ) -> Tuple[np.ndarray, np.ndarray]:
     """CR2 Bell-McCaffrey cluster-robust variance with per-coefficient DOF.
 
-    Implements the formula of Bell-McCaffrey (2002) as refined by
-    Pustejovsky-Tipton (2018) / `clubSandwich::vcovCR(..., type="CR2")`.
-
-    For each cluster ``g``:
-      - ``H_gg = X_g bread_inv X_g'`` (n_g x n_g).
-      - ``A_g = (I - H_gg)^{-1/2}`` via symmetric eigendecomposition with
+    Implements ``clubSandwich::vcovCR(..., type="CR2") + coef_test(test=
+    "Satterthwaite")`` for both unweighted and weighted ``lm`` fits. The
+    weighted form uses clubSandwich's specific WLS-CR2 algebra (which is
+    NOT a textbook PT2018 §3.3 transform-once derivation; see
+    docs/methodology/REGISTRY.md for the algorithm details).
+
+    For each cluster ``g`` (with normalized weights ``W_norm = w / mean(w)``):
+      - ``H_gg = X_g M_U X_g' W_g`` (asymmetric weighted hat; W not sqrt(W))
+      - ``S_W = sum_g X_g' W_g² X_g`` (W² in the bias-correction term)
+      - ``G_g = I - H_gg - H_gg' + X_g M_U S_W M_U X_g'``
+      - ``A_g = G_g^{-1/2}`` via symmetric eigendecomposition with
         pseudoinverse handling (see :func:`_cr2_adjustment_matrix`).
-      - Per-cluster score ``s_g = X_g' A_g u_g``.
-    Meat = ``sum_g s_g s_g'``; VCOV = ``bread_inv meat bread_inv``.
+      - Per-cluster score ``s_g = X_g' W_g A_g u_g`` (u_g = raw residual)
+
+    Unweighted special case (``weights=None``): ``W_norm=1``, ``S_W=X'X``,
+    ``M_U @ S_W @ M_U = M_U``, so ``G_g`` collapses to ``I - H_gg`` (the
+    symmetric form). Bit-equal to the prior unweighted behavior at machine
+    precision (atol=1e-14 regression-safety).
 
-    Per-coefficient Satterthwaite DOF for contrast ``c_j = e_j``:
+    Meat = ``sum_g s_g s_g'``; VCOV = ``M_U meat M_U`` (where ``M_U`` is the
+    normalized bread inverse; ``w_scale`` cancels in the final vcov).
 
-        omega_g = A_g X_g bread_inv c_j  (length n_g)
-        trace(B) = sum_i (X_i' bread_inv c_j)^2
-        trace(B^2) = sum_{g, h} (omega_g' M_{g, h} omega_h)^2
+    Per-coefficient Satterthwaite DOF: see :func:`_cr2_bm_dof_inner` for the
+    unweighted simple formula and the weighted full P_array construction.
 
-    where ``M = I - X bread_inv X'``. DOF_j = trace(B)^2 / trace(B^2).
+    Parameters
+    ----------
+    X : ndarray of shape (n, k)
+    residuals : ndarray of shape (n,)
+        Raw residuals ``y - X beta_hat`` from the (weighted) fit.
+    cluster_ids : ndarray of shape (n,)
+    bread_matrix : ndarray of shape (k, k)
+        ``X'WX`` if weighted, ``X'X`` if unweighted.
+    weights : ndarray of shape (n,), optional
+        Original (un-normalized) weights. ``None`` for unweighted.
 
     Returns
     -------
     vcov : ndarray of shape (k, k)
     dof_vec : ndarray of shape (k,)
-
-    Notes
-    -----
-    Unweighted only. Weighted CR2 is a Phase 2+ follow-up; the signature would
-    need to thread ``weights`` through the hat-matrix and residual rebalancing
-    (per clubSandwich's WLS handling). The call site in
-    :func:`_compute_robust_vcov_numpy` raises before dispatching when weights
-    are present alongside ``vcov_type="hc2_bm"`` + cluster.
     """
     n, k = X.shape
     cluster_ids_arr = np.asarray(cluster_ids)
     unique_clusters = np.unique(cluster_ids_arr)
+    # When weights are provided, enforce subpopulation invariance: zero-weight
+    # rows must contribute nothing to the sandwich. The earlier "drop zero-
+    # total-weight clusters only" guard handled all-zero clusters but missed
+    # mixed-zero clusters (positive total weight, some zero-weight rows
+    # inside). In mixed-zero clusters, the zero-weight rows still entered the
+    # CR2 adjustment matrices (H_gg, G_g, A_g, bias_term) on the row side,
+    # silently changing SE/DOF — contradicting the linalg contract that
+    # zero-weight rows are inert. Fix: physically filter to `weights > 0`
+    # rows before all per-cluster computations. CI codex flagged this as P0
+    # on PR #475 round 2.
+    if weights is not None:
+        weights_arr = np.asarray(weights, dtype=np.float64)
+        positive_mask = weights_arr > 0
+        if not np.all(positive_mask):
+            X = X[positive_mask]
+            residuals = residuals[positive_mask]
+            cluster_ids_arr = cluster_ids_arr[positive_mask]
+            weights_arr = weights_arr[positive_mask]
+            weights = weights_arr  # Rebind for downstream w_scale/W_norm logic
+            n = X.shape[0]
+            # bread_matrix is invariant to zero-weight row removal: the caller
+            # computes `X.T @ (X * w[:, None])`, and zero-weight rows contribute
+            # zero to that sum. So bread_matrix passed in is already equivalent
+            # to bread_matrix on the filtered design. No rebuild needed.
+            # Recount unique clusters after filtering.
+            unique_clusters = np.unique(cluster_ids_arr)
+        eff_clusters = np.array(
+            [g for g in unique_clusters if float(np.sum(weights_arr[cluster_ids_arr == g])) > 0]
+        )
+        if len(eff_clusters) < 2:
+            raise ValueError(
+                f"Need at least 2 clusters with positive total weight for "
+                f"cluster-robust SEs, got {len(eff_clusters)} effective "
+                f"clusters out of {len(unique_clusters)} unique."
+            )
+        unique_clusters = eff_clusters
     G = len(unique_clusters)
     if G < 2:
         raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {G}")
@@ -1559,43 +1595,79 @@ def _compute_cr2_bm(
             ) from e
         raise
 
-    # Precompute the full residual-maker M = I - H (n x n). O(n^2 k) build.
-    # For CR2 BM DOF, we need M_{g, h} blocks across cluster pairs, so the
-    # full matrix is the cleanest representation. Note: n should be small
-    # to modest for cluster-robust DiD use cases.
-    H = X @ bread_inv @ X.T
-    M = np.eye(n) - H
+    # Normalize weights: w_scale = mean(w[w>0]), W_norm = w / w_scale. Following
+    # clubSandwich convention, all internal algebra uses M_U = (X' W_norm X)^{-1}
+    # = w_scale * bread_inv. For unweighted (w_scale=1), M_U == bread_inv and the
+    # algorithm reduces bit-equally to the prior unweighted form.
+    if weights is not None:
+        weights_arr = np.asarray(weights, dtype=np.float64)
+        pos = weights_arr > 0
+        w_scale = float(np.mean(weights_arr[pos])) if np.any(pos) else 1.0
+        W_norm = weights_arr / w_scale
+        M_U = w_scale * bread_inv  # = (X' W_norm X)^{-1}
+    else:
+        w_scale = 1.0
+        W_norm = np.ones(n, dtype=np.float64)
+        M_U = bread_inv
 
-    # Per-cluster indices and adjustment matrices. Compute once, reuse for
-    # meat assembly and DOF loop.
+    # Per-cluster indices
     cluster_idx = {g: np.where(cluster_ids_arr == g)[0] for g in unique_clusters}
+
+    # S_W = sum_g X_g' diag(W_norm_g²) X_g (used for both vcov and DOF).
+    # For unweighted (W_norm=1): S_W = X'X = bread_matrix.
+    S_W = np.zeros((k, k))
+    for g in unique_clusters:
+        idx_g = cluster_idx[g]
+        X_g = X[idx_g]
+        S_W += X_g.T @ (X_g * (W_norm[idx_g] ** 2)[:, np.newaxis])
+    MUWTWUM = M_U @ S_W @ M_U
+
+    # Per-cluster adjustment matrices A_g via G_g (which collapses to
+    # `I - H_gg` in the unweighted special case).
     A_g_matrices: Dict[Any, np.ndarray] = {}
     for g in unique_clusters:
         idx_g = cluster_idx[g]
-        H_gg = H[np.ix_(idx_g, idx_g)]
+        X_g = X[idx_g]
+        # Asymmetric weighted hat block. For unweighted (W_norm=1): H_gg is the
+        # standard symmetric `X_g @ M_U @ X_g.T`.
+        H_gg = (X_g @ M_U @ X_g.T) * W_norm[idx_g][np.newaxis, :]
         I_g = np.eye(len(idx_g))
-        A_g_matrices[g] = _cr2_adjustment_matrix(I_g - H_gg)
+        bias_term = X_g @ MUWTWUM @ X_g.T
+        G_g = I_g - H_gg - H_gg.T + bias_term
+        A_g_matrices[g] = _cr2_adjustment_matrix(G_g)
 
     # --- VCOV (meat) ---
-    # Adjusted per-cluster scores, stacked as a (G, k) matrix so the meat is
-    # cluster_scores.T @ cluster_scores.
+    # Per-cluster score: s_g = X_g' diag(W_norm_g) A_g u_g.
     cluster_scores = np.zeros((G, k))
     for gi, g in enumerate(unique_clusters):
         idx_g = cluster_idx[g]
         u_g = residuals[idx_g]
         A_g = A_g_matrices[g]
-        # X_g' @ (A_g @ u_g) = score of shape (k,)
-        cluster_scores[gi] = X[idx_g].T @ (A_g @ u_g)
+        adjusted = A_g @ u_g
+        cluster_scores[gi] = X[idx_g].T @ (W_norm[idx_g] * adjusted)
     meat = cluster_scores.T @ cluster_scores
-    temp = np.linalg.solve(bread_matrix, meat)
-    vcov = np.linalg.solve(bread_matrix, temp.T).T
+    vcov = M_U @ meat @ M_U
 
     # --- Per-coefficient Bell-McCaffrey cluster DOF ---
-    # Delegate to the contrast-aware helper with `contrasts=I_k` so the
-    # per-coefficient case is `c = e_j` (the j-th basis vector). Bit-identity
-    # vs the prior inline loop holds at machine precision because the same
-    # X_bi / A_g_Xbi precomputes are reused under the same matmul ordering.
-    dof_vec = _cr2_bm_dof_inner(X, M, A_g_matrices, cluster_idx, bread_inv, np.eye(k))
+    # Delegate to the contrast-aware helper. The helper branches on `weights`:
+    # unweighted uses the simple `(tr B)² / tr(B²)` form (bit-equal to prior);
+    # weighted uses the full clubSandwich P_array construction.
+    if weights is None:
+        # Build the symmetric residual-maker M = I - H for the simple formula.
+        H = X @ M_U @ X.T
+        M = np.eye(n) - H
+        dof_vec = _cr2_bm_dof_inner(X, M, A_g_matrices, cluster_idx, M_U, np.eye(k))
+    else:
+        dof_vec = _cr2_bm_dof_inner_weighted(
+            X,
+            A_g_matrices,
+            cluster_idx,
+            M_U,
+            MUWTWUM,
+            W_norm,
+            np.eye(k),
+            w_scale=w_scale,
+        )
 
     return vcov, dof_vec
 
@@ -1665,11 +1737,196 @@ def _cr2_bm_dof_inner(
     return dof_vec
 
 
+def _cr2_bm_dof_inner_weighted(
+    X: np.ndarray,
+    A_g_matrices: Dict[Any, np.ndarray],
+    cluster_idx: Dict[Any, np.ndarray],
+    bread_inv: np.ndarray,
+    MUWTWUM: np.ndarray,
+    W_norm: np.ndarray,
+    contrasts: np.ndarray,
+    w_scale: float = 1.0,
+) -> np.ndarray:
+    """Per-contrast Satterthwaite DOF for WLS-CR2 via clubSandwich's P_array form.
+
+    Implements ``clubSandwich::vcovCR(..., type="CR2") + coef_test(test=
+    "Satterthwaite")`` for the weighted case. Source: jepusto/clubSandwich
+    `R/get_arrays.R::get_GH` (inverse_var=FALSE branch) +
+    `R/coef_test.R::Satterthwaite_df`.
+
+    For each cluster ``g``:
+      ``E_g  = X_g' diag(W_norm_g) A_g``                  (k × n_g)
+      ``ME_g = bread_inv @ E_g``                          (k × n_g)
+      ``MEU_g = ME_g @ X_g``                              (k × k)
+      ``MEF_g = ME_g @ diag(W_norm_g) @ X_g``             (k × k)
+      ``H1_g = MEU_g @ M_U_ct``,
+      ``H2_g = MEF_g @ M_U_ct``,
+      ``H3_g = MEU_g @ Omega_ct``
+    where ``M_U_ct = chol(bread_inv).T``, ``Omega = MUWTWUM``,
+    ``Omega_ct = chol(Omega).T``.
+
+    For each contrast ``c`` (column of ``contrasts``), build a ``(J × J)``
+    matrix ``P`` indexed by cluster pairs:
+      ``P[g, h] = (c' H3_g)(H3_h' c) - (c' H1_g)(H2_h' c) - (c' H2_g)(H1_h' c)``
+      ``P[g, g] += sum_n (c' ME_g[:, n])²``      (diagonal correction)
+
+    ``df_satt(c) = (tr P)² / sum(P²)``.
+
+    Note: the ``w_scale`` normalization cancels in this ratio (verified Step 0).
+    All intermediate matrices use ``bread_inv = (X' W_norm X)^{-1}`` and
+    ``W_norm = w / mean(w)``.
+    """
+    X.shape[1]
+    m = contrasts.shape[1]
+    unique_clusters = list(cluster_idx.keys())
+    len(unique_clusters)
+
+    # "Square-root" factors L such that L @ L.T equals the target matrix.
+    # Try Cholesky first (matches clubSandwich's `t(chol(...))` exactly when
+    # the matrix is PD — important for full-dummy FE designs where eigh-based
+    # symmetric square roots disagree with chol on off-diagonal H-array terms
+    # by enough to materially shift Satterthwaite DOF on high-leverage
+    # coefficients). Fall back to a symmetric-eigendecomposition pseudo-
+    # square-root only on rank-deficient designs (where chol raises), so the
+    # MultiPeriodDiD-style full-dummy designs that exposed the original
+    # singular-bread case still proceed.
+    _TOL = 1e-10
+
+    def _factor_psd(M):
+        sym = 0.5 * (M + M.T)
+        try:
+            return np.linalg.cholesky(sym)
+        except np.linalg.LinAlgError:
+            eigvals, eigvecs = np.linalg.eigh(sym)
+            sqrt_eig = np.where(eigvals > _TOL, np.sqrt(np.maximum(eigvals, _TOL)), 0.0)
+            return eigvecs * sqrt_eig[np.newaxis, :]
+
+    M_U_ct = _factor_psd(bread_inv)
+    Omega_ct = _factor_psd(MUWTWUM)
+
+    # Per-cluster (k × k) H-array slices and (k × n_g) G slices.
+    # Use clubSandwich's exact operation ordering: ME_g = M @ E_g where M is
+    # the UN-normalized bread inverse `(X' W_orig X)^{-1} = bread_inv / w_scale`,
+    # NOT `M_U_norm = bread_inv` (the normalized form). w_scale cancels in
+    # the final DOF ratio mathematically, but using the wrong M-convention
+    # shifts the float64 roundoff floor on high-leverage contrasts (e.g.,
+    # full-dummy FE coefficients) enough to produce 15-30% DOF discrepancies
+    # vs `clubSandwich::vcovCR + coef_test()$df_Satt`. Match R's exact
+    # operation ordering to reproduce R's roundoff structure.
+    M_unnorm = bread_inv / w_scale  # R's "M" = (X' W_orig X)^{-1}
+    H1_list: List[np.ndarray] = []
+    H2_list: List[np.ndarray] = []
+    H3_list: List[np.ndarray] = []
+    G_list: List[np.ndarray] = []
+    for g in unique_clusters:
+        idx_g = cluster_idx[g]
+        X_g = X[idx_g]
+        W_g_diag = W_norm[idx_g]  # length n_g
+        A_g = A_g_matrices[g]
+
+        # E_g = X_g.T @ diag(W_g) @ A_g  (k × n_g)
+        E_g = (X_g.T * W_g_diag[np.newaxis, :]) @ A_g
+        # R's ME_g uses M = bread_inv_unnormalized = bread_inv / w_scale.
+        ME_g = M_unnorm @ E_g  # (k × n_g)
+
+        MEU_g = ME_g @ X_g  # (k × k)
+        MEF_g = ME_g @ (W_g_diag[:, np.newaxis] * X_g)  # (k × k)
+
+        H1_list.append(MEU_g @ M_U_ct)
+        H2_list.append(MEF_g @ M_U_ct)
+        H3_list.append(MEU_g @ Omega_ct)
+        G_list.append(ME_g)  # (k × n_g)
+
+    # For each contrast c, build P (J × J) and compute df_satt.
+    # Also retain (tr P, ||P||²_F, max(|P|)) per contrast so we can detect
+    # noise-floor degeneracies and NaN-guard them in a second pass.
+    dof_vec = np.empty(m)
+    tr_P_arr = np.empty(m)
+    sum_P_sq_arr = np.empty(m)
+    max_abs_P_arr = np.empty(m)
+    for j in range(m):
+        c = contrasts[:, j]  # (k,)
+        # Precompute c-projections: (J,)-indexed length-k vectors
+        c_H1 = np.array([c @ H1 for H1 in H1_list])  # (J, k)
+        c_H2 = np.array([c @ H2 for H2 in H2_list])  # (J, k)
+        c_H3 = np.array([c @ H3 for H3 in H3_list])  # (J, k)
+
+        # P[g, h] = c_H3[g] · c_H3[h] - c_H1[g] · c_H2[h] - c_H2[g] · c_H1[h]
+        P = c_H3 @ c_H3.T - c_H1 @ c_H2.T - c_H2 @ c_H1.T  # (J, J)
+
+        # Diagonal correction: P[g, g] += sum_n (c' ME_g[:, n])²
+        for gi, G_g in enumerate(G_list):
+            c_ME_g = c @ G_g  # (n_g,)
+            P[gi, gi] += float(c_ME_g @ c_ME_g)
+
+        tr_P = float(np.trace(P))
+        sum_P_sq = float(np.sum(P * P))
+        max_abs_P = float(np.max(np.abs(P))) if P.size else 0.0
+        tr_P_arr[j] = tr_P
+        sum_P_sq_arr[j] = sum_P_sq
+        max_abs_P_arr[j] = max_abs_P
+        dof_vec[j] = (tr_P * tr_P) / sum_P_sq if sum_P_sq > 0 else np.nan
+
+    # Noise-floor NaN-guard. The Satterthwaite DOF formula `(tr P)² / sum(P²)`
+    # is scale-invariant, so if a coefficient's P matrix is dominated by
+    # float64 accumulation noise (`max(|P|)` is many orders of magnitude
+    # smaller than the largest contrast's `max(|P|)`), the DOF computed from
+    # noise/noise is unreliable and varies across BLAS implementations by
+    # 15-30%. Detection: a contrast's max(|P|) below `1e-10 ×` the largest
+    # contrast's max(|P|) signals the noise regime. R's clubSandwich gives
+    # different specific values in this regime due to its own BLAS reduction
+    # order; we return NaN with a warning rather than ship arbitrarily-
+    # different small-sample DOF on the user-facing surface.
+    # The noise-floor detector has two criteria, applied union-wise:
+    #   1. Batch-relative: a contrast's max(|P|) < 1e-10 × the largest contrast's
+    #      max(|P|). Useful when computing per-coefficient DOF for an entire design
+    #      (`contrasts=np.eye(k)`) — a single noise-floor coefficient stands out.
+    #   2. Absolute (single-contrast safe): max(|P|) < eps_floor based on the
+    #      bread matrix scale. Necessary because batch-relative reduces to
+    #      max|P| < 1e-10 × max|P| (i.e. False) for a single contrast, leaving
+    #      direct `_compute_cr2_bm_contrast_dof(...)` callers (e.g., MPD avg_att)
+    #      unprotected. Threshold: `(EPS × n × k × bread_inv_scale)²` covers the
+    #      worst-case dgemm accumulation roundoff floor for `H1/H2/H3 @ contrast`
+    #      products. CI codex flagged this as P2 (R8 round); regression test in
+    #      tests/test_methodology_wls_cr2.py::TestWLSCR2FEDoFNoiseGuard.
+    _EPS = np.finfo(np.float64).eps
+    n_obs, k_X = X.shape
+    bread_inv_scale = float(np.max(np.abs(bread_inv))) if bread_inv.size else 1.0
+    abs_noise_floor = (_EPS * n_obs * k_X * max(bread_inv_scale, 1.0)) ** 2
+
+    if m > 1:
+        max_P_overall = float(np.max(max_abs_P_arr))
+        relative_floor = 1e-10 * max_P_overall if max_P_overall > 0 else 0.0
+    else:
+        relative_floor = 0.0
+    noise_floor = max(relative_floor, abs_noise_floor)
+    degenerate = max_abs_P_arr < noise_floor
+    n_degenerate = int(np.sum(degenerate))
+    if n_degenerate > 0:
+        dof_vec[degenerate] = np.nan
+        warnings.warn(
+            f"Satterthwaite DOF for {n_degenerate} of {m} contrast(s) "
+            f"is at the float64 noise floor (max|P| < noise_floor = "
+            f"max({relative_floor:.3e}, {abs_noise_floor:.3e})); reporting "
+            f"NaN. This typically affects high-leverage FE-dummy "
+            f"coefficients whose contrast vector projects to near-zero on "
+            f"the design — the resulting DOF varies across BLAS "
+            f"implementations and is unreliable. The coefficient SEs "
+            f"remain valid; only the Satterthwaite DOF (and any t-test or "
+            f"CI that depends on it) is suppressed.",
+            UserWarning,
+            stacklevel=3,
+        )
+
+    return dof_vec
+
+
 def _compute_cr2_bm_contrast_dof(
     X: np.ndarray,
     cluster_ids: np.ndarray,
     bread_matrix: np.ndarray,
     contrasts: np.ndarray,
+    weights: Optional[np.ndarray] = None,
 ) -> np.ndarray:
     """Per-contrast CR2 Bell-McCaffrey Satterthwaite DOF.
 
@@ -1686,12 +1943,15 @@ def _compute_cr2_bm_contrast_dof(
         Per-observation cluster identifiers. NOT subscripted by any
         column mask — cluster IDs are unchanged by column drops.
     bread_matrix : ndarray of shape (k, k)
-        ``X.T @ X`` (or ``X.T @ W @ X`` for survey-weighted — though the
-        weighted CR2 path is deferred to a follow-up PR).
+        ``X.T @ X`` (unweighted) or ``X.T @ W @ X`` (weighted).
     contrasts : ndarray of shape (k, m)
         Each column is a contrast vector ``c`` for the linear combination
         ``c' beta``. The per-coefficient case is recovered with
         ``contrasts=np.eye(k)``.
+    weights : ndarray of shape (n,), optional
+        Original (un-normalized) weights. ``None`` for unweighted; routes
+        through the bit-equal simple ``(tr B)² / tr(B²)`` formula. When
+        provided, routes through the clubSandwich WLS-CR2 P_array form.
 
     Returns
     -------
@@ -1706,6 +1966,31 @@ def _compute_cr2_bm_contrast_dof(
     n, k = X.shape
     cluster_ids_arr = np.asarray(cluster_ids)
     unique_clusters = np.unique(cluster_ids_arr)
+    # Subpopulation invariance: physically filter `weights > 0` rows before
+    # building per-cluster matrices. Mirrors the same fix in `_compute_cr2_bm`
+    # (CI codex P0 on PR #475 round 2). Zero-weight rows must be inert; the
+    # caller's bread_matrix = X.T @ (X * w[:, None]) is invariant to their
+    # removal, so no bread rebuild is needed.
+    if weights is not None:
+        weights_arr_eff = np.asarray(weights, dtype=np.float64)
+        positive_mask = weights_arr_eff > 0
+        if not np.all(positive_mask):
+            X = X[positive_mask]
+            cluster_ids_arr = cluster_ids_arr[positive_mask]
+            weights_arr_eff = weights_arr_eff[positive_mask]
+            weights = weights_arr_eff
+            n = X.shape[0]
+            unique_clusters = np.unique(cluster_ids_arr)
+        eff_clusters = np.array(
+            [g for g in unique_clusters if float(np.sum(weights_arr_eff[cluster_ids_arr == g])) > 0]
+        )
+        if len(eff_clusters) < 2:
+            raise ValueError(
+                f"Need at least 2 clusters with positive total weight for "
+                f"cluster-robust SEs, got {len(eff_clusters)} effective "
+                f"clusters out of {len(unique_clusters)} unique."
+            )
+        unique_clusters = eff_clusters
     if len(unique_clusters) < 2:
         raise ValueError(
             f"Need at least 2 clusters for cluster-robust SEs, got " f"{len(unique_clusters)}"
@@ -1723,17 +2008,48 @@ def _compute_cr2_bm_contrast_dof(
             ) from e
         raise
 
-    H = X @ bread_inv @ X.T
-    M = np.eye(n) - H
+    # Normalize weights (clubSandwich convention; w_scale cancels in DOF ratio).
+    # Use M_U = (X' W_norm X)^{-1} = w_scale * bread_inv throughout, matching
+    # clubSandwich's internal matrix used in the P_array construction.
+    if weights is not None:
+        weights_arr = np.asarray(weights, dtype=np.float64)
+        pos = weights_arr > 0
+        w_scale = float(np.mean(weights_arr[pos])) if np.any(pos) else 1.0
+        W_norm = weights_arr / w_scale
+        M_U = w_scale * bread_inv
+    else:
+        W_norm = np.ones(n, dtype=np.float64)
+        M_U = bread_inv
+
     cluster_idx = {g: np.where(cluster_ids_arr == g)[0] for g in unique_clusters}
+
+    # S_W = sum_g X_g' diag(W_norm_g²) X_g. For unweighted: S_W = X'X = bread_matrix.
+    S_W = np.zeros((k, k))
+    for g in unique_clusters:
+        idx_g = cluster_idx[g]
+        X_g = X[idx_g]
+        S_W += X_g.T @ (X_g * (W_norm[idx_g] ** 2)[:, np.newaxis])
+    MUWTWUM = M_U @ S_W @ M_U
+
     A_g_matrices: Dict[Any, np.ndarray] = {}
     for g in unique_clusters:
         idx_g = cluster_idx[g]
-        H_gg = H[np.ix_(idx_g, idx_g)]
+        X_g = X[idx_g]
+        # Asymmetric weighted hat (collapses to symmetric when W_norm=1).
+        H_gg = (X_g @ M_U @ X_g.T) * W_norm[idx_g][np.newaxis, :]
         I_g = np.eye(len(idx_g))
-        A_g_matrices[g] = _cr2_adjustment_matrix(I_g - H_gg)
-
-    return _cr2_bm_dof_inner(X, M, A_g_matrices, cluster_idx, bread_inv, contrasts)
+        bias_term = X_g @ MUWTWUM @ X_g.T
+        G_g = I_g - H_gg - H_gg.T + bias_term
+        A_g_matrices[g] = _cr2_adjustment_matrix(G_g)
+
+    if weights is None:
+        # Bit-equal to prior unweighted path: simple (tr B)² / tr(B²) form.
+        H = X @ M_U @ X.T
+        M = np.eye(n) - H
+        return _cr2_bm_dof_inner(X, M, A_g_matrices, cluster_idx, M_U, contrasts)
+    return _cr2_bm_dof_inner_weighted(
+        X, A_g_matrices, cluster_idx, M_U, MUWTWUM, W_norm, contrasts, w_scale=w_scale
+    )
 
 
 def _compute_bm_dof_from_contrasts(
@@ -1745,29 +2061,42 @@ def _compute_bm_dof_from_contrasts(
 ) -> np.ndarray:
     """Per-contrast Bell-McCaffrey (Imbens-Kolesar 2016) Satterthwaite DOF.
 
-    For each column ``c`` of ``contrasts`` (shape ``(k, m)``), define
-    ``q = X (X'WX)^{-1} c`` (length ``n``). Under a homoskedastic null, the
-    HC2 variance estimator for ``c' beta`` has a weighted-chi-squared
-    distribution; matching mean and variance via Satterthwaite gives
+    Two code paths depending on ``weights``:
 
-        DOF(c) = (sum_i q(i)^2)^2 / sum_{i, k} a(i) a(k) M_{ik}^2
+    **Unweighted** (``weights is None``): uses the simple Pustejovsky-Tipton
+    (2018) Theorem 1 form
 
-    where ``M = I - H`` and ``a(i) = q(i)^2 / (1 - h_ii)``. Using the idempotent
-    identity ``M^2 = M``, ``trace(B) = sum_i q(i)^2`` matches the numerator.
+        DOF(c) = (sum_i q(i)^2)^2 / sum_{i, k} a(i) a(k) M_{ik}^2
 
-    Allocates an ``(n, n)`` temporary for ``M`` so the cost is ``O(n^2 k)`` for
-    the hat build plus ``O(n^2 m)`` for the per-contrast sums. Practical for
-    ``n < 10_000``; larger designs should switch to a scores-based formulation
-    (tracked in TODO.md).
+    where ``q = X (X'X)^{-1} c``, ``M = I - H``, ``a(i) = q(i)^2 / (1 - h_ii)``.
+    Using the idempotent identity ``M^2 = M``, ``trace(B) = sum_i q(i)^2``
+    matches the numerator. Allocates an ``(n, n)`` temporary for ``M`` so the
+    cost is ``O(n^2 k)`` for the hat build plus ``O(n^2 m)`` for the per-
+    contrast sums. Practical for ``n < 10_000``; larger designs should switch
+    to a scores-based formulation (tracked in TODO.md).
+
+    **Weighted** (``weights is not None``): dispatches to the clubSandwich
+    singleton-cluster CR2 reduction (each observation is its own cluster)
+    via :func:`_compute_cr2_bm_contrast_dof`. The simple formula above is
+    only correct in the unweighted case — empirically it diverges from
+    ``clubSandwich::vcovCR(cluster=1:n, type="CR2") + coef_test(test=
+    "Satterthwaite")$df_Satt`` by ~6% on heteroskedastic weights. The
+    P_array form matches clubSandwich at ``atol=1e-10`` (see the WLS-CR2
+    section in ``docs/methodology/REGISTRY.md``). Only ``weight_type=
+    "pweight"`` is supported; ``aweight`` / ``fweight`` are rejected by
+    ``_compute_robust_vcov_numpy`` upstream of this helper.
 
     Parameters
     ----------
     X : ndarray of shape (n, k)
     bread_matrix : ndarray of shape (k, k) == (X'WX) or (X'X)
-    h_diag : ndarray of shape (n,), hat-matrix diagonals (already weighted)
+    h_diag : ndarray of shape (n,), hat-matrix diagonals (already weighted).
+        Used only on the unweighted path; the weighted dispatch builds its
+        own per-cluster `H_gg` blocks via :func:`_compute_cr2_bm`.
     contrasts : ndarray of shape (k, m). Pass ``np.eye(k)`` for per-coefficient DOF.
-    weights : optional weights (shape ``(n,)``) used to build the weighted hat
-        matrix. When ``None``, unweighted.
+    weights : optional weights (shape ``(n,)``). When ``None``, uses the
+        simple ``(tr B)^2 / tr(B^2)`` formula. When provided, dispatches to
+        the clubSandwich singleton-cluster CR2 P_array form.
 
     Returns
     -------
@@ -1777,6 +2106,21 @@ def _compute_bm_dof_from_contrasts(
     n, k = X.shape
     if contrasts.ndim != 2 or contrasts.shape[0] != k:
         raise ValueError(f"contrasts must have shape (k={k}, m); got {contrasts.shape}")
+    if weights is not None:
+        # Weighted one-way HC2-BM uses clubSandwich's singleton-cluster CR2
+        # reduction (each obs is its own cluster). The simple (tr B)² / tr(B²)
+        # formula is only correct in the unweighted case — empirically it
+        # diverges from `clubSandwich::vcovCR(cluster=1:n, type="CR2") +
+        # coef_test(test="Satterthwaite")$df_Satt` by ~6% on heteroskedastic
+        # weights. The P_array form (via _cr2_bm_dof_inner_weighted) matches
+        # at atol=1e-10.
+        cluster_ids_singleton = np.arange(n)
+        return _compute_cr2_bm_contrast_dof(
+            X, cluster_ids_singleton, bread_matrix, contrasts, weights=weights
+        )
+
+    # Unweighted: keep the simple (tr B)² / tr(B²) formula for bit-equal
+    # backward compatibility with prior unweighted Bell-McCaffrey output.
     try:
         bread_inv_c = np.linalg.solve(bread_matrix, contrasts)
     except np.linalg.LinAlgError as e:
@@ -1788,11 +2132,7 @@ def _compute_bm_dof_from_contrasts(
         raise
     # q has shape (n, m); column j is X @ (bread_inv @ contrasts[:, j]).
     q = X @ bread_inv_c
-    # Build the weighted residual-maker M = I - H once.
-    if weights is not None:
-        H = X @ np.linalg.solve(bread_matrix, (X * weights[:, np.newaxis]).T)
-    else:
-        H = X @ np.linalg.solve(bread_matrix, X.T)
+    H = X @ np.linalg.solve(bread_matrix, X.T)
     M = np.eye(n) - H
     M_sq = M * M  # elementwise square
     one_minus_h = np.maximum(1.0 - h_diag, 1e-10)
@@ -1949,8 +2289,25 @@ def _compute_robust_vcov_numpy(
     # CR2 Bell-McCaffrey cluster-robust (vcov_type="hc2_bm" + cluster).
     # ------------------------------------------------------------------
     if vcov_type == "hc2_bm" and cluster_ids is not None:
-        # Weighted CR2 is Phase 2+; the public wrapper guards against it.
-        vcov_cr2, dof_cr2 = _compute_cr2_bm(X, residuals, cluster_ids, bread_matrix)
+        # The clubSandwich WLS-CR2 port matches the `pweight` (sampling-weight)
+        # convention only. clubSandwich's algebra puts `w` directly into the
+        # score (s_g = X_g' diag(W) A_g u_g), producing meat = sum w_g² ...
+        # — exactly diff-diff's `pweight` convention. `aweight` (analytical /
+        # inverse-variance) and `fweight` (frequency-expanded) require
+        # different CR2 derivations that are not in this port. Reject loudly.
+        if weights is not None and weight_type != "pweight":
+            raise NotImplementedError(
+                f"vcov_type='hc2_bm' with weight_type={weight_type!r} is not "
+                "supported. The clubSandwich WLS-CR2 port matches the "
+                "'pweight' (sampling-weight) convention only. For analytical "
+                "('aweight') or frequency ('fweight') weights with CR2 "
+                "Bell-McCaffrey, the derivation is a separate methodology "
+                "task. Use weight_type='pweight' or vcov_type='hc1' "
+                "(CR1 supports all three weight types)."
+            )
+        vcov_cr2, dof_cr2 = _compute_cr2_bm(
+            X, residuals, cluster_ids, bread_matrix, weights=weights
+        )
         if return_dof:
             return vcov_cr2, dof_cr2
         return vcov_cr2
@@ -1960,6 +2317,32 @@ def _compute_robust_vcov_numpy(
     # ------------------------------------------------------------------
     if vcov_type in ("hc2", "hc2_bm"):
         # cluster path handled above; here cluster_ids is None by construction.
+        # **Weighted hc2_bm one-way**: clubSandwich's CR2 with singleton clusters
+        # uses the bias-corrected adjustment `A_i = 1 / sqrt(G_i)` where
+        # `G_i = 1 - 2*h_ii + X_i' M_U S_W M_U X_i`, NOT the simple HC2
+        # leverage `1/sqrt(1-h_ii)`. The two agree unweighted (S_W = X'X, so
+        # the bias term collapses to h_ii) but diverge on weighted designs
+        # because S_W = sum_j w_j² X_j X_j' ≠ X'WX. To match
+        # `clubSandwich::vcovCR(cluster=1:n, type="CR2")`, route weighted
+        # hc2_bm through the singleton-cluster CR2 path so vcov and DOF are
+        # consistent (both use the clubSandwich algebra).
+        if vcov_type == "hc2_bm" and weights is not None:
+            # Same `weight_type` gating as the cluster-CR2 branch above:
+            # only pweight matches clubSandwich's WLS-CR2 algebra.
+            if weight_type != "pweight":
+                raise NotImplementedError(
+                    f"vcov_type='hc2_bm' with weight_type={weight_type!r} is "
+                    "not supported. The clubSandwich WLS-CR2 port matches "
+                    "the 'pweight' (sampling-weight) convention only. Use "
+                    "weight_type='pweight' or vcov_type='hc1' (CR1 supports "
+                    "all three weight types)."
+                )
+            vcov_cr2, dof_cr2 = _compute_cr2_bm(
+                X, residuals, np.arange(n), bread_matrix, weights=weights
+            )
+            if return_dof:
+                return vcov_cr2, dof_cr2
+            return vcov_cr2
         h_diag = _compute_hat_diagonals(X, bread_matrix, weights=weights)
         if np.any(h_diag > 1.0 + 1e-6):
             warnings.warn(
@@ -2592,11 +2975,13 @@ class LinearRegression:
         two conflict (e.g. ``robust=False, vcov_type="hc2"``), in which
         case ``__init__`` raises. See :func:`solve_ols` for the per-family
         semantics and unsupported combinations. For ``"hc2_bm"``: when
-        ``cluster_ids`` is provided, dispatches to CR2 Bell-McCaffrey; with
-        ``weights``, raises ``NotImplementedError`` (the BM DOF path is
-        currently inconsistent with the WLS transform). On top of the
-        sandwich, the class stores per-coefficient BM Satterthwaite DOF
-        (``self._bm_dof``) and threads it into ``get_inference``.
+        ``cluster_ids`` is provided, dispatches to CR2 Bell-McCaffrey;
+        with ``weights`` (one-way or clustered) dispatches to the
+        clubSandwich WLS-CR2 port — supported for
+        ``weight_type="pweight"`` only (``aweight`` / ``fweight`` raise
+        ``NotImplementedError``). On top of the sandwich, the class
+        stores per-coefficient BM Satterthwaite DOF (``self._bm_dof``)
+        and threads it into ``get_inference``.
 
         For ``"conley"`` (Conley 1999 spatial-HAC) two operating modes are
         supported on the `LinearRegression` / `compute_robust_vcov` surface:
@@ -2913,15 +3298,16 @@ def fit(
                 conley_lag_cutoff=self.conley_lag_cutoff,
             )
             # For hc2_bm, compute per-coefficient Bell-McCaffrey DOF. Both
-            # the one-way HC2+BM case and the cluster CR2 case are supported;
-            # the weighted cluster path (guarded in compute_robust_vcov) is
-            # Phase 2+ and is skipped here (falls through to self._bm_dof = None).
+            # the one-way HC2+BM case and the cluster CR2 case are supported,
+            # including the weighted clustered CR2 path via the clubSandwich
+            # WLS-CR2 port. The dispatcher already rejects non-pweight weight
+            # types for hc2_bm + weights, so reaching this block guarantees
+            # `_compute_cr2_bm` returns a finite per-coefficient DOF.
             if (
                 _fit_vcov_type == "hc2_bm"
                 and not _use_survey_vcov
                 and vcov is not None
                 and not np.all(np.isnan(coefficients))
-                and not (effective_cluster_ids is not None and _fit_weights is not None)
             ):
                 # Identified columns for DOF (rank-deficient case sets NaN coefs).
                 nan_mask = np.isnan(coefficients)
@@ -3302,7 +3688,22 @@ def get_inference(
             effective_df = self.survey_df_
         elif self._bm_dof is not None and 0 <= index < len(self._bm_dof):
             bm_val = self._bm_dof[index]
-            effective_df = None if (not np.isfinite(bm_val)) else float(bm_val)
+            if not np.isfinite(bm_val):
+                # NaN BM DOF means the noise-floor guard fired (typically a
+                # high-leverage FE-dummy contrast on weighted CR2-BM). Falling
+                # through to df=None would silently use the normal distribution
+                # and produce misleading p≈0 / zero-width CIs. Instead, return
+                # NaN inference fields for the affected coefficient.
+                return InferenceResult(
+                    coefficient=coef,
+                    se=se,
+                    t_stat=float("nan"),
+                    p_value=float("nan"),
+                    conf_int=(float("nan"), float("nan")),
+                    df=None,
+                    alpha=effective_alpha,
+                )
+            effective_df = float(bm_val)
         elif (
             hasattr(self, "survey_design")
             and self.survey_design is not None
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 14065416..00f74287 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2648,12 +2648,12 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
 **Requirements checklist (tracks implementation phase completion):**
 - [x] Phase 1a: Epanechnikov / triangular / uniform kernels with closed-form `κ_k` constants (`diff_diff/local_linear.py`).
 - [x] Phase 1a: Univariate local-linear regression at a boundary (`local_linear_fit` in `diff_diff/local_linear.py`).
-- [x] Phase 1a: HC2 + Bell-McCaffrey DOF correction in `diff_diff/linalg.py` via `vcov_type="hc2_bm"` enum (both one-way and CR2 cluster-robust with Imbens-Kolesar / Pustejovsky-Tipton Satterthwaite DOF). Weighted cluster CR2 raises `NotImplementedError` and is tracked as Phase 2+ in `TODO.md`.
+- [x] Phase 1a: HC2 + Bell-McCaffrey DOF correction in `diff_diff/linalg.py` via `vcov_type="hc2_bm"` enum (both one-way and CR2 cluster-robust with Imbens-Kolesar / Pustejovsky-Tipton Satterthwaite DOF). **Weighted Bell-McCaffrey (`hc2_bm + weights`, both one-way and cluster) is now supported on the analytical linalg surface (`compute_robust_vcov` / `solve_ols` / `LinearRegression`) via the clubSandwich WLS-CR2 port.** The port matches clubSandwich's `pweight` (sampling-weight) convention only — `weight_type="aweight"` and `weight_type="fweight"` raise `NotImplementedError` on `hc2_bm + weights` (separate methodology task; CR1 / `vcov_type="hc1"` continues to support all three weight types). Estimator-level `survey_design=` paths continue to use the Taylor-series linearization (TSL) survey variance, which takes precedence over the analytical sandwich (no change in this PR). **Known precision limit on high-leverage FE-dummy coefficients**: the Satterthwaite DOF formula `(tr P)² / sum(P²)` is at the float64 noise floor for contrasts whose projection onto the design is essentially zero (typically FE-dummy coefficients where the dummy fires in a single cluster). In this regime, BLAS reduction-order differences between NumPy and R's BLAS produce 15-30% DOF discrepancies despite vcov matching at machine precision. `_cr2_bm_dof_inner_weighted` detects this regime via two criteria applied union-wise — **(a) batch-relative**: per-contrast max|P| below `1e-10 ×` the largest contrast's max|P| (catches noise-floor coefficients in a per-coefficient sweep where a non-noise reference is available); **(b) absolute single-contrast safe**: per-contrast max|P| below `(EPS × n × k × max(bread_inv_scale, 1))²` (catches single-contrast calls like MPD avg_att where no batch reference exists) — and returns NaN with a `UserWarning` rather than ship arbitrarily-different DOF on the user-facing surface. The coefficient SEs remain valid; only the DOF (and any t-test / CI that depends on it) is suppressed. The diff-diff implementation matches clubSandwich's specific algebra (R source: `CR-adjustments.R::CR2`, `clubSandwich.R::vcov_CR`, `coef_test.R::Satterthwaite_df`), **not** a textbook Pustejovsky-Tipton (2018) §3.3 transform-once derivation. clubSandwich uses W (not √W) in the hat matrix `H_gg = X_g M_U X_g' W_g`, W² in the bias-correction term `S_W = Σ_g X_g' W_g² X_g`, and unweighted residuals in the score construction `s_g = X_g' W_g A_g e_g`. The Satterthwaite DOF uses the full H1/H2/H3 array construction (`get_arrays.R::get_GH`), not the simplified `(tr B)² / tr(B²)` form (which works for the unweighted case but diverges from clubSandwich by 0.5-30% on weighted designs — see `feedback_wls_cr2_clubsandwich_parity`). Pinned at atol=1e-10 against `clubSandwich::vcovCR(..., type="CR2") + coef_test(test="Satterthwaite")$df_Satt` and `Wald_test(test="HTZ")$df_denom` on six weighted scenarios in `benchmarks/data/clubsandwich_cr2_golden.json` (vcov + non-noise-floor per-coefficient DOF + compound-contrast DOF; high-leverage FE-dummy coefficients are suppressed to NaN per the precision-limit note below) (`tests/test_methodology_wls_cr2.py`). Unweighted CR2-BM behavior is bit-equal to prior (regression-safe via `TestUnweightedRegressionStillBitEqual` + `TestDOFFormulaDualPathEquivalence`). The one-way weighted HC2-BM path uses clubSandwich's singleton-cluster CR2 reduction (each obs is its own cluster), routed via `_compute_bm_dof_from_contrasts` when `weights is not None`. clubSandwich pin: `≥ 0.7.0`.
     - **Note (scope limitation on absorbed FE):** HC2 and HC2 + Bell-McCaffrey on within-transformed designs still depend on the FULL FE hat matrix because FWL preserves coefficients and residuals but NOT the hat matrix: `h_ii = x_i' (X'X)^{-1} x_i` on the reduced design is not the diagonal of the full FE projection, and CR2's block adjustment `A_g = (I - H_gg)^{-1/2}` likewise depends on the full cluster-block hat matrix. The status across the three estimators that previously rejected this combination:
         - **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})` — SUPPORTED (auto-route).** When the user pairs `absorb=` with HC2 / HC2-BM, `DiD.fit()` internally promotes the absorb columns to `fixed_effects=` so the existing full-dummy code path computes the algebraically correct vcov from the full FE projection. Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` (singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF; PT2018 §3.3 unweighted CR2 algebra). **User-visible surface change**: under the auto-route, the entire `DiDResults` (coefficients, vcov, residuals, fitted_values, r_squared) reflect the full-dummy fit rather than the within-transformed fit — the FE-dummy entries are included in `result.coefficients` / `result.vcov`, `r_squared` is computed on the un-demeaned outcome, and `residuals` / `fitted_values` are on the original scale. `result.att` is unaffected (FWL-equivalent). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). **Survey-design scope**: when `survey_design=` is supplied, the existing survey variance path (Taylor-series linearization / replicate weights) takes precedence over the analytical HC2/HC2-BM sandwich; the auto-route only changes the FE handling (removing the prior reject) and does not redirect to the analytical small-sample sandwich on survey fits.
-        - **`TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` — SUPPORTED (inline full-dummy build).** TWFE has no `absorb=` / `fixed_effects=` parameter (the unit + time FE are baked into the estimator's identity), so the same parameter-swap auto-route used for DiD-absorb / MPD-absorb is not directly applicable. Instead, `TwoWayFixedEffects.fit()` bypasses the within-transform when `vcov_type in {"hc2","hc2_bm"}` and builds the full-dummy design `[intercept, treated×post, covariates, factor(unit), factor(time)]` explicitly, then runs OLS through the standard `solve_ols` path so the leverage correction and BM DOF compute on the full FE projection. Verified at atol=1e-10 vs `lm(y ~ treat_post + factor(unit) + factor(post)) + sandwich::vcovHC(type="HC2")` for HC2 and vs `clubSandwich::vcovCR(cluster=seq_len(n), type="CR2")` for the singleton-cluster one-way HC2-BM Satterthwaite DOF; vs `vcovCR(cluster=unit, type="CR2")` for the auto-cluster CR2-BM path. **Auto-cluster default (non-survey analytical path):** TWFE's unit auto-cluster is preserved on `hc2_bm` (routes to CR2-BM at unit) and on `hc2 + wild_bootstrap` (the bootstrap consumes the cluster structure for resampling regardless of the analytical sandwich choice); dropped on explicit `hc2 + analytical` to match the one-way contract (the linalg validator rejects `hc2 + cluster_ids`). `hc2_bm + analytical` with no explicit cluster yields the auto-cluster CR2-BM path. **Survey-design exception:** under `survey_design=` with no explicit `cluster=`, TWFE intentionally keeps the documented implicit-PSU path (the auto-cluster is NOT injected into the survey PSU structure); users who want unit-level PSU injection under a survey design must pass explicit `cluster="unit"` or set `survey_design.psu` directly. **User-visible surface change** (matches the DiD-absorb / MPD-absorb disclosure above): under HC2 / HC2-BM, `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, and `result.r_squared` reflect the full-dummy fit rather than the within-transformed reduced fit (FE-dummy entries are included, `r_squared` is computed on the un-demeaned outcome, residuals/fitted are on the original scale). `result.att`, its SE, and analytical inference are unchanged (FWL-equivalent). HC1 / CR1 / Conley / classical paths remain on the within-transform (no leverage term in those vcov families). **Survey-design scope** (mirrors the DiD-absorb auto-route contract above): when `survey_design=` is supplied, the existing survey variance path (Taylor-series linearization for analytical-weight designs, or replicate-weight variance for BRR/Fay/JK1/JKn/SDR) takes precedence over the analytical HC2/HC2-BM sandwich; the full-dummy build only changes the FE handling (removing the prior reject) and does not redirect to the analytical small-sample sandwich on survey fits. **Replicate-weight survey designs** are blocked at the estimator level: `vcov_type in {"hc2","hc2_bm"}` + replicate weights raises `NotImplementedError` because the replicate refit path re-demeans per replicate, which doesn't compose with the full-dummy build (would require per-replicate full-dummy refit); workaround: use `vcov_type="hc1"` for replicate-weight CR1. `hc2_bm + weights` remains rejected upstream by the linalg validator (same gate as Gates 4-5 — weighted CR2 variants).
+        - **`TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` — SUPPORTED (inline full-dummy build).** TWFE has no `absorb=` / `fixed_effects=` parameter (the unit + time FE are baked into the estimator's identity), so the same parameter-swap auto-route used for DiD-absorb / MPD-absorb is not directly applicable. Instead, `TwoWayFixedEffects.fit()` bypasses the within-transform when `vcov_type in {"hc2","hc2_bm"}` and builds the full-dummy design `[intercept, treated×post, covariates, factor(unit), factor(time)]` explicitly, then runs OLS through the standard `solve_ols` path so the leverage correction and BM DOF compute on the full FE projection. Verified at atol=1e-10 vs `lm(y ~ treat_post + factor(unit) + factor(post)) + sandwich::vcovHC(type="HC2")` for HC2 and vs `clubSandwich::vcovCR(cluster=seq_len(n), type="CR2")` for the singleton-cluster one-way HC2-BM Satterthwaite DOF; vs `vcovCR(cluster=unit, type="CR2")` for the auto-cluster CR2-BM path. **Auto-cluster default (non-survey analytical path):** TWFE's unit auto-cluster is preserved on `hc2_bm` (routes to CR2-BM at unit) and on `hc2 + wild_bootstrap` (the bootstrap consumes the cluster structure for resampling regardless of the analytical sandwich choice); dropped on explicit `hc2 + analytical` to match the one-way contract (the linalg validator rejects `hc2 + cluster_ids`). `hc2_bm + analytical` with no explicit cluster yields the auto-cluster CR2-BM path. **Survey-design exception:** under `survey_design=` with no explicit `cluster=`, TWFE intentionally keeps the documented implicit-PSU path (the auto-cluster is NOT injected into the survey PSU structure); users who want unit-level PSU injection under a survey design must pass explicit `cluster="unit"` or set `survey_design.psu` directly. **User-visible surface change** (matches the DiD-absorb / MPD-absorb disclosure above): under HC2 / HC2-BM, `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, and `result.r_squared` reflect the full-dummy fit rather than the within-transformed reduced fit (FE-dummy entries are included, `r_squared` is computed on the un-demeaned outcome, residuals/fitted are on the original scale). `result.att`, its SE, and analytical inference are unchanged (FWL-equivalent). HC1 / CR1 / Conley / classical paths remain on the within-transform (no leverage term in those vcov families). **Survey-design scope** (mirrors the DiD-absorb auto-route contract above): when `survey_design=` is supplied, the existing survey variance path (Taylor-series linearization for analytical-weight designs, or replicate-weight variance for BRR/Fay/JK1/JKn/SDR) takes precedence over the analytical HC2/HC2-BM sandwich; the full-dummy build only changes the FE handling (removing the prior reject) and does not redirect to the analytical small-sample sandwich on survey fits. **Replicate-weight survey designs** are blocked at the estimator level: `vcov_type in {"hc2","hc2_bm"}` + replicate weights raises `NotImplementedError` because the replicate refit path re-demeans per replicate, which doesn't compose with the full-dummy build (would require per-replicate full-dummy refit); workaround: use `vcov_type="hc1"` for replicate-weight CR1. `hc2_bm + weights` (Gates 4-5) is now lifted via the clubSandwich WLS-CR2 port; the survey-design rejection here is a separate estimator-level gate (analytical sandwich vs survey TSL precedence), independent of the linalg validator.
         - **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` — SUPPORTED (auto-route).** Same auto-route pattern as `DifferenceInDifferences`: `MultiPeriodDiD.fit()` internally promotes the absorb columns to `fixed_effects=` for HC2 / HC2-BM callers, so the existing full-dummy code path computes the algebraically correct vcov from the full FE projection on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture; the parity target is a per-period interaction `treated:period_X` because MPD requires the `treated` column to be a time-invariant ever-treated indicator, which lies in the span of the intercept and the post-auto-route unit FE dummies (under `pd.get_dummies(..., drop_first=True)` the dropped reference unit is implicit in the intercept, so the exact alias relation depends on the omitted FE category — it is NOT simply "the sum of treated-cohort unit dummies"). `solve_ols` drops one column from the collinear set under R-style rank-deficiency handling; in the shipped parity fixture (4 ever-treated cohorts of 5 units + 1 never-treated cohort of 5 units) it drops a unit dummy from the never-treated cohort (`unit_25`) and the `treated` main effect remains finite, but the specific column that gets NaN'd is pivot-order and dummy-coding dependent. Either way, the slope coefficients (`treated:period_X`) and the post-period-average `avg_att` are identified and invariant to which column was dropped. Same `MultiPeriodDiDResults` surface change as DiD: `vcov`, `residuals`, `fitted_values`, `r_squared`, and `coefficients` reflect the full-dummy fit, with `period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` invariant by FWL. HC1/CR1 paths on `absorb=` are unchanged (no leverage term). Same survey-design scope as DiD: replicate-weight variance routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design rather than the per-replicate refit branch (which targets the demeaning path); since the auto-routed design does not depend on replicate weights, no refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `<time>_<X>` dummies for that entry because the design already absorbs time via the non-reference period dummies. Without the skip, those blocks collide on dummy names and `MultiPeriodDiDResults.coefficients` (built as `{name: coef for name, coef in zip(var_names, coefficients)}`) would silently drop duplicates, breaking the coefficients-vs-vcov alignment that downstream consumers (HonestDiD sub-VCV extraction, BusinessReport, etc.) rely on. The skip applies to BOTH the new `absorb=` auto-route AND the pre-existing `fixed_effects=[<time_col>]` invocation (pre-PR, `fixed_effects=["unit", time]` produced a dict with `len < vcov.shape[0]` and NaN values overwriting the real event-study period coefficients).
-        - All three previously-rejecting absorbed-FE paths are now SUPPORTED. Weighted-CR2 variants (Gates 4-5: `vcov_type="hc2_bm" + weights`; weighted one-way HC2-BM) remain blocked at the `linalg.py::_validate_vcov_args` level pending the clubSandwich WLS algebra derivation.
+        - All three previously-rejecting absorbed-FE paths are now SUPPORTED. **Weighted-CR2 variants (Gates 4-5: `vcov_type="hc2_bm" + weights`; weighted one-way HC2-BM) are now LIFTED via the clubSandwich WLS-CR2 port** documented above on the Phase 1a row. The lift applies to the **analytical** surface: `compute_robust_vcov`, `solve_ols`, and `LinearRegression` direct callers. The helper `_compute_cr2_bm_contrast_dof(weights=...)` is also helper-ready for the contrast-DOF case (e.g., MPD `avg_att`), but no public estimator API currently passes non-None weights to that helper — `MultiPeriodDiD.fit()` has no non-survey weighted entry point, and `survey_design=` paths route through TSL (which takes precedence over the analytical CR2 sandwich). Estimator-level weighted CR2-BM via the public estimator surface would require a separate non-survey weights entry point and is out of scope here.
 - [x] Phase 1a: `vcov_type` enum threaded through `DifferenceInDifferences` (`MultiPeriodDiD`, `TwoWayFixedEffects` inherit); `robust=True` <=> `vcov_type="hc1"`, `robust=False` <=> `vcov_type="classical"`. Conflict detection at `__init__`. Results summary prints the variance-family label.
     - **Note (`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` — SUPPORTED via cluster-aware contrast DOF):** the scalar-coefficient `DifferenceInDifferences` path uses `_compute_cr2_bm`'s per-coefficient Satterthwaite DOF directly for the single-ATT contrast, but `MultiPeriodDiD` also reports a post-period-average ATT constructed as a *contrast* of the event-study coefficients (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for an arbitrary linear combination was not implemented — only the per-coefficient case existed. The new `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra (`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`, `DOF = trace(B)² / trace(B²)`), and `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the per-coefficient case is recovered at machine precision (atol=1e-10, see refactor regression in `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF`). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on cluster presence: one-way `_compute_bm_dof_from_contrasts` for `effective_cluster_ids is None`, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. R parity verified against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` at atol=1e-10 on the `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask `_kept` — the helper accepts the full `effective_cluster_ids` array. Weighted CR2-BM (`survey_design=` paths) remains a separate gate.
 - [x] Phase 1a: `clubSandwich::vcovCR(..., type="CR2")` parity harness committed: R script at `benchmarks/R/generate_clubsandwich_golden.R` plus the authoritative R-generated JSON at `benchmarks/data/clubsandwich_cr2_golden.json` (`"source": "clubSandwich"`, with `clubSandwich_version`, `R_version`, and `generated_at` captured in `meta` for forensic traceability). The parity test at `tests/test_linalg_hc2_bm.py::TestCR2BMCluster::test_cr2_parity_with_golden` runs at 1e-6 tolerance and passes at ≤ 7.1e-15 across all three datasets — Python's `_compute_cr2_bm` matches clubSandwich at machine precision.
diff --git a/tests/test_linalg_hc2_bm.py b/tests/test_linalg_hc2_bm.py
index b9619c61..14a019e4 100644
--- a/tests/test_linalg_hc2_bm.py
+++ b/tests/test_linalg_hc2_bm.py
@@ -12,9 +12,11 @@
   backend for ``MultiPeriodDiD``'s post-period-average ATT inference under the
   cluster+hc2_bm combination.
 
-Weighted CR2 Bell-McCaffrey (``hc2_bm`` + ``cluster=`` + ``weights=``) remains
-deferred to a follow-up; the corresponding ``NotImplementedError`` gate in
-``_validate_vcov_args`` is exercised by ``TestInvalidInputs``.
+Weighted CR2 Bell-McCaffrey (``hc2_bm`` + ``weights=``, both one-way and
+clustered) is now supported via the clubSandwich WLS-CR2 port. Parity against
+``clubSandwich::vcovCR(lm(weights=w), type="CR2") + coef_test(test=
+"Satterthwaite")$df_Satt`` is locked in ``tests/test_methodology_wls_cr2.py``;
+this file's tests cover backward-compat (unweighted is bit-equal to prior).
 """
 
 from __future__ import annotations
@@ -297,59 +299,63 @@ def test_hc1_cluster_unchanged(self, small_ols_dataset):
         explicit = compute_robust_vcov(X, resid, cluster_ids=cluster_ids, vcov_type="hc1")
         np.testing.assert_allclose(default, explicit, atol=1e-14, rtol=1e-14)
 
-    def test_hc2_bm_weighted_cluster_not_implemented(self, small_ols_dataset):
-        """Weighted CR2 Bell-McCaffrey is deferred to Phase 2+."""
+    def test_hc2_bm_weighted_cluster_gate_lifted(self, small_ols_dataset):
+        """Weighted CR2 Bell-McCaffrey (cluster + weights) is now supported via
+        the clubSandwich WLS-CR2 port. Smoke test: produces finite vcov + DOF.
+
+        Numerical parity against ``clubSandwich::vcovCR(lm(weights=w),
+        type="CR2") + coef_test()$df_Satt`` is locked in
+        ``tests/test_methodology_wls_cr2.py``.
+        """
         X, y = small_ols_dataset
-        _, resid, _ = _fit_unweighted(X, y)
         cluster_ids = np.arange(X.shape[0]) % 5
-        w = np.ones(X.shape[0])
-        with pytest.raises(NotImplementedError, match="weights"):
-            compute_robust_vcov(
-                X,
-                resid,
-                cluster_ids=cluster_ids,
-                vcov_type="hc2_bm",
-                weights=w,
-                weight_type="pweight",
-            )
-
-    def test_hc2_bm_weighted_one_way_not_implemented(self, small_ols_dataset):
-        """Weighted one-way Bell-McCaffrey is also deferred.
-
-        The BM DOF helper (`_compute_bm_dof_from_contrasts`) builds its hat
-        matrix from the unscaled design as `X (X'WX)^{-1} X' W`, but
-        `solve_ols` solves weighted regression by transforming to
-        `X* = sqrt(w) X`, `y* = sqrt(w) y`. The symmetric-idempotent residual
-        maker `M* = I - H*` with `H* = sqrt(W) X (X'WX)^{-1} X' sqrt(W)` is
-        the correct one for the Satterthwaite ratio; the asymmetric
-        `X (X'WX)^{-1} X' W` currently produced is neither the transformed
-        nor the original-scale formula. Rather than ship silently-wrong
-        small-sample p-values, `_validate_vcov_args` fails fast.
+        rng = np.random.default_rng(7)
+        w = rng.uniform(0.5, 2.0, size=X.shape[0])
+        # Refit WLS to get correct residuals for the weighted CR2 path.
+        coef = np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y))
+        resid_w = y - X @ coef
+        vcov, dof = compute_robust_vcov(
+            X,
+            resid_w,
+            cluster_ids=cluster_ids,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+        assert np.all(np.isfinite(vcov)), "weighted CR2-BM vcov must be finite"
+        assert np.all(np.diag(vcov) > 0), "weighted CR2-BM vcov diag must be positive"
+        assert np.all(np.isfinite(dof)) and np.all(
+            dof > 0
+        ), "weighted CR2-BM DOF must be finite and positive"
+
+    def test_hc2_bm_weighted_one_way_gate_lifted(self, small_ols_dataset):
+        """Weighted one-way HC2-BM (no cluster) is now supported.
+
+        Uses clubSandwich's singleton-cluster CR2 reduction (each obs is its
+        own cluster) for the DOF computation. The simple ``(tr B)² / tr(B²)``
+        unweighted formula DIVERGES from clubSandwich by ~6% on weighted
+        designs, so the weighted branch routes through the P_array form
+        instead.
         """
         X, y = small_ols_dataset
-        _, resid, _ = _fit_unweighted(X, y)
-        w = np.ones(X.shape[0])
-        with pytest.raises(NotImplementedError, match="weights"):
-            compute_robust_vcov(
-                X,
-                resid,
-                vcov_type="hc2_bm",
-                weights=w,
-                weight_type="pweight",
-            )
-        # The failure must also hit the internal entry point so callers that
-        # reach the numpy backend directly via `solve_ols` do not bypass it.
-        from diff_diff.linalg import _compute_robust_vcov_numpy
-
-        with pytest.raises(NotImplementedError, match="weights"):
-            _compute_robust_vcov_numpy(
-                X,
-                resid,
-                None,
-                weights=w,
-                weight_type="pweight",
-                vcov_type="hc2_bm",
-            )
+        rng = np.random.default_rng(8)
+        w = rng.uniform(0.5, 2.0, size=X.shape[0])
+        coef = np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y))
+        resid_w = y - X @ coef
+        vcov, dof = compute_robust_vcov(
+            X,
+            resid_w,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+        assert np.all(np.isfinite(vcov)), "weighted HC2-BM vcov must be finite"
+        assert np.all(np.diag(vcov) > 0), "weighted HC2-BM vcov diag must be positive"
+        assert np.all(np.isfinite(dof)) and np.all(
+            dof > 0
+        ), "weighted HC2-BM DOF must be finite and positive"
 
 
 # =============================================================================
@@ -398,22 +404,29 @@ def test_solve_ols_rejects_cluster_plus_hc2(self):
         with pytest.raises(ValueError, match="hc2 is one-way only"):
             solve_ols(X, y, cluster_ids=cluster_ids, vcov_type="hc2")
 
-    def test_solve_ols_rejects_cluster_weights_hc2_bm(self):
+    def test_solve_ols_accepts_cluster_weights_hc2_bm(self):
+        """Weighted CR2 BM via solve_ols is supported post-clubSandwich port.
+
+        Smoke test only — numerical parity locked in
+        ``tests/test_methodology_wls_cr2.py``.
+        """
         rng = np.random.default_rng(3)
         n = 20
         X = np.column_stack([np.ones(n), rng.uniform(0, 1, n)])
         y = X @ np.array([1.0, 0.5]) + rng.normal(0, 0.1, n)
         cluster_ids = np.arange(n) % 4
         weights = rng.uniform(0.5, 2.0, size=n)
-        with pytest.raises(NotImplementedError, match="weights"):
-            solve_ols(
-                X,
-                y,
-                cluster_ids=cluster_ids,
-                vcov_type="hc2_bm",
-                weights=weights,
-                weight_type="pweight",
-            )
+        coef, _resid, vcov = solve_ols(
+            X,
+            y,
+            cluster_ids=cluster_ids,
+            vcov_type="hc2_bm",
+            weights=weights,
+            weight_type="pweight",
+            return_vcov=True,
+        )
+        assert vcov is not None and np.all(np.isfinite(vcov))
+        assert np.all(np.diag(vcov) > 0)
 
     def test_linear_regression_rejects_cluster_plus_hc2(self):
         """LinearRegression is an estimator-level entry; it must also raise."""
@@ -554,10 +567,16 @@ def test_cr2_parity_with_golden(self):
             if name == "meta":
                 continue
             # Skip scenarios that don't fit this test's `y ~ x` two-column
-            # contract (e.g. `absorbed_fe_did` is a multi-column DiD design
-            # tested separately via the DiD estimator path in
-            # `test_estimators_vcov_type.py::TestDiDAbsorbedFERParity`).
-            if "x" not in d:
+            # unweighted-cluster contract. Absorbed-FE / MPD / TWFE / SA / WLS
+            # scenarios are tested separately via their own parity tests
+            # (e.g. `test_estimators_vcov_type.py::TestDiDAbsorbedFERParity`
+            # for absorbed_fe_did; `tests/test_methodology_wls_cr2.py` for
+            # the `weighted_*` scenarios).
+            if "x" not in d or "cluster" not in d or "vcov_shape" not in d:
+                continue
+            # The weighted-cluster fixtures use the same `cluster + x` keys
+            # but with `weights` present and a 3-column `[1, x, z]` design.
+            if "weights" in d:
                 continue
             x = np.array(d["x"])
             y = np.array(d["y"])
diff --git a/tests/test_methodology_wls_cr2.py b/tests/test_methodology_wls_cr2.py
new file mode 100644
index 00000000..ab19bc3d
--- /dev/null
+++ b/tests/test_methodology_wls_cr2.py
@@ -0,0 +1,884 @@
+"""Parity tests for WLS-CR2 Bell-McCaffrey against clubSandwich R goldens.
+
+This file pins `diff_diff/linalg.py`'s weighted CR2-BM port against
+`clubSandwich::vcovCR(lm(weights=w), type="CR2") + coef_test(test=
+"Satterthwaite")$df_Satt` plus the compound-contrast DOF from
+`Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` (which on a 1-row
+constraint matrix reduces to a Satterthwaite t-test).
+
+Goldens are generated by `benchmarks/R/generate_clubsandwich_golden.R` (six
+new weighted scenarios appended in this PR; existing six unweighted scenarios
+unchanged for regression safety).
+
+Algorithm note: clubSandwich's WLS-CR2 algebra is NOT a textbook
+Pustejovsky-Tipton (2018) §3.3 transform-once derivation — the two diverge by
+0.5-30% on weighted designs (see `feedback_wls_cr2_clubsandwich_parity`).
+clubSandwich uses W (not sqrt(W)) in the hat matrix, W^2 in the bias-correction
+term, and unweighted residuals in the score construction. The Satterthwaite
+DOF uses the full H1/H2/H3 array construction (`get_arrays.R::get_GH`), NOT
+the simplified `(tr B)^2 / tr(B^2)` formula that works for the unweighted case.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+
+import numpy as np
+import pytest
+
+from diff_diff.linalg import (
+    _compute_bm_dof_oneway,
+    _compute_cr2_bm,
+    _compute_cr2_bm_contrast_dof,
+    _compute_hat_diagonals,
+)
+
+_FIXTURE_PATH = os.path.join(
+    os.path.dirname(__file__),
+    "..",
+    "benchmarks",
+    "data",
+    "clubsandwich_cr2_golden.json",
+)
+_R_FIXTURE_AVAILABLE = os.path.exists(_FIXTURE_PATH)
+
+
+@pytest.fixture(scope="module")
+def goldens():
+    if not _R_FIXTURE_AVAILABLE:
+        pytest.skip(
+            "clubsandwich_cr2_golden.json not generated; run "
+            "Rscript benchmarks/R/generate_clubsandwich_golden.R"
+        )
+    with open(_FIXTURE_PATH) as f:
+        return json.load(f)
+
+
+def _design_x_z(d):
+    """Build [1, x, z] design matrix from a golden scenario dict."""
+    return np.column_stack([np.ones(len(d["x"])), np.asarray(d["x"]), np.asarray(d["z"])])
+
+
+def _refit_wls(X, y, w):
+    """Refit WLS to obtain coef and residuals consistent with R's lm()."""
+    bread = X.T @ (X * w[:, None])
+    coef = np.linalg.solve(bread, X.T @ (w * y))
+    return coef, y - X @ coef, bread
+
+
+class TestWLSCR2ParityClubSandwich:
+    """Parity for weighted CR2 Bell-McCaffrey against clubSandwich's WLS-CR2."""
+
+    def test_one_way_weighted_hc2_bm_se_matches_clubsandwich(self, goldens):
+        d = goldens["weighted_one_way_no_cluster"]
+        X = _design_x_z(d)
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        coef, resid, bread = _refit_wls(X, y, w)
+        _compute_hat_diagonals(X, bread, weights=w)
+        # One-way (no cluster) HC2-BM SE via the diff-diff dispatcher.
+        # Per the lifted gate, _compute_bm_dof_oneway returns the DOF; the SE
+        # is sqrt(diag(vcov)) where vcov is computed in `_compute_robust_vcov_numpy`.
+        # We pin both the vcov (via R's vcov_hc2_bm) and the DOF here.
+        r_vcov = np.asarray(d["vcov_hc2_bm"]).reshape(d["vcov_hc2_bm_shape"])
+        r_se = np.asarray(d["se_hc2_bm"])
+        r_dof = np.asarray(d["dof_bm_one_way"])
+
+        # Use the public dispatcher path
+        from diff_diff.linalg import compute_robust_vcov
+
+        vcov, dof = compute_robust_vcov(
+            X,
+            resid,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+        np.testing.assert_allclose(
+            vcov,
+            r_vcov,
+            atol=1e-10,
+            rtol=1e-10,
+            err_msg="weighted one-way HC2-BM vcov should match clubSandwich",
+        )
+        np.testing.assert_allclose(
+            np.sqrt(np.diag(vcov)),
+            r_se,
+            atol=1e-10,
+            err_msg="weighted one-way HC2-BM SE should match clubSandwich",
+        )
+        np.testing.assert_allclose(
+            dof, r_dof, atol=1e-10, err_msg="weighted one-way HC2-BM DOF should match clubSandwich"
+        )
+
+    def test_balanced_clusters_weighted_cr2_se(self, goldens):
+        d = goldens["weighted_balanced_clusters"]
+        X = _design_x_z(d)
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster_ids = np.asarray(d["cluster"])
+        coef, resid, bread = _refit_wls(X, y, w)
+        r_vcov = np.asarray(d["vcov_cr2"]).reshape(d["vcov_cr2_shape"])
+        r_dof = np.asarray(d["dof_bm"])
+
+        vcov, dof = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w)
+        np.testing.assert_allclose(vcov, r_vcov, atol=1e-10)
+        np.testing.assert_allclose(dof, r_dof, atol=1e-10)
+
+    def test_unbalanced_clusters_weighted_cr2_dof(self, goldens):
+        d = goldens["weighted_unbalanced_clusters"]
+        X = _design_x_z(d)
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster_ids = np.asarray(d["cluster"])
+        coef, resid, bread = _refit_wls(X, y, w)
+        r_vcov = np.asarray(d["vcov_cr2"]).reshape(d["vcov_cr2_shape"])
+        r_dof = np.asarray(d["dof_bm"])
+
+        vcov, dof = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w)
+        np.testing.assert_allclose(vcov, r_vcov, atol=1e-10)
+        np.testing.assert_allclose(dof, r_dof, atol=1e-10)
+
+    def test_singletons_present_adversarial(self, goldens):
+        """Adversarial case: PT2018 transform-once had ~30% gap here. The
+        clubSandwich algebra must match clubSandwich at 1e-10."""
+        d = goldens["weighted_singletons_present"]
+        X = _design_x_z(d)
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster_ids = np.asarray(d["cluster"])
+        coef, resid, bread = _refit_wls(X, y, w)
+        r_vcov = np.asarray(d["vcov_cr2"]).reshape(d["vcov_cr2_shape"])
+        r_dof = np.asarray(d["dof_bm"])
+
+        vcov, dof = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w)
+        np.testing.assert_allclose(
+            vcov,
+            r_vcov,
+            atol=1e-10,
+            err_msg="adversarial singleton-cluster case must match clubSandwich",
+        )
+        np.testing.assert_allclose(dof, r_dof, atol=1e-10)
+
+
+class TestWLSCR2ContrastDOF:
+    """Compound-contrast Satterthwaite DOF parity (MPD avg_att integration)."""
+
+    def test_avg_att_compound_contrast_dof_matches_wald_test_htz(self, goldens):
+        d = goldens["weighted_mpd_avg_att_dof"]
+        # The MPD scenario has a large design matrix (28+ columns) with a
+        # rank-deficient column R drops via NA (treated lies in the span of
+        # the intercept + non-reference unit dummies for treated cohorts).
+        # Build the rank-reduced design matching `finite_coef_names` from R.
+        n = len(d["y"])
+        period_arr = np.asarray(d["period"])
+        unit_arr = np.asarray(d["unit"])
+        treated_arr = np.asarray(d["treated"])
+        finite_names = d["finite_coef_names"]
+        sorted(np.unique(period_arr).tolist())
+        sorted(np.unique(unit_arr).tolist())
+
+        def _make_column(name):
+            if name == "(Intercept)":
+                return np.ones(n)
+            if name == "treated":
+                return treated_arr.astype(float)
+            if name.startswith("period_f"):
+                p = int(name[len("period_f") :])
+                return (period_arr == p).astype(float)
+            if name.startswith("treated_period_"):
+                p = int(name[len("treated_period_") :])
+                return (treated_arr * (period_arr == p)).astype(float)
+            if name.startswith("factor(unit)"):
+                u = int(name[len("factor(unit)") :])
+                return (unit_arr == u).astype(float)
+            raise ValueError(f"Unknown column name: {name}")
+
+        X = np.column_stack([_make_column(name) for name in finite_names])
+        np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster_ids = np.asarray(d["cluster"])
+
+        bread = X.T @ (X * w[:, None])
+        # R's `c_avg` is already subsetted to finite (non-NA) columns.
+        contrasts = np.asarray(d["c_avg"])[:, None]
+
+        r_dof_avg = float(d["dof_avg"])
+        dof = _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts, weights=w)
+        np.testing.assert_allclose(
+            dof[0],
+            r_dof_avg,
+            atol=1e-10,
+            err_msg="weighted avg_att compound DOF should match Wald_test HTZ",
+        )
+
+
+class TestWLSCR2WeightTypeRejection:
+    """The clubSandwich WLS-CR2 port matches `pweight` only. Other weight types
+    must raise NotImplementedError rather than silently producing wrong inference.
+    Addresses Codex P0 finding (weight_type contract gap)."""
+
+    def test_aweight_clustered_hc2_bm_rejects(self):
+        rng = np.random.default_rng(101)
+        n, k = 30, 3
+        X = np.column_stack([np.ones(n), rng.normal(size=(n, k - 1))])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.uniform(0.5, 2.0, n)
+        cluster_ids = np.arange(n) % 5
+        from diff_diff.linalg import compute_robust_vcov
+
+        with pytest.raises(NotImplementedError, match="pweight"):
+            compute_robust_vcov(
+                X,
+                y - X @ np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y)),
+                cluster_ids=cluster_ids,
+                vcov_type="hc2_bm",
+                weights=w,
+                weight_type="aweight",
+            )
+
+    def test_fweight_clustered_hc2_bm_rejects(self):
+        rng = np.random.default_rng(102)
+        n, k = 30, 3
+        X = np.column_stack([np.ones(n), rng.normal(size=(n, k - 1))])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.integers(1, 4, size=n).astype(np.float64)
+        cluster_ids = np.arange(n) % 5
+        from diff_diff.linalg import compute_robust_vcov
+
+        with pytest.raises(NotImplementedError, match="pweight"):
+            compute_robust_vcov(
+                X,
+                y - X @ np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y)),
+                cluster_ids=cluster_ids,
+                vcov_type="hc2_bm",
+                weights=w,
+                weight_type="fweight",
+            )
+
+    def test_aweight_one_way_hc2_bm_rejects(self):
+        rng = np.random.default_rng(103)
+        n, k = 30, 3
+        X = np.column_stack([np.ones(n), rng.normal(size=(n, k - 1))])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.uniform(0.5, 2.0, n)
+        from diff_diff.linalg import compute_robust_vcov
+
+        with pytest.raises(NotImplementedError, match="pweight"):
+            compute_robust_vcov(
+                X,
+                y - X @ np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y)),
+                vcov_type="hc2_bm",
+                weights=w,
+                weight_type="aweight",
+            )
+
+    def test_pweight_clustered_hc2_bm_accepted(self):
+        """pweight (the matching convention) must NOT raise."""
+        rng = np.random.default_rng(104)
+        n, k = 30, 3
+        X = np.column_stack([np.ones(n), rng.normal(size=(n, k - 1))])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.uniform(0.5, 2.0, n)
+        cluster_ids = np.arange(n) % 5
+        from diff_diff.linalg import compute_robust_vcov
+
+        vcov, dof = compute_robust_vcov(
+            X,
+            y - X @ np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y)),
+            cluster_ids=cluster_ids,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+        assert np.all(np.isfinite(vcov)) and np.all(np.diag(vcov) > 0)
+        assert np.all(np.isfinite(dof)) and np.all(dof > 0)
+
+
+class TestWLSCR2ZeroWeightClusterRejection:
+    """Zero-total-weight clusters (subpopulation-style padding) must NOT pass
+    the `G>=2` check when only one effective cluster remains. Addresses Codex
+    P1 finding."""
+
+    def test_clustered_with_one_zero_weight_cluster_raises(self):
+        """A weighted CR2 fit where one cluster has all-zero weights and only
+        one other cluster has positive weight must raise."""
+        rng = np.random.default_rng(201)
+        n = 20
+        X = np.column_stack([np.ones(n), rng.normal(size=n), rng.normal(size=n)])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        # Two clusters: 0 (positive weight) and 1 (all-zero weight).
+        cluster_ids = np.array([0] * 10 + [1] * 10)
+        w = np.concatenate([np.ones(10), np.zeros(10)])  # only cluster 0 has weight
+        bread = X.T @ (X * w[:, None])
+        # `bread_inv` would be rank-deficient if all-zero rows reduce design,
+        # but we can ensure positive bread by adding a third tiny-weight cluster.
+        cluster_ids = np.array([0] * 10 + [1] * 10)
+        # Reformulate: 2 unique clusters but only 1 has positive total weight
+        # (cluster 1 has w=0 for all its members).
+        coef = np.linalg.solve(bread, X.T @ (w * y))
+        resid = y - X @ coef
+        with pytest.raises(ValueError, match="positive total weight"):
+            _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w)
+
+    def test_contrast_dof_clustered_with_one_zero_weight_cluster_raises(self):
+        """Same check for the contrast-DOF helper."""
+        rng = np.random.default_rng(202)
+        n = 20
+        X = np.column_stack([np.ones(n), rng.normal(size=n), rng.normal(size=n)])
+        X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        cluster_ids = np.array([0] * 10 + [1] * 10)
+        w = np.concatenate([np.ones(10), np.zeros(10)])
+        bread = X.T @ (X * w[:, None])
+        contrasts = np.eye(X.shape[1])
+        with pytest.raises(ValueError, match="positive total weight"):
+            _compute_cr2_bm_contrast_dof(X, cluster_ids, bread, contrasts, weights=w)
+
+    def test_clustered_zero_weight_cluster_drops_silently_when_eff_clusters_remain(self):
+        """A zero-weight cluster amongst multiple positive-weight clusters
+        should be silently dropped (its scores contribute zero anyway). No raise."""
+        rng = np.random.default_rng(203)
+        # 4 clusters, last one zero-weight; should silently drop and use 3.
+        cluster_ids = np.array([0] * 5 + [1] * 5 + [2] * 5 + [3] * 5)
+        n = len(cluster_ids)
+        X = np.column_stack([np.ones(n), rng.normal(size=n), rng.normal(size=n)])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = np.concatenate([np.ones(5), np.ones(5) * 1.5, np.ones(5) * 0.8, np.zeros(5)])
+        bread = X.T @ (X * w[:, None])
+        coef = np.linalg.solve(bread, X.T @ (w * y))
+        resid = y - X @ coef
+        vcov, dof = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w)
+        assert np.all(np.isfinite(vcov)) and np.all(np.diag(vcov) > 0)
+        assert np.all(np.isfinite(dof)) and np.all(dof > 0)
+
+
+class TestLinearRegressionWeightedClusterHC2BM:
+    """Regression for the R2 P0: `LinearRegression(vcov_type="hc2_bm")` on a
+    weighted clustered fit must report Satterthwaite DOF via `_bm_dof`, not
+    fall back to `n - k`. The previous skip in `fit()` left `_bm_dof = None`
+    on the weighted-cluster branch, so `get_inference()` silently used the
+    generic DOF and produced anti-conservative p-values / CIs."""
+
+    def test_lr_weighted_cluster_hc2_bm_df_matches_helper(self):
+        """`LinearRegression.get_inference()` df must match
+        `compute_robust_vcov(..., return_dof=True)` DOF, not `n - k`."""
+        from diff_diff.linalg import LinearRegression, compute_robust_vcov
+
+        rng = np.random.default_rng(301)
+        n = 40
+        X = np.column_stack([np.ones(n), rng.normal(size=n), rng.normal(size=n)])
+        cluster_ids = np.arange(n) % 8  # 8 clusters
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.uniform(0.5, 2.0, n)
+
+        # Helper-level: directly compute the Satterthwaite DOF
+        coef = np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y))
+        resid = y - X @ coef
+        _, helper_dof = compute_robust_vcov(
+            X,
+            resid,
+            cluster_ids=cluster_ids,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+
+        # Public surface: fit through LinearRegression. weights/cluster live
+        # on the constructor (not fit()), and the constructor needs
+        # include_intercept=False because X already has an intercept column.
+        lr = LinearRegression(
+            include_intercept=False,
+            vcov_type="hc2_bm",
+            cluster_ids=cluster_ids,
+            weights=w,
+            weight_type="pweight",
+        )
+        lr.fit(X, y)
+        # The Satterthwaite DOF must be threaded into get_inference(), NOT the
+        # n-k fallback. Helper DOF should be small (cluster-driven); fallback
+        # would be ~37 (n - k = 40 - 3).
+        # _bm_dof is the internal attribute populated by fit() and consumed by
+        # get_inference(); also check per-coefficient inference for thread.
+        assert lr._bm_dof is not None, (
+            "LinearRegression must populate _bm_dof on weighted-cluster path "
+            "(previous code skipped this branch)"
+        )
+        np.testing.assert_allclose(
+            lr._bm_dof,
+            helper_dof,
+            atol=1e-10,
+            err_msg="_bm_dof must match helper Satterthwaite DOF",
+        )
+        # Sanity: cluster-driven DOF should be much smaller than n - k.
+        assert np.all(np.asarray(lr._bm_dof) < n - X.shape[1]), (
+            f"Expected cluster-driven Satterthwaite DOF << n-k={n - X.shape[1]}, "
+            f"got {lr._bm_dof}"
+        )
+        # End-to-end: get_inference for each coefficient yields the right df.
+        for i in range(X.shape[1]):
+            inf_i = lr.get_inference(index=i)
+            assert (
+                inf_i.df == lr._bm_dof[i]
+            ), f"get_inference(index={i}).df should equal _bm_dof[{i}]"
+
+    def test_lr_weighted_oneway_hc2_bm_df_matches_helper(self):
+        """One-way weighted HC2-BM via LinearRegression must also thread DOF."""
+        from diff_diff.linalg import LinearRegression, compute_robust_vcov
+
+        rng = np.random.default_rng(302)
+        n = 25
+        X = np.column_stack([np.ones(n), rng.normal(size=n), rng.normal(size=n)])
+        y = X @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n)
+        w = rng.uniform(0.5, 2.0, n)
+
+        coef = np.linalg.solve(X.T @ (X * w[:, None]), X.T @ (w * y))
+        resid = y - X @ coef
+        _, helper_dof = compute_robust_vcov(
+            X,
+            resid,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+            return_dof=True,
+        )
+
+        lr = LinearRegression(
+            include_intercept=False,
+            vcov_type="hc2_bm",
+            weights=w,
+            weight_type="pweight",
+        )
+        lr.fit(X, y)
+        assert lr._bm_dof is not None
+        np.testing.assert_allclose(
+            lr._bm_dof,
+            helper_dof,
+            atol=1e-10,
+            err_msg="One-way LinearRegression _bm_dof must match helper Satterthwaite DOF",
+        )
+
+
+class TestWLSCR2FEDoFNoiseGuard:
+    """Regression for R6 P1: weighted CR2-BM Satterthwaite DOF for high-
+    leverage FE-dummy contrasts (where the contrast vector projects to
+    near-zero on the design) is at the float64 noise floor and disagrees
+    with R's clubSandwich by 15-30% due to BLAS reduction-order differences.
+    The mitigation is to detect the noise floor and return NaN with a
+    UserWarning rather than ship arbitrarily-different small-sample DOF.
+
+    Other (non-noise-floor) coefficients still match clubSandwich exactly.
+    """
+
+    def test_did_absorbed_fe_nan_guards_treated_unit_dummies(self, goldens):
+        """The weighted_did_absorbed_fe scenario: treated-unit dummies are at
+        the noise floor (vcov is fine, but DOF formula is noise/noise). NaN-
+        guard fires; non-noise-floor coefs still match R."""
+        import warnings
+
+        d = goldens["weighted_did_absorbed_fe"]
+        n = len(d["y"])
+        unit_int = np.asarray(d["unit"])
+        period_int = np.asarray(d["period"])
+        treat_post = np.asarray(d["treat_post"])
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster = unit_int
+
+        cols = [np.ones(n), treat_post.astype(float)]
+        col_names = ["(Intercept)", "treat_post"]
+        for u in sorted(np.unique(unit_int))[1:]:
+            cols.append((unit_int == u).astype(float))
+            col_names.append(f"unit{u}")
+        for p in sorted(np.unique(period_int))[1:]:
+            cols.append((period_int == p).astype(float))
+            col_names.append(f"period{p}")
+        X = np.column_stack(cols)
+
+        bread = X.T @ (X * w[:, None])
+        coef = np.linalg.solve(bread, X.T @ (w * y))
+        resid = y - X @ coef
+
+        with warnings.catch_warnings(record=True) as ws:
+            warnings.simplefilter("always")
+            vcov, dof = _compute_cr2_bm(X, resid, cluster, bread, weights=w)
+
+        # VCOV matches R at machine precision.
+        r_vcov = np.asarray(d["vcov_cr2"]).reshape(d["vcov_cr2_shape"])
+        np.testing.assert_allclose(vcov, r_vcov, atol=1e-10)
+
+        # DOF: non-noise-floor coefs match R at 1e-10; treated-unit dummies
+        # (unit2/3/4 — indices 2, 3, 4 in column ordering) are NaN.
+        r_dof = np.asarray(d["dof_cr2"])
+        nan_mask = np.isnan(dof)
+        # The 3 treated-unit dummies should be flagged as noise-floor.
+        treated_unit_idx = [col_names.index(f"unit{u}") for u in [2, 3, 4]]
+        for idx in treated_unit_idx:
+            assert nan_mask[idx], f"Expected NaN at noise-floor index {idx} ({col_names[idx]})"
+        # All other coefs match R exactly.
+        finite_idx = ~nan_mask
+        np.testing.assert_allclose(
+            dof[finite_idx],
+            r_dof[finite_idx],
+            atol=1e-10,
+            err_msg="Non-noise-floor coefficients must match clubSandwich at atol=1e-10",
+        )
+        # Warning fired.
+        noise_warnings = [w for w in ws if "noise floor" in str(w.message)]
+        assert (
+            len(noise_warnings) == 1
+        ), f"Expected 1 noise-floor UserWarning, got {len(noise_warnings)}"
+
+
+class TestWLSCR2SubpopulationInvariance:
+    """Regression for CI codex R8 P0 (round 2): weighted clustered CR2 must
+    be subpopulation-invariant on mixed-zero clusters. Previously, zero-weight
+    rows inside positive-weight clusters still entered the CR2 adjustment
+    matrices (H_gg, G_g, A_g, bias_term), silently changing vcov/DOF. The fix
+    physically filters `weights > 0` rows before building any per-cluster
+    matrix. This test pins parity between (a) computing on the full design
+    with zero-weight padding and (b) computing on the physically dropped
+    positive-weight subset."""
+
+    def test_per_coefficient_dof_invariant_to_zero_weight_padding(self, goldens):
+        """`_compute_cr2_bm` on (X, w) with mixed-zero padding rows must yield
+        identical vcov + per-coefficient DOF to running on (X_pos, w_pos)
+        with zero-weight rows physically removed."""
+        # Use a simple weighted clustered fixture.
+        rng = np.random.default_rng(401)
+        n_clusters, cluster_size = 5, 6
+        n_pos = n_clusters * cluster_size
+        X_pos = np.column_stack([np.ones(n_pos), rng.normal(size=n_pos), rng.normal(size=n_pos)])
+        y_pos = X_pos @ np.array([1.0, 0.5, 0.3]) + rng.normal(0, 0.5, n_pos)
+        w_pos = rng.uniform(0.5, 2.5, n_pos)
+        cluster_pos = np.repeat(np.arange(n_clusters), cluster_size)
+
+        # Pad with 8 zero-weight rows, sprinkled across existing clusters.
+        n_pad = 8
+        X_pad = rng.normal(size=(n_pad, 3))
+        X_pad[:, 0] = 1.0  # intercept
+        y_pad = rng.normal(size=n_pad)
+        w_pad = np.zeros(n_pad)
+        cluster_pad = rng.integers(0, n_clusters, size=n_pad)
+        # Interleave (shuffle order to verify it's not order-dependent).
+        order = rng.permutation(n_pos + n_pad)
+        X_full = np.vstack([X_pos, X_pad])[order]
+        y_full = np.concatenate([y_pos, y_pad])[order]
+        w_full = np.concatenate([w_pos, w_pad])[order]
+        cluster_full = np.concatenate([cluster_pos, cluster_pad])[order]
+
+        # Refit WLS on both designs. Bread is invariant to zero-weight rows.
+        bread = X_full.T @ (X_full * w_full[:, None])
+        coef = np.linalg.solve(bread, X_full.T @ (w_full * y_full))
+        resid_full = y_full - X_full @ coef
+        resid_pos = y_pos - X_pos @ coef
+        bread_pos = X_pos.T @ (X_pos * w_pos[:, None])
+        np.testing.assert_allclose(bread, bread_pos, atol=1e-12)
+
+        vcov_full, dof_full = _compute_cr2_bm(
+            X_full, resid_full, cluster_full, bread, weights=w_full
+        )
+        vcov_pos, dof_pos = _compute_cr2_bm(X_pos, resid_pos, cluster_pos, bread_pos, weights=w_pos)
+        np.testing.assert_allclose(
+            vcov_full,
+            vcov_pos,
+            atol=1e-12,
+            err_msg="vcov must be invariant to zero-weight padding rows",
+        )
+        np.testing.assert_allclose(
+            dof_full,
+            dof_pos,
+            atol=1e-12,
+            err_msg="DOF must be invariant to zero-weight padding rows",
+        )
+
+    def test_contrast_dof_invariant_to_zero_weight_padding(self, goldens):
+        """Same invariance for `_compute_cr2_bm_contrast_dof`."""
+        rng = np.random.default_rng(402)
+        n_clusters, cluster_size = 5, 6
+        n_pos = n_clusters * cluster_size
+        X_pos = np.column_stack([np.ones(n_pos), rng.normal(size=n_pos), rng.normal(size=n_pos)])
+        w_pos = rng.uniform(0.5, 2.5, n_pos)
+        cluster_pos = np.repeat(np.arange(n_clusters), cluster_size)
+
+        n_pad = 6
+        X_pad = rng.normal(size=(n_pad, 3))
+        X_pad[:, 0] = 1.0
+        w_pad = np.zeros(n_pad)
+        cluster_pad = rng.integers(0, n_clusters, size=n_pad)
+        order = rng.permutation(n_pos + n_pad)
+        X_full = np.vstack([X_pos, X_pad])[order]
+        w_full = np.concatenate([w_pos, w_pad])[order]
+        cluster_full = np.concatenate([cluster_pos, cluster_pad])[order]
+
+        bread = X_full.T @ (X_full * w_full[:, None])
+        bread_pos = X_pos.T @ (X_pos * w_pos[:, None])
+
+        # Compound contrast: average of x and z coefficients (indices 1 and 2).
+        contrasts = np.array([[0.0], [0.5], [0.5]])
+        dof_full = _compute_cr2_bm_contrast_dof(
+            X_full, cluster_full, bread, contrasts, weights=w_full
+        )
+        dof_pos = _compute_cr2_bm_contrast_dof(
+            X_pos, cluster_pos, bread_pos, contrasts, weights=w_pos
+        )
+        np.testing.assert_allclose(
+            dof_full,
+            dof_pos,
+            atol=1e-12,
+            err_msg="Contrast DOF must be invariant to zero-weight padding rows",
+        )
+
+
+class TestWLSCR2SingleContrastNoiseFloor:
+    """Regression for CI codex R8 P2: the noise-floor NaN-guard must trigger
+    on single-contrast calls to `_compute_cr2_bm_contrast_dof` too, not just
+    batched per-coefficient calls. The earlier batch-relative-only criterion
+    couldn't catch single contrasts because `max|P|_overall == contrast's own
+    scale` for m=1. The absolute (eps × n × k × bread_scale)² floor closes
+    the gap."""
+
+    def test_single_fe_dummy_contrast_triggers_nan_guard(self, goldens):
+        """A direct single-contrast call for a high-leverage FE-dummy unit
+        dummy must return NaN, not a BLAS-dependent finite DOF."""
+        import warnings
+
+        d = goldens["weighted_did_absorbed_fe"]
+        n = len(d["y"])
+        unit_int = np.asarray(d["unit"])
+        period_int = np.asarray(d["period"])
+        treat_post = np.asarray(d["treat_post"])
+        np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster = unit_int
+
+        cols = [np.ones(n), treat_post.astype(float)]
+        col_names = ["(Intercept)", "treat_post"]
+        for u in sorted(np.unique(unit_int))[1:]:
+            cols.append((unit_int == u).astype(float))
+            col_names.append(f"unit{u}")
+        for p in sorted(np.unique(period_int))[1:]:
+            cols.append((period_int == p).astype(float))
+            col_names.append(f"period{p}")
+        X = np.column_stack(cols)
+        bread = X.T @ (X * w[:, None])
+
+        # Single contrast = unit2 dummy (the noise-floor case from the
+        # weighted_did_absorbed_fe fixture).
+        unit2_idx = col_names.index("unit2")
+        c_unit2 = np.zeros(X.shape[1])
+        c_unit2[unit2_idx] = 1.0
+        contrasts = c_unit2[:, None]  # (k, 1) — single contrast
+
+        with warnings.catch_warnings(record=True) as ws:
+            warnings.simplefilter("always")
+            dof = _compute_cr2_bm_contrast_dof(X, cluster, bread, contrasts, weights=w)
+        assert np.isnan(dof[0]), (
+            "Single weighted FE-dummy contrast must trigger absolute noise-"
+            "floor guard; got finite DOF instead."
+        )
+        noise_warnings = [w for w in ws if "noise floor" in str(w.message)]
+        assert (
+            len(noise_warnings) == 1
+        ), f"Expected 1 noise-floor UserWarning, got {len(noise_warnings)}"
+
+    def test_single_non_noise_contrast_returns_finite_dof(self, goldens):
+        """Sanity check: a single contrast that's NOT at the noise floor
+        (e.g., treat_post) must return finite DOF, not NaN."""
+        d = goldens["weighted_did_absorbed_fe"]
+        n = len(d["y"])
+        unit_int = np.asarray(d["unit"])
+        period_int = np.asarray(d["period"])
+        treat_post = np.asarray(d["treat_post"])
+        np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster = unit_int
+
+        cols = [np.ones(n), treat_post.astype(float)]
+        col_names = ["(Intercept)", "treat_post"]
+        for u in sorted(np.unique(unit_int))[1:]:
+            cols.append((unit_int == u).astype(float))
+            col_names.append(f"unit{u}")
+        for p in sorted(np.unique(period_int))[1:]:
+            cols.append((period_int == p).astype(float))
+            col_names.append(f"period{p}")
+        X = np.column_stack(cols)
+        bread = X.T @ (X * w[:, None])
+
+        # Single contrast = treat_post (NOT a noise-floor contrast).
+        treat_post_idx = col_names.index("treat_post")
+        c_treat = np.zeros(X.shape[1])
+        c_treat[treat_post_idx] = 1.0
+        contrasts = c_treat[:, None]
+
+        dof = _compute_cr2_bm_contrast_dof(X, cluster, bread, contrasts, weights=w)
+        assert np.isfinite(dof[0]) and dof[0] > 0, (
+            f"Single non-noise contrast should return finite positive DOF; " f"got {dof[0]}"
+        )
+        # Sanity: should match R's df_satt for treat_post (3.754) at atol=1e-10.
+        r_dof_treat_post = float(np.asarray(d["dof_cr2"])[treat_post_idx])
+        np.testing.assert_allclose(dof[0], r_dof_treat_post, atol=1e-10)
+
+
+class TestLinearRegressionFENanGuardEndToEnd:
+    """Regression for R7 P0: NaN BM DOF from the noise-floor guard must
+    propagate to NaN inference fields in `LinearRegression.get_inference()`,
+    not fall through to normal-theory inference (which would silently emit
+    huge t-stats and zero-width CIs for the affected coefficients)."""
+
+    def test_did_absorbed_fe_lr_inference_nan_for_guarded_coefs(self, goldens):
+        from diff_diff.linalg import LinearRegression
+
+        d = goldens["weighted_did_absorbed_fe"]
+        n = len(d["y"])
+        unit_int = np.asarray(d["unit"])
+        period_int = np.asarray(d["period"])
+        treat_post = np.asarray(d["treat_post"])
+        y = np.asarray(d["y"])
+        w = np.asarray(d["weights"])
+        cluster = unit_int
+
+        cols = [np.ones(n), treat_post.astype(float)]
+        col_names = ["(Intercept)", "treat_post"]
+        for u in sorted(np.unique(unit_int))[1:]:
+            cols.append((unit_int == u).astype(float))
+            col_names.append(f"unit{u}")
+        for p in sorted(np.unique(period_int))[1:]:
+            cols.append((period_int == p).astype(float))
+            col_names.append(f"period{p}")
+        X = np.column_stack(cols)
+
+        lr = LinearRegression(
+            include_intercept=False,
+            vcov_type="hc2_bm",
+            cluster_ids=cluster,
+            weights=w,
+            weight_type="pweight",
+        )
+        lr.fit(X, y)
+
+        # Guarded coefficients: treated-unit dummies (unit2/3/4).
+        treated_unit_idx = [col_names.index(f"unit{u}") for u in [2, 3, 4]]
+        for i in treated_unit_idx:
+            inf_i = lr.get_inference(index=i)
+            assert np.isnan(inf_i.t_stat), (
+                f"NaN guard must produce NaN t_stat at index {i} ({col_names[i]}); "
+                f"got {inf_i.t_stat}"
+            )
+            assert np.isnan(
+                inf_i.p_value
+            ), f"NaN guard must produce NaN p_value at index {i}; got {inf_i.p_value}"
+            assert all(
+                np.isnan(b) for b in inf_i.conf_int
+            ), f"NaN guard must produce NaN conf_int at index {i}; got {inf_i.conf_int}"
+            # SE and coefficient remain valid (vcov matches at machine precision).
+            assert np.isfinite(inf_i.se) and inf_i.se > 0
+            assert np.isfinite(inf_i.coefficient)
+
+        # Non-guarded coefficients still emit finite inference.
+        non_guarded_idx = [i for i in range(X.shape[1]) if i not in treated_unit_idx]
+        for i in non_guarded_idx:
+            inf_i = lr.get_inference(index=i)
+            assert np.isfinite(inf_i.t_stat), (
+                f"Non-guarded coef index {i} ({col_names[i]}) should have finite t_stat; "
+                f"got {inf_i.t_stat}"
+            )
+            assert np.isfinite(inf_i.p_value)
+            assert all(np.isfinite(b) for b in inf_i.conf_int)
+
+
+class TestUnweightedRegressionStillBitEqual:
+    """Regression safety: existing unweighted goldens must still match bit-equal."""
+
+    def test_unweighted_balanced_small_unchanged(self, goldens):
+        d = goldens["balanced_small"]
+        X = np.column_stack([np.ones(len(d["x"])), np.asarray(d["x"])])
+        y = np.asarray(d["y"])
+        cluster_ids = np.asarray(d["cluster"])
+        bread = X.T @ X
+        coef = np.linalg.solve(bread, X.T @ y)
+        resid = y - X @ coef
+        r_vcov = np.asarray(d["vcov_cr2"]).reshape(d["vcov_shape"])
+        r_dof = np.asarray(d["dof_bm"])
+
+        # Explicitly pass weights=None (the regression-safety path).
+        vcov, dof = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=None)
+        # Pin at atol=1e-12 (existing goldens at atol=1e-10; bit-equal at 1e-14).
+        np.testing.assert_allclose(
+            vcov,
+            r_vcov,
+            atol=1e-12,
+            err_msg="unweighted CR2-BM vcov must remain bit-equal to prior",
+        )
+        np.testing.assert_allclose(
+            dof, r_dof, atol=1e-12, err_msg="unweighted CR2-BM DOF must remain bit-equal to prior"
+        )
+
+
+class TestDOFFormulaDualPathEquivalence:
+    """Dual-path equivalence: the existing simple (tr B)^2 / tr(B^2) formula
+    must agree with the new full P_array form at the unweighted limit.
+
+    This addresses the reviewer's MEDIUM #2 concern about parallel DOF formulas
+    drifting over time. The new P_array form is only used when weights is
+    not None; the unweighted path keeps the simple formula. At the unweighted
+    limit (weights=ones), both formulas mathematically reduce to the same
+    Pustejovsky-Tipton (2018) Theorem 1 value, and this test pins their
+    numerical agreement.
+    """
+
+    def test_unweighted_dof_path_equivalence(self, goldens):
+        d = goldens["unbalanced_medium"]
+        X = np.column_stack([np.ones(len(d["x"])), np.asarray(d["x"])])
+        y = np.asarray(d["y"])
+        cluster_ids = np.asarray(d["cluster"])
+        bread = X.T @ X
+        coef = np.linalg.solve(bread, X.T @ y)
+        resid = y - X @ coef
+
+        # Path A: unweighted simple formula (weights=None).
+        _, dof_simple = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=None)
+        # Path B: P_array formula via weights=ones (forces weighted branch).
+        n = X.shape[0]
+        w_ones = np.ones(n)
+        _, dof_p_array = _compute_cr2_bm(X, resid, cluster_ids, bread, weights=w_ones)
+        np.testing.assert_allclose(
+            dof_simple,
+            dof_p_array,
+            atol=1e-12,
+            err_msg=(
+                "At the unweighted limit, the simple (tr B)^2/tr(B^2) and the "
+                "full P_array formulas must agree. Drift indicates the P_array "
+                "implementation has a bug (likely chol direction or M_U "
+                "normalization)."
+            ),
+        )
+
+    def test_unweighted_one_way_dof_path_equivalence(self, goldens):
+        d = goldens["singletons_present"]
+        X = np.column_stack([np.ones(len(d["x"])), np.asarray(d["x"])])
+        y = np.asarray(d["y"])
+        bread = X.T @ X
+        coef = np.linalg.solve(bread, X.T @ y)
+        y - X @ coef
+        h_diag = _compute_hat_diagonals(X, bread, weights=None)
+
+        # Path A: unweighted simple formula.
+        dof_simple = _compute_bm_dof_oneway(X, bread, h_diag, weights=None)
+        # Path B: P_array via singleton-cluster reduction with weights=ones.
+        n = X.shape[0]
+        w_ones = np.ones(n)
+        h_diag_ones = _compute_hat_diagonals(X, bread, weights=w_ones)
+        dof_p_array = _compute_bm_dof_oneway(X, bread, h_diag_ones, weights=w_ones)
+        np.testing.assert_allclose(
+            dof_simple,
+            dof_p_array,
+            atol=1e-12,
+            err_msg=(
+                "One-way unweighted DOF: simple formula must agree with the "
+                "P_array singleton-cluster reduction at the unweighted limit."
+            ),
+        )