Skip to content

docs: HAD ecosystem completion (RTD audit Batch A)#389

Merged
igerber merged 9 commits intomainfrom
docs-rtd-audit
Apr 26, 2026
Merged

docs: HAD ecosystem completion (RTD audit Batch A)#389
igerber merged 9 commits intomainfrom
docs-rtd-audit

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 26, 2026

Summary

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the canonical surfaces. The narrative pages did not yet mention HAD, and the 12-symbol HAD pretest suite shipped in had_pretests.py was absent from the API page. Also refreshes the inference-contract block to use the survey_design= canonical kwarg consolidated in PR #376.

  • docs/api/had.rst — new HAD Pretests section covering all 12 public symbols (4 single-period tests + 4 result classes + 3 joint tests + 1 joint result), split into aggregate="overall" and aggregate="event_study" subsections matching the workflow's dispatch. Refreshes the inference-contract block to reference survey_design=make_pweight_design(weights) (pweight shortcut) and survey_design=SurveyDesign(...) (full TSL); notes survey= / weights= are deprecated aliases.
  • docs/choosing_estimator.rst — HAD entries in all 3 tables (Quick Reference, Standard Error Methods, Survey Design Support) plus a new "Universal Rollout / No Untreated Control" subsection in Detailed Guidance. SE Methods row uses survey_design= canonical naming.
  • docs/r_comparison.rst — HAD row in Feature Comparison Table, new "No-Untreated Designs (no R parallel)" subsection, Migration Tips bullet.
  • docs/troubleshooting.rst — new HAD Issues section with 4 subsections (estimand resolution / mass-point fallback / classical SE under survey_design= / panel-only event-study).
  • docs/practitioner_decision_tree.rst — Start Here option 7, At a Glance row, new "Universal Rollout" section with _section-no-untreated anchor.
  • docs/references.rst — Binder citation references the canonical survey_design= paths.
  • docs/doc-deps.yaml — extends had_pretests.py entry with llms.txt user-guide dep; adds new top-level local_linear.py entry.

Methodology references (required if estimator / math changes)

  • N/A - no methodology changes (docs-only PR; no source files modified).

Validation

  • Tests added/updated: No test changes (docs-only).
  • Sphinx build: succeeds (python -m sphinx -b html -q docs docs/_build/html); 0 new warnings from this PR (71 pre-existing in bacon.py / staggered_bootstrap.py autosummary unaffected).
  • Public API regression: all 12 HAD pretest symbols importable (HADPretestReport, qug_test, stute_test, yatchew_hr_test, did_had_pretest_workflow, QUGTestResults, StuteTestResults, YatchewTestResults, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, StuteJointResult); make_pweight_design + SurveyDesign importable.
  • Rendered HTML coverage: api/had.html 276 hits, narrative pages 4-8 hits each.
  • Cross-reference resolution: _section-no-untreated anchor (definition + cross-ref).
  • Style: 0 em dashes in diff.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Closes the gaps left after PR #372 added HeterogeneousAdoptionDiD to the
canonical surfaces. The narrative pages did not yet mention HAD, and the
12-symbol HAD pretest suite shipped in `had_pretests.py` was absent from
the API page. Also refreshes the inference-contract block to use the
`survey_design=` canonical kwarg consolidated in PR #376.

- `docs/api/had.rst`: new HAD Pretests section covering all 12 public
  symbols (4 single-period tests + 4 result classes + 3 joint tests + 1
  joint result), split into `aggregate="overall"` and
  `aggregate="event_study"` subsections matching the workflow's dispatch.
  Refreshes the existing inference-contract block to reference
  `survey_design=make_pweight_design(weights)` (pweight shortcut) and
  `survey_design=SurveyDesign(...)` (full TSL); notes `survey=` /
  `weights=` are deprecated aliases.
- `docs/choosing_estimator.rst`: HAD entries in all 3 tables (Quick
  Reference, Standard Error Methods, Survey Design Support) plus a new
  "Universal Rollout / No Untreated Control" subsection in Detailed
  Guidance. SE Methods row uses `survey_design=` canonical naming.
- `docs/r_comparison.rst`: HAD row in Feature Comparison Table, new
  "No-Untreated Designs (no R parallel)" subsection, Migration Tips
  bullet.
- `docs/troubleshooting.rst`: new HAD Issues section with 4 subsections
  (estimand resolution / mass-point fallback / classical SE under
  survey_design / panel-only event-study).
- `docs/practitioner_decision_tree.rst`: Start Here option 7, At a
  Glance row, new "Universal Rollout" section with `_section-no-untreated`
  anchor.
- `docs/doc-deps.yaml`: extend had_pretests.py entry with llms.txt
  user-guide dep; add new top-level local_linear.py entry.

Verification: all 12 HAD pretest symbols importable; `make_pweight_design`
+ `SurveyDesign` importable; sphinx build succeeds with 0 new warnings
(71 pre-existing unaffected); HTML render contains expected HAD content
(276 hits in had.html, 4-8 in narrative pages); 0 em dashes;
`_section-no-untreated` anchor resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes — I found two unmitigated P1 issues on the newly documented HAD public surface.

Executive Summary

  • The new HAD Pretests API section itself looks aligned with the exported 12-symbol source surface.
  • P1: the PR rewrites the HAD estimator’s pweight/SE contract around survey_design=make_pweight_design(weights), but HeterogeneousAdoptionDiD.fit() explicitly rejects that object and still uses deprecated weights= for the pweight shortcut.
  • P1: several new HAD snippets are not runnable because they use nonexistent kwargs/results attributes (outcome, unit, time, dose, results.coef, results.fit_path) instead of the actual HAD API.
  • P2: the new R-comparison text is stale as of April 26, 2026; R now has HAD-oriented package entry points, so “no R parallel” is too strong. (cran.r-universe.dev)

Methodology

  1. Severity: P1. Impact: affected method is HeterogeneousAdoptionDiD’s weighting/variance contract. docs/api/had.rst:L49-L60, docs/api/had.rst:L67-L70, docs/choosing_estimator.rst:L665-L667, and docs/troubleshooting.rst:L535-L540 now document survey_design=make_pweight_design(weights) as the estimator-side pweight route. That does not match the implementation or the Methodology Registry: data-in HAD surfaces take survey_design=SurveyDesign(...), the deprecated weights= path is still the pweight shortcut, and make_pweight_design(...) is reserved for array-in pretests. Worse, the in-code deprecation text says weights= and survey_design=SurveyDesign(...) currently produce different SE families on HeterogeneousAdoptionDiD.fit(), so the rewritten docs also blur a real variance-family distinction. Cross-check: docs/methodology/REGISTRY.md:L2352-L2353, diff_diff/had.py:L2853-L2891, diff_diff/had.py:L2946-L2967, diff_diff/survey.py:L681-L697, diff_diff/survey.py:L798-L808. Concrete fix: restore the estimator docs to the implemented contract: weights= remains the deprecated pweight shortcut on fit(), survey_design=SurveyDesign(weights='col_name', ...) is the data-in survey route, and make_pweight_design() should only be documented on array-in pretest helpers.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings in the changed files.

Tech Debt

No findings. I also did not see any changed-file issue that is already mitigated by TODO.md.

Security

No findings.

Documentation/Tests

  1. Severity: P1. Impact: the new HAD examples are not runnable as written. The added snippets call did_had_pretest_workflow(..., outcome=..., unit=..., time=..., dose=...) and HeterogeneousAdoptionDiD.fit(..., outcome=..., unit=..., time=..., dose=...), but the actual signatures are outcome_col, dose_col, time_col, unit_col (diff_diff/had_pretests.py:L4015-L4030, diff_diff/had.py:L2786-L2804). They also print results.coef and results.fit_path, while the HAD result objects expose att and design (diff_diff/had.py:L315-L339, diff_diff/had.py:L667-L690). Affected docs: docs/choosing_estimator.rst:L393-L405, docs/practitioner_decision_tree.rst:L285-L299, docs/r_comparison.rst:L234-L238, docs/troubleshooting.rst:L500-L526. Concrete fix: update all new HAD snippets to use the real kwarg names and result attributes (outcome_col, dose_col, time_col, unit_col, results.att, results.design).

  2. Severity: P2. Impact: docs/r_comparison.rst:L216-L230, docs/r_comparison.rst:L399-L415, and docs/r_comparison.rst:L433-L436 say HAD has “no R parallel” and that the paper has not been packaged in R. As of April 26, 2026, the current package manuals already show DIDHAD and DIDmultiplegt(mode="had"); the R surface appears narrower/QUG-oriented, but it is not accurate to say there is no R counterpart at all. Concrete fix: narrow the claim to the actual gap, e.g. that diff-diff covers a broader HAD surface than the current R packages rather than “no R parallel.” (cran.r-universe.dev)

Not run here: Sphinx build/tests, due the read-only sandbox.

Path to Approval

  1. Revert the HAD estimator SE-contract docs to the implemented split between deprecated weights= and survey_design=SurveyDesign(...), and remove survey_design=make_pweight_design(weights) from estimator-surface guidance.
  2. Fix every newly added HAD snippet to use the real call signature and result fields (*_col, att, design).
  3. Update docs/r_comparison.rst to reflect the current R ecosystem and describe the remaining gap precisely instead of saying there is no R parallel.

P1 (methodology contract): the inference-contract block, choosing-estimator
SE Methods row, and troubleshooting "classical SE under survey" subsection
incorrectly described `survey_design=make_pweight_design(weights)` as the
estimator-side pweight shortcut. Per `survey.py:681-697` and
`had.py:2853-2891` that helper is reserved for ARRAY-IN HAD pretest helpers
(`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `qug_test`); on
the data-in `HeterogeneousAdoptionDiD.fit` surface the deprecated
`weights=np.ndarray` shortcut is the actual pweight route, and it currently
yields a different SE family than `survey_design=SurveyDesign(...)`:
`weights=` -> `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 /
2SLS pweight-sandwich); `survey_design=SurveyDesign(...)` ->
`"survey_binder_tsl"` / `"survey_binder_tsl_2sls"` (Binder-TSL). The
unification onto a single SE contract is queued for the next minor release.

- `docs/api/had.rst` inference-contract block: restore `weights=` shortcut
  (deprecated) and `survey_design=SurveyDesign(weights="col", ...)` as the
  two distinct weighted regimes; spell out the SE-family difference and
  the next-minor unification; add a separate paragraph that documents
  `make_pweight_design()` correctly as the pweight-only convenience for
  the array-in pretest helpers.
- `docs/api/had.rst` mass-point classical deviation: cband+event_study
  `NotImplementedError` fires on the deprecated `weights=` shortcut, not
  on `survey_design=make_pweight_design(...)`.
- `docs/choosing_estimator.rst` SE Methods row: same restoration; spells
  out the variance_formula values and notes the next-minor unification.
- `docs/troubleshooting.rst` "classical SE under survey": subsection
  `NotImplementedError` description corrected to `survey_design=SurveyDesign(...)`
  + deprecated `weights=` shortcut.

P1 (snippet correctness): the new HAD code snippets used `outcome=`,
`unit=`, `time=`, `dose=`, `results.coef`, `results.fit_path` - the actual
HAD signature uses `outcome_col`, `unit_col`, `time_col`, `dose_col`
(`had.py:2786-2804`, `had_pretests.py:3741+`), and the result objects
expose `att` and `design` (`had.py:316, 324, 669, 679`). Fixed across:

- `docs/practitioner_decision_tree.rst` Universal Rollout snippet
  (workflow + fit + result-attr).
- `docs/choosing_estimator.rst` Universal Rollout snippet (workflow + fit
  + result-attr).
- `docs/r_comparison.rst` Heterogeneous Adoption snippet (fit kwargs).
- `docs/troubleshooting.rst` HAD Issues snippets (fit kwargs +
  `results.fit_path` -> `results.design`).

P2 (R-comparison overstatement): the dedicated R package `DIDHAD`
(de Chaisemartin et al., August 2025) covers the QUG case (Design 1') of
the same arXiv paper, so claiming "no R parallel" was too strong. Reframed
to acknowledge `DIDHAD` and position diff-diff's broader surface (Design 1
no-QUG, multi-period event-study extension Appendix B.2, survey-design
integration via Binder TSL):

- `docs/r_comparison.rst`: section heading "No-Untreated Designs (no R
  parallel)" -> "Heterogeneous Adoption (HAD)"; body acknowledges `DIDHAD`
  and frames diff-diff's broader surface; Feature Comparison Table row
  label simplified; note about R equivalents updated; Migration Tips
  bullet narrowed and cross-reference target updated.

Sphinx build clean (0 warnings in edited files); also fixes a
title-underline-too-short warning on the "Resolved estimand..." subsection
that was below threshold by one character.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9a62e922816c659dc795b08517642b4d4b3a7426


Overall Assessment

⚠️ Needs changes

The prior P1s from the last review look resolved, but two new unmitigated P1 methodology/doc mismatches remain on the changed HAD public surface.

Executive Summary

  • The earlier blockers appear fixed: the new HAD examples now use the real *_col kwargs / result fields, and the estimator-side weighting docs now distinguish deprecated weights= from canonical survey_design=SurveyDesign(...).
  • P1 [Newly identified]: docs/api/had.rst now tells users that qug_test uses survey_design=make_pweight_design(weights), but qug_test still permanently rejects all survey/weighted inputs.
  • P1 [Newly identified]: the new troubleshooting text says “no exact dose == 0 implies Design 1 / WAS_d_lower”, but design="auto" also resolves to Design 1' when d.min() < 0.01 * median(|d|).
  • P2: the R-comparison refresh is directionally better, but one claim in docs/r_comparison.rst still overstates the remaining gap versus DIDHAD.
  • The newly documented HAD SE-family split / covariance limitations are already tracked in TODO.md, so those remain P3 informational only.

Methodology

  • Severity: P1 [Newly identified]. Impact: docs/api/had.rst:L67-L72 says the array-in HAD pretest helpers, including qug_test, use survey_design=make_pweight_design(weights). That contradicts the actual QUG contract: qug_test documents survey_design= / survey= / weights= as permanently unsupported and raises NotImplementedError, and the Methodology Registry explicitly says make_pweight_design(arr) is not a valid QUG migration target (diff_diff/had_pretests.py:L1236-L1255, diff_diff/had_pretests.py:L1361-L1387, docs/methodology/REGISTRY.md:L2352-L2353). Concrete fix: remove qug_test from that list and say make_pweight_design() applies only to stute_test, yatchew_hr_test, and stute_joint_pretest; for weighted data, direct QUG remains unsupported and did_had_pretest_workflow(..., survey_design=...) intentionally skips it.
  • Severity: P1 [Newly identified]. Impact: docs/troubleshooting.rst:L486-L490 says HAD routes to Design 1 whenever there are no exact dose = 0 rows. That is not the implemented default behavior: _detect_design() still resolves to continuous_at_zero / WAS when d.min() == 0 or when d.min() < 0.01 * median(|d|) (diff_diff/had.py:L1932-L1987, docs/methodology/REGISTRY.md:L2524-L2525). This misstates the public estimand-resolution contract and can make users misread results.target_parameter. Concrete fix: rewrite the troubleshooting cause to match the actual dispatch rule, including the small-positive-d_min Design 1' case.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings in the changed files.

Tech Debt

  • Severity: P3 informational. Impact: docs/api/had.rst:L61-L77 documents the weighted SE-family split, next-minor API consolidation, and missing joint cross-horizon analytical covariance, and those limitations are already tracked in TODO.md:L93-L102. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: docs/r_comparison.rst:L227-L233, docs/r_comparison.rst:L420-L446 say diff-diff additionally covers the HAD multi-period event-study extension beyond DIDHAD. The current DIDHAD docs already expose dynamic effects, placebo, and event-study output in the QUG case, so the remaining gap is narrower than written. Concrete fix: narrow the comparison to the actual documented differences (e.g. Design 1 / no-QUG support and survey-design integration), or qualify the claim to the specific Appendix B.2 staggered/last-cohort surface if that is what you mean. (cran.r-universe.dev)

Not run here: local import/Sphinx smoke checks. This sandbox is missing numpy, and the workspace is read-only.

Path to Approval

  1. In docs/api/had.rst, remove qug_test from the make_pweight_design() guidance and explicitly state that weighted/survey-aware QUG is unsupported.
  2. In docs/troubleshooting.rst, replace the “no exact zero dose => Design 1” explanation with the implemented _detect_design() rule (d.min()==0 or d.min()<0.01*median(|d|) can still resolve to Design 1' / WAS).
  3. In docs/r_comparison.rst, narrow the DIDHAD comparison so it does not imply that R lacks HAD event-study support altogether.

…le, DIDHAD claim

P1 (qug_test in array-in pretest helper list): `docs/api/had.rst:67-72`
listed `qug_test` alongside `stute_test` / `yatchew_hr_test` /
`stute_joint_pretest` as accepting `survey_design=make_pweight_design(weights)`.
Per `had_pretests.py:1236-1255` and the methodology REGISTRY (Phase 4.5 C0
decision gate), `qug_test` permanently raises `NotImplementedError` on any
of `survey_design=` / `survey=` / `weights=` - there is no migration target
for survey-aware QUG, and `make_pweight_design()` is explicitly NOT a valid
QUG migration target. The composite workflow `did_had_pretest_workflow`
handles weighted dispatch by skipping QUG with a `UserWarning`. Removed
`qug_test` from the array-in helper list and added an explicit
permanent-rejection note pointing to the workflow's skip behavior.

P1 (estimand-resolution rule misstatement): `docs/troubleshooting.rst`
"Resolved estimand" subsection said "no exact `dose == 0` => Design 1".
Per `had.py:1932-1987` `_detect_design()` resolves to Design 1' when EITHER
`d.min() == 0` OR `d.min() < 0.01 * median(|d|)` (small-share-of-treated
escape clause). Rewrote the cause to spell out both sub-cases and clarify
that Design 1 only fires when `d.min()` is meaningfully positive relative
to the dose scale. Updated the inspection snippet to compute and print the
`0.01 * median(|d|)` threshold instead of just counting `dose == 0` rows.

P2 (DIDHAD event-study overstatement): `docs/r_comparison.rst` Heterogeneous
Adoption section, R-equivalents note, and Migration Tips bullet claimed
diff-diff additionally covers "the multi-period event-study extension
(paper Appendix B.2)" beyond `DIDHAD`. The `DIDHAD` package already
exposes dynamic effects / placebo / event-study output in the QUG case, so
this overstates the gap. Narrowed all three locations to the documented
differences: Design 1 (no QUG, `WAS_{d_lower}`) and survey-design
integration via Binder TSL.

Sphinx build clean (0 warnings in edited files; the unrelated
`tutorials/18_geo_experiments.ipynb:61` "File not found:
practitioner_decision_tree.html#few-test-markets" warning is pre-existing
on origin/main and not introduced here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f63f40be6c80e458cc7398c3b83effb988ae7fd4


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior P1s appear resolved. docs/api/had.rst now correctly treats qug_test as the exception to survey-aware migration, and the estimand-resolution troubleshooting text now matches _detect_design().
  • The refreshed DIDHAD comparison is narrower than before and no longer makes the earlier overclaim about diff-diff being the only HAD event-study surface.
  • One non-blocking docs mismatch remains in the new HAD troubleshooting section: it overstates when staggered multi-cohort event-study inputs raise versus when the estimator auto-filters.
  • No code quality, performance, maintainability, or security findings in the changed files.

Methodology

  • Severity: P2. Impact: docs/troubleshooting.rst:L572-L597 says HeterogeneousAdoptionDiD.fit(..., aggregate="event_study") raises when staggered timing violates the last-cohort restriction and recommends manual subsetting. The actual contract is narrower: with first_treat_col provided, the estimator auto-filters to the last-treatment cohort plus never-treated and emits a warning; it raises only when the panel is staggered and first_treat_col is missing. That behavior is implemented in diff_diff/had.py:L1230-L1366 and diff_diff/had.py:L1470-L1499, and documented in docs/methodology/REGISTRY.md:L2408-L2408 and docs/methodology/REGISTRY.md:L2533-L2533. Concrete fix: rewrite this troubleshooting entry to split the two cases explicitly and make first_treat_col the primary remedy; keep manual subsetting as optional equivalent guidance.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 informational. Impact: the weighted SE-family split and missing joint cross-horizon analytical covariance are now clearly documented in docs/api/had.rst:L61-L95 and already tracked in TODO.md:L93-L102, so they are non-blocking per the review policy. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology/doc-contract issue above.
  • Validation note: I could not run the Sphinx build here because sphinx is not installed in this environment.

The HAD "Panel-only event-study restriction" subsection in
`docs/troubleshooting.rst` overstated when staggered multi-cohort
event-study inputs raise. Per `had.py:1230-1366` and `had.py:1470-1499`
(also documented in `docs/methodology/REGISTRY.md:2408, 2533`):

- Common-adoption panel (single first-treat period): `first_treat_col`
  optional; the period is auto-inferred from the dose invariant.
- Staggered panel WITH `first_treat_col`: estimator auto-filters to the
  last-treatment cohort + never-treated and emits a UserWarning.
- Staggered panel WITHOUT `first_treat_col`: estimator raises (the only
  actual failure mode for this restriction).

Rewrote the cause to spell out the dispatch and made `first_treat_col`
the primary remedy; kept manual cohort subsetting as an equivalent that
skips the UserWarning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 987bdcd42118bef761c202796cf0089671d706f3


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Re-review scope: the prior HAD event-study troubleshooting mismatch is resolved. docs/troubleshooting.rst:L572-L602 now matches the last-cohort auto-filter / raise split in diff_diff/had.py:L1233-L1360 and diff_diff/had.py:L1470-L1499.
  • The new HAD API/pretest coverage is otherwise aligned with the exported public surface in diff_diff/__init__.py:L65-L81 and the registry notes in docs/methodology/REGISTRY.md:L2352-L2353 and docs/methodology/REGISTRY.md:L2488-L2505.
  • One P1 remains in the new troubleshooting guidance: it says HeterogeneousAdoptionDiD() defaults to robust=True, but the constructor default is robust=False, so the suggested workaround would reproduce the same mass-point + survey vcov_type='classical' failure.
  • One minor P3 docs mismatch remains: the weighted mass-point inference summaries omit supported CR1 clustered SEs.
  • No code-quality, performance, maintainability, or security issues in the changed files.
  • I could not rerun the Sphinx build here because python -m sphinx fails with No module named sphinx.

Methodology

  • Severity: P1. Impact: docs/troubleshooting.rst:L559-L563 recommends HeterogeneousAdoptionDiD() as the fix for the mass-point + survey + classical-VCOV NotImplementedError and says the default is robust=True. The implementation default is robust=False at diff_diff/had.py:L2643-L2649, and the weighted mass-point survey guard explicitly treats that default mapping as classical and raises at diff_diff/had.py:L3418-L3447. A user following the docs would hit the same error instead of the advertised workaround. Concrete fix: replace the snippet with a working override such as HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and remove the incorrect “robust=True by default” comment.
  • Severity: P3. Impact: docs/api/had.rst:L49-L54 and docs/choosing_estimator.rst:L665-L667 describe the deprecated weighted shortcut mass-point path as classical / hc1 only. The implementation and registry also support CR1 when cluster= is supplied on the weighted mass-point path, per diff_diff/had.py:L2276-L2284, diff_diff/had.py:L2433-L2448, and docs/methodology/REGISTRY.md:L2356-L2358 / docs/methodology/REGISTRY.md:L2527. This understates the supported inference surface. Concrete fix: clarify those summaries to say classical / hc1, plus CR1 when cluster= is supplied, or separate clustering from the vcov_type discussion.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: the weighted-family split and missing joint cross-horizon analytical covariance called out in docs/api/had.rst:L61-L82 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology/doc-contract issues above.
  • Validation note: I could not rerun the docs build because sphinx is not installed in this environment.

Path to Approval

  1. Fix docs/troubleshooting.rst:L559-L563 so the recommended workaround actually selects HC1 on the mass-point survey path. The simplest correction is to show HeterogeneousAdoptionDiD(vcov_type='hc1') or HeterogeneousAdoptionDiD(robust=True), and state explicitly that plain HeterogeneousAdoptionDiD() keeps the default robust=False -> vcov_type='classical' mapping.

…lusion

P1 (default robust value): the troubleshooting "classical SE under survey"
solution and the api/had.rst inference-contract block both said the
default `robust=True` maps to `vcov_type='hc1'`. Per `had.py:2643-2649`
the constructor default is `robust=False`, which the survey + mass-point
guard at `had.py:3418-3447` treats as classical and raises - so the
suggested workaround `HeterogeneousAdoptionDiD()` reproduces the same
NotImplementedError. Replaced both with explicit working overrides
(`HeterogeneousAdoptionDiD(vcov_type='hc1')` or
`HeterogeneousAdoptionDiD(robust=True)`) and called out the wrong-default
trap explicitly.

P3 (CR1 omitted from weighted mass-point inference summary): both
`docs/api/had.rst:L49-L54` and `docs/choosing_estimator.rst:L665-L667`
described the deprecated `weights=` shortcut mass-point path as
"`classical` / `hc1` only". Per `had.py:2276-2284, 2433-2448` and
`docs/methodology/REGISTRY.md:2356-2358, 2527`, CR1 (Liang-Zeger) is also
supported when `cluster=` is supplied on the weighted mass-point path.
Added "CR1 when `cluster=` is supplied" to both summaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ff6974104d2a25e43f0ca15445df440bca453d08


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior P1 on the HAD mass-point vcov_type="classical" workaround is resolved. docs/troubleshooting.rst:L541-L569 now matches the actual default mapping and guard in diff_diff/had.py:L2577-L2580 and diff_diff/had.py:L3415-L3447.
  • The prior P3 on weighted mass-point CR1 support is also resolved. docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 now match the weighted 2SLS contract documented in docs/methodology/REGISTRY.md:L2355-L2360.
  • The new HAD API/pretest coverage aligns with the exported public surface in diff_diff/__init__.py:L64-L82 and the registry notes for survey_design=, QUG-under-survey deferral, and variance-formula labels.
  • One P2 doc-contract issue remains: the new prose overstates did_had_pretest_workflow as adjudicating the HAD design / dispatching by panel structure, but the actual API requires explicit aggregate= selection and the estimator, not the workflow, resolves continuous_at_zero vs continuous_near_d_lower vs mass_point.
  • No P0/P1 methodology issues, no code-quality/performance/maintainability/security blockers in the changed files.
  • I could not rerun the Sphinx build here because python -m sphinx fails with ModuleNotFoundError: No module named 'sphinx'.

Methodology

  • Severity: P2. Impact: docs/api/had.rst:L130-L136, docs/practitioner_decision_tree.rst:L283-L301, and docs/r_comparison.rst:L227-L235 describe did_had_pretest_workflow as if it chooses the workflow shape from panel structure and/or adjudicates which HAD design path applies. In the implementation, the workflow has two explicit modes via aggregate= and requires callers to pass aggregate="event_study" on multi-period panels (diff_diff/had_pretests.py:L4033-L4045), while the HAD design itself is auto-resolved separately by _detect_design() from the dose support (diff_diff/had.py:L1932-L1987). A reader can reasonably expect automatic event-study dispatch or a pretest report that decides between continuous_at_zero, continuous_near_d_lower, and mass_point, but the API does neither. Concrete fix: rephrase the workflow as a diagnostic battery only; state explicitly that callers choose aggregate="overall" vs aggregate="event_study", and that HeterogeneousAdoptionDiD.fit() auto-detects the HAD design from the dose support.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations newly documented in docs/api/had.rst:L62-L83 are already tracked in TODO.md:L93-L102, so they are non-blocking under the review policy.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the P2 documentation-contract issue above.
  • Validation note: I could not rerun the docs build in this environment because sphinx is not installed.

…y, not adjudicator)

The new HAD prose overstated `did_had_pretest_workflow` as adjudicating the
HAD design path or auto-dispatching by panel structure. Per
`had_pretests.py:4033-4045` the workflow has two explicit modes via the
`aggregate=` kwarg ("overall" vs "event_study") that the caller picks; the
HAD design (`continuous_at_zero` / `continuous_near_d_lower` / `mass_point`)
is resolved separately inside `HeterogeneousAdoptionDiD.fit` by
`_detect_design()` from the dose support (`had.py:1932-1987`).

- `docs/api/had.rst` HAD Pretests intro: rephrased as a diagnostic battery;
  spells out the two `aggregate=` modes selected by the caller and notes
  the design path is auto-detected inside the estimator.
- `docs/practitioner_decision_tree.rst` Universal Rollout snippet: comment
  no longer claims the workflow "adjudicates which design path"; clarifies
  that the estimator picks the design from the dose support.
- `docs/r_comparison.rst` Heterogeneous Adoption section: dropped
  "adjudicates the design path" claim; describes the workflow as a
  diagnostic battery and points the design-path resolution at the estimator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: db00262f7189f941f346c77021b534bb5f0c0f47


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior doc-contract issue is resolved. The updated prose now correctly separates pretest workflow dispatch from HAD design resolution in docs/api/had.rst:L130-L140, docs/practitioner_decision_tree.rst:L287-L310, and docs/r_comparison.rst:L233-L236, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
  • The new HAD pretest/API coverage matches the exported public surface in docs/api/had.rst:L127-L183 and diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L468-L488.
  • No unmitigated P0/P1 issues found. The refreshed survey/weighted, QUG-under-survey, mass-point-classical, and last-cohort event-study language aligns with docs/methodology/REGISTRY.md:L2352-L2382, diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4147-L4181, diff_diff/had_pretests.py:L4081-L4160, and diff_diff/had.py:L1116-L1499.
  • One P3 informational item remains: the new SE summaries still read as if weighted mass-point CR1 is unconditional when cluster= is supplied, but design="mass_point" + weights= + aggregate="event_study" + cband=True is still a documented exception (docs/api/had.rst:L49-L55, docs/choosing_estimator.rst:L665-L667, docs/methodology/REGISTRY.md:L2380-L2382, diff_diff/had.py:L4065-L4181).
  • I could not rerun the docs build or runtime import smoke tests here because the environment is missing sphinx and numpy.

Methodology

  • Severity: P3 (informational; deviation already documented in REGISTRY.md). Impact: docs/api/had.rst:L49-L55 and docs/choosing_estimator.rst:L665-L667 summarize the deprecated weights= mass-point path as supporting CR1 whenever cluster= is supplied. In implementation, the default weighted event-study path still rejects design="mass_point" + weights= + cluster= + cband=True (diff_diff/had.py:L4065-L4181), and that exception is explicitly documented in docs/methodology/REGISTRY.md:L2380-L2382. Concrete fix: add a short qualifier to the CR1 wording, e.g. “except aggregate="event_study", cband=True on the mass-point path,” or link that sentence to the existing exception note.
  • No other findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No additional findings.
  • Validation note: static cross-check only. I could not rerun Sphinx or import-based validation in this environment because sphinx and numpy are unavailable.

…eption

The R4 CR1 addition wrote "CR1 when `cluster=` is supplied" without noting
the documented mass-point + `aggregate="event_study"` + `cband=True` carve-out:
that sub-path rejects an effective `classical` vcov (per `had.py:4147-4181`
and REGISTRY:2380-2382), so plain `cluster=` with the default `robust=False`
hits the classical-default trap. Both `docs/api/had.rst` and
`docs/choosing_estimator.rst` now spell out the carve-out and point to the
existing classical-default deviation note for the working override
(`vcov_type="hc1"` or `robust=True`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f43d737cb44dd090b2e6a1553b2cc26ba3ed4cdb


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior workflow/design-resolution documentation issue looks resolved. The new prose now correctly separates did_had_pretest_workflow() dispatch from HAD design auto-detection in docs/api/had.rst:L134-L144, matching diff_diff/had_pretests.py:L4033-L4050 and diff_diff/had.py:L1932-L1987.
  • The new HAD API page covers the expected 12 public pretest symbols, and that list matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
  • One residual informational issue remains in the SE contract docs: docs/api/had.rst:L49-L58 and docs/choosing_estimator.rst:L665-L667 still imply weights= + cluster= + aggregate="event_study" + cband=True can be made to work via vcov_type="hc1" / robust=True, but the implementation rejects that clustered combination outright, and also rejects mass-point survey_design + cluster= on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095, docs/methodology/REGISTRY.md:L2380-L2382.
  • No unmitigated P0/P1 findings.
  • Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

  • P3 Impact: docs/api/had.rst:L49-L58, docs/api/had.rst:L89-L100, and docs/choosing_estimator.rst:L665-L667 still overstate clustered weighted mass-point support. They frame the remaining blocker as the classical-default path, and suggest vcov_type="hc1" / robust=True as a workaround. In code, design="mass_point" + weights= + cluster= + aggregate="event_study" + cband=True is rejected regardless of vcov_type, and design="mass_point" + survey_design + cluster= is also rejected on the weighted path diff_diff/had.py:L3399-L3447, diff_diff/had.py:L4059-L4095. This deviation is already documented in docs/methodology/REGISTRY.md:L2380-L2382, so it is informational rather than blocking. Concrete fix: revise the prose to say those clustered combinations are unsupported outright, and remove the implication that hc1/robust=True enables the clustered cband=True path.
  • No other findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD limitations referenced by these docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No additional findings.
  • Validation note: static cross-check only. I could not rerun the docs build or import-based validation because sphinx and numpy are unavailable in this environment.

…ected outright

The R6 wording said `cluster=` + `aggregate="event_study"` + `cband=True`
on mass-point could be made to work via `vcov_type="hc1"` / `robust=True`.
Per `had.py:4059-4095` and `had.py:3399-3447` (also documented in
REGISTRY:2380-2382) that path is rejected outright regardless of `vcov_type`,
and `survey_design=` + `cluster=` on weighted mass-point is similarly
rejected. The error is about variance-family mixing in the sup-t
bootstrap / Binder-TSL composition, not about the classical-default trap.

- `docs/api/had.rst` inline weights= shortcut summary: narrowed the CR1
  qualifier - "rejected outright regardless of `vcov_type`" + cross-link
  to a new sibling deviation note.
- `docs/api/had.rst` new "Mass-point cluster-combination deviation" note
  beside the existing classical-default note: enumerates the two rejected
  combinations (survey_design= + cluster= static and event-study;
  weights= + cluster= + cband=True event-study) with the implementation's
  own workaround language (cband=False / drop cluster= / cluster= alone /
  weights= + cluster=).
- `docs/choosing_estimator.rst` SE Methods row: dropped the misleading
  "requires explicit hc1/robust=True" implication; says rejected outright
  for both the weighted shortcut + cluster= + event_study cband and the
  survey_design= + cluster= mass-point combination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 96eb711ca6e59de4a17f596da2ece239893d88b5


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior weighted clustered-mass-point documentation gap is resolved. docs/api/had.rst:L49-L117 and docs/choosing_estimator.rst:L665-L667 now match the documented deviations in docs/methodology/REGISTRY.md:L2380-L2382 and the guards in diff_diff/had.py:L3399-L3447 and diff_diff/had.py:L4059-L4182.
  • The new HAD pretests API page covers the expected public surface and matches the top-level exports in diff_diff/__init__.py:L69-L81 and diff_diff/__init__.py:L474-L488.
  • The new narrative docs correctly separate pretest-workflow dispatch from estimator design auto-detection, consistent with docs/api/had.rst:L150-L160, docs/practitioner_decision_tree.rst:L287-L310, diff_diff/had_pretests.py:L4015-L4050, and diff_diff/had.py:L1932-L1987.
  • One residual informational issue remains in troubleshooting: the mass-point paragraph describes the path as a fallback that “just changes the SE regime,” but the registry and implementation treat mass_point as a distinct Wald-IV / 2SLS estimator path as well as a different SE family.
  • Validation here was static only; I could not rerun the Sphinx build or import smoke tests because sphinx and numpy are not installed in this environment.

Methodology

Affected methods reviewed: HeterogeneousAdoptionDiD and the HAD pretest surfaces did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test.

  • P3 Impact: docs/troubleshooting.rst:L524-L539 frames mass-point resolution as a fallback to a “2SLS sandwich” that “just changes the SE regime.” In the registry and implementation, mass_point is a distinct Design 1 estimator path: the point estimate itself switches to the Wald-IV / 2SLS formula with instrument 1{D_2 > d_lower}, and the structural-residual sandwich is the corresponding inference path. The same paragraph also over-attributes that SE path to a paper appendix rather than to the library’s documented 2SLS implementation around the paper’s Section 3.2.4 design. Concrete fix: rephrase this subsection to say HAD auto-resolves to the mass-point design when the modal fraction at d_lower exceeds 2%; that design uses the mass-point Wald-IV / 2SLS estimator plus structural-residual sandwich inference. References: docs/troubleshooting.rst:L524-L539, docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2527, diff_diff/had.py:L33-L39, diff_diff/had.py:L2260-L2272.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech-debt findings. The remaining HAD follow-ups referenced by the new docs are already tracked in TODO.md:L93-L102.

Security

  • No findings.

Documentation/Tests

  • No blocking documentation/test issues in the diff.
  • Validation note: static cross-check only. I could not rerun the docs build or import-based smoke tests because sphinx and numpy are unavailable in this environment.

… fallback

The "Mass-point fit fallback" troubleshooting subsection framed
`mass_point` as a fallback that "just changes the SE regime." Per
`had.py:33-39, 2260-2272` and REGISTRY:2400-2402, 2523-2527, mass-point is
a distinct Design 1 estimator path from dCDH 2026 paper Section 3.2.4
(not the Appendix): both the point estimate (Wald-IV sample-average ratio
with binary instrument `Z_g = 1{D_{g,2} > d_lower}`) AND the SE
(structural-residual 2SLS sandwich) differ from the continuous local-linear
/ CCT-2014 paths.

Renamed the subsection to "Mass-point design selected", rewrote the cause
to spell out the 2%-modal-fraction trigger, the Wald-IV ratio formula, and
the structural-residual sandwich; corrected the paper-section attribution
(Section 3.2.4, not Appendix); refined the solution snippet to point at
re-binning to drop modal fraction below 2% if the user prefers
continuous_near_d_lower.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b0399ee390a45370a1b9a671bbff048758d32e19


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: the prior informational mass-point wording issue is resolved. docs/troubleshooting.rst:L518-L546 now describes mass_point as a distinct Design 1 Wald-IV / 2SLS path, matching docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2528, and diff_diff/had.py:L33-L39, diff_diff/had.py:L2250-L2272.
  • The refreshed inference-contract block in docs/api/had.rst:L43-L117 matches the documented HAD survey/weights deviations, including the weighted mass-point vcov_type="classical" guard and the cluster-combination rejections in docs/methodology/REGISTRY.md:L2352-L2382, diff_diff/had.py:L3399-L3447, and diff_diff/had.py:L4059-L4182.
  • The new HAD pretests API section covers the exported public surface and correctly separates aggregate="overall" vs aggregate="event_study" workflow dispatch from estimator design auto-detection: docs/api/had.rst:L147-L205, diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L475-L488, diff_diff/had_pretests.py:L4033-L4050, diff_diff/had.py:L1932-L1987.
  • The narrative additions in docs/choosing_estimator.rst:L364-L405 and docs/practitioner_decision_tree.rst:L266-L311 are consistent with the registry on Design 1' vs Design 1 targets and the non-testability of Assumptions 5/6 beyond the shipped pretests.
  • Validation here was static only; I could not rerun Sphinx in this environment because python -m sphinx --version fails with No module named sphinx.

Methodology

No findings. The changed HAD docs align with the registry and implementation, and the only prior review issue was informational and is now resolved (docs/troubleshooting.rst:L518-L546, docs/methodology/REGISTRY.md:L2400-L2402, docs/methodology/REGISTRY.md:L2523-L2528, diff_diff/had.py:L2250-L2272).

Code Quality

No findings. The new API documentation covers the 12 exported HAD pretest symbols without inventing non-public surfaces (docs/api/had.rst:L147-L205, diff_diff/__init__.py:L69-L81, diff_diff/__init__.py:L475-L488).

Performance

No findings. This is a docs-only diff and it does not change runtime code paths.

Maintainability

No findings. The new doc-deps mappings improve drift coverage for HAD documentation rather than adding untracked coupling (docs/doc-deps.yaml:L386-L406).

Tech Debt

No findings. The limitations referenced by these docs remain tracked in TODO.md and/or documented as registry deviations, including weighted-event-study joint covariance and pending alias removal (TODO.md:L93-L104, docs/methodology/REGISTRY.md:L2380-L2382).

Security

No findings.

Documentation/Tests

No blocking documentation issues found in the diff. Static review only: Sphinx is not installed in this environment, so I could not rerun the docs build.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
@igerber igerber merged commit a7b8454 into main Apr 26, 2026
4 of 5 checks passed
@igerber igerber deleted the docs-rtd-audit branch April 26, 2026 14:15
igerber added a commit that referenced this pull request Apr 26, 2026
…Intensity

Practitioner walkthrough for HeterogeneousAdoptionDiD on the
no-untreated-controls case: every market got the campaign at varying
intensity and there is no clean comparison group. Fills the structural
gap T14 (ContinuousDiD) cannot address.

Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's
structure):
- Sections 1-3: framing the no-untreated-controls measurement problem,
  setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K]
  regional add-on spend (every DMA participates, no DMA at $0). DGP
  is internally consistent: outcomes are generated from the dose
  values HAD then sees, no post-hoc relabeling.
- Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean)
  collapse - HAD's overall mode requires exactly 2 periods
  (had.py:952-959). Locked headline: per-$1K marginal effect of 100
  weekly visits per DMA above the boundary spend (95% CI [98.6,
  101.4]) with design auto-detection landing on
  `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`.
  Surfaces the Assumption 5/6 advisory the library fires for Design 1
  and explains why it holds in this DGP (linear by construction).
- Section 5: multi-week event-study fit on the 8-week panel,
  per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and
  pre-launch placebos at e=-2..-4 sitting on zero.
- Section 6: stakeholder communication template (T18/T19 markdown
  blockquote pattern), per-DMA dollar-lift interpretation
  `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat.
- Section 7: extensions (population-weighted/survey path, composite
  pretest workflow described accurately as QUG support-infimum test
  + linearity tests, mass-point design path), related-tutorials
  cross-links (T01, T02, T14, T17, T18, T19), summary checklist.

Drift detection: companion tests/test_t20_had_brand_campaign_drift.py
(13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's
notebook itself has zero in-notebook asserts). Pins panel composition
including sample median, design auto-detection / target / d_lower,
overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose
mean, n_units, full event-study horizon presence (e=-4..-2, 0..3),
per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at
every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X`
pins throughout - HAD's analytical SE path is bit-identical regardless
of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87.

Documentation integration:
- docs/tutorials/README.md: new T20 entry following T18/T19's
  5-bullet pattern.
- docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py
  entry; cross-link to docs/practitioner_decision_tree.rst added.
- docs/practitioner_decision_tree.rst: `.. tip::` block at the end
  of `section-no-untreated` (Universal Rollout - landed on main via
  PR #389) cross-links to T20 for the full walkthrough.
- CHANGELOG.md: new ### Added bullet under [Unreleased].

Out of scope (queued in project_had_followups.md memory):
- _handle_had in practitioner.py:_HANDLERS map.
- HAD entries in llms-full.txt / choosing_estimator.rst.
- Pretest workflow tutorial, weighted/survey HAD tutorial,
  mass-point design demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
The HeterogeneousAdoptionDiD example snippet in choosing_estimator.rst
failed the Pure Python Fallback CI job (test_doc_snippets.py
::test_doc_snippet[choosing_estimator:block7]) due to three latent
drift bugs from PR #389 (docs-rtd-audit):

1. Missing aggregate='event_study' on both did_had_pretest_workflow
   and HAD.fit calls — default aggregate='overall' requires exactly 2
   periods, but the doc-snippet test framework's namespace `data`
   (built via generate_staggered_data) has 10 periods.

2. Used the namespace's generic `data` variable, which has nonzero
   dose in every period (rng.choice from {0.0, 0.5, 1.0, 2.0}). HAD
   requires D=0 for all units in at least one pre-period.

3. `print(f"Estimate: {results.att:.3f}")` formatted att as a scalar,
   but under aggregate='event_study' results.att is a numpy array.

Fix: rewrite the snippet to construct its own HAD-shape panel inline
(mirrors how block6 handles ContinuousDiD with its own data
generator); thread aggregate='event_study' through both calls;
iterate the per-horizon att array for output.

Pre-existing on origin/main; surfaced on this PR's CI re-run after
the rebase. Other failing snippets (troubleshooting:block18, :block20,
r_comparison:block6, :block7) are also pre-existing on main but are
out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)
igerber added a commit that referenced this pull request Apr 26, 2026
…Intensity

Practitioner walkthrough for HeterogeneousAdoptionDiD on the
no-untreated-controls case: every market got the campaign at varying
intensity and there is no clean comparison group. Fills the structural
gap T14 (ContinuousDiD) cannot address.

Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's
structure):
- Sections 1-3: framing the no-untreated-controls measurement problem,
  setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K]
  regional add-on spend (every DMA participates, no DMA at $0). DGP
  is internally consistent: outcomes are generated from the dose
  values HAD then sees, no post-hoc relabeling.
- Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean)
  collapse - HAD's overall mode requires exactly 2 periods
  (had.py:952-959). Locked headline: per-$1K marginal effect of 100
  weekly visits per DMA above the boundary spend (95% CI [98.6,
  101.4]) with design auto-detection landing on
  `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`.
  Surfaces the Assumption 5/6 advisory the library fires for Design 1
  and explains why it holds in this DGP (linear by construction).
- Section 5: multi-week event-study fit on the 8-week panel,
  per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and
  pre-launch placebos at e=-2..-4 sitting on zero.
- Section 6: stakeholder communication template (T18/T19 markdown
  blockquote pattern), per-DMA dollar-lift interpretation
  `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat.
- Section 7: extensions (population-weighted/survey path, composite
  pretest workflow described accurately as QUG support-infimum test
  + linearity tests, mass-point design path), related-tutorials
  cross-links (T01, T02, T14, T17, T18, T19), summary checklist.

Drift detection: companion tests/test_t20_had_brand_campaign_drift.py
(13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's
notebook itself has zero in-notebook asserts). Pins panel composition
including sample median, design auto-detection / target / d_lower,
overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose
mean, n_units, full event-study horizon presence (e=-4..-2, 0..3),
per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at
every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X`
pins throughout - HAD's analytical SE path is bit-identical regardless
of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87.

Documentation integration:
- docs/tutorials/README.md: new T20 entry following T18/T19's
  5-bullet pattern.
- docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py
  entry; cross-link to docs/practitioner_decision_tree.rst added.
- docs/practitioner_decision_tree.rst: `.. tip::` block at the end
  of `section-no-untreated` (Universal Rollout - landed on main via
  PR #389) cross-links to T20 for the full walkthrough.
- CHANGELOG.md: new ### Added bullet under [Unreleased].

Out of scope (queued in project_had_followups.md memory):
- _handle_had in practitioner.py:_HANDLERS map.
- HAD entries in llms-full.txt / choosing_estimator.rst.
- Pretest workflow tutorial, weighted/survey HAD tutorial,
  mass-point design demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
PR igerber#389 added HAD code snippets to choosing_estimator.rst,
troubleshooting.rst, and r_comparison.rst. None of those edits
triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/,
pyproject.toml, and the workflow file), so tests/test_doc_snippets.py
never executed and the snippets shipped with five latent bugs that now
surface on every code PR via the Pure Python Fallback job.

Bugs addressed:

- r_comparison:block6 — bare HAD.fit(data, ...) with the
  generate_staggered_data fixture failed because the default
  aggregate='overall' requires exactly 2 periods and the namespace
  data has 10. Replaced with an inline HAD-shape panel construction
  (mirrors the upstream choosing_estimator:block7 fix in 55d7a27)
  plus aggregate='event_study'.

- troubleshooting:block20 — the snippet demonstrates
  first_treat_col= auto-filtering on a staggered panel. The fixture's
  first_treat values disagree with the dose path (random per-row dose
  on never-treated units), tripping HAD's first_treat / dose-path
  consistency validator. Inlined a 120-unit / 10-period staggered
  HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the
  validator passes and the boundary local-linear estimator has
  enough distinct dose values to fit.

- troubleshooting:block17 / block18 / r_comparison:block7 — these are
  legitimately context-dependent snippets that reference est /
  results from prior text-flow context (inspection / output-format
  examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the
  expected NameError is suppressed, matching the pattern already
  used for block8, the api_bacon blocks, and the existing
  r_comparison context-dependent set.

choosing_estimator:block7 was the sixth failing snippet but was
already fixed upstream in 55d7a27 with the inline-construction
pattern; this branch rebases onto that.

Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed
on this branch (was 6 failed on origin/main before 55d7a27 and 5
failed after).

Follow-up (separate PR queued): carve test_doc_snippets.py out into a
dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/**
+ the test file itself, and exclude it from rust-test.yml's pytest
invocations so doc bugs are caught on doc PRs (not silently inherited
by code PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
…est.yml

PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that
no CI run caught because rust-test.yml only triggers on
rust/, diff_diff/, tests/, pyproject.toml, and the workflow file —
none of which include docs/. PR igerber#396 patched the snippets but did not
address the structural gap. This PR addresses it.

Two changes:

1. New .github/workflows/docs-tests.yml — separate workflow that
   runs `pytest tests/test_doc_snippets.py -v` on a single
   ubuntu-latest / py3.14 / pure-Python runner. Triggers on
   docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml,
   and the workflow file itself; same ready-for-ci label gate as
   rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape
   (the existing precedent for `pytest`-validated docs assets) so
   the two doc-validation workflows look like siblings.

2. .github/workflows/rust-test.yml: add
   --ignore=tests/test_doc_snippets.py to all three pytest
   invocations so doc snippets stop riding the code workflow.

   The Pure Python Fallback edit (line 193) is the only one that
   changes CI signal: that job runs from the repo root and was the
   ONLY place where test_doc_snippets.py actually executed. The two
   Rust-matrix edits (lines 158, 165) are defensive consistency: the
   matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142)
   without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't
   exist; the test collector silently skips every RST file via the
   guard at tests/test_doc_snippets.py:129. Adding --ignore there
   prevents the no-op from becoming a real run if anyone later adds
   `cp -r docs ...` to the copy steps. Each invocation now carries
   an in-YAML comment documenting which case it's the defensive vs
   behavior-changing edit.

Verification:
- python -c "import yaml; yaml.safe_load(open('.github/workflows/
  docs-tests.yml')); yaml.safe_load(open('.github/workflows/
  rust-test.yml'))" — both files well-formed.
- pytest tests/ --ignore=tests/test_doc_snippets.py
  --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences
  of test_doc_snippets in the collected set (was 115 cases collected
  when not ignored), confirming pytest accepts repeated --ignore
  flags as the existing line-193 pattern with --ignore=tests/
  test_rust_backend.py already showed.

After this PR opens, the workflow file itself triggers docs-tests.yml
on its own change, providing the first end-to-end CI validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant