Conversation
Self-contained reproduction of Weco's fraud-detection case study. Downloads the Kaggle dataset, builds a leakage-safe 100K/25K time-based parquet split, and exposes train.py as the optimization target (feature engineering + LightGBM config both modifiable). evaluate.py prints auc_roc for Weco. instructions.md is the full EDA + techniques prompt from the case study — column semantics for each feature group (TransactionAmt, C/D/M/V), 10 well-known IEEE-CIS techniques (UID construction, target encoding with OOF, velocity features, frequency encoding), and a target-leakage guardrail pointing out the isFraud-in-df aggregation trap. README walks through Kaggle API setup, prepare_data step, baseline sanity check (~0.914 AUC), and the canonical weco run command (gemini-3.1-pro-preview, 50 steps, expected trajectory into 0.928-0.933). Also adds 'things to try' (no-instructions variance blow-up, EDA-only ablation, scope restriction) and a silent-target-leakage watch-out pointing to the published case study. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er test Two fresh-agent test rounds surfaced three issues; all fixed: - kaggle CLI: the `kaggle` package has no __main__, so `python -m kaggle` crashes with ModuleNotFoundError. Correct entry point is `kaggle.cli`. - venv instruction used `python -m venv`, which fails on Debian/Ubuntu systems where only `python3` exists (no python-is-python3). Changed to `python3 -m venv`. After activation `python` resolves correctly. - pip-install fails on modern PEP 668 systems without a venv. README now leads with the venv setup before the install step, with a note on why. Also: prepare_data.py now catches Kaggle CalledProcessError and prints the two most common root causes (rules not accepted / kaggle.json perms) with the exact URL to accept the competition rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92cb31d6a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| y_val = val_df["isFraud"].values.astype(np.int32) | ||
|
|
||
| n_train = len(train_df) | ||
| df = pd.concat([train_df, val_df], axis=0, ignore_index=True) |
There was a problem hiding this comment.
Fit feature aggregations on training data only
build_features concatenates train_df and val_df before creating grouped amount statistics and frequency encodings, so validation rows (future data in this time-based split) directly shape the engineered features used for evaluation. That leaks validation distribution into the pipeline and can systematically inflate the reported AUC that Weco optimizes against. Compute these encodings/aggregations from train_df only, then map them onto val_df with defaults for unseen keys.
Useful? React with 👍 / 👎.
Codex flagged that the baseline concatenates train + val before computing groupby aggregations and frequency encodings, letting val-period distribution shape train features and letting each val row influence its own encoded values. Even with isFraud dropped first, this is time-leakage that inflates val AUC vs. what would be seen at serving time. Fix: compute all encoders (card1/addr1 amount stats, frequency encoding) on train_df only; .join/.map onto both splits; fill unseen val keys with train-global defaults. Refactored per-row features (time, amount) into a small helper so both splits share that code path without concat. Baseline AUC drops from the previously-reported 0.914 to 0.910 — the right number, not artificially inflated. Expected Weco trajectory (0.928- 0.933 at 200 steps with full instructions) unchanged in shape; case study absolute numbers used the leaky baseline so they shift slightly here. Also expanded instructions.md and README to distinguish target leakage (isFraud in the dataframe during aggregation) from time leakage (val distribution in the encoder fit), with the fit-on-train / apply-to-both pattern spelled out for future encoders Weco proposes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
What's in the example
Verification
Two rounds of fresh-agent testing caught and fixed: venv prereq on modern Python installs; `python3` vs `python` on Ubuntu; `kaggle` package has no `main` so needed `kaggle.cli`. Final sanity check blocked on `403 Forbidden` from the Kaggle API (rules-accept is a per-user prereq, called out in the README).
Test plan
🤖 Generated with Claude Code