Add fraud-detection example (IEEE-CIS) by ZhengyaoJiang · Pull Request #140 · WecoAI/weco-cli

ZhengyaoJiang · 2026-04-23T15:32:55Z

Summary

Reproducible Weco example on real Vesta payment transactions (IEEE-CIS Fraud Detection Kaggle dataset).
Mirrors the published case study (blog, repo): baseline AUC ≈ 0.914, pooled 6-seed mean 0.9305 ± 0.0035 after 200 steps with `gemini-3.1-pro-preview` + the bundled `instructions.md`.
Scope: both feature engineering (`build_features`) and model config (`train_and_evaluate`) in `train.py` are optimizable. Weco parses `auc_roc: 0.xxxxxx` from the evaluator.

What's in the example

`prepare_data.py` — Kaggle download, label-encode + V-feature correlation pruning, time-based 80/20 split, subsample to 100K/25K parquet files. Uses `python -m kaggle.cli` so the venv's bin/ doesn't need to be on PATH; prints a helpful hint on 403 (rules not accepted / kaggle.json perms).
`train.py` — Weco's optimization target. Leakage-safe baseline: drops `isFraud` before any cross-column aggregation.
`evaluate.py` — reimports `train.py` each run, prints the metric line.
`instructions.md` — the full EDA + techniques prompt from the case study, with a silent-target-leakage guardrail.
`README.md` — venv setup (PEP 668 safe), data prep, baseline sanity check, Weco run command, "things to try" ablations, and a pointer to the leakage trap.

Verification

Two rounds of fresh-agent testing caught and fixed: venv prereq on modern Python installs; `python3` vs `python` on Ubuntu; `kaggle` package has no `main` so needed `kaggle.cli`. Final sanity check blocked on `403 Forbidden` from the Kaggle API (rules-accept is a per-user prereq, called out in the README).

Test plan

Accept competition rules at https://www.kaggle.com/c/ieee-fraud-detection
`cd examples/fraud-detection && python3 -m venv .venv && source .venv/bin/activate`
`pip install -r requirements.txt`
`python prepare_data.py` produces `data/base_train_small.parquet` and `data/base_val_small.parquet`
`python evaluate.py` prints `auc_roc: 0.91x`
`weco run ...` (full command in README) moves AUC into 0.928–0.933 by step ~30

🤖 Generated with Claude Code

Self-contained reproduction of Weco's fraud-detection case study. Downloads the Kaggle dataset, builds a leakage-safe 100K/25K time-based parquet split, and exposes train.py as the optimization target (feature engineering + LightGBM config both modifiable). evaluate.py prints auc_roc for Weco. instructions.md is the full EDA + techniques prompt from the case study — column semantics for each feature group (TransactionAmt, C/D/M/V), 10 well-known IEEE-CIS techniques (UID construction, target encoding with OOF, velocity features, frequency encoding), and a target-leakage guardrail pointing out the isFraud-in-df aggregation trap. README walks through Kaggle API setup, prepare_data step, baseline sanity check (~0.914 AUC), and the canonical weco run command (gemini-3.1-pro-preview, 50 steps, expected trajectory into 0.928-0.933). Also adds 'things to try' (no-instructions variance blow-up, EDA-only ablation, scope restriction) and a silent-target-leakage watch-out pointing to the published case study. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er test Two fresh-agent test rounds surfaced three issues; all fixed: - kaggle CLI: the `kaggle` package has no __main__, so `python -m kaggle` crashes with ModuleNotFoundError. Correct entry point is `kaggle.cli`. - venv instruction used `python -m venv`, which fails on Debian/Ubuntu systems where only `python3` exists (no python-is-python3). Changed to `python3 -m venv`. After activation `python` resolves correctly. - pip-install fails on modern PEP 668 systems without a venv. README now leads with the venv setup before the install step, with a note on why. Also: prepare_data.py now catches Kaggle CalledProcessError and prints the two most common root causes (rules not accepted / kaggle.json perms) with the exact URL to accept the competition rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92cb31d6a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-23T15:34:59Z

+    y_val = val_df["isFraud"].values.astype(np.int32)
+
+    n_train = len(train_df)
+    df = pd.concat([train_df, val_df], axis=0, ignore_index=True)


Fit feature aggregations on training data only

build_features concatenates train_df and val_df before creating grouped amount statistics and frequency encodings, so validation rows (future data in this time-based split) directly shape the engineered features used for evaluation. That leaks validation distribution into the pipeline and can systematically inflate the reported AUC that Weco optimizes against. Compute these encodings/aggregations from train_df only, then map them onto val_df with defaults for unseen keys.

Useful? React with 👍 / 👎.

Codex flagged that the baseline concatenates train + val before computing groupby aggregations and frequency encodings, letting val-period distribution shape train features and letting each val row influence its own encoded values. Even with isFraud dropped first, this is time-leakage that inflates val AUC vs. what would be seen at serving time. Fix: compute all encoders (card1/addr1 amount stats, frequency encoding) on train_df only; .join/.map onto both splits; fill unseen val keys with train-global defaults. Refactored per-row features (time, amount) into a small helper so both splits share that code path without concat. Baseline AUC drops from the previously-reported 0.914 to 0.910 — the right number, not artificially inflated. Expected Weco trajectory (0.928- 0.933 at 200 steps with full instructions) unchanged in shape; case study absolute numbers used the leaky baseline so they shift slightly here. Also expanded instructions.md and README to distinguish target leakage (isFraud in the dataframe during aggregation) from time leakage (val distribution in the encoder fit), with the fit-on-train / apply-to-both pattern spelled out for future encoders Weco proposes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ZhengyaoJiang and others added 2 commits April 23, 2026 15:25

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fraud-detection example (IEEE-CIS)#140

Add fraud-detection example (IEEE-CIS)#140
ZhengyaoJiang wants to merge 3 commits intodevfrom
vk/fraud-detection-example

ZhengyaoJiang commented Apr 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhengyaoJiang commented Apr 23, 2026

Summary

What's in the example

Verification

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant