Skip to content

Commit 202203e

Browse files
committed
update execution doc
1 parent 768177e commit 202203e

1 file changed

Lines changed: 5 additions & 6 deletions

File tree

ADVANCED_USAGE.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@ Below are all the arguments for `bigcodebench.evaluate` for the remote evaluatio
6969
- `--tokenizer_legacy`: Whether to use the legacy tokenizer, default to `False`
7070
- `--samples`: The path to the generated samples file, default to `None`
7171
- `--no_execute`: Whether to not execute the samples, default to `False`
72-
- `--local_execute`: Whether to execute the samples locally, default to `False`
7372
- `--remote_execute_api`: The API endpoint for remote execution, default to `https://bigcode-bigcodebench-evaluator.hf.space/`, you can also use your own Gradio API endpoint by cloning the [bigcodebench-evaluator](https://huggingface.co/spaces/bigcode/bigcodebench-evaluator) repo and check `Use via API` at the bottom of the HF space page
7473
- `--pass_k`: The `k` in `Pass@k`, default to `[1, 5, 10]`, e.g. `--pass_k 1,5,10` will evaluate `Pass@1`, `Pass@5` and `Pass@10`
7574
- `--calibrated`: Whether to use the calibrated samples, default to `True`
@@ -208,10 +207,10 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
208207
# If you want to change the RAM data segment limit (in MB, 30 GB by default): `--max-data-limit`
209208
# If you want to change the RAM stack limit (in MB, 10 MB by default): `--max-stack-limit`
210209
# If you want to increase the execution time limit (in seconds, 240 seconds by default): `--min-time-limit`
211-
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --local_execute --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl
210+
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --execution local --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl
212211

213212
# If you only want to check the ground truths
214-
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --local_execute --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --check-gt-only
213+
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --execution local --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --check-gt-only
215214
```
216215
217216
...Or if you want to try it locally regardless of the risks ⚠️:
@@ -226,11 +225,11 @@ Then, run the evaluation:
226225
227226
```bash
228227
# ...Or locally ⚠️
229-
bigcodebench.evaluate --local_execute --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl
228+
bigcodebench.evaluate --execution local --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl
230229
# ...If you really don't want to check the ground truths
231-
bigcodebench.evaluate --local_execute --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --no-gt
230+
bigcodebench.evaluate --execution local --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --no-gt
232231
# If you want to save the pass rate to a file
233-
bigcodebench.evaluate --local_execute --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --save_pass_rate
232+
bigcodebench.evaluate --execution local --split [complete|instruct] --subset [full|hard] --samples samples-sanitized-calibrated.jsonl --save_pass_rate
234233
235234
# You are strongly recommended to use the following command to clean up the environment after evaluation:
236235
pids=$(ps -u $(id -u) -o pid,comm | grep 'bigcodebench' | awk '{print $1}'); if [ -n \"$pids\" ]; then echo $pids | xargs -r kill; fi;

0 commit comments

Comments
 (0)