Summary
I now believe this specific incident was most likely caused by the account running out of credits during or right after step submission.
The problem is that Weco did not make that clear. The run ended up looking like a generic failure instead of a billing interruption.
What happened
Affected run: 9aa626ae-da47-45c5-a8ce-d25b0258cfd5
Observed behavior:
- step 14 finished locally and wrote a normal eval output with a metric
- the backend later showed the run as
status=error at current_step=14
- the step 14 node ended up
status=interrupted, with execution_output=null and metric=null
- the task still appeared active
- current account balance is now negative
That makes it look very likely that credits ran out around submission time.
Why this is confusing
From the CLI side, the underlying reason is not surfaced clearly:
weco/api.py swallows submit exceptions and returns None
weco/optimizer.py converts that into a generic submit_failed run error
So even if the real cause is "out of credits", the user sees a vague run failure instead of a clear billing-related interruption.
Repro outline
A likely way to reproduce this:
- Start a queue-based run with
weco run ... --save-logs.
- Let the account balance get very low.
- Let a step finish local evaluation successfully.
- Have credits run out during or immediately after submission to
/suggest.
- Compare local logs with the run state in Weco.
Expected
If a run stops because credits ran out, Weco should say that explicitly.
Concretely, I would expect:
- a specific CLI error message like "Run interrupted: out of credits"
- preserved termination details in the run state
- a clear dashboard indicator that the run stopped because of credits, not because of a code or system issue
- ideally a direct prompt or link to top up credits / enable auto top-up
Actual
- local eval finished
- the run looked like a generic error/interruption
- the real reason was not visible in the CLI or dashboard
Relevant code
weco/api.py -> submit_execution_result(...)
weco/optimizer.py -> _run_optimization_loop(...)
Suggested fix
- detect and preserve credit-related submit failures explicitly
- surface a specific out-of-credits message in the CLI
- show a billing/credits-specific stop reason in the dashboard
- avoid collapsing this into a generic
submit_failed error
Summary
I now believe this specific incident was most likely caused by the account running out of credits during or right after step submission.
The problem is that Weco did not make that clear. The run ended up looking like a generic failure instead of a billing interruption.
What happened
Affected run:
9aa626ae-da47-45c5-a8ce-d25b0258cfd5Observed behavior:
status=erroratcurrent_step=14status=interrupted, withexecution_output=nullandmetric=nullThat makes it look very likely that credits ran out around submission time.
Why this is confusing
From the CLI side, the underlying reason is not surfaced clearly:
weco/api.pyswallows submit exceptions and returnsNoneweco/optimizer.pyconverts that into a genericsubmit_failedrun errorSo even if the real cause is "out of credits", the user sees a vague run failure instead of a clear billing-related interruption.
Repro outline
A likely way to reproduce this:
weco run ... --save-logs./suggest.Expected
If a run stops because credits ran out, Weco should say that explicitly.
Concretely, I would expect:
Actual
Relevant code
weco/api.py->submit_execution_result(...)weco/optimizer.py->_run_optimization_loop(...)Suggested fix
submit_failederror