Skip to content

Running out of credits during submit is surfaced as a generic run error #131

@SebastianBoehler

Description

@SebastianBoehler

Summary

I now believe this specific incident was most likely caused by the account running out of credits during or right after step submission.

The problem is that Weco did not make that clear. The run ended up looking like a generic failure instead of a billing interruption.

What happened

Affected run: 9aa626ae-da47-45c5-a8ce-d25b0258cfd5

Observed behavior:

  • step 14 finished locally and wrote a normal eval output with a metric
  • the backend later showed the run as status=error at current_step=14
  • the step 14 node ended up status=interrupted, with execution_output=null and metric=null
  • the task still appeared active
  • current account balance is now negative

That makes it look very likely that credits ran out around submission time.

Why this is confusing

From the CLI side, the underlying reason is not surfaced clearly:

  • weco/api.py swallows submit exceptions and returns None
  • weco/optimizer.py converts that into a generic submit_failed run error

So even if the real cause is "out of credits", the user sees a vague run failure instead of a clear billing-related interruption.

Repro outline

A likely way to reproduce this:

  1. Start a queue-based run with weco run ... --save-logs.
  2. Let the account balance get very low.
  3. Let a step finish local evaluation successfully.
  4. Have credits run out during or immediately after submission to /suggest.
  5. Compare local logs with the run state in Weco.

Expected

If a run stops because credits ran out, Weco should say that explicitly.

Concretely, I would expect:

  • a specific CLI error message like "Run interrupted: out of credits"
  • preserved termination details in the run state
  • a clear dashboard indicator that the run stopped because of credits, not because of a code or system issue
  • ideally a direct prompt or link to top up credits / enable auto top-up

Actual

  • local eval finished
  • the run looked like a generic error/interruption
  • the real reason was not visible in the CLI or dashboard

Relevant code

  • weco/api.py -> submit_execution_result(...)
  • weco/optimizer.py -> _run_optimization_loop(...)

Suggested fix

  • detect and preserve credit-related submit failures explicitly
  • surface a specific out-of-credits message in the CLI
  • show a billing/credits-specific stop reason in the dashboard
  • avoid collapsing this into a generic submit_failed error

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions