Skip to content

fix: clear error state on disabled-transitively cells when ancestor recovers#8784

Merged
dmadisetti merged 24 commits intomarimo-team:mainfrom
VishakBaddur:fix/disabled-cell-error-state-not-cleared
Apr 17, 2026
Merged

fix: clear error state on disabled-transitively cells when ancestor recovers#8784
dmadisetti merged 24 commits intomarimo-team:mainfrom
VishakBaddur:fix/disabled-cell-error-state-not-cleared

Conversation

@VishakBaddur
Copy link
Copy Markdown
Contributor

Fixes #8072

Root Cause

When a disabled-transitively cell's ancestor had an error and then recovered, the disabled cell permanently showed the ancestor's error state.

run_stale_cells() in runtime.py only re-queues non-disabled cells:

if cell_impl.stale and not self.graph.is_disabled(cid):
    cells_to_run.add(cid)

So disabled-transitively cells never got re-queued and never had a chance to reset their run_result_status from "exception" to "disabled".

Fix

  • Added is_any_ancestor_errored() to DirectedGraph
  • In run_stale_cells(), after building cells_to_run, reset run_result_status to "disabled" for any disabled-transitively cell whose ancestor no longer has an error

Testing

Added test_is_any_ancestor_errored to tests/_runtime/test_dataflow.py verifying the new graph method correctly detects and clears ancestor error states.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 17, 2026 8:33pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread marimo/_runtime/dataflow/graph.py Outdated
def is_any_ancestor_errored(self, cell_id: CellId_t) -> bool:
"""Check if any ancestor of a cell has an error."""
return any(
self.topology.cells[cid].run_result_status == "exception"
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_any_ancestor_errored() only treats run_result_status == "exception" as an error, but the runtime also uses other error-like statuses (e.g. "marimo-error" is set for semantic/registration errors). The method name/docstring says “has an error”, so this narrow check is likely to be reused incorrectly and can cause false negatives when an ancestor is still in an error state.

Consider either (a) broadening the predicate to include all statuses that should be treated as “errored” (at least "exception" and "marimo-error", possibly "interrupted" depending on intent), or (b) renaming/docstring to make it explicit that it only checks for raised exceptions.

Suggested change
self.topology.cells[cid].run_result_status == "exception"
self.topology.cells[cid].run_result_status in ("exception", "marimo-error")

Copilot uses AI. Check for mistakes.
Comment thread marimo/_runtime/runtime.py Outdated
Comment on lines +1788 to +1797
# Clear stale error state from disabled-transitively cells whose
# ancestor has recovered from an error. Without this, the disabled
# cell permanently shows the ancestor error even after it is fixed.
for cid, cell_impl in self.graph.cells.items():
if (
self.graph.is_disabled(cid)
and not cell_impl.config.disabled
and cell_impl.run_result_status == "exception"
and not self.graph.is_any_ancestor_errored(cid)
):
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop calls self.graph.is_disabled(cid) for every cell on every run_stale_cells() invocation. DirectedGraph.is_disabled() walks parents (BFS) and can become a noticeable hot path for large notebooks.

Since this block only targets disabled-transitively cells, consider using the already-tracked runtime state (cell_impl.runtime_state == "disabled-transitively" / cell_impl.disabled_transitively) instead of recomputing is_disabled() each time, or precomputing a disabled set once and reusing it in both loops.

Copilot uses AI. Check for mistakes.
Comment thread marimo/_runtime/runtime.py Outdated
Comment on lines +1788 to +1799
# Clear stale error state from disabled-transitively cells whose
# ancestor has recovered from an error. Without this, the disabled
# cell permanently shows the ancestor error even after it is fixed.
for cid, cell_impl in self.graph.cells.items():
if (
self.graph.is_disabled(cid)
and not cell_impl.config.disabled
and cell_impl.run_result_status == "exception"
and not self.graph.is_any_ancestor_errored(cid)
):
cell_impl.set_run_result_status("disabled")

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block updates cell_impl.run_result_status but does not emit any CellNotification to the frontend. The frontend’s “errored”/error UI is driven by received cell-op messages (especially error outputs), and it doesn’t observe backend run_result_status directly.

If the goal is to clear the user-visible error state for disabled-transitively cells, this likely also needs an explicit UI update (e.g., clearing/replacing the error output and/or sending a status transition that resets the frontend’s errored flag). An alternative is to include these cells in the normal _run_cells queue so they go through the runner’s standard status transitions, plus explicitly clearing their error output when they’re skipped as disabled.

Copilot uses AI. Check for mistakes.
Comment on lines +1594 to +1623
def test_is_any_ancestor_errored() -> None:
"""Test that is_any_ancestor_errored correctly detects ancestor errors."""
graph = dataflow.DirectedGraph()
# Create a chain: 0 -> 1 -> 2
code = "x = 0"
first_cell = parse_cell(code)
graph.register_cell("0", first_cell)
code = "y = x"
second_cell = parse_cell(code)
graph.register_cell("1", second_cell)
code = "z = y"
third_cell = parse_cell(code)
graph.register_cell("2", third_cell)

# No errors initially
assert not graph.is_any_ancestor_errored("0")
assert not graph.is_any_ancestor_errored("1")
assert not graph.is_any_ancestor_errored("2")

# Set cell 0 to exception state
graph.cells["0"].set_run_result_status("exception")
assert not graph.is_any_ancestor_errored("0") # no ancestors
assert graph.is_any_ancestor_errored("1") # parent 0 has error
assert graph.is_any_ancestor_errored("2") # grandparent 0 has error

# Fix cell 0 - clear the error
graph.cells["0"].set_run_result_status("success")
assert not graph.is_any_ancestor_errored("0")
assert not graph.is_any_ancestor_errored("1")
assert not graph.is_any_ancestor_errored("2")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates the new DirectedGraph.is_any_ancestor_errored() helper, but the PR’s user-facing behavior change is in Kernel.run_stale_cells() (clearing disabled-transitively cells’ stale error state when an ancestor recovers). Consider adding an integration-style runtime test that reproduces #8072 end-to-end (ancestor errors → downstream disabled-transitively cell shows error → ancestor fixed + run_stale_cells() → downstream cell no longer shows error/exception state). This would help ensure the run_stale_cells() logic stays correct as execution/notification behavior evolves.

Copilot uses AI. Check for mistakes.
@dmadisetti
Copy link
Copy Markdown
Collaborator

Hi @VishakBaddur are you still interested in contributing this? I'm not actually seeing this change reflected in the smoke test I just pushed.

image

If you want to update with a screenshot once you get this working that would be great.

Copy link
Copy Markdown
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving over to draft until we do a bit of iteration. Thanks!

@dmadisetti dmadisetti marked this pull request as draft April 1, 2026 21:06
@VishakBaddur
Copy link
Copy Markdown
Contributor Author

Hi @dmadisetti , thanks for the smoke test and for pushing this forward! I'll investigate why the error state isn't clearing in the UI. Looks like I need to also emit a frontend notification after resetting the run_result_status. Will update the PR shortly.

@VishakBaddur
Copy link
Copy Markdown
Contributor Author

Hi @dmadisetti , thanks for the smoke test, it helped pinpoint the exact gap. I've pushed a fix that addresses both the backend and frontend sides:

Root cause (two parts):
Backend: run_stale_cells() was resetting run_result_status but never emitting any frontend notifications, so the UI had no signal to update.
Frontend: transitionCell() in cell.ts did not reset the errored flag when receiving "disabled-transitively" status, leaving the red has-error border even after the output was cleared.

Fix:
Call cell_impl.set_runtime_state("disabled-transitively") to keep backend state consistent and broadcast status to the frontend
Call CellNotificationUtils.broadcast_empty_output(cell_id, status="disabled-transitively") to replace the stale error output in the UI
Reset nextCell.errored = false in the "disabled-transitively" case in transitionCell() — this is safe because in all other code paths where "disabled-transitively" is sent, errored is already false, and if an error message follows (as in mutate_graph), it will correctly re-set errored = true
Broadened is_any_ancestor_errored to include "marimo-error" in addition to "exception"

@VishakBaddur VishakBaddur marked this pull request as ready for review April 2, 2026 04:29
@VishakBaddur VishakBaddur requested a review from dmadisetti April 2, 2026 04:29
@dmadisetti
Copy link
Copy Markdown
Collaborator

@VishakBaddur this still doesn't work as expected

image

Can you update with a screenshot when you're ready?

@VishakBaddur
Copy link
Copy Markdown
Contributor Author

Hi @dmadisetti , the fix is now working end-to-end. Here's what changed:
The previous approach only handled the backend state but missed two critical issues:

Wrong code path: The clear-stale logic was in run_stale_cells(), but the UI element toggle triggers set_ui_element_value() → _run_cells() directly. So the fix never fired during the actual bug reproduction.

Frontend never received a clear signal: Even when the backend reset run_result_status, no frontend notification was sent to clear the stale error output and errored/stopped flags.

The actual fix:

Backend: Snapshot disabled cells in error/cancelled state before running. After _run_cells() completes, broadcast empty output + correct status for any cell whose ancestor has now recovered. This lives in _run_cells() so it covers all code paths.
Frontend: Clear stopped flag in disabled-transitively case; clear stopped/errored in idle case when non-error output arrives.

Screenshot 2026-04-04 at 10 24 31 AM Screenshot 2026-04-04 at 10 25 04 AM

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 4, 2026

Bundle Report

Changes will increase total bundle size by 1.88kB (0.01%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
marimo-esm 24.88MB 1.88kB (0.01%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/cells-*.js 59 bytes 704.0kB 0.01%
assets/index-*.js 89 bytes 602.39kB 0.01%
assets/index-*.css 48 bytes 362.33kB 0.01%
assets/JsonOutput-*.js 2.14kB 342.26kB 0.63%
assets/edit-*.js 1 bytes 329.61kB 0.0%
assets/add-*.js 7 bytes 192.76kB 0.0%
assets/layout-*.js -2 bytes 185.91kB -0.0%
assets/cell-*.js 3 bytes 183.15kB 0.0%
assets/file-*.js 50 bytes 49.4kB 0.1%
assets/panels-*.js 6 bytes 45.36kB 0.01%
assets/session-*.js -9 bytes 24.99kB -0.04%
assets/home-*.js 132 bytes 21.86kB 0.61%
assets/purify.es-*.js 16 bytes 21.05kB 0.08%
assets/column-*.js -659 bytes 6.53kB -9.16%

Comment thread marimo/_runtime/runtime.py Outdated
# Clear stale error state from disabled cells whose ancestor
# recovered. Uses pre-run snapshot since run_result_status is
# updated during the run.
for _cid in _pre_run_errored_disabled:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these shouldn't be underscore prefixed (implies they are unused)

@dmadisetti
Copy link
Copy Markdown
Collaborator

@VishakBaddur great job! Much cleaner implementation. We still get the red highlights:

image

But I don't think that's blocking. I think this is a great starting point if you'd like to get it in as is. Tiny code clean up comment, but let me know if you're happy with this

VishakBaddur and others added 11 commits April 7, 2026 18:05
…ecovers

Fixes marimo-team#8072

When a disabled-transitively cell's ancestor had an error and then
recovered, the disabled cell permanently showed the ancestor's error
state. This happened because run_stale_cells() only re-queues non-disabled
cells, so disabled-transitively cells never got a chance to reset their
run_result_status from 'exception' to 'disabled'.

Fix:
- Add is_any_ancestor_errored() to DirectedGraph
- In run_stale_cells(), after building cells_to_run, reset run_result_status
  to 'disabled' for any disabled-transitively cell whose ancestor no longer
  has an error
…ecovers

Root cause had two parts:
1. Backend: run_stale_cells() only reset run_result_status but never
   emitted frontend notifications, so the UI never saw the change.
2. Frontend: transitionCell() did not reset errored flag on
   'disabled-transitively' status, leaving the red error border.

Fix:
- Call set_runtime_state('disabled-transitively') after resetting
  run_result_status so the backend object state stays consistent
- Call CellNotificationUtils.broadcast_empty_output to replace the
  stale error output in the UI
- Reset nextCell.errored = false in the 'disabled-transitively' case
  in transitionCell() so the has-error CSS class is cleared
- Broaden is_any_ancestor_errored to include 'marimo-error' status
  in addition to 'exception'
- Add test for marimo-error case
Fixes issue marimo-team#8072: disabled cells permanently show ancestor error
after ancestor recovers.

Backend (runtime.py):
- Snapshot disabled cells in error/cancelled state before running
- After _run_cells completes, broadcast empty output + correct status
  for any snapshotted cell whose ancestor has now recovered
- Moved to _run_cells() to cover set_ui_element_value() path too
- config.disabled cells get idle + empty output
- transitively-disabled cells get disabled-transitively + empty output

Frontend (cell.ts):
- disabled-transitively case: also clear stopped flag
- idle case: clear stopped/errored when non-error output arrives
@VishakBaddur
Copy link
Copy Markdown
Contributor Author

Hi @dmadisetti , addressed the nit, removed the underscore prefixes. Happy to merge as-is if you are!

@manzt manzt removed their request for review April 16, 2026 19:54
@mscolnick mscolnick requested a review from dmadisetti April 17, 2026 20:22
Copy link
Copy Markdown
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure what's happening with ci. runtime tests pass locally. Thanks @VishakBaddur sorry for the long delay!

@dmadisetti dmadisetti merged commit ba12764 into marimo-team:main Apr 17, 2026
41 of 111 checks passed
@github-actions
Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.2-dev51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash bug Something isn't working dependencies documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disabled cells' ancestor errors cannot be cleared

4 participants