fix: clear error state on disabled-transitively cells when ancestor recovers#8784
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| def is_any_ancestor_errored(self, cell_id: CellId_t) -> bool: | ||
| """Check if any ancestor of a cell has an error.""" | ||
| return any( | ||
| self.topology.cells[cid].run_result_status == "exception" |
There was a problem hiding this comment.
is_any_ancestor_errored() only treats run_result_status == "exception" as an error, but the runtime also uses other error-like statuses (e.g. "marimo-error" is set for semantic/registration errors). The method name/docstring says “has an error”, so this narrow check is likely to be reused incorrectly and can cause false negatives when an ancestor is still in an error state.
Consider either (a) broadening the predicate to include all statuses that should be treated as “errored” (at least "exception" and "marimo-error", possibly "interrupted" depending on intent), or (b) renaming/docstring to make it explicit that it only checks for raised exceptions.
| self.topology.cells[cid].run_result_status == "exception" | |
| self.topology.cells[cid].run_result_status in ("exception", "marimo-error") |
| # Clear stale error state from disabled-transitively cells whose | ||
| # ancestor has recovered from an error. Without this, the disabled | ||
| # cell permanently shows the ancestor error even after it is fixed. | ||
| for cid, cell_impl in self.graph.cells.items(): | ||
| if ( | ||
| self.graph.is_disabled(cid) | ||
| and not cell_impl.config.disabled | ||
| and cell_impl.run_result_status == "exception" | ||
| and not self.graph.is_any_ancestor_errored(cid) | ||
| ): |
There was a problem hiding this comment.
This loop calls self.graph.is_disabled(cid) for every cell on every run_stale_cells() invocation. DirectedGraph.is_disabled() walks parents (BFS) and can become a noticeable hot path for large notebooks.
Since this block only targets disabled-transitively cells, consider using the already-tracked runtime state (cell_impl.runtime_state == "disabled-transitively" / cell_impl.disabled_transitively) instead of recomputing is_disabled() each time, or precomputing a disabled set once and reusing it in both loops.
| # Clear stale error state from disabled-transitively cells whose | ||
| # ancestor has recovered from an error. Without this, the disabled | ||
| # cell permanently shows the ancestor error even after it is fixed. | ||
| for cid, cell_impl in self.graph.cells.items(): | ||
| if ( | ||
| self.graph.is_disabled(cid) | ||
| and not cell_impl.config.disabled | ||
| and cell_impl.run_result_status == "exception" | ||
| and not self.graph.is_any_ancestor_errored(cid) | ||
| ): | ||
| cell_impl.set_run_result_status("disabled") | ||
|
|
There was a problem hiding this comment.
This block updates cell_impl.run_result_status but does not emit any CellNotification to the frontend. The frontend’s “errored”/error UI is driven by received cell-op messages (especially error outputs), and it doesn’t observe backend run_result_status directly.
If the goal is to clear the user-visible error state for disabled-transitively cells, this likely also needs an explicit UI update (e.g., clearing/replacing the error output and/or sending a status transition that resets the frontend’s errored flag). An alternative is to include these cells in the normal _run_cells queue so they go through the runner’s standard status transitions, plus explicitly clearing their error output when they’re skipped as disabled.
| def test_is_any_ancestor_errored() -> None: | ||
| """Test that is_any_ancestor_errored correctly detects ancestor errors.""" | ||
| graph = dataflow.DirectedGraph() | ||
| # Create a chain: 0 -> 1 -> 2 | ||
| code = "x = 0" | ||
| first_cell = parse_cell(code) | ||
| graph.register_cell("0", first_cell) | ||
| code = "y = x" | ||
| second_cell = parse_cell(code) | ||
| graph.register_cell("1", second_cell) | ||
| code = "z = y" | ||
| third_cell = parse_cell(code) | ||
| graph.register_cell("2", third_cell) | ||
|
|
||
| # No errors initially | ||
| assert not graph.is_any_ancestor_errored("0") | ||
| assert not graph.is_any_ancestor_errored("1") | ||
| assert not graph.is_any_ancestor_errored("2") | ||
|
|
||
| # Set cell 0 to exception state | ||
| graph.cells["0"].set_run_result_status("exception") | ||
| assert not graph.is_any_ancestor_errored("0") # no ancestors | ||
| assert graph.is_any_ancestor_errored("1") # parent 0 has error | ||
| assert graph.is_any_ancestor_errored("2") # grandparent 0 has error | ||
|
|
||
| # Fix cell 0 - clear the error | ||
| graph.cells["0"].set_run_result_status("success") | ||
| assert not graph.is_any_ancestor_errored("0") | ||
| assert not graph.is_any_ancestor_errored("1") | ||
| assert not graph.is_any_ancestor_errored("2") |
There was a problem hiding this comment.
This test validates the new DirectedGraph.is_any_ancestor_errored() helper, but the PR’s user-facing behavior change is in Kernel.run_stale_cells() (clearing disabled-transitively cells’ stale error state when an ancestor recovers). Consider adding an integration-style runtime test that reproduces #8072 end-to-end (ancestor errors → downstream disabled-transitively cell shows error → ancestor fixed + run_stale_cells() → downstream cell no longer shows error/exception state). This would help ensure the run_stale_cells() logic stays correct as execution/notification behavior evolves.
|
Hi @VishakBaddur are you still interested in contributing this? I'm not actually seeing this change reflected in the smoke test I just pushed.
If you want to update with a screenshot once you get this working that would be great. |
dmadisetti
left a comment
There was a problem hiding this comment.
Moving over to draft until we do a bit of iteration. Thanks!
|
Hi @dmadisetti , thanks for the smoke test and for pushing this forward! I'll investigate why the error state isn't clearing in the UI. Looks like I need to also emit a frontend notification after resetting the run_result_status. Will update the PR shortly. |
|
Hi @dmadisetti , thanks for the smoke test, it helped pinpoint the exact gap. I've pushed a fix that addresses both the backend and frontend sides: Root cause (two parts): Fix: |
|
@VishakBaddur this still doesn't work as expected
Can you update with a screenshot when you're ready? |
|
Hi @dmadisetti , the fix is now working end-to-end. Here's what changed: Wrong code path: The clear-stale logic was in run_stale_cells(), but the UI element toggle triggers set_ui_element_value() → _run_cells() directly. So the fix never fired during the actual bug reproduction. Frontend never received a clear signal: Even when the backend reset run_result_status, no frontend notification was sent to clear the stale error output and errored/stopped flags. The actual fix: Backend: Snapshot disabled cells in error/cancelled state before running. After _run_cells() completes, broadcast empty output + correct status for any cell whose ancestor has now recovered. This lives in _run_cells() so it covers all code paths.
|
Bundle ReportChanges will increase total bundle size by 1.88kB (0.01%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
|
f562651 to
efacbc6
Compare
efacbc6 to
13db6dd
Compare
| # Clear stale error state from disabled cells whose ancestor | ||
| # recovered. Uses pre-run snapshot since run_result_status is | ||
| # updated during the run. | ||
| for _cid in _pre_run_errored_disabled: |
There was a problem hiding this comment.
nit: these shouldn't be underscore prefixed (implies they are unused)
|
@VishakBaddur great job! Much cleaner implementation. We still get the red highlights:
But I don't think that's blocking. I think this is a great starting point if you'd like to get it in as is. Tiny code clean up comment, but let me know if you're happy with this |
…ecovers Fixes marimo-team#8072 When a disabled-transitively cell's ancestor had an error and then recovered, the disabled cell permanently showed the ancestor's error state. This happened because run_stale_cells() only re-queues non-disabled cells, so disabled-transitively cells never got a chance to reset their run_result_status from 'exception' to 'disabled'. Fix: - Add is_any_ancestor_errored() to DirectedGraph - In run_stale_cells(), after building cells_to_run, reset run_result_status to 'disabled' for any disabled-transitively cell whose ancestor no longer has an error
for more information, see https://pre-commit.ci
…ecovers
Root cause had two parts:
1. Backend: run_stale_cells() only reset run_result_status but never
emitted frontend notifications, so the UI never saw the change.
2. Frontend: transitionCell() did not reset errored flag on
'disabled-transitively' status, leaving the red error border.
Fix:
- Call set_runtime_state('disabled-transitively') after resetting
run_result_status so the backend object state stays consistent
- Call CellNotificationUtils.broadcast_empty_output to replace the
stale error output in the UI
- Reset nextCell.errored = false in the 'disabled-transitively' case
in transitionCell() so the has-error CSS class is cleared
- Broaden is_any_ancestor_errored to include 'marimo-error' status
in addition to 'exception'
- Add test for marimo-error case
for more information, see https://pre-commit.ci
Fixes issue marimo-team#8072: disabled cells permanently show ancestor error after ancestor recovers. Backend (runtime.py): - Snapshot disabled cells in error/cancelled state before running - After _run_cells completes, broadcast empty output + correct status for any snapshotted cell whose ancestor has now recovered - Moved to _run_cells() to cover set_ui_element_value() path too - config.disabled cells get idle + empty output - transitively-disabled cells get disabled-transitively + empty output Frontend (cell.ts): - disabled-transitively case: also clear stopped flag - idle case: clear stopped/errored when non-error output arrives
for more information, see https://pre-commit.ci
1d2ee2d to
25e8a48
Compare
|
Hi @dmadisetti , addressed the nit, removed the underscore prefixes. Happy to merge as-is if you are! |
…ll-error-state-not-cleared
dmadisetti
left a comment
There was a problem hiding this comment.
Unsure what's happening with ci. runtime tests pass locally. Thanks @VishakBaddur sorry for the long delay!
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.2-dev51 |





Fixes #8072
Root Cause
When a disabled-transitively cell's ancestor had an error and then recovered, the disabled cell permanently showed the ancestor's error state.
run_stale_cells()inruntime.pyonly re-queues non-disabled cells:So disabled-transitively cells never got re-queued and never had a chance to reset their
run_result_statusfrom"exception"to"disabled".Fix
is_any_ancestor_errored()toDirectedGraphrun_stale_cells(), after buildingcells_to_run, resetrun_result_statusto"disabled"for any disabled-transitively cell whose ancestor no longer has an errorTesting
Added
test_is_any_ancestor_erroredtotests/_runtime/test_dataflow.pyverifying the new graph method correctly detects and clears ancestor error states.