Skip to content

Add rerun policy for rollups when an upstream partition is re-run#68778

Open
FrankYang0529 wants to merge 1 commit into
apache:mainfrom
FrankYang0529:airflow-65923
Open

Add rerun policy for rollups when an upstream partition is re-run#68778
FrankYang0529 wants to merge 1 commit into
apache:mainfrom
FrankYang0529:airflow-65923

Conversation

@FrankYang0529

Copy link
Copy Markdown
Member

When an upstream partition that a rollup's downstream window already consumed is cleared and re-run, the framework had no defined behavior. The de-facto outcome silently depended on the rollup's wait policy: WaitForAll left a provisional run stuck waiting for keys that never re-arrive, while MinimumCount re-fired the downstream run on partial data. Neither is something a Dag author can rely on.

Give RollupMapper an explicit rerun_policy so the author chooses what happens. The default preserves the historical behavior, so existing Dags are unchanged; re-firing with the corrected data is opt-in.

closes: #65923

Was generative AI tooling used to co-author this PR?
  • Yes - Claude Code with Opus 4.8

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

When an upstream partition that a rollup's downstream window already
consumed is cleared and re-run, the framework had no defined behavior.
The de-facto outcome silently depended on the rollup's wait policy:
WaitForAll left a provisional run stuck waiting for keys that never
re-arrive, while MinimumCount re-fired the downstream run on partial
data. Neither is something a Dag author can rely on.

Give RollupMapper an explicit rerun_policy so the author chooses what
happens. The default preserves the historical behavior, so existing
Dags are unchanged; re-firing with the corrected data is opt-in.

Signed-off-by: PoAn Yang <payang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Idempotency: downstream rollup behavior when an upstream partition is cleared and re-run

1 participant