fix: concurrency-safe & idempotent release commit-push to main#188
Draft
PaulNewling wants to merge 3 commits into
Draft
fix: concurrency-safe & idempotent release commit-push to main#188PaulNewling wants to merge 3 commits into
PaulNewling wants to merge 3 commits into
Conversation
The release job's "Commit changed files to main" step ran `changeset version` then a bare `git push`, with no serialization and no retry. When `main` advanced between checkout and push — a concurrent run producing the identical "Auto-generated changes" commit, another merge, or a queued run — the push failed with "cannot lock ref 'refs/heads/main': is at X but expected Y", the step exited 1, and the release was left half-done: versions bumped on main but npm publish + tag (gated on the follow-up run) never happened. Re-running could not recover, since a re-run checks out the frozen run head, now behind main. - Add per-ref `concurrency` (cancel-in-progress: false) to serialize releases so two runs cannot race the push. - Replace the bare push with a rebase-and-retry loop that treats an identical commit already on main as success (idempotent). Applied to both node-simple-pnpm.yaml and node-matrix-pnpm.yaml.
Correct the concurrency comment: with queue:single (default) a pending publish run is superseded, not queued, when a newer same-ref run arrives. Document why this is safe (two-pass release self-heals; only an intermediate version/tag is skipped under burst merges) and note queue:max as the per-bump-tag escape hatch. Reconcile the push-loop comment with the concurrency block.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Make the block-release flow's "Commit changed files to main" step concurrency-safe and idempotent, in both reusable release workflows (
node-simple-pnpm.yamlandnode-matrix-pnpm.yaml).Why — a half-published release
Motivating failure: antibody-tcr-lead-selection — run 28113341538, job 83255658745 (merge of PR #158).
Build, tests, and the security scan passed. The job failed at "Commit changed files to main":
Root cause
The release runs in two passes. A merge carrying changesets runs
changeset version, commits the bump as "Auto-generated changes", and pushes it tomain. Publish and tag are gated onhas-changes == '0', so they run on the follow-up run that this push triggers.The commit step pushed bare — no serialization, no retry:
Two defects compound:
No
concurrencycontrol. Two near-simultaneous main releases each ranchangeset versionand produced the byte-identical "Auto-generated changes" commit — same parent, same bot author, same second-resolution timestamp, so the same SHA. One push won the compare-and-swap; the other lost.A failed bump-push strands the release. The bump landed on
main, but the step exited 1, so the publish/tag pass never ran for that content.mainthen declared versions npm never received: model/ui/workflow at4.3.0onmain, still4.2.xon npm, and nov3.2.0tag. Re-running can't recover — a re-run checks out the frozen run head, now behindmain, so every push fails non-fast-forward.What this PR changes
For the release job in both workflows (
build-test-publish/build-publish):Serialize releases per ref so two runs cannot race the push:
cancel-in-progress: falselets an in-flight release finish (publish + tag) rather than being cut mid-publish.Push with rebase-and-retry — replace
git pushwith a loop that rebases onto the latestmainand treats an identical commit already onmainas success:This survives
mainadvancing between checkout and push and succeeds when the bump is already onmain.Concurrency semantics — what this does and does not guarantee
GitHub keeps one running and one pending run per group (
queue: single, the default).cancel-in-progress: falseprotects the in-progress run, not a pending one: a newer same-ref run supersedes the pending run and takes its place.This is safe because the release is two-pass and self-heals:
has-changes == '0'.package.jsonversions — the full accumulated bump — so the latest code always reaches npm once merges quiesce.The supersession costs only intermediate releases, and only under burst merges. If a second merge lands while an earlier release is still running, the earlier publish run can be cancelled; npm then skips that intermediate version and its
v…tag and folds those changes into the next published version. On a normal cadence — one release finishing before the next merge — nothing is superseded and every version publishes.To give every bump its own version and tag, set
queue: max(compatible withcancel-in-progress: false; the rejected combination isqueue: maxwithcancel-in-progress: true). I left it at the default; flag if you want per-bump tags.Scope and trade-offs
cancel-in-progress: falsealso touches PR runs sharing a ref. Rapid re-pushes to a PR queue the release job instead of cancelling the in-progress one. Acceptable and safer, but flagging it in case you prefer scoping concurrency tomain.Testing
Static. Both files validated with
yqandruby; theconcurrencyblocks and the new push script sit under the correct jobs and steps.Concurrency semantics confirmed against GitHub docs.
queue: singlesupersedes the pending run regardless ofcancel-in-progress;queue: maxis the escape hatch and is incompatible only withcancel-in-progress: true.Behavioral — deterministic local reproduction. Built a git harness that recreates the failure window without GitHub or timing luck: a bare "origin", a "runner" clone left pointing at the old tip, and a competitor that advances
origin/mainbefore the runner pushes. Fixed commit dates and local-only repos keep it reproducible. Each scenario runs the old push (baregit push) and the new push (rebase-retry) against the same setup:git pushmainadvanced under the runner[rejected]— step exits 1, release strandedmainmain(idempotent re-entry)concurrencyprevents it in practice)The harness reproduces the failure class (push rejected → exit 1 → release stranded). The literal
cannot lock ref … but expected …string is HTTPS-transport-specific and collapses to "up-to-date" against a local remote — the rebase-retry covers it regardless, andconcurrencyremoves the concurrent-twin trigger.Not yet run on a live release. Canary against a low-traffic block PR (flip its
build.yamlto@v4-beta) and confirm the bump, npm publish, and tag all complete before promoting tov4.