mod_http2: reset truncated HTTP/2 responses; spare header-only ones#676
Draft
mmontalbo wants to merge 2 commits into
Draft
mod_http2: reset truncated HTTP/2 responses; spare header-only ones#676mmontalbo wants to merge 2 commits into
mmontalbo wants to merge 2 commits into
Conversation
A mod_cache revalidation 304 (EOR + Flush, no body EOS) reaches s_c2_done() with an output beam that is neither closed nor EOS'd, so h2_beam_is_complete() reports it incomplete and s_c2_done() aborts it, resetting the stream (the client sees "stream not closed cleanly"). But a header-only response (204/304, and HEAD) is complete without a body EOS. Track response completeness in conn_ctx->response_eos_seen, set when an EOS passes out (h2_c2_filter_out) or the response is header-only, and spare such responses from the s_c2_done() abort. stream_data_cb() already carries the same header-only exemption; this brings s_c2_done() in line. Adds test_h2_105_21: a mod_cache revalidation 304 must close cleanly.
On builds without AP_HAS_RESPONSE_BUCKETS (httpd 2.4.x), the h2 worker runs the c2 connection through h2_c2_process()/c2_process() rather than the generic ap_process_connection(). c2_process() closes the output beam unconditionally when the handler returns, so a response that started but never finished (no EOS, e.g. a CGI that hits the server Timeout mid-body) leaves a closed beam that h2_beam_is_complete() reports as complete. s_c2_done() then sends no reset, and stream_data_cb()'s reset fires only on a c1 re-dispatch that is lost under load, so the stream is parked and a client without a stall timeout hangs. When c2_process() finishes with a final response started but no EOS seen (response_eos_seen, from the preceding commit), abort the output beam instead of closing it; an aborted beam returns APR_ECONNABORTED from h2_beam_receive() on the next pull, so the reset fires regardless of the wakeup. Header-only responses (204/304, and HEAD) are marked complete on this path too (h2_c2_filter.c) so they are not aborted. Adds test_h2_105_20: a CGI that goes silent past Timeout, run as concurrent streams, must be reset.
11acde1 to
094f580
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series addresses two ways mod_http2 mis-terminates an HTTP/2 stream when
a c2 (request) handler finishes without sending an EOS. Patch 1 completes the
PR 69580 fix on trunk; patch 2 fixes a separate hang (PR 70131) on
the 2.4.x build path. Both come from one gap: h2_beam_is_complete() cannot
tell a header-only response (204/304, and HEAD -- complete without a body
EOS) from a truncated one, and mod_http2 compiles one of two c2 output paths
depending on the server's module magic (AP_HAS_RESPONSE_BUCKETS is 1 for
MMN >= 20211221, 0 for 2.4.x), so the two paths fail in opposite directions:
the response-bucket path resets a valid 304, and the legacy path leaves a
truncated response hanging.
r1924267 fixed PR 69580 by exempting header-only responses from the
missing-EOS reset in stream_data_cb(), and that exemption was backported to
2.4.x (r1928581), where it is enough: the legacy c2_process() closes the
output beam, so the 304's beam is complete by the time s_c2_done() runs and
stream_data_cb() is the only reset path. On trunk the response-bucket path
leaves the beam incomplete, and s_c2_done()'s own abort (added in the 2023
HTTP/2 WebSockets work) still resets the 304 -- a path
r1924267 did not cover, and which is not exercised on 2.4.x. PR 69580
therefore still reproduces on trunk. Patch 1 gives s_c2_done() the same
header-only exemption that stream_data_cb() already has.
A new conn_ctx flag, response_eos_seen ("the response is complete; an EOS was
seen, or it is header-only"), is set once where the response is finalized and
read by whichever output path is compiled, so both fixes are driven from the
same signal.
With these patches:
A header-only response (204/304, and HEAD) is no longer RST_STREAM'd on
trunk. A mod_cache revalidation 304 (EOR + Flush, no body EOS) now closes
cleanly rather than resetting the stream (PR 69580).
A response that started but never sent EOS (e.g. a CGI that hits the
server Timeout mid-body) is reset instead of left parked, on the legacy
c2 path. Previously a client without a stall timeout waited indefinitely,
while the same handler over HTTP/1.1 closed the connection and failed
fast (PR 70131).
The completeness decision is layered at the c2 level rather than pushed into
h2_beam_is_complete(): the beam has no view of the response status, so it
cannot know that a body-less 304 is complete, whereas the c2 filters and
conn_ctx do. h2_beam_is_complete() therefore stays a pure beam-state query,
and patch 1's exemption sits next to the one stream_data_cb() already applies.
Patch 2 changes only the legacy (!AP_HAS_RESPONSE_BUCKETS) path, which is not
compiled on trunk: trunk runs c2 through the generic ap_process_connection()
and never force-closes the beam, so it does not have the hang. The change is
inert on trunk and becomes active when the module is built for, or synced to,
2.4.x. Patch 1's response-bucket fix is the live fix on trunk.
Patch 1: complete r1924267 (PR 69580) on trunk. Introduce
conn_ctx->response_eos_seen, set when an EOS passes out (h2_c2_filter_out)
or the response bucket carries a header-only status
(AP_STATUS_IS_HEADER_ONLY). In s_c2_done(), spare a response that is
complete or header-only from the "incomplete output" abort, so a body-less
mod_cache revalidation 304 is no longer reset. Adds test_h2_105_21 (a 304
revalidation must close cleanly).
Patch 2: reset an incomplete response on the legacy c2 path (PR 70131).
There the worker runs c2_process(), which closes the output beam
unconditionally when the handler returns, so a response that began but
never sent EOS leaves a closed beam that h2_beam_is_complete() reports
complete; s_c2_done() then sends no reset, and stream_data_cb()'s reset
fires only on a c1 re-dispatch that is lost under load, parking the
stream. Abort the beam instead of closing it when a final response was
started but response_eos_seen is unset; an aborted beam returns
APR_ECONNABORTED from h2_beam_receive() on the next pull (checked before
the transfer loop), so the reset fires regardless of how the wakeup is
scheduled. Header-only responses are marked complete on this path too
(h2_c2_filter.c) so they are spared. Adds test_h2_105_20 (a CGI that goes
silent past Timeout, run as concurrent streams, must be reset).
[1] PR 69580: https://bz.apache.org/bugzilla/show_bug.cgi?id=69580
[2] PR 70131: https://bz.apache.org/bugzilla/show_bug.cgi?id=70131