Skip to content

fix: hash request kwargs and headers correctly#255

Merged
D4Vinci merged 4 commits intoD4Vinci:devfrom
yetval:fix/request-fingerprint
Apr 30, 2026
Merged

fix: hash request kwargs and headers correctly#255
D4Vinci merged 4 commits intoD4Vinci:devfrom
yetval:fix/request-fingerprint

Conversation

@yetval
Copy link
Copy Markdown
Contributor

@yetval yetval commented Apr 26, 2026

Summary
Request fingerprint collisions: Request.update_fingerprint() previously hashed only kwarg names and lowercased header values. Distinct requests could collapse to the same fingerprint when fp_include_kwargs or fp_include_headers were enabled, which can silently break scheduler deduplication, cache replay, and checkpoint restore. Fix: hash kwarg names together with their values and preserve header values as-is.

Repro
from scrapling.spiders.request import Request

r1 = Request("https://example.com", timeout=1)
r2 = Request("https://example.com", timeout=2)

assert r1.update_fingerprint(include_kwargs=True) != r2.update_fingerprint(include_kwargs=True)

Files changed
scrapling/spiders/request.py — fingerprint kwargs/header handling

@samrusani
Copy link
Copy Markdown

I did a quick local check against this patch and the behavior matches the PR description:

from scrapling.spiders.request import Request

assert Request("https://example.com", timeout=1).update_fingerprint(include_kwargs=True) != Request("https://example.com", timeout=2).update_fingerprint(include_kwargs=True)
assert Request("https://example.com", headers={"X-Test": "A"}).update_fingerprint(include_headers=True) != Request("https://example.com", headers={"X-Test": "a"}).update_fingerprint(include_headers=True)

tests/spiders/test_request.py also passed locally for me after applying the patch (28 passed).

One small suggestion: it would be worth adding these as regression tests in tests/spiders/test_request.py, since the contribution guide asks bug fixes to include code that reproduces the bug. That would lock in both the kwargs-value collision and the header-value case handling.

@yetval
Copy link
Copy Markdown
Contributor Author

yetval commented Apr 26, 2026

Hey @samrusani, I have went ahead and added tests. Thanks for the suggestions!

Comment thread scrapling/spiders/request.py Outdated
kwargs = (key.lower() for key in self._session_kwargs.keys() if key.lower() not in ("data", "json"))
data["kwargs"] = "".join(set(_convert_to_bytes(key).hex() for key in kwargs))
filtered_kwargs = {
key.lower(): str(value)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using str(value) for kwarg values is fragile if someone passes a non-primitive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just fixed this.

@yetval yetval requested a review from D4Vinci April 30, 2026 13:59
Copy link
Copy Markdown
Owner

@D4Vinci D4Vinci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@D4Vinci
Copy link
Copy Markdown
Owner

D4Vinci commented Apr 30, 2026

Good catch as always @yetval
We will merge once the checks pass again.

@D4Vinci D4Vinci added bug Something isn't working Ready-to-Merge This PR is approved and ready to merge once the author addresses the raised points. labels Apr 30, 2026
@D4Vinci D4Vinci merged commit 90a853d into D4Vinci:dev Apr 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Ready-to-Merge This PR is approved and ready to merge once the author addresses the raised points.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants