feat: add curl_cffi fallback to bypass WAF/TLS-fingerprint blocks by marcusosterberg · Pull Request #1471 · Webperf-se/webperf_core

marcusosterberg · 2026-05-14T13:19:08Z

Resolves #1470

Depends on #1469 — please merge that one first.

What this changes

New file: helpers/http_helper.py exposing http_get_with_fallback()
tests/utils.py: two-line edit in get_http_content (one import + one call-site swap)
requirements.txt: adds curl-cffi>=0.13.0

How it works

requests.get is tried first — fast, identical to existing behaviour
On ConnectionError, retry once via curl_cffi with impersonate="chrome131"
If curl_cffi is unavailable or also fails, the original requests.exceptions.ConnectionError is re-raised so the calling except block in get_http_content behaves exactly as before

Existing exception handling for SSL errors, redirects, timeouts, etc. is untouched because the helper only handles ConnectionError.

Testing

Verified locally against https://bolagsverket.se:

Before: four Connection error! messages, empty result file
After: robots.txt and security.txt fetched successfully via fallback; standard-files test produces a real rating

Set WEBPERF_HTTP_HELPER_DEBUG=1 to see fallback activity:

[http_helper] primary requests.get failed for https://bolagsverket.se/robots.txt: ConnectionError
[http_helper] falling back to curl_cffi with impersonate=chrome131
[http_helper] curl_cffi succeeded for https://bolagsverket.se/robots.txt: status=200

Backward compatibility

curl-cffi is wrapped in try/except ImportError — if it is not installed, the helper degrades to plain requests.get
The public API of get_http_content is unchanged
curl_cffi's response object is drop-in compatible with requests.Response for the attributes actually used by webperf_core (.text, .content, .status_code, .headers)
CI does not need any changes; curl-cffi ships pre-built wheels for all platforms the project supports

Out of scope

The requests.get call in get_url_headers() (rad 650 i tests/utils.py) — that one is for HEAD requests and was not observed to fail on bolagsverket. Can be addressed in a follow-up if needed.
Configurable impersonation profile — chrome131 default is sufficient for currently observed blocks. A future PR could expose it via settings if multiple profiles are needed.

Standards note

The files this helper is most often used to fetch — /robots.txt (RFC 9309) and /.well-known/security.txt (RFC 9116) — are by definition intended to be machine-readable by automated tools. Sites that WAF-block them are arguably in violation of the relevant RFCs. This PR is a pragmatic workaround on the consumer side.

…path addIssue() was defined with three positional arguments but called with four in the failure branch of run_test(). The extra argument was a text string carrying site-unavailable context that was never actually stored on the sub-issue. This commit adds `text` as an optional fourth parameter on addIssue and stores it on the sub-issue when provided. The call site is updated to pass arguments in the correct order: (result_dict, rule_id, url, text). All ~25 existing 3-argument call sites continue to work unchanged. Triggered by any site where the initial HTTP request raises ConnectionError, e.g. WAF-protected sites that drop python-requests at the TLS handshake (bolagsverket.se and other Swedish government sites confirmed).

Some enterprise WAF appliances (Akamai, Imperva, F5 ASM) drop the TCP connection from python-requests at the TLS ClientHello stage. The user gets requests.exceptions.ConnectionError instead of an HTTP response, even though a real browser to the same URL succeeds. This blocks webperf_core from fetching standard files (robots.txt, security.txt, etc.) on WAF-protected sites including several Swedish government agencies. This commit: - Adds helpers/http_helper.py exposing http_get_with_fallback(), which tries plain requests.get() first and falls back to curl_cffi (with impersonate="chrome131") on ConnectionError. If curl_cffi is not installed or also fails, the original error is re-raised so existing exception handling continues unchanged. - Updates tests/utils.py:get_http_content() to route through the helper (one import + one call-site swap; surrounding code unchanged). - Adds curl-cffi>=0.13.0 to requirements.txt. Debug logging available via WEBPERF_HTTP_HELPER_DEBUG=1 (prints fallback activity to stderr). Verified against https://bolagsverket.se — previously failed with four ConnectionError messages and an empty result; now fetches robots.txt and security.txt successfully via fallback.

7h3Rabbit · 2026-05-18T15:38:18Z

@marcusosterberg I agree with the problem description, but a highly recommend a different solution.
We should use sitespeed for everything instead if the normal python way is causing problems here as well.
staring to use sitespeed for everything else was to solve problems like this, where a normal browser is better.

Using sitespeed instead of PR suggested way has the benefit of supporting more browsers (read: firefox)

7h3Rabbit

This solution is not recommended. please use sitespeed for this instead (read: use a real browser instead of again trying to lie about the fact that we are not a real browser.

Using sitespeed has the benefit of using same logic across webperf-core.

marcusosterberg added 2 commits May 14, 2026 15:12

marcusosterberg requested review from 7h3Rabbit and cockroacher May 14, 2026 13:19

7h3Rabbit requested changes May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add curl_cffi fallback to bypass WAF/TLS-fingerprint blocks#1471

feat: add curl_cffi fallback to bypass WAF/TLS-fingerprint blocks#1471
marcusosterberg wants to merge 2 commits into
mainfrom
feat/curl-cffi-waf-fallback

marcusosterberg commented May 14, 2026

Uh oh!

7h3Rabbit commented May 18, 2026

Uh oh!

7h3Rabbit left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

marcusosterberg commented May 14, 2026

What this changes

How it works

Testing

Backward compatibility

Out of scope

Standards note

Uh oh!

7h3Rabbit commented May 18, 2026

Uh oh!

7h3Rabbit left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants