Skip to content

Add regression tests for special characters in URLs#149

Merged
marevol merged 2 commits intomasterfrom
test/url-special-char-regression-tests
Mar 12, 2026
Merged

Add regression tests for special characters in URLs#149
marevol merged 2 commits intomasterfrom
test/url-special-char-regression-tests

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Mar 12, 2026

Summary

  • Add regression tests for topic/2732 and topic/2733 covering special character handling in URL parsing across multiple client and transformer components
  • Tests document current behavior for brackets, percent literals, curly braces, pipes, Unicode, and already-encoded characters

Changes Made

  • FileSystemClientTest: Tests for preprocessUri with special chars (brackets, percent, curly braces, pipes, Unicode) and doGet with bracket/percent file paths
  • FtpAuthenticationTest: Tests for matches() with special characters and percent-encoded paths
  • FtpClientTest: Tests for FtpInfo parsing with brackets, literal percent, already-encoded percent, and mixed space/percent
  • Hc4HttpClientTest: Tests for constructRedirectLocation with brackets, percent, Unicode, and HTML entity characters
  • Hc5HttpClientTest: Same redirect location tests as Hc4 for the HttpComponents 5.x implementation
  • HostIntervalControllerTest: Tests for host extraction from URLs containing brackets, percent, and spaces
  • HtmlTransformerTest: Tests for URL resolution with brackets, percent, fragment-only links, and parent traversal edge cases

Testing

  • All tests are self-contained and use temporary files where needed
  • Tests document current behavior (some special chars cause exceptions) to detect regressions if URI handling changes

Additional Notes

  • Tests reference topic/2732 (special characters causing URISyntaxException) and topic/2733 (Unicode/HTML entities in redirects)
  • Some tests expect and catch exceptions to document known limitations

🤖 Generated with Claude Code

marevol and others added 2 commits March 11, 2026 23:55
Replace java.net.URI with java.net.URL throughout HtmlTransformer to
simplify URL resolution logic. Remove deprecated URI-based overloads and
eliminate the complex fallback path that manually reconstructed absolute
URLs when URISyntaxException was thrown. The URL class handles relative
URL resolution natively, making the code cleaner and more straightforward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add regression tests for topic/2732 and topic/2733 covering special
character handling (brackets, percent, curly braces, pipes, Unicode)
across FileSystemClient, FtpClient, FtpAuthentication, Hc4HttpClient,
Hc5HttpClient, HostIntervalController, and HtmlTransformer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@marevol marevol merged commit 8bc38b8 into master Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant