fix userinfo in case of some special characters by kastakhov · Pull Request #144 · libwww-perl/URI

kastakhov · 2024-08-17T22:28:27Z

Fix for #143

add some UT for this case

oalders · 2024-08-20T01:03:58Z

-	}
+    if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) {
+        my $scheme = $1;
+        my $ui = $2 || '';


Could we have a more descriptive name for this variable?

I realize this variable was not your creation. :)

Sure thing, I have renamed to $userinfo.
I decided to rename in whole file as it might be a little confusing when we're using $ui or $userinfo.

oalders · 2024-08-20T15:50:02Z

  }

-  if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) {
+  if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) {


From a discussion with @genio on IRC:

It would be nice to avoid (.*@) or (.*:.*@) as it's greedy and could be prone to bugs.

We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64

After gathering the host and port part, ^([^@]+@) should be fine.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.

For my standpoint (.*:.*@) is pretty lazy, but okay, I agree that it could be better.

@oalders, what you think about ([^:]+:.*@) for matching user info?

This will match everything except : as username cannot contain colon, then match : as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @ as second separator.

By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#? symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d

For instance, this regex with username and password which contain all special characters in unescaped form.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

Then it should be URL-encoded as %40. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Hm.
Now I see, it'll match this path incorrectly.

Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.

oalders · 2024-08-20T15:50:56Z

Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful.

kastakhov · 2024-08-22T11:01:00Z

Hi, @oalders, @genio, any another suggestions for these changes or now it looks good for you?

SineSwiper · 2024-08-26T19:13:55Z

This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but...

The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential @ characters in the password ($new =~ s/@/%40/g), and at a maximum, just call uri_escape for both user/password sets.

oalders · 2024-08-26T19:15:16Z

Is there a good reason not to just call uri_escape?

genio · 2024-08-26T19:27:39Z

I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI:

#!/usr/bin/env perl

use v5.36;
use strict;
use warnings;

use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();

my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';

{
    # mojolicious way:
    my $username = Mojo::Util::url_escape($unescaped_u);
    my $password = Mojo::Util::url_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "Mojo: ", $foo;
    my $url = Mojo::URL->new($foo);
    # say "Mojo: ", $url->to_string();
    say "Mojo: ", $url->userinfo();
}

{
    # URI way
    my $username = URI::Escape::uri_escape($unescaped_u);
    my $password = URI::Escape::uri_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "URI:  ", $foo;
    my $url = URI->new($foo);
    # say "URI:  ", $url->as_string();
    say "URI:  ", URI::Escape::uri_unescape($url->userinfo());
}

Yielding:

% perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI:  u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~

SineSwiper · 2024-08-26T21:06:57Z

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

SineSwiper · 2024-08-26T21:30:18Z

Well, disregard my user/password set problem. I realized that userinfo is running through URI::Escape::escape_char($1) and everything works correctly, as long as it didn't start off confused (ie: URI->new with reserved characters in the password).

kastakhov · 2024-08-26T21:48:37Z

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

Just in case, originally I faced with this issue here..

make _uric_escape regex more specific for userinfo part update UT

kastakhov · 2024-08-26T22:04:49Z

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃

genio · 2024-08-26T22:22:21Z

The seemingly relevant RFC pieces are linked here:
https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1
https://datatracker.ietf.org/doc/html/rfc3986#section-7.3
https://datatracker.ietf.org/doc/html/rfc3986#section-7.5

I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using URI::Escape::uri_escape. I believe making changes to the behavior here could result in unexpected double percent encoding for some folks who are already doing it the way in the sample script above.

SineSwiper · 2024-08-27T14:08:40Z

I found Section 2.2 earlier, but didn't see where those declarations were used. Good find.

So, this means we should stay away from unnecessarily parsing gen-delims in the userinfo section. At the very least, .* is way too broad.

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly.

Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL.

buralien · 2025-03-11T21:53:10Z

-	if (_host_escape($host)) {
-	    $str = "$scheme//$ui$host$port$rest";
-	}
+    if ($str =~ m,^((?:$URI::scheme_re:)?)//([^:]+:[^@]*@)?([^/?\#]*)(.*)$,os) {


This will not match correctly valid string where the optional password is omitted (such as ftp://user@host) and will also match what I think is invalid string ftp://user:@host

Wouldn't it be better written as something like: m,^((?:$URI::scheme_re:)?)[/][/]([^:]+(?:[:][^@]+)?[@])?([^/?\#@]*)(.*)$,os

fix userinfo in case of some special characters

631eb5f

add some UT for this case

oalders reviewed Aug 20, 2024

View reviewed changes

rename $ui variable to $userinfo

ab3ff48

oalders reviewed Aug 20, 2024

View reviewed changes

make regex in _uric_escape more specific

6113394

revert changes in URI.pm

837315c

make _uric_escape regex more specific for userinfo part update UT

buralien reviewed Mar 11, 2025

View reviewed changes

oalders assigned simbabque May 1, 2025

Conversation

kastakhov commented Aug 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kastakhov Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kastakhov Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kastakhov Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oalders commented Aug 20, 2024

Uh oh!

kastakhov commented Aug 22, 2024

Uh oh!

SineSwiper commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oalders commented Aug 26, 2024

Uh oh!

genio commented Aug 26, 2024

Uh oh!

SineSwiper commented Aug 26, 2024

Uh oh!

SineSwiper commented Aug 26, 2024

Uh oh!

kastakhov commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kastakhov commented Aug 26, 2024

Uh oh!

genio commented Aug 26, 2024

Uh oh!

SineSwiper commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kastakhov Aug 20, 2024 •

edited

Loading

kastakhov Aug 20, 2024 •

edited

Loading

kastakhov Aug 26, 2024 •

edited

Loading

SineSwiper commented Aug 26, 2024 •

edited

Loading

kastakhov commented Aug 26, 2024 •

edited

Loading

SineSwiper commented Aug 27, 2024 •

edited

Loading