-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url: refactor truncating long hostname #9292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The moving-part-of-the-hostname-to-the-path bit is really surprising and, I would argue, buggy behavior. Is that documented anywhere? If not, I wonder if the correct thing to do is a (hopefully semver-patch) change that treats such URLs as invalid. (A hostname longer than 63 chars is invalid per RFC 1035.)
EDIT: It doesn't appear to be documented anywhere.
/cc @nodejs/http-parser I guess, although I suppose this isn't |
@Trott OK. I agree. I'll look into the document again. |
Looks like this behavior was introduced (or at least refactored?) in db912813 by @mscdex. I'm not sure if it was unintentional and the intended behavior was something else, or if this is totally intended behavior on the theory that "garbage input results in garbage output". So maybe @mscdex can let us know? |
ping @mscdex |
I honestly don't remember, that was awhile ago. All I could go by is whatever tests we had at the time. |
Looking at what I would say the path forward is:
What do others think? |
I should add that, to me at least, the real objectionable thing here is not the truncation but the moving of the remainder of the hostname into the path. That just makes no sense to me. I cannot think of a situation where that is helpful. And I can think of situations where that can introduce bugs, for sure. |
@Trott Thanks.
|
re: #9521 (comment), I think truncation is bizarre. I don't believe that the URL RFCs say that host names have to be _DNS_ host names, there are lots of ways of resolving names, so I think truncation is a bug. Here's an example:
host names clearly are not the same as DNS names, they can be longer than DNS allows and still be resolveable. |
de9563d
to
c186062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if new CI is ✅
/cc @sam-github |
@cjihrig @jasnell @sam-github It would be nice if you can check it! |
const code = hostname.charCodeAt(i); | ||
const isVc = code === 46/*.*/ || | ||
code >= 48/*0*/ && code <= 57/*9*/ || | ||
code >= 97/*a*/ && code <= 122/*\z*/ || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a backslash before z
here that needs to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thank you for finding.
for (var i = 0; i <= hostname.length; ++i) { | ||
const code = hostname.charCodeAt(i); | ||
const isVc = code === 46/*.*/ || | ||
code >= 48/*0*/ && code <= 57/*9*/ || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the parens around the groups of checks, IMHO it makes it easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think so too.
@Trott I'm just being more cautious is all, especially since this behavior has existed for a long time (even before my rewrite). If everyone else feels differently, then go ahead and remove the semver-major label. |
@mscdex |
c186062
to
1a0f8f5
Compare
There's a couple of changes that still need to be made, |
(code >= 65/*A*/ && code <= 90/*Z*/) || | ||
code === 43/*+*/ || | ||
code === 95/*_*/|| | ||
code > 127; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're changing things in here, I think it might be better to re-prioritize the conditionals here to optimize for the common cases. Here is what I propose:
const isVc = (code >= 97/*a*/ && code <= 122/*z*/) ||
code === 46/*.*/ ||
(code >= 65/*A*/ && code <= 90/*Z*/) ||
(code >= 48/*0*/ && code <= 57/*9*/) ||
code === 45/*-*/ ||
code === 43/*+*/ ||
code === 95/*_*/||
code > 127;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscdex In my opinion, that makes more sense. It looks more common in this case. Thanks a lot.
Currently, around line 417 lib/url.js is truncating hostname and put the rest of hostname to the path if hostname length after `.` is equal or more than 63. This behavior is different from browser behavior. I changed the code so that it doesn’t truncate. I also added the test example which has more than 63 length in after `.` in hostname in test url.
1a0f8f5
to
ac7c6be
Compare
@mscdex I have applied your review which you proposed. |
(code >= 48/*0*/ && code <= 57/*9*/) || | ||
code === 45/*-*/ || | ||
code === 43/*+*/ || | ||
code === 95/*_*/|| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: there is a missing space before the ||
. I missed that in my proposed example earlier.
@mscdex Thank you! I corrected it. I will be careful. |
LGTM |
(arm-fanned failure on CI is infra related and unrelated to this change.) |
Currently, around line 417 lib/url.js is truncating hostname and put the rest of hostname to the path if hostname length after `.` is equal or more than 63. This behavior is different from browser behavior. I changed the code so that it doesn’t truncate. I also added the test example which has more than 63 length in after `.` in hostname in test url. PR-URL: nodejs#9292 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Brian White <mscdex@mscdex.net>
Landed in c65d55f. Two months in the making. 🎉 |
Currently, around line 417 lib/url.js is truncating hostname and put the rest of hostname to the path if hostname length after `.` is equal or more than 63. This behavior is different from browser behavior. I changed the code so that it doesn’t truncate. I also added the test example which has more than 63 length in after `.` in hostname in test url. PR-URL: nodejs#9292 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Brian White <mscdex@mscdex.net>
Checklist
make -j8 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
lib url
test url
Description of change
I deleted and refactored the line around 417 in url.js which was truncating hostname if hostname length after
.
is more than 63. Thus url parse behavior for hostname should be same as browser's behavior. Now it doesn’t truncate hostname.