-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relative URLs in WHATWG URL API #12682
Comments
I do not suspect that we will be able to actually deprecate |
I vote for "do nothing", at least until it's clear what the use cases for these base-less relative URLs are and how people use them. If we find out people use them a lot, I think a better solution would be (preferably in user-land) a "RelativeURL" class, not "TolerantURL", which only has pathname/search/searchParams/hash/toString(). Someone would have to specify how this works, but maybe as a first-pass it could have an internal real-URL and parse against |
@domenic it's worth pointing out that working with relative URLS is extremely common in Node - the most common use case I can think of is when an incoming HTTP request arrives - you get a relative URL under I think it would be a shame to keep two URL APIs just for relative URLs. It would be really nice if |
It just doesn't make sense to have a single API for both relative and absolute URLs---the components and parsing rules are far too different depending on the base used for the rest of the API to make any sense. So I'm pretty sure the spec isn't going to change, just because there's no underlying model that makes sense. The best you can do if you want to use one API is to make up a base URL. |
@domenic I realize that this is a hard problem, and one I do not understand very well - but I think having a base URL in Node but not in the browser could cause a lot of incompatibility when people expect code using the same spec to run the same way on both platforms. I think we can only change |
Which can be done, of course, using the information in an HTTP request (in the case of request.url) so I'm not overly concerned with that particular case. The key challenge, of course, is that without a base, it's impossible to say for sure which rules to apply to the relative bits. We either must provide a base or we must provide an equivalent context in order to properly handle the URL. Otherwise the parsing will be best guess at best. |
A compromise might be adding |
How? An HTTP server is not aware of its host name, may have several dns host names or none. In fact, I'd argue that the end server should not be concerned with what hostname it's using. |
It can make a best guess using the protocol and host header, both of which may be modified of course, but it provides enough context to provide a base URL when parsing the request URI. |
I'll not claim to know which, if any, of the 3 suggested solutions are best, but I just wanted to add a few things to the discussion: I've seen incoming HTTP requests to Node servers that don't have a The 2nd issue is knowing the protocol. This can be inferred by looking at the Bottom line is that we can only safely get the path (via |
So what I'm seeing is that what people really want a partial URL parser for is to parse the origin form of the request target in an HTTP request, which is defined as:
We can introduce a |
hmm... I'm hesitant to introduce a new class. This could be approximated in userland fairly easily using something like: const url = new URL(`https://localhost${absolutePath}?${query}`); Then look at the bits of |
@jasnell that doesn't look very usable on user input 😕 |
A userland module can make it more usable. I'd rather avoid adding a convenience class that is not part of the standard |
const relateURL = require('relateurl');
const base = new URL('http://fake/');
const url = new URL('/path?query', base);
url.searchParams.append('query2', 'value');
relateURL(url, base, { output: relateURL.ROOT_PATH_RELATIVE });
//-> /path?query&query2=value v1.0 will be released in the near future: /~https://github.com/stevenvachon/relateurl Perhaps I'll write a new RelativeURL('/path?query');
//-> RelativeURL { protocol: '', hostname: '', pathname: '/path/' ... } |
This issue also applies to absolute URLs without a hostname. For example, |
I made url-path to solve this problem for me. It supports absolute paths but it would be easy to add relative paths. |
I stumbled into this problem, too. I do not like to use a new class or module for this problem, so I resolved it by using a new protocol as the base. > new URL('/', 'relative:///');
URL {
href: 'relative:///',
origin: 'null',
protocol: 'relative:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: '' } > new URL('/folder/subfolder/../file.name', 'relative:///');
URL {
href: 'relative:///folder/file.name',
origin: 'null',
protocol: 'relative:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/folder/file.name',
search: '',
searchParams: URLSearchParams {},
hash: '' } Therefor I suggest internal support for this relative protocol. Maybe a registration of the protocol at the IANA is a good idea: https://tools.ietf.org/html/rfc7595 & https://www.iana.org/protocols. The WHATWG made it several times clear, that they do not intend to support relative URLs at all: whatwg/url#136. The only downside of the approach is the strange value of the |
@cosycode That works for most cases, but there can be subtle differences between such a > new URL('/folder\\subfolder/../file.name', 'relative:///').pathname
'/file.name'
> new URL('/folder\\subfolder/../file.name', 'http://a/').pathname
'/folder/file.name' While a non-special scheme treats the backslash as part of the file name, a special scheme treats it as if it were a forward slash. After spending some more time thinking about this issue, I don't think a one-size-fits-all solution exists, without us running into the same issues as What we could do is have specialized classes for specific tasks -- like the request target of a HTTP response (aka |
@TimothyGu An interesting behavior, I did not know exists. Maybe this special case can be mentioned in the IANA registration, so that the relative scheme behaves the same as http. Personally I do not like the idea of specialized classes, because it is agains the U concept of URL. On the other hand I do not see, how someone can push the WHATWG to change their position regarding protocol/host relative URL objects. If you really introduce specialized classes, please provide at least a convenient way to transform it into a regular URL like |
I think the real crux of the problem is that Although Node.js currently does no validation on the RFC 7230 does define a way to generate an "effective request URI", but doing so requires other information bundled within the HTTP request (such as I actually wrote a spec-compliant |
Note that on Webkit-based browsers, running URL {
hash: "#hash"
host: ""
hostname: ""
href: "about:blank#hash"
origin: "null"
password: ""
pathname: "blank"
port: ""
protocol: "about:"
search: ""
searchParams: URLSearchParams {}
username: ""
__proto__: URL
} Maybe Node.js could adopt a similar behavior? |
@aduh95 about:blank doesn't seem applicable to anything but web browsers. |
I will add an agenda item in our meeting to discuss this. That said, I am not sure what utility api's we would want as only a part of the http interfaces and not via the I did intend to bring up I don't mean this to say I don't see value in supporting relative |
This comment has been minimized.
This comment has been minimized.
@aduh95 that's bug in Safari, not a feature. See whatwg/url#539. |
@styfle want to move that comment over in nodejs/web-server-frameworks#71? It would be a good starter to the conversation I wanted to have there. And I have comments but don't want to hijack this thread to make them. |
The Say I want to just grab the hash portion of a relative URL If I want to strictly parse full URL's, fine, |
2.0.9 introduced a new URL parser based on WHATWG URL API. However, if the request.url is relative (which is the case), parsing fails: nodejs/node#12682
Check this one To parse the URL into its parts:
Once URL object is created this way we can use all its methods and properties. |
Given that we've (a) added documentation illustrating how to better handle relative URLs with the WHAT-WG API, and (b) We've backed off the deprecation of the legacy API, I'm going to close this issue for now. There's still an argument that could be made on the standards level for more ergonomic handling of relative URLs but those discussions are better directed to the whatwg/url repository. |
For the sake of completeness here is the corresponding issue in the whatwg/url repository: whatwg/url#531 |
I think this is very important. The WHATWG API has been designed to standardise existing browser behaviour, not to be the general URL API for platforms such as NodeJS. This thread shows that this causes issues, but it is a problem not as much with NodeJS as with limitations of the standard. I can predict that this will cause more problems down the road (and not just in Node) as the WHATWG API is becoming more widespread and people will necessarily hack around it to make it meet their needs. I recently completed my research on the technical part of the problem by releasing this somewhat low level library. My hope is that the community can use it as a basis for building a number of more polished URL APIs that do support relative URLs whilst maintaining compatibility with URLs as defined in the WHATWG standard. I have one attempt at such an API here (but please, come up with alternatives). |
Here is a lengthy discussion about the problem nodejs/node#12682
* fix(app-vite): Fix SSR publicPath check * refactor(app-vite): Add JSDoc types for #appOptions * fix(app-vite): Call SSR injectMiddlewares at the right time to enable publicPath middleware * fix(app-vite): Correctly use WHATWG URL constructor Here is a lengthy discussion about the problem nodejs/node#12682
@jasnell has that been formally declared anywhere? It may have been and I've just missed it. Should I PR the typings to remove the deprecation notice? |
Yep, if you look here https://nodejs.org/dist/latest-v18.x/docs/api/url.html#legacy-url-api, you'll see that the old API is now explicitly marked "Legacy" rather than "Deprecated" as of Node.js 15.13.0 |
As of Node.js 19, |
as it is actually dangerous (https://hackerone.com/reports/678487), and its status is not likely to be resolved (nodejs/node#42232, nodejs/node#12682)
We are on the track to slowly deprecate the non-standard
url.parse()
(#12168 (comment)) in favor of the new WHATWG standard-based URL API. One use case that currently cannot be migrated over fromurl.parse()
is the handling of relative URLs.Background
url.parse()
accepts incomplete, relative URLs by filling unavailable components of a URL withnull
.On the other hand, the
URL
constructor guarantees that all URL objects are fully complete and valid URLs, which means that it throws an exception in case of relative URLs:WHATWG URL API does have the algorithms necessary to parse relative URLs, however, and that is activated if a
base
argument is provided:It is not always the case that a base URL is available, though.
Possible solutions
Do nothing
What this entails is that the currently supported ability to parse relative URLs will die as
url.parse()
becomes deprecated.Do not deprecate
url.parse()
; otherwise do nothingThis is the most obvious actual solution, but from tickets like #12168, I don't see this as a good idea.
Add a non-standard
TolerantURL
classThis could work if we trick the parser into believing we have a legitimate URL, except there are many conditionals in the URL parser algorithm that provide ad-hoc compatibility fixes with legacy implementations. We would have to make a set of opinionated assumptions about the nature of the URL, such as the URL's scheme.
In addition to parsing, the setters will have awkward semantics. Consider the following:
Something else that's better than what I thought of above...
The text was updated successfully, but these errors were encountered: