-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tangentially related to CVE-2023-24329] urlparse does not correctly handle schemes that begin with ASCII digits, '+', '-', and '.' characters #99418
Comments
… with an alphabetical ASCII character. (#99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
… begin with an alphabetical ASCII character. (pythonGH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
… with an alphabetical ASCII character. (GH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
Thanks, looks like this has been fixed |
CVE-2023-24329 was assigned to this issue. |
Python 3.7, 3.8 and 3.9 are affected by this issue and still get security fixes. @gpshead: Should this fix be backported to Python 3.7-3.9? |
Ah, I don't see a fix for Python 3.10 neither, whereas the issue was reported on Python 3.10. |
I created https://python-security.readthedocs.io/vuln/urlparse-scheme.html to track fixes of this issue. |
Please see #102153.. |
(ie: that python-security urlparse-scheme blog text is currently wrong: this is not fixed, and the first report was in July, not November) |
I'm maintaining this page manually and it's quite a lot of work to maintain it. I tried to automate as many things as possible. The source can be found in the YAML file: /~https://github.com/vstinner/python-security/blob/main/vulnerabilities.yaml#L2134 Free free to propose a PR to fix the entry ;-) |
The fix for this CVE should really be backported if applicable. |
This issue does not contain the fix. See #102153. |
@gpshead : thank you so very much for the pointer! I'll do some poking around next week to see if some other OS distributions have addressed this and if there aren't any available fixes, try crafting (an) appropriate patch(es) and link it/them to the appropriate issue. I work on a project that uses 3.8; if it's too much work for 3.7, I'll just look into making the 3.8 patch work. |
… begin with an alphabetical ASCII character. (pythonGH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
… begin with an alphabetical ASCII character. (pythonGH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
… begin with an alphabetical ASCII character. (pythonGH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
… begin with an alphabetical ASCII character. (pythonGH-99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."` (cherry picked from commit 439b9cf) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
Background
RFC 3986 defines a scheme like this:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
RFC 2234 defines an ALPHA like this:
ALPHA = %x41-5A / %x61-7A
The WHATWG URL spec defines a scheme like this:
The bug
This is the scheme string parsing code from
Lib/urllib/parse.py:462-468
:This is the definition of
scheme_chars
fromLib/urllib/parse.py:77-80
:This will erroneously validate schemes that begin with any of
('.', '-', '+', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
. This behavior is in violation of both specifications.This bug is reproducible with the following snippet:
My environment
The text was updated successfully, but these errors were encountered: