Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For output parameters passed by pointer, set them on error: avoid uninitialized values #32

Open
vstinner opened this issue Oct 20, 2023 · 6 comments
Labels
guideline To be included in guidelines PEP

Comments

@vstinner
Copy link
Contributor

Python has a bunch of C API functions which take parameters are pointers, such as PyUnicode_AsUTF8AndSize(obj, &size). Some of these functions don't set output parameters on error. Problem: usually, these functions are called with variables allocated on the stack without being initialized, and so the variable value is undefined :-(

I propose to set a new guideline: always initialize all output parameters to known values.

For example, python/cpython#111106 sets the size of -1 on error. Previously, the size argument was left unchanged. Extract of the unit test fix:

    Py_ssize_t size = UNINITIALIZED_SIZE;
    const char *s = PyUnicode_AsUTF8AndSize(unicode, &size);
    if (s == NULL) {
-        assert(size == UNINITIALIZED_SIZE);
+        assert(size == -1);
        return NULL;
    }

Now size value is determistic and there is no undefined behavior anymore.

Previously, using size in the error code could lead to undefined behavior. Example:

Py_ssize_t size;
const char *str = PyUnicode_AsUTF8AndSize(unicode, &size);
return PyUnicode_FromString(str, size);

What happens if PyUnicode_AsUTF8AndSize() fails and size is undefined?

cc @erlend-aasland @serhiy-storchaka

@vstinner vstinner added the guideline To be included in guidelines PEP label Oct 20, 2023
@serhiy-storchaka
Copy link

What happens if PyUnicode_AsUTF8AndSize() fails and size is undefined?

SystemError, unless size is 0. Empty string if size is 0.

See also python/cpython#110865. *end is set in case of some errors and is not set in case of other errors.

I am not sure that such guideline should be applied in all cases. In some cases it may be handy to left the value unchanged. For example, PyArg_ParseTuple() leaves values for optional arguments unchanged, it allows to specify default values which can be different in every case and even depend on the state of the object.

@zooba
Copy link
Contributor

zooba commented Oct 23, 2023

Previously, using size in the error code could lead to undefined behavior. Example:

Py_ssize_t size;
const char *str = PyUnicode_AsUTF8AndSize(unicode, &size);
return PyUnicode_FromString(str, size);

This construct should produce a compiler warning already.

In some cases it may be handy to left the value unchanged.

I agree. We should definitely have a rule to document the value left in a reference parameter in all situations (e.g. " on success; unchanged on error" or "set to 0 on error") and be correct within that function. Being more strict about it shouldn't be necessary.

We should be able to assume that callers do their error handling properly. Otherwise, we're going to end up with more checks in every function than are worth it (e.g. people want to skip type checks, which we can only do if we trust the caller).

@vstinner
Copy link
Contributor Author

This construct should produce a compiler warning already.

Not all compilers are able to inspect PyUnicode_AsUTF8AndSize() implementation to see if size is always initialized or not. While it should be the case in CPython with LTO, it's way more complicated for C extensions which are not even linked to libpython anymore.

@zooba
Copy link
Contributor

zooba commented Oct 30, 2023

Yeah, looks like the major ones all seem to skip their uninitialised local warnings when you've passed its address somewhere. Guess it's a common enough initialisation pattern that it's worth the heuristic.

I still think we can assume that callers do their own error handling.

@vstinner
Copy link
Contributor Author

vstinner commented Sep 6, 2024

Functions added to the C API now initialize parameters passed as pointers on error to avoid uninitialized state. I consider that the issue has been fixed.

On the other side, I don't think that it's worth it to go through existing functions.

@vstinner vstinner closed this as completed Sep 6, 2024
@encukou
Copy link
Contributor

encukou commented Sep 6, 2024

This is not yet in the guideline PEP draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guideline To be included in guidelines PEP
Projects
None yet
Development

No branches or pull requests

4 participants