Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh: No such file or directory in before_all for linux builds with cibuildwheel v2.10.0 #1271

Closed
vxgmichel opened this issue Sep 16, 2022 · 10 comments · Fixed by #1273
Closed

Comments

@vxgmichel
Copy link

vxgmichel commented Sep 16, 2022

Description

After updating from 2.9.0 to v2.10.0, I get the following error for linux builds using a simple before-all command like before-all = "ls":

Running before_all...
      + /opt/python/cp38-cp38/bin/python -c 'import sys, json, os; json.dump(os.environ.copy(), sys.stdout)'
    + sh -c ls
env: sh: No such file or directory

This seems to be related to the fact that we're updating the PATH environment variable:

[tool.cibuildwheel]
before-all = "ls"
environment = { PATH="$PATH:/usr/local/bin" }
@henryiii
Copy link
Contributor

henryiii commented Sep 16, 2022

The host $PATH isn't valid inside the container. So it must have been expanded inside the container. Hmm, what I expect is happening is shlex.quote is protecting the envvars there; we added that to have better string quoting in #1244, didn't realize it would change envvar usage

@henryiii
Copy link
Contributor

henryiii commented Sep 16, 2022

@joerick two easy options:

We could call os.path.expandvars inside the container, which would expand these usages, but unfortunately, would ignore quoting protection choices (which is also fortunate, actually, since shlex.quote is adding single quotes - it just means there would be no way to pass a string with a valid environment variable as a string. Though, thinking about it, you might be able to use PASSENV in that case).

We could remove the quoting, and just trust users not to produce malformed input. It wasn't supported before 2.10 anyway.

A final "option" that I'm not calling an option would be to restructure this so it is not passed through as strings, but is passed through as a dict (making the toml version the "true" method and strings just a way to build that dict). But this is a larger architectural change.

Quick example:

>>> shlex.quote("$HOME")
"'$HOME'"

This is quoted on the host, but needs to be expanded on the target.

@henryiii
Copy link
Contributor

henryiii commented Sep 16, 2022

I wonder if this is related to why scikit-build/cmake-python-distributions#282 is breaking. I don't see an expansion, but maybe the quoting is not working as expected: /~https://github.com/scikit-build/cmake-python-distributions/blob/d8404ad6bcb294f2b95b0ceba26c9b35d248706f/pyproject.toml#L42

The only thing unique to that pair of jobs is the part: -DRUN_CMAKE_TEST_EXCLUDE:STRING='BootstrapTest|ExportImport|RunCMake.install|RunCMake.file-GET_RUNTIME_DEPENDENCIES'

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

Ah. Hmm. So it seems to be that the problem is that unlike the config-settings option, we're not calling shlex.split before these values are processed further. shlex.split would remove the quotes that shlex.quote added. But we use bashlex to parse these options, when the bashlex parser sees a single-quote, it's getting more from that than just tokenisation. It changes how env vars are processed.

@henryiii
Copy link
Contributor

Could we call shlex.split?

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

I think we can't - we can't call shlex.split on these values because they will sometimes contain single-quotes that really do mean something.

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

So, let's think what we want...

  1. environment = { PATH="$PATH:/usr/local/bin" }

    should translate to PATH="$PATH:/usr/local/bin" (or PATH=$PATH:/usr/local/bin)

  2. environment = { SOME_STRING="a string with spaces" }

    should translate to SOME_STRING="a string with spaces" (or SOME_STRING='a string with spaces')

  3. environment = { SOME_STRING="an evil string with\" some strange punctuation" }

    should translate to SOME_STRING="an evil string with\" some strange punctuation"

  4. environment = { LITERAL_PASSTHROUGH="'this is not expanded: $PATH'" }

    should translate to LITERAL_PASSTHROUGH='this is not expanded: $PATH'

Though 3 might not be a huge priority... I'm not even sure I care about 4. much, tbh.

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

One idea I had was, when parsing the option from TOML, we could run shlex.split on it at option parse time, and see if it is more than one token. If it is, we run shlex.quote on it, to ensure that it's a valid env var assignment. Otherwise we don't quote these values at all.

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

☝️ I think this might be a good approach - it gives users the most control - if they specify one token (i.e. get their bash quoting right) then cibuildwheel doesn't touch it. Only if the option value would form an invalid bash assignment do things get quoted. We could raise a warning in this case, too?

Or, the other option would be to just raise an error if the value is more than one bash token, and get the user to fix their config Nah, based on how you're using it above this would be pretty user-hostile I feel!

@joerick
Copy link
Contributor

joerick commented Sep 16, 2022

@henryiii and I have just been chatting on Discord. There is some concern that my proposed fix above would make the quoting behaviour even more confusing than it was before (which I kinda agree with!)

Based on @henryiii's build logs above, it also seems like there's might be an incompatibility between shlex and bashlex somewhere-

>>> import shlex
>>> shlex.quote("a string with 'single quotes'")
'\'a string with \'"\'"\'single quotes\'"\'"\'\''
>>> shlex.split(shlex.quote("a string with 'single quotes'"))
["a string with 'single quotes'"]
>>> import bashlex
>>> list(bashlex.split(shlex.quote("a string with 'single quotes'")))
['a string with \'"\'"\'single quotes\'"\'"\'']

We don't actually use bashlex.split anywhere though, so maybe it's not actually a problem. Needs more investigation!

We could call os.path.expandvars inside the container, which would expand these usages, but unfortunately, would ignore quoting protection choices (which is also fortunate, actually, since shlex.quote is adding single quotes - it just means there would be no way to pass a string with a valid environment variable as a string. Though, thinking about it, you might be able to use PASSENV in that case).

I think this is only papering over the problem... correct quoting is also required for the bash parser to do command substitution with $() or backticks. It could also make it impossible to pass a literal $ in an option.

We could remove the quoting, and just trust users not to produce malformed input. It wasn't supported before 2.10 anyway.

Yeah, this is what I'm leaning towards, at least temporarily.

A final "option" that I'm not calling an option would be to restructure this so it is not passed through as strings, but is passed through as a dict (making the toml version the "true" method and strings just a way to build that dict). But this is a larger architectural change.

Perhaps. Though we'd still have to figure out the tricky problem of how to translate between a dict and a bash string of assignments somewhere e.g. how do we reconcile this:

@henryiii: I'm just not fond of this working:

PATH=$PATH:/local/bin
PATH=/windows loves spaces/bin

But not this:

PATH=$PATH:/windows loves spaces/bin

So I'm kinda leaning towards reverting to the previous behaviour for now, until we can figure out what the right policy is here.

joerick added a commit that referenced this issue Sep 3, 2024
Fix for #1803, where the method used for merging the config-settings option - string concatenation - was of little use.

I broke out the TableFmt object into a protocol-like object called OptionFormat, where the formatting and merge rules can be customised more deeply. This is more explicit, and lets us be a bit more precise with how TOML object get converted into the options.

I intend to use the flexibility of the OptionsFormat structure to tackle the remaining abiguity around environment variable quoting, as seen in the xfails in the options_test.py unit test and discussed in #1271.


Co-authored-by: Matthieu Darbois <mayeut@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants