Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full support for duplicate dependencies with overlapping markers #7257

Merged
merged 1 commit into from
Jul 24, 2023

Conversation

radoering
Copy link
Member

@radoering radoering commented Dec 27, 2022

Resolves: #5066

Requires: python-poetry/poetry-core#530, python-poetry/poetry-core#546, python-poetry/poetry-core#547

  • replace _merge_dependencies_by_marker and _handle_any_marker_dependencies by the more general _resolve_overlapping_markers
  • raise IncompatibleConstraintsError if there are conflicts (due to incompatible version constraints or source types) in the constraints of a package

Tested with opencv, which has the most duplicate dependencies with overlapping markers I know about:

numpy (>=1.13.3) ; python_version < "3.7"
numpy (>=1.21.2) ; python_version >= "3.10"
numpy (>=1.21.2) ; python_version >= "3.6" and platform_system == "Darwin" and platform_machine == "arm64"
numpy (>=1.19.3) ; python_version >= "3.6" and platform_system == "Linux" and platform_machine == "aarch64"
numpy (>=1.14.5) ; python_version >= "3.7"
numpy (>=1.17.3) ; python_version >= "3.8"
numpy (>=1.19.3) ; python_version >= "3.9"

Especially the markers with platform_system and platform_machine, which overlap with most other markers make this example quite complex.

TODO: Investigation of the influence on the performance for (more) real-world examples

  • Added tests for changed code.
  • Updated documentation for changed code.

@radoering radoering changed the title full support for duplicate depenencies with overlapping markers full support for duplicate dependencies with overlapping markers Feb 25, 2023
@radoering radoering force-pushed the overlapping-markers branch from c31c559 to 15c234d Compare June 3, 2023 15:54
@radoering radoering marked this pull request as ready for review June 3, 2023 16:12
@radoering
Copy link
Member Author

Times for poetry lock with a warm cache:

pyproject.toml from ... time without PR time with PR
#4670 1 s 1 s
#4870 58 s 58 s
shootout example 16 s 10 s
large internal project 210 s 53 s

Number of overrides:

pyproject.toml from ... number of overrides without PR number of overrides with PR
#4670 24 24
#4870 52 52
shootout example 28 18
large internal project 93 15

Considering the examples, the performance of our marker arithmetic seems to be fast enough to handle this PR: There is no measurable performance regression even in cases where the number of overrides does not change.

@radoering radoering force-pushed the overlapping-markers branch from 15c234d to 84b52eb Compare June 11, 2023 15:07
@radoering radoering requested a review from a team June 11, 2023 15:18
@neersighted
Copy link
Member

PTAL @dimbleby

neersighted
neersighted previously approved these changes Jun 11, 2023
@radoering radoering force-pushed the overlapping-markers branch from 84b52eb to e706f3f Compare June 18, 2023 11:33
Copy link
Contributor

@dimbleby dimbleby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think, though my head now hurts a little...

Feels as though there's some danger of this being horribly inefficient if there are lots of overlapping markers to resolve. But probably folk mostly don't do that and certainly the examples showing that this is typically a win by reducing overrides is encouraging. So I guess we hope that this continues to hold when exposed to more examples from the outside world...

Made a couple of more or less pedantic code-improving suggestions.

src/poetry/puzzle/provider.py Outdated Show resolved Hide resolved
src/poetry/puzzle/provider.py Outdated Show resolved Hide resolved
@dimbleby
Copy link
Contributor

dimbleby commented Jun 18, 2023

Here's an example in which _resolve_overlapping_markers() has to do too much work:

[tool.poetry.dependencies]
python = "^3.5"
oldest-supported-numpy = "^2022.11.19"

see /~https://github.com/scipy/oldest-supported-numpy/blob/main/setup.cfg

Edit: to be fair, the existing code also doesn't do well with this one!

@radoering
Copy link
Member Author

I can't get a solution (within a reasonable amount of time) with or without this PR. Thus, probably not a blocker?

@dimbleby
Copy link
Contributor

I'm not sure how bad I think this is. I agree that with that example I am not getting a solution either before or after, or anyway not in an amount of time that I am willing to wait.

You can use the python requirement as a knob to adjust "how many combinations of markers are there to work through?", and therefore how slow this is. eg python = "^3.8" is just about tolerable.

However it is all made a bit muddier by finding that the overlapping markers produce some incompatible requirements, which the new code discovers and the old code does not. (I've opened an issue at oldest-supported-numpy.)

So that's an example where this change is "breaking" (defensible as "fixing" really, but expect bug reports anyway!) - and because it's breaking I'm not sure what the performance impact really is for that case.

Not sure this is helping much: I think this is an improvement, but I sure would be happier if it could be made faster!

@radoering
Copy link
Member Author

After fixing the requirements of oldest-supported-numpy (cf scipy/oldest-supported-numpy#71) and with python-poetry/poetry-core#608, which is required to get sensible results for this package anyway, I tried some ranges with and without the PR:

Python constraint without PR with PR
^3.10 1 s 1 s
^3.9 44 s 13 s
>=3.8,<3.10 > 1.000 s 69 s
>=3.5,<3.8 858 s 318 s

Larger ranges still took too long. Maybe, I was just lucky but the results are pretty clear...

@dimbleby
Copy link
Contributor

Interesting, do the changes you have made to oldest-supported-numpy have a negative effect on poetry performance?

without the PR and also without those changes I see:

Python constraint Time
>=3.5,<3.8 5s
>=3.8,<3.10 3s
^3.9 3s
^3.10 1s

where the first of those fails to find a solution and some of the others ought to (because the code without the PR is not correct); but nevertheless these times are much faster than you are reporting.

I'm all in favour of concluding that the new code is both more correct and also faster than the old code, that would be lovely. I'm just a bit confused about why we are seeing such different results.

@radoering
Copy link
Member Author

The changes to oldest-supported-numpy and python-poetry/poetry-core#608 both have a negative impact on performance.

@dimbleby
Copy link
Contributor

Well I still wish it was faster; but if we're not finding any reason to believe that this change makes things worse then I guess it should be merged

@dimbleby
Copy link
Contributor

I haven't profiled, so this might be a bad or irrelevant idea...

I wonder whether _is_relevant_marker() is slow: this bit looks like quite a lot of work:

            and self._python_constraint.allows_any(
                get_python_constraint_from_marker(marker)
            )

thought: the project python constraint could be transformed into an equivalent marker (once, up-front), and then intersected with used_marker_intersection. If that intersection is (not) empty then used_marker_intersection is (not) irrelevant to the current project.

As I say, I haven't tested whether that's definitely a slow bit that is worth optimising: and anyway it isn't completely obvious that this would even be an improvement. So don't block the merge on this idea!

@dimbleby
Copy link
Contributor

oh, but a better idea: the calculation of the intersection is currently rather slow. Much faster is to arrange it as something like this:

              markers = (
                  dep.marker if use else dep.marker.invert()
                  for use, dep in zip(uses, dependencies)
              )
              used_marker_intersection: BaseMarker = AnyMarker()
              for m in markers:
                  used_marker_intersection = used_marker_intersection.intersect(m)

presumably because most of the time the intersection collapses to the empty marker pretty quickly; and once that has happened we can avoid the work of DNF-ing and suchlike on the remaining markers.

@dimbleby
Copy link
Contributor

dimbleby commented Jun 20, 2023

and in fact that observation can be encouraged by putting the positive markers early on in the intersection, like this (could do with prettying up!):

              markers = (
                  dep.marker if use else dep.marker.invert()
                  for use, dep in sorted(
                      zip(uses, dependencies), key=lambda ud: ud[0], reverse=True
                  )
              )

That's relying on the heuristic that if there are a lot of markers present, then surely they most don't intersect one another. But that seems quite plausible, it would be quite hard to write them any other way

@radoering
Copy link
Member Author

_is_relevant_marker() seems to be irrelevant. 😉

The other suggestion looks very promising. (I have to run some additional tests later.)

@dimbleby
Copy link
Contributor

An easy win in _merge_dependencies_by_constraint():

             for deps in by_constraint.values():
-                new_markers = [dep.marker for dep in deps]
                 dep = deps[0]
-                dep.marker = marker_union(*new_markers)
+                if len(deps) > 1:
+                    new_markers = (dep.marker for dep in deps)
+                    dep.marker = marker_union(*new_markers)

no need to perform all the dnf-and-cnf complexity when trying to merge a list that only contained a single marker in the first place

@radoering radoering force-pushed the overlapping-markers branch from a268daf to 596157c Compare June 23, 2023 15:56
@radoering
Copy link
Member Author

I applied both suggestions and checked that there is no negative impact on the examples in #7257 (comment).

Measuring #7257 (comment) again:

Python constraint before optimization after optimization 1 after optimization 2
^3.9 13 s 10 s 4 s
>=3.8,<3.10 69 s 22 s 3 s
>=3.5,<3.8 318 s 15 s 9 s

The only drawback of the second optimization is that the locked markers become more verbose. See test_solver_duplicate_dependencies_with_overlapping_markers_complex. (Of course, the performance improvement is more important than consise markers in the lock file.)

@dimbleby
Copy link
Contributor

Seems like what we're seeing here is that the performance of marker operations is still expensive, when they get sufficiently complicated. Nearly all of the time in the solving is now taken up with performing the single intersection of the final combination of markers (the case where all uses are False). Improvements recorded above are great: but with broader python constraints this package is still too slow to solve.

oldest-supported-numpy is certainly more complicated than most, but it is a real found-in-the-wild example. I don't think that it's necessary to completely solve that to merge this! Especially given the evidence that before this MR poetry was also nowhere near to finding a solution, so this isn't making things worse.

However, if you have any good ideas then this still would be worth addressing....

I suspect that we are doing a lot of repeated work to normalize the same markers again and again. I don't think I'm going to get to trying anything out soon, but I wonder if there are wins to be had either by caching answers, or having markers remember that they are already in DNF (therefore another dnf() can be made a no-op).

@radoering
Copy link
Member Author

but I wonder if there are wins to be had either by caching answers, or having markers remember that they are already in DNF

On their own, both appear to have a significant effect on performance. However, when caching answers, remembering that a marker is already DNF seems to only have a very small effect not worth the more complex code. There might be some cases where it's still relevant but I think I'll wait until we find one in the wild.

Updated measurements:

Python constraint without python-poetry/poetry-core#609 with python-poetry/poetry-core#609
^3.9 4 s 1 s
>=3.8,<3.10 3 s 1 s
>=3.5,<3.8 9 s 2 s

Broader Python constraints are still too slow...

Then, I switched from scipy/oldest-supported-numpy#71 to scipy/oldest-supported-numpy#74 (with scipy/oldest-supported-numpy#74 (comment)) because that's more likely to be merged:

Python constraint without python-poetry/poetry-core#609 with python-poetry/poetry-core#609
^3.8 40 s 2 s
^3.7 too long 32 s

@radoering radoering force-pushed the overlapping-markers branch from 596157c to e9c40e6 Compare July 23, 2023 16:11
Co-authored-by: David Hotham <david.hotham@blueyonder.co.uk>
@radoering radoering force-pushed the overlapping-markers branch from e9c40e6 to 116a232 Compare July 24, 2023 15:52
@radoering radoering merged commit 9ed457b into python-poetry:master Jul 24, 2023
mwalbeck pushed a commit to mwalbeck/docker-python-poetry that referenced this pull request Aug 27, 2023
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [poetry](https://python-poetry.org/) ([source](/~https://github.com/python-poetry/poetry), [changelog](https://python-poetry.org/history/)) | minor | `1.5.1` -> `1.6.1` |

---

### Release Notes

<details>
<summary>python-poetry/poetry (poetry)</summary>

### [`v1.6.1`](/~https://github.com/python-poetry/poetry/blob/HEAD/CHANGELOG.md#161---2023-08-21)

[Compare Source](python-poetry/poetry@1.6.0...1.6.1)

##### Fixed

-   Update the minimum required version of `requests` ([#&#8203;8336](python-poetry/poetry#8336)).

### [`v1.6.0`](/~https://github.com/python-poetry/poetry/blob/HEAD/CHANGELOG.md#160---2023-08-20)

[Compare Source](python-poetry/poetry@1.5.1...1.6.0)

##### Added

-   **Add support for repositories that do not provide a supported hash algorithm** ([#&#8203;8118](python-poetry/poetry#8118)).
-   **Add full support for duplicate dependencies with overlapping markers** ([#&#8203;7257](python-poetry/poetry#7257)).
-   **Improve performance of `poetry lock` for certain edge cases** ([#&#8203;8256](python-poetry/poetry#8256)).
-   Improve performance of `poetry install` ([#&#8203;8031](python-poetry/poetry#8031)).
-   `poetry check` validates that specified `readme` files do exist ([#&#8203;7444](python-poetry/poetry#7444)).
-   Add a downgrading note when updating to an older version ([#&#8203;8176](python-poetry/poetry#8176)).
-   Add support for `vox` in the `xonsh` shell ([#&#8203;8203](python-poetry/poetry#8203)).
-   Add support for `pre-commit` hooks for projects where the pyproject.toml file is located in a subfolder ([#&#8203;8204](python-poetry/poetry#8204)).
-   Add support for the `git+http://` scheme ([#&#8203;6619](python-poetry/poetry#6619)).

##### Changed

-   **Drop support for Python 3.7** ([#&#8203;7674](python-poetry/poetry#7674)).
-   Move `poetry lock --check` to `poetry check --lock` and deprecate the former ([#&#8203;8015](python-poetry/poetry#8015)).
-   Change future warning that PyPI will only be disabled automatically if there are no primary sources ([#&#8203;8151](python-poetry/poetry#8151)).

##### Fixed

-   Fix an issue where `build-system.requires` were not respected for projects with build scripts ([#&#8203;7975](python-poetry/poetry#7975)).
-   Fix an issue where the encoding was not handled correctly when calling a subprocess ([#&#8203;8060](python-poetry/poetry#8060)).
-   Fix an issue where `poetry show --top-level` did not show top level dependencies with extras ([#&#8203;8076](python-poetry/poetry#8076)).
-   Fix an issue where `poetry init` handled projects with `src` layout incorrectly ([#&#8203;8218](python-poetry/poetry#8218)).
-   Fix an issue where Poetry wrote `.pth` files with the wrong encoding ([#&#8203;8041](python-poetry/poetry#8041)).
-   Fix an issue where `poetry install` did not respect the source if the same version of a package has been locked from different sources ([#&#8203;8304](python-poetry/poetry#8304)).

##### Docs

-   Document **official Poetry badge** ([#&#8203;8066](python-poetry/poetry#8066)).
-   Update configuration folder path for macOS ([#&#8203;8062](python-poetry/poetry#8062)).
-   Add a warning about pip ignoring lock files ([#&#8203;8117](python-poetry/poetry#8117)).
-   Clarify the use of the `virtualenvs.in-project` setting. ([#&#8203;8126](python-poetry/poetry#8126)).
-   Change `pre-commit` YAML style to be consistent with pre-commit's own examples ([#&#8203;8146](python-poetry/poetry#8146)).
-   Fix command for listing installed plugins ([#&#8203;8200](python-poetry/poetry#8200)).
-   Mention the `nox-poetry` package ([#&#8203;8173](python-poetry/poetry#8173)).
-   Add an example with a PyPI source in the pyproject.toml file ([#&#8203;8171](python-poetry/poetry#8171)).
-   Use `reference` instead of deprecated `callable` in the scripts example ([#&#8203;8211](python-poetry/poetry#8211)).

##### poetry-core ([`1.7.0`](/~https://github.com/python-poetry/poetry-core/releases/tag/1.7.0))

-   Improve performance of marker handling ([#&#8203;609](python-poetry/poetry-core#609)).
-   Allow `|` as a value separator in markers with the operators `in` and `not in` ([#&#8203;608](python-poetry/poetry-core#608)).
-   Put pretty name (instead of normalized name) in metadata ([#&#8203;620](python-poetry/poetry-core#620)).
-   Update list of supported licenses ([#&#8203;623](python-poetry/poetry-core#623)).
-   Fix an issue where PEP 508 dependency specifications with names starting with a digit could not be parsed ([#&#8203;607](python-poetry/poetry-core#607)).
-   Fix an issue where Poetry considered an unrelated `.gitignore` file resulting in an empty wheel ([#&#8203;611](python-poetry/poetry-core#611)).

##### poetry-plugin-export ([`^1.5.0`](/~https://github.com/python-poetry/poetry-plugin-export/releases/tag/1.5.0))

-   Fix an issue where markers for dependencies required by an extra were not generated correctly ([#&#8203;209](python-poetry/poetry-plugin-export#209)).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](/~https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi40Mi40IiwidXBkYXRlZEluVmVyIjoiMzYuNTIuMiIsInRhcmdldEJyYW5jaCI6Im1hc3RlciJ9-->

Reviewed-on: https://git.walbeck.it/walbeck-it/docker-python-poetry/pulls/846
Co-authored-by: renovate-bot <bot@walbeck.it>
Co-committed-by: renovate-bot <bot@walbeck.it>
Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
@radoering radoering deleted the overlapping-markers branch November 24, 2024 12:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surprising behaviour (and missing warning?) with overlaping python versions
3 participants