Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codemetapy output is not static across identical runs #26

Closed
matthewfeickert opened this issue Sep 12, 2022 · 3 comments
Closed

codemetapy output is not static across identical runs #26

matthewfeickert opened this issue Sep 12, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@matthewfeickert
Copy link

It seems that in codemetapy v2.0+ the output of subsequent runs is not static. This makes it impossible to be able to diff the output between runs (without doing manual sorting by yourself).

Example:

> docker run --rm -ti python:3.10 /bin/bash
root@68d255b7e087:/# python -m venv venv && . venv/bin/activate
(venv) root@68d255b7e087:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@68d255b7e087:/# python -m pip --quiet install --pre 'pyhf==0.7.0rc4'
(venv) root@68d255b7e087:/# python -m pip --quiet install 'codemetapy==2.2.1'
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run1.json
...
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run2.json
...
(venv) root@68d255b7e087:/# apt update && apt install -y jq
(venv) root@68d255b7e087:/# diff <(jq -S .softwareRequirements codemeta_run1.json) <(jq -S .softwareRequirements codemeta_run2.json)
11,18d10
<     "@id": "/dependency/pyyaml-ge-5.1",
<     "@type": "SoftwareApplication",
<     "identifier": "pyyaml",
<     "name": "pyyaml",
<     "runtimePlatform": "Python 3",
<     "version": ">=5.1"
<   },
<   {
26a19,26
>     "@id": "/dependency/importlib-resources-ge-1.4.0",
>     "@type": "SoftwareApplication",
>     "identifier": "importlib-resources",
>     "name": "importlib-resources",
>     "runtimePlatform": "Python 3",
>     "version": ">=1.4.0"
>   },
>   {
35c35
<     "@id": "/dependency/scipy-ge-1.1.0",
---
>     "@id": "/dependency/tqdm-ge-4.56.0",
37,38c37,38
<     "identifier": "scipy",
<     "name": "scipy",
---
>     "identifier": "tqdm",
>     "name": "tqdm",
40c40
<     "version": ">=1.1.0"
---
>     "version": ">=4.56.0"
43c43
<     "@id": "/dependency/jsonschema-ge-4.15.0",
---
>     "@id": "/dependency/pyyaml-ge-5.1",
45,46c45,46
<     "identifier": "jsonschema",
<     "name": "jsonschema",
---
>     "identifier": "pyyaml",
>     "name": "pyyaml",
48c48
<     "version": ">=4.15.0"
---
>     "version": ">=5.1"
51c51
<     "@id": "/dependency/importlib-resources-ge-1.4.0",
---
>     "@id": "/dependency/scipy-ge-1.1.0",
53,54c53,54
<     "identifier": "importlib-resources",
<     "name": "importlib-resources",
---
>     "identifier": "scipy",
>     "name": "scipy",
56c56
<     "version": ">=1.4.0"
---
>     "version": ">=1.1.0"
59c59
<     "@id": "/dependency/tqdm-ge-4.56.0",
---
>     "@id": "/dependency/jsonschema-ge-4.15.0",
61,62c61,62
<     "identifier": "tqdm",
<     "name": "tqdm",
---
>     "identifier": "jsonschema",
>     "name": "jsonschema",
64c64
<     "version": ">=4.56.0"
---
>     "version": ">=4.15.0"
(venv) root@68d255b7e087:/#

In codemetapy v0.3.5 the output was statically reproducible across runs. Is this something that could be supported again? Or should users sort the JSON manually if they want it?

For an example of how this affects workflows c.f. scikit-hep/pyhf#2002

@proycon
Copy link
Owner

proycon commented Sep 12, 2022

Good point, I agree that we want the output to be deterministic whereever possible, even though formally it makes no difference of course (it describes the same RDF graph).
I'll look into this and probably implement alphabetical sorting for the dependencies to solve this, though it might pop up in other places still too.

@proycon proycon self-assigned this Sep 12, 2022
@proycon proycon added the enhancement New feature or request label Sep 12, 2022
proycon added a commit that referenced this issue Sep 12, 2022
…/schema:identifier if schema:position is not used (#26)

it still need to be investigated whether this may conflict with actual rdf lists (rdf:first, rdf:next), for which support is currently not implemented (#22)
@proycon
Copy link
Owner

proycon commented Sep 12, 2022

I just fixed this and released v2.2.2 (with some other fixes too). I can't guarantee yet that it's always deterministic with all input, but at least lists should end up sorted now, which fixes the problem you mentioned.

PS: One thing I notice with pyhf is that the version number doesn't land correctly in the codemeta.json anymore (I see only 0.0.0). But this bug seems to be outside codemetapy because this 0.0.0 already literally appears in the generated pyhf.egg-info/PKG-INFO which codemetapy uses as source.

@proycon proycon closed this as completed Sep 12, 2022
Repository owner moved this from Todo (backlog) to Done in CLARIAH+ Shared Service: FAIR Tool Discovery Sep 12, 2022
@proycon
Copy link
Owner

proycon commented Sep 12, 2022

Another unrelated PS: by default some extensions on top of codemeta are enabled, if you don't want those you can specify --strict. You may also want to specify --released to get an accurate codemeta:developmentStatus.

matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Sep 13, 2022
* Update codemetapy to v2.2.2+ in 'current release' workflow to have access to
  the `--no-extras` CLI API in v2.0+ and reproducible runs.
   - c.f. proycon/codemetapy#24
   - c.f. proycon/codemetapy#26
   - Amends PR #1995
* Use the codemetapy v2.0 API which requires `--inputtype python` to be added.
* Update codemeta.json to follow codemetapy v2.0+ general spec.
matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Sep 20, 2022
Use fstrings for rtol warning

Use level arg over hardcoded 0.05

add in lru_cache import

fix: Pin codemetapy to v0.3.5 for `--no-extras` functionality (#1995)

* Pin codemetapy to v0.3.5 in the 'current release' test workflow to keep the
  `--no-extras` CLI API option.
   - c.f. proycon/codemetapy#24
* Update lower bounds for scipy and click in codemeta.json and add lower bounds
  for importlib-resources and typing-extensions.

Rename and add to public API

Add in hypotest_kwargs

Add FIXME notice for later. FIX BEFORE MERGE

Update public API repr

ci: Install release candidates for 'current release' test workflow (#1996)

* Use release candidates that are on PyPI for verifiying that the public API
  passes tests. This verifies that the release candidates that users are being
  asked to test reflect the release API.
* Use the latest version of pytest.

refactor: Use urllib.parse.urlsplit over urlparse (#1997)

* Use urllib.parse.urlsplit over urllib.parse.urlparse to avoid having to deal with
  urlparse's 'params' argument which incurs a performance cost.
   - c.f. https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit
   - c.f. https://youtu.be/ABJvdsIANds

Indent docstrings correctly

Use uncorrelated_background API for docstring example

Use rtol in docstring example to avoid warning

drop lru_cache

fix kwargs for autoscan

Bump version: 0.7.0rc3 → 0.7.0rc4

docs: Add Binder Python runtime environment specification (#1998)

* Add binder/runtime.txt to specify the version of Python that is used for the Binder
  environment. Use Python 3.10 as this is the latest version that all pyhf backends will
  work with.
   - c.f. https://mybinder.readthedocs.io/en/latest/howto/languages.html#python

fix: Update codemeta lower bounds for jsonschema, importlib-resources (#2000)

* Update jsonschema lower bound to v4.15.0 and importlib-resources lower bound to v1.4.0
  to match their versions in setup.cfg.
   - Amends PR #1979

docs: Add milestone for 2000 project GitHub items (#2001)

* Add milestone to README for 2000 project GitHub issues and pull requests.

fix: Use codemetapy v2.2.2+ API (#2002)

* Update codemetapy to v2.2.2+ in 'current release' workflow to have access to
  the `--no-extras` CLI API in v2.0+ and reproducible runs.
   - c.f. proycon/codemetapy#24
   - c.f. proycon/codemetapy#26
   - Amends PR #1995
* Use the codemetapy v2.0 API which requires `--inputtype python` to be added.
* Update codemeta.json to follow codemetapy v2.0+ general spec.

fix: Add filterwarnings ignore for protobuf DeprecationWarning (#2005)

* Add a ignore to filterwarnings to avoid a protobuf DeprecationWarning

> DeprecationWarning: Call to deprecated create function FileDescriptor().
> Note: Create unlinked descriptors is going to go away. Please use get/find
> descriptors from generated code or query the descriptor_pool.

from TensorFlow's use of protobuf.

fix: Specify encoding as utf-8 to enforce PEP 597 (#2007)

* Explicitly specify the encoding as utf-8 while opening a file to enforce PEP 597.
  This is future-proofing work to some degree as Python 3.15+ will make utf-8 the
  default.
   - c.f. https://peps.python.org/pep-0597/
* Add the flake8-encodings pre-commit hook to enforce PEP 597.

docs: Add FAQ on reasons for need to downgrade dependencies (#1529)

* Add FAQ explaining how reasons why users might have to manually downgrade dependencies.
   - c.f. PR #1979 for context

docs: Seperate docstrings semantically

Apply sourcery suggestion for simplification

Rename to cached for clarity

Add test for auto through upperlimit API

Use None instead of auto to simplify API

Avoid function level globals

Use np.asarray to avoid copy

Use lower and upper to match scipy terms

Split warning for readability

Add test for rtol warning

Remove tmpdir fixture as not needed for these tests given no writing of output

Add check for return_results

More verbose

fix: Correct concatenate lists instead of adding float to all list elements

Test bounds expansion

test: Update test_plot_results_no_axis baseline image (#2009)

* matplotlib v3.6.0 results in a slightly different baseline image than
  matplotlib v3.5.x, so regenerate the baseline image using matplotlib v3.6.0
  with `pytest --mpl-generate-path=tests/contrib/baseline tests/contrib/test_viz.py`.
* Mark the test_plot_results_no_axis test as xfail for Python 3.7 as matplotlib v3.6.0
  is Python 3.8+ and so the image is guaranteed to be different as Python 3.7 runtimes
  will install matplotlib v3.5.x.

Add upperlimit_fixed_scan to API docs

Add return_results test

move to test_upperlimit_with_kwargs

Move the pop out before evaluation to make everything very clean and clear

Note what scan

Rename to auto_scan

docs: fix link

Provide better coverage and use np.allclose

docs: Add Beojan Stanislaus to contributor list

change auto_scan to toms748_scan

rename fixed_scan to linear_grid_scan

Make intervals module and change API to upper_limit

Rename to pyhf.infer.intervals.upper_limits

get upper_limits.upper_limit working

Also bring along old API

limit to just upper_limit by default

Rearrange

feat: Add internal API to warn of deprecation and future removal

* Add internal API pyhf.exceptions._deprecated_api_warning to alert users to API deprecation
  by raising a subclass of DeprecationWarning and future removal.
* Add test for pyhf.exceptions._deprecated_api_warning to ensure it gets picked up as
  DeprecationWarning.

Note deprecated API

Seperate into condifence intervals section

fix: Use function scope to avoid altering hypotest_args fixture

Make test name explicit

Use deprecated Sphinx note

Add versionadded directives

feat: Add internal API to warn of deprecation and future removal (#2012)

* Add internal API pyhf.exceptions._deprecated_api_warning to alert users to API deprecation
  by raising a subclass of DeprecationWarning and future removal.
* Add test for pyhf.exceptions._deprecated_api_warning to ensure it gets picked up as
  DeprecationWarning.

Update lower bound on scipy as toms748 added in scipy v1.2.0

fixup from autoscan test changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants