codemetapy output is not static across identical runs #26

matthewfeickert · 2022-09-12T15:14:41Z

It seems that in codemetapy v2.0+ the output of subsequent runs is not static. This makes it impossible to be able to diff the output between runs (without doing manual sorting by yourself).

Example:

> docker run --rm -ti python:3.10 /bin/bash
root@68d255b7e087:/# python -m venv venv && . venv/bin/activate
(venv) root@68d255b7e087:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@68d255b7e087:/# python -m pip --quiet install --pre 'pyhf==0.7.0rc4'
(venv) root@68d255b7e087:/# python -m pip --quiet install 'codemetapy==2.2.1'
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run1.json
...
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run2.json
...
(venv) root@68d255b7e087:/# apt update && apt install -y jq
(venv) root@68d255b7e087:/# diff <(jq -S .softwareRequirements codemeta_run1.json) <(jq -S .softwareRequirements codemeta_run2.json)
11,18d10
<     "@id": "/dependency/pyyaml-ge-5.1",
<     "@type": "SoftwareApplication",
<     "identifier": "pyyaml",
<     "name": "pyyaml",
<     "runtimePlatform": "Python 3",
<     "version": ">=5.1"
<   },
<   {
26a19,26
>     "@id": "/dependency/importlib-resources-ge-1.4.0",
>     "@type": "SoftwareApplication",
>     "identifier": "importlib-resources",
>     "name": "importlib-resources",
>     "runtimePlatform": "Python 3",
>     "version": ">=1.4.0"
>   },
>   {
35c35
<     "@id": "/dependency/scipy-ge-1.1.0",
---
>     "@id": "/dependency/tqdm-ge-4.56.0",
37,38c37,38
<     "identifier": "scipy",
<     "name": "scipy",
---
>     "identifier": "tqdm",
>     "name": "tqdm",
40c40
<     "version": ">=1.1.0"
---
>     "version": ">=4.56.0"
43c43
<     "@id": "/dependency/jsonschema-ge-4.15.0",
---
>     "@id": "/dependency/pyyaml-ge-5.1",
45,46c45,46
<     "identifier": "jsonschema",
<     "name": "jsonschema",
---
>     "identifier": "pyyaml",
>     "name": "pyyaml",
48c48
<     "version": ">=4.15.0"
---
>     "version": ">=5.1"
51c51
<     "@id": "/dependency/importlib-resources-ge-1.4.0",
---
>     "@id": "/dependency/scipy-ge-1.1.0",
53,54c53,54
<     "identifier": "importlib-resources",
<     "name": "importlib-resources",
---
>     "identifier": "scipy",
>     "name": "scipy",
56c56
<     "version": ">=1.4.0"
---
>     "version": ">=1.1.0"
59c59
<     "@id": "/dependency/tqdm-ge-4.56.0",
---
>     "@id": "/dependency/jsonschema-ge-4.15.0",
61,62c61,62
<     "identifier": "tqdm",
<     "name": "tqdm",
---
>     "identifier": "jsonschema",
>     "name": "jsonschema",
64c64
<     "version": ">=4.56.0"
---
>     "version": ">=4.15.0"
(venv) root@68d255b7e087:/#

In codemetapy v0.3.5 the output was statically reproducible across runs. Is this something that could be supported again? Or should users sort the JSON manually if they want it?

For an example of how this affects workflows c.f. scikit-hep/pyhf#2002

The text was updated successfully, but these errors were encountered:

proycon · 2022-09-12T16:38:43Z

Good point, I agree that we want the output to be deterministic whereever possible, even though formally it makes no difference of course (it describes the same RDF graph).
I'll look into this and probably implement alphabetical sorting for the dependencies to solve this, though it might pop up in other places still too.

…/schema:identifier if schema:position is not used (#26) it still need to be investigated whether this may conflict with actual rdf lists (rdf:first, rdf:next), for which support is currently not implemented (#22)

proycon · 2022-09-12T20:18:23Z

I just fixed this and released v2.2.2 (with some other fixes too). I can't guarantee yet that it's always deterministic with all input, but at least lists should end up sorted now, which fixes the problem you mentioned.

PS: One thing I notice with pyhf is that the version number doesn't land correctly in the codemeta.json anymore (I see only 0.0.0). But this bug seems to be outside codemetapy because this 0.0.0 already literally appears in the generated pyhf.egg-info/PKG-INFO which codemetapy uses as source.

proycon · 2022-09-12T20:25:34Z

Another unrelated PS: by default some extensions on top of codemeta are enabled, if you don't want those you can specify --strict. You may also want to specify --released to get an accurate codemeta:developmentStatus.

* Update codemetapy to v2.2.2+ in 'current release' workflow to have access to the `--no-extras` CLI API in v2.0+ and reproducible runs. - c.f. proycon/codemetapy#24 - c.f. proycon/codemetapy#26 - Amends PR #1995 * Use the codemetapy v2.0 API which requires `--inputtype python` to be added. * Update codemeta.json to follow codemetapy v2.0+ general spec.

Use fstrings for rtol warning Use level arg over hardcoded 0.05 add in lru_cache import fix: Pin codemetapy to v0.3.5 for `--no-extras` functionality (#1995) * Pin codemetapy to v0.3.5 in the 'current release' test workflow to keep the `--no-extras` CLI API option. - c.f. proycon/codemetapy#24 * Update lower bounds for scipy and click in codemeta.json and add lower bounds for importlib-resources and typing-extensions. Rename and add to public API Add in hypotest_kwargs Add FIXME notice for later. FIX BEFORE MERGE Update public API repr ci: Install release candidates for 'current release' test workflow (#1996) * Use release candidates that are on PyPI for verifiying that the public API passes tests. This verifies that the release candidates that users are being asked to test reflect the release API. * Use the latest version of pytest. refactor: Use urllib.parse.urlsplit over urlparse (#1997) * Use urllib.parse.urlsplit over urllib.parse.urlparse to avoid having to deal with urlparse's 'params' argument which incurs a performance cost. - c.f. https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit - c.f. https://youtu.be/ABJvdsIANds Indent docstrings correctly Use uncorrelated_background API for docstring example Use rtol in docstring example to avoid warning drop lru_cache fix kwargs for autoscan Bump version: 0.7.0rc3 → 0.7.0rc4 docs: Add Binder Python runtime environment specification (#1998) * Add binder/runtime.txt to specify the version of Python that is used for the Binder environment. Use Python 3.10 as this is the latest version that all pyhf backends will work with. - c.f. https://mybinder.readthedocs.io/en/latest/howto/languages.html#python fix: Update codemeta lower bounds for jsonschema, importlib-resources (#2000) * Update jsonschema lower bound to v4.15.0 and importlib-resources lower bound to v1.4.0 to match their versions in setup.cfg. - Amends PR #1979 docs: Add milestone for 2000 project GitHub items (#2001) * Add milestone to README for 2000 project GitHub issues and pull requests. fix: Use codemetapy v2.2.2+ API (#2002) * Update codemetapy to v2.2.2+ in 'current release' workflow to have access to the `--no-extras` CLI API in v2.0+ and reproducible runs. - c.f. proycon/codemetapy#24 - c.f. proycon/codemetapy#26 - Amends PR #1995 * Use the codemetapy v2.0 API which requires `--inputtype python` to be added. * Update codemeta.json to follow codemetapy v2.0+ general spec. fix: Add filterwarnings ignore for protobuf DeprecationWarning (#2005) * Add a ignore to filterwarnings to avoid a protobuf DeprecationWarning > DeprecationWarning: Call to deprecated create function FileDescriptor(). > Note: Create unlinked descriptors is going to go away. Please use get/find > descriptors from generated code or query the descriptor_pool. from TensorFlow's use of protobuf. fix: Specify encoding as utf-8 to enforce PEP 597 (#2007) * Explicitly specify the encoding as utf-8 while opening a file to enforce PEP 597. This is future-proofing work to some degree as Python 3.15+ will make utf-8 the default. - c.f. https://peps.python.org/pep-0597/ * Add the flake8-encodings pre-commit hook to enforce PEP 597. docs: Add FAQ on reasons for need to downgrade dependencies (#1529) * Add FAQ explaining how reasons why users might have to manually downgrade dependencies. - c.f. PR #1979 for context docs: Seperate docstrings semantically Apply sourcery suggestion for simplification Rename to cached for clarity Add test for auto through upperlimit API Use None instead of auto to simplify API Avoid function level globals Use np.asarray to avoid copy Use lower and upper to match scipy terms Split warning for readability Add test for rtol warning Remove tmpdir fixture as not needed for these tests given no writing of output Add check for return_results More verbose fix: Correct concatenate lists instead of adding float to all list elements Test bounds expansion test: Update test_plot_results_no_axis baseline image (#2009) * matplotlib v3.6.0 results in a slightly different baseline image than matplotlib v3.5.x, so regenerate the baseline image using matplotlib v3.6.0 with `pytest --mpl-generate-path=tests/contrib/baseline tests/contrib/test_viz.py`. * Mark the test_plot_results_no_axis test as xfail for Python 3.7 as matplotlib v3.6.0 is Python 3.8+ and so the image is guaranteed to be different as Python 3.7 runtimes will install matplotlib v3.5.x. Add upperlimit_fixed_scan to API docs Add return_results test move to test_upperlimit_with_kwargs Move the pop out before evaluation to make everything very clean and clear Note what scan Rename to auto_scan docs: fix link Provide better coverage and use np.allclose docs: Add Beojan Stanislaus to contributor list change auto_scan to toms748_scan rename fixed_scan to linear_grid_scan Make intervals module and change API to upper_limit Rename to pyhf.infer.intervals.upper_limits get upper_limits.upper_limit working Also bring along old API limit to just upper_limit by default Rearrange feat: Add internal API to warn of deprecation and future removal * Add internal API pyhf.exceptions._deprecated_api_warning to alert users to API deprecation by raising a subclass of DeprecationWarning and future removal. * Add test for pyhf.exceptions._deprecated_api_warning to ensure it gets picked up as DeprecationWarning. Note deprecated API Seperate into condifence intervals section fix: Use function scope to avoid altering hypotest_args fixture Make test name explicit Use deprecated Sphinx note Add versionadded directives feat: Add internal API to warn of deprecation and future removal (#2012) * Add internal API pyhf.exceptions._deprecated_api_warning to alert users to API deprecation by raising a subclass of DeprecationWarning and future removal. * Add test for pyhf.exceptions._deprecated_api_warning to ensure it gets picked up as DeprecationWarning. Update lower bound on scipy as toms748 added in scipy v1.2.0 fixup from autoscan test changes

matthewfeickert mentioned this issue Sep 12, 2022

fix: Use codemetapy v2.2.2+ API scikit-hep/pyhf#2002

Merged

4 tasks

proycon self-assigned this Sep 12, 2022

proycon added the enhancement New feature or request label Sep 12, 2022

proycon moved this to Todo (backlog) in CLARIAH+ Shared Service: FAIR Tool Discovery Sep 12, 2022

proycon added this to CLARIAH+ Shared Service: FAIR Tool Discovery Sep 12, 2022

proycon closed this as completed Sep 12, 2022

Repository owner moved this from Todo (backlog) to Done in CLARIAH+ Shared Service: FAIR Tool Discovery Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codemetapy output is not static across identical runs #26

codemetapy output is not static across identical runs #26

matthewfeickert commented Sep 12, 2022

proycon commented Sep 12, 2022

proycon commented Sep 12, 2022

proycon commented Sep 12, 2022

codemetapy output is not static across identical runs #26

codemetapy output is not static across identical runs #26

Comments

matthewfeickert commented Sep 12, 2022

proycon commented Sep 12, 2022

proycon commented Sep 12, 2022

proycon commented Sep 12, 2022