Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dep5 Duplicate field "Copyright": Fails to parse many files with REUSE v2.1.0 Release #803

Closed
cordlandwehr opened this issue Jul 18, 2023 · 13 comments · Fixed by #841
Closed
Labels
bug Something isn't working documentation Missing or wrong documentation regression Regression, something worked before, but not any more upstream It's someone else's problem

Comments

@cordlandwehr
Copy link

Hi, in the KDE repos, our CI just picked up the new REUSE release and apparently this introduces a severe regression. Eg. in this job we suddenly have all files not being able to be read: https://invent.kde.org/system/partitionmanager/-/jobs/1064900
Any ideas what might going wrong here?

@cordlandwehr
Copy link
Author

We are using the FSFE Docker container on our CI, maybe it is a rebuild issue of the container?

@mxmehl
Copy link
Member

mxmehl commented Jul 18, 2023

I cannot reproduce this using my employer's GitLab instance in which also fsfe/reuse:latest is used. I made sure it's using the latest REUSE version.

This is the slightly redacted .gitlab-ci.yml:

---
reuse:
  image:
    name: fsfe/reuse:latest
    entrypoint: [""]
  script:
    - reuse lint --json

@cordlandwehr
Copy link
Author

We have a theory: could this be do to our mechanism to rm -rf the PO folder and reuse changed behavior to now error when uncommitted file removals exist in the repository?

@mxmehl
Copy link
Member

mxmehl commented Jul 18, 2023

OK, now I tried to reproduce it locally and all files are unreadable: docker run --rm -it -v /tmp/partitionmanager:/data fsfe/reuse:latest:

# SUMMARY

* Bad licenses: 0
* Deprecated licenses: 0
* Licenses without file extension: 0
* Missing licenses: 0
* Unused licenses: GPL-3.0-or-later, CC-BY-4.0, MIT, LGPL-3.0-or-later, GFDL-1.2-or-later, CC0-1.0
* Used licenses: 0
* Read errors: 386
* files with copyright information: 0 / 0
* files with license information: 0 / 0

So it seems to be that on specific environments, there occur read errors that didn't happen in 1.1.2. I cannot tell why because Git can read them perfectly fine and I can also cat them. I'm not sure which code changes could cause this, but perhaps @linozen or @carmenbianca do.

The Docker base images don't seem to have an effect. I tried both the latest Alpine and Debian image.

@nicorikken
Copy link
Member

Parts of that callstack:

The dep5 file also defines copyright information, but seems to be scoped properly https://invent.kde.org/system/partitionmanager/-/blob/master/.reuse/dep5

Added the raw log linked from the top post to ensure it doesn't get lost with retention times: kde-partitionmanager-job-1064900.log

@carmenbianca
Copy link
Member

carmenbianca commented Jul 19, 2023

@cordlandwehr I'm a little short on time, but here's a response that will hopefully help you just enough to get everything working again.

There is a parsing error for the DEP5 file, linked here. Copying a section from it:

# .desktop file
FILES: src/org.kde.partitionmanager.desktop
License: GPL-3.0-or-later
Copyright: 2008 Volker Lanz <vl@fidra.de>
Copyright: 2017 Andrius Štikonas <andrius@stikonas.eu>
Copyright: 2020 KDE translators

DEP5 doesn't allow multiple tags (or 'fields') in a section (or 'paragraph'). So it should be something like:

# .desktop file
FILES: src/org.kde.partitionmanager.desktop
License: GPL-3.0-or-later
Copyright: 2008 Volker Lanz <vl@fidra.de>
    2017 Andrius Štikonas <andrius@stikonas.eu>
    2020 KDE translators

Or analogous. In any case, indentation instead of multiple tags.


That doesn't entirely solve this issue, though. There are a few unanswered questions:

  • Why did this problem only start happening in the new release? -> My only explanation is that the upstream dependency python-debian changed something.
  • Why does the tool not explain/handle the error better? -> I think there's an issue open about this by @nicorikken, but I don't have the time to find it. It concerns python-debian, multi-processing, and debian-inspector.
  • Why does a parsing error in the DEP5 file cause everything to fail? -> Not sure, to be researched, but low-priority. The output of the lint with a broken DEP5 file is invalid in any case.
  • Can this class of errors be prevented by better documentation? -> To be researched.

I hope that helps for now.

@carmenbianca carmenbianca added bug Something isn't working upstream It's someone else's problem regression Regression, something worked before, but not any more documentation Missing or wrong documentation labels Jul 19, 2023
@XVilka
Copy link

XVilka commented Jul 19, 2023

Met the same error on Rizin repository: /~https://github.com/rizinorg/rizin/blob/dev/.reuse/dep5

reuse.report - ERROR - Unexpected error occurred while parsing 'travis-script'
ValueError: Duplicate field "Copyright" in paragraph number [4](/~https://github.com/rizinorg/rizin/actions/runs/5597235070/jobs/10235410696?pr=3671#step:4:5)6
reuse.report - ERROR - Unexpected error occurred while parsing '.clang-format'
ValueError: Duplicate field "Copyright" in paragraph number 46
reuse.report - ERROR - Unexpected error occurred while parsing 'snapcraft.yaml'
ValueError: Duplicate field "Copyright" in paragraph number 4[6](/~https://github.com/rizinorg/rizin/actions/runs/5597235070/jobs/10235410696?pr=3671#step:4:7)
reuse.report - ERROR - Unexpected error occurred while parsing 'CODEOWNERS'
ValueError: Duplicate field "Copyright" in paragraph number 46
reuse.report - ERROR - Unexpected error occurred while parsing 'Doxyfile'
ValueError: Duplicate field "Copyright" in paragraph number 46
reuse.report - ERROR - Unexpected error occurred while parsing '.appveyor.yml'
ValueError: Duplicate field "Copyright" in paragraph number 46
...

@nicorikken
Copy link
Member

Debian documentation mentions both methods, but isn't really clear on it:

In this way I checked previous versions:

reuse v2.1.0 with python-debian 0.1.49 error occurs ⚠
reuse v2.0.0 with python-debian 0.1.49 error occurs ⚠
reuse v1.1.2 with python-debian 0.1.44 no error ✅

So I tried:

reuse v2.1.0 with python-debian downgraded to 0.1.44 no error occurs ✅

Turns out python-debian went strict by default in version 0.1.47 reading the changelog: https://salsa.debian.org/python-debian-team/python-debian/-/blob/1ef7243a2b6214e52b68ca606a5407ee42a60bd7/debian/changelog#L33

I don't feel like we gain much by enforcing strictness to a file we are already not happy with. It would force reuse-tool users to go through the effort of changing their copyright file twice.

How about we call python-debian with strict=False?

@carmenbianca
Copy link
Member

Example with multiple Copyright attributes https://dep-team.pages.debian.net/deps/dep5/#copyright-field

It's not 100% clear, but it's not referring to the tags in DEP5. It's referring to the copyright lines in comment headers.

How about we call python-debian with strict=False?

Fine by me.

nicorikken added a commit that referenced this issue Jul 19, 2023
Since version 0.1.47 of python-debian the dep5 parsing is strict. This is
causing issues for certain users that have multiple Copyright statements in
their Dep5 file. Disablng strictness removes the issue for those users.

Closes #803

Signed-off-by: Nico Rikken <nico.rikken@fsfe.org>
nicorikken added a commit that referenced this issue Jul 20, 2023
Tests to ensure that all Copyright information from the .reuse/dep5 file is taken
into account. It contains 2 testcases, for both strictly and non-strictly
formatted dep5 files. The latter being present in the wild, based on comments in #803

Signed-off-by: Nico Rikken <nico.rikken@fsfe.org>
@nicorikken
Copy link
Member

nicorikken commented Jul 20, 2023

Turns out that in non-strict mode the latter copyright entries are simply ignored. So yes indentation is needed to ensure that all copyright holders become part of the report output. Testing with python-debian 0.1.44 gives me the last entry as outcome and version 0.1.48 gives me the first entry.
I pushed the test to PR #805 but this fails.
Did we ever properly handle multiple Copyright fields? Seems like we were not, and that we are right to error out.

Knowing this I'm more in favor of throwing an error. Perhaps we can improve the message and communication to be clear on how to resolve the issue.

Rather than using the built-in copyright attribute, we could consider parsing the file information ourselves, as the library is able to hold all statements:

>>> with dep5.open(encoding="utf-8") as fp:
...     Copyright(fp, strict=False).find_files_paragraph("doc/index.rst").dump()
... 
'Files: doc/*\nCopyright: 2017 Jane Doe\nCopyright: 2018 John Doe\nLicense: CC0-1.0\n'

It just returns one when the .copyright attribute is used.

>>> with dep5.open(encoding="utf-8") as fp:
...     Copyright(fp, strict=False).find_files_paragraph("doc/index.rst").copyright
... 
'2017 Jane Doe'

Our code:

map(str.strip, result.copyright.splitlines()) # type: ignore

Still, we are using the official Debian library to parse this file. If that library isn't setup up to deal with multiple Copyright, why should we be more relaxed at the risk of ambiguity?

@nicorikken nicorikken pinned this issue Jul 20, 2023
@nicorikken nicorikken changed the title Fails to parse many files with REUSE v2.1.0 Release Dep5 Duplicate field "Copyright": Fails to parse many files with REUSE v2.1.0 Release Jul 20, 2023
@cordlandwehr
Copy link
Author

Thanks a lot for the fast reply and the dep5 syntax fix. I think the main problem is the REUSE spec users learn from the "default" statement way that multiple Copyright lines are OK but this difference was missed to be checked with the linter. For KDE repos I think it is absolute fine that we fix this (now that that we know the syntax problem). But for the tool it is interesting if error-out is rather a behavior change than a fix. Maybe first using a warning and only later changing this to an error is the better approach?

@carmenbianca
Copy link
Member

As a small update, this is still a high-priority issue, but many of the solutions we've discussed among ourselves as maintainers have a lot of caveats. It is still being worked on, but needs more attention than anticipated.

Whatever the solution, we want to improve the error/warning messaging here.

@cordlandwehr
Copy link
Author

Just FYI on the KDE side everything is fine now after we fixed the syntax. But I agree that a better warning in these situations is important.

reposynch bot pushed a commit to hexagon-geo-surv/supereight2 that referenced this issue Nov 13, 2023
Multiple Copyright fields per paragraph break the reuse tool. See this
issue for details:

fsfe/reuse-tool#803 (comment)
@carmenbianca carmenbianca unpinned this issue Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Missing or wrong documentation regression Regression, something worked before, but not any more upstream It's someone else's problem
Projects
None yet
5 participants