-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dep5 Duplicate field "Copyright": Fails to parse many files with REUSE v2.1.0 Release #803
Comments
We are using the FSFE Docker container on our CI, maybe it is a rebuild issue of the container? |
I cannot reproduce this using my employer's GitLab instance in which also fsfe/reuse:latest is used. I made sure it's using the latest REUSE version. This is the slightly redacted ---
reuse:
image:
name: fsfe/reuse:latest
entrypoint: [""]
script:
- reuse lint --json |
We have a theory: could this be do to our mechanism to rm -rf the PO folder and reuse changed behavior to now error when uncommitted file removals exist in the repository? |
OK, now I tried to reproduce it locally and all files are unreadable:
So it seems to be that on specific environments, there occur read errors that didn't happen in 1.1.2. I cannot tell why because Git can read them perfectly fine and I can also The Docker base images don't seem to have an effect. I tried both the latest Alpine and Debian image. |
Parts of that callstack:
The dep5 file also defines copyright information, but seems to be scoped properly https://invent.kde.org/system/partitionmanager/-/blob/master/.reuse/dep5 Added the raw log linked from the top post to ensure it doesn't get lost with retention times: kde-partitionmanager-job-1064900.log |
@cordlandwehr I'm a little short on time, but here's a response that will hopefully help you just enough to get everything working again. There is a parsing error for the DEP5 file, linked here. Copying a section from it:
DEP5 doesn't allow multiple tags (or 'fields') in a section (or 'paragraph'). So it should be something like:
Or analogous. In any case, indentation instead of multiple tags. That doesn't entirely solve this issue, though. There are a few unanswered questions:
I hope that helps for now. |
Met the same error on Rizin repository: /~https://github.com/rizinorg/rizin/blob/dev/.reuse/dep5
|
Debian documentation mentions both methods, but isn't really clear on it:
In this way I checked previous versions: reuse So I tried: reuse Turns out python-debian went strict by default in version I don't feel like we gain much by enforcing strictness to a file we are already not happy with. It would force reuse-tool users to go through the effort of changing their copyright file twice. How about we call |
It's not 100% clear, but it's not referring to the tags in DEP5. It's referring to the copyright lines in comment headers.
Fine by me. |
Since version 0.1.47 of python-debian the dep5 parsing is strict. This is causing issues for certain users that have multiple Copyright statements in their Dep5 file. Disablng strictness removes the issue for those users. Closes #803 Signed-off-by: Nico Rikken <nico.rikken@fsfe.org>
Tests to ensure that all Copyright information from the .reuse/dep5 file is taken into account. It contains 2 testcases, for both strictly and non-strictly formatted dep5 files. The latter being present in the wild, based on comments in #803 Signed-off-by: Nico Rikken <nico.rikken@fsfe.org>
Turns out that in non-strict mode the latter copyright entries are simply ignored. So yes indentation is needed to ensure that all copyright holders become part of the report output. Testing with python-debian 0.1.44 gives me the last entry as outcome and version 0.1.48 gives me the first entry. Knowing this I'm more in favor of throwing an error. Perhaps we can improve the message and communication to be clear on how to resolve the issue. Rather than using the built-in copyright attribute, we could consider parsing the file information ourselves, as the library is able to hold all statements: >>> with dep5.open(encoding="utf-8") as fp:
... Copyright(fp, strict=False).find_files_paragraph("doc/index.rst").dump()
...
'Files: doc/*\nCopyright: 2017 Jane Doe\nCopyright: 2018 John Doe\nLicense: CC0-1.0\n' It just returns one when the
Our code: Line 239 in a86a96a
Still, we are using the official Debian library to parse this file. If that library isn't setup up to deal with multiple Copyright, why should we be more relaxed at the risk of ambiguity? |
Thanks a lot for the fast reply and the dep5 syntax fix. I think the main problem is the REUSE spec users learn from the "default" statement way that multiple Copyright lines are OK but this difference was missed to be checked with the linter. For KDE repos I think it is absolute fine that we fix this (now that that we know the syntax problem). But for the tool it is interesting if error-out is rather a behavior change than a fix. Maybe first using a warning and only later changing this to an error is the better approach? |
As a small update, this is still a high-priority issue, but many of the solutions we've discussed among ourselves as maintainers have a lot of caveats. It is still being worked on, but needs more attention than anticipated. Whatever the solution, we want to improve the error/warning messaging here. |
Just FYI on the KDE side everything is fine now after we fixed the syntax. But I agree that a better warning in these situations is important. |
Multiple Copyright fields per paragraph break the reuse tool. See this issue for details: fsfe/reuse-tool#803 (comment)
Hi, in the KDE repos, our CI just picked up the new REUSE release and apparently this introduces a severe regression. Eg. in this job we suddenly have all files not being able to be read: https://invent.kde.org/system/partitionmanager/-/jobs/1064900
Any ideas what might going wrong here?
The text was updated successfully, but these errors were encountered: