add Cleanup for copying over physcial review article id as the page n… #7025

tmrd993 · 2020-10-18T13:36:00Z

Added a cleanup to copy over the article id as the page number for APS journals. This only happens if the page number doesn't exist already.

Change in CHANGELOG.md described (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at /~https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

…umber

tobiasdiez

Thanks a lot for you PR. Your idea seems to work nicely and the code looks good as well. I've only one remark about the location of the code, and a bit of fine tuning. Please also add a test, and have a look at the fetcher tests that are currently failing (because the fetcher now returns also page information).

tobiasdiez · 2020-10-18T14:25:34Z

src/main/java/org/jabref/logic/cleanup/PageFieldCleanup.java

+/**
+ * adds the article ID of a journal as the page count, but only if the page field is empty
+ */
+public class PageFieldCleanup implements CleanupJob {


Since this is only used for the DOI fetcher, I would suggest to add this functionality as a private class method in DoiFetcher instead of a new class.

tobiasdiez · 2020-10-18T14:27:23Z

src/main/java/org/jabref/logic/cleanup/PageFieldCleanup.java

+        if (doiAsString.isPresent() && !entry.hasField(StandardField.PAGES)) {
+            String articleId = new String();
+            int index = doiAsString.get().length() - 1;
+            while (Character.isDigit(doiAsString.get().charAt(index))) {


Since this issue only concerns articles from Physical Review, I would suggest to use a regex based on the format outlined in #7019 (comment). In particular, make sure it only applies to doi's to Physical Review, and not all dois ending on some number.

Siedlerchr · 2020-10-18T14:22:53Z

src/main/java/org/jabref/logic/cleanup/PageFieldCleanup.java

+
+        if (doiAsString.isPresent() && !entry.hasField(StandardField.PAGES)) {
+            String articleId = new String();
+            int index = doiAsString.get().length() - 1;


I would simply use a substring with lastIndexOf(.) to the the part after the last dot.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/String.html#lastIndexOf(java.lang.String)

Siedlerchr · 2020-10-18T14:31:45Z

src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

@@ -89,6 +90,7 @@ public String getName() {
    }

    private void doPostCleanup(BibEntry entry) {
+        new PageFieldCleanup().cleanup(entry);


This is not a good idea to put it here, because it would be called for every type of DOI, and the APS are only a very specific subset.
I would suggest adding it as a CleanupPreset Step:
/~https://github.com/JabRef/jabref/blob/1b35f8cb0040fdfb515974e78532598f07e11af2/src/main/java/org/jabref/logic/cleanup/CleanupPreset.java

and also adding it to the Cleanup Dialog maybe as Move article id to pages? (APS).

I do think it's the right place, the DOI fetcher is not returning the right/full information, so we improve this here. Of course, you are right and the extraction should only be applied for DOI's from APS (see my comment above).

tmrd993 · 2020-10-19T08:06:58Z

Thanks a lot for you PR. Your idea seems to work nicely and the code looks good as well. I've only one remark about the location of the code, and a bit of fine tuning. Please also add a test, and have a look at the fetcher tests that are currently failing (because the fetcher now returns also page information).

Thanks for the review @Siedlerchr @tobiasdiez
I think I am almost finished but some of the fetcher tests are still failing sometimes. I don't think it has anything to do with my code though (?) as it's just a connection failure. Here is the exception:
Caused by: java.io.IOException: Server returned HTTP response code: 504 for URL: https://data.crossref.org/10.1109%2FICWS.2007.59

I am guessing it's an issue with the crossref website.

Siedlerchr · 2020-10-19T09:31:43Z

src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

+        if (!entry.getType().equals(StandardEntryType.Article)) {
+            return false;
+        }
+        Pattern apsJournalSuffixPattern = Pattern.compile("([\\w]+\\.)([\\w]+\\.)([\\w]+)");


I am thinking if it would make sense to check for the strig "phys" as well?

It's better to etract the Pattern to a static string, otherwise you won't get the advantages

Can we be sure that every Physical Review journal contains that string? For example, this aps doi doesn't contain the string "phys". https://doi.org/10.1103/PRXQuantum.1.010001
Here is a list of all journals
https://journals.aps.org/browse

I think the best is to check for the string/number 10.1103/ It seems all DOIs from APS are prefixed with that.
Otherwise the regex is too broad and would capture a lot of non related things and would create invalid data

In that case, a regex isn't really needed because in the string 10.1103, 1103 denotes the organization (APS in this case). That's really all that has to be checked if I am not mistaken.

I still would use the regex as you can test that the doi is of the right format, so that you really know that the last number is the page number. And it's also a bit easier to understand than the manual parsing using isDigit

So we

use the organization id to check if the entry is an aps journal

use the regex to check if the doi is of the right format (https://doi.org/10.1103/[journal].[volume].[articleID])

set the page field if 1 and 2 are true

Siedlerchr · 2020-10-20T09:13:03Z

src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

    private boolean isAPSJournal(BibEntry entry, String doiAsString) {
        if (!entry.getType().equals(StandardEntryType.Article)) {
            return false;
        }
+        Pattern apsSuffixPattern = Pattern.compile(APS_SUFFIX);


move that line one as well to private static final then it's good !

Siedlerchr · 2020-10-20T09:53:31Z

Thanks, looks good to me now! Don't forget to add a changelog entry for the new feature.

…-article-id

tobiasdiez · 2020-10-20T10:54:55Z

Thanks! I hope you enjoyed the process, although may have been a bit confusing at times. Sorry for that. Looking forward to your next PR.

And now MERGE 🚀

add Cleanup for copying over physcial review article id as the page n…

c9b5284

…umber

tobiasdiez requested changes Oct 18, 2020

View reviewed changes

tobiasdiez added the status: changes required Pull requests that are not yet complete label Oct 18, 2020

Siedlerchr requested changes Oct 18, 2020

View reviewed changes

tmrd993 added 2 commits October 19, 2020 10:27

remove PageFieldCleanup, add private methods to DoiFetcher

3856f25

remove comment from DoiFetcherTest

342cacb

Siedlerchr reviewed Oct 19, 2020

View reviewed changes

tmrd993 added 4 commits October 19, 2020 16:02

replace regex with substring and string comparison

e69989a

fix checkstyle issues

1779e42

fix checkstyle issues

3704269

add regex to check aps doi format

1853df7

Siedlerchr reviewed Oct 20, 2020

View reviewed changes

move suffix pattern to private static field

983962f

Siedlerchr approved these changes Oct 20, 2020

View reviewed changes

add changelog entry

a1fe707

Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels Oct 20, 2020

tmrd993 added 2 commits October 20, 2020 12:15

add issue link to changelog entry

593c86d

Merge branch 'master' into add-cleanup-for-replacing-page-number-with…

a1fdf3e

…-article-id

tobiasdiez merged commit a7b05d0 into JabRef:master Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Cleanup for copying over physcial review article id as the page n… #7025

add Cleanup for copying over physcial review article id as the page n… #7025

tmrd993 commented Oct 18, 2020

tobiasdiez left a comment

tobiasdiez Oct 18, 2020

tobiasdiez Oct 18, 2020

Siedlerchr Oct 18, 2020

Siedlerchr Oct 18, 2020

tobiasdiez Oct 18, 2020

tmrd993 commented Oct 19, 2020

Siedlerchr Oct 19, 2020

tmrd993 Oct 19, 2020

Siedlerchr Oct 19, 2020

tmrd993 Oct 19, 2020

tobiasdiez Oct 19, 2020

tmrd993 Oct 19, 2020

Siedlerchr Oct 20, 2020

Siedlerchr commented Oct 20, 2020

tobiasdiez commented Oct 20, 2020

add Cleanup for copying over physcial review article id as the page n… #7025

add Cleanup for copying over physcial review article id as the page n… #7025

Conversation

tmrd993 commented Oct 18, 2020

tobiasdiez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmrd993 commented Oct 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr commented Oct 20, 2020

tobiasdiez commented Oct 20, 2020