Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Cleanup for copying over physcial review article id as the page n… #7025

Merged
merged 11 commits into from
Oct 20, 2020
Merged

add Cleanup for copying over physcial review article id as the page n… #7025

merged 11 commits into from
Oct 20, 2020

Conversation

tmrd993
Copy link
Contributor

@tmrd993 tmrd993 commented Oct 18, 2020

Fixes #7019

Added a cleanup to copy over the article id as the page number for APS journals. This only happens if the page number doesn't exist already.

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at /~https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for you PR. Your idea seems to work nicely and the code looks good as well. I've only one remark about the location of the code, and a bit of fine tuning. Please also add a test, and have a look at the fetcher tests that are currently failing (because the fetcher now returns also page information).

/**
* adds the article ID of a journal as the page count, but only if the page field is empty
*/
public class PageFieldCleanup implements CleanupJob {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only used for the DOI fetcher, I would suggest to add this functionality as a private class method in DoiFetcher instead of a new class.

if (doiAsString.isPresent() && !entry.hasField(StandardField.PAGES)) {
String articleId = new String();
int index = doiAsString.get().length() - 1;
while (Character.isDigit(doiAsString.get().charAt(index))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this issue only concerns articles from Physical Review, I would suggest to use a regex based on the format outlined in #7019 (comment). In particular, make sure it only applies to doi's to Physical Review, and not all dois ending on some number.

@tobiasdiez tobiasdiez added the status: changes required Pull requests that are not yet complete label Oct 18, 2020

if (doiAsString.isPresent() && !entry.hasField(StandardField.PAGES)) {
String articleId = new String();
int index = doiAsString.get().length() - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply use a substring with lastIndexOf(.) to the the part after the last dot.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/String.html#lastIndexOf(java.lang.String)

@@ -89,6 +90,7 @@ public String getName() {
}

private void doPostCleanup(BibEntry entry) {
new PageFieldCleanup().cleanup(entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good idea to put it here, because it would be called for every type of DOI, and the APS are only a very specific subset.
I would suggest adding it as a CleanupPreset Step:
/~https://github.com/JabRef/jabref/blob/1b35f8cb0040fdfb515974e78532598f07e11af2/src/main/java/org/jabref/logic/cleanup/CleanupPreset.java

and also adding it to the Cleanup Dialog maybe as Move article id to pages? (APS).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's the right place, the DOI fetcher is not returning the right/full information, so we improve this here. Of course, you are right and the extraction should only be applied for DOI's from APS (see my comment above).

@tmrd993
Copy link
Contributor Author

tmrd993 commented Oct 19, 2020

Thanks a lot for you PR. Your idea seems to work nicely and the code looks good as well. I've only one remark about the location of the code, and a bit of fine tuning. Please also add a test, and have a look at the fetcher tests that are currently failing (because the fetcher now returns also page information).

Thanks for the review @Siedlerchr @tobiasdiez
I think I am almost finished but some of the fetcher tests are still failing sometimes. I don't think it has anything to do with my code though (?) as it's just a connection failure. Here is the exception:
Caused by: java.io.IOException: Server returned HTTP response code: 504 for URL: https://data.crossref.org/10.1109%2FICWS.2007.59

I am guessing it's an issue with the crossref website.

if (!entry.getType().equals(StandardEntryType.Article)) {
return false;
}
Pattern apsJournalSuffixPattern = Pattern.compile("([\\w]+\\.)([\\w]+\\.)([\\w]+)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if it would make sense to check for the strig "phys" as well?

It's better to etract the Pattern to a static string, otherwise you won't get the advantages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that every Physical Review journal contains that string? For example, this aps doi doesn't contain the string "phys". https://doi.org/10.1103/PRXQuantum.1.010001
Here is a list of all journals
https://journals.aps.org/browse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best is to check for the string/number 10.1103/ It seems all DOIs from APS are prefixed with that.
Otherwise the regex is too broad and would capture a lot of non related things and would create invalid data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a regex isn't really needed because in the string 10.1103, 1103 denotes the organization (APS in this case). That's really all that has to be checked if I am not mistaken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would use the regex as you can test that the doi is of the right format, so that you really know that the last number is the page number. And it's also a bit easier to understand than the manual parsing using isDigit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we

  1. use the organization id to check if the entry is an aps journal
  2. use the regex to check if the doi is of the right format (https://doi.org/10.1103/[journal].[volume].[articleID])
  3. set the page field if 1 and 2 are true

private boolean isAPSJournal(BibEntry entry, String doiAsString) {
if (!entry.getType().equals(StandardEntryType.Article)) {
return false;
}
Pattern apsSuffixPattern = Pattern.compile(APS_SUFFIX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move that line one as well to private static final then it's good !

@Siedlerchr
Copy link
Member

Thanks, looks good to me now! Don't forget to add a changelog entry for the new feature.

@Siedlerchr Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels Oct 20, 2020
@tobiasdiez
Copy link
Member

Thanks! I hope you enjoyed the process, although may have been a bit confusing at times. Sorry for that. Looking forward to your next PR.

And now MERGE 🚀

@tobiasdiez tobiasdiez merged commit a7b05d0 into JabRef:master Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Cleanup for copying over physcial review article id as the page number (revtex)
3 participants