Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate max_seq_length in SquadProcessor #2740

Merged
merged 4 commits into from
Jul 4, 2022

Conversation

francescocastelli
Copy link
Contributor

@francescocastelli francescocastelli commented Jun 28, 2022

Related Issue(s):
#2319

Proposed changes:

  • What we need to validate max_seq_len is processor.tokenizer.model_max_length. I've put the assert in the SquadProcessor class because in FARMReader we would have to place it in multiple different methods. Let me know if I'm missing something!

  • Should we validate the max length also on other processors? for instance TextSimilarityProcessor uses two tokenizers and has both max_seq_len_query and max_seq_len_passage.

Pre-flight checklist

@julian-risch
Copy link
Member

Hi @francescocastelli thank you for making this pull request! We will probably need a few days until we can have a look at it and give feedback. Stay tuned! 🙂

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @francescocastelli! I have a small style comment on this PR. I see you've been using the .format() method of strings, while we generally prefer to use f-strings, mostly for readability and consistency. Could you modify the code to use f-strings instead of .format()? Thanks 🙂

In addition, it would be great if you could add a small test for the new asserts. Reach out if you need help testing, unfortunately our test suite is not the most intuitive to deal with 😄

@ZanSara ZanSara added the type:feature New feature or request label Jul 1, 2022
@francescocastelli
Copy link
Contributor Author

@ZanSara Sure! I will fix everything next week :)

@francescocastelli
Copy link
Contributor Author

@ZanSara done! I've also added a test for invalid parameters at the construction of a FARMReader

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! I will merge it soon. Thank you for the contribution! 😊

@ZanSara ZanSara merged commit 31dcd55 into deepset-ai:master Jul 4, 2022
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* added max_len_seq validation in SquadProcessor

* fixed string formatting

* added tests for invalid max_seq_len

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:reader type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants