-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support BM25 parameters customization #163
Comments
Hi @rayhsieh , See the relevant discussion, where @rolftimmermans correctly notes:
That said, there is no hard technical reason why this cannot be done. It just would need to be properly documented, also explaining that these kind of tweaks are normally discouraged. I tend to prefer keeping the API surface as small as it can be, but if there is a use case for this, it can be done. |
Hi @lucaong , |
That sounds like an interesting case. I personally would like to know more about how MiniSearch performs on non-latin scripts, and I am very interested in hearing feedback on what could be added or changed to improve the experience in those cases. MiniSearch follows the principle of not including any language-specific utility (such as stemming or normalization), but making it easily possible to plug those in whenever necessary. I think I can soon prepare an update that makes BM25 parameters configurable, possibly initially as a beta feature. |
Here are some of my notes that may help in documenting the parameters, if they're exposed. May need a bit of a rewrite :) This article is also helpful for understanding
|
@lucaong Thank you for planning this request. While working on the search function on my dataset, it is very flexible for me the add language-specific tokenizer. I don't have other recommendation at this point since it is already met what I need. |
@rayhsieh this feature is now released along other features as part of |
Would you consider to support customization of BM25 parameters? It would be very helpful for optimizing search relevance.
The text was updated successfully, but these errors were encountered: