-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Mt/St/Ft normalizing #767
Conversation
more accurate since it's smarter about where to replace periods
@@ -6,6 +6,7 @@ var sanitizeAll = require('../sanitizer/sanitizeAll'), | |||
quattroshapes_deprecation: require('../sanitizer/_deprecate_quattroshapes'), | |||
text: require('../sanitizer/_text'), | |||
iso2_to_iso3: require('../sanitizer/_iso2_to_iso3'), | |||
mount_saint_fort_standardizer: require('../sanitizer/_mount_saint_fort_standardizer'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might we want to expand this in the future so that it's less specific and more of an "abbreviation_standardizer"? It feels a bit too specific of a name
const transliterated = periods_removed.replace(mountSaintFort, transliterate); | ||
|
||
// 3. whitespace-normalize by replacing much whitespace with a space and trimming | ||
// duplicate whitespace can be introduced when removing periods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a missing word in here. should it be "and trimming duplicate whitespace that can be introduced when removing periods" (or something similar)
const messages = { errors: [], warnings: [] }; | ||
|
||
// only try to transliterate if there is a city in parsed_text | ||
if (!_.isEmpty(_.get(clean, 'parsed_text.city'))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah _.get
is kinda nice
WOF names are spelled out as Mount and Fort whereas Saint and Sainte are abbreviated to St and Ste, respectively. This PR adds a sanitizer that transliterates:
This will ensure that, for example, searching for
Saint Louis, MO
andSt. Louis
return the same results.Fixes pelias/schema#157