Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On identifier not found, detect swapped words #66968

Closed
estebank opened this issue Dec 2, 2019 · 6 comments · Fixed by #67849
Closed

On identifier not found, detect swapped words #66968

estebank opened this issue Dec 2, 2019 · 6 comments · Fixed by #67849
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix` C-enhancement Category: An issue proposing an enhancement or a PR with one. D-papercut Diagnostics: An error or lint that needs small tweaks. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@estebank
Copy link
Contributor

estebank commented Dec 2, 2019

On identifiers that have more than one word, it is relatively common to write them in the wrong order (foo_bar_bazfoo_baz_bar). These are normally not found by Levenshtein distance checks, but we could do a basic "split on _, sort and join before comparison" so that we could suggest the right identifier.

This issue has been assigned to @cjkenn via this comment.

@estebank estebank added A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. A-diagnostics Area: Messages for errors, warnings, and lints P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix` labels Dec 2, 2019
@Centril Centril added the D-papercut Diagnostics: An error or lint that needs small tweaks. label Dec 3, 2019
@JohnTitor JohnTitor added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Dec 8, 2019
@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Dec 19, 2019

I'd like to try to work on this.

As foo_bar_baz is short enough to be caught by the Levenstein distance, I wrote this example for testing:

fn main() {
    let long_variable_name = true;
    println!("{}", long_name_variable);
}

@rustbot claim

@rustbot rustbot assigned rustbot and unassigned rustbot Dec 19, 2019
@LeSeulArtichaut
Copy link
Contributor

@estebank I'm thinking about the priority of the different possible matches. I tend to think that a match found by levenstein distance should have a greater priority, but what do you think?

@estebank
Copy link
Contributor Author

@LeSeulArtichaut it's tricky. I feel that your instinct might be right but we'd only know when looking at real world usage. The nice thing is that the lev distance check by its very nature it is limited, so doing the swapped words check as a fallback should work out ok in practice.

@estebank estebank changed the title On identifier not found, detect swaped words On identifier not found, detect swapped words Dec 20, 2019
@LeSeulArtichaut
Copy link
Contributor

Releasing my assignment, as I am not sure to be able to commit.
In case someone want to take on this, here is the function that should be modified.

pub fn find_best_match_for_name<'a, T>(iter_names: T,
lookup: &str,
dist: Option<usize>) -> Option<Symbol>
where T: Iterator<Item = &'a Symbol> {
let max_dist = dist.map_or_else(|| cmp::max(lookup.len(), 3) / 3, |d| d);
let (case_insensitive_match, levenstein_match) = iter_names
.filter_map(|&name| {
let dist = lev_distance(lookup, &name.as_str());
if dist <= max_dist {
Some((name, dist))
} else {
None
}
})
// Here we are collecting the next structure:
// (case_insensitive_match, (levenstein_match, levenstein_distance))
.fold((None, None), |result, (candidate, dist)| {
(
if candidate.as_str().to_uppercase() == lookup.to_uppercase() {
Some(candidate)
} else {
result.0
},
match result.1 {
None => Some((candidate, dist)),
Some((c, d)) => Some(if dist < d { (candidate, dist) } else { (c, d) })
}
)
});
if let Some(candidate) = case_insensitive_match {
Some(candidate) // exact case insensitive match has a higher priority
} else {
levenstein_match.map(|(candidate, _)| candidate)
}
}

@rustbot release-assignment

@rustbot rustbot removed their assignment Dec 22, 2019
@LeSeulArtichaut
Copy link
Contributor

(I think this is appropriate, please tell me if it isn't)
@rustbot modify labels to +E-Easy

@rustbot rustbot added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Dec 23, 2019
@cjkenn
Copy link
Contributor

cjkenn commented Jan 3, 2020

I can take a stab at this, seems reasonable with previous comments.

@rustbot claim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix` C-enhancement Category: An issue proposing an enhancement or a PR with one. D-papercut Diagnostics: An error or lint that needs small tweaks. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants