-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ugrep doesn't match a disdunction when a wildcard is present in one component, but not the other in some circumstances #451
Comments
Thank you for your feedback. This is strange and should never happen, given our extensive unit and regression test set. Looking at the regex pattern DFA internals using RE/flex tools, there is something odd with the graph for the regex pattern However, it is a problem that happens under very specific circumstances. If we add spaces to the disjunctions like so I'll get to the bottom of this and fix it asap. PS. Edited for clarity. |
OK. found the problem in internal DFA-merging of "string-like" regex patterns (e.g. |
Looks like I goofed up. For case-insensitive patterns in some rare special cases when regex patterns are combined with "string-like" patterns that overlap at or from the start with the regex pattern, then we may encounter this problem. It's right here in ugrep/lib/patterns.cpp in the *** 1756,1773 ****
{
// combine the tree DFA transitions with the regex DFA transition moves
Chars chars;
- for (DFA::State::Edges::iterator t = state->tnode->edges.begin(); t != state->tnode->edges.end(); ++t)
- chars.add(t->first);
if (opt_.i)
{
! for (DFA::State::Edges::iterator t = state->tnode->edges.find('a'); t != state->tnode->edges.end(); ++t)
{
Char c = t->first;
! if (c > 'z')
! break;
! chars.add(uppercase(c));
}
}
Moves::iterator i = moves.begin();
Positions pos;
while (i != moves.end()) The loop starts with The sanitized patch is: --- 1759,1779 ----
{
// combine the tree DFA transitions with the regex DFA transition moves
Chars chars;
if (opt_.i)
{
! for (DFA::State::Edges::iterator t = state->tnode->edges.begin(); t != state->tnode->edges.end(); ++t)
{
Char c = t->first;
! chars.add(c);
! if (c >= 'a' && c <= 'z')
! chars.add(uppercase(c));
}
}
+ else
+ {
+ for (DFA::State::Edges::iterator t = state->tnode->edges.begin(); t != state->tnode->edges.end(); ++t)
+ chars.add(t->first);
+ }
Moves::iterator i = moves.begin();
Positions pos;
while (i != moves.end())
*************** A lot of automated and also manual testing goes into verifying ugrep. We actually have unit tests to test this combination (and more) in RE/flex (the ugrep regex engine), but these unit tests use sub-patterns that start with |
I've committed patch 5be38e8 and will release an update soon after our battery of tests completed. |
minor update for #451 and year 2025
Great, thanks Doctor |
echo "The Quick Brown Fox asdf\nasdf TQBF asdf\nasdf TWBF" | ug -i 'the.quick.brown|TWBF|TQBF'
Seems it happens when the first character is the same and -i is used
echo "The Quick Brown Fox asdf\nasdf TQBF asdf\nasdf TWBF" | ug -i 'the.quick.brown|WBF|QBF'
The text was updated successfully, but these errors were encountered: