-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carriage return handling on Windows #416
Comments
This doesn't seem right. Could you please provide a minimal reproducible example? e.g., This works fine for me:
I think GNU grep handles this by stripping The semantics of the regex engine aren't going to change, sorry. |
Hm, you're right, I'm having trouble reproducing the Regarding Why not just let the anchor match when a character other than I'm guessing that this behavior is due to the current behavior of rust-lang/regex ? If so, I can open an issue there, which seems like a more appropriate place (even though you're the primary author/contributor on that project as well). |
The issue already exists: rust-lang/regex#244
If you know how to implement this efficiently in a DFA, then please teach me. :-) |
Just to be clear, I think the state of affairs is, indeed, unfortunate. It's just hard to fix. I can't say that it will never get fixed, but I wouldn't expect something any time soon. In the mean time, there are not-ideal workarounds. e.g., |
I was thinking about that--how opposed would you be to introducing an option (possibly via config-file) to automatically replace |
ripgrep doesn't have a config file. I'd rather folks work around it for now. I'm not strongly against your heuristic, but we'd need to be careful that it doesn't introduce any additional weirdness. |
Makes sense. Thanks for considering this. |
Ah, I've figured out my issue with word-boundaries--for some reason I was expecting /shame |
@BatmanAoD No worries, word boundaries are hard. See also #389. |
I'm looking into modifying the regex crate to handle Would you consider this a bug in |
Somewhat related: I'm not sure I understand exactly what the "reverse" matching logic is for or how it works. Is it just what's described in rust-lang/regex#190? Does the engine construct a DFA designed (guaranteed?) to find the same matches when applied backwards that the "normal" DFA would find when applied forwards? If so, then with regard to this particular issue, it might actually make my proposed fix infeasible (or at least more complicated than I'd like), unless I'm misunderstanding something. When running the DFA forward, Tangentially: I haven't taken even a cursory look at the code to answer this for myself, but how is ripgrep processing input before feeding it to the regex crate? I've previously assumed that the reason |
It sounds like you are understanding why this is hard to fix in the regex engine. I'd prefer if we discussed the regex engine on the regex repo.
|
Please motivate this with use cases and examples. Reasoning about this in the abstract will probably never convince me that anything needs to be changed. (This also seems orthogonal to |
@BatmanAoD I created #441 |
@BatmanAoD I think I see how this might be fixed for most use cases, at least in ripgrep. It won't hit everything, but:
I think I might be able to roll this into libripgrep. In particular, I am also going to try and take a crack at #389, and the solutions have a similarish feel. |
Note, of course, that in order to be entirely correct, |
@BatmanAoD In line oriented mode, a Getting this right in multiline mode (again, will be a new thing part of libripgrep, it's already implemented) is trickier since matching |
Is libripgrep being developed in a separate repository? Perhaps I could try implementing that solution? |
@BatmanAoD That's probably not a good idea. As far as my open source work goes, I basically don't like collaborating when code is in a nascent state. There is just waaaaay too much context in my brain. It would take days to unload, and everything could switch at a moment's notice as I'm developing. I think this blog packages my thoughts in a neat little bow: http://habitatchronicles.com/2004/04/you-cant-tell-people-anything/ With that said, I do occasionally push my progress to the ag/libripgrep branch: /~https://github.com/BurntSushi/ripgrep/tree/ag/libripgrep --- Beware though, I sometimes force push! |
And I have made sure to incorporate CRLF stripping as well: ripgrep/grep-regex/src/strip.rs Lines 8 to 10 in 4d6fba4
|
I believe I understand! |
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
Being a Windows user with ripgrep 11.0.2 on board, I’m a bit confused about this CRLF thing.
|
I cannot reproduce on Linux:
OK, so I'm assuming that
It's quite likely here that
You can see the
Because
Adding the
Confirming with
Notice that there is only one |
@BurntSushi, I guess, Linux
|
It will have to wait until I have a chance to debug this on Windows. Or else someone else should feel free to debug it. |
Also not sure why you're reporting bugs on closed tickets. This makes it impossible to track. I created #1500 for you. Please just do that next time. |
On Windows, I noticed two surprising things about carriage returns (
\r
):\b
anchor to match). This means thatrg -w foo
fails to findfoo
at the end of a line in a file with Windows line-endings.$
, which makes using the anchor more difficult.I realize that this could be by design, since RipGrep may be using some standard definition of
\b
that explicitly does not include\r
. That seems unintuitive, though, and I don't think I've seen that behavior in other regex engines running on Windows (e.g. in Python on Windows).The text was updated successfully, but these errors were encountered: