-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegexSet references cannot be passed across unwind boundaries #576
Comments
Note that this is an issue for Regex types as well, not just RegexSet. |
Did this use to work? (I don't think it did.) So are you reported this as a feature request? It's not immediately obvious to me that this should work. But it's probably okay for regex internals to assert that it is unwind safe since interior mutability is never used for correctness, only performance. |
I'm not sure if this was supported in the past; it was a bit of a sharp edge that bit me, as I'd assume normally that immutable objects would be unwind safe. If the interior mutability is only used for caching, perhaps the safest approach would be to drop the cache if a panic occurs inside the regex engine? |
This makes the thread_local (and by consequence, lazy_static) crates optional by providing a naive caching mechanism when perf-cache is disabled. This is achieved by defining a common API and implementing it via both approaches. The one tricky bit here is to ensure our naive version implements the same auto-traits as the fast version. Since we just use a plain mutex, it impls RefUnwindSafe, but thread_local does not. So we forcefully remove the RefUnwindSafe impl from our safe variant. We should be able to implement RefUnwindSafe in both cases, but this likely requires some mechanism for clearing the regex cache automatically if a panic occurs anywhere during search. But that's a more invasive change and is part of #576.
This makes the thread_local (and by consequence, lazy_static) crates optional by providing a naive caching mechanism when perf-cache is disabled. This is achieved by defining a common API and implementing it via both approaches. The one tricky bit here is to ensure our naive version implements the same auto-traits as the fast version. Since we just use a plain mutex, it impls RefUnwindSafe, but thread_local does not. So we forcefully remove the RefUnwindSafe impl from our safe variant. We should be able to implement RefUnwindSafe in both cases, but this likely requires some mechanism for clearing the regex cache automatically if a panic occurs anywhere during search. But that's a more invasive change and is part of #576.
This makes the thread_local (and by consequence, lazy_static) crates optional by providing a naive caching mechanism when perf-cache is disabled. This is achieved by defining a common API and implementing it via both approaches. The one tricky bit here is to ensure our naive version implements the same auto-traits as the fast version. Since we just use a plain mutex, it impls RefUnwindSafe, but thread_local does not. So we forcefully remove the RefUnwindSafe impl from our safe variant. We should be able to implement RefUnwindSafe in both cases, but this likely requires some mechanism for clearing the regex cache automatically if a panic occurs anywhere during search. But that's a more invasive change and is part of #576.
Just wanted to chime in that when using libfuzz or afl.rs this is also an issue because you need std::panic::RefUnwindSafe implemented for objects within the closure. If you're trying to fuzz something that involves regex matching, you just wont be able to (at least from my rust newbie perspective). |
The situation may have changed here because of changes upstream in thread_local. The statement in the (I noticed this because the build is broken due to a warning. CachedThreadLocal is deprecated and now just passes through to ThreadLocal. That relates to this issue because the fix (just use ThreadLocal) should probably also adjust the comment, but just replacing "CachedThreadLocal" with "ThreadLocal" in the comment doesn't seem to make it right.) |
@dpathakj Yeah, I'm working on addressing this. I have a newborn at home though, so it's tough. Basically, my plan here is to actually remove the The main problem with removing the dependency is the reason why |
This commit removes the thread_local dependency (even as an optional dependency) and replaces it with a more purpose driven memory pool. The comments in src/pool.rs explain this in more detail, but the short story is that thread_local seems to be at the root of some memory leaks happening in certain usage scenarios. The great thing about thread_local though is how fast it is. Using a simple Mutex<Vec<T>> is easily at least twice as slow. We work around that a bit by coding a simplistic fast path for the "owner" of a pool. This does require one new use of `unsafe`, of which we extensively document. This now makes the 'perf-cache' feature a no-op. We of course retain it for compatibility purposes (and perhaps it will be used again in the future), but for now, we always use the same pool. As for benchmarks, it is likely that *some* cases will get a hair slower. But there shouldn't be any dramatic difference. A careful review of micro-benchmarks in addition to more holistic (albeit ad hoc) benchmarks via ripgrep seems to confirm this. Now that we have more explicit control over the memory pool, we also clean stuff up with repsect to RefUnwindSafe. Fixes #362, Fixes #576 Ref BurntSushi/rure-go#3
This commit removes the thread_local dependency (even as an optional dependency) and replaces it with a more purpose driven memory pool. The comments in src/pool.rs explain this in more detail, but the short story is that thread_local seems to be at the root of some memory leaks happening in certain usage scenarios. The great thing about thread_local though is how fast it is. Using a simple Mutex<Vec<T>> is easily at least twice as slow. We work around that a bit by coding a simplistic fast path for the "owner" of a pool. This does require one new use of `unsafe`, of which we extensively document. This now makes the 'perf-cache' feature a no-op. We of course retain it for compatibility purposes (and perhaps it will be used again in the future), but for now, we always use the same pool. As for benchmarks, it is likely that *some* cases will get a hair slower. But there shouldn't be any dramatic difference. A careful review of micro-benchmarks in addition to more holistic (albeit ad hoc) benchmarks via ripgrep seems to confirm this. Now that we have more explicit control over the memory pool, we also clean stuff up with repsect to RefUnwindSafe. Fixes #362, Fixes #576 Ref BurntSushi/rure-go#3
This should be fixed in |
The following minimal test case fails on regex 1.1.6:
with the following error output:
The text was updated successfully, but these errors were encountered: