-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(cruby): reuse XPathContext objects #3378
Conversation
@tenderlove if you get some time to review this, I'd really appreciate it. You can just focus on the final commit (everything else is nonfunctional refactoring). ❤️ |
f92c678
to
334ad08
Compare
b760c62
to
1729fd5
Compare
**What problem is this PR intended to solve?** Some cleanup and refactoring that was originally in #3378 but is generally valuable even if that PR doesn't land, so I split it out.
1729fd5
to
beecfa4
Compare
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
beecfa4
to
969bea9
Compare
Going to merge this so I can ship a 1.18 release candidate today. But if you have any feedback, the final release is still a couple of weeks away. |
Note that if you use a single XPath context and support custom XPath extension functions, a custom function could evaluate XPath expressions recursively which will lead to corruption of context variables. This is mostly due to some design mistakes in libxml2. So you should backup and restore the following members of
Other settings that could change like namespaces should be backed and restored up as well. Or you could simply disallow recursive evaluation in some way. |
@nwellnhof thank you for the advice! |
@nwellnhof Would it be OK to set |
If we're re-using the xmlXPathContext object, there's a chance that the context variables will be trashed by recursive custom functions. In #3378 (comment), Nick advised: > Note that if you use a single XPath context and support custom XPath > extension functions, a custom function could evaluate XPath > expressions recursively which will lead to corruption of context > variables. This is mostly due to some design mistakes in libxml2. So let's set these context variables back to their default.
See #3386 |
That's not enough. Consider:
I'd suggest to set both values to 1 once after creating an XPath context, but it doesn't matter much outside of XSLT. |
@nwellnhof Oh! That makes sense, I misunderstood your original message. It hadn't even occurred to me that people would want to do that. Alternatively, I could create a new xpath context object if there's a recursive call being made. |
Yes, it seems unlikely that Nokogiri users would want to do that. But in libxslt, this happens all the time. I think the underlying issue is that context creation is too slow in libxml2 because the XPath functions are registered each time. If this was fixed, it shouldn't be too costly to recreate a context for each evaluation. |
@nwellnhof Yes, in my benchmark, |
@nwellnhof If it's interesting, I implemented a branch in #3388 where I copy an initialized |
I reworked XPath function lookup here: https://gitlab.gnome.org/GNOME/libxml2/-/commit/bf5fcf6e646bb51a0f6a3655a1d64bea97274867
|
@nwellnhof Thanks for that! Here's how the above benchmark looks for small ~6kb files:
or, dropping the "re-use the object" comparison:
where:
That's a really good win! |
See extended discussion at #3378 Benchmark comparing this commit against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower I think we could get greater performance gains by re-using XPathContext objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a 2x speedup for zero additional complexity.
See extended discussion at #3378 Benchmark comparing this commit against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower I think we could get greater performance gains by re-using XPathContext objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a 2x speedup for zero additional complexity.
See extended discussion at #3378 Benchmark comparing this commit against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower I think we could get greater performance gains by re-using XPathContext objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a 2x speedup for zero additional complexity.
Maybe I'm missing something, but looking at commit cef057f, I can't see where namespaces (or variables) are unregistered from the global context after evaluation. On another note, if you really want to speed up XPath processing, you should think about supporting compiled expressions. |
**What problem is this PR intended to solve?** After chatting with @nwellnhof in #3378, he went ahead and made an performance-focused change upstream in libxml2/gnome@bf5fcf6e that I'm pulling in here to the vendored library. Benchmark comparing this PR against v1.17.x ("main"): Comparison: large: main: 3910.6 i/s large: patched: 3759.6 i/s - same-ish: difference falls within error Comparison: small: patched: 242901.7 i/s small: main: 127486.0 i/s - 1.91x slower We could get greater performance gains by re-using `XPathContext` objects, but only at the cost of a significant amount of additional complexity, since in order to properly support recursive XPath evaluation, Nokogiri would have to push and pop "stack frames" containing: - internal state contextSize and proximityPosition - registered namespaces - registered variables - function lookup handler That feels like a lot of code for a small win. Comparatively, pulling in this upstream patch is still a ~2x speedup with zero additional complexity. That's a win. **Have you included adequate test coverage?** N/A **Does this change affect the behavior of either the C or the Java implementations?** Performance improvement impacting CRuby only.
@nwellnhof I reverted most of this PR in #3389, which I just merged. So what's on But if it's interesting, the ruby method |
@nwellnhof re: compiled expressions, I did implement it in #3380 but I was not able to benchmark as large of a performance difference as I expected. Any advice you'd have on when to compile expressions might be helpful for me, both to benchmark and to help users understand when to do it. Similarly: I have a branch that enables the XPath cache, and it did show some pretty good speedup on some blunt benchmarks, but again I don't know enough about how it works to give people advice on when to enable it. |
It should be most noticeable with expressions that evaluate quickly like
The cache speeds up allocation of XPath objects. In general, it's a good idea to enable it. Another useful optimization is |
See #3403 which is a placeholder issue to circle back on Nick's suggested performance improvements. |
What problem is this PR intended to solve?
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize).
This PR makes the following changes:
XPathContext
object ...nil
as the value arg to#register_ns
and#register_variable
to deregister the namespace or variable#reset
method that deregisters all namespaces and variables registered#node=
method to set the current node being searchedSearchable
methods ...#get_xpath_context
#reset
when the block returnsThere is an escape hatch that I will leave undocumented, which is to set the env var
NOKOGIRI_DEOPTIMIZE_XPATH
, out of an abundance of caution.Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file:
What alternatives were considered?
I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling
xmlXPathRegisterAllFunctions
to re-register them took us back to the original performance profile.The win here is to avoid re-calling
xmlXPathRegisterAllFunctions
, and for that we track registered variables and namespaces, and de-register them after the query eval completes.Have you included adequate test coverage?
Yes.
Does this change affect the behavior of either the C or the Java implementations?
CRuby only.