-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go ReadFrom performance issues on large rule files #209
Comments
The |
Similar rule but not quite as long so I could compile it with You can recreate the situation by copying the python script posted in #207 (comment) and modifying |
Try this:
And paste here the logs you get. They should look like:
|
It appears as though my RUST_LOG=info ./target/release/yr compile big_rule.yar -o big_rule.yarc [2024-09-25T21:20:23Z INFO yara_x::compiler] WASM module build time: 206.755527271s
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Aho-Corasick automaton build time: 3.89521057s
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Number of rules: 1
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T21:20:27Z INFO yara_x::compiler::rules] Atoms with len > 4: 0 RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md [2024-09-25T21:25:31Z INFO yara_x::compiler::rules] Deserialization time: 255.158134944s
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Aho-Corasick automaton build time: 3.992856189s
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Number of rules: 1
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T21:25:35Z INFO yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-25T21:25:35Z INFO yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-25T21:25:35Z INFO yara_x::scanner::context] Scan time: 359.784µs
[2024-09-25T21:25:35Z INFO yara_x::scanner::context] Atom matches: 260 |
That's strange, can you provide details about operating system, and CPU type? |
I'm using WSL (Ubuntu 22.04 LTS). I allocated 12 GB of RAM and 4 cores of my i7-9750H to WSL |
Can you try the same on Windows without WSL? |
I also tried it on a system running Arch Linux natively with 8 GB of RAM and an i5-4670K (I know it's old) and had similar results as my WSL machine: RUST_LOG=info ./target/release/yr compile big_rule.yar -o big_rule.yarc [2024-09-25T22:23:40Z INFO yara_x::compiler] WASM module build time: 199.419351874s
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Aho-Corasick automaton build time: 2.951699981s
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Number of rules: 1
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T22:23:43Z INFO yara_x::compiler::rules] Atoms with len > 4: 0 RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md [2024-09-25T22:27:42Z INFO yara_x::compiler::rules] Deserialization time: 199.372617541s
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Aho-Corasick automaton build time: 2.950987995s
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Number of rules: 1
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T22:27:45Z INFO yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-25T22:27:45Z INFO yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-25T22:27:45Z INFO yara_x::scanner::context] Scan time: 233.65µs
[2024-09-25T22:27:45Z INFO yara_x::scanner::context] Atom matches: 248 |
Something strange is going on here, can you tell me more about your setup? Like Also, if you have some other machine that you can test, that will be helpful too. |
I've upgraded from rustc 1.80 to 1.81 and there's a huge performance drop, my numbers are now similar to yours:
I'll try to figure out why. |
This is the flamegraph for when compiled with rustc 1.80: And this other with rustc 1.81: There's a huge performance difference in the merge_bundle function. The |
There were some changes in sorting algorithms in the Rust standard library between versions 1.80 and 1.81. It was also mentioned in the release notes: https://releases.rs/docs/1.81.0/#libraries At least in this case the changes affected performance negatively. |
I built YARA-X with Rust 1.80 and I can confirm that my scan times have drastically decreased. Should I edit this issue to focus on the compile and scan times between Also, the rust 1.80 scan times are still much longer than YARA v4: YARA-Xhyperfine "./target/release/yr scan -C big_rule.yarc README.md" Benchmark 1: ./target/release/yr scan -C big_rule.yarc README.md
Time (mean ± σ): 28.954 s ± 2.663 s [User: 28.794 s, System: 0.208 s]
Range (min … max): 25.671 s … 32.781 s 10 runs YARAhyperfine "yara --stack-size=1048576 -C big_rule.yarc README.md" Benchmark 1: yara --stack-size=1048576 -C big_rule.yarc README.md
Time (mean ± σ): 705.0 ms ± 36.7 ms [User: 552.0 ms, System: 151.8 ms]
Range (min … max): 672.6 ms … 799.7 ms 10 runs Should I make a separate issue for that? |
I will open an issue in the wasmtime project about this regression between rust 1.80 and 1.81 and see what they say. I tried with the latest version of wasmtime and the issue is still there. Regarding the deserialization time still being higher in YARA-X, there's one thing you can do to decrease that time. You can enable the In YARA-X the condition of each rule is compiled into WASM code, which in turn is converted into native code for the current platform. By default, when you serialize a set of rules, only the WASM code is stored in the resulting file. When you deserialize the rules, YARA-X converts the WASM code into native code for the current platform, and that's the step that is taking most of the deserialization time. By enabling the Still, the deserialization step may be slower in YARA-X, the aim of YARA-X is being faster at scanning. Also notice that the WASM->native code conversion is very slow in this case due to the nature of your rule. With this large number of patterns, the condition for your rule gets translated into a single huge function that takes time to compile and optimize. If would greatly benefit from splitting your huge rule into multiple smaller ones. When you have multiple rules YARA-X can compile them in parallel and reduce the compilation time. |
I saw a large decrease in scan times when building with RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md [2024-09-27T13:09:05Z INFO yara_x::compiler::rules] Deserialization time: 325.41223ms
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Aho-Corasick automaton build time: 3.605753886s
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Number of rules: 1
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Number of patterns: 120000
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-27T13:09:08Z INFO yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-27T13:09:08Z INFO yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-27T13:09:08Z INFO yara_x::scanner::context] Scan time: 296.509µs
[2024-09-27T13:09:08Z INFO yara_x::scanner::context] Atom matches: 230 hyperfine "./target/release/yr scan -C big_rule.yarc README.md" Benchmark 1: ./target/release/yr scan -C big_rule.yarc README.md
Time (mean ± σ): 5.636 s ± 0.870 s [User: 5.492 s, System: 0.156 s]
Range (min … max): 4.074 s … 6.985 s 10 runs It's still a few seconds slower than my YARA v4 results #209 (comment) but all of my issues have been addressed. Thank you for your help and walking me through this |
The issues with rustc 1.81 should be solved once this fix is merged and propagated to |
using
ReadFrom
is much slower than a similar function in go-yara on large rule files:YARA-X ReadFrom
./go-yara-x-test
Source
go-yara ReadRules
./go-yara-test
The text was updated successfully, but these errors were encountered: