Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go ReadFrom performance issues on large rule files #209

Closed
zdiff opened this issue Sep 25, 2024 · 16 comments
Closed

Go ReadFrom performance issues on large rule files #209

zdiff opened this issue Sep 25, 2024 · 16 comments

Comments

@zdiff
Copy link

zdiff commented Sep 25, 2024

using ReadFrom is much slower than a similar function in go-yara on large rule files:

YARA-X ReadFrom

Command Mean [s] Min [s] Max [s] Relative
./go-yara-x-test 319.087 ± 36.415 268.215 391.082 1.00
hyperfine ./go-yara-test --export-markdown yara-x.md
Benchmark 1: ./go-yara-test
  Time (mean ± σ):     319.087 s ± 36.415 s    [User: 318.592 s, System: 0.478 s]
  Range (min … max):   268.215 s … 391.082 s    10 runs

Source

package main

import (
        "bytes"
        _ "embed"
        "log"

        yara "github.com/VirusTotal/yara-x/go"
)

//go:embed rule.yarc
var rule []byte

func main() {

        _, err := yara.ReadFrom(bytes.NewReader(rule))
        if err != nil {
                log.Fatal(err)
        }

}

go-yara ReadRules

Command Mean [s] Min [s] Max [s] Relative
./go-yara-test 1.992 ± 0.313 1.683 2.700 1.00
hyperfine ./go-yara-test --export-markdown go-yara.md
Benchmark 1: ./go-yara-test
  Time (mean ± σ):      1.992 s ±  0.313 s    [User: 1.849 s, System: 0.164 s]
  Range (min … max):    1.683 s …  2.700 s    10 runs
package main

import (
	"bytes"
	_ "embed"
	"log"

	"github.com/hillu/go-yara/v4"
)

//go:embed rule.yarc
var rule []byte
func main() {

	_, err := yara.ReadRules(bytes.NewReader(rule))
	if err != nil {
		log.Fatal(err)
	}

}
@plusvic
Copy link
Member

plusvic commented Sep 25, 2024

The ReadFrom not only loads the rules from the given reader, it also creates the Aho-Corasick automaton and produces the native code for the current platform. The time is probably spent in one of those two steps. Can you share information about the rule? It's the same gigantic rule that you mentioned in #207?

@zdiff
Copy link
Author

zdiff commented Sep 25, 2024

The ReadFrom not only loads the rules from the given reader, it also creates the Aho-Corasick automaton and produces the native code for the current platform. The time is probably spent in one of those two steps. Can you share information about the rule? It's the same gigantic rule that you mentioned in #207?

Similar rule but not quite as long so I could compile it with yr compile

You can recreate the situation by copying the python script posted in #207 (comment) and modifying for count in range(131533): to for count in range(120000): to avoid triggering the yr compile bug.

@plusvic
Copy link
Member

plusvic commented Sep 25, 2024

Try this:

cargo build --release --features=logging
RUST_LOG=info ./target/release/yr compile big_rule.yar -o big_rule.yarc
RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md

And paste here the logs you get. They should look like:

RUST_LOG=info ./target/release/yr compile big_rule.yar                                                 
[2024-09-25T18:56:43Z INFO  yara_x::compiler] WASM module build time: 14.509398672s
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.007314934s
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T18:56:46Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0

RUST_LOG=info ./target/release/yr scan -C output.yarc README.md                                     
[2024-09-25T18:57:36Z INFO  yara_x::compiler::rules] Deserialization time: 15.063360268s
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.256815566s
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T18:57:39Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-25T18:57:39Z INFO  yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-25T18:57:39Z INFO  yara_x::scanner::context] Scan time: 595.413µs
[2024-09-25T18:57:39Z INFO  yara_x::scanner::context] Atom matches: 203

@zdiff
Copy link
Author

zdiff commented Sep 25, 2024

It appears as though my WASM module build time is much larger than yours for both commands

RUST_LOG=info ./target/release/yr compile big_rule.yar -o big_rule.yarc
[2024-09-25T21:20:23Z INFO  yara_x::compiler] WASM module build time: 206.755527271s
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.89521057s
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T21:20:27Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0

RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md
[2024-09-25T21:25:31Z INFO  yara_x::compiler::rules] Deserialization time: 255.158134944s
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.992856189s
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T21:25:35Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-25T21:25:35Z INFO  yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-25T21:25:35Z INFO  yara_x::scanner::context] Scan time: 359.784µs
[2024-09-25T21:25:35Z INFO  yara_x::scanner::context] Atom matches: 260

@plusvic
Copy link
Member

plusvic commented Sep 25, 2024

That's strange, can you provide details about operating system, and CPU type?

@zdiff
Copy link
Author

zdiff commented Sep 25, 2024

I'm using WSL (Ubuntu 22.04 LTS). I allocated 12 GB of RAM and 4 cores of my i7-9750H to WSL

@plusvic
Copy link
Member

plusvic commented Sep 25, 2024

Can you try the same on Windows without WSL?

@zdiff
Copy link
Author

zdiff commented Sep 25, 2024

I also tried it on a system running Arch Linux natively with 8 GB of RAM and an i5-4670K (I know it's old) and had similar results as my WSL machine:

RUST_LOG=info ./target/release/yr compile big_rule.yar -o big_rule.yarc
[2024-09-25T22:23:40Z INFO  yara_x::compiler] WASM module build time: 199.419351874s
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 2.951699981s
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T22:23:43Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0

RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md
[2024-09-25T22:27:42Z INFO  yara_x::compiler::rules] Deserialization time: 199.372617541s
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 2.950987995s
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-25T22:27:45Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-25T22:27:45Z INFO  yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-25T22:27:45Z INFO  yara_x::scanner::context] Scan time: 233.65µs
[2024-09-25T22:27:45Z INFO  yara_x::scanner::context] Atom matches: 248

@plusvic
Copy link
Member

plusvic commented Sep 26, 2024

Something strange is going on here, can you tell me more about your setup? Like rustc version and which exact commit of YARA-X you are compiling?

Also, if you have some other machine that you can test, that will be helpful too.

@plusvic
Copy link
Member

plusvic commented Sep 26, 2024

I've upgraded from rustc 1.80 to 1.81 and there's a huge performance drop, my numbers are now similar to yours:

[2024-09-26T08:21:16Z INFO  yara_x::compiler] WASM module build time: 159.544566313s
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.312271385s
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-26T08:21:20Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0

I'll try to figure out why.

@plusvic
Copy link
Member

plusvic commented Sep 26, 2024

This is the flamegraph for when compiled with rustc 1.80:

image

And this other with rustc 1.81:

image

There's a huge performance difference in the merge_bundle function.

The merge_bundle function calls sort_unstable_by_key, which on Rust 1.81 seems to be relying on a quicksort implementation that is behaving worse than the one in Rust 1.80.

@plusvic
Copy link
Member

plusvic commented Sep 26, 2024

There were some changes in sorting algorithms in the Rust standard library between versions 1.80 and 1.81.

rust-lang/rust#124032

It was also mentioned in the release notes:

https://releases.rs/docs/1.81.0/#libraries

At least in this case the changes affected performance negatively.

@zdiff
Copy link
Author

zdiff commented Sep 26, 2024

I built YARA-X with Rust 1.80 and I can confirm that my scan times have drastically decreased. Should I edit this issue to focus on the compile and scan times between yr built with 1.80 and 1.81?

Also, the rust 1.80 scan times are still much longer than YARA v4:

YARA-X

hyperfine "./target/release/yr scan -C big_rule.yarc README.md"
Benchmark 1: ./target/release/yr scan -C big_rule.yarc README.md
  Time (mean ± σ):     28.954 s ±  2.663 s    [User: 28.794 s, System: 0.208 s]
  Range (min … max):   25.671 s … 32.781 s    10 runs

YARA

hyperfine "yara --stack-size=1048576 -C big_rule.yarc README.md"
Benchmark 1: yara --stack-size=1048576 -C big_rule.yarc README.md
  Time (mean ± σ):     705.0 ms ±  36.7 ms    [User: 552.0 ms, System: 151.8 ms]
  Range (min … max):   672.6 ms … 799.7 ms    10 runs

Should I make a separate issue for that?

@plusvic
Copy link
Member

plusvic commented Sep 27, 2024

I will open an issue in the wasmtime project about this regression between rust 1.80 and 1.81 and see what they say. I tried with the latest version of wasmtime and the issue is still there.

Regarding the deserialization time still being higher in YARA-X, there's one thing you can do to decrease that time. You can enable the native-code-serialization feature. Let me explain what it does.

In YARA-X the condition of each rule is compiled into WASM code, which in turn is converted into native code for the current platform. By default, when you serialize a set of rules, only the WASM code is stored in the resulting file. When you deserialize the rules, YARA-X converts the WASM code into native code for the current platform, and that's the step that is taking most of the deserialization time.

By enabling the native-code-serialization feature you tell YARA-X to include not only the WASM code, but also the native code in the serialized rules. This way the WASM->native code conversion is done once (when the rules are compiled), it doesn't need to be done again when the rules are deserialized. The drawback is that the produced file is larger. This approach works when you are deserializing the rules in the same platform where they where compiled.

Still, the deserialization step may be slower in YARA-X, the aim of YARA-X is being faster at scanning. Also notice that the WASM->native code conversion is very slow in this case due to the nature of your rule. With this large number of patterns, the condition for your rule gets translated into a single huge function that takes time to compile and optimize. If would greatly benefit from splitting your huge rule into multiple smaller ones. When you have multiple rules YARA-X can compile them in parallel and reduce the compilation time.

@zdiff
Copy link
Author

zdiff commented Sep 27, 2024

I saw a large decrease in scan times when building with cargo build --release --features="logging yara-x/native-code-serialization:

RUST_LOG=info ./target/release/yr scan -C big_rule.yarc README.md
[2024-09-27T13:09:05Z INFO  yara_x::compiler::rules] Deserialization time: 325.41223ms
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Aho-Corasick automaton build time: 3.605753886s
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Number of rules: 1
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Number of patterns: 120000
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Number of anchored sub-patterns: 0
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Number of atoms: 2400000
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len = 0: 0
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len = 1: 0
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len = 2: 0
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len = 3: 0
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len = 4: 2400000
[2024-09-27T13:09:08Z INFO  yara_x::compiler::rules] Atoms with len > 4: 0
[2024-09-27T13:09:08Z INFO  yara_x::scanner::context] Started rule evaluation: default:big_rule
[2024-09-27T13:09:08Z INFO  yara_x::scanner::context] Scan time: 296.509µs
[2024-09-27T13:09:08Z INFO  yara_x::scanner::context] Atom matches: 230

hyperfine "./target/release/yr scan -C big_rule.yarc README.md"
Benchmark 1: ./target/release/yr scan -C big_rule.yarc README.md
  Time (mean ± σ):      5.636 s ±  0.870 s    [User: 5.492 s, System: 0.156 s]
  Range (min … max):    4.074 s …  6.985 s    10 runs

It's still a few seconds slower than my YARA v4 results #209 (comment) but all of my issues have been addressed. Thank you for your help and walking me through this

@plusvic
Copy link
Member

plusvic commented Dec 2, 2024

The issues with rustc 1.81 should be solved once this fix is merged and propagated to wasmtime and yara-x: bytecodealliance/regalloc2#204

@plusvic plusvic closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants