Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disputes raised due to RuntimeConstruction error in pvf execution #2863

Closed
sandreim opened this issue Jan 5, 2024 · 5 comments · Fixed by #2895
Closed

Disputes raised due to RuntimeConstruction error in pvf execution #2863

sandreim opened this issue Jan 5, 2024 · 5 comments · Fixed by #2895
Labels
I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.

Comments

@sandreim
Copy link
Contributor

sandreim commented Jan 5, 2024

The issue is happening on Rococo, logs: https://grafana.teleport.parity.io/goto/BQfD20FSg?orgId=1

WARN tokio-runtime-worker parachain::dispute-coordinator: Candidate 0x0008ce7fe58231c871fda90dd6fbf140fc5961bf0fad7301139340406a64bc63 considered invalid: ExecutionError("execute: execute error: RuntimeConstruction(Other(\"cannot deserialize module: compilation settings of module incompatible with native host: compilation setting \\\"has_avx512bitalg\\\" is enabled, but not available on the host\"))")

WARN tokio-runtime-worker parachain::pvf: execution worker concluded, error occurred: Invalid(WorkerReportedInvalid("execute: execute error: RuntimeConstruction(Other(\"cannot deserialize module: compilation settings of module incompatible with native host: compilation setting \\\"has_avx512bitalg\\\" is enabled, but not available on the host\"))")) artifact_id=ArtifactId { code_hash: 0x90efa66f83c794ea1ce93f4a7f019a389a6ba151493326bd790e49077515f369, executor_params_hash: 0x03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314 } worker=Worker(2v1) worker_rip=false

WARN tokio-runtime-worker parachain::approval-voting: Detected invalid candidate as an approval checker. reason=ExecutionError("execute: execute error: RuntimeConstruction(Other(\"cannot deserialize module: compilation settings of module incompatible with native host: compilation setting \\\"has_avx512bitalg\\\" is enabled, but not available on the host\"))") candidate_hash=0xa9a745a082c2aadb0077a421b5902be4ef54f4c1f5d0da4871edd2ca65e0ca09 para_id=Id(2030) traceID=225508057066263142710214066364837014500

To answer why is this happening, requires more digging. One possible answer is the host VM type was changed. But, in any case we shouldn't raise a dispute for candidates that we can't check due to such errors.

CC @s0me0ne-unkn0wn @mrcnski

@sandreim sandreim added T0-node This PR/Issue is related to the topic “node”. I2-bug The node fails to follow expected behavior. labels Jan 5, 2024
@bkchr
Copy link
Member

bkchr commented Jan 5, 2024

The reason is probably that the VM was moved and as we don't clean up the cache anymore since #685.

The solution should be that we include the host architecture as well in the artifact name.

@s0me0ne-unkn0wn
Copy link
Contributor

@mrcnski probably it's perfect timing for fixing it in the scope of #2604?

@mrcnski
Copy link
Contributor

mrcnski commented Jan 6, 2024

Ah no!

  1. Maybe we should actually try constructing the runtime for each artifact on node startup, and prune any that fail.
  2. And yeah, we should as well include the host architecture in the artifact name, in case a mismatch could also lead to other errors apart from runtime construction. (i.e. runtime construction succeeds but execution fails due to architecture mismatch - not sure if it's possible?) Any other info that would be good to include?

Also related: #661. The tl;dr is that I didn't have confidence to treat RuntimeConstruction as an internal error, though it should be safe to do so. It is rather overloaded and needs some untangling.

Good that this only happened on Rococo so far. We should make this a high-priority fix!

@mrcnski
Copy link
Contributor

mrcnski commented Jan 6, 2024

@mrcnski probably it's perfect timing for fixing it in the scope of #2604?

You mean, do the considered refactor of preparation errors? This PR title is inaccurate because the logs indicate the error is in execution, not preparation. Indeed I was confused at first because preparation errors should never raise a dispute - will update the title. Or did you mean something else?

@mrcnski mrcnski changed the title Disputes raised due to error in pvf preparation Disputes raised due to RuntimeConstruction error in pvf execution Jan 6, 2024
@mrcnski
Copy link
Contributor

mrcnski commented Jan 6, 2024

The tl;dr is that I didn't have confidence to treat RuntimeConstruction as an internal error, though it should be safe to do so. It is rather overloaded and needs some untangling.

Actually we can do this now, I forgot about this: #661 (comment) For the record, the concern was that the Other variant specifically is overloaded - ideally for errors that we treat as internal, we have enumerated every error case.

mrcnski added a commit that referenced this issue Jan 10, 2024
Considering the complexity of
#2871 and the discussion
therein, as well as the further complexity introduced by the hardening
in #2742, as well as the
eventual replacement of wasmtime by PolkaVM, it seems best to remove
this persistence as it is creating more problems than it solves.

## Related

Closes #2863
@github-project-automation github-project-automation bot moved this from Backlog to Completed in parachains team board Jan 10, 2024
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this issue Mar 25, 2024
Considering the complexity of
paritytech#2871 and the discussion
therein, as well as the further complexity introduced by the hardening
in paritytech#2742, as well as the
eventual replacement of wasmtime by PolkaVM, it seems best to remove
this persistence as it is creating more problems than it solves.

## Related

Closes paritytech#2863
bkchr pushed a commit that referenced this issue Apr 10, 2024
* cargo update -p mio

* bump relayer version: v1.2.0

* Cargo.lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Completed
4 participants