[DO NOT MERGE] Test issue on Linux due to update. #540

gianfra-t · 2024-06-25T20:22:05Z

This PR is meant to discuss the issue found on some tests in Linux. Not intended to be merged, unless of course a fix can be found.

Tests are test_register_vaultand and test_subxt_processing_events_after_dispatch_error.
Due to the extra log level, it is evident that the transaction get stuck in the node.

gianfra-t · 2024-06-25T20:56:12Z

After adding logs into the relevant crates for transaction handling, which are sc-transaction-pool and sc-basic-authorship we can finally see what is going on with the transactions that get stuck and are never finalized.

The key piece of logging is the following:

[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Attempting to push transactions from the pool.
[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Pool status: PoolStatus { ready: 2, ready_bytes: 312, future: 0, future_bytes: 0 }
[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Consensus deadline reached when pushing block transactions, proceeding with proposing.

This happens on the authorsip module, where extrinsics are applied and added to the block.
We see on the logs Consensus deadline reached when pushing block transactions... from here
sometimes even before including a single one!!

For reference, when transactions are included we see:

[2024-06-25T16:37:06Z DEBUG sc_basic_authorship::basic_authorship] Attempting to push transactions from the pool.
[2024-06-25T16:37:06Z DEBUG sc_basic_authorship::basic_authorship] Pool status: PoolStatus { ready: 2, ready_bytes: 312, future: 0, future_bytes: 0 }
[2024-06-25T16:37:06Z TRACE sc_basic_authorship::basic_authorship] [0xbe9527d4bd813a710dcff0516263960217fb74af40ced2a7d70d58e8875eaa29] Pushing to the block.
[2024-06-25T16:37:08Z DEBUG sc_basic_authorship::basic_authorship] [0xbe9527d4bd813a710dcff0516263960217fb74af40ced2a7d70d58e8875eaa29] Pushed to the block.
[2024-06-25T16:37:08Z DEBUG sc_basic_authorship::basic_authorship] Consensus deadline reached when pushing block transactions, proceeding with proposing.

In that second stream, 1 transaction got included while the other one did not.

How much is the deadline?

We define the authorship for the manual seal here and the propose factory here. Both of which are relevant for proposing using the manual_seal functionality.

Function run_manual_seal function will attempt to seal a new block which will call propose of the propose factory, the important bit is that the deadline is a constant set to 10 seconds.

Internally, propose calls propose_with with a deadline using the following formula, which should leave many seconds for transactions to get applied.
Further down the line, propose_with will call apply_extrinsics which should apply as many ready transactions on the pool as possible. There is where we see the deadline being hit.

The question is why is this deadline being met so quickly? This would also explain in theory why it can fail "sometimes", since it will depend on the execution time between those processes, which is system dependent.

TorstenStueber · 2024-06-26T13:09:22Z

@gianfra-t What is the purpose of proposing a transaction?

gianfra-t · 2024-06-26T14:07:22Z

What do you mean @TorstenStueber ? Functions propose and propose_with? As I understand the term Proposal refers to a proposed block not a transaction. Internally, it attempts to apply (execute) as many transactions as possible, limited by this apparent 6 second window.

TorstenStueber · 2024-06-26T15:29:26Z

I am not familiar with the code but this seems to use manual seal on a chain that runs locally for testing purposes? Is it important to put multiple transactions into a block? Can we not just submit transactions the classical way and won't run into this problem with the Proposer (I say this with very superficial knowledge)?

ebma · 2024-06-27T10:48:14Z

Can we not just submit transactions the classical way

What's the classical way for you? Authoring blocks every 6 to 12 seconds without manual seal?

The question is why is this deadline being met so quickly? This would also explain in theory why it can fail "sometimes", since it will depend on the execution time between those processes, which is system dependent.

But we do know that it never fails for the integration tests and it also fails if you just run one of the two tests you mentioned in the description in isolation. So I doubt that it's due to resource consumption or processing power of the node. We also know that it doesn't happen on macOS. What if something goes wrong when comparing the current time to the deadline? This wouldn't explain why it works for the integration tests though. Since we can't touch the code of the transaction-pool and authorship pallets (without forking), @gianfra-t have you tried comparing our configuration for the manual seal with eg. the one interlay uses? Maybe there is a discrepancy and we can solve the issue by changing some of the parameters.

gianfra-t · 2024-06-27T13:17:37Z

I do not know why this doesn't happen also on the integration tests. If it has the exact same manual seal configuration, then it may be that since the chain is generating blocks all the time, the transactions have a chance to be re-included in a block. On these failing tests, there is no new block creation after 1 or 2 blocks. Although I tried to create new blocks as a potential solution for this and that didn't work either...

I will try to compare some manual seal configurations because there is a big chance that changing that can solve this. I would like to delay the block proposing process, ideally.

Because even like @TorstenStueber said about sending only 1 transaction, sometimes the deadline is being met even after the inclusion of that single one!.

On the latest ci logs of this PR we can see how the first log for the relevant times (now and deadline) which correspond to the initiation of the sealing process happens way too early before the applying process takes place.

ebma · 2024-06-27T14:48:38Z

If it has the exact same manual seal configuration, then it may be that since the chain is generating blocks all the time, the transactions have a chance to be re-included in a block.

But in the integration tests, we are using manual seal as well so the blocks are not generated all the time either. For every test, we are launching a separate chain so that the tests don't influence each other and each one of them has a 'clean environment'.

Correct me if I'm wrong or misunderstanding the problem, but the authorship pallet rejects including the transaction into the block because it probably thinks that the transaction is too large and would require more time for processing than is available in the block (?)

gianfra-t · 2024-06-27T19:23:17Z

Yes, but in the integration tests transactions are being sent asynchronously all the time, and every time a transaction gets into the pool the node (with manual seal) will start another block, otherwise it won't. Or do you say that every test on the integration also waits for the transaction to be finalized? I do not know enough about those tests.

And regarding the rejection what I understand from the logs and the corresponding line where the deadline is checked, is that the the transaction gets fetched from the pool but is not even attempted to be executed. Even the estimation happens later.

And Interlay, for instance, seems to have the same config :/ (That would be create_empy and finalize set to true)

ebma · 2024-06-28T16:50:01Z

Hmm you are right, the size of the transaction couldn't be it. Thanks for sharing the links.

This configuration is different though, maybe it makes a difference?

gianfra-t · 2024-06-28T18:57:46Z

Mmmm interesting. What changes is the inherent provider. I think those are applied before the extrinsics, but if ours takes a long time it could make the difference. Worth the shot!

…for 1.1.0 update

ebma · 2024-07-26T13:35:37Z

clients/runtime/src/tests.rs

+#![allow(dead_code)]
+#![allow(unused_imports)]
+
 #![allow(dead_code)]
 #![allow(unused_imports)]


I don't think we need these declarations twice.

ebma · 2024-07-26T13:35:47Z

clients/stellar-relay-lib/src/connection/connector/connector.rs

+
+

Nit: remove

* update dependencies, fixing compilation errors (WIP) * update pallets to 1.1.0 * fix subxt feature flags * fix client configuration new fields * fix clippy issues and missing std flag * update fixes and missing configs for runtimes * fixes to node configuration (wip) * node client fixes * bump subxt to minimum viable version * use static wrapper for account_id and currency_id metadata replacement * use account_id struct from subxt implementing Encode and Decode as type to avoid Static wrapper * fix account type conversion using new subxt type * missing flag, cleanup unused types * temprary remove conditional metadata * use foucoco metadata only * warnings and type conversion * re-implement hashable trait * replace account_id on tests * use subxt version 0.29 across crates * modification of service config values * testing increase in timeout for tests * prints for testing * bump to subxt 0.30.0 * temp remove unrelated jobs * work with ubuntu latest * bump to subxt 0.33 seal block periodically seal blocks more often seal blocks more often block finalization force hack * local linux tests * testing finalizing first block solution * setup working locally * test only integration on ci * cleaning up temporary changes * small code improvements * more comment removal * use ubuntu latest os for main test * Use different function to set exchange rate * `lookup_bytes` is only used here. Removing `.clone()`. [`fn fetch_raw(...)` accepts `Vec<u8>`](/~https://github.com/paritytech/subxt/blob/f06a95d687605bf826db9d83b2932a73a57b169f/subxt/src/storage/storage_type.rs#L51). `fn build()` already [accepts Url](/~https://github.com/paritytech/jsonrpsee/blob/v0.20.0/client/transport/src/ws/mod.rs#L279) * cleanup * handle all metadata instances * temporary disable unused code on test file * add all-features flag to vault * addresses the confusing `all-features` feature of `runtime` package; Reuse the `UrlParseError` by changing it to `url::ParseError`, instead of jsonrpsee's `InvalidUri` * comment cleanups * use macos image for main workflow testing * revert * print current directory * remove toolchain * ran cargo +nightly-2024-02-09 fmt --all * revert, to use ubuntu * Revert to mac-os This reverts commit 7f25b8f. * uncomment failing test * force install rustup * use macos-13 * ignore subxt issues, and print out proof error * Revert "ignore subxt issues, and print out proof error" This reverts commit 08da9d6. * Revert "Merge branch 'main' into polkadot-v1.1.0" This reverts commit 255845c, reversing changes made to 08da9d6. * ignore failing subxt tests and print out proof error * update cargo lock /~https://github.com/pendulum-chain/spacewalk/actions/runs/10079792308/job/27899770357?pr=536#step:14:24 * fix cargo.lock; update only dia-oracle * update network * add back iowa * [DO NOT MERGE] Test issue on Linux due to update. (#540) * add extra cargo.toml specially for CI testing. * stellar relay config sdftest2 ip change * fixing some regressions from incomplete merge * fixing more merge issues * missing changes from #538 * fix AccountId PrettyPrint impl * fix compare spec version name * fix benchmark changes merge issue * remove when possible explicit declaration of pallets in construct_runtime macro * cargo fmt * remove all-features feature * testing inherent modification * Revert "remove when possible explicit declaration of pallets in construct_runtime macro" This reverts commit 33a6067. * Revert "Revert "remove when possible explicit declaration of pallets in construct_runtime macro"" This reverts commit 17f8100. * regenerate metadata after removal of explicit pallet delcaration in macro * Revert "testing inherent modification" This reverts commit 3345c90. * use pendulum's polkadot-sdk fork with modified constant * modification of stellar sdftest3 ip * modify also vault resources sdftest3 ip --------- Co-authored-by: Marcel Ebert <mail@marcel-ebert.de> Co-authored-by: b-yap <2826165+b-yap@users.noreply.github.com>

add logs

cf90568

gianfra-t marked this pull request as ready for review June 26, 2024 15:21

CI test with extra logs

f0e1bdc

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 128aef1 to f0e1bdc Compare June 26, 2024 15:23

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 18fe89a to 3b8a104 Compare June 26, 2024 15:30

merged polkadot-v1.1.0

37c6f18

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 3b8a104 to 37c6f18 Compare June 26, 2024 15:37

gianfra-t added 5 commits July 8, 2024 10:04

configure delayed finalizer

4d5979a

don't create empty block

7b88a5e

interval sealing

830a82a

remove unused imports

b681ce3

remove comments

c308099

gianfra-t force-pushed the test_issue_update_to_polkadot branch 2 times, most recently from 89677c7 to 04b57ed Compare July 9, 2024 07:45

fully manual seal

60925d8

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 04b57ed to 60925d8 Compare July 9, 2024 07:58

test ubuntu 20.4

2308620

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 7cb4fbc to 0422c1f Compare July 9, 2024 09:16

.

dc97520

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 0422c1f to dc97520 Compare July 9, 2024 09:24

gianfra-t added 2 commits July 24, 2024 09:25

use cargo.lock built with main cargo.toml, re-introduce test changes …

d9c1fc4

…for 1.1.0 update

remove extra logs, use 1.77 on stable

dab49eb

gianfra-t force-pushed the test_issue_update_to_polkadot branch 2 times, most recently from 28ee579 to 95b9145 Compare July 24, 2024 15:01

run all test

f29aa46

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 95b9145 to f29aa46 Compare July 25, 2024 08:10

gianfra-t added 14 commits July 25, 2024 05:35

allow dead code on tests again

a45a8a0

temporarily ignore connector tests

5b74922

ignoring more tests

96786f8

.

18c6a13

update stellar networks version and ip

4263af8

remove ignore from failing stellar tests

5d84ddd

Merge branch 'polkadot-v1.1.0' into test_issue_update_to_polkadot

1becfe8

uncomment more tests

7f6972c

remove special log flags

c79b755

enable subxt tests, disable again tx matching tests

b31a4eb

change stellar sdftest2 ip

2f2147e

sdftest ip change

abf7a5b

change ip of sdftest3 back

0450007

remove extra stellar configs

8b4871c

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 73f96ea to 4f49a1f Compare July 26, 2024 13:20

use iowa

50ff2b9

gianfra-t force-pushed the test_issue_update_to_polkadot branch from 4f49a1f to 50ff2b9 Compare July 26, 2024 13:21

ebma approved these changes Jul 26, 2024

View reviewed changes

gianfra-t added 3 commits July 26, 2024 11:26

cleanup test changes

29121d8

format cleanup

b0766e5

remove testing changes to dev ci

6cc4ae3

gianfra-t merged commit a2f1957 into polkadot-v1.1.0 Jul 26, 2024
2 of 6 checks passed

gianfra-t deleted the test_issue_update_to_polkadot branch July 26, 2024 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Test issue on Linux due to update. #540

[DO NOT MERGE] Test issue on Linux due to update. #540

gianfra-t commented Jun 25, 2024 •

edited

Loading

gianfra-t commented Jun 25, 2024

TorstenStueber commented Jun 26, 2024

gianfra-t commented Jun 26, 2024

TorstenStueber commented Jun 26, 2024

ebma commented Jun 27, 2024

gianfra-t commented Jun 27, 2024

ebma commented Jun 27, 2024

gianfra-t commented Jun 27, 2024 •

edited

Loading

ebma commented Jun 28, 2024

gianfra-t commented Jun 28, 2024

ebma Jul 26, 2024

ebma Jul 26, 2024

[DO NOT MERGE] Test issue on Linux due to update. #540

[DO NOT MERGE] Test issue on Linux due to update. #540

Conversation

gianfra-t commented Jun 25, 2024 • edited Loading

gianfra-t commented Jun 25, 2024

TorstenStueber commented Jun 26, 2024

gianfra-t commented Jun 26, 2024

TorstenStueber commented Jun 26, 2024

ebma commented Jun 27, 2024

gianfra-t commented Jun 27, 2024

ebma commented Jun 27, 2024

gianfra-t commented Jun 27, 2024 • edited Loading

ebma commented Jun 28, 2024

gianfra-t commented Jun 28, 2024

ebma Jul 26, 2024

Choose a reason for hiding this comment

ebma Jul 26, 2024

Choose a reason for hiding this comment

gianfra-t commented Jun 25, 2024 •

edited

Loading

gianfra-t commented Jun 27, 2024 •

edited

Loading