Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Test issue on Linux due to update. #540

Merged
merged 39 commits into from
Jul 26, 2024

Conversation

gianfra-t
Copy link
Contributor

@gianfra-t gianfra-t commented Jun 25, 2024

This PR is meant to discuss the issue found on some tests in Linux. Not intended to be merged, unless of course a fix can be found.

Tests are test_register_vaultand and test_subxt_processing_events_after_dispatch_error.
Due to the extra log level, it is evident that the transaction get stuck in the node.

@gianfra-t
Copy link
Contributor Author

After adding logs into the relevant crates for transaction handling, which are sc-transaction-pool and sc-basic-authorship we can finally see what is going on with the transactions that get stuck and are never finalized.

The key piece of logging is the following:

[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Attempting to push transactions from the pool.
[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Pool status: PoolStatus { ready: 2, ready_bytes: 312, future: 0, future_bytes: 0 }
[2024-06-25T16:33:55Z DEBUG sc_basic_authorship::basic_authorship] Consensus deadline reached when pushing block transactions, proceeding with proposing.

This happens on the authorsip module, where extrinsics are applied and added to the block.
We see on the logs Consensus deadline reached when pushing block transactions... from here
sometimes even before including a single one!!

For reference, when transactions are included we see:

[2024-06-25T16:37:06Z DEBUG sc_basic_authorship::basic_authorship] Attempting to push transactions from the pool.
[2024-06-25T16:37:06Z DEBUG sc_basic_authorship::basic_authorship] Pool status: PoolStatus { ready: 2, ready_bytes: 312, future: 0, future_bytes: 0 }
[2024-06-25T16:37:06Z TRACE sc_basic_authorship::basic_authorship] [0xbe9527d4bd813a710dcff0516263960217fb74af40ced2a7d70d58e8875eaa29] Pushing to the block.
[2024-06-25T16:37:08Z DEBUG sc_basic_authorship::basic_authorship] [0xbe9527d4bd813a710dcff0516263960217fb74af40ced2a7d70d58e8875eaa29] Pushed to the block.
[2024-06-25T16:37:08Z DEBUG sc_basic_authorship::basic_authorship] Consensus deadline reached when pushing block transactions, proceeding with proposing.

In that second stream, 1 transaction got included while the other one did not.

How much is the deadline?

We define the authorship for the manual seal here and the propose factory here. Both of which are relevant for proposing using the manual_seal functionality.

Function run_manual_seal function will attempt to seal a new block which will call propose of the propose factory, the important bit is that the deadline is a constant set to 10 seconds.

Internally, propose calls propose_with with a deadline using the following formula, which should leave many seconds for transactions to get applied.
Further down the line, propose_with will call apply_extrinsics which should apply as many ready transactions on the pool as possible. There is where we see the deadline being hit.

The question is why is this deadline being met so quickly? This would also explain in theory why it can fail "sometimes", since it will depend on the execution time between those processes, which is system dependent.

@TorstenStueber
Copy link
Contributor

@gianfra-t What is the purpose of proposing a transaction?

@gianfra-t
Copy link
Contributor Author

What do you mean @TorstenStueber ? Functions propose and propose_with? As I understand the term Proposal refers to a proposed block not a transaction. Internally, it attempts to apply (execute) as many transactions as possible, limited by this apparent 6 second window.

@gianfra-t gianfra-t marked this pull request as ready for review June 26, 2024 15:21
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 128aef1 to f0e1bdc Compare June 26, 2024 15:23
@TorstenStueber
Copy link
Contributor

I am not familiar with the code but this seems to use manual seal on a chain that runs locally for testing purposes? Is it important to put multiple transactions into a block? Can we not just submit transactions the classical way and won't run into this problem with the Proposer (I say this with very superficial knowledge)?

@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 18fe89a to 3b8a104 Compare June 26, 2024 15:30
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 3b8a104 to 37c6f18 Compare June 26, 2024 15:37
@ebma
Copy link
Member

ebma commented Jun 27, 2024

Can we not just submit transactions the classical way

What's the classical way for you? Authoring blocks every 6 to 12 seconds without manual seal?

The question is why is this deadline being met so quickly? This would also explain in theory why it can fail "sometimes", since it will depend on the execution time between those processes, which is system dependent.

But we do know that it never fails for the integration tests and it also fails if you just run one of the two tests you mentioned in the description in isolation. So I doubt that it's due to resource consumption or processing power of the node. We also know that it doesn't happen on macOS. What if something goes wrong when comparing the current time to the deadline? This wouldn't explain why it works for the integration tests though. Since we can't touch the code of the transaction-pool and authorship pallets (without forking), @gianfra-t have you tried comparing our configuration for the manual seal with eg. the one interlay uses? Maybe there is a discrepancy and we can solve the issue by changing some of the parameters.

@gianfra-t
Copy link
Contributor Author

I do not know why this doesn't happen also on the integration tests. If it has the exact same manual seal configuration, then it may be that since the chain is generating blocks all the time, the transactions have a chance to be re-included in a block. On these failing tests, there is no new block creation after 1 or 2 blocks. Although I tried to create new blocks as a potential solution for this and that didn't work either...

I will try to compare some manual seal configurations because there is a big chance that changing that can solve this. I would like to delay the block proposing process, ideally.

Because even like @TorstenStueber said about sending only 1 transaction, sometimes the deadline is being met even after the inclusion of that single one!.

On the latest ci logs of this PR we can see how the first log for the relevant times (now and deadline) which correspond to the initiation of the sealing process happens way too early before the applying process takes place.

@ebma
Copy link
Member

ebma commented Jun 27, 2024

If it has the exact same manual seal configuration, then it may be that since the chain is generating blocks all the time, the transactions have a chance to be re-included in a block.

But in the integration tests, we are using manual seal as well so the blocks are not generated all the time either. For every test, we are launching a separate chain so that the tests don't influence each other and each one of them has a 'clean environment'.

Correct me if I'm wrong or misunderstanding the problem, but the authorship pallet rejects including the transaction into the block because it probably thinks that the transaction is too large and would require more time for processing than is available in the block (?)

@gianfra-t
Copy link
Contributor Author

gianfra-t commented Jun 27, 2024

Yes, but in the integration tests transactions are being sent asynchronously all the time, and every time a transaction gets into the pool the node (with manual seal) will start another block, otherwise it won't. Or do you say that every test on the integration also waits for the transaction to be finalized? I do not know enough about those tests.

And regarding the rejection what I understand from the logs and the corresponding line where the deadline is checked, is that the the transaction gets fetched from the pool but is not even attempted to be executed. Even the estimation happens later.

And Interlay, for instance, seems to have the same config :/ (That would be create_empy and finalize set to true)

@ebma
Copy link
Member

ebma commented Jun 28, 2024

Hmm you are right, the size of the transaction couldn't be it. Thanks for sharing the links.

This configuration is different though, maybe it makes a difference?

@gianfra-t
Copy link
Contributor Author

Mmmm interesting. What changes is the inherent provider. I think those are applied before the extrinsics, but if ours takes a long time it could make the difference. Worth the shot!

@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch 2 times, most recently from 89677c7 to 04b57ed Compare July 9, 2024 07:45
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 04b57ed to 60925d8 Compare July 9, 2024 07:58
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 7cb4fbc to 0422c1f Compare July 9, 2024 09:16
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 0422c1f to dc97520 Compare July 9, 2024 09:24
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch 2 times, most recently from 28ee579 to 95b9145 Compare July 24, 2024 15:01
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 95b9145 to f29aa46 Compare July 25, 2024 08:10
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 73f96ea to 4f49a1f Compare July 26, 2024 13:20
@gianfra-t gianfra-t force-pushed the test_issue_update_to_polkadot branch from 4f49a1f to 50ff2b9 Compare July 26, 2024 13:21
Comment on lines 2 to 6
#![allow(dead_code)]
#![allow(unused_imports)]

#![allow(dead_code)]
#![allow(unused_imports)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these declarations twice.

Comment on lines 1 to 2


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove

@gianfra-t gianfra-t merged commit a2f1957 into polkadot-v1.1.0 Jul 26, 2024
2 of 6 checks passed
@gianfra-t gianfra-t deleted the test_issue_update_to_polkadot branch July 26, 2024 14:33
gianfra-t added a commit that referenced this pull request Jul 31, 2024
* update dependencies, fixing compilation errors (WIP)

* update pallets to 1.1.0

* fix subxt feature flags

* fix client configuration new fields

* fix clippy issues and missing std flag

* update fixes and missing configs for runtimes

* fixes to node configuration (wip)

* node client fixes

* bump subxt to minimum viable version

* use static wrapper for account_id and currency_id metadata replacement

* use account_id struct from subxt implementing Encode and Decode as type to avoid Static wrapper

* fix account type conversion using new subxt type

* missing flag, cleanup unused types

* temprary remove conditional metadata

* use foucoco metadata only

* warnings and type conversion

* re-implement hashable trait

* replace account_id on tests

* use subxt version 0.29 across crates

* modification of service config values

* testing increase in timeout for tests

* prints for testing

* bump to subxt 0.30.0

* temp remove unrelated jobs

* work with ubuntu latest

* bump to subxt 0.33

seal block periodically

seal blocks more often

seal blocks more often

block finalization force hack

* local linux tests

* testing finalizing first block solution

* setup working locally

* test only integration on ci

* cleaning up temporary changes

* small code improvements

* more comment removal

* use ubuntu latest os for main test

* Use different function to set exchange rate

* `lookup_bytes` is only used here. Removing `.clone()`.
[`fn fetch_raw(...)` accepts `Vec<u8>`](/~https://github.com/paritytech/subxt/blob/f06a95d687605bf826db9d83b2932a73a57b169f/subxt/src/storage/storage_type.rs#L51).

`fn build()` already [accepts Url](/~https://github.com/paritytech/jsonrpsee/blob/v0.20.0/client/transport/src/ws/mod.rs#L279)

* cleanup

* handle all metadata instances

* temporary disable unused code on test file

* add all-features flag to vault

* addresses the confusing `all-features` feature of `runtime` package;

Reuse the `UrlParseError` by changing it to `url::ParseError`, instead of jsonrpsee's `InvalidUri`

* comment cleanups

* use macos image for main workflow testing

* revert

* print current directory

* remove toolchain

* ran cargo +nightly-2024-02-09 fmt --all

* revert, to use ubuntu

* Revert to mac-os

This reverts commit 7f25b8f.

* uncomment failing test

* force install rustup

* use macos-13

* ignore subxt issues, and print out proof error

* Revert "ignore subxt issues, and print out proof error"

This reverts commit 08da9d6.

* Revert "Merge branch 'main' into polkadot-v1.1.0"

This reverts commit 255845c, reversing
changes made to 08da9d6.

* ignore failing subxt tests and print out proof error

* update cargo lock /~https://github.com/pendulum-chain/spacewalk/actions/runs/10079792308/job/27899770357?pr=536#step:14:24

* fix cargo.lock; update only dia-oracle

* update network

* add back iowa

* [DO NOT MERGE] Test issue on Linux due to update. (#540)

* add extra cargo.toml specially for CI testing.

* stellar relay config sdftest2 ip change

* fixing some regressions from incomplete merge

* fixing more merge issues

* missing changes from #538

* fix AccountId PrettyPrint impl

* fix compare spec version name

* fix benchmark changes merge issue

* remove when possible explicit declaration of pallets in construct_runtime macro

* cargo fmt

* remove all-features feature

* testing inherent modification

* Revert "remove when possible explicit declaration of pallets in construct_runtime macro"

This reverts commit 33a6067.

* Revert "Revert "remove when possible explicit declaration of pallets in construct_runtime macro""

This reverts commit 17f8100.

* regenerate metadata after removal of explicit pallet delcaration in macro

* Revert "testing inherent modification"

This reverts commit 3345c90.

* use pendulum's polkadot-sdk fork with modified constant

* modification of stellar sdftest3 ip

* modify also vault resources sdftest3 ip

---------

Co-authored-by: Marcel Ebert <mail@marcel-ebert.de>
Co-authored-by: b-yap <2826165+b-yap@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants