Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with subscriptions on parachain balances. #61

Open
stepanLav opened this issue Jan 27, 2022 · 32 comments
Open

Problem with subscriptions on parachain balances. #61

stepanLav opened this issue Jan 27, 2022 · 32 comments
Assignees
Labels
I2-bug The node fails to follow expected behavior.

Comments

@stepanLav
Copy link

Hello!

We faced with a problem when we made a transfer and immediately reconnect and subscribe for a balance update - the last event does not come on that subscription.

Environment:
Acala Mandala network

Case looks like:

1. Connected
2. Made transfer
3. Disconnected
4. Connected again
5. Subscribe on balance update by api.query.tokens.accounts

Actual result:
The balance stay on old value and new event doesn't come.
Expected result:
Receive event when balance was changed

It's reproducing not by 100%, on my view around 25% of all tries.
For reproducing it I prepared script:

/~https://github.com/stepanLav/simple_script/blob/master/src/script.ts
Example output of this script:

Curent balance is 14800000000
Balance from subscripttion is 14800000000
Seconds after subscription: 1
Seconds after subscription: 2
Seconds after subscription: 3
...
Seconds after subscription: 58
Seconds after subscription: 59
I can't fetch balance by subscription, let's reconnect and do it again
Curent balance is 14700000000
@bkchr
Copy link
Member

bkchr commented Jan 27, 2022

But your extrinsic is always included on chain?

@stepanLav
Copy link
Author

But your extrinsic is always included on chain?

Yes, right, In the example after 60 second I create a new connection and get the balance right.

@bkchr
Copy link
Member

bkchr commented Jan 27, 2022

But sometimes the subscription works?

@stepanLav
Copy link
Author

But sometimes the subscription works?

Yes, it happens in about 25% of all cases or slightly more.

@ggwpez
Copy link
Member

ggwpez commented Jan 27, 2022

Stupid question: But should you not subscribe before any change happens?
Or is the subscription also supposed to work retroactively?

@stepanLav
Copy link
Author

api.query.tokens.accounts

Yeah, I agree that this case looks strange, but we were faced with that when adding chains that use multiasset in our app. We had no problem switching connections before when we used subscription with "api.query.system.account".
And we do reconnection when a user is changing his accounts in app. Now we have an issue when the balance for parachains with multiasset can't update and after research found this strange thing, that exactly "api.query.tokens.accounts" may "lost" some events.

@bkchr
Copy link
Member

bkchr commented Jan 27, 2022

@jacogr any idea if that is a bug in js or some bug in the rpc?

@jacogr
Copy link

jacogr commented Jan 27, 2022

If it was a JS API bug, I would be flooded across all chains. I've had one report this week around subscriptions on parachains (which works correctly on relay chains and stand-alone chains) not updating. See polkadot-js/apps#6874

So there is really something weird going on, the results reminds me of the runtime version subscription issues we had a while back. What it feels like and looks like is that when the state update happens in a block that is not finalized, it doesn't update. This would more-or-less align with the "in 25% of cases this happens" observation.

@stepanLav
Copy link
Author

@bkchr @ggwpez Guys, Can I help with something else looking for a problem? Looks like Jaco faced it also.

@jacogr
Copy link

jacogr commented Feb 3, 2022

Does the same happen for you when you don't disconnect in the middle, i.e.

//make a transfer
    await api.tx.currencies.transfer(Receiver, {'token': 'DOT'}, 100000000).signAndSend(newPair)

    /*
    //recreate connection
    await api.disconnect()
    api = await connectionCreator(url)
    */

    //subscribe on balance
    api.query.tokens.accounts(newPair.address, {'token': 'DOT'}, ({ free }) => {
        console.log(`Balance from subscripttion is ${free}`)
        if (balance_before_transfer.toString() != free.toString()){
            throw new Error("Try again, it's happening 1/5 tymes")
        }
    });

aka 1 in 5 times it doesn't work? Trying to simplify to the bare-minimum that show the behavior.

Second question - as you 100% sure that the extrinsic is accepted in the timeout you set? (i.e. state should be updated)

@stepanLav
Copy link
Author

@jacogr Thank you for your attention!

Does the same happen for you when you don't disconnect in the middle, i.e.

No, it doesn't happen. It works well and I getting messages by subscriptions.

Second question - as you 100% sure that the extrinsic is accepted in the timeout you set? (i.e. state should be updated)

Yes, I'm pretty sure the balance was changed because after 60 seconds I re-created the subscription and got the updated balance. As I said it's reproduced time to time.

await delay(60000)

console.log(`I can't fetch balance by subscription, let's reconnect and do it again...`)
clearInterval(interval)

//recreate connection and try fetch balance again
await api.disconnect()
api = await connectionCreator(url)
await print_current_balance(api, newPair)

I recorded my case, on the left is our application that subscribed to the balance, on the right is running the script:

Screen.Recording.2022-02-03.at.18.26.06.mp4

@jacogr
Copy link

jacogr commented Feb 3, 2022

Ok, if you don't disconnect and it doesn't happen, it def. doesn't point to an issue with the state_subscribeStorage RPC. Cannot see how it would in that case, it should just be consistent behavior between them.

So a valid test may be to to -

  1. subscribe with 1 API
  2. create another instance and subscribe (this is your general disconnect/reconnect step)

They should match all the way through.

At this point, not sure how it can be Substrate related if it only happens on a new connection. (Also not sure how exactly that ends up being an issue, since you re-created the provider/API, so in the case of something global stuck in the VM, then it would be an explanation, but not otherwise)

@stepanLav
Copy link
Author

stepanLav commented Feb 10, 2022

@jacogr
I made script which is reproducing problem with 100%:

/~https://github.com/stepanLav/simple_script/blob/master/src/script.ts

firstApi.query.tokens.accounts(newPair.address, { 'token': 'DOT' }, async ({ free }) => {
        console.log(`Balance from subscripttion is ${free}`)

        //make a transfer only one times
        if (++count === 1) {
            await firstApi.tx.currencies.transfer(Receiver, { 'token': 'DOT' }, 100000000).signAndSend(newPair)
        }

        //if subscribe to the balance with second connection immediately after have got updated event it lead to the problem.
        if (count === 2) {
            secondApi.query.tokens.accounts(newPair.address, { 'token': 'DOT' }, async ({ free }) => {
                console.log(`Balance from NEW subscripttion is ${free}`)
            })
        }
    });

If we subscribe to the balance using a second api instance immediately after receiving an event on the first instance, we will get the old value through the second instance, which won't be updated.

@bkchr
Copy link
Member

bkchr commented Feb 10, 2022

@stepanLav and that fails for the Acala Parachain? Did you may also try Polkadot?

@stepanLav
Copy link
Author

stepanLav commented Feb 15, 2022

@bkchr
It's reproducing for Acala Parachain on Polkadot in 100% cases.

Looks like it depends of the pallet "tokens" because I tried reproduce it on Polkadot (main network) with subscription on balance through system.account() and it doesn't reproduce.

UPD:

This is also reproduced for Statemine Parachain in Kusama for assets.account() subscription.
Example with Remark (id=8)

firstApi.query.assets.account(8, newPair.address, async ({ value }) => {
        console.log(`Balance from subscripttion is ${value}`)

        //make a transfer only one times
        if (++count === 1) {
            await firstApi.tx.assets.transfer(8, { 'Id': Receiver }, 100000000).signAndSend(newPair)
        }

        //if subscribe to the balance with second connection immediately after have got updated event it lead to the problem.
        if (count === 2) {
            secondApi.query.assets.account(8, newPair.address, async ({ value }) => {
                console.log(`Balance from NEW subscripttion is ${value}`)
            })
        }
    });

@antonkhvorov
Copy link

@jacogr @bkchr due to our observation that it can be reproduced on the parachain networks (Statemine on Kusama, Acala on Polkadot, etc), we are assuming that it is related to the block finalization on the parachains. What do you think?

@bkchr
Copy link
Member

bkchr commented Mar 1, 2022

@koute could you take a look at this?

@koute
Copy link
Contributor

koute commented Mar 2, 2022

Sure thing; I'm on it.

@koute koute self-assigned this Mar 2, 2022
@koute
Copy link
Contributor

koute commented Mar 2, 2022

I can successfully reproduce this problem on a local testnet. I'll investigate it in more detail and fix it.

@koute
Copy link
Contributor

koute commented Mar 4, 2022

Okay, I know what the issue is. This is what happens:

  1. Best block is AAA. Current balance is 1.
  2. Block BBB is imported, but is not marked as best; transfer extrinsic is executed in this block, changing the balance from 1 to 2.
  3. New subscription is created; balance of 1 is fetched from the best block (which is still AAA) and sent to the client.
  4. Block CCC (which is built on top of block BBB) becomes the best block/is finalized.

Basically the issue here is that we send the storage change notifications only when a block is imported, but the initial value we send to the subscriber when they first connect is what is in the storage for the best block, which might not be the same as the most recently imported one. And once another block becomes the best block we don't send any new storage notifications to bring the client up to speed on the current state of the storage for the current best block.

So essentially the storage change notifications are currently fundamentally broken as they neither guarantee that you'll see all of the storage changes nor guarantee that the storage changes you see will actually be finalized and stay on chain.

So I guess we could fix it in one of the following ways:

  1. Eagerly send out storage change notifications just as we do right now but send a bunch of fixup notifications when a new best block is picked in cases where that would otherwise result in an inconsistent state.
  2. Same as (1), but send fixups on block finalization.
  3. Only send out storage change notifications for finalized blocks.

I'm guessing (1) would be ideal?

And, just an idea, as an extra feature after fixing this maybe we could also add an extra flag to the storage change notifications telling the subscriber whether the change is final or not and also emit events on finalization. So, e.g. the subscriber would potentially see the following:

  1. Block A: "X changed to 1, not final" (the initial change)
  2. Block B: "X changed to 2, not final" (another block is picked as best, but still not finalized)
  3. Block B->C: "X changed to 3, not final" (new best block built on top of the previous one, still not final)
  4. Block B: "X changed to 2, final" (the previously picked block was finalized)
  5. Block B->C: "X changed to 3, not final" (repeat the 3rd notification, since the 4th overwrote it)

So that way the clients could choose whether they want to observe the most recent potentially unfinalized state, only the finalized state, or both, and could do that with a single notification stream. (Alternatively have the client tell us which kind of a stream they want: the best block one, or the finalized block one, and if they need both have them just subscribe twice. Not sure which approach would be better.)

@bkchr Before I get to fixing this just so that I don't waste my time going for the wrong approach - does this sound good to you? Anything I'm missing here?

@bkchr
Copy link
Member

bkchr commented Mar 17, 2022

We had some internal discussion about the problem as described by @koute above.

  1. Eagerly send out storage change notifications just as we do right now but send a bunch of fixup notifications when a new best block is picked in cases where that would otherwise result in an inconsistent state.

I think I would vote for this solution as you already have proposed. I would say we do it the following way:

  1. We track of which block a peer has seen the storage changes.
  2. When we change the best block to something that a peer has not yet seen, we read all the tracked keys.

Regarding supporting finalized vs best block. I don't really know, that mainly depends on how the UI is handling this. I mean the UI can already handle this by being subscribed to the storage changes and also to finalized heads. The moment we have a new finalized head, it could load the observed storage changes.

@jacogr how is polkadot js handling these storage changes? Do you have some in memory fork aware tree structure that keeps track of all the storage changes of each block?

@jacogr
Copy link

jacogr commented Mar 18, 2022

Each subscription only updates when the RPC sends it new values and then it will return those values to the subscriber. So the JS layer doesn't do anything specific to keep track, it just feeds the values through as the PRCs provides them. Basically all state changes are only done on the Node RPC layer.

So for instance, assuming some UI out there that displays a balance for an account -

  1. UI side makes a sub to the key for system account
  2. On initial connect the RPC returns the as-is value
  3. if a change is detected this is passed town the sub
  4. UI displays this new value

My guess would be that if the initial value is provided as a finalized value, it also helps.

@bkchr
Copy link
Member

bkchr commented Mar 18, 2022

Ahh! I thought you would track this based on forks and what is the best block etc.

This also means that UI currently shows anything, could be any fork or just what came later. This sounds like we should redesign this entire thing here...

The UI should probably on listen for finalized changes, but others are may still interested in all the changes, because they can use that to track every key change.

@stepanLav
Copy link
Author

@koute
Hello!
Any news about this problem? We are still encountering it.

@koute
Copy link
Contributor

koute commented Apr 11, 2022

@stepanLav Yes, this is unfortunately not that trivial to fix as there are fundamental issues with the way the storage notifications subscriptions are currently implemented, which might necessitate some sort of redesign of the API.


So I had thought about this problem some more, and basically from what I can see we have three potential requirements here:

a) Never re-execute blocks. Do not store a list of changed keys with each block.
b) Allow the user to subscribe to a stream of storage notifications without specifying all of the keys.
c) Make sure the subscribers see don't miss out on any changes and all the changes they'll (eventually) see will be finalized on the chain.

and we can only pick 2 out of 3. Right now we've essentially picked A and B, so if we want to fulfill C it leaves us either going with:

  1. get rid of requirement A: store a list of changed keys with each block (in the database?) so that we'll be able to emit fixup notifications on chain reorganization while not having to re-execute the blocks,
  2. or get rid of requirement B: essentially remove the ability from the API to subscribe to notifications without specifying the exact keys we want, so we won't need to know what changed in a block and will be able to just resend everything that the client is subscribed for when a chain reorganization happens,
  3. or resend everything that the client is subscribed for if it's a subscription for specific keys (so same as number 2), but instead of removing support for wildcard subscriptions just start sending them starting from the next finalized block, after which the client will keep track of all of the changes that are made from that point on and can handle the reorganizations itself (because it'll get notified about every block that'll be built on top of the already finalized block).

So @bkchr @jacogr which option do we go for? (Unless, of course, someone has a better idea.)

Personally I'd either go for 1 (which would keep the client simple since it doesn't have to do anything extra, and keep the API the same, but make substrate more complex), or 3 (which would keep substrate simple since we don't have to store any extra data, keep the API the same, but will make the client more complex, and will change the semantics a little). I'm guessing number 2 is not acceptable since someone somewhere surely needs this functionality?

Either way I'd be happy to push this and do the legwork as long as we can decide on something.

(I'm not getting into whether the client should follow the finalized block, or the best block, or just simply the most recent block, since in all of those cases it will still be necessary to somehow keep a consistent view of the storage. We can decide on that later.)


Appendix: why can't we keep the wildcard notifications and not track what changed in the storage for each block?

For the sake of argument consider the following simple scenario:

A──> B
└──> C

A is the finalized block. B is the best block.

The subscriber connects and subscribes to any new changes. New block D is imported on top of C, and that new D block gets finalized:

A──> B
└──> C ──> D

The client gets sent all of the changes from the D block, and that's all fine, however since there was a reorganization it's missing notifications about all of the changes from C.

And we can't get those changes because the client subscribed to any changes so we don't have a specific list of keys whose values we could fetch from the storage and send to the client, and we can't get a list of all of the changes from C since we don't store them and can't re-execute the block.

@bkchr
Copy link
Member

bkchr commented Apr 12, 2022

3. or resend everything that the client is subscribed for if it's a subscription for specific keys (so same as number 2), but instead of removing support for wildcard subscriptions just start sending them starting from the next finalized block, after which the client will keep track of all of the changes that are made from that point on and can handle the reorganizations itself (because it'll get notified about every block that'll be built on top of the already finalized block).

I think we should do this.

While I don't really get why you want to send wildcards from the next finalized block? If you have a wildcard subscription you probably do some kind of indexing, aka are interested in all the blocks. However, that is no problem. I would just ignore the wildcard subscriptions when it is about to resend missed keys. Clients that are interested in all the changes of a block can then use state_traceBlock to re-request the missed blocks. AFAIK @xlc is already doing it exactly in this way.

@koute
Copy link
Contributor

koute commented Apr 12, 2022

While I don't really get why you want to send wildcards from the next finalized block?

Mostly to discourage and lower the chance of misuse, and I think it's somewhat cleaner since it guarantees that you can reconstruct a consistent view of the storage without having to fetch any historical blocks (state_trackBlock is RPC-unsafe, while a wildcard subscription is not) and the clients don't have to special-case this. If you're only interested in what ends up finalized (as opposed to every change in every block) you must either ignore all of the changes before the next finalized block, or you need to trace all of the historical blocks back to the last finalized block. It removes a potential footgun in an RPC-safe method when you don't have access to the RPC-unsafe methods, and when you do have access to the RPC-unsafe methods you can fetch those blocks anyway.

But it is of course a tradeoff.

If you prefer to not to do this then I guess that's fine too; we'll have to then do this in the client(s) anyway though. (For those clients that use/support wildcard subscriptions; not sure how many are there.)

@jacogr
Copy link

jacogr commented Apr 12, 2022

IMHO, the wildcard subs where it means “any key changes in this block” should be unsafe.

It really has very specific use, once again, imho - it certainly is very valuable, but it is not “general use”, aka the number we actually get should be on the low side. As an aside, the light client explicitly doesn’t support this mode as I can see.

The majority of the subs are for specific keys in general dapps - “tell me when x changes”. I believe this is what we should aim to get right.

Would like to confirm with the request or behind “gimme all”, but in this case I believe we may be ok with finalized?

@koute
Copy link
Contributor

koute commented Apr 13, 2022

Considering wildcard subscriptions will require extra work on the client to get right (unless it only wants to dump all changes and doesn't care about consistency of the final state) I like the idea of making those RPC-unsafe. We can then just document the caveats and leave it as-is.

Are we sure that won't break someone's usecase though? Is everyone who's using wildcard subscriptions already using their own node which is already running in allow-unsafe-RPC mode or can be switched into it? And @jacogr does polkadot.js even support wildcard subscriptions in the first place?

@jacogr
Copy link

jacogr commented Apr 13, 2022

Since the RPC exposes it, yes, you can make wildcard subs with polkadot-js. Not on the general api.query.* interfaces there it is pin-point to a specific item (or items), but you can call the RPC directly.

I'm not quite sure it can be made unsafe on the RPC layer, well, probably not via the same interface we already have. (Granted, it has literally been a very long time since I delved into that). And yes, doing that will break it for somebody.

@bkchr
Copy link
Member

bkchr commented Apr 13, 2022

I'm also fine with making wildcard support unsafe and fix the normal use case :)

I still don't understand how wildcard support would work for finalized blocks because that would again mean we would need to store the changes that happened in a block. But as we don't want to do it, I don't really need to understand it :P

@koute
Copy link
Contributor

koute commented Apr 13, 2022

I'm also fine with making wildcard support unsafe and fix the normal use case :)

Okay, so if there are no further objections I'll start working on a PR with a fix soon-ish. (:

I still don't understand how wildcard support would work for finalized blocks because that would again mean we would need to store the changes that happened in a block. But as we don't want to do it, I don't really need to understand it :P

Yes, you'd need to store the changes, so if we don't want to store those then this cannot be done in substrate. But it can be done by the clients themselves, they just need to temporarily buffer all of the changes for the non-finalized blocks since the last finalized block, and then once a new block gets finalized retrace the path from the last finalized block through all of the the newly finalized blocks and collect all of the storage changes along the way.

If polkadot.js had some convenient API for wildcard subscriptions we'd do it there, but you have to call the RPC directly to do this anyway, so we don't have to and can just leave it up to the users'. (Especially if we'd make the wildcards unsafe and properly document that "here be dragons" so people are aware of how it works.)

@the-right-joyce the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I2-bug The node fails to follow expected behavior. and removed I3-bug labels Aug 25, 2023
lexnv pushed a commit that referenced this issue Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior.
Projects
Status: in progress
Development

Successfully merging a pull request may close this issue.

7 participants