-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an instance of failed to decrypt error when an in flight /keys/query
fails.
#3486
Conversation
Specifically, when checking the event sender matches who sent us the session keys we skip waiting for pending device list updates if we already know who owns the session key.
try { | ||
await this.crypto.deviceList.downloadKeys([event.getSender()!], false); | ||
} catch (e) { | ||
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", { | ||
sender: deviceKey, | ||
err: e as Error, | ||
}); | ||
} | ||
|
||
senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it rather feels like the value of this is pretty limited, given that we're happy to accept messages from unknown devices.
The only time this might do something different is if a key-fetch for another user happens to complete before that for the sender, and it turns out that other user is the rightful owner of the device. That seems a pretty remote chance to me.
@BillCarsonFr wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that logic is correct, though I'm hesitant to remove it without understanding why we added it in the first place given this is a security thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the logic. Here we are receiving a megolm key by succesfully decrypting an olm message.
So from that we have a megolm session that is:
- owned by a curveKey
- claimed to be owned by a mxid
- claimed to be owned by a ed key
=> What we want is to validate the things that are claimed (and also sanity validate the homeserver controlled fields that are in the original event)
The only way to do that is to download the device (/keys/query
), we only accepted devices correctly signed (mxid|curve|ed|device_id are signed by the ed_key). This signature "binds" the curve/ed/mxid together
So once we have the device we can check if the claimed stuff match what's in the key/query
.
And typically the first time someone adds you and send you an encrypted to device you won't have yet downloaded his keys. => Makes it impossible to link to mxId at time of reception of the key.
Regarding the solution:
We need for m.room.keys to delay these checks, so not download anything at all. And instead at time of decryption check that, or when the keys are finally downloaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also notice that currently we can't make the difference between an not known yet device or a delete device, or a very short lived device.. (hence the red warning for message sent from a deleted device)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to do that is to download the device (/keys/query),
If we have previously stored a signed copy of the device, surely there is no need to download it again?
Regarding the solution: We need for m.room.keys to delay these checks, so not download anything at all.
Yes, that's the ideal solution, but that's really not the point here: this PR is trying to make an incremental improvement to the existing implementation.
There are two proposed improvements here:
- Erik's change as it currently stands: if we already have a signed copy of the device, there is no need to wait for any ongoing
/keys/query
requests to complete. Can we agree this is uncontroversial? - My later comment: given that - in this code - we treat an unknown device identically to a correctly-signed device, I don't really see the value in doing a
/keys/query
request anyway. Supposing the incoming message is from a bogus device, we're not going to get any extra information by requesting the device list from the claimed user: the bogus device isn't going to be in the list, so it is treated as "unknown".
This code has been observed to cause significant numbers of UISIs. I don't believe the current implementation offers any value, and Erik is proposing a quick win here. I think it would be a mistake to refuse it because it's not the ideal solution.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
try { | ||
await this.crypto.deviceList.downloadKeys([event.getSender()!], false); | ||
} catch (e) { | ||
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", { | ||
sender: deviceKey, | ||
err: e as Error, | ||
}); | ||
} | ||
|
||
senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the logic. Here we are receiving a megolm key by succesfully decrypting an olm message.
So from that we have a megolm session that is:
- owned by a curveKey
- claimed to be owned by a mxid
- claimed to be owned by a ed key
=> What we want is to validate the things that are claimed (and also sanity validate the homeserver controlled fields that are in the original event)
The only way to do that is to download the device (/keys/query
), we only accepted devices correctly signed (mxid|curve|ed|device_id are signed by the ed_key). This signature "binds" the curve/ed/mxid together
So once we have the device we can check if the claimed stuff match what's in the key/query
.
And typically the first time someone adds you and send you an encrypted to device you won't have yet downloaded his keys. => Makes it impossible to link to mxId at time of reception of the key.
Regarding the solution:
We need for m.room.keys to delay these checks, so not download anything at all. And instead at time of decryption check that, or when the keys are finally downloaded
try { | ||
await this.crypto.deviceList.downloadKeys([event.getSender()!], false); | ||
} catch (e) { | ||
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", { | ||
sender: deviceKey, | ||
err: e as Error, | ||
}); | ||
} | ||
|
||
senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also notice that currently we can't make the difference between an not known yet device or a delete device, or a very short lived device.. (hence the red warning for message sent from a deleted device)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have spoken to @BillCarsonFr and we think it's fine to merge this as-is. Per my earlier comments, I think we could go even further, but in the interests of expediency maybe we should just land it.
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> Co-authored-by: Michael Telatynski <7t3chguy@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: RiotRobot <releases@riot.im> Co-authored-by: Florian Duros <florianduros@element.io> Co-authored-by: Kerry <kerrya@element.io> Co-authored-by: David Baker <dbkr@users.noreply.github.com> Co-authored-by: Erik Johnston <erik@matrix.org> Co-authored-by: Valere <bill.carson@valrsoft.com> Co-authored-by: Hubert Chathi <hubertc@matrix.org> Close IDB database before deleting it to prevent spurious unexpected close errors (#3478) Fix export type `GeneratedSecretStorageKey` (#3479) Fix order of things in `crypto-api.ts` (#3491) Fix bug where switching media caused media in subsequent calls to fail (#3489) fixes (#3515) fix the integ tests, where #3509 etc fix the unit tests. fix breakage on node 16 (#3527) Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. (#3486) Fix `TypedEventEmitter::removeAllListeners(void)` not working (#3561)
* Add hacky option to disable the actual calling part of group calls. So we can try using livekit instead. * Put LiveKit info into the `m.call` state event (#3522) * Put LK info into state Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com> * Update to the new way the LK service works Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com> --------- Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com> * Send 'contentLoaded' event As per comment, so we can start digging ourselves out of the widget API hole we're currently in. * Add comment on updating the livekit service URL * Appease CI on `livekit` branch (#3566) * Update codeowners on `livekit` branch (#3567) * add getOpenIdToken to embedded client backend Signed-off-by: Timo K <toger5@hotmail.de> * add test and update comment Signed-off-by: Timo K <toger5@hotmail.de> * Merge `develop` into `livekit` (#3569) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> Co-authored-by: Michael Telatynski <7t3chguy@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: RiotRobot <releases@riot.im> Co-authored-by: Florian Duros <florianduros@element.io> Co-authored-by: Kerry <kerrya@element.io> Co-authored-by: David Baker <dbkr@users.noreply.github.com> Co-authored-by: Erik Johnston <erik@matrix.org> Co-authored-by: Valere <bill.carson@valrsoft.com> Co-authored-by: Hubert Chathi <hubertc@matrix.org> Close IDB database before deleting it to prevent spurious unexpected close errors (#3478) Fix export type `GeneratedSecretStorageKey` (#3479) Fix order of things in `crypto-api.ts` (#3491) Fix bug where switching media caused media in subsequent calls to fail (#3489) fixes (#3515) fix the integ tests, where #3509 etc fix the unit tests. fix breakage on node 16 (#3527) Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. (#3486) Fix `TypedEventEmitter::removeAllListeners(void)` not working (#3561) * Revert "Merge `develop` into `livekit`" (#3572) * Don't update calls with no livekit URL & expose method to update it instead and generally simplify a bit: change it to a single string rather than an array of structs. * Fix other instances of passing focusInfo / livekit url * Add temporary setter * WIP refactor for removing m.call events * Always remember rtcsessions since we need to only have one instance * Fix tests * Fix import loop * Fix more cyclic imports & tests * Test session joining * Attempt to make tests happy * Always leave calls in the tests to clean up * comment + desperate attempt to work out what's failing * More test debugging * Okay, so these ones are fine? * Stop more timers and hopefully have happy tests * Test no rejoin * Test malformed m.call.member events * Test event emitting and also move some code to a more sensible place in the file * Test getActiveFoci() * Test event emitting (and also fix it) * Test membership updating & pruning on join * Test getOldestMembership() * Test member event renewal * Don't start the rtc manager until the client has synced Then we can initialise from the state once it's completed. * Fix type * Remove listeners added in constructor * Stop the client here too * Stop the client here also also * ARGH. Disable tests to work out which one is causing the exception * Disable everything * Re-jig to avoid setting listeners in the constructor and re-enable tests * No need to rename this anymore * argh, remove the right listener * Is it this test??? * Re-enable some tests * Try mocking getRooms to return something valid * Re-enable other tests * Give up trying to get the tests to work sensibly and deal with getRooms() returning nothing * Oops, don't enable the ones that were skipped before * One more try at the sensible way * Didn't work, go back to the hack way. * Log when we manage to send the member event update * Support `getOpenIdToken()` in embedded mode (#3676) * Call `sendContentLoaded()` (#3677) * Start MatrixRTC in embedded mode (#3679) * Reschedule the membership event check * Bump widget api version * Add mock for sendContentLoaded() * More log detail * Fix tests and also better assert because the tests were passing undefined which was considered fine because we were only checking for null. * Simplify updateCallMembershipEvent a bit * Split up updateCallMembershipEvent some more * Typo Co-authored-by: Daniel Abramov <inetcrack2@gmail.com> * Expand comment * Add comment * More comments * Better comment * Sesson * Rename some variables * Comment * Remove unused method * Wrap updatecallMembershipEvent so it only runs one at a time * Do another update if another one is triggered while the update happens * Make triggerCallMembershipEventUpdate async * Fix test & some missed timer removals * Mark session manager as unstable --------- Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com> Signed-off-by: Timo K <toger5@hotmail.de> Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com> Co-authored-by: Timo K <toger5@hotmail.de> Co-authored-by: Timo <16718859+toger5@users.noreply.github.com> Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>
* Drop support for Node 16 ([\matrix-org#3533](matrix-org#3533)). * Improve types around login, registration, UIA and identity servers ([\matrix-org#3537](matrix-org#3537)). * **The Browserify artifact is being deprecated, scheduled for removal in the October 10th release cycle. (matrix-org#3189)** * Simplify `MatrixClient::setPowerLevel` API ([\matrix-org#3570](matrix-org#3570)). Fixes element-hq/element-web#13900 and matrix-org#1844. * Deprecate `VerificationRequest.getQRCodeBytes` and replace it with the asynchronous `generateQRCode`. ([\matrix-org#3562](matrix-org#3562)). * Deprecate `VerificationRequest.beginKeyVerification()` in favour of `VerificationRequest.startVerification()`. ([\matrix-org#3528](matrix-org#3528)). * Deprecate `Crypto.VerificationRequest` application event, replacing it with `Crypto.VerificationRequestReceived`. ([\matrix-org#3514](matrix-org#3514)). * Throw saner error when peeking has its room pulled out from under it ([\matrix-org#3577](matrix-org#3577)). Fixes element-hq/element-web#18679. * OIDC: Log in ([\matrix-org#3554](matrix-org#3554)). Contributed by @kerryarchibald. * Prevent threads code from making identical simultaneous API hits ([\matrix-org#3541](matrix-org#3541)). Fixes element-hq/element-web#25395. * Update IUnsigned type to be extensible ([\matrix-org#3547](matrix-org#3547)). * add stop() api to BackupManager for clean shutdown ([\matrix-org#3553](matrix-org#3553)). * Log the message ID of any undecryptable to-device messages ([\matrix-org#3543](matrix-org#3543)). * Ignore thread relations on state events for consistency with edits ([\matrix-org#3540](matrix-org#3540)). * OIDC: validate id token ([\matrix-org#3531](matrix-org#3531)). Contributed by @kerryarchibald. * Fix read receipt sending behaviour around thread roots ([\matrix-org#3600](matrix-org#3600)). * Fix `TypedEventEmitter::removeAllListeners(void)` not working ([\matrix-org#3561](matrix-org#3561)). * Don't allow Olm unwedging rate-limiting to race ([\matrix-org#3549](matrix-org#3549)). Fixes element-hq/element-web#25716. * Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. ([\matrix-org#3486](matrix-org#3486)). * Use the right anchor emoji for SAS verification ([\matrix-org#3534](matrix-org#3534)). * fix a bug which caused the wrong emoji to be shown during SAS device verification. ([\matrix-org#3523](matrix-org#3523)).
Specifically, when checking the event sender matches who sent us the session keys we skip waiting for pending device list updates if we already know who owns the session key.
This mitigates some of element-hq/element-web#24682.
I've also stuck an try/catch across the
downloadKeys(..)
call, on the assumption that it would make it easier to track this failure mode down. I've completely made up the newOLM_BAD_SENDER_CHECK_FAILED
error code.Ideally, I think instead of doing it like this we'd instead specifically track who sent us the session keys (or at least, who we believe session keys should be from), and match against those when we do the decryption. But that's a bigger job and I think this should help a bunch while this gets fixed properly.
Here's what your changelog entry will look like:
🐛 Bug Fixes
/keys/query
fails. (#3486).