-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3911: Linking media to events #3911
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,281 @@ | ||||||||||||||||||||||
# MSC3911: Linking media to events | ||||||||||||||||||||||
|
||||||||||||||||||||||
(An alternative to [MSC3910](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3910).) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Currently, access to media in Matrix has the following problems: | ||||||||||||||||||||||
|
||||||||||||||||||||||
* The only protection for media is the obscurity of the URL, and URLs are | ||||||||||||||||||||||
easily leaked (eg accidental sharing, access | ||||||||||||||||||||||
logs). [synapse#2150](/~https://github.com/matrix-org/synapse/issues/2150) | ||||||||||||||||||||||
* Anybody (including non-matrix users) can cause a homeserver to copy media | ||||||||||||||||||||||
into its local | ||||||||||||||||||||||
store. [synapse#2133](/~https://github.com/matrix-org/synapse/issues/2133) | ||||||||||||||||||||||
* When a media event is redacted, the media it used remains visible to all. | ||||||||||||||||||||||
[synapse#1263](/~https://github.com/matrix-org/synapse/issues/1263) | ||||||||||||||||||||||
* There is currently no way to delete | ||||||||||||||||||||||
media. [matrix-spec#226](/~https://github.com/matrix-org/matrix-spec/issues/226) | ||||||||||||||||||||||
* If a user requests GDPR erasure, their media remains visible to all. | ||||||||||||||||||||||
* When all users leave a room, their media is not deleted from the server. | ||||||||||||||||||||||
|
||||||||||||||||||||||
This proposal builds on | ||||||||||||||||||||||
[MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3916) (which | ||||||||||||||||||||||
adds authentication to media download), to require that the authenticated | ||||||||||||||||||||||
user is *authorised* to access the requested media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Proposal | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Overview | ||||||||||||||||||||||
|
||||||||||||||||||||||
After an item of media is uploaded, it must be linked to an event (via | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For MLS, we may need to reference media within a to-device event, but we might be able to link it to an in-room event. I'll have to look into this more, but flagging this as something that will need to be considered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there's also ephemeral-ish things like user profiles which aren't directly related to events, though are duplicated/copied to membership events in most cases. |
||||||||||||||||||||||
parameters to the `/send` api). A given piece of media is only visible | ||||||||||||||||||||||
to a user if the user can see the corresponding event. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since reference counting is required anyways, I feel like it's better if the server does it automatically. I wrote a short doc on that here: #4086 |
||||||||||||||||||||||
|
||||||||||||||||||||||
### Detailed spec changes | ||||||||||||||||||||||
|
||||||||||||||||||||||
1. A new "media upload" endpoint is defined, `POST | ||||||||||||||||||||||
/_matrix/client/v1/media/upload`. It is based on the existing | ||||||||||||||||||||||
[`/_matrix/media/v3/upload`](https://spec.matrix.org/v1.4/client-server-api/#post_matrixmediav3upload) | ||||||||||||||||||||||
endpoint, but media uploaded this way is not initially viewable (except to | ||||||||||||||||||||||
the user that uploaded it). This is referred to as a "restricted" media item. | ||||||||||||||||||||||
Comment on lines
+35
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: extend this to |
||||||||||||||||||||||
|
||||||||||||||||||||||
The existing endpoint is deprecated. Media uploaded via the existing endpoint | ||||||||||||||||||||||
is "unrestricted". | ||||||||||||||||||||||
|
||||||||||||||||||||||
2. Attaching media | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the proposal is called "Linking media to events" and "linking" terminology is used outside of this list item. is this intentional? can we agree on a single term otherwise? "attaching" seems better suited. |
||||||||||||||||||||||
|
||||||||||||||||||||||
* The methods for sending events | ||||||||||||||||||||||
([`PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey) | ||||||||||||||||||||||
and [`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid)) | ||||||||||||||||||||||
are extended to take a query parameter `attach_media`, whose value must be a complete `mxc:` URI. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The `attach_media` parameter may be used several times to attach several | ||||||||||||||||||||||
pieces of media to the same event. The maximum number of pieces of media | ||||||||||||||||||||||
that can be attached to a single event is implementation-defined by servers. | ||||||||||||||||||||||
|
||||||||||||||||||||||
If any of the `attach_media` parameters do not correspond to known | ||||||||||||||||||||||
restricted media items, or they refer to restricted media items that have | ||||||||||||||||||||||
already been attached, the server responds with a 400 error with | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reading this proposal top down, it is not entirely clear whether "already attached" means a different event or a duplicate query parameter. |
||||||||||||||||||||||
`M_INVALID_PARAM`. | ||||||||||||||||||||||
Comment on lines
+55
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should there also be a limitation that one can only attach media that they uploaded? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. erm... good question. I think the answer is "yes, there should", to guard against the possibility that someone gets hold of the |
||||||||||||||||||||||
|
||||||||||||||||||||||
Sending an event in this manner associates the media with the sent | ||||||||||||||||||||||
event. From then on, the media can be seen by any user who can see the event | ||||||||||||||||||||||
itself. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Servers should ensure that sending an event remains an idempotent operation: in | ||||||||||||||||||||||
particular, if a client sends an event with a media attachment, and then | ||||||||||||||||||||||
repeats the operation with identical parameters, a 200 response must be returned | ||||||||||||||||||||||
(with the original event ID) even though the media has already been attached. | ||||||||||||||||||||||
|
||||||||||||||||||||||
* Alternatively, if a restricted media item is referenced in a call to | ||||||||||||||||||||||
[`PUT /_matrix/client/v3/profile/{userId}/avatar_url`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3profileuseridavatar_url) | ||||||||||||||||||||||
it is instead attached to the user's profile. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Again, if the media is already attached, the server responds with a 400 error with | ||||||||||||||||||||||
`M_INVALID_PARAM`. | ||||||||||||||||||||||
|
||||||||||||||||||||||
If the media is not attached to either an event or a profile within a reasonable period | ||||||||||||||||||||||
(say, ten minutes), then the server is free to assume that the user has changed their | ||||||||||||||||||||||
Comment on lines
+76
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for clarity: the period starts as the media upload has finished, which is particularly relevant with async uploads? |
||||||||||||||||||||||
mind (or the client has gone offline), and may clean up the uploaded media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
3. Additional checks on `/download` and `/thumbnail` endpoints | ||||||||||||||||||||||
|
||||||||||||||||||||||
The new `/download` and `/thumbnail` endpoints added in | ||||||||||||||||||||||
[MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3916) are | ||||||||||||||||||||||
updated the server must check if the requesting user or server has | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
permission to see the corresponding event. If not, the server responds with | ||||||||||||||||||||||
a 403 error and `M_UNAUTHORIZED`. | ||||||||||||||||||||||
|
||||||||||||||||||||||
4. Federation API returns a `restrictions` property | ||||||||||||||||||||||
|
||||||||||||||||||||||
The `/_matrix/federation/v1/media/download` and `/_matrix/federation/v1/media/thumbnail` | ||||||||||||||||||||||
endpoints specified by [MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3916) | ||||||||||||||||||||||
are extended: the returned json object may have a property `restrictions`. | ||||||||||||||||||||||
Comment on lines
+90
to
+92
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there are several updates required to this MSC - the priority has been to get 3916 into working order, then the team will move onto this MSC afterwards. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MSC3916 was merged 🥳 |
||||||||||||||||||||||
|
||||||||||||||||||||||
If there is no `restrictions` property, the media is a legacy "unrestricted" | ||||||||||||||||||||||
media. Otherwise, `restrictions` should be a JSON object with one | ||||||||||||||||||||||
of the following properties: | ||||||||||||||||||||||
|
||||||||||||||||||||||
* `event_id`: the event id of the event to which the media is attached. | ||||||||||||||||||||||
* `profile_user_id`: the user id of the user to whose profile the media is attached. | ||||||||||||||||||||||
|
||||||||||||||||||||||
It is invalid for both `event_id` and `profile_user_id` to be set. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The requesting server must check the restrictions list, and only return | ||||||||||||||||||||||
the requested media to users who have permission to view the relevant | ||||||||||||||||||||||
event or profile. If the requesting server caches the media, it must also | ||||||||||||||||||||||
cache the restrictions list. | ||||||||||||||||||||||
|
||||||||||||||||||||||
If neither `event_id` nor `profile_user_id` are present, the requesting | ||||||||||||||||||||||
user should assume that an unknown restriction is present, and not allow access | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
to any user. | ||||||||||||||||||||||
|
||||||||||||||||||||||
An example response: | ||||||||||||||||||||||
|
||||||||||||||||||||||
``` | ||||||||||||||||||||||
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p | ||||||||||||||||||||||
|
||||||||||||||||||||||
--gc0p4Jq0M2Yt08jU534c0p | ||||||||||||||||||||||
Content-Type: application/json | ||||||||||||||||||||||
|
||||||||||||||||||||||
{ "restrictions": { | ||||||||||||||||||||||
"event_id": "$Rqnc-F-dvnEYJTyHq_iKxU2bZ1CI92-kuZq3a5lr5Zg" | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means that the association between (encrypted) media and event ids will be plainly accessible to the homeserver, and federated servers. I'd imagine that isn't something we actually care about but something that others may complain about. Given that /copy presumably can't actually decrypt the media, that also means that the homeserver can associate any copied encrypted media as well |
||||||||||||||||||||||
}} | ||||||||||||||||||||||
|
||||||||||||||||||||||
--gc0p4Jq0M2Yt08jU534c0p | ||||||||||||||||||||||
Content-Type: text/plain | ||||||||||||||||||||||
|
||||||||||||||||||||||
This media is plain text. Maybe somebody used it as a paste bin. | ||||||||||||||||||||||
|
||||||||||||||||||||||
--gc0p4Jq0M2Yt08jU534c0p | ||||||||||||||||||||||
``` | ||||||||||||||||||||||
|
||||||||||||||||||||||
5. New "media copy" API | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A downside of the media copy API is clients would no longer be able to reliably cache media. For the example of stickers or emoji, popular/busy rooms could have tens of references to the same media object, which causes tens of downloads because the client doesn't know different. Possible alternatives (for future MSCs, imo) would be:
I'm more preferential to the second. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. going down the rabbit hole of custom emoji, there's an interaction with MSC4027 we will have to figure out. |
||||||||||||||||||||||
|
||||||||||||||||||||||
A new endpoint is defined: `POST | ||||||||||||||||||||||
/_matrix/client/v1/media/copy/{serverName}/{mediaId}`. The body of the | ||||||||||||||||||||||
request must be a JSON object, but there are no required parameters. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Conceptually, the API makes a new copy of a media item. (In practice, the | ||||||||||||||||||||||
server will probably make a new reference to an existing media item, but | ||||||||||||||||||||||
that is an implementation detail). | ||||||||||||||||||||||
|
||||||||||||||||||||||
The response is a json object with a required `content_uri` property, | ||||||||||||||||||||||
giving a new MXC URI referring to the media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The new media item can be attached to a new event, and generally functions | ||||||||||||||||||||||
in every way the same as uploading a brand new media item. | ||||||||||||||||||||||
Comment on lines
+145
to
+146
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can the user copy restricted media they don't have access to? (I assume not, but what is the appropriate error code?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say no, it's the same as |
||||||||||||||||||||||
|
||||||||||||||||||||||
This "copy" api is to be used by clients when forwarding events with media | ||||||||||||||||||||||
attachments. | ||||||||||||||||||||||
richvdh marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
+148
to
+149
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'd need to use this for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (and custom emoji, when that exists) |
||||||||||||||||||||||
|
||||||||||||||||||||||
(This mechanism, rather than just allowing clients to attach media to multiple | ||||||||||||||||||||||
events, is necessary to ensure that the list of events attached a given piece of | ||||||||||||||||||||||
media does not grow over time, which would make it difficult for servers to | ||||||||||||||||||||||
reliably cache media and impose the correct access restrictions.) | ||||||||||||||||||||||
|
||||||||||||||||||||||
5. Autogenerated `m.room.member` events | ||||||||||||||||||||||
|
||||||||||||||||||||||
Servers will generate `m.room.member` events with an `avatar_url` whenever | ||||||||||||||||||||||
one of their users joins a room, or changes their profile picture. | ||||||||||||||||||||||
richvdh marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Such events must each use a different copy of the media item, in the same | ||||||||||||||||||||||
way as the "media copy" API described above. | ||||||||||||||||||||||
|
||||||||||||||||||||||
6. Backwards compatibility mechanisms | ||||||||||||||||||||||
|
||||||||||||||||||||||
For backwards compatibility with older clients and requesting servers: | ||||||||||||||||||||||
servers may for a short time choose to allow unauthenticated access via the | ||||||||||||||||||||||
deprecated `/_matrix/media/v3` endpoints, even for restricted media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
7. URL preview | ||||||||||||||||||||||
|
||||||||||||||||||||||
The | ||||||||||||||||||||||
[`/preview_url`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3preview_url) | ||||||||||||||||||||||
endpoint returns an object that references an image for the previewed | ||||||||||||||||||||||
site. | ||||||||||||||||||||||
|
||||||||||||||||||||||
It is expected that servers will continue to treat such media as unrestricted | ||||||||||||||||||||||
(at least for local users), but it would be legitimate for them to, for example, | ||||||||||||||||||||||
return a different `mxc:` URI for each requesting user, and only allow each user | ||||||||||||||||||||||
access to the corresponding `mxc:` URI. | ||||||||||||||||||||||
Comment on lines
+177
to
+180
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I see the advantage of a server issuing distinct URIs for previews. Is there a motivation for putting this here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I was trying to reason about how we might apply access restrictions to media that is part of a URL preview (if we decided that was a sensible thing to do). It may well be that the extra text confuses more than it clarifies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. leaving url previews as a way to keep "uploading" unrestricted media would leave open the attack vector this MSC is trying to close. |
||||||||||||||||||||||
|
||||||||||||||||||||||
### Applications | ||||||||||||||||||||||
|
||||||||||||||||||||||
The following discusses the impact of this proposal on various parts of | ||||||||||||||||||||||
ecosystem: we consider the changes that will be required for existing | ||||||||||||||||||||||
implementations, and how the proposal will facilitate future extensions. | ||||||||||||||||||||||
|
||||||||||||||||||||||
#### IRC/XMPP bridges | ||||||||||||||||||||||
|
||||||||||||||||||||||
These bridges have previously been discussed in [MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3916), however this proposal adds a new problem. | ||||||||||||||||||||||
|
||||||||||||||||||||||
These bridges currently use the content repository as a paste-bin: large text | ||||||||||||||||||||||
messages are uploaded as plain-text media, and a link is then sent to the | ||||||||||||||||||||||
remote network. This will become problematic, because servers are entitled | ||||||||||||||||||||||
to remove any media that does not get linked to an event. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Solutions might include: | ||||||||||||||||||||||
* the bridge hosting its own content repository for this usecase | ||||||||||||||||||||||
* using an external service | ||||||||||||||||||||||
* special-casing the bridge AS user to permit it to upload media without | ||||||||||||||||||||||
expiry. | ||||||||||||||||||||||
Comment on lines
+200
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
other ideas:
|
||||||||||||||||||||||
|
||||||||||||||||||||||
#### Icons for "social login" flows | ||||||||||||||||||||||
|
||||||||||||||||||||||
These likewise were previously discussed in | ||||||||||||||||||||||
[MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3916). | ||||||||||||||||||||||
|
||||||||||||||||||||||
We need to ensure that the icons are not deleted from the content | ||||||||||||||||||||||
repository even though they have not been attached to any event or profile. It | ||||||||||||||||||||||
would be wise for servers to provide administrators with a mechanism to upload | ||||||||||||||||||||||
media without attaching it to anything. | ||||||||||||||||||||||
|
||||||||||||||||||||||
#### Redacting events | ||||||||||||||||||||||
|
||||||||||||||||||||||
Under this proposal, servers can determine which media is referenced by an | ||||||||||||||||||||||
event that is redacted, and add that media to a list to be cleaned up. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with this is that if the user who sent the media leaves the room, the homeserver may no longer get the reaction events. That means there will be no mechanism for those events to be removed. |
||||||||||||||||||||||
|
||||||||||||||||||||||
This would also apply if all users in a room are deactivated (either via a GDPR | ||||||||||||||||||||||
section 17 request or by a self-requested "Deactivate account" request). In | ||||||||||||||||||||||
this case, all events in the room, and all media referenced by them, should be | ||||||||||||||||||||||
removed. Currently, Synapse does not support removing the events (see also | ||||||||||||||||||||||
[synapse#4720](/~https://github.com/matrix-org/synapse/issues/4720)); but if at | ||||||||||||||||||||||
Comment on lines
+221
to
+222
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Synapse was fixed a few days ago, apparently :) |
||||||||||||||||||||||
some point in the future this is added, then this proposal makes it easy to | ||||||||||||||||||||||
extend to deleting the media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Fixes [synapse#1263](/~https://github.com/matrix-org/synapse/issues/1263). | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Potential issues | ||||||||||||||||||||||
richvdh marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AndrewRyanChama says:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please consider that widgets also use media, i.e. #4039 to summarize, linking media is not a big issue for events of a well-known structure referencing them - with regards to widgets, that means that clients can parse the payload and handle it automatically. however especially for widgets it is often attractive to send nonstandard events (let alone ones with some custom encoding), that prohibit this. there can be use cases (with widgets or without), where multiple media are attached to an event that represents a certain state at a point in time and which is edited collaboratively. however editing would usually involve changing only one of the referenced media items at most, resulting in an immense overhead when having to |
||||||||||||||||||||||
|
||||||||||||||||||||||
* Since each `m.room.member` references the avatar separately, changing your | ||||||||||||||||||||||
avatar will cause an even bigger traffic storm if you're in a lot of rooms. | ||||||||||||||||||||||
|
||||||||||||||||||||||
In addition, this could cause duplication of media in the remote media cache, | ||||||||||||||||||||||
if the implementation does not take steps to deduplicate (eg, storing media | ||||||||||||||||||||||
by content hash rather than media id). | ||||||||||||||||||||||
Comment on lines
+233
to
+235
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we spec a response header for the media download that provides a suitable hash of the object? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. De-duplication is already possible at an implementation level, and would probably be handled by a different MSC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that reivilibre wasn't talking about the way clients/homeservers are deduplicating files after download, but rather before they're downloaded. I agree that former is an implementation detail that doesn't require uniformity across software, and as such doesn't need to be covered in protocol spec. But if the intention here is to avoid the very process of downloading a file multiple times (from a different linked MXC each time), can this be done without protocol declaring one well known way to discover e.g. that hash of underlying file by HEAD request? I'm specifically thinking of a situation where you've got a user looking at sticker selector and not-so-maliciously spamming them by clicking them 100s of times. Some people do that, it isn't really a bad behavior in some rooms and contexts (slightly distant example would be custom emote spam on Twitch chats). Since the stickers sent in each event are no longer sharing the same MXC, AFAIK client (and homeserver, if sender is remote) will have to download the sticker 100s of times, and thus generate a lot of unnecessary traffic. Only then it will be able to deduplicate file for local storage/cache. So my (and I assume reivilibre's) concern here is that with the proposal as it is right now, there's no way for client/homeserver to avoid that "traffic duplication", so to speak. Or is there? 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, on the other hand, that choice of hash type used for deduplication before download, might eventually have an effect on future implementations of dedupe after download. For example, iirc currently in Synapse there's a mapping of MXC URI to local filename which is a random string. With this MSC, each server would have to store a mapping from the new generated IDs to the local file (in Synapse's case, again, a random name). If protocol adds a way to map MXC IDs to file hash early, is there a reason for servers to not store files by naming them after their hashes (as received over e.g. federation)? This would save you one more lookup: MXC IDs -> file hash -> randomized local name. Oh, and I haven't thought how/if this "early dedupe" could realistically work with files attached to E2EE events, and whether it would add any additional metadata leak other than what's caused by this MSC as is. |
||||||||||||||||||||||
|
||||||||||||||||||||||
## Alternatives | ||||||||||||||||||||||
|
||||||||||||||||||||||
* Have the "upload" endpoint return a nonce, which can then be used in the | ||||||||||||||||||||||
"send" endpoint in place of the `mxc` uri. It's hard to see what advantage | ||||||||||||||||||||||
this gives, beyond the fact a nonce could be smaller so marginally fewer | ||||||||||||||||||||||
bytes to send. | ||||||||||||||||||||||
|
||||||||||||||||||||||
* Use some sort of "content token" for each piece of media, and require clients to | ||||||||||||||||||||||
provide it, per [MSC3910](/~https://github.com/matrix-org/matrix-spec-proposals/pull/3910). | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Security considerations | ||||||||||||||||||||||
|
||||||||||||||||||||||
* Letting servers track the relationship between events and media is a metadata | ||||||||||||||||||||||
leak, especially for e2ee rooms. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Unstable prefix | ||||||||||||||||||||||
|
||||||||||||||||||||||
TODO | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this is what MMR will be using) |
||||||||||||||||||||||
|
||||||||||||||||||||||
## Dependencies | ||||||||||||||||||||||
|
||||||||||||||||||||||
[MSC3916](/~https://github.com/matrix-org/matrix-spec-proposals/issues/3916) "Authentication for media access, and new endpoint names". | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Prior art | ||||||||||||||||||||||
|
||||||||||||||||||||||
* Credit: this proposal is based on ideas from @jcgruenhage and @anoadragon453 at | ||||||||||||||||||||||
https://cryptpad.fr/code/#/2/code/view/oWjZciD9N1aWTr1IL6GRZ0k1i+dm7wJQ7juLf4tJRoo/ | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [MSC~~701~~3796](/~https://github.com/matrix-org/matrix-spec-proposals/issues/3796): | ||||||||||||||||||||||
a predecessor of this proposal | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [MSC2461](/~https://github.com/matrix-org/matrix-spec-proposals/pull/2461): | ||||||||||||||||||||||
adds per-user authentication but does not attempt to restrict access to | ||||||||||||||||||||||
individual items of media. | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [MSC2278](/~https://github.com/matrix-org/matrix-spec-proposals/pull/2278): | ||||||||||||||||||||||
Deleting attachments for expired and redacted messages | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [MSC1902](/~https://github.com/matrix-org/matrix-spec-proposals/pull/1902): | ||||||||||||||||||||||
Split the media repo into s2s and c2s parts | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [MSC2846](/~https://github.com/matrix-org/matrix-spec-proposals/pull/2846): | ||||||||||||||||||||||
Decentralizing media through CIDs | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since authenticated media landed, I think