-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Internal error when posting an event with \u0000 #9341
Comments
For the record I got this when trying to send logs from the docker api to matrix. Docker adds an header on each line to determine if it comes from stdout or stderr, and this header contains NULL bytes, which got encoded by my code to |
We're also seeing this on matrix.org occassionally: https://sentry.matrix.org/sentry/synapse-matrixorg/issues/166204/ Not quite sure what the solution is though. |
I guess you could check at the validation step (if there is) if the string contains a null byte, and return an error then. |
Shouldn't we store JSON data as bytes, anyways? It's untrusted data. |
The problem seems to be related to room message search: this is not related to the JSON encoding. This occurs when Synapse inserts the plaintext of a message into the We could either omit search index entries for messages with null bytes or perhaps replace them with e.g. a space character (to preserve it being a word boundary character — I think replacing the message text with spaces for the |
Some relevant bits of postgres' documentation:
and refers to this page as a reference on character sets. Aside: I think if we wanted to go full pedant on this:
But that's going by the letter of the law. Replacing U+0000 with good old U+0020 space is probably the best use of our time! |
Another thought: has anyone tried registering a user with a null code point? |
The docs for what we ask people to do for postgres are at /~https://github.com/matrix-org/synapse/blob/251cfc4e09c36c87227771f1a3aa305c4b0a2121/docs/postgres.md#set-up-database They seem to suggest an encoding of UTF-8 and C-locale. Not using those is known to cause issues. Does postgres have an additional charset configuration beyond that? |
The docs recommend: I think 0x00 → 0x20 is the sensible choice.
I guess you're talking about the user directory? (Equally, the room directory will be potentially affected too) |
I think this assumes that the data going in is readable text? I don't think this is valid. It could make more sense to escape it, but than you're dealing with needing to have escape logic which is always a pain. 😢 |
How would you escape it, FWIW? Treating NUL as a delimiter (word boundary) seems like the pragmatic approach (that's all that converting it to a space before feeding it to (Though since the message is unlikely to be readable text, I can see the option of just exempting it from message search altogether to be sensible, too.) |
This internal error just happened to sbd. when joining #conduit:fachschaften.org (the main Conduit room) with Synapse:
|
That error looks to be coming from a different transaction to the one discussed above: synapse/synapse/storage/databases/main/events.py Lines 1666 to 1681 in 51e2db3
|
@DMRobertson … which probably means this issue being closed by #10820 does not mean that the underlying issue for that Conduit <> Synapse issue was fixed, right? |
I'd recommend filing a new issue, it seems unrelated enough to the above conversation that we should discuss it separately! |
Agreed!
Yes please! |
Description
Although
"\u0000"
is a perfectly valid JSON string, it seems to fail when storing in the database.Steps to reproduce
Error seen in the log file:
Version information
If not matrix.org:
Version: 1.26.0
Install method: apt
The text was updated successfully, but these errors were encountered: