Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Sync loop #19

Closed
redsky17 opened this issue Feb 11, 2019 · 14 comments
Closed

Initial Sync loop #19

redsky17 opened this issue Feb 11, 2019 · 14 comments
Labels
bug Something isn't working

Comments

@redsky17
Copy link
Member

[2019-02-11 10:43:16.041] [db] [error] failed to save state after initial sync: mdb_dbi_open: MDB_DBS_FULL: Environment maxdbs limit reached
[2019-02-11 10:43:16.041] [net] [info] trying initial sync

@redsky17 redsky17 added the bug Something isn't working label Feb 11, 2019
AndrewJDR added a commit to AndrewJDR/nheko that referenced this issue Feb 17, 2019
AndrewJDR added a commit to AndrewJDR/nheko that referenced this issue Feb 17, 2019
redsky17 added a commit that referenced this issue Feb 18, 2019
Attempt to fix issue #19 by increasing the lmdb max_dbs setting.
@AndrewJDR
Copy link
Contributor

AndrewJDR commented Feb 19, 2019

I was able to successfully log in with this patch in place! I think you can safely merge this to master.

@redsky17
Copy link
Member Author

I'm glad it worked for you and I'll merge the PR, but I want to leave this issue open. I have a feeling that there's some greater design issue that needs to be addressed.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Mar 19, 2019

I made the mistake of deleting the nheko cache and I can't seem to log in anymore.
I have a different error, though:

[2019-03-19 20:33:47.052] [crypto] [info] creating new olm account
[2019-03-19 20:33:47.105] [crypto] [info] ed25519   : <deleted>
[2019-03-19 20:33:47.105] [crypto] [info] curve25519: <deleted>
[2019-03-19 20:33:47.105] [crypto] [info] generating one time keys
[2019-03-19 20:33:48.324] [net] [info] uploaded 50 signed_curve25519 one-time keys
[2019-03-19 20:33:48.324] [net] [info] trying initial sync
[2019-03-19 20:33:59.237] [net] [error] initial sync error: 2067873536 
[2019-03-19 20:33:59.237] [net] [info] trying initial sync
[2019-03-19 20:34:10.264] [net] [error] initial sync error: 2067873536 M_UNRECOGNIZED
[2019-03-19 20:34:10.264] [net] [info] trying initial sync
[2019-03-19 20:34:21.630] [net] [error] initial sync error: 2067873536 
[2019-03-19 20:34:21.630] [net] [info] trying initial sync
[2019-03-19 20:34:32.990] [net] [error] initial sync error: 2067873536 

@deepbluev7
Copy link
Member

This issue could also be related to matrix-org/synapse#4898 . Matrix.org (and probably other servers using federation workers) sends wrong m.read events over federation, which leads to wrong events being sent to clients. Basically you get a json::type_error here /~https://github.com/Nheko-Reborn/mtxclient/blob/5422d281bd8d1ae5db14b5a8e0c4093decbedeb9/lib/structs/responses/sync.cpp#L72 while parsing the timestamp field.

Because there is no handling of parse errors in initialSync, Nheko just loops forever trying to sync.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Mar 24, 2019

I also noted I can't receive any new messages until I restart nheko, so it's not limited to the initial sync.

@deepbluev7
Copy link
Member

Well, nheko can't parse some ephemeral events (reading receipts) properly, so I wouldn't be surprised, if that bugs out all of the sync code, i.e. also receiving normal messages,. I get quite a few messages of wrong timestamp fields when I'm already logged in with my workaround (see Nheko-Reborn/mtxclient#10 for my changes).

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Mar 25, 2019

I confirm it solved the issue completely. Thank you!

@redsky17
Copy link
Member Author

redsky17 commented May 6, 2019

I think this has been sufficiently resolved at this point.

@redsky17 redsky17 closed this as completed May 6, 2019
@bjesus
Copy link

bjesus commented Sep 24, 2020

I'm getting the same (?) error with the most recent version from git. Can't login at all using Nheko. It used to work fine for me with previous builds.

[2020-09-24 12:42:32.394] [net] [info] initial sync completed
[2020-09-24 12:42:32.867] [db] [info] mark room !XXX:matrix.org as encrypted
[2020-09-24 12:42:33.591] [db] [error] failed to save state after initial sync: mdb_dbi_open: MDB_DBS_FULL: Environment maxdbs limit reached
[2020-09-24 12:42:33.591] [net] [info] trying initial sync
[json.exception.out_of_range.403] key 'device_id' not found
{
  "content": {
    "algorithm": "m.megolm.v1.aes-sha2",
    "ciphertext": "XXX",
    "sender_key": "XXX",
    "session_id": "XXX"
  },
  "event_id": "$XXX",
  "origin_server_ts": 1598530409679,
  "sender": "@chat:XXX.com",
  "type": "m.room.encrypted",
  "unsigned": {
    "age": 2413780228
  }
}

@deepbluev7
Copy link
Member

@bjesus : Are you in a lot of rooms like 1000 or so? Or ist that message about device_id not found repeating itself a lot, while you are initial sync looping. In the former case, bumping MAX_DBS should help (we need to increase that anyway, I think), in the latter case that should be easily fixable, but I need to dig into, why this is actually still failing and not caught immediately.

@bjesus
Copy link

bjesus commented Sep 24, 2020

I have about 250 rooms and 750 direct people, yes. How do I bump MAX_DBS? Is it an environment variable? and what should I set there? Thank you @deepbluev7 !

@deepbluev7
Copy link
Member

It's currently a hardcoded variable in here: /~https://github.com/Nheko-Reborn/nheko/blob/master/src/Cache.cpp#L51

Nheko uses multiple dbs per room (direct chats are also a room). Arguably this should be much higher, since we recently added a few addional dbs per room and Nheko should (at least at some point) scale to 10k rooms or more. I'll investigate, what's the reasonable maximum and if this should be configurable by the user.

@bjesus
Copy link

bjesus commented Sep 24, 2020

Thanks! I've changed it to 38092UL (just guessing) and now Nheko works perfectly.

@deepbluev7
Copy link
Member

@bjesus : I'll try to fix that soon. I'm pretty sure you have one of the largest accounts, that ever used Nheko. I'll definitely need to figure out a proper solution to the max db thingy.

deepbluev7 added a commit that referenced this issue Dec 17, 2022
Backtrace:

Thread 1 "nheko" received signal SIGSEGV, Segmentation fault.
containerWidget (w=w@entry=0x0) at /usr/src/debug/dev-qt/qtwidgets-5.15.7/qtbase-everywhere-src-5.15.7/src/widgets/styles/qstylesheetstyle.cpp:2467
2467        if (const QAbstractScrollArea *sa = qobject_cast<const QAbstractScrollArea *>(w->parentWidget())) {
(gdb) bt
 #0  containerWidget(QWidget const*) (w=w@entry=0x0) at /usr/src/debug/dev-qt/qtwidgets-5.15.7/qtbase-everywhere-src-5.15.7/src/widgets/styles/qstylesheetstyle.cpp:2467
 #1  0x00007ffff4aa0ad6 in QStyleSheetStyle::drawPrimitive(QStyle::PrimitiveElement, QStyleOption const*, QPainter*, QWidget const*) const (this=0x555559917900, pe=<optimized out>, opt=0x55555ea4b5c0, p=0x7fffffffcfd0, w=0x0) at /usr/src/debug/dev-qt/qtwidgets-5.15.7/qtbase-everywhere-src-5.15.7/src/widgets/styles/qstylesheetstyle.cpp:4452
 #2  0x00007fff61d4a86b in KQuickStyleItem::paint(QPainter*) (this=this@entry=0x55555ea4a1e0, painter=painter@entry=0x7fffffffcfd0) at /usr/src/debug/kde-frameworks/qqc2-desktop-style-5.101.0/qqc2-desktop-style-5.101.0/plugin/kquickstyleitem.cpp:1667
 #3  0x00007fff61d4b22a in KQuickStyleItem::updatePolish() (this=0x55555ea4a1e0) at /usr/src/debug/kde-frameworks/qqc2-desktop-style-5.101.0/qqc2-desktop-style-5.101.0/plugin/kquickstyleitem.cpp:1928
 #4  0x00007ffff57717c2 in QQuickWindowPrivate::polishItems() (this=0x55555ea2e760) at /usr/src/debug/dev-qt/qtdeclarative-5.15.7-r1/qtdeclarative-everywhere-src-5.15.7/src/quick/items/qquickwindow.cpp:393
 #5  0x00007ffff570f4ef in QSGThreadedRenderLoop::polishAndSync(QSGThreadedRenderLoop::Window*, bool) (this=this@entry=0x5555598eb770, w=w@entry=0x7fffe000aef0, inExpose=inExpose@entry=true) at /usr/src/debug/dev-qt/qtdeclarative-5.15.7-r1/qtdeclarative-everywhere-src-5.15.7/src/quick/scenegraph/qsgthreadedrenderloop.cpp:1576
 #6  0x00007ffff5710a8e in QSGThreadedRenderLoop::handleExposure(QQuickWindow*) (this=0x5555598eb770, window=<optimized out>) at /usr/src/debug/dev-qt/qtdeclarative-5.15.7-r1/qtdeclarative-everywhere-src-5.15.7/src/quick/scenegraph/qsgthreadedrenderloop.cpp:1374
 #7  0x00007ffff43d2b45 in QWindow::event(QEvent*) (this=0x7fffe0006eb0, ev=<optimized out>) at /usr/src/debug/dev-qt/qtgui-5.15.7-r1/qtbase-everywhere-src-5.15.7/src/gui/kernel/qwindow.cpp:2450
 #8  0x00007ffff49ee481 in QApplicationPrivate::notify_helper(QObject*, QEvent*) (this=<optimized out>, receiver=0x7fffe0006eb0, e=0x7fffffffd460) at /usr/src/debug/dev-qt/qtwidgets-5.15.7/qtbase-everywhere-src-5.15.7/src/widgets/kernel/qapplication.cpp:3637
 #9  0x00007ffff3e2d618 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (receiver=0x7fffe0006eb0, event=0x7fffffffd460) at /usr/src/debug/dev-qt/qtcore-5.15.7/qtbase-everywhere-src-5.15.7/src/corelib/kernel/qcoreapplication.cpp:1064
 #10 0x00007ffff43c8368 in QGuiApplicationPrivate::processExposeEvent(QWindowSystemInterfacePrivate::ExposeEvent*) (e=0x55555f648b30) at /usr/src/debug/dev-qt/qtgui-5.15.7-r1/qtbase-everywhere-src-5.15.7/src/gui/kernel/qguiapplication.cpp:3261
 #11 0x00007ffff43a55ab in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) (flags=flags@entry=...) at /usr/src/debug/dev-qt/qtgui-5.15.7-r1/qtbase-everywhere-src-5.15.7/src/gui/kernel/qwindowsysteminterface.cpp:1169
 #12 0x00007fffef102622 in xcbSourceDispatch(GSource*, GSourceFunc, gpointer) (source=<optimized out>) at /usr/src/debug/dev-qt/qtgui-5.15.7-r1/qtbase-everywhere-src-5.15.7/src/plugins/platforms/xcb/qxcbeventdispatcher.cpp:105
 #13 0x00007ffff386d030 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
 #14 0x00007ffff386d2d8 in  () at /usr/lib64/libglib-2.0.so.0
 #15 0x00007ffff386d36f in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0
 #16 0x00007ffff3e80e55 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=0x5555596a4770, flags=...) at /usr/src/debug/dev-qt/qtcore-5.15.7/qtbase-everywhere-src-5.15.7/src/corelib/kernel/qeventdispatcher_glib.cpp:423
 #17 0x00007ffff3e2c00b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7fffffffd700, flags=..., flags@entry=...) at /usr/src/debug/dev-qt/qtcore-5.15.7/qtbase-everywhere-src-5.15.7/include/QtCore/../../src/corelib/global/qflags.h:69
 #18 0x00007ffff3e344ea in QCoreApplication::exec() () at /usr/src/debug/dev-qt/qtcore-5.15.7/qtbase-everywhere-src-5.15.7/include/QtCore/../../src/corelib/global/qflags.h:121
 #19 0x00005555594a5c43 in main(int, char**) (argc=2, argv=0x7fffffffdab8) at /home/nicolas/Dokumente/devel/open-source/nheko/src/main.cpp:401
(gdb) p w
$1 = (const QWidget *) 0x0
Lymkwi pushed a commit to Lymkwi/nheko that referenced this issue May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants