Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feature): use sqlite as backing database #200

Merged
merged 10 commits into from
Feb 29, 2020
Merged

(feature): use sqlite as backing database #200

merged 10 commits into from
Feb 29, 2020

Conversation

jethrokuan
Copy link
Member

@jethrokuan jethrokuan commented Feb 27, 2020

Addresses #199. We move the link storage to a SQL database, to aid maintainability. Here are some pros and cons.

Pros

  1. Full cache rebuilds only happen once (at first use). All further cache updates are incremental. How?

The cache computes the sha1 of the buffer contents. If buffer contents have changed, the relevant parts of the caches are updated. If the files are no longer there, they are also removed from the cache. That is, the files table allows us to maintain a consistent view of the cache.

The cache is also persisted to disk, which means no startup time.

Since full cache rebuilds happen only once, we may also remove the dependency on async, and all the potential bugs that come with it.

  1. Since the cache is stored outside of Emacs, it can be used across multiple Emacs instances.

  2. The code is greatly simplified, and a layer of data abstraction is helpful for maintainability.

  3. We can use the sqlite database for other fancy things (full text search)

Cons

  1. Likely to be slower than the current approach of having them in-memory in Emacs. From my testing so far, I have not noticed any lag. PKMs generally don't grow very big, at least not big enough for SQL to become too slow.

  2. Additional dependency on sqlite and emacsql. Not a big deal.

  3. The org-roam databases are NOT machine-independent yet. The absolute file paths are still stored in the cache. However, org-roam-build-cache will ensure that it still works when changing machines, although that means a full cache rebuild. Ideally, we'd store relative paths. That at least makes the database work on the same platforms (Unix, Windows).

Call for testers

I've been running on this the whole day, and it's been nice and fine so far. I'll likely leave this up for a while. I think this is going to break most PRs, and I want to have this merged in first.

@jethrokuan jethrokuan changed the title SQLite Org-roam DB WIP: SQLite Org-roam DB Feb 27, 2020
@jethrokuan jethrokuan mentioned this pull request Feb 27, 2020
1 task
@jethrokuan jethrokuan changed the title WIP: SQLite Org-roam DB (feature): use sqlite as backing database Feb 28, 2020
@alanz
Copy link
Contributor

alanz commented Feb 28, 2020

Likely to be slower than the current approach of having them in-memory in Emacs.

Hopefully this is a one-time degradation, which then remains mostly constant as the size of the roam db grows. And memory usage gets capped too.

@jethrokuan
Copy link
Member Author

Hopefully this is a one-time degradation, which then remains mostly constant as the size of the roam db grows. And memory usage gets capped too.

The size of the Roam DB grows really slowly (all text, from one person). RAM usage is basically nothing. As for disk memory, it's now at 400kb for my database.

@technovangelist
Copy link

with this setup, on quit I am now asked about quiting active processes. I assume its ok to say yes to that, right?

@jethrokuan
Copy link
Member Author

with this setup, on quit I am now asked about quiting active processes. I assume its ok to say yes to that, right?

Yes, but it's not a graceful kill of the database process. Maybe a hook should be added to kill-emacs to call for a DB connection shutdown.

@jethrokuan
Copy link
Member Author

So the check for active processes happens before hooks for kill-emacs are run, which means the popup will still show, but now the shutdown is graceful.

@jethrokuan jethrokuan merged commit 0c2aaad into develop Feb 29, 2020
@jethrokuan jethrokuan deleted the feat/sqlite branch February 29, 2020 07:56
@alphapapa
Copy link

@jethrokuan Not sure if this would necessarily be appropriate for this case, but I've used this before, and if it works here, it would be simpler. So FYI:

38.11 Querying Before Exit

When Emacs exits, it terminates all its subprocesses. For subprocesses
that run a program, it sends them the ‘SIGHUP’ signal; connections are
simply closed. Because subprocesses may be doing valuable work, Emacs
normally asks the user to confirm that it is ok to terminate them. Each
process has a query flag, which, if non-‘nil’, says that Emacs should
ask for confirmation before exiting and thus killing that process. The
default for the query flag is ‘t’, meaning do query.

-- Function: process-query-on-exit-flag process
This returns the query flag of PROCESS.

-- Function: set-process-query-on-exit-flag process flag
This function sets the query flag of PROCESS to FLAG. It returns
FLAG.

 Here is an example of using ‘set-process-query-on-exit-flag’ on a
 shell process to avoid querying:

      (set-process-query-on-exit-flag (get-process "shell") nil)
           ⇒ nil

-- User Option: confirm-kill-processes
If this user option is set to ‘t’ (the default), then Emacs asks
for confirmation before killing processes on exit. If it is ‘nil’,
Emacs kills processes without confirmation, i.e., the query flag of
all processes is ignored.

@jethrokuan
Copy link
Member Author

For subprocesses that run a program, it sends them the ‘SIGHUP’ signal; connections are simply closed.

SIGHUP actually sounds alright. So all that extra machinery is redundant.

@jackbaty
Copy link

How long should the db take to build, first time? An org-roam.db file is there, but after 20 minutes it remains empty and I see...

Error in post-command-hook (org-roam--maybe-update-buffer): (error "[Org-roam] your cache isn’t built yet! Please wait.")

Seems like it's trying, but I'm guessing 20 minutes isn't normal :). I have eventually restarted Emacs but still nothing.

@jackbaty
Copy link

I should perhaps mention that my roam directory has fewer than 200 files.

@jethrokuan
Copy link
Member Author

jethrokuan commented Feb 29, 2020

@jackbaty 5 seconds maybe? Mine has more than that, and takes 5 seconds. Probably a bug somewhere. Do the following steps in order:

(org-roam--db-close-all)
;; delete the org-roam.db file
(org-roam-build-cache)

Edit: you probably didn't manually run org-roam-build-cache once

@jackbaty
Copy link

That did it, thanks! I'm not sure what I might have done to confuse things first launch so I may not be much help with debugging, sorry.

One thing I did notice after my last comment is that today's file was in the db, but it was the only one.

@jethrokuan
Copy link
Member Author

Since org-roam-build-cache runs pretty quickly when the cache is already built, maybe I should run it automatically.

@dbjg
Copy link

dbjg commented Feb 29, 2020

It alls seems to work quite well. I haven't had any problems with upgrading. Is there anything else that I could test?

@jethrokuan
Copy link
Member Author

awesome, that's good to hear! I guess just keep on the lookout any performance degradations or bugs. I'm going to make a couple more breaking changes and simplifications, and then probably release it as 1.0.0.

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

I also had to rebuild the graph the first time, I presume as a "first time" step.

When I run org-roam-show-graph it only shows links that were added after the sqlite db came into action, not for the prior ones. Is this expected?

@jethrokuan
Copy link
Member Author

the graph produced is a materialized view of what the SQL database contains, so I suppose so.

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

Is there a way to scan the directory and pick up the links? My mental model of the system assumed this would be the case, or was the case with the prior version. I admit I did not use the graph before, so I could be off base here.

@jethrokuan
Copy link
Member Author

Sorry, I don't really understand. Is the graph not showing you the same as what you're seeing before? Because it should be exactly the same.

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

I did not use the graph before today.

But the graph only shows links between items that I added today, after the sqlite db update came into use, nothing from my earlier notes that I added.

@jethrokuan
Copy link
Member Author

that's not right. Could you run org-roam-build-cache to see if that changes?

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

I had already done org-roam-build-cache a few times.

I just explored the db with sqlitebrowser, and it does show up with 368 entries, 285 file_links betweeen them. All four tables are populated, except the refs one, which I presume I am not needing.

So I guess sqlite is not the problem.

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

Digging further, the graph.dot file is fully populated, the svg does not include all of it. I probably have a char needing escaping somewhere, will dig in, and create a separate bug report when I find something.

@jethrokuan
Copy link
Member Author

seems like a bug there, could be a double quote char messing up the dot output.

@alanz
Copy link
Contributor

alanz commented Feb 29, 2020

Yes, that is exactly what I found

@technovangelist
Copy link

i tried the graph before using the sqlite db and then after. before all the docs were there and after only some were. I have no docs with quotes in the titles. where do i look for the dot file?

@jethrokuan
Copy link
Member Author

where do i look for the dot file?

It's at /tmp/graph.dot, but the issue that @alanz reported probably existed before the switch to sqlite.

juergenhoetzel added a commit to juergenhoetzel/org-roam that referenced this pull request Mar 8, 2020
Graceful shutdown is already guaranteed via `kill-emacs-hook`.

Refs org-roam#200
juergenhoetzel added a commit to juergenhoetzel/org-roam that referenced this pull request Mar 8, 2020
Graceful shutdown is already guaranteed via `kill-emacs-hook`.

Refs org-roam#200
juergenhoetzel added a commit to juergenhoetzel/org-roam that referenced this pull request Mar 8, 2020
Graceful shutdown is already guaranteed via `kill-emacs-hook`.

Refs org-roam#200
jethrokuan added a commit that referenced this pull request Mar 8, 2020
Graceful shutdown is already guaranteed via `kill-emacs-hook`.

Refs #200

Co-authored-by: Jethro Kuan <jethrokuan95@gmail.com>
@skizye
Copy link

skizye commented Mar 17, 2020

The documentation hasn't been updated on how to install, so when using current instructions with latest master there are errors on startup about emacsql not found. Any ideas, or can the documentation be updated?

@jethrokuan
Copy link
Member Author

If you're using something like straight to install, the dependencies are pulled automatically for you. If not, you'll have to install them manually, or wait for the MELPA release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants