Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(redb-store): Optimization for small file import in redb store #2062

Merged
merged 176 commits into from
Mar 19, 2024

Conversation

rklaehn
Copy link
Contributor

@rklaehn rklaehn commented Mar 7, 2024

Description

Optimization for small file import in redb store.

This batches writes and reads up to some delay and vastly improves performance especially for writes. Linux kernel import goes from minutes to seconds.

Notes & open questions

Currently delete is just a normal operation. I think it should be changed to a toplevel operation because crashing with a delete transaction being not committed would lead to inconsistent state: files would be gone on disk and there in the db.

File deletions of any of the "owned" files are now handled outside the transaction. That requires keeping track of deletes and undeletes within a transaction. E.g. if within a single transaction it is determined that hash 0xabcd is to be deleted, it gets marked for deletion, then it gets created again. We must not delete the file at the end of the transaction.

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

Use Self::POSTCARD_MAX_SIZE for fixed size
...to allow for stores that use io even for the lookup, such as a redb store.

Also use iroh_base::Hash instead of blake3::Hash, and make gc delete use batches.
# Conflicts:
#	iroh-bytes/src/get/db.rs
#	iroh/src/downloader/get.rs
# Conflicts:
#	iroh/src/client.rs
#	iroh/src/node.rs
#	iroh/src/rpc_protocol.rs
Also align existing comments with reality
The split is now an impl detail of the batch writer
rklaehn added 4 commits March 14, 2024 11:45
## Description

refactor(iroh-bytes): Weak entry map

Get rid of some of the drop complexity by storing a weak handle map. The
handles themselves have a strong reference, so they are still easy to
use.

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates if relevant.
- [ ] Tests if relevant.
# Conflicts:
#	iroh-bytes/src/store/bao_file.rs
#	iroh-bytes/src/store/file.rs
#	iroh-bytes/src/store/file/tests.rs
#	iroh-bytes/src/store/traits.rs
@rklaehn
Copy link
Contributor Author

rklaehn commented Mar 14, 2024

The overall design is very neat, with the categorizing and batched iteration. Some comments inline. I think we can copy this design for the docs store as well, there we also have many very small transactions currently sometimes.

So of course there is a downside. What you report as true is always up to 500ms or whatever ahead of what is actually persisted. I feel that this is acceptable in this case, but you need to be aware of it.

Especially if we use the same approach for the doc store, you could get any combination of the doc store being ahead of the blob store or vice versa. So the doc store must not assume any entry state. E.g. doc store inserts some data, reports entry complete, crash after 10ms, entry is no longer complete.

@rklaehn rklaehn requested a review from Frando March 14, 2024 12:43
rklaehn added 5 commits March 14, 2024 16:56
# Conflicts:
#	iroh-cli/src/commands/blob.rs
#	iroh-cli/src/commands/doctor.rs
#	iroh-cli/src/commands/start.rs
#	iroh-cli/tests/cli.rs
#	iroh/Cargo.toml
# Conflicts:
#	iroh-bytes/src/store/file.rs
@rklaehn rklaehn marked this pull request as ready for review March 15, 2024 09:31
rklaehn added 2 commits March 19, 2024 10:11
# Conflicts:
#	iroh-cli/src/commands/start.rs
#	iroh/examples/rpc.rs
#	iroh/src/node.rs
Copy link
Member

@Frando Frando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are all addressed, so approving this!
If there are corner cases or unexpected situations, we hopefully catch them during dogfooding, so better to get this in now than later IMO.

Edit: Needs rebase

Base automatically changed from bao-file to main March 19, 2024 09:47
# Conflicts:
#	iroh-bytes/src/store/file.rs
#	iroh-bytes/src/store/file/import_flat_store.rs
#	iroh-bytes/src/store/file/tables.rs
#	iroh-bytes/src/store/file/test_support.rs
#	iroh-bytes/src/store/file/tests.rs
#	iroh-bytes/src/store/file/util.rs
#	iroh-bytes/src/store/file/validate.rs
#	iroh/tests/gc.rs
@rklaehn rklaehn added this pull request to the merge queue Mar 19, 2024
Merged via the queue into main with commit 8dd2c8c Mar 19, 2024
20 checks passed
@dignifiedquire dignifiedquire deleted the fast-redb-store branch March 19, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants