fix(iroh-blobs): do not hit the network when downloading blobs which are complete #2586

Frando · 2024-08-05T09:11:28Z

Description

Two changes to the downloader:

Never try to download from ourselves. If the only provider node added is our own node, fail with error "no providers".
The actual download request flow is turned into a generator (while keeping API compatibility for the existing get_to_db public function). A new get_to_db_in_steps function either runs to completion if the requested data is fully available locally, or yields a NeedsConn struct at the point where it needs a network connection to proceed. The NeedsConn has an async proceed(self, conn: Connection), which must be called with a connection for the actual download to start.

This two-step process allows the downloader to check if we should dial nodes at all, or are already done without doing anything, while emitting the exact same flow of events (because we run the same loop) to the client.

To achieve this, get_to_db now uses a genawaiter generator internally. This means that the big loop that is the iroh-blobs protocol request flow does not have to be changed at all, only that instead of a closure we yield and resume, which makes this much easier to integrate into an external state machine like the downloader.

The changes needed for this for the downloader are a bit verbose because the downloader itself is generic over a Getter, with impls for the actual impl and a test impl that does not use networking; therefore the new NeedsConn state has to be modeled with an additional associated type and trait here.

This PR adds three tests:

Downloading a missing blob from the local node fails without trying to connect to ourselves
Downloading an existing blob succeeds without trying to download
Downloading an existing collection succeeds without trying to download

Closes #2575
Replaced #2576

Notes and open questions

Breaking changes

None, only an API addition to the public API of iroh_blobs: iroh_blobs::get::check_local_with_progress_if_complete

Ref #2575

…nd do not attempt to download from ourselves

github-actions · 2024-08-05T09:13:52Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2586/docs/iroh/

Last updated: 2024-08-05T11:37:34Z

…mplete locally first

dignifiedquire · 2024-08-05T10:23:57Z

can you also add a test with a collection that needs some blobs to be downloaded

Frando · 2024-08-05T10:24:16Z

Wait, I am not sure if this is correct. We can only resume the generator once, so we have to recreate it if a node fails. I do this in the direct_download case, but not in the queued downloader. Hrmm, gotta think more.

Frando · 2024-08-05T11:44:45Z

OK, I pushed another commit that hopefully fixes things.

Here's the constraint of the current design in the PR: We create a generator, which encapsulates the loop-over-hashseq flow for getting collections. We start the generator before connecting to a node, to yield Complete in case we have everything available locally. The generator is resumed with a connection, to then perform the actual download protocol, continuing the loop as needed. However, this generator we can only ever resume once. So if the node fails during the download, and we want to try the next node, we have to recreate the generator. This is what we do now in the PR. This means, though, that in this case the FoundLocal events would be emitted again.

When using the iroh blobs client, this is not necessarily an issue, because the TransferState that we reduce the events into by default deals fine with this.

I see three options now:

a) Keep as-is in the PR and document that events may be emitted multiple times when the download is auto-retried, and have to deduplicated or ignored client-side if acting on each event. Easiest option this is.
b) Deduplicate the events when sending them out, by keeping a TransferState in the progress sender. This is straightforward to add to the queued downloader, because there we keep a TransferState anyway to emit an initial state when a new intent is added to an already-queued download request. If we want to have it for direct downloads too, the "keep a transfer state" logic would have to be added to the get_to_db flow as well. Might be rather straightforward too, but would increase the state being kept.
c) Do not use a generator, but rewrite the get_to_db loop so that they return a cloneable struct of its state at the time when it needs a conn, and resume with that struct. This likely means "write manual state machine instead of looping dynamically". Not super keen on doing this, but it wouldn't be the end of the world either, the loop is not that complex.

WDYT? Also, cc @rklaehn

dignifiedquire · 2024-08-05T11:49:18Z

The problem I see with the current approach is that you would get Local events, for blobs that where not local before, eg peer 1 has blob1, but not blob2, and peer 2 has blob2, so you get the following events if I understand correctly

Remote: Blob1
< Error, blob2 not found with peer1 >
< Restart generator >
Local: Blob1
Remote: Blob2

Frando · 2024-08-05T12:01:58Z

The problem I see with the current approach is that you would get Local events, for blobs that where not local before

True. We should write a test for the scenario you made up in any case.

My current thinking is that we could use the TransferState that already exists to check if any event should be emitted or not. At least for FoundLocal. So option b) that is.

And for Found (which should really be renamed FoundRemote) we should add the node id of the remote peer IMO and emit them multiple times. Edit: Ugh, we can't do that, because iroh-blobs doesn't know about node ids, it works with regular quic connections too. Unfortunate. Either a new generic around the connection it is, or mapping the node id in in the higher layers.

This is turning out a bit bigger than envisioned. I'd say let's keep discussing a bit and then I can impl what we arrive at (did this a bit "on the go" so far, without having gotten at these difficulties before. This is good to tackle however, on current main we are a) double-emitting events for each node we try, and b) have no tests for that).

dignifiedquire · 2024-08-05T12:14:37Z

I think if we have the lower levels of iroh-blobs emit events on a per transfer basis this is good.
For the events emitted from the downloader, I am thinking we could lean into the duplicate events and wrap the events in a an event like this

enum DownloaderEvent {
  BlobTransferEvent {
    event: BlobEvent,
    target_node: NodeId,
  }
}

and allow full insight into the actual happenings for a user

dignifiedquire and others added 8 commits August 2, 2024 09:54

test: add failing test

0482bac

Ref #2575

fix(iroh-blobs): in downloader, check if blob is available locally, a…

8845025

…nd do not attempt to download from ourselves

fix/test: never download from ourselves, also not in direct mode

44cf421

fix: properly check local availability before downloading

90258c5

tests: add test for downloading existing collection

ccf15d4

tests: use TestResult

d0bdb8e

chore: make typechecker happy & fmt

be28e29

refactor(iroh-blobs): use a generator for get_to_db

f9c076d

chore: cleanup

8aa6317

Frando mentioned this pull request Aug 5, 2024

fix(iroh): don't hit network for complete blobs&collections #2576

Closed

Frando added 2 commits August 5, 2024 11:24

fix: metrics feature flags

58c83f9

fix: unused import

aa403f4

Frando changed the title ~~refactor(iroh-blobs): use generator to perform checking the local store and downloading from network in two steps~~ refactor(iroh-blobs): use generator to make checking locally and downloading a two-step process Aug 5, 2024

Frando added 2 commits August 5, 2024 12:00

fix: never download from ourselves and always check if download is co…

8cc1693

…mplete locally first

fix: move no-download-from-self check after is-complete-locally check

750eb20

Frando marked this pull request as ready for review August 5, 2024 10:03

Frando changed the title ~~refactor(iroh-blobs): use generator to make checking locally and downloading a two-step process~~ refactor(iroh-blobs): do not hit the network when downloading blobs which are complete Aug 5, 2024

chore: fmt

8bb97be

fix: recreate the generator if we try another node

ec2baad

dignifiedquire enabled auto-merge August 5, 2024 16:37

dignifiedquire changed the title ~~refactor(iroh-blobs): do not hit the network when downloading blobs which are complete~~ fix(iroh-blobs): do not hit the network when downloading blobs which are complete Aug 5, 2024

dignifiedquire approved these changes Aug 5, 2024

View reviewed changes

dignifiedquire added this pull request to the merge queue Aug 5, 2024

Merged via the queue into main with commit 0784403 Aug 5, 2024
29 checks passed

dignifiedquire deleted the refactor-download-check-local branch August 5, 2024 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(iroh-blobs): do not hit the network when downloading blobs which are complete #2586

fix(iroh-blobs): do not hit the network when downloading blobs which are complete #2586

Frando commented Aug 5, 2024 •

edited

Loading

github-actions bot commented Aug 5, 2024 •

edited

Loading

dignifiedquire commented Aug 5, 2024

Frando commented Aug 5, 2024

Frando commented Aug 5, 2024 •

edited

Loading

dignifiedquire commented Aug 5, 2024 •

edited

Loading

Frando commented Aug 5, 2024

dignifiedquire commented Aug 5, 2024

fix(iroh-blobs): do not hit the network when downloading blobs which are complete #2586

fix(iroh-blobs): do not hit the network when downloading blobs which are complete #2586

Conversation

Frando commented Aug 5, 2024 • edited Loading

Description

Notes and open questions

Breaking changes

github-actions bot commented Aug 5, 2024 • edited Loading

dignifiedquire commented Aug 5, 2024

Frando commented Aug 5, 2024

Frando commented Aug 5, 2024 • edited Loading

dignifiedquire commented Aug 5, 2024 • edited Loading

Frando commented Aug 5, 2024

dignifiedquire commented Aug 5, 2024

Frando commented Aug 5, 2024 •

edited

Loading

github-actions bot commented Aug 5, 2024 •

edited

Loading

Frando commented Aug 5, 2024 •

edited

Loading

dignifiedquire commented Aug 5, 2024 •

edited

Loading