Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(iroh-net): Upnp port mapping #1117

Merged
merged 72 commits into from
Jul 4, 2023

Conversation

divagant-martian
Copy link
Contributor

@divagant-martian divagant-martian commented Jun 20, 2023

This is the skateboard version of upnp mappings, which I think should cover a wide range of use cases to begin with.

Random notes on this (that depending on process will end up in one/several issues)

Tailscale differences (behaviour)

  • for probing, they first send a ssdp:all request (simple service discovery) this seems unnecessary to me since the type of device we want is an igd one. An ssdp:all request will be answered from many different devices, let's say for example speakers connected to the local network. I don't see this as useful since afterwards, the responses are filtered to gather the igd devices anyway. We don't do that, and instead send the igd request only.
  • before sending the igd request, which is technically sent to a multicast address, ts sends also a unicast request to the likely router address. This is to deal with firewalls that reject the response to the multicast address since it's coming to the unicast address. Including this sounds something that we should do, but would mean to contribute or fork igd or move the code to our codebase. The igd code is fairly simple so I don't think any of these options is out of scope.
  • the local address is for now set to ipv4 unspecified address. ts uses the interfaces address.
  • renewing and creating a mapping is done as soon as it's identified (awaiting) instead of "spawning a task" (goroutine)
  • no on_change when the mapping changes.
  • "closing" the client in tailscale mean invalidating the mappings and marking the client as closed. However the closed value is only ever used to .. prevent double closing. I think we can simply invalidate the mappings, there is no real use for the closing.

Other notes

This so far follows the og template but I'd like us to consider a couple points:

  • No need to register a on_change callback. We probably can instead add a tokio::sync::watch to keep track of the last changed value.
  • The client is clone, which is surprising to me. Do we really want multiple port mapping clients to exist and operate concurrently? Depending on this moving this to be more of a service/actor would make sense.
  • Updating the mapping is done in a very synchronous, reactive fashion. If we need a mapping and don't have one, we start creating it there. If the mapping is due for renewal, the renewal process starts in that instant (when we realize it's late ). We could instead add the necessary futures to renew the mapping. For creation, we would create the mapping as soon as the local port is known and not when the mapping is requested.
    These are all changes that make sense to me, but it would be good to have the opinion of someone with better understanding of how port mapping fits within the code as a whole, in concept.

Further notes

  • I noticed that the netcheck code that calls the upnp mapping does not wait for it to complete. While this was simple to solve, I'd like to first check out the git history of the code to understand what the current code is supposed to do and adjust accordingly.
  • What's the standard for logging?
  • What kind of metrics do we want to check? How do we do metrics in general
  • Tests will be interesting. I've tested the relevant parts of the code in a toy project and get a mapping, but in practice this probably depends on different kinds of routers and lan setups in general

@dignifiedquire
Copy link
Contributor

  • What's the standard for logging?

We use tracing crate for logging, some nice examples can be found in the netcheck.rs code.

@dignifiedquire
Copy link
Contributor

  • What kind of metrics do we want to check? How do we do metrics in general

This is how we define metrics:

for a detailed walkthrough, ping @Arqu our metrics guy :D

@dignifiedquire
Copy link
Contributor

  • What kind of metrics do we want to check?

I would start with the metrics that tailscale collects, other than that my general approach is to collect

  • success & failure rates
  • unexpected events

@dignifiedquire
Copy link
Contributor

  • Tests will be interesting.

Yes, maybe we could add some functionality to the doctor command that tries to do a port mapping and reports how well it worked, so it is easy for folks to run manual tests.

@Arqu
Copy link
Collaborator

Arqu commented Jun 21, 2023

Dig mostly covered the bare basics of metrics. To have metrics collected run with iroh --metrics-addr 127.0.0.1:9090 provide ... then you can access the current snapshot of prometheus metrics on localhost:9090. There's a separate exporter you can run to push things up to our infra.

As for what metrics, there is no hard rule, metrics are there to serve us, the important ones will be surfaced more in grafana but most of them are useful. I usually go by add more as you need. Typically you want to cover major events, and output counters along with instrumenting main loops and interesting branches/error cases.

@dignifiedquire
Copy link
Contributor

  • No need to register a on_change callback. We probably can instead add a tokio::sync::watch to keep track of the last changed value.

Sounds good, though should probably check for what it is actually used where and if we need it.

@dignifiedquire
Copy link
Contributor

  • The client is clone, which is surprising to me.

This is purely because of the structure of netcheck.rs at the moment. If we can make it work without, cool.

@dignifiedquire
Copy link
Contributor

  • Updating the mapping is done in a very synchronous, reactive fashion. If we need a mapping and don't have one, we start creating it there. If the mapping is due for renewal, the renewal process starts in that instant (when we realize it's late ). We could instead add the necessary futures to renew the mapping. For creation, we would create the mapping as soon as the local port is known and not when the mapping is requested.
    These are all changes that make sense to me, but it would be good to have the opinion of someone with better understanding of how port mapping fits within the code as a whole, in concept.

This sounds good to me in general, but I think it would be good to check with @flub how to best integrate this with his refactor plans for netcheck.

ramfox
ramfox previously requested changes Jun 21, 2023
iroh-net/src/hp/portmapper.rs Outdated Show resolved Hide resolved
iroh-net/src/hp/portmapper/upnp.rs Outdated Show resolved Hide resolved
tracing::debug!(
"upnp gateway changed from {old_gateway_addr} to {gateway_addr}"
);
// TODO(@divagant-martian): tailscale does not invalidate the mappings here. Why?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've searched for the router and the router has changed, maybe this implies that we don't have access to the old one anymore, so release-ing the mapping is maybe not possible, or just not our responsibility?

If our local address has changed, but our router is the same, I understand we want to release the mapping because we want them to not direct the traffic to the old local address.

Anyways, I think it's fine to not invalidate the mappings in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right about releasing the mappings, sorry for the lack of precision: I mean they don't clear the mappings. So we still have self.current_mapping as Some even if, as you pointed, the router is not there anymore

iroh-net/src/hp/portmapper.rs Outdated Show resolved Hide resolved
iroh-net/src/hp/portmapper.rs Outdated Show resolved Hide resolved
/// Updates the portmap related values based on a [`portmapper::ProbeResult`].
pub fn update_portmap_probe(&mut self, probe_result: Option<portmapper::ProbeOutput>) {
match probe_result {
Some(portmapper::ProbeOutput { upnp, pmp, pcp }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, lest just just store the ProbeOutput instead of these three individual fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 0846437

PortMap {
/// Local port to get a mapping.
local_port: NonZeroU16,
/// How long to wait for an external port to be ready.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in seconds or sth like that is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 552e1c7

fn poll(&mut self, cx: &mut std::task::Context<'_>) -> Poll<Event> {
// grab the waker if needed
if let Some(waker) = &self.waker {
if waker.will_wake(cx.waker()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly does this check do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an optimization to prevent switching wakers when the task has not moved threads

@dignifiedquire
Copy link
Contributor

When I try and run the doctor command on a network where upnp is disabled it just times out, but looking at the log it should actually error. Any chance that error could be properly reported?

Screenshot 2023-07-03 at 10 41 11

@dignifiedquire
Copy link
Contributor

Okay, tried to run this locally and got into the following issue. The local_addr ends up being set to 127.0.0.1 when requesting a mapping, resulting in an error, as can be seen here (using the rust-igd examples)

Screenshot 2023-07-03 at 11 08 10

When I use my local lan IP 192.168.x.x things work as expected in all the examples for rust-igd.

@divagant-martian
Copy link
Contributor Author

When I try and run the doctor command on a network where upnp is disabled it just times out, but looking at the log it should actually error. Any chance that error could be properly reported?

Not sure what you mean, in your screenshot there is both the log and the error.

Can you elaborate?

@dignifiedquire
Copy link
Contributor

Can you elaborate?

The error is only reported in the logs. If I run without logs only this shows up

Screenshot 2023-07-03 at 12 50 09

@divagant-martian divagant-martian self-assigned this Jul 3, 2023
@divagant-martian
Copy link
Contributor Author

@dignifiedquire ok I see. This is somewhat intentional and I see both approaches making sense. Right now I report whether there is an external address and if it changed on the watcher. In the doctor command, a failure would get a None report to a None report (so intentionally not really reported". This is the approach tailscale has.

If you think it makes sense, I could change this option to a result and propagate it up. The only place where this would make a difference tho is in the doctor command

@dignifiedquire
Copy link
Contributor

I see, that makes sense. Given that we can get the errors via logs, I think we can leave it.

@divagant-martian
Copy link
Contributor Author

divagant-martian commented Jul 3, 2023

This should be ready in any case, since I moved to using the same dep as #1157 for the local ip.
Home router is not necessary yet, since the addr in which an upnp gateway answers is a multicast addr following the spec, not its unicast addr

@divagant-martian divagant-martian changed the title feat: Upnp port mapping feat(iroh-net): Upnp port mapping Jul 4, 2023
Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets go :)

@divagant-martian divagant-martian dismissed ramfox’s stale review July 4, 2023 16:16

changes have been addressed and pr has been approved

@divagant-martian divagant-martian merged commit 701e9b7 into n0-computer:main Jul 4, 2023
@divagant-martian divagant-martian deleted the upnp-port-mapping branch September 17, 2023 14:55
matheus23 pushed a commit that referenced this pull request Nov 14, 2024
* add igd

* add log, use types for readability

* request mapping

* fix code placement + comments

* filling in the gaps

* some cleanup and refactor

* debugging

* debug cleanning

* merge cleanup

* easy doctor appointment

* debug cleanning

* use non zero to ensure port to map is positive

* spelling

* follow iroh code style

* log when probe fails

* update comment

* more logs, use localhost

* add portmap metrics to iroh-metrics

* use portmap metrics

* add client

* wip: closing for the day

* wip

* more wip

* connecting stuff

* coding with the heart lol

* fixes

* customize timeout of portmapping doctor command

* on_change for portmap

* improve docs and logs

* add mapping struct that handles maintenance

* add waker

* improve debug logs

* connect with mapping thing

* partially connect current mapping stream

* simplify mapping handling and add a sanity test

* remove hardcoded mapping half life

* remove unused field

* ensure external port is non zero

* simplify test types, reduce test timeout

* add external addr fn to mapping

* rework getting a new upnp mapping to reuse a known gateway and or known external port

* connect expiry and renew events

* split client functions

* update doctor command

* add service channel capacity

* remove unnecesary change

* leverage derive_more and the utils mod

* remove yet one more unnecesary clone

* Revert "remove yet one more unnecesary clone" might be controversial since technically the probe starts before the report

This reverts commit f558246.

* misc spelling

* misc docs

* cleanup

* more cleanup

* simplify logic updating a probe

* appease clippy

* expand comment

* expand metrics

* self review

* improve doctor docs

* unflatten portmap probe

* get interfaces

* get interfaces using dep already here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants