Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_border_router stops routing after a while #16398

Open
benpicco opened this issue Apr 26, 2021 · 7 comments
Open

gnrc_border_router stops routing after a while #16398

benpicco opened this issue Apr 26, 2021 · 7 comments
Assignees
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@benpicco
Copy link
Contributor

benpicco commented Apr 26, 2021

Description

After a while the border router will stop routing any traffic from or to the WPAN.
The border router can still reach hosts outside the WPAN so the uplink connection is still intact.
Nodes can still reach the border router on it's 6lo interface, so the wireless connection is also still intact.

However, routing between the two interfaces ceases to work.
When the border router is rebooted, routing between WPAN and WAN is working again.

Steps to reproduce the issue

  • flash a node with the examples/gnrc_border_router firmware (I used a same54-xpro with ethernet uplink and at86rf215 extension module, but this has also been observed with SLIP on a custom board)
  • flash several nodes with examples/gnrc_networking firmware with USEMODULE += gnrc_ipv6_router_default changed to USEMODULE += gnrc_ipv6_default to avoid having nodes act as routers.
  • ???
    • disconnect nodes for a couple of minutes
    • bring in new nodes to the WPAN

Expected results

  • The border router can reach hosts outside the WPAN
  • 6lo nodes get a global prefix
  • 6lo nodes can reach hosts outside the WPAN

Actual results

  • The border router can reach hosts outside the WPAN
2021-04-26 21:32:12,759 # Iface  6  HWaddr: FC:C2:3D:0D:2D:1F 
2021-04-26 21:32:12,764 #           L2-PDU:1500  MTU:1492  HL:255  RTR  
2021-04-26 21:32:12,767 #           Source address length: 6
2021-04-26 21:32:12,769 #           Link type: wired
2021-04-26 21:32:12,775 #           inet6 addr: fe80::fec2:3dff:fe0d:2d1f  scope: link  VAL
2021-04-26 21:32:12,782 #           inet6 addr: 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f  scope: global  VAL
2021-04-26 21:32:12,785 #           inet6 group: ff02::2
2021-04-26 21:32:12,787 #           inet6 group: ff02::1
2021-04-26 21:32:12,791 #           inet6 group: ff02::1:ff0d:2d1f
2021-04-26 21:32:12,792 #           
2021-04-26 21:32:12,798 # Iface  5  HWaddr: 6A:E4  Channel: 26  Page: 0  NID: 0x23  PHY: O-QPSK 
2021-04-26 21:32:12,799 #           
2021-04-26 21:32:12,804 #           Long HWaddr: E6:EA:AF:F8:AF:5D:EA:E4 
2021-04-26 21:32:12,810 #            TX-Power: 0dBm  State: IDLE  max. Retrans.: 3  CSMA Retries: 4 
2021-04-26 21:32:12,816 #           AUTOACK  ACK_REQ  CSMA  L2-PDU:102  MTU:1280  HL:64  RTR  
2021-04-26 21:32:12,819 #           RTR_ADV  6LO  IPHC  
2021-04-26 21:32:12,822 #           Source address length: 8
2021-04-26 21:32:12,825 #           Link type: wireless
2021-04-26 21:32:12,831 #           inet6 addr: fe80::e4ea:aff8:af5d:eae4  scope: link  VAL
2021-04-26 21:32:12,838 #           inet6 addr: 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4  scope: global  VAL
2021-04-26 21:32:12,840 #           inet6 group: ff02::2
2021-04-26 21:32:12,843 #           inet6 group: ff02::1
2021-04-26 21:32:12,847 #           inet6 group: ff02::1:ff5d:eae4
2021-04-26 21:32:12,848 #           
2021-04-26 21:34:36,491 #  nib neigh
2021-04-26 21:34:36,499 # 2001:16b8:457f:b9f4:fc32:9114:cac:8f97 dev #5 lladdr FE:32:91:14:0C:AC:8F:97  STALE REGISTERED
2021-04-26 21:34:36,505 # fe80::de39:6fff:fe6a:6980 dev #6 lladdr DC:39:6F:6A:69:80 router STALE GC
2021-04-26 21:34:36,514 # 2001:16b8:457f:b9f4:ac8d:fee1:6041:91f1 dev #5 lladdr AE:8D:FE:E1:60:41:91:F1  STALE REGISTERED
2021-04-26 21:34:36,522 # 2001:16b8:457f:b9f4:fec2:3d00:0:bb1 dev #5 lladdr FC:C2:3D:00:00:00:0B:B1  STALE REGISTERED
2021-04-26 21:34:36,530 # 2001:16b8:457f:b9f4:204:2519:1801:c905 dev #5 lladdr 00:04:25:19:18:01:C9:05  STALE REGISTERED
2021-04-26 21:34:36,538 # 2001:16b8:457f:b9f4:a83c:23b3:10e6:9c05 dev #5 lladdr AA:3C:23:B3:10:E6:9C:05  STALE REGISTERED
2021-04-26 21:34:36,546 # 2001:16b8:457f:b900:de39:6fff:fe6a:6980 dev #6 lladdr DC:39:6F:6A:69:80 router STALE GC
2021-04-26 21:34:41,867 #  ping 2600::
2021-04-26 21:34:42,003 # 12 bytes from 2600::: icmp_seq=0 ttl=49 time=131.040 ms
2021-04-26 21:34:43,002 # 12 bytes from 2600::: icmp_seq=1 ttl=49 time=130.810 ms
2021-04-26 21:34:44,002 # 12 bytes from 2600::: icmp_seq=2 ttl=49 time=130.576 ms
2021-04-26 21:34:44,002 # 
2021-04-26 21:34:44,005 # --- 2600:: PING statistics ---
2021-04-26 21:34:44,010 # 3 packets transmitted, 3 packets received, 0% packet loss
2021-04-26 21:34:44,014 # round-trip min/avg/max = 130.576/130.808/131.040 ms
2021-04-26 21:39:24,198 #  nib prefix
2021-04-26 21:39:24,204 # 2001:16b8:457f:b9f4::/62 dev #5  expires 4214 sec deprecates 614 sec
2021-04-26 21:39:24,210 # 2001:16b8:457f:b900::/64 dev #6  expires 6811 sec deprecates 3211 sec
2021-04-26 21:40:00,245 #  nib route
2021-04-26 21:40:00,248 # 2001:16b8:457f:b9f4::/62 dev #5
2021-04-26 21:40:00,251 # 2001:16b8:457f:b900::/64 dev #6
2021-04-26 21:40:00,255 # default* via fe80::de39:6fff:fe6a:6980 dev #6
  • 6lo nodes get a global prefix
  • 6lo nodes can't reach nodes outside the WPAN
2021-04-26 21:35:35,122 # Iface  7  HWaddr: 1C:05  Channel: 26  NID: 0x23  PHY: O-QPSK 
2021-04-26 21:35:35,123 #           Long HWaddr: AA:3C:23:B3:10:E6:9C:05 
2021-04-26 21:35:35,124 #            State: IDLE 
2021-04-26 21:35:35,125 #           ACK_REQ  L2-PDU:102  MTU:1280  HL:64  6LO  
2021-04-26 21:35:35,126 #           IPHC  
2021-04-26 21:35:35,127 #           Source address length: 8
2021-04-26 21:35:35,128 #           Link type: wireless
2021-04-26 21:35:35,129 #           inet6 addr: fe80::a83c:23b3:10e6:9c05  scope: link  VAL
2021-04-26 21:35:35,131 #           inet6 addr: 2001:16b8:457f:b9f4:a83c:23b3:10e6:9c05  scope: global  VAL
2021-04-26 21:35:35,132 #           inet6 group: ff02::1
2021-04-26 21:35:35,132 #           
2021-04-26 21:35:35,133 #           Statistics for Layer 2
2021-04-26 21:35:35,134 #             RX packets 140  bytes 14511
2021-04-26 21:35:35,134 #             TX packets 9 (Multicast: 1)  bytes 0
2021-04-26 21:35:35,135 #             TX succeeded 9 errors 0
2021-04-26 21:35:35,135 #           Statistics for IPv6
2021-04-26 21:35:35,136 #             RX packets 33  bytes 11547
2021-04-26 21:35:35,136 #             TX packets 9 (Multicast: 1)  bytes 536
2021-04-26 21:35:35,137 #             TX succeeded 9 errors 0
2021-04-26 21:35:35,137 # 
2021-04-26 21:35:37,329 #  ping 2600::
2021-04-26 21:35:40,329 # 
2021-04-26 21:35:40,330 # --- 2600:: PING statistics ---
2021-04-26 21:35:40,331 # 3 packets transmitted, 0 packets received, 100% packet loss
2021-04-26 21:35:55,064 #  nib route
2021-04-26 21:35:55,066 # default* via fe80::e4ea:aff8:af5d:eae4 dev #7

border router is still reachable using global address on 6lo interface

2021-04-26 21:41:56,524 #  ping 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4
2021-04-26 21:41:56,532 # 12 bytes from 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4: icmp_seq=0 ttl=64 rssi=68 dBm time=5.984 ms
2021-04-26 21:41:57,533 # 12 bytes from 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4: icmp_seq=1 ttl=64 rssi=68 dBm time=5.984 ms
2021-04-26 21:41:58,533 # 12 bytes from 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4: icmp_seq=2 ttl=64 rssi=68 dBm time=6.624 ms
2021-04-26 21:41:58,533 # 
2021-04-26 21:41:58,534 # --- 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4 PING statistics ---
2021-04-26 21:41:58,534 # 3 packets transmitted, 3 packets received, 0% packet loss
2021-04-26 21:41:58,535 # round-trip min/avg/max = 5.984/6.197/6.624 ms

global address of uplink interface is not reachable

2021-04-26 21:42:09,260 #  ping 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f
2021-04-26 21:42:12,260 # 
2021-04-26 21:42:12,262 # --- 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f PING statistics ---
2021-04-26 21:42:12,263 # 3 packets transmitted, 0 packets received, 100% packet loss

packet capture

From Linux, the border router is reachable via it's uplink interface, but no via it's 6lo interface:

% ping 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f
PING 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f(2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f) 56 data bytes
64 bytes from 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f: icmp_seq=1 ttl=255 time=630 ms
64 bytes from 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f: icmp_seq=2 ttl=255 time=129 ms
64 bytes from 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f: icmp_seq=3 ttl=255 time=32.8 ms
64 bytes from 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f: icmp_seq=4 ttl=255 time=71.4 ms
64 bytes from 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f: icmp_seq=5 ttl=255 time=319 ms
^C
--- 2001:16b8:457f:b900:fec2:3dff:fe0d:2d1f ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 32.825/236.519/629.671/219.849 ms
% ping 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4
PING 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4(2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4) 56 data bytes
From 2001:16b8:457f:b900:de39:6fff:fe6a:6980 icmp_seq=1 Destination unreachable: No route
From 2001:16b8:457f:b900:de39:6fff:fe6a:6980 icmp_seq=2 Destination unreachable: No route
From 2001:16b8:457f:b900:de39:6fff:fe6a:6980 icmp_seq=3 Destination unreachable: No route
From 2001:16b8:457f:b900:de39:6fff:fe6a:6980 icmp_seq=4 Destination unreachable: No route
^C
--- 2001:16b8:457f:b9f4:e4ea:aff8:af5d:eae4 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3058ms

Versions

RIOT master (8a7f3ab)

I just realized that this is a duplicate of #14676 with a less elaborate setup process.

@benpicco benpicco added Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Apr 26, 2021
@benpicco
Copy link
Contributor Author

benpicco commented Apr 26, 2021

Hm, I don't see this with ethos / uhcp on samr21-xpro as the border router.

edit just to rule out a driver issue I connected the at86rf215 extension module to the samr21-xpro and used this instead of at86rf2xx.

Here I also don't see the issue. I suspect a race condition that is not exposed on the slower CPU.
This would also be in line with the observation that the issue seemingly goes away when enabling debug output in nib.c 😕

edit now I can't even reproduce the issue on same54-xpro with the same setup as before 😩

@fjmolinas
Copy link
Contributor

flash several nodes with examples/gnrc_networking firmware with USEMODULE += gnrc_ipv6_router_default changed to USEMODULE += gnrc_ipv6_default to avoid having nodes act as routers.

Is this a requirement to reproduce the issue?

@benpicco
Copy link
Contributor Author

It's a requirement if you want to prevent another issue to arise, namely that some nodes will select each other as the router instead of the border router.

@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 22, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@miri64
Copy link
Member

miri64 commented Jul 20, 2021

Have you tried to run it with ENABLE_DEBUG e.g. within the NIB?

@benpicco
Copy link
Contributor Author

Yes this makes the problem occur much less frequently.

@miri64
Copy link
Member

miri64 commented Jul 20, 2021

Damn

@ansocket
Copy link

ansocket commented May 5, 2023

Hello, @benpicco

I think I have similar problem with gnrc_border_router. After some time border_router can't receive anything by ethos.

Hardware:

  • 1 RAK3172 (STM32WLE5CC with LoRa), connected with ethos (uhcp) to linux (WSL)
  • 1 Remote node with the same chip.

What I do:

start_network.sh:
hocok@hocok:~/RIOT/dist/tools/ethos$ sudo ./start_network.sh /dev/ttyUSB0 tap0 bbbb::/48
net.ipv6.conf.tap0.forwarding = 1
net.ipv6.conf.tap0.accept_ra = 0
----> ethos: sending hello.
----> ethos: activating serial pass through.
----> ethos: hello received
NETOPT_TX_END_IRQ not implemented by driver
NETOPT_TX_END_IRQ not implemented by driver
gnrc_uhcpc: Using 6 as border interface and 7 as wireless interface.
main(): This is RIOT! (Version: b388f-lora_work)
RIOT border router example application
All up, running the shell now
Ifconfig:
ifconfig
Iface  6  HWaddr: A6:63:C3:8E:5D:0F
          L2-PDU:1500  MTU:1500  HL:64  RTR
          Source address length: 6
          Link type: wired
          inet6 addr: fe80::a463:c3ff:fe8e:5d0f  scope: link  VAL
          inet6 addr: fe80::2  scope: link  VAL
          inet6 group: ff02::2
          inet6 group: ff02::1
          inet6 group: ff02::1:ff8e:5d0f
          inet6 group: ff02::1:ff00:2

Iface  7  HWaddr: 24:2B  Channel: 0  NID: 0x23  PHY: BPSK
          Long HWaddr: A6:DE:12:64:DA:C5:A4:2B
           State: IDLE
          L2-PDU:102  MTU:1280  HL:64  RTR
          RTR_ADV  6LO  IPHC
          Source address length: 8
          Link type: wireless
          inet6 addr: fe80::a4de:1264:dac5:a42b  scope: link  VAL
          inet6 addr: bbbb::a4de:1264:dac5:a42b  scope: global  VAL
          inet6 group: ff02::2
          inet6 group: ff02::1
          inet6 group: ff02::1:ffc5:a42b
          inet6 group: ff02::1a

Then I switch on the remote node and connect by netcat. After I'm trying to send 10000 test packets by TCP (Sock API) from the remote node to linux host.

TCP connection:
hocok@hocok:~$ nc -6 -v bbbb::fc9e:48d0:a8de:14f5 12345
Connection to bbbb::fc9e:48d0:a8de:14f5 12345 port [tcp/*] succeeded!

>test -c 10000
 LoraLAB_TEST_SENDING = 0
 LoraLAB_TEST_SENDING = 1
 LoraLAB_TEST_SENDING = 2
 LoraLAB_TEST_SENDING = 3
 LoraLAB_TEST_SENDING = 4
 LoraLAB_TEST_SENDING = 5
...

And after random number of packets TCP reception is stopped (but not disconnected), BR ethos-console starts to write
uhcp_client(): no reply received
Also console stops to receive input messages:

ethos console:
> help
uhcp_client(): no reply received
uhcp_client(): no reply received
uhcp_client(): no reply received
help
help
ifconfig
uhcp_client(): no reply received

Meanwhile I can ping BR from the remote node, but cannot from linux host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

5 participants