Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agentx && 15 seconds back-off window #8990

Closed
triple-it opened this issue Jul 7, 2021 · 8 comments
Closed

agentx && 15 seconds back-off window #8990

triple-it opened this issue Jul 7, 2021 · 8 comments
Labels
triage Needs further investigation

Comments

@triple-it
Copy link

triple-it commented Jul 7, 2021

  • FRR VERSION
    frr 7.5.1-0~ubuntu20.04 amd64 FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
    ~~
    Copyright 1996-2005 Kunihiro Ishiguro, et al.
    configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--enable-exampledir=/usr/share/doc/frr/examples/' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-systemd=yes' '--enable-rpki' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
    ~~

  • OPERATING SYSTEM VERSION
    Distributor ID: Ubuntu
    Description: Ubuntu 20.04.1 LTS
    Release: 20.04
    Codename: focal

  • KERNEL VERSION
    Linux app2 5.4.0-58-generic Rework time related code #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


Describe the bug

[ x] Did you check if this is a duplicate issue? (I see some agentx socket and path issues, but that is not related and this snmp disconnect behavior is not mentioned)
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Enable agentx
[root@app2 log]# grep snmp /etc/frr/daemons
zebra_options=" -A 127.0.0.1 -M snmp -s 90000000 --log file:/var/log/frr/zebra.log"
bgpd_options=" -A 127.0.0.1 -M snmp --log file:/var/log/frr/bgpd.log"
[root@app2 log ]#
[root@app2 log ]# grep agentx /etc/frr/frr.conf
agentx
[root@app2 log ]#

Expected behavior
Stable agentx SNMP behavior, no 15 seconds downtime

Seen behavior
SNMP agentx disconnects all the time

2021/07/07 08:59:56 BGP: snmp[info]: AgentX master disconnected us, reconnecting in 15
2021/07/07 08:59:56 BGP: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
2021/07/07 09:00:11 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 09:00:11 BGP: snmp[info]: NET-SNMP version 5.8 AgentX subagent connected
2021/07/07 09:16:56 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 09:16:56 BGP: snmp[info]: AgentX master disconnected us, reconnecting in 15
2021/07/07 09:16:56 BGP: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
2021/07/07 09:17:11 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 09:17:11 BGP: snmp[info]: NET-SNMP version 5.8 AgentX subagent connected
2021/07/07 09:19:56 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 09:19:56 BGP: snmp[info]: AgentX master disconnected us, reconnecting in 15
2021/07/07 09:19:56 BGP: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
2021/07/07 09:20:11 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 09:20:11 BGP: snmp[info]: NET-SNMP version 5.8 AgentX subagent connected

Running snmpd with agentx debugging reveals only a 'close'
#ExecStart=/usr/sbin/snmpd -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -f -p /run/snmpd.pid -Dagentx -Lf /var/log/snmpd.log

agentx/master: got response errstat=0, (req=0x386,trans=0x385,sess=0x9)
agentx/master: agentx_got_response() beginning...
agentx/master:   handle_agentx_response: processing: iso.3.6.1.2.1.15.3.1.2.100.64.4.98
agentx/master: handle_agentx_response() finishing...
agentx/master: transport disconnect on session 0x562fbe817d80
agentx/master: close 0x562fbe817d80, -1
agentx/master: handle pdu (req=0x46a10ec3,trans=0x0,sess=0x7)
agentx/master: send response, stat 0 (req=0x46a10ec3,trans=0x0,sess=0x7)
agentx_build: packet built okay

Questions:

  • How to troubleshoot this further?
  • Why is the socket closed towards bgpd? (Is it also closed towards zebra? Is this normal behavior? How can I verify?)
  • Is this maybe a known net-snmp related bug?
  • Is disconnect/reconnect normal behavior? But bgpd has a 15 seconds back-off algorithm implemented, so subsequent monitoring SNMP queries will fail in that 15 seconds window.
  • So everything is working, except not in that 15 seconds back-off window.
ii  snmpd          5.8+dfsg-2ubuntu2.3 amd64        SNMP (Simple Network Management Protocol) agents
NET-SNMP version:  5.8
Web:               http://www.net-snmp.org/
Email:             net-snmp-coders@lists.sourceforge.net
@triple-it triple-it added the triage Needs further investigation label Jul 7, 2021
@triple-it
Copy link
Author

It definately looks something related to the bgp_snmp agentx library.
I found in the NET-SNMP subagent.c the 'reconnecting in' code..

[root@changeme net-snmp ]# grep -Hrin "AgentX master disconnected us" *
agent/mibgroup/agentx/subagent.c:322:            snmp_log(LOG_INFO, "AgentX master disconnected us, reconnecting in %d\n", period);
agent/mibgroup/agentx/subagent.c:324:            snmp_log(LOG_INFO, "AgentX master disconnected us, not reconnecting\n");
[root@changeme net-snmp ]#

The period seems to be configurable with
agentxPingInterval 60
http://www.net-snmp.org/docs/README.agentx.html

@triple-it
Copy link
Author

triple-it commented Jul 7, 2021

So it seems the reconnect period is defined by the NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL.
In which pace the agentx session is reopened again (or pinged!)

int period =netsnmp_ds_get_int(NETSNMP_DS_APPLICATION_ID,NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL);
...
snmp_alarm_register(period, SA_REPEAT, agentx_reopen_session, NULL);
snmp_log(LOG_INFO, "AgentX master disconnected us, reconnecting in %d\n", period);

This DefaultStore value NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL is set to 15.
But it should be configurable by the agentxPingInterval isn't?

[root@changeme net-snmp ]# grep -Hin -A10 -B10 NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL agent/mibgroup/agentx/agentx_config.c    
agent/mibgroup/agentx/agentx_config.c-145-agentx_parse_agentx_ping_interval(const char *token, char *cptr)
agent/mibgroup/agentx/agentx_config.c-146-{
agent/mibgroup/agentx/agentx_config.c-147-    int x = atoi(cptr);
agent/mibgroup/agentx/agentx_config.c-148-
agent/mibgroup/agentx/agentx_config.c-149-    DEBUGMSGTL(("agentx/config/ping", "%s\n", cptr));
agent/mibgroup/agentx/agentx_config.c-150-    if (x < 1) {
agent/mibgroup/agentx/agentx_config.c-151-        config_perror("Invalid ping interval value");
agent/mibgroup/agentx/agentx_config.c-152-        return;
agent/mibgroup/agentx/agentx_config.c-153-    }
agent/mibgroup/agentx/agentx_config.c-154-    netsnmp_ds_set_int(NETSNMP_DS_APPLICATION_ID,
agent/mibgroup/agentx/agentx_config.c:155:                       NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL, x);
agent/mibgroup/agentx/agentx_config.c-156-}
agent/mibgroup/agentx/agentx_config.c-157-#endif                          /* USING_AGENTX_SUBAGENT_MODULE */
agent/mibgroup/agentx/agentx_config.c-158-
agent/mibgroup/agentx/agentx_config.c-159-/* ---------------------------------------------------------------------
agent/mibgroup/agentx/agentx_config.c-160- *
agent/mibgroup/agentx/agentx_config.c-161- * Sub-agent
agent/mibgroup/agentx/agentx_config.c-162- */
agent/mibgroup/agentx/agentx_config.c-163-
agent/mibgroup/agentx/agentx_config.c-164-
agent/mibgroup/agentx/agentx_config.c-165-/* ---------------------------------------------------------------------
--
agent/mibgroup/agentx/agentx_config.c-235-#ifdef USING_AGENTX_SUBAGENT_MODULE
agent/mibgroup/agentx/agentx_config.c-236-    /*
agent/mibgroup/agentx/agentx_config.c-237-     * tokens for subagent
agent/mibgroup/agentx/agentx_config.c-238-     */
agent/mibgroup/agentx/agentx_config.c-239-    if (SUB_AGENT == agent_role) {
agent/mibgroup/agentx/agentx_config.c-240-      agentx_register_config_handler("agentxPingInterval",
agent/mibgroup/agentx/agentx_config.c-241-                                     agentx_parse_agentx_ping_interval, NULL,
agent/mibgroup/agentx/agentx_config.c-242-                                     "AgentX ping interval");
agent/mibgroup/agentx/agentx_config.c-243-      /* ping and/or reconnect by default every 15 seconds */
agent/mibgroup/agentx/agentx_config.c-244-      netsnmp_ds_set_int(NETSNMP_DS_APPLICATION_ID,
agent/mibgroup/agentx/agentx_config.c:245:                         NETSNMP_DS_AGENT_AGENTX_PING_INTERVAL, 15);
agent/mibgroup/agentx/agentx_config.c-246-    }
agent/mibgroup/agentx/agentx_config.c-247-#endif /* USING_AGENTX_SUBAGENT_MODULE */
agent/mibgroup/agentx/agentx_config.c-248-}
[root@changeme net-snmp ]# 

Would that be a valid workaround? To set the 'agentxPingInterval' to 1? So we have at most a roughly 1 second SNMP outage from the BGPd?

Because it does not seem to work at my end..

2021/07/07 17:56:56 BGP: snmp[info]: AgentX master disconnected us, reconnecting in 15
2021/07/07 17:56:56 BGP: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
2021/07/07 17:57:11 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/07 17:57:11 BGP: snmp[info]: NET-SNMP version 5.8 AgentX subagent connected
[root@app2 ~ ]# grep -Hin agentx /etc/snmp/snmpd.conf 
/etc/snmp/snmpd.conf:24:master agentx
/etc/snmp/snmpd.conf:25:agentxPingInterval 1
[root@app2 ~ ]# 

Is this something we can configure as a parameter in FRR on the Agentx subagent config?
What are the possibilities to configure the Agentx on frr?

@pguibert6WIND
Copy link
Member

I think it should be possibe to modify /etc/snmp/frr.conf file with additional line agentxPingInterval 1. Did you try it?

@triple-it
Copy link
Author

Yes i am running it that way on one site. But still an snmp monitoring probe could falsely find that oid down then isn't?
(Probe receives no_such_oid)

@triple-it
Copy link
Author

Only a SNMP restart does not seem to work.
Maybe the BGPd needs also to be restarted, but that should wait for a while on this node, as we are also testing the durability of the ESTABLISHED time, towards an IPSEC host.
We should move/do some development on a DEV host first, where we can restart the BGPd anytime necessary...

2021/07/15 05:41:03 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/15 05:41:03 BGP: snmp[info]: AgentX master disconnected us, reconnecting in 15
2021/07/15 05:41:03 BGP: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
2021/07/15 05:41:18 BGP: [EC 100663310] snmp[err]: unknown snmp version 193
2021/07/15 05:41:18 BGP: snmp[info]: NET-SNMP version 5.8 AgentX subagent connected

Some kludge I came up with is:

  • Created an SNMP extend BASH script, which checks for " No Such Instance currently exists at this OID"
  • If you are really sure this one exists, you can enable a cache file for a couple of minutes.
  • This way monitoring probes don't immediately get "No Such instance" errors back from the SNMP agentx.
[root@app2 snmpproxy ]# ./snmpproxy.sh 
Usage: ./snmpproxy.sh <community> <oid>
Example: ./snmpproxy.sh public NET-SNMP-EXTEND-MIB::nsExtendOutput1Line."mon828": To proxy query that OID
Prepare a cache: echo 'NET-SNMP-EXTEND-MIB::nsExtendOutput1Line.mon828 = STRING: x' >   /tmp/'snmpproxy.sh.NET-SNMP-EXTEND-MIB::nsExtendOutput1Line.mon828.cache'
[root@app2 snmpproxy ]# 

The unsafety stale monitoring window of course increases by the CACHETIME...

@github-actions
Copy link

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Dec 16, 2022

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Dec 23, 2022
@frrbot frrbot bot removed the autoclose label Dec 23, 2022
@ayush1804027
Copy link

ayush1804027 commented Jun 24, 2023

Getting unknown command agentx in vtysh shell, agentx is not correctly configured with frr
How can I solve this problem?

frr-test-a# conf t
frr-test-a(config)# agentx
% [ZEBRA] Unknown command: agentx
frr-test-a(config)#

Already configured /etc/frr/daemons with the following parameters and also -M snmp to the parameters
zebra=yes
bgpd=yes
ospfd=yes
ospf6d=yes
ripd=yes
ripngd=yes
isisd=yes
pimd=yes
ldpd=yes
nhrpd=yes
eigrpd=yes
babeld=yes
sharpd=yes
staticd=yes
pbrd=yes
bfdd=yes
fabricd=yes

vtysh_enable=yes
zebra_options=" -s 90000000 --daemon -A 127.0.0.1 -M snmp"
bgpd_options=" --daemon -A 127.0.0.1 -M snmp"
ospfd_options=" --daemon -A 127.0.0.1 -M snmp"
ospf6d_options=" --daemon -A ::1 -M snmp"
ripd_options=" --daemon -A 127.0.0.1 -M snmp"
ripngd_options=" --daemon -A ::1 -M snmp"
isisd_options=" --daemon -A 127.0.0.1 -M snmp"
pimd_options=" --daemon -A 127.0.0.1 -M snmp"
ldpd_options=" --daemon -A 127.0.0.1 -M snmp"
nhrpd_options=" --daemon -A 127.0.0.1 -M snmp"
eigrpd_options=" --daemon -A 127.0.0.1 -M snmp"
babeld_options=" --daemon -A 127.0.0.1 -M snmp"
sharpd_options=" --daemon -A 127.0.0.1 -M snmp"
staticd_options=" --daemon -A 127.0.0.1 -M snmp"
pbrd_options=" --daemon -A 127.0.0.1 -M snmp"
bfdd_options=" --daemon -A 127.0.0.1 -M snmp"
fabricd_options=" --daemon -A 127.0.0.1 -M snmp"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

3 participants