Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keepalive events stop processing after the first occurrence #2135

Closed
barryorourke opened this issue Sep 28, 2018 · 3 comments · Fixed by #2502
Closed

Keepalive events stop processing after the first occurrence #2135

barryorourke opened this issue Sep 28, 2018 · 3 comments · Fixed by #2502
Assignees
Milestone

Comments

@barryorourke
Copy link

I've been experimenting with alerting on keepalive events the past few days and have managed to get them to alert via a handler. However, I've not managed to figure out how to get them to alert more than once or get past warning state.

@palourde suggested I raise this bug after discussing it with him on slack.

Expected Behavior

Keepalive events should continue to run and be handled after the agent has stopped running.

Current Behavior

I see event filtered every 20 seconds whilst the agent is running. When I stop the agent I see sending event to handler after 120 seconds, then there are no more events processed until I start the agent back up.

Steps to Reproduce (for bugs)

  1. set up keepalive handler
[root@phbtest ~]# sensuctl handler info keepalive
{
  "type": "Handler",
  "spec": {
    "name": "keepalive",
    "type": "set",
    "timeout": 60,
    "handlers": [
      "cst",
      "cstv2"
    ],
    "filters": null,
    "env_vars": null,
    "environment": "default",
    "organization": "default"
  }
}
[root@phbtest ~]# sensuctl handler info cstv2
{
  "type": "Handler",
  "spec": {
    "name": "cstv2",
    "type": "pipe",
    "command": "/usr/bin/handler-cst-sensu2",
    "timeout": 0,
    "handlers": null,
    "filters": [
      "is_incident"
    ],
    "env_vars": null,
    "environment": "default",
    "organization": "default"
  }
}
[root@phbtest ~]# cat /usr/bin/handler-cst-sensu2
#!/usr/bin/env python3
import datetime
import sys

if __name__ == "__main__":
    event = sys.stdin.read()
    with open('/tmp/sensu-v2handler-dump', 'w') as f:
        f.write("{}: {}".format(str(datetime.datetime.now()), event))
  1. turn off the agent
  2. watch the logs, after the initial occurrence of the failing state the keepalive does not handle again until the agent is restarted.

Context

Keepalives do not behave in the same way as they did under Sensu 1.x.

Your Environment

  • Sensu version used (sensuctl, sensu-backend, and/or sensu-agent):
  • Installation method (packages, binaries, docker etc.):
[root@phbtest ~]# rpm -qa sensu-*
sensu-cli-2.0.0~nightly+20180926-1.x86_64
sensu-backend-2.0.0~nightly+20180924-1.x86_64
sensu-agent-2.0.0~nightly+20180926-1.x86_64
  • Operating System and version (e.g. Ubuntu 14.04):
    Scientific Linux 7.5
@palourde palourde added the bug label Sep 28, 2018
@barryorourke
Copy link
Author

Through further experimentation I have discovered that the occurrence count will increment if you restart sensu-backend whilst the agent is down.

@nikkictl
Copy link

nikkictl commented Oct 30, 2018

Hey @barryorourke, thanks for brining this to our attention! I just submitted a PR that addresses a few keepalive bugs. Could you try reproducing after #2245 lands and let us know if you're still experiencing this bug? I've confirmed that the PR does not address this issue.

@barryorourke
Copy link
Author

This should probably be labeled "1.x parity"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants