-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instead of erroring, clean up after dangling IDs in DB #14321
Instead of erroring, clean up after dangling IDs in DB #14321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// remove things from the table during | ||
// it. | ||
logrus.Errorf("Database issue: dangling ID %s found (not a pod or container) - removing", string(id)) | ||
toRemoveIDs = append(toRemoveIDs, string(id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to add a continue here. Right now the logic will go to the next step which is podBkt.Get(stateKey)
and will result in a nil pointer panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good catch! I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sry this is not an actual for loop, just a function so you have to return
37803d9
to
68c1035
Compare
For various (mostly legacy) reasons, Podman presently maintains a unified namespace for pods and containers - IE, we cannot have both a pod and a container named "test" at the same time. To implement this, we use a global database table of every pod and container ID (and another of every pod and container name). These entries should be added when containers/pods are added, and removed when containers/pods are removed, with the database's transactional integrity providing a guarantee that this is batched with the overall removal and that the DB should remain sane and consistent no matter what. As such, we treat a dangling ID as a hard error that stops the use of Podman. Unfortunately, we have someone run into this last Friday. I'm still not certain how exactly their DB got into this state, but without further clarification there, we can consider removing the error and making Podman instead clean up and remove any dangling IDs, which should restore Podman to a serviceable state. Drop an error message if we do this, though, because people should know that the DB is in a bad state. [NO NEW TESTS NEEDED] it is deliberately impossible to produce a configuration that would test this without hex-editing the DB file. Signed-off-by: Matthew Heon <mheon@redhat.com>
68c1035
to
b7dbc50
Compare
/lgtm |
/hold cancel |
/cherry-pick v3.0.1-rhel |
Bot's broken, doing it manual-like |
[v3.0.1-rhel] Backport #14321
For various (mostly legacy) reasons, Podman presently maintains a unified namespace for pods and containers - IE, we cannot have both a pod and a container named "test" at the same time. To implement this, we use a global database table of every pod and container ID (and another of every pod and container name).
These entries should be added when containers/pods are added, and removed when containers/pods are removed, with the database's transactional integrity providing a guarantee that this is batched with the overall removal and that the DB should remain sane and consistent no matter what. As such, we treat a dangling ID as a hard error that stops the use of Podman.
Unfortunately, we have someone run into this last Friday. I'm still not certain how exactly their DB got into this state, but without further clarification there, we can consider removing the error and making Podman instead clean up and remove any dangling
IDs, which should restore Podman to a serviceable state. Drop an error message if we do this, though, because people should know that the DB is in a bad state.
[NO NEW TESTS NEEDED] it is deliberately impossible to produce a configuration that would test this without hex-editing the DB file.