-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add etcd snapshot and restore #2118
Conversation
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2584ac0
to
3ead6fa
Compare
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
5b83392
to
78fd41f
Compare
9f73245
to
d3575b6
Compare
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 🇮🇹
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💰
Will the snapshots be stored forever? |
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
46f1902
to
6fa3b27
Compare
@galal-hussein I try to test after introduce learner. If I do a cluster reset, I will get panic like below, which was not seen before. What is going wrong? Or it is simply not ready yet? How I test:
This is 100% rep-producable for me.
|
@ElisaMeng I will check that out, thanks a lot, if you dont mind can you open a separate issue with the problem |
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
@galal-hussein I will do that, I post here because I thought this was a WIP, don't wanna press the alert too early. :) See #2131 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Destination: &ServerConfig.DisableSnapshots, | ||
}, | ||
&cli.StringFlag{ | ||
Name: "snapshot-dir", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad we didn't decide earlier if things are -path
or -dir
. I wanted to complain that we're adding two paths, but one is called Path and one is called Dir, but we're already inconsistent on this for other config settings. Should the the snapshot-restore-path point at a single file?
@@ -89,6 +97,25 @@ func nameFile(config *config.Control) string { | |||
return filepath.Join(dataDir(config), "name") | |||
} | |||
|
|||
func snapshotDir(config *config.Control) (string, error) { | |||
if config.SnapshotDir == "" { | |||
// we have to create the snapshot dir if we are using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we create the default dir if it doesn't exist? What happens if the user specifies a nonexistent path? Do we want to create that also, or just fail when trying to save the snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we fail in that case. I'd wouldn't want the software creating n+1 nested dir structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah same here, maybe check here that it's a writable directory?
Use #2154 |
Proposed changes
Add etcd snapshot and restoration feature, this include adding 5 new flags:
Types of changes
Verification
snapshot should be enabled by default and the snapshot dir defaults to /server/db/snapshots, you should verify by checking the directory for snapshots saved there.
restoration should happen the same way cluster-reset is working, when specifying the flag it will move the old data dir to /server/db/etcd-old and then attempt to restore the snapshot by creating a new data dir, and then start etcd with forcing a new cluster so that it start a 1 member cluster.
to verify the restoration you should be able to see cluster starting with only one etcd member and the data from the specified snapshot is being restored correctly.
The pr also will add etcd snapshot retention and also a flag to disable snapshots altogether.
Testing
###1 Testing disabling snapshots
you should see no snapshot has been created in /server/db/snapshots
###2 Testing snapshot interval and retention
you should see snapshots created every 5 seconds in /server/db/snapshots and no more than 10 snapshots in the directory
###3 testing snapshot restore
To test snapshot restore, just use --snapshot-restore-path and point to any snapshot file that you have on the system
The cluster should restore and etcd will only have one member, also you should see /server/db/etcd-old directory has been created.
Linked Issues
rancher/rke2#45