-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Galal hussein etcd backup restore #2154
Galal hussein etcd backup restore #2154
Conversation
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
The sub command was discussed in a conversation last week or Monday and I think it was expressed as a desire but a step after this was accomplished. |
makes sense. i'll wait to see what darren says only to make sure his opinion on the UX doesnt clash with what we have here. |
Ok, synced with @ibuildthecloud and here's the changes. SPOILER: he wants to do less than what i proposed.
Two things unrelated to this PR:
|
To track the unrelated items we'd like to pick up later I have created the following for tracking purposes: |
For the con, do we need to support cronspec? |
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
…enerate Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
@cjellick In regards to |
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
So there should always need a snapshot to restore a cluster If it loss quorum? |
If you issue |
* Add etcd snapshot and restore Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * fix error logs Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * goimports Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * fix flag describtion Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * Add disable snapshot and retention Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * use creation time for snapshot retention Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com> * unexport method, update var name Signed-off-by: Brian Downs <brian.downs@gmail.com> * adjust snapshot flags Signed-off-by: Brian Downs <brian.downs@gmail.com> * update var name, string concat Signed-off-by: Brian Downs <brian.downs@gmail.com> * revert previous change, create constants Signed-off-by: Brian Downs <brian.downs@gmail.com> * update Signed-off-by: Brian Downs <brian.downs@gmail.com> * updates Signed-off-by: Brian Downs <brian.downs@gmail.com> * type assertion error checking Signed-off-by: Brian Downs <brian.downs@gmail.com> * update Signed-off-by: Brian Downs <brian.downs@gmail.com> * update Signed-off-by: Brian Downs <brian.downs@gmail.com> * update Signed-off-by: Brian Downs <brian.downs@gmail.com> * pr remediation Signed-off-by: Brian Downs <brian.downs@gmail.com> * pr remediation Signed-off-by: Brian Downs <brian.downs@gmail.com> * pr remediation Signed-off-by: Brian Downs <brian.downs@gmail.com> * pr remediation Signed-off-by: Brian Downs <brian.downs@gmail.com> * pr remediation Signed-off-by: Brian Downs <brian.downs@gmail.com> * updates Signed-off-by: Brian Downs <brian.downs@gmail.com> * updates Signed-off-by: Brian Downs <brian.downs@gmail.com> * simplify logic, remove unneeded function Signed-off-by: Brian Downs <brian.downs@gmail.com> * update flags Signed-off-by: Brian Downs <brian.downs@gmail.com> * update flags Signed-off-by: Brian Downs <brian.downs@gmail.com> * add comment Signed-off-by: Brian Downs <brian.downs@gmail.com> * exit on restore completion, update flag names, move retention check Signed-off-by: Brian Downs <brian.downs@gmail.com> * exit on restore completion, update flag names, move retention check Signed-off-by: Brian Downs <brian.downs@gmail.com> * exit on restore completion, update flag names, move retention check Signed-off-by: Brian Downs <brian.downs@gmail.com> * update disable snapshots flag and field names Signed-off-by: Brian Downs <brian.downs@gmail.com> * move function Signed-off-by: Brian Downs <brian.downs@gmail.com> * update field names Signed-off-by: Brian Downs <brian.downs@gmail.com> * update var and field names Signed-off-by: Brian Downs <brian.downs@gmail.com> * update var and field names Signed-off-by: Brian Downs <brian.downs@gmail.com> * update defaultSnapshotIntervalMinutes to 12 like rke Signed-off-by: Brian Downs <brian.downs@gmail.com> * update directory perms Signed-off-by: Brian Downs <brian.downs@gmail.com> * update etc-snapshot-dir usage Signed-off-by: Brian Downs <brian.downs@gmail.com> * update interval to 12 hours Signed-off-by: Brian Downs <brian.downs@gmail.com> * fix usage typo Signed-off-by: Brian Downs <brian.downs@gmail.com> * add cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * add cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * add cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * wire in cron Signed-off-by: Brian Downs <brian.downs@gmail.com> * update deps target to work, add build/data target for creation, and generate Signed-off-by: Brian Downs <brian.downs@gmail.com> * remove dead make targets Signed-off-by: Brian Downs <brian.downs@gmail.com> * error handling, cluster reset functionality Signed-off-by: Brian Downs <brian.downs@gmail.com> * error handling, cluster reset functionality Signed-off-by: Brian Downs <brian.downs@gmail.com> * update Signed-off-by: Brian Downs <brian.downs@gmail.com> * remove intermediate dapper file Signed-off-by: Brian Downs <brian.downs@gmail.com> Co-authored-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Proposed changes
Add etcd snapshot and restoration feature, this include adding 5 new flags:
Types of changes
Verification
snapshot should be enabled by default and the snapshot dir defaults to /server/db/snapshots, you should verify by checking the directory for snapshots saved there.
restoration should happen the same way cluster-reset is working, when specifying the flag it will move the old data dir to /server/db/etcd-old and then attempt to restore the snapshot by creating a new data dir, and then start etcd with forcing a new cluster so that it start a 1 member cluster. Upon completion of the restore, k3s will exit.
to verify the restoration you should be able to see cluster starting with only one etcd member and the data from the specified snapshot is being restored correctly.
The pr also will add etcd snapshot retention and also a flag to disable snapshots altogether.
Testing
###1 Testing disabling snapshots
you should see no snapshot has been created in /server/db/snapshots
###2 Testing snapshot interval and retention
you should see snapshots created every 5 seconds in /server/db/snapshots and no more than 10 snapshots in the directory
Make sure that if the
--cluster-reset-restore-path
flag is given that the--cluster-reset
flag is present:Verify that there's a failure when an invalid path is provided:
###3 testing snapshot restore
To test snapshot restore, just use --etcd-snapshot-restore-path and point to any snapshot file that you have on the system
The cluster should restore and etcd will only have one member, also you should see /server/db/etcd-old directory has been created.
Linked Issues
rancher/rke2#45