Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

Closed
kenafoster opened this issue Aug 30, 2024 · 2 comments · Fixed by #2673
Closed
Labels
type: bug 🐛 Something isn't working

Comments

@kenafoster
Copy link
Contributor

Describe the bug

Run nebari deploy on top of a previously created cluster with an install that includes the commit ed170cb73f11df42d4d6b6536f7bea92ae1fe934 (which adds a 'count' to the efs module). Nebari destroys the entire efs module (so all of the JupyterLab data) and then recreates it

Expected behavior

New changes would be deployed but existing data would persist

OS and architecture in which you are running Nebari

Ubuntu 22.04 amd64

How to Reproduce the problem?

Deploy Nebari into AWS with any previous version up to 2024.7.1. Then install install nebari that includes this commit/this line:

count = var.efs_enabled ? 1 : 0

This happens since adding count = var.efs_enabled ? 1 : 0 changes the module from module.efs.aws_efs_file_system.main to module.efs[0].aws_efs_file_system.main

The fix is to create a terraform moved block .

I just noticed that some other changes included since the 2024.7.1 release have implemented this correctly

Command output

The relevant part of the Terraform is:

 [terraform]:   # module.efs.aws_efs_file_system.main will be destroyed
[terraform]:   # (because module.efs is not in configuration)
[terraform]:   - resource "aws_efs_file_system" "main" {
[terraform]:       - arn                             = "arn:aws-us-gov:elasticfilesystem:us-gov-west-1:xxxxxxxxxxxx:file-system/fs-xxxxxxxxxxx" -> null
...
...
...
[terraform]:   # module.efs[0].aws_efs_file_system.main will be created
[terraform]:   + resource "aws_efs_file_system" "main" {
[terraform]:       + arn                     = (known after apply)
[terraform]:       + availability_zone_id    = (known after apply)

Versions and dependencies used.

Nebari version - previously deployed tag 2024.7.1. Now deployed from nebari-dev/nebari:develop HEAD at 498e569

Compute environment

None

Integrations

No response

Anything else?

No response

@kenafoster kenafoster added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels Aug 30, 2024
@kenafoster
Copy link
Contributor Author

I'm not sure whether to report this as a follow-up issue but in trying to recover the rest of the destroyed instance, I hit another blocker.

The change to the EFS system forced replacement of module.jupyterhub-nfs-mount[0].kubernetes_persistent_volume.main.

When terraform tried to delete that PV, it timed out because there is a PVC bound to it, but there isn't any dependency in the relevant code (/~https://github.com/nebari-dev/nebari/blob/ed170cb73f11df42d4d6b6536f7bea92ae1fe934/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/nfs-mount/main.tf)

So terraform sees a change to the kubernetes_persistent_volume that forced replacement, but the underlying destroy PV call is failing is the PV is bound to the PVC created in the same file above. But the PVC doesn't explicitly require the name of the PV (just its storage class) so I guess Terraform can't determine that the PVC must be destroyed/replaced along with the PV if that makes sense

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Aug 30, 2024

@kenafoster I believe this issue is the same as #2638 and should be fixed by #2639 which is already merged. Can you try out the latest develop branch and see if it's still an issue?

Update: I think I was mistaken and one more moved block is needed for AWS only

@Adam-D-Lewis Adam-D-Lewis added this to the Next Release milestone Aug 30, 2024
@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Sep 2, 2024
@Adam-D-Lewis Adam-D-Lewis removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug 🐛 Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

2 participants