Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Deleted DatasetVersion "Clean up" Job #7286

Open
nayib-jose-gloria opened this issue Jul 17, 2024 · 1 comment
Open

Create Deleted DatasetVersion "Clean up" Job #7286

nayib-jose-gloria opened this issue Jul 17, 2024 · 1 comment
Assignees

Comments

@nayib-jose-gloria
Copy link
Contributor

Create a "clean-up" batch job that deletes DatasetArtifacts + DatasetVersions for an input List of DatasetVersionIds.

See data model in orm.py and business logic entity objects in entities.py

Leverage existing functions for artifact/DB clean-up such as:
business layer function to delete dataset version artifacts from S3

persistence layer function to delete dataset version + associated dataset artifact rows from the DB

Other batch jobs created in data-portal before:
Publish revisions: code and infra
Dataset metadata update: code and infra

Memory / vcpu requirements should be reasonably fine-tuned during testing in rdev environment.

@nayib-jose-gloria nayib-jose-gloria changed the title Create Deleted Dataset "Clean up" Job Create Deleted DatasetVersion "Clean up" Job Jul 17, 2024
@lvreynoso lvreynoso self-assigned this Jul 17, 2024
@nayib-jose-gloria
Copy link
Contributor Author

Note to @lvreynoso: @Bento007 has suggested looking into using an aws lambda for this instead of a batch job; I think its worth considering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants