Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: reject writes only when data disk is degraded #24436

Merged
merged 1 commit into from
Dec 7, 2024

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Dec 4, 2024

Background https://redpandadata.atlassian.net/browse/CORE-8349

Health monitor tracks only the data disk now as the only use of that state is for rejecting writes. Cache disk state is irrelevant at cluster level.

This was tested manually by creating a cluster with custom cache disk mountpoint and trying to produce to it.

Before this commit, producing would have failed with a full cache disk. After this commit, producing fails only if the data disk is full.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state).

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 4, 2024

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59241#01939393-73db-4935-87e8-9b193bc30f60:

"rptest.tests.full_disk_test.WriteRejectTest.test_refresh_disk_health"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59251#01939471-a183-4f65-8f55-0d0b88888849:

"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.ms"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59241

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/full_disk_test.py::WriteRejectTest.test_refresh_disk_health

@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/59251#01939417-d322-4b6a-b8f3-30c4e837a917 have failed and will be retried

gtest_raft_rpunit

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59251

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/archive_retention_test.py::CloudArchiveRetentionTest.test_delete@{"cloud_storage_type":2,"retention_type":"retention.ms"}

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me!

@nvartolomei
Copy link
Contributor Author

/ci-repeat 1
tests/rptest/tests/archive_retention_test.py::CloudArchiveRetentionTest.test_delete@{"cloud_storage_type":2,"retention_type":"retention.ms"}

@nvartolomei nvartolomei changed the title cluster: reject writes only if data disk is degraded cluster: reject writes only when data disk is degraded Dec 6, 2024
@nvartolomei
Copy link
Contributor Author

/ci-repeat 1
skip-redpanda-builds

Health monitor tracks only the data disk now as the only use of that
state is for rejecting writes. Cache disk state is irrelevant at cluster
level.

This was tested manually by creating a cluster with custom cache disk
mountpoint and trying to produce to it.

Before this commit, producing would have failed with a full cache disk.
After this commit, producing fails only if the data disk is full.
@nvartolomei nvartolomei merged commit 252f5be into redpanda-data:dev Dec 7, 2024
17 checks passed
@nvartolomei nvartolomei deleted the nv/CORE-8349 branch December 7, 2024 03:35
@nvartolomei nvartolomei restored the nv/CORE-8349 branch December 7, 2024 03:35
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants