Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove log store truncation from resource mg #620

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

Besroy
Copy link
Contributor

@Besroy Besroy commented Dec 24, 2024

Currently both resource_mgr and raft can call log store's truncate, but resource_mgr will not truncate logs whose lsn less than compact lsn. That means resource_mgr just re-truncate logs which will be / has been truncated in compact. But if resource_mgr and raft call truncate concurrently, crash will happen.
For example, raft truncate logs upto compact_lsn and execute

m_records.truncate(upto_lsn);
m_start_lsn.store(upto_lsn + 1);

Then resource_mgr truncate truncate_lsn (which is no large than compact_lsn) and execute m_trunc_ld_key = m_records.at(upto_lsn).m_trunc_key; Since truncate_lsn has been truncated, an exception thrown.

This pr remove truncation from resource_mgr to avoid concurrency.

@Besroy Besroy requested review from xiaoxichen and yuwmao December 24, 2024 10:20
@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 66.08%. Comparing base (1a0cef8) to head (d51d532).
Report is 117 commits behind head on master.

Files with missing lines Patch % Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #620      +/-   ##
==========================================
+ Coverage   56.51%   66.08%   +9.56%     
==========================================
  Files         108      109       +1     
  Lines       10300    10982     +682     
  Branches     1402     1509     +107     
==========================================
+ Hits         5821     7257    +1436     
+ Misses       3894     3004     -890     
- Partials      585      721     +136     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Besroy Besroy force-pushed the fix_truncation branch 4 times, most recently from ae7436a to 348fa3a Compare December 25, 2024 10:45
@Besroy Besroy force-pushed the fix_truncation branch 5 times, most recently from e6b2adb to 5bddd0d Compare December 26, 2024 09:45
@JacksonYao287
Copy link
Contributor

@yamingk @sanebay this is another concurrent issue at logstore level, cause by two concurrent truncation(one is from nuraft, the other is from resource manager). do we indeed need truncation in resource manager? pls check if this will affect nublox case.

@xiaoxichen xiaoxichen changed the title avoid truncation when no logs Remove log store truncation from resource mg Dec 31, 2024
xiaoxichen
xiaoxichen previously approved these changes Dec 31, 2024
Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Currently both resource_mgr and raft can call log store's truncate, but resource_mgr will not truncate logs whose lsn less than compact lsn.
That means resource_mgr just re-truncate logs which will be / has been truncated in compact.
But if resource_mgr and raft call truncate concurrently, crash will happen. So this commit remove it.
lsn, prev_lsn);
RD_DBG_ASSERT(m_commit_upto_lsn.compare_exchange_strong(prev_lsn, lsn),
"Raft Channel: unexpected log {} commited before config {} committed", prev_lsn, lsn);
if (prev_lsn >= lsn || !m_commit_upto_lsn.compare_exchange_strong(prev_lsn, lsn)) {
Copy link
Contributor Author

@Besroy Besroy Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JacksonYao287 , I remove debug assert and use error log here to enhance code readability. PTAL

@xiaoxichen xiaoxichen merged commit 348e05d into eBay:master Jan 3, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants