-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: crash for delconsumer during stream reading #4513
Conversation
if (sitem.consumer->pel->numnodes == 0) { | ||
LOG(DFATAL) << "Internal error when accessing consumer data, seen_time " | ||
<< sitem.consumer->seen_time; | ||
result = OpStatus::CANCELLED; | ||
return OpStatus::OK; | ||
} | ||
} | ||
|
||
range_opts.noack = opts->noack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this? does the test cover this fix as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was 2340 line I haven't changed it.
} | ||
sitem.consumer = range_opts.consumer; | ||
if (!sitem.consumer) { | ||
return OpStatus::OUT_OF_MEMORY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly misleading in this context, because FindOrAddConsumer
did not fail because we are out of memory but because the consumer group (alice
in the test) is deleted and the pointer is nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something else that is happening, I did not understand the crash until I actually used the debugger and stepped through but a few comments:
if sitem.consumer
== nullptr we would return OUT_OF_MEMORY
which would make the XREAD
command return return rb->SendNullArray();
(see how we handle the transaction result below).
So that means that we never triggered that path in the first place! The actual crash came from below:
if (sitem.consumer->pel->numnodes == 0) {
LOG(DFATAL) << "Internal error when accessing consumer data, seen_time "
<< sitem.consumer->seen_time;
result = OpStatus::CANCELLED;
return OpStatus::OK;
}
So my question here is:
How does the introduced flow which is never triggered change the value of the check below ?
if (sitem.consumer->pel->numnodes == 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BorysTheDev the first comment is worth responding.
@kostasrim I believe the fix to the crash is this line:
sitem.consumer = range_opts.consumer;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romange You are absolutely right sitem.consumer = range_opts.consumer;
is the fix, because sitem.consumer contains already invalidated pointer
I've added
if (!sitem.consumer) {
return OpStatus::OUT_OF_MEMORY;
for the case if the memory for the consumer can not be allocated, but it isn't a fix of the crash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for (1) and from my understanding FindOrAddConsumer
shall return null if cg
is empty or if we try to insert the consumer into the rax data structure and we fail. Not sure if the second can fail only
under out of memory scenario but either way it would worth a comment.
For (2), yes I got it and checked it myself. I thought the fix was the early return but that didn't make sense and that's why I left the comment above yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BorysTheDev but we don't return NULL
within FindOrAddConsumer
only when OUT_OF_MEMORY
right ?
From my understanding, cg
can be null
, if the consumer does not exist then it must be interted with raxInsert
which can return NULL as well (not sure if this is only for OUT OF MEMORY though)
The rest make sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep we got checks for cg
+ raxInsert will return null only if it fails to insert. all good 👍
fix: crash for delconsumer during reading stream
fixes: /~https://github.com/dragonflydb/dataplane-private/issues/27