-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apimachinery & apiserver: use stacktrace in the stdlib #75853
apimachinery & apiserver: use stacktrace in the stdlib #75853
Conversation
I think the size limit was to prevent excessively large logs (stdlib http server panic handling does the same thing) I don't know why the custom caller code was added initially. @smarterclayton, do you know? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the size limit was to prevent excessively large logs (stdlib http server panic handling does the same thing)
That makes sense. Maybe we should add a comment if we revert to that behavior?
Personally I like this change. I find the callers thing quite odd.
@@ -92,23 +92,22 @@ func (t *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
return | |||
} | |||
|
|||
errCh := make(chan interface{}) | |||
// resultCh is used as both errCh and stopCh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like the rename. It makes more sense.
@@ -62,27 +63,14 @@ func HandleCrash(additionalHandlers ...func(interface{})) { | |||
|
|||
// logPanic logs the caller tree when a panic occurs. | |||
func logPanic(r interface{}) { | |||
callers := getCallers(r) | |||
callers := debug.Stack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to rename this to stacktrace?
@@ -155,10 +143,10 @@ func GetCaller() string { | |||
// handlers to handle errors and panics the same way. | |||
func RecoverFromPanic(err *error) { | |||
if r := recover(); r != nil { | |||
callers := getCallers(r) | |||
callers := debug.Stack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
I think this code was added before Go had an easy way to get a stack dump. There may have been a security concern too (avoiding showing some portions of the stack). But it’s been too long. |
2012: I can go with manually allocating buf size if excessively large log is a concern. Will update the PR soon |
943a02b
to
0e61b77
Compare
updated. I'm working on adding a test to capture what's printed in klog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test!
/lgtm
staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime_test.go
Outdated
Show resolved
Hide resolved
callers := getCallers(r) | ||
// Same as stdlib http server code. Manually allocate stack trace buffer size | ||
// to prevent excessively large logs | ||
const size = 64 << 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be worth extracting this logic somewhere, since we do the same thing in handlers/rest.go. Probably not though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm not sure on this
on one hand, I'd imagine extracting this logic would result in one more function call in the call stack
on the other hand, I guess there are more places in our code base need this logic (I was about to add the logic here until I saw #73137 (comment))
739be66
to
826cb90
Compare
826cb90
to
d5ca73a
Compare
Looks good. Can you post the difference in output from before and after this PR? |
staging/src/k8s.io/apimachinery/pkg/util/runtime/runtime_test.go
Outdated
Show resolved
Hide resolved
d5ca73a
to
a2c59f5
Compare
@sttts https://gist.github.com/roycaihw/b0977279000fb3f3048c3b77df89bec5 to reproduce: include this commit, bring up apiserver and look for the last occurrence of "Observed a panic" in apiserver log |
/retest |
@roycaihw do I read the gist correctly that we print all 3249 go routine stack traces now, while only one before? |
@sttts we print stacktraces for 3 goroutines in both cases after:
before:
/retest |
@liggitt could you approve this PR? /retest |
if match, _ := regexp.MatchString(`logPanic(.*)`, lines[2]); !match { | ||
t.Errorf("mismatch symbolized function name: %s", lines[2]) | ||
} | ||
if match, _ := regexp.MatchString(`runtime\.go:69 \+0x`, lines[3]); !match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a fragile test... is this format and line number stable across releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question for the [running]
format, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line #: it's fragile. I meant to have people update the number when changing our runtime.go
, but it's not necessary
[running]
: it comes from one of the statuses. Should be stable in this test
updated both to use regexp. The rest of the test should be stable enough
also added documentation about if golang stdlib changes its format
|
||
// captureStderr redirects stderr to result string, and then restore stderr from backup | ||
func captureStderr(f func()) (string, error) { | ||
bak := os.Stderr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer func(){ os.Stderr = bak }()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tried to defer closing the writable file before, and it behaved weirdly (stuck on flushing the buffer?)
a2c59f5
to
ca8fcfc
Compare
ca8fcfc
to
999a02c
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, roycaihw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Yeah, I'm pretty sure I added this code, for this reason. |
use
runtime.Stack()
in the stdlib to log stack trace instead of using customized code, so that third party lib can parse the panic (like panicparse)e.g. if we have a panic here, with the change, the tool is able to parse all three goroutines
without the change, only the first goroutine (
finishRequest
) gets parsed. The second goroutine (timeoutHandler
) is missed because we drop a trailing newline in the one goroutine before. The third goroutine (HandleCrash
) is missed becausegetCaller()
is non-standard which only contains filename + line number.Does this PR introduce a user-facing change?:
/cc @logicalhan @sttts