-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zookeeper event watching, subscription, reconnection #159
Conversation
Hi Serejja, First of all thank you. I am still getting a panic when applied. (I will attach the panic to the back end). Lets give you a quick background on where we originally encountered the problem.
Our app is a Go-based Microservice that acts as a proxy to several internal services. We are using Kafka to basically notify our service when some of the data we cache has been invalidated and needs to be evicted. To reproduce the problem I have been putting our client against a single zookeeper node in our cluster and then taking the zookeeper node up and down. I can repeatedly reproduce the panic everytime I try to restart the zookeeper node in question. Note: The client never goes down while the zookeeper node is down. It happily processes messages and then writes errors that it cant commit the offset. I have managed to write a workaround. (Please forgive me as it probably crude. I am new to Go and Zookeeper.) Basically my workaround was to add a channel to the ZooKeeperConfig struct in zk_coordinator. Then in the trySubscribeChange, I replace the panic with a push of a value to a channel. if strings.HasPrefix(e.Path, newZKGroupDirs(this.config.Root, Groupid).ConsumerRegistryDir) { I also do the same thing in the in the listenConnectionEvents: for groupId, watch := range this.watches { Finally when I create the consumer, I listen on this channel and then if there is a problem I close the consumer and the config.Coordinator.Disconnect() This closes the old connection and re-initializes the consumer group allowing the application to keep going. Like, I said I am not sure if the right solution. The challenge have is that when panics are used inside of spun go routines there is no way for the main application to "catch" them. (I could be wrong .... I am new to go). Anyways, thanks again for responding to this. Appreciate the help and please let me know if there is anything I can supply to you to help debug this problem.
Here is the top of the panic. panic: zk: could not connect to a server goroutine 115 [running]: |
Hi John, given your steps I can't reproduce this on local environment. I tried all sorts of ZK start/stop patterns (e.g. quickly start/stop ZK; stop, wait 1-5 minutes, start again etc) and all I see is "connection refused" messages in stderr which are normal. No panics at all. Can you please share your Kafka and ZK versions? (I doubt the problem is in Kafka version but it definitely can be ZK version). I was testing this against Kafka 0.8.2.1 and Zookeeper 3.4.6 One more thing worth to mention here: |
Hi Serhy, Appreciate the efforts. Here is the info you request. kafka = 0.8.2.1 Were you running messages through the platform when you shutdown a I would agree with you about the Panics, however the one thing that I don't I will take a look a the ZookeeperConfig and Thanks again for the help, On Thu, Sep 17, 2015 at 5:01 AM, Serhey Novachenko <notifications@github.com
|
Hey John, I've tried both quiet cluster and running traffic (approx. 5k dummy msg/sec) and still can't reproduce that. Not sure what I'm doing wrong :) Regarding intercepting panics I think we could add something like a PanicHandler (or whatever would we call it) to the ConsumerConfig with the default implementation that would panic as it is now. This way we wouldn't break any existing clients AND would give the ability to handle abnormal situations like this more or less gracefully. I think I'll create a separate PR with this behavior and let you know. Thanks! |
Uhm, and yes, could you please try testing this locally with ZK 3.4.6? Maybe the issue is hidden somewhere there? |
Hey John, can you please take a look at #162 and reply what you think about this approach? Thanks! |
Hi Serhey, Thanks for the effort. Part of the challenge with asynchronous solutions I really like your PanicHandler solution. It would fit nicely because what Thanks again and please let me know when the Pull Request is merged and I
On Sep 18, 2015 3:28 AM, "Serhey Novachenko" notifications@github.com
|
Hi Serhey, Never mind I see you have already committed the changes. I will pull them Thanks, On Fri, Sep 18, 2015 at 4:28 AM, Serhey Novachenko <notifications@github.com
|
Fix zookeeper event watching, subscription, reconnection
No description provided.