-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsure if PostgreSQL config is working as expected #27
Comments
Give /~https://github.com/taskrabbit/makara/releases/tag/v0.2.0.beta7 a try. Might be as simple as not handling that error (string) correctly. Let me know, thanks for the help in testing. |
Thanks, unfortunately, it didn't help. It doesn't look like this code (connection_message?) actually gets called. |
Oh, interesting. It's instantiating a new connection rather than invoking a reconnect on the current connection. I'll see if I can figure out a solution but the short story is that Makara is not handling connection issues upon initialization of the adapter, only after connected. |
@AaronRustad are you able to see what the original exception is in your country_lookup.rb? I just need the error#message value. It doesn't look like it's being caught and as a result it is caught in the query_cache middleware which attempts to reconnect. So as I see it there are two issues 1) Makara is not handling your original error properly 2) failure of reconnection on slave nodes should not blow up (I'll likely fix this via a makara config value). |
I believe this may be the 'original' error:
Then it is followed by
|
Closing due to inactivity and potential fixes have been merged. |
I was pretty stoked when I came across this gem, but I'm running into similar problems as here with PostgreSQL. On v0.3.0.rc3, I can't seem to get failover working properly either direction (losing primary or losing replica). If I roll back to the commit right before caeb8dc and set the I was setting up my primary/replica with Docker and the latest PostgreSQL. I have a simple, newly generated Rails app that I was using to replicate this. Will that stuff help? Is there somewhere else I can dig in and do some investigation? I'd love to help get this fixed for PostgreSQL however I can! |
When you say failover isn't working properly are you talking about the initial connection or later in the lifecycle? |
I've been trying this a ton, so I might need to confirm which combination of things I was trying. I've seen it on initialization and after having a booted, running, and working app, killing the primary docker container and I don't see it continuing to read from the replica. I'll get some concrete steps down for you today. |
The first thing to try is a running app with one master and >1 slaves. Then bring one of the slaves down. Reads should continue from the other slave. If not, try to get the exact error message which Makara saw and ignored. Makara no longer attempts to handle initial connection failures, as that turns into a can of worms. |
@mnelson I've been working with @bmorton on this. Here's a repo that contains steps to reproduce the issue he described above: /~https://github.com/moneill/makara-repro. In short, we start the app with the master and replicas running and working, then shut one down and see the following:
|
This is happening with Rails.env == production? I'm basically wondering why post-startup and initial read pg_adapter#initialize is called. IIRC the pg adapter calls connect! within the initializer which is a problem I ran into in previous versions of Makara. |
Yeah, we tried it in both development and production modes to the same result. |
I've been doing some more digging and it looks like when Rails loses connection to any of the database connections, it goes and tries to instantiate a new Playing with a couple potential solutions, but currently getting bit by |
Appreciate the digging. There's potentially a rails version issue here, as some people are using Makara + PG with success. |
Just followed down that path to see if I could identify some ActiveRecord differences that might point to something, but I can reproduce the same issue with Rails 3.2.19 and Rails 4.0.10 using the same boilerplate stuff that was added to the app that @moneill linked. To be clear, everything appears to work fine if all the Postgres instances are up, but if one of them is down, requests fail no matter which of the Postgres instances are taken down. I spot checked a couple different versions of the You mentioned that handling initial connection failures was a can of worms. From what I can tell from the history, it looks like that was supported at one point and then removed. I think that's when Postgres failover support stopped working. Would you mind talking a bit more about the initial connection failures stuff? Given the way the ActiveRecord adapter for Postgres works, I'm not sure there's a way around handling the initial connection failures. The potential solution that I mentioned above is pretty hacky still. It basically makes The part that isn't solved with this approach (because of another hack I had to do to the Ideas or thoughts? I'd love to help get this fixed up. |
I've largely got Makara configured and working as expected, but while testing the fail-over capabilities of the slaves, I'm not seeing master take over. It's my understanding that if all the slaves become blacklisted, master should take over.
I'm able to successfully run the application using a single master and a single slave. I can see that when I modify records, the master database is used, and when I make reads, the slave is used. I'm able to bring down master and continue to read from the slave, and when I try to write, those commands fail.
However, if I leave master running and bring down my single slave, all reads fail. I believe all requests should be issued against master in this case, correct?
EDIT: When I say "bring down" I'm actually asking Postgres to gracefully shutdown....if that matters at all.
My config is as follows:
The Error I'm seeing:
Makara Gem: v0.2.0.beta6
Postgres : 9.2
Rails : 4.0.1
Ruby : 2.0.0
Thanks for your help!
The text was updated successfully, but these errors were encountered: