Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grpc_resolver using external discovery service #1498

Merged
merged 11 commits into from
May 13, 2019
Merged

Add grpc_resolver using external discovery service #1498

merged 11 commits into from
May 13, 2019

Conversation

guanw
Copy link
Contributor

@guanw guanw commented Apr 25, 2019

Signed-off-by: Jude Wang judew@uber.com

Which problem is this PR solving?

Short description of the changes

  • Add grpc resolver which listens to available host/port from discovery.notifier and updates list of collector servers for agent to connect.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review, but as I'm in meetings for the next N hours, wanted to give preliminary feedback

cmd/agent/app/reporter/grpc/builder.go Outdated Show resolved Hide resolved
cmd/agent/app/reporter/grpc/builder.go Outdated Show resolved Hide resolved
cmd/agent/app/reporter/grpc/builder.go Outdated Show resolved Hide resolved
cmd/agent/app/reporter/grpc/flags.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
@guanw
Copy link
Contributor Author

guanw commented Apr 26, 2019

saw this error twice with travis
image

Local build "make test-ci" works fine. Is there a way to configure memory limit on travis?

pkg/discovery/grpcresolver/grpc_resolver.go Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
cmd/agent/app/reporter/grpc/flags.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
@yurishkuro
Copy link
Member

btw, why do you need to change Gopkg.lock?

@guanw
Copy link
Contributor Author

guanw commented Apr 30, 2019

btw, why do you need to change Gopkg.lock?

hmmm. There seems to be discrepency error in travis yesterday but now that I checked again it was gone somehow. Will revert it back.

@codecov
Copy link

codecov bot commented Apr 30, 2019

Codecov Report

Merging #1498 into master will decrease coverage by <.01%.
The diff coverage is 98.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1498      +/-   ##
==========================================
- Coverage   98.79%   98.79%   -0.01%     
==========================================
  Files         189      190       +1     
  Lines        8986     9058      +72     
==========================================
+ Hits         8878     8949      +71     
- Misses         84       85       +1     
  Partials       24       24
Impacted Files Coverage Δ
cmd/agent/app/reporter/grpc/flags.go 100% <100%> (ø) ⬆️
cmd/agent/app/reporter/grpc/builder.go 100% <100%> (ø) ⬆️
pkg/discovery/grpcresolver/grpc_resolver.go 98.57% <98.57%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aacfaf...0f8b179. Read the comment docs.

Copy link
Contributor Author

@guanw guanw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the waitGroup. I think it makes sure we exit watcher goroutine before we exit the main program, so generally a good cleanup order? Otherwise the the main program could potentially exit before watcher finishes.

@guanw
Copy link
Contributor Author

guanw commented May 7, 2019

@yurishkuro Any further comments?

pkg/discovery/discoverer.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
connections = backendCount
}

makeSureConnectionsUp(t, connections, testc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be called with backendCount, not connections. Or even better - it should be outside of the loop, since you're not restarting servers (which is why I suggested making it part of startTestServers, since the expectation of that function is that "servers are ready"). And you don't need to use testc with the load balancer for this health check.

@guanw
Copy link
Contributor Author

guanw commented May 8, 2019

@yurishkuro I'm not clear why we don't need to use testc for the health check. If we don't do a initial load balancing and make sure every server has been reached at least once, what's the point of having makeSureConnectionsUp function?

@yurishkuro
Copy link
Member

I'm not clear why we don't need to use testc for the health check.

we do need to use a connection, but it can be a different connection constructed without load balancer, since you are giving it explicit peer.

Copy link
Contributor Author

@guanw guanw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear why we don't need to use testc for the health check.

we do need to use a connection, but it can be a different connection constructed without load balancer, since you are giving it explicit peer.

Are you referring to testc.EmptyCall? I think we are populating peer instead of "giving". The peers will only be decided by resolver and minPeers we passed to it.

pkg/discovery/grpcresolver/grpc_resolver.go Outdated Show resolved Hide resolved
@guanw
Copy link
Contributor Author

guanw commented May 10, 2019

I'm not clear why we don't need to use testc for the health check.

we do need to use a connection, but it can be a different connection constructed without load balancer, since you are giving it explicit peer.

Are you referring to testc.EmptyCall? I think we are populating peer instead of "giving". The peers will only be decided by resolver and minPeers we passed to it.

@yurishkuro Any comment on my last reply?

@yurishkuro
Copy link
Member

testc is created with your custom load balancer. A plain connection without load balancer is sufficient for server liveness checks. Connection with LB is only needed in the actual unit test.

@guanw
Copy link
Contributor Author

guanw commented May 10, 2019

testc is created with your custom load balancer. A plain connection without load balancer is sufficient for server liveness checks. Connection with LB is only needed in the actual unit test.

I'm not certain that's going to work. I tested the resolver without roundrobin balancer(i.e without grpc.WithBalancerName(roundrobin.Name)) the testc will just call the same peer everytime(confirm by populating). A plain connection without load balancer is sufficient to check if server is live, but without every peer being called at least once, calling roundrobin at second step could still miss some peers(tested and I could see for example {minPeers 5, connections 5} it could end up calling 4 out of 5 peers instead of 5 out of 5).

Also taking a look at TestBackendsRoundRobin in /~https://github.com/grpc/grpc-go/blob/master/balancer/roundrobin/roundrobin_test.go, I still think building connection for all server peers is necessary everytime we build a new grpc client connection.

@yurishkuro
Copy link
Member

yurishkuro commented May 10, 2019

You have two rounds of network calls:

  • in the first, you ping each server directly (by giving the connection a Peer) to check it's up. You don't need load balancing on the connection in this case. Therefore, this whole logic can be encapsulated in the startServers() function
  • in the second round you're actually unit testing the LB, so you start a new connection that uses the LB and run unit test with it

@guanw
Copy link
Contributor Author

guanw commented May 10, 2019

You have two rounds of network calls:

  • in the first, you ping each server directly (by giving the connection a Peer) to check it's up. You don't need load balancing on the connection in this case. Therefore, this whole logic can be encapsulated in the startServers() function
  • in the second round you're actually unit testing the LB, so you start a new connection that uses the LB and run unit test with it

But still, load balancing at step 2 does not necessarily work even if the first step works. Cause now they are from two different connections? I tested the idea here but it some times still hits only 4 out of 5.

@yurishkuro
Copy link
Member

Step 1 has nothing to do with step 2, it's preparation of servers and a check that they are alive. If step 2 fails after that (using a different connection) then it's either a bug in the LB or in the test, and makes it even more sense to use different connection vs. step 1.

@guanw
Copy link
Contributor Author

guanw commented May 10, 2019

Step 1 has nothing to do with step 2, it's preparation of servers and a check that they are alive. If step 2 fails after that (using a different connection) then it's either a bug in the LB or in the test, and makes it even more sense to use different connection vs. step 1.

I tested against the original grpc-go roundrobin test they make by swapping the "making connection up" part with directly calling peers, it would fail. I posted the question on their github page and will continue on this once they replied.

@yurishkuro
Copy link
Member

can you link that issue here?

@guanw
Copy link
Contributor Author

guanw commented May 12, 2019

grpc/grpc-go#2808

@yurishkuro
Copy link
Member

Ah, the explanation in that ticket makes sense - if you create a new connection for the LB test, you'd still need to ensure that it is connected to each peer before you can execute the test. We should add this to the comments in the test.

@guanw
Copy link
Contributor Author

guanw commented May 12, 2019

sg. Will add some comment in.

Jude Wang added 10 commits May 11, 2019 22:02
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
pkg/discovery/grpcresolver/grpc_resolver_test.go Outdated Show resolved Hide resolved
@yurishkuro yurishkuro changed the title add grpc_resolver which gets notified from external discovery service Add grpc_resolver using external discovery service May 12, 2019
Signed-off-by: Jude Wang <judew@uber.com>
@yurishkuro yurishkuro merged commit f6cfb78 into jaegertracing:master May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants