Teaching gRPC some path-awareness
Author: Dominik Roos
Last updated: 2020-09-02
Discussion at: -
The SCION control plane uses gRPC to have a reliable RPC mechanism. gRPC is very flexible, and employs powerful concepts that we can utilize to make our RPC stack path aware.
The following image from the gRPC Blog summarizes the semantics around connections in gRPC quite well.
The terminology in go library is slightly different:
A client dials a
ClientConnto a target service (instead of a Channel).
ClientConnhas one or multiple HTTP/2 connections, called
SubConn(instead of a Conn).
The go gRPC stack consists of multiple components that interact with each other to make an RPC happen.
ClientConn is dialed, gRPC creates a
instance specific to this
The resolver is responsible for resolving a string target that is passed to
grpc.Dial to one or more addresses that can be used for establishing a
ClientConn. The syntax for the target is described here: gRPC Name
The balancer has two functions. It manages what
SubConns should be opened or
closed by the
ClientConn. When RPCs are scheduled on a
balancer picks what
SubConn the RPC should use.
gRPC will handle the connection establishment and monitoring for each
SubConn. HTTP/2 has a mechanism in the protocol to health check the
connection. This will show L4 connection healthiness, but not at the application
layer. (L7 health checks can optionally be enabled.) The
SubConns are dialed
with with the target provided by the balancer, that were original resolved by the
resolver. The dialer can be customized.
The Dialer simply takes a string as its input.
By default, gRPC uses the following combination:
Resolver: DNS resolver that resolves host names to IP addresses.
Balancer: Pick first balancer, that always picks the first address out of the list that was provided by the resolver.
Dialer: net.Dialer with tcp from stdlib.
Teaching an old dog new tricks
The way gRPC splits the responsibilities in these different components is very powerful, as it allows us to plug in path awareness into the gRPC stack.
At a high-level, we can do the following:
Plug a resolver that can take a SCION address and resolve paths. The resolver attaches information about the path to resolver.Address.
Plug a balancer that picks healthy
SubConnsfor each RPC.
Plug a dialer that can dial QUIC/SCION to specific targets provided by the resolver.
In order to profit from this connection management, we need to have long-lived connections between control plane entities. This can be abstracted in some kind of connection manager.
There are two things that need to be resolved when considering SCION control plane interactions. First, we need paths to be resolved in order to contact remote ASes. Second, we need to resolve the QUIC address if we are handed an svc address.
For the first iteration, it makes sense to restrict ourselves to resolving paths. Redirecting from svc to an actual address can still be done by the dialer. In fact, it will be more reliable for remote ASes with multiple control servers until we have a service lookup RPC.
The resolver will resolve paths for addresses with the
Since the dialer only takes a string as the target, we need to share state between the resolver and the dialer in form of a path registry. The path registry is a mapping from path identifier to the actual path object. The resolver will encode the path choice in the target address.
For example, the client wants to dial
2 indicates the control service. The resolver will resolve paths, create IDs
for each path it resolves, and register them with the path registry. The
resolver then returns resolved addresses
(format up for discussion).
The dialer will establish a QUIC/SCION connection to the target. The path is retrieved from the shared state between the dialer and the resolver.
For the first iteration, the balancer does not need to be SCION path aware. The default round robin balancer should suffice. It will pick healthy connections in a round robin fashion.
At a later stage, we can plug our own balancer that takes path properties into
account. Or a balancer that prefers to stick with the same
SubConn until it
is no longer healthy.
To profit from gRPC monitoring the connection health, connections must be long
ConnManager will be invoked to establish
ClientConns. It will
take care of
ClientConn management. If there is already a
a requested target, the
ConnManager simply returns a reference to that,
instead of establishing a new one. It will also need to run garbage collection
ClientConns that have not been actively used for some amount of
In the code base, we already plug a
Dialer interface everywhere. The
ConnManager can be hidden behind this interface. We also need to abstract the
ClientConn. Then, we can wrap the
grpc.ClientConn and use
Close method for reference tracking.
Things to investigate
How often does the resolution trigger? Does it ever trigger if everything is fine?