You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the major limitations in libp2p currently is that it is difficult to control use of limited resources, such as memory, file descriptors, and bandwidth. This will be useful to ensure resources are allocated in a useful way between subsystems used by a single application and between applications.
The core concept is that a ResourceManager will be able to decide what new resources can be newly allocated by what subsystems and applications, and when necessary, forcefully revoke resource access.
Here are some potential concrete resource classes (ResourceClass) to manage:
RAM
File descriptors
CPU time
Bandwidth
Connections provided outside libp2p itself, e.g. TCP, WebSocket, WebRTC
NAT table entries in an external box
These resources are ultimately used by some top-level owner. Individual applications are the most obvious owners, but other parts of libp2p that are independent of applications would own resources as well:
DHT server, which supports the network as a whole
Circuit relay
Temporary pool (for handling incoming connections before the target application is found)
Resource graph
When an owner owns a resource, that is represented by an edge from the owner to the resource. To make this more powerful though, we introduce the notion of interior nodes that are both owners and resources: they are abstractions within libp2p that both provide something to a higher level like an application, and depend on lower level resources. For example, we might have an application, which owns a libp2p connection, which owns a muxer, which owns an upgraded connection, which owns a raw TCP connection. Each of these relationships is represented by a graph edge from the owner to the resource, and the complete ownership relation is represented by the transitive closure of the graph. In other words, the application itself "transitively-owns" the connection, the muxer,... all the way down to the TCP connection.
Note that resources can also have multiple owners; e.g. a muxer can be owned by multiple streams running on top of it. To simplify the interaces slightly, the ResourceManager can also be thought of as the owner of the "top-level" owners, and becomes the root of the graph. Objects which represent or directly use real resources (like TCP sockets) are the leaves.
Leaves should implement LeafResource. Objects which own subresources should implement Owner. Inner nodes should implement both Resource and Owner.
The actual behavior of the ResourceManager depends entirely on its implementation and isn't specified by this graph structure itself. The following is a proposal for such behavior.
Limiting resource use
For a resource to be allocated (and not revoked), it must be attached to a part of the graph that can support its "weight." Likewise, owners can be limited in how much weight they are willing to carry (own). If the weight of a subtree exceeds a limit, priorities determine what resources should be shed. Each resource class (e.g. file descriptors, RAM,...) are tracked and limited independently, so they do not affect each other.
Two mechanisms allow control over resource use: (1) limits, and (2) priorities.
Limits
Limits are straightforward: any internal node in the resource graph can have a limit imposed on the total weight of a given resource that can hang below that node.
When a resource has multiple parents, it can be counted against any one of its parents (its "effective parent". In practice, it should be counted against whichever parent allows it to not exceed limits.
Prioritites
When no configuration of effective parents avoids exceeding resource limits, priorites determine which resources are allowed in the tree and which are not. Each edge from an owner to a resource (child) has a numerical priority between 0 (lowest priority) and 1 (highest). Any leaf resource has an effective priority determined by the product of the priority of every edge from itself back to the root owner.
Proposed interfaces
typeResourceinterface {
// Makes a context that lasts as long as the resource (is cancelled if// the resource is revoked)Context() context.Context// Returns a new context derived from the context passed in, but also// limited to last no longer than the resourceContextFromContext(ctx context.Context) context.Context// Indicate that the resource is no longer neededDestroy()
// Add a handler that runs when the resource manager decides this resource// can no longer be used. Should be used to clean up the underlying real// resourceSetRevocationHandler(func(string) bool)
// Provides a list of owners for this resource. Guaranteed not to be emptyOwners() []Owner// Indicates if this is a leaf resource (is a LeafResource).// Otherwise this is an internal node (is an Owner).IsLeaf() bool
}
typeLeafResourceinterface {
Resource// Indicates the class of this resourceResourceClass() ResourceClass// Inidcates how many units this resource instance uses toward limitsUnits() float32
}
typeOwnerinterface {
// Lists child resourcesResources() []Resource// Makes this object own a resource. If that resource already had another// owner, it will have multiple ownersAddResource(resourceResource) bool// Makes this object own a resource. If limits would be exceeded, wait// to see if it will succeed up until the deadlineAddDeadline(resourceResource, time.Time) bool// Makes this object own a resource, removing other owners that it may// already haveMoveResource(resourceResource)
// Tries to make this object own a resource, succeeding only if moving the// entire resource (including all subresources) doesn't exceed any// resource limitsTryMoveResource(resourceResource) bool// List resource limits enforced on this nodeResourceLimits() []ResourceLimit// Indicates the priorities for each child. Defaults to 1 for all childrenResourcePriorities() map[Resoruce]float32
}
typeResourceLimitstruct {
// What type of resource is being limitedclassResourceClass// Maximum number of resource units that can be ownedlimitfloat32
}
typeResourceManagerinterface {
Owner// Adds a class of resource to be managedAddResourceClass(ResourceClass)
// Returns the placeholder owner, which is a special child (see below)PlaceholderOwner() Owner
}
// A type of resource; e.g. RAM, fd, etc.typeResourceClassinterface {
// Name of the resource class (for debugging)Name() string
}
// The inner nodes may have whatever interace to set their resource limits,// but this is a minimal interface for setting simple limitstypeLimitableResourceinterface {
Resource// Set a limit of the given number of unitsSetLimit(ResourceClass, unitsfloat32)
// Remove a limitRemoveLimit(ResourceClass)
}
Conceptual resource allocation algorithm
There aren't any interesting choices to be made unless some resource limit is exceeded. The first interesting set of choices involves resources with multiple parents.
When adding or moving a resource to a new position in the graph, it is necessary to try to find a set of paths from the root to each leaf that do not exceed any resource limits. To find such a solution, different choices of effective parents can be made to avoid exceeding any one limit. If any configuration that avoids exceeding any limits is possible, the new configuration is valid.
Otherwise, the lowest priority (as computed by the product of the priorities on each edge from root to leaf) resource that is below the limiting node is revoked, and the new configuration is tested to see if it is valid. If not, repeat the process of removing the lowest priority resource.
In practice, this algorithm is not going to be efficient (I think finding a valid configuration of effective parents is equivalent to SAT). Some reasonable approximation should be fine, however. Any such approximation will be far better than the current state in libp2p.
Other considerations
Pre-allocation
It is often useful to allocate a collection of resources together, since it is not ideal to allocate some resources and then find others needed at the same time aren't available. This situation causes unnecessary resource contention and CPU utilization ("thrashing").
To avoid this, a subtree of "inactive" resource objects can be allocated under the ResourceManager's PlaceholderOwner(), which has no limits on resource use. "Inactive" indicates that these resource objects would not actually own the resources they represent; e.g. a TCP socket resource wouldn't actually hold a real socket.
TODO: this isn't entirely thought through. There may be problems here.
Once the subtree is constructed, it would be activated by a two-step process:
Move the subtree to where it actually belongs in the resource tree, using TryMoveResource(). This atomically succeeds or fails for the entire subtree.
Assuming step 1 succeeds, do whatever additional setup needs to be done on the subtree to actually use the underlying resources.
This mechanism will probably need some additional interfaces.
Temporary allocation
Sometimes it is necessary to allocate resources without knowing what really ought to own them, for example when handling incoming protocols. This case can be handled by creating special owners (with limits) for these resources, and then moving them to the appropriate place in the tree once the correct owners are identified.
Examples
Consider a libp2p node that supports an application and also serves as a DHT node (i.e. not just a DHT client). This example is highly simplified, but designed to illustrate the concepts described here.
The resource manager would have four children ("top-level owners"):
The application
The DHT "server" component
A temporary owner for incoming connections/messages
Placeholder owner
The first three of these nodes would have resource limits set on them. In particular, the application would have fairly high resource limits, while the other two would have much lower limits.
Outgoing connections
The application, in turn, would own a set of streams that it uses to communicate with other libp2p nodes. Each of these streams would own a mux, which would own an upgraded connection, which would own an underlying connection, which would own a TCP stream. In a more sophisticated example, multiple applications would own streams that share muxes. So far things are fairly straightforward; the application can open streams (and connections) until doing so fails.
Each object in this tree that uses a nontrivial amount of RAM (e.g. significant buffers) should implement the Resource interface on the buffer and treat it as a child.
TODO: This isn't quite right yet!
To make this more efficient, when opening a new stream, the application should initially allocate this stream and all of the resources under it to the placeholder owner. Then, these resources should be moved under the application. Only if this succeeds should the real TCP connection be opened.
The application will also likely use the DHT for peer routing or content routing. When it does, the application will also own the individual DHT queries it makes, which will in turn own the streams the query uses. Limiting what an individual DHT query can use is done by simply setting resource limits on the query node. Internally, this would make the DHT query logic rate limit by calls to AddDeadline() blocking when trying to dial new connections.
Incoming connections
When a new connection comes in, it would initially be owned by the temporary owner. As that connection is handled by the switch, an upgraded connection would be created. This upgraded connection would then be added to the temporary owner, and the upgraded connection would own the underlying connection. The same process would happen when a mux is created.
Once the mux opens a stream to the application or DHT, it will be moved to the corresponding top level owner. This ensures that the correct resource limit will be applied as soon as possible.
Routing table
The Kademlia DHT routing table is an interesting case to consider.
Traditional Kademlia uses a routing table composed of buckets that can hold up to K entries ("k buckets"). However, Kademlia is designed as a fairly standalone system, not as part of a larger, modular system like libp2p. The function of the DHT routing table in libp2p can better be described as a way of querying the libp2p peer store, together with a system to ensure that peer store entries that are useful to the DHT are retained (not allowed to be garbage collected) and verified to be still accurate.
In that context, the K bucket size parameter actually specifies the maximum number of peer store entries that the DHT wants to own for its own use (per bucket/peer id range). If other subsystems want to own more entries for their own use, the DHT can still use them, and that benefits the DHT with no cost to the system.
In terms of resource ownership, this means that peer store entries, or at least entries that are periodically pinged to verify connectivity, would be treated as resources as described in this proposal. Each DHT bucket would be an owner, limited to K entries. The bucket objects themselves would then be owned by the DHT. This ensures that the DHT tries to keep around K peer store entries per bucket. Other subsystems could also keep entries around as needed. Peer store entries with no owners would then be subject to garbage collection, either immediately or after a delay (e.g. least recently used policy).
The text was updated successfully, but these errors were encountered:
Very interesting! I like your idea of the DHT owning a set of entries in the peer store, that makes a lot of sense. And with the query capability, the DHT could periodically re-evaluate which are the most useful known peers to claim an interest in and replace peers that drop out of a bucket with the next best match.
About the pre-allocation, it seems like we need interfaces for "activating" a resource (open the socket, etc) after we move onto the ownership graph.
And maybe one for the resource to inform the manager when we've been cut off at the knees by the OS (socket closed or whatever). Would closing our own Context be enough in that case?
Intro
One of the major limitations in libp2p currently is that it is difficult to control use of limited resources, such as memory, file descriptors, and bandwidth. This will be useful to ensure resources are allocated in a useful way between subsystems used by a single application and between applications.
The core concept is that a
ResourceManager
will be able to decide what new resources can be newly allocated by what subsystems and applications, and when necessary, forcefully revoke resource access.Here are some potential concrete resource classes (
ResourceClass
) to manage:These resources are ultimately used by some top-level owner. Individual applications are the most obvious owners, but other parts of libp2p that are independent of applications would own resources as well:
Resource graph
When an owner owns a resource, that is represented by an edge from the owner to the resource. To make this more powerful though, we introduce the notion of interior nodes that are both owners and resources: they are abstractions within libp2p that both provide something to a higher level like an application, and depend on lower level resources. For example, we might have an application, which owns a libp2p connection, which owns a muxer, which owns an upgraded connection, which owns a raw TCP connection. Each of these relationships is represented by a graph edge from the owner to the resource, and the complete ownership relation is represented by the transitive closure of the graph. In other words, the application itself "transitively-owns" the connection, the muxer,... all the way down to the TCP connection.
Note that resources can also have multiple owners; e.g. a muxer can be owned by multiple streams running on top of it. To simplify the interaces slightly, the
ResourceManager
can also be thought of as the owner of the "top-level" owners, and becomes the root of the graph. Objects which represent or directly use real resources (like TCP sockets) are the leaves.Leaves should implement
LeafResource
. Objects which own subresources should implementOwner
. Inner nodes should implement bothResource
andOwner
.The actual behavior of the
ResourceManager
depends entirely on its implementation and isn't specified by this graph structure itself. The following is a proposal for such behavior.Limiting resource use
For a resource to be allocated (and not revoked), it must be attached to a part of the graph that can support its "weight." Likewise, owners can be limited in how much weight they are willing to carry (own). If the weight of a subtree exceeds a limit, priorities determine what resources should be shed. Each resource class (e.g. file descriptors, RAM,...) are tracked and limited independently, so they do not affect each other.
Two mechanisms allow control over resource use: (1) limits, and (2) priorities.
Limits
Limits are straightforward: any internal node in the resource graph can have a limit imposed on the total weight of a given resource that can hang below that node.
When a resource has multiple parents, it can be counted against any one of its parents (its "effective parent". In practice, it should be counted against whichever parent allows it to not exceed limits.
Prioritites
When no configuration of effective parents avoids exceeding resource limits, priorites determine which resources are allowed in the tree and which are not. Each edge from an owner to a resource (child) has a numerical priority between 0 (lowest priority) and 1 (highest). Any leaf resource has an effective priority determined by the product of the priority of every edge from itself back to the root owner.
Proposed interfaces
Conceptual resource allocation algorithm
There aren't any interesting choices to be made unless some resource limit is exceeded. The first interesting set of choices involves resources with multiple parents.
When adding or moving a resource to a new position in the graph, it is necessary to try to find a set of paths from the root to each leaf that do not exceed any resource limits. To find such a solution, different choices of effective parents can be made to avoid exceeding any one limit. If any configuration that avoids exceeding any limits is possible, the new configuration is valid.
Otherwise, the lowest priority (as computed by the product of the priorities on each edge from root to leaf) resource that is below the limiting node is revoked, and the new configuration is tested to see if it is valid. If not, repeat the process of removing the lowest priority resource.
In practice, this algorithm is not going to be efficient (I think finding a valid configuration of effective parents is equivalent to SAT). Some reasonable approximation should be fine, however. Any such approximation will be far better than the current state in libp2p.
Other considerations
Pre-allocation
It is often useful to allocate a collection of resources together, since it is not ideal to allocate some resources and then find others needed at the same time aren't available. This situation causes unnecessary resource contention and CPU utilization ("thrashing").
To avoid this, a subtree of "inactive" resource objects can be allocated under the
ResourceManager
'sPlaceholderOwner()
, which has no limits on resource use. "Inactive" indicates that these resource objects would not actually own the resources they represent; e.g. a TCP socket resource wouldn't actually hold a real socket.TODO: this isn't entirely thought through. There may be problems here.
Once the subtree is constructed, it would be activated by a two-step process:
TryMoveResource()
. This atomically succeeds or fails for the entire subtree.This mechanism will probably need some additional interfaces.
Temporary allocation
Sometimes it is necessary to allocate resources without knowing what really ought to own them, for example when handling incoming protocols. This case can be handled by creating special owners (with limits) for these resources, and then moving them to the appropriate place in the tree once the correct owners are identified.
Examples
Consider a libp2p node that supports an application and also serves as a DHT node (i.e. not just a DHT client). This example is highly simplified, but designed to illustrate the concepts described here.
The resource manager would have four children ("top-level owners"):
The first three of these nodes would have resource limits set on them. In particular, the application would have fairly high resource limits, while the other two would have much lower limits.
Outgoing connections
The application, in turn, would own a set of streams that it uses to communicate with other libp2p nodes. Each of these streams would own a mux, which would own an upgraded connection, which would own an underlying connection, which would own a TCP stream. In a more sophisticated example, multiple applications would own streams that share muxes. So far things are fairly straightforward; the application can open streams (and connections) until doing so fails.
Each object in this tree that uses a nontrivial amount of RAM (e.g. significant buffers) should implement the
Resource
interface on the buffer and treat it as a child.TODO: This isn't quite right yet!
To make this more efficient, when opening a new stream, the application should initially allocate this stream and all of the resources under it to the placeholder owner. Then, these resources should be moved under the application. Only if this succeeds should the real TCP connection be opened.
The application will also likely use the DHT for peer routing or content routing. When it does, the application will also own the individual DHT queries it makes, which will in turn own the streams the query uses. Limiting what an individual DHT query can use is done by simply setting resource limits on the query node. Internally, this would make the DHT query logic rate limit by calls to
AddDeadline()
blocking when trying to dial new connections.Incoming connections
When a new connection comes in, it would initially be owned by the temporary owner. As that connection is handled by the switch, an upgraded connection would be created. This upgraded connection would then be added to the temporary owner, and the upgraded connection would own the underlying connection. The same process would happen when a mux is created.
Once the mux opens a stream to the application or DHT, it will be moved to the corresponding top level owner. This ensures that the correct resource limit will be applied as soon as possible.
Routing table
The Kademlia DHT routing table is an interesting case to consider.
Traditional Kademlia uses a routing table composed of buckets that can hold up to K entries ("k buckets"). However, Kademlia is designed as a fairly standalone system, not as part of a larger, modular system like libp2p. The function of the DHT routing table in libp2p can better be described as a way of querying the libp2p peer store, together with a system to ensure that peer store entries that are useful to the DHT are retained (not allowed to be garbage collected) and verified to be still accurate.
In that context, the K bucket size parameter actually specifies the maximum number of peer store entries that the DHT wants to own for its own use (per bucket/peer id range). If other subsystems want to own more entries for their own use, the DHT can still use them, and that benefits the DHT with no cost to the system.
In terms of resource ownership, this means that peer store entries, or at least entries that are periodically pinged to verify connectivity, would be treated as resources as described in this proposal. Each DHT bucket would be an owner, limited to K entries. The bucket objects themselves would then be owned by the DHT. This ensures that the DHT tries to keep around K peer store entries per bucket. Other subsystems could also keep entries around as needed. Peer store entries with no owners would then be subject to garbage collection, either immediately or after a delay (e.g. least recently used policy).
The text was updated successfully, but these errors were encountered: