Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Manager proposal #635

Closed
jhiesey opened this issue May 17, 2019 · 2 comments
Closed

Resource Manager proposal #635

jhiesey opened this issue May 17, 2019 · 2 comments

Comments

@jhiesey
Copy link

jhiesey commented May 17, 2019

Intro

One of the major limitations in libp2p currently is that it is difficult to control use of limited resources, such as memory, file descriptors, and bandwidth. This will be useful to ensure resources are allocated in a useful way between subsystems used by a single application and between applications.

The core concept is that a ResourceManager will be able to decide what new resources can be newly allocated by what subsystems and applications, and when necessary, forcefully revoke resource access.

Here are some potential concrete resource classes (ResourceClass) to manage:

  • RAM
  • File descriptors
  • CPU time
  • Bandwidth
  • Connections provided outside libp2p itself, e.g. TCP, WebSocket, WebRTC
  • NAT table entries in an external box

These resources are ultimately used by some top-level owner. Individual applications are the most obvious owners, but other parts of libp2p that are independent of applications would own resources as well:

  • DHT server, which supports the network as a whole
  • Circuit relay
  • Temporary pool (for handling incoming connections before the target application is found)

Resource graph

When an owner owns a resource, that is represented by an edge from the owner to the resource. To make this more powerful though, we introduce the notion of interior nodes that are both owners and resources: they are abstractions within libp2p that both provide something to a higher level like an application, and depend on lower level resources. For example, we might have an application, which owns a libp2p connection, which owns a muxer, which owns an upgraded connection, which owns a raw TCP connection. Each of these relationships is represented by a graph edge from the owner to the resource, and the complete ownership relation is represented by the transitive closure of the graph. In other words, the application itself "transitively-owns" the connection, the muxer,... all the way down to the TCP connection.

Note that resources can also have multiple owners; e.g. a muxer can be owned by multiple streams running on top of it. To simplify the interaces slightly, the ResourceManager can also be thought of as the owner of the "top-level" owners, and becomes the root of the graph. Objects which represent or directly use real resources (like TCP sockets) are the leaves.

Leaves should implement LeafResource. Objects which own subresources should implement Owner. Inner nodes should implement both Resource and Owner.

The actual behavior of the ResourceManager depends entirely on its implementation and isn't specified by this graph structure itself. The following is a proposal for such behavior.

Limiting resource use

For a resource to be allocated (and not revoked), it must be attached to a part of the graph that can support its "weight." Likewise, owners can be limited in how much weight they are willing to carry (own). If the weight of a subtree exceeds a limit, priorities determine what resources should be shed. Each resource class (e.g. file descriptors, RAM,...) are tracked and limited independently, so they do not affect each other.

Two mechanisms allow control over resource use: (1) limits, and (2) priorities.

Limits

Limits are straightforward: any internal node in the resource graph can have a limit imposed on the total weight of a given resource that can hang below that node.

When a resource has multiple parents, it can be counted against any one of its parents (its "effective parent". In practice, it should be counted against whichever parent allows it to not exceed limits.

Prioritites

When no configuration of effective parents avoids exceeding resource limits, priorites determine which resources are allowed in the tree and which are not. Each edge from an owner to a resource (child) has a numerical priority between 0 (lowest priority) and 1 (highest). Any leaf resource has an effective priority determined by the product of the priority of every edge from itself back to the root owner.

Proposed interfaces

type Resource interface {
	// Makes a context that lasts as long as the resource (is cancelled if
	// the resource is revoked)
	Context() context.Context

	// Returns a new context derived from the context passed in, but also
	// limited to last no longer than the resource
	ContextFromContext(ctx context.Context) context.Context

	// Indicate that the resource is no longer needed
	Destroy()

	// Add a handler that runs when the resource manager decides this resource
	// can no longer be used. Should be used to clean up the underlying real
	// resource
	SetRevocationHandler(func(string) bool)

	// Provides a list of owners for this resource. Guaranteed not to be empty
	Owners() []Owner

	// Indicates if this is a leaf resource (is a LeafResource).
	// Otherwise this is an internal node (is an Owner).
	IsLeaf() bool
}

type LeafResource interface {
	Resource

	// Indicates the class of this resource
	ResourceClass() ResourceClass

	// Inidcates how many units this resource instance uses toward limits
	Units() float32
}

type Owner interface {
	// Lists child resources
	Resources() []Resource

	// Makes this object own a resource. If that resource already had another
	// owner, it will have multiple owners
	AddResource(resource Resource) bool

	// Makes this object own a resource. If limits would be exceeded, wait
	// to see if it will succeed up until the deadline
	AddDeadline(resource Resource, time.Time) bool

	// Makes this object own a resource, removing other owners that it may
	// already have
	MoveResource(resource Resource)

	// Tries to make this object own a resource, succeeding only if moving the
	// entire resource (including all subresources) doesn't exceed any
	// resource limits
	TryMoveResource(resource Resource) bool

	// List resource limits enforced on this node
	ResourceLimits() []ResourceLimit

	// Indicates the priorities for each child. Defaults to 1 for all children
	ResourcePriorities() map[Resoruce]float32
}

type ResourceLimit struct {
	// What type of resource is being limited
	class ResourceClass

	// Maximum number of resource units that can be owned
	limit float32
}

type ResourceManager interface {
	Owner

	// Adds a class of resource to be managed
	AddResourceClass(ResourceClass)

	// Returns the placeholder owner, which is a special child (see below)
	PlaceholderOwner() Owner
}

// A type of resource; e.g. RAM, fd, etc.
type ResourceClass interface {
	// Name of the resource class (for debugging)
	Name() string
}


// The inner nodes may have whatever interace to set their resource limits,
// but this is a minimal interface for setting simple limits
type LimitableResource interface {
	Resource

	// Set a limit of the given number of units
	SetLimit(ResourceClass, units float32)

	// Remove a limit
	RemoveLimit(ResourceClass)

}

Conceptual resource allocation algorithm

There aren't any interesting choices to be made unless some resource limit is exceeded. The first interesting set of choices involves resources with multiple parents.

When adding or moving a resource to a new position in the graph, it is necessary to try to find a set of paths from the root to each leaf that do not exceed any resource limits. To find such a solution, different choices of effective parents can be made to avoid exceeding any one limit. If any configuration that avoids exceeding any limits is possible, the new configuration is valid.

Otherwise, the lowest priority (as computed by the product of the priorities on each edge from root to leaf) resource that is below the limiting node is revoked, and the new configuration is tested to see if it is valid. If not, repeat the process of removing the lowest priority resource.

In practice, this algorithm is not going to be efficient (I think finding a valid configuration of effective parents is equivalent to SAT). Some reasonable approximation should be fine, however. Any such approximation will be far better than the current state in libp2p.

Other considerations

Pre-allocation

It is often useful to allocate a collection of resources together, since it is not ideal to allocate some resources and then find others needed at the same time aren't available. This situation causes unnecessary resource contention and CPU utilization ("thrashing").

To avoid this, a subtree of "inactive" resource objects can be allocated under the ResourceManager's PlaceholderOwner(), which has no limits on resource use. "Inactive" indicates that these resource objects would not actually own the resources they represent; e.g. a TCP socket resource wouldn't actually hold a real socket.

TODO: this isn't entirely thought through. There may be problems here.
Once the subtree is constructed, it would be activated by a two-step process:

  1. Move the subtree to where it actually belongs in the resource tree, using TryMoveResource(). This atomically succeeds or fails for the entire subtree.
  2. Assuming step 1 succeeds, do whatever additional setup needs to be done on the subtree to actually use the underlying resources.

This mechanism will probably need some additional interfaces.

Temporary allocation

Sometimes it is necessary to allocate resources without knowing what really ought to own them, for example when handling incoming protocols. This case can be handled by creating special owners (with limits) for these resources, and then moving them to the appropriate place in the tree once the correct owners are identified.

Examples

Consider a libp2p node that supports an application and also serves as a DHT node (i.e. not just a DHT client). This example is highly simplified, but designed to illustrate the concepts described here.

The resource manager would have four children ("top-level owners"):

  1. The application
  2. The DHT "server" component
  3. A temporary owner for incoming connections/messages
  4. Placeholder owner

The first three of these nodes would have resource limits set on them. In particular, the application would have fairly high resource limits, while the other two would have much lower limits.

Outgoing connections

The application, in turn, would own a set of streams that it uses to communicate with other libp2p nodes. Each of these streams would own a mux, which would own an upgraded connection, which would own an underlying connection, which would own a TCP stream. In a more sophisticated example, multiple applications would own streams that share muxes. So far things are fairly straightforward; the application can open streams (and connections) until doing so fails.

Each object in this tree that uses a nontrivial amount of RAM (e.g. significant buffers) should implement the Resource interface on the buffer and treat it as a child.

TODO: This isn't quite right yet!
To make this more efficient, when opening a new stream, the application should initially allocate this stream and all of the resources under it to the placeholder owner. Then, these resources should be moved under the application. Only if this succeeds should the real TCP connection be opened.

The application will also likely use the DHT for peer routing or content routing. When it does, the application will also own the individual DHT queries it makes, which will in turn own the streams the query uses. Limiting what an individual DHT query can use is done by simply setting resource limits on the query node. Internally, this would make the DHT query logic rate limit by calls to AddDeadline() blocking when trying to dial new connections.

Incoming connections

When a new connection comes in, it would initially be owned by the temporary owner. As that connection is handled by the switch, an upgraded connection would be created. This upgraded connection would then be added to the temporary owner, and the upgraded connection would own the underlying connection. The same process would happen when a mux is created.

Once the mux opens a stream to the application or DHT, it will be moved to the corresponding top level owner. This ensures that the correct resource limit will be applied as soon as possible.

Routing table

The Kademlia DHT routing table is an interesting case to consider.

Traditional Kademlia uses a routing table composed of buckets that can hold up to K entries ("k buckets"). However, Kademlia is designed as a fairly standalone system, not as part of a larger, modular system like libp2p. The function of the DHT routing table in libp2p can better be described as a way of querying the libp2p peer store, together with a system to ensure that peer store entries that are useful to the DHT are retained (not allowed to be garbage collected) and verified to be still accurate.

In that context, the K bucket size parameter actually specifies the maximum number of peer store entries that the DHT wants to own for its own use (per bucket/peer id range). If other subsystems want to own more entries for their own use, the DHT can still use them, and that benefits the DHT with no cost to the system.

In terms of resource ownership, this means that peer store entries, or at least entries that are periodically pinged to verify connectivity, would be treated as resources as described in this proposal. Each DHT bucket would be an owner, limited to K entries. The bucket objects themselves would then be owned by the DHT. This ensures that the DHT tries to keep around K peer store entries per bucket. Other subsystems could also keep entries around as needed. Peer store entries with no owners would then be subject to garbage collection, either immediately or after a delay (e.g. least recently used policy).

@yusefnapora
Copy link
Contributor

Very interesting! I like your idea of the DHT owning a set of entries in the peer store, that makes a lot of sense. And with the query capability, the DHT could periodically re-evaluate which are the most useful known peers to claim an interest in and replace peers that drop out of a bucket with the next best match.

About the pre-allocation, it seems like we need interfaces for "activating" a resource (open the socket, etc) after we move onto the ownership graph.

And maybe one for the resource to inform the manager when we've been cut off at the knees by the OS (socket closed or whatever). Would closing our own Context be enough in that case?

@marten-seemann
Copy link
Contributor

As of v0.18.0, libp2p now has a resource manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants