On the Design of an Internetwork Layer

or Replacing the Internet Protocol

Introduction

This paper examines the design of an internetwork layer with sufficient scope and flexibility to handle the needs of the growing global Internet. In particular, we design an internetwork layer that could replace the current Internet Protocol and would remain viable as the Internet's internetwork layer for the foreseeable future.

We start with the service model we expect from an internetwork layer, add some high level architectural issues for achieving a network that lasts, discuss some results that seem inevitable, and finally propose the sub-systems needed to make it all work.

Service Model

The basic service provided by an internetwork protocol layer is that of a datagram service between any and all end-points that may wish to communicate. Obviously it must have naming and addressing mechanisms suitable for the number of end-points envisioned.

More than that, we are now seeing requirements on the net that it be capable of providing more than just delivery of the packets; end-points wish more control over how their packets are delivered. These requirements are usually grouped together under the term Quality of Service and include policy routing and resource reservation. The internetwork layer needs to provide these capabilities. Ideally the policy decisions are made by a combination of policies from the source and destination.

Security is the biggest open question in the design of an internetwork layer. Radia Perlman's thesis provides guidance as to how we can protect, at the network layer, against denial of service and theft of service attacks. The question is: which security services should the internetworking layer supply?

Multicast is also a requirement. I'm not going to say much more than that, not because I don't think multicast is important or that I want to slight it, I just haven't thought about it much and will leave that work to others who have. None the less, keep multicast in mind for everything that follows.

Finally, the internetwork layer has to be able to forward the packets. Networks are getting faster and will continue to do so. The internetwork layer must keep up while also providing the quality of service and security capabilities discussed above. For example, if providing the quality of service asked requires packet classification, the internetwork layer should provide means to allow efficient implementation of classification.

Architectural Issues

In this section we cover a few architectural issues for an internetwork layer. In part these are goodness and light sorts of things, but they apply specifically to the internetwork layer we're building.

Names, Addresses, and Routes

In routing we repeatedly work with three entities, names, addresses, and routes. In short, a name specifies the object it names, an address tells you where the object is, and a route tells you how to get there. An IP address is, properly, an address. It is frequently used, however, as a name. Also, with years of usage people have come to expect anything that is called an addresses to act like an IP address. Therefore, we've abandoned the term address in favor of locator, leaving address to just mean IP addresses.

names

Names give a label for something. Generally they are unique within some context. A name for me is "David Bridgham", unique within my group of friends. To guarantee unique names for computers we invent computer friendly names like social security numbers. We sometimes refer to these as identifiers; it is still a name. Names tell you nothing about where the object is or how to get to it. They do have some internal structure; my name contains a family part and my social security number tells you what region I lived in when I was registered. This structure is useful primarily to whoever is handing out the names.

locators

My locator is Reading, Massachusetts, USA Terra. Actually, that is my home. Right now I'm mobile with a locator of Whistler, British Columbia, Canada Terra. These locators don't tell you who I am. Next week, the Whistler locator will get you to someone else. They also do not tell you how to get to me. That is, the locator neither names me nor is a route to me. The locator does give you the information to go find the route though, by direct lookup in a routing table system or by naming points on a map from which you can compute a route in a map based system. My locator changes as I move though my name remains the same.

You may note that a locator is a name too. It names the location of the object instead of the object itself.

routes

A route is how to get from somewhere to somewhere. It depends on where you are and where you want to go. A whole host of other issues must also be considered such as policy and resource availability. Given source and destination locators you need to be able to discover some number of routes between here and there amongst which you select according to your policy or other considerations.

selectors

Another new term is selector. The selector is those parts of the packet a packet forwarder uses to figure out how it is going to forward this packet. The key aspect here is that the selector must provide enough granularity to distinguish packets that need different handling.

do we need all these?

It is not necessarily the case that an internetwork layer must have all of names, locators, routes, and selectors explicitly (though I believe it should). However, any network layer protocol needs each of these functions. Look to see how they do each and if it is not explicit, then either some header field is doing double duty or it is derived from fields that are there. Ask how that will cause problems.

IP uses its addresses as locators, identifiers, and selectors. The primary use of the address is as a locator. The address belongs to an interface and names where that interface is connected to the network. When you move, your address changes. Except the address is used as an identifier as well; it's wired into the TCP checksum de-multiplexing and calculation so if the address is changed mid-connection, the TCP connection is lost. The DNS name of a machine doesn't change as it moves around the network but its identifier (as embodied in IP by its IP address) does.

NOTE: Of course, some people argue that the DNS name should also change when a host moves. They're really confused.

IP uses the destination IP address as its primary selector. Some routers also use the TOS field to provide a limited quality of service control over routing but since the TOS field has never come into widespread use in the Internet some routers also look for well known TCP or UDP ports for that information. With a requirement for per-connection quality of service, IPng will need better granularity in its selectors.

For destinations with multiple IP addresses the source is left to choose which destination address to use. Those different addresses may very well select different routes (while addresses are not routes they are used to find routes) but no routing information is available to the source that would enable it to make a reasonable decision.

Can't Change it All at Once

The future Internet will be too large to change anything, everywhere at once. This is a simple statement but with many implications.

NOTE: One of the key points of special relativity is that across any significant distance there is no such thing as simultaneous. The Internet has significant distances.

Overall it implies that we should remove anything possible from the core internetwork layer in such a way that it is left to local implementation rather than distributed. Anything that is left to local implementation may be replaced locally as better technology comes along. More importantly, it can be replaced incrementally through the Internet.

Routing and Scaling

A major challenge to a replacement for IP is that of solving the routing problem in a way that scales to the size of the net we want to be able to build. The issue of scaling a routing system is that of information abstraction.

It is obvious that the future Internet will be too large for any one router, much less all routers, to know how to route to everywhere. They need a way to work when they know less then the entire routing state of the network. The question is how to do this?

One answer is what we do today in IP; information is pruned strictly by the addressing hierarchy. CIDR increases the number of levels in the addressing hierarchy to decrease the size of each level (especially the top one). Since this scheme only routes packets along the addressing hierarchy it constrains the network topology to also follow the addressing hierarchy. This is a limitation we should not place on the future Internet just yet.

Another important characteristic of the current IP approach to information abstraction in routing is that it is all precomputed in a distributed manner. The routing system figures out which routes are possible before it has need of any of those routes. With policy, these pre-computed routes include policy variations. Obviously this limits the range of possible policies to what the network as a whole knows how to compute.

The obvious alternative to precomputed routes is one where a router acquires routing information on demand. In this way the information it acquires is relevant to the router's needs and the router can process the information with respect to real policy requirements rather than imagined ones.

Results

We take the service model along with the architectural principles listed above and we get the results of this section. I certainly can't claim these are the only possible answers that satisfy the constraints, but in some cases the conclusions do seem inevitable.

Source Selected Routes

Quality of service, scaling, and a preference for local instead of distributed systems imply that source selected routes are preferable to hop-by-hop routing. It doesn't necessarily have to be the source that picks the route. The important thing is that it is done in one place rather than distributed across the network. But we'll continue calling it source routing for now.

With the route chosen in one place, the routing algorithms do not have to be consistent networkwide to prevent routing loops. New routing algorithms can be fielded experimentally or incrementally replaced as better technology is created and the range of possible policy considerations can change with time. Actually, the route selection can be distributed recursively (consider the routing between specified nodes in IP loose source routing) and still gain these benefits.

It is unrealistic to expect a network wide routing system to compute all the possible routes taking all the possible policies into account. This includes issues like provider selection. It is easy to envision how to do provider selection through such hacks as multiple addresses if there is only one layer of provider above each end site. It is even conceivable that the routing protocol might compute those different routes. Once you envision a larger network, with three or four levels of provider above each site, it's quickly apparent that this approach simply does not scale.

The traditional reason that the Internet prefers hop-by-hop routing is that it more readily allows healing of network failures. Once quality of service enters the picture however (either policy or resource reservation), whoever is re-routing around the failure must understand the quality of service requirements of the connection it is re-routing. With distributed routing we have to distribute quality of service requirements to any router that might have to re-route whereas with source routing we have to send network failure information back to the source. Of the two pieces of information, network failure is not as likely to need extensions every few months so that is the preferable information to standardize into a protocol that must be globally consistent.

Source routing appears to be the choice that allows the flexibility the net needs.

Flows

Source routes provide the necessary flexibility, but with a very large network and locators that are variable length and long because of the size of the network, a source route could be enormous. Including the full source route with every packet would be grossly inefficient. If an end-point is going to use this route for several packets then an obvious optimization is to send the source route through the network once to blaze a path and let the rest of the packets follow the path.

Maps

For the source route calculator to do its job, it must be told the source and destination of the route, it needs the quality of service requirements to apply (for both source and destination), and it needs information about the network's topology and quality of service capabilities. Since it is not feasible for any router to know the entire network's topology and quality of service capabilities, it must have some way of acquiring only the information it needs. As discussed above in the section on routing and scaling, there is either the ``precomputed'' approach or the ``go out and get it'' approach. Since we quality of service requirements will create a vast number of possible answers, precomputing is not feasible. Therefore, we need a way for the route calculator to go out and ask the network for its topology and quality of service information. In other words, it will get maps of the network.

Since these maps contain quality of service information which ranges from concrete information like the bandwidth and latency characteristics of a network to very nebulous concepts like the NSFNet Appropriate Use Policy, the map specification must be extensible and flexible. Since the map specification only transfers the information, it doesn't have to understand it. The usual specification care can give us a map specification that will name any future characteristics we care to add.

Renumbering

As the network grows, it seems quite likely that end-points will need to be renumbered. As the network changes and grows, the locator hierarchy will become less optimal. That is, to get good routes, routing information will have to be propagated further up the locator tree. To reduce this spread of routing information you find a new locator hierarchy; you renumber some part of the network.

Since the new locator hierarchy could affect large portions of the net, even the entire net, and it will not be possible to notify the entire network simultaneously of this change, the network must continue to work while the renumbering is in progress and only some of the nodes to be renumbered have been notified. I'd suggest that this implies that any such renumbering must be automatic and hopefully invisible but I'm not completely sure of that.

One implication that I am sure of is that this means you really do not want to use locators as end-point identifiers. The end-point to which you are speaking is still there even if its locator changes.

Note that this sounds very similar to some mobile host requirements. A proper solution here may help both.

Network State

Some engineering decisions that are still very much open have to do with network state. Obviously this internetwork layer has a fair amount of state spread around the net. Resource reservation puts state in the forwarders; if flows are used as an optimization to source routes, the flow data is network state; multicast systems have their distribution trees. The maps with their network attributes and the copies of those maps in the the routers around the net is a large amount of state.

Some state, service state, by its nature can not go in the packets. Resource reservations and packet charging data are examples. Other information, user state, could go in either.

The task for any internetwork layer is to identify which state goes in the packets and which in the network, and for state that goes in the network, how does it get there and how is it maintained? Should the routers infer state from traffic they see or should state be installed explicitly? A lot of these will be efficiency tradeoffs with the decisions based on assumed network traffic patterns. In some cases we may be able to allow for either option and find out after the net is running which way the bulk of the traffic goes.

Sub-Systems

The internetwork layer is a large system that is to deliver the service described in the previous section. In proper top-down design we now break the internetwork system into sub-systems to carry out all the tasks necessary. The important point to understand is what services each sub-system needs from the other sub-systems and what services they provide. These sub-systems are: forwarding, route calculation, route distribution, resource reservation, flow setup, multicast, and security.

Forwarding and Route Calculation

In IP, the forwarding and route calculation sub-systems are intimately tied together. High performance routers put a cache in front of the route calculation so it does not really happen each time, but they are trying to emulate doing a routing calculation for each packet forwarded. In a world of policy routing and resource reservation, it no longer works as well to tie forwarding and route calculation together in this way. By separating the route calculation and the forwarding sub-systems, we free the forwarding to run with less overhead and we allow the possibility that the route calculation could run elsewhere. Forwarding, by its nature, is distributed. Route calculation has been distributed but distributing enough policy information to make policy routing decisions is quite an interesting problem. By removing route calculation from the forwarders, we have put it under local control. It no longer has to be standardized.

Route Distribution

IP left the route distribution mechanism out of the internetwork layer. Routing protocols, to be developed later, would inform the forwarding/routing sub-system about what they learned. Since one of the driving forces behind developing a replacement for IP is the need for an internetwork layer that will scale beyond IP, we need to make sure that the route distribution sub-system is up to the task. The main client of the route distribution sub-system is the route calculation sub-system. It wants network maps with quality of service information, so that is the offered service of the route distribution sub-system. This part needs to be standardized across the network.

How the route distribution sub-system learns its map to distribute, how it knows what tags to put on which nodes, how it abstracts information from nodes lower down in the tree, these are all left to local decision. The protocols and algorithms that implement these parts of the network mapping system need not be standardized networkwide and so can be incrementally enhanced as we learn better ways to do them.

NOTE: Another piece here is policy distribution. In Nimrod we've put the route calculator at the source. But if we're going to allow destination policy too it needs a way to learn about the destination's policy requirements. More tags in the map? Maybe just a tag that says ``here be destination policies'' and you can invoke the destination policy protocol to learn about them? I like the idea of just putting destination policies in the map.

Flow Setup and Resource Reservation

The flow setup sub-system takes the route from the route calculation sub-system and talks to each of the forwarding sub-systems, informing them of the flow to come, how to identify which packets belong to the flow, and how to handle the packets. Flow setup needs a back path to route calculation to ask for a new route if the flow setup fails at some point. Likewise the resource reservation sub-system.

NOTE: Flow setup and resource reservation seem awfully similar. Should they just be combined? I know RSVP is trying to do one without the other but they might work better combined.

Multicast

The multicast sub-system takes network information from the route distribution sub-system, calculates its multicast trees, and uses the flow setup and resource reservation sub-systems to set up the multicast trees. The one additional requirement of multicast on the other sub-systems is that flows need the ability to have forks or multiple destinations.

Like route calculation, computing the multicast tree is a local implementation. With better technology, you can replace the code in one place without disturbing the rest of the net.

The multicast sub-system needs a way to learn of the end points. This is certainly a widespread protocol where everyone involved has to agree. Only, however, within a single multicast communication. Each multicast system could use a different protocol here, again allowing for incremental upgrading of the technology.

Security

All of this communication between sub-systems should at least have the capability of being secure with authentication being the primary concern.

At the forwarding level, protection against internetwork level denial of service attacks requires authenticated packet classification so each packet needs authentication information.

The security sub-system is concerned with the maintenance and operation of these security pieces.

Further Work

This paper points out a direction of the internetwork layer; it is not a detailed specification. A huge amount of engineering work remains. How is the locator hierarchy laid out so it can expand with the network? We need the details of the protocols whereby the various sub-systems communicate. Where do we draw the line between state that goes into the net and state we keep in the packets? Given the ability to acquire network maps how do we generate routes from those maps and what algorithms are good for figuring out just which maps we should get? How do we best build a multicast tree? How do we measure routing inefficiency so we know when it is time to renumber some part of the net?

We need to answer some of these questions up front. Some of them fortunately, are left to local implementation so we can field something that works now and replace parts as we learn better.

Conclusion

This paper lays out an architecture for an internetworking layer. It is capable of meeting the requirements of the service model and it has aspects for dealing with the architectural issues such that should scale to whatever size internet we build. It may not be the only architecture that answers the requirements but any other possibility must answer the same questions.