On the Design of an Internetwork Layer
On the Design of an Internetwork Layer
or Replacing the Internet Protocol
Introduction
This paper examines the design of an internetwork layer with
sufficient scope and flexibility to handle the needs of the growing
global Internet. In particular, we design an internetwork layer that
could replace the current Internet Protocol and would remain viable as
the Internet's internetwork layer for the foreseeable future.
We start with the service model we expect from an internetwork layer,
add some high level architectural issues for achieving a network that
lasts, discuss some results that seem inevitable, and finally propose
the sub-systems needed to make it all work.
Service Model
The basic service provided by an internetwork protocol layer is that
of a datagram service between any and all end-points that may wish to
communicate. Obviously it must have naming and addressing mechanisms
suitable for the number of end-points envisioned.
More than that, we are now seeing requirements on the net that it be
capable of providing more than just delivery of the packets;
end-points wish more control over how their packets are
delivered. These requirements are usually grouped together under the
term Quality of Service and include policy routing and
resource reservation. The internetwork layer needs to provide these
capabilities. Ideally the policy decisions are made by a combination
of policies from the source and destination.
Security is the biggest open question in the design of an internetwork
layer. Radia Perlman's thesis provides guidance as to how we can
protect, at the network layer, against denial of service and theft of
service attacks. The question is: which security services should the
internetworking layer supply?
Multicast is also a requirement. I'm not going to say much more than
that, not because I don't think multicast is important or that I want
to slight it, I just haven't thought about it much and will leave that
work to others who have. None the less, keep multicast in mind for
everything that follows.
Finally, the internetwork layer has to be able to forward the packets.
Networks are getting faster and will continue to do so. The
internetwork layer must keep up while also providing the quality of
service and security capabilities discussed above. For example, if
providing the quality of service asked requires packet classification,
the internetwork layer should provide means to allow efficient
implementation of classification.
Architectural Issues
In this section we cover a few architectural issues for an
internetwork layer. In part these are goodness and light sorts of
things, but they apply specifically to the internetwork layer we're
building.
Names, Addresses, and Routes
In routing we repeatedly work with three entities, names,
addresses, and routes. In short, a name specifies
the object it names, an address tells you where the object is, and a
route tells you how to get there. An IP address is, properly, an
address. It is frequently used, however, as a name. Also, with years
of usage people have come to expect anything that is called an
addresses to act like an IP address. Therefore, we've abandoned the
term address in favor of locator, leaving
address to just mean IP addresses.
names
Names give a label for something. Generally they are unique within
some context. A name for me is "David Bridgham", unique within my
group of friends. To guarantee unique names for computers we invent
computer friendly names like social security numbers. We sometimes
refer to these as identifiers; it is still a name. Names
tell you nothing about where the object is or how to get to it. They
do have some internal structure; my name contains a family part and my
social security number tells you what region I lived in when I was
registered. This structure is useful primarily to whoever is handing
out the names.
locators
My locator is Reading, Massachusetts, USA Terra. Actually, that is my
home. Right now I'm mobile with a locator of Whistler, British
Columbia, Canada Terra. These locators don't tell you who I am. Next
week, the Whistler locator will get you to someone else. They also do
not tell you how to get to me. That is, the locator neither names me
nor is a route to me. The locator does give you the information to go
find the route though, by direct lookup in a routing table system or
by naming points on a map from which you can compute a route in a map
based system. My locator changes as I move though my name remains the
same.
You may note that a locator is a name too. It names the location of
the object instead of the object itself.
routes
A route is how to get from somewhere to somewhere. It depends on
where you are and where you want to go. A whole host of other issues
must also be considered such as policy and resource availability.
Given source and destination locators you need to be able to discover
some number of routes between here and there amongst which you select
according to your policy or other considerations.
selectors
Another new term is selector. The selector is those parts of
the packet a packet forwarder uses to figure out how it is going to
forward this packet. The key aspect here is that the selector must
provide enough granularity to distinguish packets that need different
handling.
do we need all these?
It is not necessarily the case that an internetwork layer must have
all of names, locators, routes, and selectors explicitly (though I
believe it should). However, any network layer protocol needs each of
these functions. Look to see how they do each and if it is not
explicit, then either some header field is doing double duty or it is
derived from fields that are there. Ask how that will cause problems.
IP uses its addresses as locators, identifiers, and selectors. The
primary use of the address is as a locator. The address belongs to an
interface and names where that interface is connected to the network.
When you move, your address changes. Except the address is used as an
identifier as well; it's wired into the TCP checksum de-multiplexing
and calculation so if the address is changed mid-connection, the TCP
connection is lost. The DNS name of a machine doesn't change as it
moves around the network but its identifier (as embodied in IP by its
IP address) does.
NOTE: Of course, some people argue that the
DNS name should also change when a host moves. They're
really confused.
IP uses the destination IP address as its primary selector. Some
routers also use the TOS field to provide a limited quality of service
control over routing but since the TOS field has never come into
widespread use in the Internet some routers also look for well known
TCP or UDP ports for that information. With a requirement for
per-connection quality of service, IPng will need better granularity
in its selectors.
For destinations with multiple IP addresses the source is left to
choose which destination address to use. Those different addresses
may very well select different routes (while addresses are not routes
they are used to find routes) but no routing information is available
to the source that would enable it to make a reasonable decision.
Can't Change it All at Once
The future Internet will be too large to change anything, everywhere
at once. This is a simple statement but with many
implications.
NOTE: One of the key points of special relativity is that
across any significant distance there is no such thing as
simultaneous. The Internet has significant distances.
Overall it implies that we should remove anything possible from the
core internetwork layer in such a way that it is left to local
implementation rather than distributed. Anything that is left to
local implementation may be replaced locally as better technology
comes along. More importantly, it can be replaced incrementally
through the Internet.
Routing and Scaling
A major challenge to a replacement for IP is that of solving the
routing problem in a way that scales to the size of the net we want to
be able to build. The issue of scaling a routing system is that of
information abstraction.
It is obvious that the future Internet will be too large for any one
router, much less all routers, to know how to route to everywhere.
They need a way to work when they know less then the entire routing
state of the network. The question is how to do this?
One answer is what we do today in IP; information is pruned strictly
by the addressing hierarchy. CIDR increases the number of levels in
the addressing hierarchy to decrease the size of each level
(especially the top one). Since this scheme only routes packets along
the addressing hierarchy it constrains the network topology to also
follow the addressing hierarchy. This is a limitation we should not
place on the future Internet just yet.
Another important characteristic of the current IP approach to
information abstraction in routing is that it is all precomputed in a
distributed manner. The routing system figures out which routes are
possible before it has need of any of those routes. With policy,
these pre-computed routes include policy variations. Obviously this
limits the range of possible policies to what the network as a whole
knows how to compute.
The obvious alternative to precomputed routes is one where a router
acquires routing information on demand. In this way the information
it acquires is relevant to the router's needs and the router can
process the information with respect to real policy requirements
rather than imagined ones.
Results
We take the service model along with the architectural principles
listed above and we get the results of this section. I certainly
can't claim these are the only possible answers that satisfy the
constraints, but in some cases the conclusions do seem inevitable.
Source Selected Routes
Quality of service, scaling, and a preference for local instead of
distributed systems imply that source selected routes are preferable
to hop-by-hop routing. It doesn't necessarily have to be the source
that picks the route. The important thing is that it is done in one
place rather than distributed across the network. But we'll continue
calling it source routing for now.
With the route chosen in one place, the routing algorithms do not have
to be consistent networkwide to prevent routing loops. New routing
algorithms can be fielded experimentally or incrementally replaced as
better technology is created and the range of possible policy
considerations can change with time. Actually, the route selection
can be distributed recursively (consider the routing between specified
nodes in IP loose source routing) and still gain these benefits.
It is unrealistic to expect a network wide routing system to compute
all the possible routes taking all the possible policies into account.
This includes issues like provider selection. It is easy to envision
how to do provider selection through such hacks as multiple addresses
if there is only one layer of provider above each end site. It is
even conceivable that the routing protocol might compute those
different routes. Once you envision a larger network, with three or
four levels of provider above each site, it's quickly apparent that
this approach simply does not scale.
The traditional reason that the Internet prefers hop-by-hop routing is
that it more readily allows healing of network failures. Once quality
of service enters the picture however (either policy or resource
reservation), whoever is re-routing around the failure must understand
the quality of service requirements of the connection it is
re-routing. With distributed routing we have to distribute quality of
service requirements to any router that might have to re-route whereas
with source routing we have to send network failure information back
to the source. Of the two pieces of information, network failure is
not as likely to need extensions every few months so that is the
preferable information to standardize into a protocol that must be
globally consistent.
Source routing appears to be the choice that allows the flexibility
the net needs.
Flows
Source routes provide the necessary flexibility, but with a very large
network and locators that are variable length and long because of the
size of the network, a source route could be enormous. Including the
full source route with every packet would be grossly inefficient. If
an end-point is going to use this route for several packets then an
obvious optimization is to send the source route through the network
once to blaze a path and let the rest of the packets follow the path.
Maps
For the source route calculator to do its job, it must be told the
source and destination of the route, it needs the quality of service
requirements to apply (for both source and destination), and it needs
information about the network's topology and quality of service
capabilities. Since it is not feasible for any router to know the
entire network's topology and quality of service capabilities, it must
have some way of acquiring only the information it needs. As
discussed above in the section on routing and scaling, there is either
the ``precomputed'' approach or the ``go out and get it'' approach.
Since we quality of service requirements will create a vast number of
possible answers, precomputing is not feasible. Therefore, we need a
way for the route calculator to go out and ask the network for its
topology and quality of service information. In other words, it will
get maps of the network.
Since these maps contain quality of service information which ranges
from concrete information like the bandwidth and latency
characteristics of a network to very nebulous concepts like the NSFNet
Appropriate Use Policy, the map specification must be extensible and
flexible. Since the map specification only transfers the information,
it doesn't have to understand it. The usual specification care can
give us a map specification that will name any future characteristics
we care to add.
Renumbering
As the network grows, it seems quite likely that end-points will need
to be renumbered. As the network changes and grows, the locator
hierarchy will become less optimal. That is, to get good routes,
routing information will have to be propagated further up the locator
tree. To reduce this spread of routing information you find a new
locator hierarchy; you renumber some part of the network.
Since the new locator hierarchy could affect large portions of the
net, even the entire net, and it will not be possible to notify the
entire network simultaneously of this change, the network must
continue to work while the renumbering is in progress and only some of
the nodes to be renumbered have been notified. I'd suggest that this
implies that any such renumbering must be automatic and hopefully
invisible but I'm not completely sure of that.
One implication that I am sure of is that this means you really do not
want to use locators as end-point identifiers. The end-point to which
you are speaking is still there even if its locator changes.
Note that this sounds very similar to some mobile host requirements.
A proper solution here may help both.
Network State
Some engineering decisions that are still very much open have to do
with network state. Obviously this internetwork layer has a fair
amount of state spread around the net. Resource reservation puts
state in the forwarders; if flows are used as an optimization to
source routes, the flow data is network state; multicast systems have
their distribution trees. The maps with their network attributes and
the copies of those maps in the the routers around the net is a large
amount of state.
Some state, service state, by its nature can not go in the packets.
Resource reservations and packet charging data are examples. Other
information, user state, could go in either.
The task for any internetwork layer is to identify which state goes in
the packets and which in the network, and for state that goes in the
network, how does it get there and how is it maintained? Should the
routers infer state from traffic they see or should state be installed
explicitly? A lot of these will be efficiency tradeoffs with the
decisions based on assumed network traffic patterns. In some cases we
may be able to allow for either option and find out after the net is
running which way the bulk of the traffic goes.
Sub-Systems
The internetwork layer is a large system that is to deliver the
service described in the previous section. In proper top-down design
we now break the internetwork system into sub-systems to carry out all
the tasks necessary. The important point to understand is what
services each sub-system needs from the other sub-systems and what
services they provide. These sub-systems are: forwarding, route
calculation, route distribution, resource reservation, flow setup,
multicast, and security.
Forwarding and Route Calculation
In IP, the forwarding and route calculation sub-systems are intimately
tied together. High performance routers put a cache in front of the
route calculation so it does not really happen each time, but they are
trying to emulate doing a routing calculation for each packet
forwarded. In a world of policy routing and resource reservation, it
no longer works as well to tie forwarding and route calculation
together in this way. By separating the route calculation and the
forwarding sub-systems, we free the forwarding to run with less
overhead and we allow the possibility that the route calculation could
run elsewhere. Forwarding, by its nature, is distributed. Route
calculation has been distributed but distributing enough policy
information to make policy routing decisions is quite an interesting
problem. By removing route calculation from the forwarders, we have
put it under local control. It no longer has to be standardized.
Route Distribution
IP left the route distribution mechanism out of the internetwork
layer. Routing protocols, to be developed later, would inform the
forwarding/routing sub-system about what they learned. Since one of
the driving forces behind developing a replacement for IP is the need
for an internetwork layer that will scale beyond IP, we need to make
sure that the route distribution sub-system is up to the task. The
main client of the route distribution sub-system is the route
calculation sub-system. It wants network maps with quality of service
information, so that is the offered service of the route distribution
sub-system. This part needs to be standardized across the network.
How the route distribution sub-system learns its map to distribute,
how it knows what tags to put on which nodes, how it abstracts
information from nodes lower down in the tree, these are all left to
local decision. The protocols and algorithms that implement these
parts of the network mapping system need not be standardized
networkwide and so can be incrementally enhanced as we learn better
ways to do them.
NOTE: Another piece here is policy distribution.
In Nimrod we've put the route calculator at the source. But if we're
going to allow destination policy too it needs a way to learn about
the destination's policy requirements. More tags in the map? Maybe
just a tag that says ``here be destination policies'' and you can
invoke the destination policy protocol to learn about them? I like
the idea of just putting destination policies in the map.
Flow Setup and Resource Reservation
The flow setup sub-system takes the route from the route calculation
sub-system and talks to each of the forwarding sub-systems, informing
them of the flow to come, how to identify which packets belong to the
flow, and how to handle the packets. Flow setup needs a back path to
route calculation to ask for a new route if the flow setup fails at
some point. Likewise the resource reservation sub-system.
NOTE: Flow setup and resource reservation seem
awfully similar. Should they just be combined? I know RSVP is trying
to do one without the other but they might work better combined.
Multicast
The multicast sub-system takes network information from the route
distribution sub-system, calculates its multicast trees, and uses the
flow setup and resource reservation sub-systems to set up the
multicast trees. The one additional requirement of multicast on the
other sub-systems is that flows need the ability to have forks or
multiple destinations.
Like route calculation, computing the multicast tree is a local
implementation. With better technology, you can replace the code in
one place without disturbing the rest of the net.
The multicast sub-system needs a way to learn of the end points. This
is certainly a widespread protocol where everyone involved has to
agree. Only, however, within a single multicast communication. Each
multicast system could use a different protocol here, again allowing
for incremental upgrading of the technology.
Security
All of this communication between sub-systems should at least have the
capability of being secure with authentication being the primary
concern.
At the forwarding level, protection against internetwork level denial
of service attacks requires authenticated packet classification so
each packet needs authentication information.
The security sub-system is concerned with the maintenance and
operation of these security pieces.
Further Work
This paper points out a direction of the internetwork layer; it is not
a detailed specification. A huge amount of engineering work remains.
How is the locator hierarchy laid out so it can expand with the
network? We need the details of the protocols whereby the various
sub-systems communicate. Where do we draw the line between state that
goes into the net and state we keep in the packets? Given the ability
to acquire network maps how do we generate routes from those maps and
what algorithms are good for figuring out just which maps we should
get? How do we best build a multicast tree? How do we measure
routing inefficiency so we know when it is time to renumber some part
of the net?
We need to answer some of these questions up front. Some of them
fortunately, are left to local implementation so we can field
something that works now and replace parts as we learn better.
Conclusion
This paper lays out an architecture for an internetworking layer. It
is capable of meeting the requirements of the service model and it has
aspects for dealing with the architectural issues such that should
scale to whatever size internet we build. It may not be the only
architecture that answers the requirements but any other possibility
must answer the same questions.