Why you don't need stateful middle-boxes or transport layer state in your network when you use IPv6 with Elastic Compute and Connectivity Infrastructure

Neither network state – like what TCP gives you – nor stateful middle-boxes – like advanced firewalls or load balancers – are necessary when you have IPv6 networks. This is a significant departure from the IPv4 world where both are a necessary, even fundamental, component of designing a safe, scalable, fast, and resilient network.

I acknowledge that both of those statements are provocative.

First, why is state fundamental to IPv4 networks? This really amounts to two things: perimeter security and address scarcity. And that really boils down to one thing: address scarcity:

Because you don’t have enough addresses for every host on the LAN, you must NAT.
Because you must NAT, there is a requirement that response traffic be delivered back to the host that originated the transport layer session.
Because you are tracking transport layer state, you have a created an artificial transport layer choke-point at the LAN perimeter that offers an opportunity to centralize control of the transport layer port space using stateful packet filters, thus you have a firewall.
Because you are controlling the transport layer port space centrally and you are NAT-ing the LAN, you are presenting a virtualized destination address for any Internet-accessible services in the data center or DMZ, which makes them address-bound instead of name-bound, so you need a horizontal-scaling method, thus you have a layer-4 load balancer.
Because you are NAT-ing and load-balancing layer-4, you care about host failures behind the load balancer and how they effect user experience, so you add application layer host monitoring and re-load balancing on failure, thus you have an application delivery controller.

If you don’t NAT, you don’t have a strategic point of control for the transport layer port space; you don’t have a common destination address for Internet accessible services; you don’t have to scale that destination address horizontally; you don’t have to manage failures when forwarding traffic to a host with a private IP address. And if you still try to do all of those things, you create completely unnecessary and unjustifiable complication in your network

How does IPv6 change this?

The most obvious way is by eliminating address scarcity from the problem space. A /64 network prefix has 18 quintillion addresses; you are not going to run out. The second way is that with 18 quintillion hexadecimal strings, you will be using DNS to manage them; this means that you have an established tool to manage traffic management that resides earlier in the sequence of events than a load-balancer or firewall and can scale better than any load-balancer, firewall, or application delivery controller ever will. The third way this changes is when you give your elastic compute and connectivity infrastructure functionally infinite addresses; when you combine that with DNS and identity-based access controls (i.e., logins) you can dynamically create an IP destination per user.

When you know that only one use is going to be using the service, you eliminate a whole segment of the architectural and security concerns inherent in the design of Internet accessible services. You no longer need to make decisions about where to send a packet that involve the transport layer (or higher). You have removed state from the network by abandoning the transport layer and tracking state using the session layer in the background, away from the network forwarding plane, by equating user login with network session. That is what giving elastic infrastructure a functionally infinite global address space gives you. Freedom from NAT, from cookies, from packet inspection, from virtual servers, from middle boxes, from stateful devices, and from failure domains for services that effect more than one user.

When you move the complexity of delivering services from the forwarding plane to the configuration plane, lots of things are possible – the SDN hype is built upon that – but without removing state and increasing the address space, there isn’t any actual complexity savings (in fact, the complexity probably goes up). Vendors are counting on making money from that complexity.

But what about Security? (think of the children!)

[Full disclosure: I think that most security products are snake oil and that security vendors are selling quack medicine to scared but ignorant and/or powerless people who are afraid for their jobs. It is the worst kind of greedy abusive vendor FUD to create and perpetuate a market that exists mostly because of incompetence, sacred cows, and wizard worship (or the acts of the very people who are selling the protection; White Hats who used to be Black Hats have a special place in Hell). That doesn’t mean that I don’t think about security, or that there aren’t cyber bad guys, or that being smart and prudent isn’t necessary, just that I don’t those who take advantage of the weak or stupid.]

Do you need a well configured Peer Edge router that blocks traffic in both directions that isn’t supposed to be there? Absolutely. Should you expect that your ISP or your peers implement BCP38 and BCP84? Yes, but you should be prepared to deal with them not doing it. But for IP address spoofing attacks against your infrastructure you’ve already done the best thing you can to de-fang them; you have disaggregated your vulnerabilities.

When you have all the traffic going to a single IP address, an attacker may focus their attack on that IP address and if they succeed everyone suffers. But when you hand out a different IP address to every user and your filters are dynamically adjusting to per user sessions, it becomes very difficult to conduct a brute force attack against your application. Knowing that the DNS request came from a real client (because you are using TCP not UDP), and knowing that the user authentication has succeeded from a particular source address, so only that source address is going to be permitted past the filter and only to the expected destination results in a more perimeter defense that is simultaneously more aggressive and more nuanced.

Likewise, advanced attacks against your services now must begin with identity theft, and we are actually pretty good at keeping those kinds of attacks at bay through the use of time based one time passwords and multi-factor, multi-vector authentication. And, when the compromise of a password only results in the compromise of a single shard of an application that has been personalized for a user but is compartmentalized away from everything else, the totality of the effect is much much smaller. Tracking the behavior of a client against an application when there is only one client being tracked at a time is so much easier than trying to do this for millions of concurrent users (as we do now) that Web application firewalls can now become even better at transactional policy enforcement and the prevention of more sophisticated attacks against the application infrastructure without the expense of purpose-built security appliances or special-purpose software; WAF becomes a module in the Web server or the Framework that can use developer written unit-tests and data-driven fuzzy logic and pattern matching for “always on security” not a product on a vendor price list.

Disaggregating traffic by user is more secure (yay!) and it is easier to track and monetize the user (boo!), if that is your thing, but it is both of these things by design, meaning there aren’t post-design modifications trying to patch up the network or transport layer because it was built to fail. That is a good thing (even if I wish that you would stop tracking and monetizing me).

What is keeping this from happening? (I want the world. I want the whole world.)

IPv6 adoption barriers are well documented. Things that keep this particular vision from fruition beyond them:

DNS clients and caches that hold onto answers after they have expired.
DNS servers that can’t be updated fast enough.
Configuration plane infrastructure and services that don’t focus on rapid, dynamic, change management and network-user-session-centric traffic management.
Dynamic network services that don’t know about anything but a tiny sliver of the picture that matters to them. Given the way it is usually deployed, BGP is an example. Simply merging BGP with DNS to create a cache that weighs responses in conformance to the BGP routing table would be a primitive step in the right direction that is nonetheless monumentally better than what we have today.
Elastic Infrastructure stacks (like OpenStack and VMware) that have really bad networking – in fact the entire virtual networking world (SDN, NFV, cloud networking) is filled with abysmally bad ideas that just. won’t. die.
Web servers and application frameworks that tread application layer security as a second (or third) class citizen.
Vendors who continue to ship products where destination routing out performs and out scales source routing.
Data center architecture that use layer-2 broadcast domains instead of layer-3 routing for path finding. (It is time for STP to die.)
Network end-points that aggregate traffic into absurdly large failure domains rather than breaking cores down to single-serving size compute architectures. (HP Moonshot, Intel RackScale, and Facebook GroupHug are moving along this trajectory, but not far enough. Aereo is a better example.)

That is my view. I don’t get to say it enough because I’m paid not to, and not very many people who do hear it are actually listening.

Why you don’t need stateful middle-boxes or transport layer state in your network when you use IPv6 with Elastic Compute and Connectivity Infrastructure