Some recent coverage of OpenFlow and software-defined networks (SDN) in general has gotten me thinking about the relationship between it and server/desktop virtualization like Xen, VMware and the like. I’m by far not the first person to think about this. Martin Casado, as close to the father of OpenFlow as anyone, has written about how networking doesn’t and does need a VMware, but this is a bit of a different story than address or topology virtualization.
One of the core benefits that’s being heralded about OpenFlow/SDN is that it can reduce the amount of code running on switches/routers which is good for security, cost, performance and everything else. It came up in Urs Hoelzle’s keynote about OpenFlow at Google at the Open Networking Summit and it’s featured heavily in other coverage including this recent webinar which discusses how OpenFlow/SDN changes forwarding.
The same arguments came up when desktop/server virtualization first started. Hypervisors were going to be small things which you could actually reason about and they wouldn’t have nearly as many bugs or vulnerabilities as regular OSes. Finally, we were going to get the mircokernels we always wanted.
That’s not how it ended up though. Today, Xen and VMware ESX are both in the realm of 400,000 lines of code and Microsoft’s Hyper-V isn’t far behind. (The numbers are borrowed from the NOVA paper in EuroSys 2010 and I’ve stuck the relevant figure here.)
While a few hundred thousand lines of code is a far cry from the ~15 million that are in the Linux kernel today and the over 50 million lines of code that are theoretically in Windows, it’s probably not where people arguing that hypervisors were the final coming of microkernels wanted them to be. With new features being released all of the time, the evidence is that hypervisors will follow the same course as OSes and have nearly unbounded growth in terms of code size perhaps with a delayed start.
Similarly, I think that while the first take at implementing pure OpenFlow/SDN switches may result in a reduced amount of code running on switches as functionality moves to controllers. However, this isn’t quite what it looks like for two reasons. First, unlike a hypervisor which provided a clean layer to protect hardware resources from the OSes that ran on top of it, OpenFlow has no such advantage and instead exposes it’s resources completely to a controller with no security against a misbehaving controller.
Second, it seems likely to me that functionality is going to drift back to switches assuming that it ever really leaves the switches. My money is on the fact that people will realize that some functionality makes sense to be pushed to the controller—perhaps even most functionality, but there will always be things which are cheaper, faster and better to implement in the switches themselves. Latency sensitive tasks that don’t want to wait to hear from a controller like quickly routing around failures come to mind.