MAC Address Aggregation and Translation as an Alternative to L2 Overlays

Tuesday, November 18, 2014 at 2:05 a.m. UTC by stretch

Not so long ago, if you wanted to build a data center network, it was perfectly feasible to place your layer three edge on the top-of-rack switches and address each rack as its own subnet. You could leverage ECMP for simple load-sharing across uplinks to the aggregation layer. This made for an extremely efficient, easily managed data center network.

Then, server virtualization took off. Which was great, except now we had this requirement that a virtual machine might need to move from one rack to another. With our L3 edge resting at the top of the rack, this meant we'd need to re-address each VM as it was moved (which is apparently a big problem on the application side). So, now we have two options: We can either retract the L3 edge up a layer and have a giant L2 network spanning dozens of racks, or we could build a layer two overlay on top of our existing layer three infrastructure.

Most people opt for some form of the L2 overlay approach, because no one wants to maintain a flat L2 network with dozens or hundreds of thousands of end hosts, right? But why is that?

Continue reading 11 comments

PDU-12C

Friday, October 31, 2014 at 2:20 a.m. UTC by stretch

"After all, what's the best part of Halloween?" Jimmy pleaded over the phone. He was trying yet again to convince Tom to skip work for the night and head over to the party he was throwing. Tom and Jimmy were good friends, but he already knew how the conversation was going to end.

"I dunno, the candy?" Tom played dumb.

"No, the eye candy! I'm telling you bro, you don't want to miss it. Rachel will be there." Jimmy sang the last bit tauntingly.

"I told you," Tom countered. "I've got work." It was around 6pm now, and he was just pulling into the parking lot outside the data center where he planned to spend the night recabling several racks of equipment. The scariest part of his Halloween would be picking through years' worth of undressed patch cabling.

"I don't get why you have to do that shit at night anyway. Why can't you do it during the day when you're stuck at work anyway?" Jimmy prodded.

Tom parked across from the building's entrance and turned off his car. Other than a couple vehicle belonging to the operations staff, the parking lot was deserted. He grabbed his tool bag from the passenger seat and headed toward the building's entrance.

Continue reading 11 comments

Cumulus Linux: First Impressions

Wednesday, October 1, 2014 at 2:05 a.m. UTC by stretch

Typically, when you buy a network router or switch, it comes bundled with some version of the manufacturer's operating system. Cisco routers come with IOS (or some derivative), Juniper routers come with Junos, and so on. But with the recent proliferation of merchant silicon, there seem to be fewer and fewer differences between competing devices under the hood. For instance, the Juniper QFX3500, the Cisco Nexus 3064, and the Arista 7050S are all powered by an off-the-shelf Broadcom chipset rather than custom ASICs developed in-house. Among such similar hardware platforms, the remaining differentiator is the software.

One company looking to benefit from this trend is Cumulus Networks. Cumulus does not produce or sell hardware, only a network operating system: Cumulus Linux. The Debian-based OS is built to run on whitebox hardware you can purchase from a number of partner Original Device Manufacturers (ODMs). (Their hardware compatability list includes a number of 10GE and 40GE switch models from different vendors.)

Cumulus Linux is, as the name implies, Linux. There is no "front end" CLI as on, for example, Arista platforms. Upon login you are presented with a Bash terminal and all the standard Linux utilities (plus a number of not-so-standard bits). Like any OS, Cumulus handles interactions with the underlying hardware and among processes.

cl_architecture.png

Continue reading 8 comments

Preliminary Book Topics

Wednesday, August 13, 2014 at 11:46 p.m. UTC by stretch

As I announced earlier this summer, I'm working on writing a book targeted to people entering the field of computer networking. I've got a fair amount of content fleshed out already, but figured it might help to get some feedback on the tentative structure. The book is being written in a question-and-answer style, organized into chapters by subject.

Below is the preliminary table of contents. It's still very much a work in progress, but I'm curious what people think of this approach. Constructive criticism and suggestions for additional content are welcome!

Continue reading 51 comments

Replacing an MPLS WAN with an Internet VPN Overlay

Monday, July 14, 2014 at 1:03 p.m. UTC by stretch

I received an email last week from a reader seeking advice on a fairly common predicament:

Our CIO has recently told us that he wants to get rid of MPLS because it is too costly and is leaning towards big internet lines running IPSEC VPNs to connect the whole of Africa.

As you can imagine, this has caused a huge debate between the networks team and management, we run high priority services such as Lync enterprise, SAP, video conferencing etc. and networks feel we need MPLS for guaranteed quality for these services but management feels the Internet is today stable enough to run just as good as MPLS.

What is your take on the MPLS vs Internet debate from a network engineer's point of view? And more so, would running those services over Internet work?

This is something I struggled with pretty frequently in a prior job working for a managed services provider. MPLS WANs are great because they provide flexible, private connectivity with guaranteed throughput. Most MPLS providers also allow you to choose from a menu of QoS schemes and classify your traffic so that real-time voice and video services are treated higher preference during periods of congestion.

Unfortunately, MPLS WANs tend to be considerably more expensive than Internet circuits. A dedicated 3 Mbps MPLS circuit might cost three or four times as much as a 50 Mbps business class broadband Internet circuit: These numbers are hard to justify to management who may not appreciate the contexts of reliability and QoS controls. Since private connectivity can be achieved using a VPN overlay on top of plain Internet circuits, can we still justify the cost MPLS WANs? Should we?

My advice would be to stick with the MPLS WAN if you can afford it. A VPN overlaid on top of Internet circuits might work most of the time, but when it doesn't perform adequately, you'll have little immediate recourse. Should you decide on moving to a VPN overlay, do so in phases: Keep the MPLS WAN around for a few months in case the overlay strategy doesn't work out. But if you find that your Internet circuits provide sufficient throughput so that congestion of real-time services never becomes a problem, maybe that's an acceptable solution.

28 comments

Beyond the Blog

Wednesday, June 25, 2014 at 2:00 a.m. UTC by stretch

I'm thinking about writing a book.

Obviously, there are a lot of networking books on the market today. Search for any mainstream certification on Amazon and you'll find titles from half a dozen publishers. The majority of these are oriented toward specific vendors (most commonly Cisco) and many parallel a given certification exam. These books are overall pretty great. Most of them.

There also exists a minority of books which cover topics outside of the vendor-driven mainstream, like Gary A. Donahue's Network Warrior published by O'Reilly, now in its second edition. I love this kind of independent title because its content isn't constrained to a particular mold. The author finds stuff he thinks is relevant and interesting, and he writes about it. This is the correct way to write a book.

But over the past few years it has become painfully evident to me that there are many areas of this field we simply don't talk about in print, at least not at the entry level where perhaps it would be most helpful. If you want a thirty-page lecture on subnetting or a terrible mnemonic for the OSI model, pick any CCNA book from the pile and you're good to go. But what if you've never set foot inside a data center and want to know what it's like? What if you're trying to decide between Cisco and Juniper for your first ever network deployment? What if you think change management means you're getting a new boss?

Continue reading 45 comments

PSA: Global IPv4 Routing Table Hits 500k Routes

Tuesday, May 6, 2014 at 10:01 p.m. UTC by stretch

Last week, the global IPv4 routing table has surpassed the 500 thousand route benchmark, according to the CIDR Report. The graph below shows its progression since the early nineties:

plot.png

I last wrote about global IPv4 growth in August of 2009, when the table size was at a mere 300 thousand routes. While that benchmark was largely ceremonial, this one crosses a threshold which should may be of grave concern for many.

As has been pointed out on the NANOG mailing list, we are quickly approaching the hard forwarding plane capacity limits which exists on several very popular platforms, namely the Cisco 7600/6500 and RSP720/Sup720. The default TCAM partitioning scheme of these platforms allows for a maximum of 512 thousand IPv4 routes.

If you accept full Internet routes anywhere on your network, you'll want to verify the maximum table sizes for those platforms. On the 6500/7600 platform, the current partitioning scheme can be inspected with show mls cef maximum-routes:

Router# show mls cef maximum-routes
FIB TCAM maximum routes :
=======================
Current :
---------
 IPv4 + MPLS         - 512k (default)
 IPv6 + IP Multicast - 256k (default)

The good news is that it's easy to repartition the default scheme (e.g. mls cef maximum-routes ip 768) to allow for more IPv4 space. Unfortunately, this requires taking the device out of production for a time to be rebooted.

Thanks to @nixgeek and the NANOG folks for inspiring this post!

7 comments

Deploying Datacenter MPLS/VPN on Junos

Tuesday, April 15, 2014 at 1:17 a.m. UTC by stretch

One of my recent projects has been deploying an MPLS/VPN architecture across a pair of smallish datacenters comprised entirely of Juniper gear. While I'm no stranger to MPLS/VPN, I am still a bit green to Junos, so it was a good learning exercise. My previous articles covering MPLS/VPN on Cisco IOS have been fairly popular, so I figured it would be worthwhile to cover a similar implementation in the Juniper world.

For our datacenters, we decided to implement a simple spine and leaf topology with a pair of core routers functioning as IBGP route reflectors and a pair of layer three ToR switches in each server rack. The spine is comprised of four layer three switches which run only MPLS and OSPF; they do not participate in BGP.

mpls-ospf.png

This article assume some basic familiarity with MPLS/VPN, so if you're new to the game, consider reading through these previous articles for some background before continuing:

Continue reading 12 comments

The Value of a Microsecond

Wednesday, April 2, 2014 at 12:36 a.m. UTC by stretch

While perusing vendor datasheets, have you ever questioned the inclusion of seemingly insignificant latency specifications? Take a look at Arista's line-up, for instance. Their 7500 series chassis lists a port-to-port latency of up to 13 microseconds (that's thirteen thousandths of a millisecond) whereas their "ultra-low latency" 7150 series switches provide sub-microsecond latency.

Arista_7150_series.png

But who cares? Both values can be roughly translated as "zero" for us wetware-powered humans. (For reference, 8,333 microseconds pass in the time it takes your shiny new 120 Hz HDTV to complete one screen refresh.) So, does anyone really care about such obscenely low latency?

For a certain few organizations involved in high-frequency stock trading, those shaved microseconds can add up to billions of dollars in profit. The New York Times recently published an article titled The Wolf Hunters of Wall Street by Michael Lewis, which reveals how banks have leveraged low network latency to manipulate stock prices in open markets. (Thanks to @priscillaoppy for the tip!)

The increments of time involved were absurdly small: In theory, the fastest travel time, from Katsuyama’s desk in Manhattan to the BATS exchange in Weehawken, N.J., was about two milliseconds, and the slowest, from Katsuyama’s desk to the Nasdaq exchange in Carteret, N.J., was around four milliseconds. In practice, the times could vary much more than that, depending on network traffic, static and glitches in the equipment between any two points. It takes 100 milliseconds to blink quickly — it was hard to believe that a fraction of a blink of an eye could have any real market consequences.
Continue reading 10 comments

Learn Python

Thursday, March 27, 2014 at 1:10 a.m. UTC by stretch

Around six years ago, I decided to start a website called packetlife.net. Maybe you've heard of it. Most people turn to a purpose-built content management system like Wordpress or Drupal for such an endeavor, but I needed greater flexibility to achieve some of the projects I had in mind. This meant I needed to learn a programming language and write a good amount of the site's logic myself.

I already had some experience dabbling in PHP, but wasn't thrilled with it. I figured if I was going to learn a new language, it should be useful as a general purpose language and not just for building a web site. After a bit of research and deliberation, I chose Python (and the Django web framework).

The purpose of this post is to convince networkers with little to no experience writing code to learn Python. In the past I've encouraged fellow networkers to pick up any programming language, as it's more important to think like a programmer than it is to gain proficiency in a particular language. However, I've realized that many people get stuck on which language they want to learn, lose motivation, and end up not growing proficient in anything. So, I've started telling people to skip that first step and just learn Python.

Continue reading 15 comments

About Packet Life

PacketLife.net is the work of a network engineer named Jeremy Stretch. It began as a repository for Cisco certification study notes in 2008, but quickly grew into a popular community web site.

The site's goal is to offer free, quality technical education to networkers all over the world, regardless of skill level or background.

Blog Splotlight

FryGuy's Blog

Jeff Fry