Convergence Delays: SVI vs Routed Interface
By stretch | Monday, January 24, 2011 at 2:12 a.m. UTC
There are two general ways in which routed links can be implemented between Cisco multilayer switches. The first option is to designate a VLAN for the routed traffic and create a routed VLAN interface (SVI) on either end of the link for that VLAN. An example configuration for a solution might look like this:
interface Vlan100 ip address 10.0.0.1 255.255.255.0 ! interface FastEthernet0/1 switchport trunk encapsulation dot1q switchport mode trunk
The other option is to configure the physical interfaces as routed interfaces, assigning each an IP address directly:
interface FastEthernet0/1 no switchport ip address 10.0.0.1 255.255.255.0
Both are valid solutions. The latter provides a simpler configuration, while the former allows for a more flexible design. However, a recent discussion on networking-forum.com got me thinking about differences between the two approaches with respect to convergence time.
Slide 24 of the Multilayer Campus Architecture and Design Principles presentation by Cisco referenced in the discussion shows a direct routed link with a routing convergence delay of just ~8 msec. In contrast, the delay of a layer two link with an SVI is estimated to be between 150 and 200 msec. That's a pretty substantial distance, particularly with real-time traffic (e.g. VoIP and video). Let's see if we can replicate their results.
Testing will be performed on a Catalyst 3560-24, connected to a Catalyst 3550-24 at the far end. All but the testing interface will be administratively shutdown. A link fault will be simulated by shutting the remote interface on the 3550.
debug ip routing wil be enabled to record modifications to the routing table. As in the reference slide, delay will be calculated using log timestamps. To achieve the desired granularity, the
msec argument has been appended to the timestamp service commands.
S1(config)# service timestamps log datetime msec S1(config)# service timestamps debug datetime msec
We'll implement both configurations in turn and record the delay that occurs between a link failure and update of the routing table. We'll repeat each test three times and take the average delay for each.
Below is the log output of the first run of the SVI test. The timestamps we're interested in are the line protocol state transition and the interface being removed from the routing table.
Jan 23 18:22:51.315: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/16, changed state to down Jan 23 18:22:51.324: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan100, changed state to down Jan 23 18:22:51.324: is_up: 0 state: 4 sub state: 1 line: 0 Jan 23 18:22:51.324: RT: interface Vlan100 removed from routing table Jan 23 18:22:51.324: RT: del 10.0.0.0/24 via 0.0.0.0, connected metric [0/0] Jan 23 18:22:51.324: RT: delete subnet route to 10.0.0.0/24 Jan 23 18:22:51.324: RT: delete network route to 10.0.0.0 Jan 23 18:22:52.330: %LINK-3-UPDOWN: Interface FastEthernet0/16, changed state to down
- 9 msec
- 8 msec
- 9 msec
Average delay: 9 msec
Routed Interface Testing
Again, below is the log output from the first run of the test.
Jan 23 18:28:57.671: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/17, changed state to down Jan 23 18:28:58.669: %LINK-3-UPDOWN: Interface FastEthernet0/17, changed state to down Jan 23 18:28:58.669: is_up: 0 state: 0 sub state: 1 line: 0 Jan 23 18:28:58.669: RT: interface FastEthernet0/17 removed from routing table Jan 23 18:28:58.669: RT: del 10.0.1.0/24 via 0.0.0.0, connected metric [0/0] Jan 23 18:28:58.669: RT: delete subnet route to 10.0.1.0/24 Jan 23 18:28:58.669: RT: delete network route to 10.0.0.0
- 998 msec
- 998 msec
- 998 msec
Average delay: 998 msec
What Does it Mean?
Well, that was unexpected. Our results are quite different from what the slide suggests. We see a mere ~9 msec delay for the switchport/SVI configuration and almost a full second of delay for the routed interface.
It turns out that the default carrier-delay for physical interfaces (2 seconds) is to blame. Looking at the prior page of the presentation PDF, slide 23, suggests that their test was performed with a carrier-delay set to 0 msec. We can configure our 3560 to match using the command
carrier-delay under interface configuration.
S1(config-if)# carrier-delay msec 0
Running the routed interface test again, we now see very different results:
Jan 23 18:45:06.316: %LINK-3-UPDOWN: Interface FastEthernet0/17, changed state to down Jan 23 18:45:06.316: is_up: 0 state: 0 sub state: 1 line: 0 Jan 23 18:45:06.316: RT: interface FastEthernet0/17 removed from routing table Jan 23 18:45:06.316: RT: del 10.0.1.0/24 via 0.0.0.0, connected metric [0/0] Jan 23 18:45:06.316: RT: delete subnet route to 10.0.1.0/24 Jan 23 18:45:06.316: RT: delete network route to 10.0.0.0 Jan 23 18:45:07.314: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/17, changed state to down Jan 23 18:45:07.314: is_up: 0 state: 0 sub state: 1 line: 0
After eliminating the carrier delay, the physical interface is brought down immediately upon detecting a link failure and the routing table is immediately updated. The line protocol transitions to down as well just one millisecond later.
Modifying the carrier-delay appears to have no effect on the SVI convergence delay.
I want to avoid reading too far into these results, for two reasons. First, this lab provides very limited visibility into exactly when a link is considered down. Second, I suspect much of this behavior is platform-dependent.
However, I would like to close with a few points to take away:
- The tests above were limited to simulating direct failures only.
- As we've seen, the switchport/SVI configuration does not necessarily impose a long convergence delay.
- Carrier-delay is a major factor for routed interfaces. It's a good idea to consider lowering it from its default value.
- Don't trust that statistics from slides (or blog posts) will apply to your network. Verify the numbers yourself.
About the Author
Jeremy Stretch is a network engineer living in the Raleigh-Durham, North Carolina area. He is known for his blog and cheat sheets here at Packet Life. You can reach him by email or follow him on Twitter.
January 24, 2011 at 5:19 a.m. UTC
Thanks for the article. SVI vs routed has always been debated and often without real arguments based on facts. SVI was a lot faster than I expected. Would setting carrier delay to 0 msec be a risk in any way, like setting link to down too fast, leading to flapping interfaces?
January 24, 2011 at 9:23 a.m. UTC
I believe matgar @ networking-forum.com answered your question about the low response time on the SVI interface. I'll quote him - "This [is] because the switch needs to check all the switchports in the chassi to see if any of them has the affected VLAN in them. First when that check has been performed will the interface go down. And then the routing protocol [c]an start its job."
Try turning on some ports on the 3560 to see if that introduces any delay, then add some of those ports to the vlan of the SVI. I think that will clear up some questions you might have.
January 24, 2011 at 10:33 a.m. UTC
Hello, generally when i use routed interfaces i implement the carrier-delay to zero with the following command : "ip routing protocol purge interface", the down link impacts the RIB by purging it rather than waiting the process doing the job (helpfull when you have a big RIB like on a 6500 platform).
January 24, 2011 at 1:10 p.m. UTC
One thing to be careful of with SVIs is making sure the VLAN does not exist on any other interface. Otherwise, the SVI will stay up even though the physical interface is down, and it will not be removed from the routing table. Removing unnecessary VLANs from trunks is a good practice in general, but particularly when using SVIs.
January 24, 2011 at 3:00 p.m. UTC
@Brad: That would defeat the purpose of using an SVI for a point-to-point link; the SVI would stay up regardless of whether the link being tested was in a failed state.
January 24, 2011 at 10:34 p.m. UTC
Great post - this is not something which I would have thought about until you brought it up in your blog. I will be looking to implement some new core switches in the near future and will definitely look to test out both forms of interface for performance while considering my design options.
January 27, 2011 at 4:00 a.m. UTC
How about using single-hop BFD? Does this have any impact on this, at least in case of unidirectional link faults? Thanks anyways for the tips again..
February 5, 2011 at 5:56 p.m. UTC
Great post! I remembered the article and asked an engineer about this at Cisco Live in London. He said the ports are polled in sequence before determining whether to bring the SVI down. For some hardware this can be 20ms per port (I think he mentioned 3560), so for a 48 port device it could take as long as 48*20ms = 960ms. Your results suggest it may only be active ports that are polled, or you may have just been extremely lucky to get a result in 9ms.
August 21, 2011 at 7:31 a.m. UTC
I've done a lot of research into this lately, and ended up with Cisco engineers going through the source code of 12.2(53) to try to find the answer.
For starters, the log entries don't necessarily tell when the actual changes were made in the routing table. I thought they would represent a worst-case scenario but they don't.
I tried this exact test with a pair of 3550s and saw delays of around 950ms every time. Even after dropping the carrier-delay, it was unchanged. The logs seemed to show the detection of the link updown and then a second later, after lineproto down, then the action happened. Tried again with 3560s and the same results.
Anyway, long story short, if you're on any earlier IOS than 12.2(55), you won't see any benefit to the carrier-delay because while the code is in there to make the changes, it doesn't actually seem to do anything.
It would be interesting to see this test repeated but with ping tests running to time the real performance, because I suspect that the response time indicated in the logs is somewhat smaller than the actual interruption to successful traffic flow. With three ports active we were seeing an actual outage of 80ms.
August 28, 2014 at 10:25 a.m. UTC
Convergence of SVI vs. Routed link on a Cisco 3548 Nexus running A1.1c
Link failures were simulated by “shut” on the remote interface.
SVI – L2 VLAN
switch# 2014 Aug 28 10:04:06.348524 switch %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/9 is down (Link failure) 2014 Aug 28 10:04:06.478105 urib: "direct": 184.108.40.206/24 no more next hops 2014 Aug 28 10:04:06.478419 urib: 220.127.116.11/24 Deleting & Freeing 2014 Aug 28 10:04:06.479344 urib: "local": 18.104.22.168/32 no more next hops 2014 Aug 28 10:04:06.479618 urib: 22.214.171.124/32 Deleting & Freeing 2014 Aug 28 10:04:06.479954 urib: "broadcast": 126.96.36.199/32 no more next hops 2014 Aug 28 10:04:06.480215 urib: 188.8.131.52/32 Deleting & Freeing 2014 Aug 28 10:04:06.480542 urib: "broadcast": 184.108.40.206/32 no more next hops 2014 Aug 28 10:04:06.480951 urib: 220.127.116.11/32 Deleting & Freeing 2014 Aug 28 10:04:06.696703 urib: "am": 18.104.22.168/32 no more next hops 2014 Aug 28 10:04:06.697117 urib: 22.214.171.124/32 Deleting & Freeing
Route deletion = 478419 – 348524 = 129895us
switch# 2014 Aug 28 10:07:32.508936 urib: "direct": 126.96.36.199/24 no more next hops 2014 Aug 28 10:07:32.509188 urib: "local": 188.8.131.52/32 no more next hops 2014 Aug 28 10:07:32.509561 urib: 184.108.40.206/24 Deleting & Freeing 2014 Aug 28 10:07:32.509732 urib: 220.127.116.11/32 Deleting & Freeing 2014 Aug 28 10:07:32.512789 urib: "broadcast": 18.104.22.168/32 no more next hops 2014 Aug 28 10:07:32.513464 urib: 22.214.171.124/32 Deleting & Freeing 2014 Aug 28 10:07:32.514677 urib: "broadcast": 126.96.36.199/32 no more next hops 2014 Aug 28 10:07:32.515005 urib: 188.8.131.52/32 Deleting & Freeing 2014 Aug 28 10:07:32.523539 switch %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/9 is down (Link failure) 2014 Aug 28 10:07:32.628070 urib: "am": 184.108.40.206/32 no more next hops 2014 Aug 28 10:07:32.628392 urib: 220.127.116.11/32 Deleting & Freeing
Route deletion = 509561 – 523539 = -13978us
I ran multiple tests and they were always the same. Looks like the Nexus removes the route from the routing table before it even realizes or outputs that the link is down. and reacts to link changes faster with a routed link vs. SVI.