Fun with IPsec stateful failover
By stretch | Monday, August 17, 2009 at 2:00 a.m. UTC
One way to provide failover for IPsec tunnels is to simply configure two independent tunnels between two sites. While simple, this approach means maintaining twice the configuration and consuming twice the address space. Cisco IOS offers an alternative approach using a feature known as stateful IPsec failover to terminate an IPsec tunnel on multiple devices at one or both ends for failover.
Consider the following topology of a branch site connected to a corporate headquarters:
The branch pictured is just one of dozens which are to be configured similarly. We can use IOS's stateful IPsec failover feature to dual-home a single IPsec tunnel from the branch router (R4) to the two distribution routers (R1 and R2) using HSRP and SSO.
First, an HSRP group must be configured on the two distribution routers:
R1(config)# interface f0/0 R1(config-if)# standby 1 name BRANCH-5-TUNNEL R1(config-if)# standby 1 ip 10.0.0.15
R2(config)# interface f0/0 R2(config-if)# standby 1 name BRANCH-5-TUNNEL R2(config-if)# standby 1 ip 10.0.0.15
Further HSRP configuration tweaks, such as setting custom timers or adding interface tracking can be accomplished as you would expect (and would be recommended for a real-world deployment).
Verify that HSRP is functioning before proceeding:
R2# show standby FastEthernet0/0 - Group 1 State is Active 2 state changes, last state change 00:00:11 Virtual IP address is 10.0.0.15 Active virtual MAC address is 0000.0c07.ac01 Local virtual MAC address is 0000.0c07.ac01 (v1 default) Hello time 3 sec, hold time 10 sec Next hello sent in 1.936 secs Preemption disabled Active router is local Standby router is unknown Priority 100 (default 100) Group name is "BRANCH-5-TUNNEL" (cfgd)
Stateful switchover (SSO) is an IOS feature which can provide inter-device service synchronization and stateful failover. Here we'll be using it to provide stateful failover for our IPsec tunnel terminated on the two distribution routers.
First we enable inter-device redundancy for our HSRP (standby) group:
R1(config)# redundancy inter-device R1(config-red-interdevice)# scheme standby BRANCH-5-TUNNEL
Upon configuring inter-device redundancy, you may receive this notice on one of the routers:
% Standby scheme configuration cannot be processed now group BRANCH-5-TUNNEL is not in active state
This simply indicates that this is the standby HSRP router. The router will need to be reloaded before the redundancy scheme configuration can take effect.
Second, we define an Inter-process Communication (IPC) association. IPC configuration can look a bit odd until you become familiar with its hierarchy, but it's actually pretty simple in concept. We'll start by creating a new association to define the redundancy relationship between R1 and R2:
R1(config)# ipc zone default R1(config-ipczone)# association 1 R1(config-ipczone-assoc)# protocol ? sctp SCTP transport configuration R1(config-ipczone-assoc)# protocol sctp R1(config-ipc-protocol-sctp)#
Stream Control Transmission Protocol is used to synchronize state across the routers. We complete the SSO configuration by defining the local and remote end points of the SCTP connection. The physical address of the HSRP interface on each router will be used, but the port number is arbitrary (so long as R1's local port matches R2's remote port and vice versa).
R1(config-ipc-protocol-sctp)# local-port 5005 R1(config-ipc-local-sctp)# local-ip 10.0.0.1 R1(config-ipc-local-sctp)# exit R1(config-ipc-protocol-sctp)# remote-port 5005 R1(config-ipc-remote-sctp)# remote-ip 10.0.0.2
Repeat this configure on R2, swapping IP addresses where appropriate. Completed, the configurations look like this:
redundancy inter-device scheme standby BRANCH-5-TUNNEL ! ipc zone default association 1 no shutdown protocol sctp local-port 5005 local-ip 10.0.0.1 remote-port 5005 remote-ip 10.0.0.2 ! redundancy inter-device scheme standby BRANCH-5-TUNNEL ! ipc zone default association 1 no shutdown protocol sctp local-port 5005 local-ip 10.0.0.2 remote-port 5005 remote-ip 10.0.0.1
R1 and R2 need to be rebooted for the redundancy configuration to take effect.
Upon rebooting, the redundancy configuration on the standby router will trigger a second reload, as indicated by the following log message:
%RF_INTERDEV-4-RELOAD: % RF induced self-reload. my state = NEGOTIATION peer state = STANDBY COLD
After the standby router has rebooted a second time, the commands show redundancy states and show redundancy inter-device can be used to verify the redundant operation:
R1# show redundancy states my state = 8 -STANDBY HOT peer state = 13 -ACTIVE Mode = Duplex Unit ID = 0 Maintenance Mode = Disabled Manual Swact = Enabled Communications = Up client count = 12 client_notification_TMR = 30000 milliseconds RF debug mask = 0x0 R1# show redundancy inter-device Redundancy inter-device state: RF_INTERDEV_STATE_STDBY Scheme: Standby Groupname: BRANCH-5-TUNNEL Group State: Standby Peer present: RF_INTERDEV_PEER_COMM Security: Not configured
R2# show redundancy states my state = 13 -ACTIVE peer state = 8 -STANDBY HOT Mode = Duplex Unit ID = 0 Maintenance Mode = Disabled Manual Swact = Enabled Communications = Up client count = 12 client_notification_TMR = 30000 milliseconds RF debug mask = 0x0
IKE and ISAKMP
To keep things simple, a minimal IKE/ISAKMP configuration is provided here, using a pre-shared key. Such an implementation is not acceptable for real-world deployments; use asymmetric RSA keys instead.
The following cryptographic configuration will be identical on both R1 and R2:
crypto isakmp policy 1 authentication pre-share crypto isakmp key DontUsePresharedKeys address 172.16.0.18 ! crypto ipsec transform-set MyTransformSet esp-aes esp-sha-hmac
It is important that R1 and R2 be configured with identical cryptographic policies and keys in order for the IPsec tunnel hand-off to succeed in the event of a failure.
A similar configuration is performed on the branch router (R4):
crypto isakmp policy 1 authentication pre-share crypto isakmp key DontUsePresharedKeys address 10.0.0.15 ! crypto ipsec transform-set MyTransformSet esp-aes esp-sha-hmac
Unfortunately, IPsec stateful failover does not yet support virtual tunnel interfaces (VTIs), so we'll have to make due with crypto maps. For the sake of simplicity, we'll limit encrypted traffic to that between 10.99.0.0/16 and 172.16.5.0/24. The following configuration is applied to R1 and R2:
ip access-list extended BRANCH-5-ACL permit ip 10.99.0.0 0.0.255.255 172.16.5.0 0.0.0.255 ! crypto map BRANCH-5-MAP 10 ipsec-isakmp set peer 172.16.0.18 set transform-set MyTransformSet match address BRANCH-5-ACL reverse-route
The reverse-route parameter triggers the creation of a static route for local traffic headed toward the far end of the tunnel. If using a dynamic routing protocol in your lab, be sure to redistribute static routes on R1 and R2 into the protocol as more preferable than the dynamic routes. Otherwise, local traffic may be routed out via the standby router only to be dropped; traffic can only be encrypted by the active router in the pair (as it maintains the active IPsec security associations).
The final bit of configuration on our distribution routers is to apply the crypto map with stateful failover capability:
R1(config)# interface f0/0 R1(config-if)# crypto map BRANCH-5-MAP redundancy BRANCH-5-TUNNEL stateful
R2(config)# interface f0/0 R2(config-if)# crypto map BRANCH-5-MAP redundancy BRANCH-5-TUNNEL stateful
Because encryption works best with something decrypting at the other end, let's add a mirror crypto map to our branch router (R4):
ip access-list extended CORPORATE-ACL permit ip 172.16.5.0 0.0.0.255 10.99.0.0 0.0.255.255 ! crypto map CORPORATE-MAP 10 ipsec-isakmp set peer 10.0.0.15 set transform-set MyTransformSet match address CORPORATE-ACL
Which is applied to the appropriate interface:
R4(config)# interface f0/0 R4(config-if)# crypto map CORPORATE-MAP
At this point, traffic between 10.99.0.0/16 and 172.16.5.0/24 should be flowing properly through the encrypted tunnel.
Determine the active router. For the purpose of this lab, it is currently R2:
R2# show redundancy states my state = 13 -ACTIVE peer state = 8 -STANDBY HOT Mode = Duplex Unit ID = 0 ...
We can observe the failover behavior by shutting down R2's external interface to simulate an outage:
R2(config)# int f0/0 R2(config-if)# shutdown %HSRP-5-STATECHANGE: FastEthernet0/0 Grp 1 state Active -> Init %RF_INTERDEV-4-RELOAD: % RF induced self-reload. my state = ACTIVE peer state = STANDBY HOT
While IPsec stateful failover works as advertised, resulting in only minimal traffic disruption during the IPsec association hand-off, it has one little side-effect: the entire router is reloaded. R1 immediately assumes the active role, and R2 eventually reloads to become the hot standby, completing the exchange:
R1# show redundancy states my state = 13 -ACTIVE peer state = 8 -STANDBY HOT Mode = Duplex Unit ID = 0
While this works, scuttling the current running IOS and reloading the entire router can hardly be considered an elegant response to a simple interface failure. This is particularly limiting in hub-and-spoke topologies like the one examined here; with a dozen branch tunnels terminated on a pair of distribution routers, either could easily be the active router for any number of tunnels.
Given the kamikaze nature of state transitions, coupled with the lack of VTI support, stateful IPsec failover seems unready for real-world deployment.
About the Author
Jeremy Stretch is a network engineer living in the Raleigh-Durham, North Carolina area. He is known for his blog and cheat sheets here at Packet Life. You can reach him by email or follow him on Twitter.
Posted in Security
August 17, 2009 at 6:46 a.m. UTC
Great article! Tried to implement it before on 2811, but no luck... AFAIK its only supported on 7xxx routers?
August 17, 2009 at 5:36 p.m. UTC
You had me all the way until 'no VTI support'...bummer
August 17, 2009 at 5:56 p.m. UTC
August 17, 2009 at 10:08 p.m. UTC
I'm about to roll this out using 3825's (the lowest end device supported when we first looked at this feature) but it has been difficult to find code that supports all the features we need. I had to put in a TAC case to find code that didn't have broken RRI. (12.4.15T9)
Anyhow once it's up, it seems to work as advertised. Often only 1-3 dropped pings between failover.
I had no idea about VTI, but that's good to know as I would've expected it to be supported. (A common problem with Cisco in my experience)
Anybody work with this technology and dynamic routing? (on the inside) What's the best approach to get routes from this pair to adjacent routers, since the IPSEC is tied to the same device as the active HSRP?
August 18, 2009 at 5:17 p.m. UTC
While I'm sure there are possibly other technical reasons to use a router to provide stateful IPSec termination point. Routers never really provided good "stateful" anything till this day...
Also, while I understand this blog focuses primarily on router/Switch technologies.. for this type of application, you really have to consider ASA/PIX platform in HA mode. They provide seamless failover functionality maintaining stateful connections and providing a good IPSec VPN Termination point. It just seems to fit the bill much better than a traditional router platform (off course there are downsides to this: limited dynamic routing functionality, etc).
Maybe all this boils down to having a good design/architectural blueprint. I guess picking the correct platform to do the job at hand is just as important as understanding how to configure a service on a device.
February 22, 2011 at 9:00 a.m. UTC
As this was an entry from august 2009, does anyone know if SSO with VTI is supported now?
February 23, 2011 at 5:13 a.m. UTC
I shut down R2's external interface,then R2 is reloaded.After R2 is booted,it becomes the active router again, and the vpn client cannot connect unless I connect it.Why?
April 2, 2011 at 3:27 p.m. UTC
I've run into the same issue when running a quick test on a pair of 2821s as my hubs. Admittedly, I didn't investigate further as to why that happened. Thinking about it now, perhaps the HSRP preemption on the primary router that I typically use when configuring HSRP was the cause.
May 4, 2012 at 6:08 p.m. UTC
According to the document I am reading at the moment, this feature only supports PSK for IPSec for RSA.
"Public key infrastructure (PKI) is not supported when used with stateful failover. (Only preshared keys for IKE are supported.)"
August 30, 2012 at 8:46 a.m. UTC
Many thanks for this article. It inspired me create a more simpler IPSEC failover setup using smaller Cisco router 881 that do not support SSO.
I have a question regarding routing when using IPSEC failover.
I am using pairs of Cisco 881 at branches in failover setup by applying the crypto map to the HSRP process of the outside interface. This works fine and failover is fast. Routing is static: default route pointing to the internet interface Fa4/WAN. VPN is IPSEC. VPN hub is ASA-5520. All local-LAN traffic is matched by the crypto-acl (source: LOCAL, destination: ANY). LAN routing is static.
Config of the primary:
description Fa4 outside (x.y.z.148/26)
ip address x.y.z.148 255.255.255.192
ip access-group outside-in in
standby 4 ip x.y.z.147
standby 4 timers msec 500 msec 1500
standby 4 preempt delay reload 120
standby 4 name hsrp-outside
standby 4 track 1 decrement 20
standby 4 track 2 decrement 20
ip tcp adjust-mss 1452
crypto map IPSEC redundancy hsrp-outside
no cdp enable
!# track1 tracks lo0 as the manual failover switch
!# track2 tracks the inside interface (line-protocol)
crypto isakmp policy 10
encr aes 256
crypto isakmp key very-secret-psk address ASA.hub.outside.135 no-xauth
crypto isakmp keepalive 20 periodic
crypto ipsec security-association replay window-size 256
crypto ipsec transform-set ESP-AES-256-SHA esp-aes 256 esp-sha-hmac
crypto ipsec df-bit clear
crypto map IPSEC 10 ipsec-isakmp
set peer ASA.hub.outside.135
set security-association lifetime seconds 28800
set transform-set ESP-AES-256-SHA
match address 101
access-list 101 permit ip branch.local.lan.0 0.0.3.255 any
access-list 101 permit ip branch.local.transfer.176 0.0.0.15 any
access-list 101 permit ip host branch.local.loopback.5 any
access-list 101 permit ip host branch.local.loopback.6 any
ip route 0.0.0.0 0.0.0.0 x.y.z.129 name "Internet"
ip route branch.local.loopback.6 255.255.255.255 branch.local.transfer.179 name "VPN02 Loopback1"
ip route branch.local.lan.0 255.255.252.0 branch.local.transfer.180 name "LAN Networks"
I did not find a solution for the following:
I want to manage both 881 via the active IPSEC tunnel. E.g. ssh to the inside or loopback interface from central site, Tacacs and syslog is sourced from loopback interface which are both inside the tunnel. While primary is crypto-active and fully reachable via VPN tunnel, the secondary has its static default route pointing to the ISPs router connected to at its own outside interface. With this routing the secondary is not reachable via IPSEC tunnel active at the primary - only via non-tunneled outside IP.
-> What is needed is a kind of dynamic or HSRP-state-depending temporary routing for the passive router pointing to the inside of the active router. Or some total different VPN setup? Any ideas?
Btw: should I tweak the MTU at any interface additionally to the TCP-MSS adjustment?
Thanks in advance.
February 19, 2015 at 6:04 p.m. UTC
How would it be possible to create a hub and spoke model with static IPSEC HA, IE having two routers on both ends? This assumes the hub is a single router.
December 27, 2015 at 3:41 p.m. UTC
Have something changed if VTI in use?