When we discuss simple network failover solutions, we typically are contrasting two fundamental options: active/active and active/standby. These describe the manner in which two redundant devices operate under normal conditions.
In active/standby implementations, only the primary device in a pair passes traffic. The standby device sits idle, ready to assume the active role should the primary device fail. The standby device may receive routing and state information from the primary device in order to facilitate stateful failover, but it doesn't actually pass traffic until the primary device fails.
In active/active implementations, both devices are online and pass traffic under normal conditions. When one device fails, all traffic flows through the remaining device.
Customers often have a difficult time coming to terms with the perceived expense of an active/standby failover solution. The notion that one is paying double for an extra device "that doesn't do anything most of the time" clashes with the nature of a responsible consumer. In an active/active implementation, the second device is at least passing traffic, which is easier to justify psychologically. However, active/active implementations present a hidden danger: artificial scalability.
Let's examine a scenario in which an organization has implemented two Cisco ASA 5510 firewalls configured for high availability at its network perimeter. Although the initial throughput load on these firewalls is low, it is growing steadily over time. Eventually the average load creeps up to around 80%, and the network begins to experience sporadic spikes in throughput beyond the maximum load a single ASA 5510 can handle, which we'll refer to as the 100% threshold.
Consider how the two failover methods we've described will react to the increasing demand for throughput. In an active/standby implementation, the primary firewall will choke during the periods of heavy load and drop traffic. (This is not likely to trigger a failover, although even it does, the standby firewall will experience the same issue.) This provides early warning to network administrators that the network has begun outgrow the current platform and that a redesign is needed.
In an active/active implementation, however, the combined pair of ASA 5510s are able to handle more traffic than a single device (the 200% threshold). The active/active pair has no issue with traffic exceeding the 100% threshold and demand continues to grow so much that the average throughput is now beyond the maximum capacity of a single device. What happens when a failure occurs? The maximum capacity drops to the 100% threshold and the remaining device is unable to accommodate the full traffic load. A substantial amount of traffic is dropped until the primary device is brought back online to handle its share of the load.
Of course, like so many potential network issues, this risk can be mitigated by diligent monitoring and reporting. But I believe it does help strengthen the case for active/standby failover when one needs to defend that idle second firewall to a particularly budget-conscious customer.