Redundancy in IT can seem like just a way to double costs from an outside perspective. Whether it’s a second ISP connection or multiple servers, you have to weigh the cost against the benefits.
Using your redundancy correctly can keep a business running and protect the organization from costly service outages. To make the case to remove your single points of failure, it can help to have some examples of the same functionality from everyday life.
Failover
Example: ice cube tray
Ice cube trays are not as common as they once were thanks to ice makers but they still work well as an example of a failover. Let’s say you have two ice cube trays. You fill them with water and put them in the freezer where the water freezes.
Failed implementation
As a person that likes symmetry, you take the ice cubes from each tray each time you fill your glass throughout the day. Halfway through the day, you use the last ice cubes and fill the trays with water again and put them back in the freezer to make ice again. A short time later, you return for another refill and try to put some ice cubes in your glass but find that they are still mostly liquid.
Correct implementation
Each time you fill your glass throughout the day, you take the ice cubes from a single tray. Once that tray is empty, you fill it with water again and return it to the freezer. A short time later, you return for another refill and use the ice cubes from the second tray. Upon finishing off the second tray, you fill it with water and return it to the freezer. By that time, the ice cubes in tray 1 are frozen and perfect for your next drink.
Understanding
The failed implementation of the ice cube tray fails because it is actually implemented for load balancing – multiple people could get ice cubes at the same time without impacting each other. Given the environment in the example, load balancing is not needed. A failover implementation allows the service to continue to operate even as one of the providers is not available.
The same thought goes for multiple-roll toilet paper dispensers, refillable refrigerated water bottles, and many other things that are properly treated as failover.
Load balancing
Example: multiple lane road
Roads cost more money to create and maintain the more lanes they have. Your average residential street is only two lanes, to allow traffic to travel each direction, while a higher traffic road might have two or more lanes going each direction. You also have turn lanes to help keep traffic flowing and traffic lights to regulate traffic.
Failed implementation
A small town recently got a two lane road widened to four lanes by the state department of transportation. This road connects two larger cities and this small town happens to be in the middle. To keep the road in good condition for as long as possible (the citizens think this might reduce their taxes), they agree to only use a single lane in each direction.
For 10 years, the town uses only the inside lanes until the potholes get bad enough that they start causing cars to need repairs. At this point, the people in the town begin to use the outside lanes that have only been used by outsiders and aged from the weather. In 6 years, the outside lanes are worn out and the state needs to repair the entire road.
Correct implementation
The drivers on this road use the new four lane road normally. Slower traffic keeps to the right lanes while passing and turning traffic use the left lanes. The experience with traffic is much more predictable and easier to regulate. Cars in a single lane are no longer backed up through multiple intersections when a traffic light turns red. Turning vehicles are less likely to slow down traffic behind them and vehicles don’t have to go through a worn out, pothole road in order to complete their turn.
The road lasts about the same amount of time since traffic is distributed across the lanes and they are being used instead of just being weathered. Drivers on the road spent less on vehicle repairs so the taxes to resurface the road did not seem so bad.
Understanding
Sometimes a proper implementation can cost more but it can often be seen as a cost of doing business. Downtime can reduce revenue or customer satisfaction, depending on the business.
The failed implementation is treating this road with a failover approach. Given the long term of a road and the traffic needs, it does not make sense to use a single lane at a time. The misguided citizens concerned about their taxes are misusing the public good and possibly causing travel delays and traffic accidents. Wanting to get as many cars to their destination as quickly as possible, it makes more sense to see this wider road as a load balancing implementation.
Conclusion
There are always ‘what if’ scenarios or goals that could invalidate an example or discussion about setting up an environment for failover or load balancing. For example, you could buy two more ice cube trays for the first example to have load balancing and failover options. At a dollar a piece, this would not be that big of an investment but that is not always practical for much more expensive IT infrastructure equipment.
I’ve heard many stories of individuals coming into environments that are set up to handle the wrong problem. It doesn’t always make sense to have tons of spares sitting on the shelf when you need more systems in production to handle the current load. While those computers, servers, hardware components, or other objects are sitting there, costs may have dropped or technology may have improved that you could phase in as replacements. It can also be a waste of money to set an environment up for load balancing to handle more traffic than actually exists only to have a single point of failure further up the line that cripples your organization at a critical time.
Do you have other examples or disagree with my conclusions? Share it on Twitter, Facebook, or Reddit.