High Availability in Virtualization Strategies
By now you know one of the upsides of virtualization is the ability to put more applications on fewer servers. When vitalization really became more mainstream in the early to mid 2000s, we did many of our asset refresh projects to move toward a virtualized platform.
We always caution that an element one must have in the redesign is planning for high availability. When you distribute applications in this consolidated format, you have to realize that all your applications now become critical – not just the ones you traditionally define as critical. In the days of old – when client/server was the model, we often determined that with the distributed workload, we could afford to only consider the most critical application servers as requiring fault tolerance. There were many ways to create fault tolerance, but usually the expense was too much to offset the risk. Now I see the move to thinner clients, Web applications and mobile devices having access to the applications – so suddenly high availability and fault tolerance can’t be ignored.
It’s important to protect from the cost of downtime in IT, as IT is increasingly the most impactful resource to business performance. We are partners with VMWare and it is the hypervisor we recommend to our clients. The capacity for high availability starts with a multiple host cluster design.
The administrative toolset manages the health and availability of the virtual machines and detects potential problems. Should a system OS fail, HA restarts another OS. If a physical machine should fail in the cluster, HA knows to restart the applications on another server. So in your virtualization planning, you need to go at least N+1 and prepare for sufficient resources for failover.
This design is also very good for planning maintenance too. At the basic level of HA, know that you will have a little downtime as the load shift has to go through restart processes. But in most cases this recovery time is aggressive enough. More aggressive requirements for uptime can be accomplished with a bit more planning and the correct tools.