Google is so serious about its disaster recovery planning that it tests for most scenarios imaginable, no matter how unlikely, including an alien invasion of its offices. You see, nothing puts fear into a CIO like a system outage — it gives new meaning to having a “bad day” at the office.
But quite surprisingly, unlike Google, very few companies have a disaster recovery plan. A recent IT survey indicated that only 26 percent of small to medium-sized businesses do. With New Year’s resolutions top of mind, here are five steps that you and your organization should take to dramatically reduce the fallout of your next outage.
#1: There will be a next time
Note, I said “next outage,” which implies there will be another outage or disaster that takes out your critical business systems. Thinking that it will not happen again leaves organizations more susceptible — and probably even longer recovery times.
The best time to start your disaster planning or to update your existing plan is right after you’ve had an outage. Every company should do a thorough postmortem after an outage that entails a forensic analysis leading up to and after the outage. Understanding what happened and how it can be avoided in the future is critical, but so is asking, “How could we have recovered faster?”
Based on the information gathered, disaster preparedness test cases should be updated or even expanded. The more verbose your test cases, the less likely you will be surprised when that scenario presents itself — and it is not a drill.
#2 Focus on preventing and recovering from outages
The first requirement in any disaster recovery plan is to have a plan. There are many disaster recovery plan documents available online that you can use as a guide, but most importantly, start by identifying your business critical systems. Then, focus on building redundancy into those systems and understanding what’s required to put Humpty Dumpty back together if systems need to be restored.
#3 Have a disaster recovery team, leader in place
When your company is fighting the storm of an outage, it’s not the time to decide what to do or who does what. Sure, there will be some “heat of the battle” decisions to be made, but for the most part, responses should be choreographed and rehearsed (many times) by a dedicated team of engineers.
You also need an effective leader who can direct diagnosing the problem and repairing systems. A great leader is key during an outage because the company’s reputation ultimately rests on their decisions.
#4 Your infrastructure is only as a resilient as your providers
Companies typically rely on service providers such as ISPs, colocation providers or cloud providers like Amazon Web Services. Although rare, even a world-class cloud provider like Amazon has outages that can leave both systems and CIOs in the dark about what the problem is or when it will be fixed.
Netflix, a high-profile AWS customer, has been collateral damage to several AWS outages over the last few years. But, Netflix doesn’t assume AWS will provide 100 percent uptime. In fact, Netflix has created what it calls the “Simian Army” that randomly disables production instances under close supervision to simply see what happens and ensure that their engineers can react before the issue impacts its customers. Much like law enforcement and first responders that continuously drill and conduct mock disasters, Netflix engineers hone their skills and reaction times through continuous testing.
#5 Communicate with your customers quickly
At Zoho, we have millions of users that depend on our cloud services. They depend on us to keep their businesses running. When we are down, they are down. Not being able to access applications or services that are critical to your business is one of the most helpless feelings. It is at these times when your customers need to hear from you fast.
In the case of a Zoho outage, communications to our customers comes straight from our CEO — we want them to know that there isn’t anything more important than solving the outage. Since normal communication lines with our customers may be lost during an IT infrastructure outage, we utilize social networks like Twitter and Facebook to provide status updates to our customers.
Being open, clear and timely with your customers can save you from dealing with a social media disaster as well — remember, customers tweet when they’re upset!
All you CIOs reading this, the bottom line is it’s as good a time as any to face your biggest fear. Start the new year by putting a disaster recovery plan in place. And remember to plan and test for everything — even aliens.
The post 5 Tips for CIOs: Plan for Hurricanes, Hackers and Aliens? appeared first on ManageEngine Blogs.
The original article/video can be found at 5 Tips for CIOs: Plan for Hurricanes, Hackers and Aliens?