Since most businesses don’t even write a disaster recovery (DR) plan, you’re definitely ahead of the game if you’ve gone ahead and created written documentation. Problem is, there’s a good chance the plan you wrote doesn’t work: it’s missing applications, it’s missing steps, it assumes systems come back up faster than they can, or it has some other problem. The best way to find those problems is to schedule and test the DR plan. That way you won’t hit those same problems when you need to recover from a real disaster.
You can pick from different ways to validate your recovery plan. The least disruptive is to do a desk walkthrough, where everyone who would be involved in the recovery gets together to read through the plan. This will spot obvious errors, but it won’t help identify problems like unrealistic timing estimates.
The most effective way to test the DR plan is also the most disruptive: shut down your primary systems and actually execute the steps to bring them up according to your recovery strategy. This will give you accurate measures for timing and identify communication and control problems, but requires the biggest commitment of human resources (you’ll want both end users and the technical staff to participate) and introduces a real risk that you won’t successfully recover some application.
To minimize the risks of executing your DR test, be sure you make a DR test plan that extends your DR plan for the test scenario. This DR test plan should include the following:
- a list of everyone who will be involved and how to contact them. One person should be identified as responsible for coordinating the test activities.
- prep work including checking that your backup server configurations are consistent with your production server configurations. All patches, applications, and libraries should be matching versions. Allow time to bring backup systems up to date if necessary.
- instructions for creating backups and shutting down production systems cleanly.
- a way to track time required to bring up the DR servers and applications, along with any issues encountered and the steps required to fix them.
- instructions for bringing production systems back online cleanly and checking them out to make sure they’re ready for the next business day.
- a post-mortem to evaluate the DR test and update the documentation to reflect any new details and timing.
Make sure the DR test date won’t interfere with any critical system maintenance or upgrade activity, and make sure everyone who will be impacted by the simulated outage is notified of the test, even if they won’t be participating. You don’t want users trying to get real business done while the test is going on.
After you complete your DR test and update the documentation to reflect any corrections, make a note to schedule your next DR test. Because there are constantly changes in your IT infrastructure, it isn’t enough to test your DR plan just once. You should conduct a test at least once a year, updating your plan to reflect any changes in servers, applications, and personnel.
Does your DR plan need an update? Download our guide to Disaster Recovery as a Service to learn more about planning for a disaster and how disaster recovery as a service can make getting your systems back on line easier and faster. Contact us to learn more.