Archive for June 1, 2015

Preparing for a Bad Day – How to write Disaster Recovery documentation

June 1, 2015 Leave a comment

IT disasters are unpleasant, and can take many forms. However overwhelming the idea of a possible disaster may be, it is crucial to have a well-formed plan in place. Many IT professionals don’t have a good grasp on how to write disaster recovery documentation, which can lead to confusion and problems when disaster strikes.

To give an idea of how good disaster recovery documentation can save the day, I’d like to share a story of how good documentation not only saved the day, but also saved my vacation. 

A few years ago, I was riding the airport shuttle on my way to a cruise ship vacation. While en-route, I got a call from my workplace where the person calling said that they had a power outage at our main datacenter. I was gone, my alternate was stuck in traffic, what should they do? I said “Can you find the Mac server rack?” Yes, they found it. “Do you see the packet marked Emergency Server Startup and Shutdown Procedures?” Yes, they did. “OK, open that and start reading. It’ll walk you through the process.” I talked with them for a few more minutes to make sure that they were OK, then I said goodbye, ended the call and prepared to board my plane.

Without that packet attached to the front of the server rack, which I had made sure was updated the day before with the latest information, I might have been trying to talk someone through the shutdown procedure for about fifteen servers and twelve RAID arrays over the phone up until the moment that the flight attendant yanked my phone out of my hand because the plane needed to take off.

There are a few lessons to take from this story.

Q: Where was I when the disaster occurred?
A: Off-site and without access to either a computer or a way to connect back to the work network.

Q: Where was the other person who had been trained on our disaster recovery process?
A: Off-site and unavailable.

Q: Who was the person on the phone?
A: Someone who wasn’t trained in our disaster recovery process.

Q: What allowed the person on the phone to successfully bring down my servers?
A: Accurate and easily-understood documentation that was placed for ready access.

Q: What was not affected by this disaster?
A: My vacation.

With these lessons in mind, see below the jump for my advice on how to write disaster recovery documentation.

Read more…

%d bloggers like this: