T-SQL Tuesday #38–Resolution

This is my first time joining the T-SQL Tuesday blog party but on the off chance that there’s anyone reading this that doesn’t already know about it I’ll give a quick summary of what it is.  It was started in 2009 by Adam Machanic and every month someone is chosen to host the party.  On the Wednesday prior to the second Tuesday of the month the host picks a topic and then anyone who wishes blogs about it the following Tuesday.  This month is hosted by Jason Brimhall and he chose “Standing Firm” as the theme.  We were asked to blog a story related to the following terms:

  • resolve
  • resolution
  • resolute

You can read some of his ideas for topics in the post linked from the image.  What I’m going to talk about is a disaster situation I was part of and how I stood firm to not do a repetitive task manually.

When I first started my current job I was moving from a development position with a vendor to a production DBA position.  I was looking forward to the new challenges and the different experiences that the change would, and still is, giving me.  I was also moving from a position where I handled the work handed to me and there was very, very little room for self direction so I was also looking forward to being able to pick out the pieces of the environment that I thought I could improve and work on those.  The first day was a full day orientation for everyone that works at the hospital.  We learned about patient privacy, why forwarding emails with pictures of kittens is bad, and how to wash our hands.  Which is a golf ball size dollop of the antibacterial gel unless you’re going into surgery in which case it’s something more.  And the gel is better than washing with soap unless your hands have dirt on them or they are otherwise soiled.  But I digress.

Tuesday morning, my first day actually working, I went around the office where I was going to be working and met people.  I learned that they were wiping up after a major virus infection that had IS working around the clock for a couple weeks to clean up and they were still dealing with the final parts of cleanup.  The other DBA and I went out to lunch.  After coming back we learned that one of the VM hosts had failed over and as a result a bunch of the VMs had gotten corrupted.  We later found out this was due to a misconfiguration combined with a bug (essentially, the hosts were communicating with the SAN on different block sizes, the host that failed over renegotiated with the SAN but the host that didn’t fail over was never told and started corrupting VMs).  Not exactly what I wanted to deal with for my first issue at the new job but I was open to the challenge.  Especially since the VM environment was managed by another team and they were responsible for getting the VMs backup.  The DBA team would just be responsible for restoring the DBs to the most recent we could once the VM was restored.

To make sure we were ready when each VM was available my counterpart went to start prepping the restore scripts the way he normally had, a manual process that worked fine for a small number of databases but when looking at a large number of servers some with multiple database I decided I didn’t want to spend my first day doing what was essentially busy work.  I’ve often been glad that the DBAs here have been good and got things set up well and that day was the first time.  The full database backups went to fairly standard locations but not absolutely standard locations.  Fortunately we only had to do one of those for each database.  Even more fortunately all the transaction logs went to the same place in a very standard location with just a few exceptions (which weren’t affected by this).  Instead of the manual process I reach for a trick I had used before.  I opened Excel and wrote a concatenate function to combine all the data we needed to generate the log restore commands given the server, database, and filename.  Since I had it done before the first server was ready for us we didn’t loose any time and were able to focus on other things while we were waiting for servers to come our way.

After we had the scripts generated we saved them in case we needed to run them again.  Which, unfortunately, we did a couple times until the cause of the failover was discovered.  Having the spreadsheet to generate the scripts saved a lot of time as well as saving the scripts after they had been generated in case we needed them again saved a lot of time and allowed us to work on servers not affected by this outage.  Thinking of ways to automate repetitive tasks is something every DBA should work on doing.  Restoring about one hundred database would have been a lot more work had the commands needed to be put together by hand.