T-SQL Tuesday #62 – HealthySQL – Restores


This month’s T-SQL Tuesday is about keeping a healthy SQL environment.  You can read more at Pearl Knows’ post on it.  One of the things I’ve done in our environment is automate restores to a test machine to validate that we could successfully restore.  I’ve blogged some about pieces of that script and will likely do that more going forward as it evolves but what I want to write about today is the statistics that have come out of that routine.

The Restore Setup

To start with, our standard backup job drops a backup on a local disk and then Tivoli, on a different schedule, run and pulls it to its storage.  The next morning the restore process picks five random databases, finds the latest backup by querying MSDB, restores it from Tivoli to a test machine, does a SQL restore, and runs an integrity check.  Any time it failed to do a restore I noted the reason, resolved it, and finished the restore to make sure there were no other issues.  I also noted what the failure was so I could go back and run metrics on why we couldn’t restore a backup.


There was a 5.3% failure rate excluding cases where the database picked a database that was intentionally not backed up.  This includes cases where the backup was sitting on disk but hadn’t been picked up by Tivoli prior to the restore routine started.  I decided to count those as failures since if we had a disk failure between when the backup happened (typically 8pm) and when the restore happened (at 5am) there was a loss.  This is a higher failure rate than we would like to see so let’s look into what the failures were.

Tivoli Failures

Breaking down the failures, 89.2% were due to Tivoli not having the backup file.  By far the bulk of those were caused by scheduling differences between when the SQL backup ran and when the Tivoli backup ran.  Some were missed entirely because the SQL backup and the Tivoli backup were running at the same time but others  had a large gap between the two, sometimes almost a day.  These scheduling issues accounted half of all the failures.  Fortunately, they were easy to fix by adjusting one schedule or the other.

21% of the Tivoli failures were due to transient Tivoli issues that caused backups to not happen (either system wide or on that machine on a specific night).  And 15% were caused by persistent failures due to configuration.

SQL Failures

The remaining 10.8% of the failures were due to issues with the SQL backup job.  One was on a system not yet in production but should have had our maintenance job installed already.  One was caused by a permissions issue, and two were because the vendor had responsibility for the system and hadn’t set up a maintenance job.  Yes, I check backups on systems I’m not responsible for.  Mainly this is because I want to protect the hospital I work for.  It is also in my best interest to bring the issue up ahead of time instead of being pulled into a disaster situation and being asked to perform a miracle.


We didn’t have a single backup fail to restore due to backup corruption.  If we could get to the backup file we were able to restore it.

Lessons Learned

Being able to restore a backup is the most important part of having one.  Without knowing if your backups are restorable you won’t be able to resolve any issues preventing the restore until it’s too late.  The biggest lesson for us wasn’t a surprise but it is good to have hard numbers around it.

  1. Our biggest point of failure is the disk backup.  If I grabbed the SQL backup off disk instead of long term storage I wouldn’t have caught the majority of our failures.
  2. Scheduling is the single biggest issue.  If you have a similar setup pay careful attention to your schedules or see if you can add a step to your backup job to have the disk backup pick up the SQL backup right after it happens.
  3. Verify vendor work.  There are vendors out there that insist on supporting their own systems and are good at it but the bulk of the ones that we work with aren’t.  If it’s documented that they have responsibility for it but at the end of the day if it’s in your environment you will probably be called on for support.

Being able to track numbers about failures will also help get the support you need from management to improve your infrastructure.  In our case, we’re moving most of our backups from Tivoli to a DataDomain device so we can bypass those scheduling and agent failures.

SSIS Scripts and Connection Managers

One of the projects I’m working on uses an SSIS package to migrate data between from one database that’s internal use only to one that drives a public website.  As happens so often the source data isn’t formatted in the way that it should be presented on the website so some manipulation is required enroute.  The one I’m dealing with here is fixing the capitalization.  The source data is stored in all caps and to become proper case.  Except not all words should be proper case.  Some are abbreviations and should be all caps (ex: <pull out of AdventureWorks database>) and some (in, and) should be all lower case and some they want total translations done.  Since this list is somewhat dynamic I don’t want to have to change the SSIS package each time they identify another word that needs to be handled differently so I am storing the list in a database table for the SSIS package to reference.

Since these need to be applied to several different column in several different scripts we shouldn’t be hard coding the server reference in each script.  That makes it much harder to maintain and to deploy from test to production (since development isn’t done on production, right) because each script needs to be updated.  It would be much easier if we could reference the connection managers we define for the whole package.  Fortunately, that’s possible.


First off I’m going to modify the AdventureWorks data so that we have some data we can clean up and create and populate a translation table:

UPDATE [Person].[Address] SET [AddressLine1] = UPPER([AddressLine1])
CREATE TABLE custom.Translations (
    Original VARCHAR(255) PRIMARY KEY,
    Translated VARCHAR(255)
INSERT INTO custom.Translations (Original, Translated) VALUES
    ('Ct.', 'CT.'),
    ('Ave.', 'AVE.'),
    ('Pl.', 'PL.'),
    ('Blv.', 'BLVD.'),
    ('Blvd.', 'BLVD.'),
    ('St.', 'ST.'),
    ('Rd.', 'RD.')


Next create an SSIS package and a connection to Aventureworks.  I’m using an ADO.NET connection but OLE should work.

Connection Manager ADONET

Create an ADO.NET Source to pull the Person.Address table and use the connection manager you defined earlier.

Create Script Component

Now what we’re here for, the script component.  There are a few things setting up this component that need to be right for the script to work so be careful in this section.  Of course, the worse that happens is you’ll need to start again.


Create the script component and select transformation:

select_script_component Select Script Component Type

Edit the Script Component to make sure the ScriptLanguage is correct since you can’t edit that after saving the component.  I’m using VB so I made that change before moving on.

Script Transformation Editor - VB


In the “Input Columns” section select the two columns we want to modify.  Leave them as ReadOnly since we’re going to add new outputs so we can keep them separate from the source data should that be useful in the future.

Script Transformation Editor - Input


Then add the two output columns making sure to change the data type to string.

Script Transformation Editor - Output

And, almost finally, add the connection manager you created earlier.

Script Transformation Editor - Connection Manager

And, finally finally, to back to the Script section and click Edit Script.  This is where we’re finally getting to work.

Use the Connection Manager

The script has a few sections and most have already been set up for you.  We’re going to be working in the “Input0_ProcessInputRow” section.  Visual Studio makes this pretty clear in it’s comments:

Edit Script

Here’s the script for my project.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        'Set up the connection.
        'Here is where we reference the connection manager we initially defined
        Dim connmgr As IDTSConnectionManager100 = Me.Connections.AdventureWorks2012
        'Then create the connection, open it, and create the adapter that makes the call
        Dim cn As SqlClient.SqlConnection = DirectCast(connmgr.AcquireConnection(Nothing), SqlClient.SqlConnection)
        Dim sql As String = "SELECT Original, Translated FROM custom.Translations"
        Dim adapter As New SqlClient.SqlDataAdapter(sql, cn)
        'Now we're creating the data table and filling it from the query above.
        Dim translations As New DataTable
        'Do the initial conversion to proper case, mucking with our abbreviations, putting the results into a placeholder
        Dim tmpAddress1 As String = StrConv(Row.AddressLine1, VbStrConv.ProperCase)
        Dim tmpAddress2 As String = StrConv(Row.AddressLine2, VbStrConv.ProperCase)
        'Set up the variables we'll be using to set the abbreviations to what we want them to be
        Dim i As Integer
        Dim original As String
        Dim translated As String
        Dim current As DataRow
        'And loop through all the rows in the Translations table until we've set everything back
        For i = 0 To translations.Rows.Count - 1
            current = translations.Rows.Item(i)
            'Populate the conversion for the current row
            original = current.Item("Original").ToString
            translated = current.Item("Translated").ToString
            'Update the placeholder
            tmpAddress1 = tmpAddress1.Replace(original, translated)
            tmpAddress2 = tmpAddress2.Replace(original, translated)
        'Lastly, set the output columns to the placeholder
        Row.properAddress1 = tmpAddress1
        Row.properAddress2 = tmpAddress2
    End Sub

This will get everything in proper case with the exceptions of the abbreviations we defined earlier and make it easy to add translations in the future.

Confirm Results

Finish it up and set up a destination then run the script to populate it then compare the results:


Here you can see things translated as desired including the “BLV” abbreviation getting translated to “BLVD”.  There are some issues with the script though.  You can see “DR” (no period) got translated to “Dr” which probably isn’t exactly as desired.  That requires a little more work that I’ll get into in an upcoming post.

Script enabling mail notifications

One of the things that we try to do where I work is automate as much of the process of setting up a new server as possible.  This reduces the possibility for user error and speeds up the setup process.  We recently found a little twist in setting up database mail where it partially worked.  Calling the mail procedures went fine but notifications (either on an event like a read error or for a job completion) didn’t work.


Here’s the line of the setup script that should enable mail notifications:

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'DatabaseMailProfile', N'REG_SZ', @defaultProfile

Where @defaultProfile is the name of the profile to use.

The fix for this was as follows:

  1. Right click on “SQL Server Agent”
  2. Go to the “Alert System” screen
  3. Uncheck “Enable mail profile”
  4. Click okay then go back in and re-enable the mail profile.

So to find out why the scripted version was doing wrong the first thing I tried was going and scripting that change in SSMS.  Since this is supposed to be what SQL runs that should tell me what I’m doing wrong.  Working in the healthcare industry we’re still deploying mostly SQL 2008 R2 instances and that’s what this script is for.  SQL 2012 makes a different call but from what I can see still has the same problem.

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'DatabaseMailProfile', N'REG_SZ', N'MailProfileName'

So that didn’t work.  The next step was to take the SQLServerAgent branch of the registry from a server that was fully working and a server that had been set up with the script and compare them.  Doing that I found another registry key that was different:


That key needs to be set to 1.  Adding the following to my setup script got notifications working:

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'UseDatabaseMail', N'REG_DWORD', 1


I’m not sure why SQL doesn’t script that second registry key setting out but it’s crucial for SQL notifications to work.

T-SQL Tuesday #42! The Long and Winding Road

For those not already familiar, T-SQL Tuesday is a blog party that happens the second Tuesday of every month.  The host rotates each much and this month Wendy Pastrick is hosting.  The topic she has chosen is:

…we all experience change in our work lives. Maybe you have a new job, or a new role at your company. Maybe you’re just getting started and you have a road map to success in mind. Whatever it is, please share it next week, Tuesday May 14th. Make sure you note what technologies you find are key to your interests or successes, and maybe you will inspire someone to look down a road less traveled.

The biggest change that’s happened for me recently is moving to a new job which also had me switching from working for a vendor (an Electronic Health Record vendor) to working for a client (a hospital.)  This also had me changing from being a Database Developer to being a Production DBA.  Both sides of that meant a lot of changes in my work environment.

Sometimes you don’t know how bad it could be

I used to rant about the significant mistakes I saw in the product I worked on.  Some things that just shouldn’t happen.  Now, I still think they’re things that shouldn’t happen but I realize how much worse things could be.  I’m realizing how many companies miss the basic stuff that they got.  Such as hard coded passwords, following client’s change control processes, and what an integrity check is and why it’s needed.  And compared to one product we have that uses a GUID stripped of the dashes for identity on rows, stores both that and the actual value in each table, and has implicit conversions all over the place preventing index seeks my former product looks almost spotless.  If you’re working as a developer some place and are concerned about the quality of your database, keep up the good fight but realize things could be much, much worse.

Different Problems

The problems I’m working on now are very different from the ones I worked on when I worked for the vendor.  If an application is returning data that it shouldn’t the problem doesn’t even make it to me in most cases.  Performance problems may either go straight to the vendor or make a stop at my desk after which my findings typically go to the vendor to validate.  But while I’m less involved in those I have a new set of problems to deal with.  How do I ensure we can recover any database if we’re called to?  Who needs to be contacted to do what testing before we apply a SQL patch?  Why did that SQL instance crash?  I don’t have fewer problems now.  If anything, I have more because instead of being responsible for the DB code for a large product I’m responsible for 150+ servers and all the maintenance, security, performance, and reliability issues that come with those.

Different Methods

In my previous job if something was broken in the database I fixed it.  If there was a bug in the logic I changed it.  If code was inefficient I optimized it.  Performance could be fixed by adding an index or changing the code.  There were also times when I helped clients track down performance issues specific to their config.  Now what I deal with is much more limited.  I deal with performance issues caused by regular DB maintenance by trying to schedule during low use times (not always 10pm since we’re a hospital), blocking due to a long running report or load, and recommend index and code changes to vendors.  Whereas before I would occasionally restructure entire stored procs to fix an issue now I don’t delve into things that far.  And in some cases I don’t have access to much code at all because it’s all adhoc SQL.

Different Tools

Before I was focused almost 100% on issues with the code.  Now I’m focused much more on the system than the code.  I still use Profiler and query plans to troubleshoot but I find myself relying much more on sp_WhoIsActive, sp_blitz, and Idera’s SQLdm which we use for monitoring.  These tools tend to be much more useful with the problems I’m trying to solve now.

In The End

I’ve been very happy with the change.  A big part of why I left my vendor position was because there was a whole world of experience that I didn’t have access to.  One that all of my clients dealt with every day.  Now I’ve experienced that world for a little over two years.  I’ve learned more about optimal disk layout, how to automate deployments, how to configure security, and a lot more.  I’ve also been working with more SQL components (my SSIS knowledge is still basic but it would have been almost nil before) and even some stuff outside SQL (Powershell is my friend).  There are things I miss.  I miss stepping through a trace to find out why an SP didn’t execute as expected.  I miss reworking code to cut the execution time.  Most of the problems I work on now don’t have the depth that the coding problems did and I do miss that.  However, there are some that do (such as automating restoring databases all the way from pulling out of TSM to doing an integrity check and sending a report) and that makes me happy.  It’s been a good change and I’m looking forward to all that I still have to learn as a production DBA.

Estimating Remaining Query Time

There are some queries that report a percent complete value in the sys.dm_exec_requests DMV.  While this is useful for getting a ballpark of how long the query will take to finish doing this kind of math in your head isn’t always easy.  Fortunately it’s possible to do all sorts of fun math in SQL so you can put it in there to get the estimated query time remaining.

Get that Query Time

This can’t be used for all queries.  There’s a subset of queries where progress can be tracked like this.  The full list can be found on the BOL page for sys.dm_exec_requests but the ones I use it for the most are backup, restore, and checkdb so I know when I need to check back in.

Now, this won’t be a perfect estimate.  Obviously variations in server, network, or disk load can change how quickly the operation completes.  Other things may not be quite so obvious.  For example, when restoring a file the amount of time needed to allocate the space on disk isn’t counted.  This can cause the estimated time remaining to be greater than it will actually take.

Since I use Adam Machanic’s sp_WhoIsActive I based the code off that.  And since I’m only tracking one query at a time I didn’t worry about handling multiple rows.

create table #t (
        runtime varchar (max),
        percent_complete numeric (8, 5),
        sql_text xml ,
        wait_info varchar (max)
exec FAHC_DBA .dbo. usp_WhoIsActive @output_column_list = '[dd%][percent_complete][sql_text][wait_info]' ,
        @destination_table  = '#t'
declare @runtime time
select @runtime = convert(time ,right(runtime, 12)) from #t where percent_complete is not null
select total_estimated =
                      (datepart( ss,@runtime ) + (datepart (mi, @runtime)*60 ) + (datepart( hh,@runtime )*60* 60)
                      )/(percent_complete/ 100)
        remaining_estimated =
                            (datepart( ss,@runtime ) + (datepart (mi, @runtime)*60 ) + (datepart( hh,@runtime )*60* 60)
                            )/(percent_complete/ 100)
               ((datepart( hh,@runtime )*60* 60) + ( datepart(mi ,@runtime)* 60)
from #t
where percent_complete is not null
drop table #t

Writing Checksums by Reindexing

I recently came across a couple databases that didn’t have any recovery model set.  Gotta love vendors.  Anyway, I know that turning on the checksum recovery model doesn’t put a checksum on each page.  Rather, it’s written the next time the page is written to disk.  This can lead to a gap where some pages never get a checksum written to them.  Since there’s no “DBCC CREATE_CHECKSUM” command we need to look for another way to do it to write the checksum for tables that aren’t modified often.

Verifying Checksum

Before we do anything else we need to know how to verify whether or not there’s a checksum on the page.  To do that I used the method Colleen Morrow used when she discussed switching from torn page to checksum recovery models.  As she shows, the checksum is stored in the m_tornBits field of the page header instead of a separate field.  We’ll be using that to confirm when the checksum is written.


We’re going to create a database, modify the recovery model and create a clustered table.  Finally, we’re going to insert some data and do a checkpoint to make sure all the data is written to disk.

USE ChecksumTest
CREATE TABLE dbo. clusteredTable (
name VARCHAR (255) NOT NULL,
INSERT INTO dbo. clusteredTable
SELECT TOP 1000 b. name FROM sys. objects a
CROSS APPLY sys. objects b ;

Now that we have the data there we should confirm that there isn’t a checksum on the page already.  First we’ll set trace flag 3604 so we can see the output of one of the commands in the query window then we’ll use DBCC IND to see what pages are on the table.  Lastly, we’ll use DBCC PAGE to look at the page itself.

DBCC IND ('ChecksumTest' ,'clusteredTable', 1);
/* DBCC IND Results:
PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID     PartitionNumber PartitionID          iam_chain_type       PageType IndexLevel NextPageFID NextPagePID PrevPageFID PrevPagePID
------- ----------- ------ ----------- ----------- ----------- --------------- -------------------- -------------------- -------- ---------- ----------- ----------- ----------- -----------
1       154         NULL   NULL        2105058535  1           1               72057594038779904    In-row data          10       NULL       0           0           0           0
1       153         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           156         0           0
1       155         1      154         2105058535  1           1               72057594038779904    In-row data          2        1          0           0           0           0
1       156         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           157         1           153
1       157         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           158         1           156
1       158         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          0           0           1           157
(6 row(s) affected)
--Picking a random page type 1 row from above
DBCC PAGE (ChecksumTest, 1,156 ,1)

And here is the relevent part of the results from the DBCC PAGE call:

PAGE: (1:156)


Page @0x0000000085C2C000

m_pageId = (1:156)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 27     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039697408
Metadata: PartitionId = 72057594038779904                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:153)                 m_nextPage = (1:157)
pminlen = 8                          m_slotCnt = 313                      m_freeCnt = 13
m_freeData = 7553                    m_reservedCnt = 0                    m_lsn = (21:184:538)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0

Notice that the m_tornBits value is 0 meaning there’s no page recovery written to this page.

Write that Checksum

Qe’re going to enable the checksum recovery model and see what we can to to get a checksum written to the page.  First we’re going to try an index reorg.  We’re also going to do a checkpoint after the reorg just to make absolutely sure that there’s nothing just floating around in memory.

ALTER INDEX PK_ClusteredTable ON clusteredTable REORGANIZE
DBCC IND ('ChecksumTest' ,'clusteredTable', 1);
--The output is the same as above so we check the same page.
DBCC PAGE (ChecksumTest, 1,156 ,1)

And that same section of DBCC PAGE from above:

PAGE: (1:156)


Page @0x0000000085C2C000

m_pageId = (1:156)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 27     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039697408
Metadata: PartitionId = 72057594038779904                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:153)                 m_nextPage = (1:157)
pminlen = 8                          m_slotCnt = 313                      m_freeCnt = 13
m_freeData = 7553                    m_reservedCnt = 0                    m_lsn = (21:184:538)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0

m_tornBits is still 0 so that didn’t do it.  There may be cases where it does, such as if the pages are actually out of order, but this does confirm it won’t always happen.  Let’s see what happens when we do a rebuild instead of a reorg.  Again, we’re doing a reindex just to make sure everything gets to disk and verify the page numbers with DBCC IND.

ALTER INDEX PK_ClusteredTable ON clusteredTable REBUILD
DBCC IND ('ChecksumTest' ,'clusteredTable', 1);

If we look at the output of DBCC IND this time we see a different output:

PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID     PartitionNumber PartitionID          iam_chain_type       PageType IndexLevel NextPageFID NextPagePID PrevPageFID PrevPagePID
------- ----------- ------ ----------- ----------- ----------- --------------- -------------------- -------------------- -------- ---------- ----------- ----------- ----------- -----------
1       168         NULL   NULL        2105058535  1           1               72057594038845440    In-row data          10       NULL       0           0           0           0
1       159         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           169         0           0
1       169         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           171         1           159
1       170         1      168         2105058535  1           1               72057594038845440    In-row data          2        1          0           0           0           0
1       171         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           172         1           169
1       172         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          0           0           1           171

All the page numbers are different.  This is an indication we’re probably going to see a checksum.  Again, picking a random page with a page type of 1:

DBCC PAGE (ChecksumTest, 1,171 ,1)

We now see that m_tornBits has a value indicating that the checksum has been written.

PAGE: (1:171)


Page @0x0000000085AA0000

m_pageId = (1:171)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x0                 m_level = 0                          m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 28     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039762944
Metadata: PartitionId = 72057594038845440                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:169)                 m_nextPage = (1:172)
pminlen = 8                          m_slotCnt = 303                      m_freeCnt = 23
m_freeData = 7563                    m_reservedCnt = 0                    m_lsn = (22:16:48)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 1602034681


This still isn’t a low-intensity way to get checksums written for each page.  It is something that can be done an index at a time that could have a side benefit as well.  Care should be taken when using this, such as doing it online where available or doing it offline, but it’s a useful tool to make sure that after switching to full recovery model that the checksum gets written to little modified pages.

Powershell – Validating Parameters

To test the full length of the restore process I’ve been working on a Powershell script to automate restoring a number of our databases to a test box.  To help make the script more flexible I’ve used parameters in the script so if our environment changes over time it’s easy to change what the script does to accommodate.  To make sure the script runs those parameters need to be validated before the script runs to make sure the input can be used and that’s what I’ll be covering here.

The Parameters

The parameters that I’m using are:
$numberDBs #This is the number of databases to restore.  They'll be restored in sequence so we don't want to be too restrictive but if someone wants to automatically restore more than 20 databases I'm going to assume they typoed something
$supportLevel #We mark all of our databases with a support level they get and there's a discrete list of possible values.  This will be used to restrict what servers we use to test.
$targetServerInstance #This can be any valid instance but for the purposes of this post we're just going to validate that there's a string that's not too long to be a server name
$targetDataPath #We want to confirm that this is a valid, local Windows path
$targetLogPath #Same as above
We’re going to start out by defining our parameter block and getting all the parameters defined.  For the demo script I’m just going to output the parameters:
param([Parameter(Position=0)] [string]$numberDBs,
    [Parameter(Position=1)] [string]$supportLevel,
    [Parameter(Position=2)] [string]$targetServerInstance,
    [Parameter(Position=3)] [string]$targetDataPath,
    [Parameter(Position=4)] [string]$targetLogPath
Write-Output"Number of databases: $numberDBs"
Write-Output"Support Level: $supportLevel"
Write-Output"Target Instance: $targetServerInstance"
Write-Output"Data Path: $targetDataPath"
Write-Output"Log Path: $targetLogPath"
I saved this script into my “C:\Scratch” folder as “ParameterDemo.ps1″ so I can run it with the following command:
C:\Scratch\ParameterDemo.ps1 -numberDBs 5 -supportLevel "BackupsOnly" -targetServerInstance "servername" -targetDataPath "T:\Data" -targetLogPath "T:\Data"
And here’s the output:
Since I specified positions for each parameter I can also call them without the name and just use the position.  But what if I put things in the wrong order?

I feel Validated

Using Powershell’s parameter validation we can check all this with very little code.  Let’s start with the number of databases we want to restore.  We’re dealing with a number here so we’re looking for the ValidateRange attribute.  We don’t want to the script to run if we’re restoring less than 1 database and don’t want to do more than 20 so here’s what we modify that line to:
Making that change and rerunning the script gives us an error saying it can’t validate the string:
Rearranging the parameters so we’re putting the number of databases where it belongs (but typoing the number) we get an error more along the lines of what we expect:
Cannot validate argument on parameter 'numberDBs'. The 25 argument is greater than the maximum allowed range of 20. Supply an argument  that is less than or equal to 20 and then try the command again.
So lets’ fix that and put in the validation for the support level.
We know we’re looking at a discrete set of values here so we’re going to use the ValidateSet attribute.  There may be a way to query these out of the database but we’re keeping things simple for now.
[Parameter(Position=1)][ValidateSet("NoAccess","NoResponsibilities","BackupsOnly","Monitoring","Troubleshooting")] [string]$supportLevel,
Rerunning the fixed query we get a new error:
Cannot validate argument on parameter 'supportLevel'. The argument "servername" does not belong to the set  "NoAccess,NoResponsibilities,BackupsOnly,Monitoring,Troubleshooting" specified by the ValidateSet attribute. Supply an argument that is in the set and then try the  command again.
So let’s fix that and put in the rest of the validation.
param([Parameter(Position=0)][ValidateRange(1,20)] [string]$numberDBs,
[Parameter(Position=1)][ValidateSet("NoAccess","NoResponsibilities","BackupsOnly","Monitoring","Troubleshooting")] [string]$supportLevel,
[Parameter(Position=2)][ValidateLength(1,255)] [string]$targetServerInstance,
[Parameter(Position=3)][ValidatePattern('[A-Za-z]:\\[^/:*?"&lt;&gt;|]*')] [string]$targetDataPath,
[Parameter(Position=4)][ValidatePattern('[A-Za-z]:\\[^/:*?"&lt;&gt;|]*')] [string]$targetLogPath
Write-Output "Number of databases: $numberDBs"
Write-Output "Support Level: $supportLevel"
Write-Output "Target Instance: $targetServerInstance"
Write-Output "Data Path: $targetDataPath"
Write-Output "Log Path: $targetLogPath"
There’s just one more validation I want to test, I’ll be leaving testing the length of the servername up to you if you want to see what that does.  What I want to test is the regex for the data and log paths.  What if someone tries to put a UNC path in for one of those:
 Cannot validate argument on parameter 'targetDataPath'. The argument "\\fileserver\dbaspace\" does not match the  "[A-Za-z]:\\[^/:*?"<>|]*" pattern. Supply an argument that matches "[A-Za-z]:\\[^/:*?"<>|]*" and try the command again.
As someone else pointed out, that’s not the most intuitive error if you’re an end user.  In my case since this will be a DBA script only I’m not too worried.  However, if you’re coding something that end users will be running you may want to consider implementing more clear error messages.
There could very will be more validation done on the input.  For example, making sure the $targetServerInstance exists and that the paths exist on that target.  It may be possible to put that logic in the framework shown in the “more clear error messages” link but you could also do it elsewhere in the script.  For many applications the validate attribute will do the bulk of the work for very little work and save you time.

CheckDB Exiting with Error State 6

A firetruck crashing just like checkdb

Last week I talked about how I was automated DB restores and integrity checks to validate our restore process.  One of the errors I ran into was this one:

Executed as user: DOMAIN\user. A job step received an error at line 104 in a PowerShell script. The corresponding line is ‘$database.CheckTables(“None”, “AllErrorMessages,NoInformationMessages,TableLock”,”DataPurity”)’. Correct the script and reschedule the job. The error information returned by PowerShell is: ‘Exception calling “CheckTables” with “3” argument(s): “Check tables failed for Database ‘restoredDatabase’. ”  ‘.  Process Exit Code -1.  The step failed.

Looking into the error log I found this:

DBCC CHECKDB (restoredDatabase) WITH all_errormsgs, no_infomsgs, tablock, data_purity executed by DOMAIN\user terminated abnormally due to error state 6. Elapsed time: 0 hours 10 minutes 1 seconds.


The first thing I did was to run to my good friend SQL Books Online.  And came up short.  If you look for “Understanding DBCC Error Messages” on that page you’ll see that there isn’t an error state 6 listed so I had no official documentation to go off of.  I knew the database wasn’t corrupt and that the process worked on a smaller database.  Taking a look at the error message again I noticed that it had been failing right around the 10 minute mark each time which points at a timeout.  I started running a checkDB on it manually and after I cancelled the command I saw the same message in the error log.  So I had discovered that error state 6 means the client got bored and cancelled the integrity check.

The Fix

Now that I knew that I was dealing with a timeout I just had to rework the Powershell script to bypass it.  When I started here is what my code looked like:

$server = new-object ("Microsoft.SqlServer.Management.Smo.Server") $targetServerInstance
 $database = $server.Databases | Where-Object -Property name -EQ -Value $targetDBName
 $database.CheckTables("None", "AllErrorMessages,NoInformationMessages,TableLock","DataPurity")

There isn’t a way to change the timeout on the smo.server object so we need to add another step and create a server connection object where we can change the timeout.  We do that and here’s what we end up with:

$serverConn = new-object Microsoft.SqlServer.Management.Common.ServerConnection($targetServerInstance)
$serverConn.StatementTimeout = 0
$server = new-object ("Microsoft.SqlServer.Management.Smo.Server") $targetServerInstance
$database = $serverConn.Databases | Where-Object {$_.name -EQ $targetDBName}
$database.CheckTables("None", "AllErrorMessages,NoInformationMessages,TableLock","DataPurity")


As it is with troubleshooting a lot of errors the trick is looking into each detail until it makes sense. Even where documentation is lacking there’s usually enough information to help at least start covering the gap.

T-SQL Tuesday #040: File and Filegroup Wisdom


It’s time fore another TSQL Tuesday.  This was a hard theme for me to come up with a topic for.  Midnight DBA is hosting this month and the theme is File and Filegroup Wisdom.  The first thing I could think of I had already written about so that would be cheating.  One of the things I do mess around with a fair amount is Powershell so I thought it would be interesting to dig into how Powershell treats filegroups and files.  I’m only going to scratch the surface here but what I do cover will give you the base to explore on your own.

Pulling the groups

The first thing to do is to pull the data about the filegroups.  To start off with I’m just going to ouput all the data about the filegroups to the screen and format it in a way that’s a little easier to read:

[System.Reflection.Assembly ]::LoadWithPartialName("Microsoft.SqlServer.SMO" ) | Out-Null
$serverName = 'localhost'
$server = New-Object("Microsoft.SqlServer.Management.Smo.Server") $serverName
$db = $server.Databases | Where-Object {$_.name -eq 'AdventureWorks'}
$db.FileGroups | Format-Table

This will list all the file groups in the database and some of the basic information about them.  A couple of the fields, namely files and properties, can hold multiple pieces of information which can make them harder to read.  The name and some of the basic information such as whether or not the group is read-only.

Getting more properties

Most of the properties of the file group are visible here but if we dig into the Properties collection we do see one more.  We’re going to reference the first file group in the array, marked by spot 0, and list the properties the same way we listed the file groups above.

$db.FileGroups[0].Properties | Format-Table

The last property is PolicyHealthState so if you use policies to manage your system here is one place to get that info.

Digging into Files

Heading back up let’s take a look at the files in the group.  When working with collections like this Powershell makes it easy to iterate through each item to get information.  Here’s how we can view all the files in each file group:

$i = $db.FileGroups.Count
$db.FileGroups[0..$i].Files | Format-Table

The Parent property even has the name of the file group that it belongs to.  If you’re looking for the count of files in each file group we need to use a more traditional foreach loop.

foreach ($fg in $db.FileGroups) {
     $fileCount = $fg.Files.Count
     write-output "$fg $fileCount"

That’s all for now

That’s all I’m going to cover here but there’s a lot more to dig into.  For example, if you want more information on a particular file you can use the trick above to get more information on any of the files.  There are also a number of methods you can call on both filegroups and files to work with them.

SQLPS has Trust Issues – AuthorizationManager check failed

Have you ever known anyone that would refuse to trust a perfectly reputable source?  Well that’s the experience I’ve been having with SQLPS lately.  For anyone that isn’t aware SQLPS is a Powershell utility make for SQL.  It’s rather useful and we’ve had nightly environment checks using Powershell scripts running for a while.  I’ve been working on new scripts to automate restoring backups from tape and run an integrity check to confirm all restore with no issues.  What’s different about these is instead of having the entire script locally on the monitoring server I want to keep them in a network share so we can call the functions as desired as well.

And that’s where I ran into the issue.  We’re going to pick up after I upgraded to SQL 2012 (which wasn’t necessary but I didn’t need much of a reason to upgrade).  The first issue I ran into is that the Powershell execution policy had reverted to RemoteSigned.  To handle that issue I had to edit the registry which seems a bit extreme.  Under SQL 2008 R2 I had been able to run the set-executionpolicy command within a script.  But as long as it works who am I to complain?
Once I handled that I started getting the following error:
AuthorizationManager check failed
The line that was erroring was “dot sourcing” a script file from a UNC path.  Dot sourcing lets you use functions in that script file later in the script.  Most of the advice that I read stated to change the execution policy which I had already done.  After much digging I came across a thread that referenced files downloaded from the Internet.  That got me thinking.  What if there was still an issue based on the location.  I RDP’ed into the box as the SQL Agent SQL account and got this:
The error behind the error
Powershell was allowing me to run the file (so the execution policy was in place) but was prompting me to approve it.  This is likely what was causing the error in SQL Agent.  Following this I opened IE and went to Internet Options -> Security and added the root of the UNC path to the trusted sites.
Running the script again it ran without an issue and, much to my enjoyment, when I ran it via Powershell it ran perfectly.  Okay, so it started running into other errors I didn’t get when running in the normal Powershell environment but I’ll cover those a different day.
This isn’t something I would do off the bat because it does depend on your security settings.  Another test machine I have, a desktop instead of server, I didn’t have to make this change on.  However, if you’re getting the error message above this will likely to the trick.