Script enabling mail notifications

One of the things that we try to do where I work is automate as much of the process of setting up a new server as possible.  This reduces the possibility for user error and speeds up the setup process.  We recently found a little twist in setting up database mail where it partially worked.  Calling the mail procedures went fine but notifications (either on an event like a read error or for a job completion) didn’t work.

SSMS Lies

Here’s the line of the setup script that should enable mail notifications:

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'DatabaseMailProfile', N'REG_SZ', @defaultProfile

Where @defaultProfile is the name of the profile to use.

The fix for this was as follows:

  1. Right click on “SQL Server Agent”
  2. Go to the “Alert System” screen
  3. Uncheck “Enable mail profile”
  4. Click okay then go back in and re-enable the mail profile.

So to find out why the scripted version was doing wrong the first thing I tried was going and scripting that change in SSMS.  Since this is supposed to be what SQL runs that should tell me what I’m doing wrong.  Working in the healthcare industry we’re still deploying mostly SQL 2008 R2 instances and that’s what this script is for.  SQL 2012 makes a different call but from what I can see still has the same problem.

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'DatabaseMailProfile', N'REG_SZ', N'MailProfileName'

So that didn’t work.  The next step was to take the SQLServerAgent branch of the registry from a server that was fully working and a server that had been set up with the script and compare them.  Doing that I found another registry key that was different:

HKLM\SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent\UseDatabaseMail

That key needs to be set to 1.  Adding the following to my setup script got notifications working:

EXEC master.dbo.xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent', N'UseDatabaseMail', N'REG_DWORD', 1

 

I’m not sure why SQL doesn’t script that second registry key setting out but it’s crucial for SQL notifications to work.

T-SQL Tuesday #42! The Long and Winding Road

For those not already familiar, T-SQL Tuesday is a blog party that happens the second Tuesday of every month.  The host rotates each much and this month Wendy Pastrick is hosting.  The topic she has chosen is:

…we all experience change in our work lives. Maybe you have a new job, or a new role at your company. Maybe you’re just getting started and you have a road map to success in mind. Whatever it is, please share it next week, Tuesday May 14th. Make sure you note what technologies you find are key to your interests or successes, and maybe you will inspire someone to look down a road less traveled.

The biggest change that’s happened for me recently is moving to a new job which also had me switching from working for a vendor (an Electronic Health Record vendor) to working for a client (a hospital.)  This also had me changing from being a Database Developer to being a Production DBA.  Both sides of that meant a lot of changes in my work environment.

Sometimes you don’t know how bad it could be

I used to rant about the significant mistakes I saw in the product I worked on.  Some things that just shouldn’t happen.  Now, I still think they’re things that shouldn’t happen but I realize how much worse things could be.  I’m realizing how many companies miss the basic stuff that they got.  Such as hard coded passwords, following client’s change control processes, and what an integrity check is and why it’s needed.  And compared to one product we have that uses a GUID stripped of the dashes for identity on rows, stores both that and the actual value in each table, and has implicit conversions all over the place preventing index seeks my former product looks almost spotless.  If you’re working as a developer some place and are concerned about the quality of your database, keep up the good fight but realize things could be much, much worse.

Different Problems

The problems I’m working on now are very different from the ones I worked on when I worked for the vendor.  If an application is returning data that it shouldn’t the problem doesn’t even make it to me in most cases.  Performance problems may either go straight to the vendor or make a stop at my desk after which my findings typically go to the vendor to validate.  But while I’m less involved in those I have a new set of problems to deal with.  How do I ensure we can recover any database if we’re called to?  Who needs to be contacted to do what testing before we apply a SQL patch?  Why did that SQL instance crash?  I don’t have fewer problems now.  If anything, I have more because instead of being responsible for the DB code for a large product I’m responsible for 150+ servers and all the maintenance, security, performance, and reliability issues that come with those.

Different Methods

In my previous job if something was broken in the database I fixed it.  If there was a bug in the logic I changed it.  If code was inefficient I optimized it.  Performance could be fixed by adding an index or changing the code.  There were also times when I helped clients track down performance issues specific to their config.  Now what I deal with is much more limited.  I deal with performance issues caused by regular DB maintenance by trying to schedule during low use times (not always 10pm since we’re a hospital), blocking due to a long running report or load, and recommend index and code changes to vendors.  Whereas before I would occasionally restructure entire stored procs to fix an issue now I don’t delve into things that far.  And in some cases I don’t have access to much code at all because it’s all adhoc SQL.

Different Tools

Before I was focused almost 100% on issues with the code.  Now I’m focused much more on the system than the code.  I still use Profiler and query plans to troubleshoot but I find myself relying much more on sp_WhoIsActive, sp_blitz, and Idera’s SQLdm which we use for monitoring.  These tools tend to be much more useful with the problems I’m trying to solve now.

In The End

I’ve been very happy with the change.  A big part of why I left my vendor position was because there was a whole world of experience that I didn’t have access to.  One that all of my clients dealt with every day.  Now I’ve experienced that world for a little over two years.  I’ve learned more about optimal disk layout, how to automate deployments, how to configure security, and a lot more.  I’ve also been working with more SQL components (my SSIS knowledge is still basic but it would have been almost nil before) and even some stuff outside SQL (Powershell is my friend).  There are things I miss.  I miss stepping through a trace to find out why an SP didn’t execute as expected.  I miss reworking code to cut the execution time.  Most of the problems I work on now don’t have the depth that the coding problems did and I do miss that.  However, there are some that do (such as automating restoring databases all the way from pulling out of TSM to doing an integrity check and sending a report) and that makes me happy.  It’s been a good change and I’m looking forward to all that I still have to learn as a production DBA.

Estimating Remaining Query Time

There are some queries that report a percent complete value in the sys.dm_exec_requests DMV.  While this is useful for getting a ballpark of how long the query will take to finish doing this kind of math in your head isn’t always easy.  Fortunately it’s possible to do all sorts of fun math in SQL so you can put it in there to get the estimated query time remaining.

Get that Query Time

This can’t be used for all queries.  There’s a subset of queries where progress can be tracked like this.  The full list can be found on the BOL page for sys.dm_exec_requests but the ones I use it for the most are backup, restore, and checkdb so I know when I need to check back in.

Now, this won’t be a perfect estimate.  Obviously variations in server, network, or disk load can change how quickly the operation completes.  Other things may not be quite so obvious.  For example, when restoring a file the amount of time needed to allocate the space on disk isn’t counted.  This can cause the estimated time remaining to be greater than it will actually take.

Since I use Adam Machanic’s sp_WhoIsActive I based the code off that.  And since I’m only tracking one query at a time I didn’t worry about handling multiple rows.

create table #t (
        runtime varchar (max),
        percent_complete numeric (8, 5),
        sql_text xml ,
        wait_info varchar (max)
)
 
exec FAHC_DBA .dbo. usp_WhoIsActive @output_column_list = '[dd%][percent_complete][sql_text][wait_info]' ,
        @destination_table  = '#t'
 
declare @runtime time
select @runtime = convert(time ,right(runtime, 12)) from #t where percent_complete is not null
select total_estimated =
        (
               (
                      (datepart( ss,@runtime ) + (datepart (mi, @runtime)*60 ) + (datepart( hh,@runtime )*60* 60)
                      )/(percent_complete/ 100)
               )
        )/60,
        remaining_estimated =
        (
               (
                      (
                            (datepart( ss,@runtime ) + (datepart (mi, @runtime)*60 ) + (datepart( hh,@runtime )*60* 60)
                            )/(percent_complete/ 100)
                      )
               )-
               ((datepart( hh,@runtime )*60* 60) + ( datepart(mi ,@runtime)* 60)
               )
        )/60,
        *
from #t
where percent_complete is not null
 
drop table #t

Writing Checksums by Reindexing

I recently came across a couple databases that didn’t have any recovery model set.  Gotta love vendors.  Anyway, I know that turning on the checksum recovery model doesn’t put a checksum on each page.  Rather, it’s written the next time the page is written to disk.  This can lead to a gap where some pages never get a checksum written to them.  Since there’s no “DBCC CREATE_CHECKSUM” command we need to look for another way to do it to write the checksum for tables that aren’t modified often.

Verifying Checksum

Before we do anything else we need to know how to verify whether or not there’s a checksum on the page.  To do that I used the method Colleen Morrow used when she discussed switching from torn page to checksum recovery models.  As she shows, the checksum is stored in the m_tornBits field of the page header instead of a separate field.  We’ll be using that to confirm when the checksum is written.

Setup

We’re going to create a database, modify the recovery model and create a clustered table.  Finally, we’re going to insert some data and do a checkpoint to make sure all the data is written to disk.

CREATE DATABASE ChecksumTest;
GO
 
ALTER DATABASE ChecksumTest
SET PAGE_VERIFY NONE;
GO
 
USE ChecksumTest
 
CREATE TABLE dbo. clusteredTable (
id INT IDENTITY ( 1,1 ) NOT NULL,
name VARCHAR (255) NOT NULL,
CONSTRAINT PK_ClusteredTable PRIMARY KEY (id )
);
 
INSERT INTO dbo. clusteredTable
SELECT TOP 1000 b. name FROM sys. objects a
CROSS APPLY sys. objects b ;
GO
CHECKPOINT
GO

Now that we have the data there we should confirm that there isn’t a checksum on the page already.  First we’ll set trace flag 3604 so we can see the output of one of the commands in the query window then we’ll use DBCC IND to see what pages are on the table.  Lastly, we’ll use DBCC PAGE to look at the page itself.

DBCC TRACEON (3604 );
GO
 
DBCC IND ('ChecksumTest' ,'clusteredTable', 1);
GO
/* DBCC IND Results:
PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID     PartitionNumber PartitionID          iam_chain_type       PageType IndexLevel NextPageFID NextPagePID PrevPageFID PrevPagePID
------- ----------- ------ ----------- ----------- ----------- --------------- -------------------- -------------------- -------- ---------- ----------- ----------- ----------- -----------
1       154         NULL   NULL        2105058535  1           1               72057594038779904    In-row data          10       NULL       0           0           0           0
1       153         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           156         0           0
1       155         1      154         2105058535  1           1               72057594038779904    In-row data          2        1          0           0           0           0
1       156         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           157         1           153
1       157         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          1           158         1           156
1       158         1      154         2105058535  1           1               72057594038779904    In-row data          1        0          0           0           1           157
 
(6 row(s) affected)
*/
 
--Picking a random page type 1 row from above
DBCC PAGE (ChecksumTest, 1,156 ,1)
GO

And here is the relevent part of the results from the DBCC PAGE call:

PAGE: (1:156)
<snip>

PAGE HEADER:

Page @0x0000000085C2C000

m_pageId = (1:156)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 27     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039697408
Metadata: PartitionId = 72057594038779904                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:153)                 m_nextPage = (1:157)
pminlen = 8                          m_slotCnt = 313                      m_freeCnt = 13
m_freeData = 7553                    m_reservedCnt = 0                    m_lsn = (21:184:538)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0

Notice that the m_tornBits value is 0 meaning there’s no page recovery written to this page.

Write that Checksum

Qe’re going to enable the checksum recovery model and see what we can to to get a checksum written to the page.  First we’re going to try an index reorg.  We’re also going to do a checkpoint after the reorg just to make absolutely sure that there’s nothing just floating around in memory.

ALTER DATABASE ChecksumTest SET PAGE_VERIFY CHECKSUM
GO
 
ALTER INDEX PK_ClusteredTable ON clusteredTable REORGANIZE
GO
CHECKPOINT
GO
 
DBCC IND ('ChecksumTest' ,'clusteredTable', 1);
GO
--The output is the same as above so we check the same page.
 
DBCC PAGE (ChecksumTest, 1,156 ,1)
GO

And that same section of DBCC PAGE from above:

PAGE: (1:156)

PAGE HEADER:

Page @0x0000000085C2C000

m_pageId = (1:156)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 27     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039697408
Metadata: PartitionId = 72057594038779904                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:153)                 m_nextPage = (1:157)
pminlen = 8                          m_slotCnt = 313                      m_freeCnt = 13
m_freeData = 7553                    m_reservedCnt = 0                    m_lsn = (21:184:538)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0

m_tornBits is still 0 so that didn’t do it.  There may be cases where it does, such as if the pages are actually out of order, but this does confirm it won’t always happen.  Let’s see what happens when we do a rebuild instead of a reorg.  Again, we’re doing a reindex just to make sure everything gets to disk and verify the page numbers with DBCC IND.

ALTER INDEX PK_ClusteredTable ON clusteredTable REBUILD
GO
CHECKPOINT
GO
 
DBCC IND ('ChecksumTest' ,'clusteredTable', 1);
GO

If we look at the output of DBCC IND this time we see a different output:

PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID     PartitionNumber PartitionID          iam_chain_type       PageType IndexLevel NextPageFID NextPagePID PrevPageFID PrevPagePID
------- ----------- ------ ----------- ----------- ----------- --------------- -------------------- -------------------- -------- ---------- ----------- ----------- ----------- -----------
1       168         NULL   NULL        2105058535  1           1               72057594038845440    In-row data          10       NULL       0           0           0           0
1       159         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           169         0           0
1       169         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           171         1           159
1       170         1      168         2105058535  1           1               72057594038845440    In-row data          2        1          0           0           0           0
1       171         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          1           172         1           169
1       172         1      168         2105058535  1           1               72057594038845440    In-row data          1        0          0           0           1           171

All the page numbers are different.  This is an indication we’re probably going to see a checksum.  Again, picking a random page with a page type of 1:

DBCC PAGE (ChecksumTest, 1,171 ,1)
GO

We now see that m_tornBits has a value indicating that the checksum has been written.

PAGE: (1:171)
<snip>

PAGE HEADER:

Page @0x0000000085AA0000

m_pageId = (1:171)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x0                 m_level = 0                          m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 28     m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594039762944
Metadata: PartitionId = 72057594038845440                                 Metadata: IndexId = 1
Metadata: ObjectId = 2105058535      m_prevPage = (1:169)                 m_nextPage = (1:172)
pminlen = 8                          m_slotCnt = 303                      m_freeCnt = 23
m_freeData = 7563                    m_reservedCnt = 0                    m_lsn = (22:16:48)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 1602034681

Conclusion

This still isn’t a low-intensity way to get checksums written for each page.  It is something that can be done an index at a time that could have a side benefit as well.  Care should be taken when using this, such as doing it online where available or doing it offline, but it’s a useful tool to make sure that after switching to full recovery model that the checksum gets written to little modified pages.

Powershell – Validating Parameters

To test the full length of the restore process I’ve been working on a Powershell script to automate restoring a number of our databases to a test box.  To help make the script more flexible I’ve used parameters in the script so if our environment changes over time it’s easy to change what the script does to accommodate.  To make sure the script runs those parameters need to be validated before the script runs to make sure the input can be used and that’s what I’ll be covering here.

The Parameters

The parameters that I’m using are:
$numberDBs #This is the number of databases to restore.  They'll be restored in sequence so we don't want to be too restrictive but if someone wants to automatically restore more than 20 databases I'm going to assume they typoed something
$supportLevel #We mark all of our databases with a support level they get and there's a discrete list of possible values.  This will be used to restrict what servers we use to test.
$targetServerInstance #This can be any valid instance but for the purposes of this post we're just going to validate that there's a string that's not too long to be a server name
$targetDataPath #We want to confirm that this is a valid, local Windows path
$targetLogPath #Same as above
We’re going to start out by defining our parameter block and getting all the parameters defined.  For the demo script I’m just going to output the parameters:
param([Parameter(Position=0)] [string]$numberDBs,
    [Parameter(Position=1)] [string]$supportLevel,
    [Parameter(Position=2)] [string]$targetServerInstance,
    [Parameter(Position=3)] [string]$targetDataPath,
    [Parameter(Position=4)] [string]$targetLogPath
)
 
Write-Output"Number of databases: $numberDBs"
Write-Output"Support Level: $supportLevel"
Write-Output"Target Instance: $targetServerInstance"
Write-Output"Data Path: $targetDataPath"
Write-Output"Log Path: $targetLogPath"
I saved this script into my “C:\Scratch” folder as “ParameterDemo.ps1″ so I can run it with the following command:
C:\Scratch\ParameterDemo.ps1 -numberDBs 5 -supportLevel "BackupsOnly" -targetServerInstance "servername" -targetDataPath "T:\Data" -targetLogPath "T:\Data"
And here’s the output:
Since I specified positions for each parameter I can also call them without the name and just use the position.  But what if I put things in the wrong order?

I feel Validated

Using Powershell’s parameter validation we can check all this with very little code.  Let’s start with the number of databases we want to restore.  We’re dealing with a number here so we’re looking for the ValidateRange attribute.  We don’t want to the script to run if we’re restoring less than 1 database and don’t want to do more than 20 so here’s what we modify that line to:
 [Parameter(Position=0)][ValidateRange(1,20)][string]$numberDBs
Making that change and rerunning the script gives us an error saying it can’t validate the string:
Rearranging the parameters so we’re putting the number of databases where it belongs (but typoing the number) we get an error more along the lines of what we expect:
Cannot validate argument on parameter 'numberDBs'. The 25 argument is greater than the maximum allowed range of 20. Supply an argument  that is less than or equal to 20 and then try the command again.
So lets’ fix that and put in the validation for the support level.
We know we’re looking at a discrete set of values here so we’re going to use the ValidateSet attribute.  There may be a way to query these out of the database but we’re keeping things simple for now.
[Parameter(Position=1)][ValidateSet("NoAccess","NoResponsibilities","BackupsOnly","Monitoring","Troubleshooting")] [string]$supportLevel,
Rerunning the fixed query we get a new error:
Cannot validate argument on parameter 'supportLevel'. The argument "servername" does not belong to the set  "NoAccess,NoResponsibilities,BackupsOnly,Monitoring,Troubleshooting" specified by the ValidateSet attribute. Supply an argument that is in the set and then try the  command again.
So let’s fix that and put in the rest of the validation.
param([Parameter(Position=0)][ValidateRange(1,20)] [string]$numberDBs,
[Parameter(Position=1)][ValidateSet("NoAccess","NoResponsibilities","BackupsOnly","Monitoring","Troubleshooting")] [string]$supportLevel,
[Parameter(Position=2)][ValidateLength(1,255)] [string]$targetServerInstance,
[Parameter(Position=3)][ValidatePattern('[A-Za-z]:\\[^/:*?"&lt;&gt;|]*')] [string]$targetDataPath,
[Parameter(Position=4)][ValidatePattern('[A-Za-z]:\\[^/:*?"&lt;&gt;|]*')] [string]$targetLogPath
)
 
Write-Output "Number of databases: $numberDBs"
Write-Output "Support Level: $supportLevel"
Write-Output "Target Instance: $targetServerInstance"
Write-Output "Data Path: $targetDataPath"
Write-Output "Log Path: $targetLogPath"
There’s just one more validation I want to test, I’ll be leaving testing the length of the servername up to you if you want to see what that does.  What I want to test is the regex for the data and log paths.  What if someone tries to put a UNC path in for one of those:
 Cannot validate argument on parameter 'targetDataPath'. The argument "\\fileserver\dbaspace\" does not match the  "[A-Za-z]:\\[^/:*?"<>|]*" pattern. Supply an argument that matches "[A-Za-z]:\\[^/:*?"<>|]*" and try the command again.
As someone else pointed out, that’s not the most intuitive error if you’re an end user.  In my case since this will be a DBA script only I’m not too worried.  However, if you’re coding something that end users will be running you may want to consider implementing more clear error messages.
There could very will be more validation done on the input.  For example, making sure the $targetServerInstance exists and that the paths exist on that target.  It may be possible to put that logic in the framework shown in the “more clear error messages” link but you could also do it elsewhere in the script.  For many applications the validate attribute will do the bulk of the work for very little work and save you time.

CheckDB Exiting with Error State 6

A firetruck crashing just like checkdb

Last week I talked about how I was automated DB restores and integrity checks to validate our restore process.  One of the errors I ran into was this one:

Executed as user: DOMAIN\user. A job step received an error at line 104 in a PowerShell script. The corresponding line is ‘$database.CheckTables(“None”, “AllErrorMessages,NoInformationMessages,TableLock”,”DataPurity”)’. Correct the script and reschedule the job. The error information returned by PowerShell is: ‘Exception calling “CheckTables” with “3” argument(s): “Check tables failed for Database ‘restoredDatabase’. ”  ‘.  Process Exit Code -1.  The step failed.

Looking into the error log I found this:

DBCC CHECKDB (restoredDatabase) WITH all_errormsgs, no_infomsgs, tablock, data_purity executed by DOMAIN\user terminated abnormally due to error state 6. Elapsed time: 0 hours 10 minutes 1 seconds.

DBCC CHECKDB Error Codes

The first thing I did was to run to my good friend SQL Books Online.  And came up short.  If you look for “Understanding DBCC Error Messages” on that page you’ll see that there isn’t an error state 6 listed so I had no official documentation to go off of.  I knew the database wasn’t corrupt and that the process worked on a smaller database.  Taking a look at the error message again I noticed that it had been failing right around the 10 minute mark each time which points at a timeout.  I started running a checkDB on it manually and after I cancelled the command I saw the same message in the error log.  So I had discovered that error state 6 means the client got bored and cancelled the integrity check.

The Fix

Now that I knew that I was dealing with a timeout I just had to rework the Powershell script to bypass it.  When I started here is what my code looked like:

$server = new-object ("Microsoft.SqlServer.Management.Smo.Server") $targetServerInstance
 $database = $server.Databases | Where-Object -Property name -EQ -Value $targetDBName
 $database.CheckTables("None", "AllErrorMessages,NoInformationMessages,TableLock","DataPurity")

There isn’t a way to change the timeout on the smo.server object so we need to add another step and create a server connection object where we can change the timeout.  We do that and here’s what we end up with:

$serverConn = new-object Microsoft.SqlServer.Management.Common.ServerConnection($targetServerInstance)
$serverConn.StatementTimeout = 0
$server = new-object ("Microsoft.SqlServer.Management.Smo.Server") $targetServerInstance
$database = $serverConn.Databases | Where-Object {$_.name -EQ $targetDBName}
$database.CheckTables("None", "AllErrorMessages,NoInformationMessages,TableLock","DataPurity")

 

As it is with troubleshooting a lot of errors the trick is looking into each detail until it makes sense. Even where documentation is lacking there’s usually enough information to help at least start covering the gap.

T-SQL Tuesday #040: File and Filegroup Wisdom

tsql2sday

It’s time fore another TSQL Tuesday.  This was a hard theme for me to come up with a topic for.  Midnight DBA is hosting this month and the theme is File and Filegroup Wisdom.  The first thing I could think of I had already written about so that would be cheating.  One of the things I do mess around with a fair amount is Powershell so I thought it would be interesting to dig into how Powershell treats filegroups and files.  I’m only going to scratch the surface here but what I do cover will give you the base to explore on your own.

Pulling the groups

The first thing to do is to pull the data about the filegroups.  To start off with I’m just going to ouput all the data about the filegroups to the screen and format it in a way that’s a little easier to read:

[System.Reflection.Assembly ]::LoadWithPartialName("Microsoft.SqlServer.SMO" ) | Out-Null
 
$serverName = 'localhost'
 
$server = New-Object("Microsoft.SqlServer.Management.Smo.Server") $serverName
 
$db = $server.Databases | Where-Object {$_.name -eq 'AdventureWorks'}
 
$db.FileGroups | Format-Table

This will list all the file groups in the database and some of the basic information about them.  A couple of the fields, namely files and properties, can hold multiple pieces of information which can make them harder to read.  The name and some of the basic information such as whether or not the group is read-only.

Getting more properties

Most of the properties of the file group are visible here but if we dig into the Properties collection we do see one more.  We’re going to reference the first file group in the array, marked by spot 0, and list the properties the same way we listed the file groups above.

$db.FileGroups[0].Properties | Format-Table

The last property is PolicyHealthState so if you use policies to manage your system here is one place to get that info.

Digging into Files

Heading back up let’s take a look at the files in the group.  When working with collections like this Powershell makes it easy to iterate through each item to get information.  Here’s how we can view all the files in each file group:

$i = $db.FileGroups.Count
 
$db.FileGroups[0..$i].Files | Format-Table

The Parent property even has the name of the file group that it belongs to.  If you’re looking for the count of files in each file group we need to use a more traditional foreach loop.

foreach ($fg in $db.FileGroups) {
     $fileCount = $fg.Files.Count
     write-output "$fg $fileCount"
}

That’s all for now

That’s all I’m going to cover here but there’s a lot more to dig into.  For example, if you want more information on a particular file you can use the trick above to get more information on any of the files.  There are also a number of methods you can call on both filegroups and files to work with them.

SQLPS has Trust Issues – AuthorizationManager check failed

Have you ever known anyone that would refuse to trust a perfectly reputable source?  Well that’s the experience I’ve been having with SQLPS lately.  For anyone that isn’t aware SQLPS is a Powershell utility make for SQL.  It’s rather useful and we’ve had nightly environment checks using Powershell scripts running for a while.  I’ve been working on new scripts to automate restoring backups from tape and run an integrity check to confirm all restore with no issues.  What’s different about these is instead of having the entire script locally on the monitoring server I want to keep them in a network share so we can call the functions as desired as well.

And that’s where I ran into the issue.  We’re going to pick up after I upgraded to SQL 2012 (which wasn’t necessary but I didn’t need much of a reason to upgrade).  The first issue I ran into is that the Powershell execution policy had reverted to RemoteSigned.  To handle that issue I had to edit the registry which seems a bit extreme.  Under SQL 2008 R2 I had been able to run the set-executionpolicy command within a script.  But as long as it works who am I to complain?
Once I handled that I started getting the following error:
AuthorizationManager check failed
The line that was erroring was “dot sourcing” a script file from a UNC path.  Dot sourcing lets you use functions in that script file later in the script.  Most of the advice that I read stated to change the execution policy which I had already done.  After much digging I came across a thread that referenced files downloaded from the Internet.  That got me thinking.  What if there was still an issue based on the location.  I RDP’ed into the box as the SQL Agent SQL account and got this:
The error behind the error
Powershell was allowing me to run the file (so the execution policy was in place) but was prompting me to approve it.  This is likely what was causing the error in SQL Agent.  Following this I opened IE and went to Internet Options -> Security and added the root of the UNC path to the trusted sites.
Running the script again it ran without an issue and, much to my enjoyment, when I ran it via Powershell it ran perfectly.  Okay, so it started running into other errors I didn’t get when running in the normal Powershell environment but I’ll cover those a different day.
This isn’t something I would do off the bat because it does depend on your security settings.  Another test machine I have, a desktop instead of server, I didn’t have to make this change on.  However, if you’re getting the error message above this will likely to the trick.

Duration of log restores

One of the decisions every DBA has to make is how frequently to backup transaction logs.  The top end of this duration is how much data can be lost by the business but what’s the bottom end?  When I took “IE3: High Availability & Disaster Recovery” Kimberly Tripp from SQL Skills recommended backing up transaction logs every five minutes.  Not being able to think of a downside at the time I changed that to our new standard and started deploying all new machines with that policy.  Recently it occurred to me that the number of extra files may noticeably affect log restore time.
Even though that wasn’t likely the case I wanted to test to have some hard data.  Our old standard for tlog backups was 30 minutes so that’s what I’m going to be comparing against.  In addition to the recovery time I’m also going to compare the disk space used because as long as I’m doing this why the heck not.

First off, disk space because it’s easier.  The minimum transaction log backup size that I’ve seen is 7k.  This means a completely unused database getting a transaction log backup every 30 minutes requires 336k of space over one day while one getting backed up every 5 minutes uses 2016k which is a 1.6MB difference.  This may seem small but if you have 500 databases in full recovery mode that gets you up to a whopping 820MB of space savings.   Okay, so maybe that actually isn’t much space for most shops with 500 databases.  It doesn’t seem like you should let disk space stop you from making that change.
Now, time, the more interesting of the two.  In order to test both what happens with activity and without I created a scheduled job that would call a proc every quarter of a second and ran it for two hours then let the databases sit inactive for another two hours.  I did three runs and took the average over the three to determine how long it tool.  The 5-minute logs took an average of 43.8 seconds to run and the 30-minute logs took 11.6.  Now, this shouldn’t be thought of as a 4x increase but instead as a 30 second increase over a four hour time span.  Since the logs that have activity take significantly longer than the empty logs with more time you’re going to be adding less time than it would seem.  If we extend this out to restoring 24 hours worth of logs we’re looking at about a three minute difference in restore time.
I would do testing in your environment to determine how big of a difference you would see.  Based on this testing you may have a decision to make depending on how frequently you do full backups.  If those three minutes would be critical to you it’s worth considering using a longer log backup time at the expense of risking more data loss.  However, a better option may be to invest in an HA solution so you don’t need to restore all the logs, or the full backup for that matter.

Identifying used databases

We have a number of DB servers that are used by our development team that have grown to have a number of different databases on them. At some point, one has to ask the question, “How many of these databases are actually used?” We recently went through that exercise on a SQL 2000 server. Due to the fact that the server was running SQL 2000 and it’s age I didn’t do much pre-work, just identifying which databases had active connections at a specific point in time and passed the whole list to them to look through. Out of the thirty user databases only five ended up being used. It’s time to do that same check on another server but this on is newer and running SQL 2008 R2. As a result I decided to do more pre-work to save manual work for the dev team.

In SQL 2000 my only option would have been to start a trace and let it run for a while to see what databases were hit. Fortunately, SQL 2008 has the sys.dm_db_index_usage_stats DMV. This shows the most recent user scan, seek, lookup, or update for any index including heaps. On this server, if the database is actively being queried it’s a pretty safe bet that it’s still needed and this DMV will help me quickly identify what databases are being queried.

select d.name,
 [lastSeek] = max(last_user_seek),
 [lastScan] = max(last_user_scan),
 [lastLookup] = max(last_user_lookup),
 [lastUpdate] = max(last_user_update)
 from sys.databases d
 inner join sys.dm_db_index_usage_stats ius on d.database_id = ius.database_id
 where d.database_id &gt; 4
 group by d.name

From here you can simply eliminate anything with a recent access date. To do that I pasted the results of that query into Excel and got the max of the three columns. This doesn’t eliminate the possibility that a database is being written to and not read from or is only used by a scheduled job but not actually needed. However, it’s a good starting point and can help eliminate a lot of work.