Use cases for devnodeclean.exe

Use cases for devnodeclean.exe

So what are use cases for devnodeclean.exe, and what is it? Windows creates an entry in the registry for every new device that is connected. That also goes for storage devices, including VSS shadow copies.

When you create a lot of VSS snapshots, both software (windows) or hardware (SAN) based ones that get mounted and unmounted this creates a lot of the registry entries. Normally these should get cleaned up by the process creating them. Microsoft can take care of their use cases but they cannot do this for 3rd party software as Windows cannot know the intent of that software. Hence you might end up with a registry system hive that starts to bloat. When that hive gets big enough you will notice slowdowns in shutdown and restarts. These slowdowns can become very long and even lead to failure to boot Windows.

This can happen with SAN hardware VSS provider backup software or with a backup solution that integrates with SAN hardware VSS providers. Mind you it is not limited to hardware VSS providers. It can also happen with software VSS providers. Microsoft actually had this as a bug with Hyper-V backups a long time ago. A hotfix fixed the issue by removing the registry entries the backups created.

But not all software will do this. Not even today. The better software does, but even Veeam only provided this option in VBR 9.5 Update 4. Mind you, Veeam is only responsible for what they control via storage integrations. When you leverage an off-host proxy with Hyper-V Veeam collaborates with the hardware VSS provider but does not orchestrate the transportable snapshots itself. So in that case the clean up is the responsibility of the SAN vendor’s software.

Another use case I have is file servers on a SAN being backup and protected via hardware VSS snapshots with the SAN vendors software. That also leads to registry bloat.

I never had any issues as I clean up the phantom registry entries preemptively. Veeam actually published a KB article as well on this before they fixed it in their code.

Still, if you need to clean up existing phantom registry entries you will need to use a tool call devnoceclean.exe.

Preventing registry bloat

When the software responsible doesn’t prevent registry bloat you will have to clean up the phantom registry entries in another way. Doing this manually is tedious and not practical. So, let’s forget about that option.

You can write your own code or script to take care of this issue. Cool if you can but realize you need to be very careful what you delete in the registry. Unless you really know your way around the depths of storage-related entries in the registry I suggest using a different approach, which I’ll discuss next.

Another solution is to use the Microsoft provided tool devnodeclean.exe. This tool is Microsft’s version of its example code you can find here How to remove registry information for devices that will never be used again

You can download that tool here. Extract it and grab the .exe that fits your OS architecture, x86, or x64 bit. I usually put in the subfolder Bin under C:\SysAdmin\Tools\DevNodeClean\ where I also create a subfolder named Logs. Remember you need to run this with elevated permissions. devnodeclean.exe /n list the entries it will remove while just running it without a switch actually removes the entries. It will work with Windows Server 2012(R2), 2016, and 2019. It can take a while if you have many thousands of entries.

Use cases for devnodeclean.exe
Use cases for devnodeclean.exe

While you can run the tool manually in “one-off” situations normally you’ll want to run it automatically and regularly. For that, I use a PowerShell script that logs the actions and I use Task Scheduler to run it every day or week. that depends on the workload on that host.

Sample Code

Below is some sample code to get you started.

$TimeStamp = $(((get-date)).ToString("yyyyMMddTHHmmss"))
$PathToDevNodeClean = 'C:\SysAdmin\Tools\DevNodeClean'
Start-Transcript -Path "$PathToDevNodeClean\Logs\DevNodeCleanLog-$TimeStamp.txt"
Write-Output get-date ': Starting registry cleanup of phantom VSS entries'
Invoke-Expression "$PathToDevNodeClean\Bin\DevNodeClean.exe"
Write-Output get-date 'Cleaning up old log files'

$DaysToRetain = 0
$DateTime = ((Get-Date).AddDays(-$DaysToRetain))
$AllLogFilesInDevNodeClean = Get-ChildItem -Path "$PathToDevNodeClean\Logs" -Filter "DevNodeCleanLog-*.txt" -Force -File | Where-Object { $_.CreationTime -lt $DateTime }

foreach ($File in $AllLogFilesInDevNodeClean) {
    $FileName = $File.FullName
    $TimeStamp = Get-Date
    try {
        Remove-Item -Path $FileName -ErrorAction SilentlyContinue
        Write-Output "$TimeStamp > Deleting file $FileName because it was created before $DateTime"
    } 
    catch {
        Write-Output "$TimeStamp > Failed to delete $FileName It is probably in use" 
        Write-Output $TimeStamp $_.Exception.Message
    }
    finally {      
    }
}
Stop-Transcript 

Good luck with your devnodeclean.exe adventures. As with any sample code, big boy rules apply, use it at your own risk and test before letting this lose on your production systems.

This is just one example of how my long time experience with Windows storage and backups prevents problems in environments I manage or design. If you need help or have a question, reach out and we’ll try to help.

Production Checkpoints in Windows Server 2016

We’ve  had snapshots, or better checkpoints as we call them now for consistency amongst products, for a longest time in Hyper-V. I have always liked and used them to my benefit. That’s what they are intended for. But you have to use them correctly, in a supported and smart manner. Some (or perhaps not an insignificant number of) people did not read the manual and/or do not test their assumptions before trying something in production. Some times that leads to a lesson, sometimes it leads to tears.

We now have the choice between two type of checkpoints: Production Checkpoints and Standard Checkpoints.

A standard virtual machine checkpoint all the memory state of running applications get stored and when you apply the checkpoint it’s back magically. Doing this to a production SQL or Exchange Server for example causes (huge) problems. With some applications these problems are minor or transient but it’s not a healthy consistent state to be in, and recovery has to happen. Which could  happen automatically or require disaster recovery depending on the situation at hand.

Production checkpoints are made in application consistent manner. For this the leverage Volume Shadow Copy Services (or File System Freeze on Linux) which puts the virtual machines into a safe state to create a checkpoint that can be restored like a VSS based, application consistent backup or SAN snapshot. This does mean that applying a production checkpoint requires the restored virtual machine to boot from an off line state, just like with a restored backup.

The choice for what type of checkpoint can be made on a per virtual machine basis which make it’s flexible as you can pick the best option for a particular virtual machine for a specific purpose. As you might have guessed, that still requires some insight, reading the manual and testing your assumptions. But you now can have the behavior you want and way to many assumed to have.

image

We also have the option of allowing or disallowing for a standard checkpoint to me made if for any reason (which results in the VSS snapshot in Windows or file systems freeze in Linux  in the guest might not work or are not available) a production checkpoint cannot be made. Here’s the table of what type of checkpoint can be used when from MSDN. I conclude that the chosen default is the best fitting one for most scenarios.

image

You also have the option of choosing the standard checkpoints for a virtual machine. That gives you the exact behavior as with all previous versions of Hyper-V.

I love the GUI for ad hoc work but when I need to do this on dozens or hundreds of virtual machines or potentially tens of thousands when running a larger private cloud this is not the way to go. PowerShell is you long time trusted friend here!

image

As you can see just the “-CheckpointType” parameter you control the check pointing behavior. And as it is very easy to grab all virtual machines on a host or cluster setting this for all or a selection of your virtual machines is easy and fast. Let’s set it to “ProductionOnly” and grab the setting for that VM via PowerShell

image

When you create a checkpoint of a Windows Server 2016 Hyper-V host you’ll even get a nice notification (you can turn it off) by default that the Production checkpoint used backup technology in the guest operating system.

image

It’s also important to realize that this capability is the basis of the new checkpoint based way of making backups in Windows Server 2016 as well. But that’s a subject for another blog post. Thank you for reading!

Optimizing Backups: PowerShell Script To Move All Virtual Machines On A Cluster Shared Volume To The Node Owing That CSV

When you are optimizing the number of snapshots to be taken for backups or are dealing with storage vendors software that leveraged their hardware VSS provider you some times encounter some requirements that are at odds with virtual machine mobility and dynamic optimization.

For example when backing up multiple virtual machines leveraging a single CSV snapshot you’ll find that:

  • Some SAN vendor software requires that the virtual machines in that job are owned by the same host or the backup will fail.
  •  
  • Backup software can also require that all virtual machines are running on the same node when you want them to be be protected using a single CSV snap shot. The better ones don’t let the backup job fail, they just create multiple snapshots when needed but that’s less efficient and potentially makes you run into issues with your hardware VSS provider.
image

VEEAM B&R v8 in action … 8 SQL Server VMs with multiple disks on the same CSV being backed up by a single hardware VSS writer snapshot (DELL Compellent 6.5.20 / Replay Manager 7.5) and an off host proxy Organizing & orchestrating backups requires some effort, but can lead to great results.

Normally when designing your cluster you balance things out a well as you can. That helps out to reduce the needs for constant dynamic optimizations. You also make sure that if at all possible you keep all files related to a single VM together on the same CSV.

Naturally you’ll have drift. If not you have a very stable environment of are not leveraging the capabilities of your Hyper-V cluster. Mobility, dynamic optimization, high to continuous availability are what we want and we don’t block that to serve the backups. We try to help out to backups as much a possible however. A good design does this.

If you’re not running a backup every 15 minutes in a very dynamic environment you can deal with this by live migrating resources to where they need to be in order to optimize backups.

Here’s a little PowerShell snippet that will live migrate all virtual machines on the same CSV to the owner node of that CSV. You can run this as a script prior to the backups starting or you can run it as a weekly scheduled task to prevent the drift from the ideal situation for your backups becoming to huge requiring more VSS snapshots or even failing backups. The exact approach depends on the storage vendors and/or backup software you use in combination with the needs and capabilities of your environment.

cls
 
$Cluster = Get-Cluster
$AllCSV = Get-ClusterSharedVolume -Cluster $Cluster
 
ForEach ($CSV in $AllCSV)
{
    write-output "$($CSV.Name) is owned by $($CSV.OWnernode.Name)"
     
    #We grab the friendly name of the CSV
    $CSVVolumeInfo = $CSV | Select -Expand SharedVolumeInfo
    $CSVPath = ($CSVVolumeInfo).FriendlyVolumeName
 
    #We deal with the \ being and escape character for string parsing.
    $FixedCSVPath = $CSVPath -replace '\\', '\\'
 
    #We grab all VMs that who's owner node is different from the CSV we're working with
    #From those we grab the ones that are located on the CSV we're working with
      $VMsToMove = Get-ClusterGroup | ? {($_.GroupType –eq 'VirtualMachine') -and ( $_.OwnerNode -ne $CSV.OWnernode.Name)} | Get-VM | Where-object {($_.path -match $FixedCSVPath)}
    
    ForEach ($VM in $VMsToMove)
     {
        write-output "`tThe VM $($VM.Name) located on $CSVPath is not running on host $($CSV.OwnerNode.Name) who owns that CSV"
        write-output "`tbut on $($VM.Computername). It will be live migrated."
        #Live migrate that VM off to the Node that owns the CSV it resides on
        Move-ClusterVirtualMachineRole -Name $VM.Name -MigrationType Live -Node $CSV.OWnernode.Name
    }
}
Winking smile

Now there is a lot more to discuss, i.e. what and how to optimize for virtual machines that are clustered. For optimal redundancy you’ll have those running on different nodes and CSVs. But even beyond that, you might have the clustered VMs running on different cluster, which is the failure domain.  But I get the remark my blogs are wordy and verbose so … that’s for another time

Trouble Shooting Windows Server 2012 host based CommVault Backups with DELL Compellent hardware VSS provider of Hyper-V guests: ‘Microsoft Hyper-V VSS Writer’ State: [5] Waiting for completion

We have been running CommVault Simpana 9.0 R2 SP7 in combination with the DELL Compellent Hardware VSS provider to do host based backups of the virtual machines on our Windows Server 2012 Hyper-V clusters host with great success and speed.

We’ve run into two issues so far. One, I blogged about in DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied was an due to some missing permissions for the domain account we configured the Compellent Replay manager Service to run with. The solution for that issue can be found in that same blog post.

The other one was that sometimes during the backup of a Hyper-V host we got an error from CommVault that put the job in a “pending” status, kept trying and failing. The error is:

Error Code: [91:9], Description: Volume Shadow Copy Service (VSS) error. VSS service or writers may be in a bad state. Please check vsbkp.log and Windows Event Viewer for VSS related messages. Or run vssadmin list writers from command prompt to check state of the VSS writers.

clip_image001

When we look at the Compellent controller we see the following things happen:

  • The snapshots get made
  • They are mounted briefly and then dismounted.
  • They are deleted

The result at the CommVault end is that the job goes into a pending state with the above error. When we look at the state of the Microsoft Hyper-V VSS Writer by running “vssadmin list writer” …

image

… from an elevated command prompt we see:

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Retryable error

Note at this stage:

  1. Resuming the job doesn’t help (it actually keep trying by itself but no joy).
  2. Killing the job and restarting brings no joy. On top of that our friendly error “Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied.“ is back, but this time related to the error state of the ‘Microsoft Hyper-V VSS Writer’. The error now has changed a little and has become:

clip_image002

 

 

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

To get rid of this one we can restart the host or, less drastic, restart the Hyper-V Virtual Machine management Service (VMMS.exe) which will do the trick as well.  Before you do this , drain the node when you pause it, then resume it with the option failing back the roles. Windows 2012 makes it a breeze to do this without service interruption Smile

image

clip_image003

image

The Cause: Almost or completely full partitions inside the virtual machines

Looking for solutions when CommVault is involved can be tedious as their consultancy driven sales model isn’t focused on making information widely available. Trouble shooting VSS issues can also be considered a form of black art at times. Since this is Windows 2012 RTM an the date is September 20th 2012 as the moment of writing, there are not yet any hotfixes related to host level backups of Virtual machines and such. CommVault Simpana 9.0 R2 SP7 is also fully patched.

This,combined with the fact that we did not see anything like this during testing (and we did a fair amount) makes us look at the guests. That’s the big difference on a large production cluster. All those unique guests with their own history. We also know from the past years with VSS snapshots in Windows 2008(R2) that these tend to fail due to issues in the guests. Take a peak at Troubleshoot VSS issues that occur with Windows Server Backup (WBADMIN) in Windows Server 2008 and Windows Server 2008 R2 just for starters  As an example we already had seen one guest (dev/test server) that had 5 user logged in doing all kinds of reconfigurations and installs go into save mode during a backup, so it could be due to something rotten in certain guests. There is very much to consider when doing these kinds of backups.

By doing some comparing of successful & failed backups it really looks as if it was related to certain virtual machines. A lot of issues are caused by the VSS service, not running or not being able to do snapshots because of lack of space so perhaps this was the case here as well?

We poked around a bit. First let’s see what we can find in the Hyper-V specific logs like the Microsoft-Windows-Hyper-V-VMMS-Admin event log. Ah lot’s of errors relating to a number of guests!

image

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          19/09/2012 22:14:37
Event ID:      10102
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      undisclosed server
Description:
Failed to create the volume shadow copy inside of virtual machine ‘undisclosedserver’. (Virtual machine ID 84521EG0G-8B7A-54ED-2F24-392A1761ED11)

Well people, that is called a clue Winking smile. So we did some Live Migration to isolate suspect VMs to a single node, run backups, see them fail, do the the same with a new and clean VM an it all works. and indeed … looking at the guest involved when the CommVault backup fails we that the VSS service is running and healthy but we do see all kind of badness related to disk space:

  • Large SQL Server backup files put aside on the system partition or or other disks
  • Application & service pack installers left behind,
  • Log and tempdb volumes running out of space.
  • Application Logs running out of control

That later one left 0MB of disk space on the system (Test Controller TFS shitting itself), but we managed to clear just enough to get to just over 1GB of free space which was enough to make the backup succeed.

clip_image001[8]

image

Servers, virtual or physical ones, should to be locked down to prevent such abuse. I know, I know. Did I already tell you I do not reside in a perfect world? We cannot protect against dev and test server admins who act without much care on their servers. We’ll just keep hammering at it to raise their awareness I guess. For end users and production servers we monitor those well enough to proactively avoid issues. With dev & test servers we don’t do so, or the response team would have a day’s work reacting to all alerts that daily dev & test usage on those servers generate.

The fix

  • Clear at least 1GB or a bit more inside each partition in the guest running on the host that has a failing backup. I prefer to have at least a couple of GB free  (10% to 15% => give yourself some head room people).
  • Then you can resume the backup job manually or let CommVault do that for you if it’s still in a pending state.
  • If you’ve killed the job make sure you restore the
  • Microsoft Hyper-V VSS Writer  to a healthy state as described above. Thanks to Live Migration this can be achieved without any down time.

Conclusion

There is experimenting, testing, production testing, production and finally real life environments where not all is done as it should be. Yes, really the world isn’t perfect. Managers sometimes think it’s click, click, Next, click and voila we’ve got a complex multisite system running. Well it isn’t like that and you need some time and skills to make it all work. Yes even in todays “cheap, fast, easy to run your business form your smartphone”  ecosystem of the private, hybrid and public cloud, where all is bliss and world peace reigns.

The DELL Compellent Hardware VSS provider & replay manager service handle all this without missing a beat, which is very comforting. As previous experiences with hardware VSS provides of other vendors make us think that these would probably have blown up by now.