Use cases for devnodeclean.exe

Use cases for devnodeclean.exe

So what are use cases for devnodeclean.exe, and what is it? Windows creates an entry in the registry for every new device that is connected. That also goes for storage devices, including VSS shadow copies.

When you create a lot of VSS snapshots, both software (windows) or hardware (SAN) based ones that get mounted and unmounted this creates a lot of the registry entries. Normally these should get cleaned up by the process creating them. Microsoft can take care of their use cases but they cannot do this for 3rd party software as Windows cannot know the intent of that software. Hence you might end up with a registry system hive that starts to bloat. When that hive gets big enough you will notice slowdowns in shutdown and restarts. These slowdowns can become very long and even lead to failure to boot Windows.

This can happen with SAN hardware VSS provider backup software or with a backup solution that integrates with SAN hardware VSS providers. Mind you it is not limited to hardware VSS providers. It can also happen with software VSS providers. Microsoft actually had this as a bug with Hyper-V backups a long time ago. A hotfix fixed the issue by removing the registry entries the backups created.

But not all software will do this. Not even today. The better software does, but even Veeam only provided this option in VBR 9.5 Update 4. Mind you, Veeam is only responsible for what they control via storage integrations. When you leverage an off-host proxy with Hyper-V Veeam collaborates with the hardware VSS provider but does not orchestrate the transportable snapshots itself. So in that case the clean up is the responsibility of the SAN vendor’s software.

Another use case I have is file servers on a SAN being backup and protected via hardware VSS snapshots with the SAN vendors software. That also leads to registry bloat.

I never had any issues as I clean up the phantom registry entries preemptively. Veeam actually published a KB article as well on this before they fixed it in their code.

Still, if you need to clean up existing phantom registry entries you will need to use a tool call devnoceclean.exe.

Preventing registry bloat

When the software responsible doesn’t prevent registry bloat you will have to clean up the phantom registry entries in another way. Doing this manually is tedious and not practical. So, let’s forget about that option.

You can write your own code or script to take care of this issue. Cool if you can but realize you need to be very careful what you delete in the registry. Unless you really know your way around the depths of storage-related entries in the registry I suggest using a different approach, which I’ll discuss next.

Another solution is to use the Microsoft provided tool devnodeclean.exe. This tool is Microsft’s version of its example code you can find here How to remove registry information for devices that will never be used again

You can download that tool here. Extract it and grab the .exe that fits your OS architecture, x86, or x64 bit. I usually put in the subfolder Bin under C:\SysAdmin\Tools\DevNodeClean\ where I also create a subfolder named Logs. Remember you need to run this with elevated permissions. devnodeclean.exe /n list the entries it will remove while just running it without a switch actually removes the entries. It will work with Windows Server 2012(R2), 2016, and 2019. It can take a while if you have many thousands of entries.

Use cases for devnodeclean.exe
Use cases for devnodeclean.exe

While you can run the tool manually in “one-off” situations normally you’ll want to run it automatically and regularly. For that, I use a PowerShell script that logs the actions and I use Task Scheduler to run it every day or week. that depends on the workload on that host.

Sample Code

Below is some sample code to get you started.

$TimeStamp = $(((get-date)).ToString("yyyyMMddTHHmmss"))
$PathToDevNodeClean = 'C:\SysAdmin\Tools\DevNodeClean'
Start-Transcript -Path "$PathToDevNodeClean\Logs\DevNodeCleanLog-$TimeStamp.txt"
Write-Output get-date ': Starting registry cleanup of phantom VSS entries'
Invoke-Expression "$PathToDevNodeClean\Bin\DevNodeClean.exe"
Write-Output get-date 'Cleaning up old log files'

$DaysToRetain = 0
$DateTime = ((Get-Date).AddDays(-$DaysToRetain))
$AllLogFilesInDevNodeClean = Get-ChildItem -Path "$PathToDevNodeClean\Logs" -Filter "DevNodeCleanLog-*.txt" -Force -File | Where-Object { $_.CreationTime -lt $DateTime }

foreach ($File in $AllLogFilesInDevNodeClean) {
    $FileName = $File.FullName
    $TimeStamp = Get-Date
    try {
        Remove-Item -Path $FileName -ErrorAction SilentlyContinue
        Write-Output "$TimeStamp > Deleting file $FileName because it was created before $DateTime"
    } 
    catch {
        Write-Output "$TimeStamp > Failed to delete $FileName It is probably in use" 
        Write-Output $TimeStamp $_.Exception.Message
    }
    finally {      
    }
}
Stop-Transcript 

Good luck with your devnodeclean.exe adventures. As with any sample code, big boy rules apply, use it at your own risk and test before letting this lose on your production systems.

This is just one example of how my long time experience with Windows storage and backups prevents problems in environments I manage or design. If you need help or have a question, reach out and we’ll try to help.

Troubleshooting Veeam B&R Error code: ‘32768’. Failed to create VM recovery snapshot

I recently had to move a Windows Server 2016 VM over to another cluster (2012R2 to 2016 cluster)  and to do so I uses shared nothing live migration. After the VM was happily running on the new cluster I kicked of a Veeam backup job to get a first restore point for that VM. Better safe than sorry right?

image

But the job and the retries failed for that VM. The error details are:

Failed to create snapshot Compellent Replay Manager VSS Provider on repository01.domain.com (mode: Veeam application-aware processing) Details: Job failed (‘Checkpoint operation for ‘FailedVM’ failed. (Virtual machine ID 459C3068-9ED4-427B-AAEF-32A329B953AD). ‘FailedVM’ could not initiate a checkpoint operation: %%2147754996 (0x800423F4). (Virtual machine ID 459C3068-9ED4-427B-AAEF-32A329B953AD)’). Error code: ‘32768’.
Failed to create VM recovery snapshot, VM ID ‘3459c3068-9ed4-427b-aaef-32a329b953ad’.

Also when the job fails over to the native Windows VSS approach when the HW VSS provider fails it still does not work. At first that made me think of a bug that sued to exist in Windows Server 2016 Hyper-V where a storage live migration of any kind would break RCT and new full was needed to fix it. That bug has long since been fixed and no a new full backup did not solve anything here. Now there are various reasons why creating a checkpoint will not succeed so we need to dive in deeper. As always the event viewer is your friend. What do we see? 3 events during a backup and they are SQL Server related.
image

image

image

On top of that the SQLServerWriter  is in a non retryable error when checking with vssadmin list writers.

image

It’s very clear there is an issue with the SQL Server VSS Writer in this VM and that cause the checkpoint to fail. You can search for manual fixes but in the case of an otherwise functional SQL Server I chose to go for a repair install of SQL Server. The tooling for hat is pretty good and it’s probably the fastest way to resolve the issues and any underlying ones we might otherwise still encounter.

After running a successful repair install of SQL Server we get greeted by an all green result screen.

image

So now we check vssadmin list writers again to make sure they are all healthy if not restart the SQL s or other relevant service if possible. Sometime you can fix it by restarting a service, in that case reboot the server. We did not need to do that. We just ran a new retry in Veeam Backup & Replication and were successful.

There you go. The storage live migration before the backup of that VM made me think we were dealing with an early Windows Server 2016 Hyper-V bug but that was not the case. Trouble shooting is also about avoiding tunnel vision.

Optimizing Backups: PowerShell Script To Move All Virtual Machines On A Cluster Shared Volume To The Node Owing That CSV

When you are optimizing the number of snapshots to be taken for backups or are dealing with storage vendors software that leveraged their hardware VSS provider you some times encounter some requirements that are at odds with virtual machine mobility and dynamic optimization.

For example when backing up multiple virtual machines leveraging a single CSV snapshot you’ll find that:

  • Some SAN vendor software requires that the virtual machines in that job are owned by the same host or the backup will fail.
  •  
  • Backup software can also require that all virtual machines are running on the same node when you want them to be be protected using a single CSV snap shot. The better ones don’t let the backup job fail, they just create multiple snapshots when needed but that’s less efficient and potentially makes you run into issues with your hardware VSS provider.
image

VEEAM B&R v8 in action … 8 SQL Server VMs with multiple disks on the same CSV being backed up by a single hardware VSS writer snapshot (DELL Compellent 6.5.20 / Replay Manager 7.5) and an off host proxy Organizing & orchestrating backups requires some effort, but can lead to great results.

Normally when designing your cluster you balance things out a well as you can. That helps out to reduce the needs for constant dynamic optimizations. You also make sure that if at all possible you keep all files related to a single VM together on the same CSV.

Naturally you’ll have drift. If not you have a very stable environment of are not leveraging the capabilities of your Hyper-V cluster. Mobility, dynamic optimization, high to continuous availability are what we want and we don’t block that to serve the backups. We try to help out to backups as much a possible however. A good design does this.

If you’re not running a backup every 15 minutes in a very dynamic environment you can deal with this by live migrating resources to where they need to be in order to optimize backups.

Here’s a little PowerShell snippet that will live migrate all virtual machines on the same CSV to the owner node of that CSV. You can run this as a script prior to the backups starting or you can run it as a weekly scheduled task to prevent the drift from the ideal situation for your backups becoming to huge requiring more VSS snapshots or even failing backups. The exact approach depends on the storage vendors and/or backup software you use in combination with the needs and capabilities of your environment.

cls
 
$Cluster = Get-Cluster
$AllCSV = Get-ClusterSharedVolume -Cluster $Cluster
 
ForEach ($CSV in $AllCSV)
{
    write-output "$($CSV.Name) is owned by $($CSV.OWnernode.Name)"
     
    #We grab the friendly name of the CSV
    $CSVVolumeInfo = $CSV | Select -Expand SharedVolumeInfo
    $CSVPath = ($CSVVolumeInfo).FriendlyVolumeName
 
    #We deal with the \ being and escape character for string parsing.
    $FixedCSVPath = $CSVPath -replace '\\', '\\'
 
    #We grab all VMs that who's owner node is different from the CSV we're working with
    #From those we grab the ones that are located on the CSV we're working with
      $VMsToMove = Get-ClusterGroup | ? {($_.GroupType –eq 'VirtualMachine') -and ( $_.OwnerNode -ne $CSV.OWnernode.Name)} | Get-VM | Where-object {($_.path -match $FixedCSVPath)}
    
    ForEach ($VM in $VMsToMove)
     {
        write-output "`tThe VM $($VM.Name) located on $CSVPath is not running on host $($CSV.OwnerNode.Name) who owns that CSV"
        write-output "`tbut on $($VM.Computername). It will be live migrated."
        #Live migrate that VM off to the Node that owns the CSV it resides on
        Move-ClusterVirtualMachineRole -Name $VM.Name -MigrationType Live -Node $CSV.OWnernode.Name
    }
}
Winking smile

Now there is a lot more to discuss, i.e. what and how to optimize for virtual machines that are clustered. For optimal redundancy you’ll have those running on different nodes and CSVs. But even beyond that, you might have the clustered VMs running on different cluster, which is the failure domain.  But I get the remark my blogs are wordy and verbose so … that’s for another time

Fixing A Little Quirk In Dell Compellent Replay Manager

If you’re running a DELL Compellent SAN you’re probably familiar with Replay Manager. It’s Compellent’s solution to take VSS based (and as such application consistent) snapshots.image

When you’re running Replay Manager you might run into the following issue when trying to access a host.image

Every time, you access a host for the first time after opening Replay Manager you’ll be prompted for your password, even if you select Remember my password. You don’t need to retype it so that’s fine, but you do need to click it.

In the system log you’ll see the below error logged.image

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          7/08/2013 9:55:43
Event ID:      4
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      replayserver.test.lab
Description:
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server replaymanagerservice. The target name used was HTTP/myhost.test.lab. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (TEST.LAB) is different from the client domain (TEST.LAB), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

Well, this a rather well know issue in the Microsoft world. Take a look here IIS 7+ Kerberos authentication failure: KRB_AP_ERR_MODIFIED. Browse to the possible causes & solutions. You’ll find this situation right in there. So what we do is execute the following command to register the correct SPN for the host or hosts on the Replay Manager service account:

SetSPN -a HTTP/myhost.test.lab TESTreplaymanagerservice

Do note to run this from an elevated command prompt using a account with sufficient AD permissions in AD. You’ll now no longer have to click on the username/password prompt and get rid of that error.

You can verify if the SPB for your hosts exists on your Replay Manager Service account by running:

SetSPN -l  TESTreplaymanagerservice

If this is the biggest issue you’ll ever have with a hardware snapshot service & hardware provider you know you’ve got a good solution.