Live Migration Fails due to non-existent SharedStoragePath or ConfigStoreRootPath

Introduction

I was tasked to troubleshoot a cluster where cluster aware updating (CAU) failed due to the nodes never succeeding going into maintenance mode. It seemed that none of the obvious or well know issues and mistakes that might break live migrations were present. Looking at the cluster and testing live migration not a single VM on any node would live migrate to any other node.
So, I take a peek the event id and description and it hits me. I have seen this particular event id before.

Live Migration Fails due to non-existent SharedStoragePath or ConfigStoreRootPath

Log Name:      System
Source:        Microsoft-Windows-Hyper-V-High-Availability
Date:          9/27/2018 15:36:44
Event ID:      21502
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-B.datawisetech.corp
Description:
Live migration of ‘Virtual Machine ADFS1’ failed.
Virtual machine migration operation for ‘ADFS1’ failed at migration source ‘NODE-B’. (Virtual machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7)
Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).
The live migration fails due to non-existent SharedStoragePath or ConfigStoreRootPath which is where collections metadata lives.

More errors are logged

There usually are more related tell-tale events. They however are clear in pin pointing the root cause.

On the destination host

On the destination host you’ll find event id 21066:

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          9/27/2018 15:36:45
Event ID:      21066
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-A.datawisetech.corp
Description:
Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).

A bunch of 1106 events per failed live migration per VM in like below:

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Operational
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          9/27/2018 15:36:45
Event ID:      1106
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-A.datawisetech.corp
Description:
vm\service\migration\vmmsvmmigrationdestinationtask.cpp(5617)\vmms.exe!00007FF77D2171A4: (caller: 00007FF77D214A5D) Exception(998) tid(1fa0) 80070002 The system cannot find the file specified.

On the source host

On the source host you’ll find event id 1840 logged
Log Name:      Microsoft-Windows-Hyper-V-Worker-Operational
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          9/27/2018 15:36:44
Event ID:      1840
Task Category: None
Level:         Error
Keywords:
User:          NT VIRTUAL MACHINE\4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7
Computer:      NODE-B.datawisetech.corp
Description:
[Virtual machine 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7] onecore\vm\worker\migration\workertaskmigrationsource.cpp(281)\vmwp.exe!00007FF6E7C46141: (caller: 00007FF6E7B8957D) Exception(2) tid(ff4) 80042001     CallContext:[\SourceMigrationTask]

As well as event id 21111:
Log Name:      Microsoft-Windows-Hyper-V-High-Availability-Admin
Source:        Microsoft-Windows-Hyper-V-High-Availability
Date:          9/27/2018 15:36:44
Event ID:      21111
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-B.datawisetech.corp
Description:
Live migration of ‘Virtual Machine ADFS1’ failed.

… event id 21066:
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          9/27/2018 15:36:44
Event ID:      21066
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-B.datawisetech.corp
Description:
Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).

… and event id 21024:
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          9/27/2018 15:36:44
Event ID:      21024
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      NODE-B.datawisetech.corp
Description:
Virtual machine migration operation for ‘ADFS1’ failed at migration source ‘NODE-B’. (Virtual machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7)

Live migration fails due to non-existent SharedStoragePath or ConfigStoreRootPath explained

If you have worked with guest clusters and the ConfigStoreRootPath you know about issues with collections/ groups & checkpoints. This is related to those. If you haven’t heard anything yet read https://blog.workinghardinit.work/2018/09/10/correcting-the-permissions-on-the-folder-with-vhds-files-checkpoints-for-host-level-hyper-v-guest-cluster-backups/.

This is what a Windows Server 2016/2019 cluster that has not been configured with a looks like.

Get-VMHostCluster  -ClusterName “W2K19-LAB”

image

HKLM\Cluster\Resources\GUIDofWMIResource\Parameters there is a value called ConfigStoreRootPath which in PowerShell is know as the SharedStoragePath property.  You can also query it via

And this is what it looks like in the registry (0.Cluster and Cluster keys.) The resource ID we are looking at is the one of the Virtual Machine Cluster WMI resource.

image

If it returns a path you must verify that it exists, if not you’re in trouble with live migrations. You will also be in trouble with host level guest cluster backups or Hyper-V replicas of them. Maybe you don’t have guest cluster or use in guest backups and this is just a remnant of trying them out.

When I run it on the problematic cluster I get a path points to a folder on a CSV that doesn’t exist.

Get-VMHostCluster -ClusterName “W2K19-LAB
ClusterName SharedStoragePath
———– —————–
W2K19-LAB   C:\ClusterStorage\ReFS-01\SharedStoragePath

What happend?

Did they rename the CSV? Replace the storage array? Well as it turned out they reorganized and resized the CSVs. As they can’t shrink SAN LUNs the created new ones. They then leveraged storage live migration to move the VMs.

The old CSV’s where left in place for about 6 weeks before they were cleaned up. As this was the first time they ran Cluster Aware Updating after removing them this is the first time they hit this problem. Bingo! You probably think you’ll just change it to an existing CSV folder path or delete it. Well as it turns out, you cannot do that. You can try …

PS C:\Users\administrator1> Set-VMHostCluster -ClusterName “W2K19-LAB” -SharedStoragePath “C:\ClusterStorage\Volume1\SharedStoragePath”

Set-VMHostCluster : The operation on computer ‘W2K19-LAB’ failed: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80070032
At line:1 char:1
+ Set-VMHostCluster -ClusterName
“W2K19-LAB” -SharedStoragePath “C:\Clu …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Set-VMHostCluster], VirtualizationException
+ FullyQualifiedErrorId : OperationFailed,Microsoft.HyperV.PowerShell.Commands.SetVMHostCluster

Or try …
$path = “C:\ClusterStorage\Volume1\Hyper-V\Shared”
Get-ClusterResource “Virtual Machine Cluster WMI” | Set-ClusterParameter -Name ConfigStoreRootPath -Value $path -Create

Whatever you try, deleting, overwriting, … no joy. As it turns out you cannot change it and this is by design. A shaky design I would say. I understand the reasons because if it changes or is deleted and you have guest clusters with collection depending on what’s in there you have backup and live migration issues with the guest clusters. But if you can’t change it you also run into issues if storage changes. You dammed if you do, dammed if you don’t.

Workaround 1

What

Create a CSV with the old name and folder(s) to which the current path is pointing. That works. It could even be a very small one. As test I use done of 1GB. Not sure of that’s enough over time but if you can easily extend your CSV that’s should not pose a problem. It might actually be a good idea to have this as a best practice. Have a dedicated CSV for the SharedStoragePath. I’ll need to ask Microsoft.

How

You know how to create a CSV and a folder I guess, that’s about it.  I’ll leave it at that.

Workaround 2

What

Set the path to a new one in the registry. This could be a new path (mind you this won’t fix any problems you might already have now with existing guest clusters).

Delete the value for that current path and leave it empty. This one is only a good idea if you don’t have a need for VHD Set Guest clusters anymore. Basically, this is resetting it to the default value.

How

There are 2 ways to do this. Both cost down time. You need to bring the cluster service down on all nodes and then you don’t have your CSV’s. That means your VMs must be shut down on all nodes of the cluster

The Microsoft Support way

Well that’s what they make you do (which doesn’t mean you should just do it even without them instructing you to do so)

  1. Export your HKLM\Cluster\Resources\GUIDofWMIResource\Parameters for save keeping and restore if needed.
  2. Shut down all VMs in the cluster or even the ones residing on a CSV even if not clusterd.
  3. Stop the cluster service on all nodes (the cluster is shutdown if you do that), leave the one you are working on for last.
  4. From one node, open up the registry key
  5. Click on HKEY_LOCAL_MACHINE and then click on file, then select load hive
  6. Browse to c:\windows\cluster, and select CLUSDB
  7. Click ok, and then name it DB
  8. Expand DB, then expand Resources
  9. Select the GUID of Virtual Machine WMI
  10. Click on parameters, on (configStoreRootPath) you will find the value
  11. Double click on it, and delete it or set it to a new path on a CSV that you created already
  12. Start the cluster service
  13. Then start the cluster service from all nodes, node by node

My way

Not supported, at your own risk, big boy rules apply. I have tried and tested this a dozen times in the lab on multiple clusters and this also works.

  1. In the registry key Cluster (HKLM\Cluster\Resources\GUIDofWMIResource\Parameters) of ever cluster node delete the content of the REGZ value for configStoreRootPath, so it is empty or change it to a new path on a CSV that you created already for this purpose.
  2. If you have a cluster with a disk witness, the node who owns the disk witness also has a 0.Cluster key (HKLM\0.Cluster\Resources\GUIDofWMIResource\Parameters). Make sure you also to change the value there.
  3. When you have done this. You have to shut down all the virtual machines. You then stop the cluster service on every node. I try to work on the node owning the disk witness and shut down the cluster on that one as the final step. This is also the one where I start again the cluster again first so I can easily check that the value remains empty in both the Cluster and the 0.Cluster keys. Do note that with a file share / cloud share witness, knowing what node was shut down last can be important. See https://blog.workinghardinit.work/2017/12/11/cluster-shared-volumes-without-active-directory/. That’s why I always remember what node I’m working on and shut down last.
  4. Start up the cluster service on the other nodes one by one.
  5. This avoids having to load the registry hive but editing the registry on every node in large clusters is tedious. Sure, this can be scripted in combination with shutting down the VMs, stopping the cluster service on all nodes, changing the value and then starting the cluster services again as well as the VMs. You can control the order in which you go through the nodes in a script as well. I actually did script this but I used my method. you can find it at the bottom of this blog post.

Both methods will work and live migrations will work again. Any existing problematic guest cluster VMs in backup or live migration is food for another blog post perhaps. But you’ll have things like driving your crazy.

Some considerations

Workaround 1 is a bit of a “you got to be kidding me” solution but at least it leaves some freedom replace, rename, reorganize the other CSVs as you see fit. So perhaps having a dedicated CSV just for this purpose is not that silly. Another benefit is that this does not involve messing around in the cluster database via the registry. This is something we advise against all the time but now has become a way to get out of a pickle.

Workaround 2 speaks for its self. There is two ways to achieve this which I have shown. But a word of warning. The moment the path changes and you have some already existing VHD Set guests clusters that somehow depend on that you’ll see that backups start having issues and possibly even live migrations. But you’re toast for all your Live migrations anyway already so … well yeah, what can I do.

So, this is by design. Maybe it is but it isn’t very realistic that your stuck to a path and name that hard and that it causes this much grief or allows for people to shoot themselves in the foot. It’s not like all this documented somewhere.

Conclusion

This needs to be fixed. While I can get you out of this pickle it is a tedious operation with some risk in a production environment. It also requires down time, which is bad. On top of that it will only have a satisfying result if you don’t have any VHD Set guest clusters that rely on the old path. The mechanism behind the SharedStoragePath isn’t as robust and flexible yet as it should be when it comes to changes & dealing with failed host level guest cluster backups.

I have tested this in Windows 2019 insider preview. The issue is still there. No progress on that front. Maybe in some of the future cumulative updates, things will be fixed to make guest clustering with VHD Set a more robust and reliable solution. The fact that Microsoft relies on guest clustering to support some deployment scenarios with S2D makes this even more disappointing. It is also a reason I still run physical shared storage-based file clusters.

The problematic host level backups I can work around by leveraging in guest backups. But the path issue is unavoidable if changes are needed.

After 2 years of trouble with the framework around guest cluster backups / VHD Set, it’s time this “just works”. No one will use it when it remains this troublesome and you won’t fix this if no one uses this. The perfect catch 22 situation.

The Script

$ClusterName = "W2K19-LAB"
$OwnerNodeWitnessDisk = $Null
$RemberLastNodeThatWasShutdown = $Null
$LogFileName = "ConfigStoreRootPathChange"

$RegistryPathCluster = "HKLM:\Cluster\Resources\$WMIClusterResourceID\Parameters"
$RegistryPathClusterDotZero = "HKLM:\0.Cluster\Resources\$WMIClusterResourceID\Parameters"
$REGZValueName = "ConfigStoreRootPath" 
$REGZValue = $Null #We need to empty the value
#$REGZValue = "C:\ClusterStorage\ReFS-01\SharedPath" #We need to set a new path.

#Region SupportingFunctionsAndWorkFlows
Workflow ShutDownVMs {
    param ($AllVMs)
    
    Foreach -parallel ($VM in $AllVMs) {
        InlineScript {
            try {
                If ($using:VM.State -eq "Running") {
                    Stop-VM -Name $using:VM.Name -ComputerName $using:VM.ComputerName -force 
                } 
            }
            catch {
                $ErrorMessage = $_.Exception.Message
                $ErrorLine = $_.InvocationInfo.Line
                $ExceptionInner = $_.Exception.InnerException
                Write-2-Log -Message "!Error occured!:" -Severity Error
                Write-2-Log -Message $ErrorMessage -Severity Error
                Write-2-Log -Message $ExceptionInner -Severity Error
                Write-2-Log -Message $ErrorLine -Severity Error
                Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
            }
        }
    }
}

#Code to shut down all VMs on all Hyper-V cluster nodes
Workflow StartVMs {
    param ($AllVMs)
    Foreach -parallel ($VM in $AllVMs) {
        InlineScript {
            try {
                if ($using:VM.State -eq "Off") {
                    Start-VM -Name $using:VM.Name -ComputerName $using:VM.ComputerName 
                }
            }
            catch {
                $ErrorMessage = $_.Exception.Message
                $ErrorLine = $_.InvocationInfo.Line
                $ExceptionInner = $_.Exception.InnerException
                Write-2-Log -Message "!Error occured!:" -Severity Error
                Write-2-Log -Message $ErrorMessage -Severity Error
                Write-2-Log -Message $ExceptionInner -Severity Error
                Write-2-Log -Message $ErrorLine -Severity Error
                Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
            }
        }
    }
}
function Write-2-Log {
    [CmdletBinding()]
    param(
        [Parameter()]
        [ValidateNotNullOrEmpty()]
        [string]$Message,
        [Parameter()]
        [ValidateNotNullOrEmpty()]
        [ValidateSet('Information', 'Warning', 'Error')]
        [string]$Severity = 'Information'
    )
 
    $Date = get-date -format "yyyyMMdd"
    [pscustomobject]@{
        Time     = (Get-Date -f g)
        Message  = $Message
        Severity = $Severity
        
    } | Export-Csv -Path "$PSScriptRoot\$LogFileName$Date.log" -Append -NoTypeInformation
}


#endregion

Try {
    Write-2-Log -Message "Connecting to cluster $ClusterName" -Severity Information
    $MyCluster = Get-Cluster -Name $ClusterName
    $WMIClusterResource = Get-ClusterResource "Virtual Machine Cluster WMI" -Cluster $MyCluster
    Write-2-Log -Message "Grabbing Cluster Resource: Virtual Machine Cluster WMI" -Severity Information
    $WMIClusterResourceID = $WMIClusterResource.Id
    Write-2-Log -Message "The Cluster Resource Virtual Machine Cluster WMI ID is $WMIClusterResourceID " -Severity Information
    Write-2-Log -Message "Checking for quorum config (disk, file share / cloud witness) on $ClusterName" -Severity Information

    If ((Get-ClusterQuorum -Cluster $MyCluster).QuorumResource -eq "Witness") {
        Write-2-Log -Message "Disk witness in use. Lookin up for owner node of witness disk as that holds the 0.Cluster registry key" -Severity Information
        #Store the current owner node of the witness disk.
        $OwnerNodeWitnessDisk = (Get-ClusterGroup -Name "Cluster Group").OwnerNode 
        Write-2-Log -Message "Owner node of witness disk is $OwnerNodeWitnessDisk" -Severity Information
    }
}
Catch {
    $ErrorMessage = $_.Exception.Message
    $ErrorLine = $_.InvocationInfo.Line
    $ExceptionInner = $_.Exception.InnerException
    Write-2-Log -Message "!Error occured!:" -Severity Error
    Write-2-Log -Message $ErrorMessage -Severity Error
    Write-2-Log -Message $ExceptionInner -Severity Error
    Write-2-Log -Message $ErrorLine -Severity Error
    Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
    Break
}

try {
    $ClusterNodes = $MyCluster | Get-ClusterNode
    Write-2-Log -Message "We have grabbed the cluster nodes $ClusterNodes from $MyCluster" -Severity Information

    Foreach ($ClusterNode in $ClusterNodes) {
        #If we have a disk witness we also need to change the in te 0.Cluster registry key on the current witness disk owner node.
        If ($ClusterNode.Name -eq $OwnerNodeWitnessDisk) {
            if (Test-Path -Path $RegistryPathClusterDotZero) {
                Write-2-Log -Message "Changing $REGZValueName in 0.Cluster key on $OwnerNodeWitnessDisk who owns the witnessdisk to $REGZvalue" -Severity Information
                Invoke-command -computername $ClusterNode.Name -ArgumentList $RegistryPathClusterDotZero, $REGZValueName, $REGZValue {
                    param($RegistryPathClusterDotZero, $REGZValueName, $REGZValue)
                    Set-ItemProperty -Path $RegistryPathClusterDotZero -Name $REGZValueName -Value $REGZValue -Force | Out-Null}
            }
        }
        if (Test-Path -Path $RegistryPathCluster) {
            Write-2-Log -Message "Changing $REGZValueName in Cluster key on $ClusterNode.Name to $REGZvalue" -Severity Information
            Invoke-command -computername $ClusterNode.Name -ArgumentList $RegistryPathCluster, $REGZValueName, $REGZValue {
                param($RegistryPathCluster, $REGZValueName, $REGZValue)
                Set-ItemProperty -Path $RegistryPathCluster -Name $REGZValueName -Value $REGZValue -Force | Out-Null}
        }
    }

    Write-2-Log -Message "Grabbing all VMs on all clusternodes to shut down" -Severity Information
    $AllVMs = Get-VM –ComputerName ($ClusterNodes)
    Write-2-Log -Message "We are shutting down all running VMs" -Severity Information
    ShutdownVMs $AllVMs
}

catch {
    $ErrorMessage = $_.Exception.Message
    $ErrorLine = $_.InvocationInfo.Line
    $ExceptionInner = $_.Exception.InnerException
    Write-2-Log -Message "!Error occured!:" -Severity Error
    Write-2-Log -Message $ErrorMessage -Severity Error
    Write-2-Log -Message $ExceptionInner -Severity Error
    Write-2-Log -Message $ErrorLine -Severity Error
    Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
    Break
}

try {
    #Code to stop the cluster service on all cluster nodes
    #ending with the witness owner if there is one
    Write-2-Log -Message "Shutting down cluster service on all nodes in $MyCluster that are not the owner of the witness disk" -Severity Information
    Foreach ($ClusterNode in $ClusterNodes) {
        #First we shut down all nodes that do NOT own the witness disk
    
        If ($ClusterNode.Name -ne $OwnerNodeWitnessDisk) {
            Write-2-Log -Message "Stop cluster service on node $ClusterNode.Name" -Severity Information
            if ((Get-ClusterNode -Cluster W2K19-LAB | where-object {$_.State -eq "Up"}).count -ne 1) {
                Stop-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
            }
            Else {
                Stop-Cluster -Cluster $MyCluster -Force | Out-Null
                $RemberLastNodeThatWasShutdown = $ClusterNode.Name
            }
        }
    }
    #Whe then shut down the nodes that owns the witness disk
    #If we have a fileshare etc,  this won't do anything.
    Foreach ($ClusterNode in $ClusterNodes) {
        If ($ClusterNode.Name -eq $OwnerNodeWitnessDisk) {
            Write-2-Log -Message "Stopping cluster and as such last node $ClusterNode.Name" -Severity Information
            Stop-Cluster -Cluster $MyCluster -Force | Out-Null
            $RemberLastNodeThatWasShutdown = $OwnerNodeWitnessDisk
        }
    }  
    #Code to start the cluster service on all cluster nodes,
    #starting with the original owner of the witness disk
    #or the one that was shut down last


    Foreach ($ClusterNode in $ClusterNodes) {
        #First we start the node that was shut down last. This is either the one that owned the witness disk
        #or just the last node that was shut down in case of a fileshare
        If ($ClusterNode.Name -eq $RemberLastNodeThatWasShutdown) {
            Write-2-Log -Message "Starting the clusternode $ClusterNode.Name that was the last to shut down" -Severity Information
            Start-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
        }           
    }

    Write-2-Log -Message "Starting the all other clusternodes in $MyCluster" -Severity Information
    Foreach ($ClusterNode in $ClusterNodes) {
        #We then start all the other nodes in the cluster.     
        If ($ClusterNode.Name -ne $RemberLastNodeThatWasShutdown) {
            Write-2-Log -Message "Starting the clusternode $ClusterNode.Name" -Severity Information
            Start-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
        }
    }
}

catch {
    $ErrorMessage = $_.Exception.Message
    $ErrorLine = $_.InvocationInfo.Line
    $ExceptionInner = $_.Exception.InnerException
    Write-2-Log -Message "!Error occured!:" -Severity Error
    Write-2-Log -Message $ErrorMessage -Severity Error
    Write-2-Log -Message $ExceptionInner -Severity Error
    Write-2-Log -Message $ErrorLine -Severity Error
    Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
    Break
}

Start-sleep -Seconds 15
Write-2-Log -Message "Grabbing all VMs on all clusternodes to start them up" -Severity Information
$AllVMs = Get-VM –ComputerName ($ClusterNodes)
Write-2-Log -Message "We are starting all stopped VMs" -Severity Information
StartVMs $AllVMs
#Hit it again ...
$AllVMs = Get-VM –ComputerName ($ClusterNodes)
StartVMs $AllVMs

The script is below as promised. If you use this without testing in a production environment and it blows up in your face you are going to get fired and it is your fault. You can use it both to introduce as fix the issue. The action are logged in the directory where the script is run from.

High performance live migration done right means using SMB Direct

I  saw people team two 10GBps NICs for live migration and use TCP/IP. They leveraged LACP for this as per my blog Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members . That was a nice post but not a commercial to use it. It was to prove a point that LACP/Static switch dependent teaming did allow for multiple VMs to be live migrated in the same direction between two node. But for speed, max throughput & low CPU usage teaming is not the way to go. This is not needed as you can achieve bandwidth aggregation and redundancy with SMB via Multichannel. This doesn’t require any LACP configuration at all and allows for switch independent aggregation and redundancy. Which is great, as it avoids stacking with switches that don’t do  VLT, MLAG,  …

Even when your team your NICs your better off using SMB. The bandwidth aggregation is often better. But again, you can have that without LACP NIC teaming so why bother? Perhaps one reason, with LACP failover is faster, but that’s of no big concern with live migration.

We’ll do some simple examples to show you why these choices matter. We’ll also demonstrate the importance of an optimize RSS configuration. Do not that the configuration we use here is not a production environment, it’s just a demo to show case results.

But there is yet another benefit to SMB.  SMB Direct.  That provides for maximum throughput, low latency and low CPU usage.

LACP NIC TEAM with 2*10Gbps with TCP

With RSS setting on the inbox default we have problems reaching the best possible throughput (17Gbps). But that’s not all. Look at the CPU at the time of live migration. As you can see it’s pretty taxing on the system at 22%.

image

If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we sometimes get better CPU overhead at +/- 12% but the throughput does not improve much and it’s not very consistent. It can get worse and look more like the above.

image

LACP NIC TEAM with 2*10Gbps with SMB (Multichannel)

With the default RSS Settings we still have problems reaching the best possible throughput but it’s better (19Gbps). CPU wise, it’s pretty taxing on the system at 24%.

image

If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we get better over CPU overhead at +/- 8% but the throughput actually declined (17.5 %). When we run the test again we were back to the results we saw with default RSS settings.

image

Is there any value in using SMB over TCP with LACP for live migration?

Yes there is. Below you see two VMs live migrate, RSS is optimized. One core per VM is used and the throughput isn’t great, is it. Depending on the speed of your CPU you get at best 4.5 to 5Gbps throughput per VM as that 1 core per VM is the limiting factor. Hence see about 9Gbps here, as there’s 2 VMs, each leveraging 1 core.

image

Now look at only one VM with RSS is optimized with SMB over an LACP NIC team. Even 1 large memory VM leverages 8 cores and achieves 19Gbps.

image

What about Switch Independent Teaming?

Ah well that consumes a lot less CPU cycles but it comes at the price of speed. It has less CPU overhead to deal with in regards to LACP. It can only receive on one team member. The good news is that even a single VM can achieve 10Gbps (better than LACP) at lower CPU overhead. With SMB you get better CPU distribution results but as the one member is a bottle neck, not faster. But … why bother when we have …better options!? Read on Smile!

No Teaming – 2*10Gbps with SMB Multichannel, RSS Optimized

We are reaching very good throughput but it’s better (20Gbps) with 8 RSS queues assigned to 8 physical cores. The CPU at the time of live migration is pretty good at 6%-7%.

image

Important: This is what you want to use if you don’t have 10Gbps but you do have 4* 1Gbps NICs for live migration. You can test with compression and LACP teaming if you want/can to see if you get better results. Your mirage may vary Smile. If you have only one 1Gbps NIC => Compression is your sole & only savior.

2*10Gbps with SMB Direct

We’re using perfmon here to see the used bandwidth as RDMA traffic does not show up in Task Manager.

image

We have no problems reaching the best possible throughput but it’s better (20Gbps, line speed). But now look at the CPU during live migration. How do you like them numbers?

Do not buy non RDMA capable NICs or Switches without DCB support!

These are real numbers, the only thing is that the type and quality of the NICs, firmware and drivers used also play a role an can skew the results a bit. The onboard LOM run of the mill NICs aren’t always the best choice. Do note that configuration matters as you have seen. But SMB Direct eats them all for breakfast, no matter what.

Convinced yet? People, one of my core highly valuable skillsets is getting commodity hardware to perform and I tend to give solid advice. You can read all my tips for fast live migrations here in Live Migration Speed Check List – Take It Easy To Speed It Up

Does all of this matter to you? I say yes , it does. It depends on your environment and usage patterns. Maybe you’re totally over provisioned and run only very small workloads in your virtual machines. But it’s save to say that if you want to use your hardware to its full potential under most circumstances you really want to leverage SMB Direct for live migrations. What about that Hyper-V cluster with compute and storage heavy applications, what about SQL Server virtualization? Would you not like to see this picture with SMB RDMA? The Mellanox  RDMA cards are very good value for money. Great 10Gbps switches that support DCB (for PFC/ETS) can be bought a decent prices. You’re missing out and potentially making a huge mistake not leveraging SMB Direct for live migrations and many other workloads. Invest and design your solutions wisely!

Live Migration Speed Check List – Take It Easy To Speed It Up

When configuring live migrations it’s easy to go scrounge on all the features and capabilities we have in Windows Server 2012 R2.

There is no one stopping you configuring 50 simultaneous live migrations. When you have only one, two or even four 1Gbps NICs at your disposal,  you might stick to 1 or 2 VMs per available 1Gbps. But why limit yourself if you have one or multiple 10Gbps pipes or bigger ready to roll? Well let’s discuss a little what happens when you do a live migration on a Hyper-V cluster with CSV storage. Initiating a live migrations kicks of a slew of activities.

  1. First it is establish form where (aka the source host) to where we are migrating (aka the target host).
  2. Permissions are checked, are we allowed to do this?
  3. Do we have enough memory on the target to do this? If so allocate that memory.
  4. Set up a skeleton VM on the target host that is a perfect copy of the source VM’s  specifications and configure dependencies on the target host.
  5. Let’s see if we can get a network connection set up and running. If that works, we’re cool and can now transfer the memory.
  6. A bitmap is created to track the changes to the memory pages of the source VM’s pages. Each memory page is copied from the source host to the target host VM during which the memory page is marked clean.
  7. As long as the source VM is running memory is changing, which continues to be tracked in the bitmap and as such that page is mapped as dirty over there. In an iterative process this dirty memory is copied over again and so on. This continues until the remaining dirty memory is minimal. This will take longer if the VM is very memory intensive.
  8. The tiniest amount of not yet copied dirty memory is that part of a VMs state that is copied during “black out”. For this to happen the VM on the source host is paused, the remaining state is copied.
  9. A final check is done to confirm all is well and then the virtual machine is resumed on the target host.
  10. Any remains of the VM on the source host are cleaned up.

That’s actually a lot of work and as you can see copying the state is just part of the process. The more bandwidth & the lower the latency we throw at this part of the process becomes less of the total time spent during live migration.

If you can’t fill of just fill the bandwidth of your 10/40/46Gbps pipe or pipes & you operate at line speed, what’s left as overhead? Everything that’s not actual the copy of VM state. The trick is to keep the host busy so you minimize idle time of the network copies. I.e we want to fill up that bandwidth just right but  not go overboard otherwise  the work to manage a large number of multiple live migrations might actually slow you down. Compare it to juggling with balls. You might be very good and fast at it but when you have to many balls to attend to you’ll get into trouble because you have to spread you attention to wide, i.e. you’re doing more context switching that is optimal.

So tweaking the number of simultaneous live migrations to your environment is the last step in making sure a node is drained as fast as possible. Slowing things down can actually speed things up.  So when you get your 10Gbps or better pipes in production it pays of to test a bit and find the best settings for your environment.

Let’s recap all of the live migration optimization tips I have given over the years and add a final word of advice.  Those who have been reading my blog for a while know I enjoy testing to find what works best and I do tweak settings to get best performance and results. However you have to learn and accept that it makes no sense in real life to hunt for 1% or 2% reduction in live migration speeds. You’ll get one off  hiccups that slow you down more than that.

So what you need to do is tweak the things that matter the most and will get you 99% results?

  • Get the biggest pipe you need & can afford. Bigger pipes are always better than lots of aggregated smaller pipes when it come to low latency & high throughput.
  • Choose the best performance settings Hyper-V offers you. You can choose from TCP/IP,Compression, SMB. Ben Armstrong has a blog post on this Faster Live Migration–Which Option Should You Choose? I’d like to add that you can use NIC teaming for live migration as well and prior to Windows Server 2012 R2 that was the only way to aggregate bandwidth. Now you have more options. I prefer SMB but when I don’t have 10Gbps at my disposal I have found that compression really makes a difference. In my home  lab where I have only 1Gbps, the horror, it stopped me from going crazy Smile (being addicted to 10Gbps).

image

  • Optimize the power settings for your server BIOS if you want an extra speed & smoothness with 10Gbps (less so with 1Gbps). Look here An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview, the network traffic is a lot more stable, i.e. a flat line!  In Windows 2008 R2 this was a real need for 10Gbps or you’d be stuck at 16% max.
  • Enable Jumbo Frames for another 15-20%. Thanks to Multi Channel I can visualize this now. See also this blog post Live Migration Can Benefit From Jumbo Frames. The pictures say it all!
  • Figure out the best number of simultaneous live migrations in your environments. Well you just read this blog, so now you know.  Start at 4 and experiment upwards. Tune it back down if the speed deteriorates. The “best” number depends on your environment.

If you do these 5 things you’ll have really gotten the best performance out of your infrastructure that’s possible for live migration. Bar compression, which is not magic either but reducing the GB you need to transport at the cost of CPU cycles, you just cannot push more than 1.25GB/s trough a single 10Gbps pipe and so on. You might keep looking to grab another 1% or 2% improvement left and right  but might I suggest you have more pressing issues to attend to that, when fixed are a lot more rewarding? Knocking 1 or 2 seconds of a 100 second host evacuation is not going to matter, it’s a glitch. Stop, don’t over engineer it, don’t IBM it, just move on. If you don’t get top performance after tweaking these 5 settings you should look at all the moving parts involved between the host as the issue is there (drivers, firmware, cables, switch configurations, …) as you have a mistake or problem somewhere along the way.

The Hyper V Amigos Showcast Episode 3: Live Migration

Here’s the 3rd episode of the Hyper-V Amigos show cast. As Carsten was overwhelmed with work (running your own business is very hard work) and had some issues with his storage spaces lab due to testing we’re discussing live migration optimizations in this installment.

 Carsten Rachfahl and I had a lot of fun again, even during the second take, yes we needed one. Apparently these software thingies require me to click on “record” Smile as there is no intelligent agent yet to act on my intention.

Carsten & I discussing & showing some live migration optimizations

 

I have written many blog posts on this subject already and I’m sure I’ll write more. Optimizing the use of the hypervisor (Hyper-V) across the entire storage, compute/memory & networking stack is one of my specialties and I enjoy this part of my job very much. I also like to share this information as real.

I’m sure you’ll agree that Hyper-V has come a long way in short period of time and I’m pretty sure we’re going to see Microsoft continue this pace for quite a while.

I have a blog post coming out (it’s in the queue) on my 4 top recommendations for optimal live migrations but here’s a search of relevant blog posts on this topic, and we referred to some of them during our show cast:

https://blog.workinghardinit.work/?s=Live+Migration&submit=Search

When you’re done reading al these posts on live migration you’ll have earned a nice refreshing beverage of your choice Mug.

One more thing, if you like these show casts let us know! Last but not least, I’m doing a demo heavy (only) session at ITProceed on June 12th 2014. Many local experts, community members  and I will be around afterwards to discuss these technologies.