My favorite deployment for VMs with Discrete Device Assignment for GPU

Introduction

Recently I had an interesting discussion on how to leverage Discrete Device Assignment (DDA) for GPU needs when it’s only needed for a certain number of virtual machines. Someone had read my blogs on leveraging DDA that made here optimistic and enthusiastic. But she noticed in the lab she could not leverage DDA on a VM running on a cluster and she could not use storage QoS policies on a stand-alone Hyper-V host with local storage. So, what could she do?

Well for one, her findings are correct. Microsoft did not enable DDA on clustered virtual machines. It doesn’t make sense as the GPU hardware is tied to the virtual machine and any high availability, both planned (live migration) or unplanned (failover) isn’t possible and available anyway. It just cannot be done. I hear you, when you say “but they pulled it off for SR-IOV for networking”. Sure, but please keep in mind that network cards with Ethernet and TCP/IP allows for different approaches than high end video.

My favorite deployment for VMs with Discrete Device Assignment for GPU

My favorite deployment for VMs with Discrete Device Assignment (DDA) for GPU leveraged SMB3 SOFS shares for the virtual hard disks and stand-alone Hyper-V hosts that are member servers in the domain. Let me explain why.

Based on what we discussed above we have some options. One work around is running the DDA virtual machines not high available on local storage on a cluster node. But that would mean you would have a few VMs on all the nodes and that all those nodes must have a DDA capable GPU. Or if you limit the number of nodes that have such a GPU you’ll have a few odd balls in your cluster. You’ll need to manage some extra complexity and must save guard against assigning a GPU via DDA that is already in use for RemoteFX. That cause all kinds of unpleasantness, nothing too deadly but not something you want to do if on your production VDI clusters for fun. It’s a bit like not running a domain controller on a CSV and not making it highly available. If that’s the only option you have you can do that, and I do when needed as Microsoft has improved a lot of things to make this a better and less risky experience. But I prefer to have either physical one or host it on a separate non-clustered Hyper-V host if that’s an option because not all storage solutions and environments have all capabilities needed to make that fool proof.

Also note that running other storage on a S2D node isn’t supported. You have your OS on the boot disks and the disks used in storage spaces. Odd ones out aren’t supposed to be there as S2D will try to recruit them. You can get do it when using traditional shared storage

What I also don’t like about that is that if the cluster storage is not SMB3 SOFS you don’t get the benefit of storage QoS policies in Windows Server 2016, that only works with CSV. So optionally you could leave the non-clustered VM on a CSV. But that’s perhaps a bit confusing and some people might think some forgot to make the machine high available etc.

My preferred setup to get high available storage for virtual machines with DDA needs that benefits from what storage QoS polies have to offer for VDI is to use standalone Hyper-V hosts that have DDA capable GPUs and leverage SMB3 SOFS shares for the virtual Machines.

clip_image002

The virtual machines cannot be high available anyway so you lose nothing there. The beauty is that in this case, as you leverage a Windows Server 2016 SOFS cluster for Hyper-V storage over SMB3 shares, you do get Storage QoS policies.

#On a SOFS node

Get-StorageQosPolicy -Name DedicatedTier1Policy | Get-StorageQosFlow | ft InitiatorName, *IOPS, Status, PolicyID, filePath -AutoSize

#Query for the VM disks on the Hyper-V node

Get-VM -Name DDAVMSOFSStorage -ComputerName RemoteFXHost | Get-VMHardDiskDrive |fl *

clip_image004

#We generate some IO and get some stats on a SOFS node

get-storageQosFlow

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\ | fl

clip_image006

You can start out with one Hyper-V node and add more when needed, that scale out. Depending on the needs of the virtual machines and specs of the servers (Memory, CPU cores) and the capability and number of GPU in the video cards you get some scale up as well.

To learn more about DDA go here:  https://blog.workinghardinit.work/?s=DDA&submit=Search

To learn more about storage QoS policies go here:

Some more considerations

By going disaggregated. You can leverage a SOFS share for both virtual machines running on a Hyper-V cluster or on stand-alone (non-clustered) Hyper-V that are domain members. The SOFS cluster can be leveraging S2D, traditional storage spaces with shared SAS (JBODs) or even a FC, iSCSI or shared SAS SANS if that the only option you have. That’s all OK as long as it’s SOFS running on Windows Server 2016 and the Hyper-V hosts (stand alone or clustered) are a running 2016 as well (needed for Storage QoS policies and DDA). There is no need for the Hyper-V host to be part of a cluster to get the best results you need. If I use SOFS for both scenarios I can use the same storage array, but I don’t need to. I could also use separate storage arrays. If the Hyper-V cluster is leveraging CSV instead of SOFS I will need to use a separate one for SOFS as its ill advised to mix Hyper-V workloads with the SOFS role. Keep things easy, clear and supportable. I’ll borrow a picture I got from a Microsoft PM recently, do seek out the bad ideas.

clip_image008

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

Carsten and I dove into our labs and played around with RemoteFX and Discrete Device Assignment in Windows Server 2016 Hyper-V and RDS. This resulted in the Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Some background on RemoteFX & DDA

I’ve discussed the new capabilities in previous blog posts such as https://blog.workinghardinit.work/?s=DDA&submit=Search  and RemoteFX and vGPU Improvements in Windows Server 2016 Hyper-V. But here the Hyper-V Amigos talk about it for your benefit and enjoyment. I for one know we had a ton of fun. Microsoft only VDI solutions are really taking off both on-premises and in Azure in cost conscious environments that still need good performance. I think we’ll see an uptake of such deployments as Microsoft has made some decisions and added some features to make this more feasible.

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Click this link or the image below to watch Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

image

There’s a bit of a learning curve associated with using DDA in Windows Server 2016. You’ll have to get acquainted with how to do it and put it to the test in labs and POCs. Do this before you even start thinking about designing production ready solutions. Having a good understanding on how it works and behaves is paramount to success.

Enjoy!

Discrete Device Assignment with Storage Controllers

Discrete Device Assignment with Storage Controllers is the second type of DDA we’ll look at. I have written before on Discrete Device Assignment in Windows 2016. A the time of writing officially the 2 supported use cases are GPU and NVMe disk pass-through. I have demonstrated the configuration of DDA with a NVIDIA GRID K1 GPU here.

Meanwhile I have also successfully configured DDA with a NVMe disk. I’ll demonstrate how to do this later but in this blog post I want to address a consequence of this experiment. So let’s take a preliminary at Discrete Device Assignment with storage controllers

With an Intel NMVe disk you do not assign the actual disk to the virtual machine. It’s the controller. By disabling and dismounting the standard NVM Express Controller from the host and assigning it to the guest you make the NVME disk available in the guest.

image

This is supported. This makes me wonder if MSFT would consider officially supporting other storage controllers. What if you need 8TB of high performance storage dedicated to a single VM? You could assign an extra controller in the host with a RAID 10 SSD virtual disk to the virtual machine. How different would that be from NVMe? No too much I guess. Smile

https://i0.wp.com/blog.workinghardinit.work/wp-content/uploads/2016/04/image-7.png?ssl=1

By the way this assigning & un-assigning keeps data intact. This means that is also a roundabout sort of way to get data in and out of a virtual machine. On of the funky, crazy ideas I have already is to use this to export & import data. Maybe.

I really do wonder how things will evolve here. Perhaps these are too much “niche” use case scenarios but it’s interesting none the less. But perhaps the advances in NVMe Fabrics and the added performance available via VHDX outpace the need for DDA storage solutions.

Anuway enough musing as we’ll be taking a more hands on look at assigning a NVMe disk to a VM and Discrete Device Assignment with storage controllers via PowerShell in a later blog and video.

Issues to watch out for when configuring Discrete Device Assignment

When you’re discovering how to get discrete device assignment to work you have some potential bumps that might trip you up. So what are the issues to watch out for when configuring Discrete Device Assignment? We’ll share some here but note this is from testing with Windows Server 2016 Technical Preview 4. Changes can and probably will happen before RTM.

Make sure your VMs are running on the latest configuration version. That means 7.x at the time of writing. Many of the new features require this as discussed in Windows Server 2016 TPv4 Hyper-V brings virtual machine configuration version 7

Check the configuration version of the VM

When you try to add a GPU to a VM via Discrete Device Assignment you’ll get an error when the VM has version 5.0 in stead of 7.x. This can easily happen when you move VMs from older versions to a shiny new Windows 2016 environment as in the example below:

image

Naturally all of this is logged in the Hyper-V-VMMS Admin logs as well

‘W2K12R2’ cannot add device ‘Virtual Pci Express Port’ until the virtual machine is upgraded. (Virtual machine ID 592A920F-B0E9-480C-9052-A397B377BCC9)

Mind your Dynamic Memory settings

Another thing you need to watch out for is that when you use dynamic memory the startup memory and the minimum memory values have to match. So minimum memory cannot be lower than the startup memory. Do not that this is TPv4 and things might change.

image

Cannot add the device to ‘W2K12R2’ as that virtual machine has Dynamic Memory configured with different startup memory and minimum memory values. When adding a device, the virtual machine must be configured with equal startup memory and minimum memory values.(Virtual machine ID 592A920F-B0E9-480C-9052-A397B377BCC9)

If you try to change this on a VM with discrete device assignment enabled you’ll also find that this isn’t allowed.

image

Cannot perform the operation for ‘W2K12R2’ as the specified memory settings are not compatible for device assignment. The startup memory size and minimum memory size must be equal when Dynamic Memory is enabled and devices are also assigned.(Virtual machine ID 592A920F-B0E9-480C-9052-A397B377BCC9)

Set the automatic stop action to “Turn off the virtual machine”

I already mentioned this in the blog but you need to make sure that the automatic stop action for the virtual machine is set to “turn off the virtual machine and not to the default of “save the virtual machine state”. You cannot use DDA unless you do so.

image

Cannot add the device to ‘W2K12R2’ as that virtual machine is configured to go to saved state on host shutdown. (Virtual machine ID 592A920F-B0E9-480C-9052-A397B377BCC9)

Again, changing this on a VM that has DDA assigned will not work.

image

Discrete means one on one

Remember that you cannot assign a device to more than one VM. The thing here is it won’t block you when both VMs are shut down, at least not in TPv4.  But It’s dedicated and won’t work. When you do and you try to start any of those VMs it won’t work.

image

An error occurred while attempting to start the selected virtual machine(s).

‘RFX-WIN10ENT’ failed to start.

Virtual Pci Express Port (Instance ID 9B15DD32-5F94-46EF-8524-501007830322): Failed to Power on with Error ‘The device is in use by an active process and cannot be disconnected.’.

When you try to assign a GPU to a VM that is assigned to a running VM it will block you!

The dive Cleary identifies the VM the device is already assigned to.

Add-VMAssignableDevice -LocationPath $LocationPathOfDismountedDA -VMName RFX-WIN10ENT
Add-VMAssignableDevice : ‘RFX-WIN10ENT’ failed to add resources to ‘RFX-WIN10ENT’.
Virtual Pci Express Port (Instance ID EA7CB907-C38A-4396-97E0-A9A8F3C2D1B0): Failed to Power on with Error ‘The device
is in use by an active process and cannot be disconnected.’.
‘RFX-WIN10ENT’ failed to add resources. (Virtual machine ID 425A366E-E380-4D8C-AADE-DE16EAC0A104)
‘RFX-WIN10ENT’ Virtual Pci Express Port (Instance ID EA7CB907-C38A-4396-97E0-A9A8F3C2D1B0): Failed to Power on with
Error ‘The device is in use by an active process and cannot be disconnected.’ (0x80070964). (Virtual machine ID
425A366E-E380-4D8C-AADE-DE16EAC0A104)
Could not allocate the PCI Express device with the Plug and Play Device Instance path
‘PCIP\VEN_10DE&DEV_0FF2&SUBSYS_101210DE&REV_A1\6&17F903&0&00400010’ because it is already in use by another VM.
At line:1 char:1
+ Add-VMAssignableDevice -LocationPath $LocationPathOfDismountedDA -VMN …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Add-VMAssignableDevice], VirtualizationException
+ FullyQualifiedErrorId : OperationFailed,Microsoft.HyperV.PowerShell.Commands.AddVmAssignableDevice

Shut down your VM to make change to DDA

Last but not least to use DDA (assign, configure) with a VM you have to shut it down.  Removing devices whilst the VM is running isn’t blocked. But, he results can be quite “harsh”. This is me removing a DDA GPU form a Windows 2012 R2 VM whilst it’s running.

image

The fun part is that you can add it again while the VM is running and with will work, but it’s not a healthy thing to do.

As stated above, these notes are from testing with Windows Server 2016 Technical Preview 4 so thing can still change. Happy testing!