My favorite deployment for VMs with Discrete Device Assignment for GPU

Introduction

Recently I had an interesting discussion on how to leverage Discrete Device Assignment (DDA) for GPU needs when it’s only needed for a certain number of virtual machines. Someone had read my blogs on leveraging DDA that made here optimistic and enthusiastic. But she noticed in the lab she could not leverage DDA on a VM running on a cluster and she could not use storage QoS policies on a stand-alone Hyper-V host with local storage. So, what could she do?

Well for one, her findings are correct. Microsoft did not enable DDA on clustered virtual machines. It doesn’t make sense as the GPU hardware is tied to the virtual machine and any high availability, both planned (live migration) or unplanned (failover) isn’t possible and available anyway. It just cannot be done. I hear you, when you say “but they pulled it off for SR-IOV for networking”. Sure, but please keep in mind that network cards with Ethernet and TCP/IP allows for different approaches than high end video.

My favorite deployment for VMs with Discrete Device Assignment for GPU

My favorite deployment for VMs with Discrete Device Assignment (DDA) for GPU leveraged SMB3 SOFS shares for the virtual hard disks and stand-alone Hyper-V hosts that are member servers in the domain. Let me explain why.

Based on what we discussed above we have some options. One work around is running the DDA virtual machines not high available on local storage on a cluster node. But that would mean you would have a few VMs on all the nodes and that all those nodes must have a DDA capable GPU. Or if you limit the number of nodes that have such a GPU you’ll have a few odd balls in your cluster. You’ll need to manage some extra complexity and must save guard against assigning a GPU via DDA that is already in use for RemoteFX. That cause all kinds of unpleasantness, nothing too deadly but not something you want to do if on your production VDI clusters for fun. It’s a bit like not running a domain controller on a CSV and not making it highly available. If that’s the only option you have you can do that, and I do when needed as Microsoft has improved a lot of things to make this a better and less risky experience. But I prefer to have either physical one or host it on a separate non-clustered Hyper-V host if that’s an option because not all storage solutions and environments have all capabilities needed to make that fool proof.

Also note that running other storage on a S2D node isn’t supported. You have your OS on the boot disks and the disks used in storage spaces. Odd ones out aren’t supposed to be there as S2D will try to recruit them. You can get do it when using traditional shared storage

What I also don’t like about that is that if the cluster storage is not SMB3 SOFS you don’t get the benefit of storage QoS policies in Windows Server 2016, that only works with CSV. So optionally you could leave the non-clustered VM on a CSV. But that’s perhaps a bit confusing and some people might think some forgot to make the machine high available etc.

My preferred setup to get high available storage for virtual machines with DDA needs that benefits from what storage QoS polies have to offer for VDI is to use standalone Hyper-V hosts that have DDA capable GPUs and leverage SMB3 SOFS shares for the virtual Machines.

clip_image002

The virtual machines cannot be high available anyway so you lose nothing there. The beauty is that in this case, as you leverage a Windows Server 2016 SOFS cluster for Hyper-V storage over SMB3 shares, you do get Storage QoS policies.

#On a SOFS node

Get-StorageQosPolicy -Name DedicatedTier1Policy | Get-StorageQosFlow | ft InitiatorName, *IOPS, Status, PolicyID, filePath -AutoSize

#Query for the VM disks on the Hyper-V node

Get-VM -Name DDAVMSOFSStorage -ComputerName RemoteFXHost | Get-VMHardDiskDrive |fl *

clip_image004

#We generate some IO and get some stats on a SOFS node

get-storageQosFlow

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\ | fl

clip_image006

You can start out with one Hyper-V node and add more when needed, that scale out. Depending on the needs of the virtual machines and specs of the servers (Memory, CPU cores) and the capability and number of GPU in the video cards you get some scale up as well.

To learn more about DDA go here:  https://blog.workinghardinit.work/?s=DDA&submit=Search

To learn more about storage QoS policies go here:

Some more considerations

By going disaggregated. You can leverage a SOFS share for both virtual machines running on a Hyper-V cluster or on stand-alone (non-clustered) Hyper-V that are domain members. The SOFS cluster can be leveraging S2D, traditional storage spaces with shared SAS (JBODs) or even a FC, iSCSI or shared SAS SANS if that the only option you have. That’s all OK as long as it’s SOFS running on Windows Server 2016 and the Hyper-V hosts (stand alone or clustered) are a running 2016 as well (needed for Storage QoS policies and DDA). There is no need for the Hyper-V host to be part of a cluster to get the best results you need. If I use SOFS for both scenarios I can use the same storage array, but I don’t need to. I could also use separate storage arrays. If the Hyper-V cluster is leveraging CSV instead of SOFS I will need to use a separate one for SOFS as its ill advised to mix Hyper-V workloads with the SOFS role. Keep things easy, clear and supportable. I’ll borrow a picture I got from a Microsoft PM recently, do seek out the bad ideas.

clip_image008

RemoteFX and vGPU Improvements in Windows Server 2016 Hyper-V

UPDATE 2015/11/23: RemoteFX works in Windows Server 2016 TPv4 and I’m successfully running OpenGL in a server VM with W2K16Tpv4 and a W2K16TPv4 host!

image

Let’s take a look at some of the RemoteFX and vGPU Improvements in Windows Server 2016 Hyper-V. For me the abilities they are adding in this release are significant and a break through. Why? They are talking away many of the last show stoppers for a number of scenarios that are important to the ecosystems I roam around in, when the CxO have a clue that is.

What are we looking at that’s new for Windows Server 2016?

The things that are breaking down the biggest showstoppers are:

  • OpenGL & OpenCL API Support (FINALLY!)
  • 1GB dedicated VRAM
  • 4K Resolution
  • Serverv VM Suppport (very important in our GIS environment actually) Generation 2 VM Support (YES!)
  • Improved performance
  • H.264/AVC codec investment

Now, I missed this initially but it was announced at Microsoft Ignite 2015 that RemoteFX will support generation 2 virtual machines and it allows us to still benefit form the future of virtual machines without losing RemoteFX. Until now generation 2 virtual machines were no compatible with RemoteFX.This was due to the Generation 2 virtual machines not having an emulated PCI bus, which RemoteFX needs until WIndows Server 2012 R2 and Windows 8.1.

Generation 2 support combined with Server support in the virtual machine and OpenGL (ip to 4.4) /OpenCL (up to 1.1)  is a breakthrough, let’s hope the versions supported don’t spoil the party. I wonder if they can come up with a mechanism to upgrade support if OpenGL for newer versions that are released. But application compatibility was very limiting.

This is really great news and will make Hyper-V a far better candidate for many more scenarios than ever before.

Get your test rig set up

So it’s time to upgrade the lab server with a RemoteFX capable GPU to Windows Server 2016 TPv3 and test this.

I think some of our GIS engineers will be very happy with these new capabilities for ESRI Arc GIS, Adobe, AutoCAD, … and many more less well know specialty software they need.

If you want to test it out here’s the Experience guide for Enabling OpenGL Support for vGPU in Server 2016

image
So we set it all up but unfortunately there is still an issue being worked out at the moment of writing.

image

But I will help you get started for when it’s fixed.  Which I hope will be soon! To me it looks like they “just forgot” to activate RemoteFX for server as it look a lot like a Windows Server 2012 R2 VM where one tries to add a RemoteFX card, it just doesn’t work. Sale host with Windows 10 Enterprise does not have this issue …

image

 

So why not test with Windows 10? Well the OpenGL/CL capabilities are server only. And those are important to us!

E2EVC 2014 Brussels

Ladies & gentleman, on May 30-June 1, 2014 the E2EVC 2014 Brussels Virtualization Conference is taking place. This is a non marketing event by experts in virtualization. So these people design, implement and support virtualization solutions for a living.  E2EVC Virtualization Conference is a non-commercial, it does not run a profit for the organizers or speakers. Everybody volunteers. The attendance fee covers the costs of the conference rooms, coffee breaks and such. The value is in the knowledge sharing and the networking.

 image
This community event strives to bring the best virtualisation experts together to exchange knowledge and to establish new connections. It’s a weekend event (so people can attend without interrupting their work or customer services. Filled with presentations, Master Classes and discussions you can have 3 days to network and learn from your peers.

So the next event will take place in Brussels, Belgium May 30 – June 1, 2014 in Hotel Novotel Brussels Centre Tour Noire. So my Belgian colleagues, this is your change to be al little Dutch as they have a SPECIAL PRICE FOR BELGIAN RESIDENTS – 199 EUR!

If you’re not Belgian you are also very welcome. So do register for E2EVC 2014 Brussels. If you have knowledge to share, please volunteer to speak. This community event has as a goal to share knowledge and stimulates professionals to present on their subject matters.

A big thank you for Alex Juschin & team for his never ending efforts to help organize this conference!

The Experts Conference – TEC 2011–After Action Report

I enjoyed my time at the TEC2011 conference. The networking with fellow IT Pros was excellent and the discussions were rich in content. It’s fun to see that a lot of people are already looking at Windows 8 server. In that respect the session by Hans Vredevoort “Hyper-V Storage Deep Dive” was a very good one, offering a look at things to come. Information is not yet flowing in on Windows 8, or at least not in the quantities we’d like. We all expect that situation to improve before the end of the year when we’ll (hopefully) find a first beta under the x-mas tree to play with during the holidays season Smile. Jaap Wesselius  was haunted by the demo gods. Well it was either the demo gods or all those “always on” IT Pros bringing the wireless down. But he recovered strong in his session  “Virtualizing Exchange 2010” that was actually called "Exchange on Hyper-V:do’s and don’ts". Not even the demo gods can keep Jaap down. We also enjoyed seeing Carsten Rachfahl in action in his session on “Hyper-V networking Best Practices” which he brought well and with a sense of humor on his new best practice since the evening before when a bunch of us where testing things out on Hyper-V clusters all over Europe Smile  Maarten Wijsman  (in Holland) & Rick Slagers  (on the scene at TEC2011) form Wortell.nl were both assisting in this endeavor.

I had a good time, met a lot of people from the community  like Joachim Nässlander (@Nasslander). The Belgian ProExchange community was well represented and Ilse Van Criekinge was there as well. I learned a lot and I’m happy to have attended.To conclude our conference on Wednesday Carsten Rachfahl (@hypervserver), Hans Vredevoort (@hvredevoort) and I did a video panel interview on Hyper-V Windows 8 Server and some 10Gbps cluster networking. He’ll put it on line on his web site http://www.hyper-v-server.de/videos/ so you can find there when it’s released. Jaap, Hans and I went to dinner and concluded the conference with a beer in the bar. It’s back home tomorrow and then to work.