A First look at Cloud Witness

Introduction

In Windows Server 2012 R2 Failover Clustering we have 2 types of witness:

  1. Disk witness: a shared disk that can be seen by all cluster nodes
  2. File Share Witness (FSW): An SMB 3 file share that is accessible by all cluster nodes

Since Windows Server 2012 R2 the recommendation is to always configure a witness. The reason for this is that thanks to dynamic quorum and dynamic witness. These two capabilities offer the best possible resiliency without administrator intervention and are enabled by default. The cluster dynamically assigns a quorum vote to node when it’s up and removes it when it’s down. Likewise, the witness is given a vote when it’s better to have a witness, if you’re better off without the witness it won’t get a vote. That’s why Microsoft now advises to always set a witness, it will be managed automatically. The result of this is that you’ll get the best possible uptime for a cluster under any given circumstance.

This is still the case in Windows Server 2016 but Failover clustering does introduce a new option witness option: cloud witness.

Why do we need a cloud witness?

For certain scenarios such a cluster without shared storage and especially when a stretched cluster is involved you’ll have to use a FSW. It’s a great solution that works as well as a disk witness in most cases. Why do I say most? Well there is a scenario where a disk witness will provide better resiliency, but let’s not go there now.

Now the caveat here is that you’ll need to place the FSW in a 3rd independent site. That’s a hard order for many to fulfill. You can put in on the desktop of the receptionist at a branch office or on a virtual machine on the cluster itself but it’s “suboptimal”. Ideally the FSW is independent and high available not dependent on what it’s supposed to support in achieving quorum.

One of the other workarounds was to extend AD to Azure, deploy a SOFS Cluster with an non CA file share on a cluster of VMs in Azure and have both other sites have access to it over VPN or express route. That works but in a time of easy, fast, cheap and good solutions it’s still serious effort, just for a file share.

As Microsoft has more and more use cases that require a FSW (site aware stretched clusters, Storage Spaces Direct, Exchange DAG, SQL Availability Groups, workgroup or multi domain clusters) they had to find a solution for the growing number of customers that do not have a 3rd site but do need a FSW. The cloud idea above is great but the implementation isn’t the best as it’s rather complex and expensive. Bar using virtual machines you can’t use Azure file services in the cloud as those are primarily for consumption by applications and the security is done via not via ACLing but access keys. That means the security for the Cluster Name Object (CNO) can’t’ be set. So even when you can expose a cloud file to on premises to Windows 2016 (any OS that supports SMB 3 actually) by mapping it via NET USE the cluster GUI can’t set the required security for the cluster nodes so it will fail. And no you can’t set it manually either. Just to prove this I tried it for you to save you the trouble. Do NOT even go there!

clip_image002

So what is possible? Well come Windows Server 2016 Failover Clustering now has a 3rd type of witness. The cloud witness. Functionally wise it’s like a FSW. The big difference it’s a dedicated, cloud based solution that mitigates the need and costs for a 3rd data center and avoids the cost of the workarounds people came up with.

Implementing the cloud witness

In your Azure subscription you create a storage account, for this purpose I’ve create one named democloudwitness in my resource group RG-Demo. I’m using a separate storage account to keep thing tidy and separated from my other demo storage accounts.

A storage account gets two Access keys and two connection strings. The reason for this is that we you need to regenerate the keys you can have your workloads use the other one this can be done without down time.

clip_image004

In Azure the work is actually already done. The rest will happen on premises on the cluster. We’ll configure the cluster with a witness. In PowerShell this is a one liner.

clip_image006

If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route. You normally do not to use the endpoint parameter, just in the rare case you need to specify a different Azure service endpoint.

The above access key is a fake one by the way, just so you know. Once you’re done Get-ClusterQuorum returns Cloud Witness as QuorumResource.

clip_image008

In the GUI you’ll see

clip_image010

When you open up the Blobs services in your storage account you’ll see that a blob service has been created with a name of msft-cloud-witness. When you select it you’ll see a file with a GUID as the name.

clip_image012

That guid is actually the same as your cluster instance ID that you can find in the registry of your cluster nodes under the HKLM\Cluster key in the string value ClusterInstanceID.

Your storage account can be used for multiple clusters. You’ll just see extra entries each with their own guid.

clip_image014

All this consumes so few resources it’s quite possibly the cheapest ever way of getting a cluster witness. Time will tell.

Things to consider

• Cloud Witness uses the HTTPS REST (NOT SMB 3) interface of the Azure Storage Account service. This means it requires the HTTPS port to be open on all cluster nodes to allow access over the internet. Alternatively an Azure Site-2-Site VPN or Express Route can be used. You’ll need one of those.

• REST means no ACLing for the CNO like on a SMB 3 FSW to be done. Security is handled automatically by the Failover Cluster which doesn’t store the actual access key, but generates a shared access security (SAS) token using the access key and stores it securely.

• The generated SAS Token is valid as long as the access key remains valid. When rotating the primary access key, it is important to first update the cloud witness (on all your clusters that are using that storage account) with the secondary access key before regenerating the primary access key.

• Plan your governance between cluster & Azure admins if these are not the same. I see Azure resources governance being neglected and as a cluster admin it’s nice to have some degree of control or say in the Azure part of the equation.

For completeness I’ll mention that the entire setup of a cloud witness is also very nicely integrated in to the Failover Cluster GUI.

Right click on the desired cluster and select “Configure Cluster Quorum Settings” from menu under “More Actions”

clip_image015

Click through the startup form (unless you’ve never ever done this, then you might want to read it).

clip_image017

Select either “Select the quorum witness” or “Advanced quorum configuration”

clip_image019

We keep the default selection of all nodes.

clip_image021

We select to “Configure a cloud witness”

clip_image023

Type in your Azure storage account name, your primary access key for the “Azure storage account key” and leave the endpoint at its default. You’ll normally won’t need this unless you need to use a different Azure Service Endpoint.

clip_image025

Click “Next”to review what you’re about to do

clip_image027

Click Next again and let the wizard run.

clip_image029

You’ll get a report when it’s done. If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route.

Conclusion

I was pleasantly surprised by how it easy it was to set up a cloud witness. The biggest hurdle for some might be access to Azure in secured environments. The file itself contains no sensitive information at all and while a VPN or Express Route are secured connectivity options this might not be allowed or viable in certain environments. Other than this I have found it to be very reliable, effective cheap and easy. I really encourage you to test it and see what it can do for you.

SOFS / SMB 3 Offers Best VM Resiliency Experience

I have blogged about Virtual Machine Resiliency in Windows 2016 Failover Clustering before in Testing Virtual Machine Compute Resiliency in Windows Server 2016 

Those test and demos were done with block lever storage, CSV on Fibre Channel, iSCSI or shared SAS. Today we’ll look at the experience when you’re running your VMs on a continually available file share on a Scale Out File Server (SOFS). This configuration offers the best possible experience.

Why well, when the cluster node is in Isolated mode this has no impact on the SOFS share as this is a resource external to the Hyper-V cluster. In other words it remains on line. This means that the VMs, even if they have lost their high availability during the time the node is Isolated, they keep running. After all there is nothing wrong with Hyper-V itself. With block level CSV storage you lose access to the storage as that a cluster resource and the node got isolated. That’s why the VMs go into a paused critical state during a transient failure with block level storage but they don’t when you’re using SOFS.

image

The virtual machine compute resiliency feature in action shows you that the VMs service a transient failure without issues. Your services need never know something was up. Even when the transient failure is reoccurring that doesn’t mean it will cause down time. The node will be quarantined and if it come backup the workload will be live migrated away.

image

You can watch a video of this in action here on Vimeo:

The quarantine threshold and duration as well as the resiliency period and can be tweaked to your environment to get the best possible results.

image

SMB 3 for the win! This is yet one more convincing argument to start looking into SOFS and leveraging the capabilities of SMB3. Remember that you can run as SOFS cluster against your existing shared storage to get started if you can get the IOPS/latency you require. But also look into storage spaces, especially storage spaces direct which avoids some of the drawback SANs have in such a scenario. High time for storage vendors to really scale out, implement SMB 3 well and complete and keep the great added value features they already have in their offering. It’s this or becoming yet a bit more irrelevant in todays storage scene in the Microsoft ecosystem.

Setting up Discrete Device Assignment with a GPU

Introduction

Let’s take a look at setting up Discrete Device Assignment with a GPU. Windows Server 2016 introduces Discrete Device Assignment (DDA). This allows a PCI Express connected device, that supports this, to be connected directly through to a virtual machine.

The idea behind this is to gain extra performance. In our case we’ll use one of the four display adapters in our NVIDIA GROD K1 to assign to a VM via DDA. The 3 other can remain for use with RemoteFX. Perhaps we could even leverage DDA for GPUs that do not support RemoteFX to be used directly by a VM, we’ll see.

As we directly assign the hardware to VM we need to install the drivers for that hardware inside of that VM just like you need to do with real hardware.

I refer you to the starting blog of a series on DDA in Windows 2016:

Here you can get a wealth of extra information. My experimentations with this feature relied heavily on these blogs and MSFT provide GitHub script to query a host for DDA capable devices. That was very educational in regards to finding out the PowerShell we needed to get DDA to work! Please see A 1st look at Discrete Device Assignment in Hyper-V to see the output of this script and how we identified that our NVIDIA GRID K1 card was a DDA capable candidate.

Requirements

There are some conditions the host system needs to meet to even be able to use DDA. The host needs to support Access Control services which enables pass through of PCI Express devices in a secure manner. The host also need to support SLAT and Intel VT-d2 or AMD I/O MMU. This is dependent on UEFI, which is not a big issue. All my W2K12R2 cluster nodes & member servers run UEFI already anyway. All in all, these requirements are covered by modern hardware. The hardware you buy today for Windows Server 2012 R2 meets those requirements when you buy decent enterprise grade hardware such as the DELL PowerEdge R730 series. That’s the model I had available to test with. Nothing about these requirements is shocking or unusual.

A PCI express device that is used for DDA cannot be used by the host in any way. You’ll see we actually dismount it form the host. It also cannot be shared amongst VMs. It’s used exclusively by the VM it’s assigned to. As you can imagine this is not a scenario for live migration and VM mobility. This is a major difference between DDA and SR-IOV or virtual fibre channel where live migration is supported in very creative, different ways. Now I’m not saying Microsoft will never be able to combine DDA with live migration, but to the best of my knowledge it’s not available today.

The host requirements are also listed here: https://technet.microsoft.com/en-us/library/mt608570.aspx

  • The processor must have either Intel’s Extended Page Table (EPT) or AMD’s Nested Page Table (NPT).

The chipset must have:

  • Interrupt remapping: Intel’s VT-d with the Interrupt Remapping capability (VT-d2) or any version of AMD I/O Memory Management Unit (I/O MMU).
  • DMA remapping: Intel’s VT-d with Queued Invalidations or any AMD I/O MMU.
  • Access control services (ACS) on PCI Express root ports.
  • The firmware tables must expose the I/O MMU to the Windows hypervisor. Note that this feature might be turned off in the UEFI or BIOS. For instructions, see the hardware documentation or contact your hardware manufacturer.

You get this technology both on premises with Windows Server 2016 as and with virtual machines running Windows Server 2016; Windows 10 (1511 or higher) and Linux distros that support it. It’s also an offering on high end Azure VMs (IAAS). It supports both Generation 1 and generation 2 virtual machines. All be it that generation 2 is X64 bit only, this might be important for certain client VMs. We’ve dumped 32 bit Operating systems over decade ago so to me this is a non-issue.

For this article I used a DELL PowerEdge R730, a NVIIA GRID K1 GPU. Windows Server 2016 TPv4 with CU of March 2016 and Windows 10 Insider Build 14295.

Microsoft supports 2 devices at the moment:

  • GPUs and coprocessors
  • NVMe (Non-Volatile Memory express) SSD controllers

Other devices might work but you’re dependent on the hardware vendor for support. Maybe that’s OK for you, maybe it’s not.

Below I describe the steps to get DDA working. There’s also a rough video out on my Vimeo channel: Discrete Device Assignment with a GPU in Windows 2016 TPv4.

Setting up Discrete Device Assignment with a GPU

Preparing a Hyper-V host with a GPU for Discrete Device Assignment

First of all, you need a Windows Server 2016 Host running Hyper-V. It needs to meet the hardware specifications discussed above, boot form EUFI with VT-d enabled and you need a PCI Express GPU to work with that can be used for discrete device assignment.

It pays to get the most recent GPU driver installed and for our NVIDIA GRID K1 which was 362.13 at the time of writing.

clip_image001

On the host when your installation of the GPU and drivers is OK you’ll see 4 NIVIDIA GRID K1 Display Adapters in device manager.

clip_image002

We create a generation 2 VM for this demo. In case you recuperate a VM that already has a RemoteFX adapter in use, remove it. You want a VM that only has a Microsoft Hyper-V Video Adapter.

clip_image003

In Hyper-V manager I also exclude the NVDIA GRID K1 GPU I’ll configure for DDA from being used by RemoteFX. In this show case that we’ll use the first one.

clip_image005

OK, we’re all set to start with our DDA setup for an NVIDIA GRID K1 GPU!

Assign the PCI Express GPU to the VM

Prepping the GPU and host

As stated above to have a GPU assigned to a VM we must make sure that the host no longer has use of it. We do this by dismounting the display adapter which renders it unavailable to the host. Once that is don we can assign that device to a VM.

Let’s walk through this. Tip: run PoSh or the ISE as an administrator.

We run Get-VMHostAssignableDevice. This return nothing as no devices yet have been made available for DDA.

I now want to find my display adapters

#Grab all the GPUs in the Hyper-V Host

$MyDisplays = Get-PnpDevice | Where-Object {$_.Class -eq “Display”}

$MyDisplays | ft -AutoSize

This returns

clip_image007

As you can see it list all adapters. Let’s limit this to the NVIDIA ones alone.

#We can get all NVIDIA cards in the host by querying for the nvlddmkm

#service which is a NVIDIA kernel mode driver

$MyNVIDIA = Get-PnpDevice | Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”}

$MyNVIDIA | ft -AutoSize

clip_image009

If you have multiple type of NVIDIA cared you might also want to filter those out based on the friendly name. In our case with only one GPU this doesn’t filter anything. What we really want to do is excluded any display adapter that has already been dismounted. For that we use the -PresentOnly parameter.

#We actually only need the NVIDIA GRID K1 cards, let’s filter some #more,there might be other NVDIA GPUs.We might already have dismounted #some of those GPU before. For this exercise we want to work with the #ones that are mountedt he paramater -PresentOnly will do just that.

$MyNVidiaGRIDK1 = Get-PnpDevice -PresentOnly| Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”} |

Where-Object {$_.FriendlyName -eq “NVIDIA Grid K1”}

$MyNVidiaGRIDK1 | ft -AutoSize

Extra info: When you have already used one of the display adapters for DDA (Status “UnKnown”). Like in the screenshot below.

clip_image011

We can filter out any already unmounted device by using the -PresentOnly parameter. As we could have more NVIDIA adaptors in the host, potentially different models, we’ll filter that out with the FriendlyName so we only get the NVIDIA GRID K1 display adapters.

clip_image013

In the example above you see 3 display adapters as 1 of the 4 on the GPU is already dismounted. The “Unkown” one isn’t returned anymore.

Anyway, when we run

$MyNVidiaGRIDK1 = Get-PnpDevice -PresentOnly| Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”} |

Where-Object {$_.FriendlyName -eq “NVIDIA Grid K1”}

$MyNVidiaGRIDK1 | ft -AutoSize

We get an array with the display adapters relevant to us. I’ll use the first (which I excluded form use with RemoteFX). In a zero based array this means I disable that display adapter as follows:

Disable-PnpDevice -InstanceId $MyNVidiaGRIDK1[0].InstanceId -Confirm:$false

Your screen might flicker when you do this. This is actually like disabling it in device manager as you can see when you take a peek at it.clip_image015

When you now run

$MyNVidiaGRIDK1 = Get-PnpDevice -PresentOnly| Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”} |

Where-Object {$_.FriendlyName -eq “NVIDIA Grid K1”}

$MyNVidiaGRIDK1 | ft -AutoSize

Again you’ll see

clip_image017

The disabled adapter has error as a status. This is the one we will dismount so that the host no longer has access to it. The array is zero based we grab the data about that display adapter.

#Grab the data (multi string value) for the display adapater

$DataOfGPUToDDismount = Get-PnpDeviceProperty DEVPKEY_Device_LocationPaths -InstanceId $MyNVidiaGRIDK1[0].InstanceId

$DataOfGPUToDDismount | ft -AutoSize

clip_image019

We grab the location path out of that data (it’s the first value, zero based, in the multi string value).

#Grab the location path out of the data (it’s the first value, zero based)

#How do I know: read the MSFT blogs and read the script by MSFT I mentioned earlier.

$locationpath = ($DataOfGPUToDDismount).data[0]

$locationpath | ft -AutoSize

clip_image021

This locationpath is what we need to dismount the display adapter.

#Use this location path to dismount the display adapter

Dismount-VmHostAssignableDevice -locationpath $locationpath -force

Once you dismount a display adapter it becomes available for DDA. When we now run

$MyNVidiaGRIDK1 = Get-PnpDevice -PresentOnly| Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”} |

Where-Object {$_.FriendlyName -eq “NVIDIA Grid K1”}

$MyNVidiaGRIDK1 | ft -AutoSize

We get:

clip_image023

As you can see the dismounted display adapter is no longer present in display adapters when filtering with -presentonly. It’s also gone in device manager.

clip_image024

Yes, it’s gone in device manager. There’s only 3 NVIDIA GRID K1 adaptors left. Do note that the device is unmounted and as such unavailable to the host but it is still functional and can be assigned to a VM.That device is still fully functional. The remaining NVIDIA GRID K1 adapters can still be used with RemoteFX for VMs.

It’s not “lost” however. When we adapt our query to find the system devices that have dismounted I the Friendly name we can still get to it (needed to restore the GPU to the host when needed). This means that -PresentOnly for system has a different outcome depending on the class. It’s no longer available in the display class, but it is in the system class.

clip_image026

And we can also see it in System devices node in Device Manager where is labeled as “PCI Express Graphics Processing Unit – Dismounted”.

We now run Get-VMHostAssignableDevice again see that our dismounted adapter has become available to be assigned via DDA.

clip_image029

This means we are ready to assign the display adapter exclusively to our Windows 10 VM.

Assigning a GPU to a VM via DDA

You need to shut down the VM

Change the automatic stop action for the VM to “turn off”

clip_image031

This is mandatory our you can’t assign hardware via DDA. It will throw an error if you forget this.

I also set my VM configuration as described in https://blogs.technet.microsoft.com/virtualization/2015/11/23/discrete-device-assignment-gpus/

I give it up to 4GB of memory as that’s what this NVIDIA model seems to support. According to the blog the GPUs work better (or only work) if you set -GuestControlledCacheTypes to true.

“GPUs tend to work a lot faster if the processor can run in a mode where bits in video memory can be held in the processor’s cache for a while before they are written to memory, waiting for other writes to the same memory. This is called “write-combining.” In general, this isn’t enabled in Hyper-V VMs. If you want your GPU to work, you’ll probably need to enable it”

#Let’s set the memory resources on our generation 2 VM for the GPU

Set-VM RFX-WIN10ENT -GuestControlledCacheTypes $True -LowMemoryMappedIoSpace 2000MB -HighMemoryMappedIoSpace 4000MB

You can query these values with Get-VM RFX-WIN10ENT | fl *

We now assign the display adapter to the VM using that same $locationpath

Add-VMAssignableDevice -LocationPath $locationpath -VMName RFX-WIN10ENT

Boot the VM, login and go to device manager.

clip_image033

We now need to install the device driver for our NVIDIA GRID K1 GPU, basically the one we used on the host.

clip_image035

Once that’s done we can see our NVIDIA GRID K1 in the guest VM. Cool!

clip_image037

You’ll need a restart of the VM in relation to the hardware change. And the result after all that hard work is very nice graphical experience compared to RemoteFX

clip_image039

What you don’t believe it’s using an NVIDIA GPU inside of a VM? Open up perfmon in the guest VM and add counters, you’ll find the NVIDIA GPU one and see you have a GRID K1 in there.

clip_image041

Start some GP intensive process and see those counters rise.

image

Remove a GPU from the VM & return it to the host.

When you no longer need a GPU for DDA to a VM you can reverse the process to remove it from the VM and return it to the host.

Shut down the VM guest OS that’s currently using the NVIDIA GPU graphics adapter.

In an elevated PowerShell prompt or ISE we grab the locationpath for the dismounted display adapter as follows

$DisMountedDevice = Get-PnpDevice -PresentOnly |

Where-Object {$_.Class -eq “System” -AND $_.FriendlyName -like “PCI Express Graphics Processing Unit – Dismounted”}

$DisMountedDevice | ft -AutoSize

clip_image045

We only have one GPU that’s dismounted so that’s easy. When there are more display adapters unmounted this can be a bit more confusing. Some documentation might be in order to make sure you use the correct one.

We then grab the locationpath for this device, which is at location 0 as is an array with one entry (zero based). So in this case we could even leave out the index.

$LocationPathOfDismountedDA = ($DisMountedDevice[0] | Get-PnpDeviceProperty DEVPKEY_Device_LocationPaths).data[0]

$LocationPathOfDismountedDA

clip_image047

Using that locationpath we remove the DDA GPU from the VM

#Remove the display adapter from the VM.

Remove-VMAssignableDevice -LocationPath $LocationPathOfDismountedDA -VMName RFX-WIN10ENT

We now mount the display adapter on the host using that same locationpath

#Mount the display adapter again.

Mount-VmHostAssignableDevice -locationpath $LocationPathOfDismountedDA

We grab the display adapter that’s now back as disabled under device manager of in an “error” status in the display class of the pnpdevices.

#It will now show up in our query for -presentonly NVIDIA GRIDK1 display adapters

#It status will be “Error” (not “Unknown”)

$MyNVidiaGRIDK1 = Get-PnpDevice -PresentOnly| Where-Object {$_.Class -eq “Display”} |

Where-Object {$_.Service -eq “nvlddmkm”} |

Where-Object {$_.FriendlyName -eq “NVIDIA Grid K1”}

$MyNVidiaGRIDK1 | ft -AutoSize

clip_image049

clip_image051

We grab that first entry to enable the display adapter (or do it in device manager)

#Enable the display adapater

Enable-PnpDevice -InstanceId $MyNVidiaGRIDK1[0].InstanceId -Confirm:$false

The GPU is now back and available to the host. When your run you Get-VMHostAssignableDevice it won’t return this display adapter anymore.

We’ve enabled the display adapter and it’s ready for use by the host or RemoteFX again. Finally, we set the memory resources & configuration for the VM back to its defaults before I start it again (PS: these defaults are what the values are on standard VM without ever having any DDA GPU installed. That’s where I got ‘m)

#Let’s set the memory resources on our VM for the GPU to the defaults

Set-VM RFX-WIN10ENT -GuestControlledCacheTypes $False -LowMemoryMappedIoSpace 256MB -HighMemoryMappedIoSpace 512MB

clip_image053

Now tell me all this wasn’t pure fun!

The Hyper-V Amigos Episode 10

It’s with great pride that the Hyper-V Amigos ride again and for The Hyper-V Amigos Episode 10 they dive into what’s new and improved in Windows Server 2016 Failover Clustering.

image

Well OK we only discuss a few subjects in this web cast as there is only a limited amount of time. I’ll present an overview of during my session at the German Cloud and Datacenter conference on May 12th in Germany. An hour is not enough for a deep dive into everything but we will build on our session we did at the Technical Summit (November 2014) in Germany on Improvements in Failover Cluster 2012 R2 ad get you up to speed so you can select what to investigate further.

Until then, enjoy the webcast and I hope it helps prepare you for what’s coming and entices you to join us at the Cloud and Datacenter Summit in Germany on May 12th! And if clustering alone is not enough to bring you over check out the agenda and you might realize what great gathering of experts is happing at the conference. Just look at the content, the breath and depth of the cloud and datacenter technologies being discussed is vast!