The Hyper-V Processor Virtual Machine Reserve

Introduction

Hyper-V offers 3 ways of managing or tweaking the CPU scheduler to provide the best possible configuration for certain scenarios and use cases. The defaults normally work fine but of certain conditions you might want to tweak them for the best possible outcome.  The CPU resource controls at your disposal are:

  • Virtual machine reserve  – Think of this as the minimum CPU “QoS”
  • Virtual machine limit – Think of this as the maximum CPU “QoS”
  • Relative Weight – Think of this as the scale defining what VM is more important.

Note that you should understand what these setting are and can do. Threat them like spices. Select the ones you need and don’t overdo it. They’re there to help you, if needed you can leverage all three. But it’s highly unlikely you’ll need to do so. Using one or two will server you best if and when you need them.

In this blog post we’ll look at the virtual machine reserve.

Virtual Machine Reserve

The virtual machine reserve is a minimum CPU guarantee. This is configured at the virtual machine level for all vCPUs of that virtual machine. This means a couple of things. The reserve is guaranteed for the virtual machine that has it configured. This means that when there is contention for CPU resources the reserve is enforced. When there is no reserve configured are left to compete for the available resources, they don’t have any “SLA” minimum guaranteed CPU cycles however

image

The virtual machine reserve is configured as the minimal % of CPU usage we’ll guarantee a virtual machine to get when there is contention. 0% reserve means that the virtual machine reserve is disabled. This is the default.

So how is this reserve accomplished? When a virtual machine is started there is a check if the virtual machine reserve can be met. If not, the virtual machine will not be started. This is also the case when you want to quick or live migrate a virtual machine. You cannot migrate a virtual machines to a host that cannot deliver the reserve.

The reserve is guaranteed. When you have started enough virtual machines with a reserve to consume 100% of the host CPU resources you cannot start another virtual machines that has a reserve configured. This is even the case if all the running virtual machines who together have reserved 100% of the host CPU resources are completely idle. That’s the cost of reserving CPU resources.

Also note you can mix virtual machines with and without a reserve configured. That’s perfectly fine.

Below is an extreme example. This single virtual machines has 56 vCPU configured with a 100% reserve. This means that this virtual machines will only start on a host that has no other virtual machines with a reserve running. Virtual machines without a reserve will run but when this virtual machines requires the minimum to be met they’ll lose their CPU cycles.

image

So how does this work

When the virtual machines are running and contention for CPU resources materializes on the host than the reserve will kick in and guarantee the virtual machines get there guaranteed minimum. If there is no contention this isn’t enforced and all virtual machines can grab CPU resources as long as they are available. Only when a reserve can’t be met due to lack of resources the CPU scheduler intervenes. This means there is no waste by not allowing other virtual machines to use what is not needed at any given moment by others. Yes virtual machines can use more than their reserve if it’s available and other virtual machines don’t have a need for their reserved CPU cycles. But the moment there is contention, the reserve is guaranteed. That’s why you cannot reserve more that 100% in total. This however makes things a bit more complex in real life. A reserve is not enforced when not needed but it does have to be there. This means that if you reserve a 30% for a virtual machine with 56vCPU you can’t even reserve 80% for a virtual machine with 8 vCPU and run those together.

image

Even when you have 56 cores left with 70% CPU unreserved you cannot find one single CPU with 80% free, let alone 8 of them. This means you cannot start such a virtual machine when the first one is running and vice versa. So that’s another caveat you need to consider.

Use Cases

The primary use case of virtual machine reserve is to enforce an SLA for a virtual machines compute needs. When a DBA demands 100% of the 8 vCPUs his SQL Server virtual machine has configured you can set the reserve to 100%, which is 14*% of all CPU resources on the host. On our 56 core host that means at least 8 CPU have to have 100% of resources free or that SQL Server virtual machine won’t even start as the reserve can’t be guaranteed. imageWhen it’s running and there is no contention for resources other virtual machines can grab whatever they need. The CPU scheduler will only take resources away from the other virtual machines if it’s needed to guarantee the reserve configured for the SQL Server virtual machine.

Some people get creative and use the virtual machine reserve to allocate 100% of all host CPU resources to all virtual machines running on a host. This means all resources are taken and they’ll need to rebalance the reservation if you want to add more virtual machines with a certain reserve. This only works if you can enforce all the virtual machines are created with a reserve. If not some of the new virtual machines might not get any CPU cycles when here is a lot of contention.

Limitations

It can become high maintenance. When people set up CPU reserves for all virtual machines (let’s say 14 of them each with 2 vPCU)  on a host with 2*14 cores. That means we can give each virtual machine a reserve of 10%. They all get 2*10% of 28 cores. 

When you now add a virtual machine with a reserve to the host you’ll need to reset the reserves to accommodate 11 virtual machines. As said this can be used to make sure no VM get’s deployed and started without you noticing (some one will complain when a VM doesn’t start) but this is tedious to manage.

Also note that this game of percentages plays per host but those can be part of a cluster. So when you do this on all nodes on a cluster things will go wrong if you have reserved 100% of the CPU resources on all nodes, live migration and failover will run into issues as extra virtual machines can’t start on nodes that haven’t got sufficient CPU resources left. You need to account and plan for for this

Conclusion

Virtual machine reserve is a powerful tool to have when needed. You do need to understand its behavior very well and take that into behavior for startup, quick and live migrations.

If you do that and use it wisely it can help you achieve SLA’s and the best possible guaranteed performance when needed.

Live Export a Running Virtual Machine or a Checkpoint

A remarkably little known feature in Windows Sever 2012 R2 (and Windows 8.1)  is the ability to export one or multiple running virtual machines.

image

You just select right click in the Hyper-V manager and select Export from the context menu and follow the wizard to select an export location. Easy. This is also possible via PowerShell so you can automate this. The result is a VM you can import which gives you a copy of the original virtual machine in a saved state, at the point in time that you exported it.

More people seem to know about the capability to export a checkpoint of a running virtual machine, not so many of the capability to export a running VM itself. I noticed this because some people figured the latter was a new feature in Windows 2016. No it’s not. We’ve had this option since Windows 8.1 and Windows Server 2012 R2.

image

So why even have the option of exporting a checkpoint of a running VM? Because this enables you to have exports from various points in time, which is pretty cool and handy during test and development and trouble shooting or lab work. As a standard checkpoint has state in Windows Server 2012 R2 I prefer to shut down the VM, create a checkpoint and start the VM again. When I then export that checkpoint I don’t have to worry about the state in the VM at that point in time as it was shut down.

For some workloads this isn’t a big deal bit for some this is not a great experience, hence the fact that checkpoints are “”not supported in production but for test and dev.

In Windows Server 2016 we now have production checkpoints. That means that when we apply such checkpoints we have a consistent state just like when we restore VM from a backup. You’ll have to boot it up after applying the checkpoint, they do not appear running with the state at the time the snapshot was taken. Well, not unless you opt to create standard checkpoints. The reduces the need for me to shut down a VM before I create a checkpoint to export in many cases.

When you export a running VM in Windows Server 2016 you’ll have a copy of it in saved state. Just like you did in Windows Server 2012 R2, no change there. When you import that you’ll have a VM in saved state that you need to start up. If you want an application consistent copy, create a production checkpoint first and export that one.

So there you go. The feature to live export a running virtual machine was here before and it’s still here. The real extra capability with live exports comes from leveraging the live export of a checkpoint of a running virtual machine and the fact that we now have production checkpoints.

E2EVC 2016 Dublin

So after 4 days in London with the Veeam Vanguards I’m hopping over the Irish sea tonight towards Dublin to attend and present at the Experts 2 Experts Virtualization Conference (E2EVC) 2016  Dublin.

image

I’ll be presenting on High Availability with Windows Server 2016 Failover Clustering and Hyper-V. These are supported and enhanced by other new or improved capabilities in Windows Server 2016. So there will be a lot to take in. I’ll be around to talk shop before and after the session so if you want to chat, find me around the conference.

E2EVC a boutique community driven conference. It’s small scale and independence makes it a very direct experience in both realism and hard core real world experience in the virtualization ecosystem and related technologies. We encourage anyone to attend and present. No “prima donnas”  – you can speak when you want to as long as you’re willing to do so and share knowledge and insights.  Give it a shot!  I hope to see you there

Client Access & Windows Server 2016 Site Aware Stretched Clusters

Introduction

There’s more to business continuity than having multiple locations. When it comes to high availability, or perhaps more accurately disaster recovery and business continuity people tend to focus on the good news. Some managers don’t want to be bothered by the details of our incompetency (i.e. reality and laws of physics) and vendors only like to focus on what they can sell with the biggest profit margin. Anything raining on that party falls under annoying details. When such a manager and such a sales man find each other it’s a match made in heaven. You’re the one who’s bringing the rain. It comes under the form of a simple question. How are we going to expose the failed over services internally and externally to the users and customers? What you mean that million-dollar investment in multiple SANs, clusters and consultants isn’t sufficient? Nope!

clip_image002

One piece of very good news is that in Windows Server 2016 Failover Clustering we can now leverage a cloud witness as well, next to a file share witness. This has the benefits we do not need a 3rd site for the file share witness. Which was not always feasible, sometimes a bit convoluted to achieve in the cloud via IAAS or depended on a rather less dependable server or PC somewhere in a branch office.

What’s the problem?

The problem is that failing over the workload with the services (VMs, SQL, File Servers, …) in a healthy, consistent state is only part of the challenge. The other part is to make sure that your clients (human or machines) can actually access those failed over services. If required or possible without noticing or with the smallest possible interruption possible. Even when you can achieve failover with only seconds of service interruption, some applications just can handle this gracefully or not at all.

The thing is when you have multiple sites that often means distinct separate subnets / networks. So when that VM with IP address of 10.10.100.124 on default untagged VLAN 100 fails over to the other site how will the clients in the various branch offices or on the internet access it services? DNS point to 10.10.100.124 under normal conditions.

Well when the IP address can be updated for the DNS record thanks to “Multi-Subnet Resource Configuration” (SQL Server, File Share) thing will work again, eventually, given enough time.

clip_image004

Multi-Subnet Resource Configuration works as follows. We have a single network name resource which we make dependent on multiple IP Address resources. In cluster terms that’s a “OR” decency when looking at the validation report. The secret sauce is that only one of the IP address resources of the network name resource is online at any given time. This gets registered in DNS and that’s what the clients use to access the service.

This works but the DNS record need to be upgraded, DNS replication needs to happen, client their DNS cache needs to expire and update etc. You can be looking at half an hour of down time actually.

But what if Multi-Subnet Resource Configuration isn’t an option or we’re in a hurry? What are options and how well and fast do these work? That’s the point at which the storage vendor is already counting the profits, the PM states the job’s done and the boss has already decided the project is a success and the network guys have some questions about YOUR problems. Let’s discuss some of options to deal with accessing services after a site activation.

Note: Hyper-V replica has the ability inject an alternate IP address on failover but we’re talking about a stretched cluster here, where replication happens at the storage level, not at the application level (Hyper-V) for the virtual machines.

Software Defined Networking Aka Network Virtualization.

Using Hyper-V Network Virtualization (HNV) abstract VMs logical subnet boundaries. This gives each virtual network the illusion it is running as a physical network. The typical example for this is multiple tenants that have the same IP space. The fact that it overlays physical network is also very handy when it comes to one and the same tenants in multi-site scenarios. Virtual networks allow VMs to move across different physical networks without re-configuring IP address in guest OS.

clip_image005

This totally abstracts the networks and it works great for virtual machines (Hyper-V). It doesn’t have to be limited to a single DC or site. Do note that there’s things to discuss around CSVs, Live Migration cluster wise and routing, gateways, DNS, geo load balancing access wise but you get the idea. When it comes to different subnets, different sites in regards to clustering things are not as easy as it seems. For this discussion we’re limiting ourselves to client connectivity to resources that move to another site and don’t dive into the details of network virtualization either.

Network Name Properties

There’s two cluster network name resource property setting you can configure to help reducing downtime after a failover.

RegisterAllProvidersIP cluster network name resource property

Remember our first story of “Multi-Subnet Resource Configuration” with the DNS updates and cache that has to expire? Well this can be enhanced as long as the applications can hand handle it. We can configure the DNS registration behavior via the RegisterAllProvidersIP property of a cluster networks name resource.

Get-ClusterResource MySQLServer |

Set-ClusterParameter RegisterAllProvidersIP 1

By setting this to TRUE all the IP address resources, on line and off line, are registered in DNS. If you have a “enlightened” application that can check for and handle multiple IP addresses and determine which one to use it allows for faster client reconnects. This works great with SQL Server.

HOstRecordTTL cluster network name resource property

This is great but has limited scope as the application has to have the logic to handle multiple registered IP addresses for the same resource and figure out when to use which one. SQL Server can do this, so can Exchange. What about a file server? RegisterAllProvidersIP won’t work but we can reduce the time to live of the DNS record for a cluster network resource IP address on the client from 20 minutes to 5 minutes or lower.

Get-ClusterResource MyFileShare |

Set-ClusterParameter HostRecordTTL 300

This is not an option for Hyper-V, there network virtualization works better or we use other options. Read on!

Stretch your VLANs

Here the VLAN(s) stretch across the sites. This means that the IP address of the service (VMs, SQL Servers, File Shares, …) never changes making it very easy to have the clients reconnect very fast.

clip_image007

Easy for the apps and the system administrators. Well sort of, chances are that the networks admins will chip in and put a kill contract out on you with some assassins. Just saying. In a perfect world this would be a good idea. In reality layer 2 and spanning tree are making sure you’ll sort of regret it or at least deal with the drawback and fall out. Choose wisely.

Abstract the network devices

This is a network vendors provided solution and I don’t see it very much in the wild. In this approach the network devices use a 3rd IP address that get registered in DNS for use by the clients. The fact that the workload switches between subnets when failing over between sites is irrelevant to the clients.

clip_image018

Cisco has this in a couple of solutions where NAT or a VIP is used to achieve this. As this is network appliance/ hardware based it works with any workload.

SLA your way out

Some people “mitigate” the prolonged down time by having a separate SLA for local failover versus site failover. Cool, but if I was cynical I could state that this is just lawyer behavior. You create fine print and “cover your ass” for that scenario. It’s not really solving anything but accepting longer down time and having all involved parties recognize and accept that fact. This is a valid approach.

Be creative & drive towards maximum portability

In an ideal world you can provision apps & services so fast you only need to protect and failover the persistent data. A world of micro services, containers where servers and virtual machines are cattle. But many of us will have to deal with servers being holey cows for now.

The above approaches are the most common options. There are more variations to these. One of those could be bases around the use of a dedicated management domain on both sites. It’s a concept I’ve used a couple of time where and when allowed.

It has some drawbacks or at least some complexities to deal with and one such example might be when configuring host based backups that need access to the guest VMs. This requires some extra firewall configuration. Nothing that would prevent you from doing so with good backup products like Veeam and it’s something you’re probably used to doing already for monitoring and backups across domains anyway.

But it also has serious benefits as the actual business domains are completely separate from the management domain and potentially 100% virtualized but that’s not a hard requirement as long as you keep the remaining physical servers in their own site dependent subnet which routes, these don’t move anyway, and they should have workloads that are distributed anyway like AD, Exchange DAGs, etc. The big benefits compared to a stretched cluster is that you can have the same subnet(s) on both sides of the stretched cluster for your virtual machines and you change the routing and endpoints for your public and private access to the services. Instead of making the changes to the cluster resources you do so higher up at the stack. It’s a bit like moving your data center to new location “as is” and directing the clients to the new location. This removes the need for stretched VLAN, or implementing network virtualization, at the cost of a bit more down time & work to “switch”. It’s worth considering.

It helps to leveraging DNS and geo load balancing technologies in this but the core infrastructure (the site ware stretched cluster) can run in a fully routed / Layer 3 fashion.

Sure you’ll still need to make sure the traffic from the offices goes to the correct data center now and it really rocks if you have your internet presence geo-load balanced in some way but let’s face it. But you needed to have that in order for any approach anyway.

Closing thoughts

There is a lot more detail and complexity to all of this than I covered in this short article. This is meant an eye opener, a point from where to start the discussion with the business demanding 24/7, 99.999% a zero cost and effort. Like Amazon or Azure but then better, cheaper and on premises. Ouch! As you might expect, this can’t be dealt with in just a few pages. Getting a solid, working disaster avoidance, recovery and business continuity plan & process is going to take some effort to create and maintain.

Fully failing over without any work or a second of downtime is a very expensive illusion and you might be better off with 15 to 20 minutes of down time for 90% of the workload and 30 to 60 minutes for the remaining 10% that trying to chase the ultimate perfection of 100% zero downtime ever for all services. Chances are you’ll go broke trying and pretending, which means failing. Remember that when your primary data center was just taken down or worse, burnt down, dealing with a couple of hours of down time to get you secondary site up and running 100 % isn’t actually as bad as it seems when discussing 2 or 3 hours of down time in a management meeting. Somehow it always seems a bigger deal when not faced with the alternative of the business being wiped out.

One final note, don’t forget to tell your bosses you’re going to have to practices this a couple of times per year. Doing it for real count’s a practice only if it’s the 3rd time you do it. Good luck!