In place upgrade of RD Gateway farm nodes to Windows Server 2016 removes the Loopback adapter for UDP load balancing

Here’s a quick heads up to anyone who’s involved in upgrading existing Windows Server 2012 (R2) RD Gateway farms to Windows Server 2016.

In my recent experiences the in place upgrade (VMs) works rather well. Just make sure the netlogon service is set to automatic (a know issue and a fix is coming) after you upgrade and install all updates. Also make sure that you don’t have this issue

Windows Time Service settings are not preserved during an in-place upgrade to Windows Server 2016 or Windows 10 Version 1607

There is however one networks specific issue specific you’ll need to deal with when leveraging UDP with a load balancer via Direct Server Return.

When you have a RD Gateway farm you load balance it with a (preferably high available) load balancer like a Kemp Loadmaster. I have described this in these blogs/videos Load balancing Hyper-V Workloads With High To Continuous Availability With a KEMP Loadmaster and Quick Demo Video Of Site Failover With KEMP Loadmaster Global Balancing

What you also do is load balance both HTTPS (TCP, port 443) and UDP (port 3391). For UDP we use Direct Server Return ((DSR) as described in my blog post Load balancing UDP for a RD Gateway farm with a KEMP Loadmaster. This requires a properly configured loopback adapter.

image

During the in place upgrade to Windows Server 2016 this loopback adapter is removed form the nodes. So you need to add it back just a described in my original blog post. Normally it will find the settings for it in the registry but it’s bets you check it all out as I’ve found that the loopback adapter did have “Register this connection”s address in DNS” enabled as well as NETBIOS over TCP/IP. So, per my blog post, check it all to make sure. Other than that, after installing all the Windows Server 2016 updates all works smoothly after an in place upgrade.

Hope this helps someone out there!

High Availability has a price

We’ll go back to basics today. Some times the obvious, no matter how evident it is to us technologists, is challenged. Recently we got the remark that we were wasting CPU cycles by assigning to many vCPU to certain virtual machines on our Hyper-V cluster. So we had to explain that high availability has a price. On top of that we had to explain that things are not as wasteful as they seem in a virtual environment.

The case

Here’s one of the “offending” virtual machines. They assumed that we are wasting at least 50% of 12 CPUs.

image

This is one node in a dual node  load balancing (active-active) and highly available solution. This provides for zero down time during scheduled maintenance and very little downtime during system failures.

And here’s the second node (yes the 1st node has been down for scheduled maintenance more recently that node 2).

image

In a 2 node HA solution you need to make sure that one node can handle the entire workload. This is the absolute border line of an N+1 solution.  This means you can lose 1 node. N determines the number of nodes needed to guarantee an agreed upon service level and the number defines how many nodes failures can be tolerated before affecting the service.

In the above example there’s a need to have the CPU resources on each node to run the entire workload on one node without having an effect on the service. Therefore, when both nodes are up this might seem like a waste to the uninitiated. It is however a required to achieve the high availability goal. A constant CPU usage over 75 % will lead to a reduction in service quality in this case and even compromise the usability of the that service.

I did not even dive into the dangers of designing purely based on averages during this “explanation”. That was one step to much for the level of the discussion.

It’s also important to note that Hyper-V CPU scheduling is highly intelligent and is far less susceptible to the waste of CPU cycles via over provisioning of vCPU than some other solutions are or used to be. Knowing the capabilities and inner working of the technology used is also important in all this. More nodes generally also make “over provisioning” less of an issue. When you have 10 nodes and you lose 1, you only have lost 10% of the capabilities, not 33% like in a 3 node cluster.

Ideally you have 3 node so that even during an issue with one node you still maintain redundancy. However if you want acceptable services during a 2 node failure you’ll need to go to N+2, meaning that you need 2 nodes to provide the services and handle losing 2 nodes gracefully. In that case you’ll need 4 node and so on.  The larger the node count  the wiser it is to go to a N+2 model and ideally you’ll provide separate failure domains over which the nodes are distributed. An example of this is having a redundant geo-load balanced web farm of 32 virtual machine nodes spread over 2 locations and running on separate hardware failover clusters in each location. As you can see the higher the stakes and demands the faster the cost and potential complexity rises. You can offload some of the complexity by leveraging a public cloud like Azure, but the costs will still be there. There is no such thing as a free lunch, some are quite easy and affordable for what you get.

Conclusion

High Availability has a price. I did mention that already, right? To be able to keep your services running at a level that is both workable and acceptable to your customers and stake holders you will need to over provision to a degree. There is no magic here. When your solutions are being scrutinized by people with no real background, experience and context in high availability you might need to explain this.

Load balancing Hyper-V Workloads With High To Continuous Availability With a KEMP Loadmaster

I’m working on some labs and projects with KEMP Loadmaster load balancing appliances (LM 2400, LM-R320) That will lead to some blog post on  load balancing several workloads, which are all on Windows Server 2012 R2  Hyper-V or integrate in to Azure. The load balancers used in the labs are the virtual appliances, depending on the needs and environment these are a very good, cost effective option for production as well and depending on the version you get they scale very well. Hence their use in cloud environments, they will not hold you back at all!

To stimulate your interest in load balancing and high availability I’ve put up a video on load balancing RD Gateway services. Consider it a teaser or introduction to more about the subject.

Why use an appliance (hardware/virtual)? Well let’s look at the 2 alternatives:

  • Round robin DNS, which is also sometimes used is just to low tech for most real life scenarios and sometimes can’t be used or is less efficient which impacts scalability and performance. On top of that it doesn’t provide health checking for failover purposes.
  • I’ve also said  before that while Windows NLB  provides layer 4 load balancing out of the box it’s pretty basic. It also often causes a lot of network grief and the implementation can be tedious. This has not improved in an ever more virtualized & cloud based world. On top of that, when network virtualization comes into play you might paint yourself into a corner as those two don’t mix. But if that’s not a concern and you’re on a budget, I’ve used it with success in the past as well.