NIC Firmware/Driver Updates Reset Your RSS/VMQ optimizations

When optimizing your RSS/VMQ settings for maximum performance you’ll normally (I hope) do this in PowerShell. Save that script with some comments on why you configure it that way and make it part of your Hyper-V host deployment scripts

Why? Automation is king but you’ll need it again for sure. Why? Well there is this “tendency” that NIC firmware/driver updates reset your RSS/VMQ optimizations back to their defaults.That’s a bit of a bummer if you have to redo all the work instead of having a script ready to go. I have seen many a deployment where the configuration was missing after firmware/driver upgrades so please, check!

image

Figure: Where has my optimized configuration gone after a driver/firmware upgrade?

The good news is this isn’t a show stopper issue as things will keep working, but without your optimizations and with VMQ, depending on your NIC team setup for the vSwitch issues might occur. When doing NIC teaming for your virtual switch it’s important to get it right.  With switch dependent teaming (LACP/Static) the NICs in the team need to use overlapping processor sets (Min Queues). When doing switch independent teaming the NICs in the team need to use non-overlapping processor sets. So you need to configure each NIC in your team to use the different processors (Sum of Queues).

On top of that you might want to / should separate RSS/VMQ cores from each other. SMB Direct for CSV/LM will also help achieve this as there we leverage CPU offloading to the NIC.

Know What Receive Side Scaling (RSS) Is For Better Decisions With Windows 8

Introduction

As I mentioned in an introduction post Thinking About Windows 8 Server & Hyper-V 3.0 Network Performance there will be a lot of options and design decisions to be made in the networking area, especially with Hyper-V 3.0. When we’ll be discussing DVMQ (see DMVQ In Windows 8 Hyper-V), SR-IOV in Windows 8 (or VMQ/VMDq in Windows 2008 R2) and other network features with their benefits, drawbacks and requirements it helps to know what Receive Side Scaling (RSS) is. Chances are you know it better than the other mentioned optimizations. After all it’s been around longer than VMQ or SR-IOV and it’s beneficial to other workloads than virtualization. So even if you’re a “hardware only for my servers” die hard kind of person you can already be familiar with it. Perhaps you even "dislike” it because when the Scalable Networking Pack came out for Windows  2003 it wasn’t such a trouble free & happy experience. This was due to incompatibilities with a lot of the NIC drivers and it wasn’t fixed very fast. This means the internet is loaded with posts on how to disable RSS & the offload settings on which it depends. This was done to get stability or performance back for application servers like Exchange and others applications or services.

The Case for RSS

But since Windows 2008 these days are over. RSS is a great technology that gets you a lot better usage of out of your network bandwidth and your server. Not using RSS means that you’ll buy extra servers to handle the same workload. That wastes both energy and money. So how does RSS achieve this? Well without RSS all the interrupt from a NIC go to the same CPU/Core in multicore processors (Core 0).  In Task Manager that looks not unlike the picture below:

image

Now for a while the increase in CPU power kept the negative effects at bay for a lot of us in the 1Gbps era. But now, with 10Gbps becoming more common every day, that’s no longer the case. That core will become the bottle neck as that poor logical CPU will be running at 100%, processing as much network interrupts in can handle, while the other logical CPU only have to deal with the other workloads. You might never see more than 3.5Gbps of bandwidth being used if you don’t use RSS. The CPU core just can’t keep up. When you use RSS the processing of those interrupts is distributed across al cores.

With Windows 2008 and Windows 2008 R2 and Windows 8 RSS is enabled by default in the operating system. Your NIC needs to support it and in that case you’ll be able to disable or enable it. Often you’ll get some advanced features (illustrated below) with the better NICs on the market. You’ll be able to set the base processor, the number of processors to use, the number of queues etc. That way you can divide the cores up amongst multiple NICs and/or tie NICs to specific cores.

image

image

So you can get fancy if needed and tweak the settings if needed for multi NIC systems. You can experiment with the best setting for your needs, follow the vendors defaults (Intel for example has different workload profiles for their NICs) or read up on what particular applications require for best performance.

Information On How To Make It Work

For more information on tweaking RSS you take a look at the following document http://msdn.microsoft.com/en-us/windows/hardware/gg463392. It holds a lot more information than just RSS in various scenarios so it’s a useful document for more than just this.

Another good guide is the "Networking Deployment Guide: Deploying High-Speed Networking Features". Those docs are about Windows 2008 R2 but they do have good information on RSS.

If you notice that RSS is correctly configured but it doesn’t seem to work for you it’ might be time to check up on the other adaptor offloads like TCP Checksum Offload, Large Send Offload etc. These also get turned of a lot when trouble shooting performance or reliability issues but RSS depends on them to work. If turned off, this could be the reason RSS is not working for you..

Thinking About Windows 8 Server & Hyper-V 3.0 Network Performance

Introduction

The main purpose of this post is, as mentioned in the title, to think. This is not a design or a technical reference. When it comes to designing a virtualization solution for your private cloud their are a lot of components to consider. Storage, networking, CPU, memory all come in to play and there is no one size fits all. It all depends on your needs, budget in combination with how good your  insight into your future plans & requirements are. This is not and easy task. Virtualizing 40 webservers is very different from virtualizing SQL Server, Exchange or SharePoint.  Server virtualization is different from VDI and VDI itself comes in many different flavors.

  • So what workloads are you hosting? Is it a homogeneous or a heterogeneous environment?
  • What kind of applications are you supporting? Client-Server, SOA/Web Services, Cloud apps?
  • What storage performance & features do you need to support all that?
  • What kind of network traffic does this require?
  • What does your business demand? Do you know, do they even know? Can you even know in a private cloud environment?
  • Do you have one customers or many (multi tenancy) and how are they alike or different in both IT needs and business requirements.

The needs of true public cloud builders are different from those running their own private clouds in their own data centers or in a mix of those with infrastructure at a hosting provider. On top of that an SMB environment is different from large enterprises and companies of the same size will differ in their requirements enormously due to the nature of their business.

I’ve written about virtualization and CPU considerations before (NUMA, Power Save settings for both OS & 10Gbps network performance) before. I’ve also discussed a number of posts about 10Gbps networking and different approaches on how to introduce it with out breaking the bank. In 2012 I intend to blog some more on networking and storage options with Windows 8 and Hyper-V 3.0. But I still need to get my hands on the betas and release candidates of Windows 8 to do so. You’ll notice I don’t talk about Infiniband. Well I just don’t circulate in the ecosystems where absolute top notch performance is so important that they can justify and get that kind of budget to throw at those needs.

To set the scene for these blog posts I’ll introduce some considerations around networking options with Hyper-V. There are many features and options both in hardware, technologies, protocols, file systems. Even when everything is intended to make live simpler people might get lost in all the options and choices available.

Windows Server 8 NIC features – The Alphabet Soup

  • Data Center Bridging (DCB)
  • Receive Segment Coalescing (RSC)
  • Receive Side Scaling (RSS)
  • Remote Direct Memory Access (RDMA)
  • Single Root I/O Virtualization (SR-IOV)
  • Virtual Machine Queue (VMQ)
  • IPsec offload (IPsecTO)

A lot of this stuff has to do with converged networks. These offer a lot of flexibility and the potential for cost savings along the way. But convergence & cost savings are not a goal. They are means to an end. Perhaps you can have better, cheaper and more effective solutions leveraging you existing network infrastructure by adding some 10Gbps switches & NICs where they provide the best bang for the buck. Chances are you don’t need to throw it all out and do a fork lift replacement. Use what you need from the options and features. Be smart about it. Remember my post on A Fool With A Tool Is Still A Fool, don’t be that guy!

Now let’s focus on couple of the features here that have to do with network I/O performance and not as much convergence or QOS. As an example of this I like to use Live migration of virtual machines with 10Gbps. Right now with one 10Gbps NIC I can use 75% of the bandwidth of a dedicated NIC for live migration. When running 20 or more virtual machines per host with 4Gbps to 8Gbps of memory and with Windows 8 giving me multiple concurrent Live Migrations I can really use that bandwidth. Why would I want to cut it up to 2 or 3Gbps in that case. Remember the goal. All the features and concepts are just tools, means to and end. Think about what you need.

But wait, in Windows 8 we have some new tricks up our sleeve. Let’s team two 10Gbps NICs put all traffic over that team and than divide the bandwidth up and use QOS to assure Live Migration gets 10Gbps when needed but without taking it away from others network I/O when it’s not needed. That’s nice! Sounds rather cool doesn’t it and I certainly see a use for it. It might not be right if you can’t afford to loose that bandwidth when Live Migration kicks in but if you can … more power & cost savings to you. But there are other reasons not to put everything on one NIC or team.

RSS, VMQ, SR-IOV

One thing all these have in common is that they are used to reduce the CPU load / bottleneck on the host and allow to optimize the network I/O and bandwidth usage of your expensive 10Gbps NICs. Both avoiding having a CPU bottleneck and optimizing the use of the available bandwidth mean you get more out of your servers. That translates in avoiding buying more of them to get the same workload done.

RSS is targeted at the host network traffic. VMQ and SR-IOV are targeted at the virtual machine network traffic but in the end the both result in the same benefits as stated above. RSS & VMQ integrate well with other advanced windows features. VMQ for examples can be used with the extensible Hyper-V switch while RSS can be combined with QOS & DCB in storage & cluster host networking scenarios. So these give you a lot of options and flexibility. SR-IOV or RDMA is more focused on raw performance and doesn’t integrate so well with the more advanced features for flexibility & scalability. I’ll talk some more on this in future blog posts.

Now with all these features that have there own requirements and compatibilities you might want to reconsider putting all traffic over one pair of teamed NICs. You can’t optimize them all in such a scenario and that might hurt you. Perhaps you’ll be fine, perhaps you won’t.

image

So what to use where and when depends on how many NICs you’ll use in your servers and for what purpose. For example even in a private cloud for lightweight virtual machines running web services you might want to separate the host management & cluster traffic from the virtual machine network traffic. You see RSS & VMQ are mutually exclusive. That way you can use RSS for the host/cluster traffic and DVMQ for the virtual machine network. Now if you need redundancy you might see that you’ll already use 2*2NIC with Windows 8 NLB in combination with two switches to avoid a single point of failure. Do your really need that bandwidth for the guest servers? Perhaps not but, you might find that it helps improve density because of better better host & NIC performance helping you avoiding the cost of buying extra servers. If you virtualize SQL servers you’d be even more interested in all this. The picture below is just an illustration, just to get you to think, it’s not a design.

image

I’m sure a lot of matrices will be produced showing what features are compatible under what conditions, perhaps even with some decision charts to help you decide what to use where and when.