SMB 3, ODX, Windows Server 2012 R2 & Windows 8.1 perform magic in file sharing for both corporate & branch offices

SMB 3 for Transparent Failover File Shares

SMB 3 gives us lots of goodies and one of them is Transparent Failover which allows us to make file shares continuously available on a cluster. I have talked about this before in Transparent Failover & Node Fault Tolerance With SMB 2.2 Tested (yes, that was with the developer preview bits after BUILD 2011, I was hooked fast and early) and here Continuously Available File Shares Don’t Support Short File Names – "The request is not supported" & “CA failure – Failed to set continuously available property on a new or existing file share as Resume Key filter is not started.”

image

This is an awesome capability to have. This also made me decide to deploy Windows 8 and now 8.1 as the default client OS. The fact that maintenance (it the Resume Key filter that makes this possible) can now happen during day time and patches can be done via Cluster Aware Updating is such a win-win for everyone it’s a no brainer. Just do it. Even better, it’s continuous availability thanks to the Witness service!

When the node running the file share crashes, the clients will experience a somewhat long delay in responsiveness but after 10 seconds the continue where they left off when the role has resumed on the other node. Awesome! Learn more bout this here Continuously Available File Server: Under the Hood and SMB Transparent Failover – making file shares continuously available.

Windows Clients also benefits from ODX

But there is more it’s SMB 3 & ODX that brings us even more goodness. The offloading of read & write to the SAN saving CPU cycles and bandwidth. Especially in the case of branch offices this rocks. SMB 3 clients who copy data between files shares on Windows Server 2012 (R2) that has storage an a ODX capable SAN get the benefit that the transfer request is translated to ODX by the server who gets a token that represents the data. This token is used by Windows to do the copying and is delivered to the storage array who internally does all the heavy lifting and tell the client the job is done. No more reading data form disk, translating it into TCP/IP, moving it across the wire to reassemble them on the other side and write them to disk.

image

To make ODX happen we need a decent SAN that supports this well. A DELL Compellent shines here. Next to that you can’t have any filter drives on the volumes that don’t support offloaded read and write. This means that we need to make sure that features like data deduplication support this but also that 3rd party vendors for anti-virus and backup don’t ruin the party.

image

In the screenshot above you can see that Windows data deduplication supports ODX. And if you run antivirus on the host you have to make sure that the filter driver supports ODX. In our case McAfee Enterprise does. So we’re good. Do make sure to exclude the cluster related folders & subfolders from on access scans and schedules scans.

Do not run DFS Namespace servers on the cluster nodes. The DfsDriver does not support ODX!

image

The solution is easy, run your DFS Namespaces servers separate from your cluster hosts, somewhere else. That’s not a show stopper.

The user experience

What it looks like to a user? Totally normal except for the speed at which the file copies happen.

Here’s me copying an ISO file from a file share on server A to a file share on server B from my Windows 8.1 workstation at the branch office in another city, 65 KM away from our data center and connected via a 200Mbps pipe (MPLS).

image

On average we get about 300 MB/s or 2.4 Gbps, which “over” a 200Mbps WAN is a kind of magic. I assure you that they’re not complaining and get used to this quite (too) fast Winking smile.

The IT Pro experience

Leveraging SMB 3 and ODX means we avoid that people consume tons of bandwidth over the WAN and make copying large data sets a lot faster. On top of that the CPU cycles and bandwidth on the server are conserved for other needs as well. All this while we can failover the cluster nodes without our business users being impacted. Continuous to high availability, speed, less bandwidth & CPU cycles needed. What’s not to like?

Pretty cool huh! These improvements help out a lot and we’ve paid for them via software assurance so why not leverage them? Light up your IT infrastructure and make it shine.

What’s stopping you?

So what are your plans to leverage your software assurance benefits? What’s stopping you? When I asked that I got a couple of answers:

  • I don’t have money for new hardware. Well my SAN is also pré Windows 2012 (DELL Compellent SC40 controllers. I just chose based on my own research not on what VARs like to sell to get maximal kickbacks Winking smile. The servers I used are almost 4 years old but fully up to date DELL PowerEdge R710’s, recuperated from their duty as Hyper-V hosts. These server easily last us 6 years and over time we collected some spare servers for parts or replacement after the support expires. DELL doesn’t take away your access to firmware &drivers like some do and their servers aren’t artificially crippled in feature set.
  • Skills? Study, learn, test! I mean it, no excuse!
  • Bad support from ISV an OEMs for recent Windows versions are holding you back? Buy other brands, vote with your money and do not accept their excuses. You pay them to deliver.

As IT professionals we must and we can deliver. This is only possible as the result of sustained effort & planning. All the labs, testing, studying helps out when I’m designing and deploying solutions. As I take the entire stack into account in designs and we do our due diligence, I know it will work. The fact that being active in the community also helps me know early on what vendors & products have issues and makes that we can avoid the “marchitecture” solutions that don’t deliver when deployed. You can achieve this as well, you just have to make it happen. That’s not too expensive or time consuming, at least a lot less than being stuck after you spent your money.

Unable to retrieve all data needed to run the wizard. Error details: “Cannot retrieve information from server “Node A”. Error occurred during enumeration of SMB shares: The WinRM protocol operation failed due to the following error: The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not available. This is usually returned by a HTTP server that does not support the WS-Management protocol.

I was recently configuring a Windows Server 2012 File server cluster to provide SMB transparent failover with continuous available file shares for end users. So, we’re not talking about a Scale Out File Server here.

All seemed to go pretty smooth until we hit a problem. when the role is running on Node A and you are using the GUI on Node A this is what you see:

image

When you try to add a share you get this

"Unable to retrieve all data needed to run the wizard. Error details: "Cannot retrieve information from server "Node A". Error occurred during enumeration of SMB shares: The WinRM protocol operation failed due to the following error: The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not available. This is usually returned by a HTTP server that does not support the WS-Management protocol.”

image

When you failover the file server role to the other node, things seem to work just fine. So this is where you run the GUI from Node A while the file server role resides on Node B.

image

You can add a share, it all works. You notice the exact same behavior on the other node. So as long as the role is running on another node than the one on which you use Failover Cluster Manager you’re fine. Once you’re on the same node you run into this issue. So what’s going on?

So what to do? It’s related to WinRM so let’s investigate that.

image

So the WinRM config comes via a GPO. The local GPO for this is not configured. So that’s not the one, it must come from the  domain.The IP addresses listed are the node IP and the two cluster networks. What’s not there is local host 127.0.0.1, the cluster IP address or any of the IPV6 addresses.

I experimented with a lot of settings. First we ended up creating an OU in the OU where the cluster nodes reside on which we blocked inheritance. We than ran gpupdate /target:computer /force on both nodes to make sure WinRM was no longer configure by the domain GPO. As the local GPO was not configured it reverted back to the defaults. The listener show up as listing to all IPv4 and IPv6 addresses. Nice but the GPO was now disabled.

image

This is interesting but, things still don’t work. For that we needed to disable/enable WinRM

Configure-SMRemoting -disable
Configure-SMRemoting –enable

or via server manager

image

That fixed it, and we it seems a necessity to to. Do note that to disable/enable remote management it should not be configured via a GPO or it throws an error like

image

or

image

Some more testing

We experimented by adding 127.0.0.0-172.0.0.1 an enabling the GPO again. We then saw the listener did show the local host, cluster & file role IP address but the issue was back. Using * in just IPv 4 did not do the trick either.

image

What did the trick was to use * in the filter for IPv 6 and keep our original filters on IPv4. The good news is that having removed the GPO and disabling/enabling WinRM  the cluster IP address & Filer Role IP address are now in the list. That could be good for other use cases.

This is not ideal, but it all works now.

What we settled for

So we ended up with still restricting the GPO settings for IPv4 to subnet ranges and allowing * for IPv6. This made sure that even when we run the Failover Cluster Manager GUI from the node that owns the file server role everything still works.

One workaround is to work from a remote host, not from a cluster member, which is a good practice anyway.

The key takeaway is that when Microsoft says they test with IPv6 enabled they literally mean for everything.

Note

There is a TechNet article on WinRM GPO Settings for SCVMM 2012 RC where they advice to set both IPv4 and  IPv6  to * to avoid issues with SCVMM operations. How to Add Trusted Hyper-V Hosts and Host Clusters in VMM 

However, we found that IPv6 is the key requirement here, * for just IP4 alone did not work.

Transparent Failover & Node Fault Tolerance With SMB 2.2 Tested

Transparent Failover and node fault tolerance with SMB 2.2 in Windows 8 Server is something that caught my attention immediately. The entire effort in infrastructure has been to keep the plumbing as invisible & unnoticed as possible. In some areas we had great success in others not so much. Planned & unplanned down time of file servers has always been an issue as there was always a short or longer outage and any failover meant disconnecting & reconnecting leading to all kinds of end users problems and confusion. To them the network is down. But the same issues exist on the server side with apps depending on files shares or servers like SQL Server that are writing backups to remote share or read data from such a share. Often it needs some kind of human intervention to correct the situation. No not even 3rd party clustered file systems and active-active clustering software could achieve this. The SMB protocol prior to 2.2 did not allow for it.

So when one hears it is a possibility now we want to test this! So we throw some virtual machines on the test cluster and build a file cluster with windows 8 server and we also have a 3rd server to act as a client with SMB 2.2. Open the Failover Cluster Manager right click roles and choose to configure a role.

image

You’ll see the familiar wizard, click next

image

And choose the file server role

image

Give the Client Access Point a name and add an IP address.

image

Add some storage

image

And voila … after the confirmation we’re asked to configure high availability

image

This opens the New Share Wizard

image

…this is all pretty straight forward so I’ll leave out the screenshots but for the most important one where we explicitly uncheck the “Enable continuous availability” as we want to first run a test without it Smile

image

Continue through the wizard & voila you have a clustered file server with a Client Access Point as a single namespace. Please not that you can connect to this using that single name space. No need for \serverABizzShare & \ServerBBizzShare and going fancy with redundant DFS name spaces and the like.

Remember we still need to make this share highly available but let’s do some file copies and fail over the node to see what this looks like without transparent failover. Select the role transparent, right click, choose “Move” and “Select  Node”.

image

Choose an available node and click OK

image

As you can see this looks rather familiar.image

Let’s make that share continuously available. Go to and double click on the share you want to configure.

image

You’ll see a progress dialog whilst information is retrieved …

image

… and then the share properties are presented, most is familiar stuff but we need the bottom one “Settings”. Select the check box to make the share continuously available.

image

Now let’s try that file copy of again whilst failing over the file server role to another node.

image

So there is no loss of data, no need to the client to reconnect, you don’t have to retry but you do have a freeze that lasts for about 20 seconds on my test lab. I hope this will still improve before RTM.

What we learned here is that we can have Transparent (File Share) failover with SMB 2.2 in a virtualized environment and we can give it a “Client Access Point” name like “MyOldFileServer” so that users are not confused or need to learn another UNC path. There are many options to achieve keeping old namespaces around for end user ease of use but this is an extra ace up our sleeve. For now planned (patching, server maintenance) or unplanned (crash) is a 20 second freeze experience right now as the file share fails over. This freeze is probably due to active–passive clustering. For now active-active is not recommended/supported for file sharing in an end user scenario. I think they are “worried” of huge file shares with a zillion meta data updates to sync. But this is supported for apps like hyper-V, SQL Server backups or apps needing file data etc. I’m going to try it next and for user data. Things might change before RTM and with multichannel, RDAM, 10Gbps, NIC teaming in the OS perhaps that active-active scenario might be feasible for user file data? PLEASE? Otherwise here’s another request for “Windows 8 Server R2” Winking smile

The secret sauce is in:

  • SMB 2.2 on both client & server
  • Resume Key
  • SMB 2.2 Witness service which is stet to running when you make an share continuous available.

image

Go watch the sessions from the Build conference to hear more on al this. The work they’ve put in to this  + some of the complexities are quite amazing. http://www.buildwindows.com/Sessions

Things to find out: how to rename a Client Access Point or how to delete it. Adding a new one is easy.

Warning: It’s September 23rd 2011 and the Developer Preview is a little rough around the edges don’t run this on anything you need to get your bills paid yet  Winking smile