Windows Server 2016 Data Deduplication Scales and Performs Better

I’ve been leveraging Windows Server Data Deduplication since it became available with great results.

Embedded image permalink

One of the enhanced features in Windows Server 2016 is Data Deduplication and it’s one I welcome very much. The improvements we’re getting mostly have to do with scale and performance. I’m quite pleased that Microsoft listened to our previous feedback on this.

image

You cannot imagine how much money on backup target storage we have saved by using this. So we’re very happy that Windows Server 2016 Data Deduplication scales and performs better. The fact that we can no get even better scale and performance is music to our ears. The Backup target servers are the first in line for an upgrade, that’s for sure! That’s the reason I mentioned it as a subject to look into in the Hyper-V amigos interview at Ignite!

Scale Improvement of the supported LUN sizes, up to 64TB

Actually I was already pushing this to 50TB Embarrassed smile in some cases for testing but over all I used 6 to 10 TB volumes. But the support for bigger volumes is very welcome. Now, please not that you should NOT go any higher than 64TB (I actually stay below that) otherwise deduplication doesn’t work due to it’s dependency on VSS. Please read my blog

Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted” on this subject.

In Windows 2012 R2 we were limited because data deduplication used a single-threaded job and I/O queue for each volume. That makes it wiser to have 10 target LUNS of 6TB than one huge 60TB LUN. The big issue otherwise is that large volumes could lead to the dedup processing keeping up with the rate of data changes (“churn”).  Now your milage would very depending on the type of data and the delta. More info on this in the blog post:Sizing Volumes for Data Deduplication in Windows Server. It will help you size the volumes but note that in Windows Server 2016 the rules have changed Smile

The dedup optimization processing now runs multiple threads in parallel using multiple I/O queues on a single volume which gives you better performance and doesn’t incur the overhead of having to use more smaller LUNs.

File sizes up to 1TB are good for dedup

Windows Server 2012 R2 Data Deduplication supports the use of file sizes up to 1TB, but they are considered as “not good candidates” for dedup.  So that DPM workaround of backing up to a truckload of virtual machines with 1TB virtual disks that are deduplicated is borderline. You can see one improvement in CPS v2 coming already (also see the next header). 1TB is now fully supported and a good candidate. I’ll be pushing it higher … in my opinion this is were the most work will need to be done for future improvements. It would allow for more scenarios (I have VMs that hold VHDX virtual disks of  2TB or more). Scale it something that helps keep things simple. Simple avoid costs & issue with complexity. That’s always a good thing if possible.

In Windows Server 2012 the algorithms can’t scale as well and performance suffers due to things like scanning for and inserting changes can slow down as the total data set increases. These processes have been redesigned in Windows Server 2016. It now uses new stream map structures and improved partial file optimization. As a result 1TB file sizes have become good candidates.

Virtualized backup is a new usage type

DPM is already leveraging deduplication of virtual machines (CPS drove that I think, see Deduplicating DPM Storage).

image

In Windows Server 2016 all the dedup configuration settings needed for the DPM backup scenario have been combined into a new usage type called “Backup”. This simplifies the deployment and helps “future proof” your setup as future changes can automatically be applied true this usage type.

Nano Server support

Data deduplication is (or will be) fully supported in Nano Server (new in TPv3). It’s not completely done yet so deduplication support in Nano Server still has a few restrictions:

  • Support has only been validated in non-clustered configurations
  • Deduplication job cancellation must be done manually (using the Stop-DedupJob PowerShell command)

Microsoft welcomes any feedback on the deduplication feature via an email sent to [email protected]. For me the standing order is to break through that 1TB barrier!

My take & Magic Ball

In combination with the right backup product it saves a ton of money. I have leveraged VEEAM and in the past Windows Backup (inbox) with great results. The benefit of these two is that you can backup to physical storage and leverage deduplication. Virtualized backup as a new usage type and makes live easier for the supported “workaround” around the limitations of DPM where normally they only support VDI for  with deduplication.  What I’m really curious about is another possible future usage type: “Virtual Servers” … I guess for that one deduplication support for the OS disk would be very beneficial for “cloud” providers. We’ll see

Windows Server Backup Benefits from Improvements in Windows Server 2012

Introduction

In certain environments we backup VMs and any remaining physical hosts using Windows Backup. Before you all think this is ridiculous, I advise you to think again. With some automation you can build a very reliable agentless backup solution with the built in functionality. Windows Server 2012 brings good news for the smaller & perhaps low budget environments. Windows Server Backup is now capable of doing host level backups of the Hyper-V guest stored on Clustered Shared Volumes. This was not the case in Windows 2008 R2 and it is a vast improvement.This change is due to the fact that CSV has been changed not to require specialized API capable of dealing with it’s intricacies. All backup products now can backup CSVs without specialized APIs.

This is, linked to huge improvements in how a CSV behaves during a backup. In the past, when you started a backup, the CSV ownership would be moved to the node that runs the backup and all access by other nodes was in redirected mode for the duration of the backup. Unless you used a hardware VSS provider, which were not trouble free either. If your backup software did not understand CSVs and use the CSV APIs you were out of luck. From Windows Server 2012 on you are only in redirected I/O mode for the time it takes to create the VSS snapshot. The rest of the backup duration your nodes access the CDV disk in direct mode. So back to Windows Server Backup. You cannot up the CSV as disk volume but you can select Hyper-V from the items to include in the backup.

image

That will show all VMs running on that host, meaning you cannot backup VMs running on another host. Compared to Windows Server 2008 R2 where using the native Windows backups  with VMs on a CSV LUN meant using in guest backups this a major improvement.

Some Approaches to Using Windows Server Backup

Sometime we run the backups to local disk and regularly copy those off to a file share. This has the benefit of providing the backup versioning you can from using a local disk. The draw back is that the backups can be rather big.

In VMs, that is with backups in the guest, we run those backups to a file share over a 1Gbps management network. Performance is good, but it leaves us with the issue that there is no versioning.

For that reason our backup script copies the entire backup folder for a server to an archive folder on that same JBOD. Depending on how much space you have and need you can can configure the retention time of these older backups. This way you can keep a large number of backups over time.  A script runs every day that deletes the older backups based on the chosen retention time so you don’t run out of space.

There is one way around the lack of versioning when writing backups to a share and that is to mount a VHD on a file share locally to the host where you are running a backup or use pass through disk inside the VM. While you can get away with this becomes rather messy due to management & flexibility drawbacks of pass through disks. Mounting a VHD on a file share inside a VM is also a performance issue. So while possible and viable in certain scenarios I don’t use this for more than a few hosts and those are physical ones.

We had hoped that Windows Server 2012 with its support for VSS snapshots on SMB 3.0 shares would have enabled backup versioning in Windows Backup, just like it can do for backups to local disk. Unfortunately this is not the case. You’ll still get the same warning when backing up from a Windows Server 2012 host to an SMB 3.0 share as you used to get with previous versions:

image

How fast are backups and restores?

The largest environment is a couple of Physical servers, a two node Hyper-V cluster and about 22 virtual machines. That includes some with a larger amount of data. The biggest being 400 GB. Full backups are run weekly at night on all servers over 1Gbps and this works just fine.

We can backup a VM of with a 50GB VHD (about 50% to 75% in use) and copy that backup to the archive folder in 20 minutes. We backup AND copy to archive a VM’s C: Drive (20GB of data) and a D: Drive (190GB of data), 2 separate VHD  in 2.5 hours.

For some statistics. A bare metal restore of a VM or physical host over 1Gbps with a single VHD or volume with the OS, applications and some data takes us 30 tot 35 minutes in real life due to the overhead of setting a new VM up. I you just want to restore individual data you can do that as well. You can even mount the backup VHD and recover them via the Windows Explorer.

This is what a the logging we do and e-mail to the sys admins looks like:

image

Where else do the Windows Server 2012 improvements help?

Data Deduplication

Now there is some good news. We ran ddpeval.exe against the JBOD LUNS where we store these archived backups and got some great results. We also copied such an archive folder to a Windows Server 2012 Host and ran data deduplication against it. In that test we achieved a 84%-85% deduplication rate depending on how many versions of the backups we archive and what the delta is during that time frame.  The latter is important. If we run dedupe only against domain controller backup archives we get up to 94%. Deduplication should not not impact restore performance to much because in 99% of the cases you revert to the last backup which sits in the WindowsBackup Folder. Only if you need older backups you will work against the deduped files, unless the archive folder is on the same LUN as the original backups. I’ don’t have real life info about restores yet. Just a small lab test.

SMB 3.0 & Multichannel

In Windows Server 2012 you also get all benefits of SMB 3.0 and MultiChannel for your backup traffic.

Not your grandfathers ChkDsk

The vastly improved ChkDsk is a comfort for worried minds when it comes to fixing potential corruption on a large LUN. Last but not least, the FLUSH command in NTFS makes using cheaper data disks safer.

VHDX

The VHDX format allows for 64TB. That means your Windows Server backup can now handle more than 2TB LUNs. This should be adequate Smile

Conclusion

With some creativity and automation via scripting you can leverage the Windows Backup to be a nice and flexible solution. Although I feel that providing backup versioning to a file share is an improvement that is missing  the new features help out a lot and all in all, it’s not bad at all! So you see, even smaller organizations can benefits seriously from Windows Server 2012 and get more bang for their bucks.