Quick Fix Publish : VM won’t boot after October 2017 Updates for Windows Server 2016 and Windows 10 (KB4041691)

If you had WSUS (or SCCM) running tonight with auto approval on you might have woken up this morning to virtual machines that can boot anymore.

image

Great, another update gone wrong. Time to restore from backup as that can be the fasted way to restore services when in a pickle and if you have a good solutions for that in place. For the others you can do what I did is below. Actually a couple of us MVPs were on this issue at a number of sites as our fist task this morning. But first the root cause.

Well read this link Express update delivery ISV support and you have all you need. Basically the delta and the full cumulative update of October (KB4041691 – https://support.microsoft.com/en-us/help/4041691)  ended up in WSUS without you explicitly putting it there. That should not happen, normally the delta is not published for it to be downloaded and heaven forbid auto approved.  You could also have manually approved everything without really knowing what and why. Not a great idea at all.

image

So your VM get’s offered both of them and that is BAD!

image

Normally you get into this pickle if you some how managed to install both of these yourself or via other tools (see the link above), which you shouldn’t do.

Now if you don’t have decent restore capabilities from backups or snapshots there is another way out by removing the updates.

Boot into the problematic VM and select troubleshoot

image

Select to open the command prompt and stay away from any other auto repair options.

image

Microsoft advises to get rid of the SessionsPending reg key. To do so load the software registry hive as follows:

reg load hklm\temp c:\windows\system32\config\software

Delete the SessionsPending registry key, if it exists by running:

reg delete “HKLM\temp\Microsoft\Windows\CurrentVersion\Component Based Servicing\SessionsPending” /v Exclusive

Unload the software registry hive:

reg unload HKLM\temp

Run dism /image:c:\ /get-packages to find the updates installed that caused the issue

image

The yellow one are the ones of interest and you can see the first one never even got an install time/

We now use DISM to remove these updates.  Do first create the C:\Temp folder with MD temp if it doesn’t exist yet!

dism /image:c:\ /remove-package /packagename:myproblematicpackagetoremove /scratchdir:c:\temp

image

When done, close the command prompt, shut down the VM and then start it.

image

It will take a while but if will succeed and you’ll be greeted by a logon screen. Good luck!

Important: Do not try any other repair options or removing the updates with DISM might fail. We choose to remove all 3 updates from tonight to make sure. It might suffice to remove the delta one alone but we wanted to have an VM back as it was last night so more testing can be done before it is deployed again.

So, basically, don’t auto approve updates blindly, but test, validate & roll out in phases. Have great backup and TESTED restores. All by all we were only bitten in the lab, a couple of test/dev VMs and some of our infra VMs. Most of these are redundant and are patched stagger so our services were never badly effected. That gave us time to trouble shoot and investigate and warn our colleagues. As you can see here the issue was a delta update that made it into WSUS and was installed together with the full CU. Just manually downloading the CU and testing it would not have given you the heads up. About an issue. This is a reminder you need to test your real live situation and processes as realistically as possible. When you’re done with testing and cleaning up any fallout of this issue, make sure to patch your systems again!

Update: this also goes for Windows 10 Updates

Also see fellow MVP Mikael Nystrom blog post  https://deploymentbunny.com/2017/10/11/the-october-2017-update-inaccessible-boot-device/

Update: we now also have the official MSFT response & fix for each and every scenario right here https://support.microsoft.com/en-us/help/4049094/windows-devices-may-fail-to-boot-after-installing-october-10-version-o

The Zombie ISV®

The Zombie ISV® is the type that should have been extinct based on the current state of technology. Let me give you an idea what that current state of technology means in our neck of the woods. Last week our team started deploying some DELL R720  PowerEdge servers to replace the last W2K8R2 Hyper-V cluster in the company with a Windows Server 2012 one. The older hardware will be recycled. Some will live on as test servers, backup media servers. All running Windows Server 2012 of course. One of them will become our physical (SAN) LUN to VHDX converter server so we can move our large LUNS (2T-15TB) to vhdx. Later this year 10Gbps networking, RDMA Mellanox cards and ODX will provide for fast vhdx movement to their new virtual hosts. Work in progress, but it should give you an idea about what we’re working with.

It may surprise you but even we have 2 Windows Server 2003 physical servers left. One is a DELL NX1950 Storage server that has been serving local workspace to a team that does image parsing (12TB). That one is >6 years old and is slated for retirement. We don’t need this concept anymore. We can build anything we want for such purposes using Windows Server 2012 Storage spaces and if required leverage the in box iSCSI target. To build it we can just draw disk bays, disk, servers from the retired hardware shelf, no sweat. We have plenty of spare parts and it works just fine. If it’s cost efficient and an effective solution, we roll that way.

The other one is a server for the financial software sold by a company (the Zombie ISV®) that does not believe in virtualization. It’s running code that’s over a 12 years old (legacy java run times and even that was a success because it used to be JInitiator until a a few years ago). There is no life cycle planning what so ever and when after 5 years the hardware needed replacing we got nothing but silence form the vendor. After months of asking for a meeting on the what and how (OS upgrade, x64, virtualization) and being ignored we just took a decommissioned server that had two years of warranty left and transplanted the disks. Even if the warranty runs out on that one we have some of the same model in the spare parts cabinet.

The workload itself runs just fine virtualized but they don’t support that. Luckily for the people that have to do it in their environment they run zero change of that Zombie ISV® ever noticing that a server is virtualized anyway. They also don’t get the concept of a dedicated service account in windows. So they end up with the database or BI services running under their remote support credentials that expire and get disabled by the helpdesk. Sigh. They don’t see the need to proactively support operating systems above Windows XP or browsers after IE 6.0. We did a lot of hacks to keep that system working and came to despise the total lack of technological expertise and professionalism of the vendor. Their “consultants” that’s don grasp x64 bit, or they download installers for 4 hours during a paid day of consulting … sickening to the stomach. Meetings with the account managers (they seem to travel in packs) is a lot of vacant bank stares and apathy. They don’t have answers, they don’t look for answers, they simply don’t care. The idea was to replace the package, but it was not to be. But in the end we settled for throwing all responsibility for it so they’ll find a place to host it and our bookkeepers can access over a secure remote connection. At least we have gotten rid of this security risk in our environment.

That people, is the miserable state of some ISVs in the 21st century. But it’s not just them. It’s a testimonial to the degree in which companies get tied up and locked in to mediocre solutions and technology debt. In the infrastructure world (storage, networking, servers, virtualization) people who know what they’re doing do not allow this to happen. As more and more decisions on software and applications are made by business & analyst types we are seeing an increase in technology debt and lack of any life cycle management. So where we have seen infrastructure get more and more bang for the buck we’ve also seen the software & services cost explode and on top of that incur technology debt, expenses and risks on the business. That’s pretty bad. I see a growing divide in a lot of companies between ever more efficient and cost effective infrastructure (combined with cloud solutions) and the slowness of getting custom software into production combined with issues concerning supportability and upgradeability. All this at ever increasing costs and FTEs. That’s not supposed to happen but it is, despite the high investments in * analysts, business consultants, architects, * coaches, project managers, IT managers etc. in the era of the cloud. This is regression.  It all sounds like the result of the feel good EQ approach to business without results but hey, no one feels left behind Confused smile. I believe a mate of mine calls this the race to the bottom. No wonder some companies that I know have done away with all this and just let business units organize themselves organically. They either fail and disappear of thrive and prosper, but a no time to they fall in to the trap of over organized pseudo flat structure (i.e. pass the hot potato and no responsibility) that still manages to create ever more managerial positions (flat?) whilst realizing ever less results. We’ve seen the financial and housing market charades collapse. Guess what’s next? There won’t be a bail out for you or me, beware of that.