After our monthly OS patch and reboot, most of the Exchange services would not restart on one of our Exchange 2007 servers. The server in question is a special-purpose machine, holding the CAS, HUB, and Mailbox roles, but it’s for internal use only, and thus has no internet access. The event log showed only that the services would, “not start in a timely fashion”, and that each timed out after 30000 ms.
Since the services worked just fine prior to patching, our first thought was to remove the patches. They uninstalled with no problem, but the services still wouldn’t start. Rebooted again. Same problem, same errors in the event log.
Since this was affecting a production system, I opened a support incident with Microsoft, then set about researching the problem myself while waiting for the callback from an Exchange engineer. Since all I had to go on were the rather generic-sounding event log entries, I tried googling a few phrases. I quickly found several links that proved useful.
KB944752 – this article describes almost exactly what we were encountering, except that it concerns service failures that occur after installing a hotfix rollup, which we had not done.
This article describes a problem with services not starting in a secure Exchange environment. It also mentions post hotfix rollup installation.
This MS Exchange Team blog entry is very similar to the two above, and concerns service failures related an inability to verify the certificate that Microsoft used to sign the code in the .Net framework common runtime assembly.
The basic problem seems to be that whenever a module in the .Net assembly is updated, the first time the service is run, it attempts to validate the code signature by verifying the certificate against a list maintained by Microsoft at crl.microsoft.com, and since these secure servers can’t get to that site, the call times out and the services refuse to start.
Microsoft hadn’t called back in three hours, so I figured why not try the suggestions in the articles, and it worked. I added the following line to the config files for each of the affected service executables:
<generatePublisherEvidence enabled="false"/>
Once entered, the services started without delay, and the problem was solved. For a detailed explanation of the fix, see the articles above, as they explain it better than I ever could.
Now that that problem was solved, the whole reason I was on site that day was to install the latest hotfix rollup on the Exchange servers. But even that took extraordinarily long on that same server, when the installer seemed to hang while updating the .Net binaries. I did some more research and discovered a known problem concerning the installation of hotfix rollups on secure Exchange servers, for the same reason – the installer was attempting to verify the new modules against the certificate revocation list at Microsoft. The install will still work, but it will take very long since it tries to verify every module as it is being installed.
After 45 minutes, I started looking for ways to speed up the process, and once I found it, I was a bit embarassed that I hadn’t thought of it myself. I opened the hosts file on the secure Exchange server, and added an entry for crl.microsoft.com, then pointed it at 127.0.0.1. It won’t find the certificate revocation list there, but it will fail immediately instead of waiting for the site lookup to time out. Within 5 minutes, the hotfix rollup installation completed with no errors and everything is running smoothly now.
Oh, and by the time Microsoft finally called back, I had solved the original problem and was installing the hotfix rollup, so I just thanked him for calling and closed the case. The engineer congratulated me on solving the problem myself, which was nice of him. :-)