Learning “AWS Backup” Restorations for On-Prem VMWare VMs

CF Webtools has maintained VMWare ESXi guest OS instances, managed by vCenter, for about 7 years. They are a mix of Linux and Windows Server OSs and are maintained at a secure and redundant co-location data center. While an expensive up-front investment, it has paid for itself over those years, and we have a plan to continue that solution for about another 5 years. A recent upgrade to the next major version proved that virtual machines take a fraction of time for maintenance compared to bare metal instances. Granted, there’s some spin-up time when things work for so long, and you must remember, research, and troubleshoot procedures. Managed cloud takes almost all that time out of the equation, making it my favorite. Though I do miss hands-on hardware here and there.

Some of our on-prem VMs are critical, and some are not. The critical ones have always been backed up with different solutions, depending upon what they are and what the recovery needs look like. However, almost all have come with challenges. So I wanted to look for a VM snapshot-based cloud backup solution that I could trust and would be budget-friendly.

My first direction was to research Veeam. Their solution is very well known. However, it was a struggle to get the attention of Veeam and CDW as a small business without an existing account. I was able to lean on one of our hardware vendors, xByte, who hooked us up with one of their Veeam partners. But it was determined that it was fairly costly with a per-instance license model compared to our existing solutions. So I continued my search.

I then found AWS Backup has an on-prem VMWare solution. AWS Backup is relatively new to the backup game, but its implementations are continually growing. We currently use that service for all our AWS EC2 backups. That service was a “God send” after numerous awful implementations of custom Lambda/CloudWatch scripts and an EBS Automation method. Finally, a solution for what should have been around since the start of EC2.

As of November 2021, AWS Backup offers backup for on-prem VMWare vCenter servers. You must install their Storage Gateway virtual appliance as the “middleman” agent. I was hoping for an “agentless” solution; however, we only pay $0.05/GB-Mo warm storage and $0.01/GB-Mo Cold Storage. That’s a considerable saving, considering we do not have to pay for a license per instance, and there are no incoming bandwidth fees! We will have to pay bandwidth for on-prem restores, but considering that is very rarely done, and bandwidth is relatively cheap, it’s a non-issue. We’d have to pay for storage anyway, so there’s no change.

Another significant advantage is we get a single backup solution for both on-prem and AWS Cloud. It’s one less piece of software we must be familiar with, document, troubleshoot, and keep updated. Outside of an office domain controller, we also anticipate a complete cutover to AWS in 5 years.

Continue reading

#aws-backup, #backup, #vm, #vmware, #vsphere

New AWS Windows and DR Services

49137469 - a word cloud of disaster recovery related items

On January 7th, 2019, AWS released the AMI for Windows Server 2019. This comes a few months after the October 2nd, 2018 release of the new Microsoft OS.

Some highlights include smaller and more efficient Windows containers, support for Linux containers for application modernization and App Compatibility Feature on Demand. It comes under the standard AWS Windows pricing model.

On that note, Windows Server 2008 R2 (SP1) is at end-of-life in about one year, while Windows Server 2012 R2 End of Mainstream support was back in October of 2018. Don’t let technical debt put you into a predicament. CF Webtools operations team can assist you with upgrading your operating system to a more recent version, whether it be the weather tested Windows Server 2016 or the most modern version of Windows Server 2019.

CF Webtools uses CloudEndure to provide disaster recovery (DR) services to our on-premise and data center clients. The service is low impact and gives you the security your organization demands. We investigated multiple DR services and chose CloudEndure to be our primary solution.

Amazon has recently acquired CloudEndure. The company was backed by Dell, Infosys and others. They have been an AWS Advanced Technology Partner since 2016. More details have yet to surface.

If you are interested in Disaster Recovery Services for your organization, please contact us and we’d love to help you.

SpamAssassin Automatically Restarts

At CF Webtools we run SpamAssassin on our Windows email server via means of “SpamAssassin in a Box” by JAM Software. (We also use their TreeSize software which I love to use to track down disk space issues in Windows.) SpamAssassin is a local service that our SmarterMail server talks to, filtering out SPAM based on a scoring system.

SA-eventThis morning I was alerted to the service not responding via our alerting software Nagios. When I checked on the service I saw it was stopped. From there I went down the freak’n rabbit hole.

I opened the even log and saw “SpamAssassin daemon (spamd.exe) restart successful. Looking through the list I see it restarts every hour. I hate when this happens because that means that some other process is likely running on a scheduled basis and causing it to crash. I did note in the “Recovery” tab of the services window that the service is set to restart upon failure. So my thought is it’s crashing and then automatically restarting.

On to further investigation. I found a “SpamD.log” file in “C:\ProgramData\JAM Software\spamdService\sa-logs\”. Upon inspection around the time of the alleged crash I found this entry:

[4980] info: spamd: server killed by SIGINT, shutting down

Okay, so what is “SIGINT” and what is it doing to my service? Googling really didn’t help. Any suggestions were Linux based and didn’t seem to be relevant.

I then turn to “Service.config” under  “C:\ProgramData\JAM Software\spamdService\”. Here I find “<SpamdRestartIntervalInMinutes>”. Wait what?! Looking at the manual I find this bit “Specifies the interval for restarting ‘SpamD’ (in minutes).”. Well this is interesting. I do some further Googling and find a changelog note for version 1.0 back from 2010. It reads:

To consider possible changes in SpamAssassin configuration and to prevent potential memory leaks,SpamAssassin in a Box periodically restarts the daemon.

Okay, now this sounds like some hacks I’ve done in my younger years.

But at least I found my answer. It’s supposed to do that!

Why didn’t the service automatically restart? Just a one-off issue I suppose. I found this in the log:

SpamAssassin daemon (spamd.exe) was started, but didn’t respond to any signals from localhost.
Please restart the service. Should the problem persist, please contact support

Resolving VMWare Converter File I/O Error

Here at CF Webtools we’ve been shifting towards Virtual Machines to replace our dedicated “iron” as Mark Kruger likes to say. Let me say for starters that I’m very impressed with the Dell PowerEdge VRTX Shared Infrastructure Platform. There is so much back-end power in that thing we’ve been able to move our entire set of staging platforms onto one M620 Server Node along with a RAID 6 shared PERC disk array. It handles it like a champ and each virtual server is extremely fast. After about a year of testing and no real issues we’ve been able to move some of our production servers onto another server node. We’re looking forward to adding a couple more server nodes as well. Each node runs VMware ESXi HyperVisor 6.0 via RAID 1 SD cards.

In addition we’re also migrating some workstation VM’s away from Hyper-V and onto a separate Dell Server that we’ve reclaimed.

vcenterconverter61But here’s the real reason I’m writing today:

FAILED: A file I/O error occurred while accessing ”.

I get this error when using “WMware vcenter Converter Standalone 6.0.0” to convert any powered-on Windows machines onto one of the ESXi instances. I don’t get this issue when converting power-on Linux machines. Very odd and Google results of forums haven’t been very helpful. Mostly just a lot of run chkdsk and check for fully qualified domain resolutions.

I’m not going to cover Linux conversions here since they work. But basically what a powered-on Windows conversion does is it installs a helper VM on the machine to be converted. It’s run as a service and you have the option to manually uninstall when finished or let it automatically uninstall.

Something, probably this helper service, then takes a snapshot of the source system. Then the helper VM does a block-level clone for each volume it finds.

Mine always failed after the snapshot and before the clone.

What I did was used the “Export logs…” link in the converter. The line I found interesting, reading the file vmware-converter-server-1.log, was:

error vmware-converter-server[01288] [Originator@6876 sub=Ufa.HTTPService] Failed to read request; stream: <io_obj p:0x03dc40ac, h:-1, <pipe ‘\\.\pipe\vmware-converter-server-soap’>, <pipe ‘\\.\pipe\vmware-converter-server-soap’>>, error: class Vmacore::TimeoutException(Operation timed out)

After some Google searching it dawned on me that I am using two IP subnets. One for the general network and one for management. My machine runs 10.0.0.* (general) and 10.1.1.* (management) subnets. The source system has 10.0.0.* assigned to it while the destination ESXi server has 10.1.1.* assigned to it.

Because my system can communicate with both networks, everything could communicate just fine with both the source and destination machines.

However once things get rolling, the process moves from my machine to communicating between the source and destination. My machine merely monitors the progress. Which makes sense. Keep out the middle man and you have efficient network data transfer.

So the fix here was to bind a temporary management subnet address (10.1.1.*) to the source machine’s NIC. Now the helper VM is able to communicate with the destination server over that management subnet. Continue reading

#convert, #esxi, #file-io-error, #vmware, #windows

Avoid Windows Server Data Storage Copy Pitfalls Using a Subfolder

Our normal web server consists of a OS and Program File drive (C:) and a data drive to hold website files (E:). This provides an extra layer of security, speed and helpful structure. Sometimes we will also add another data drive (F:) for clients with really large storage needs. For example all user uploaded photos goes onto a 2TB drive array.

So let’s say you have user upload photos dedicated to one drive. You may want to just place the data onto the root of the drive. Simple right?

Well here’s what you may run into: When migrating/copying that drive to a new drive/machine using Robocopy you’ll find a few issues: (robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /e /copy:DT /MT:8)

  1. If you’re putting the data into a subfolder this time, that root subfolder will become a system-hidden folder. The reason is you are copying the root of a drive. Pretty annoying.
    1. You can fix this by running this after the copy starts: “attrib -H -S F:\Data”
  2. It will try copy “System Volume Information” and “Recycle Bin”. But you’ll find out that your process will just get stuck because it doesn’t have permissions to do so.
    1. You can fix this by not copying any system or hidden files/folders:
      “robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /e /copy:DT /MT:8 /xd $Recycle.bin “System Volume Information”” FYI: I tried using “/xa:HS” instead of the /xd, but that didn’t work as expected.
    2. If you’ve already gone 8 hours into your copy operation just to find this out, speed things up by syncing things instead using: “robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /mir /copy:DT /MT:8 /xd $Recycle.bin “System Volume Information” /xo /fft”

So my point is, don’t put your data folder/file structure in the drive root. It’ll get mixed up with hidden-system files and folders and one day throw you for a loop. Instead put that all in a subfolder such as “F:\data”. Another example might be “E:\websites”.

Side-note: There are other copy methods to avoid this situation, however Robocopy is going to be one of your fastest options.

#folder, #server, #windows

IIS URL Multiple Specific Character Find/Replace

Today’s challenge at CF Webtools for myself was to find and replace any “_” (underscore) characters in a URL .htm file name and replace it with “-” (dash). The list I was given had file names with up to 7 underscores in any position. Example: my_file_name.htm

While I figured this would be a straight-forward task with IIS URL Rewrite, I was wrong.

In the end I found that I either had to create one rule for each possible underscore count or write a custom rewrite rule. I went the one rule per count route. I read in one blog you can only use up to 9 variables ({R:x}).

The other part of the rule was they had to be only in the “/articles/” directory.

My first challenge was just to get the right regular expression in place. What I found out was that the IIS (7.5) UI’s “Test Pattern” utility doesn’t accurately test. In the test this worked:

Input: http://www.test.com/articles/my_test.htm
Pattern: ^.*\/articles\/(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test"

However, this does not match in real-world testing. #1, don’t escape “/” (forward-slash) (really??). #2 the pattern is only matched against everything after the domain and first slash (http://www.test.com/).

So really, only this works:

Input: http://www.test.com/articles/my_test.htm
Pattern: ^articles/(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test"

In order to match against up to 8 underscores, you need 8 rules, each one looking for more underscores. So the next one would be:

Input: http://www.test.com/articles/my_test_file.htm
Pattern: ^articles/(.*)_(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test", {R:3} : "file"

To do this efficiently you just edit the web.config in the web root for that site. The end result ended up being:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <rewrite>
            <rules>
                <rule name="AUSx1" stopProcessing="true">
                    <match url="^articles/(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}.htm" />
                </rule>
                <rule name="AUSx2" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}.htm" />
                </rule>
                <rule name="AUSx3" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}.htm" />
                </rule>
                <rule name="AUSx4" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}.htm" />
                </rule>
                <rule name="AUSx5" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}.htm" />
                </rule>
                <rule name="AUSx6" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}.htm" />
                </rule>
                <rule name="AUSx7" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}-{R:8}.htm" />
                </rule>
                <rule name="AUSx8" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}-{R:8}-{R:9}.htm" />
                </rule>
            </rules>
        </rewrite>
    </system.webServer>
</configuration>

In the end this URL:

http://www.domain.com/articles/my_file_foo_bar.htm

becomes:

http://www.domain.com/articles/my-file-foo-bar.htm

#iis, #replace, #url, #url-rewrite

Manual Windows 2008 Registry Restore

After a Windows Update the lovely “Blue Screen of Death” appeared on one of our servers. Frantic to find a solution, “Boot to the last known working configuration” wasn’t working. A system restore was a last resort option.

Here’s what the error consisted of:

STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\Systemroot\System32\Config\SOFTWARE
or its log or alternate.
It is corrupt, absent, or not writable.

To resolve the issue I:

  1. Boot to the Windows 2008 Server Install DVD
  2. Click “Repair Computer” on the second screen
  3. Open a command prompt on the second or third prompt
  4. Change directory to C:\Windows\System32\Config\
  5. Rename “SOFTWARE” to “SOFTWARE.BAK”
  6. Copy “RegBack\SOFTWARE” to that directory
  7. Reboot

This restored the SOFTWARE registry to its previous state before the Windows Update. I then had a pending list of Windows Updates to install again. But I’ll leave that for another day for now to see if anyone else is having issues.

#blue-screen-of-death, #microsoft, #registry, #update, #windows, #windows-2008-server