Is it Cheaper to run Windows Servers on Azure than AWS?

Spoiler alert; the answer is yes. And by quite a bit. But with reservations. You have to look at the whole picture.

I took an example linked AWS account that had two Windows EC2 compute instances currently running. Both are webservers, one for production and one for staging. The following monthly cost comparisons use both on demand and savings plans that are for 1-year commitments with no down payment as this is typical for me.

AWS InstanceOn DemandSavings PlanAzure InstanceOn DemandSavings Plan
t3a.2xlarge$327$266B8ms$247$171
m7i-flex.2xlarge$535$461D8d v5$330$226

As you can see, I could potentially be saving up to $330 on just these two instances by moving to Azure from AWS. $3,960/yr is a decent vacation or maybe a paycheck. So why not switch cloud providers? Well, there is the cost of learning a new infrastructure and dealing with all the nuances that come with it. But in this specific case, it’s managed database services.

The following compares single AZ Microsoft SQL Server Web Edition on Amazon RDS with Provisioned vCore Azure SQL Database. Note that Azure SQL Database is edition agnostic. There is also Azure managed SQL, but that’s even more expensive. We won’t be comparing AWS reserved instances to Azure reserved instances because we may never buy a reserved RDS instance on AWS for this account. Once you buy it, you are very locked in. No instance type or size exchanges, nor selling it on a marketplace. But you can see, a reserved instance on Azure is still more expensive than AWS on demand.

AWS InstanceOn DemandAzure InstanceOn DemandReserved Capacity
db.m6i.xlarge$430D4s v5$737$581
db.m6i.2xlarge$890D8s v5$1,105$1,162

Find that odd? You actually get a better value on Azure when comparing it to the standard or enterprise edition. However, I deal primarily with Web and Express editions. The majority of applications I handle don’t require the functionality or redundancy built into standard and enterprise editions. However, if you were to require standard or enterprise, I would strongly suggest looking at using Azure to save you quite a bit of money, or start using Linux and an alternative database technology like Aurora on AWS.

When we factor in the managed database, we now do not find value moving to Azure from AWS as the realized savings is now lost. Yes there are options to spin up the database on a VM and manage it ourselves, but now you are adding in monthly labor costs and additional license costs for backups, not to mention the lost performance gains RDS or Azure SQL Database brings.

In conclusion, hosting Windows servers on Azure over AWS can save you money. But you need to factor in all the services, even beyond the database, including system administration labor and 3rd parties utility integration. Why can Azure have potentially significant savings on Windows servers? Because Microsoft owns both the infrastructure and the licenses. They do not translate the same license discounts to other cloud providers. For example, you used to be able to bring your own SQL license to AWS RDS, but that was discontinued some time ago.

Note: Turn off the “Azure Hybrid Benefit” slider when viewing Azure pricing. This option requires bringing your own license and does not facilitate an accurate pricing comparison.

If you are interested, Microsoft has an official AWS-to-Azure compute mapping guide; however, I used Microsoft Copilot to help me find the equivalent instance types and sizes. There are also web utilities that map them you can search for.

#aws, #azure, #comparison, #compute, #cost

Always Encrypt AWS EC2 Instances

When running a business and the goal is to become successful, you may inevitably fill out a risk assessment questionnaire that asks, “Is all your data encrypted at risk?”. And when you start looking at your list of EBS volumes created over the past period of time, you might be surprised to learn that the “encrypted” property of your volumes very well might equal “false”. NVMe instance store volumes are encrypted, but many EC2 solutions rely on EBS volumes, and they are not encrypted by default.

Encryption at rest for Amazon Elastic Block Store (EBS) volumes protects data when it’s not in use. It ensures that data stored on the physical storage media in an AWS data center is unreadable to anyone without the appropriate decryption key. Many argue this is a fundamental component of a strong security strategy, and they are not wrong.

AWS leverages its Key Management Service (KMS) to manage the encryption keys. You can use the default AWS-managed keys or create and manage your own customer-managed keys (CMKs) for more control over key policies, access permissions, and rotation schedules.

Does encryption at rest really matter while at an AWS data center, specifically? Unless you are dealing with classified information, probably not. AWS has an amazing level of virtual security separating your data from bad actors, both virtually and physically. The chances of someone walking off with your data medium (hard drive) are slim to none, and when the medium goes bad, the destruction and audit process is equally impressive. But at the end of the data, it matters to someone’s boss, their investors, policies, perception, and potentially your career. Many industry regulations and security standards, such as HIPAA, PCI DSS, and GDPR, require that sensitive data be encrypted at rest.

Security is like an onion with many layers. While AWS has robust physical and logical security controls in place, encryption at rest adds another vital layer of protection. This “defense-in-depth” approach ensures that even if one security control fails, others are there to prevent a breach.

Encryption isn’t limited to just the EBS volume itself. Per AWS documentation, when you encrypt an EBS volume, any snapshots you create from it and any subsequent volumes created from those snapshots will also be automatically encrypted. This provides an end-to-end security envelope for your data’s entire lifecycle.

So why not do it? The list of reasons is small, but you need to be aware of them, especially when dealing with AMIs and cross-region copies:

  • Very minimal increase in latency (typically negligible)
  • If a key becomes unusable (disabled or deleted), you cannot access the encrypted volume data until the key is restored
  • Additional key management overhead may be necessary if you opt to manage your own keys
  • AMIs with encrypted snapshots cannot be shared publicly – only with specific AWS accounts
  • Cannot directly copy encrypted snapshots to another region (must re-encrypt with destination region key)
  • Minimal additional cost for AWS KMS key usage

When creating an EC2 instance, the default view does not encrypt the volume. You will need to press the “Advanced” link in the “Configure storage” section first.

Then, expand your volume details. Next, change the “Encrypted” property from “Not encrypted” to “Encrypted”. Optionally, select your KMS key or leave it alone, and it will use the default. While you are here, you may wish to change the “Delete on termination” from “No” to “Yes”. This will help prevent any accidental data loss in edge cases, but be aware that this may lead to unexpected orphaned EBS volumes if you don’t go in and clean up your EBS volumes when you delete EC2 instances.

If you forget to turn this on, or you have an existing instance, you can still convert the volume to encrypted. It is a bit of a process though. You need to stop the instance, create an encrypted snapshot, detach the volume from the instance (jot down the device name such as /dev/xvda), restore the encrypted snapshot as a volume, and attach the volume to the instance with the same device mapping.

#aws, #ebs, #ec2, #encryption

Installing Docker w/ Compose on Amazon Linux 2023

Installing Docker with the Compose v2 plugin does not follow the recommended paths outlined on the Docker website. Specifically, it suggests that you add either the CentOS or Fedora repo and then run sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin, stating that sudo dnf install docker is the obsolete way of doing it (“ce” stands for community edition and “ee” stands for enterprise edition). Unfortunately, that does not work. Another thing to note is that Amazon’s repo uses a more stable version of Docker rather than the most current one. While this is typically a good thing, you may be missing out on a security patch you need.

Here is a script that I use for reference:

#ensure all the installed packages on our system are up to date
sudo dnf update
#use the default Amazon repository to download and install the Docker
sudo dnf install docker
#start Docker
sudo systemctl start docker
#start automatically with system boot
sudo systemctl enable docker
#confirm the service is running
sudo systemctl status docker
#add our current user to the Docker group
sudo usermod -aG docker $USER
#apply the changes we have done to Docker Group
newgrp docker
#create plugin directory for all users
sudo mkdir -p /usr/local/lib/docker/cli-plugins
#download current version of docker compose plugin
sudo curl -SL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" -o /usr/libexec/docker/cli-plugins/docker-compose
#make plugin executable
sudo chmod +x /usr/libexec/docker/cli-plugins/docker-compose
# restart docker service
sudo systemctl restart docker
#verify docker compose works
docker compose version
Note: Because the docker compose plugin was manually installed, it will not be updated with future yum/df updates. You will need to repeat the download and permission process to update it.

If you have a better way of doing this, I’d love to hear your feedback!

#docker, #docker-compose

Learning “AWS Backup” Restorations for On-Prem VMWare VMs

CF Webtools has maintained VMWare ESXi guest OS instances, managed by vCenter, for about 7 years. They are a mix of Linux and Windows Server OSs and are maintained at a secure and redundant co-location data center. While an expensive up-front investment, it has paid for itself over those years, and we have a plan to continue that solution for about another 5 years. A recent upgrade to the next major version proved that virtual machines take a fraction of time for maintenance compared to bare metal instances. Granted, there’s some spin-up time when things work for so long, and you must remember, research, and troubleshoot procedures. Managed cloud takes almost all that time out of the equation, making it my favorite. Though I do miss hands-on hardware here and there.

Some of our on-prem VMs are critical, and some are not. The critical ones have always been backed up with different solutions, depending upon what they are and what the recovery needs look like. However, almost all have come with challenges. So I wanted to look for a VM snapshot-based cloud backup solution that I could trust and would be budget-friendly.

My first direction was to research Veeam. Their solution is very well known. However, it was a struggle to get the attention of Veeam and CDW as a small business without an existing account. I was able to lean on one of our hardware vendors, xByte, who hooked us up with one of their Veeam partners. But it was determined that it was fairly costly with a per-instance license model compared to our existing solutions. So I continued my search.

I then found AWS Backup has an on-prem VMWare solution. AWS Backup is relatively new to the backup game, but its implementations are continually growing. We currently use that service for all our AWS EC2 backups. That service was a “God send” after numerous awful implementations of custom Lambda/CloudWatch scripts and an EBS Automation method. Finally, a solution for what should have been around since the start of EC2.

As of November 2021, AWS Backup offers backup for on-prem VMWare vCenter servers. You must install their Storage Gateway virtual appliance as the “middleman” agent. I was hoping for an “agentless” solution; however, we only pay $0.05/GB-Mo warm storage and $0.01/GB-Mo Cold Storage. That’s a considerable saving, considering we do not have to pay for a license per instance, and there are no incoming bandwidth fees! We will have to pay bandwidth for on-prem restores, but considering that is very rarely done, and bandwidth is relatively cheap, it’s a non-issue. We’d have to pay for storage anyway, so there’s no change.

Another significant advantage is we get a single backup solution for both on-prem and AWS Cloud. It’s one less piece of software we must be familiar with, document, troubleshoot, and keep updated. Outside of an office domain controller, we also anticipate a complete cutover to AWS in 5 years.

Continue reading

#aws-backup, #backup, #vm, #vmware, #vsphere

New AWS Windows and DR Services

49137469 - a word cloud of disaster recovery related items

On January 7th, 2019, AWS released the AMI for Windows Server 2019. This comes a few months after the October 2nd, 2018 release of the new Microsoft OS.

Some highlights include smaller and more efficient Windows containers, support for Linux containers for application modernization and App Compatibility Feature on Demand. It comes under the standard AWS Windows pricing model.

On that note, Windows Server 2008 R2 (SP1) is at end-of-life in about one year, while Windows Server 2012 R2 End of Mainstream support was back in October of 2018. Don’t let technical debt put you into a predicament. CF Webtools operations team can assist you with upgrading your operating system to a more recent version, whether it be the weather tested Windows Server 2016 or the most modern version of Windows Server 2019.

CF Webtools uses CloudEndure to provide disaster recovery (DR) services to our on-premise and data center clients. The service is low impact and gives you the security your organization demands. We investigated multiple DR services and chose CloudEndure to be our primary solution.

Amazon has recently acquired CloudEndure. The company was backed by Dell, Infosys and others. They have been an AWS Advanced Technology Partner since 2016. More details have yet to surface.

If you are interested in Disaster Recovery Services for your organization, please contact us and we’d love to help you.

SpamAssassin Automatically Restarts

At CF Webtools we run SpamAssassin on our Windows email server via means of “SpamAssassin in a Box” by JAM Software. (We also use their TreeSize software which I love to use to track down disk space issues in Windows.) SpamAssassin is a local service that our SmarterMail server talks to, filtering out SPAM based on a scoring system.

SA-eventThis morning I was alerted to the service not responding via our alerting software Nagios. When I checked on the service I saw it was stopped. From there I went down the freak’n rabbit hole.

I opened the even log and saw “SpamAssassin daemon (spamd.exe) restart successful. Looking through the list I see it restarts every hour. I hate when this happens because that means that some other process is likely running on a scheduled basis and causing it to crash. I did note in the “Recovery” tab of the services window that the service is set to restart upon failure. So my thought is it’s crashing and then automatically restarting.

On to further investigation. I found a “SpamD.log” file in “C:\ProgramData\JAM Software\spamdService\sa-logs\”. Upon inspection around the time of the alleged crash I found this entry:

[4980] info: spamd: server killed by SIGINT, shutting down

Okay, so what is “SIGINT” and what is it doing to my service? Googling really didn’t help. Any suggestions were Linux based and didn’t seem to be relevant.

I then turn to “Service.config” under  “C:\ProgramData\JAM Software\spamdService\”. Here I find “<SpamdRestartIntervalInMinutes>”. Wait what?! Looking at the manual I find this bit “Specifies the interval for restarting ‘SpamD’ (in minutes).”. Well this is interesting. I do some further Googling and find a changelog note for version 1.0 back from 2010. It reads:

To consider possible changes in SpamAssassin configuration and to prevent potential memory leaks,SpamAssassin in a Box periodically restarts the daemon.

Okay, now this sounds like some hacks I’ve done in my younger years.

But at least I found my answer. It’s supposed to do that!

Why didn’t the service automatically restart? Just a one-off issue I suppose. I found this in the log:

SpamAssassin daemon (spamd.exe) was started, but didn’t respond to any signals from localhost.
Please restart the service. Should the problem persist, please contact support

Resolving VMWare Converter File I/O Error

Here at CF Webtools we’ve been shifting towards Virtual Machines to replace our dedicated “iron” as Mark Kruger likes to say. Let me say for starters that I’m very impressed with the Dell PowerEdge VRTX Shared Infrastructure Platform. There is so much back-end power in that thing we’ve been able to move our entire set of staging platforms onto one M620 Server Node along with a RAID 6 shared PERC disk array. It handles it like a champ and each virtual server is extremely fast. After about a year of testing and no real issues we’ve been able to move some of our production servers onto another server node. We’re looking forward to adding a couple more server nodes as well. Each node runs VMware ESXi HyperVisor 6.0 via RAID 1 SD cards.

In addition we’re also migrating some workstation VM’s away from Hyper-V and onto a separate Dell Server that we’ve reclaimed.

vcenterconverter61But here’s the real reason I’m writing today:

FAILED: A file I/O error occurred while accessing ”.

I get this error when using “WMware vcenter Converter Standalone 6.0.0” to convert any powered-on Windows machines onto one of the ESXi instances. I don’t get this issue when converting power-on Linux machines. Very odd and Google results of forums haven’t been very helpful. Mostly just a lot of run chkdsk and check for fully qualified domain resolutions.

I’m not going to cover Linux conversions here since they work. But basically what a powered-on Windows conversion does is it installs a helper VM on the machine to be converted. It’s run as a service and you have the option to manually uninstall when finished or let it automatically uninstall.

Something, probably this helper service, then takes a snapshot of the source system. Then the helper VM does a block-level clone for each volume it finds.

Mine always failed after the snapshot and before the clone.

What I did was used the “Export logs…” link in the converter. The line I found interesting, reading the file vmware-converter-server-1.log, was:

error vmware-converter-server[01288] [Originator@6876 sub=Ufa.HTTPService] Failed to read request; stream: <io_obj p:0x03dc40ac, h:-1, <pipe ‘\\.\pipe\vmware-converter-server-soap’>, <pipe ‘\\.\pipe\vmware-converter-server-soap’>>, error: class Vmacore::TimeoutException(Operation timed out)

After some Google searching it dawned on me that I am using two IP subnets. One for the general network and one for management. My machine runs 10.0.0.* (general) and 10.1.1.* (management) subnets. The source system has 10.0.0.* assigned to it while the destination ESXi server has 10.1.1.* assigned to it.

Because my system can communicate with both networks, everything could communicate just fine with both the source and destination machines.

However once things get rolling, the process moves from my machine to communicating between the source and destination. My machine merely monitors the progress. Which makes sense. Keep out the middle man and you have efficient network data transfer.

So the fix here was to bind a temporary management subnet address (10.1.1.*) to the source machine’s NIC. Now the helper VM is able to communicate with the destination server over that management subnet. Continue reading

#convert, #esxi, #file-io-error, #vmware, #windows

Avoid Windows Server Data Storage Copy Pitfalls Using a Subfolder

Our normal web server consists of a OS and Program File drive (C:) and a data drive to hold website files (E:). This provides an extra layer of security, speed and helpful structure. Sometimes we will also add another data drive (F:) for clients with really large storage needs. For example all user uploaded photos goes onto a 2TB drive array.

So let’s say you have user upload photos dedicated to one drive. You may want to just place the data onto the root of the drive. Simple right?

Well here’s what you may run into: When migrating/copying that drive to a new drive/machine using Robocopy you’ll find a few issues: (robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /e /copy:DT /MT:8)

  1. If you’re putting the data into a subfolder this time, that root subfolder will become a system-hidden folder. The reason is you are copying the root of a drive. Pretty annoying.
    1. You can fix this by running this after the copy starts: “attrib -H -S F:\Data”
  2. It will try copy “System Volume Information” and “Recycle Bin”. But you’ll find out that your process will just get stuck because it doesn’t have permissions to do so.
    1. You can fix this by not copying any system or hidden files/folders:
      “robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /e /copy:DT /MT:8 /xd $Recycle.bin “System Volume Information”” FYI: I tried using “/xa:HS” instead of the /xd, but that didn’t work as expected.
    2. If you’ve already gone 8 hours into your copy operation just to find this out, speed things up by syncing things instead using: “robocopy \\OLD-SERVER\UserPhotos F:\Data\UserPhotos /mir /copy:DT /MT:8 /xd $Recycle.bin “System Volume Information” /xo /fft”

So my point is, don’t put your data folder/file structure in the drive root. It’ll get mixed up with hidden-system files and folders and one day throw you for a loop. Instead put that all in a subfolder such as “F:\data”. Another example might be “E:\websites”.

Side-note: There are other copy methods to avoid this situation, however Robocopy is going to be one of your fastest options.

#folder, #server, #windows

IIS URL Multiple Specific Character Find/Replace

Today’s challenge at CF Webtools for myself was to find and replace any “_” (underscore) characters in a URL .htm file name and replace it with “-” (dash). The list I was given had file names with up to 7 underscores in any position. Example: my_file_name.htm

While I figured this would be a straight-forward task with IIS URL Rewrite, I was wrong.

In the end I found that I either had to create one rule for each possible underscore count or write a custom rewrite rule. I went the one rule per count route. I read in one blog you can only use up to 9 variables ({R:x}).

The other part of the rule was they had to be only in the “/articles/” directory.

My first challenge was just to get the right regular expression in place. What I found out was that the IIS (7.5) UI’s “Test Pattern” utility doesn’t accurately test. In the test this worked:

Input: http://www.test.com/articles/my_test.htm
Pattern: ^.*\/articles\/(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test"

However, this does not match in real-world testing. #1, don’t escape “/” (forward-slash) (really??). #2 the pattern is only matched against everything after the domain and first slash (http://www.test.com/).

So really, only this works:

Input: http://www.test.com/articles/my_test.htm
Pattern: ^articles/(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test"

In order to match against up to 8 underscores, you need 8 rules, each one looking for more underscores. So the next one would be:

Input: http://www.test.com/articles/my_test_file.htm
Pattern: ^articles/(.*)_(.*)_(.*).htm$
Capture Groups: {R:1} : "my", {R:2} : "test", {R:3} : "file"

To do this efficiently you just edit the web.config in the web root for that site. The end result ended up being:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <rewrite>
            <rules>
                <rule name="AUSx1" stopProcessing="true">
                    <match url="^articles/(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}.htm" />
                </rule>
                <rule name="AUSx2" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}.htm" />
                </rule>
                <rule name="AUSx3" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}.htm" />
                </rule>
                <rule name="AUSx4" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}.htm" />
                </rule>
                <rule name="AUSx5" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}.htm" />
                </rule>
                <rule name="AUSx6" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}.htm" />
                </rule>
                <rule name="AUSx7" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}-{R:8}.htm" />
                </rule>
                <rule name="AUSx8" stopProcessing="true">
                    <match url="^articles/(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*)_(.*).htm$" />
                    <action type="Redirect" url="articles/{R:1}-{R:2}-{R:3}-{R:4}-{R:5}-{R:6}-{R:7}-{R:8}-{R:9}.htm" />
                </rule>
            </rules>
        </rewrite>
    </system.webServer>
</configuration>

In the end this URL:

http://www.domain.com/articles/my_file_foo_bar.htm

becomes:

http://www.domain.com/articles/my-file-foo-bar.htm

#iis, #replace, #url, #url-rewrite

Manual Windows 2008 Registry Restore

After a Windows Update the lovely “Blue Screen of Death” appeared on one of our servers. Frantic to find a solution, “Boot to the last known working configuration” wasn’t working. A system restore was a last resort option.

Here’s what the error consisted of:

STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\Systemroot\System32\Config\SOFTWARE
or its log or alternate.
It is corrupt, absent, or not writable.

To resolve the issue I:

  1. Boot to the Windows 2008 Server Install DVD
  2. Click “Repair Computer” on the second screen
  3. Open a command prompt on the second or third prompt
  4. Change directory to C:\Windows\System32\Config\
  5. Rename “SOFTWARE” to “SOFTWARE.BAK”
  6. Copy “RegBack\SOFTWARE” to that directory
  7. Reboot

This restored the SOFTWARE registry to its previous state before the Windows Update. I then had a pending list of Windows Updates to install again. But I’ll leave that for another day for now to see if anyone else is having issues.

#blue-screen-of-death, #microsoft, #registry, #update, #windows, #windows-2008-server