Lessons learned from migrating 42 servers to AWS

Earlier this year I’ve worked on a project where we had to migrate 42 servers from a data center to AWS as part of an AWS Migration Acceleration Program (MAP) deal. There was some pressure, since the contract with the data center would end about 2 months later and the customer didn’t want to renew or extend the current contract with the data center.

During this project I learned some things I think are valuable and would like to share. This might save you some time and/or frustration when you’re working on a similar project.

Disclaimer: this is not meant to be a full migration guide, but rather things I encountered and want to share. Take from this what you can use, and create a migration plan for your specific situation.

The tools used
#

For this project, the following tools were used:

AWS Migration Hub
AWS Application Migration Service
Terraform (or any IaC tool of your choice)
Scripting-language of your choice (I used PowerShell)
Bash-scripting (the servers were running Linux)

Step one: Inventarisation
#

To know what exactly you’re dealing with, it’s important to make an inventarisation of the environment you’ll be migrating, and everything it makes use of or is used by. Having a clear overview makes it easier to spot potential issues, plan ahead and help set up a realistic time frame for the migration.

Network / subnets

Get all information about the current networks and subnets the servers are using, are the servers using static or dynamic IP addresses, are there any VPN connections, do we need to take any allow-listings (both inbound and outbound) into account and what public IP addresses are in use, if any. The customer might even have their own public IP range which they want to (partially) move to AWS.
Firewall rules

Try to get a complete overview of all firewall rules that are in place. This helps in determining which servers should be accessible from where, should be able to connect to where, and might also help you spot issues with the current setup so you can mitigate those in the new setup.
Traffic flows between servers

The firewall rules might be of use for that, but traffic between private subnets is often unrestricted. Knowing the traffic flows between servers will help you set up more restrictive (and safer) security groups within AWS.
DNS domains

Does the customer want to move DNS domains to AWS? Are there DNS-records that point to the servers and need to be changed? If there are changes to be made to DNS records and/or domains, some planning needs to be done to make sure that during the actual migration, you don’t have to wait hours or days before a change has been propagated throughout the internet.
Certificates

Do the servers or any applications running on them use any certificate and how are they managed? Knowing this can help determine how to expose an application to the internet; can you use an Application Load Balancer or is the certificate managed through an automated system which runs on the server itself? In the latter case, you either have to use a Network Load Balancer, connect the server ‘directly’ to the internet (bad practice!) or change how the certificate is managed, which might have a big impact on the work that needs to be done.
Backup RTO and RPO

For setting up the new environment, it’s important to know what RPO and RTO the customer requires for which service / application / server. Setting up AWS Backup in advance makes it easier to enable it during or just after the migration.
Software license requirements

Some software vendors use the MAC-address of a server to bind their licenses to. If that’s the case, some additional actions need to be taken to ensure you don’t have to keep changing your license registration with such vendors.
OS versions being used and patch level

Before you start working on preparing for the migration, it’s wise to know which OSes are being used, and what versions. When you encounter older OS versions, there might be more work involved installing the necessary agents, or it might even be impossible. Also, knowing the current patch-level of the OSes and how patching is managed is important to know for the new environment in AWS.
Software used

Specific products and/or versions are eligible for additional discounts in AWS MAP, like commercial databases, SAP and more.

Step two: Make a plan
#

Once you’ve got at least most of the information, it’s time to start making a plan.

Make an IP-plan for the new environment

You won’t always be able to keep the current IP-addresses and creating a new IP-plan is important for setting up the new network and subnets, with proper sizing.
Determine the security groups to create

Make sure you know the needed security groups to allow traffic between servers, and allow the required inbound and outbound traffic. Determine which security groups to make and what servers to attach them to.
Make sure you have your MAP tag number (MPE ID)

This is needed when deploying resources to get the discount. Determine how you will apply the tags and what resources might need alternate tags.
Determine the use of a launch template (highly recommended!)

AWS Application Migration Service makes use of launch templates. Determining if you’re going to use it and what settings you want to specify in it, helps to gather possible additional info.
Determine the order of migration

Make an initial order of migration, and keep validating that order until the actual migration.

Step three: Preparations
#

Once you’ve created an initial plan to work from, it’s time to start the actual preparations.

Set up the the management account listed in the Migration Plan (the management account that’s part of the MAP deal) and activate the Cost Allocation Tag required for MAP 2.0. This info should be made available to you by your AWS representative for the MAP deal.
Set up AWS Migration Hub and either install the Discovery Agent on the servers you’re migrating, or, if you have access to the hypervisor layer, install the appliance. More info on these can be found in the AWS Documentation for AWS Migration Hub
Next, deploy the infrastructure you’re migrating the servers to, including an initial security group to assign to the launch template, and an Instance Profile with the appropriate permissions. Next, create VPC Endpoints for the services SSM uses; this way you should be able to access the servers even when they cannot connect to the internet (which is probably the case during the test phases).

Also make sure that you tag everything with the map-migrated tag, also for the infrastructure. With Terraform, you can set this using the AWS provider parameter default_tags. More info on the exact tag value should be available through the AWS MAP channel.
Once you’ve got the basic infrastructure set up, initialise the AWS Application Migration Service. During the initialisation, also make sure to set up the default launch template. Make sure the MAP tag is added with the appropriate value, the security group, instance profile, network subnet, et cetera.
After AWS Application Migration Service has been initialised, the Replication Agent can be installed on the servers to be migrated. The replication agent uses TCP port 1500 to connect to the AWS Application Migration Service, so make sure any firewall allows TCP port 1500 outbound for the source servers.

NB: The Replication Agent requires the Linux headers to install. For older Linux-versions this could mean you have to locate the Linux headers for the specific release, since they might no longer be available through the distributions update manager.
Install the AWS SSM Agent on the source servers. This is helpful to be able to connect to the server through the AWS Console using Session Manager, or even using the Session Manager Plugin for the AWS CLI. During testing and the actual migration, this can prove useful when you’re running into issues with any server.
Once the servers have started replicating, we have to play the waiting game.

Until the servers have finished their initial replication, there’s not much to be done. The time it takes before the initial replication is done, is dependent on the speed of the internet connection, the total amount of data to be replicated, as well as the number of changes to the filesystem the source systems have. The AWS Application Migration Service console gives an estimate of the time required to complete replication, which is constantly updated.

Step four: Test, test, test!
#

When all (or at least the ones you want to start with) servers have replicated, you can start testing.

In the AWS Application Migration Service console, select one or more servers to test, and launch test instances.

Make sure that the test-servers are unable to contact live servers, so they do not contaminate any production environment.
Create a migration-script per server. Do multiple test-runs to check and improve the migration-scripts.
During testing, you might encounter software that can throw a wrench in the migration, like corosync and pacemaker. When you encounter such software, determine if you still need it and take action accordingly to mitigate any possible issues that might arise by keeping those configurations as they are.
Evaluate of your intended order of migration is valid. During testing you might find a different order is needed.
Create waves based on the order of migration for a simpler orchestration during the actual migration.
Do at least one full test-migration. This helps determine how much time is needed for the full migration. This is important for how much down-time you’ll have, which needs to be communicated with the customer and any users of the application(s), as well as help in deciding the moment of the actual migration, the number of people working on the migration, when the test-persons should be able to start testing the application after migration, et cetera.
If you’re moving any server from being directly exposed to the internet, to being fronted by a load balancer, test the load balancer configuration as best you can.
Once you’re done testing a source server, mark it as ‘Ready for cut-over’ in AWS Application Migration Service.
If there are DNS changes to be made, prepare for them; lower TTL values for records that need to be changed, and prepare any domain that needs to be moved to Route53, or even move them in advance if possible.

Step five: The real deal
#

This is what you’ve been testing for!

Shut down any running services on the live servers, especially databases, and wait for the last changes to be replicated to AWS.
Start migrating in waves.
Make sure your security groups have the proper access (they should at least be reachable for the group of test-users)
Have your test-group test as early as possible and have a select group of people report on any findings. Triage what needs to be fixed right away, and what can wait. Have product owners participate in this where possible.
Mark servers that have been given the green light as ‘Finalize cut-over’ in AWS Application Migration Service to indicate they’re finished.
Turn on VPC Flow Logs to help troubleshoot any network-issues during the migration.

Step six: The aftermath
#

Once the migration has been finished successfully, there’s a few more things that need to be done.

Turn off the old servers, or at the very least make sure that the applications will not be enabled again.
Make sure the servers and services are being backed up in AWS.
Mark the migrated servers as ‘Mark as archived’ in AWS Application Migration Service.
Remove any software from the servers that was specifically needed for the data center architecture (e.g. VMware tool, Azure tools)

Points of attention
#

During both testing and the actual migration, when launching multiple (bigger) instances at the same time, one or more instances might respond badly/have weird issues. In that case, stop the instance(s) in the AWS Console, wait a minute or two, and start it up again. The reason for this is that the underlying host has issues allocating the proper resources to the instance. Stopping the instance and starting it again relocates the instance to a host that has sufficient resources available for the instance.
If you’re using user_data in your launch template(s), this will only be run when the server has a working network connection. If a server has no working network connection, user_data cannot be retrieved from the instance metadata and cannot be run.
Make sure that the customer tests the application(s) during migration and sign off on them. Ultimately, it’s the customers responsibility to determine if an application is working as intended and if all data is correctly transferred.

The tools used #

Step one: Inventarisation #

Step two: Make a plan #

Step three: Preparations #

Step four: Test, test, test! #

Step five: The real deal #

Step six: The aftermath #

Points of attention #