Automating patching with AWS Systems Manager
The code that accompanies this blogpost can be found here
Recently I’ve been looking into patching Windows servers that have dependencies between them, using AWS Systems Manager.
The use-case was an application that exists of web servers, middleware servers and a database server.
The web servers have connections open to the database server, and the middleware servers run processes that get information from the database server.
The servers were patched manually, by stopping the services on the web servers and middleware servers first and checking that all middleware services were stopped, before stopping the databases. Once that was done, the servers were updated. After patching, the databases were first brought back online, before starting the middleware services and the web services again.
To set this up, I created some PowerShell scripts (with a little bit of SSM variable flavour) to be run on the instances to stop and start the services, as well as checking the services before continuing to the next step. These scripts were put as SSM documents, to be called from an automation document.
Example script (Start-Components.ps1
):
try {
$_serverRole = "{{ServerRole}}" # This is an SSM variable reference
$_fqdn = "$((Get-WmiObject Win32_ComputerSystem).DNSHostName).$((Get-WmiObject Win32_ComputerSystem).Domain)"
Write-Output "[INF] Starting Components on $($_fqdn) with server role '$_serverRole'"
switch ($_serverRole) {
Web {
Write-Output "[INF] Setting Startup Type for web services where the current StartType is Manual to Automatic and starting them."
Get-Service iisadmin | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running
Get-Service w3svc | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running
}
Middleware {
Write-Output "[INF] Doing stuff to enable the middleware services to start."
# Your code here
}
Database {
Write-Output "[INF] Setting Startup Type for all database services where the current StartType is Manual to Automatic and starting them."
Get-Date -Format "yyyy-MM-dd HH:mm:ss"
Get-Service *sql* | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running
Write-Output "[INF] Making sure all database services are started before continuing."
# When there are no services that match the name, the while loop will not be entered.
while (Get-Service *sql* | Where-Object Status -ne Running) {
Write-Output "[DEB] [$(Get-Date -Format "yyyy-MM-dd HH:mm:ss")] Not all database services have started yet. Waiting a little longer."
Start-Sleep -Seconds 60
}
}
Default { }
}
}
catch {
Write-Output "[ERR] Failed to start components!"
Write-Error $Error[0] -ErrorAction Continue
exit 1
}
Which is consumed to create an SSM document using Terraform:
resource "aws_ssm_document" "patching_start_components" {
name = "Patching-StartComponents"
document_type = "Command"
target_type = "/AWS::EC2::Instance"
content = jsonencode({
schemaVersion = "2.2"
description = "Patching Post-install Start Components Document"
parameters = {
ServerRole = {
type = "String"
description = "Role of the server (Web, Middleware, Database, None)"
default = "None"
allowedValues = [
"Web",
"Middleware",
"Database",
"None",
]
}
}
mainSteps = [
{
action = "aws:runPowerShellScript"
name = "StartComponents"
precondition = {
StringEquals = [
"platformType",
"Windows"
]
}
inputs = {
runCommand = split("\n", file("${path.cwd}/powershell_scripts/Start-Components.ps1"))
}
}
]
})
}
Using an automation document, we can orchestrate the flow of patching. In the example code, I’ve also included a method to patch servers of the same function at different times. For this, the option PatchWindow
has been added, with allowed values Monday
and Wednesday
. The output of each step is redirected to an encrypted CloudWatch log-group.
resource "aws_ssm_document" "patching_automation" {
name = "Patching-Automation"
document_type = "Automation"
document_format = "YAML"
content = <<EOT
description: |-
# Patching Automation
This script provides a staged patching experience. Services are stopped in a specific order on specific instances after which patching is run, and services are started again on servers in reverse order.
schemaVersion: '0.3'
parameters:
PatchWindow:
type: String
allowedValues:
- Monday
- Wednesday
description: Patch-window to run for. Determines which servers are affected.
mainSteps:
- name: StopWebServerServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_stop_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Web
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Web
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Stop the services on the web servers
nextStep: StopMiddlewareServices
onFailure: 'step:StartWebServerServices'
- name: StopMiddlewareServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_stop_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Middleware
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Middleware
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Stop the services on the middleware servers
nextStep: StopDatabaseServices
onFailure: 'step:StartMiddlewareServices'
- name: StopDatabaseServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_stop_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Database
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Database
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Stop the services on the database servers
nextStep: PatchServers
onFailure: 'step:StartDatabaseServices'
- name: PatchServers
action: 'aws:runCommand'
inputs:
DocumentName: AWS-RunPatchBaseline
Targets:
# Uncomment the following lines to only patch specific server-roles
# - Key: 'tag:ServerRole'
# Values:
# - Web
# - Middleware
# - Database
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
Operation: Install
RebootOption: RebootIfNeeded
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Patch the servers
nextStep: StartDatabaseServices
onFailure: Abort
- name: StartDatabaseServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_start_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Database
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Database
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Start the services on the database servers
nextStep: StartMiddlewareServices
onFailure: Abort
- name: StartMiddlewareServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_start_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Middleware
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Middleware
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Start the services on the middleware servers
nextStep: StartWebServerServices
onFailure: Abort
- name: StartWebServerServices
action: 'aws:runCommand'
inputs:
DocumentName: ${aws_ssm_document.patching_start_components.name}
Targets:
- Key: 'tag:ServerRole'
Values:
- Web
- Key: 'tag:PatchWindow'
Values:
- '{{PatchWindow}}'
Parameters:
ServerRole: Web
CloudWatchOutputConfig:
CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
CloudWatchOutputEnabled: true
description: Start the services on the web servers
isEnd: true
EOT
}
The automation document allows for some error-handling as well. As you can see in the example, when the step StopMiddlewareServices
fails, it will skip to step StartMiddlewareServices
(defined with the line onFailure: 'step:StartMiddlewareServices'
) and will proceed from there.
Once we have the automation document in place, we can create maintenance windows with an associated task, to execute the automation document for that triggers automatically executing the automation document.
resource "aws_ssm_maintenance_window" "install_window_monday" {
enabled = true
name = "patch-window-monday"
schedule = local.patching.cron_patching_monday
duration = 4
cutoff = 2
}
resource "aws_ssm_maintenance_window_task" "task_install_patches_monday" {
window_id = aws_ssm_maintenance_window.install_window_monday.id
name = "install-patches-monday"
task_type = "AUTOMATION"
task_arn = aws_ssm_document.patching_automation.name
priority = 5
task_invocation_parameters {
automation_parameters {
document_version = "$LATEST"
parameter {
name = "PatchWindow"
values = ["Monday"]
}
}
}
}
In this example, instances with a tag PatchWindow
with a value of Monday
will be targeted for the maintenance task.
After applying the code to your environment, instances can be included by setting two tags on them.
PatchWindow
determines the maintenance window the instance will be included in. In this example, valid values are Monday
and Wednesday
.
ServerRole
determines which actions in the PowerShell scripts will be taken. In this example, valid values are Web
, Middleware
, Database
or None
.