Skip to main content

Automating patching with AWS Systems Manager

·1134 words·6 mins
Sebastiaan Brozius
Author
Sebastiaan Brozius

The code that accompanies this blogpost can be found here

Recently I’ve been looking into patching Windows servers that have dependencies between them, using AWS Systems Manager.

The use-case was an application that exists of web servers, middleware servers and a database server.

Application diagram

The web servers have connections open to the database server, and the middleware servers run processes that get information from the database server.

The servers were patched manually, by stopping the services on the web servers and middleware servers first and checking that all middleware services were stopped, before stopping the databases. Once that was done, the servers were updated. After patching, the databases were first brought back online, before starting the middleware services and the web services again.

To set this up, I created some PowerShell scripts (with a little bit of SSM variable flavour) to be run on the instances to stop and start the services, as well as checking the services before continuing to the next step. These scripts were put as SSM documents, to be called from an automation document.

Example script (Start-Components.ps1):

try {
  $_serverRole = "{{ServerRole}}" # This is an SSM variable reference
  $_fqdn = "$((Get-WmiObject Win32_ComputerSystem).DNSHostName).$((Get-WmiObject Win32_ComputerSystem).Domain)"
  Write-Output "[INF] Starting Components on $($_fqdn) with server role '$_serverRole'"

  switch ($_serverRole) {
    Web { 
      Write-Output "[INF] Setting Startup Type for web services where the current StartType is Manual to Automatic and starting them."

      Get-Service iisadmin | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running
      Get-Service w3svc | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running
    }
    Middleware {  
      Write-Output "[INF] Doing stuff to enable the middleware services to start."

      # Your code here
    }
    Database {  
      Write-Output "[INF] Setting Startup Type for all database services where the current StartType is Manual to Automatic and starting them."
      Get-Date -Format "yyyy-MM-dd HH:mm:ss"
      
      Get-Service *sql* | Where-Object StartType -eq "Manual" | Set-Service -StartupType Automatic -Status Running

      Write-Output "[INF] Making sure all database services are started before continuing."
      # When there are no services that match the name, the while loop will not be entered.
      while (Get-Service *sql* | Where-Object Status -ne Running) {
        Write-Output "[DEB] [$(Get-Date -Format "yyyy-MM-dd HH:mm:ss")] Not all database services have started yet. Waiting a little longer."
        Start-Sleep -Seconds 60
      }
    }
    Default { }
  }
}
catch {
  Write-Output "[ERR] Failed to start components!"
  Write-Error $Error[0] -ErrorAction Continue
  exit 1
}

Which is consumed to create an SSM document using Terraform:

resource "aws_ssm_document" "patching_start_components" {
  name          = "Patching-StartComponents"
  document_type = "Command"
  target_type   = "/AWS::EC2::Instance"

  content = jsonencode({
    schemaVersion = "2.2"
    description   = "Patching Post-install Start Components Document"
    parameters = {
      ServerRole = {
        type        = "String"
        description = "Role of the server (Web, Middleware, Database, None)"
        default     = "None"
        allowedValues = [
          "Web",
          "Middleware",
          "Database",
          "None",
        ]
      }
    }
    mainSteps = [
      {
        action = "aws:runPowerShellScript"
        name   = "StartComponents"
        precondition = {
          StringEquals = [
            "platformType",
            "Windows"
          ]
        }
        inputs = {
          runCommand = split("\n", file("${path.cwd}/powershell_scripts/Start-Components.ps1"))
        }
      }
    ]
  })
}

Using an automation document, we can orchestrate the flow of patching. In the example code, I’ve also included a method to patch servers of the same function at different times. For this, the option PatchWindow has been added, with allowed values Monday and Wednesday. The output of each step is redirected to an encrypted CloudWatch log-group.

resource "aws_ssm_document" "patching_automation" {
  name            = "Patching-Automation"
  document_type   = "Automation"
  document_format = "YAML"

  content = <<EOT
description: |-
  # Patching Automation

  This script provides a staged patching experience. Services are stopped in a specific order on specific instances after which patching is run, and services are started again on servers in reverse order.
schemaVersion: '0.3'
parameters:
  PatchWindow:
    type: String
    allowedValues:
      - Monday
      - Wednesday
    description: Patch-window to run for. Determines which servers are affected.
mainSteps:
  - name: StopWebServerServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_stop_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Web
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Web
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Stop the services on the web servers
    nextStep: StopMiddlewareServices
    onFailure: 'step:StartWebServerServices'
  - name: StopMiddlewareServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_stop_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Middleware
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Middleware
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Stop the services on the middleware servers
    nextStep: StopDatabaseServices
    onFailure: 'step:StartMiddlewareServices'
  - name: StopDatabaseServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_stop_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Database
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Database
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Stop the services on the database servers
    nextStep: PatchServers
    onFailure: 'step:StartDatabaseServices'
  - name: PatchServers
    action: 'aws:runCommand'
    inputs:
      DocumentName: AWS-RunPatchBaseline
      Targets:
        # Uncomment the following lines to only patch specific server-roles
        # - Key: 'tag:ServerRole'
        #   Values:
        #     - Web
        #     - Middleware
        #     - Database
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        Operation: Install
        RebootOption: RebootIfNeeded
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Patch the servers
    nextStep: StartDatabaseServices
    onFailure: Abort
  - name: StartDatabaseServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_start_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Database
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Database
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Start the services on the database servers
    nextStep: StartMiddlewareServices
    onFailure: Abort
  - name: StartMiddlewareServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_start_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Middleware
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Middleware
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Start the services on the middleware servers
    nextStep: StartWebServerServices
    onFailure: Abort
  - name: StartWebServerServices
    action: 'aws:runCommand'
    inputs:
      DocumentName: ${aws_ssm_document.patching_start_components.name}
      Targets:
        - Key: 'tag:ServerRole'
          Values:
            - Web
        - Key: 'tag:PatchWindow'
          Values:
            - '{{PatchWindow}}'
      Parameters:
        ServerRole: Web
      CloudWatchOutputConfig:
        CloudWatchLogGroupName: ${aws_cloudwatch_log_group.automated_patching.name}
        CloudWatchOutputEnabled: true
    description: Start the services on the web servers
    isEnd: true
EOT
}

The automation document allows for some error-handling as well. As you can see in the example, when the step StopMiddlewareServices fails, it will skip to step StartMiddlewareServices (defined with the line onFailure: 'step:StartMiddlewareServices') and will proceed from there.

Once we have the automation document in place, we can create maintenance windows with an associated task, to execute the automation document for that triggers automatically executing the automation document.

resource "aws_ssm_maintenance_window" "install_window_monday" {
  enabled  = true
  name     = "patch-window-monday"
  schedule = local.patching.cron_patching_monday
  duration = 4
  cutoff   = 2
}

resource "aws_ssm_maintenance_window_task" "task_install_patches_monday" {
  window_id = aws_ssm_maintenance_window.install_window_monday.id
  name      = "install-patches-monday"
  task_type = "AUTOMATION"
  task_arn  = aws_ssm_document.patching_automation.name
  priority  = 5

  task_invocation_parameters {
    automation_parameters {
      document_version = "$LATEST"

      parameter {
        name   = "PatchWindow"
        values = ["Monday"]
      }
    }
  }
}

In this example, instances with a tag PatchWindow with a value of Monday will be targeted for the maintenance task.

After applying the code to your environment, instances can be included by setting two tags on them. PatchWindow determines the maintenance window the instance will be included in. In this example, valid values are Monday and Wednesday. ServerRole determines which actions in the PowerShell scripts will be taken. In this example, valid values are Web, Middleware, Database or None.