GKE Node Maintenance

Node maintenance

GKE allows you to auto-upgrade your Kubernetes control-plane and nodes. See Cluster upgrades and Auto-upgrading nodes for more details. Auto-Upgrade is enabled for the nodes by default. GKE allows you to specify maintenance windows and exclusions to control when an auto-upgrade is executed. See Configuring maintenance windows and exclusions

Change Kubernetes Version

No release channel is selected and therefore the version is set statically. Change the parameter cluster_version in your gke-terraform repository to change the Kubernetes version.

Verify Auto-Upgrade Settings

The node auto-upgrade is controlled by the following variables (derived from Terraform Kubernetes Engine Module)

  • maintenance_start_time: Time window specified for daily or recurring maintenance operations in RFC3339 format (default: 05:00)

  • maintenance_end_time: Time window specified for recurring maintenance operations in RFC3339 format (default: "")

  • maintenance_recurrence: Frequency of the recurring maintenance window in RFC5545 format. (default: "")

  • worker_max_surge: The number of additional worker nodes that can be added to the node pool during an upgrade

  • worker_max_unavailable: The number of worker nodes that can be simultaneously unavailable during an upgrade

  • builder_max_surge The number of additional builder nodes that can be added to the node pool during an upgrade

  • builder_max_unavailable: The number of builder nodes that can be simultaneously unavailable during an upgrade

Google GKE allows for a simple daily or a recurring maintenance window.

When maintenance_recurrence is not set, this automatically results in a daily maintenance window. See terraform module’s source code for details on this distinction. Only maintenance_start_time is required for the daily maintenance window to use. A minimum duration of 4h is required, see Configuring maintenance windows and exclusions. The duration in the case of the daily maintenance window cannot be set with Terraform and therefore defaults to 4h.

By setting maintenance_start_time, maintenance_end_time and maintenance_recurrence you define a recurring maintenance window. Example for a recurring maintenance window:

# a 9-5 UTC-4 window every weekday
maintenance_start_time time: "2019-01-01T09:00:00-0400
maintenance_end_time: "2019-01-01T17:00:00-0400"
maintenance_recurrence: "FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR"

The format of maintenance_start_time and maintenance_end_time should be RFC3339 The format of maintenance_recurrence should be RFC5545 See the Google GKE API Documentatiom for more details and examples

The default values results in a daily 4h maintenance window.