For backups, every internet company’s technical team must handle this task, and we are no exception. Today, I’ll share my own strategies for backing up production Kubernetes clusters.

My primary goals for Kubernetes backups are to prevent:

  • Accidental deletion of a namespace within the cluster
  • Accidental misconfiguration causing resource anomalies (e.g., deployments, configmaps)
  • Accidental deletion of partial resources in the cluster
  • Loss of etcd data

Backing Up etcd

Backing up etcd prevents catastrophic failures at the cluster level or loss of etcd data, which could render the entire cluster unusable. In such cases, only full cluster recovery can restore services.

Backup script for etcd:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/bin/bash
#ENDPOINTS="https://192.168.1.207:2379,https://192.168.1.208:2379,https://192.168.1.209:2379"
ENDPOINTS="127.0.0.1:2379"
CACERT="/etc/kubernetes/pki/etcd/ca.crt"
CERT="/etc/kubernetes/pki/etcd/server.crt"
KEY="/etc/kubernetes/pki/etcd/server.key"
DATE=`date +%Y%m%d-%H%M%S`
BACKUP_DIR="/home/centos/hostpath/backups/k8s/etcd"
ETCDCTL_API=3 /usr/local/bin/etcdctl --cacert=${CACERT} --cert=${CERT} --key=${KEY} --endpoints="${ENDPOINTS}" snapshot save ${BACKUP_DIR}/k8s-snapshot-${DATE}.db
find $BACKUP_DIR/ -type f -mtime +20 -exec rm -f {} \;

Cron job schedule:

1
50 21 * * * /bin/bash /home/centos/hostpath/backups/k8s/etcdv3-bak.sh

Setting Up MinIO Object Storage

Since our storage cluster uses GlusterFS, we use MinIO to set up object storage with GlusterFS as the underlying filesystem. If you’re using Alibaba Cloud OSS to back up your cluster resources, skip this step and refer to: https://github.com/AliyunContainerService/velero-plugin

To deploy MinIO in Kubernetes, persistent volumes (PV) and persistent volume claims (PVC) are required. For simplicity, we run it via Docker:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
version: '2.0'
services:
  minio:
    image: minio/minio:latest
    container_name: minio
    ports:
      - "39000:9000"
      - "39001:9001"
    restart: always
    command: server --console-address ':9001' /data
    environment:
      MINIO_ACCESS_KEY: admin
      MINIO_SECRET_KEY: adminSD#123
    logging:
      options:
        max-size: "1000M" # Maximum file upload limit
        max-file: "100"
      driver: json-file
    volumes:
      - /home/centos/hostpath/backups/k8s/velero:/data # Mount path
    networks:
      - minio

networks:
  minio:
    ipam:
      config:
      - subnet: 10.210.1.0/24
        gateway: 10.210.1.1

Open your browser and access the following URL with the provided credentials to manage MinIO via the web console:

1
2
3
MinIO Web: http://192.168.1.214:39001
MinIO Admin: admin
MinIO Admin Password: adminSD#123

Installing Velero Client

1
brew install velero

Create a file credentials-velero for later use when setting up the Velero server to connect to object storage:

1
2
3
[default]
aws_access_key_id = admin
aws_secret_access_key = adminSD#123

Deploying Velero in Kubernetes

1
2
3
4
5
6
7
# velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.0 \
    --bucket k8s-jf \
    --secret-file ./credentials-velero \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000

Velero Commands

Backup, View, Delete Operations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Backup resources in the ingress-nginx namespace:
velero backup create ingress-nginx-backup --include-namespaces ingress-nginx

# View backup details:
velero backup describe ingress-nginx-backup
velero backup logs ingress-nginx-backup

# Delete a backup:
velero delete backup ingress-nginx-backup

# Backup all resources except those in ingress-nginx and test namespaces:
velero backup create k8s-full-test-backup --exclude-namespaces ingress-nginx,test

# Backup specific resource types:
velero backup create kube-system-backup --include-resources pod,secret

# Delete backup without confirmation:
velero backup delete kube-system-backup --confirm

# Backup pods with PVCs:
velero backup create pvc-backup --snapshot-volumes --include-namespaces test-velero

Note:

  • Use --include-resources to specify resource types to include.
  • Use --exclude-resources to exclude certain resource types.

Scheduled Backups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Backup every 6 hours:
velero create schedule ${SCHEDULE_NAME} --schedule="0 */6 * * *"

# Backup every 6 hours using @every syntax:
velero create schedule ${SCHEDULE_NAME} --schedule="@every 6h"

# Daily backup for the web namespace:
velero create schedule ${SCHEDULE_NAME} --schedule="@every 24h" --include-namespaces web

# Weekly backup with TTL of 90 days:
velero create schedule ${SCHEDULE_NAME} --schedule="@every 168h" --ttl 2160h0m0s

Note: --ttl sets the backup retention period. After expiration, backups are automatically cleaned up. Default is 30 days.

Restoring Backups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Create a restore from a backup:
velero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}

# Create a restore from a backup (default name: ${BACKUP_NAME}-<timestamp>):
velero restore create --from-backup ${BACKUP_NAME}

# Restore from the latest backup of a schedule:
velero restore create --from-schedule ${SCHEDULE_NAME}

# Restore specific resources from a backup:
velero restore create --from-backup backup-2 --include-resources pod,secret

# Restore all backups (existing services won't be overwritten):
velero restore create --from-backup all-ns-backup

# Restore only specific namespaces (e.g., default and nginx-example):
velero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example

# Restore test-velero namespace to test-velero-1:
velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1

View backups:

1
2
3
4
velero get backup   # List backups
velero get schedule # List scheduled backups
velero get restore  # List restores
velero get plugins  # List installed plugins

Note: Velero allows restoring resources to different namespaces than their original source. Use the --namespace-mappings flag:

1
velero restore create RESTORE_NAME --from-backup BACKUP_NAME --namespace-mappings old-ns-1:new-ns-1,old-ns-2:new-ns-2

Velero Backup Practical Examples

Perform a full cluster backup:

1
velero backup create k8s-jf-test-all

Set up a 4-hourly backup with retention of 2 months:

1
2
# velero create schedule k8s-jf-cron-4h --exclude-namespaces test,tt --schedule="@every 4h" --ttl 1440h
Schedule "k8s-jf-cron-4h" created successfully.

Restore a single namespace from a full backup within the same cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# velero restore create k8s-jf-test-all-restore --from-backup k8s-jf-test-all --include-namespaces test --namespace-mappings test:test10000
# velero restore describe k8s-jf-test-all-restore
Name:         k8s-jf-test-all-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>
Phase:                                 InProgress
Estimated total items to be restored:  141
Items restored so far:                 123
Started:    2021-09-07 10:47:44 +0800 CST
Completed:  <n/a>
Backup:  k8s-jf-test-all
Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>
Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto
Namespace mappings:  test=test10000
Label selector:  <none>
Restore PVs:  auto
Preserve Service NodePorts:  auto
# velero restore get
NAME                      BACKUP            STATUS       STARTED                         COMPLETED   ERRORS   WARNINGS   CREATED                         SELECTOR
k8s-jf-test-all-restore   k8s-jf-test-all   InProgress   2021-09-07 10:47:44 +0800 CST   <nil>       0        0          2021-09-07 10:47:44 +0800 CST   <none>

Cross-cluster restore using periodic backups

For cross-cluster restores, both clusters must use the same cloud provider’s persistent volume solution. Here, we use MinIO with bucket k8s-jf on both sides:

1
2
3
4
5
6
7
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.0 \
    --bucket k8s-jf \
    --secret-file ./credentials-velero \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000

Check existing backups:

1
2
3
4
5
# velero backup get
NAME                                STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
k8s-jf-all-cron-4h-20210907061716   Completed         0        0          2021-09-07 14:17:16 +0800 CST   59d       default            <none>
k8s-jf-all-cron-4h-20210907021627   Completed         0        0          2021-09-07 10:16:27 +0800 CST   59d       default            <none>
k8s-jf-test-all                     Completed         0        0          2021-09-07 10:19:45 +0800 CST   29d       default            <none>

Restore specific namespace data

Restore the argocd namespace from backup k8s-jf-all-cron-4h-20210907061716 into the argocd-dev namespace in the current cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# velero restore create --from-backup k8s-jf-all-cron-4h-20210907061716 --include-namespaces argocd --namespace-mappings argocd:argocd-dev
Restore request "k8s-jf-all-cron-4h-20210907061716-20210907155450" submitted successfully.
Run `velero restore describe k8s-jf-all-cron-4h-20210907061716-20210907155450` or `velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907155450` for more details.
# velero restore get
NAME                                               BACKUP                              STATUS       STARTED                         COMPLETED   ERRORS   WARNINGS   CREATED                         SELECTOR
k8s-jf-all-cron-4h-20210907061716-20210907155450   k8s-jf-all-cron-4h-20210907061716   InProgress   2021-09-07 15:54:51 +0800 CST   <nil>       0        0          2021-09-07 15:54:51 +0800 CST   <none>
# velero restore logs k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:23Z" level=info msg="Attempting to restore Secret: argocd-application-controller-token-wv62v" logSource="pkg/restore/restore.go:1238" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
...
time="2021-09-07T08:11:30Z" level=info msg="Restored 61 items out of an estimated total of 61 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:664" name=argocd-server namespace=argocd-dev progress= resource=services restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
time="2021-09-07T08:11:30Z" level=info msg="Waiting for all restic restores to complete" logSource="pkg/restore/restore.go:546" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119
...
time="2021-09-07T08:11:30Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:480" restore=velero/k8s-jf-all-cron-4h-20210907061716-20210907161119

The logs confirm that data from the original argocd namespace has been successfully restored to the current cluster’s argocd-dev namespace.

Uninstalling Velero

Uninstall Velero — note that uninstall does not delete the namespace:

1
2
3
4
5
6
# velero uninstall
You are about to uninstall Velero.
Are you sure you want to continue (Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero
# kubectl delete crds -l component=velero

Troubleshooting Backup Issues

1
2
3
4
time="2021-09-07T07:22:35Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
time="2021-09-07T07:22:36Z" level=info msg="Backup storage location is invalid, marking as unavailable" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:117"
time="2021-09-07T07:22:36Z" level=error msg="Error listing backups in backup store" backupLocation=default controller=backup-sync error="rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix=backups%2F: dial tcp: lookup minio.velero.svc on 10.96.0.10:53: no such host" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:361" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:182"
...

This indicates inconsistency between CRD BackupStorageLocation and the configured object storage credentials. Likely due to leftover resources after reinstallation.

Fix by reinstalling cleanly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# velero uninstall
You are about to uninstall Velero.
Are you sure you want to continue (Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero
# kubectl delete crds -l component=velero
# velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.0 \
    --bucket k8s-jf \
    --secret-file ./credentials-velero \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.214:39000
# kubectl -n velero get backupstoragelocation default -o yaml # Verify correct configuration
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: "2021-09-07T07:47:44Z"
  generation: 1
  labels:
    component: velero
  name: default
  namespace: velero
  resourceVersion: "1184696"
  selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default
  uid: 39502e43-272e-461f-a114-a9ec955f0510
spec:
  config:
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://192.168.1.214:39000
  default: true
  objectStorage:
    bucket: k8s-jf
  provider: aws
status:
  lastSyncedTime: "2021-09-07T07:50:00Z"
  lastValidationTime: "2021-09-07T07:50:00Z"
  phase: Available