For backups, every internet company’s technical team must handle this task, and we are no exception. Today, I’ll share my own strategies for backing up production Kubernetes clusters.
My primary goals for Kubernetes backups are to prevent:
Accidental deletion of a namespace within the cluster
Accidental deletion of partial resources in the cluster
Loss of etcd data
Backing Up etcd
Backing up etcd prevents catastrophic failures at the cluster level or loss of etcd data, which could render the entire cluster unusable. In such cases, only full cluster recovery can restore services.
Since our storage cluster uses GlusterFS, we use MinIO to set up object storage with GlusterFS as the underlying filesystem. If you’re using Alibaba Cloud OSS to back up your cluster resources, skip this step and refer to: https://github.com/AliyunContainerService/velero-plugin
To deploy MinIO in Kubernetes, persistent volumes (PV) and persistent volume claims (PVC) are required. For simplicity, we run it via Docker:
# Create a restore from a backup:velero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}# Create a restore from a backup (default name: ${BACKUP_NAME}-<timestamp>):velero restore create --from-backup ${BACKUP_NAME}# Restore from the latest backup of a schedule:velero restore create --from-schedule ${SCHEDULE_NAME}# Restore specific resources from a backup:velero restore create --from-backup backup-2 --include-resources pod,secret
# Restore all backups (existing services won't be overwritten):velero restore create --from-backup all-ns-backup
# Restore only specific namespaces (e.g., default and nginx-example):velero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example
# Restore test-velero namespace to test-velero-1:velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1
velero get backup # List backupsvelero get schedule # List scheduled backupsvelero get restore # List restoresvelero get plugins # List installed plugins
Note:
Velero allows restoring resources to different namespaces than their original source. Use the --namespace-mappings flag:
# velero restore create k8s-jf-test-all-restore --from-backup k8s-jf-test-all --include-namespaces test --namespace-mappings test:test10000# velero restore describe k8s-jf-test-all-restoreName: k8s-jf-test-all-restore
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: InProgress
Estimated total items to be restored: 141Items restored so far: 123Started: 2021-09-07 10:47:44 +0800 CST
Completed: <n/a>
Backup: k8s-jf-test-all
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: test=test10000
Label selector: <none>
Restore PVs: auto
Preserve Service NodePorts: auto
# velero restore getNAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
k8s-jf-test-all-restore k8s-jf-test-all InProgress 2021-09-07 10:47:44 +0800 CST <nil> 00 2021-09-07 10:47:44 +0800 CST <none>
Cross-cluster restore using periodic backups
For cross-cluster restores, both clusters must use the same cloud provider’s persistent volume solution. Here, we use MinIO with bucket k8s-jf on both sides:
# velero uninstallYou are about to uninstall Velero.
Are you sure you want to continue(Y/N)? y
Velero uninstalled ⛵
# kubectl delete namespace/velero clusterrolebinding/velero# kubectl delete crds -l component=velero
time="2021-09-07T07:22:35Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"time="2021-09-07T07:22:36Z" level=info msg="Backup storage location is invalid, marking as unavailable" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:117"time="2021-09-07T07:22:36Z" level=error msg="Error listing backups in backup store" backupLocation=default controller=backup-sync error="rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: Get http://minio.velero.svc:9000/velero?delimiter=%2F&list-type=2&prefix=backups%2F: dial tcp: lookup minio.velero.svc on 10.96.0.10:53: no such host" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:361" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:182"...
This indicates inconsistency between CRD BackupStorageLocation and the configured object storage credentials. Likely due to leftover resources after reinstallation.