Backup & Restore

Backup Architecture and Design

It is needed to implement a backup strategy for the K3S cluster. This backup strategy should, at least, contains a backup infrastructure, and backup and restore procedures for OS basic configuration files, K3S cluster configuration and PODs Persistent Volumes.

The backup architecture is the following:

picluster-backup-architecture

  • OS filesystem backup

    Some OS configuration files should be backed up in order to being able to restore configuration at OS level. For doing so, Restic can be used. Restic provides a fast and secure backup program that can be intregrated with different storage backends, including Cloud Service Provider Storage services (AWS S3, Google Cloud Storage, Microsoft Azure Blob Storage, etc). It also supports opensource S3 Minio.

  • K3S cluster configuration backup and restore.

    This could be achieve backing up and restoring the etcd Kubernetes cluster database as official documentation states. The supported backup procedure is only supported in case etcd database is deployed (by default K3S use a sqlite databse)

    As an alternative Velero, a CNCF project, can be used to backup and restore kubernetes cluster configuration. Velero is kubernetes-distribution agnostic since it uses Kubernetes API for extracting and restoring the configuration, instead relay on backups/restores of etcd database.

    Since for the backup and restore is using standard Kubernetes API, Velero can be used as a tool for migrating the configuration from one kubernetes cluster to another having a differnet kubernetes flavor. From K3S to K8S for example.

    Velero can be intregrated with different storage backends, including Cloud Service Provider Storage services (AWS S3, Google Cloud Storage, Microsoft Azure Blob Storage, etc). It also supports opensource S3 Minio.

    Since Velero is a most generic way to backup any Kuberentes cluster (not just K3S) it will be used to implement my cluster K3S backup.

  • PODs Persistent Volumes backup and restore.

    Applications running in Kubernetes needs to be backed up in a consistent state. It means that before copying the filesystem is it required to freeze the application and make it flush all the pending changes to disk before making the copy. Once the backup is finished, the application can be unfreeze. 1) Application Freeze and flush to disk 2) Filesystem level backup 3) Application unfreeze.

    Longhorn provides its own mechanisms for doing the backups and to take snapshots of the persistent volumes. See Longhorn documentation.

    Longhorn does not currently support application consistent volumes snapshots/backups, see longhorn open issue #2128.

    Longhorn does support, from release 1.2.4, Kubernetes CSI snapshot API to take snapshots/backups programmatically. See Longhorn documentation: CSI Snapshot Support.

    Wiht this functionality application-consistent backups can be orchestrated:

    # Freeze POD's filesystem
    kubectl exec pod -- app_feeze_commandç
    # Take snapshot using CSI Snapshot CRDs
    kubectl apply -f volume_snapshot.yml
    # wait till snapshot finish
    # Unfreeze POD's filesystem
    kubectl exec pod -- app_unfreeze_command
    

    Velero also support CSI snapshot API to take Persistent Volumes snapshots, through CSI provider, Longorn, when backing-up the PODs. See Velero CSI snapshot support documentation.

    Integrating Container Storage Interface (CSI) snapshot support into Velero and Longhorn enables Velero to backup and restore CSI-backed volumes using the Kubernetes CSI Snapshot feature.

    For orchestrating application-consistent backups, Velero supports the definition of backup hooks, commands to be executed before and after the backup, that can be configured at POD level through annotations.

    So Velero, with its buil-in functionality, CSI snapshot support and backup hooks, is able to perform the orchestration of application-consistent backups. Velero delegates the actual backup/restore of PV to the CSI provider, Longhorn.

  • Minio as backup backend

    All the above mechanisms supports as backup backend, a S3-compliant storage infrastructure. For this reason, open-source project Minio has been deployed for the Pi Cluster.

OS Filesystem backup with Restic

OS filesystems from different nodes will be backed up using restic. As backend S3 Minio server will be used.

Restic installation and backup scheduling tasks have been automated with Ansible developing a role: ricsanfre.backup. This role installs restic and configure a systemd service and timer to schedule the backup execution.

Restic installation and backup scheduling configuration

Ubuntu has as part of its distribution a restic package that can be installed with apt command. restic version is an old one, so it is better to install the last version binary (0.16.5) from github repository

For doing the installation execute the following commands as root user

cd /tmp
wget https://github.com/restic/restic/releases/download/v0.16.5/restic_0.16.5_linux_[amd64/arm64].bz2
bzip2 -d /tmp/restic_0.16.5_linux_[amd64/arm64].bz2
cp /tmp/restic_0.16.5_linux_[amd64/arm64] /usr/local/bin/restic
chmod 755 /usr/local/bin/restic 

Create restic environment variables files

restic repository info can be passed to restic command through environment variables instead of typing in as parameters with every command execution

  • Step 1: Create a restic config directory

    sudo mkdir /etc/restic
    
  • Step 2: Create restic.conf file containing repository information:

    RESTIC_REPOSITORY=s3:https://<minio_server>:9091/<restic_bucket>
    RESTIC_PASSWORD=<restic_repository_password>
    AWS_ACCESS_KEY_ID=<minio_restic_user>
    AWS_SECRET_ACCESS_KEY=<minio_restic_password>
    
  • Step 3: Export as enviroment variables content of the file

    export $(grep -v '^#' /etc/restic/restic.conf | xargs -d '\n')
    

Copy CA SSL certificates

In case Minio S3 server is using secure communications using a not valid certificate (self-signed or signed with custom CA), restic command must be used with --cacert <path_to_CA.pem_file option to let restic validate the server certificate.

Copy CA.pem, used to sign Minio SSL certificate into /etc/restic/ssl/CA.pem

Restic repository initialization

restic repository (stored within Minio’s S3 bucket) need to be initialized before being used. It need to be done just once.

For initilizing the repo execute:

restic init

For checking whether the repo is initialized or not execute:

restic init cat config

That command shows the information about the repository (file config stored within the S3 bucket)

Execute restic backup

For manually launch backup process, execute

restic backup <path_to_backup>

Backups snapshots can be displayed executing

restic snapshots

Restic repository maintenance tasks

For checking repository inconsistencies and fixing them

restic check

For applying data retention policy (i.e.: maintain 30 days old snapshots)

restic forget --keep-within 30d

For purging repository old data:

restic prune

Restic backup schedule and concurrent backups

  • Scheduling backup processes

    A systemd service and timer or cron can be used to execute and schedule the backups.

    ricsanfre.backup ansible role uses a systemd service and timer to automatically execute the backups. List of directories to be backed up, the scheduling of the backup and the retention policy are passed as role parameters.

  • Allowing concurrent backup processes

    A unique repository will be used (unique S3 bucket) to backing up configuration from all cluster servers. Restic maintenace tasks (restic check, restic forget and restic prune operations) acquires an exclusive-lock in the repository, so concurrent backup processes including those operations are mutually lock.

    To avoid this situation, retic repo maintenance tasks are scheduled separatedly from the backup process and executed just from one of the nodes: gateway

Backups policies

The folling directories are backed-up from the cluster nodes

Path Exclude patterns
/etc/  
/home/oss .cache
/root .cache
/home/ansible .cache .ansible

Backup policies scheduling

  • Daily backup at 03:00 (executed from all nodes)
  • Daily restic repo maintenance at 06:00 (executed from gateway node)

Enable CSI snapshots support in K3S

K3S distribution currently does not come with a preintegrated Snapshot Controller that is needed to enable CSI Snapshot feature. An external snapshot controller need to be deployed. K3S can be configured to use kubernetes-csi/external-snapshotter.

To enable this feature, follow instructions in Longhorn documentation - Enable CSI Snapshot Support.

  • Step 1. Prepare kustomization yaml file to install external csi snaphotter (setting namespace to kube-system)

    tmp/kustomization.yaml

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    namespace: kube-system
    resources:
    - https://github.com/kubernetes-csi/external-snapshotter/client/config/crd/?ref=v6.3.2
    - https://github.com/kubernetes-csi/external-snapshotter/deploy/kubernetes/snapshot-controller/?ref=v6.3.2
    
  • Step Deploy Snapshot-Controller

    kubectl apply -k ./tmp
    

Longhorn backup configuration

For configuring the backup in Longhorn is needed to define a backup target, external storage system where longhorn volumes are backed to and restore from. Longhorn support NFS and S3 based backup targets. Minio can be used as backend.

Minio end-point credentials

Create kuberentes secret resource containing Minio end-point access information and credentials

  • Create manifest file longhorn-minio-secret.yml

    apiVersion: v1
    kind: Secret
    metadata:
    name: minio-secret
    namespace: longhorn-system
    type: Opaque
    data:
    AWS_ACCESS_KEY_ID: <base64_encoded_longhorn-minio-access-key> # longhorn
    AWS_SECRET_ACCESS_KEY: <base64_encoded_longhorn-minio-secret-key> # longhornpass
    AWS_ENDPOINTS: <base64_encoded_mino-end-point> # https://minio-service.default:9000
    AWS_CERT: <base64_encoded_minio_ssl_pem> # minio_ssl_certificate, containing complete chain, including CA
    

    For encoding the different access paramenters the following commands can be used:

    echo -n minio_url | base64
    echo -n minio_access_key_id | base64
    echo -n minio_secret_access_key | base64
    cat minio-ssl.pem ca.pem | base64 | tr -d "\n"
    
  • Apply manifest file

    kubectl apply -f longhorn-s3-secret.yml
    

Configure Longhorn backup target

Go to the Longhorn UI. In the top navigation bar, click Settings. In the Backup section, set Backup Target to:

s3://<bucket-name>@<minio-s3-region>/

In the Backup section set Backup Target Credential Secret to the secret resource created before

minio-secret

longhorn-backup-settings

Target can be automatically configured when deploying helm chart

Additional overriden values can be provided to helm chart deployment to configure S3 target.

defaultSettings:
  backupTarget: s3://longhorn@eu-west-1/
  backupTargetCredentialSecret: minio-secret

Scheduling longhorn volumes backup

A Longhorn recurring job can be created for scheduling periodic backups/snapshots of volumes. See details in Longhorn - Scheduling backups and snapshots.

  • Create RecurringJob manifest resource

    ---
    apiVersion: longhorn.io/v1beta1
    kind: RecurringJob
    metadata:
      name: backup
      namespace: longhorn-system
    spec:
      cron: "0 5 * * *"
      task: "backup"
      groups:
      - default
      retain: 2
      concurrency: 2
      labels:
        type: 'full'
        schedule: 'daily'
    

    This will create recurring backup job for default. Longhorn will automatically add a volume to the default group when the volume has no recurring job.

  • Apply manifest file

    kubectl apply -f recurring_job.yml
    

Configure Longhorn CSI Snapshots

VolumeSnapshotClass objects from CSI Snapshot API need to be configured

  • Create VolumeSnapshotClass to create Longhorn snapshots (in-cluster snapshots, not backed up to S3 backend), volume_snapshotclass_snap.yml

    # CSI VolumeSnapshot Associated With Longhorn Snapshot
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1
    metadata:
      name: longhorn-snapshot-vsc
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: snap
    
  • Create VolumeSnapshotClass to create Longhorn backups (backed up to S3 backend), volume_snapshotclass_bak.yml

    # CSI VolumeSnapshot Associated With Longhorn Backup
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1
    metadata:
      name: longhorn-backup-vsc
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: bak
    
  • Apply manifest file

    kubectl apply -f volume_snapshotclass_snap.yml volume_snapshotclass_bak.yml
    

Kubernetes Backup with Velero

Velero installation and configuration

Velero defines a set of Kuberentes’ CRDs (Custom Resource Definition) and Controllers that process those CRDs to perform backups and restores.

Velero as well provides a CLI to execute backup/restore commands using Kuberentes API. More details in official documentation, How Velero works

The complete backup workflow is the following:

velero-backup-process

As storage provider, Minio will be used. See Velero’s installation documentation using Minio as backend.

Configuring Minio bucket and user for Velero

Velero requires an object storage bucket to store backups in. In Minio a dedicated S3 bucket is created for Velero (name: k3s-velero)

A specific Minio user velero is configured with specic access policy to grant the user access to the bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::k3s-velero/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::k3s-velero"
            ]
        }
    ]
}

See more details in Velero plugin for aws.

Installing Velero CLI

Velero CLI need to be installed joinly with kubectl. velero uses kubectl config file (~/.kube/config) to connect to Kuberentes API.

This will be installed in pimaster

  • Step 1: Download latest stable velero release from https://github.com/vmware-tanzu/velero/releases

  • Step 2: Download tar file corresponding to the latest stable version and the host architecture

    velero-<release>-linux-<arch>.tar.gz
    
  • Step 3: unarchive

    tar -xvf <RELEASE-TARBALL-NAME>.tar.gz -C /dir/to/extract/to
    
  • Step 4: Copy velero binary to /usr/local/bin

  • Step 5: Customize namespace for operational commands

    CLI commands expects velero to be running in default namespace (velero). If namespace is different it must be specified with any execution of the command (–namespace option)

    Velero CLI can be configured to use a different namespace and avoid passing –namespace option with each execution

    velero client config set namespace=<velero_namespace>
    

Installing Velero Kubernetes Service

Installation using Helm (Release 3):

  • Step 1: Add the vmware-tanzu Helm repository:

    helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
    
  • Step2: Fetch the latest charts from the repository:

    helm repo update
    
  • Step 3: Create namespace

    kubectl create namespace velero
    
  • Step 4: Create values.yml for Velero helm chart deployment

    # AWS backend plugin configuration
    initContainers:
      - name: velero-plugin-for-aws
        image: velero/velero-plugin-for-aws:v1.10.0
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - mountPath: /target
            name: plugins
    
    # Minio storage configuration
    configuration:
      backupStorageLocation:
        - provider: aws
          bucket: <velero_bucket>
          caCert: <ca.pem_base64> # cat CA.pem | base64 | tr -d "\n"
          config:
            region: eu-west-1
            s3ForcePathStyle: true
            s3Url: https://minio.example.com:9091
            insecureSkipTLSVerify: true
      # Enable CSI snapshot support
      features: EnableCSI
    credentials:
      secretContents:
        cloud: |
          [default]
          aws_access_key_id: <minio_velero_user> # Not encoded
          aws_secret_access_key: <minio_velero_pass> # Not encoded
    
    # Disable VolumeSnapshotLocation CRD. It is not needed for CSI integration
    snapshotsEnabled: false
    
  • Step 5: Install Velero in the velero namespace with the overriden values

    helm install velero vmware-tanzu/velero --namespace velero -f values.yml
    
  • Step 6: Confirm that the deployment succeeded, run:

    kubectl -n velero get pod
    
  • Step 7: Configure VolumeSnapshotClass

    Create manifest file volume_snapshotclass_velero.yml

    # CSI VolumeSnapshot Associated With Longhorn Backup
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1
    metadata:
      name: velero-longhorn-backup-vsc
      labels:
        velero.io/csi-volumesnapshot-class: "true"
    driver: driver.longhorn.io
    deletionPolicy: Retain
    parameters:
      type: bak
    

    This VolumeSnapshotClass will be used by Velero to create VolumeSnapshot objects when orchestrating PV backups. The VolumeSnapshotClass to be used, from all the configured in the system, is the one with the label “velero.io/csi-volumesnapshot-class”.

    Setting a DeletionPolicy of Retain on the VolumeSnapshotClass will preserve the volume snapshot in the storage system for the lifetime of the Velero backup and will prevent the deletion of the volume snapshot, in the storage system, in the event of a disaster where the namespace with the VolumeSnapshot object may be lost.

    Apply manifest file

    kubectl apply -f volume_snapshotclass_velero.yml
    

Velero chart configuration details

  • Velero plugins installation

    The chart configuration deploys the following velero plugin as initContainers:

    • velero-plugin-for-aws to enable S3 Minio as backup backend.
    # AWS backend and CSI plugins configuration
    initContainers:
      - name: velero-plugin-for-aws
        image: velero/velero-plugin-for-aws:v1.10.0
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - mountPath: /target
            name: plugins
    
  • Enable Velero CSI Snapshots

    configuration:
       # Enable CSI snapshot support
      features: EnableCSI
    
    # Disable VolumeSnapshotLocation CRD. It is not needed for CSI integration
    snapshotsEnabled: false
    
  • Configure Minio S3 server as backup backend

    # Minio storage configuration
    configuration:
      # Cloud provider being used
      provider: aws
      backupStorageLocation:
        provider: aws
        bucket: <velero_bucket>
        caCert: <ca.pem_base64> # cat CA.pem | base64 | tr -d "\n"
        config:
          region: eu-west-1
          s3ForcePathStyle: true
          s3Url: https://minio.example.com:9091
          insecureSkipTLSVerify: true
    credentials:
      secretContents:
        cloud: |
          [default]
          aws_access_key_id: <minio_velero_user> # Not encoded
          aws_secret_access_key: <minio_velero_pass> # Not encoded
    
    

    Minio server connection data (configuration.backupStorageLocation.config) ,minio credentials (credentials.secretContents), and bucket(configuration.backupStorageLocation.bucket) to be used.

GitOps installation (ArgoCD)

As alternative, for GitOps deployment (ArgoCD), instead of putting minio credentials into helm values in plain text, a Secret can be used to store the credentials.

apiVersion: v1
kind: Secret
metadata:
  name: velero-secret
  namespace: velero
type: Opaque
data:
  cloud: <velero_secret_content | b64encode>

Where is:

[default]
aws_access_key_id: <minio_velero_user> # Not encoded
aws_secret_access_key: <minio_velero_pass> # Not encoded

And the following helm values need to be provided, instead of credentias.secretContent

credentials:
  existingSecret: velero-secret

Testing Velero installation

  • Step 1: Deploy a testing application (nginx), which uses a Longhorn’s Volume for storing its logs (/var/logs/nginx)

    1) Create manifest file: nginx-example.yml

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: nginx-example
      labels:
        app: nginx
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: nginx-logs
      namespace: nginx-example
      labels:
        app: nginx
    spec:
      storageClassName: longhorn
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 50Mi
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      namespace: nginx-example
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
              app: nginx
          annotations:
              pre.hook.backup.velero.io/container: fsfreeze
              pre.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--freeze", "/var/log/nginx"]'
              post.hook.backup.velero.io/container: fsfreeze
              post.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--unfreeze", "/var/log/nginx"]'
        spec:
          volumes:
            - name: nginx-logs
              persistentVolumeClaim:
                claimName: nginx-logs
          containers:
            - image: nginx:1.17.6
              name: nginx
              ports:
                - containerPort: 80
              volumeMounts:
                - mountPath: "/var/log/nginx"
                  name: nginx-logs
                  readOnly: false
            - image: ubuntu:bionic
              name: fsfreeze
              securityContext:
                privileged: true
              volumeMounts:
                - mountPath: "/var/log/nginx"
                  name: nginx-logs
                  readOnly: false
              command:
                - "/bin/bash"
                - "-c"
                - "sleep infinity"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
          app: nginx
      name: my-nginx
      namespace: nginx-example
    spec:
      ports:
        - port: 80
          targetPort: 80
      selector:
        app: nginx
      type: LoadBalancer
    

    2) Apply manifest file nginx-example.yml

    kubectl apply -f nginx-example.yml
    

    3) Connect to nginx pod and create manually a file within /var/log/nginx

    kubectl exec <nginx-pod> -n nginx-example -it -- /bin/sh
    # touch /var/log/nginx/testing
    

    4) Create a backup for any object included in nginx-example namespace:

    velero backup create nginx-backup --include-namespaces nginx-example --wait  
    

    5) Simulate a disaster:

    kubectl delete namespace nginx-example
    

    6) To check that the nginx deployment and service are gone, run:

    kubectl get deployments --namespace=nginx-example
    kubectl get services --namespace=nginx-example
    kubectl get namespace/nginx-example
    

    7) Run the restore

    velero restore create --from-backup nginx-backup
    

    8) Check the status of the restore:

    velero restore get
    

    After the restore finishes, the output looks like the following:

    NAME                          BACKUP         STATUS      STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
    nginx-backup-20211220180613   nginx-backup   Completed   2021-12-20 18:06:13 +0100 CET   2021-12-20 18:06:50 +0100 CET   0        0          2021-12-20 18:06:13 +0100 CET   <none>
    

    9) Check nginx deployment and services are back

    kubectl get deployments --namespace=nginx-example
    kubectl get services --namespace=nginx-example
    kubectl get namespace/nginx-example
    

    10) Connect to the restored pod and check that testing file is in /var/log/nginx

Schedule a periodic full backup

Set up daily full backup can be on with velero CLI

velero schedule create full --schedule "0 4 * * *"

Or creating a ‘Schedule’ kubernetes resource:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: full
  namespace: velero
spec:
  schedule: 0 4 * * *
  template:
    hooks: {}
    includedNamespaces:
    - '*'
    includedResources:
    - '*'
    includeClusterResources: true
    metadata:
      labels:
        type: 'full'
        schedule: 'daily'
    ttl: 720h0m0s

References


Last Update: Jul 24, 2024

Comments: