What is this project about?
Scope
The main goal of this project is to create a kubernetes cluster at home using ARM/x86 bare metal nodes (Raspberry Pis and low cost refurbished mini PCs) and to automate its deployment and configuration applying IaC (infrastructure as a code) and GitOps methodologies with tools like Ansible, cloud-init and Flux CD.
The project scope includes the automatic installation and configuration of a lightweight Kubernetes flavor based on K3S, and deployment of cluster basic services such as:
- Distributed block storage for POD’s persistent volumes, LongHorn.
- S3 Object storage, Minio.
- Backup/restore solution for the cluster, Velero and Restic.
- Certificate management, Cert-Manager.
- Secrets Management solution with Vault and External Secrets
- Identity Access Management(IAM) providing Single-sign On, Keycloak
- Observability platform based on:
- Metrics monitoring solution, Prometheus
- Logging and analytics solution, combined EFK+LG stacks (Elasticsearch-Fluentd/Fluentbit-Kibana + Loki-Grafana)
- Distributed tracing solution, Tempo.
Also deployment of services for building a cloud-native microservices architecture are include as part of the scope:
- Service mesh architecture, Istio
- API security with Oauth2.0 and OpenId Connect, using IAM solution, Keycloak
- Streaming platform, Kafka
Design Principles
- Use hybrid x86/ARM bare metal nodes, combining in the same cluster Raspberry PI nodes (ARM) and x86 mini PCs (HP Elitedesk 800 G3).
- Use lightweight Kubernetes distribution (K3S). Kubernetes distribution with a smaller memory footprint which is ideal for running on Raspberry PIs
- Use distributed storage block technology, instead of centralized NFS system, for pod persistent storage. Kubernetes block distributed storage solutions, like Rook/Ceph or Longhorn, in their latest versions have included ARM 64 bits support.
- Use opensource projects under the CNCF: Cloud Native Computing Foundation umbrella
- Use latest versions of each opensource project to be able to test the latest Kubernetes capabilities.
- Automate deployment of cluster using IaC (infrastructure as a code) and GitOps methodologies with tools like:
- cloud-init to automate the initial OS installation of the cluster nodes.
- Ansible for automating the configuration of the cluster nodes, installation of kubernetes and external services, and triggering cluster bootstrap (FluxCD bootstrap).
- Flux CD to automatically provision Kubernetes applications from git repository.
Technology Stack
The following picture shows the set of opensource solutions used for building this cluster:
Name | Description | |
---|---|---|
Ansible | Automate OS configuration, external services installation and k3s installation and bootstrapping | |
FluxCD | GitOps tool for deploying applications to Kubernetes | |
Cloud-init | Automate OS initial installation | |
Ubuntu | Cluster nodes OS | |
K3S | Lightweight distribution of Kubernetes | |
containerd | Container runtime integrated with K3S | |
Cilium | Kubernetes Networking (CNI) and Load Balancer | |
CoreDNS | Kubernetes DNS | |
HA Proxy | Kubernetes API Load-balancer | |
Metal LB | Load-balancer implementation for bare metal Kubernetes clusters (Cilium LB alternative) | |
Ingress NGINX | Kubernetes Ingress Controller | |
Istio | Kubernetes Service Mesh | |
Longhorn | Kubernetes distributed block storage | |
Minio | S3 Object Storage solution | |
Cert-manager | TLS Certificates management | |
Hashicorp Vault | Secrets Management solution | |
External Secrets Operator | Sync Kubernetes Secrets from Hashicorp Vault | |
Keycloak | Identity Access Management | |
OAuth2.0 Proxy | OAuth2.0 Proxy | |
Velero | Kubernetes Backup and Restore solution | |
Restic | OS Backup and Restore solution | |
Prometheus | Metrics monitoring and alerting | |
Fluentd | Logs forwarding and distribution | |
Fluentbit | Logs collection | |
Loki | Logs aggregation | |
Elasticsearch | Logs analytics | |
Kibana | Logs analytics Dashboards | |
Tempo | Distributed tracing monitoring | |
Grafana | Monitoring Dashboards |
External Resources and Services
Even whe the premise is to deploy all services in the kubernetes cluster, there is still a need for a few external services/resources. Below is a list of external resources/services and why we need them.
Cloud external services
Note: These resources are optional, the homelab still works without them but it won’t have trusted certificates.
Provider | Resource | Purpose | |
---|---|---|---|
Letsencrypt | TLS CA Authority | Signed valid TLS certificates | |
IONOS | DNS | DNS and DNS-01 challenge for certificates |
Alternatives:
-
Use a private PKI (custom CA to sign certificates).
Currently supported. Only minor changes are required. See details in Doc: Quick Start instructions.
-
Use other DNS provider.
Cert-manager / Certbot used to automatically obtain certificates from Let’s Encrypt can be used with other DNS providers. This will need further modifications in the way cert-manager application is deployed (new providers and/or webhooks/plugins might be required).
Currently only acme issuer (letsencytp) using IONOS as dns-01 challenge provider is configured. Check list of supported dns01 providers.
Self-hosted external services
There is another list of services that I have decided to run outside the kuberentes cluster selfhosting them.
External Service | Resource | Purpose | |
---|---|---|---|
Minio | S3 Object Store | Cluster Backup | |
Hashicorp Vault | Secrets Management | Cluster secrets management |
Minio backup servive is hosted in a VM running in Public Cloud, using Oracle Cloud Infrastructure (OCI) free tier.
Vault service is running in one of the cluster nodes, node1
, since Vault kubernetes authentication method need access to Kuberentes API, I won’t host Vault service in Public Cloud.
What I have built so far
From hardware perspective I built two different versions of the cluster
- Cluster 1.0: Basic version using dedicated USB flash drive for each node and centrazalized SAN as additional storage
- Cluster 2.0: Adding dedicated SSD disk to each node of the cluster and improving a lot the overall cluster performance
- Cluster 3.0: Creating hybrid ARM/x86 kubernetes cluster, combining Raspberry PI nodes with x86 mini PCs
What I have developed so far
From software perspective, I have developed the following:
-
Cloud-init template files for initial OS installation in Raspberry PI nodes
Source code can be found in Pi Cluster Git repository under
metal/rpi/cloud-init
directory. -
Ansible playbook and roles for configuring cluster nodes and installating and bootstraping K3S cluster
Source code can be found in Pi Cluster Git repository under
/ansible
directory.Aditionally several ansible roles have been developed to automate different configuration tasks on Ubuntu-based servers that can be reused in other projects. These roles are used by Pi-Cluster Ansible Playbooks
Each ansible role source code can be found in its dedicated Github repository and is published in Ansible-Galaxy to facilitate its installation with
ansible-galaxy
command.Ansible role Description Github ricsanfre.security Automate SSH hardening configuration tasks ricsanfre.ntp Chrony NTP service configuration ricsanfre.firewall NFtables firewall configuration ricsanfre.dnsmasq Dnsmasq configuration ricsanfre.storage Configure LVM ricsanfre.iscsi_target Configure iSCSI Target ricsanfre.iscsi_initiator Configure iSCSI Initiator ricsanfre.k8s_cli Install kubectl and Helm utilities ricsanfre.fluentbit Configure fluentbit ricsanfre.minio Configure Minio S3 server ricsanfre.backup Configure Restic ricsanfre.vault Configure Hashicorp Vault -
Packaged Kuberentes applications (Helm, Kustomize, manifest files) to be deployed using FluxCD
Source code can be found in Pi Cluster Git repository under
/kubernetes
directory. -
This documentation website picluster.ricsanfre.com, hosted in Github pages.
Static website generated with Jekyll.
Source code can be found in the Pi-cluster repository under
/docs
directory.
Software used and latest version tested
The software used and the latest version tested of each component
Type | Software | Latest Version tested | Notes |
---|---|---|---|
OS | Ubuntu | 22.04.2 | |
Control | Ansible | 2.17.2 | |
Control | cloud-init | 23.1.2 | version pre-integrated into Ubuntu 22.04.2 |
Kubernetes | K3S | v1.30.2 | K3S version |
Kubernetes | Helm | v3.15.3 | |
Kubernetes | etcd | v3.5.13-k3s1 | version pre-integrated into K3S |
Computing | containerd | v1.7.17-k3s1 | version pre-integrated into K3S |
Networking | Cilium | 1.15.7 | |
Networking | CoreDNS | v1.10.1 | Helm chart version: 1.31.0 |
Metric Server | Kubernetes Metrics Server | v0.7.2 | Helm chart version: 3.12.1 |
Service Mesh | Istio | v1.22.3 | Helm chart version: 1.22.3 |
Service Proxy | Ingress NGINX | v1.11.1 | Helm chart version: 4.11.1 |
Storage | Longhorn | v1.6.2 | Helm chart version: 1.6.2 |
Storage | Minio | RELEASE.2024-04-18T19-09-19Z | Helm chart version: 5.2.0 |
TLS Certificates | Certmanager | v1.15.1 | Helm chart version: v1.15.1 |
Logging | ECK Operator | 2.13.0 | Helm chart version: 2.13.0 |
Logging | Elastic Search | 8.13.0 | Deployed with ECK Operator |
Logging | Kibana | 8.13.0 | Deployed with ECK Operator |
Logging | Fluentbit | 3.0.7 | Helm chart version: 0.46.11 |
Logging | Fluentd | 1.15.3 | Helm chart version: 0.5.2 Custom docker image from official v1.17.1 |
Logging | Loki | 3.1.0 | Helm chart grafana/loki version: 6.7.1 |
Monitoring | Kube Prometheus Stack | v0.75.0 | Helm chart version: 61.2.0 |
Monitoring | Prometheus Operator | v0.75.0 | Installed by Kube Prometheus Stack. Helm chart version: 61.2.0 |
Monitoring | Prometheus | v2.53.0 | Installed by Kube Prometheus Stack. Helm chart version: 61.2.0 |
Monitoring | AlertManager | v0.27.0 | Installed by Kube Prometheus Stack. Helm chart version: 61.2.0 |
Monitoring | Prometheus Node Exporter | v1.8.1 | Installed as dependency of Kube Prometheus Stack chart. Helm chart version: 61.2.0 |
Monitoring | Prometheus Elasticsearch Exporter | 1.7.0 | Helm chart version: prometheus-elasticsearch-exporter-6.0.0 |
Monitoring | Grafana | 11.1.0 | Helm chart version: 8.3.2 |
Tracing | Grafana Tempo | 2.5.0 | Helm chart: tempo-distributed (1.15.1) |
Backup | Minio External (self-hosted) | RELEASE.2024-03-07T00:43:48Z | |
Backup | Restic | 0.16.5 | |
Backup | Velero | 1.13.2 | Helm chart version: 6.7.0 |
Secrets | Hashicorp Vault | 1.16.1 | |
Secrets | External Secret Operator | 0.9.20 | Helm chart version: 0.9.20 |
SSO | Keycloak | 24.0.5 | Bitnami Helm chart version: 21.7.0 |
SSO | Oauth2.0 Proxy | 7.6.0 | Helm chart version: 7.7.9 |
GitOps | Flux CD | v2.3.0 |
Comments:
- Previous
- Next