Pi Cluster Documentation
Pi Cluster Documentation

Monitoring (Prometheus)

Prometheus stack installation for kubernetes using Prometheus Operator can be streamlined using kube-prometheus project maintained by the community.

That project collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

Components included in kube-prom-stack package are:

This stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components.

The architecture of components deployed is showed in the following image.

kube-prometheus-stack

About Prometheus Operator

Prometheus operator manages Prometheus and AlertManager deployments and their configuration through the use of Kubernetes CRD (Custom Resource Definitions):

  • Prometheus and AlertManager CRDs: declaratively defines a desired Prometheus/AlertManager setup to run in a Kubernetes cluster. It provides options to configure the number of replicas and persistent storage.
  • ServiceMonitor/PodMonitor/Probe /ScrapeConfig CRDs: manages Prometheus service discovery configuration, defining how a dynamic set of services/pods/static-targets should be monitored.
  • PrometheusRules CRD: defines Prometheus’ alerting and recording rules. Alerting rules, to define alert conditions to be notified (via AlertManager), and recording rules, allowing Prometheus to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series.
  • AlertManagerConfig CRD defines Alertmanager configuration, allowing routing of alerts to custom receivers, and setting inhibition rules.

Kube-Prometheus Stack installation

Installation

Kube-prometheus stack can be installed using helm kube-prometheus-stack maintained by the community

  • Step 1: Add the Prometheus repository

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    
  • Step 2: Fetch the latest charts from the repository

    helm repo update
    
  • Step 3: Create kube-prom-stack-values.yml providing basic configuration

    # Produce cleaner resources names
    cleanPrometheusOperatorObjectNames: true
        
    # AlertManager configuration
    alertmanager:
      alertmanagerSpec:
        ##
        ## Configure access to AlertManager via sub-path
        externalUrl: http://monitoring.${DOMAIN}/alertmanager/
        routePrefix: /alertmanager
        ##
        ## HA configuration: Replicas
        ## Number of Alertmanager POD replicas
        replicas: 1
        ##
        ## POD Storage Spec
        storage:
          volumeClaimTemplate:
            spec:
              storageClassName: ${STORAGE_CLASS}
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 5Gi
        ##
      ## Configure Ingress
      ingress:
        enabled: true
        ingressClassName: nginx
        annotations:
          # Enable cert-manager to create automatically the SSL certificate and store in Secret
          cert-manager.io/cluster-issuer: ca-issuer
          cert-manager.io/common-name: monitoring.${DOMAIN}
        path: /alertmanager
        pathType: Prefix
        hosts:
          - monitoring.${DOMAIN}
        tls:
          - hosts:
            - monitoring.${DOMAIN}
            secretName: monitoring-tls
        
    # Prometheus configuration
    prometheus:
      prometheusSpec:
        ##
        ## Removing default filter Prometheus selectors
        ## Default selector filters defined by default in helm chart.
        ## matchLabels:
        ##   release: {{ $.Release.Name | quote }}
        ## ServiceMonitor, PodMonitor, Probe and Rules need to have label 'release' equals to kube-prom-stack helm release (kube-prom-stack)
        podMonitorSelectorNilUsesHelmValues: false
        probeSelectorNilUsesHelmValues: false
        ruleSelectorNilUsesHelmValues: false
        scrapeConfigSelectorNilUsesHelmValues: false
        serviceMonitorSelectorNilUsesHelmValues: false
        ##
        ## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series.
        ## This is disabled by default. --web.enable-admin-api command line
        ## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis
        enableAdminAPI: true
        ##
        ## Configure access to Prometheus via sub-path
        ## --web.external-url and --web.route-prefix Prometheus command line parameters
        externalUrl: http://monitoring.${DOMAIN}/prometheus/
        routePrefix: /prometheus
        ##
        ## HA configuration: Replicas & Shards
        ## Number of replicas of each shard to deploy for a Prometheus deployment.
        ## Number of replicas multiplied by shards is the total number of Pods created.
        replicas: 1
        shards: 1
        ##
        ## TSDB Configuration
        ## ref: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
        # Enable WAL compression
        walCompression: true
        # Retention data configuration
        retention: 14d
        retentionSize: 50GB
        ## Enable Experimental Features
        # ref: https://prometheus.io/docs/prometheus/latest/feature_flags/
        enableFeatures:
          # Enable Memory snapshot on shutdown.
          - memory-snapshot-on-shutdown
        ##
        ## Limit POD Resources
        resources:
          requests:
            cpu: 100m
          limits:
            memory: 2000Mi
        ##
        ## POD Storage Spec
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: ${STORAGE_CLASS}
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 5Gi
        ##
      ## Configuring Ingress
      ingress:
        enabled: true
        ingressClassName: nginx
        annotations:
          # Enable cert-manager to create automatically the SSL certificate and store in Secret
          cert-manager.io/cluster-issuer: ca-issuer
          cert-manager.io/common-name: monitoring.${DOMAIN}
        path: /prometheus
        pathType: Prefix
        hosts:
          - monitoring.${DOMAIN}
        tls:
          - hosts:
            - monitoring.${DOMAIN}
            secretName: monitoring-tls
        
    # Prometheus Node Exporter Configuration
    prometheus-node-exporter:
      fullnameOverride: node-exporter
        
    # Kube-State-Metrics Configuration
    kube-state-metrics:
      fullnameOverride: kube-state-metrics
        
    # Grafana Configuration
    grafana:
      fullnameOverride: grafana
      # Admin user password
      adminPassword: "s1cret0"
      # grafana configuration
      grafana.ini:
        server:
          domain: monitoring.local.test
          root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana/"
          # When serve_from_subpath is enabled, internal requests from e.g. prometheus get redirected to the defined root_url.
          # This is causing prometheus to not be able to scrape metrics because it accesses grafana via the kubernetes service name and is then redirected to the public url
          # To make Prometheus work, disable server_from_sub_path and add rewrite rule in NGINX proxy
          # ref: https://github.com/grafana/grafana/issues/72577#issuecomment-1682277779
          serve_from_sub_path: false
      ##
      ## Provisioning sidecars
      ##
      sidecar:
        dashboards:
          # Enable dashboard sidecar
          enabled: true
          # Enable discovery in all namespaces
          searchNamespace: ALL
          # Search for ConfigMaps containing `grafana_dashboard` label
          label: grafana_dashboard
          # Annotation containing the folder where sidecar will place the dashboard.
          folderAnnotation: grafana_folder
          provider:
            # disableDelete to activate a import-only behaviour
            disableDelete: true
            # allow Grafana to replicate dashboard structure from filesystem
            foldersFromFilesStructure: true
        datasources:
          # Enable datasource sidecar
          enabled: true
          # Enable discovery in all namespaces
          searchNamespace: ALL
          # Search for ConfigMaps containing `grafana_datasource` label
          label: grafana_datasource
          labelValue: "1"
          ## Grafana Ingress configuration
      ingress:
        enabled: true
        ingressClassName: nginx
        # Values can be templated
        annotations:
          # Enable cert-manager to create automatically the SSL certificate and store in Secret
          cert-manager.io/cluster-issuer: ca-issuer
          cert-manager.io/common-name: monitoring.${DOMAIN}
          # Nginx rewrite rule
          nginx.ingress.kubernetes.io/rewrite-target: /$1
        path: /grafana/?(.*)
        pathType: ImplementationSpecific
        hosts:
          - monitoring.${DOMAIN}
        tls:
          - hosts:
            - monitoring.${DOMAIN}
            secretName: monitoring-tls
    
    # Kubernetes Monitoring
    ## Kubelet
    ##
    # Enable kubelet service
    kubeletService:
      ## Prometheus Operator creates Kubelet service
      ## Prometheus Operator started with options
      ## `--kubelet-service=kube-system/kube-prometheus-stack-kubelet`
      ## `--kubelet-endpoints=true`
      enabled: true
      namespace: kube-system
        
    ## Configuring Kubelet Monitoring
    kubelet:
      enabled: true
      serviceMonitor:
        enabled: true
        
    ## Kube API
    ## Configuring Kube API monitoring
    kubeApiServer:
      enabled: true
      serviceMonitor:
        # Enable Service Monitor
        enabled: true
        
    ## Kube Controller Manager
    kubeControllerManager:
      ## K3s controller manager is not running as a POD
      ## ServiceMonitor and Headless service is generated
      ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
      ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
      ##
      # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors
        
      # Enable KubeController manager montoring
      enabled: true
      # endpoints : IP addresses of K3s control plane nodes
      endpoints: &cp
        - ${K8S_CP_NODE_1}
        - ${K8S_CP_NODE_2}
        - ${K8S_CP_NODE_3}
      service:
        # Enable creation of service
        enable: true
      serviceMonitor:
        # Enable and configure Service Monitor
        enabled: true
        
    ## Etcd monitoring
    kubeEtcd:
      enabled: true
      # K3s etcd not running as a POD, so endpoints need to be configured
      endpoints: *cp
      service:
        enabled: true
        port: 2381
        targetPort: 2381
        
    ## Kube Scheduler
    kubeScheduler:
      ## K3s Kube-scheduler is not running as a POD
      ## ServiceMonitor and Headless service is generated
      ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
      ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
      ##
      # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors
      enabled: true
      # K3s kube-scheduler not running as a POD
      # Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
      #
      # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors
      endpoints: *cp
      serviceMonitor:
        enabled: true
        
    kubeProxy:
      ## K3s kube-proxy is not running as a POD
      ## ServiceMonitor and Headless service is generated
      ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
      ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
      ##
      enabled: true
      # K3s kube-proxy not running as a POD
      endpoints:
        - ${K8S_CP_NODE_1}
        - ${K8S_CP_NODE_2}
        - ${K8S_CP_NODE_3}
        - ${K8S_WK_NODE_1}
        - ${K8S_WK_NODE_2}
        - ${K8S_WK_NODE_2}
      serviceMonitor:
        enabled: true
        
    ## Core DNS monitoring
    ##
    coreDns:
      enabled: true
      # Creates headless service to get accest to all coreDNS Pods
      service:
        enabled: true
        port: 9153
      # Enable service monitor
      serviceMonitor:
        enabled: true
    
  • Step 4: Install kube-Prometheus-stack in kube-prom-stack namespace

    helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f kube-prom-stack-values.yaml --namespace kube-prom-stack --create-namespace
    

Helm Chart Base Configuration

Cleaner resource Names

Following options in values.yaml files makes produce cleaner resources names removing kube-prom-stack prefix from all resources generated from subcharts deployef: Grafana, Node Exporter, Kube-State-Metrics

# Produce cleaner resources names
cleanPrometheusOperatorObjectNames:
# Prometheus Node Exporter Configuration
prometheus-node-exporter:
  # remove kube-prom-stack prefix
  fullnameOverride: node-exporter
# Kube-State-Metrics Configuration
kube-state-metrics:
  # remove kube-prom-stack prefix
  fullnameOverride: kube-state-metrics
# Grafana configuration
grafana:
  # remove kube-prom-stack prefix
  fullnameOverride: grafana

Prometheus Configuration

# Prometheus configuration
prometheus:
  prometheusSpec:
    ##
    ## Removing default filter Prometheus selectors
    ## Default selector filters defined by default in helm chart.
    ## matchLabels:
    ##   release: {{ $.Release.Name | quote }}
    ## ServiceMonitor, PodMonitor, Probe and Rules need to have label 'release' equals to kube-prom-stack helm release (kube-prom-stack)
    podMonitorSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    scrapeConfigSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    ##
    ## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series.
    ## This is disabled by default. --web.enable-admin-api command line
    ## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis
    enableAdminAPI: true
    ##
    ## Configure access to Prometheus via sub-path
    ## --web.external-url and --web.route-prefix Prometheus command line parameters
    externalUrl: http://monitoring.${DOMAIN}/prometheus/
    routePrefix: /prometheus
    ##
    ## HA configuration: Replicas & Shards
    ## Number of replicas of each shard to deploy for a Prometheus deployment.
    ## Number of replicas multiplied by shards is the total number of Pods created.
    replicas: 1
    shards: 1
    ##
    ## TSDB Configuration
    ## ref: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
    # Enable WAL compression
    walCompression: true
    # Retention data configuration
    retention: 14d
    retentionSize: 50GB
    ## Enable Experimental Features
    # ref: https://prometheus.io/docs/prometheus/latest/feature_flags/
    enableFeatures:
      # Enable Memory snapshot on shutdown.
      - memory-snapshot-on-shutdown

The following options are used to configure Prometheus Server

  • Admin API is enabled (`prometheus.prometheusSpec.enableAdminAPI)
  • Prometheus server configured to run behind a proxy under a subpath: prometheus.prometheusSpec.externalUrl and prometheus.prometheusSpec.routePrefix
  • HA configuration: Prometheus number of replicas and shards set to 1. Prometheus Operator is not deploying Prometheus replicas.
  • Prometheus TSDB configuration:
    • Enable WAL compression (prometheus.prometheusSpec.walCompression)
    • Data retention configuration: set by prometheus.prometheusSpec.retention and prometheus.prometheusSpec.retentionSize
  • Experimental features enabled
    • Enable “Memory-snapshot-on-shutdown”.

Grafana configuration

grafana:
  fullnameOverride: grafana
  # Admin user password
  adminPassword: "s1cret0"
  # grafana configuration
  grafana.ini:
    server:
      domain: monitoring.local.test
      root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana/"
      serve_from_sub_path: true
  ##
  ## Provisioning sidecars
  sidecar:
    dashboards:
      # Enable dashboard sidecar
      enabled: true
      # Enable discovery in all namespaces
      searchNamespace: ALL
      # Search for ConfigMaps containing `grafana_dashboard` label
      label: grafana_dashboard
      # Annotation containing the folder where sidecar will place the dashboard.
      folderAnnotation: grafana_folder
      provider:
        # disableDelete to activate a import-only behaviour
        disableDelete: true
        # allow Grafana to replicate dashboard structure from filesystem
        foldersFromFilesStructure: true
    datasources:
      # Enable datasource sidecar
      enabled: true
      # Enable discovery in all namespaces
      searchNamespace: ALL
      # Search for ConfigMaps containing `grafana_datasource` label
      label: grafana_datasource
      labelValue: "1"

The following options are used to configure Grafana

  • Admin user password is set: grafana.adminPassword
  • Grafana server configured to run behind a proxy under a subpath: server configuration under grafana.grafana.ini
  • Dynamic provisioning of dashboard: Configure Grafana’s dashboard sidecar to discover ConfigMaps containing dashboards definitions from all namespaces (grafana.sidecar.dashboards.searchNamespaces) containing label grafana_dashboard. Annoration grafana_folder can be used to select the folder where the dashboard is placed.
  • Dynamic provisioning of datasources: Configure Grafana’s datasources sidecar to discover ConfigMaps containing dashboards definitions from all namespaces (grafana.sidecar.datasources.searchNamespaces) containing label grafana_datasource

Ingress Configuration

To make endpoints available under same FQDN in different paths as specified in the following table

UI endpoint Prefix
Grafana monitoring.${DOMAIN} /grafana
Prometheus   /prometheus
AlertManager   /alertmanager

The following values.yaml need to be specified to generate Ingress resources and configure Prometheus, AlertManager and Grafana servers to run behind a HTTP Proxy under a subpath.

alertmanager:
  alertmanagerSpec:
    externalUrl: http://monitoring.${DOMAIN}/alertmanager/
    routePrefix: /alertmanager
  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      # Enable cert-manager to create automatically the SSL certificate and store in Secret
      cert-manager.io/cluster-issuer: ca-issuer
      cert-manager.io/common-name: monitoring.${DOMAIN}
    path: /alertmanager
    pathType: Prefix
    hosts:
      - monitoring.${DOMAIN}
    tls:
      - hosts:
        - monitoring.${DOMAIN}
        secretName: monitoring-tls
prometheus:
  prometheusSpec:
    name: prometheus
    externalUrl: http://monitoring.${DOMAIN}/prometheus/
    routePrefix: /prometheus
  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      # Enable cert-manager to create automatically the SSL certificate and store in Secret
      cert-manager.io/cluster-issuer: ca-issuer
      cert-manager.io/common-name: monitoring.${DOMAIN}
    path: /prometheus
    pathType: Prefix
    hosts:
      - monitoring.${DOMAIN}
    tls:
      - hosts:
        - monitoring.${DOMAIN}
        secretName: monitoring-tls
grafana:
  # Configure
  grafana.ini:
    server:
      # Run Grafana behind HTTP reverse proxy using a subpath
      domain: monitoring.local.test
      root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana/"
      # When serve_from_subpath is enabled, internal requests from e.g. prometheus get redirected to the defined root_url.
      # This is causing prometheus to not be able to scrape metrics because it accesses grafana via the kubernetes service name and is then redirected to the public url
      # To make Prometheus work, disable server_from_sub_path and add rewrite rule in NGINX proxy
      # ref: https://github.com/grafana/grafana/issues/72577#issuecomment-1682277779
      serve_from_sub_path: false
  # Grafana Ingress configuration
  ingress:
    enabled: true
    ingressClassName: nginx
    # Values can be templated
    annotations:
      # Enable cert-manager to create automatically the SSL certificate and store in Secret
      cert-manager.io/cluster-issuer: ca-issuer
      cert-manager.io/common-name: monitoring.${DOMAIN}
      # Nginx rewrite rule. Needed since serve_from_sub_path has been disabled
      nginx.ingress.kubernetes.io/rewrite-target: /$1
    path: /grafana/?(.*)
    pathType: ImplementationSpecific
    hosts:
      - monitoring.${DOMAIN}
    tls:
      - hosts:
        - monitoring.${DOMAIN}
            secretName: monitoring-tls

POD Configuration: CPU and Memory limit Resources and Storage

Configures AlerManager and Prometheus’ PODs persistent volumes to use the class longhorn and defines volume sizes and limiting resources used by Prometheus POD

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: ${STORAGE_CLASS}
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi
prometheus:
  prometheusSpec:
    ##
    ## Limit POD Resources
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 2000Mi
    ##
    ## POD Storage Spec
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: ${STORAGE_CLASS}
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

Kubernetes Monitoring

Kubernetes system metrics

Kuberentes Documentation - System Metrics details the Kubernetes components exposing metrics in Prometheus format:

  • kube-controller-manager (exposing metrics endpoint at TCP 10257)
  • kube-proxy (exposing /metrics endpoint at TCP 10249)
  • kube-apiserver (exposing /metrics at Kubernetes API port TCP 6443)
  • kube-scheduler (exposing /metrics endpoint at TCP 10259)
  • kubelet (exposing /metrics, /metrics/cadvisor, /metrics/resource and /metrics/probes endpoints at TCP 10250)
Additional services monitoring

Additionally coreDNS and etcd database can be monitored. They both expose Prometheus

kube-prom-stack configuration

Configure Kubernetes control plane metrics endpoints (etcd, controllerManager, scheduler), providing IP addresses of the different nodes of the cluster.

Also if kube-proxy is used, list of Ip addresses of all nodes running the cluster need to be provided for extracting kube-proxy metrics. If Cilium CNI is used kubeProxy monitoring must be disabled, setting kubeProxy.enabled: false

# Kubernetes Monitoring
## Kubelet
##
# Enable kubelet service
kubeletService:
  ## Prometheus Operator creates Kubelet service
  ## Prometheus Operator started with options
  ## `--kubelet-service=kube-system/kube-prometheus-stack-kubelet`
  ## `--kubelet-endpoints=true`
  enabled: true
  namespace: kube-system

## Configuring Kubelet Monitoring
kubelet:
  enabled: true
  serviceMonitor:
    enabled: true

## Kube API
## Configuring Kube API monitoring
kubeApiServer:
  enabled: true
  serviceMonitor:
    # Enable Service Monitor
    enabled: true

## Kube Controller Manager
kubeControllerManager:
  ## K3s controller manager is not running as a POD
  ## ServiceMonitor and Headless service is generated
  ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
  ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
  ##
  # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors

  # Enable KubeController manager montoring
  enabled: true
  # endpoints : IP addresses of K3s control plane nodes
  endpoints: &cp
    - ${K8S_CP_NODE_1}
    - ${K8S_CP_NODE_2}
    - ${K8S_CP_NODE_3}
  service:
    # Enable creation of service
    enable: true
  serviceMonitor:
    # Enable and configure Service Monitor
    enabled: true

## Etcd monitoring
kubeEtcd:
  enabled: true
  # K3s etcd not running as a POD, so endpoints need to be configured
  endpoints: *cp
  service:
    enabled: true
    port: 2381
    targetPort: 2381

## Kube Scheduler
kubeScheduler:
  ## K3s Kube-scheduler is not running as a POD
  ## ServiceMonitor and Headless service is generated
  ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
  ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
  ##
  # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors
  enabled: true
  # K3s kube-scheduler not running as a POD
  # Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
  #
  # ref: https://kubernetes.io/docs/concepts/services-networking/service/#services-without-selectors
  endpoints: *cp
  serviceMonitor:
    enabled: true

kubeProxy:
  ## K3s kube-proxy is not running as a POD
  ## ServiceMonitor and Headless service is generated
  ## headless service is needed, So prometheus can discover each of the endpoints/PODs behind the service.
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
  ## Required headless service to extract the metrics the service need to be defined without selector and so the endpoints must be defined explicitly
  ##
  enabled: true
  # K3s kube-proxy not running as a POD
  endpoints:
    - ${K8S_CP_NODE_1}
    - ${K8S_CP_NODE_2}
    - ${K8S_CP_NODE_3}
    - ${K8S_WK_NODE_1}
    - ${K8S_WK_NODE_2}
    - ${K8S_WK_NODE_2}
  serviceMonitor:
    enabled: true

## Core DNS monitoring
##
coreDns:
  enabled: true
  # Creates headless service to get accest to all coreDNS Pods
  service:
    enabled: true
    port: 9153
  # Enable service monitor
  serviceMonitor:
    enabled: true

What has been deployed by kube-stack?

Applications

Prometheus Operator

The above installation procedure, deploys Prometheus Operator and creates Prometheus and AlertManager CRDs, which make the operator to deploy the corresponding Prometheus and AlertManager PODs (as StatefulSets).

Note that the final specification can be changed in helm chart values (prometheus.prometheusSpec and alertmanager.alertmanagerSpec)

Prometheus Node Exporter

Node Exporter is a Prometheus exporter for hardware and OS metrics exposed by UNIX kernels, written in Go with pluggable metric collectors.

Prometheus Node exporter helm chart is deployed as a subchart of the kube-prometheus-stack helm chart. This chart deploys Prometheus Node Exporter in all cluster nodes as daemonset.

Default kube-prometheus-stack’s Helm Chart values.yml file contains default configuration for Node Exporter Helm chart under prometheus-node-exporter variable:

Default configuration just excludes from the monitoring several mount points and file types (extraArgs) and it creates the corresponding Prometheus Operator’s ServiceMonitor object to start scrapping metrics from this exporter.

Prometheus-node-exporter’s metrics are exposed in TCP port 9100 (/metrics endpoint) of each of the daemonset PODs.

Kube State Metrics

kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. KSM can be used to view metrics on deployments, nodes, pods, and more. KSM holds an entire snapshot of Kubernetes state in memory and continuously generates new metrics based off of it.

kube-state-metrics gathers data using the standard Kubernetes go client and Kubernetes API. This raw data is used to create snapshot of the state of the objects in Kubernetes cluster. it generate Prometheus compliant metrics that are exposed at /metricsendpoint on port 8080.

kube-state-metrics-pipeline

Prometheus Kube State Metrics helm chart is deployed as a subchart of the kube-prometheus-stack helm chart. This chart deploys kube-state-metrics agent. In kube-prometheus-stack’s helm chart kube-state-metrics value is used to pass the configuration to kube-state-metrics’s chart.

Grafana

Grafana helm chart by default is deployed as a subchart of the kube-prometheus-stack helm chart. This chart deploys Grafana.

In kube-prometheus-stack’s helm chart grafana value is used to pass the configuration to grafana’s chart.

By default kube-prom-stack configures Grafana’s following features:

Prometheus Operator Configuration

Prometheus Server

kube-prom-stack generates Prometheus object, so Prometheus Operator can deploy a Prometheus Server in declarative way, using prometheus.prometheusSpec defined in Helm Chart

The resource generated can be obtained after deploying kube-prom-stack helm chart with the command:

kubectl get Prometheus kube-prometheus-stack -o yaml -n kube-prom-stack

The following is a sample file the command could generate:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack
  namespace: kube-prom-stack
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: kube-prometheus-stack-alertmanager
      namespace: kube-proms-stack
      pathPrefix: /alertmanager
      port: http-web
  enableAdminAPI: true
  enableFeatures:
  - memory-snapshot-on-shutdown
  evaluationInterval: 30s
  externalUrl: http://monitoring.${DOMAIN}/prometheus/
  image: quay.io/prometheus/prometheus:v{$PROM_VERSION}
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  portName: http-web
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 1
  resources:
    limits:
      memory: 2000Mi
    requests:
      cpu: 100m
  retention: 14d
  retentionSize: 50GB
  routePrefix: /prometheus
  ruleNamespaceSelector: {}
  ruleSelector: {}
  scrapeConfigNamespaceSelector: {}
  scrapeConfigSelector: {}
  scrapeInterval: 30s
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  serviceAccountName: kube-prometheus-stack-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  shards: 1
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: longhorn
  version: ${PROM_VERSION}

This Prometheus object specifies the following Prometheus configuration:

  • Prometheus version and image installed (spec.version and spec.image). Prometheus version, ${PROM_VERSION} in the previous sample resource manifest file, depends on the kube-prom-stack release version.

  • HA Configuration. Number of shards and replicas per shard (spec.shards and spec.replicas).

    Prometheus basic HA mechanism is implemented through replication. Two (or more) instances (replicas) need to be running with the same configuration except that they will have one external label with a different value to identify them. The Prometheus instances scrape the same targets and evaluate the same rules.

    There is additional HA mechanims, Prometheus’ sharding, which splits targets to be scraped into shards and each shard is assigned to a Prometheus server instance (or to a set, number of replicas).

    The main drawback of this sharding solution is that, to query all data, query federation (e.g. Thanos Query) and distributed rule evaluation engine (e.g. Thanos Ruler) should be deployed.

    Number of shards matches the number of StatefulSet objects to be deployed and numner of replicas are the number of PODs of each StatefulSet.

  • AlertManager server connected to this instance of Prometheus for perfoming the alerting (spec.alerting.alertManager). The connection parameters specified by default matches the AlertManager object created by kube-prometheus-stack

  • Default scrape interval, how often Prometheus scrapes targets (spec.scrapeInterval: 30sg). It can be overwitten in PodMonitor/ServiceMonitor/Probe particular configuration.

  • Rules evaluation period, how often Prometheus evaluates rules (evaluationInterval: 30s)

  • Data retention policy (retention: 10d)

  • Persistent volume specification (storage): volumeClaimTemplate used by the Statefulset objects deployed. In my case volume claim from Longhorn.

  • Rules for filtering the Prometheus Operator Resources (PodMonitor, ServiceMonitor, Probe and PrometheusRule) that applies to this particular instance of Prometheus server. Filtering rules includes both <entity>NamespaceSelector and <entity>Selector to filter resources belonging to matching namespaces and seletors that this Prometheus server will take care of.

    Resource NameSpace Selector Filter
    PodMonitor spec.podMonitorNamespaceSelector spec.podMonitorSelector
    ServiceMonitor spec.serviceMonitorNamespaceSelector spec.serviceMonitorSelector
    Probe spec.probeNamespaceSelector spec.probeSelector
    Rule spec.ruleNamespaceSelector spec.ruleSelector
    ScrapeConfig spec.scrapeConfigNamespaceSelector spec.scrapeConfigSelector

    The following diagram, from official prometheus operator documentation, shows an example of how the filtering rules are applied. A Deployment and Service called my-app is being monitored by Prometheus based on a ServiceMonitor named my-service-monitor:

    prometheus-operator-filtering
    Source: Prometheus Operator Documentation

    By default kube-prometheus-stack values.yaml includes a default filter rule for objects (Namespace Selector filters are all null by default):

    <entity>Selector:
      matchLabels:
        release: <kube-prometheus-stack helm releasea name>
    

    With this rule all PodMonitor/ServiceMonitor/Probe/Prometheus rules resources must have a label: release: kube-prometheus-stack for being managed by the Prometheus Server

    This default filters can be removed providing the following values to helm chart:

    prometheusSpec:
      ruleSelectorNilUsesHelmValues: false
      serviceMonitorSelectorNilUsesHelmValues: false
      podMonitorSelectorNilUsesHelmValues: false
      probeSelectorNilUsesHelmValues: false
      scrapeConfigSelectorNilUsesHelmValues: false
    
AlertManager Server

kube-prom-stack generates Alertmanager object, so Prometheus Operator can deploy a AlertManager Server in declarative way, using prometheus.alertManagerSpec defined in Helm Chart

The resource generated can be obtained after deploying kube-prom-stack helm chart with the command:

kubectl get AlertManager kube-prometheus-stack -o yaml -n kube-prom-stack

The following is a sample file the command could generate:

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  labels:
    name: kube-prometheus-stack
    namespace: kube-prom-stack
spec:
  alertmanagerConfigNamespaceSelector: {}
  alertmanagerConfigSelector: {}
  externalUrl: http://monitoring.${DOMAIN}/alertmanager/
  image: quay.io/prometheus/alertmanager:${ALERTMANAGER_VERSION}
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  portName: http-web
  replicas: 1
  retention: 120h
  routePrefix: /alertManager
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  serviceAccountName: kube-prometheus-stack-alertmanager
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: longhorn
  version: ${ALERTMANAGER_VERSION}

This AlartManager object specifies the following Alert Manager configuration:

  • A version and image: v0.24.0 (spec.version and spec.image). AlertManager version, ${ALERTMANAGER_VERSION} in the previous sample resource manifest file, depends on the kube-prom-stack release version installed.

  • HA Configuration. Number of replicas (spec.replicas).

  • Data retention policy (retention: 120h)

  • Persistent volume specification (storage: volumeClaimTemplate:) used by the Statefulset objects deployed. In my case volume claim from Longhorn.

ServiceMonitor

kube-prometheus-stack creates several ServiceMonitor objects to start scraping metrics from all the applications deployed:

  • Node Exporter
  • Grafana
  • Kube-State-Metrics
  • Prometheus
  • AlertManager
  • Prometheus Operator

and the following Kubernetes services and processes depending on the configuration of the helm chart.

  • coreDNS
  • Kube API server
  • kubelet
  • Kube Controller Manager
  • Kubernetes Scheduler
  • Kubernetes etcd
  • Kube Proxy

The list can be obtained with following command:

kubectl get ServiceMonitor -A
NAMESPACE         NAME                                            AGE
kube-prom-stack   grafana                                         91m
kube-prom-stack   kube-prometheus-stack-alertmanager              91m
kube-prom-stack   kube-prometheus-stack-apiserver                 91m
kube-prom-stack   kube-prometheus-stack-coredns                   91m
kube-prom-stack   kube-prometheus-stack-kube-controller-manager   91m
kube-prom-stack   kube-prometheus-stack-kube-etcd                 91m
kube-prom-stack   kube-prometheus-stack-kube-proxy                91m
kube-prom-stack   kube-prometheus-stack-kube-scheduler            91m
kube-prom-stack   kube-prometheus-stack-kubelet                   91m
kube-prom-stack   kube-prometheus-stack-operator                  91m
kube-prom-stack   kube-prometheus-stack-prometheus                91m
kube-prom-stack   kube-state-metrics                              91m
kube-prom-stack   node-exporter                                   91m
Headless Services

For monitoring Kubernetes metric endpoints exposed by the different nodes of the cluster, kube-prometheus-stack creates a set of kubernetes headless service are created

These services have the following spec.clusterIP=None, allowing Prometheus to discover each of the pods behind the service. Since the metrics are exposed not by a pod but by a kubernetes process, the service need to be defined without selector and the endpoints must be defined explicitly.

kubectl get svc --field-selector spec.clusterIP=None -A
NAMESPACE         NAME                                            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
kube-prom-stack   alertmanager-operated                           ClusterIP   None         <none>        9093/TCP,9094/TCP,9094/UDP     125m
kube-prom-stack   prometheus-operated                             ClusterIP   None         <none>        9090/TCP                       125m
kube-system       kube-prometheus-stack-coredns                   ClusterIP   None         <none>        9153/TCP                       125m
kube-system       kube-prometheus-stack-kube-controller-manager   ClusterIP   None         <none>        10257/TCP                      125m
kube-system       kube-prometheus-stack-kube-etcd                 ClusterIP   None         <none>        2381/TCP                       125m
kube-system       kube-prometheus-stack-kube-proxy                ClusterIP   None         <none>        10249/TCP                      125m
kube-system       kube-prometheus-stack-kube-scheduler            ClusterIP   None         <none>        10259/TCP                      125m
kube-system       kube-prometheus-stack-kubelet                   ClusterIP   None         <none>        10250/TCP,10255/TCP,4194/TCP   125m
Prometheus Rules

kube-prometheus-stack creates several PrometheusRule resources to specify the alerts and the metrics that Prometheus generated based on the scraped metrics (alerting and record rules)

The rules provisioned can be found here: Prometheus rules created by kube-prometheus-stack chart.

kubectl get PrometheusRules -A
NAMESPACE         NAME                                                              AGE
kube-prom-stack   kube-prometheus-stack-alertmanager.rules                          95m
kube-prom-stack   kube-prometheus-stack-config-reloaders                            95m
kube-prom-stack   kube-prometheus-stack-etcd                                        95m
kube-prom-stack   kube-prometheus-stack-general.rules                               95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-cpu-usage-seconds-tot   95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-memory-cache            95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-memory-rss              95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-memory-swap             95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-memory-working-set-by   95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.container-resource                95m
kube-prom-stack   kube-prometheus-stack-k8s.rules.pod-owner                         95m
kube-prom-stack   kube-prometheus-stack-kube-apiserver-availability.rules           95m
kube-prom-stack   kube-prometheus-stack-kube-apiserver-burnrate.rules               95m
kube-prom-stack   kube-prometheus-stack-kube-apiserver-histogram.rules              95m
kube-prom-stack   kube-prometheus-stack-kube-apiserver-slos                         95m
kube-prom-stack   kube-prometheus-stack-kube-prometheus-general.rules               95m
kube-prom-stack   kube-prometheus-stack-kube-prometheus-node-recording.rules        95m
kube-prom-stack   kube-prometheus-stack-kube-scheduler.rules                        95m
kube-prom-stack   kube-prometheus-stack-kube-state-metrics                          95m
kube-prom-stack   kube-prometheus-stack-kubelet.rules                               95m
kube-prom-stack   kube-prometheus-stack-kubernetes-apps                             95m
kube-prom-stack   kube-prometheus-stack-kubernetes-resources                        95m
kube-prom-stack   kube-prometheus-stack-kubernetes-storage                          95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system                           95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system-apiserver                 95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system-controller-manager        95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system-kube-proxy                95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system-kubelet                   95m
kube-prom-stack   kube-prometheus-stack-kubernetes-system-scheduler                 95m
kube-prom-stack   kube-prometheus-stack-node-exporter                               95m
kube-prom-stack   kube-prometheus-stack-node-exporter.rules                         95m
kube-prom-stack   kube-prometheus-stack-node-network                                95m
kube-prom-stack   kube-prometheus-stack-node.rules                                  95m
kube-prom-stack   kube-prometheus-stack-prometheus                                  95m
kube-prom-stack   kube-prometheus-stack-prometheus-operator                         95m

Grafana Configuration

DataSources

kube-prom-stack generates a configMap containing Grafana’s Prometheus and AlertManager data-sources, so Grafana can dynamically import it using provisioning sidecar.

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-prometheus-stack-grafana-datasource
  namespace: kube-prom-stack
    labels:
    grafana_datasource: "1"
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: "Prometheus"
      type: prometheus
      uid: prometheus
      url: http://kube-prometheus-stack-prometheus.kube-prom-stack:9090/prometheus
      access: proxy
      isDefault: true
      jsonData:
        httpMethod: POST
        timeInterval: 30s
    - name: "Alertmanager"
      type: alertmanager
      uid: alertmanager
      url: http://kube-prometheus-stack-alertmanager.kube-prom-stack:9093/alertmanager
      access: proxy
      jsonData:
        handleGrafanaManagedAlerts: false
        implementation: prometheus
Dashboards

kube-prom-stack generates configMaps containing Grafana’s dashboards for displaying metrics of the monitored Services (Kubernetes, coreDNS, NodeExporter, Prometheus, Kube-State-Metrics, etc.)

List of dashboards can be queried with the following command:

kubectl get cm -l grafana_dashboard  -n kube-prom-stack

As example:

kubectl get cm -l grafana_dashboard  -n kube-prom-stack 
NAME                                                      DATA   AGE
kube-prometheus-stack-alertmanager-overview               1      8m15s
kube-prometheus-stack-apiserver                           1      8m15s
kube-prometheus-stack-cluster-total                       1      8m15s
kube-prometheus-stack-controller-manager                  1      8m15s
kube-prometheus-stack-etcd                                1      8m15s
kube-prometheus-stack-grafana-overview                    1      8m15s
kube-prometheus-stack-k8s-coredns                         1      8m15s
kube-prometheus-stack-k8s-resources-cluster               1      8m15s
kube-prometheus-stack-k8s-resources-multicluster          1      8m15s
kube-prometheus-stack-k8s-resources-namespace             1      8m15s
kube-prometheus-stack-k8s-resources-node                  1      8m15s
kube-prometheus-stack-k8s-resources-pod                   1      8m15s
kube-prometheus-stack-k8s-resources-workload              1      8m15s
kube-prometheus-stack-k8s-resources-workloads-namespace   1      8m15s
kube-prometheus-stack-kubelet                             1      8m15s
kube-prometheus-stack-namespace-by-pod                    1      8m15s
kube-prometheus-stack-namespace-by-workload               1      8m15s
kube-prometheus-stack-node-cluster-rsrc-use               1      8m15s
kube-prometheus-stack-node-rsrc-use                       1      8m15s
kube-prometheus-stack-nodes                               1      8m15s
kube-prometheus-stack-nodes-aix                           1      8m15s
kube-prometheus-stack-nodes-darwin                        1      8m15s
kube-prometheus-stack-persistentvolumesusage              1      8m15s
kube-prometheus-stack-pod-total                           1      8m15s
kube-prometheus-stack-prometheus                          1      8m15s
kube-prometheus-stack-proxy                               1      8m15s
kube-prometheus-stack-scheduler                           1      8m15s
kube-prometheus-stack-workload-total                      1      8m15s

Additional Configuration

Installing Grafana separately

Grafana helm chart by default is deployed as a sub-chart of the kube-prometheus-stack helm chart.

Grafana can be installed outside Kube-Prom-Stack to have better control of the installation (version and configuration).

The following kube-prom-stack helm chart values.yaml disables Grafana subchart Helm chart installation (grafana.enabled: false). The creation of kube-prometheus-stack dashboards can be forced (grafana.forceDeployDashboards), so configMaps containing kube-prom-stack’s dashboards can be deployed.

Also annotation to all Grafana dashboards (ConfigMaps) can be added, so Grafana can deploy them into a specific folder (grafana_folder annotation)

# kube-prometheus-stack helm values (disable-grafana)
# Disabling instalation of Grafana sub-chart
grafana:
  enabled: false
  # Enable deployment of kube-prometheus-stack grafana dashboards
  forceDeployDashboards: true
  # Adding grafana folder annotation
  sidecar:
    dashboards:
      annotations:
        grafana_folder: Kubernetes

See “Grafana Kubernetes Installation” for installing Grafana separately and how to further configure it (Integation with Keycloak for single-sign-on, automate dashboards download from Grafana Labs. etc..

K3S Monitoring configuration

K3s configuration

Enabling remote access to /metrics endpoints

By default, K3S components (Scheduler, Controller Manager and Proxy) do not expose their endpoints to be able to collect metrics. Their /metrics endpoints are bind to 127.0.0.1, exposing them only to localhost, not allowing the remote query.

The following K3S installation arguments need to be provided, to change this behavior.

--kube-controller-manager-arg 'bind-address=0.0.0.0' 
--kube-proxy-arg 'metrics-bind-address=0.0.0.0'
--kube-scheduler-arg 'bind-address=0.0.0.0

Enabling etcd metrics

In case etcd is used as cluster database, the following argument has to be provided to k3s control plane nodes:

--etcd-expose-metrics=true

K3S duplicate metrics issue

K3S distribution has a special behavior related to metrics exposure.

K3s deploys a single process in each cluster node: k3s-server running on master nodes or k3s-agent running on worker nodes. All kubernetes components running in the node share the same memory, and so K3s is emitting the same metrics in all /metrics endpoints available in a node: api-server, kubelet (TCP 10250), kube-proxy (TCP 10249), kube-scheduler (TCP 10251) and kube-controller-manager (TCP 10257). When polling one of the kubernetes components metrics endpoints, the metrics belonging to other kubernetes components are not filtered out.

k3s master, running all kubernetes components, is emitting the same metrics in all the ports. k3s workers, only running kubelet and kube-proxy components, emit the same metrics in both TCP 10250 and 10249 ports. By the other hand, kubelet additional metrics endpoints (/metrics/cadvisor, /metrics/resource and /metrics/probes) are only available at TCP 10250.

By default kube-prometheus-stack enables the scraping of all Kubernetes metrics endpoint (TCP Ports 10249,10250,10251, 10257 and apiserver) and that causes the ingestion of duplicated metrics. Duplicated metrics in Prometheus should be avoided so memory and CPU consumption can be reduced.

Two possible solutions:

  1. Remove duplicate metrics in Prometheus scrapping configuration, discarding duplicate metrics
    • This solution avoid the ingestion of duplicates but it does not avoid the overlapping scrapping
    • Lack of documentation about the metrics exposed by each endpoint makes difficult to configure the discarding metric rules.
  2. Disabling scrapping of most Kubernetes endpoints, keeping only kubelet port scrapping (TCP: 10250): /metrics, /metrics/cadvisor, /metrics/resource and /metrics/probes
    • This solution avoid both data duplication ingestion and overlapping scrapping
    • As a draw-back, default kube-Prometheus-stack dashboards and prometheus rules are not valid since they use different job labels to identify metrics coming from different end-points). Dashboards and prometheus rules need to be generated so kubelet jobname is used.

Solution: Monitor only kubelet endpoints and re-build K3s-compliant dashboards and prometheys rules

Disabling kube-prom-stack K8s monitoring
grafana:
  # The default dashboards are not working for `k3s`, so we disable them.
  defaultDashboardsEnabled: false
defaultRules:
  # The default rules are not working for `k3s`, so we disable them.
  create: false
# Source for issues/solutions: https://github.com/k3s-io/k3s/issues/3619#issuecomment-1425852034
# `k3s` exposes all metrics combined for each component, so we don't need to scrape them separately
# We'll only scrape kubelet, otherwise we'd get duplicate metrics.
kubelet:
  enabled: true
# Kubernetes API server collects data from master nodes, while kubelet collects data from master and worker nodes
# To not duplicate metrics we'll only scrape Kubelet
kubeApiServer:
  enabled: false
kubeControllerManager:
  enabled: false
kubeProxy:
  enabled: false
kubeScheduler:
  enabled: false

With this configuration, kube-prom-stack does not install any Grafana dashboard (grafana.defaultDashboardsEnabled false) or any Prometheus rule (defaultRules.create false) Only Kubelet endpoint monitoring is kept, disabling monitoring of rest of Kubernetes components.

Creating Grafana and Prometheus rules from available mixins

The following process describe how to generate K3s-compliant Prometheus Monitoring Mixins1, replicating building process of kube-prom-stack.

The kube-prometheus project uses monitoring mixins to generate alerts and dashboards. Monitoring mixins are a collection of Jsonnet libraries that generate dashboards and alerts for Kubernetes. The kubernetes-mixin is a mixin that generates dashboards and alerts for Kubernetes. The node-exporter, coredns, grafana, prometheus and prometheus-operator mixins are also used to generate dashboards and alerts for the Kubernetes cluster.

Using jsonnet the kuberentes dashboards and Prometheus rules can be generated from mixins

Instead of installing go locally as described in the Adin’s blog, we will generate a jsonnet development environment using docker to build everything and extract the required yaml files

The following steps will create the following directory structure and files

k3s-mixins
├── build
│   ├── Dockerfile
│   ├── Makefile
│   ├── out
│   ├── src
│   │   ├── generate.sh
│   │   ├── main.jsonnet
└── kustomization.yaml
  • Create a k3s-mixin building directory
    mkdir -p k3s-mixins/build
    mkdir -p k3s-mixins/out
    mkidr -p k3s-mixins/src
    
  • Create k3s-mixins/build/src/main.jsonnet)

    # We use helper functions from kube-prometheus to generate dashboards and alerts for Kubernetes.
    local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
        
    local kubernetesMixin = addMixin({
      name: 'kubernetes',
      dashboardFolder: 'Kubernetes',
      mixin: (import 'kubernetes-mixin/mixin.libsonnet') + {
        _config+:: {
          cadvisorSelector: 'job="kubelet"',
          kubeletSelector: 'job="kubelet"',
          kubeSchedulerSelector: 'job="kubelet"',
          kubeControllerManagerSelector: 'job="kubelet"',
          kubeApiserverSelector: 'job="kubelet"',
          kubeProxySelector: 'job="kubelet"',
          showMultiCluster: false,
        },
      },
    });
        
    local nodeExporterMixin = addMixin({
      name: 'node-exporter',
      dashboardFolder: 'General',
      mixin: (import 'node-mixin/mixin.libsonnet') + {
        _config+:: {
          nodeExporterSelector: 'job="node-exporter"',
          showMultiCluster: false,
        },
      },
    });
        
    local corednsMixin = addMixin({
      name: 'coredns',
      dashboardFolder: 'DNS',
      mixin: (import 'coredns-mixin/mixin.libsonnet') + {
        _config+:: {
          corednsSelector: 'job="coredns"',
        },
      },
    });
        
    local etcdMixin = addMixin({
      name: 'etcd',
      dashboardFolder: 'Kubernetes',
      mixin: (import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
        _config+:: {
          clusterLabel: 'cluster',
        },
      },
    });
        
    local grafanaMixin = addMixin({
      name: 'grafana',
      dashboardFolder: 'Grafana',
      mixin: (import 'grafana-mixin/mixin.libsonnet') + {
        _config+:: {},
      },
    });
        
    local prometheusMixin = addMixin({
      name: 'prometheus',
      dashboardFolder: 'Prometheus',
      mixin: (import 'prometheus/mixin.libsonnet') + {
        _config+:: {
          showMultiCluster: false,
        },
      },
    });
        
    local prometheusOperatorMixin = addMixin({
      name: 'prometheus-operator',
      dashboardFolder: 'Prometheus Operator',
      mixin: (import 'prometheus-operator-mixin/mixin.libsonnet') + {
        _config+:: {},
      },
    });
        
    local stripJsonExtension(name) =
      local extensionIndex = std.findSubstr('.json', name);
      local n = if std.length(extensionIndex) < 1 then name else std.substr(name, 0, extensionIndex[0]);
      n;
        
    local grafanaDashboardConfigMap(folder, name, json) = {
      apiVersion: 'v1',
      kind: 'ConfigMap',
      metadata: {
        name: 'grafana-dashboard-%s' % stripJsonExtension(name),
        namespace: 'kube-prom-stack',
        labels: {
          grafana_dashboard: '1',
        },
      },
      data: {
        [name]: std.manifestJsonEx(json, '    '),
      },
    };
        
    local generateGrafanaDashboardConfigMaps(mixin) = if std.objectHas(mixin, 'grafanaDashboards') && mixin.grafanaDashboards != null then {
      ['grafana-dashboard-' + stripJsonExtension(name)]: grafanaDashboardConfigMap(folder, name, mixin.grafanaDashboards[folder][name])
      for folder in std.objectFields(mixin.grafanaDashboards)
      for name in std.objectFields(mixin.grafanaDashboards[folder])
    } else {};
        
    local nodeExporterMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(nodeExporterMixin);
    local kubernetesMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(kubernetesMixin);
    local corednsMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(corednsMixin);
    local etcdMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(etcdMixin);
    local grafanaMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(grafanaMixin);
    local prometheusMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(prometheusMixin);
    local prometheusOperatorMixinHelmGrafanaDashboards = generateGrafanaDashboardConfigMaps(prometheusOperatorMixin);
        
    local grafanaDashboards =
      kubernetesMixinHelmGrafanaDashboards +
      nodeExporterMixinHelmGrafanaDashboards +
      corednsMixinHelmGrafanaDashboards +
      etcdMixinHelmGrafanaDashboards +
      grafanaMixinHelmGrafanaDashboards +
      prometheusMixinHelmGrafanaDashboards +
      prometheusOperatorMixinHelmGrafanaDashboards;
        
        
    local prometheusAlerts = {
      'kubernetes-mixin-rules': kubernetesMixin.prometheusRules,
      'node-exporter-mixin-rules': nodeExporterMixin.prometheusRules,
      'coredns-mixin-rules': corednsMixin.prometheusRules,
      'etcd-mixin-rules': etcdMixin.prometheusRules,
      'grafana-mixin-rules': grafanaMixin.prometheusRules,
      'prometheus-mixin-rules': prometheusMixin.prometheusRules,
      'prometheus-operator-mixin-rules': prometheusOperatorMixin.prometheusRules,
    };
        
    grafanaDashboards + prometheusAlerts
    
  • Create script (k3s-mixins/build/src/generate.sh) to automate the generation of the yaml files from the mixins

    #!/bin/sh
        
    set -e # Exit on any error
    set -u # Treat unset variables as an error
        
    # Define paths
    MIXINS_DIR="./templates"
        
    # Function to escape YAML content
    escape_yaml() {
      local file_path="$1"
      echo "Escaping $file_path..."
      # Read the file content, process, and overwrite it
      sed -i \
        -e 's/{{/{{`{{/g' \
        -e 's/}}/}}`}}/g' \
        -e 's/{{`{{/{{`{{`}}/g' \
        -e 's/}}`}}/{{`}}`}}/g' \
        "$file_path"
      echo "Escaped $file_path."
    }
        
    # Clean the templates directory
    echo "Cleaning templates directory..."
    rm -rf ${MIXINS_DIR}/*
    echo "Templates directory cleaned."
        
    # Convert Jsonnet to YAML
    echo "Converting Jsonnet to YAML..."
    jsonnet main.jsonnet -J vendor -m ${MIXINS_DIR} | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml' -- {}
    echo "Jsonnet conversion completed."
        
    # Remove all non-YAML files
    echo "Removing non-YAML files..."
    find ${MIXINS_DIR} -type f ! -name "*.yaml" -exec rm {} +
    echo "Non-YAML files removed."
        
    # Escape brackets in the rules yaml files similar to how the kube-prometheus-stack Helm chart does.
    # https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/hack/sync_prometheus_rules.py#L259-L260
    echo "Escaping YAML files..."
    find ${MIXINS_DIR} -name '*-rules.yaml' | while read -r file; do
      escape_yaml "$file"
    done
    echo "YAML files escaped."
        
    echo "Processing completed successfully!"
    
  • Create Dockerfile (k3s-mixins/build/Dockerfile) to build and extract the generated yaml files

    FROM golang:1.24.2-alpine AS build
    LABEL stage=builder
        
    WORKDIR /k3s-mixins
        
    COPY src/ .
        
    # Install required packages
    RUN apk add git
        
    # Install jsonnet and the jsonnet-bundler
    RUN go install github.com/google/go-jsonnet/cmd/jsonnet@latest
    RUN go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
        
    # Install gojsontoyaml
    RUN go install github.com/brancz/gojsontoyaml@latest
        
    # Init Jsonnet project
    RUN jb init
        
    # Install mixinx
    RUN jb install github.com/kubernetes-monitoring/kubernetes-mixin@master
    RUN jb install github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus@main
    RUN jb install github.com/povilasv/coredns-mixin@master
           
    # Create output directory for the manifest files
    RUN mkdir templates
        
    # Execute command to generate
    RUN chmod +x generate.sh
    RUN ./generate.sh
        
    FROM scratch AS mixins
    COPY --from=build /k3s-mixins/templates /    
    
  • Execute docker build command within k3s-mixins/build directory to extract dashboards and rule files to out directory
    cd k3x-mixins/build
    
    docker build --no-cache --target mixins --output out/ .
    
  • Go to build/out directory and apply all manifest files

    kubectl apply -f .
    

  1. A monitoring mixin is a set of Grafana dashboards and Prometheus rules and alerts, packaged together in a reusable and extensible bundle. Mixins are written in jsonnet, and are typically installed and updated with jsonnet-bundler.

    For more information about mixins, see:


Last Update: Jun 23, 2025

Comments: