Monitoring (Prometheus)

Prometheus stack installation for kubernetes using Prometheus Operator can be streamlined using kube-prometheus project maintaned by the community.

That project collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

Components included in kube-stack package are:

This stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components.

The architecture of components deployed is showed in the following image.

kube-prometheus-stack

About Prometheus Operator

Prometheus operator manages Prometheus and AlertManager deployments and their configuration through the use of Kubernetes CRD (Custom Resource Definitions):

  • Prometheus and AlertManager CRDs: declaratively defines a desired Prometheus/AlertManager setup to run in a Kubernetes cluster. It provides options to configure the number of replicas and persistent storage.
  • ServiceMonitor/PodMonitor/Probe CRDs: manages Prometheus service discovery configuration, defining how a dynamic set of services/pods/static-targets should be monitored.
  • PrometheusRules CRD: defines Prometheus’ alerting and recording rules. Alerting rules, to define alert conditions to be notified (via AlertManager), and recording rules, allowing Prometheus to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series.
  • AlertManagerConfig CRD defines Alertmanager configuration, allowing routing of alerts to custom receivers, and setting inhibition rules.

Kube-Prometheus Stack installation

Helm chart installation

Kube-prometheus stack can be installed using helm kube-prometheus-stack maintaind by the community

  • Step 1: Add the Prometheus repository

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    
  • Step2: Fetch the latest charts from the repository

    helm repo update
    
  • Step 3: Create namespace

    kubectl create namespace k3s-monitoring
    
  • Step 3: Create values.yml

    prometheusOperator:
      # Relabeling job name for operator metrics
      serviceMonitor:
        relabelings:
        # Replace job value
        - sourceLabels:
          - __address__
          action: replace
          targetLabel: job
          replacement: prometheus-operator
      # Disable creation of kubelet service
      kubeletService:
        enabled: false
    alertmanager:
      alertmanagerSpec:
        # Subpath /alertmanager configuration
        externalUrl: http://monitor.picluster.ricsanfre.com/alertmanager/
        routePrefix: /
        # PVC configuration
        storage:
          volumeClaimTemplate:
            spec:
              storageClassName: longhorn
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 50Gi
      # ServiceMonitor job relabel
      serviceMonitor:
        relabelings:
          # Replace job value
          - sourceLabels:
            - __address__
            action: replace
            targetLabel: job
            replacement: alertmanager
    prometheus:
      prometheusSpec:
        # Subpath /prometheus configuration
        externalUrl: http://monitoring.picluster.ricsanfre.com/prometheus/
        routePrefix: /
        # Resources request and limits
        resources:
          requests:
            memory: 1Gi
          limits:
            memory: 1Gi
        # PVC configuration
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: longhorn
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 50Gi
      # ServiceMonitor job relabel
      serviceMonitor:
        relabelings:
          # Replace job value
          - sourceLabels:
            - __address__
            action: replace
            targetLabel: job
            replacement: prometheus
    grafana:
      # Configuring /grafana subpath
      grafana.ini:
        server:
          domain: monitoring.picluster.ricsanfre.com
          root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana/"
          serve_from_sub_path: true
      # Admin user password
      adminPassword: "admin_password"
      # List of grafana plugins to be installed
      plugins:
        - grafana-piechart-panel
      # ServiceMonitor label and job relabel
      serviceMonitor:
        labels:
          release: kube-prometheus-stack
        relabelings:
          # Replace job value
          - sourceLabels:
            - __address__
            action: replace
            targetLabel: job
            replacement: grafana
    # Disabling monitoring of K8s services.
    # Monitoring of K3S components will be configured out of kube-prometheus-stack
    kubelet:
      enabled: false
    kubeApiServer:
      enabled: false
    kubeControllerManager:
      enabled: false
    kubeScheduler:
      enabled: false
    kubeProxy:
      enabled: false
    kubeEtcd:
      enabled: false
    # Disable K8S Prometheus Rules
    # Rules for K3S components will be configured out of kube-prometheus-stack
    defaultRules:
      create: true
      rules:
        etcd: false
        k8s: false
        kubeApiserverAvailability: false
        kubeApiserverBurnrate: false
        kubeApiserverHistogram: false
        kubeApiserverSlos: false
        kubeControllerManager: false
        kubelet: false
        kubeProxy: false
        kubernetesApps: false
        kubernetesResources: false
        kubernetesStorage: false
        kubernetesSystem: false
        kubeScheduler: false
    

    The above chart values.yml:

    • Configures AlerManager and Prometheus’ PODs persistent volumes to use longhorn (alertmanager.alertmanagerSpec.storage.volumeClaimTemplate and prometheus. prometheusSpec.storageSpec.volumeClaimTemplate)

    • Configure prometheus and alertmanager to run behind a proxy http under subpaths /prometheus and /alertmanager (prometheus.prometheusSpec.externalUrl/alertmanager.alertManagerSpec.externalUrl and prometheus.prometheusSpec.routePrefix/alertmanager.alertManagerSpec.routePrefix)

    • Set memory resource limits for Prometheus POD prometheus.prometheusSpec.resources

    • Sets Grafana’s specific configuration (admin password grafana.adminPassword and list of plugins to be installed: grafana.plugins).

    • Configure Grafana to run behind a proxy http under a subpath /grafana (grafana.grafana.ini.server).

    • Disables monitoring of kubernetes components (apiserver, etcd, kube-scheduler, kube-controller-manager, kube-proxy and kubelet): kubeApiServer.enabled, kubeControllerManager.enabled, kubeScheduler.enabled, kubeProxy.enabled , kubelet.enabled and kubeEtcd.enabled.

      Monitoring of K3s components will be configured outside kube-prometheus-stack. See explanation in section K3S components monitoring below.

    • Sets specific configuration for the ServiceMonitor objects associated with Prometheus, Prometheus Operator and Grafana monitoring.

      Relabeling the job name (grafana.serviceMonitor.relabelings, prometheus.serviceMonitor.relabelings and prometheusOperator.serviceMonitor.relabelings) and setting the proper label for Grafana’s ServiceMonitor (grafana.serviceMonitor.labels.release) to match the selector of Prometheus Operator (otherwise Grafana is not monitored).

  • Step 4: Install kube-Prometheus-stack in the monitoring namespace with the overriden values

    helm install -f values.yml kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace k3s-monitoring
    

Ingress resources configuration

Enable external access to Prometheus, Grafana and AlertManager through Ingress Controller.

Instead of using separate DNS domains to access the three components, Prometheus, Alertmanager and Grafana are configured to run behind Traefik HTTP Proxy using a unique domain,monitoring.picluster.ricsanfre.com, with different subpath for each component:

  • Grafana: https://monitoring.picluster.ricsanfre.com/grafana
  • Prometheus: https://monitoring.picluster.ricsanfre.com/prometheus
  • Alertmanager: https://monitoring.picluster.ricsanfre.com/alertmanager

DNS domain monitoring.picluster.ricsanfre.com must be mapped, in cluster DNS server configuration, to Traefik Load Balancer service extenal IP.

Prometheus, Grafana and alertmanager backend are not providing secure communications (HTTP traffic) and thus Ingress resource will be configured to enable HTTPS (Traefik TLS end-point) and redirect all HTTP traffic to HTTPS.

Since prometheus and alertmanager frontends does not provide any authentication mechanism, Traefik HTTP basic authentication will be configured.

  • Step 1. Create TLS secret resource for monitor.picluster.ricsanfre.com

    Cert-manager will be used to automatically generate the required TLS secret.

    Create the manifest file monitor-cert.yml

    ---
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: monitoring-cert
      namespace: k3s-monitor
    spec:
      secretName: monitoring-secret
      issuerRef:
        name: ca-issuer
        kind: ClusterIssuer
      commonName: monitoring.picluster.ricsanfre.com
      dnsNames:
      - monitoring.picluster.ricsanfre.com
      privateKey:
        algorithm: ECDSA
    

    monitor-cert Certificate and TLS Secret monitoring-secret will be created for monitoring.picluster.ricsanfre.com domain.

  • Step 2. Create Stripping Prefix Middleware

    Traefik ingress route need to be configured using a stripPrefix middleware, to remove path prefix before sending the request to the backend service.

    ---
    # Strip prefix middleware
    
    apiVersion: traefik.containo.us/v1alpha1
    kind: Middleware
    metadata:
      name: stripprefix
      namespace: k3s-monitoring
    spec:
      stripPrefix:
        prefixes:
          - "/prometheus"
          - "/alertmanager"
          - "/grafana"
        forceSlash: false
    
  • Step 3. Create IngressRoute resource to redirect all HTTP traffic to HTTPS

    ---
    # IngressRoute http redirect
    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
      name: monitoring-http
      namespace: k3s-monitoring
    spec:
      entryPoints:
        - web
      routes:
      - kind: Rule
        match: Host(`monitoring.picluster.ricsanfre.com`)
        priority: 1
        middlewares:
          - name: redirect
            namespace: traefik-system
        services:
          - kind: TraefikService
            name: noop@internal
    

    This resource uses redirect Middleware created during Traefik installation. This middleware redirect all HTTP traffic to HTTPS.

  • Step 5. Create IngressRoute listening to HTTPS traffic

    ---
    # IngressRoute https
    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
      name: monitoring-https
      namespace: k3s-monitoring
    spec:
      entryPoints:
        - websecure
      routes:
      - kind: Rule
        match: Host(`monitoring.picluster.ricsanfre.com`) && PathPrefix(`/prometheus`)
        services:
        - name: kube-prometheus-stack-prometheus
          port: 9090
          namespace: k3s-monitoring
        middlewares:
          - name: basic-auth
            namespace: traefik-system
          - name: stripprefix
            namespace: k3s-monitoring
      - kind: Rule
        match: Host(`monitoring.picluster.ricsanfre.com`) && PathPrefix(`/alertmanager`)
        services:
        - name: kube-prometheus-stack-alertmanager
          port: 9093
          namespace: k3s-monitoring
        middlewares:
          - name: basic-auth
            namespace: traefik-system
          - name: stripprefix
            namespace: k3s-monitoring
      - kind: Rule
        match: Host(`monitoring.picluster.ricsanfre.com`) && PathPrefix(`/grafana`)
        services:
        - name: kube-prometheus-stack-grafana
          port: 80
          namespace: k3s-monitoring
        middlewares:
          - name: stripprefix
            namespace: k3s-monitoring
      tls:
        secretName: monitoring-secret
    

    This resource uses the following Middlewares:

    • basic-auth Middleware created during Traefik installation. This middleware provides HTTP basic authentication.

    • strippreffix Midlewware defined in Step 2.

    It uses the TLS secret, monitoring-secret defined in Step 1.

  • Step 6. Create a manifest file monitoring_ingress.yml with the previous resources and apply the manifest file

    kubectl apply -f monitoring_ingress.yml
    

What has been deployed by kube-stack?

Prometheus Operator

The above installation procedure, deploys Prometheus Operator and creates the needed Prometheus and AlertManager Objects, which make the operator to deploy the corresponding Prometheus and AlertManager PODs (as StatefulSets).

Note that the final specification can be changed in helm chart values (prometheus.prometheusSpec and alertmanager.alertmanagerSpec)

Prometheus Object

This object contain the desirable configuration of the Prometheus Server

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: k3s-monitoring
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 39.2.1
    chart: kube-prometheus-stack-39.2.1
    heritage: Helm
    release: kube-prometheus-stack
  name: kube-prometheus-stack-prometheus
  namespace: k3s-monitoring
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: kube-prometheus-stack-alertmanager
      namespace: k3s-monitoring
      pathPrefix: /
      port: http-web
  enableAdminAPI: false
  evaluationInterval: 30s
  externalUrl: http://kube-prometheus-stack-prometheus.k3s-monitoring:9090
  image: quay.io/prometheus/prometheus:v2.37.0
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: kube-prometheus-stack
  portName: http-web
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: kube-prometheus-stack
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: kube-prometheus-stack
  scrapeInterval: 30s
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: kube-prometheus-stack-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: kube-prometheus-stack
  shards: 1
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: longhorn
  version: v2.37.0

This Prometheus object specifies the following Prometheus configuration:

  • Prometheus version and image installed (v2.37.0) (spec.version and spec.image).

  • HA Configuration. Number of shards and replicas per shard (spec.shards and spec.replicas).

    Prometheus basic HA mechanism is implemented through replication. Two (or more) instances (replicas) need to be running with the same configuration except that they will have one external label with a different value to identify them. The Prometheus instances scrape the same targets and evaluate the same rules.

    There is additional HA mechanims, Prometheus’ sharding, which splits targets to be scraped into shards and each shard is assigned to a Prometheus server instance (or to a set, number of replicas).

    The main drawback of this sharding solution is that, to query all data, query federation (e.g. Thanos Query) and distributed rule evaluation engine (e.g. Thanos Ruler) should be deployed.

    Number of shards matches the number of StatefulSet objects to be deployed and numner of replicas are the number of PODs of each StatefulSet.

  • AlertManager server connected to this instance of Prometheus for perfoming the alerting (spec.alerting.alertManager). The connection parameters specified by default matches the AlertManager object created by kube-prometheus-stack

  • Default scrape interval, how often Prometheus scrapes targets (spec.scrapeInterval: 30sg). It can be overwitten in PodMonitor/ServiceMonitor/Probe particular configuration.

  • Rules evaluation period, how often Prometheus evaluates rules (evaluationInterval: 30s)

  • Data retention policy (retention: 10d)

  • Persistent volume specification (storage: volumeClaimTemplate:) used by the Statefulset objects deployed. In my case volume claim from Longhorn.

  • Rules for filtering the Objects (PodMonitor, ServiceMonitor, Probe and PrometheusRule) that applies to this particular instance of Prometheus services: spec.podMonitorSelector, spec.serviceMonitorSelector, spec.probeSelector, and spec.rulesSelector introduces a filtering rule (Objects must include a label release: kube-prometheus-stack).

    The following diagram, from official prometheus operator documentation, shows an example of how the filtering rules are applied. A Deployment and Service called my-app is being monitored by Prometheus based on a ServiceMonitor named my-service-monitor:

    prometheus-operator-crds
    Source: Prometheus Operator Documentation

AlertManager Object

This object contain the desirable configuration of the AlertManager Server

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: k3s-monitoring
  labels:
    app: kube-prometheus-stack-alertmanager
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 39.4.0
    chart: kube-prometheus-stack-39.4.0
    heritage: Helm
    release: kube-prometheus-stack
  name: kube-prometheus-stack-alertmanager
  namespace: k3s-monitoring
spec:
  alertmanagerConfigNamespaceSelector: {}
  alertmanagerConfigSelector: {}
  externalUrl: http://kube-prometheus-stack-alertmanager.k3s-monitoring:9093
  image: quay.io/prometheus/alertmanager:v0.24.0
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  portName: http-web
  replicas: 1
  retention: 120h
  routePrefix: /
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: kube-prometheus-stack-alertmanager
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: longhorn
  version: v0.24.0

This AlartManager object specifies the following Alert Manager configuration:

  • A version and image: v0.24.0 (spec.version and spec.image)

  • HA Configuration. Number of replicas (spec.shards and spec.replicas).

  • Data retention policy (retention: 120h)

  • Persistent volume specification (storage: volumeClaimTemplate:) used by the Statefulset objects deployed. In my case volume claim from Longhorn.

ServiceMonitor Objects

kube-prometheus-stack creates several ServiceMonitor objects to start scraping metrics from all the components deployed:

  • Node Exporter
  • Grafana
  • Kube-State-Metrics
  • Prometheus
  • AlertManager
  • Prometheus Operator

and the following Kubernetes services and processes depending on the configuration of the helm chart.

  • coreDNS
  • Kube Api server
  • kubelet
  • Kube Controller Manager
  • Kubernetes Scheduler
  • Kubernetes etc
  • Kube Proxy

In the chart configuration, monitoring of kube-controller-manager, kube-scheduler, kube-proxy, kubelet components has been disabled. Only the monitoring of coreDNS component has not been disabled.

See below section, “K3S components monitoring”, to know why monitoring of kubernetes components has been disabled in kube-prometheus-stack and how to configure manually the monitoring of K3s.

PrometheusRule Objects

kube-prometheus-stack creates several PrometheusRule objects to specify the alerts and the metrics that Prometheus generated based on the scraped metrics (alerting and record rules)

The rules provisioned can be found here: Prometheus rules created by kube-prometheus-stack chart.

Since monitoring of K8S components (kube-controller-manager, kube-scheduler, kube-proxy, kubelet) has been disabled in the chart configuration, correponding PrometheusRules objects are not created.

See below section, “K3S components monitoring”, to know how to configure manually those rules.

Grafana

Grafana helm chart is deployed as a subchart of the kube-prometheus-stack helm chart.

Kube-prometheus-stack’s helm chart grafana value is used to pass the configuration to grafana’s chart.

In my case, on top of default values.yml, only admin password and specific plugin has been specified. Plugin grafana-piechart-panel is needed by Traefik’s dashboard, that will be deployed later.

grafana:
  # Admin user password
  adminPassword: "admin_password"
  # List of grafana plugins to be installed
  plugins:
    - grafana-piechart-panel

Provisioning Dashboards automatically

Grafana dashboards can be configured through provider definitions (yaml files) located in a provisioning directory (/etc/grafana/provisioning/dashboards). This yaml file contains the directory from where dashboards in json format can be loaded. See Grafana Tutorial: Provision dashboards and data sources

When Grafana is deployed in Kubernetes using the helm chart, dashboards can be automatically provisioned enabling a sidecar container provisioner.

Grafana helm chart creates the following /etc/grafana/provisioning/dashboard/provider.yml file, which makes Grafana load all json dashboards from /tmp/dashboards

apiVersion: 1
providers:
- name: 'sidecarProvider'
  orgId: 1
  folder: ''
  type: file
  disableDeletion: false
  allowUiUpdates: false
  updateIntervalSeconds: 30
  options:
    foldersFromFilesStructure: false
    path: /tmp/dashboards

With this sidecar provider enabled, Grafana dashboards can be provisioned automatically creating ConfigMap resources containing the dashboard json definition. A provisioning sidecar container must be enabled in order to look for those ConfigMaps in real time and automatically copy them to the provisioning directory (/tmp/dashboards).

Check out “Grafana chart documentation: Sidecar for Dashboards” explaining how to enable/use dashboard provisioning side-car.

kube-prometheus-stack configure by default grafana provisioning sidecar to check only for new ConfigMaps containing label grafana_dashboard

This are the default helm chart values configuring the sidecar:

grafana:
  sidecar:
    dashboards:
      SCProvider: true
      annotations: {}
      defaultFolderName: null
      enabled: true
      folder: /tmp/dashboards
      folderAnnotation: null
      label: grafana_dashboard
      labelValue: null

For provision automatically a new dashboard, a new ConfigMap resource must be created, labeled with grafana_dashboard: 1 and containing as data the json file content.

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-grafana-dashboard
  labels:
     grafana_dashboard: "1"
data:
  dashboard.json: |-
  [json_file_content]

Following this procedure kube-prometheus-stack helm chart automatically deploy a set of Dashboards for monitoring metrics coming from Kubernetes processes and from Node Exporter. The list of kube-prometheus-stack grafana dashboards

For each dashboard a ConfigMap containing the json definition is created.

For the K8s disabled components kube-prometheus-stack do not deploy the corresponding dashboard, so they need to be added manually. See below section “K3S components monitoring” to know how to add manually those dashboards.

You can get all of them running the following command

kubectl get cm -l "grafana_dashboard=1" -n k3s-monitoring

Provisioning DataSources automatically

Grafana datasources can be configured through yml files located in a provisioning directory (/etc/grafana/provisioning/datasources). See Grafana Tutorial: Provision dashboards and data sources

When deploying Grafana in Kubernetes, datasources config files can be imported from ConfigMaps. This is implemented by a sidecar container that copies these ConfigMaps to its provisioning directory.

Check out “Grafana chart documentation: Sidecar for Datasources” explaining how to enable/use this sidecar container.

kube-prometheus-stack enables by default grafana datasource sidecar to check for new ConfigMaps containing label grafana_datasource.

sidecar:
  datasources:
    enabled: true
    defaultDatasourceEnabled: true
    uid: prometheus
    annotations: {}
    createPrometheusReplicasDatasources: false
    label: grafana_datasource
    labelValue: "1"
    exemplarTraceIdDestinations: {}

This is the ConfigMap, automatically created by kube-prometheus-stack, including the datasource definition for connecting Grafana to the Prometheus server: (Datasource name Prometheus)

apiVersion: v1
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      uid: prometheus
      url: http://kube-prometheus-stack-prometheus.k3s-monitoring:9090/
      access: proxy
      isDefault: true
      jsonData:
        timeInterval: 30s
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: k3s-monitoring
  labels:
    app: kube-prometheus-stack-grafana
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 39.4.0
    chart: kube-prometheus-stack-39.4.0
    grafana_datasource: "1"
    heritage: Helm
    release: kube-prometheus-stack
  name: kube-prometheus-stack-grafana-datasource
  namespace: k3s-monitoring

The ConfigMap includes the grafana_datasource label, so it is loaded by the sidecar container into Grafana’s provisioning directory.

Prometheus Node Exporter

Prometheus Node exportet helm chart is deployed as a subchart of the kube-prometheus-stack helm chart.This chart deploys Prometheus Node Exporter in all cluster nodes as daemonset

Kube-prometheus-stack’s helm chart prometheus-node-exporter value is used to pass the configuration to node exporter’s chart.

Default kube-prometheus-stack’s values.yml file contains the following configuration which is not changed in the installation procedure defined above

prometheus-node-exporter:
  namespaceOverride: ""
  podLabels:
    ## Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards
    ##
    jobLabel: node-exporter
  extraArgs:
    - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
    - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
  service:
    portName: http-metrics
  prometheus:
    monitor:
      enabled: true

      jobLabel: jobLabel

      ## Scrape interval. If not set, the Prometheus default scrape interval is used.
      ##
      interval: ""

      ## How long until a scrape request times out. If not set, the Prometheus default scape timeout is used.
      ##
      scrapeTimeout: ""

      ## proxyUrl: URL of a proxy that should be used for scraping.
      ##
      proxyUrl: ""

      ## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
      ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
      ##
      metricRelabelings: []
      # - sourceLabels: [__name__]
      #   separator: ;
      #   regex: ^node_mountstats_nfs_(event|operations|transport)_.+
      #   replacement: $1
      #   action: drop

      ## RelabelConfigs to apply to samples before scraping
      ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
      ##
      relabelings: []
      # - sourceLabels: [__meta_kubernetes_pod_node_name]
      #   separator: ;
      #   regex: ^(.*)$
      #   targetLabel: nodename
      #   replacement: $1
      #   action: replace
  rbac:
    ## If true, create PSPs for node-exporter
    ##
    pspEnabled: false

Default configuration just excludes from the monitoring several mount points and file types (extraArgs) and it creates the corresponding ServiceMonitor object to start scrapping metrics from this exporter.

Prometheus-node-exporter’s metrics are exposed in TCP port 9100 (/metrics endpoint) of each daemonset PODs.

Kube State Metrics

Prometheus Kube State Metrics helm chart is deployed as a subchart of the kube-prometheus-stack helm chart.

This chart deploys kube-state-metrics agent. kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.

Kube-prometheus-stack’s helm chart kube-state-metrics value is used to pass the configuration to kube-state-metrics’s chart.

Kube-state-metrics’ metrics are exposed in TCP port 8080 (/metrics endpoint).

K3S and Cluster Services Monitoring

In this section, it is detailed the procedures to activate Prometheus monitoring for K3S components and the cluster services deployed.

The procedure includes the creation of Kuberentes resources, Services/Endpoints and ServiceMonitor/PodMonitor/Probe, that need to be created to configure Prometheus’ service discovery and monitoring configuration. It also includes the dashboards, in json format, that need to be imported in Grafana to visualize the metrics of each particular service.

K3S components monitoring

Kuberentes Documentation - System Metrics details the Kubernetes components exposing metrics in Prometheus format:

  • kube-controller-manager (exposing metrics endpoint at TCP 10257)
  • kube-proxy (exposing /metrics endpoint at TCP 10249)
  • kube-apiserver (exposing /metrics at Kubernetes API port TCP 6443)
  • kube-scheduler (exposing /metrics endpoint at TCP 10259)
  • kubelet (exposing /metrics, /metrics/cadvisor, /metrics/resource and /metrics/probes endpoints at TCP 10250)

kube-prometheus-stack creates the kubernetes resources needed to scrape the metrics from all K8S components in a standard distribution of Kubernetes, but these objects are not valid for a K3S cluster.

K3S distribution has a special behavior related to metrics exposure. K3s deploys one process in each cluster node: k3s-server running on master nodes or k3s-agent running on worker nodes. All kubernetes components running in the node share the same memory, and so K3s is emitting the same metrics in all /metrics endpoints available in a node: api-server, kubelet (TCP 10250), kube-proxy (TCP 10249), kube-scheduler (TCP 10251) and kube-controller-manager (TCP 10257). When polling one of the kubernetes components metrics endpoints, the metrics belonging to other kubernetes components are not filtered out.

node1, k3s master, running all kubernetes components, is emitting the same metrics in all the ports. node2-node4, k3s workers, only running kubelet and kube-proxy components, emit the same metrics in both TCP 10250 and 10249 ports.

Enabling the scraping of all different metrics TCP ports (10249,10250,10251, 10257 and apiserver) causes the ingestion of duplicated metrics. Duplicated metrics in Prometheus need to be avoided so memory and CPU consumption can be reduced.

By the other hand, kubelet additional metrics endpoints (/metrics/cadvisor, /metrics/resource and /metrics/probes) are only available at TCP 10250.

Thus, the solution is to scrape only the metrics endpoints available in kubelet port (TCP 10250): /metrics, /metrics/cadvisor, /metrics/resource and /metrics/probes

This is the reason why monitoring of K8s kuberentes components has been disabled in kube-prometheus-stack chart configuration.

# Disable creation of kubelet service
prometheusOperator:
  kubeletService:
    enabled: false
# Disabling monitoring of K8s services.
# Monitoring of K3S components will be configured out of kube-prometheus-stack
kubelet:
  enabled: false
kubeApiServer:
  enabled: false
kubeControllerManager:
  enabled: false
kubeScheduler:
  enabled: false
kubeProxy:
  enabled: false
kubeEtcd:
  enabled: false
# Disable K8S Prometheus Rules
# Rules for K3S components will be configured out of kube-prometheus-stack
defaultRules:
  create: true
  rules:
    etcd: false
    k8s: false
    kubeApiserverAvailability: false
    kubeApiserverBurnrate: false
    kubeApiserverHistogram: false
    kubeApiserverSlos: false
    kubeControllerManager: false
    kubelet: false
    kubeProxy: true
    kubernetesApps: false
    kubernetesResources: false
    kubernetesStorage: false
    kubernetesSystem: true
    kubeScheduler: false

With this configuration, the kubernetes resources (headless Service, ServiceMonitor and PrometheusRules) are not created for activate K8S components monitoring and correponding Grafana’s dashboards are not deployed.

To configure manually all kubernetes resources needed to scrape the available metrics from kubelet metrics endpoints, follow this procedure:

  • Create a manifest file k3s-metrics-service.yml for creating the Kuberentes service used by Prometheus to scrape all K3S metrics.

    This service must be a headless service, spec.clusterIP=None, allowing Prometheus to discover each of the pods behind the service. Since the metrics are exposed not by a pod but by a k3s process, the service need to be defined without selector and the endpoints must be defined explicitly.

    The service will be use the kubelet endpoint (TCP port 10250) for scraping all K3S metrics available in each node.

    ---
    # Headless service for K3S metrics. No selector
    apiVersion: v1
    kind: Service
    metadata:
      name: k3s-metrics-service
      labels:
        app.kubernetes.io/name: kubelet
      namespace: kube-system
    spec:
      clusterIP: None
      ports:
      - name: https-metrics
        port: 10250
        protocol: TCP
        targetPort: 10250
      type: ClusterIP
    ---
    # Endpoint for the headless service without selector
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: k3s-metrics-service
      namespace: kube-system
    subsets:
    - addresses:
      - ip: 10.0.0.11
      - ip: 10.0.0.12
      - ip: 10.0.0.13
      - ip: 10.0.0.14
      ports:
      - name: https-metrics
        port: 10250
        protocol: TCP
    
  • Create manifest file for defining the service monitor resource for let Prometheus discover these targets

    The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover K3S metrics endpoint as a Prometheus target.

    A single ServiceMonitor resource to enable the collection of all k8s components metrics from unique port TCP 10250.

    This ServiceMonitor includes all Prometheus’ relabeling/dropping rules defined by the ServiceMonitor resources that kube-prometheus-stack chart would have created if monitoring of all k8s component were activated.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        release: kube-prometheus-stack
      name: k3s-monitoring
      namespace: k3s-monitoring
    spec:
      endpoints:
      # /metrics endpoint
      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        honorLabels: true
        metricRelabelings:
        # apiserver
        - action: drop
          regex: apiserver_request_duration_seconds_bucket;(0.15|0.2|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2|3|3.5|4|4.5|6|7|8|9|15|25|40|50)
          sourceLabels:
          - __name__
          - le
        port: https-metrics
        relabelings:
        - action: replace
          sourceLabels:
          - __metrics_path__
          targetLabel: metrics_path
        scheme: https
        tlsConfig:
          caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecureSkipVerify: true
      # /metrics/cadvisor
      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        honorLabels: true
        metricRelabelings:
        - action: drop
          regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
          sourceLabels:
          - __name__
        - action: drop
          regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
          sourceLabels:
          - __name__
        - action: drop
          regex: container_memory_(mapped_file|swap)
          sourceLabels:
          - __name__
        - action: drop
          regex: container_(file_descriptors|tasks_state|threads_max)
          sourceLabels:
          - __name__
        - action: drop
          regex: container_spec.*
          sourceLabels:
          - __name__
        path: /metrics/cadvisor
        port: https-metrics
        relabelings:
        - action: replace
          sourceLabels:
          - __metrics_path__
          targetLabel: metrics_path
        scheme: https
        tlsConfig:
          caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecureSkipVerify: true
        # /metrics/probes
      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        honorLabels: true
        path: /metrics/probes
        port: https-metrics
        relabelings:
        - action: replace
          sourceLabels:
          - __metrics_path__
          targetLabel: metrics_path
        scheme: https
        tlsConfig:
          caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecureSkipVerify: true
      jobLabel: app.kubernetes.io/name
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          app.kubernetes.io/name: kubelet
    
  • kube-prometheus-stack’s Prometheus rules associated to K8s components are not intalled when disabling their monitoring. Anyway those rules are not valid for K3S since it contains promQL queries filtering metrics by job labels “apiserver”, “kubelet”, etc.

    kube-prometheus-stack creates by default different PrometheusRules resources, but all of them are included in single manifest file in prometheus-operator source repository: kubernetesControlPlane-prometheusRule.yaml

    Modify the yaml file to replace job labels names:

    • Replace job labels names

      Replace the following strings:

      • job="apiserver"
      • job="kube-proxy"
      • job="kube-scheduler"
      • job="kube-controller-manager"

      by:

      job="kubelet"

    • Add the following label so it match the PrometheusOperator selector for rules

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
       labels:
         release: kube-prometheus-stack` 
      
  • Apply manifest file

    kubectl apply -f k3s-metrics-service.yml k3s-servicemonitor.yml kubernetesControlPlane-prometheusRule.yaml
    
  • Check targets are automatically discovered in Prometheus UI:

    http://prometheus/targets

coreDNS monitoring

Enabled by default in kube-prometheus-stack

coreDns:
  enabled: true
  service:
    port: 9153
    targetPort: 9153
    ...

It creates kube-prometheus-stack-coredns service in kube-system namespace pointing to coreDNS POD.

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: k3s-monitoring
  creationTimestamp: "2022-08-18T16:22:12Z"
  labels:
    app: kube-prometheus-stack-coredns
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 39.8.0
    chart: kube-prometheus-stack-39.8.0
    heritage: Helm
    jobLabel: coredns
    release: kube-prometheus-stack
  name: kube-prometheus-stack-coredns
  namespace: kube-system
  resourceVersion: "6653"
  uid: 5c0e9f38-2851-450a-b28f-b4baef76e5bb
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Creates the ServiceMonitor kube-prometheus-stack-coredns

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: kube-prometheus-stack
    meta.helm.sh/release-namespace: k3s-monitoring
  creationTimestamp: "2022-08-18T16:22:15Z"
  generation: 1
  labels:
    app: kube-prometheus-stack-coredns
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 39.8.0
    chart: kube-prometheus-stack-39.8.0
    heritage: Helm
    release: kube-prometheus-stack
  name: kube-prometheus-stack-coredns
  namespace: k3s-monitoring
  resourceVersion: "6777"
  uid: 065442b6-6ead-447b-86cd-775a673ad071
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    port: http-metrics
  jobLabel: jobLabel
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app: kube-prometheus-stack-coredns
      release: kube-prometheus-stack

K3S Grafana dashboards

kube-prometheus-stack should install the Grafana dashboards corresponding to K8S components, but since their monitoring is disabled in the helm chart configuration, they need to be intalled manually.

Kubernetes components dashboards can be donwloaded from grafana.com:

These Grafana’s dashboards need to be modified because promQL queries using job name label (kube-scheduler, kube-proxy, apiserver, etc.) that are not used in our configuration. In our configuration only one scrapping job (“kubelet”) is configured to scrape metrics from all K3S components.

The following changes need to be applied to json files:

Replace the following strings:

  • job=\"apiserver\"
  • job=\"kube-proxy\"
  • job=\"kube-scheduler\"
  • job=\"kube-controller-manager\"

by:

job=\"kubelet\"

Traefik Monitoring

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Traefik metrics endpoint as a Prometheus target.

  • Create a manifest file traefik-servicemonitor.yml
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: traefik
    release: kube-prometheus-stack
  name: traefik
  namespace: k3s-monitoring
spec:
  jobLabel: app.kubernetes.io/name
  endpoints:
    - port: traefik
      path: /metrics
  namespaceSelector:
    matchNames:
      - traefik-system
  selector:
    matchLabels:
      app.kubernetes.io/instance: traefik
      app.kubernetes.io/name: traefik
      app.kubernetes.io/component: traefik-metrics
  • Apply manifest file
    kubectl apply -f traefik-servicemonitor.yml
    
  • Check target is automatically discovered in Prometheus UI: http://prometheus/targets

Traefik Grafana dashboard

Traefik dashboard can be donwloaded from grafana.com: dashboard id: 11462. This dashboard has as prerequisite to have installed grafana-piechart-panel plugin. The list of plugins to be installed can be specified during kube-prometheus-stack helm deployment as values (grafana.plugins variable).

Longhorn Monitoring

As stated by official documentation, Longhorn Backend service is a service pointing to the set of Longhorn manager pods. Longhorn’s metrics are exposed in Longhorn manager pods at the endpoint http://LONGHORN_MANAGER_IP:PORT/metrics

Backend endpoint is already exposing Prometheus metrics.

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Longhorn metrics endpoint as a Prometheus target.

  • Create a manifest file longhorm-servicemonitor.yml

    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: longhorn
        release: kube-prometheus-stack
      name: longhorn-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      jobLabel: app.kubernetes.io/name
      selector:
        matchLabels:
          app: longhorn-manager
      namespaceSelector:
        matchNames:
        - longhorn-system
      endpoints:
      - port: manager
    
  • Apply manifest file

    kubectl apply -f longhorn-servicemonitor.yml
    
  • Check target is automatically discovered in Prometheus UI:http://prometheus/targets

Longhorn Grafana dashboard

Longhorn dashboard sample can be donwloaded from grafana.com: dashboard id: 13032.

Velero Monitoring

By default velero helm chart is configured to expose Prometheus metrics in port 8085 Backend endpoint is already exposing Prometheus metrics.

It can be confirmed checking velero service

kubectl get svc velero -n velero-system -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: velero
    meta.helm.sh/release-namespace: velero-system
  creationTimestamp: "2021-12-31T11:36:39Z"
  labels:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: velero
    helm.sh/chart: velero-2.27.1
  name: velero
  namespace: velero-system
  resourceVersion: "9811"
  uid: 3a6707ba-0e0f-49c3-83fe-4f61645f6fd0
spec:
  clusterIP: 10.43.3.141
  clusterIPs:
  - 10.43.3.141
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-monitoring
    port: 8085
    protocol: TCP
    targetPort: http-monitoring
  selector:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/name: velero
    name: velero
  sessionAffinity: None
  type: ClusterIP

And executing curl command to obtain the velero metrics:

curl 10.43.3.141:8085/metrics

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Velero metrics endpoint as a Prometheus target.

  • Create a manifest file velero-servicemonitor.yml

    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: velero
        release: kube-prometheus-stack
      name: velero-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      jobLabel: app.kubernetes.io/name
      endpoints:
        - port: http-monitoring
          path: /metrics
      namespaceSelector:
        matchNames:
          - velero-system
      selector:
        matchLabels:
          app.kubernetes.io/instance: velero
          app.kubernetes.io/name: velero
    
  • Apply manifest file
    kubectl apply -f longhorn-servicemonitor.yml
    
  • Check target is automatically discovered in Prometheus UI

    http://prometheus.picluster.ricsanfre/targets

Velero Grafana dashboard

Velero dashboard sample can be donwloaded from grafana.com: dashboard id: 11055.

Minio Monitoring

For details see Minio’s documentation: “Collect MinIO Metrics Using Prometheus”.

  • Generate bearer token to be able to access to Minio Metrics

    mc admin prometheus generate <alias>
    

    Output is something like this:

    scrape_configs:
    - job_name: minio-job
    bearer_token: eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJleHAiOjQ3OTQ4Mjg4MTcsImlzcyI6InByb21ldGhldXMiLCJzdWIiOiJtaW5pb2FkbWluIn0.mPFKnj3p-sPflnvdrtrWawSZn3jTQUVw7VGxdBoEseZ3UvuAcbEKcT7tMtfAAqTjZ-dMzQEe1z2iBdbdqufgrA
    metrics_path: /minio/v2/metrics/cluster
    scheme: https
    static_configs:
    - targets: ['127.0.0.1:9091']
    

    Where:

    • bearer_token is the token to be used by Prometheus for authentication purposes
    • metrics_path is th path to scrape the metrics on Minio server (TCP port 9091)
  • Create a manifest file minio-metrics-service.yml for creating the Kuberentes service pointing to a external server used by Prometheus to scrape Minio metrics.

    This service. as it happens with k3s-metrics must be a headless service and without selector and the endpoints must be defined explicitly

    The service will be use the Minio endpoint (TCP port 9091) for scraping all metrics.

    ---
    # Headless service for Minio metrics. No Selector
    apiVersion: v1
    kind: Service
    metadata:
      name: minio-metrics-service
      labels:
        app.kubernetes.io/name: minio
      namespace: kube-system
    spec:
      clusterIP: None
      ports:
      - name: http-metrics
        port: 9091
        protocol: TCP
        targetPort: 9091
      type: ClusterIP
    ---
    # Endpoint for the headless service without selector
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: minio-metrics-service
      namespace: kube-system
    subsets:
    - addresses:
      - ip: 10.0.0.11
      ports:
      - name: http-metrics
        port: 9091
      protocol: TCP
    
  • Create manifest file for defining the a Secret containing the Bearer-Token an the service monitor resource for let Prometheus discover this target

    The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Minio metrics endpoint as a Prometheus target. Bearer-token need to be b64 encoded within the Secret resource

    ---
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: minio-monitor-token
      namespace: k3s-monitoring
    data:
      token: < minio_bearer_token | b64encode >
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: minio
        release: kube-prometheus-stack
      name: minio-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      jobLabel: app.kubernetes.io/name
      endpoints:
        - port: http-metrics
          path: /minio/v2/metrics/cluster
          scheme: https
          tlsConfig:
            insecureSkipVerify: true 
          bearerTokenSecret:
            name: minio-monitor-token
            key: token
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          app.kubernetes.io/name: minio
    
  • Apply manifest file
    kubectl apply -f minio-metrics-service.yml minio-servicemonitor.yml
    
  • Check target is automatically discovered in Prometheus UI: http://prometheus/targets

Minio Grafana dashboard

Minio dashboard sample can be donwloaded from grafana.com: dashboard id: 13502.

Elasticsearch Monitoring

prometheus-elasticsearch-exporter need to be installed in order to have Elastic search metrics in Prometheus format. See documentation “Prometheus elasticsearh exporter installation”.

This exporter exposes /metrics endpoint in port 9108.

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Fluentbit metrics endpoint as a Prometheus target.

  • Create a manifest file elasticsearch-servicemonitor.yml

    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: prometheus-elasticsearch-exporter
        release: kube-prometheus-stack
      name: elasticsearch-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      endpoints:
        - port: http
          path: /metrics
      namespaceSelector:
        matchNames:
          - k3s-logging
      selector:
        matchLabels:
          app: prometheus-elasticsearch-exporter
    

Elasticsearch Grafana dashboard

Elasticsearh exporter dashboard sample can be donwloaded from prometheus-elasticsearh-grafana.

Fluentbit/Fluentd Monitoring

Fluentbit Monitoring

Fluentbit, when enabling its HTTP server, it exposes several endpoints to perform monitoring tasks. See details in Fluentbit monitoring doc.

One of the endpoints (/api/v1/metrics/prometheus) provides Fluentbit metrics in Prometheus format.

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Fluentbit metrics endpoint as a Prometheus target.

  • Create a manifest file fluentbit-servicemonitor.yml

    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: fluent-bit
        release: kube-prometheus-stack
      name: fluentbit-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      jobLabel: app.kubernetes.io/name
      endpoints:
        - path: /api/v1/metrics/prometheus
          targetPort: 2020
        - params:
            target:
            - http://127.0.0.1:2020/api/v1/storage
          path: /probe
          targetPort: 7979
      namespaceSelector:
        matchNames:
          - k3s-logging
      selector:
        matchLabels:
          app.kubernetes.io/instance: fluent-bit
          app.kubernetes.io/name: fluent-bit
    

Service monitoring include two endpoints. Fluentbit metrics endpoint (/api/v1/metrics/prometheus port TCP 2020) and json-exporter sidecar endpoint (/probe port 7979), passing as target parameter fluentbit storage endpoint (api/v1/storage)

Fluentd Monitoring

In order to monitor Fluentd with Prometheus, fluent-plugin-prometheus plugin need to be installed and configured. The custom docker image fluentd-aggregator, I have developed for this project, has this plugin installed.

fluentd.conf file must include configuration of this plugin. It provides ‘/metrics’ endpoint on port 24231.

# Prometheus metric exposed on 0.0.0.0:24231/metrics
<source>
  @type prometheus
  @id in_prometheus
  bind "#{ENV['FLUENTD_PROMETHEUS_BIND'] || '0.0.0.0'}"
  port "#{ENV['FLUENTD_PROMETHEUS_PORT'] || '24231'}"
  metrics_path "#{ENV['FLUENTD_PROMETHEUS_PATH'] || '/metrics'}"
</source>

<source>
  @type prometheus_output_monitor
  @id in_prometheus_output_monitor
</source>

Check out further details in [Fluentd Documentation: Monitoring by Prometheus] (https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus).

The Prometheus custom resource definition (CRD), ServiceMonitoring will be used to automatically discover Fluentd metrics endpoint as a Prometheus target.

  • Create a manifest file fluentd-servicemonitor.yml

    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app: fluentd
        release: kube-prometheus-stack
      name: fluentd-prometheus-servicemonitor
      namespace: k3s-monitoring
    spec:
      jobLabel: app.kubernetes.io/name
      endpoints:
        - port: metrics
          path: /metrics
      namespaceSelector:
        matchNames:
          - k3s-logging
      selector:
        matchLabels:
          app.kubernetes.io/instance: fluentd
          app.kubernetes.io/name: fluentd
    

Fluentbit/Fluentd Grafana dashboard

Fluentbit dashboard sample can be donwloaded from grafana.com: dashboard id: 7752.

This dashboard has been modified to include fluentbit’s storage metrics (chunks up and down) and to solve some issues with fluentd metrics.

External Nodes Monitoring

  • Install Node metrics exporter

    Instead of installing Prometheus Node Exporter, fluentbit built-in similar functionallity can be used.

    Fluentbit’s node-exporter-metric and prometheus-exporter plugins can be configured to expose gateway metrics that can be scraped by Prometheus.

    Add to node’s fluent.conf file the following configuration:

    [INPUT]
        name node_exporter_metrics
        tag node_metrics
        scrape_interval 30
    

    It configures node exporter input plugin to get node metrics

    [OUTPUT]
        name prometheus_exporter
        match node_metrics
        host 0.0.0.0
        port 9100
    

    It configures prometheuss output plugin to expose metrics endpoint /metrics in port 9100.

  • Create a manifest file external-node-metrics-service.yml for creating the Kuberentes service pointing to a external server used by Prometheus to scrape External nodes metrics.

    This service. as it happens with k3s-metrics, and Minio must be a headless service and without selector and the endpoints must be defined explicitly.

    The service will be use the Fluentbit metrics endpoint (TCP port 9100) for scraping all metrics.

    ---
    # Headless service for External Node metrics. No Selector
    apiVersion: v1
    kind: Service
    metadata:
      name: external-node-metrics-service
      labels:
        app: prometheus-node-exporter
        release: kube-prometheus-stack
        jobLabel: node-exporter
      namespace: k3s-monitoring
    spec:
      clusterIP: None
      ports:
      - name: http-metrics
        port: 9100
        protocol: TCP
        targetPort: 9100
      type: ClusterIP
    ---
    # Endpoint for the headless service without selector
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: external-node-metrics-servcie
      namespace: k3s-monitoring
    subsets:
    - addresses:
      - ip: 10.0.0.1
      ports:
      - name: http-metrics
        port: 9100
        protocol: TCP
    

    The service has been configured with specific labels so it matches the discovery rules configured in the Node-Exporter ServiceMonitoring Object (part of the kube-prometheus installation) and no new service monitoring need to be configured and the new nodes will appear in the corresponing Grafana dashboards.

    app: prometheus-node-exporter
    release: kube-prometheus-stack
    jobLabel: node-exporter
    

    Prometheus-Node-Exporter Service Monitor is the following:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      annotations:
        meta.helm.sh/release-name: kube-prometheus-stack
        meta.helm.sh/release-namespace: k3s-monitoring
      generation: 1
      labels:
        app: prometheus-node-exporter
        app.kubernetes.io/managed-by: Helm
        chart: prometheus-node-exporter-3.3.1
        heritage: Helm
        jobLabel: node-exporter
        release: kube-prometheus-stack
      name: kube-prometheus-stack-prometheus-node-exporter
      namespace: k3s-monitoring
      resourceVersion: "6369"
    spec:
      endpoints:
      - port: http-metrics
        scheme: http
      jobLabel: jobLabel
      selector:
        matchLabels:
          app: prometheus-node-exporter
          release: kube-prometheus-stack
    

    spec.selector.matchLabels configuration specifies which labels values must contain the services in order to be discovered by this ServiceMonitor object.

    app: prometheus-node-exporter
    release: kube-prometheus-stack
    

    jobLabel configuration specifies the name of a service label which contains the job_label assigned to all the metrics. That is why jobLabel label is added to the new service with the corresponding value (node-exporter). This jobLabel is used in all configured Grafana’s dashboards, so it need to be configured to reuse them for the external nodes.

    jobLabel: node-exporter
    
  • Apply manifest file
    kubectl apply -f exterlnal-node-metrics-service.yml
    
  • Check target is automatically discovered in Prometheus UI: http://prometheus/targets

Grafana dashboards

Not need to install additional dashboards. Node-exporter dashboards pre-integrated by kube-stack shows the external nodes metrics.


Last Update: Sep 09, 2022

Comments: