Logging (EFK and Loki)

EFK vs PLG Stacks

Two different stacks can be deployed as centralized logging solution for the kubernetes cluster:

  • EFK stack (Elasticsearch-Fluentd/Fluentbit-Kibana), where:
    • Elasticsearch is used as logs storage and search engine
    • Fluentd/Fluentbit used for collecting, aggregate and distribute logs
    • Kibana is used as visualization layer.

    This is a mature open-source stack for implementing centralized log management and log analytics capabilities. Since Elasticsearch indexes the whole content of the logs the resources required by the solution in terms of storage and memory are high.

  • PLG stack (Promtail - Loki - Grafana), where:
    • Promtail is used as log collector
    • Loki as log storage/aggregator
    • Grafana as visualization layer.

    Loki is a lightweigh alternative to Elasticsearch providing a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus for Kubernetes environments.

    Loki consumption of resources is lower than ES because it does not index the contents of the logs, it rather indexes a set of labels for each log stream.

In the cluster both stacks can be deployed to deliver complimentary logs-based monitoring (observability) and advance log analytics capabilities.

The logging architecture will have the following components:

  1. Loki as key component of the Observability platform. Loki, managing logs like prometheus metrics, with the same labels, makes possible to join in the same Grafana dashboards metrics (prometheus), logs (Loki) and traces (jaegger) belonging to the same context (pod, application, container). This way Grafana can be used as single plane of glass for monitoring cluster services.

  2. ElasticSearh/Kibana providing advance log analytics capabilities. Loki indexing capabilities are limited to logs labels while ES indexes whole content of the logs. Kibana provides many visualization tools to do analysis on ES indexed data, such as location maps, machine learning for anomaly detection, and graphs to discover relationships in data.

  3. Common architecture for log collection, aggregation and distribution based on Fluentbit/Fluentd. Fluentbit/Fluentd can be used to distribute logs to both logs storage platform (ES and Loki) instead of deploying two separate log collectors (Fluentbit and Promtail).

    Fluentbit/Fluentd selected over Promtail, because it is a general purpose log colletor/distributor, that can be used to ingest logs from different sources (not only kubernetes), parsing and filtering them, and route them to different destinations. Promtail is tailored only to work only with Loki.

The architecture is shown in the following picture

K3S-EFK-LOKI-Architecture

This solution will not only process logs from kubernetes cluster but also collects the logs from external nodes (i.e.: gateway node.)

Collecting cluster logs

Container logs

In Kubernetes, containerized applications that log to stdout and stderr have their log streams captured and redirected to log files on the nodes (/var/log/containers). To tail these log files, filter log events, transform the log data, and ship it off to the Elasticsearch logging backend, a process like, fluentd/fluentbit can be used.

To learn more about kubernetes logging architecture check out “Cluster-level logging architectures” from the official Kubernetes docs. Logging architecture using node-level log agents is the one implemented with fluentbit/fluentd log collectors. Fluentbit/fluentd proccess run in each node as a kubernetes’ daemonset with enough privileges to access to host file system where container logs are stored (/var/logs/containers in K3S implementation).

Fluentbit and fluentd official helm charts deploy the fluentbit/fluentd pods as privileged daemonset with access to hots’ /var/logs directory. In addition to container logs, same Fluentd/Fluentbit agents deployed as daemonset can collect and parse logs from systemd-based services and OS filesystem level logs (syslog, kern.log, etc., all of them located in /var/logs)

Kubernetes logs

In K3S all kuberentes componentes (API server, scheduler, controller, kubelet, kube-proxy, etc.) are running within a single process (k3s). This process when running with systemd writes all its logs to /var/log/syslog file. This file need to be parsed in order to collect logs from Kubernetes (K3S) processes.

K3S logs can be also viewed with journactl command

In master node:

sudo journactl -u k3s

In worker node:

sudo journalctl -u k3s-agent

Host logs

OS level logs (/var/logs) can be collected with the same agent deployed to collect containers logs (daemonset)

Log collection, aggregation and distribution architectures

Two different architectures can be implemented with Fluentbit and Fluentd

logging-forwarder-only logging-forwarder-aggregator

Forwarder-only architecture

This pattern includes having a logging agent, based on fluentbit or fluentd, deployed on edge (forwarder), generally where data is created, such as Kubernetes nodes, virtual machines or baremetal servers. These forwarder agents collect, parse and filter logs from the edge nodes and send data direclty to a backend service.

Advantages

Disadvantages

  • Hard to change configuration across a fleet of agents (E.g., adding another backend or processing)
  • Hard to add more end destinations if needed

Forwarder/Aggregator Architecture

Similar to the forwarder-only deployment, lightweight logging agent instance is deployed on edge (forwarder) close to data sources (kubernetes nodes, virtual machines or baremetal servers). In this case, these forwarders do minimal processing and then use the forward protocol to send data to a much heavier instance of Fluentd or Fluent Bit (aggregator). This heavier instance may perform more filtering and processing before routing to the appropriate backend(s).

Advantages

  • Less resource utilization on the edge devices (maximize throughput)

  • Allow processing to scale independently on the aggregator tier.

  • Easy to add more backends (configuration change in aggregator vs. all forwarders).

Disadvantages

  • Dedicated resources required for an aggregation instance.

With this architecture, in the aggregation layer, logs can be filtered and routed to different logs backends: Elastisearch and Loki. In the future different backend can be added to do further online processing. For example Kafka can be deployed as backend to build a Data Streaming Analytics architecture (Kafka, Apache Spark, Flink, etc) and route only the logs from a specfic application.

Logging solution installation procedure

The procedure for deploying logging solution stack is described in the following pages:

  1. Elasticsearch and Kibana installation

  2. Loki installation

  3. Fluentbit/Fluentd forwarder/aggregator architecture installation.

References


Last Update: Nov 19, 2022

Comments: