Skip to content

Latest commit

 

History

History
292 lines (191 loc) · 12.4 KB

README.md

File metadata and controls

292 lines (191 loc) · 12.4 KB

Telemetry Controller

Telemetry Controller collects, routes, and forwards telemetry data (logs, metrics and traces) from Kubernetes clusters supporting multi-tenancy out of the box.

The Telemetry Controller provides isolation and access control for telemetry data, similar to what Kubernetes provides for pods, secrets, and other resources. It provides an opinionated, convenient, and robust multi-tenant API on top of OpenTelemetry, and introduces new resources that give granular control over the shared data, while hiding the complexity of setting up and maintaining OpenTelemetry Collector manually.

Description

Telemetry Controller can be configured using Custom Resources to set up an opinionated Opentelemetry Collector configuration to route log messages based on rules defined as a Tenant -> Subscription relation map. That way:

  • Administrators can define a collector and tenants to provide isolation and access control for telemetry data. These are cluster scoped resources.
  • Users can create subscriptions to select telemetry data streams that only their tenant can access.
  • Users can create or refer the available outputs in their subscriptions to route and transport data. That way users can configure what they want to collect and where they want to send it - within their tenant’s scope.

Telemetry Controller flow diagram

Telemetry Controller can collect container logs that come from stdout/stderr and are written to the host filesystem by the container runtime.

Collector

Collectors specify global settings for the OTEL Collector DaemonSet, and a tenantSelector that lists the Tenants that the collector should pick up. The collector also attaches metadata to the telemetry data sources: for Kubernetes logs, it fetches additional metadata like pod labels and adds those as attributes to log records.

Tenants

Typically, a tenant is a set of Kubernetes namespaces, which is a best practice for managing multi-tenant workloads inside a single cluster. Tenant resources specify:

  • subscriptionNamespaceSelectors for namespaces that select subscriptions created by the tenant users, and
  • logSourceNamespaceSelectors that specify the namespaces where the logs are produced (that are also the concern of the tenant users).

In trivial use cases these two label selectors are the same.

The Tenant is actually a routing rule that helps to make sure that telemetry data is only accessible to a given Subscription if it matches the policies set by the administrator.

Subscriptions

Tenant users can define their Subscriptions in the namespace(s) of their Tenants. Subscriptions can select from the telemetry data (that is already filtered as part of the Tenant definition) and set Output endpoints where the data is forwarded. Such an endpoint can be:

  • an aggregator, for example, Logging operator,
  • a remote telemetry backend, for example, Loki, Jaeger, or Prometheus, or
  • a managed service provider, for example, Splunk or Sumo Logic.

Telemetry Controller CR flow

Getting Started

To get started with the Telemetry Controller, complete the following steps. Alternatively, see our Telemetry Controller overview and quickstart blog post.

Prerequisites

  • go version v1.22+
  • docker version 24+
  • kubectl version v1.26+
  • kubernetes v1.26+ with containerd as the container runtime

Optional: create a cluster locally

We recommend using kind or minikube for local experimentation and development.

Kind uses containerd by default, but for minikube you have to start the cluster using the --container-runtime=containerd flag.

kind create cluster
# or
minikube start --container-runtime=containerd

Deployment steps for users

Deploy latest telemetry-controller:

# Install telemetry-controller, and opentelemetry-operator as a sub-chart
helm upgrade --install --wait --create-namespace --namespace telemetry-controller-system telemetry-controller oci://ghcr.io/kube-logging/helm-charts/telemetry-controller

Deployment steps for devs

Install deps, CRDs and RBAC

# Install dependencies (opentelemtry-operator):
make install-deps

# Install the CRDs and RBAC into the cluster:
make install

Run

# Option 1 (faster): Run the operator from you local machine (uses cluster-admin rights)
make run

# Option 2 (safer): Build and run the operator inside the cluster (uses proper RBAC)
make docker-build IMG=telemetry-controller:latest

kind load docker-image telemetry-controller:latest
# or
minikube image load telemetry-controller:latest

make deploy IMG=telemetry-controller:latest

Example setup

You can deploy the example configuration provided as part of the docs. This will deploy a demo pipeline with one tenant, two subscriptions, and an OpenObserve instance. Deploying OpenObserve is an optional, but recommended step, logs can be forwarded to any OTLP endpoint. OpenObserve provides a UI to visualize the ingested logstream.

# Deploy OpenObserve
kubectl apply -f docs/examples/simple-demo/openobserve.yaml

# Set up portforwarding for OpenObserve UI
kubectl -n openobserve port-forward svc/openobserve 5080:5080 &

Open the UI at localhost:5080, navigate to the Ingestion/OTEL Collector tab, and copy the authorization token as seen on the screenshot. OpenObserve auth

Paste this token to the example manifests:

sed -i '' -e "s/\<TOKEN\>/INSERT YOUR COPIED TOKEN HERE/" docs/examples/simple-demo/one_tenant_two_subscriptions.yaml

Note: Telemetry Controller supports batching, you can enable it by adding it to your output definition.

We reccommend the following settings:

Low-Latency settings

This configuration prioritizes sending small, frequent batches over achieving efficiency through larger batch sizes, it is useful for scenarios where minimal delay in data transmission is critical:

  • send_batch_size: 8192 (Minimum viable batch size for fast processing.)
  • timeout: 200ms (Low timeout ensures small batches are sent quickly.)

Example config:

apiVersion: telemetry.kube-logging.dev/v1alpha1
kind: Output
metadata:
  name: LL-output
  namespace: example
spec:
  batch:
    send_batch_size: 8192
    timeout: 200ms
  otlp:
    endpoint: example

Archival settings

This configuration maximizes resource usage, making it ideal for batch processing and data archival purposes, it is useful for scenarios where efficiency and throughput are prioritized over immediate transmission:

  • send_batch_size: 1048576 (Large batch size for optimal throughput.)
  • timeout: 60s (Wait for more data to maximize batch size.)

Example config:

apiVersion: telemetry.kube-logging.dev/v1alpha1
kind: Output
metadata:
  name: Archival-output
  namespace: example
spec:
  batch:
    send_batch_size: 1048576
    timeout: 60s
  otlp:
    endpoint: example
# Deploy the pipeline definition
kubectl apply -f docs/examples/simple-demo/one_tenant_two_subscriptions.yaml

Create a workload, which will generate logs for the pipeline:

helm install --wait --create-namespace --namespace example-tenant-ns --generate-name oci://ghcr.io/kube-logging/helm-charts/log-generator

NOTE: To exclude logs from a specific pod, add the telemetry.kube-logging.dev/exclude: "true" annotation to the pod.

Open the OpenObserve UI and inspect the generated log messages:

Set up portforwarding for OpenObserve UI

kubectl -n openobserve port-forward svc/openobserve 5080:5080

OpenObserve logs

Sending logs to logging-operator (example)

(For a more detailed description see our Sending data to the Logging Operator blog post.)

Install dependencies (cert-manager and opentelemetry-operator):

kubectl apply -f /~https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
kubectl apply -f /~https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.112.0/opentelemetry-operator.yaml
# Wait for the opentelemtry-operator to be running
kubectl wait --namespace opentelemetry-operator-system --for=condition=available deployment/opentelemetry-operator-controller-manager --timeout=300s

Deploy latest telemetry-controller:

kubectl apply -k github.com/kube-logging/telemetry-controller/config/default --server-side

Install logging-operator

helm upgrade --install logging-operator oci://ghcr.io/kube-logging/helm-charts/logging-operator --version=4.6.0 -n logging-operator --create-namespace

Install log-generator

helm upgrade --install --wait log-generator oci://ghcr.io/kube-logging/helm-charts/log-generator -n log-generator --create-namespace

Apply the provided example resource for logging-operator: logging-operator.yaml

kubectl apply -f logging-operator.yaml

Apply the provided example resource for telemetry-controller: telemetry-controller.yaml

kubectl apply -f telemetry-controller.yaml

Under the hood

Telemetry Controller uses a custom OpenTelemetry Collector distribution as its agent. This distribution is and will be compatible with the upstream OpenTelemetry Collector distribution regarding core features, but:

  • We reduce the footprint of the final image by removing unnecessary components. This reduces not just the size, but also the vulnerability surface of the collector.
  • We include additional components with features not available in the upstream OpenTelemetry Collector, for example, to provide a richer set of metrics.
  • We use the OpenTelemetry Operator as the primary controller to implicitly manage the collector.

OpenTelemetry Collector runs as a DaemonSet, mounting and reading the container log files present on the node. During the initial parsing of the log entries, we extract the pod name, pod namespace, and some other metadata. This allows us to associate the log entry to the respective source pod through the Kubernetes API, and to fetch metadata which cannot be extracted from the message alone.

Support

If you encounter problems while using the Telemetry Controller, open an issue or talk to us on the #logging-operator Discord channel.

Further info

For further information, use cases, and tutorials, read our blog posts about Telemetry Controller, for example:

We also give talks about Telemetry Controller at various open source conferences, for example:

Contributing

If you find this project useful, help us:

  • Support the development of this project and star this repo! ⭐
  • Help new users with issues they may encounter 💪
  • Send a pull request with your new features and bug fixes 🚀

Please read the Organisation's Code of Conduct!

For more information, read our organization's contribution guidelines.

License

The project is licensed under the Apache 2.0 License.