Skip to content

Commit

Permalink
feat: add support for OpenTelemetry (#205)
Browse files Browse the repository at this point in the history
Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com>
Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com>
  • Loading branch information
3 people authored Jan 13, 2025
1 parent 141cae7 commit 1fcc20a
Show file tree
Hide file tree
Showing 21 changed files with 906 additions and 58 deletions.
19 changes: 19 additions & 0 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (

"github.com/googleapis/genai-toolbox/internal/log"
"github.com/googleapis/genai-toolbox/internal/server"
"github.com/googleapis/genai-toolbox/internal/telemetry"
"github.com/spf13/cobra"
"gopkg.in/yaml.v3"
)
Expand Down Expand Up @@ -106,6 +107,9 @@ func NewCommand(opts ...Option) *Command {
flags.StringVar(&cmd.tools_file, "tools_file", "tools.yaml", "File path specifying the tool configuration.")
flags.Var(&cmd.cfg.LogLevel, "log-level", "Specify the minimum level logged. Allowed: 'DEBUG', 'INFO', 'WARN', 'ERROR'.")
flags.Var(&cmd.cfg.LoggingFormat, "logging-format", "Specify logging format to use. Allowed: 'standard' or 'JSON'.")
flags.BoolVar(&cmd.cfg.TelemetryGCP, "telemetry-gcp", false, "Enable exporting directly to Google Cloud Monitoring.")
flags.StringVar(&cmd.cfg.TelemetryOTLP, "telemetry-otlp", "", "Enable exporting using OpenTelemetry Protocol (OTLP) to the specified endpoint (e.g. 'http://127.0.0.1:4318')")
flags.StringVar(&cmd.cfg.TelemetryServiceName, "telemetry-service-name", "toolbox", "Sets the value of the service.name resource attribute for telemetry data.")

// wrap RunE command so that we have access to original Command object
cmd.RunE = func(*cobra.Command, []string) error { return run(cmd) }
Expand Down Expand Up @@ -173,6 +177,21 @@ func run(cmd *Command) error {
return fmt.Errorf("logging format invalid.")
}

// Set up OpenTelemetry
otelShutdown, err := telemetry.SetupOTel(ctx, cmd.Command.Version, cmd.cfg.TelemetryOTLP, cmd.cfg.TelemetryGCP, cmd.cfg.TelemetryServiceName)
if err != nil {
errMsg := fmt.Errorf("error setting up OpenTelemetry: %w", err)
cmd.logger.ErrorContext(ctx, errMsg.Error())
return errMsg
}
defer func() {
err := otelShutdown(ctx)
if err != nil {
errMsg := fmt.Errorf("error shutting down OpenTelemetry: %w", err)
cmd.logger.ErrorContext(ctx, errMsg.Error())
}
}()

// Read tool file contents
buf, err := os.ReadFile(cmd.tools_file)
if err != nil {
Expand Down
24 changes: 24 additions & 0 deletions cmd/root_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ func withDefaults(c server.ServerConfig) server.ServerConfig {
if c.Port == 0 {
c.Port = 5000
}
if c.TelemetryServiceName == "" {
c.TelemetryServiceName = "toolbox"
}
return c
}

Expand Down Expand Up @@ -137,6 +140,27 @@ func TestServerConfigFlags(t *testing.T) {
LogLevel: "WARN",
}),
},
{
desc: "telemetry gcp",
args: []string{"--telemetry-gcp"},
want: withDefaults(server.ServerConfig{
TelemetryGCP: true,
}),
},
{
desc: "telemetry otlp",
args: []string{"--telemetry-otlp", "http://127.0.0.1:4553"},
want: withDefaults(server.ServerConfig{
TelemetryOTLP: "http://127.0.0.1:4553",
}),
},
{
desc: "telemetry service name",
args: []string{"--telemetry-service-name", "toolbox-custom"},
want: withDefaults(server.ServerConfig{
TelemetryServiceName: "toolbox-custom",
}),
},
}
for _, tc := range tcs {
t.Run(tc.desc, func(t *testing.T) {
Expand Down
96 changes: 96 additions & 0 deletions docs/telemetry/guide_collector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Use collector to export telemetry (trace and metric) data
Collector receives telemetry data, processes the telemetry, and exports it to a wide variety of observability backends using its components.

## Collector
The OpenTelemetry Collector removes the need to run, operate, and maintain multiple
agents/collector. This works well with scalability and supports open source
observability data formats senidng to one or more open source or commercial
backends. In addition, collector also provide other benefits such as allowing
your service to offload data quickly while it take care of additional handling
like retries, batching, encryption, or even sensitive data filtering.

To run a collector, you will have to provide a configuration file. The
configuration file consists of four classes of pipeline component that access
telemetry data.
- `Receivers`
- `Processors`
- `Exporters`
- `Connectors`

Example of setting up the classes of pipeline components (in this example, we
don't use connectors):

```yaml
receivers:
otlp:
protocols:
http:
endpoint: "127.0.0.1:4553"

exporters:
googlecloud:
project: <YOUR_GOOGLE_CLOUD_PROJECT>

processors:
batch:
send_batch_size: 200
```
After each pipeline component is configured, you will enable it within the
`service` section of the configuration file.

```yaml
service:
pipelines:
traces:
receivers: ["otlp"]
processors: ["batch"]
exporters: ["googlecloud"]
```

For a conceptual overview of the Collector, see [Collector][collector].

[collector]: https://opentelemetry.io/docs/collector/

## Using a Collector
There are a couple of steps to run and use a Collector.

1. Obtain a Collector binary. Pull a binary or Docker image for the
OpenTelemetry contrib collector.

1. Set up credentials for telemetry backend.

1. Set up the Collector config.
Below are some examples for setting up the Collector config:
- [Google Cloud Exporter][google-cloud-exporter]
- [Google Managed Service for Prometheus Exporter][google-prometheus-exporter]

1. Run the Collector with the configuration file.

```bash
./otelcol-contrib --config=collector-config.yaml
```

1. Run toolbox with the `--telemetry-otlp` flag. Configure it to send them to
`http://127.0.0.1:4553` (for HTTP) or the Collector's URL.

```bash
./toolbox --telemetry-otlp=http://127.0.0.1:4553
```

1. Once telemetry datas are collected, you can view them in your telemetry
backend. If you are using GCP exporters, telemetry will be visible in GCP
dashboard at [Metrics Explorer][metrics-explorer] and [Trace
Explorer][trace-explorer].

> [!NOTE]
> If you are exporting to Google Cloud monitoring, we recommend that you use
> the Google Cloud Exporter for traces and the Google Managed Service for
> Prometheus Exporter for metrics.

[google-cloud-exporter]:
/~https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter
[google-prometheus-exporter]:
/~https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter#example-configuration
[metrics-explorer]: https://console.cloud.google.com/monitoring/metrics-explorer
[trace-explorer]: https://console.cloud.google.com/traces
183 changes: 183 additions & 0 deletions docs/telemetry/telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Telemetry for Toolbox

Telemetry data such as logs, metrics, and traces will help developers understand
the internal state of the system.

Toolbox exports telemetry data of logs via standard out/err, and traces/metrics
through OpenTelemetry. Additional flags can be passed to Toolbox to enable
different logging behavior, or to export metrics through a specific
[exporter](#exporter).


## Logging

### Logging format
Toolbox supports both text and structured logging format.

The text logging (also the default logging format) outputs log as string:
```
2024-11-12T15:08:11.451377-08:00 INFO "Initialized 0 sources.\n"
```

The structured logging outputs log as JSON:
```
{
"timestamp":"2024-11-04T16:45:11.987299-08:00",
"severity":"ERROR",
"logging.googleapis.com/sourceLocation":{...},
"message":"unable to parse tool file at \"tools.yaml\": \"cloud-sql-postgres1\" is not a valid kind of data source"
}
```
> [!NOTE]
> `logging.googleapis.com/sourceLocation` shows the source code location
> information associated with the log entry, if any.
### Log level
Toolbox supports four log levels, including `Debug`, `Info`, `Warn`,
and `Error`. Toolbox will only output logs that are equal or more severe to the
level that it is set. Below are the log levels that Toolbox supports in the
order of severity.

| **Log level** | **Description** |
|---------------|-----------------|
| Debug | Debug logs typically contain information that is only useful during the debugging phase and may be of little value during production. |
| Info | Info logs include information about successful operations within the application, such as a successful start, pause, or exit of the application. |
| Warn | Warning logs are slightly less severe than error conditions. While it does not cause an error, it indicates that an operation might fail in the future if action is not taken now. |
| Error | Error log is assigned to event logs that contain an application error message. |

### Logging Configurations
The following flags can be used to customize Toolbox logging:

| **Flag** | **Description** |
|----------|-----------------|
| `--log-level` | Preferred log level, allowed values: `debug`, `info`, `warn`, `error`. Default: `info`. |
| `--logging-format` | Preferred logging format, allowed values: `standard`, `json`. Default: `standard`. |

#### Example:

```bash
./toolbox --tools_file "tools.yaml" --log-level warn --logging-format json
```

## Telemetry
### Metrics
A metric is a measurement of a service captured at runtime. The collected data
can be used to provide important insights into the service.
Toolbox provides the following custom metrics:

| **Metric Name** | **Description** |
|-----------------|-----------------|
| `toolbox.server.toolset.get.count` | Counts the number of toolset manifest requests served |
| `toolbox.server.tool.get.count` | Counts the number of tool manifest requests served |
| `toolbox.server.tool.get.invoke` | Counts the number of tool invocation requests served |

All custom metrics have the following attributes/labels:

| **Metric Attributes** | **Description** |
|-----------------|-----------------|
| `toolbox.name` | Name of the toolset or tool, if applicable. |
| `toolbox.status` | Operation status code, for example: `success`, `failure`. |

### Traces
Trace is a tree of spans that shows the path that a request makes through an
application.

Spans generated by Toolbox server is prefixed with `toolbox/server/`. For
example, when user run Toolbox, it will generate spans for the following, with
`toolbox/server/init` as the root span:

![traces](traces.png)

### Exporter
Exporter is responsible for processing and exporting telemetry data. Toolbox
generates telemetry data within the OpenTelemetry Protocol (OTLP), and user can
choose to use exporters that are designed to support the OpenTelemetry
Protocol. Within Toolbox, we provide two types of exporter implementation to
choose from, either the Google Cloud Exporter that will send data directly to
the backend, or the OTLP Exporter along with a Collector that will act as a
proxy to collect and export data to the telemetry backend of user's choice.

![telemetry_flow](telemetry_flow.png)

#### Google Cloud Exporter
The Google Cloud Exporter directly exports telemetry to Google Cloud Monitoring.
It utilizes the [GCP Metric Exporter][gcp-metric-exporter] and [GCP Trace
Exporter][gcp-trace-exporter].

[gcp-metric-exporter]:
/~https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/tree/main/exporter/metric
[gcp-trace-exporter]:
/~https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/tree/main/exporter/trace

> [!NOTE]
> If you're using Google Cloud Monitoring, the following APIs will need to be
enabled. For instructions on how to enable APIs, see [this
guide](https://cloud.google.com/endpoints/docs/openapi/enable-api):
>
> - logging.googleapis.com
> - monitoring.googleapis.com
> - cloudtrace.googleapis.com
#### OTLP Exporter
This implementation uses the default OTLP Exporter over HTTP for
[metrics][otlp-metric-exporter] and [traces][otlp-trace-exporter]. You can use
this exporter if you choose to export your telemetry data to a Collector.

[otlp-metric-exporter]: https://opentelemetry.io/docs/languages/go/exporters/#otlp-traces-over-http
[otlp-trace-exporter]: https://opentelemetry.io/docs/languages/go/exporters/#otlp-traces-over-http

### Collector
A collector acts as a proxy between the application and the telemetry backend. It
receives telemetry data, transforms it, and then exports data to backends that
can store it permanently. Toolbox provide an option to export telemetry data to user's choice of
backend(s) that are compatible with the Open Telemetry Protocol (OTLP). If you
would like to use a collector, please refer to this
[guide](./guide_collector.md).

### Telemetry Configurations
The following flags are used to determine Toolbox's telemetry configuration:

| **flag** | **type** | **description** |
|-------------------------------|----------|-----------------|
| `--telemetry-gcp` | bool | Enable exporting directly to Google Cloud Monitoring. Default is `false`. |
| `--telemetry-otlp` | string | Enable exporting using OpenTelemetry Protocol (OTLP) to the specified endpoint (e.g. 'http://127.0.0.1:4318'). |
| `--telemetry-service-name` | string | Sets the value of the `service.name` resource attribute. Default is `toolbox`. |

In addition to the flags noted above, you can also make additional configuration
for OpenTelemetry via the [General SDK Configuration][sdk-configuration] through
environmental variables.

[sdk-configuration]:
https://opentelemetry.io/docs/languages/sdk-configuration/general/

#### Example usage

To enable Google Cloud Exporter:
```bash
./toolbox --telemetry-gcp
```

To enable OTLP Exporter, provide Collector endpoint:
```bash
./toolbox --telemetry-otlp=http://127.0.0.1:4553
```

#### Resource Attribute
All metrics and traces generated within Toolbox will be associated with a
unified [resource][resource]. The list of resource attributes included are:

| **Resource Name** | **Description** |
|-------------------|-----------------|
| [TelemetrySDK](https://pkg.go.dev/go.opentelemetry.io/otel/sdk/resource#WithTelemetrySDK) | TelemetrySDK version info. |
| [OS](https://pkg.go.dev/go.opentelemetry.io/otel/sdk/resource#WithOS) | OS attributes including OS description and OS type. |
| [Container](https://pkg.go.dev/go.opentelemetry.io/otel/sdk/resource#WithContainer) | Container attributes including container ID, if applicable. |
| [Host](https://pkg.go.dev/go.opentelemetry.io/otel/sdk/resource#WithHost) | Host attributes including host name. |
| [SchemaURL](https://pkg.go.dev/go.opentelemetry.io/otel/sdk/resource#WithSchemaURL) | Sets the schema URL for the configured resource. |
| `service.name` | Open telemetry service name. Defaulted to `toolbox`. User can set the service name via flag mentioned above to distinguish between different toolbox service. |
| `service.version` | The version of Toolbox used. |


[resource]: https://opentelemetry.io/docs/languages/go/resources/



Binary file added docs/telemetry/telemetry_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/telemetry/traces.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1fcc20a

Please sign in to comment.