briancaffey/hermes-otel
OTel Plugin for Hermes Agent
hermes-otel is an observability plugin for the Hermes Agent that enables comprehensive monitoring of agentic workflows. It automatically captures and exports LLM tool calls, model invocations, and API requests as OpenTelemetry spans to any OTLP-compatible backend. The plugin provides deep visibility into session lifecycles, including parent-child span nesting and token usage aggregation. It supports a wide range of observability platforms including Phoenix, Langfuse, SigNoz, and Grafana for tracking traces, metrics, and logs.
- Exports LLM tool calls and model invocations as OTel spans
- Supports multiple backends including Phoenix, Langfuse, and SigNoz
- Aggregates token usage and session I/O metrics automatically
full readme from github
hermes-otel
OpenTelemetry plugin for Hermes Agent. Automatically exports LLM tool calls, model invocations, and API requests as OTel spans to any OTLP-compatible backend.
Backends
Tested with:
- Phoenix (local or cloud) — traces + metrics
- Langfuse (cloud or self-hosted) — traces only
- LangSmith (LangChain's tracing platform) — traces only
- SigNoz (cloud or self-hosted) — traces + metrics + logs
- Jaeger (local) — traces only
- Grafana Tempo (local or Grafana Cloud) — traces only
- Grafana LGTM (local) — traces + metrics + logs
- Uptrace (self-hosted) — traces + metrics + logs
- OpenObserve (self-hosted) — traces + metrics + logs
Any OTLP HTTP endpoint should work.
- For Phoenix see docker-compose/phoenix.yaml
- For Langfuse see https://langfuse.com/self-hosting/deployment/docker-compose
- For Langsmith see https://smith.langchain.com/
- For SigNoz see docker-compose/signoz/ (includes the upstream stack + port-remap notes)
- For Grafana LGTM see docker-compose/lgtm.yaml and docker-compose/lgtm/README.md
- For Uptrace see docker-compose/uptrace.yaml and docker-compose/uptrace/README.md
- For OpenObserve see docker-compose/openobserve.yaml and docker-compose/openobserve/README.md
Installation
hermes plugins install briancaffey/hermes-otel
The plugin lives in ~/.hermes/plugins/hermes_otel/ and Hermes auto-discovers it via plugin.yaml. However, the OTel dependencies must be installed into the hermes-agent virtual environment (where hermes itself runs):
# Install OTel runtime dependencies into the hermes-agent venv
~/git/hermes-agent/venv/bin/pip install \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-http
# Optional: for LangSmith time-ordered run IDs
~/git/hermes-agent/venv/bin/pip install langsmith
You can also install the plugin package itself in editable mode (this pulls in the same OTel deps automatically):
~/git/hermes-agent/venv/bin/pip install -e ~/.hermes/plugins/hermes_otel
Running tests
The test suite uses its own isolated environment via uv and does not require the hermes-agent venv:
cd ~/.hermes/plugins/hermes_otel
# Unit + integration tests (no Docker needed, <1s)
uv run --extra dev pytest
# All E2E tests (requires Docker)
uv run --extra dev --extra e2e pytest -m e2e
# Phoenix E2E only (starts a single container)
uv run --extra dev --extra e2e pytest -m phoenix
# Langfuse E2E only (starts full stack via docker compose)
uv run --extra dev --extra e2e pytest -m langfuse
# Smoke tests — full pipeline: hermes API server -> plugin -> Langfuse
uv run --extra dev --extra e2e pytest -m smoke
The default pytest run excludes E2E and smoke tests and completes in under a second.
Test tiers
The test suite is organized into four tiers, from fastest/simplest to slowest/most comprehensive:
| Tier | Marker | Tests | What it tests | Requirements |
|---|---|---|---|---|
| Unit | (default) | 109 | Hook logic, tracer init, helpers, SpanTracker | None |
| Integration | (default) | 19 | Full span export pipeline with InMemorySpanExporter, parent-child hierarchy, token roll-up, metrics | None |
| E2E | -m e2e |
6 | OTLP export to real Phoenix/Langfuse, queried via GraphQL/REST API | Docker |
| Smoke | -m smoke |
6 | Send real chats to hermes via OpenAI SDK, verify traces in Langfuse | hermes gateway + Langfuse |
Unit tests (tests/unit/) cover:
_safe_str,_to_int,_detect_session_kindhelper functionsSpanTrackerclass: span lifecycle, parent stack, end_allHermesOTelPlugin.init()environment detection (Phoenix vs Langfuse vs LangSmith priority)NoopSpangraceful degradation when OTel is unavailable- All 8 hook callbacks with mocked tracer (span names, attributes, metric recording, module-state management)
Integration tests (tests/integration/) use a real OTel SDK with InMemorySpanExporter — no network needed:
- Individual hook pairs produce correctly attributed spans
- Parent-child nesting: Session > LLM > API > Tool (verified via span context)
- Full session lifecycle with token aggregation and session I/O roll-up
- Metric counters and histograms via
InMemoryMetricReader
E2E tests (tests/e2e/) invoke hooks directly against real backends and query their APIs:
- Phoenix: fires hooks, queries Phoenix GraphQL API at
/graphqlto verify spans - Langfuse: fires hooks, queries Langfuse REST API at
GET /api/public/observationsto verify observations
Smoke tests (tests/smoke/) exercise the complete production pipeline:
- test_hermes_api: verifies the hermes API server is functional (health, models, chat completion)
- test_hermes_langfuse: sends real chats via OpenAI SDK to hermes, then queries Langfuse to confirm traces arrived with correct span names, tool spans, and token data
E2E backends
Phoenix — single container, starts in seconds:
docker compose -f docker-compose/phoenix.yaml up -d
# or let the test fixture start it automatically
Langfuse — full stack (Langfuse + Postgres + Redis + ClickHouse + MinIO), starts in ~60s:
docker compose -f docker-compose/langfuse.yaml up -d
# Pre-seeded API keys: lf_pk_test_hermes_otel / lf_sk_test_hermes_otel
# UI at http://localhost:3000, OTEL endpoint at http://localhost:3000/api/public/otel
The E2E fixtures will start/stop Docker services automatically if they aren't already running. If a service is already running on the expected port, it is reused.
Smoke tests
Smoke tests exercise the full pipeline end-to-end:
OpenAI SDK --> hermes API server --> LLM --> OTEL plugin --> Langfuse
(port 8642) (hooks.py) (port 3000)
\ /
`--- pytest sends chat here pytest queries here ---`
They require:
- hermes-agent API server running with the OTEL plugin loaded. Add to
~/.hermes/.env:
Then start the gateway:API_SERVER_ENABLED=truehermes gateway - Langfuse running with credentials configured in
~/.hermes/.env(OTEL_LANGFUSE_*variables)
Tests skip automatically with a helpful message if either service is not reachable. The smoke tests poll the Langfuse observations API (up to 60-90s) to account for async trace ingestion.
Configuration
You can either pick one backend via environment variables (legacy mode,
shown below), or fan multiple backends out in parallel via
config.yaml. The two are mutually exclusive — when backends: is set in
the yaml file, env-var detection is skipped.
Multi-backend (config.yaml)
A fully annotated template lives at config.yaml.example
in the plugin root. Copy it to config.yaml and edit in place:
cp ~/.hermes/plugins/hermes_otel/config.yaml.example \
~/.hermes/plugins/hermes_otel/config.yaml
config.yaml is gitignored so local endpoints and (avoidable) secrets
never get committed. Only config.yaml.example is tracked. A minimal
multi-backend config looks like:
backends:
- type: phoenix
endpoint: http://localhost:6006/v1/traces
- type: jaeger
endpoint: http://localhost:4318/v1/traces
- type: tempo
endpoint: http://localhost:3200/v1/traces
- type: signoz
endpoint: http://localhost:4328/v1/traces
ingestion_key_env: OTEL_SIGNOZ_INGESTION_KEY # secret from env
- type: langfuse
public_key_env: LANGFUSE_PUBLIC_KEY
secret_key_env: LANGFUSE_SECRET_KEY
base_url: https://cloud.langfuse.com
- type: otlp # any other OTLP/HTTP collector
name: my-collector
endpoint: http://collector:4318/v1/traces
headers:
X-Auth: secret
Every entry gets its own BatchSpanProcessor and (where supported) its own
PeriodicExportingMetricReader. Each processor owns a background worker
thread, so a slow or unreachable collector cannot block the agent's hot
path or starve the others — span end is just a non-blocking enqueue. Both
trace and metrics export run in parallel across all configured backends.
Supported type values: phoenix, langfuse, signoz, jaeger, tempo,
otlp, lgtm, uptrace, openobserve. Use otlp for any collector
that doesn't have a dedicated type. Backends marked
traces-only (langfuse, jaeger, tempo) are auto-detected and skip
the metrics reader. Override with metrics: true|false per entry if
needed. See config.yaml.example for the full list of fields each type
accepts — Uptrace takes a dsn: for the uptrace-dsn header, OpenObserve
takes user: / password: for HTTP Basic auth, and so on.
Full-conversation capture
By default the llm.* span's input.value is just the latest user turn.
The underlying api.* spans don't expose per-message detail. To see the
entire message list the model actually saw, flip on
capture_conversation_history:
capture_conversation_history: true
conversation_history_max_chars: 40000 # safety cap; JSON is clipped with "..."
Or via env: HERMES_OTEL_CAPTURE_CONVERSATION_HISTORY=true. When enabled
the LLM span gets input.value = JSON-serialized history, input.mime_type = application/json, and hermes.conversation.message_count. Phoenix
pretty-prints the JSON in its Input panel; Langfuse / Jaeger / SigNoz show
it as a large string. Respects the global capture_previews kill switch.
Secrets should live in env vars (use the *_env: keys to reference them
by name) rather than inline in yaml. LangSmith remains an env-var-only
single-backend path; setting LANGSMITH_TRACING=true short-circuits the
yaml backend list.
Single backend (env vars)
Pick one backend:
Phoenix
export OTEL_PHOENIX_ENDPOINT="http://localhost:6006/v1/traces"
export OTEL_PROJECT_NAME=hermes-agent
Langfuse
# Option A (plugin-specific vars):
export OTEL_LANGFUSE_PUBLIC_API_KEY="pk-lf-..."
export OTEL_LANGFUSE_SECRET_API_KEY="sk-lf-..."
# Optional — defaults to EU cloud endpoint
export OTEL_LANGFUSE_ENDPOINT="https://cloud.langfuse.com/api/public/otel"
# For US region:
# export OTEL_LANGFUSE_ENDPOINT="https://us.cloud.langfuse.com/api/public/otel"
# Option B (Langfuse-standard vars from docs):
# export LANGFUSE_PUBLIC_KEY="pk-lf-..."
# export LANGFUSE_SECRET_KEY="sk-lf-..."
# export LANGFUSE_BASE_URL="https://cloud.langfuse.com" # or us.cloud/langfuse/self-hosted base URL
LangSmith
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
# Optional — defaults to LangChain Cloud
export LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
# Optional — project name for organizing traces
export LANGSMITH_PROJECT="hermes-langsmith-otel"
Note: Install
langsmithfor better time-ordered run IDs:pip install langsmith. The plugin useslangsmith.uuid7()for run IDs when available, otherwise falls back touuid.uuid4().
SigNoz
# Self-hosted (see docker-compose/signoz/ — OTLP HTTP is remapped to 4328
# to avoid colliding with Phoenix on 4318)
export OTEL_SIGNOZ_ENDPOINT="http://localhost:4328/v1/traces"
export OTEL_PROJECT_NAME=hermes-agent
# SigNoz Cloud — use the regional ingest URL + your ingestion key
# export OTEL_SIGNOZ_ENDPOINT="https://ingest.us.signoz.cloud:443/v1/traces"
# export OTEL_SIGNOZ_INGESTION_KEY="sz-..."
The plugin sends both traces and metrics over OTLP/HTTP. When
OTEL_SIGNOZ_INGESTION_KEY is set, the signoz-ingestion-key header is
attached to both exporters.
Jaeger
# Jaeger ≥ 1.35 accepts OTLP/HTTP natively on port 4318
export OTEL_JAEGER_ENDPOINT="http://localhost:4318/v1/traces"
export OTEL_PROJECT_NAME=hermes-otel-jaeger
Jaeger is traces-only — the plugin skips metric export when this backend is selected. If you need token/tool/cost metrics alongside Jaeger traces, pair it with a Prometheus-compatible metrics sink or use a unified backend (Phoenix, SigNoz).
Grafana Tempo
# Tempo accepts OTLP/HTTP natively on port 4318
export OTEL_TEMPO_ENDPOINT="http://localhost:4318/v1/traces"
export OTEL_PROJECT_NAME=hermes-otel-tempo
Run the upstream single-binary example (Tempo + MinIO + Grafana + Prometheus):
cd ~/git/grafana/tempo/example/docker-compose/single-binary
docker compose up -d
# UI: http://localhost:3000 (Grafana, anonymous admin)
# OTLP: http://localhost:4318 (HTTP) / localhost:4317 (gRPC)
Tempo is traces-only — the plugin skips metric export when this backend is selected. The upstream example already bundles Prometheus + Grafana, so token/tool/cost metrics can be routed there via a separate Prometheus remote-write or OTel collector if needed.
Optional
export OTEL_PROJECT_NAME="hermes-agent" # Shown in Phoenix
export HERMES_OTEL_DEBUG=true # Enable debug logging (see below)
Debug logging
The plugin prints only essential startup messages (backend connected/failed, hook count) to stdout. For detailed per-span logging (span start/end, parent nesting, token counts, HTTP payloads), enable debug mode:
export HERMES_OTEL_DEBUG=true
Debug output is written to ~/.hermes/plugins/hermes_otel/debug.log and does not clutter hermes stdout.
Priority order: LangSmith (if LANGSMITH_TRACING=true) > Langfuse (if credentials set) > SigNoz (OTEL_SIGNOZ_ENDPOINT) > Jaeger (OTEL_JAEGER_ENDPOINT) > Tempo (OTEL_TEMPO_ENDPOINT) > Phoenix (OTEL_PHOENIX_ENDPOINT).
Shaping knobs — config.yaml and HERMES_OTEL_* env vars
Backend selection stays env-var-driven (above). For telemetry shaping — sampling, preview size, resource attributes, TTL, extra headers — you can also use a YAML file at ~/.hermes/plugins/hermes_otel/config.yaml.
Precedence (per-field): HERMES_OTEL_* env var > config.yaml value > default.
Example config.yaml:
enabled: true
sample_rate: 0.25 # ParentBased(TraceIdRatioBased) — null/omit = sample everything
root_span_ttl_ms: 600000 # orphan-sweep threshold (10 min default)
flush_interval_ms: 60000 # metrics export cadence
preview_max_chars: 1200 # clip_preview truncation limit
capture_previews: true # false = suppress all input.value / output.value
project_name: hermes-prod # supersedes OTEL_PROJECT_NAME
global_tags:
team: platform
resource_attributes: # merged into Resource; overrides global_tags on key conflict
env: prod
region: us-east-1
headers: # merged onto outgoing OTLP requests
X-Scope-OrgID: tenant-a
Every field can be overridden by env var with prefix HERMES_OTEL_ (scalars only):
| Field | Env var |
|---|---|
enabled |
HERMES_OTEL_ENABLED (true/false) |
sample_rate |
HERMES_OTEL_SAMPLE_RATE (float 0..1, or 0 to disable) |
root_span_ttl_ms |
HERMES_OTEL_ROOT_SPAN_TTL_MS |
flush_interval_ms |
HERMES_OTEL_FLUSH_INTERVAL_MS |
preview_max_chars |
HERMES_OTEL_PREVIEW_MAX_CHARS |
capture_previews |
HERMES_OTEL_CAPTURE_PREVIEWS |
project_name |
HERMES_OTEL_PROJECT_NAME |
span_batch_max_queue_size |
HERMES_OTEL_SPAN_BATCH_MAX_QUEUE_SIZE |
span_batch_schedule_delay_ms |
HERMES_OTEL_SPAN_BATCH_SCHEDULE_DELAY_MS |
span_batch_max_export_batch_size |
HERMES_OTEL_SPAN_BATCH_MAX_EXPORT_BATCH_SIZE |
span_batch_export_timeout_ms |
HERMES_OTEL_SPAN_BATCH_EXPORT_TIMEOUT_MS |
force_flush_on_session_end |
HERMES_OTEL_FORCE_FLUSH_ON_SESSION_END |
pyyaml is optional — if not installed, the YAML file is silently skipped and only env vars + defaults apply. Malformed YAML logs a single warning and falls back to defaults.
Privacy mode
Set capture_previews: false (or HERMES_OTEL_CAPTURE_PREVIEWS=false) to suppress every input.value / output.value attribute. Useful for shared deployments where message content can't leave the process. A one-line startup banner confirms the mode is active.
Per-turn summary attributes
On on_session_end, the root session/agent span is enriched with a summary of what happened in the turn — so dashboards don't need to JOIN across spans.
| Attribute | Type | Meaning |
|---|---|---|
hermes.turn.tool_count |
int | distinct tool names invoked |
hermes.turn.tools |
string | sorted CSV of distinct tool names (≤500 chars) |
hermes.turn.tool_targets |
string | |-joined distinct file paths / URLs |
hermes.turn.tool_commands |
string | |-joined distinct shell commands |
hermes.turn.tool_outcomes |
string | sorted CSV of distinct outcome statuses |
hermes.turn.skill_count |
int | distinct skill names inferred |
hermes.turn.skills |
string | sorted CSV of distinct skill names |
hermes.turn.api_call_count |
int | pre_api_request hook invocations |
hermes.turn.final_status |
string | completed | interrupted | incomplete | timed_out |
Zero/empty aggregators are omitted rather than emitted as empty strings.
Tool identity, outcome, skill inference
Each tool.* span now also carries:
hermes.tool.target— first non-empty value under args.path/file_path/target/url/uri.hermes.tool.command— first non-empty value under args.command/cmd.hermes.tool.outcome— one ofcompleted·error·timeout·blocked· (explicitstatusfield from the result, lowercased). Onlyerrormaps the spanStatusCodetoERROR; timeouts/blocked stayOKso dashboards don't count them as failures.hermes.skill.name— inferred from args paths matching/skills/<name>/. Does not match/optional-skills/<name>/references/. Also increments ahermes.skill.inferred{skill_name, source}counter so ops can audit hit rates.
Orphan-span sweep
If a session never fires on_session_end (e.g. host crash mid-turn), it would otherwise leak active-span state. A TTL-based sweeper (default 10 min, configurable via root_span_ttl_ms) runs at the top of every pre_* hook; sessions older than the TTL are finalized with hermes.turn.final_status=timed_out and span status OK (not ERROR — timeouts should not pollute error rates).
Non-blocking span export
Spans are exported via OpenTelemetry's BatchSpanProcessor: span.end() enqueues the span to a bounded in-memory queue, and a background worker drains that queue in batches on a timer. This means a slow or unreachable OTLP backend no longer adds latency to every tool call / API request.
Export cadence:
- Background worker flushes every
span_batch_schedule_delay_ms(default 1s). - At the end of each session (
on_session_end), the plugin force-flushes so traces appear in the UI immediately rather than after the worker's next cycle. Disable withforce_flush_on_session_end: falseif you prefer to let the worker handle it. - On graceful process shutdown, an
atexithandler flushes the queue once so nothing is lost.
Backpressure: the queue is bounded by span_batch_max_queue_size (default 2048). If the agent outruns the exporter, the oldest enqueued spans are dropped — hermes keeps running rather than stalling.
Crash vs. graceful exit: up to schedule_delay_millis worth of spans may be lost on a hard crash (SIGKILL, OOM). This is the standard OTel trade-off and mirrors every production tracing stack. Graceful shutdown (hermes gateway stop, SIGTERM) triggers the atexit flush.
How it works
Hermes fires lifecycle hooks. This plugin maps them to OTel spans:
Turn 1:
session.{platform} / cron (root, when session hooks are available)
└── LLM span
└── API span (first call → stop or tool_calls)
└── Tool span(s) (if tools called)
└── API span (second call → final response)
Span hierarchy
| Span | Kind | Contains |
|---|---|---|
session.{platform} / cron |
GENERAL | Session metadata, completion/interruption status |
llm.{model} |
LLM | Model name, provider, user message (input), assistant response (output) |
api.{model} |
LLM | Token counts (prompt + completion), duration, finish reason, cache tokens |
tool.{name} |
TOOL | Tool name, arguments (input), result (output), error status |
Attribute conventions
The plugin emits dual-convention attributes so both backends work:
| Metric | Langfuse (gen_ai) | Phoenix (OpenInference) |
|---|---|---|
| Prompt tokens | gen_ai.usage.input_tokens |
llm.token_count.prompt |
| Completion tokens | gen_ai.usage.output_tokens |
llm.token_count.completion |
| Total tokens | — | llm.token_count.total |
| Cache read | gen_ai.usage.cache_read_input_tokens |
llm.token_count.cache_read |
| Cache write | gen_ai.usage.cache_creation_input_tokens |
llm.token_count.cache_write |
Langfuse uses gen_ai.content.prompt and gen_ai.content.completion for text. Phoenix uses input.value and output.value. Both are set on LLM spans.
File structure
| File | Role |
|---|---|
plugin.yaml |
Plugin manifest — declares hooks to Hermes |
__init__.py |
Entry point — initializes tracer, registers core hooks (+ session hooks when supported) |
tracer.py |
OTel TracerProvider setup, span lifecycle management, parent/child tracking |
hooks.py |
Hook implementations — maps Hermes events to OTel spans with attributes |
debug_utils.py |
Optional debug logging and secret masking |
docker-compose/ |
Docker Compose files for Phoenix and Langfuse backends |
tests/unit/ |
Unit tests — helpers, SpanTracker, tracer init, hook callbacks |
tests/integration/ |
Integration tests — InMemorySpanExporter, span hierarchy, metrics |
tests/e2e/ |
E2E tests — real Phoenix/Langfuse via Docker |
tests/smoke/ |
Smoke tests — full pipeline through hermes API server to Langfuse |
Roadmap: additional backends
This plugin speaks plain OTLP/HTTP, so any OTLP-compatible backend should work today with no code changes — just point OTEL_EXPORTER_OTLP_ENDPOINT at it. The list below tracks backends I plan to formally test, add a docker-compose/ file for, and (where applicable) cover with a smoke test.
Status legend: ✅ supported & tested · 🟡 should work, not yet tested/documented · 🔲 planned
| Backend | Signals | Deployment | Account / cost | Status |
|---|---|---|---|---|
| Phoenix | traces | Local (docker) · Arize AX cloud | OSS, no account · commercial cloud | ✅ |
| Langfuse | traces | Local (docker compose) · Cloud | OSS, no account · free tier + paid | ✅ |
| LangSmith | traces | Cloud only (self-host = enterprise) | Free personal tier · paid tiers | ✅ |
| Jaeger | traces | Local (single container) | OSS, no account needed | ✅ |
| SigNoz | traces + metrics + logs | Local (docker compose) · Cloud | OSS, no account · free tier + paid cloud | ✅ |
| Grafana Tempo | traces | Local (docker compose) · Grafana Cloud | OSS, no account · free tier + paid cloud | ✅ |
| Grafana LGTM | traces + metrics + logs | Local (single container) | OSS, no account | ✅ |
| OpenObserve | traces + metrics + logs | Local (single binary / docker) · Cloud | OSS, no account · free tier + paid cloud | ✅ |
| Uptrace | traces + metrics + logs | Local (docker compose) · Cloud | OSS, no account · free tier + paid cloud | ✅ |
| Honeycomb | traces + metrics | Cloud only | Free tier + paid | 🔲 |
| New Relic | traces + metrics + logs | Cloud only | Free tier (100 GB/mo) + paid | 🔲 |
| Elastic APM | traces + metrics + logs | Local (docker) · Elastic Cloud | OSS self-host · trial + paid cloud | 🔲 |
| Datadog | traces + metrics + logs | Cloud only | Trial only, paid thereafter | 🔲 |
Quick picks
- Fully offline / no account ever: Phoenix, Langfuse (self-hosted), Jaeger, SigNoz, Grafana Tempo+Mimir, OpenObserve, Uptrace, Elastic APM self-host. All runnable via
docker compose up. - Free SaaS (personal / hobby tier, no credit card): Langfuse Cloud, LangSmith, SigNoz Cloud, Grafana Cloud, Honeycomb, New Relic. Best if you don't want to run infrastructure.
- Paid only (credit card required after trial): Datadog, Dynatrace, LangSmith self-hosted (enterprise plan).
Free-tier limits change frequently — check each vendor's pricing page before committing. The table reflects what's advertised as of this writing.
Signals note
Jaeger and Tempo are both traces only. If you want both spans and the token/tool/cost metrics this plugin emits (via PeriodicExportingMetricReader), pair them with Prometheus, or pick one of the traces+metrics backends above.
Current limitations
- No full prompt capture — Hermes hooks don't expose the fully-formed prompt (system message + conversation history + tool results) to plugins. API spans only receive metadata (token counts, model, duration). The raw user message and assistant response appear on the parent LLM span.
- Langfuse auth — Requires both public and secret keys; Basic Auth is constructed automatically. If only one key is set, Langfuse mode won't activate.
- No gRPC — Only OTLP over HTTP/JSON is used. gRPC exporters are not included.
- Single session per run — Span tracking is in-memory; if Hermes restarts mid-session, active spans are lost. A TTL-based sweeper finalizes abandoned sessions (see "Orphan-span sweep" above), but the orphaned process's buffered spans still need a graceful
atexitto flush.