Skip to main content

Observability Guide

CloudBase provides built-in observability capabilities based on OpenTelemetry and OpenInference standards, helping developers track and monitor the complete execution chain of AI Agents.

Prerequisites

  • Agent application created (LangChain / LangGraph / CrewAI)
  • Corresponding SDK installed (cloudbase-agent-server / @cloudbase/agent-server)
  • Understanding of OpenTelemetry basics (optional)

Install Dependencies

# Basic dependencies
pip install cloudbase-agent-server cloudbase-agent-observability

# If you need to export to OTLP backend (e.g., Langfuse)
pip install opentelemetry-exporter-otlp

Overview

What is Observability

Observability is the ability to understand a system's internal state through signals output by the system (logs, metrics, traces). For AI Agent applications, observability helps you:

  • Track Execution Chains: View the complete call chain from the Agent receiving a request to returning a response
  • Identify Performance Bottlenecks: Identify time-consuming LLM calls or tool executions
  • Debug Issues: Analyze the Agent's decision process and tool call parameters
  • Optimize Costs: Calculate token usage and analyze model invocation frequency

Observability Features

  • Out-of-the-Box: Enable with one line of code or one environment variable, no complex configuration
  • Full-Chain Tracing: Automatically links Server layer → Adapter layer → Agent SDK layer call chains
  • Standardized: Follows OpenTelemetry and OpenInference semantic conventions
  • Multiple Export Targets: Supports console output (debugging) and OTLP export (Langfuse, Jaeger, etc.)

Architecture Principles

Span Hierarchy Example

Using a LangGraph workflow as an example, a typical Span hierarchy looks like:

AG-UI.Server (Request entry point)
└─ Adapter.LangGraph (Agent adapter layer)
└─ LangGraph
├─ node_a (LangGraph node)
│ └─ ChatOpenAI (LLM call)
├─ node_b (LangGraph node)
│ ├─ ChatOpenAI (LLM call)
│ └─ calculator (Tool call)
└─ synthesizer (LangGraph node)
└─ ChatOpenAI (LLM call)

Span Type Description

TypeIconDescriptionExamples
CHAIN⛓️Chained callsAdapter.LangGraph, LangGraph nodes
LLM💬LLM callsChatOpenAI, ChatAnthropic
TOOL🔧Tool callscalculator, get_weather
AGENT🤖Agent callsMulti-Agent orchestration scenarios

Standards Followed

  • OpenTelemetry: Standard framework for distributed tracing, providing concepts like Span, Trace, Context
  • OpenInference: Semantic conventions for AI applications, defining attribute specifications for Span types like LLM, TOOL, CHAIN

Key attributes include:

  • input.value / output.value: Input/output content
  • llm.model_name: Model identifier
  • llm.token_count.prompt / llm.token_count.completion: Token usage
  • tool.name: Tool function name

Quick Start

This is the simplest approach, no code changes required, just set environment variables.

# Enable console output (for local development debugging)
AUTO_TRACES_STDOUT=true

# Disable observability
AUTO_TRACES_STDOUT=false

Example:

# app.py - No code changes needed
from cloudbase_agent.server import AgentServiceApp
from cloudbase_agent.langgraph import LangGraphAgent

app = AgentServiceApp() # Automatically reads AUTO_TRACES_STDOUT environment variable
app.run(lambda: {"agent": agent})

Method 2: Enable via Code Configuration

For finer control (e.g., OTLP export configuration), you can explicitly configure via code.

from cloudbase_agent.server import AgentServiceApp
from cloudbase_agent.observability.server import ConsoleTraceConfig, OTLPTraceConfig

# Option A: Console output (local debugging)
app = AgentServiceApp(observability=ConsoleTraceConfig())

# Option B: Export to Langfuse
app = AgentServiceApp(
observability=OTLPTraceConfig(
endpoint="https://your-langfuse.com/api/public/otel/v1/traces",
headers={"Authorization": "Basic your-credentials"}
)
)

app.run(lambda: {"agent": agent})

Exporter Configuration

Console Export (Local Debugging)

The console exporter outputs Span information in JSON format to the console, suitable for local development and debugging.

Output Example:

{
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"span_id": "a1b2c3d4e5f67890",
"parent_span_id": "0987654321fedcba",
"name": "ChatOpenAI",
"kind": "SPAN_KIND_INTERNAL",
"start_time": "2025-01-15T08:30:00.123456Z",
"end_time": "2025-01-15T08:30:01.234567Z",
"attributes": {
"openinference.span.kind": "LLM",
"llm.model_name": "gpt-4",
"input.value": "Hello, how are you?",
"output.value": "I'm doing well, thank you!",
"llm.token_count.prompt": 5,
"llm.token_count.completion": 7
}
}

Viewing Tips:

# Save to file
python app.py 2>&1 | grep "^\\{" > spans.jsonl

# Parse and beautify output
python app.py 2>&1 | python -m json.tool

OTLP Export (Production)

OTLP (OpenTelemetry Protocol) is the standard transmission protocol for OpenTelemetry and can export trace data to various backends like Langfuse, Jaeger, Zipkin, etc.

Langfuse Configuration Example

Langfuse is an open-source LLM observability platform with native OpenTelemetry support.

from cloudbase_agent.observability.server import OTLPTraceConfig

app = AgentServiceApp(
observability=OTLPTraceConfig(
endpoint="https://cloud.langfuse.com/api/public/otel/v1/traces",
headers={
"Authorization": "Basic " + base64.b64encode(
f"{public_key}:{secret_key}".encode()
).decode()
}
)
)

Jaeger Configuration Example

Jaeger is an open-source distributed tracing system by Uber.

from cloudbase_agent.observability.server import OTLPTraceConfig

app = AgentServiceApp(
observability=OTLPTraceConfig(
endpoint="http://localhost:4318/v1/traces"
)
)

Complete Examples

Python + LangGraph + Langfuse

import os
import base64
from cloudbase_agent.server import AgentServiceApp
from cloudbase_agent.langgraph import LangGraphAgent
from cloudbase_agent.observability.server import OTLPTraceConfig
from langgraph.graph import StateGraph, MessagesState
from langchain_openai import ChatOpenAI

# Create Agent
def create_agent():
model = ChatOpenAI(
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_BASE_URL"),
model="gpt-4"
)

async def chat_node(state):
response = await model.ainvoke(state["messages"])
return {"messages": [response]}

workflow = StateGraph(MessagesState)
workflow.add_node("chat", chat_node)
workflow.add_edge("__start__", "chat")
workflow.add_edge("chat", "__end__")

return {
"agent": LangGraphAgent(
name="chatbot",
graph=workflow.compile()
)
}

# Configure Langfuse
credentials = base64.b64encode(
f"{os.getenv('LANGFUSE_PUBLIC_KEY')}:{os.getenv('LANGFUSE_SECRET_KEY')}".encode()
).decode()

app = AgentServiceApp(
observability=OTLPTraceConfig(
endpoint=f"{os.getenv('LANGFUSE_HOST')}/api/public/otel/v1/traces",
headers={"Authorization": f"Basic {credentials}"}
)
)

app.run(create_agent)

TypeScript + LangGraph + Console

import { createExpressRoutes } from "@cloudbase/agent-server";
import { LanggraphAgent } from "@cloudbase/agent-adapter-langgraph";
import { StateGraph, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import express from "express";
import { ExporterType } from "@cloudbase/agent-observability/server";

const app = express();

const createAgent = () => {
const model = new ChatOpenAI({
modelName: "gpt-4",
openAIApiKey: process.env.OPENAI_API_KEY,
});

const StateAnnotation = Annotation.Root({
messages: Annotation<string[]>({
reducer: (x, y) => x.concat(y),
default: () => [],
}),
});

const graph = new StateGraph(StateAnnotation)
.addNode("chat", async (state) => {
const response = await model.invoke(state.messages);
return { messages: [response.content] };
})
.addEdge("__start__", "chat")
.addEdge("chat", "__end__");

return {
agent: new LanggraphAgent({
name: "chatbot",
compiledWorkflow: graph.compile(),
}),
};
};

createExpressRoutes({
createAgent,
express: app,
observability: { type: ExporterType.Console },
});

app.listen(9000);

Best Practices

1. Use Console Export in Development Environments

During local development, using ConsoleTraceConfig or ExporterType.Console allows you to view trace data in real-time without configuring external services.

2. Use OTLP Export in Production Environments

For production environments, it's recommended to export trace data to professional platforms like Langfuse or Jaeger for long-term storage and analysis.