Skip to content

可观测性

本章演示 OnePath 内建的两类可观测能力:端到端分布式链路追踪(三进程,零代码修改、纯环境变量开关)与 网络拓扑地图(节点局部视图 + 分布式聚合全局图)。追踪机制的完整说明见 追踪指南,拓扑机制见 拓扑场景

端到端链路追踪

三进程链路追踪演示:sensor → router → storage。sensor 发布消息,router 用一行 onepath_forward 转发到 storage,同一 trace_id 跨三跳不变,验证自动埋点与 TLV attachment 透传。

零代码修改启用——全部开关由环境变量控制:

环境变量取值示例作用
ONEPATH_TRACE_ENABLE1追踪总开关;未设则零开销
ONEPATH_TRACE_SAMPLE_RATIO1.0头部采样率 [0.0, 1.0]1.0 = 全采
ONEPATH_TRACE_NDJSON_PATH/tmp/sensor.ndjsonNDJSON 输出路径,每行一个 span JSON
ONEPATH_TRACE_SERVICE_NAMEsensorresource 属性 service.name,用于区分进程

关键 OnePath API

  • onepath_forward(sample, pub) — 核心追踪助手:从线程调用栈顶取父 span,起一个 PRODUCER 子 span,把新的追踪上下文注入 TLV attachment 后转发;trace_id 跨跳不变,每跳更新 span_id
  • onepath_declare_publisher / onepath_publisher_put — 发布消息(自动埋点产生 span)
  • onepath_subscribe / onepath_sample_release — 订阅与释放样本
c
static onepath_publisher_t g_router_pub;

static void router_handler(onepath_sample_t *sample, void *userdata) {
    onepath_forward(sample, g_router_pub);   /* 起 PRODUCER 子 span 并注入 TLV,trace_id 不变 */
    EP_INFO("[router] forward %zu bytes", sample->data_len);
    onepath_sample_release(sample);
}
/* sensor 端: */
onepath_declare_publisher(s, &pub, SENSOR_KEY, NULL);
onepath_publisher_put(pub, buf, (size_t)n);

按 storage → router → sensor 顺序启动三个进程(每个都带 ONEPATH_TRACE_* 环境变量),默认 peer 模式走组播 P2P,无需路由节点:

bash
# 终端 1: storage(订阅 demo/storage)
ONEPATH_TRACE_ENABLE=1 ONEPATH_TRACE_SAMPLE_RATIO=1.0 \
ONEPATH_TRACE_NDJSON_PATH=/tmp/storage.ndjson ONEPATH_TRACE_SERVICE_NAME=storage \
./examples/build/release/full/onepath_tracing_demo storage

# 终端 2: router(订阅 demo/sensor,用 onepath_forward 转发到 demo/storage)
ONEPATH_TRACE_ENABLE=1 ONEPATH_TRACE_SAMPLE_RATIO=1.0 \
ONEPATH_TRACE_NDJSON_PATH=/tmp/router.ndjson ONEPATH_TRACE_SERVICE_NAME=router \
./examples/build/release/full/onepath_tracing_demo router

# 终端 3: sensor(发布到 demo/sensor)
ONEPATH_TRACE_ENABLE=1 ONEPATH_TRACE_SAMPLE_RATIO=1.0 \
ONEPATH_TRACE_NDJSON_PATH=/tmp/sensor.ndjson ONEPATH_TRACE_SERVICE_NAME=sensor \
./examples/build/release/full/onepath_tracing_demo sensor

跑完后用 jq 查看链路(同一 trace_id 应跨三份 NDJSON 出现):

bash
jq -r '.trace_id + " " + .name + " svc=" + .resource["service.name"]' \
   /tmp/{sensor,router,storage}.ndjson
text
[2026-06-21-17-20-56:466] [INFO] [sensor] publishing to demo/sensor (rounds=10)
[2026-06-21-17-20-56:466] [INFO] [sensor] -> sensor-msg-0
[2026-06-21-17-20-56:467] [INFO] [router] forward 12 bytes
[2026-06-21-17-20-56:467] [INFO] [storage] <- sensor-msg-0 (e2e via TLV)
[2026-06-21-17-20-56:967] [INFO] [storage] <- sensor-msg-1 (e2e via TLV)

每条消息产生 4 个 span(sensor PRODUCER → router CONSUMER → router PRODUCER → storage CONSUMER),trace_id 跨跳不变、span_id 每跳变化、parent_id 链完整。unset ONEPATH_TRACE_ENABLE 即回归零开销。

变体:双后端。两个后端均支持追踪,输出格式一致。

网络拓扑地图

拓扑感知两层能力:本节点局部视图(直连邻居 / 链路 / 是否零拷贝)+ 分布式 agent 聚合出全局拓扑图(节点 + 边 + 传输标签 + 服务归类)。peer 模式组播 P2P,无需路由节点。

关键 OnePath API

  • onepath_topology_local(s, &local) / onepath_topology_local_free(&local) — 查询/释放本节点局部视图(self id、声明的服务、直连邻居及链路标记)
  • onepath_topology_agent_start(s) / onepath_topology_agent_stop(s) — 启停拓扑 agent(宣告在线、对外提供本节点局部视图供聚合)
  • onepath_topology_snapshot(s, &g, timeout_ms) / onepath_topology_graph_free(&g) — 聚合/释放全局拓扑图(nodes + edges)
c
onepath_open_peer(&s);
onepath_declare_publisher(s, &pub, key, NULL);
onepath_topology_agent_start(s);
while (g_running) {
    onepath_topo_local_t local;
    if (onepath_topology_local(s, &local) == ONEPATH_OK) {
        printf("self : %s (%s)\n", local.self_zid, whatami_str(local.self_whatami));
        onepath_topology_local_free(&local);
    }
    onepath_topo_graph_t g;
    if (onepath_topology_snapshot(s, &g, 2000) == ONEPATH_OK)
        onepath_topology_graph_free(&g);
    sleep(3);
}
onepath_topology_agent_stop(s);
bash
./examples/build/release/full/onepath_topology_map                 # 单节点查看局部视图
./examples/build/release/full/onepath_topology_map nodeA &         # 多终端聚合全局图
./examples/build/release/full/onepath_topology_map nodeB &
./examples/build/release/full/onepath_topology_map nodeC
text
[2026-06-21-17-19-33:379] [ OK ] [agent] started, role=node
self  : fb56d32763f82fbcf104ee84855ea5b1 (Peer)
nbr   : 3
  [Peer] 6abbdc8ccf808671dcb924062b1ae132 (SHM)
        link tcp  dst=tcp/127.0.0.1:7804
[2026-06-21-17-19-33:379] [INFO] --- global topology graph (4 nodes, 3 edges) ---
  EDGE 6abbdc8ccf808671dcb924062b1ae132 <--tcp/shm--> fb56d32763f82fbcf104ee84855ea5b1

变体:双后端。拓扑感知与 agent 聚合两个后端均支持,输出格式一致。其中同主机零拷贝标记((SHM) / <--tcp/shm-->)仅在完整版出现——这是 SHM 能力边界,精简版不显示该标记。详见 拓扑场景

OnePath™ 以预构建库形式交付,运行时零外部依赖。