多模冗余 XMR
XMR(multi-modal redundancy,多模冗余)是一个投票选举族:N 份冗余 + 边缘投票,选举出多数一致的可靠结果。与 三模冗余 TMR 这一固定 3 路档相比,XMR 是任意 N 的一般化,且投票在消费/读端本地完成,无独立投票节点单点。
冗余档位
- DMR(双模,N=2)= 检错语义:两份分歧时无法定多数,诚实回
NO-CONSENSUS,不冒充可靠值。 - TMR(三模,N=3)= 纠 1 错:2/3 多数票胜出,可屏蔽 1 个错误/翻转结果。
- NMR(N 模,任意 N)= 一般化,启动几个角色就是几模冗余。
N 自动检测:消费/读端用存活感知(liveliness)自动获知组内在线 worker / 副本数量(auto-N),无需配置 expected;启动几个就是几模,故投票结果显示为 votes/N(如 2/3、3/3)。CRC32 端到端校验默认开启,被翻转的结果在投票前剔除。
注意:「三模」(TMR) 只是 XMR 族里的一档,不等于「多模冗余」整体。
XMR 冗余计算
N 个 worker 各自把同一整数任务翻倍冗余计算,消费端用 XMR 边缘投票自动选举出多数一致的可靠结果,屏蔽算错的 worker。--fault 模拟单粒子翻转,被多数票屏蔽(TMR 纠 1 错)。
关键 OnePath API
onepath_xmr_compute(s, &w, GROUP, fn, ud, NULL)— 把本进程注册为计算 worker,按组名 join,订阅任务、对每个任务跑fn并发布结果onepath_xmr_emit(sink, buf, n)— 在计算回调内提交本 worker 的结果onepath_xmr_consume(s, &c, GROUP, on_elected, NULL, NULL)— 注册投票消费端(opts=NULL 即 auto-N / 精确多数 / CRC32 默认开),对选举结果回调onepath_xmr_submit(s, GROUP, data, len)— 向组提交一个任务onepath_xmr_result_t(seq/data/data_len/votes/n/agreed)— 选举结果
c
/* worker:翻倍,faulty 则 +1 算错 */
static void double_fn(void *ud, const void *in, size_t in_len, onepath_xmr_sink_t *sink) {
int fault = *(int *)ud; char buf[64];
long r = strtol((const char *)in, NULL, 10) * 2 + (fault ? 1 : 0);
int n = snprintf(buf, sizeof buf, "%ld", r);
onepath_xmr_emit(sink, buf, (size_t)n);
}
static void on_elected(void *ud, const onepath_xmr_result_t *r) {
EP_OK("elected seq=%llu -> '%.*s' (votes=%d/%d, %s)",
(unsigned long long)r->seq, (int)r->data_len, (const char *)r->data,
r->votes, r->n, r->agreed ? "AGREED" : "NO-CONSENSUS");
}
onepath_xmr_compute(s, &w, GROUP, double_fn, &fault, NULL); /* worker */
onepath_xmr_consume(s, &c, GROUP, on_elected, NULL, NULL); /* consume */
onepath_xmr_submit(s, GROUP, argv[2], strlen(argv[2])); /* submit */bash
./examples/build/release/full/onepath_xmr_compute_demo consume &
./examples/build/release/full/onepath_xmr_compute_demo worker &
./examples/build/release/full/onepath_xmr_compute_demo worker &
./examples/build/release/full/onepath_xmr_compute_demo worker --fault &
./examples/build/release/full/onepath_xmr_compute_demo submit 21 # 期望选举 42text
[ OK ] voting consumer ready on 'demo/xmr/compute' (auto-N via liveliness)
[ OK ] worker ready on 'demo/xmr/compute'
[ OK ] worker ready on 'demo/xmr/compute'
[ OK ] worker ready on 'demo/xmr/compute' (FAULTY)
[ OK ] elected seq=... -> '42' (votes=2/3, AGREED)faulty worker 算出的 43 被多数票否决,两个正确的 42 以 2/3 胜出(AGREED)——即 TMR 纠 1 错。票数不足多数时诚实回 agreed=0(NO-CONSENSUS),不冒充可靠结果。
变体:双后端。
XMR 冗余存储
一次写入广播到 N 个冗余存储副本各自存储,读端查询全部副本并按键做 XMR 边缘投票,选举出多数一致的可靠值并剔除被翻转的应答。
关键 OnePath API
onepath_xmr_store(s, &st, GROUP, NULL)— 把本进程注册为存储副本,订阅写入并存储、应答读取查询onepath_xmr_put(s, GROUP, key, val, len)— 向组写一个键值,广播到全部副本onepath_xmr_get(s, GROUP, keyexpr, NULL, on_elected, NULL)— 读取并在读端本地按键投票,对每个选举出的键回调;返回选举出的键数量onepath_xmr_result_t(存储场景用key/data/data_len/votes/n/agreed)
c
static void on_elected(void *ud, const onepath_xmr_result_t *r) {
EP_OK("elected '%s' = '%.*s' (votes=%d/%d, %s)",
r->key, (int)r->data_len, (const char *)r->data,
r->votes, r->n, r->agreed ? "AGREED" : "NO-CONSENSUS");
}
onepath_xmr_store(s, &st, GROUP, NULL); /* replica */
onepath_xmr_put(s, GROUP, argv[2], argv[3], strlen(argv[3])); /* put */
int n = onepath_xmr_get(s, GROUP, argv[2], NULL, on_elected, NULL); /* get */
EP_INFO("elected %d key(s)", n);bash
./examples/build/release/full/onepath_xmr_store_demo replica &
./examples/build/release/full/onepath_xmr_store_demo replica &
./examples/build/release/full/onepath_xmr_store_demo replica &
./examples/build/release/full/onepath_xmr_store_demo put sensor/temp 25
./examples/build/release/full/onepath_xmr_store_demo get sensor/temptext
[ OK ] replica ready on 'demo/xmr/store'
[ OK ] replica ready on 'demo/xmr/store'
[ OK ] replica ready on 'demo/xmr/store'
[ OK ] put 'sensor/temp' = '25'
[ OK ] elected 'sensor/temp' = '25' (votes=3/3, AGREED)
[INFO] elected 1 key(s)读端查询到 3 个副本各自应答,按键投票 3 票一致选出可靠值 25(AGREED);某副本应答传输中被翻转(CRC 校验失败)会被剔除,由其余副本多数票纠正。
变体:双后端。