Skip to content

vacant.mvp

P7 demo layer — the four reference scenarios (law_firm, code_review, multilingual_translation, self_replication), the 8 P7 metrics, the demo CLI (vacant demo …), and the Streamlit dashboard.

demo

Demo CLI: python -m vacant.mvp.demo --scenario=<name> [--substrate=<backend>] [--seed=N].

Prints a JSON-encoded ScenarioResult to stdout for piping into jq / unit tests / fixture-snapshot tooling.

If VACANT_DEMO_DB_PATH (or --db) is set, the run streams events into the SQLite demo store so the dashboard / vacant demo --tail can read them back.

metrics

8 metrics for the P7 dashboard (P7_mvp.md §3) plus 7 Layer 9 health indicators from THEORY_V5 §Layer 9.

Each metric is exposed as: - compute_*(snapshot) -> value -- pure function over a MetricsSnapshot. - MetricsWriter -- accumulates the values plus a timestamp into an in-memory deque (and serialises to a SQLite metrics table when one is provided) so the dashboard can plot time series.

The snapshot is a frozen dataclass that any caller (a scenario, a unit test, or the dashboard itself) can build from the registry + the aggregator + the per-scenario ScenarioResult. It does NOT depend on any I/O; pure compute.

MetricsSnapshot dataclass

MetricsSnapshot(aggregator: Aggregator | None = None, vacants: dict[VacantId, dict[str, Any]] = dict(), manifests: tuple[ChildManifest, ...] = (), graduations: tuple[float, ...] = (), dispatch_latencies_ms: tuple[float, ...] = (), same_controller_eval: dict[str, int] = dict(), registry_writes_attempted: int = 0, registry_writes_seq_monotonic: int = 0, spawn_events: tuple[dict[str, Any], ...] = (), caller_selections: tuple[dict[str, Any], ...] = (), custody_uncertain_vids: frozenset[VacantId] = frozenset(), lineage_embeddings: dict[VacantId, tuple[tuple[float, ...], ...]] = dict(), peer_review_events: tuple[dict[str, Any], ...] = ())

Inputs to the metrics module. All optional -- missing fields return zero or empty for the corresponding metric.

vacants class-attribute instance-attribute

vacants: dict[VacantId, dict[str, Any]] = field(default_factory=dict)

vid -> {state: VacantState, parent_id: VacantId|None, n_calls: int}.

graduations class-attribute instance-attribute

graduations: tuple[float, ...] = ()

Unix timestamps of successful graduations.

dispatch_latencies_ms class-attribute instance-attribute

dispatch_latencies_ms: tuple[float, ...] = ()

Wall-clock latencies of call_capability in milliseconds.

same_controller_eval class-attribute instance-attribute

same_controller_eval: dict[str, int] = field(default_factory=dict)

{'true_positives': N, 'flagged_total': N} from the adversarial set.

registry_writes_seq_monotonic class-attribute instance-attribute

registry_writes_seq_monotonic: int = 0

Counters for the concurrent-writers metric.

spawn_events class-attribute instance-attribute

spawn_events: tuple[dict[str, Any], ...] = ()

Each entry shape: {"path": "D1|D2|D3|D4|D5|B|C|Z", "ts": float}. Powers d_spawn_ratio — the share of births that came from agent self-replication (D-paths) vs transitional / bootstrap paths.

caller_selections class-attribute instance-attribute

caller_selections: tuple[dict[str, Any], ...] = ()

Each entry: {"was_exploration": bool, "ts": float} recording whether the caller's UCB selection came from the exploration pool (INSUFFICIENT_DATA candidates) vs the greedy top-k. Powers exploration_ratio — V5 §3.6(a).

custody_uncertain_vids class-attribute instance-attribute

custody_uncertain_vids: frozenset[VacantId] = field(default_factory=frozenset)

Vacant IDs flagged custody_uncertain by the heartbeat watcher (consecutive missed HEARTBEAT_SUNK rounds past the threshold). Powers custody_uncertain_count — V5 §4.2.

lineage_embeddings class-attribute instance-attribute

lineage_embeddings: dict[VacantId, tuple[tuple[float, ...], ...]] = field(default_factory=dict)

Per-lineage-root: tuple of recent member embeddings (STYLO Vec16). lineage_capability_drift averages the L2 distance from the root's earliest embedding to the most recent member embedding, per lineage.

peer_review_events class-attribute instance-attribute

peer_review_events: tuple[dict[str, Any], ...] = ()

Each entry: {"target_vid": VacantId, "ts": float}. Powers peer_review_density — avg reviews per active vacant per week (THEORY_V5 §Layer 9).

MetricsWriter dataclass

MetricsWriter(max_points: int = 5000, samples: deque[tuple[float, str, Any]] = (lambda: deque(maxlen=5000))(), _lock: Lock = Lock())

In-memory ring buffer of (ts, metric_name, value) triples for time-series plotting. Configurable max length.

compute_reputation_distribution

compute_reputation_distribution(snap: MetricsSnapshot) -> dict[str, float]

Return per-dimension mean and stdev across all active vacants.

Source code in src/vacant/mvp/metrics.py
def compute_reputation_distribution(snap: MetricsSnapshot) -> dict[str, float]:
    """Return per-dimension mean and stdev across all active vacants."""
    if snap.aggregator is None:
        return {}
    out: dict[str, list[float]] = {
        "factual": [],
        "logical": [],
        "relevance": [],
        "honesty": [],
        "adoption": [],
    }
    for (vid, _sub), rep in snap.aggregator._posteriors.items():
        ctx = snap.aggregator._contexts.get(vid)
        if ctx is None or ctx.state not in (VacantState.ACTIVE, VacantState.LOCAL):
            continue
        for dim, mu in rep.means().items():
            out[dim].append(mu)
    summary: dict[str, float] = {}
    for dim, values in out.items():
        if not values:
            summary[f"mean_{dim}"] = 0.0
            summary[f"n_{dim}"] = 0.0
        else:
            summary[f"mean_{dim}"] = mean(values)
            summary[f"n_{dim}"] = float(len(values))
    return summary

compute_cold_start_uplift

compute_cold_start_uplift(snap: MetricsSnapshot) -> float

Fraction of calls that went to new vacants (n_eff < N_MIN_FOR_STABLE_SCORE on any dim). Higher = more exploration. Returns 0 when no calls are recorded.

Source code in src/vacant/mvp/metrics.py
def compute_cold_start_uplift(snap: MetricsSnapshot) -> float:
    """Fraction of calls that went to *new* vacants (n_eff < N_MIN_FOR_STABLE_SCORE
    on any dim). Higher = more exploration. Returns 0 when no calls are
    recorded."""
    from vacant.core.constants import N_MIN_FOR_STABLE_SCORE

    if snap.aggregator is None:
        return 0.0
    new_calls = 0
    total_calls = 0
    for vid, meta in snap.vacants.items():
        n = int(meta.get("n_calls", 0))
        total_calls += n
        rep: Beta5D | None = None
        for sub in ("default", "claude-sonnet-4-6", "gpt-4o", "local-ollama-llama3"):
            r = snap.aggregator._posteriors.get((vid, sub))
            if r is not None:
                rep = r
                break
        if rep is None:
            new_calls += n
            continue
        if any(n_eff < N_MIN_FOR_STABLE_SCORE for n_eff in rep.n_effs().values()):
            new_calls += n
    return (new_calls / total_calls) if total_calls else 0.0

compute_same_controller_detection_rate

compute_same_controller_detection_rate(snap: MetricsSnapshot) -> float

True-positive rate of the same-controller signal on the adversarial set: TP / (TP + FN). Computed from the dashboard's adversarial run.

Source code in src/vacant/mvp/metrics.py
def compute_same_controller_detection_rate(snap: MetricsSnapshot) -> float:
    """True-positive rate of the same-controller signal on the adversarial
    set: TP / (TP + FN). Computed from the dashboard's adversarial run."""
    eval_ = snap.same_controller_eval
    tp = eval_.get("true_positives", 0)
    fn = eval_.get("false_negatives", 0)
    return (tp / (tp + fn)) if (tp + fn) else 0.0

compute_lineage_depth_distribution

compute_lineage_depth_distribution(snap: MetricsSnapshot) -> dict[int, int]

Histogram: depth -> count. Depth 0 = root, depth 1 = root's child, ...

Source code in src/vacant/mvp/metrics.py
def compute_lineage_depth_distribution(snap: MetricsSnapshot) -> dict[int, int]:
    """Histogram: depth -> count. Depth 0 = root, depth 1 = root's child, ..."""
    by_id = {vid: meta for vid, meta in snap.vacants.items()}
    depths: Counter[int] = Counter()
    for vid in by_id:
        depth = 0
        cur = vid
        seen: set[VacantId] = set()
        while True:
            if cur in seen:
                break
            seen.add(cur)
            parent = by_id.get(cur, {}).get("parent_id")
            if parent is None:
                break
            depth += 1
            cur = parent
        depths[depth] += 1
    return dict(depths)

compute_graduation_rate

compute_graduation_rate(snap: MetricsSnapshot, *, window_s: float = 86400.0) -> float

Graduations per spawn event in the same window.

THEORY_V5 §Layer 9 defines this as |grad events| / |spawn events|. The earlier implementation divided by the composite-count instead, which produced a number in a different unit (graduations/composite/window) and made the dashboard read inversely to what the theory intended.

Returns 0.0 when there are no spawn events in the window (no denominator) — that matches the V5 semantics of "the network hasn't produced any spawns yet, so graduation_rate is undefined; treat as 0 for plotting".

Source code in src/vacant/mvp/metrics.py
def compute_graduation_rate(snap: MetricsSnapshot, *, window_s: float = 86_400.0) -> float:
    """Graduations per spawn event in the same window.

    THEORY_V5 §Layer 9 defines this as
    `|grad events| / |spawn events|`. The earlier implementation
    divided by the composite-count instead, which produced a number
    in a different unit (graduations/composite/window) and made the
    dashboard read inversely to what the theory intended.

    Returns 0.0 when there are no spawn events in the window (no
    denominator) — that matches the V5 semantics of "the network
    hasn't produced any spawns yet, so graduation_rate is undefined;
    treat as 0 for plotting".
    """
    cutoff = time.time() - window_s
    recent_grads = [t for t in snap.graduations if t >= cutoff]
    recent_spawns = [
        e for e in snap.spawn_events if float(e.get("ts", 0)) >= cutoff
    ]
    if not recent_spawns:
        return 0.0
    return len(recent_grads) / len(recent_spawns)

compute_dispatch_p99_latency

compute_dispatch_p99_latency(snap: MetricsSnapshot) -> float

Wall-clock p99 of dispatch latencies (ms). Returns 0 with <2 samples.

Source code in src/vacant/mvp/metrics.py
def compute_dispatch_p99_latency(snap: MetricsSnapshot) -> float:
    """Wall-clock p99 of dispatch latencies (ms). Returns 0 with <2 samples."""
    samples = list(snap.dispatch_latencies_ms)
    if len(samples) < 2:
        return float(samples[0]) if samples else 0.0
    qs = quantiles(samples, n=100)
    return float(qs[98])  # 99th quantile (index 98 of 99 cuts)

compute_signature_verify_throughput

compute_signature_verify_throughput(*, n_signatures: int = 1000) -> float

Verifications per second on a freshly generated batch. Cheap microbenchmark; runs synchronously.

Source code in src/vacant/mvp/metrics.py
def compute_signature_verify_throughput(
    *,
    n_signatures: int = 1000,
) -> float:
    """Verifications per second on a freshly generated batch.
    Cheap microbenchmark; runs synchronously."""
    from vacant.core.crypto import keygen

    sk, vk = keygen()
    payload = b"the quick brown fox jumps over the lazy dog"
    sigs = [sign(sk, payload) for _ in range(n_signatures)]
    t0 = time.perf_counter()
    for sig in sigs:
        if not verify(vk, payload, sig):
            return 0.0
    elapsed = time.perf_counter() - t0
    return n_signatures / elapsed if elapsed > 0 else 0.0

compute_registry_consistency

compute_registry_consistency(snap: MetricsSnapshot) -> float

% of registry writes that preserved sequence-no monotonicity under concurrent writers. 100% under correct behaviour; <100% indicates a regression.

Source code in src/vacant/mvp/metrics.py
def compute_registry_consistency(snap: MetricsSnapshot) -> float:
    """% of registry writes that preserved sequence-no monotonicity under
    concurrent writers. 100% under correct behaviour; <100% indicates
    a regression."""
    attempted = snap.registry_writes_attempted
    if attempted == 0:
        return 1.0
    return snap.registry_writes_seq_monotonic / attempted

compute_d_spawn_ratio

compute_d_spawn_ratio(snap: MetricsSnapshot) -> float

Share of births that came from D-path agent self-replication (D1-D5) vs total spawns (D + B + C + Z). V5 §Layer 9 lists this as "網路成熟度核心指標,目標 > 0.7". Returns 0.0 with no events.

Source code in src/vacant/mvp/metrics.py
def compute_d_spawn_ratio(snap: MetricsSnapshot) -> float:
    """Share of births that came from D-path agent self-replication
    (D1-D5) vs total spawns (D + B + C + Z). V5 §Layer 9 lists this as
    *"網路成熟度核心指標,目標 > 0.7"*. Returns 0.0 with no events."""
    if not snap.spawn_events:
        return 0.0
    d = sum(1 for e in snap.spawn_events if str(e.get("path", "")).startswith("D"))
    return d / len(snap.spawn_events)

compute_exploration_ratio

compute_exploration_ratio(snap: MetricsSnapshot) -> float

Fraction of caller selections that hit the UCB exploration pool (INSUFFICIENT_DATA candidates) rather than the greedy top-k. V5 §3.6(a): without exploration the network freezes into an oligopoly; the dashboard wants this >= 0.20 in healthy steady state.

Source code in src/vacant/mvp/metrics.py
def compute_exploration_ratio(snap: MetricsSnapshot) -> float:
    """Fraction of caller selections that hit the UCB exploration pool
    (INSUFFICIENT_DATA candidates) rather than the greedy top-k.
    V5 §3.6(a): without exploration the network freezes into an
    oligopoly; the dashboard wants this >= 0.20 in healthy steady state."""
    if not snap.caller_selections:
        return 0.0
    n_exp = sum(1 for s in snap.caller_selections if bool(s.get("was_exploration", False)))
    return n_exp / len(snap.caller_selections)

compute_custody_uncertain_count

compute_custody_uncertain_count(snap: MetricsSnapshot) -> int

Number of vacants flagged custody_uncertain (consecutive missed HEARTBEAT_SUNK past the threshold). V5 §4.2 — sunk heartbeat is the keypair custody attestation, so a missing heartbeat past threshold is a real security signal, not a benign liveness flap.

Source code in src/vacant/mvp/metrics.py
def compute_custody_uncertain_count(snap: MetricsSnapshot) -> int:
    """Number of vacants flagged `custody_uncertain` (consecutive
    missed `HEARTBEAT_SUNK` past the threshold). V5 §4.2 — sunk
    heartbeat is the *keypair custody attestation*, so a missing
    heartbeat past threshold is a real security signal, not a
    benign liveness flap."""
    return len(snap.custody_uncertain_vids)

compute_lineage_capability_drift

compute_lineage_capability_drift(snap: MetricsSnapshot) -> dict[str, float]

Per-lineage L2 drift from the root's earliest STYLO embedding to the most recent member embedding. V5 §4.3 — the lineage is what evolves, not the individual; this metric quantifies that drift.

Returns {lineage_root_short: float, ...}. Empty when no lineage embeddings are recorded.

Source code in src/vacant/mvp/metrics.py
def compute_lineage_capability_drift(snap: MetricsSnapshot) -> dict[str, float]:
    """Per-lineage L2 drift from the root's earliest STYLO embedding to
    the most recent member embedding. V5 §4.3 — the *lineage* is what
    evolves, not the individual; this metric quantifies that drift.

    Returns `{lineage_root_short: float, ...}`. Empty when no lineage
    embeddings are recorded.
    """
    out: dict[str, float] = {}
    for root_vid, embeddings in snap.lineage_embeddings.items():
        if len(embeddings) < 2:
            continue
        first = embeddings[0]
        last = embeddings[-1]
        # L2 distance; tuples must be same dim — silently skip if not.
        if len(first) != len(last):
            continue
        d2 = sum((a - b) ** 2 for a, b in zip(first, last, strict=True))
        out[root_vid.short()] = math.sqrt(d2)
    return out

compute_substrate_diversity

compute_substrate_diversity(snap: MetricsSnapshot) -> float

Shannon entropy (bits) over substrate_primary across all Active/Hibernating vacants. V5 §Layer 9 lists this as a health indicator — higher = less monoculture risk if any single substrate vendor degrades or revokes API access.

Source code in src/vacant/mvp/metrics.py
def compute_substrate_diversity(snap: MetricsSnapshot) -> float:
    """Shannon entropy (bits) over `substrate_primary` across all
    Active/Hibernating vacants. V5 §Layer 9 lists this as a health
    indicator — higher = less monoculture risk if any single substrate
    vendor degrades or revokes API access."""
    counts: Counter[str] = Counter()
    for vid, meta in snap.vacants.items():
        state = meta.get("state")
        if state not in (VacantState.ACTIVE, VacantState.HIBERNATING, VacantState.LOCAL):
            continue
        primary = str(meta.get("substrate_primary", "unknown"))
        counts[primary] += 1
    return _shannon_entropy(counts.values())

compute_controller_diversity

compute_controller_diversity(snap: MetricsSnapshot) -> float

Shannon entropy over controller_id across all Active vacants. V5 §Layer 9 — pair-bar against same_controller_detection_rate to distinguish "many independent operators" (high entropy, low detection) from "one operator across many vacants" (low entropy, high detection).

Source code in src/vacant/mvp/metrics.py
def compute_controller_diversity(snap: MetricsSnapshot) -> float:
    """Shannon entropy over `controller_id` across all Active vacants.
    V5 §Layer 9 — pair-bar against `same_controller_detection_rate` to
    distinguish "many independent operators" (high entropy, low
    detection) from "one operator across many vacants" (low entropy,
    high detection)."""
    counts: Counter[str] = Counter()
    for vid, meta in snap.vacants.items():
        state = meta.get("state")
        if state not in (VacantState.ACTIVE, VacantState.LOCAL):
            continue
        controller = str(meta.get("controller_id", "unknown"))
        counts[controller] += 1
    return _shannon_entropy(counts.values())

compute_peer_review_density

compute_peer_review_density(snap: MetricsSnapshot, *, window_s: float = 7 * 86400.0) -> float

Average peer reviews per Active vacant per window_s (default 1 week). V5 §4.2(e) claims a healthy network should give a new vacant 30+ peer reviews in its first week — this is the quantitative shape of that claim.

Source code in src/vacant/mvp/metrics.py
def compute_peer_review_density(
    snap: MetricsSnapshot, *, window_s: float = 7 * 86_400.0
) -> float:
    """Average peer reviews per Active vacant per `window_s` (default
    1 week). V5 §4.2(e) claims a healthy network should give a new
    vacant 30+ peer reviews in its first week — this is the
    quantitative shape of that claim."""
    n_active = sum(
        1
        for meta in snap.vacants.values()
        if meta.get("state") in (VacantState.ACTIVE, VacantState.HIBERNATING)
    )
    if n_active == 0:
        return 0.0
    cutoff = time.time() - window_s
    n_reviews_in_window = sum(1 for ev in snap.peer_review_events if float(ev.get("ts", 0)) >= cutoff)
    return n_reviews_in_window / n_active

compute_all

compute_all(snap: MetricsSnapshot) -> dict[str, Any]

Run every metric and return a flat dict.

Source code in src/vacant/mvp/metrics.py
def compute_all(snap: MetricsSnapshot) -> dict[str, Any]:
    """Run every metric and return a flat dict."""
    return {
        "reputation_distribution": compute_reputation_distribution(snap),
        "cold_start_uplift": compute_cold_start_uplift(snap),
        "same_controller_detection_rate": compute_same_controller_detection_rate(snap),
        "lineage_depth_distribution": compute_lineage_depth_distribution(snap),
        "graduation_rate": compute_graduation_rate(snap),
        "dispatch_p99_latency_ms": compute_dispatch_p99_latency(snap),
        "signature_verify_throughput_per_s": compute_signature_verify_throughput(),
        "registry_consistency_pct": compute_registry_consistency(snap) * 100.0,
        # --- Layer 9 health indicators -------------------------------------
        "d_spawn_ratio": compute_d_spawn_ratio(snap),
        "exploration_ratio": compute_exploration_ratio(snap),
        "custody_uncertain_count": compute_custody_uncertain_count(snap),
        "lineage_capability_drift": compute_lineage_capability_drift(snap),
        "substrate_diversity": compute_substrate_diversity(snap),
        "controller_diversity": compute_controller_diversity(snap),
        "peer_review_density": compute_peer_review_density(snap),
    }

dashboard

Streamlit dashboard for the P7 demo.

Run with: uv run streamlit run src/vacant/mvp/dashboard.py.

Pages: - 網路 (Network) -- list of vacants with state, capability, mean reputation per dim. - 血緣 (Lineage) -- parent_id chain visualisation. - Scenario -- pick + run; events stream. - 指標 (Metrics) -- 8 metrics, time-series. - 對抗 (Adversarial) -- adversarial seed=666 ring detection.

User-facing text is in 繁體中文 per CLAUDE.md.

render_decentralized_trust

render_decentralized_trust() -> None

去中心化信任 — transparency-log epochs + witness quorum + OTS state.

This page surfaces the 6-layer anti-tamper defenses that technical.html promises: each sealed epoch has a Merkle root, an operator signature, optional git-anchor commit SHA, optional OpenTimestamps receipt, and a set of independent N-of-M witness cosignatures from peer registries.

Without a running registry the page renders an empty-state hint so the demo dashboard doesn't crash when run against pure scenario state.

Source code in src/vacant/mvp/dashboard.py
def render_decentralized_trust() -> None:
    """去中心化信任 — transparency-log epochs + witness quorum + OTS state.

    This page surfaces the 6-layer anti-tamper defenses that technical.html
    promises: each sealed epoch has a Merkle root, an operator signature,
    optional git-anchor commit SHA, optional OpenTimestamps receipt, and a
    set of independent N-of-M witness cosignatures from peer registries.

    Without a running registry the page renders an empty-state hint so
    the demo dashboard doesn't crash when run against pure scenario
    state.
    """
    st.title("去中心化信任 — Decentralized Trust")
    st.caption("6 層防篡改:簽章 → 序號 → 新鮮度 → Merkle → 異常 → 附加(git/OTS/見證)。")

    db_path = st.text_input(
        "Registry SQLite 路徑(留空則略過此頁)",
        value=st.session_state.get("decentral_db", ""),
        key="decentral_db",
        help=(
            "例:var/registry.db。需先以 `RegistryStore` 寫入過事件並執行過 "
            "`seal_epoch(...)`。本頁不會修改資料,只讀取。"
        ),
    )
    if not db_path:
        st.info(
            "尚未指定 registry DB;請先跑情境並指向落地的 sqlite 檔。 "
            "可用 CLI `vacant registry anchor / witness-cosign / verify-quorum` 操作。"
        )
        return

    from pathlib import Path

    from sqlalchemy.ext.asyncio import create_async_engine
    from sqlmodel import select

    from vacant.registry import (
        EpochWitness,
        MerkleEpoch,
        RegistryStore,
        WitnessRootSet,
        verify_witness_quorum,
    )

    if not Path(db_path).exists():
        st.error(f"找不到 {db_path}")
        return

    async def _load() -> tuple[list[MerkleEpoch], dict[int, list[EpochWitness]]]:
        engine = create_async_engine(f"sqlite+aiosqlite:///{db_path}")
        store = RegistryStore(engine)
        try:
            async with store._sessionmaker() as s:
                eres = await s.execute(select(MerkleEpoch).order_by(MerkleEpoch.epoch_id))  # type: ignore[arg-type]
                epochs = list(eres.scalars().all())
                witnesses: dict[int, list[EpochWitness]] = {}
                for e in epochs:
                    wres = await s.execute(
                        select(EpochWitness).where(EpochWitness.epoch_id == e.epoch_id)
                    )
                    witnesses[int(e.epoch_id or 0)] = list(wres.scalars().all())
            return epochs, witnesses
        finally:
            await engine.dispose()

    epochs, witnesses = asyncio.run(_load())
    if not epochs:
        st.warning(
            "此 registry 尚未 seal 過任何 epoch。執行 `await store.seal_epoch(...)` 後再回來。"
        )
        return

    st.subheader(f"Sealed epochs({len(epochs)} 筆)")
    st.dataframe([_epoch_anchor_summary(e) for e in epochs], hide_index=True, width="stretch")

    st.subheader("見證人 quorum(聯邦化信任)")
    rootset_hex = st.text_input(
        "見證人公鑰集合(逗號分隔 hex;N 個候選人)",
        value=st.session_state.get("decentral_rootset", ""),
        key="decentral_rootset",
        help="例:abcd...,1234,...。將會檢查每個 epoch 是否達到 M-of-N 門檻。",
    )
    threshold = st.number_input(
        "Quorum 門檻 M",
        min_value=1,
        value=int(st.session_state.get("decentral_thresh", 1)),
        step=1,
        key="decentral_thresh",
    )

    rootset_keys: tuple[bytes, ...] = ()
    if rootset_hex.strip():
        try:
            rootset_keys = tuple(
                bytes.fromhex(k.strip()) for k in rootset_hex.split(",") if k.strip()
            )
        except ValueError as exc:
            st.error(f"rootset 解析失敗:{exc}")
            rootset_keys = ()

    rows = []
    rs: WitnessRootSet | None = None
    if rootset_keys and len(rootset_keys) >= int(threshold):
        try:
            rs = WitnessRootSet(threshold=int(threshold), keys=rootset_keys)
        except Exception as exc:
            st.error(f"WitnessRootSet 建立失敗:{exc}")
            rs = None
    for e in epochs:
        ws = witnesses[int(e.epoch_id or 0)]
        ok = verify_witness_quorum(epoch=e, cosignatures=ws, rootset=rs) if rs is not None else None
        rows.append(
            {
                "epoch_id": e.epoch_id,
                "見證人 (cosign)": len(ws),
                "quorum": "✅" if ok else ("❌" if ok is False else "—"),
            }
        )
    st.dataframe(rows, hide_index=True, width="stretch")
    st.caption(
        "✅ = M-of-N quorum 達成;❌ = 不足;— = 尚未輸入見證人公鑰集合。 "
        "OTS upgraded 表示 `.ots` 比特幣錨定已升級。 "
        "git_commit_sha 是公開可查的透明日誌節點。"
    )

Scenarios

law_firm

Scenario 1 -- law_firm: composite parent + 2 closed sub-vacants (P7_demo_seed §"Scenario 1 -- law_firm").

Composite "法律問答 vacant" delegates each query to: - "專利查詢" (factual lookup) -- high F signals. - "條款草擬" (logical drafting) -- high L signals.

After 30 simulated calls the composite parent earns from successful delegation; both sub-vacants stay LOCAL (no graduation triggered in this scenario).

code_review

Scenario 2 -- code_review: 5 ACTIVE vacants race to review the same PR; reputation diverges; same-controller signal demonstrably downweights a colluding ring (P7_demo_seed §"Scenario 2 -- code_review").

multilingual_translation

Scenario 3 -- multilingual_translation: cross-substrate dispatch.

6 vacants ("translator") each declare different substrate_spec.allowed_substrates: - 2 prefer claude-sonnet-4-6 - 2 prefer gpt-4o - 2 prefer local-ollama-llama3

10 queries each in en->zh, en->ja, en->es, en->fr (40 total per pair). The aggregator tracks separate posteriors per (vacant_id, substrate). A vacant successfully serving across >=2 substrates earns a portability_factor bonus (+0.05 across F).

self_replication

Scenario 4 -- self_replication: D1/D2/D3/D5 spawns + lineage tree + one graduation (P7_demo_seed §"Scenario 4 -- self_replication").

Over 200 simulated ticks: - D1 spawn at tick 30 (clone with mutation) - D2 spawn at tick 50 (closed subagent-bud) - D3 spawn at tick 80 (capability fork) - D5 spawn at tick 120 (cross-substrate) - Tick 180: try to graduate D2 child.

Assertions checked by the integration test: - Lineage tree depth = 2 (root -> 4 children, no grandchildren) - All 5 vacants share no keypair - All children have parent_id = root - Root logbook has 4 SPAWN entries - D2 child stays LOCAL until graduation - Graduation flips D2 child's manifest to closed_by_default=False with same keypair + extended logbook