vacant.mvp¶
P7 demo layer — the four reference scenarios (law_firm,
code_review, multilingual_translation, self_replication), the 8
P7 metrics, the demo CLI (vacant demo …), and the Streamlit
dashboard.
demo
¶
Demo CLI: python -m vacant.mvp.demo --scenario=<name> [--substrate=<backend>] [--seed=N].
Prints a JSON-encoded ScenarioResult to stdout for piping into
jq / unit tests / fixture-snapshot tooling.
If VACANT_DEMO_DB_PATH (or --db) is set, the run streams events
into the SQLite demo store so the dashboard / vacant demo --tail can
read them back.
metrics
¶
8 metrics for the P7 dashboard (P7_mvp.md §3) plus 7 Layer 9 health indicators from THEORY_V5 §Layer 9.
Each metric is exposed as:
- compute_*(snapshot) -> value -- pure function over a MetricsSnapshot.
- MetricsWriter -- accumulates the values plus a timestamp into an
in-memory deque (and serialises to a SQLite metrics table when one
is provided) so the dashboard can plot time series.
The snapshot is a frozen dataclass that any caller (a scenario, a unit
test, or the dashboard itself) can build from the registry + the
aggregator + the per-scenario ScenarioResult. It does NOT depend on
any I/O; pure compute.
MetricsSnapshot
dataclass
¶
MetricsSnapshot(aggregator: Aggregator | None = None, vacants: dict[VacantId, dict[str, Any]] = dict(), manifests: tuple[ChildManifest, ...] = (), graduations: tuple[float, ...] = (), dispatch_latencies_ms: tuple[float, ...] = (), same_controller_eval: dict[str, int] = dict(), registry_writes_attempted: int = 0, registry_writes_seq_monotonic: int = 0, spawn_events: tuple[dict[str, Any], ...] = (), caller_selections: tuple[dict[str, Any], ...] = (), custody_uncertain_vids: frozenset[VacantId] = frozenset(), lineage_embeddings: dict[VacantId, tuple[tuple[float, ...], ...]] = dict(), peer_review_events: tuple[dict[str, Any], ...] = ())
Inputs to the metrics module. All optional -- missing fields return zero or empty for the corresponding metric.
vacants
class-attribute
instance-attribute
¶
vacants: dict[VacantId, dict[str, Any]] = field(default_factory=dict)
vid -> {state: VacantState, parent_id: VacantId|None, n_calls: int}.
graduations
class-attribute
instance-attribute
¶
Unix timestamps of successful graduations.
dispatch_latencies_ms
class-attribute
instance-attribute
¶
Wall-clock latencies of call_capability in milliseconds.
same_controller_eval
class-attribute
instance-attribute
¶
{'true_positives': N, 'flagged_total': N} from the adversarial set.
registry_writes_seq_monotonic
class-attribute
instance-attribute
¶
Counters for the concurrent-writers metric.
spawn_events
class-attribute
instance-attribute
¶
Each entry shape: {"path": "D1|D2|D3|D4|D5|B|C|Z", "ts": float}.
Powers d_spawn_ratio — the share of births that came from agent
self-replication (D-paths) vs transitional / bootstrap paths.
caller_selections
class-attribute
instance-attribute
¶
Each entry: {"was_exploration": bool, "ts": float} recording
whether the caller's UCB selection came from the exploration pool
(INSUFFICIENT_DATA candidates) vs the greedy top-k. Powers
exploration_ratio — V5 §3.6(a).
custody_uncertain_vids
class-attribute
instance-attribute
¶
custody_uncertain_vids: frozenset[VacantId] = field(default_factory=frozenset)
Vacant IDs flagged custody_uncertain by the heartbeat watcher
(consecutive missed HEARTBEAT_SUNK rounds past the threshold).
Powers custody_uncertain_count — V5 §4.2.
lineage_embeddings
class-attribute
instance-attribute
¶
lineage_embeddings: dict[VacantId, tuple[tuple[float, ...], ...]] = field(default_factory=dict)
Per-lineage-root: tuple of recent member embeddings (STYLO Vec16).
lineage_capability_drift averages the L2 distance from the root's
earliest embedding to the most recent member embedding, per lineage.
peer_review_events
class-attribute
instance-attribute
¶
Each entry: {"target_vid": VacantId, "ts": float}. Powers
peer_review_density — avg reviews per active vacant per week
(THEORY_V5 §Layer 9).
MetricsWriter
dataclass
¶
MetricsWriter(max_points: int = 5000, samples: deque[tuple[float, str, Any]] = (lambda: deque(maxlen=5000))(), _lock: Lock = Lock())
In-memory ring buffer of (ts, metric_name, value) triples for time-series plotting. Configurable max length.
compute_reputation_distribution
¶
compute_reputation_distribution(snap: MetricsSnapshot) -> dict[str, float]
Return per-dimension mean and stdev across all active vacants.
Source code in src/vacant/mvp/metrics.py
compute_cold_start_uplift
¶
compute_cold_start_uplift(snap: MetricsSnapshot) -> float
Fraction of calls that went to new vacants (n_eff < N_MIN_FOR_STABLE_SCORE on any dim). Higher = more exploration. Returns 0 when no calls are recorded.
Source code in src/vacant/mvp/metrics.py
compute_same_controller_detection_rate
¶
compute_same_controller_detection_rate(snap: MetricsSnapshot) -> float
True-positive rate of the same-controller signal on the adversarial set: TP / (TP + FN). Computed from the dashboard's adversarial run.
Source code in src/vacant/mvp/metrics.py
compute_lineage_depth_distribution
¶
compute_lineage_depth_distribution(snap: MetricsSnapshot) -> dict[int, int]
Histogram: depth -> count. Depth 0 = root, depth 1 = root's child, ...
Source code in src/vacant/mvp/metrics.py
compute_graduation_rate
¶
compute_graduation_rate(snap: MetricsSnapshot, *, window_s: float = 86400.0) -> float
Graduations per spawn event in the same window.
THEORY_V5 §Layer 9 defines this as
|grad events| / |spawn events|. The earlier implementation
divided by the composite-count instead, which produced a number
in a different unit (graduations/composite/window) and made the
dashboard read inversely to what the theory intended.
Returns 0.0 when there are no spawn events in the window (no denominator) — that matches the V5 semantics of "the network hasn't produced any spawns yet, so graduation_rate is undefined; treat as 0 for plotting".
Source code in src/vacant/mvp/metrics.py
compute_dispatch_p99_latency
¶
compute_dispatch_p99_latency(snap: MetricsSnapshot) -> float
Wall-clock p99 of dispatch latencies (ms). Returns 0 with <2 samples.
Source code in src/vacant/mvp/metrics.py
compute_signature_verify_throughput
¶
Verifications per second on a freshly generated batch. Cheap microbenchmark; runs synchronously.
Source code in src/vacant/mvp/metrics.py
compute_registry_consistency
¶
compute_registry_consistency(snap: MetricsSnapshot) -> float
% of registry writes that preserved sequence-no monotonicity under concurrent writers. 100% under correct behaviour; <100% indicates a regression.
Source code in src/vacant/mvp/metrics.py
compute_d_spawn_ratio
¶
compute_d_spawn_ratio(snap: MetricsSnapshot) -> float
Share of births that came from D-path agent self-replication (D1-D5) vs total spawns (D + B + C + Z). V5 §Layer 9 lists this as "網路成熟度核心指標,目標 > 0.7". Returns 0.0 with no events.
Source code in src/vacant/mvp/metrics.py
compute_exploration_ratio
¶
compute_exploration_ratio(snap: MetricsSnapshot) -> float
Fraction of caller selections that hit the UCB exploration pool (INSUFFICIENT_DATA candidates) rather than the greedy top-k. V5 §3.6(a): without exploration the network freezes into an oligopoly; the dashboard wants this >= 0.20 in healthy steady state.
Source code in src/vacant/mvp/metrics.py
compute_custody_uncertain_count
¶
compute_custody_uncertain_count(snap: MetricsSnapshot) -> int
Number of vacants flagged custody_uncertain (consecutive
missed HEARTBEAT_SUNK past the threshold). V5 §4.2 — sunk
heartbeat is the keypair custody attestation, so a missing
heartbeat past threshold is a real security signal, not a
benign liveness flap.
Source code in src/vacant/mvp/metrics.py
compute_lineage_capability_drift
¶
compute_lineage_capability_drift(snap: MetricsSnapshot) -> dict[str, float]
Per-lineage L2 drift from the root's earliest STYLO embedding to the most recent member embedding. V5 §4.3 — the lineage is what evolves, not the individual; this metric quantifies that drift.
Returns {lineage_root_short: float, ...}. Empty when no lineage
embeddings are recorded.
Source code in src/vacant/mvp/metrics.py
compute_substrate_diversity
¶
compute_substrate_diversity(snap: MetricsSnapshot) -> float
Shannon entropy (bits) over substrate_primary across all
Active/Hibernating vacants. V5 §Layer 9 lists this as a health
indicator — higher = less monoculture risk if any single substrate
vendor degrades or revokes API access.
Source code in src/vacant/mvp/metrics.py
compute_controller_diversity
¶
compute_controller_diversity(snap: MetricsSnapshot) -> float
Shannon entropy over controller_id across all Active vacants.
V5 §Layer 9 — pair-bar against same_controller_detection_rate to
distinguish "many independent operators" (high entropy, low
detection) from "one operator across many vacants" (low entropy,
high detection).
Source code in src/vacant/mvp/metrics.py
compute_peer_review_density
¶
compute_peer_review_density(snap: MetricsSnapshot, *, window_s: float = 7 * 86400.0) -> float
Average peer reviews per Active vacant per window_s (default
1 week). V5 §4.2(e) claims a healthy network should give a new
vacant 30+ peer reviews in its first week — this is the
quantitative shape of that claim.
Source code in src/vacant/mvp/metrics.py
compute_all
¶
compute_all(snap: MetricsSnapshot) -> dict[str, Any]
Run every metric and return a flat dict.
Source code in src/vacant/mvp/metrics.py
dashboard
¶
Streamlit dashboard for the P7 demo.
Run with: uv run streamlit run src/vacant/mvp/dashboard.py.
Pages: - 網路 (Network) -- list of vacants with state, capability, mean reputation per dim. - 血緣 (Lineage) -- parent_id chain visualisation. - Scenario -- pick + run; events stream. - 指標 (Metrics) -- 8 metrics, time-series. - 對抗 (Adversarial) -- adversarial seed=666 ring detection.
User-facing text is in 繁體中文 per CLAUDE.md.
render_decentralized_trust
¶
去中心化信任 — transparency-log epochs + witness quorum + OTS state.
This page surfaces the 6-layer anti-tamper defenses that technical.html promises: each sealed epoch has a Merkle root, an operator signature, optional git-anchor commit SHA, optional OpenTimestamps receipt, and a set of independent N-of-M witness cosignatures from peer registries.
Without a running registry the page renders an empty-state hint so the demo dashboard doesn't crash when run against pure scenario state.
Source code in src/vacant/mvp/dashboard.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | |
Scenarios¶
law_firm
¶
Scenario 1 -- law_firm: composite parent + 2 closed sub-vacants (P7_demo_seed §"Scenario 1 -- law_firm").
Composite "法律問答 vacant" delegates each query to: - "專利查詢" (factual lookup) -- high F signals. - "條款草擬" (logical drafting) -- high L signals.
After 30 simulated calls the composite parent earns from successful delegation; both sub-vacants stay LOCAL (no graduation triggered in this scenario).
code_review
¶
Scenario 2 -- code_review: 5 ACTIVE vacants race to review the same PR; reputation diverges; same-controller signal demonstrably downweights a colluding ring (P7_demo_seed §"Scenario 2 -- code_review").
multilingual_translation
¶
Scenario 3 -- multilingual_translation: cross-substrate dispatch.
6 vacants ("translator") each declare different substrate_spec.allowed_substrates:
- 2 prefer claude-sonnet-4-6
- 2 prefer gpt-4o
- 2 prefer local-ollama-llama3
10 queries each in en->zh, en->ja, en->es, en->fr (40 total per pair).
The aggregator tracks separate posteriors per (vacant_id, substrate).
A vacant successfully serving across >=2 substrates earns a
portability_factor bonus (+0.05 across F).
self_replication
¶
Scenario 4 -- self_replication: D1/D2/D3/D5 spawns + lineage tree + one graduation (P7_demo_seed §"Scenario 4 -- self_replication").
Over 200 simulated ticks: - D1 spawn at tick 30 (clone with mutation) - D2 spawn at tick 50 (closed subagent-bud) - D3 spawn at tick 80 (capability fork) - D5 spawn at tick 120 (cross-substrate) - Tick 180: try to graduate D2 child.
Assertions checked by the integration test: - Lineage tree depth = 2 (root -> 4 children, no grandchildren) - All 5 vacants share no keypair - All children have parent_id = root - Root logbook has 4 SPAWN entries - D2 child stays LOCAL until graduation - Graduation flips D2 child's manifest to closed_by_default=False with same keypair + extended logbook