vacant.mvp¶
P7 demo layer — the four reference scenarios (law_firm,
code_review, multilingual_translation, self_replication), the 8
P7 metrics, the demo CLI (vacant demo …), and the Streamlit
dashboard.
demo
¶
Demo CLI: python -m vacant.mvp.demo --scenario=<name> [--substrate=<backend>] [--seed=N].
Prints a JSON-encoded ScenarioResult to stdout for piping into
jq / unit tests / fixture-snapshot tooling.
If VACANT_DEMO_DB_PATH (or --db) is set, the run streams events
into the SQLite demo store so the dashboard / vacant demo --tail can
read them back.
metrics
¶
8 metrics for the P7 dashboard (P7_mvp.md §3).
Each metric is exposed as:
- compute_*(snapshot) -> value -- pure function over a MetricsSnapshot.
- MetricsWriter -- accumulates the 8 values plus a timestamp into an
in-memory deque (and serialises to a SQLite metrics table when one
is provided) so the dashboard can plot time series.
The snapshot is a frozen dataclass that any caller (a scenario, a unit
test, or the dashboard itself) can build from the registry + the
aggregator + the per-scenario ScenarioResult. It does NOT depend on
any I/O; pure compute.
MetricsSnapshot
dataclass
¶
MetricsSnapshot(aggregator: Aggregator | None = None, vacants: dict[VacantId, dict[str, Any]] = dict(), manifests: tuple[ChildManifest, ...] = (), graduations: tuple[float, ...] = (), dispatch_latencies_ms: tuple[float, ...] = (), same_controller_eval: dict[str, int] = dict(), registry_writes_attempted: int = 0, registry_writes_seq_monotonic: int = 0)
Inputs to the metrics module. All optional -- missing fields return zero or empty for the corresponding metric.
vacants
class-attribute
instance-attribute
¶
vacants: dict[VacantId, dict[str, Any]] = field(default_factory=dict)
vid -> {state: VacantState, parent_id: VacantId|None, n_calls: int}.
graduations
class-attribute
instance-attribute
¶
Unix timestamps of successful graduations.
dispatch_latencies_ms
class-attribute
instance-attribute
¶
Wall-clock latencies of call_capability in milliseconds.
same_controller_eval
class-attribute
instance-attribute
¶
{'true_positives': N, 'flagged_total': N} from the adversarial set.
registry_writes_seq_monotonic
class-attribute
instance-attribute
¶
Counters for the concurrent-writers metric.
MetricsWriter
dataclass
¶
MetricsWriter(max_points: int = 5000, samples: deque[tuple[float, str, Any]] = (lambda: deque(maxlen=5000))(), _lock: Lock = Lock())
In-memory ring buffer of (ts, metric_name, value) triples for time-series plotting. Configurable max length.
compute_reputation_distribution
¶
compute_reputation_distribution(snap: MetricsSnapshot) -> dict[str, float]
Return per-dimension mean and stdev across all active vacants.
Source code in src/vacant/mvp/metrics.py
compute_cold_start_uplift
¶
compute_cold_start_uplift(snap: MetricsSnapshot) -> float
Fraction of calls that went to new vacants (n_eff < N_MIN_FOR_STABLE_SCORE on any dim). Higher = more exploration. Returns 0 when no calls are recorded.
Source code in src/vacant/mvp/metrics.py
compute_same_controller_detection_rate
¶
compute_same_controller_detection_rate(snap: MetricsSnapshot) -> float
True-positive rate of the same-controller signal on the adversarial set: TP / (TP + FN). Computed from the dashboard's adversarial run.
Source code in src/vacant/mvp/metrics.py
compute_lineage_depth_distribution
¶
compute_lineage_depth_distribution(snap: MetricsSnapshot) -> dict[int, int]
Histogram: depth -> count. Depth 0 = root, depth 1 = root's child, ...
Source code in src/vacant/mvp/metrics.py
compute_graduation_rate
¶
compute_graduation_rate(snap: MetricsSnapshot, *, window_s: float = 86400.0) -> float
Graduations per (composite, time-window).
Source code in src/vacant/mvp/metrics.py
compute_dispatch_p99_latency
¶
compute_dispatch_p99_latency(snap: MetricsSnapshot) -> float
Wall-clock p99 of dispatch latencies (ms). Returns 0 with <2 samples.
Source code in src/vacant/mvp/metrics.py
compute_signature_verify_throughput
¶
Verifications per second on a freshly generated batch. Cheap microbenchmark; runs synchronously.
Source code in src/vacant/mvp/metrics.py
compute_registry_consistency
¶
compute_registry_consistency(snap: MetricsSnapshot) -> float
% of registry writes that preserved sequence-no monotonicity under concurrent writers. 100% under correct behaviour; <100% indicates a regression.
Source code in src/vacant/mvp/metrics.py
compute_all
¶
compute_all(snap: MetricsSnapshot) -> dict[str, Any]
Run every metric and return a flat dict.
Source code in src/vacant/mvp/metrics.py
dashboard
¶
Streamlit dashboard for the P7 demo.
Run with: uv run streamlit run src/vacant/mvp/dashboard.py.
Pages: - 網路 (Network) -- list of vacants with state, capability, mean reputation per dim. - 血緣 (Lineage) -- parent_id chain visualisation. - Scenario -- pick + run; events stream. - 指標 (Metrics) -- 8 metrics, time-series. - 對抗 (Adversarial) -- adversarial seed=666 ring detection.
User-facing text is in 繁體中文 per CLAUDE.md.
Scenarios¶
law_firm
¶
Scenario 1 -- law_firm: composite parent + 2 closed sub-vacants (P7_demo_seed §"Scenario 1 -- law_firm").
Composite "法律問答 vacant" delegates each query to: - "專利查詢" (factual lookup) -- high F signals. - "條款草擬" (logical drafting) -- high L signals.
After 30 simulated calls the composite parent earns from successful delegation; both sub-vacants stay LOCAL (no graduation triggered in this scenario).
code_review
¶
Scenario 2 -- code_review: 5 ACTIVE vacants race to review the same PR; reputation diverges; same-controller signal demonstrably downweights a colluding ring (P7_demo_seed §"Scenario 2 -- code_review").
multilingual_translation
¶
Scenario 3 -- multilingual_translation: cross-substrate dispatch.
6 vacants ("translator") each declare different substrate_spec.allowed_substrates:
- 2 prefer claude-sonnet-4-6
- 2 prefer gpt-4o
- 2 prefer local-ollama-llama3
10 queries each in en->zh, en->ja, en->es, en->fr (40 total per pair).
The aggregator tracks separate posteriors per (vacant_id, substrate).
A vacant successfully serving across >=2 substrates earns a
portability_factor bonus (+0.05 across F).
self_replication
¶
Scenario 4 -- self_replication: D1/D2/D3/D5 spawns + lineage tree + one graduation (P7_demo_seed §"Scenario 4 -- self_replication").
Over 200 simulated ticks: - D1 spawn at tick 30 (clone with mutation) - D2 spawn at tick 50 (closed subagent-bud) - D3 spawn at tick 80 (capability fork) - D5 spawn at tick 120 (cross-substrate) - Tick 180: try to graduate D2 child.
Assertions checked by the integration test: - Lineage tree depth = 2 (root -> 4 children, no grandchildren) - All 5 vacants share no keypair - All children have parent_id = root - Root logbook has 4 SPAWN entries - D2 child stays LOCAL until graduation - Graduation flips D2 child's manifest to closed_by_default=False with same keypair + extended logbook