Lifecycle design
This page explains the design behind the lifecycle vocabulary. For the authoritative reference see the lifecycle reference; for the exhaustive design record see ADR 0005.
The xs. namespace
Topics that start with xs. are owned by the runtime; everything else is
app data. This separation makes “find all actor lifecycle events” a fast
prefix query (xs.actor.) against the existing topic index, rather than a
full-stream scan. It also gives a one-glance rule for anyone reading a
topic: starts with xs., runtime managed; else, your data.
The choice cascades:
- Lifecycle events fit one shape:
xs.<kind>.<name>.<event>. - Module registrations join the namespace:
xs.module.<name>. - Per-call data (an action’s
.callor.response, a service’s.recvor.send) stays in the app’s namespace. That data is the user’s; the runtime is only routing.
Shared vocabulary across the three kinds
Actors, services, and actions used to each have their own ad-hoc set of
lifecycle topics: .register/.unregistered (actor), .spawn/.stopped
with a meta reason (service), .define/.ready/.error (action). Three
sets, three separate implementations of historical compaction, three
different ways for users to ask “did this thing stop?”.
The unified set is nine events:
- Inputs:
create,term(user appends these). - Run marker:
active(runtime ack on successful start). - Failure to init:
invalid. - Terminal stops:
fin.ok,fin.error,fin.term. - Transient stops:
replaced(a successor is coming) andstopped(xs is coming back).
Actions use a subset (no fin.ok, no fin.error, no stopped, no
replaced) because actions don’t run long-lived tasks: a re-create
rebuilds the definition and re-emits active rather than stepping a
running instance aside. The shape is otherwise the same.
The dispatcher logic is the same for all three. Only the per-kind prefix in the topic string differs.
The two-slot compaction algorithm
For every <kind>.<name>, the dispatcher keeps two slots: confirmed
(the last create that emitted an active) and pending (the latest
create with no terminal ack yet).
The reason for two slots is a single race: what happens when a user
appends a new create to hot-replace a working version, and the new
version is broken? Three things can happen in order:
- The old, running version reads the new
createfrom its own stream subscription and exits. - The dispatcher tries to parse the new
create’s script. If it fails, the dispatcher emitsinvalid. - The user, or the dispatcher on next restart, has to decide what runs now.
A latest-wins scheme remembers only the broken create. The system goes
down because of a typo. The two-slot scheme remembers the last
known-good version separately. When invalid arrives, pending is
cleared but confirmed survives. At threshold the dispatcher prefers
pending and falls back to confirmed on parse failure. The system
stays up.
The same two slots cover three related cases:
- xs died mid-spawn (the new
createhas no ack): try it; on parse fail, fall back. - Server crash mid-run (no terminal ack at all): start
confirmed. - Server shutdown (a
stoppedack arrived): leave both slots untouched; resume on next start.
Splitting fin.* into three topics
fin.ok, fin.error, and fin.term could be one topic with a reason
field in meta. They aren’t, because then “did this service stop because
of an error or because the user asked?” requires reading meta on every
frame. Putting the distinction in the topic shape lets observers
subscribe to just one stream (xs.service.api.fin.error) without
filtering, and lets the compaction algorithm key off the topic suffix
directly.
The same reasoning explains why replaced and stopped sit outside the
fin.* family. They look like ends but aren’t terminal:
replacedindicates a successor is coming. It deliberately doesn’t clearconfirmedso the fallback survives the brief window before the new instance’sactivelands.stoppedindicates xs is going down but coming back. It must be invisible to compaction so the next restart resumes the service.
A scheme that compacted on any stop would silently take services down on every xs restart, which was a real bug in the pre-rename codebase.
Breaking change
xs is pre-1.0. Rather than carry an in-binary conversion path for pre-rename stores, the new code only knows the new vocabulary. Existing stores written by pre-rename xs binaries are not readable. Users either start fresh, stay on an older release, or write their own conversion using the topic mapping documented in the ADR.
Further reading
- Lifecycle reference: the authoritative event list, state-transition table, threshold rule, and invariants.
- Topics reference: naming rules, hierarchy, the
xs.namespace, system topics. - ADR 0005: the full design record, including the eight deficiencies that drove the rewrite and the coverage check.