Skip to content

Lifecycle design

This page explains the design behind the lifecycle vocabulary. For the authoritative reference see the lifecycle reference; for the exhaustive design record see ADR 0005.

The xs. namespace

Topics that start with xs. are owned by the runtime; everything else is app data. This separation makes “find all actor lifecycle events” a fast prefix query (xs.actor.) against the existing topic index, rather than a full-stream scan. It also gives a one-glance rule for anyone reading a topic: starts with xs., runtime managed; else, your data.

The choice cascades:

  • Lifecycle events fit one shape: xs.<kind>.<name>.<event>.
  • Module registrations join the namespace: xs.module.<name>.
  • Per-call data (an action’s .call or .response, a service’s .recv or .send) stays in the app’s namespace. That data is the user’s; the runtime is only routing.

Shared vocabulary across the three kinds

Actors, services, and actions used to each have their own ad-hoc set of lifecycle topics: .register/.unregistered (actor), .spawn/.stopped with a meta reason (service), .define/.ready/.error (action). Three sets, three separate implementations of historical compaction, three different ways for users to ask “did this thing stop?”.

The unified set is nine events:

  • Inputs: create, term (user appends these).
  • Run marker: active (runtime ack on successful start).
  • Failure to init: invalid.
  • Terminal stops: fin.ok, fin.error, fin.term.
  • Transient stops: replaced (a successor is coming) and stopped (xs is coming back).

Actions use a subset (no fin.ok, no fin.error, no stopped, no replaced) because actions don’t run long-lived tasks: a re-create rebuilds the definition and re-emits active rather than stepping a running instance aside. The shape is otherwise the same.

The dispatcher logic is the same for all three. Only the per-kind prefix in the topic string differs.

The two-slot compaction algorithm

For every <kind>.<name>, the dispatcher keeps two slots: confirmed (the last create that emitted an active) and pending (the latest create with no terminal ack yet).

The reason for two slots is a single race: what happens when a user appends a new create to hot-replace a working version, and the new version is broken? Three things can happen in order:

  1. The old, running version reads the new create from its own stream subscription and exits.
  2. The dispatcher tries to parse the new create’s script. If it fails, the dispatcher emits invalid.
  3. The user, or the dispatcher on next restart, has to decide what runs now.

A latest-wins scheme remembers only the broken create. The system goes down because of a typo. The two-slot scheme remembers the last known-good version separately. When invalid arrives, pending is cleared but confirmed survives. At threshold the dispatcher prefers pending and falls back to confirmed on parse failure. The system stays up.

The same two slots cover three related cases:

  • xs died mid-spawn (the new create has no ack): try it; on parse fail, fall back.
  • Server crash mid-run (no terminal ack at all): start confirmed.
  • Server shutdown (a stopped ack arrived): leave both slots untouched; resume on next start.

Splitting fin.* into three topics

fin.ok, fin.error, and fin.term could be one topic with a reason field in meta. They aren’t, because then “did this service stop because of an error or because the user asked?” requires reading meta on every frame. Putting the distinction in the topic shape lets observers subscribe to just one stream (xs.service.api.fin.error) without filtering, and lets the compaction algorithm key off the topic suffix directly.

The same reasoning explains why replaced and stopped sit outside the fin.* family. They look like ends but aren’t terminal:

  • replaced indicates a successor is coming. It deliberately doesn’t clear confirmed so the fallback survives the brief window before the new instance’s active lands.
  • stopped indicates xs is going down but coming back. It must be invisible to compaction so the next restart resumes the service.

A scheme that compacted on any stop would silently take services down on every xs restart, which was a real bug in the pre-rename codebase.

Breaking change

xs is pre-1.0. Rather than carry an in-binary conversion path for pre-rename stores, the new code only knows the new vocabulary. Existing stores written by pre-rename xs binaries are not readable. Users either start fresh, stay on an older release, or write their own conversion using the topic mapping documented in the ADR.

Further reading

  • Lifecycle reference: the authoritative event list, state-transition table, threshold rule, and invariants.
  • Topics reference: naming rules, hierarchy, the xs. namespace, system topics.
  • ADR 0005: the full design record, including the eight deficiencies that drove the rewrite and the coverage check.