Architecture
cross.stream is built on a decoupled architecture that separates event metadata from content storage, optimizing for both stream processing and content retrieval.
Core Components
Event Stream
The event stream uses fjall (a log-structured merge-tree store) as an append-only index for frame metadata. Each frame contains:
- Unique ID (using SCRU128)
- Content hash (if content exists)
- Custom metadata
- TTL (Time-To-Live) settings
Content Storage (CAS)
Content is stored separately in a Content-Addressable Storage (CAS) system implemented using cacache. Key features:
- Content is immutable
- Content is referenced by its cryptographic hash
- Multiple frames can reference the same content
- Content is stored once, regardless of how many frames reference it
When to Use Meta vs CAS
Prefer metadata for structured data (records, numbers, small strings). Metadata
is inline on the frame, so reading 50 frames gives you 50 values with no extra
lookups. Actor output uses this by default: the out record is stored as frame
metadata.
Use CAS for blobs: images, large text, binary content. CAS gives you
deduplication and streaming reads, but each value requires a separate lookup.
Actors can opt into CAS with return_options: { target: "cas" }.
Rule of thumb: if the value is something you’d put in a JSON field, it belongs in meta. If it’s something you’d serve as a file, it belongs in CAS.
Server Process
A server process sits between clients and the store, enabling:
- Concurrent access from multiple clients
- Real-time subscriptions to new events
- Background maintenance tasks
Data Flow
Writing Events
When appending an event:
- If content is provided, it’s written to CAS first, generating a hash
- A frame is created with metadata and the content hash
- The frame is appended to the
fjallindex - The frame is broadcast to any active subscribers
- A topic index is updated for quick head retrieval
Reading Events
When reading the stream (cat):
- Only frame metadata is retrieved from
fjall - Content remains in CAS until specifically requested
- The stream can be read efficiently without pulling content
Content retrieval (cas):
- Content is fetched from CAS using the hash from a frame
- This is a separate operation from stream reading
- Content is retrieved only when needed