Summary Bus contract

Contract for deterministic summaries with strict provenance.

Purpose

The Summary Bus carries summary objects derived from either Event Bus events or Sessions Bus sessions.

This page defines:

Summary object schemas for events and sessions
On-disk endpoints and manifest rules
Provenance requirements and invariants
Determinism boundaries and what must be reproducible
Smoke test expectations and failure modes

This contract prioritizes traceability and replay over subjective summary quality.

Scope

In scope:

event_summary objects: summaries over a single event or a deterministic group of events
session_summary objects: summaries over a single session object
Manifests for both summary streams
Provenance fields and invariants
Error and accounting rules for skipped items

Out of scope:

How to parse raw ChatGPT exports
How sessions are computed
How embeddings are computed
Quality evaluation of model outputs
Any publishing or markdown rendering

What this is

A service bus output that can be consumed by packagers (Digest Engine) and other downstream processes.
A traceable bridge between raw structured objects and human-meaningful synthesis fields.

What this is not

A replacement for the Event Bus or Sessions Bus
A place to do bagging, publishing, or UI export
A place to hide missing coverage by silently dropping inputs

Endpoints

Event summaries

Daily file pattern:

summaries/events/YYYY-MM-DD.events.summary.jsonl

Manifest pattern:

summaries/manifest/YYYY-MM-DD.events.summary.manifest.json

Session summaries

Daily file pattern:

summaries/sessions/YYYY-MM-DD.sessions.summary.jsonl

Manifest pattern:

summaries/manifest/YYYY-MM-DD.sessions.summary.manifest.json

Rules:

JSONL: one object per line, UTF-8
Daily file must exist even if empty
Manifest must exist even if daily file is empty
Consumers must read only these endpoints and their manifests

Canonical summary schemas

The Summary Bus defines two related schemas:

event_summary.v1
session_summary.v1

Both share a common structure: identity, provenance, selection, model metadata, outputs, and accounting.

Event summary schema

Required fields

schema_version
String. Example: event_summary.v1
summary_id
String. Stable id for this summary object.
day
String. YYYY-MM-DD partition.
source_type
String. Must be event.
source_ids
Array of strings. Must include at least one Event Bus event_id.
selection
Object describing exactly what text was summarized.

Required keys inside selection:

selection_type
String. Example: single_event or event_slice
source_text_hash
String. Hash of the exact text payload that was summarized, after deterministic normalization.
normalization
Object describing normalization rules applied before hashing.

Minimum keys inside normalization:

name string
version string
model
Object describing the model invocation identity.

Required keys inside model:

provider string
model_name string
model_version string or empty string if unavailable
temperature number or null if unknown
max_tokens integer or null if unknown
prompt
Object describing the prompt identity.

Required keys inside prompt:

prompt_hash string
template_id string or name
prompt_version string
producer
Object describing the summarizer implementation.

Required keys inside producer:

summarizer_version string
run_id string
outputs
Object holding the summary content.

Required keys inside outputs:

summary_text string

Optional fields

outputs.tags array
outputs.topics array
outputs.category string
outputs.actions array
outputs.confidence number
outputs.format_type string
outputs.notes string

Any optional field that is model generated must be treated as model output and must not be interpreted as deterministic truth.

Session summary schema

Required fields

schema_version
String. Example: session_summary.v1
summary_id
String. Stable id for this summary object.
day
String. YYYY-MM-DD partition.
source_type
String. Must be session.
source_ids
Array of strings. Must include exactly one session_id as the primary id.
event_ids
Array of strings. The session evidence pointer list. Must match the referenced session object or be declared as a derived slice.
selection
Object describing what text was summarized.

Required keys inside selection:

selection_type
String. Example: session_full or session_slice
source_text_hash
String. Hash of the exact text payload summarized.
normalization
Object describing deterministic normalization before hashing.
model
Same required fields as event summaries.
prompt
Same required fields as event summaries.
producer
Same required fields as event summaries.
outputs.summary_text
String.

Optional fields

Same pattern as event summaries.

Provenance rules

Provenance is not optional. These requirements exist to prevent silent drift.

Required provenance fields

Every summary object must contain:

source_ids list
selection.source_text_hash
selection.normalization marker
model metadata fields under model
prompt identity under prompt including prompt_hash
producer identity under producer including summarizer_version and run_id

Source text hash definition

The source_text_hash must be computed over a deterministic serialization of the selected content.

Minimum requirements:

Normalize line endings
Remove or normalize known non-semantic whitespace
Apply a documented normalization rule set with a version marker

Consumers must treat the hash as the canonical anchor for "what was summarized".

Accounting and coverage rules

Summaries must not silently drop items.

If an item is skipped, the system must:

record skip counts in the manifest
include a breakdown of skip reasons
optionally emit a skip report artifact if the run is large

Determinism boundaries

The Summary Bus must be deterministic where it can be deterministic, and explicit where it cannot.

Deterministic components

These must be deterministic and replayable:

Selection logic for which items are summarized for a day
Ordering of inputs
Text normalization rules before hashing
Prompt template choice and prompt hash derivation
Summary id derivation

Non-deterministic components

Model output is not deterministic in general.

Rules:

Model output fields must be labeled as model generated output.
The exact prompt hash, model identity, and selection hash must be recorded so a re-run is explainable.
If you enable a deterministic mode for the model provider, record the setting but do not assume perfect determinism.

Manifest contract

Event summary manifest

Minimum required fields for YYYY-MM-DD.events.summary.manifest.json:

schema_version string, example: events_summary_manifest.v1
bus_schema_version string, example: event_summary.v1
day string
input object describing what was consumed:
- eventbus_manifest_day string
- eventbus_manifest_sha256 string
- optional eventbus_range if you consume more than one day
paths object:
- summaries_path string
counts object:
- eligible integer
- produced integer
- skipped integer
- failed integer
skip_reasons object mapping reason code to integer count
integrity object:
- sha256 string
- bytes integer
producer object:
- summarizer_version string
- run_id string
- model_name string
- prompt_hash string

Session summary manifest

Same structure, but input references Sessions Bus manifest(s):

sessions_manifest_day
sessions_manifest_sha256

Invariants

Mandatory contract invariants:

schema_version present on every summary object
source_ids always present and non-empty
selection.source_text_hash always present
Manifest always present even if the day output is empty
Daily summary file exists even if empty
Counts must be consistent:
- eligible = produced + skipped + failed
Items must not be silently dropped:
- any drop must be reflected in counts and skip reasons

Compatibility rules

Consumers must ignore unknown optional fields.
Adding optional fields does not require a schema bump if semantics do not change.
Any change that affects provenance interpretation, summary id derivation, or required fields requires:
- ADR
- Schema version bump
- Minimal migration note

Backward compatibility window:

Consumers must support at least the latest and the previous summary schema versions unless an ADR states otherwise.

Smoke test

Purpose: validate schema and provenance, not model quality.

A smoke test must validate:

Can read the relevant upstream manifest(s)
Can produce empty-but-valid outputs when upstream is empty
Output JSONL parses
Every summary object contains:
- schema_version
- summary_id
- source_ids
- selection.source_text_hash
- model and prompt objects with required keys
- producer.run_id and producer.summarizer_version
Manifest exists and counts reconcile
Skip reasons are present when skipped > 0

Acceptance criteria:

No schema violations
No missing provenance fields
No count reconciliation errors

Failure modes and required behavior

Stop-the-line behavior is mandatory: fail fast, record failure in run records, do not silently degrade.

Common failure modes:

Missing upstream manifest
Schema mismatch in upstream inputs
Malformed JSONL output
Missing provenance fields
Manifest mismatch or unreconciled counts
Selection hash computation failure

Purpose​

Scope​

What this is​

What this is not​

Endpoints​

Event summaries​

Session summaries​

Canonical summary schemas​

Event summary schema​

Required fields​

Optional fields​

Session summary schema​

Required fields​

Optional fields​

Provenance rules​

Required provenance fields​

Source text hash definition​

Accounting and coverage rules​

Determinism boundaries​

Deterministic components​

Non-deterministic components​

Manifest contract​

Event summary manifest​

Session summary manifest​

Invariants​

Compatibility rules​

Smoke test​

Failure modes and required behavior​

Purpose

Scope

What this is

What this is not

Endpoints

Event summaries

Session summaries

Canonical summary schemas

Event summary schema

Required fields

Optional fields

Session summary schema

Required fields

Optional fields

Provenance rules

Required provenance fields

Source text hash definition

Accounting and coverage rules

Determinism boundaries

Deterministic components

Non-deterministic components

Manifest contract

Event summary manifest

Session summary manifest

Invariants

Compatibility rules

Smoke test

Failure modes and required behavior