Summary Bus contract
Contract for deterministic summaries with strict provenance.
Purpose
The Summary Bus carries summary objects derived from canonical upstream structured sources, including Event Bus events, Sessions Bus sessions, and Chunk Bus document or chunk selections.
This page defines:
- Summary object schemas for events, sessions, documents, and chunk sets
- On-disk endpoints and manifest rules
- Provenance requirements and invariants
- Determinism boundaries and what must be reproducible
- Smoke test expectations and failure modes
This contract prioritizes traceability and replay over subjective summary quality.
Scope
In scope:
event_summaryobjects: summaries over a single event or a deterministic group of eventssession_summaryobjects: summaries over a single session objectdocument_summaryobjects: summaries over one document or a deterministic document slicechunk_set_summaryobjects: summaries over a deterministic set of chunk ids- manifests for all sanctioned summary streams
- provenance fields and invariants
- determinism rules for document and chunk-set selections
- optional structured synthesis fields for hierarchical summaries
- error and accounting rules for skipped items
Out of scope:
- How to parse raw ChatGPT exports
- How sessions are computed
- How embeddings are computed
- Quality evaluation of model outputs
- Any publishing or markdown rendering
What this is
- A service bus output that can be consumed by packagers (Digest Engine) and other downstream processes.
- A traceable bridge between raw structured objects and human-meaningful synthesis fields.
- The sanctioned storage layer for both compatibility-safe summary text and richer structured synthesis payloads when available.
Consumers do not write summaries. Consumers either read existing summaries by manifest, or create a request to the summarizer service (now or scheduled). This prevents teams from “helpfully” generating summaries inside their own repos and breaking determinism guarantees.
What this is not
- A replacement for the Event Bus, Sessions Bus, or Chunk Bus
- A place to do bagging, publishing, or UI export
- A place to hide missing coverage by silently dropping inputs
Requests are out of bus
Summary Bus is a storage and provenance contract for completed summary_item artifacts. It is not an orchestration interface.
Rule
Consumers and producers must never write summary items directly into Summary Bus.
How summaries are requested
Repos request summaries through the Summary Request seam, by appending a summary_request.v1 JSON object to the Summarizer Service queue.
See: Summary Request Seam for:
summary_request.v1schema- queue storage projection and invariants
- idempotency and retry semantics
- formal definition of
nowvsscheduled
Endpoints
Event summaries
Daily file pattern:
summaries/events/YYYY-MM-DD.events.summary.jsonl
Manifest pattern:
summaries/manifest/YYYY-MM-DD.events.summary.manifest.json
Session summaries
Daily file pattern:
summaries/sessions/YYYY-MM-DD.sessions.summary.jsonl
Manifest pattern:
summaries/manifest/YYYY-MM-DD.sessions.summary.manifest.json
Document summaries
Daily file pattern:
summaries/documents/YYYY-MM-DD.documents.summary.jsonl
Manifest pattern:
summaries/manifest/YYYY-MM-DD.documents.summary.manifest.json
Chunk-set summaries
Daily file pattern:
summaries/chunk_sets/YYYY-MM-DD.chunk_sets.summary.jsonl
Manifest pattern:
summaries/manifest/YYYY-MM-DD.chunk_sets.summary.manifest.json
Rules:
- JSONL: one object per line, UTF-8
- Daily file must exist even if empty
- Manifest must exist even if daily file is empty
- Consumers must read only these endpoints and their manifests
Canonical summary schemas
The Summary Bus defines four related schemas:
event_summary.v1session_summary.v1document_summary.v1chunk_set_summary.v1
All share a common structure: identity, provenance, selection, model metadata, outputs, and accounting.
Event summary schema
Required fields
-
schema_versionString. Example:event_summary.v1 -
summary_idString. Stable id for this summary object. -
dayString.YYYY-MM-DDpartition. -
source_typeString. Must beevent. -
source_idsArray of strings. Must include at least one Event Busevent_id. -
selectionObject describing exactly what text was summarized.
Required keys inside selection:
-
selection_typeString. Example:single_eventorevent_slice -
source_text_hashString. Hash of the exact text payload that was summarized, after deterministic normalization. -
normalizationObject describing normalization rules applied before hashing.
Minimum keys inside normalization:
-
namestring -
versionstring -
modelObject describing the model invocation identity.
Required keys inside model:
-
providerstring -
model_namestring -
model_versionstring or empty string if unavailable -
temperaturenumber or null if unknown -
max_tokensinteger or null if unknown -
promptObject describing the prompt identity.
Required keys inside prompt:
-
prompt_hashstring -
template_idstring or name -
prompt_versionstring -
producerObject describing the summarizer implementation.
Required keys inside producer:
-
summarizer_versionstring -
run_idstring -
outputsObject holding the summary content.
Required keys inside outputs:
summary_textstring
Optional fields
outputs.tagsarrayoutputs.topicsarrayoutputs.categorystringoutputs.actionsarrayoutputs.confidencenumberoutputs.format_typestringoutputs.notesstring
Any optional field that is model generated must be treated as model output and must not be interpreted as deterministic truth.
Session summary schema
Required fields
-
schema_versionString. Example:session_summary.v1 -
summary_idString. Stable id for this summary object. -
dayString.YYYY-MM-DDpartition. -
source_typeString. Must besession. -
source_idsArray of strings. Must include exactly onesession_idas the primary id. -
event_idsArray of strings. The session evidence pointer list. Must match the referenced session object or be declared as a derived slice. -
selectionObject describing what text was summarized.
Required keys inside selection:
-
selection_typeString. Example:session_fullorsession_slice -
source_text_hashString. Hash of the exact text payload summarized. -
normalizationObject describing deterministic normalization before hashing. -
modelSame required fields as event summaries. -
promptSame required fields as event summaries. -
producerSame required fields as event summaries. -
outputs.summary_textString.
Optional fields
Same pattern as event summaries.
Document summary schema
Required fields
-
schema_versionString. Example:document_summary.v1 -
summary_idString. Stable id for this summary object. -
dayString.YYYY-MM-DDpartition. -
source_typeString. Must bedocument. -
source_idsArray of strings. Must include at least onedocument_id. -
selectionObject describing exactly which document content was summarized.
Required keys inside selection:
-
selection_typeString. Must be one ofdocument_fullordocument_slice. -
source_text_hashString. Hash of the exact selected document text after deterministic normalization. -
normalizationObject describing normalization rules applied before hashing. -
modelSame required fields as event summaries. -
promptSame required fields as event summaries. -
producerSame required fields as event summaries. -
outputs.summary_textString.
Recommended fields
document_idstring as a convenience alias to the primary source idchunk_idsarray when the document summary was assembled from explicit chunk references
Optional fields
Same pattern as event summaries, plus any structured synthesis fields defined below.
Chunk-set summary schema
Required fields
-
schema_versionString. Example:chunk_set_summary.v1 -
summary_idString. Stable id for this summary object. -
dayString.YYYY-MM-DDpartition. -
source_typeString. Must bechunk_set. -
source_idsArray of strings. Must contain the selectedchunk_idvalues. -
selectionObject describing exactly which chunk selection was summarized.
Required keys inside selection:
-
selection_typeString. Must be one ofchunk_set,document_subset, orselection_manifest. -
source_text_hashString. Hash of the exact normalized chunk-set text payload that was summarized. -
normalizationObject describing normalization rules applied before hashing. -
modelSame required fields as event summaries. -
promptSame required fields as event summaries. -
producerSame required fields as event summaries. -
outputs.summary_textString.
Recommended fields
document_idsarray when chunk ids span known documents
Optional fields
Same pattern as event summaries, plus any structured synthesis fields defined below.
Structured synthesis outputs
For document-like sources, the canonical summary artifact is a traceable hierarchical synthesis derived from Chunk Bus inputs.
Rules:
outputs.summary_textremains required for compatibility with simple consumers.- Structured hierarchy fields are optional, but when present they are the canonical additional meaning surface for document-oriented summaries.
- Consumers must ignore unknown optional fields.
Optional structured fields include:
outputs.hierarchyoutputs.hierarchy_versionoutputs.node_countoutputs.leaf_countoutputs.coverageoutputs.qualityoutputs.evidence_refsoutputs.uncertaintiesoutputs.rendered_shortoutputs.rendered_mediumoutputs.rendered_long
Provenance rules
Provenance is not optional. These requirements exist to prevent silent drift.
Required provenance fields
Every summary object must contain:
source_idslistselection.source_text_hashselection.normalizationmarker- model metadata fields under
model - prompt identity under
promptincludingprompt_hash - producer identity under
producerincludingsummarizer_versionandrun_id
Document and chunk-set summaries must additionally allow a reviewer to reconcile the selected source ids against canonical document_id and chunk_id anchors from Chunk Bus.
Source text hash definition
The source_text_hash must be computed over a deterministic serialization of the selected content.
Minimum requirements:
- Normalize line endings
- Remove or normalize known non-semantic whitespace
- Apply a documented normalization rule set with a version marker
Consumers must treat the hash as the canonical anchor for "what was summarized".
Accounting and coverage rules
Summaries must not silently drop items.
If an item is skipped, the system must:
- record skip counts in the manifest
- include a breakdown of skip reasons
- optionally emit a skip report artifact if the run is large
Determinism boundaries
The Summary Bus must be deterministic where it can be deterministic, and explicit where it cannot.
Deterministic components
These must be deterministic and replayable:
- Selection logic for which items are summarized for a day
- Ordering of inputs
- Text normalization rules before hashing
- Prompt template choice and prompt hash derivation
- Summary id derivation
Non-deterministic components
Model output is not deterministic in general.
Rules:
- Model output fields must be labeled as model generated output.
- The exact prompt hash, model identity, and selection hash must be recorded so a re-run is explainable.
- If you enable a deterministic mode for the model provider, record the setting but do not assume perfect determinism.
Manifest contract
Event summary manifest
Minimum required fields for YYYY-MM-DD.events.summary.manifest.json:
schema_versionstring, example:events_summary_manifest.v1bus_schema_versionstring, example:event_summary.v1daystringinputobject describing what was consumed:eventbus_manifest_daystringeventbus_manifest_sha256string- optional
eventbus_rangeif you consume more than one day
pathsobject:summaries_pathstring
countsobject:eligibleintegerproducedintegerskippedintegerfailedinteger
skip_reasonsobject mapping reason code to integer countintegrityobject:sha256stringbytesinteger
producerobject:summarizer_versionstringrun_idstringmodel_namestringprompt_hashstring
Session summary manifest
Same structure, but input references Sessions Bus manifest(s):
sessions_manifest_daysessions_manifest_sha256
Document summary manifest
Same structure, but input references Chunk Bus or selection manifests:
chunk_manifest_dayorselection_manifest_pathchunk_manifest_sha256orselection_manifest_sha256- optional
document_idssummary fields in the manifest metadata
Chunk-set summary manifest
Same structure, but input references the chunk selection source:
chunk_manifest_dayorselection_manifest_pathchunk_manifest_sha256orselection_manifest_sha256- optional
document_idsorselection_hashmetadata
Invariants
Mandatory contract invariants:
schema_versionpresent on every summary objectsource_idsalways present and non-emptyselection.source_text_hashalways present- Manifest always present even if the day output is empty
- Daily summary file exists even if empty
- Counts must be consistent:
eligible = produced + skipped + failed
- Items must not be silently dropped:
- any drop must be reflected in counts and skip reasons
Compatibility rules
- Consumers must ignore unknown optional fields.
- Adding optional fields does not require a schema bump if semantics do not change.
- Any change that affects provenance interpretation, summary id derivation, or required fields requires:
- ADR
- Schema version bump
- Minimal migration note
Migration note for this expansion:
- Existing consumers of event and session summaries remain unchanged.
- Consumers that enumerate Summary Bus endpoints must add
summaries/documents/andsummaries/chunk_sets/if they intend to read document-oriented summaries. - Consumers must continue to treat
outputs.summary_textas the required compatibility field and ignore unknown optional structured fields.
Backward compatibility window:
- Consumers must support at least the latest and the previous summary schema versions unless an ADR states otherwise.
Smoke test
Purpose: validate schema and provenance, not model quality.
A smoke test must validate:
- Can read the relevant upstream manifest(s)
- Can produce empty-but-valid outputs when upstream is empty
- Output JSONL parses
- Every summary object contains:
schema_versionsummary_idsource_idsselection.source_text_hashmodelandpromptobjects with required keysproducer.run_idandproducer.summarizer_version
- Manifest exists and counts reconcile
- Skip reasons are present when
skipped > 0 - If
outputs.hierarchyexists, it satisfies the minimum structure defined by the producer's documented hierarchy version
Acceptance criteria:
- No schema violations
- No missing provenance fields
- No count reconciliation errors
Failure modes and required behavior
Stop-the-line behavior is mandatory: fail fast, record failure in run records, do not silently degrade.
Common failure modes:
- Missing upstream manifest
- Schema mismatch in upstream inputs
- Malformed JSONL output
- Missing provenance fields
- Manifest mismatch or unreconciled counts
- Selection hash computation failure