Event Bus contract
Authoritative contract for atomic message events.
Purpose
The Event Bus is the canonical, append-only stream of message events. It is the primary seam that downstream consumers rely on.
This page defines the event object schema, on-disk layout, manifest rules, and invariants. It does not define how to parse ChatGPT exports or any other upstream raw source format.
Scope
This contract defines:
- Canonical event object schema and schema version marker
- Stable ID rules and timestamp normalization rules
- Provenance fields required for replay and traceability
- File layout and endpoint patterns
- Manifest format requirements and integrity checks
- Compatibility and evolution rules for schema changes
- Smoke test expectations
- Failure modes and required responses
Non-scope
This contract does not define:
- How raw sources are discovered, downloaded, or authenticated
- Parsing logic for ChatGPT export formats
- Enrichment logic such as tagging, categorization, or summaries
- Any vector store or database indexing strategy
Bus endpoints
Daily event files
Path pattern:
eventbus/daily/YYYY-MM-DD.jsonl
Rules:
- One JSON object per line
- UTF-8
- Append-only for a given day
- Must exist even if empty
Daily manifest files
Path pattern:
eventbus/manifest/YYYY-MM-DD.manifest.json
Rules:
- Must exist even if the day file is empty
- Must reference the corresponding daily file path
- Must include schema version, counts, and integrity fields
Canonical event schema
Minimum required fields
-
schema_versionString. Manifest schema identifier. Example:event_manifest.v2 -
bus_schema_versionString. Event record schema identifier. Example:event.v1 -
dayString.YYYY-MM-DDin the bus timezone. -
daily_pathString. Relative or absolute path to the daily JSONL file this manifest describes. -
countsObject with:-
events_totalInteger. Total events in the daily file. -
events_by_kindObject mappingevent_kindto integer count. -
events_by_domainObject mappingdomain_familyto integer count.
-
-
integrityObject with:-
sha256String. SHA-256 of the daily JSONL file content. -
bytesInteger. File size in bytes. -
linesInteger. Number of JSONL lines.
-
-
kind_registryObject that fixes the allowed taxonomy for this file (acts as a compatibility fence):-
allowed_kindsArray of strings (see list below). -
allowed_subkindsObject mapping each kind to an array of allowed subkinds. Must include"other"as an allowed subkind for every kind.
-
Optional fields
-
producerObject describing the generator:repostringversionstringgit_commitstring (short or full hash)hoststring (optional)run_idstring (optional)
-
generated_atTimestamp string (ISO 8601). -
warningsArray of strings. Non-fatal issues detected at generation time. -
notesFreeform string for operator notes.
Allowed event_kind and event_subkind (v1)
This is the recommended default taxonomy for event.v1. It is intentionally small and stable.
-
chat_turnSubkinds:user_messageassistant_messagetool_calltool_resultsystem_noteother
-
outreach_actionSubkinds:plannedsentreply_receivedfollowup_dueother
-
external_observationSubkinds:norm_publishedparliament_updatetweet_postedjob_postedopportunity_postedprice_tickother
-
external_updateSubkinds:object_changeddeadline_changedstatus_changedother
-
external_deadlineSubkinds:deadline_upcomingdeadline_missedother
-
workflow_triggeredSubkinds:schedule_triggermanual_triggerdependency_triggerother
-
workflow_completedSubkinds:successpartialother
-
workflow_failedSubkinds:exceptionvalidation_failedrate_limitedauth_failedother
-
health_signalSubkinds:heartbeat_oklag_detectedqueue_backlogother
-
decision_recordSubkinds:policy_decisionarchitecture_decisionpriority_decisionother
-
work_session_loggedSubkinds:focus_blockmeetingreviewother
Boundary rule: event_bus vs chunk/session/summary
The event bus is an append-only log of observations, actions, and triggers. Event payloads must be thin and primarily carry pointers:
- If the content is substantial text, store it in
chunk_busand reference it from the event. - If it is a time-boxed human work unit, store it in
session_busand reference it from the event. - If it is interpreted meaning, classification, or synthesis, store it in
summary_busand reference it from the event.
This prevents the event log from becoming a document store or a meaning landfill.
Producer permissions (public, non-project-specific)
To prevent bus drift, producers should be constrained by role:
- Signal/ingest producers may emit:
chat_turn,external_observation,external_update,external_deadline. - Workflow/orchestration producers may emit:
workflow_triggered,workflow_completed,workflow_failed,health_signal. They should emit detailed execution traces torun_records(not the event bus), using the event bus only for routing and alerting signals. - Human-work producers may emit:
work_session_logged,decision_record, and may reference linkedsession_busitems. - Transformers and publishers should generally not invent new events as “meaning outputs”. Their outputs belong in
summary_bus,digest_bus, orsnapshot_bus, linked back to the upstream events that justified them.
The only escape valve is event_subkind="other" which must be used sparingly and reviewed periodically (if it becomes frequent, promote a new named subkind).
Stable ID rules
Goal: event_id must not change across re-runs when the same upstream content is reprocessed.
Requirements:
-
Deterministic: derived from stable upstream identifiers when available
-
If upstream identifiers are missing, derive from a stable tuple such as:
- source_system
- source_uri
- conversation_id if available
- timestamp_ms
- role
- content hash
-
Collision resistant: must be cryptographic hash based
Prohibited:
- Random UUID generation at ingest time
- IDs that depend on run timestamp or file ordering
Timestamp normalization rules
-
timestamp_msmust be epoch milliseconds as integer -
If upstream provides seconds or ISO timestamps, convert to milliseconds
-
Allowed range:
- Must be within a plausible operational window
- Must not be negative
-
If timestamp cannot be recovered:
- The event must be rejected
- The run record must include an error and the source pointer
Provenance rules
sourcemust allow replaysource_urimust point to a retained raw input or a canonical ingest snapshot- If the same raw file is re-ingested, the resulting events must have the same
event_idvalues
Invariants
These rules are mandatory. Violations must trigger stop-the-line behavior.
- Daily files are append-only
- Stable IDs must not change across re-runs
- Manifest exists for every day, even for empty day files
schema_versionexists on every event- Each daily file must be valid JSONL
event_idmust be unique within a day file
Manifest contract
Minimum required fields for YYYY-MM-DD.manifest.json:
-
schema_versionString. Example:event_manifest.v1 -
bus_schema_versionString. Example:event.v1 -
dayString.YYYY-MM-DD -
daily_pathString. Path to the daily JSONL file -
countsObject with:events_totalintegerevents_by_roleobject mapping role to integer
-
integrityObject with:sha256string for the daily file contentbytesinteger for daily file size
Optional fields:
producerobject: repo, version, git commitgenerated_attimestamp
Compatibility rules
Schema evolution must be controlled and predictable.
-
Consumers must ignore unknown fields
-
Producers may add optional fields without a version bump if:
- Existing required fields are unchanged
- Semantics of existing fields are unchanged
-
Any change to required fields, meaning, or ID rules requires:
- ADR
- Schema version bump
- Minimal migration note
Backward compatibility window:
- Consumers must support at least the latest and previous schema versions, unless an ADR states otherwise.
Smoke test
Purpose: prove that a minimal run can produce contract-compliant outputs.
Minimal command:
-
A repo-specific smoke command is acceptable, but it must validate:
- At least one day file path is produced, even if empty
- A manifest is produced for that day
- JSONL parses
- No duplicate
event_idvalues in the file - Manifest counts match parsed counts
Expected validations:
- JSONL parse for the daily file
- Schema presence checks for required fields
- Manifest presence and checksum match
Failure modes and required behavior
Stop-the-line means:
- Fail fast
- Write a run record with error taxonomy
- Do not silently degrade
- Do not emit partial outputs without manifest integrity
Missing day file
Symptoms:
eventbus/daily/YYYY-MM-DD.jsonlmissing
Required response:
- Fail the run
- Record error:
MISSING_DAILY_FILE
Malformed JSONL
Symptoms:
- A line is not valid JSON
- Encoding errors
Required response:
- Fail the run
- Record error:
MALFORMED_JSONL - Include source pointer to offending line number if possible
Duplicate event IDs
Symptoms:
- Two lines share the same
event_id
Required response:
- Fail the run
- Record error:
DUPLICATE_EVENT_ID
Timestamp out of range
Symptoms:
- Negative timestamp or implausible date
Required response:
- Reject event
- Fail the run if any rejected events occur
- Record error:
TIMESTAMP_OUT_OF_RANGE
Manifest mismatch
Symptoms:
- Parsed count differs from manifest count
- SHA256 differs
Required response:
- Fail the run
- Record error:
MANIFEST_MISMATCH - Recommend rebuild of that day output as remediation