Digest Bus contract
Contract for memory bags and published digest artifacts.
Purpose
The Digest Bus carries compiled memory artifacts, called bags, plus the published outputs that downstream consumers can render or serve.
This page defines:
- Bag representation and bag id stability rules
- On-disk layout for digests and memos
- Authoritative index files and how consumers must read them
- Traceability rules back to Summary Bus inputs and selection rules
- Atomic publishing rules
- Smoke test expectations and failure modes
This contract treats the Digest Bus as a publish boundary with strict integrity guarantees.
Scope
In scope:
- Bag model and required metadata
- Index files and their semantics
- Publish directory layout for L2 and similar levels
- Naming conventions for artifacts and memo files
- Atomic update rules for published outputs
- Traceability invariants and validation gates
Out of scope:
- How summaries are produced
- How sessions are computed
- How selectors are authored or tuned
- Human editorial judgement inside memo text
- UI rendering or site generation
Key rule
Consumers must read index files.
Consumers must not discover digests by walking directory trees and guessing conventions.
If an output is not referenced by an index entry, it does not exist as far as the contract is concerned.
Concepts
Bag
A bag is a deterministic compilation of units derived from upstream summaries plus selection rules.
A bag has:
- a stable id
- a type
- a window or scope
- a manifest of what it contains
- published artifacts (memos, optional data files)
- authoritative index references
Digest
A digest is the published representation of one bag or a set of related bags.
A digest may include:
- memo markdown files
- metadata sidecars
- optionally snapshot-ready payloads
Endpoints
The contract defines a small set of canonical endpoints. You may add internal run folders, but these endpoints are the stable interface.
Published digests
Base directory:
digests/
Level 2 published output layout:
digests/L2/<bag_type>/<bag_id>/
Required subpaths inside a bag directory:
meta/bag.jsonmeta/trace.jsonmemo/memo/index.json
Optional subpaths:
data/for structured exportsassets/for images or attachments if needed
Memo file placement:
digests/L2/<bag_type>/<bag_id>/memo/<memo_slug>.md
Memo sidecar metadata:
digests/L2/<bag_type>/<bag_id>/memo/<memo_slug>.meta.json
Authoritative indexes
Index directory:
index/
Required authoritative index files:
index/l2_by_window.jsonindex/digest_registry.json
Optional indexes:
index/l2_latest.jsonindex/tags_index.jsonindex/pairs_index.json
Integrity anchors:
index/index.sha256or equivalent integrity file- optional per-index sha files if preferred
Run records
Digest builds must write run records:
artifacts/run_records/<run_id>.run_record.json
Bag model
Bag id stability
A bag id must be stable across reruns when inputs and selection rules are identical.
A bag id must be derived from:
- bag type
- window definition
- selector or rule set identity
- the ordered list of upstream summary ids, or a deterministic hash of that list
If any of these change, bag id may change. If bag id changes, it must be explainable via trace data.
Required bag metadata file
meta/bag.json is required and must be schema versioned.
Minimum required fields:
schema_versionstring, example:digest_bag.v1bag_idstringbag_typestring, example:tagbagorpairbagorcohortlevelstring or integer, example:L2windowobject describing time or scope:window_typestringstart_daystring or nullend_daystring or nulllabelstring or null
selectorobject describing selection rules:selector_idstringselector_versionstringselector_hashstring
inputsobject describing upstream dependencies:summary_bus_manifest_refsarray of objects withdayandsha256summary_schema_versionsarray
countsobject:candidateintegerselectedintegerpublished_memosinteger
producerobject:digest_engine_versionstringrun_idstring
created_atstring, ISO timestamp
Required trace file
meta/trace.json is required and exists to support audit and downstream confidence.
Minimum required fields:
schema_versionstring, example:digest_trace.v1bag_idstringupstreamobject:summary_idsarray of stringssource_ids_unionarray of strings or omitted if too large, but then includesource_ids_union_hashsource_ids_union_hashstring
rulesobject:selector_idstringselector_hashstringregistry_hashstring if a registry influences selection
coverageobject:selected_summary_idsintegerdropped_summary_idsintegerdrop_reasonsobject mapping reason code to counts
integrityobject:bag_dir_sha256string or an equivalent deterministic integrity representation
Memo model
Each memo must have:
- markdown content file
- metadata sidecar json
- traceability fields that connect it to bag, rules, and upstream summaries
Required memo metadata sidecar
memo/<memo_slug>.meta.json is required.
Minimum required fields:
schema_versionstring, example:digest_memo_meta.v1bag_idstringmemo_slugstringtitlestringbag_typestringlevelstring or integersummary_idsarray of strings included in this memoselector_idstringselector_hashstringcreated_atISO timestampintegrityobject:md_sha256string- optional
meta_sha256string
Optional fields:
tagsarraytopicsarraycategorystringaudiencearraynotesstring
Index files
Index files are the only allowed interface for consumers.
index/digest_registry.json
Purpose: enumerate all published bags and where to find them.
Minimum fields:
schema_versionstring, example:digest_registry.v1updated_atISO timestampentriesarray of objects with:levelbag_typebag_idwindow_labelor window fieldspathrelative path to bag rootbag_meta_sha256trace_sha256published_memosinteger
index/l2_by_window.json
Purpose: primary query surface for L2.
Minimum fields:
schema_versionstringupdated_atISO timestampwindowsmapping from window label to:bag_refsarray of bag reference objects:bag_typebag_idpathpublished_memoscreated_at
Memo index inside each bag
memo/index.json must exist.
Minimum fields:
schema_versionstring, example:bag_memo_index.v1bag_idstringupdated_atISO timestampmemosarray of:memo_slugtitlepathrelative path to markdownmeta_pathmd_sha256- optional
tags
Publishing rules
Publishing must be atomic.
Rules:
- Never partially update a published bag directory.
- Never partially update index files.
- Use stage directories and promote only after validation passes.
Recommended pattern:
- Write outputs to a staging directory
- Validate:
- schemas
- traceability
- integrity hashes
- index references
- Promote staging into final location via atomic rename
- Update indexes via atomic write:
- write new index to temp file
- fsync
- atomic rename
If atomic rename is not possible across devices, the pipeline must enforce same filesystem staging or record a stop-the-line failure.
Invariants
Mandatory invariants:
- Every published memo must be traceable to:
- bag id
- selector identity
- upstream summary ids
- Index files must be authoritative:
- any published bag must be referenced by
digest_registry.json - any bag referenced by indexes must exist and validate
- any published bag must be referenced by
- Index updates must be atomic:
- consumers must never observe partially updated indexes
- Published output never partially updated:
- either the full bag build is present and valid, or it is absent
- Schema versions always present in:
meta/bag.jsonmeta/trace.jsonmemo/index.json- memo sidecars
Compatibility rules
- Consumers must ignore unknown fields.
- Adding optional fields is allowed without bump if semantics do not change.
- Changes to:
- bag id derivation
- index file semantics
- required metadata fields
- publishing atomicity behavior require:
- ADR
- schema version bump for affected schemas
- minimal migration note
Backward compatibility window should be defined in ADRs. Default expectation: support latest and previous versions for index files.
Smoke test
Purpose: validate index consistency, atomic update behavior, and traceability.
A smoke test must validate:
- A bag build produces:
meta/bag.jsonmeta/trace.jsonmemo/index.json- at least zero memos, but valid structure always
- All referenced paths in
digest_registry.jsonandl2_by_window.jsonexist - Hashes reconcile:
- bag meta sha matches referenced sha
- memo md sha matches computed sha
- Traceability:
- every memo sidecar lists summary ids
- summary ids referenced are unique and well-formed
- selector hash present and non-empty
- Atomicity behavior:
- the build uses stage then promote, evidenced by run record fields or a publish log artifact
Acceptance criteria:
- No missing required files
- No index references to missing paths
- No memo without meta sidecar
- No trace file missing upstream linkage
- No partial publish detectable
Failure modes and required behavior
Stop-the-line behavior is mandatory.
Common failures and required error recording:
- Index references missing bag directory
- Missing
meta/bag.jsonor missing schema version markers - Memo missing sidecar or integrity mismatch
- Bag id drift without selector or inputs changes
- Non-atomic index update detected
- Inconsistent counts between bag metadata and memo index
On any failure:
- do not publish partial outputs
- record the failure in the run record
- include enough details to reproduce the validation step