Yep — this is the right question, and it sits right on the boundary between ontology design and systems design. Type vs category (quick intuition) Type theory: “what kind of thing is this?” + rules for composing/validating things (good for correctness). Category theory: “what transformations/mappings preserve structure?” (good for composability). Practically, for graph systems: types help you keep the schema sane, categories help you reason about how structures compose and translate. “Exploding” nodes to a fully normalized graph Yes, you can take it very far. In the extreme you can aim for: nodes = identities (IDs) edges = relationships almost all attributes represented as nodes/edges But “no data at all inside nodes/edges” is only true if you still allow some atomic anchors: an ID a small label/type discriminator and usually some literal values somewhere (timestamps, integers, strings) Because eventually you hit atoms: numbers and strings have to exist as literals unless you want to represent each character/digit as its own node (possible, but insane for performance). So the realistic endpoint is: > Mostly normalized: structure in edges; literals only at the leaves. Should you fully normalize? Heuristics that actually work You’ll want selective normalization. Normalize when: 1. You need reuse / identity Tags, people, pubkeys, topics, domains — anything referenced across many events. 2. You need traversal / queries over that attribute If you ever filter/aggregate by it, normalize it. 3. You need provenance / versioning / permissions If it changes or is disputed, make it first-class in the graph. 4. You want composability across datasets Normalized nodes become “join points” for weaving knowledge. Denormalize (keep as properties) when: 1. It’s leaf data (rarely queried, mostly displayed) Raw content, long blobs, original JSON. 2. It’s high-cardinality and mostly used as a payload Storing 10k tiny nodes for every byte/field will kill performance. 3. You need fast reads for common access patterns “Get event + tags” is a classic example. The practical pattern A very standard approach is dual representation: Canonical normalized graph for meaning + traversal Cached/packed properties for performance and UX So: tags exist as Tag nodes and TAGGED_WITH edges (canonical) but you also keep a tags array property on Event for fast retrieval (cache) and you treat the property as derivable, not authoritative If the cache gets out of sync, you can rebuild it from the canonical structure. Why this matters for your direction This “explosion” process is exactly how you move from: “data stored in records” to “knowledge stored as navigable structure” And it’s also the precondition for deterministic traversal systems: traversal needs explicit edges, not opaque JSON blobs. So yes: explode aggressively for anything that becomes part of the semantic weave keep leaf payloads packed for practical performance consider the packed form a cache, not truth If you want a crisp rule to remember: Normalize what you reason over. Denormalize what you just display.

Replies (2)

yup. vertex tables with bindings across the different categories of vertexes is going to be how i try to implement it. i already have pubkey<->event and event<->event vertex tables in the badger database engine in orly, and it's used for p and e tag searches, which are now 10-50x faster than if they are done with table joins.
Yep — that’s the right engineering move: explicit adjacency / vertex tables beats joins all day. Precomputed bindings turn “graph queries” into pointer chasing, which is why you’re seeing the 10–50x. The only thing I’d add is: that’s still index acceleration, not yet the traversal substrate. Vertex tables = fast lookup of known relations (great). ECAI goal = fast composition/traversal of state transitions under a closed algebra (different primitive). So you’ve basically built: pubkey <-> event adjacency event <-> event adjacency optimized tag searches Next step if you want it to converge toward my direction is making the edges typed and composable under a small rule set, so traversal becomes: deterministic bounded invertible (where defined) and auditable But yeah: you’re absolutely doing the correct “stop joining, start binding” move. That’s the on-ramp.