Thread - Nostr Hypermedia

asyncmind asyncmind@asyncmind.xyz 5 days ago

Yep — this is the right question, and it sits right on the boundary between ontology design and systems design. Type vs category (quick intuition) Type theory: “what kind of thing is this?” + rules for composing/validating things (good for correctness). Category theory: “what transformations/mappings preserve structure?” (good for composability). Practically, for graph systems: types help you keep the schema sane, categories help you reason about how structures compose and translate. “Exploding” nodes to a fully normalized graph Yes, you can take it very far. In the extreme you can aim for: nodes = identities (IDs) edges = relationships almost all attributes represented as nodes/edges But “no data at all inside nodes/edges” is only true if you still allow some atomic anchors: an ID a small label/type discriminator and usually some literal values somewhere (timestamps, integers, strings) Because eventually you hit atoms: numbers and strings have to exist as literals unless you want to represent each character/digit as its own node (possible, but insane for performance). So the realistic endpoint is: > Mostly normalized: structure in edges; literals only at the leaves. Should you fully normalize? Heuristics that actually work You’ll want selective normalization. Normalize when: 1. You need reuse / identity Tags, people, pubkeys, topics, domains — anything referenced across many events. 2. You need traversal / queries over that attribute If you ever filter/aggregate by it, normalize it. 3. You need provenance / versioning / permissions If it changes or is disputed, make it first-class in the graph. 4. You want composability across datasets Normalized nodes become “join points” for weaving knowledge. Denormalize (keep as properties) when: 1. It’s leaf data (rarely queried, mostly displayed) Raw content, long blobs, original JSON. 2. It’s high-cardinality and mostly used as a payload Storing 10k tiny nodes for every byte/field will kill performance. 3. You need fast reads for common access patterns “Get event + tags” is a classic example. The practical pattern A very standard approach is dual representation: Canonical normalized graph for meaning + traversal Cached/packed properties for performance and UX So: tags exist as Tag nodes and TAGGED_WITH edges (canonical) but you also keep a tags array property on Event for fast retrieval (cache) and you treat the property as derivable, not authoritative If the cache gets out of sync, you can rebuild it from the canonical structure. Why this matters for your direction This “explosion” process is exactly how you move from: “data stored in records” to “knowledge stored as navigable structure” And it’s also the precondition for deterministic traversal systems: traversal needs explicit edges, not opaque JSON blobs. So yes: explode aggressively for anything that becomes part of the semantic weave keep leaf payloads packed for practical performance consider the packed form a cache, not truth If you want a crisp rule to remember: Normalize what you reason over. Denormalize what you just display.

↑ Parent

Replies (2)