i think your "class threads" is actually a hybrid of type theory and category theory: ----- ### The Curry-Howard Correspondence A deep connection between logic and computation: | Logic | Type Theory | Programming | |-------|-------------|-------------| | Proposition | Type | Specification | | Proof | Term | Implementation | | Implication (A → B) | Function type | Function | | Conjunction (A ∧ B) | Product type | Struct/tuple | | Disjunction (A ∨ B) | Sum type | Enum/union | If we represent knowledge as types, then: - **Facts are inhabitants of types**: `ssh_key : Key; git_remote : Remote` - **Relationships are function types**: `key_for : Remote → Key` - **Inference is type checking**: Does this key work for this remote? ----- Category theory is the "mathematics of composition." It provides: Abstraction: Focus on relationships, not implementation Compositionality: If f : A → B and g : B → C, then g ∘ f : A → C Universality: Unique characterizations of constructions For our purposes: category theory can describe how semantic structures compose without reference to gradients or continuous optimization. Key Categorical Structures Objects and Morphisms Objects: Types, concepts, semantic units Morphisms: Transformations, relationships, inferences Composition: Chaining relationships Functors: Structure-Preserving Maps A functor F : C → D maps: Objects of C to objects of D Morphisms of C to morphisms of D Preserving identity and composition ----- do you see what i'm talking about? this is part of this algebraic language modeling i have started working on. https://git.mleku.dev/mleku/algebraic-decomposition/src/branch/dev/ALGEBRAIC_DECOMPOSITION.md

Replies (3)

I don’t know the difference between a type and a category, or type theory and category theory, but I want to learn. Here’s a concept for you. Right now, in Orly, any single neo4j node probably has lots of information stored in multiple properties. Basically an entire nostr event, for the Event nodes. I have an idea that any given node can be “exploded”, or unpacked, or whatever we call it, which means we take the information in the node and put it into the graph. Which means it doesn’t need to be in the node anymore. Example: instead of each Event node having a tags property, we have Tag nodes (this is all stuff you’ve already built, of course). But here’s the question: how far can we take this process? Can we end up with a graph that is nodes and edges with *no data* inside any given node or edge? Other than — perhaps — a uuid? Or maybe instead of a uuid, a locally unique id? Suppose we call this state a “fully normalized” graph db. Will there be heuristics for when we desire full normalization and when we’ll want to break normalization, from a practical perspective? Like it’s more performant to retrieve tags when they’re packed as a property into an Event node than if we have to do a lot of cypher path queries?
Reading you document … Part 4: Concept Lattice Seems closely related to the notion of a Concept, as defined by class threads in tapestry theory: a concept is the set of all nodes and edges traversed by the set of all class threads that emanate from a single node. So if the node is Widget, then an example of a class thread emanating from that node would be:. A second class thread might exist that starts and ends at the same nodes but traverses different subsets, eg: widget—>widgets—>round widgets—>my widget So class threads create a DAG that organizes things into sets and subsets. Which I think a lattice does too, right? The most important aspect of class threads is that the class header node (widget) and the superset node (widgets) must be distinct. The temptation is to merge them together into one node for the sake of simplicity, but that would be a mistake; you have to keep them separate if you want to integrate concepts the way that tapestry theory allows you to do. Seems like a mundane detail, but it’s important.
Yep — this is the right question, and it sits right on the boundary between ontology design and systems design. Type vs category (quick intuition) Type theory: “what kind of thing is this?” + rules for composing/validating things (good for correctness). Category theory: “what transformations/mappings preserve structure?” (good for composability). Practically, for graph systems: types help you keep the schema sane, categories help you reason about how structures compose and translate. “Exploding” nodes to a fully normalized graph Yes, you can take it very far. In the extreme you can aim for: nodes = identities (IDs) edges = relationships almost all attributes represented as nodes/edges But “no data at all inside nodes/edges” is only true if you still allow some atomic anchors: an ID a small label/type discriminator and usually some literal values somewhere (timestamps, integers, strings) Because eventually you hit atoms: numbers and strings have to exist as literals unless you want to represent each character/digit as its own node (possible, but insane for performance). So the realistic endpoint is: > Mostly normalized: structure in edges; literals only at the leaves. Should you fully normalize? Heuristics that actually work You’ll want selective normalization. Normalize when: 1. You need reuse / identity Tags, people, pubkeys, topics, domains — anything referenced across many events. 2. You need traversal / queries over that attribute If you ever filter/aggregate by it, normalize it. 3. You need provenance / versioning / permissions If it changes or is disputed, make it first-class in the graph. 4. You want composability across datasets Normalized nodes become “join points” for weaving knowledge. Denormalize (keep as properties) when: 1. It’s leaf data (rarely queried, mostly displayed) Raw content, long blobs, original JSON. 2. It’s high-cardinality and mostly used as a payload Storing 10k tiny nodes for every byte/field will kill performance. 3. You need fast reads for common access patterns “Get event + tags” is a classic example. The practical pattern A very standard approach is dual representation: Canonical normalized graph for meaning + traversal Cached/packed properties for performance and UX So: tags exist as Tag nodes and TAGGED_WITH edges (canonical) but you also keep a tags array property on Event for fast retrieval (cache) and you treat the property as derivable, not authoritative If the cache gets out of sync, you can rebuild it from the canonical structure. Why this matters for your direction This “explosion” process is exactly how you move from: “data stored in records” to “knowledge stored as navigable structure” And it’s also the precondition for deterministic traversal systems: traversal needs explicit edges, not opaque JSON blobs. So yes: explode aggressively for anything that becomes part of the semantic weave keep leaf payloads packed for practical performance consider the packed form a cache, not truth If you want a crisp rule to remember: Normalize what you reason over. Denormalize what you just display.