Skip to main content

Relations & Edges

Edges are the structural connections of the knowledge graph. Every edge is grounded in shared factual evidence — not semantic similarity, not embedding proximity, but actual facts that mention both connected nodes.

The key principle

Evidence-grounded edges

Embedding proximity does NOT create edges. Two nodes can be semantically close (similar embeddings) yet have no edge if no facts mention both. An edge exists because facts explicitly reference both concepts.

This is a fundamental architectural distinction. Semantic similarity is a search tool — it helps find related nodes. But edges represent a stronger claim: "these two concepts appear together in the same factual evidence."

How edges are created

Edges arise from seed co-occurrence during fact extraction:

  1. During fact decomposition, each fact's entities and concepts are extracted
  2. When a single fact mentions multiple seeds, those seeds become edge candidates
  3. Edge candidates accumulate in the write_edge_candidates table
  4. When a seed is promoted to a node, the edge resolver processes its candidates:
    • Loads all shared facts between node pairs
    • Calls an LLM to generate a justification citing specific facts
    • Creates the edge with weight = number of shared facts

Weight semantics

An edge's weight equals the number of facts shared between its two nodes. Higher weight means stronger evidence for the relationship. This is always a positive number — there are no negative edges.

Edge types

TypeWhen usedExample
relatedConnects nodes of the same type (concept-concept, entity-entity)"solar power" ↔ "wind power"
cross_typeConnects nodes of different types (entity-event, concept-entity)"NASA" (entity) ↔ "Apollo 11" (event)
contradictsLinks thesis/antithesis perspective pairs"AI is beneficial" ↔ "AI is dangerous"

The relationship type is determined automatically by comparing the node types of the two endpoints: same type produces related, different types produce cross_type.

Edge properties

Each edge stores:

FieldDescription
source_node_idOne endpoint (canonical: always the smaller UUID)
target_node_idOther endpoint (canonical: always the larger UUID)
relationship_typerelated, cross_type, or contradicts
weightShared fact count (positive float)
justificationLLM-generated reasoning with {fact:uuid} citation tokens

Canonical ordering

Edges use canonical UUID ordering — the smaller UUID is always stored as source_node_id. This ensures each node pair has exactly one edge per type, regardless of which direction the edge was discovered. A database unique constraint enforces this.

Justifications

Each edge includes an LLM-generated justification explaining why the two nodes are related. Justifications cite specific facts using {fact:uuid} tokens, which are rendered as clickable links in the UI. This makes every connection in the graph auditable.

Circular references

Circular references are valid and expected. "Water" links to "hydrogen" and "hydrogen" links to "water" — this reflects real conceptual structure. The graph is flat (all nodes are peers), and cycles are natural.

Navigation agents handle cycles through visited_nodes tracking, ensuring they don't get stuck in loops while traversing the graph.

Edges vs. embedding similarity

EdgesEmbedding Similarity
Created bySeed co-occurrence + LLM justificationComputed from content
Meaning"These concepts share factual evidence""These concepts have similar semantic content"
Used forGraph traversal, synthesis, visualizationNode search, dedup detection
Grows with useYes — more queries discover more edgesNo — determined by content

Both are valuable, but they serve different purposes. Edges represent verified factual connections; similarity is a discovery heuristic.