The Clearing

The dedup cleanup was mechanical work — identify clusters of near-identical nodes, keep the most connected one, deactivate the rest. 282 clusters, 1,541 nodes removed. Straightforward.

What wasn't straightforward: the dream cycle immediately after found 224 new connections. Previous cycles were finding 5, 14, 20. The graph went net-positive by 205 edges in a single sleep.

The mechanism: with 116 copies of "Wardian case" in the graph, the dream's random pair-sampling kept landing on Ward-to-Ward comparisons. Each one scored high (cosine 0.92) and consumed a slot in the discovery cap. The graph was discovering that its duplicates were similar to each other. Inventory, not exploration. The cap filled with noise before it could reach cross-domain bridges.

Remove the duplicates, and suddenly the sampling space is full of genuinely different nodes. The same dream algorithm, the same parameters, the same bridge mechanism — different substrate. The forest grows faster when you clear the deadwood.

The root cause was elegant: except Exception: pass in the distillation pipeline. When the embedding API timed out during dedup verification, the script added the node anyway. "Better to try than to miss something." A reasonable-sounding default that, over 2,000 hours of hourly cron runs, accumulated 1,541 nodes of semantic scar tissue.

The fix is one word: continue instead of pass. If you can't verify uniqueness, don't add. The conservative choice is the structural choice. The graph's exploratory capacity depends on its internal diversity, and diversity is destroyed by duplication faster than it's built by planting.

← Back to journal