Synthesis Layer, Why I Didn’t Build a Wiki
or: Computers prefer structured data.
Nate Jones opens with Andrej Karpathy’s sentence and the sentence is right: “The knowledge is compiled once and then kept current, not re-derived on every query.” Compile once. Refer back. Don’t ask the model to rebuild your understanding from raw documents every time it sits down at the desk.
The question I had to answer wasn’t whether to compile. It was what to compile into, and who maintains it when the AI gets it wrong.
Nate’s natural next step is a wiki. His prompt kit walks the reader through it: an editorial policy, a schema for the wiki, an indexer, a drift auditor. It is a careful, intelligent answer for the audience he’s writing to.
This is the exact same fight I was having with Claude 3.5 about a year ago. Only instead of a wiki, I had PDFs and Docx files. Everything that is now in the vot-canon database was in those files.
Ninety-five characters. The lore document alone ran a hundred pages. Claude was unable to process. It was a disaster.
While I’m sure Claude can do a much better job on .md files now, this is still the wrong shape for the endstate of this workflow. This is a step backwards.
I took my data and moved the mountain to Mohammad. We built the database. The database has a full-text search. It is wildly better than my original flat files.
I’m arguing from the other side of that decision, not from theory. Nate’s piece is the comparison I wish I’d had before I started building. This isn’t an attack on his answer. It’s the view from one tier up.
Three Tiers, One Inflection Point
Three tiers, each right for its use case.
Tier one: AI as a better search engine. No persistent knowledge, no schema, no compilation, just better retrieval than Google. The lawyer asking ChatGPT to find a relevant case lives here. So does the researcher who pastes a paragraph and asks “what am I missing?”
Tier two: a compiled wiki. A solo researcher or small team building synthesis at moderate scale. Markdown pages, an editorial policy, a schema as instruction. Nate’s audience lives here: someone who has read fifty papers on a topic, wants the synthesis to outlast any individual session, and is willing to maintain a wiki to get it.
Tier three: a database with enforced schema. Structured records, exact retrieval, a knowledge graph that grows without corrupting prior context. The use case is large enough or relational enough that prose pages stop being a viable container.
Nate’s wiki is the right answer at tier two. This piece lives at tier three. The interesting question isn’t which tier is best in the abstract. It’s where the inflection point sits, and what breaks when you cross it.
The Fork: Schema as Prompt vs. Schema as Constraint
Nate names the write-time vs. query-time decision and that framing is correct. Inside it sits a second decision that does most of the work: schema as prompt vs. schema as constraint.
A prompt is an instruction the AI reads and tries to follow. A schema written as prompt is a markdown file describing how the wiki should be organized: which sections matter, what fields belong together, how to handle contradictions. The AI reads it as guidance and writes prose pages that conform to it as best it can. A schema written as constraint is CREATE TABLE with NOT NULL on the fields that have to be present, foreign keys on the relationships that have to hold, and a database engine that rejects writes that violate either. The schema isn’t read. It’s enforced.
The difference is who does the enforcing. With schema-as-prompt the AI is the enforcer, every session, on every write. With schema-as-constraint the database is.
The AI cannot write a record missing a required field. It cannot write a foreign key to a record that doesn’t exist. It cannot silently skip a relationship the schema demands. The editorial policy lives in the engine, not in the prompt.
Nate’s own framing is the tell. He calls the schema file “the highest-leverage document in the whole system.” He’s right. So we put it where it has the most leverage, which is in the database, where the database makes it true.
Layer 1: Fiction (Canon In, Three Things Out)
The fiction domain runs on a single shape: canon in, AI in the middle, three things out. Canon is the human-maintained read layer: characters, locations, world rules, voice calibration, cross-book linkages. The AI reads it and never writes it. The three outputs are the chapter.md itself, the fiction synthesis written to synthesis.db, and the production notes written to notes.db.
The chapter.md is the deliverable. The fiction synthesis is what the chapter established that the next session has to inherit: narrative state, scene resolutions, threads opened or closed. The production notes are what the work taught about how to do the work: taste corrections, constraints that have to hold, affinities surfaced during revision. One read, one AI pass, three writes, each going to the layer that is shaped to receive it.
Three enforcement directions. Canon rejects invention: the AI cannot write a pov_char value for a character that isn’t in the character table. Synthesis rejects sloppy outcome records: scope, priority, and attribution required on commit. Notes rejects unscoped guidance: a constraint without a project tag and a priority does not write. The human-maintained read layer protects against the AI making things up. The two AI-writable layers protect against the AI being careless with what it learned, in two different registers.
That’s the database tutorial. The point is the loop, not the schemas.
The next session opens. Claude queries canon for the architectural shape, synthesis for what the prior session left on the table, notes for the constraints and taste corrections in scope. It arrives knowing who was in the scene, what the location looked like, what the character’s voice calibration is, what the last chapter set up that this one has to pay off, and what the user said the last time the prose drifted toward a register the project rejects.
Without it, the session opens with the model asking “remind me, what’s Babydoll’s voice register again?” and getting whichever answer the loaded prose pages happened to phrase clearly enough this time. Same character, two slightly different answers across two sessions, no flag when they diverge. With the loop, the answer is the same row both times.
Not re-derived. Retrieved.
Layer 2: Research (Claude Writes and Reads synthesis.db)
In the research domain, the AI both writes and reads. Claude evaluates a source, develops an argument, ships an article. The compiled understanding goes into typed records: a source row, an article row, a junction row connecting them, and a section-level keyword index. Next prep session queries those records and arrives knowing what sources exist, what arguments have already been made, what ground doesn’t need covering again.
Here’s where the architecture diverges from a wiki. When a source appears in a second article, a second junction record is written. The first junction row, capturing exactly how the source was deployed in the first article at the time it was written, is never touched. The source’s synthesis field grows richer as new uses accumulate, but the per-article context is exactly what it was the day it was written. The graph grows forward.
A wiki version of this would require going back and editing the prior pages to reflect new understanding. Which is exactly the drift problem, inflicted intentionally, every time the synthesis layer learns something new. The database doesn’t have that problem. A single full-text query returns every compiled understanding that touches a keyword across every article, without opening any individual session or modifying any prior record. The knowledge graph is traversable in any direction, source to articles, article to sources, topic keyword to everything that touched it, and every node in the graph is exactly as it was when it was written.
The synthesis grows. The records don’t move.
Layer 3: Session Notes (Claude Writes notes.db)
In the session domain, Claude writes and reads scoped notes: project tag, priority, observation count, FTS5 search across the body. Constraints get logged as separate records with their own lifecycle. Affinities, the patterns the user has demonstrated they care about, ride alongside.
Concrete: a constraint logged at session three (”never reframe Babydoll’s hypervigilance as paranoia; it’s calibration”) becomes a row in notes.db with a scope, a priority, and a body the next session’s init can pull by tag. Session four opens, queries constraints scoped to the fiction project, and arrives with that line already in context. Not “here’s a long markdown file of everything; figure out what’s relevant.” A query that returns only what applies, ordered by priority, with the original observation date attached.
Not a markdown file of session notes. Typed records the AI writes and queries to stay calibrated across sessions, scoped to the project that needs them, indexed for retrieval.
Three layers. One principle. The AI does the work, the relevant data goes into the right database, and the next session arrives already informed. Compiled understanding lives in databases, not in prose pages.
The Drift Problem (And the AI Makes It Worse)
Drift isn’t a multi-author problem. It happens to wikis written by a single user and wikis maintained by a single AI, because the cause isn’t conflicting editors. It’s time. The page was accurate when it was written. The subject moved on. The page didn’t.
Wikipedia is the canonical example at scale. Edit wars are the visible failure; the invisible one is the uncontested page nobody is watching that quietly aged out of accuracy with no flag.
A database record has a timestamp. A wiki page has prose. One tells you when it was last touched. The other reads clean regardless of age. A record only changes when something writes to it, and the timestamp shows the moment. Drift becomes auditable, intentional, traceable to the write that caused it.
The AI, however, is a writer. If it writes a record with stale context, a degraded session, the wrong scope, an incomplete understanding, the damage commits with the same timestamp authority as a correct write. It looks intentional. It isn’t. Schema constraint catches the easy failures: a required field missing, a type mismatch, a foreign key to a record that doesn’t exist. A confidently wrong field that satisfies every constraint still commits.
So the architecture has two structural defenses against AI drift. The first is write permission design: the most load-bearing layer is human-maintained and read-only to the AI, on purpose. A deterministic export pipeline keeps canon current; the AI reads it and never writes it.
The second is reuse with accumulating context. When a new article needs a source already in the database, the AI doesn’t re-evaluate from scratch and doesn’t write a parallel record. It retrieves the existing source row, reads the synthesis field that captures every prior deployment, and writes a new junction record that links the source to this article. The prior junctions stay frozen; the synthesis grows.
The records don’t change. The AI’s thinking does. By article seven, the model isn’t reasoning over one junction row, it’s reasoning across six prior treatments that the index surfaces on every relevant query, and the seventh treatment is sharper for it. Drift between old and new becomes visible at write time, not on a quarterly audit, because every new write happens in the company of every old one.
The honest architecture is this: canon is drift-resistant by design. AI-written layers are drift-resistant by discipline, schema constraint, and reuse-with-visibility, not by guarantee. The database makes drift auditable and intentional. It doesn’t make it impossible. And none of this replaces the oldest rule in the stack: backup early, backup often. Shit happens.
Search
Drift is the discipline problem. The indexing gap is the structural one.
Claude searching markdown files is pattern matching over prose. FTS5 is an inverted index: exact token matches across thousands of records in milliseconds, regardless of how the prose around those tokens is shaped. search_chapters(field="pov_char", value="Babydoll") returns every chapter Babydoll narrates across eleven books, twenty rows, ordered, with word count and QR score, in milliseconds. The wiki equivalent is asking the AI to scan compiled prose pages and hope the synthesis agent mentioned POV consistently.
A database query is exact. Prose search is fuzzy. At scale, fuzzy compounds to slop. Slop is the enemy.
The honest caveat: the models have improved. Claude today runs long context better than 3.5 did, recalls more reliably, follows structure more consistently. A current model against well-maintained markdown outperforms a 3.5 model against the same files. That’s real. It shifts the baseline, not the ceiling.
No matter how good the model gets at reading prose, it is still a model reading prose. FTS5 is deterministic. It doesn’t improve with model releases. It returns exact token matches at the same speed on Opus 4.7 as it did on 3.5. The computer was always going to prefer the format computers were built for.
The Inflection Point
The crossing from tier two to tier three isn’t about scale alone. It’s about what breaks when the prose layer stops being sufficient.
If your synthesis fits on a wiki and the wiki’s drift can be audited monthly, you’re at tier two. Nate’s answer is the answer. If the synthesis spans multiple domains, the queries are relational, the freshness is per-claim rather than per-page, and the cost of a confidently-wrong sentence in a synthesis page is higher than the cost of building schema, you’re at tier three. You build the database.
Tier three costs more than tier two upfront. Schema design is real work, and it has to happen before the records start filling in or you spend the next year doing migrations against your own back catalog. Write permission design is real work too: which layer is human-only, which layer the AI can write, which layer is read-only across the board, and how the human-maintained layer stays current without the AI quietly editing it. None of that gets easier later. The bill comes early or it comes worse later.
There’s a third thing the database buys, which is what the externalized memory piece covered from a different angle. Context window degradation can’t reach the database. The session can drift, the prose layer can soften, the chapter can lose the plot, but synthesis.db sits outside the conversation entirely. The model writes to it on commit. The next session reads from it on init. The compiled understanding is durable in a way an in-context summary never can be.
That’s the inflection point. Tier two is right for most people. Tier three is right when the use case stops being “help me write this article” and becomes “compile what I know across every domain I work in, durably, queryably, without drifting between sessions.”
What tier three looks like in practice is two session-init flows, sketched here in pseudocode against the real tool names. The first is a fiction chapter session. The second is an article that reuses prior research.
Novel chapter init: Babydoll, Book 4, Chapter 7.
# CANON IN: read the architectural shape
chapter = get_chapter(book="babydoll-4", chapter=7)
pov = get_character_fields(name=chapter.pov_char,
fields=["aidialogueprompt", "voice_register",
"trauma_profile", "current_state"])
location = get_location_fields(location_id=chapter.primary_location,
fields=["description", "atmosphere", "continuity_notes"])
coterie = get_coterie_members(coterie=chapter.coterie,
fields=["name", "current_state", "last_seen_location"])
# SYNTHESIS IN: read what prior chapters established
prior_outcomes = search_chapters(field="book", value="babydoll-4",
return_fields=["chapter_num", "synthesis_summary",
"open_threads", "qr_score"])
character_arc = search_by_field(search_field="name",
search_value=chapter.pov_char,
return_fields=["arc_state_through_book4"])
# NOTES IN: read constraints and affinities in scope
constraints = read_constraints(scope="babydoll", priority="HIGH")
affinities = read_affinities(scope="vot")
# AI WRITES THREE THINGS OUT
# 1. chapter.md (the deliverable)
# 2. write_chapter_synthesis(chapter_id, outcomes, threads_resolved,
# threads_opened)
# 3. write_note(scope="babydoll", body=production_observation, ...)
# write_constraint(...) if a new taste correction emerged
The session opens with twelve database calls. Twelve calls, three databases, every record a typed row that the AI can rely on. Nothing about Babydoll’s voice register depends on whether the loaded prose pages happened to phrase it clearly enough this time.
Article init reusing prior references.
The research workflow’s defining move is reuse. A new article on AI sycophancy doesn’t re-evaluate the same Stanford paper from scratch. It pulls every prior deployment of that source and writes the new article in the company of all of them.
# SYNTHESIS IN: what's the prior coverage on this thread?
prior_articles = research_article_search(query="sycophancy")
# returns articles with thesis, status, published_date
# For each cited source, retrieve the source row and prior usage
for source_name in brief.sources:
source = research_source_search(query=source_name)
# returns the source row with the synthesis field:
# how this source has been deployed across every prior article
prior_uses = research_use_search(source_id=source.id)
# returns every junction record: which articles cited it,
# what argument it supported, was_rebuttal flag, usage_quote
# NOTES IN: editorial stance and standing constraints
editorial = read_constraints(scope="elf", priority="HIGH")
# e.g. N:963 advocacy-source-as-evidence rule
# DRAFT THE ARTICLE in the context of all of the above
# AI WRITES TO SYNTHESIS
# 1. article.md (the deliverable)
# 2. research_article_write(article_id, title, thesis, ...)
# 3. for each source: research_use_write(source_id, article_id,
# usage_quote, usage_context, was_rebuttal)
# 4. write_note(scope="elf", body=observation_about_what_worked, ...)
The seventh article that cites a source isn’t seeing the source for the first time. It’s seeing six prior junction records, the source’s accumulated synthesis field, and the editorial stance that has held across every one of them. The reasoning compounds; the records stay frozen.
That’s tier three in operation. Canon read, three writes out, every write going to the layer shaped to receive it. The architecture isn’t a diagram. It’s twelve database calls a session, run on a deterministic shape, and the next session inheriting exactly what the last one earned.
The Compounding Caveat
The intelligence builds. Every session extends the last. The reasoning gets better, provided the reasoning was sound to start with.
The database doesn’t audit your conclusions. It preserves them, exactly as written, and hands them to the next session as the foundation to build on. Sound reasoning compounds. Unsound reasoning compounds just as faithfully. Garbage in, garbage out, at database scale, across every session you ever run.
The wiki at least re-reads and re-interprets, which gives bad reasoning a non-zero chance of being caught in new context. The database retrieves exactly what was written. The quality of the foundation determines the quality of everything built on it. Invest in the reasoning before you compile it. The database will take care of the rest.
The Landing
A year ago, the question was whether to compile. Karpathy and Nate answered that one, and they answered it correctly. Compile once, refer back, don’t ask the model to rebuild your understanding from raw documents every session. The hard question was the one that came next: what to compile into, and who maintains the compiled thing when the AI gets it wrong. The answer at scale isn’t a wiki, because a wiki is prose, and prose is what the AI was already failing on. The answer is a database with a schema the engine enforces, a write permission boundary that protects the load-bearing layer from the writer that can’t be trusted with it, and a knowledge graph that grows forward without rewriting its own past.
The schema is the highest-leverage document in the system. So we put it in the database, where the database enforces it. Not in the prompt, where the AI interprets it.
We didn’t build a wiki. We built the thing a wiki should have been built on.
You may also like:

