Building an LLM Wiki in Obsidian

> [!abstract] TL;DR A three-layer system where you curate sources, the LLM compiles and maintains a structured wiki, and your knowledge compounds over time instead of evaporating between sessions. Based on Andrej Karpathy's [llm-wiki.md](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) (April 2026). --- ## What You're Building Three layers, strictly separated: - `raw/` — immutable source documents (articles, PDFs, notes). You add to this; nothing edits it. - `wiki/` — LLM-generated and LLM-maintained markdown pages. You read this; the LLM writes it. - `CLAUDE.md` — the schema file that turns Claude Code from a generic chatbot into a disciplined wiki maintainer. The compiler analogy: `raw/` is source code, the LLM is the compiler, `wiki/` is the output. You run the compiler when you add something new. > [!tip] Why not RAG? (Retrieval-Augmented Generation) Standard RAG re-discovers knowledge from scratch on every query. The LLM Wiki pre-compiles it once. Cross-references are already there. Contradictions are already flagged. The synthesis reflects everything you've read — and gets richer with every source you add. --- ## Prerequisites - [Obsidian](https://obsidian.md/) with your vault ready - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) — `npm install -g @anthropic-ai/claude-code` - [Obsidian Web Clipper](https://obsidian.md/clipper) browser extension for capturing articles - Git initialized in your vault root (`git init`) — version history and recovery for free --- ## Step 1: Folder Structure Create this inside your vault: ``` llm-wiki/ ├── raw/ │ ├── articles/ │ ├── books/ │ ├── podcasts/ │ └── assets/ ← local images ├── wiki/ │ ├── index.md ← catalog of every wiki page (LLM maintains) │ └── log.md ← append-only ingest/query/lint history └── CLAUDE.md ← your schema file ``` > [!todo] Obsidian image setup Go to **Settings → Files and links** and set the attachment folder path to `llm-wiki/raw/assets/`. Then go to **Settings → Hotkeys**, search "Download attachments for current file," and bind a hotkey (e.g. `Ctrl+Shift+D`). After clipping an article, hit the hotkey to pull all images local so the LLM can reference them. --- ## Step 2: Write Your CLAUDE.md > [!warning] Don't skip this CLAUDE.md is the most important file in the system. Without it, Claude Code has no idea how your wiki is structured, what your topics are, or what to do on ingest. A generic chatbot gives you generic results. Paste this into `CLAUDE.md` and customize the bracketed sections: ```markdown # Wiki Schema ## Purpose This wiki covers [your topic area(s)]. It is maintained by an LLM agent. I read it; the agent writes it. ## Directory layout - raw/ Source documents. Immutable. Never modify. - wiki/ LLM-generated pages. One file per concept or entity. - wiki/index.md Catalog of all pages. Update after every ingest. - wiki/log.md Append-only log. Format: ## [YYYY-MM-DD] operation | title ## Page format Each wiki page should include: - YAML frontmatter: tags, source_count, date_updated - A 2-3 sentence summary at the top - Sections: Overview, Key Concepts, Connections, Open Questions - Wikilinks to related pages using [[Page Name]] syntax ## On ingest 1. Read the source document fully 2. Write or update a summary page in wiki/ 3. Update any related entity/concept pages (expect 5-15 page touches) 4. Update wiki/index.md with a one-line entry 5. Append to wiki/log.md ## On query Answer from wiki pages, cite sources. If the answer is useful, offer to file it as a new wiki page. ## On lint Check for: orphan pages, broken wikilinks, contradictions between pages, concepts mentioned but missing their own page. ``` > [!note] CLAUDE.md evolves with you Start simple. After a few ingests you'll notice patterns — topics the schema handles poorly, page formats that don't fit your domain, conventions you want to change. Update CLAUDE.md as you go. The LLM reads it fresh every session. --- ## Step 3: Get Sources In **Option A — Web Clipper (recommended)** Install [Obsidian Web Clipper](https://obsidian.md/clipper). When you find an article worth keeping, click the extension and route it to `llm-wiki/raw/articles/`. It converts the page to clean markdown automatically. Hit your image download hotkey after clipping. **Option B — Claude Code ingest** Open Claude Code in your vault directory and run: ``` > I added a new file to raw/articles/. Please ingest it per the schema in CLAUDE.md. ``` Claude reads the file, writes/updates wiki pages, updates `index.md`, and logs the ingest. Keep Obsidian's graph view open on the side — you can watch the connections form in real time. > [!question] What counts as a good source? Articles, book chapters, podcast notes, paper PDFs, your own fleeting notes. One rule: if you wouldn't read it twice, don't add it. Garbage in, garbage wiki. --- ## Step 4: The Three Commands Open a terminal in your vault directory, run `claude`, and use these prompts as your daily interface: **Ingest** ``` I added [filename] to raw/articles/. Please ingest it. ``` **Query** ``` What does my wiki say about [topic]? Synthesize across all relevant pages. ``` **Lint** ``` Please lint the wiki. Find orphan pages, broken links, contradictions, and concepts mentioned but missing their own page. ``` > [!tip] File good answers back in When a query produces a useful comparison, analysis, or connection you didn't see before — ask Claude to save it as a new wiki page. Your explorations compound into the knowledge base the same way ingested sources do. --- ## Step 5: Navigation and Search At small scale (~50–100 pages), `index.md` is sufficient. Keep entries short and scannable: ```markdown ## Entities - [[Tim Dettmers]] — ML researcher, quantization work, bitsandbytes. 3 sources. - [[LoRA]] — Low-rank adaptation for fine-tuning. See [[PEFT]], [[QLoRA]]. ## Concepts - [[Quantization]] — Reducing model weight precision. See [[GGUF]], [[AWQ]]. ``` Claude reads `index.md` first on every query, then drills into the relevant pages. > [!warning] The index breaks at scale Past ~150 pages, `index.md` itself overflows Claude's context window. At that point, add [qmd](https://github.com/tobi/qmd) — a local markdown search engine with BM25/vector hybrid search that works as both a CLI and MCP server. Claude Code can call it as a native tool instead of reading the full index. --- ## Keeping the Wiki Healthy Run a lint pass every few weeks. Things to catch: - **Orphan pages** — no inbound wikilinks. Either link them or delete them. - **Contradictions** — resolve by noting which source is newer or more authoritative. - **Stale claims** — flag pages where newer sources supersede old ones. - **Concept gaps** — terms mentioned across pages that lack their own entry. > [!success] Commit to git after major sessions `git add . && git commit -m "ingest: [article title]"` gives you a diff history of how the wiki evolved and a recovery path if an ingest goes sideways. --- ## Resources |Resource|Link| |---|---| |Karpathy's original gist|https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f| |Claude Code docs|https://docs.anthropic.com/en/docs/claude-code| |Obsidian Web Clipper|https://obsidian.md/clipper| |qmd (markdown search, BM25/vector)|https://github.com/tobi/qmd| |Marp (slide decks from markdown)|https://marp.app| |Dataview plugin|https://blacksmithgu.github.io/obsidian-dataview/| --- ## Footnotes: Advanced Enhancements > [!example]- Obsidian Skills for Claude (click to expand) Steph Ango (Obsidian CEO) published a set of agent skills that teach Claude to write native Obsidian syntax — callouts, Canvas, Bases, frontmatter. Adding these to your CLAUDE.md substantially improves output quality. Search "Obsidian skills Claude Code" for the current set. > [!example]- Dataview frontmatter queries (click to expand) If Claude adds structured YAML frontmatter to every wiki page (tags, dates, source counts), Dataview can generate dynamic tables across your vault automatically. Useful once you have 50+ pages and want cross-cutting views by tag or date. > [!example]- Contradiction resolution policy (click to expand) The pattern doesn't specify which source wins when two contradict. A simple convention: add `confidence: high/medium/low` to page frontmatter. Newer source wins unless the older page is `confidence: high`, in which case Claude flags it for human review rather than auto-overwriting. > [!example]- MCP bridge for Obsidian (click to expand) Claude Code can connect to Obsidian via MCP server, enabling tighter integration than filesystem access alone — direct vault queries, plugin API access, etc. Experimental as of mid-2026; check the Claude Code docs for current MCP support status. > [!bug]- Known issue: token scaling (click to expand) At large scale (200+ pages), `index.md` overflows context. Adding qmd as an MCP tool is the fix — Claude searches the wiki directly rather than loading the full index. This is an acknowledged open problem in Karpathy's original gist and the most common wall people hit. > [!example]- Synthetic training data (click to expand) Karpathy mentioned a future direction: using the compiled wiki to generate synthetic training data and fine-tune a small model so it "knows" your domain in its weights rather than through context. Expensive and experimental, but the logical end state of the pattern — a model that is your second brain, not just a reader of it.