(2023-09-13) Gwern Nenex A Neural Personal Wiki Idea

Gwern: Nenex: A Neural Personal Wiki Idea. If Vannevar Bush or Douglas Engelbart were designing a ‘neural wiki’ from the ground up to be a ‘tool for thought’, taking GPT-4-level LLMs for granted, what would that look like?

It would probably not look like existing tools, which take a hypertext approach of a collection of independent nodes referencing each other and version-controlled as text files.

A more natural approach would be to draw inspiration from DL scaling paradigms in treating ‘everything as a sequence prediction task’: in this LLM-centric wiki paradigm (Nenex), the wiki would not be file/node-centric but edit-centric.

You log all actions to train a local LLM to imitate you, and use the system alternating between taking actions and approving/disapproving execution of predicted actions. As data accumulates, the LLM learns not simply tool usage or generic text prediction, but prediction of your text, with your unique references, preferences, and even personality/values.

The wiki is represented not as a set of static files with implicit history, but in more of a revision-control system or functional programming style as a history of edits in a master log; the LLM simply learns to predict the next action in the log

All user edits, reference additions, spellchecks or new vocabulary addition, summarization, updates of now-outdated pages etc, are just more actions for the LLM to learn to predict on the fly. It can flexibly use embeddings & retrieval, simple external tools (such as downloading research papers), & operate over an API.

As of September 2023, LLMs have revolutionized my code writing, but I would have to admit that they have not revolutionized my natural-language writing

There is just not that much I can get out of a LLM if all it can do is implement short code requirements I can express in a chat format without too much friction, and I have to do everything else for it.

Writing shares a strange feature with painting. The offsprings of painting stand there as if they are alive, but if anyone asks them anything, they remain most solemnly silent. The same is true of written words. You’d think they were speaking as if they had understanding, but if you question anything that has been said because you want to learn more, it continues to signify just that same thing forever…

Something I find depressing about writing, as compared to programming, is how inert it is.

No matter how many times one corrals notes into an essay or expands an outline or runs spellcheck, it takes the same amount of work the next time

No matter how famous and influential a periodical is, it is only as good as its latest issue

Nothing is more useless than yesterday’s news.

And yet, why is it? Why isn’t the accumulated wisdom of decades of writing more useful?

What I want is to animate my dead corpus so it can learn & think & write.

There have to be more useful writing LLM tools than these—like there is in writing code, where GPT-4 has rapidly become indispensable for quickly writing or reviewing my code.

For example, tag management is a perennial pain point: adding tags to pages or URLs, splitting up overgrown tags, naming the new sub-tags… All extremely tedious chores which cause users to abandon tagging, but also straightforwardly automatable using LLM embeddings+prompts

Naming & titling short pages is a hassle that discourages creating them in the first place. Breaking up a long essay into a useful hierarchy of section headers is not the most fun one could have on one’s laptop in bed

But it’s easy to see barriers to a deep integration of LLMs. The first question is: how do you even get your text into an LLM for it to do anything with it?

most LLM use can accept a small amount of text (a small fraction of an essay), and only temporarily—that text will be forgotten as soon as that request finishes. This means that an LLM is ignorant of the rest of your personal wiki. (context/memory)

We also have security issues: our LLM assistant is too easily turned into a confused deputy, executing orders that come from attackers via prompt injection, rather than our own commands, because they tend to look the same

The biggest problem with LLMs is the processing bottleneck of attention limiting its ability to flexibly learn from any kind of corpus.

We could wait for better attention & memory mechanisms. It’s a hot area of research, and publicly-accessible models like Claude-2 can, as of September 2023, digest up to hundreds of thousands of tokens. Perhaps that is what neural wikis are waiting for? Or some further tweaks to retrieval?

But Claude-2 is still extremely expensive, the attention seems to routinely fail or result on confabulations

a real LLM maximalist wants to program NNs using not code, but data (and pre-existing data at that).

There is an alternative to context windows, if we look back in DL history to the now-forgotten age of RNNs. If we drop the usual unstated assumption that one must use stateless API calls which cannot be personalized beyond the prompt, the idea of finetuning becomes an obvious one.

Large NNs are increasingly sample-efficient, learning more from fewer data points than smaller models

And finetuning doesn’t need to be one-and-done, or limited to running offline in a batch process overnight while the user sleeps.

When RNN users wanted the best performance possible on a new corpus of text, what did they do? They used dynamic evaluation (Transformer version, w/retrieval): essentially, just finetuning the model (using ordinary ‘online’ SGD) each timestep on each input as it arrived.

Dynamic evaluation is particularly good at updating a model to a rather different data distribution, handling repeated or rare/novel (or both) tokens, at a constant factor cost (which can be dropped as necessary), while in principal being able to improve indefinitely. This makes dynamic evaluation much more suitable than other techniques like vector retrieval

If it can predict the user’s next typed word action, why not predict the user’s next action, period? Not just Backspace to correct a typo, but all actions: spellcheck, writing commit messages, switching text buffers…

(Our goal here is not ‘superintelligence’, but ‘superknowledge’.)

Because users type slowly on average, if the LLM is fast enough to provide completions in a reasonable timeframe at all, then it can probably be dynamically evaluated as well; if it is not, then one can begin optimizing more heavily.

We can do this by stepping back from simply recording typed words, to recording all actions. The central object is no longer a static text file sitting on disk, but a sequence of actions. Every text file is built up by a sequence of actions taken by the user: inserting other files, deleting words, spellchecking, typing, running built-in functions to do things like sort lines… Each action can be logged, and the LLM dynamically trained to predict the next action

This would work particularly well with a Lisp approach, as Lisp systems like Emacs can easily serialize all executed user actions to textual S-expressions

This gives a new paradigm for a ‘text editor’. Where Emacs is “everything is a buffer or Lisp function”, and vi is “everything is a keystroke”, for a neural assistant, “everything is user imitation”. All user actions and state transitions are stored as a sequence, and predicted by the LLM.

And because it is supervised by the user, its errors are corrected immediately, leading to a DAgger-like bootstrap: if the NN errors frequently in a particular way, it gets feedback on that class of errors and fixes it.

Edited: 2026-01-06 00:00:00 | Tweet this! | Search Twitter for discussion

Bill Seitz