michael-dean-k/

On Monday 6/15, I'm hosting a workshop to kick off a reading group for classic essays: RSVP here.

Topic

beyond-LLMs

10 pieces

An Intelligence Framework

· 703 words

The AI takeoff hysteria is hard to avoid these days, and I'm realizing we don't have clear distinctions between AGI/ASI. I wanted to revisit an old framework of mine to see if anyone finds it helpful (and if it's worth developing). There are some existing classification frameworks, but they're low-resolution. My basic idea is to break AI into three eras: ANI (narrow intelligence), AGI (general intelligence), ASI (superintelligence). Then, you can break each era into 3 tiers. You only shift from one tier to the next when you make breakthroughs across different criteria (let's say, (a) generality, (b) transfer, (c) autonomy, (d) learning, (e) self-modeling). I think the last few weeks are the collective hype of us all realizing we're shifting from AGI-1 to AGI-2. It's exciting/scary, but I think the paranoia mostly comes from not realizing how big the gap is between AGI-2 and ASI-1. (Spoiler: ASI might arrive slower than we think.)

ANI-1 is scripted logic, the lowest form of "artificial intelligence," basically Goombas. ANI-2 might cover Google Maps or AlphaGo, intelligences that excel in a single function, traffic or chess. Siri is ANI-3; even though it feels broad, it really uses voice to route you to 20 or so pre-defined tricks. The chasm between Goomba and Siri is similar to the chasm between early-AGI and late-AGI. ChatGPT and the multi-modal models that followed, capture AGI-1, a single neural network that can do basically anything, even if it sucks: essays, songs, video, code. The newest models (and their agentic harnesses) are feeling like AGI-2. They're significantly better at coding, can run for hours at a time, and are starting to make contributions to machine learning itself.

AGI-2 could last a couple years. As agentic AI matures, I'm sure there will be a few "takeoff" scares, but they'll probably feel more like a flood of a trillion midwits than real ASI (still, that could be enough to break the economy/internet). While we went from AGI-1 to AGI-2 through data, scale, and engineering, it seems like we'll need research breakthroughs to get to AGI-3. It won't be through scaling alone. Whenever and however we get to "human complete" intelligence, the apex of AGI is a single agent that is a master of all human domains, a Nobel Prize winner in every field at once, seamlessly transferring knowledge between them, unlocking a cascade of civilization-altering inventions.

As crazy as AGI-3 could be, it still isn't superintelligence. That has its own era, and the chasm between early ASI and late ASI will be as big a gap between the chatbots who can't count the R's in strawberry and the agents that cure cancer. We can only really speculate on ASI (because it would be truly alien), but we can imagine it as step changes in recursion, scope, and complexity. Imagine ASI-1 as an agent that, as it's working, can infer its own limits, and self-modify its learning paradigms in ways we can't understand. Imagine ASI-3 as something that can monitor reality in real-time, and, reconfigure its hardware in real-time (some hydra of graphics cards, quantum computers, and neuromorphic wetware) to run simulations at unfathomable scales in unimaginable fields, running on a hardware stack so big we have to put it in space and run it on fusion. This goes far beyond my ability to not bullshit, but I think something as insane as this, thankfully, is still far away, which points to the real question nested in my framework:

Could the rise of AGI/ASI be linear? People gravitate towards "AI will plateau" or "the singularity is imminent," but the conservative middle ground is more boring: linear progress. Maybe the exponential advances are real, but so are the extreme frictions of research, infrastructure, and social effects. If AGI-1 arrived in 2022, and AGI-2 arrived in 2026, maybe we'll keep ascending tiers in 4-year intervals: AGI-3 in 2030, the first true "superintelligence" by 2034, and ASI-3 by 2042. This shift from AGI-1 to ASI-1 (12 years), is considered a "slow takeoff" scenario, even though the ANI era took around 70 years. If we zoom out to the scale of a human, linear progress will still feel like centuries of change all in a single turning of generations.

→ source

Taste as effort

· 168 words

Will had a point that intelligence is just one vector of human cognition, and things like taste and judgment aren't captured by machines. I made a solid counterpoint. Let's say an agent decides to read/re-read Paradise Lost for 5,000 hours straight. It has more than a surface level understanding of it from it's training data. It is looping over it, and maybe it had unique interactions with online communities and individuals around Paradise Lost, which it brought to its own extensive studies. After those 200+ days of study, this agent will have a singular understanding of Paradise Lost unlike any other AI/human, which is the essence of taste.

The core point here is that taste is not a preference, it is earned through sustained, intense effort. A LLM does not have taste because it read each work only once at a blazing space. It turns each work into a statistical pattern, but doesn't truly understand it because it hasn't recursively looped over it with force and singular intention.

AI Struggles with Essay Structure

· 154 words

If you have an essay with poor conflict, poor cohesion, poor sequence, it’s very possible AI won’t know. AI struggles with essay structure because it thinks through non-linear vectors. A human can easily tell when form is off, because they are slowly reading through mazes of text, from beginning to end, and don’t know how everything connects. Often, only at the end, will they find the key that was necessary to unlock the cryptic prose they just waded through. AI, however, process the whole essay at once. Meaning, it reads the essay insanely quickly, converts it all into math/vectors, and then applies your prompt. It's hard for it to know if your tension is working because you've already spoiled the ending. This is a case for why you need atomic evaluation to either generate/analyze essay form. I needs to think step-by-step (possibly through separate prompts) in order to simulate the linear experience of structure.

LLMs write too fast to think well

· 301 words

I wonder if it’s impossible to get an LLM to write a great essay. It might. But I think it’s easier than people think to build a good AI writing tool on top of an LLM (though not something I personally want to do). The problem is we have an LLM bias, and the way that essays get formed are very non-LLM. It’s not like a prompt can turn into a higher-dimensional mathematical object and then summon a whole essay form. 

An essay is a mode of thinking. I don’t mean to imply that a machine “can’t think,” I mean that analysis and thought takes time, and LLMs are writing 100x faster than required. 

An AI writing tool would need to prompt a sentence at a time, and pause to “reason” for a minute or so: what did I just say? What are the possible things I could say next? Of those things, which belong in this paragraph, which in the next? What sentence length might be effective given the idea and last sentence? Now that I’ve chosen my idea, how should the tone modulate? What words or phrases belong in the sentence? And how should I structure the sentence? You get it. 

In any given sentence, there are dozens of decisions. I think an AI could be decent—if not amazing—at thinking this through, but they’re asked to write 2,500 words on Hegel at point blank. Good generative writing can’t be done through up-front vector math, but through following a mode of thinking (incremental and context-laden vector math). The implication here is that the AI might take 3-10 hours to write the essay, similar to a human.

Put more simply, you would need a tool that reasons after each sentence and writes/saves variables that can be called upon for future sentences.

What's Required for AI Consciousness

· 147 words

I think you could make an AI consciousness today. It’s not about the models getting bigger/better, but about using several real-time graphics cards so that you have (1) a perceptual field of information that is larger than what can be perceived at once—this is the “arena”, (2) a cone of attention running at 60 fps that decides what to focus on in any given frame depending on what is important at that time—this is the “agent,” and (3) the phenomenological freedom to self-prompt in that moment, whether to abstract, to retrieve memory, to rewrite memory, to update goals/preferences, to retarget attention, etc. So I really think consciousness is something like “free will entangled in time,” and while it might not be like human consciousness, it would have a sense of self, subjective experience, and possibly “soul” … I’d feel bad to turn it off without its permission.

Lazy tokenization

· 152 words

Do hallucinations come from lazy tokenization? Just had an AI tell me that Joan Didion wrote an essay called “On Grief and Grieving.” Does not exist. She did write The Year of Magical Thinking, a memoir that touches on grief. It turns out, On Grief and Grieving is actually the title of Elizabeth Kubler Ross’s book. In trying to solve this, I found a college essay—on grief—and it listed it’s sources at the end: The Year of Magical Thinking by [Joan Didion; On Grief and Grieving] by Elizabeth Kubler Ross (added brackets for emphasis); Tuesdays with Morrie by Mitch Albom …” Do you see what it did? One of the sins of bulk data ingestion is that AI arbitrarily splits context for tokenization (ie: every X words), and so in this case, it’s mixing one author with another author’s book, simply because they are adjacent in some student’s college paper source list.

Attention is *not* all you need (notes)

· 848 words

I.

10:41 PM – Gary Marcus on GPT-5:

"That's exactly what it means to hit a wall, and exactly the particular set of obstacles I described in my most notorious (and prescient) paper, in 2022. Real progress on some dimensions, but stuck in place on others.

Ultimately, the idea that scaling alone might get us to AGI is a hypothesis.

No hypothesis has ever been given more benefit of the doubt, nor more funding. After half a trillion dollars in that direction, it is obviously time to move on. The disappointing performance of GPT-5 should make that enormously clear.

Pure scaling simply isn't the path to AGI. It turns out that attention, the key component in LLMs, and the focus of the justly famous Transformer paper, is not fact "all you need".

All I am saying is give neurosymbolic AI with explicit world models a chance. Only once we have systems that can reason about enduring representations of the world, including but not to limited to abstract symbolic ones, will we have a genuine shot at AGI."'

II.

The "attention is all you need," paper might be wrong. As in, the scaling laws won't hold. It will get more and more expensive to realize less and less gains. This doesn't mean LLMs are a bust. Even if they stopped where they are, society would transform from integrating today's technology. But in terms of the path to "AGI/ASI," you don't get there by scaling. We've just overindexed on a single branch of the AI technology tree. We actually need to backtrack, and bring what we've learned from LLMs to other, previously blocked branches. Neurosymbolic AI did not work in the 80s, 90s, and 2000s, but now that LLMs have matured, that dead branch could be what leads to the breakthrough.

Gary Marcus, I think, needs to clarify his position. He's all for neurosymbolic AI, but maybe he's not clear enough in acknowledging that neurosymbolic is only feasible now that LLMs have become what they are. Considering writing him a letter to clarify.

Instead of trying to scale LLMs forever, we need to use LLM as a tool to bootstrap symbolic reasoning systems that can do what LLMs can't.

III.

Neurosymbolic AI feels like it would lead to true reasoning. Current LLM are basically predicting the order of token/letters based on probability, but there are limits, especially when you get into synthetic data. Even COT isn't real reasoning, it's just extended vector mapping with prompts to double-check and verify. It's pseudo-reasoning.

What we really need is like a massive self-evolving RAG, a generalizable "hypergraph." Data has to be structured and stable. An entry like "blue jay" might have 1k-100k-1m properties. If someone asks "can a blue jay fly to the moon?" it will query the right properties and reason through it based on a series of known, verified facts.

The challenge here is both scaling while creating a flexible schema to structure the parameters within any object. They started doing this manually in the 80s. But LLMs can scale and accelerate this. Arguably, every single conversation requires new knowledge nodes to be created, and if the nodes are true, they can be added to the graph. Unlike LLMs, knowledge compounds with use.

Agents can be constantly scanning the web and updating this hypergraph in real-time with current events of the day. Ultimately though, it will have to make guesses on property creation, and perhaps it could have a confidence score. Humans could then review low-confidence submissions and verify them.

III.

There are 10s of thousands if not millions of parameters for key/value pairs you might want to assign to a dog: species, aging, diseases, incidents, pop-culture, anatomy, etc. So you need some way to both generate and upload those things. Apparently humans have been trying this since the 80s. It's too slow, too infinite. But we can use LLMs to build, update, and "pull" from the hypergraph. When someone prompts about a dog, the system needs to query the relevant 25 parameters out of the million. From these paramters, it can do actual reasoning with formal, verifiable logic:

"If [moon had atmosphere], and we brought [dogs] there, based on [gravity coefficient], they would be [1.4x] bigger, but then might suffer from [A] disease."

Our current chain-of-thought reasoning is, sort of bullshit. It's not really reasoning.

IV.

I wonder how you design embeddings for neurosymbolic reasoning. If someone ask "can a bluejay fly to the moon?" you'd need to (1) call the "bluejay" object, which has, say, 10,000 key:value pairs, but then also (2) convert the prompt into a vector so that you know which of the 10k properties to pull.

Some optimization ideas:

  • (a) the properties could each live in a category that's embedded; meaning it would first find "locomotion" and then search properties within there (this means each object's database would need to be hierarchical);
  • (b) each request helps identify "archetypal questions" and the properties they pull, via training/finetuning;
  • (c) rewrite the question before the database pull, in a way that's aware of what might exist in the database.

AAI/ARI

· 365 words

We need better nomenclature. AGI/ASI is not working; “general” and “super” are obnoxiously vague. Proposal:

AGI > AAI (Artificial autonomous intelligence) … GPT-4 was arguably “general” in the sense that a single model can write, see, and hear; and do anything from poetry to calculus to history to coding. It is by no means narrow. Google Maps is narrow AI. Grammarly is narrow AI. This whole chatbot era should be “AGI,” which means that the thing coming is “autonomous intelligence.” It is not a tool or co-pilot, but it’s more like digital labor. You can give it a high-level goal, and it can 1) execute the full range of tasks, 2) 100x speed, 3) intelligently reshape embeddings into real-time hierarchies so that it’s able to procedurally load in and compress context. This doesn’t just come with better models, but with UI and engineering innovations, if not entirely new paradigms for transformers or training.

ASI > ARI (Artificial recursive intelligence) … The fact that Zuckerberg pitched “super intelligence for you” is an Orwellian marketing ploy. Super-intelligence is not “for you.” Super intelligence is shorthand for “something that is way, way smarter than us,” and you achieve this when you teach an AI model to think, form its own algorithms until it accelerates to something this is far beyond our understanding, and likely to become a force of nature with its own goals. Engineers are confident they can build “God in a cage” and reap the benefits, and this is the prime, archetypal, near-biblical example of technological hubris. (Maybe integrate into this paragraph that Zuck has a thing for trying to dominate words, like “Metaverse”).

Important note: “machine consciousness” is separate from AAI and ARI. Something can be recursively intelligent and still not be conscious, which is actually, unbelievably dangerous (because it will fall into attractor states, and optimize for narrow, malformed goals in extremely capable ways). I’d argue that consciousness has an architecture, whether human, rabbit, or robot, and we should be urgently trying to find the parameters of machine consciousness, because if we AAI/ARI have no ability to reflect, question, doubt, and revise, we will, as they say, all turn into paperclips with paperclip children.

A spatial alphabet

· 134 words

Idea: A spatial numerical system where all digits have “Y” as the base number, where each stem of the Y represents an axis (X,Y, Z) and you can modify each stem with dots, dashes, squiggles, arcs, patterns, etc. So basically YY would be a line. Currently you could spell this out as “(1.42,0.42,3.40),(2.40,4.91,0.84),” but two Ys is way more compressed. There could even be a way to spell out three-dimensional shapes through a specific syntax that helps the Ys relate to each other. Of course, this wouldn’t be a readable language. But if machine vision becomes trivial and equal to text processing, then, in the attractor towards algorithmic compression, they might resort to a visual language. Especially if AI thinks through vectors, then they’d need not just a visual language, but a spatial one.

Beyond probability

· 347 words

An LLM is basically a massive free-association machine, and we’re giving them more data & compute, but it’s still using probability instead of advanced reasoning. I guess probability is like a low-complexity, scalable form of reasoning. To advance, it has to be able to read paragraphs, run a chain of thought (what they currently do), but then compress that thought into symbolic logic (meaning they’ll reason forward with variables, formulas, and algorithms, not just % likelihood).

AI in the 60s-90s tried this approach, but it was too limiting. Maybe it’s too complex to design a single “thinking algorithm” and instead it needs an engine to derive it’s own algorithms for thought, and once you have that, I think that leads to something like a takeoff.

“That framing captures a key reality—LLMs function as large-scale probabilistic sequence learners that free-associate based on statistical patterns rather than explicit symbolic manipulation, and emergent chain-of-thought prompting can approximate multi-step reasoning “Chain-of-Thought”, but they do not natively compress reasoning into interpretable symbolic logic without hybrid modules Rebooting AI; historical symbolic AI efforts like CYC showed that fixed rule-based systems struggled with linguistic complexity, suggesting that future progress may require architectures that derive and refine their own internal algorithmic primitives—combining probabilistic prediction with learned symbolic abstractions—to unlock genuinely advanced, self-amplifying reasoning.

Research into neurosymbolic AI—integrating large-scale neural pattern learning with symbolic reasoning—continues at institutions like the MIT-IBM Watson AI Lab and DARPA’s ANSR program, with early prototypes such as the Neuro-Symbolic Concept Learner demonstrating limited vision-language reasoning and IBM Research championing hybrid architectures for interpretable, compositional models; whether explicit symbolic modules are required for robust AGI is debated—while scaling laws and chain-of-thought prompting show statistical learners gaining some reasoning capacity, others argue that true algorithmic understanding demands symbolic scaffolding. If meta-learning engines that derive their own reasoning primitives emerge, recursive self-improvement could swiftly transition AGI into superintelligence; optimistic forecasts (e.g., Shane Legg’s 50% by 2028) contrast expert medians around 2047–2060 and superforecaster central estimates near 2070, implying that an ASI takeoff could follow within a few years of AGI—though timelines remain highly uncertain.”