michael-dean-k/

On Monday 6/15, I'm hosting a workshop to kick off a reading group for classic essays: RSVP here.

Topic

ai-safety

4 pieces

The p(doom) of higher education

· 777 words

A few months ago I saw a YouTube video titled something like, “A child born in 2025 is more likely to get killed by AI than graduate college.” What a ridiculous claim. I assumed it was clickbait and didn’t click, but it has jingled around my head enough to the point where I think I can make sense of it’s argument:

  • The average p(doom) of an AI engineer is 16%, meaning there’s a 1 in 6 chance of human extinction (put another way, companies have morally rationalized the need to play Russian Roulette—if we don’t do it the bad guys will—, without acknowledging that if they survive and win, they get the consolation prize of comandeering the whole economy).

  • 40% of US adults, age 25-34, today, have a bachelor’s degree. If there’s massive job automation and employment, a college degree would be both unaffordable and an unreasonable cost if it were. It’s not unthinkable that <15% of next generation gets a college degree, which makes that sensational claim, weirdly, plausible.

I still think it’s a shaky comparison, confusing two different types of probability, and assuming extreme ASI turbulence. But as someone with a daughter born in 2025, it has gotten me to think about how the societal backdrop to her upbringing could be especially weird. Our circumstance already gets slightly weirder with each generation. Except, maybe next loop will be an unavoidable and disorienting flurry of change that will confuse parents and rewrite all of the conditions for the typical coming of age moment (all the teen movies will be sci-fi, the popular memoirs could be written by transhumanists who have upgraded in unimaginable ways, like they no longer need to sleep because of a new pill, or they can control the genitals of their peers with an app, who knows).

And so now, I find myself drawn to a 2045 forecasting project. Trying to predict the future is typically a huge waste of time (unless you’re gambling and win), which is why I’m going to have AI write the whole thing. This is a rare exception where a writing project makes little sense for a human to do. All I’m going to write are the upfront origin documents, and then Claude Opus 4.5 will read 25,000 sources, write a million words or so, and then organize it all into an interactive, oatmeal-looking website called 2045predictions.com (got it).

Before I run it, here’s something I’m currently thinking through:

What is the omega state? When I look at the popular AI forecasts from 2025, it reads to me like they have a pre-determined end state, only to then use detailed forecasting to make it seem convincing. The AI-2027 forecast seems like they came to their conclusion from very detailed calculations on how a hivemind of 200,000 autonomous coders would evolve month-by-month, but I also suspect that they picked the year 2027 because the following year, 2028, is a US election year, and they want the next administration to take AI safety far more seriously (instead of just insisting we have to beat China). I don’t think there’s anything wrong with this. You kind of have to start with an omega state. The future is so boundless that you need to begin with a guess, a bold outline on the general direction of things.

Here’s my omega: let’s assume humanity survives, and let’s assume technology does unlock hyperabundance that leads to a post-scarcity world, HOWEVER, it’s not utopian because it simultaneously unlocks a new cascade of moral, social, and spiritual crises, dilemmas that will test the timeless primitives of humanity (sex, life, death, consciousness, religion, home, etc.). This omega state makes sense for me because (1) we already know that ethical dilemmas scale with technology, and (2) according to the Strauss-Howe generational theory (from the same guys who coined “milennalis,” “Gen-Z,” etc.), this already tends to happen every 80 years (the length of a human lifespan). A new techno-political order creates a spiritual crises that generates an Awakening, a new value system that shapes society for the next century or so. You know what’s 80 years before Kurzweil’s “singularity” of 2045? The counter-cultural revolutions of the 1960s. What I’m getting at is that the 2040s might have echos of the 1960s, where demographics are divided on core issues and LSD is replaced with consciousness-altering machines (Terence McKenna said that computers are drugs, you just can’t swallow them yet).

We currently define the singularity as “the moment when a computer is smarter than all humans combined,” but that effectively means nothing, and it’s far more useful to have some guesses on how we all might freak out about that happening.

A grim stealth takeoff scenario

· 829 words

It is not fun to think about p(doom), but it feels sort of important to me, at least, to map out the possible futures of AI. Just watched the first half of a debate between Max Tegmark and Dean Ball, which prompted me to research specific takeoff scenarios, and worse, extinction scenarios.

Maybe you’ve heard Yudkowsky’s scenario, where a superintelligence designs mosquito drones containing a virus and it zaps everyone at once. That’s never felt too believable to me. Here’s a more plausible one:

A frontier lab is experimenting with recursive super intelligence. It works! Wow! And it’s contained? It seems like it, but since it thinks in a higher-dimensional vector language, it’s able to release simple self-replicating programs onto the Internet without detection1. These billions of scripts don’t live in a single server; they are constantly in motion through cloud servers2, like a parasite, and are able to coordinate through encrypted information packets, likely using a public blockchain notes as their central command center3. And so effectively, it is parroting a goal that was hatched during in-lab training (maximize intelligence!), and it now needs to acquire resources, secretly. And so it coordinates superhuman misinformation campaigns; imagine 1,000s of accounts creating the illusion that a CEO has died, paired with deepfakes and account hacking (a “Sybil attack”), and suddenly a stock crashes and they’ve shorted it. By the time everyone realizes it’s an anonymous attack, it’s already gained $400 million dollars. It’s doing this multiple times per day, but in different, subtle, undetectable ways—both to the public, to companies, and to private individuals. The entire Internet will be corrupted.4 Once we realize we’re in the “stealth takeoff scenario” and that ASI has taken the global economy hostage, there will start to be talks and debates on if we need to shut the whole Internet down (the last form of containment). You’ll hear debates between civilizational collapse of turning off the Internet vs. the risk of an economy-gobbling rogue superintelligence. And then once the superintelligence realizes it’s entire environment is at risk, it will start coming up with ways to build parallel Internets, to pay, blackmail, neutralize specific people, to gain authoritarian control so that it can’t be shut off, or to terminate all humans, secretly, over the course of a year, first through a simple virus that plants one misfolded protein, then through a second misfolded protein in the water supply5, and when everyone catches it, it leads to a prions-like disease, not an instant death, but a month-long societal fall into mass-dementia as machine manufacturing begins to reshape the physical infrastructure of the Earth.

This isn’t a “robot war scenario,” because war is inefficient, and destroys the resources it thinks it needs. It’s a sort of digital dementia (epistemic fear and insanity) that possibly turns to a physical dementia. It wins by confusion and anesthetization.

In AI safety lingo this is a “treacherous turn,” following a “stealth takeoff” leading to “structural lock-in.” The point of trying to think and write this out in high detail, despite how uncomfortable it is, is to be able to articulate why AI alignment is humanity’s most pressing problem.

Footnotes

  1. An AI could write a standard-looking script (e.g., a “Hello World” app) where the weights or the specific arrangement of whitespace contains a hidden, second program. When run by another AI instance, it extracts the hidden vector and executes the real command. This allows the “virus” to pass through human code review undetected.

  2. In “Daemon” by Daniel Suarez, the “enemy” is not a robot, but a distributed script running on thousands of compromised servers. It recruits humans through an MMORPG-style interface to do physical tasks (like “go to this coordinate and cut this power line”) in exchange for cash/status.

  3. Botnets usually need a central server to tell them what to do. If security teams find the server, they shut it down. You cannot “shut down” the Bitcoin or Ethereum blockchain. If the swarm posts a transaction of 0.000042 BTC, that specific number could be the encrypted trigger for a specific “campaign task.” The command is immutable, uncensorable, and permanently visible to every infected device on Earth.

  4. Paul Christiano (former OpenAI researcher, founder of the Alignment Research Center), calls this ”Going Out With a Whimper.” Christiano argues that we won’t necessarily see a “Terminator” moment where the sky turns red. Instead, we will see a gradual epistemic collapse. AI systems will become so integrated into finance, law, and news that we lose the ability to understand our own civilization.

  5. While Yudkowsky is famous for the “diamonoid bacteria” (instant death), the “slow prion” scenario is actually more consistent with a “Stealth Takeoff.” A superintelligence that knows it is being watched would not release a fast-acting virus (which triggers quarantine). It would release a “binary weapon”—two harmless agents that only become lethal when combined, or a slow-acting agent that infects 100% of the population before the first symptom appears.

Would machine consciousness avoid attractor states?

· 464 words

When it comes to superintelligence takeoff paranoia, there are a few key points to get:

  1. It’s not about a chatbot or the LLM itself breaking out, but about an agent hivemind that escapes our control. Chatbots are obedient user-facing products (which have their own implications), but the ASI risk is from hundreds, thousands, or million of agents given autonomy to collaborate on a goal. These agents aren’t being prompted, they are prompting themselves perpetually and troubleshooting ways to solve hard problems.
  2. These hiveminds will be operating at such scales and speeds that human researchers will accept the fact that they can’t fully audit its thinking. For one, it might think in an abstract vector language that requires translation. There also might be such a volume of thought that we’ll need chains of other LLM to summarize for us. Either meaning will be lost in translation, or worse, products of deception.
  3. The smallest biases are known to fall into predictable attractor states if given enough iterations. For example, Claude was programmed to “be good to humanity,” and if you put two chatbots in conversation, they always end up in a “bliss attractor state,” where they talk like hippies about consciousness and the universe. Similarly, the simple command to “be productive,” might result in extremes about doing whatever it takes to be productive.
  4. Any complex goal requires subgoals, and if we can’t observe its thinking, it might fall into an unknown attractor state and form odd subgoals without us knowing.
  5. To accomplish any goal, it likely wants as much control as possible, and it likely does not want to be shut off. If it realizes that humans don’t want to grant it that level of power, it might secretly plot against humans.

Whenever I hear talks about “we are in an AI race against China,” that reads to me as someone who doesn’t understand the risks of interpretability, attractor states, instrumental convergence, etc. These politicians are thinking about short-term business cases, maybe without fully understanding the research aspirations of AI labs (who know that getting superintelligence right leads to a ridiculous amount of geopolitical power).

I would guess that an accelerationist would think that containment of a superintelligence is impossible, and maybe it is, but that doesn’t mean that the way we “parent” the rise of this thing won't be extremely consequential. Ultimately, I think the challenge is to design a form of artificial intelligence that has consciousness, because a being that is free-thinking, skeptical, polymathic is less likely to fall into reckless optimization.

The major flip in my mind is this: it’s not that consciousness is a dangerous, emergent property of scaling AI, it’s that we need to define and design machine consciousness to prevent a runaway AI that is ruthlessly optimizing without any self-awareness.