Much discourse on artificial general intelligence (AGI) falls within a few camps. There are existential risk rationalists whose intense focus on AGI safety concerns focuses solely on a post-AGI world. Some may be agnostic to when or how AGI will be created, while others hold that scaling is all we need; that AGI is nearly certain to arrive within a generation. Others relate AGI safety concerns to worrying about overpopulation on Mars. Less ink has been spilled on the realities of how AGI might be created, and how the struggle to create AGI might interact with barriers and structures, especially non-technological, that guide general innovation.
Innovation oozes into the adjacent possible not at random, but with sharp biases, and often with great inefficiency. Innovation is driven by messy imperfect humans whose natures are emotional, hypocritical, self-serving, self-effacing. We wake up and think millions of thoughts, feel millions of feelings; we are multitudes who contain multitudes. We act, sometimes farsighted, other times shortsighted, with conviction up the hill shaped by our incentives, using the tools at hand shaped by society’s incentives, the greatest of which is capitalism.
The astonishing capabilities, and more importantly the sharp rate of progress, of models such as GPT-3, Codex, DALL-E/Imagen/Stable Diffusion has been viewed as a bullish signal for AGI - look! scaling is incredible, it keeps learning new abilities, and the log-linear trend line means we’ve found a path straight to AGI. I find such conclusions misguided. Scaling is an enormously important landmark achievement that will go down in the history of computing - but as solving the problem of superhuman ANI (artificial narrow intelligence). And the success of ANI and its sweeping economic impact - a paradise of ANI - may ultimately prove to be the greatest challenge to reaching AGI.
A fork in the road appears increasingly likely to me; a decision that will be faced by individuals, companies, and industries. On one side, we have superhuman ANI, which we roughly already know how to create, and which already appears capable of revolutionary, multi-generational economic impact. On the other side, we have AGI, which we don’t know how to create (we also don’t know what we don’t know). As with forks in the past, some will choose the first, others the second. Traditionally, the brave outsiders pursue the unproven in academia; that many individuals will choose the second path I have no doubt. The issue is the allocation of resources: what use is working on AGI in academia if it requires as enormous, or staggeringly more, resources to create than ANI? And in industry, I have no doubts that groups that dare to think differently, such as OpenAI, may continue to pursue AGI. But it will be increasingly hard to defend the choice to shareholders, to venture capitalist funding, to company budgets, to family, to your legacy, of missing the current generation-defining revolution to possibly build the next.
Beyond resource allocation, the anthropology of work under capitalism appears to prefer ANI over AGI. Historically, capitalism has a tendency of reducing generally intelligent agents (humans) to narrow purposes: division of labor, assembly line workers, slaves. Such roles do not require the broad generality of intelligence that humans are capable of. Many human jobs follow the “train-do” pattern, where on-the-job learning is minimal and unnecessary. This aligns with the “train-do” paradigm of existing (superhuman) AI, which can only learn in a very specific, very data-hungry manner; then after training, simply do. This already appears sufficient for sweeping economic impact.
Q&A
Rate yourself on these axes. Italics are my current opinions.
Do you agree that we already roughly know how to build superhuman ANI? Yes.
How much economic impact do you think superhuman ANI will have? Seems likely to be generation-defining, if not more - perhaps on the scale of computers themselves, or the internet, or greater. The impact of the models already released is great and will take some years to be felt, and the rate of progress appears very fast. The greater impact you think superhuman ANI will have, the more it will “distract” resource allocation to AGI development.
How much easier will it be to extend ANI to more domains and improve them, compared to creating AGI? Improving ANI is no small task - while scaling works wonders, it relies on data which can be lacking in many important domains. Ilya Sutskever talks about researching clever ways to make up for limited data with compute. Despite these challenges, we still have a clear roadmap for improving and expanding narrow AI: just add data. This contrasts with AGI which has many unknown unknowns and no clear roadmap.
How much will research on ANI (extending to more domains, making them better) help with creating AGI, and vice versa? Not much; I think many important problems are likely to be distinct and non-overlapping. See the addendum for some examples of AGI-specific problems in my view. The point is not that these are too hard, in contrast I think all will be solved; but that they are specific to AGI. The harder you think AGI is to create, and the less you think ANI research (which has greater incentivizes) helps AGI, the more distracting ANI is to AGI.
Even if AGI existed, for many economically important tasks, do you think superhuman ANI will be preferred over AGI (e.g., ANI is better, or faster, or cheaper, or easier to use, or safer; or AGI’s improvements are not worth the cost)? Yes, I think it plausible. Perhaps the most important ability will be learning effectively from little data and experience, which can unlock AI abilities in many more economically-important tasks. This is often associated with AGI, but I don’t see a reason why it must be. I think capitalism’s preference for the “train-do” pattern will make the final AI appear narrow in practice.
Grounding AI impact
It is notoriously difficult to pin down a precise definition of AGI, but broad consensus finds a critical contrast with generality vs specialization. Human performance is often taken as a milestone: in one manner, as a benchmark that AGI should surpass on tasks (e.g., classify images as accurately as a human); in another more interesting manner, simply as a standard of generality (achieve strong performance on as many tasks as a human can do).
Science must be grounded in reality: the reported experiment replicates in your hands, or it does not. The bridge design holds the weight, or it collapses. Reality, in turn, informs theory which concepts have value. To clarify murky conceptual definitions of AGI, and assess its value, I propose to ground it in economically tangible outcomes - does it replace a human job, or not? This is perhaps the most impactful, world-changing prospect of AI, general or not. It is here that AGI doesn’t seem necessary for many economically important tasks, while ANI seems sufficient, especially with its superhuman performance, to have transformative economic potential.
The future can reflect and pinpoint - that was the moment when AI, wherever it may be on the spectrum from specialized ANI to general AGI, really changed the world, or this industry - this is when we hit the inflection point that X% of humans were displaced by AI, or when job descriptions changed. Certainly I think this point will farther on the generality spectrum than modern AIs, and current work in broadening AI capabilities will produce incredibly advances. Yet science will take this grounding back, and find that the range on the spectrum likely wasn’t that close to the flag planted at the AGI point, which will seem antiquated and arbitrary. Why, they will ask, did they care so much about this specific level of generality, when the world could change without it?
Addendum: AGI-specific challenges
- Overcoming anti-scaling tasks / negative transfer seems challenging for AGI in order to achieve generality, but significantly less problematic for ANI where specialization is the goal. Pushing generality means encountering domains where previously learned inductive biases are not useful, and in fact can be harmful; ANI doesn’t need to juggle, reason about inductive biases, and retrieve them on a domain-specific basis as much as AGI likely will need to - a problem yet to be solved. Note that the famous no free lunch theorem lays at the extreme end here.
- A popular notion is that AGI requires interacting with an environment; scaling up a language model is not enough on its own - but to efficiently obtain online training data, model sizes need to be dramatically shrunk (e.g., GATO). This may already be one crucial fork - pursue AGI with smaller interactive models, or scale LMs to improve ANI? Cognitive models don’t need real-time speeds while solving many economically important tasks. (On the other hand, there is already abundant work on variable “thinking speed” with deep models.)
- AGI likely needs significant advances in “free-form learning” - SOTA models are superhuman on many tasks, but learn in a very specific, very data-hungry manner. We don’t know how to achieve efficient learning (as humans do) from new experiences, especially in new out-of-distribution domains. The most exciting developments here are GPT’s few-shot learning ability, and retrieval mechanisms. This problem is likely shared with broadening current ANI, but at some point there will be a fork; after all, in many economically-important tasks, trained superhuman performance is already good enough - the ability to continually learn is not necessary.