AI Risk and Engineering Practice
I've been reading much about AGI risk, and arguments about the potential for existential risk from AGI. I find the conversation severely lacking. I have not seen any real engagement with, or explanation of, why AGI risk is of a uniquely different kind or extent than any number of extant technologies.
I believe Tyler is spot on – there are no properly developed models of AI/AGI risk: https://marginalrevolution.com/marginalrevolution/2023/04/this-gpt-4-answer-speaks-for-itself.html There are plenty of rambling, connect-the-dot arguments, which at best demonstrate some vague level of potential risk. But if arguments such as those are best that AI doomers can come up with, then what are they really telling us?
Then there are proposals for AI safety such as this: https://marginalrevolution.com/marginalrevolution/2023/04/ideas-for-regulating-ai-safety.html All of those are essentially proposals for government regulation, or for the use of government power. I haven’t seen a meaningful elucidation of what precisely is the risk/failure mechanism for AI systems. Number 9 in particular belies an apparent ignorance of regulatory systems and in particular of commercial engineering practice in general. Why is a 'safe harbor' needed? Numerous engineering bodies already provide forums for synchronization among private companies developing products of many kinds. There are clear frameworks and best practices. These forums exist and continue to function today, and indeed many of them have done so for decades. Why are so many who worry about AI risk so ignorant of actual engineering and technical development?
And what of the extensive existing literature on the incidence and avoidance of failures of major engineering systems, including highly complex ones such as nuclear reactors, weapon systems, aircraft, spacecraft, life-support systems, traffic controls, vehicular safety, etc? Those studies are probably the closest we already have to something resembling AI safety. Is it so obvious that they're irrelevant to the current conversation, and if so, why? I haven’t seen much to suggest that the AI safety community has seriously engaged with and digested much of that work, or even that they're aware of it at all. Instead we have a community proposing world-government type fixes for a broad set of often fanciful scenarios. A classic case of misalignment.
While we shouldn't assume that the specific lessons of the existing engineering safety literature and practice would map directly onto AI safety, I believe it at least demonstrates a framework we can apply more generally to analyze and systematize risks. Some of the core tenets are things like understanding failure modes, broadly construed, and identifying interlocks and failsafes that can break those failure modes. Often these are things like breaking highly complex functions into smaller parts (chokepoints) that can be monitored; putting humans in the loop, with appropriate visibility into specific parts; literal air gaps between functions, with positive human action required to move between functions or to go beyond a certain extent within in a given function; or the use of continuous monitoring and limiting tools feeding into kill switches.
Another common error is the commingling of two very different categories of risk. The risk profile of a badly designed or regulated AI is very different from that of a well designed but nefariously-purposed AI. The former is a lot more like the engineering problems we're (to some extent) used to dealing with. The latter is a fully different animal, much more like nuclear proliferation than classical engineering safety problems.
The nefarious actor AI problem is really just a fear that nefarious actors could become a lot more intellectually capable. Can a Clipper Chip in every AI really solve that?
The badly-behaving AI, whether due to misalignment or otherwise, is perhaps more tractable. The ability for any system, with or without AI, to inflict damage on the world is limited by its ability to impact the actual physical world. Eliezer's paperclip machine, in order to threaten human prosperity, would need to have a pretty large physical scale, and would need to be able to defend itself against our attacking it – which we presumably would if we believed it was doing something so undesirable as Eliezer fears. Here, surveillance is the key. If an AI-driven system (or indeed any system) appears to be building up capability beyond its intended scope, stop that system.