Progress in AI over the last decade has been accompanied by a lack of robustness, rooted in a predictability crisis: highly-accurate image classifiers fail on images with an unusual background, robots surprise us with unanticipated reward hacking phenomena, and powerful language models display unexpected emergent capabilities and dangers when cunningly instructed. What options do we have for anticipating failures and risks when instructing AI systems? Should we infer a set of capabilities as a cognitive footprint of system performance? Or should we delegate predictive surveillance to an external ML-powered assessor? In the end, how can we determine the range of (co-)operating conditions, instructions and tasks where complex AI-human ecosystems will be robustly good? This predictability crisis also affects the future of AI as a whole. Can we anticipate the capabilities and risks of future AI systems when deployed in a social context? Could we find scaling laws in these complex scenarios and different granularities? In sum, what are the levels of aggregation and cognitive abstractions that lead to more predictability and better AI control?
“Predictable AI” is a community of AI academics, researchers, policy-makers and stakeholders aiming to convert these questions into a fresh perspective over the complementary ideas of machine behaviour, explainable AI, value alignment, cognitive AI evaluation, scaling laws, AI progress, uncertainty estimation, A(G)I safety and instructable AI.
With the support of: