Should we infer a set of capabilities as a cognitive footprint of system performance? xxxxxxx
What are the pros and cons of delegating predictions to independent (ML) assessors based on existing experiments? xxxxxxxx
Can we anticipate the capabilities and risks of future AI systems when deployed in a social context? xxxxxxxxxxxxxxxx
Is human-like AI more predictable for humans or are human-likeness and predictability orthogonal? xxxxxxxxxxxx
Can we determine causation for AI liability as those situations that were predictable? (REPHRASED ALSO AS A PROBLEM, SEE BELOW: What’s the relation between causation, predictability and AI liability?) xxxxxx
What kind of harmful capabilities can we anticipate AI systems to have?
Is anthropomorphism a significant hindrance to robust AI evaluation?
Are AI Safety and AI Evaluation orthogonal research streams, or two sides of the same coin?
Can predictability of AI be increased through dynamic evaluation? x
What’s the role of predictable AI for future industries?
What kind of qualitative (causal, explanatory) inference makes sense from quantitative performance measurements?
What’s the relationship between explainable AI and predictable AI?
When would/should an AI system request help from a human expert? How frequently? What types of collaborations can be set up?
Can mechanistic interpretability research eventually help us predict which capabilities models will develop next during training?
Are current regulatory frameworks adequate in preventing AI from becoming unregulated? If not, what new mechanisms need to be developed?"
What are the potential problems of making big models available for everyone?
Does the interaction with AI using natural language make it more predictable, or just the other way around?"
How neural network (NN) scaling laws compare to other universal scaling laws across complex systems research? (e.g., Geoff West's superlinear scaling laws identified for cities and social phenomena and Melanie Moses' observations of scaling in computation)
In scaling laws, could the scaling exponents increase with further distributed systems (like ant colonies or the human swarm)?
Are there patterns in the ways that AIs fail to generalise?
Can we predict anything about the progress of AI from looking at biological evolution of intelligence?
How accurate have previous forecasts of AI progress been, and how can we learn from their mistakes?
What role predictability of AI plays for human trust/acceptability in/of AI?
How can we determine the range of (co-)operating conditions, instructions and tasks where complex AI-human ecosystems will be robustly good? xxx
What are the predictive levels of aggregation and cognitive abstractions that lead to better AI control? x
How can we find scaling laws in social scenarios full of emergent phenomena or situations where the law of large numbers doesn't hold? xxxx
What tools do we have to anticipate failures and risks when instructing AI systems through natural language, other than red-teaming? xxxxxx
What are good indicators of predictable performance? How can we robustly assess predictability? AI evaluation/behaviour repositories to test predictability xx
How to perform capability inference from system performance? xxxx
How to frame and evaluate the predictability of general-purpose AI? x
What’s the relation between causation, predictability and AI liability? xxxxxx
How can AI systems recognise "bad" or "potentially bad" things and mark them as such?
How to effectively implement human control of AI?
Possible ways to control technologies leading to an unstable future
How to study the trade-offs between AI predictability and AI competency (or human-AI efficiency)? xx
How to extrapolate AI capabilities into the future?
How to improve the HCI using predictable AI?
How to organize risk management for artificial intelligence systems?
How to harmonize the answers and decisions/solutions from AI systems with the current ethical (ideological) constraints/restrictions and rules of human societies?
Methods for description and detection of unexpected behavior of large language models and mitigating their risks.
Issue of differing ideologies regarding AI *predictability* and their potential wider implications for society.
How can we anticipate future distributions and degenerations caused by generative AI, in language, images, etc.?
Are scaling laws on engineering results such as LMs contaminated by the human/science bias of keeping up with the trend?