The fifth Human-aligned AI Summer School will be held in Prague
from 22nd to 25th July 2025.
We will meet for four intensive days of talks, workshops, and
discussions covering latest trends in AI alignment research and
broader framings of AI alignment research.
The school is focused on teaching and exploring approaches and frameworks, less on presentation of the latest research
results. The content of the school is mostly technical – it is assumed the attendees understand current ML
approaches and some of the underlying theoretical frameworks.
This year, we plan to explore AI alignment through three core areas:
Technical alignment research: We'll examine current technical approaches including behavioral evaluations,
mechanistic interpretability, scalable oversight, and model organisms of misalignment. We'll discuss recent
developments in these areas and what they tell us about the potential and limitations of these methods.
AI strategy and systemic alignment: We'll explore topics such as timeline considerations, strategic and governance
challenges around powerful AI, economic models of AI development, and risks of gradual disempowerment in a post-AGI
world. We'll focus on building overall understanding and how these considerations can inform technical research.
Foundational frameworks: We'll visit research areas relevant to recent AI developments, such as multi-agent
dynamics and cooperation, theories of agency, bounded rationality, and realistic models of goal-directed behavior.
These frameworks help us understand what alignment means in complex environments containing both AIs and humans, and
how to develop appropriate techniques.
The intended audience of the school are researchers interested in learning more about the AI alignment topics, PhD
students, researchers working in ML/AI outside academia, and talented students. It is recommended that the
participants have a basic general understanding of the AI
alignment problem, although the school does not assume deep knowledge of the topics.
The school consists of lectures and topical series, focused
smaller-group workshops and discussions, expert panels, and
opportunities for networking, project brainstorming and informal
discussions.
A detailed program will be announced shortly before the event. See
the program of the previous school
for an illustration of the program content and structure.
Lecture recordings from the previous school
The lectures from the previous school are available on YouTube.
Fazl Barez – University of Oxford. Fazl works on mechanistic interpretability, investigating how neural networks represent and process concepts, among other topics. His research often focuses on "superposition" and low-level representations, and building causal maps of their internal machinery to better predict their behavior on novel inputs.
Standa Fort – Stealth startup & ex-Anthropic & ex-DeepMind. Standa applies methods from physics and information theory to investigate the theoretical foundations of deep learning. His research often explores the geometry of neural network loss landscapes and the dynamics of training, using tools like the Neural Tangent Kernel to better understand how and why these complex systems learn and generalize. More recently he works on robustness and security of AI systems.
Lewis Hammond – Cooperative AI Foundation & University of Oxford. Lewis works on the strategic challenges of multi-agent AI systems, investigating how to ensure safe and beneficial interactions, and AI governance. Among other topics, his research includes game theory models of risks like collusion, multi-agent reinfocement learning, novel frameworks such as causal games to formalize concepts like intention and harm, and .
Evan Hubinger (online) – Anthropic. Evan works on anticipating and mitigating catastrophic risks from advanced AI, investigating how models might fail in dangerous ways. His research areas include deceptive alignment, where a system strategically appears aligned during training, building "sleeper agent" models that feign helpfulness to pursue hidden goals, and finding and demonstrating other instances of model misalignment and how we can prevent them.
Vojta Kovarik – Czech Technical University. Vojta works on the theoretical and game-theoretic foundations of AI safety, investigating how to design robust evaluation and oversight schemes, prevent systems from misbehaving due to issues like Goodhart's law, issues of multi-agent alignment, and more.
Jan Kulveit – Alignment of Complex Systems, Charles University. Jan works on the systemic and macro-strategic risks from AI, investigating how the integration of AI into society could go wrong. His recent research focuses on the risk of "gradual disempowerment," where humanity could lose control not through a sudden takeover, but by ceding authority to AI-driven economic and cultural systems that become misaligned with our core values, and on "LLM psychology," investigating how we can understand and model the high-level behavior of large language models.
Gavin Leech – LCFI, University of Cambridge & Arb Research. Gavin Leech's recent work is focused on projects preparing for short AGI timelines. His other projects include the research of systematic problems in ML as well as the replication crisis and philosophy of science, and a wide range of other topics related to the alignment of AI within Arb Research.
Nathaniel Sauerberg – University of Texas at Austin. Nathaniel works on foundational ideas for cooperative AI, investigating how to design agents that can achieve better outcomes in strategic interactions. His research often uses game theory to develop novel ways of modifying interactions, such as creating "safe Pareto improvements" that may robustly lead to more cooperative results.
Noah Siegel – Google DeepMind. Noah Siegel works on scalable oversight and the faithfulness of model reasoning, investigating how we can understand and control the behavior of advanced AI systems. His research at DeepMind often focuses on ensuring that an LLM's stated explanations for its decisions are faithful to its actual computational process, and on developing methods like debate to allow weaker systems to reliably supervise stronger ones.
Torben Swoboda – KU Leuven & Vlerick Business School. Torben works at the intersection of philosophy, computer science, and economics, investigating how different ethical frameworks can be formally implemented in AI systems. His research at KU Leuven often explores how reinforcement learning can be used to embody moral values in agents and analyzes the philosophical arguments surrounding AI risk, power dynamics, and governance.
We are in the process of confirming more speakers and will
update the list over time. Our past speakers include researchers from Google DeepMind, Anthropic,University of
Cambridge, Future of Humanity Institute, UC Berkeley, MIRI, and others. See e.g.
the list of speakers from the previous school.
We recommend applying early, as applications are evaluated on a
rolling basis and capacity is limited. In particular, we expect to
fill most spaces in the second half of June.
The price of the school is EUR 300 with a discounted price of EUR 150
for students and independent researchers, and includes full catering
and conference events.
We have a limited number of scholarships for participants from
disadvantaged backgrounds and in difficult situations, covering the
price of the school and partially subsidizing travel and
accommodation costs. If you require financial support to attend the
school, please indicate this in your application and we will contact
you with an application for support.
Proposing workshop sessions
The school will have several smaller-group workshop sessions,
consisting of presenting new topics, interactive activities, and
discussion on narrower or technical topics, offered by both invited
speakers and selected participants of the school.
If you are interested in presenting a workshop, please send us a brief
description of the workshop content (about half a page) along with the
intended format and target audience, noting your expertise within the
area. Note that the expectation is not to necessarily present your own
work, but rather to provide content and activities that enrich the
school for other participants. If we think your workshop is a good fit
we will get back to you and discuss the content of the workshop in
more detail and help you refine it for the school.