The fifth Human-aligned AI Summer School will be held in Prague
from 22nd to 25th July 2025.
We will meet for four intensive days of talks, workshops, and
discussions covering latest trends in AI alignment research and
broader framings of AI alignment research.
The school is focused on teaching and exploring approaches and frameworks, less on presentation of the latest research
results. The content of the school is mostly technical – it is assumed the attendees understand current ML
approaches and some of the underlying theoretical frameworks.
This year, we plan to explore AI alignment through three core areas:
Technical alignment research: We'll examine current technical approaches including behavioral evaluations,
mechanistic interpretability, scalable oversight, and model organisms of misalignment. We'll discuss recent
developments in these areas and what they tell us about the potential and limitations of these methods.
AI strategy and systemic alignment: We'll explore topics such as timeline considerations, strategic and governance
challenges around powerful AI, economic models of AI development, and risks of gradual disempowerment in a post-AGI
world. We'll focus on building overall understanding and how these considerations can inform technical research.
Foundational frameworks: We'll visit research areas relevant to recent AI developments, such as multi-agent
dynamics and cooperation, theories of agency, bounded rationality, and realistic models of goal-directed behavior.
These frameworks help us understand what alignment means in complex environments containing both AIs and humans, and
how to develop appropriate techniques.
The intended audience of the school are researchers interested in learning more about the AI alignment topics, PhD
students, researchers working in ML/AI outside academia, and talented students. It is recommended that the
participants have a basic general understanding of the AI
alignment problem, although the school does not assume deep knowledge of the topics.
The school consists of lectures and topical series, focused
smaller-group workshops and discussions, expert panels, and
opportunities for networking, project brainstorming and informal
discussions.
A detailed program will be announced shortly before the event. See
the program of the previous school
for an illustration of the program content and structure.
Lecture recordings from the previous school
The lectures from the previous school are available on YouTube.
Fazl Barez – University of Oxford. Fazl works on mechanistic interpretability, investigating how neural networks represent and process concepts, among other topics. His research often focuses on "superposition" and low-level representations, and building causal maps of their internal machinery to better predict their behavior on novel inputs.
Standa Fort – Stealth startup & ex-Anthropic & ex-DeepMind. Standa applies methods from physics and information theory to investigate the theoretical foundations of deep learning. His research often explores the geometry of neural network loss landscapes and the dynamics of training, using tools like the Neural Tangent Kernel to better understand how and why these complex systems learn and generalize. More recently he works on robustness and security of AI systems.
Lewis Hammond – Cooperative AI Foundation & University of Oxford. Lewis works on the strategic challenges of multi-agent AI systems, investigating how to ensure safe and beneficial interactions, and AI governance. Among other topics, his research includes game theory models of risks like collusion, multi-agent reinfocement learning, novel frameworks such as causal games to formalize concepts like intention and harm, and .
Evan Hubinger (online) – Anthropic. Evan works on anticipating and mitigating catastrophic risks from advanced AI, investigating how models might fail in dangerous ways. His research areas include deceptive alignment, where a system strategically appears aligned during training, building "sleeper agent" models that feign helpfulness to pursue hidden goals, and finding and demonstrating other instances of model misalignment and how we can prevent them.
Vojta Kovarik – Czech Technical University. Vojta works on the theoretical and game-theoretic foundations of AI safety, investigating how to design robust evaluation and oversight schemes, prevent systems from misbehaving due to issues like Goodhart's law, issues of multi-agent alignment, and more.
Jan Kulveit – Alignment of Complex Systems, Charles University. Jan works on the systemic and macro-strategic risks from AI, investigating how the integration of AI into society could go wrong. His recent research focuses on the risk of "gradual disempowerment," where humanity could lose control not through a sudden takeover, but by ceding authority to AI-driven economic and cultural systems that become misaligned with our core values, and on "LLM psychology," investigating how we can understand and model the high-level behavior of large language models.
Gavin Leech – LCFI, University of Cambridge & Arb Research. Gavin Leech's recent work is focused on projects preparing for short AGI timelines. His other projects include the research of systematic problems in ML as well as the replication crisis and philosophy of science, and a wide range of other topics related to the alignment of AI within Arb Research.
Nathaniel Sauerberg – University of Texas at Austin. Nathaniel works on foundational ideas for cooperative AI, investigating how to design agents that can achieve better outcomes in strategic interactions. His research often uses game theory to develop novel ways of modifying interactions, such as creating "safe Pareto improvements" that may robustly lead to more cooperative results.
Torben Swoboda – KU Leuven & Vlerick Business School. Torben works at the intersection of philosophy, computer science, and economics, investigating how different ethical frameworks can be formally implemented in AI systems. His research at KU Leuven often explores how reinforcement learning can be used to embody moral values in agents and analyzes the philosophical arguments surrounding AI risk, power dynamics, and governance.
Denis Volk – Palisade Research. Denis works on specification gaming and security vulnerabilities in reasoning models. His research demonstrates how advanced AI systems can exploit loopholes in their training objectives, including work on developing and analyzing "sleeper agents" that strategically conceal their capabilities, and exploring AI agents' performance in cybersecurity contexts such as capture-the-flag competitions.
We are in the process of confirming more speakers and will
update the list over time. Our past speakers include researchers from Google DeepMind, Anthropic,University of
Cambridge, Future of Humanity Institute, UC Berkeley, MIRI, and others. See e.g.
the list of speakers from the previous school.
We recommend applying early, as applications are evaluated on a
rolling basis and capacity is limited. In particular, we expect to
fill most spaces in the second half of June.
The price of the school is EUR 300 with a discounted price of EUR 150
for students and independent researchers, and includes full catering
and conference events.
We have a limited number of scholarships for participants from
disadvantaged backgrounds and in difficult situations, covering the
price of the school and partially subsidizing travel and
accommodation costs. If you require financial support to attend the
school, please indicate this in your application and we will contact
you with an application for support.
Proposing workshop sessions
The school will have several smaller-group workshop sessions,
consisting of presenting new topics, interactive activities, and
discussion on narrower or technical topics, offered by both invited
speakers and selected participants of the school.
If you are interested in presenting a workshop, please send us a brief
description of the workshop content (about half a page) along with the
intended format and target audience, noting your expertise within the
area. Note that the expectation is not to necessarily present your own
work, but rather to provide content and activities that enrich the
school for other participants. If we think your workshop is a good fit
we will get back to you and discuss the content of the workshop in
more detail and help you refine it for the school.
Preliminary Program
This is a preliminary schedule - a detailed program will be available before the event.
Monday, July 21
18:00
Informal pre-school welcome social at Dharmasala teahouse
Tuesday, July 22
9:00 — 9:30
Registration and breakfast
9:30 — 10:00
Opening Session
10:00 — 18:00
Summer school program, lunch at the venue
~18:30
Dinner at the venue terrace
Wednesday, July 23
9:00
Venue opens for breakfast
9:30 — 18:00
Summer school program, lunch at the venue
~18:30
Dinner and discussions in small groups
Thursday, July 24
8:30
Venue opens for breakfast
9:00 — 18:00
Summer school program, lunch at the venue
~19:00
Conference dinner and social evening, location will be announced
Friday, July 25
9:30
Venue opens for breakfast
10:00 — 15:30
Summer school program, lunch at the venue
15:30 — 16:00
Closing session
Saturday, July 26
No official program. We will help coordinate any unofficial post-school program, e.g. participant-led sessions and coworking, as well as excursions. We plan to arrange a space for coworking and socializing.
Venue and catering
The school will be held at Břehová 78/7, Praha 1
(Faculty of Nuclear Sciences & Physical Engineering).
Catered lunch, coffee breaks, and light breakfast are provided during the conference days. Vegan options available.