The fifth Human-aligned AI Summer School will be held in Prague from 22nd
to 25th July 2025.
We will meet for four intensive days of talks, workshops, and discussions
covering latest trends in AI alignment research and broader framings of AI
alignment research.
Format of the school
The school is focused on teaching and exploring approaches and frameworks,
less on presentation of the latest research results. The content of the school
is mostly technical – it is assumed the attendees understand current ML
approaches and some of the underlying theoretical frameworks.
This year, we plan to explore AI alignment through three core areas:
Technical alignment research: We'll examine current technical
approaches including behavioral evaluations, mechanistic interpretability,
scalable oversight, and model organisms of misalignment. We'll discuss
recent developments in these areas and what they tell us about the potential
and limitations of these methods.
AI strategy and systemic alignment: We'll explore topics such as
timeline considerations, strategic and governance challenges around powerful
AI, economic models of AI development, and risks of gradual disempowerment
in a post-AGI world. We'll focus on building overall understanding and how
these considerations can inform technical research.
Foundational frameworks: We'll visit research areas relevant to
recent AI developments, such as multi-agent dynamics and cooperation,
theories of agency, bounded rationality, and realistic models of
goal-directed behavior. These frameworks help us understand what alignment
means in complex environments containing both AIs and humans, and how to
develop appropriate techniques.
The intended audience of the school are researchers interested in learning
more about the AI alignment topics, PhD students, researchers working in ML/AI
outside academia, and talented students. It is recommended that the
participants have a basic general understanding of
the AI alignment problem, although the school does not assume
deep knowledge of the topics.
The school consists of lectures and topical series, focused smaller-group
workshops and discussions, expert panels, and opportunities for networking,
project brainstorming and informal discussions.
Lecture recordings from the previous school
The lectures from the previous school are available on YouTube.
Ondřej Bajgar – University of Oxford. Ondřej works at
the intersection of technical value learning and AI governance,
investigating how to align AI with human preferences while ensuring robust
safety through principled constraints. His research includes developing more
efficient methods for Bayesian inverse reinforcement learning (IRL) and
other applications of quantified uncertainty in AI alignment, and proposing
novel frameworks for AI regulation, most notably the use of negative human
rights as a foundation for both international policy and technical safety
specifications.
Fazl Barez – University of Oxford. Fazl works on
mechanistic interpretability, investigating how neural networks represent
and process concepts, among other topics. His research often focuses on
"superposition" and low-level representations, and building causal maps of
their internal machinery to better predict their behavior on novel inputs.
Stanislav Fort –
Stealth startup & ex-Anthropic & ex-DeepMind.
Standa applies methods from physics and information theory to investigate
the theoretical foundations of deep learning. His research often explores
the geometry of neural network loss landscapes and the dynamics of training,
using tools like the Neural Tangent Kernel to better understand how and why
these complex systems learn and generalize. More recently he works on
robustness and security of AI systems.
Tomáš Gavenčiak –
Alignment of Complex Systems, Charles University.
Tomáš works on the strategic risks emerging from complex multi-agent AI
systems, investigating how large-scale interactions between autonomous
agents can lead to systemic failures. He applies methods from game theory,
active inference and complex systems modeling to analyze the dynamics of AI
ecosystems. He is also interested in building tools for human cognition and
cyborgism.
Lewis Hammond –
Cooperative AI Foundation & University of Oxford.
Lewis works on the strategic challenges of multi-agent AI systems,
investigating how to ensure safe and beneficial interactions, and AI
governance. Among other topics, his research includes game theory models of
risks like collusion, multi-agent reinfocement learning, novel frameworks
such as causal games to formalize concepts like intention and harm, and .
Evan Hubinger (online) – Anthropic. Evan works on
anticipating and mitigating catastrophic risks from advanced AI,
investigating how models might fail in dangerous ways. His research areas
include deceptive alignment, where a system strategically appears aligned
during training, building "sleeper agent" models that feign helpfulness to
pursue hidden goals, and finding and demonstrating other instances of model
misalignment and how we can prevent them.
Vojta Kovarik – Czech Technical University. Vojta works
on the theoretical and game-theoretic foundations of AI safety,
investigating how to design robust evaluation and oversight schemes, prevent
systems from misbehaving due to issues like Goodhart's law, issues of
multi-agent alignment, and more.
Jan Kulveit –
Alignment of Complex Systems, Charles University.
Jan works on the systemic and macro-strategic risks from AI, investigating
how the integration of AI into society could go wrong. His recent research
focuses on the risk of "gradual disempowerment," where humanity could lose
control not through a sudden takeover, but by ceding authority to AI-driven
economic and cultural systems that become misaligned with our core values,
and on "LLM psychology," investigating how we can understand and model the
high-level behavior of large language models.
Gavin Leech –
LCFI, University of Cambridge & Arb Research.
Gavin Leech's recent work is focused on projects preparing for short AGI
timelines. His other projects include the research of systematic problems in
ML as well as the replication crisis and philosophy of science, and a wide
range of other topics related to the alignment of AI within Arb Research.
Fernando Rosas –
University of Sussex & Imperial College London.
Fernando applies the mathematical tools of complexity science and
information theory to fundamental questions in AI safety and
interpretability. His research investigates the universal principles
governing complex information-processing systems, drawing parallels between
computational neuroscience and AI. His recent work introduces the "AI in a
vat" framework to analyze the fundamental trade-offs between a simulated
world's computational efficiency and its faithfulness and interpretability.
Nathaniel Sauerberg – University of Texas at Austin.
Nathaniel works on foundational ideas for cooperative AI, investigating how
to design agents that can achieve better outcomes in strategic interactions.
His research often uses game theory to develop novel ways of modifying
interactions, such as creating "safe Pareto improvements" that may robustly
lead to more cooperative results.
Torben Swoboda – KU Leuven & Vlerick Business School.
Torben works at the intersection of philosophy, computer science, and
economics, investigating how different ethical frameworks can be formally
implemented in AI systems. His research at KU Leuven often explores how
reinforcement learning can be used to embody moral values in agents and
analyzes the philosophical arguments surrounding AI risk, power dynamics,
and governance.
Chris van Merwijk – University of Oxford. Chris works on
the foundational theory of inner alignment, investigating how learned models
can develop internal objectives that diverge from the goals they were
trained to pursue. He is a primary author of the seminal work introducing
"mesa-optimization," a framework that formalizes how a trained model can
itself become an optimizer with a misaligned "mesa-objective." His research
provides the core concepts, including deceptive alignment, that underpin
much of the modern study of inner alignment failures.
Denis Volk – Palisade Research. Denis works on
specification gaming and security vulnerabilities in reasoning models. His
research demonstrates how advanced AI systems can exploit loopholes in their
training objectives, including work on developing and analyzing "sleeper
agents" that strategically conceal their capabilities, and exploring AI
agents' performance in cybersecurity contexts such as capture-the-flag
competitions.
Registration and fees
Applications for the summer school are closed. Contact us at
haaiss@acsresearch.org for
last-minute updates and questions.
The price of the school is EUR 300 with a discounted price of EUR 150 for
students and independent researchers, and includes full catering and
conference events.
We have a limited number of scholarships for participants from disadvantaged
backgrounds and in difficult situations, covering the price of the school and
partially subsidizing travel and accommodation costs. If you require financial
support to attend the school, please indicate this in your application and we
will contact you with an application for support.
Proposing workshop sessions
The school will have several smaller-group workshop sessions, consisting of
presenting new topics, interactive activities, and discussion on narrower or
technical topics, offered by both invited speakers and selected participants
of the school.
If you are interested in presenting a workshop, please send us a brief
description of the workshop content (about half a page) along with the
intended format and target audience, noting your expertise within the area.
Note that the expectation is not to necessarily present your own work, but
rather to provide content and activities that enrich the school for other
participants. If we think your workshop is a good fit we will get back to you
and discuss the content of the workshop in more detail and help you refine it
for the school.
Preliminary Program
This is a preliminary schedule – please check the website back for
updates.
Nathaniel Sauerberg: Foundations of Cooperative AI
18:30
Dinner at terrace, topical tables
Wednesday, July 23
8:30 — 9:00
Breakfast
9:00 — 10:20
Evan Hubinger: Auditing Language Models for Hidden Objectives (Remote talk)
10:20 — 10:50
Coffee break
10:50 — 11:20
Denis Volk: Specification gaming by AI agents
11:20 — 11:50
Short talks: Martin Leitgab: Forecasting X-risk Relevant LLM Preferences &
Decisions Ondrej Krasa: Conceptual Limits of Mechanistic Interpretability Chris Pang: The Quest for a Human Inductive Bias
11:50 — 12:30
Lightning talks
12:30 — 14:00
Lunch
14:00 — 18:00
David F. Wagner: Flying Machines: Possibility and Peril (LARP, in
parallel)
14:00 — 15:20
Jan Kulveit: Gradual Disempowerment David Hyland: Bounded Rationality and Human-AI Alignment: Why
Limits Matter
15:20 — 15:50
Coffee break
15:50 — 17:00
Workshops (parallel, TBA)
17:00 — 17:10
Break
17:10 — 18:00
Structured discussions
18:00 +
Dinners in small groups
Thursday, July 24
9:00 — 9:30
Breakfast
9:30 — 10:30
Stanislav Fort: Robustness and Security in the Age of Massive
Multi-modal AI Models
10:30 — 11:00
Coffee break
11:00 — 12:30
Torben Swoboda: AI Developer Control Problem Lewis Hammond: Doing Effective Research, Effectively
12:30 — 14:00
Lunch
14:00 — 18:00
David F. Wagner: Flying Machines: Possibility and Peril (LARP, in
parallel)
Fernando Rosas: Identifying Abstractions Vojta Kovařík / Ondřej Bajgar: TBA
12:30 — 14:00
Lunch
14:00 — 15:00
Panel discussion: Agendas
15:00 — 15:10
Short break
15:10 — 15:50
Closing session
16:00
Schelling point for meetings
Saturday, July 26
No official program. We will help coordinate any unofficial post-school
program, e.g. participant-led sessions and coworking, as well as excursions.
We plan to arrange a space for coworking and socializing.
Venues and catering
The school will be held at
Břehová 78/7, Praha 1
(Faculty of Nuclear Sciences & Physical Engineering, Czech Technical
University).
Catered lunch, coffee breaks, and light breakfast are provided during the
conference days. Vegan options available. All the school events are covered by
the registration fee.