Human-aligned AI Summer School 2025

Speakers | Registration | Program | Organizers

The fifth Human-aligned AI Summer School was held in Prague from 22^nd to 25^th July 2025. We met for four intensive days of talks, workshops, and discussions covering latest trends in AI alignment research and broader framings of AI alignment research.

Full playlist of HAAISS 2025

Format of the school

The school is focused on teaching and exploring approaches and frameworks, less on presentation of the latest research results. The content of the school is mostly technical – it is assumed the attendees understand current ML approaches and some of the underlying theoretical frameworks.

This year, we explored AI alignment through three core areas:

Technical alignment research: We'll examine current technical approaches including behavioral evaluations, mechanistic interpretability, scalable oversight, and model organisms of misalignment. We'll discuss recent developments in these areas and what they tell us about the potential and limitations of these methods.
AI strategy and systemic alignment: We'll explore topics such as timeline considerations, strategic and governance challenges around powerful AI, economic models of AI development, and risks of gradual disempowerment in a post-AGI world. We'll focus on building overall understanding and how these considerations can inform technical research.
Foundational frameworks: We'll visit research areas relevant to recent AI developments, such as multi-agent dynamics and cooperation, theories of agency, bounded rationality, and realistic models of goal-directed behavior. These frameworks help us understand what alignment means in complex environments containing both AIs and humans, and how to develop appropriate techniques.

The intended audience of the school are researchers interested in learning more about the AI alignment topics, PhD students, researchers working in ML/AI outside academia, and talented students. It is recommended that the participants have a basic general understanding of the AI alignment problem, although the school does not assume deep knowledge of the topics.

The school consisted of lectures and topical series, focused smaller-group workshops and discussions, expert panels, and opportunities for networking, project brainstorming and informal discussions.

Speakers

Ondřej Bajgar – University of Oxford. Ondřej works at the intersection of technical value learning and AI governance, investigating how to align AI with human preferences while ensuring robust safety through principled constraints. His research includes developing more efficient methods for Bayesian inverse reinforcement learning (IRL) and other applications of quantified uncertainty in AI alignment, and proposing novel frameworks for AI regulation, most notably the use of negative human rights as a foundation for both international policy and technical safety specifications.
Fazl Barez – University of Oxford. Fazl works on mechanistic interpretability, investigating how neural networks represent and process concepts, among other topics. His research often focuses on "superposition" and low-level representations, and building causal maps of their internal machinery to better predict their behavior on novel inputs.
Stanislav Fort – Stealth startup & ex-Anthropic & ex-DeepMind. Standa applies methods from physics and information theory to investigate the theoretical foundations of deep learning. His research often explores the geometry of neural network loss landscapes and the dynamics of training, using tools like the Neural Tangent Kernel to better understand how and why these complex systems learn and generalize. More recently he works on robustness and security of AI systems.
Tomáš Gavenčiak – Alignment of Complex Systems, Charles University. Tomáš works on the strategic risks emerging from complex multi-agent AI systems, investigating how large-scale interactions between autonomous agents can lead to systemic failures. He applies methods from game theory, active inference and complex systems modeling to analyze the dynamics of AI ecosystems. He is also interested in building tools for human cognition and cyborgism.
Lewis Hammond – Cooperative AI Foundation & University of Oxford. Lewis works on the strategic challenges of multi-agent AI systems, investigating how to ensure safe and beneficial interactions, and AI governance. Among other topics, his research includes game theory models of risks like collusion, multi-agent reinfocement learning, novel frameworks such as causal games to formalize concepts like intention and harm, and .
Evan Hubinger (online) – Anthropic. Evan works on anticipating and mitigating catastrophic risks from advanced AI, investigating how models might fail in dangerous ways. His research areas include deceptive alignment, where a system strategically appears aligned during training, building "sleeper agent" models that feign helpfulness to pursue hidden goals, and finding and demonstrating other instances of model misalignment and how we can prevent them.
Vojta Kovarik – Czech Technical University. Vojta works on the theoretical and game-theoretic foundations of AI safety, investigating how to design robust evaluation and oversight schemes, prevent systems from misbehaving due to issues like Goodhart's law, issues of multi-agent alignment, and more.
Jan Kulveit – Alignment of Complex Systems, Charles University. Jan works on the systemic and macro-strategic risks from AI, investigating how the integration of AI into society could go wrong. His recent research focuses on the risk of "gradual disempowerment," where humanity could lose control not through a sudden takeover, but by ceding authority to AI-driven economic and cultural systems that become misaligned with our core values, and on "LLM psychology," investigating how we can understand and model the high-level behavior of large language models.
Gavin Leech – LCFI, University of Cambridge & Arb Research. Gavin Leech's recent work is focused on projects preparing for short AGI timelines. His other projects include the research of systematic problems in ML as well as the replication crisis and philosophy of science, and a wide range of other topics related to the alignment of AI within Arb Research.
Fernando Rosas – University of Sussex & Imperial College London. Fernando applies the mathematical tools of complexity science and information theory to fundamental questions in AI safety and interpretability. His research investigates the universal principles governing complex information-processing systems, drawing parallels between computational neuroscience and AI. His recent work introduces the "AI in a vat" framework to analyze the fundamental trade-offs between a simulated world's computational efficiency and its faithfulness and interpretability.
Nathaniel Sauerberg – University of Texas at Austin. Nathaniel works on foundational ideas for cooperative AI, investigating how to design agents that can achieve better outcomes in strategic interactions. His research often uses game theory to develop novel ways of modifying interactions, such as creating "safe Pareto improvements" that may robustly lead to more cooperative results.
Torben Swoboda – KU Leuven & Vlerick Business School. Torben works at the intersection of philosophy, computer science, and economics, investigating how different ethical frameworks can be formally implemented in AI systems. His research at KU Leuven often explores how reinforcement learning can be used to embody moral values in agents and analyzes the philosophical arguments surrounding AI risk, power dynamics, and governance.
Chris van Merwijk – University of Oxford. Chris works on the foundational theory of inner alignment, investigating how learned models can develop internal objectives that diverge from the goals they were trained to pursue. He is a primary author of the seminal work introducing "mesa-optimization," a framework that formalizes how a trained model can itself become an optimizer with a misaligned "mesa-objective." His research provides the core concepts, including deceptive alignment, that underpin much of the modern study of inner alignment failures.
Denis Volk – Palisade Research. Denis works on specification gaming and security vulnerabilities in reasoning models. His research demonstrates how advanced AI systems can exploit loopholes in their training objectives, including work on developing and analyzing "sleeper agents" that strategically conceal their capabilities, and exploring AI agents' performance in cybersecurity contexts such as capture-the-flag competitions.

Registration and fees

The summer school has concluded. Contact us at haaiss@acsresearch.org for questions about the event.

The price of the school is EUR 300 with a discounted price of EUR 150 for students and independent researchers, and includes full catering and conference events.

We have a limited number of scholarships for participants from disadvantaged backgrounds and in difficult situations, covering the price of the school and partially subsidizing travel and accommodation costs. If you require financial support to attend the school, please indicate this in your application and we will contact you with an application for support.

Proposing workshop sessions

The school had several smaller-group workshop sessions, consisting of presenting new topics, interactive activities, and discussion on narrower or technical topics, offered by both invited speakers and selected participants of the school.

Preliminary Program

This is a preliminary schedule – please check the website back for updates.

Most of the program is held at the venue, Břehová 78/7, Praha 1

Monday, July 21

18:00

Informal pre-school welcome social at Dharmasala teahouse (Peckova 15, Karlín)

Tuesday, July 22

9:00 — 9:30	Registration & Breakfast
9:30 — 9:50	Opening session
9:50 — 11:00	Gavin Leech: What's going on?
11:00 — 11:30	Coffee break
11:30 — 12:30	Jan Kulveit: Understanding LLMs
12:30 — 14:00	Lunch
14:00 — 15:10	Lightning talks
15:10 — 15:30	Coffee break
15:40 — 16:40	Lewis Hammond: Multiagent Risks from Advanced AI
16:40 — 17:00	Break
17:00 — 18:00	Nathaniel Sauerberg: Foundations of Cooperative AI
18:30	Dinner at terrace, topical tables

Wednesday, July 23

8:30 — 9:00	Breakfast
9:00 — 10:20	Evan Hubinger: Auditing Language Models for Hidden Objectives (Remote talk)
10:20 — 10:50	Coffee break
10:50 — 11:20	Denis Volk: Specification gaming by AI agents
11:20 — 11:50	Short talks: Martin Leitgab: Forecasting X-risk Relevant LLM Preferences & Decisions Ondrej Krasa: Conceptual Limits of Mechanistic Interpretability Chris Pang: The Quest for a Human Inductive Bias
11:50 — 12:30	Lightning talks
12:30 — 14:00	Lunch
14:00 — 18:00	David F. Wagner: Flying Machines: Possibility and Peril (LARP, in parallel)
14:00 — 15:20	Jan Kulveit: Gradual Disempowerment David Hyland: Bounded Rationality and Human-AI Alignment: Why Limits Matter
15:20 — 15:50	Coffee break
15:50 — 17:00	Workshops (parallel, TBA)
17:00 — 17:10	Break
17:10 — 18:00	Structured discussions
18:00 +	Dinners in small groups

Thursday, July 24

9:00 — 9:30	Breakfast
9:30 — 10:30	Stanislav Fort: Robustness and Security in the Age of Massive Multi-modal AI Models
10:30 — 11:00	Coffee break
11:00 — 12:30	Torben Swoboda: AI Developer Control Problem Lewis Hammond: Doing Effective Research, Effectively
12:30 — 14:00	Lunch
14:00 — 18:00	David F. Wagner: Flying Machines: Possibility and Peril (LARP, in parallel)
14:00 — 15:00	Panel discussion: Careers
15:00 — 15:30	Coffee break
15:30 — 16:40	Workshops (parallel, TBA)
16:40 — 17:00	Break
17:00 — 18:00	Structured discussions
18:30 — 22:00	School dinner, The Flow Terrace

Friday, July 25

9:30 — 10:00	Breakfast
10:00 — 10:45	Fazl Barez: Mechanistic Interpretability
10:45 — 11:15	Coffee break
11:15 — 12:30	Fernando Rosas: Identifying Abstractions Vojta Kovařík: What can go wrong with evaluation of powerful AI
12:30 — 14:00	Lunch
14:00 — 15:00	Panel discussion: Agendas
15:00 — 15:10	Short break
15:10 — 15:50	Closing session
16:00	Schelling point for meetings

Saturday, July 26

No official program. Coordination for any unofficial post-school program, e.g. participant-led sessions and coworking, as well as excursions.

Venues and catering

The school is held at Břehová 78/7, Praha 1 (Faculty of Nuclear Sciences & Physical Engineering, Czech Technical University).

The school dinner is held at The Flow Terrace, Wenceslas Square

The pre-school welcome social is held at Dharmasala teahouse (Peckova 15, Karlín)

Catered lunch, coffee breaks, and light breakfast are provided during the conference days. Vegan options available. All the school events are covered by the registration fee.

Organizers

Program: Tomáš Gavenčiak, Jan Kulveit, and Vojtěch Kovařík
Operations: Hana Kalivodová, Kristina Barnett, and Viktorie Havlíčková
Organized by Alignment of Complex Systems research and Center for Theoretical Study, Charles University, Prague

📧 Contact us at haaiss@acsresearch.org

Human-aligned AISummer School