Speakers | Program | Registration | Organizers

The third Human-aligned AI Summer School will be held in Prague from 4th to 7th August 2022. We will meet for four intensive days of discussions, workshops, and talks covering latest trends in AI alignment research and broader framings of AI alignment research.

Format of the school

The school is focused on teaching and exploring approaches and frameworks, less on presentation of the latest research results. The content of the school is mostly technical – it is assumed the attendees understand current ML approaches such as deep learning, reinforcement learning, and some of the underlying theoretical frameworks.

The intended audience of the school are researchers interested in learning more about the AI alignment topics, PhD students, researchers working in ML/AI outside academia, and talented students.
It is also recommended that the participants have a basic general understanding of the AI alignment problem and technical concepts such as specification gaming, instrumental goals, and interpretability.
This year we also encourage inter-disciplinary researchers interested in applying their domain knowledge in AI alignment to attend, e.g. from evolutionary biology, behavioral economics, or statistical mechanics.

The school consists of lectures and topical series, focused smaller-group workshops and discussions, expert panels, and opportunities for networking, project brainstorming and informal discussions.
See the previous school for an illustration of our planned speakers and general program structure (note this year will consist of 40-50% of discussions and smaller group workshops).


David Krueger (online)
Assistant Professor at the University of Cambridge, Computational and Biological Learning lab
Rohin Shah (online)
Research scientist in AGI safety team at Deepmind
Lewis Hammond
Director of Cooperative AI Foundation, DPhil affiliate at FHI Oxford
Stanislav Fořt
Researcher at Anthropic
Jan Kulveit
Researcher at the Alignment of Complex Systems research group, and FHI Oxford
Ondrej Bajgar
DPhil affiliate at FHI Oxford
Nora Amman
Project lead of the PIBBSS research fellowship


Wednesday, Aug 3

18:00 Informal pre-school welcome party in the CZEA Epistea space

Thursday, Aug 4

9:30 Venue opens for breakfast
10:30 — 11:10 Opening session
11:10 — 11:30 Coffee break
11:30 — 12:30 Ondrej Bajgar: AGI epistemics
12:30 — 14:00 Lunch (catered)
14:00 — 15:00 Jan Kulveit: Mapping alignment agendas
15:00 — 15:20 Coffee break
15:20 — 16:50 David Kruger: Is AI Alignment the solution to AI x-safety?
Technical AI Alignment obstacles and complementary approaches (remotely)
16:50 — 17:00 Break
17:00 — 18:00 Breakout session
19:00 Conference dinner in TowerPark

Friday, Aug 5

8:30 Venue opens for breakfast
9:30 – 10:50 Stanislav Fořt: Deep Learning as a Natural Science
10:50 – 11:10 Coffee break
11:10 – 12:30 Breakout session
12:30 – 14:00 Lunch (catered)
14:00 – 15:30 Lightning talks (early career researchers)
15:30 – 16:10 Extended coffee break (encouraging 1:1s)
16:10 – 17:40 Rohin Shah: Examples of goal misgeneralization (remotely)
18:30 - 20:00 Dinners at pre-booked restaurants (optional)

Saturday, Aug 6

8:30 Venue opens for breakfast
9:30 – 10:40 Lewis Hammond: Introduction to Cooperative AI
10:40 – 11:10 Coffee break
11:10 – 12:30 Breakout session
12:30 – 14:00 Lunch (catered)
14:00 – 15:00 Nora Ammann: Studying Intelligent Behavior in Natural System:
new avenues for progress in AI alignment
15:00 – 15:20 Coffee break
15:20 – 16:20 Panel discussion: AI alignment agendas
16:20 – 16:40 Break
16:40 – 17:40 Breakout session
18:30 - 20:00 Dinners at pre-booked restaurants (optional)
20:00 - 00:00 Arts event at Sacre Coeur

Sunday, Aug 7 (updated)

9:30 – 11:10 Extended breakfast (Schelling point for 1:1s: 10:00)
11:10 – 12:30 Breakout session
12:30 – 14:00 Lunch (catered)
14:00 – 15:00 Panel discussion: Careers in AI alignment
15:00 – 15:10 Break
15:10 – 16:10 Closing words, future announcements, feedback our reward signal
16:10 – 16:30 Farewell tea

Workshop sessions

Workshops are smaller-group sessions combining presenting new topics with interactive activities and discussion, offered by both speakers and senior participants of the school.

Decision Theory and Embedded Agency (Daniel Herrmann)
We will work through the basics of decision theory in the Jeffrey-Bolker framework. We will then discuss the extent to which Jeffrey-Bolker can captured embedded agency, and identify ways in which it fails to do so. We will end by generating some possible paths forward.
Challenges and directions in AI interpretability (Anson Ho)
Interpretability research has been rapidly gaining increasing popularity both inside and outside the field of AI alignment, but many key challenges remain - what are the key desiderata for interpretability tools? How can we formalise interpretability and ensure that it can competitively scale to more advanced AI? The goal of this structured discussion is to brainstorm such questions, promote critical analysis of existing interpretability work, and stimulate ideas for new directions of research.
How Different Paradigms for AGI Intersect With AI Safety (Kai Sandbrink)
Despite the recent successes of deep learning, it is as-of-now unclear both (1.) whether current deep learning models are capable of achieving general intelligence, and (2.) whether deep learning represents the best and safest technology for doing so. But what are the alternatives? In this workshop, two other AI paradigms are presented, neurosymbolic AI and probabilistic programming, and their potential for capability and safety discussed. We find that, although deep learning has the longest track record of scalability, the other approaches are potentially more robust, and merit greater exploration.
AGI Epistemics discussion (Ondrej Bajgar and Jan Kulveit)
In his talk before lunch, Ondrej outlined one possible tool for analysing arguments in AI Alignment: proxies. In this discussion, we can think of other possible tools that would help clarify thinking and discussion about future AI systems, we will try applying them to specific examples of arguments and disagreements, and see whether thinking about proxies is also useful there.
From emergent reciprocity to social dynamics (Ram Rachum)
We can use multi-agent reinforcement learning to explore the first moment where an AI agent realizes that there are other living creatures around it. We're interested in the moment where the AI agent is asking itself, "is this a friend or a foe?" In the last few years, MARL researchers have shown partial success in making the answer be "friend", or in more precise terms, to show emergent reciprocity without intrinsic rewards. We hope that better success at emergent reciprocity could shed light on how an AI agent absorbs moral values, and how an advanced AI will interact with humans. We'll look at a few examples of research in this field and discuss next possible steps.
Steps toward implementing brain-like AGI (Gunnar Zarncke)
How can the human brain reliably bring about comparable values across time and cultures, and how can we use this to implement AI with values? The workshop explores this question with exercises and by relating affective neuroscience to deep-learning multi-agent simulations. No prior experience in either field is required but may be applied in breakout groups.
Discussion on Deep Learning as a Natural Science (Stanislav Fořt)
Continuing the discussion on Deep Learning as a Natural Science: There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks, and we will be exploring phenomena, research directions and intuitions in high-dimensional loss landscapes with empiricism and intuitions and tools from statistical physics.
An Interpretable framework: Tsetlin Machine and its limitation for NLP (Ronan Kumar Yadav)
Tsetlin Machine is a recent ML model that learns simple and complex architecture using propositional logic. Such propositional logic is the combination of features in original or negated form. Since most of the operation on TM is based on bit manipulation, there is an advantage of extracting each propositional logic for easier interpretation of the model. Here, in this session, we will explore this new paradigm and the core learning process. We will demonstrate a practical case of text classification along with various interpretation techniques used in TM. In addition to this, we will explore its limitation that will open a new research area in the field of explainable AI.
Assistance games - a model of Human-AI interactions (Robert Klassert)
Assistance games are formal models of strategic human-AI interaction. They generalize the "standard model", where AI systems operate on fixed certain objectives, by allowing for uncertainty about goals and belief updates informed by human behavior and preferences. In this session we will learn about the model, canonical examples and results (like the "off-switch game") and think about the relevance of assistance in the context of other research agendas.
AI Safety: from agenda to research paper (Pablo Moreno)
In this workshop I will explain how I usually craft new research projects, which I believe can be useful to produce important research on AI Safety. In this workshop you will have the opportunity to carry out the first steps of this procedure, and generate your own productive research questions.
Type signature and origin of human values (Jan Kulveit)
Yesterday, some people expressed surprise I'm satisfied enough with my current guess about 'what human values are' to prioritize work on other topics. This impromptu session will consist of ~30m poorly structured braindump of what my guess is, followed by a discussion
Brainstorming and/or Discussion of research ideas in "intelligence in the wild" (Nora Amman)
Come to this session if you're interested in brainstorming or discussing ideas for how we might be able learn about AI alignment from looking at fields that studying intelligent behavior in natural systems. At the start of the session, we'll take stock of people's interest and from there, we might split up in smaller thematic groups to discuss ideas, or we might discuss clusters from the research map in more detail, or we might do something else in so far as that's seem more sensible given the prevalent interests.

Registration and fees

The applications are now closed.

The price of the school is EUR 250 for working professionals and EUR 150 for students and independent researchers, and includes full catering and conference events.

Due to a generous funder, we can subsidize tickets as well as travel and accommodation support for some number of participants. In case the associated expenses would cause you not to attend, we encourage you to apply and flag this in your application.


Program: Tomáš Gavenčiak (main coordinator) and Jan Kulveit,
Center for Theoretical Study, Charles University in Prague

Operations: Hana Kalivodova
Czech Association for Effective Altruism

📧 Contact us at haaiss2022@gmail.com

Opportunities at Center for Theoretical Study

The new Alignment of Complex Systems Research Group is offering positions to researchers, students, research engineers, a research manager and a technical writer. The group focuse on conceptual and empirical work on agency and the intersection of complex systems and AI alignment. See the research group announcement for more info and register your interest here.


Main venue: Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
Břehová 78/7, Praha 1
The way from the building entrance will be signposted.

Saturday evening event venue: Sacre Coeur
Holečkova 31,Praha 5

Thursday evening event venue: Tower Park
Mahlerovy sady 1, Praha 3
The entrance to the tower is on the lowest floor under the bridge. It will be signposted.

Wednesday evening party venue: CZEA Epistea space and Dharmasala teahouse
Peckova 15, Praha 8