OMNI:
Open-endedness via Modeling human Notions of Interestingness
ICLR 2024

Abstract

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also interesting (e.g., worthwhile and novel). We propose solving this problem by Open-endedness via Models of human Notions of Interestingness (OMNI). The insight is that we can utilize large (language) models (LMs) as a model of interestingness (MoI), because they already internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that LM-based MoIs improve open-ended learning by focusing on tasks that are both learnable and interesting, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

overview

Method

Provided that the real, significant challenges of AI safety and existential risk can be solved, there are tremendous gains to be had by creating more powerful AI or even AGI. Our approach combines a learning progress auto-curriculum and a model of interestingness, to train a Reinforcement Learning (RL) agent in a task-conditioned manner.

Learning Progress Curriculum

The task pool in open-ended environments can be very large and diverse, making it challenging for an agent to learn effectively through uniform sampling. Most randomly sampled tasks are likely to be too easy or too difficult for the agent. To automatically identify tasks at the frontier of the agent's capabilities, we extend the learning-progress-based curriculum (LP) from Kanitscheider et al. The high-level idea is for the curriculum to predominantly sample tasks with high learning progress, defined as an agent's recent change in task success probability.

Modeling what Humans Find Interesting

This paper capitalizes on the capabilities of autoregressive LMs, specifically GPT-3 and GPT-4, to emulate human notions of interestingness. LMs are pretrained on vast and diverse text corpora, enabling them to amass a significant amount of world knowledge. The LMs are prompted in a few-shot manner by providing it with a few examples of choosing which tasks are interesting. It takes into account the agent's existing proficiency on a given set of tasks and suggests what humans would typically find interesting. The input prompt consists of several components:

  1. Directives encouraging interestingly different behaviors, such as "The ultimate goal that [the agent] would like your help with is to learn as many interestingly different skills as possible ..."
  2. Environment description, including the possible objects in the environment, and how a task in the environment is specified
  3. Tasks that the agent has done well and tasks to predict the interestingness of

Experiments in a Finite Task Space

OMNI significantly outperforms baselines based on uniform sampling or learning progress alone. Uniform sampling samples all tasks with equal probabilities. Uniform sampling is the most naive and samples tasks that are too easy or too difficult most of the time. LP is distracted by the many boring tasks. OMNI: LP + MoI focuses on the subset of tasks with high learning progress that are also interesting.

overview
overview
overview

Experiments in an Infinite Task Space

In truly open-ended settings, there are an infinite number of possible tasks. We demonstrate OMNI in a boundless task space, where the task set is not finitely predetermined. Essential to training an agent capable of handling any task in such an open-ended learning framework is the development of an universal reward function, that can evaluate if any task has been completed or not. We propose an instantiation of OMNI to endlesssly generate interesting tasks, and create executable code to assess the completion of the generated tasks, allowing an RL agent to learn them.

AI2-THOR results

Conclusion

In conclusion, our work demonstrates the potential of using an MoI to significantly enhance auto-curricula and the quest for open-ended learning algorithms by intelligently focusing on learnable and interesting tasks. In the long run, it hints at a synergy between LMs and open-endedness that simultaneously addresses looming challenges for both: how will LMs ultimately rise to the level of creativity seen in the best of human innovation, and how will open-endedness overcome the trap of diverging into a vast space of uninspiring mediocrity? By playing off each other’s strengths, LMs can perhaps someday become essential engines of open-ended discovery and begin to participate in the creative dance that has defined civilization since its inception.

Citation

@article{zhang2023omni,
title={OMNI: Open-endedness via Models of human Notions of Interestingness},
author={Jenny Zhang and Joel Lehman and Kenneth Stanley and Jeff Clune},
year={2023},
journal={arXiv preprint arXiv:2306.01711},
}

Acknowledgements

This work was supported by the Vector Institute, the Canada CIFAR AI Chairs program, a grant from Schmidt Futures, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. We also thank Andrew Dai, Cédric Colas, and members in our lab at the University of British Columbia, namely Aaron Dharna, Ben Norman, and Shengran Hu, for insightful discussions and feedback.

The website template was borrowed from Jon Barron.