Contextual Experience Replay for Continual Learning of Language Agents

Large language model-based agents have shown their potential in decision-making tasks, such as web navigation. However, solving multi-step decision-making tasks in complex environments like websites often requires the acquisition of environment-specific experiences. Without continual learning of environment-specific knowledge, current methods often fail in these complex tasks. To address this, we propose Contextual Experience Replay (CER), a novel training-free framework to enable efficient continual learning for language agents through experience replay contextually, i.e. in their context window. CER is loosely inspired by experience replay in reinforcement learning, where the agent is trained with past experiences to do continual learning. Specifically, CER accumulates and synthesizes past experiences, which are represented as natural language summarizations and concrete trajectory examples, into a dynamic memory buffer. These experiences encompass environment dynamics and common decision-making patterns, allowing the agents to retrieve and augment themselves with relevant knowledge in new contexts, enhancing their adaptability in complex environments. We evaluate CER on the challenging WebArena and VisualWebArena benchmarks. While orthogonal to other methods, CER improves the GPT-4o agent baseline by a large margin and gets competitive results. On VisualWebArena, CER surpasses the tree search method with much lower token costs and achieves a state-of-the-art success rate of 31.9%. On WebArena, CER also gets a competitive average success rate of 33.16%, relatively improving the success rate of the GPT-4o agent baseline by 36.69%. CER shows that the continual learning of environment-specific knowledge is important and can lead to significant improvements in sequential decision-making tasks in complex environments.