Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, researchers often encounter constraints stemming from finite computational resources and bounded memory capacities. This work proposes a novel approach, termed Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs. Furthermore, we delve into the prevalent issue of degraded model performance when both instructional prompts and contextual information undergo compression for downstream tasks. To address this challenge, we propose a novel instruction reconstruction methodology aimed at mitigating the detrimental effects of this compression process. The effectiveness of our proposed approach was validated across multiple tasks while achieving an impressive context compression rate of at least 32x. On text reconstruction task, we maintain a BLEU-4 score close to 0.95. On passkey retrieval task, we achieve nearly 100% accuracy involving an extensive sequence length of 1 million tokens. On long-text question-answering task, we obtain comparable performance with the non-compressed LLM in F1 and Rouge scores. Our method also demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources.