type
status
date
Jun 10, 2024 08:16 AM
slug
summary
tags
category
icon
password
Information Generalizing an LLM from 8k to 1M Context using Qwen-Agent
Generalizing an LLM from 8k to 1M Context using Qwen-Agent
TLDR: We’ve created an agent using Qwen2 models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.
Ideas
- Use a 8k-context chat model to build an agent capable of handling 1M tokens.
- Synthesize SFT data using this agent (with automated quality control by filters)
- Use the synthesized data to fine-tune a pretrained model.
Building Long-context Agent
RAG
Classic pipeline:
- Divide context into short chunks (≤ 512 tokens)
- retrain only the most relevant chunk within the context length (e.g., 8k)

Chunk-by-Chunk Reading
Problems of RAG: the relevant chunks do not have sufficient keyword overlap with the user query
CbC pipeline:
- Ask the model to assess its relevance
- retrieve the most relevant chunks

Step-by-Step Reasoning
Improve multi-hop reasoning ability:

Experiments
Baselines:
- 32k-model, a 7B chat model fine-tuned with 8k-context and few 32k-context samples. Long context extension is realized by training-free method like RoPE.
- 4k-RAG, 32k-model with RAG.
- 4k-Agent, 32k model with methods listed above.
