Alibaba Qwen: Generalizing an LLM from 8k to 1M Context using Qwen-Agent

type

status

date

Jun 10, 2024 08:16 AM

slug

summary

category

icon

password

Information Generalizing an LLM from 8k to 1M Context using Qwen-Agent

QwenGeneralizing an LLM from 8k to 1M Context using Qwen-Agent

Generalizing an LLM from 8k to 1M Context using Qwen-Agent

TLDR: We’ve created an agent using Qwen2 models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.

Ideas

Use a 8k-context chat model to build an agent capable of handling 1M tokens.

Synthesize SFT data using this agent (with automated quality control by filters)

Use the synthesized data to fine-tune a pretrained model.

Building Long-context Agent

RAG

Classic pipeline:

Divide context into short chunks (≤ 512 tokens)

retrain only the most relevant chunk within the context length (e.g., 8k)

Chunk-by-Chunk Reading

Problems of RAG: the relevant chunks do not have sufficient keyword overlap with the user query

CbC pipeline:

Ask the model to assess its relevance

retrieve the most relevant chunks

Step-by-Step Reasoning

Improve multi-hop reasoning ability:

Experiments

Baselines:

32k-model, a 7B chat model fine-tuned with 8k-context and few 32k-context samples. Long context extension is realized by training-free method like RoPE.

4k-RAG, 32k-model with RAG.

4k-Agent, 32k model with methods listed above.