🪬Alibaba Qwen: Generalizing an LLM from 8k to 1M Context using Qwen-Agent
Jun 10, 2024
| Oct 10, 2024
Words 227Read Time 1 min
type
status
date
Jun 10, 2024 08:16 AM
slug
summary
tags
category
icon
password
Information Generalizing an LLM from 8k to 1M Context using Qwen-Agent
QwenQwenGeneralizing an LLM from 8k to 1M Context using Qwen-Agent
 
TLDR: We’ve created an agent using Qwen2 models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.

Ideas

  1. Use a 8k-context chat model to build an agent capable of handling 1M tokens.
  1. Synthesize SFT data using this agent (with automated quality control by filters)
  1. Use the synthesized data to fine-tune a pretrained model.
 

Building Long-context Agent

RAG

Classic pipeline:
  1. Divide context into short chunks (≤ 512 tokens)
  1. retrain only the most relevant chunk within the context length (e.g., 8k)
notion image

Chunk-by-Chunk Reading

Problems of RAG: the relevant chunks do not have sufficient keyword overlap with the user query
CbC pipeline:
  1. Ask the model to assess its relevance
  1. retrieve the most relevant chunks
notion image

Step-by-Step Reasoning

Improve multi-hop reasoning ability:
notion image
 

Experiments

Baselines:
  • 32k-model, a 7B chat model fine-tuned with 8k-context and few 32k-context samples. Long context extension is realized by training-free method like RoPE.
  • 4k-RAG, 32k-model with RAG.
  • 4k-Agent, 32k model with methods listed above.
notion image
 
  • LLM
  • Long-Context
  • Long Context Challenge in LLMsPaper Reading: Code Pretraining in LLMs
    Loading...