摘要

上下文工程正在快速演进,本文展望Automatic Context Management、Context Routing、Hierarchical Context、Memory-augmented LM、State Space Models(如Mamba、RWKV)等前沿技术,分析LLM上下文处理的未来发展方向。

关键词速览

术语英文说明
ACMAutomatic Context Management自动上下文管理
Context RoutingContext Routing上下文路由
Hierarchical ContextHierarchical Context分层上下文
Memory LMMemory-Augmented LM记忆增强语言模型
SSMState Space Models状态空间模型
MambaMamba选择性状态空间模型
RWKVRWKV循环GPT
Linear AttentionLinear Attention线性注意力
Mixture of ExpertsMixture of Experts专家混合
Long Range ArenaLong Range Arena长程竞技场

一、Automatic Context Management

1.1 ACM概述

自动上下文管理(ACM)是让LLM自动决定如何处理上下文的范式:

传统方式:
用户 ──► 手动设计prompt ──► LLM ──► 固定上下文

ACM方式:
用户 ──► ACM系统自动决策 ──► LLM
         │
         ├── 决定包含什么
         ├── 决定如何组织
         ├── 决定压缩程度
         └── 决定存储位置

1.2 ACM核心组件

class AutomaticContextManager:
    """自动上下文管理器"""
    
    def __init__(
        self,
        llm_client,
        memory_store,
        embedding_model
    ):
        self.llm = llm_client
        self.memory = memory_store
        self.embedding = embedding_model
    
    def manage_context(
        self,
        user_input: str,
        session_id: str
    ) -> dict:
        """
        自动管理上下文
        """
        # 1. 分析任务类型
        task_analysis = self._analyze_task(user_input)
        
        # 2. 决定上下文策略
        strategy = self._decide_strategy(task_analysis)
        
        # 3. 收集相关上下文
        contexts = self._collect_contexts(user_input, strategy)
        
        # 4. 决定是否压缩
        if self._needs_compression(contexts, strategy):
            contexts = self._compress_contexts(contexts, strategy)
        
        # 5. 决定是否存储
        if strategy.get('should_remember'):
            self._store_in_memory(user_input, contexts, session_id)
        
        return {
            'contexts': contexts,
            'strategy': strategy,
            'task_type': task_analysis['type'],
            'compression_applied': self._needs_compression(contexts, strategy)
        }
    
    def _analyze_task(self, user_input: str) -> dict:
        """分析任务类型"""
        prompt = f"""分析以下用户输入,确定其任务类型和上下文需求。
 
输入:{user_input}
 
分析维度:
1. 任务类型:factual(事实查询)/creative(创意)/reasoning(推理)/conversation(对话)
2. 上下文依赖度:高/中/低
3. 需要的历史深度:短期/中期/长期
4. 是否需要专业知识
 
输出JSON格式:"""
        
        result = self.llm.generate(prompt)
        # 解析JSON
        return json.loads(result)
    
    def _decide_strategy(self, task_analysis: dict) -> dict:
        """决定处理策略"""
        strategies = {
            'factual': {
                'context_window': 'focused',  # 聚焦在相关事实上
                'compression': 'aggressive',
                'should_remember': True,
                'retrieval_weight': 0.8
            },
            'creative': {
                'context_window': 'broad',  # 更宽泛的上下文
                'compression': 'light',
                'should_remember': False,
                'retrieval_weight': 0.3
            },
            'reasoning': {
                'context_window': 'complete',  # 完整上下文
                'compression': 'minimal',
                'should_remember': True,
                'retrieval_weight': 0.5
            },
            'conversation': {
                'context_window': 'recent',  # 最近对话
                'compression': 'moderate',
                'should_remember': True,
                'retrieval_weight': 0.2
            }
        }
        
        task_type = task_analysis.get('type', 'conversation')
        return strategies.get(task_type, strategies['conversation'])

1.3 自适应压缩

class AdaptiveCompressor:
    """自适应压缩器"""
    
    def __init__(self, llm_client):
        self.llm = llm_client
        self.compression_models = {
            'light': LightCompressor(),
            'moderate': ModerateCompressor(),
            'aggressive': AggressiveCompressor()
        }
    
    def compress_adaptive(
        self,
        contexts: List[str],
        task_type: str,
        available_tokens: int
    ) -> List[str]:
        """根据任务自适应压缩"""
        # 选择压缩级别
        if task_type == 'factual':
            compression_level = 'aggressive'
        elif task_type == 'reasoning':
            compression_level = 'light'
        else:
            compression_level = 'moderate'
        
        compressor = self.compression_models[compression_level]
        
        # 压缩每个上下文
        compressed = []
        for ctx in contexts:
            ctx_tokens = self._estimate_tokens(ctx)
            
            if ctx_tokens <= available_tokens:
                compressed.append(ctx)
            else:
                target_ratio = available_tokens / ctx_tokens
                compressed.append(compressor.compress(ctx, target_ratio))
        
        return compressed

二、Context Routing

2.1 Context Routing概念

Context Routing根据内容类型将信息路由到不同的处理路径:

class ContextRouter:
    """上下文路由器"""
    
    def __init__(self):
        self.routing_rules = {
            'factual': {
                'path': 'retrieval',  # 知识库检索
                'compression': 'high',
                'storage': 'knowledge_base'
            },
            'personal': {
                'path': 'memory',  # 个人记忆
                'compression': 'low',
                'storage': 'user_profile'
            },
            'session': {
                'path': 'window',  # 短期窗口
                'compression': 'minimal',
                'storage': 'session_cache'
            },
            'system': {
                'path': 'instruction',  # 指令处理
                'compression': 'none',
                'storage': 'system_prompt'
            }
        }
    
    def route(self, content: str, content_type: str = None) -> dict:
        """路由内容"""
        # 自动检测内容类型
        if content_type is None:
            content_type = self._detect_content_type(content)
        
        routing = self.routing_rules.get(
            content_type,
            self.routing_rules['session']
        )
        
        return {
            'content': content,
            'type': content_type,
            'path': routing['path'],
            'compression': routing['compression'],
            'storage': routing['storage']
        }
    
    def _detect_content_type(self, content: str) -> str:
        """检测内容类型"""
        # 简化实现
        if any(kw in content for kw in ['记住', '我的名字', '偏好']):
            return 'personal'
        elif any(kw in content for kw in ['什么是', '定义', '如何']):
            return 'factual'
        elif '系统' in content or '你是一个' in content:
            return 'system'
        else:
            return 'session'

2.2 动态上下文选择

class DynamicContextSelector:
    """动态上下文选择器"""
    
    def __init__(
        self,
        embedding_model,
        attention_scorer
    ):
        self.embedding = embedding_model
        self.attention_scorer = attention_scorer
    
    def select(
        self,
        query: str,
        candidates: List[Dict],
        budget_tokens: int,
        strategy: str = "weighted"
    ) -> List[Dict]:
        """
        动态选择上下文
        
        strategy: 'greedy', 'weighted', 'diverse'
        """
        if strategy == "greedy":
            return self._greedy_select(query, candidates, budget_tokens)
        elif strategy == "weighted":
            return self._weighted_select(query, candidates, budget_tokens)
        else:
            return self._diverse_select(query, candidates, budget_tokens)
    
    def _greedy_select(
        self,
        query: str,
        candidates: List[Dict],
        budget_tokens: int
    ) -> List[Dict]:
        """贪心选择"""
        query_emb = self.embedding.encode(query)
        
        # 按相关性排序
        scored = []
        for c in candidates:
            c_emb = self.embedding.encode(c['content'])
            score = self._cosine_similarity(query_emb, c_emb)
            scored.append((score, c))
        
        scored.sort(key=lambda x: x[0], reverse=True)
        
        # 贪心选择
        selected = []
        current_tokens = 0
        
        for score, c in scored:
            c_tokens = c.get('token_count', self._estimate_tokens(c['content']))
            
            if current_tokens + c_tokens <= budget_tokens:
                selected.append(c)
                current_tokens += c_tokens
        
        return selected
    
    def _weighted_select(
        self,
        query: str,
        candidates: List[Dict],
        budget_tokens: int
    ) -> List[Dict]:
        """加权选择:考虑相关性和重要性"""
        query_emb = self.embedding.encode(query)
        
        scored = []
        for c in candidates:
            relevance = self._cosine_similarity(
                query_emb,
                self.embedding.encode(c['content'])
            )
            
            importance = c.get('importance', 0.5)
            
            # 综合分数
            combined_score = 0.7 * relevance + 0.3 * importance
            
            scored.append((combined_score, c))
        
        scored.sort(key=lambda x: x[0], reverse=True)
        
        selected = []
        current_tokens = 0
        
        for score, c in scored:
            c_tokens = c.get('token_count', self._estimate_tokens(c['content']))
            
            if current_tokens + c_tokens <= budget_tokens:
                selected.append(c)
                current_tokens += c_tokens
        
        return selected
    
    def _diverse_select(
        self,
        query: str,
        candidates: List[Dict],
        budget_tokens: int
    ) -> List[Dict]:
        """多样性选择"""
        # 先按相关性筛选top-k
        query_emb = self.embedding.encode(query)
        
        scored = []
        for c in candidates:
            relevance = self._cosine_similarity(
                query_emb,
                self.embedding.encode(c['content'])
            )
            scored.append((relevance, c))
        
        scored.sort(key=lambda x: x[0], reverse=True)
        top_k = scored[:min(20, len(scored))]
        
        # MMSR多样性选择
        selected = []
        current_tokens = 0
        
        for relevance, c in top_k:
            c_tokens = c.get('token_count', self._estimate_tokens(c['content']))
            
            if current_tokens + c_tokens > budget_tokens:
                continue
            
            # 检查与已选内容的多样性
            is_diverse = True
            for s in selected:
                sim = self._cosine_similarity(
                    self.embedding.encode(c['content']),
                    self.embedding.encode(s['content'])
                )
                if sim > 0.9:  # 太相似
                    is_diverse = False
                    break
            
            if is_diverse or len(selected) < 3:  # 至少选3个
                selected.append(c)
                current_tokens += c_tokens
        
        return selected
    
    @staticmethod
    def _cosine_similarity(a, b):
        dot = sum(x * y for x, y in zip(a, b))
        norm_a = sum(x * x for x in a) ** 0.5
        norm_b = sum(x * x for x in b) ** 0.5
        return dot / (norm_a * norm_b + 1e-8)
    
    @staticmethod
    def _estimate_tokens(text: str) -> int:
        chinese = sum(1 for c in text if '\u4e00' <= c <= '\u9fff')
        english = len(text.split()) - chinese
        return int(chinese * 0.5 + english * 0.25)

三、Hierarchical Context

3.1 分层上下文架构

┌─────────────────────────────────────────────────────────────────┐
│                    分层上下文架构                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  L3: 全局层 (Global)                                            │
│      - 跨会话的知识                                             │
│      - 用户长期偏好                                             │
│      - 领域知识库                                               │
│                                                                 │
│  L2: 会话层 (Session)                                           │
│      - 当前会话的历史                                           │
│      - 任务上下文                                               │
│      - 临时信息                                                 │
│                                                                 │
│  L1: 实时层 (Real-time)                                          │
│      - 当前输入                                                 │
│      - 活跃焦点                                                 │
│      - 立即上下文                                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

3.2 实现代码

class HierarchicalContextManager:
    """分层上下文管理器"""
    
    def __init__(self):
        self.global_layer = GlobalContextStore()
        self.session_layer = SessionContextStore()
        self.realtime_layer = {}
    
    def get_context(
        self,
        query: str,
        session_id: str,
        config: dict = None
    ) -> str:
        """获取分层上下文"""
        if config is None:
            config = self._get_default_config()
        
        context_parts = []
        
        # 1. 实时层
        if config.get('include_realtime', True):
            realtime = self.realtime_layer.get(session_id, [])
            context_parts.extend(realtime)
        
        # 2. 会话层
        if config.get('include_session', True):
            session_context = self.session_layer.get(session_id)
            if session_context:
                context_parts.append(f"[会话上下文]\n{session_context}")
        
        # 3. 全局层
        if config.get('include_global', True):
            global_context = self.global_layer.get_relevant(query)
            if global_context:
                context_parts.append(f"[全局知识]\n{global_context}")
        
        return "\n\n".join(context_parts)
    
    def update_context(
        self,
        session_id: str,
        content: str,
        layer: str = 'session'
    ):
        """更新指定层"""
        if layer == 'session':
            self.session_layer.add(session_id, content)
        elif layer == 'global':
            self.global_layer.add(content)
        elif layer == 'realtime':
            if session_id not in self.realtime_layer:
                self.realtime_layer[session_id] = []
            self.realtime_layer[session_id].append(content)
    
    def _get_default_config(self) -> dict:
        """获取默认配置"""
        return {
            'include_realtime': True,
            'include_session': True,
            'include_global': True,
            'max_tokens': {
                'realtime': 2000,
                'session': 8000,
                'global': 5000
            }
        }

四、Memory-Augmented LM

4.1 记忆增强语言模型

class MemoryAugmentedLM:
    """记忆增强语言模型"""
    
    def __init__(
        self,
        llm_client,
        memory_store,
        controller
    ):
        self.llm = llm_client
        self.memory = memory_store
        self.controller = controller  # 记忆控制器
    
    def generate_with_memory(
        self,
        query: str,
        session_id: str
    ) -> str:
        """带记忆的生成"""
        # 1. 从记忆中检索相关内容
        relevant_memories = self.memory.retrieve(query, session_id)
        
        # 2. 决定读取哪些记忆
        selected_memories = self.controller.select_read(
            query,
            relevant_memories
        )
        
        # 3. 构建提示
        prompt = self._build_prompt(query, selected_memories)
        
        # 4. 生成
        response = self.llm.generate(prompt)
        
        # 5. 决定是否写入记忆
        if self.controller.should_write(response, session_id):
            self.memory.write(response, session_id)
        
        return response
    
    def learn_from_feedback(
        self,
        query: str,
        response: str,
        feedback: dict,
        session_id: str
    ):
        """从反馈中学习"""
        # 更新记忆权重
        self.memory.update_weights(
            query,
            response,
            feedback,
            session_id
        )
        
        # 如果反馈负面,修正相关记忆
        if feedback.get('rating', 0) < 3:
            self._revise_memory(query, response, feedback, session_id)

4.2 神经记忆模块

class NeuralMemoryModule:
    """神经记忆模块"""
    
    def __init__(self, embedding_dim: int = 768):
        self.embedding_dim = embedding_dim
        self.memory_keys = []  # 记忆的key向量
        self.memory_values = []  # 记忆的内容
        self.memory_strengths = []  # 记忆强度
    
    def write(
        self,
        key: str,
        value: str,
        strength: float = 1.0
    ):
        """写入记忆"""
        # 简化:直接存储
        self.memory_keys.append(key)
        self.memory_values.append(value)
        self.memory_strengths.append(strength)
    
    def read(self, query: str, top_k: int = 5) -> List[dict]:
        """读取记忆"""
        # 简化实现
        # 实际中应该用向量相似度
        results = []
        
        for i, key in enumerate(self.memory_keys):
            # 简单的关键词匹配
            overlap = len(set(key.split()) & set(query.split()))
            if overlap > 0:
                results.append({
                    'content': self.memory_values[i],
                    'strength': self.memory_strengths[i],
                    'relevance': overlap / len(key.split())
                })
        
        # 排序并返回top_k
        results.sort(key=lambda x: x['strength'] * x['relevance'], reverse=True)
        return results[:top_k]
    
    def update_strength(
        self,
        index: int,
        delta: float
    ):
        """更新记忆强度"""
        if 0 <= index < len(self.memory_strengths):
            self.memory_strengths[index] += delta
            # 限制范围
            self.memory_strengths[index] = max(0.1, min(2.0, self.memory_strengths[index]))

五、State Space Models

5.1 SSM基础

状态空间模型(SSM)是一种新型的序列建模架构,与Transformer相比具有线性复杂度的优势:

class StateSpaceModel:
    """
    状态空间模型基础
    
    SSM通过状态方程建模序列:
    x(t+1) = Ax(t) + Bu(t)
    y(t) = Cx(t) + Du(t)
    
    其中:
    - A: 状态转移矩阵
    - B: 输入矩阵
    - C: 输出矩阵
    - D: 直连矩阵
    """
    
    def __init__(
        self,
        state_dim: int,
        input_dim: int,
        output_dim: int
    ):
        self.state_dim = state_dim
        self.input_dim = input_dim
        self.output_dim = output_dim
        
        # 初始化参数
        self.A = np.random.randn(state_dim, state_dim) * 0.1
        self.B = np.random.randn(state_dim, input_dim) * 0.1
        self.C = np.random.randn(output_dim, state_dim) * 0.1
        self.D = np.zeros((output_dim, input_dim))
    
    def forward(self, u: np.ndarray) -> tuple:
        """
        SSM前向传播
        
        Args:
            u: 输入序列 [seq_len, input_dim]
        
        Returns:
            y: 输出序列 [seq_len, output_dim]
            state: 最终状态 [state_dim]
        """
        seq_len = len(u)
        state = np.zeros(self.state_dim)
        outputs = []
        
        for t in range(seq_len):
            # 状态更新
            state = self.A @ state + self.B @ u[t]
            
            # 输出
            y = self.C @ state + self.D @ u[t]
            outputs.append(y)
        
        return np.array(outputs), state

5.2 Mamba架构

Mamba是选择性状态空间模型(Selective SSM),通过输入依赖的选择机制实现:

class MambaBlock:
    """
    Mamba选择机制
    
    核心改进:让A, B, C矩阵成为输入的函数,
    从而实现内容感知的序列建模
    """
    
    def __init__(
        self,
        d_model: int,
        d_state: int = 16,
        d_conv: int = 4,
        expand: int = 2
    ):
        self.d_model = d_model
        self.d_state = d_state
        self.d_conv = d_conv
        self.expand = expand
        
        d_inner = d_model * expand
        
        # 投影矩阵
        self.in_proj = nn.Linear(d_model, d_inner * 2)
        
        # 卷积
        self.conv = nn.Conv1d(
            d_inner,
            d_inner,
            d_conv,
            padding=d_conv - 1
        )
        
        # SSM参数投影
        self.x_proj = nn.Linear(d_inner, d_state * 2 + 1)
        self.dt_proj = nn.Linear(d_state, d_inner)
        
        # 输出投影
        self.out_proj = nn.Linear(d_inner, d_model)
        
        # A矩阵初始化
        self.A_log = nn.Parameter(torch.randn(d_state, d_model))
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Mamba前向传播
        """
        batch, seq_len, d = x.shape
        
        # 输入投影
        xz = self.in_proj(x)
        x_inner, z = xz.chunk(2, dim=-1)
        
        # 卷积
        x_conv = self.conv(x_inner.transpose(1, 2)).transpose(1, 2)
        x_conv = F.silu(x_conv)
        
        # SSM参数(选择机制)
        x_dbl = self.x_proj(x_conv)
        dt, B, C = x_dbl.split([self.d_state, self.d_state, self.d_state], dim=-1)
        dt = self.dt_proj(dt)
        
        # 选择性扫描
        y = self.selective_scan(
            x_conv,
            dt,
            self.A_log,
            B,
            C,
            z
        )
        
        # 输出
        y = y * F.silu(z)
        return self.out_proj(y)
    
    def selective_scan(
        self,
        u,
        dt,
        A,
        B,
        C,
        z
    ):
        """
        选择性扫描算法
        
        这是Mamba的核心:参数由输入决定
        """
        # 简化实现
        # 实际使用硬件感知的并行扫描
        batch, seq_len, d_inner = u.shape
        d_state = self.d_state
        
        # 离散化
        A_dis = torch.exp(torch.einsum('bd,dn->bdn', dt, A))
        B_dis = torch.einsum('bdn,bd->bdn', B, dt)
        
        # 扫描
        y = torch.zeros_like(u)
        h = torch.zeros(batch, d_state, device=u.device)
        
        for i in range(seq_len):
            h = A_dis[:, i] * h + B_dis[:, i] * u[:, i:i+1]
            y[:, i] = torch.einsum('bn,bn->b', h, C[:, i])
        
        return y

5.3 RWKV架构

RWKV(Receptance Weighted Key Value)是一种结合RNN和Transformer优点的架构:

class RWKVBlock:
    """
    RWKV块
    
    特点:
    - 线性注意力(O(n)复杂度)
    - 可以并行训练
    - 可以高效推理
    - 保留长期依赖
    """
    
    def __init__(
        self,
        d_model: int,
        d_ffn: int,
        layer_id: int,
        n_layers: int
    ):
        self.layer_id = layer_id
        self.n_layers = n_layers
        
        # 时间混合
        self.time_mix = TimeMixing(d_model)
        
        # 通道混合
        self.channel_mix = ChannelMixing(d_model, d_ffn)
    
    def forward(self, x: torch.Tensor, last_state=None) -> tuple:
        """RWKV前向传播"""
        # 时间混合(类似注意力)
        x, new_time_state = self.time_mix(x, last_state)
        
        # 通道混合(类似FFN)
        x = self.channel_mix(x)
        
        return x, {'time': new_time_state}
 
 
class TimeMixing(nn.Module):
    """RWKV时间混合"""
    
    def __init__(self, d_model: int):
        super().__init__()
        
        self.d_model = d_model
        
        # RWKV参数
        self.time_decay = nn.Parameter(torch.zeros(d_model))
        self.time_first = nn.Parameter(torch.zeros(d_model))
        
        # 投影
        self.key = nn.Linear(d_model, d_model)
        self.value = nn.Linear(d_model, d_model)
        self.receptance = nn.Linear(d_model, d_model)
        
        self.output = nn.Linear(d_model, d_model)
    
    def forward(self, x: torch.Tensor, state=None) -> tuple:
        """
        RWKV时间混合
        
        核心公式:
        wkv = sum_i(exp(-(i-t+1)*decay) * value_i) / sum_i(exp(-(i-t+1)*decay))
        rwkv = sigmoid(receptance) * wkv
        
        然后 output = rwkv @ output_weights
        """
        B, T, C = x.shape
        
        # 投影
        k = self.key(x)
        v = self.value(x)
        r = torch.sigmoid(self.receptance(x))
        
        # WKV计算(简化)
        wkv = self._compute_wkv(k, v)
        
        # 输出
        rwkv = r * wkv
        output = self.output(rwkv)
        
        return output, None  # state简化
    
    def _compute_wkv(self, k, v):
        """计算RWKV的wkv项"""
        # 简化实现
        # 实际使用数值稳定的并行算法
        B, T, C = k.shape
        
        wkv = torch.zeros_like(k)
        
        for t in range(T):
            w = torch.exp(self.time_decay)
            w_first = torch.exp(self.time_first)
            
            # 计算分母
            denom = w_first
            # 计算分子
            numer = w_first * v[:, t:t+1]
            
            for i in range(t):
                denom = 1 + w * denom
                numer = w * numer + w * v[:, i:i+1]
            
            wkv[:, t] = numer / denom
        
        return wkv

5.4 SSM vs Transformer对比

特性TransformerSSM (Mamba/RWKV)
时间复杂度O(n²)O(n)
空间复杂度O(n²)O(n)
长期依赖有限(位置编码)自然保持
并行训练完全并行几乎完全并行
推理速度慢(完整注意力)快(线性)
可解释性注意力权重状态向量
上下文学习较弱(但增强中)
硬件效率中等

六、未来展望

6.1 短期发展趋势(1-2年)

  1. 上下文窗口继续扩大:百万token将成为常态
  2. 更智能的压缩:基于语义的理解而非规则
  3. 原生缓存支持:硬件和模型层面的缓存
  4. 多模态上下文:文本、图像、视频的统一处理

6.2 中期发展趋势(3-5年)

  1. 自适应上下文:模型自动学习最优上下文策略
  2. 外部记忆整合:持久的、可学习的外部记忆系统
  3. State Space融合:SSM与Transformer的深度融合
  4. 个性化上下文:针对用户定制的上下文管理

6.3 长期愿景

┌─────────────────────────────────────────────────────────────────┐
│                     理想上下文系统                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  用户意图 ──► 智能理解 ──► 最优上下文 ──► 高效推理              │
│                                    │                            │
│                          ┌─────────┴─────────┐                  │
│                          │                   │                  │
│                    长期记忆              短期窗口                │
│                          │                   │                  │
│                    ┌─────┴─────┐        ┌───┴───┐               │
│                    │           │        │       │               │
│                  知识图谱    个人记忆   会话历史  实时信息         │
│                                                                 │
│  特点:上下文像人脑一样自然、高效、个性化                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

七、相关主题

八、参考文献

  1. Gu, A., et al. (2021). Efficiently Modeling Long Sequences with Structured State Spaces. NeurIPS.
  2. Dao, T., et al. (2023). Hungry Hungry Hippos: Towards Language Modeling with State Space Models. ICLR.
  3. Peng, B., et al. (2023). RWKV: Reinventing RNNs for the Transformer Era. ACL.
  4. Liu, H., et al. (2024). Mixture-of-Experts Meets Instruction Tuning. arXiv.
  5. Anthropic (2024). Claude 3.5 System Card.