关键词
| 进化策略 | CMA-ES | 协方差矩阵 | 自然进化策略 | 步长控制 |
|---|---|---|---|---|
| (μ+λ)-ES | (μ,λ)-ES | 重组操作 | 累积步长调整 | 成功比率 |
| 权重更新 | 自然梯度 | 信息几何 | 等距采样 | 排名选择 |
| Hansen | Beyer | 演化多样性 | 全局收敛 | 超参数自适应 |
摘要
进化策略(Evolution Strategies, ES)是最早的进化计算范式之一,由Ingo Rechenberg和Hans-Paul Schwefel于1960年代在德国斯图加特大学开创。与遗传算法侧重于离散编码不同,进化策略专门针对连续实数参数的优化问题,采用正态分布采样进行解的搜索。其标志性成果——协方差矩阵自适应进化策略(CMA-ES)——被认为是连续优化领域最成功的算法之一,在各种基准测试和实际应用中展现出卓越的性能。
一、进化策略的历史演进
1.1 起源(1960s)
1964年,Ingo Rechenberg在斯图加特大学流体机械实验室工作时,首次提出进化策略的概念。他的任务是优化喷嘴形状以最大化流体流速。通过模拟生物进化过程,他开发了第一个进化策略算法。
Rechenberg的贡献:
- 证明了算法在1/5成功规则下能够收敛
- 建立了进化策略的数学收敛性框架
Schwefel的贡献:
- 扩展了进化策略的应用范围
- 提出了多目标优化的雏形
1.2 经典进化策略类型
(μ+λ)-ES(Plus策略)
在每代中,父母代和子代共同竞争存活:
(μ,λ)-ES(Comma策略)
只有子代参与竞争,父母代完全被替换:
Note
(μ,λ)-ES更适合动态环境和非稳态优化,因为其”忘记机制”能够避免陷入局部最优。(μ+λ)-ES则更适合稳态优化,能够更好地利用优秀解。
1.3 从(1+1)-ES到现代ES
最简单的进化策略是(1+1)-ES:
class OnePlusOneES:
def __init__(self, objective, bounds, n_iterations=1000):
self.objective = objective
self.bounds = bounds
self.n_iterations = n_iterations
self.dim = len(bounds[0])
def mutate(self, parent, sigma):
"""正态分布变异"""
child = parent + np.random.normal(0, sigma, self.dim)
return np.clip(child, self.bounds[0], self.bounds[1])
def run(self):
# 初始化
parent = np.random.uniform(self.bounds[0], self.bounds[1])
parent_fitness = self.objective(parent)
# 1/5成功规则参数
sigma = 0.1
success_count = 0
target_success_rate = 0.2
for t in range(self.n_iterations):
# 生成子代
child = self.mutate(parent, sigma)
child_fitness = self.objective(child)
if child_fitness < parent_fitness: # 最小化问题
parent = child
parent_fitness = child_fitness
success_count += 1
# 1/5成功规则调整步长
if t % (10 * self.dim) == 0:
success_rate = success_count / (10 * self.dim)
if success_rate > target_success_rate:
sigma *= 1 / 0.85 # 增加步长
else:
sigma *= 0.85 # 减小步长
success_count = 0
return parent, parent_fitness二、协方差矩阵自适应(CMA-ES)
2.1 基本原理
CMA-ES(Covariance Matrix Adaptation Evolution Strategy)由Ost Hansen和Stefan Kern于2004年正式提出,是现代进化策略的巅峰之作。
核心思想:通过自适应地学习目标函数的局部几何结构,引导搜索向最优方向前进。
关键创新:
- 学习目标函数的局部曲率
- 自适应调整搜索步长
- 利用历史信息加速收敛
2.2 多元正态分布采样
CMA-ES的核心是在维参数空间中进行多元正态分布采样:
其中:
- :均值向量(当前搜索中心)
- :全局步长
- :协方差矩阵(描述椭圆等高线的形状和方向)
2.3 协方差矩阵更新
2.3.1 -rank-1更新
当发现好解时,更新协方差矩阵以朝该方向移动:
其中为学习率,为演化路径(evolution path)。
2.3.2 -rank-μ更新
使用前μ个最优个体的信息更新协方差矩阵:
其中为权重,满足。
2.3.3 完整的协方差更新
2.4 CMA-ES完整实现
import numpy as np
class CMAES:
"""
协方差矩阵自适应进化策略
基于 Hansen & Ostermeier (2001) 实现
"""
def __init__(self, objective, dim, x0=None, sigma=0.3,
popsize=None, weight_power=1):
self.objective = objective
self.dim = dim
self.sigma = sigma
# 默认种群规模
self.lambda_ = 4 + int(3 * np.log(dim)) if popsize is None else popsize
# 初始化参数
self.xmean = x0 if x0 is not None else np.random.randn(dim)
# 协方差矩阵初始化为单位阵
self.C = np.eye(dim)
self.pc = np.zeros(dim) # 演化路径
self.ps = np.zeros(dim) # 演化路径(步长)
# 学习率
self.mu_eff = None # 有效个体数
self.set_weights(weight_power)
# 步长控制参数
self.cc = 4 / (dim + 4) # cc ≈ 4/(n+4)
self.c1 = 2 / ((dim + 1.3)**2 + self.mu_eff) # c1 ≈ 2/n²
self.cmu = self.mu_eff / ((dim + 1.3)**2 + 2*self.mu_eff)
self.clambda = 1 / (self.mu_eff + 2)
# 累积步长调整参数
self.cs = (self.mu_eff + 2) / (self.dim + self.mu_eff + 5)
self.damps = 1 + 2*max(0, np.sqrt((self.mu_eff-1)/(dim+1))-1) + self.cs
# 适应度历史
self.fitness_history = []
self.best_fitness = float('inf')
self.best_solution = self.xmean.copy()
def set_weights(self, power=1):
"""设置重组权重"""
N = self.lambda_ // 2
weights = np.array([max(N-i, 1) for i in range(N)])
weights = weights ** power
weights = weights / sum(weights)
# 正权重用于优势个体
self.pos_weights = np.zeros(self.lambda_)
self.neg_weights = np.zeros(self.lambda_)
sorted_indices = np.argsort(self.pop_fitness)
self.pos_weights[sorted_indices[:N]] = weights[:N]
self.neg_weights[sorted_indices[-N:]] = weights[-N:]
# 计算有效个体数
self.mu_eff = 1 / sum(weights**2)
def sample_population(self):
"""从多元正态分布采样"""
# 使用Cholesky分解进行采样
B = np.linalg.cholesky(self.C)
self.population = np.array([
self.xmean + self.sigma * B @ np.random.randn(self.dim)
for _ in range(self.lambda_)
])
# 评估适应度
self.pop_fitness = np.array([self.objective(x) for x in self.population])
# 记录最优解
best_idx = np.argmin(self.pop_fitness)
if self.pop_fitness[best_idx] < self.best_fitness:
self.best_fitness = self.pop_fitness[best_idx]
self.best_solution = self.population[best_idx].copy()
def update_distribution(self):
"""更新均值、协方差矩阵和步长"""
# 按适应度排序
sorted_indices = np.argsort(self.pop_fitness)
sorted_pop = self.population[sorted_indices]
# 更新均值
old_mean = self.xmean.copy()
self.xmean = sum(w * sorted_pop[i] for i, w in enumerate(self.pos_weights))
# 更新演化路径(步长控制)
y = (self.xmean - old_mean) / self.sigma
self.ps = (1 - self.cs) * self.ps + np.sqrt(self.cs * (2 - self.cs)) * y
# 检查是否需要调整步长
ps_norm = np.linalg.norm(self.ps)
expected_norm = np.sqrt(self.dim) * (1 - 1/(4*self.dim) + 1/(21*self.dim**2))
self.sigma *= np.exp((self.cs / self.damps) * (ps_norm / expected_norm - 1))
# 更新演化路径(协方差)
self.pc = (1 - self.cc) * self.pc + \
np.sqrt(self.cc * (2 - self.cc)) * (self.xmean - old_mean) / self.sigma
# 更新协方差矩阵
rank1_update = self.c1 * np.outer(self.pc, self.pc)
rank_mu_update = np.zeros((self.dim, self.dim))
for i, w in enumerate(self.pos_weights):
if w > 0:
y = (sorted_pop[i] - old_mean) / self.sigma
rank_mu_update += w * np.outer(y, y)
self.C = (1 - self.c1 - self.cmu) * self.C + rank1_update + rank_mu_update
# 确保对称性和正定性
self.C = np.triu(self.C) + np.triu(self.C, 1).T
def run(self, max_iterations=1000, tol=1e-8):
"""运行CMA-ES"""
for iteration in range(max_iterations):
self.sample_population()
self.update_distribution()
self.fitness_history.append(self.best_fitness)
if iteration % 100 == 0:
print(f"Iter {iteration}: Best fitness = {self.best_fitness:.6f}")
# 早停条件
if self.sigma < tol:
break
return self.best_solution, self.best_fitness2.5 CMA-ES参数设置指南
| 参数 | 典型值 | 说明 |
|---|---|---|
| (种群规模) | 维度的对数函数 | |
| (父代数量) | 通常取一半 | |
| rank-1更新学习率 | ||
| rank-μ更新学习率 | ||
| 协方差学习率 | ||
| 步长阻尼参数 |
Tip
对于非凸优化问题,CMA-ES通常是最可靠的选择。对于凸二次函数,CMA-ES几乎能达到最优性能。
三、自然进化策略(Natural Evolution Strategies)
3.1 信息几何视角
自然进化策略(Natural Evolution Strategies, NES)是Wierstra等人在2014年提出的理论框架,基于信息几何学和自然梯度优化。
核心思想:在概率分布的参数空间中,使用自然梯度而非欧几里得梯度进行更新。
3.2 经典NES算法
3.2.1 目标函数
NES直接在参数分布的参数空间中优化:
3.2.2 自然梯度更新
自然梯度通过Fisher信息矩阵进行重归一化:
Fisher信息矩阵:
3.3 梯度估计
使用样本近似期望:
对于高斯分布:
class NaturalEvolutionStrategies:
def __init__(self, objective, dim, learning_rate=0.01):
self.objective = objective
self.dim = dim
self.lr = learning_rate
# 初始化均值和协方差
self.mean = np.zeros(dim)
self.cov = np.eye(dim)
def sample(self, n_samples=100):
"""从高斯分布采样"""
return np.random.multivariate_normal(self.mean, self.cov, n_samples)
def fitness_shaping(self, fitnesses):
"""适应度塑形(避免早熟)"""
ranks = np.argsort(np.argsort(fitnesses)) # 排名
n = len(fitnesses)
utilities = np.array([max(0, np.log(n/2 + 1) - np.log(n - r))
for r in ranks])
utilities /= sum(utilities) # 归一化
return utilities
def compute_gradient(self, samples, utilities):
"""计算自然梯度"""
n = len(samples)
# 梯度(均值)
grad_mean = np.zeros(self.dim)
for x, u in zip(samples, utilities):
grad_mean += u * (x - self.mean)
# 梯度(协方差)
grad_cov = np.zeros((self.dim, self.dim))
for x, u in zip(samples, utilities):
diff = (x - self.mean).reshape(-1, 1)
grad_cov += u * (diff @ diff.T - self.cov)
return grad_mean, grad_cov
def update(self, grad_mean, grad_cov):
"""更新参数"""
self.mean += self.lr * grad_mean
self.cov += self.lr * grad_cov
# 确保协方差矩阵正定
self.cov = np.triu(self.cov) + np.triu(self.cov, 1).T
def run(self, n_iterations=1000):
for t in range(n_iterations):
# 采样
samples = self.sample()
fitnesses = np.array([self.objective(x) for x in samples])
# 适应度塑形
utilities = self.fitness_shaping(fitnesses)
# 计算梯度
grad_mean, grad_cov = self.compute_gradient(samples, utilities)
# 更新
self.update(grad_mean, grad_cov)
if t % 100 == 0:
print(f"Iter {t}: Best = {min(fitnesses):.6f}")
return self.mean3.4 xNES算法
xNES(exponential Natural Evolution Strategies)是NES的一个重要变体,使用指数参数化确保正定性:
def xnes_update(theta, gradient, learning_rate=0.01, cov_learning_rate=None):
"""
xNES参数更新
theta: 参数向量(用于指数族分布)
gradient: 梯度
"""
if cov_learning_rate is None:
cov_learning_rate = learning_rate / 2
# 指数更新
theta += learning_rate * gradient
return theta四、重组操作(Recombination)
4.1 离散重组
从父代中随机选择等位基因:
其中为每个维度的父代选择。
4.2 中介重组
随机选择两个父代的中点:
4.3 遗传编程式重组
def intermediate_recombination(parents, n_offspring):
"""中介重组"""
offspring = []
for _ in range(n_offspring):
# 随机选择两个父代
p1, p2 = random.sample(parents, 2)
# 加权混合
alpha = random.random()
child = alpha * p1 + (1 - alpha) * p2
offspring.append(child)
return np.array(offspring)
def global_intermediate_recombination(parents, n_offspring):
"""全局中介重组(使用所有父代)"""
offspring = []
for _ in range(n_offspring):
# 从所有父代中随机选择两个
indices = random.sample(range(len(parents)), 2)
p1, p2 = parents[indices[0]], parents[indices[1]]
# 加权混合
alpha = random.random()
child = alpha * p1 + (1 - alpha) * p2
offspring.append(child)
return np.array(offspring)五、步长控制机制
5.1 累积步长调整(CSA)
累积步长调整(Cumulative Step Size Adaptation, CSA)通过跟踪演化路径来判断是否需要调整步长:
def cumulative_step_size_adaptation(ps, sigma, dim, cs, damps, expected_norm):
"""
累积步长调整
ps: 演化路径
sigma: 当前步长
"""
ps_norm = np.linalg.norm(ps)
# 更新步长
sigma *= np.exp((cs / damps) * (ps_norm / expected_norm - 1))
return sigma5.2 PCGA步长控制
PCGA(Pairwise-Cheap-Clustering Step Size Adaptation)是另一种步长控制方法:
def pcga_step_control(fitness_parents, fitness_offspring, sigma, dim):
"""基于成对比较的步长控制"""
n_better = sum(1 for fo in fitness_offspring
if any(fp < fo for fp in fitness_parents))
success_rate = n_better / len(fitness_offspring)
# 调整步长
if success_rate > 0.2:
sigma *= 1.1
elif success_rate < 0.2:
sigma *= 0.9
return sigma5.3 成功比率自适应
class SuccessRatioAdaptation:
def __init__(self, target_ratio=0.25):
self.target = target_ratio
self.success_history = []
self.window_size = 10 * dim
def update(self, parent_fitness, offspring_fitness):
n_success = sum(1 for fo in offspring_fitness if fo < parent_fitness)
self.success_history.append(n_success / len(offspring_fitness))
if len(self.success_history) > self.window_size:
self.success_history.pop(0)
def get_step_factor(self):
avg_success = np.mean(self.success_history)
if avg_success > self.target:
return 1.05
else:
return 0.95六、CMA-ES的工程实践
6.1 典型应用场景
CMA-ES特别适用于以下问题类型:
Tip
- 非凸优化:山谷、盆地、平台等复杂地形
- 多模态优化:需要发现多个局部最优
- 病态条件数:Hessian矩阵条件数很大的问题
- 黑盒优化:梯度不可用或计算代价高昂
6.2 基准测试表现
在COCO(BBOB)基准测试中,CMA-ES在以下维度表现优异:
| 问题类型 | CMA-ES表现 | 备注 |
|---|---|---|
| 球函数 | ★★★★★ | 接近理论最优 |
| Ellipsoidal | ★★★★☆ | 收敛稍慢 |
| Rastrigin | ★★★★☆ | 需要较大种群 |
| Rosenbrock | ★★★★☆ | 整体良好 |
| Sharp Ridge | ★★★☆☆ | 挑战性较大 |
6.3 实践建议
# CMA-ES使用最佳实践
def practical_cma_es_example():
# 问题定义
dim = 20
objective = lambda x: sum((x - 0.5)**2)
# 参数设置
x0 = np.random.randn(dim)
sigma = 0.3
# 边界约束(推荐使用变换)
lower = np.zeros(dim)
upper = np.ones(dim)
# 使用带边界处理的CMA-ES
optimizer = CMAESWithBoundaries(
objective, dim, x0, sigma,
lower=lower, upper=upper
)
result = optimizer.run(max_iterations=2000)
return result6.4 与其他算法的比较
| 特性 | CMA-ES | 随机搜索 | 梯度下降 |
|---|---|---|---|
| 梯度需求 | 否 | 否 | 是 |
| 收敛速度 | 中等 | 慢 | 快(凸问题) |
| 全局搜索 | 强 | 中 | 弱 |
| 参数敏感性 | 低 | 低 | 中 |
七、Python实现示例
7.1 使用pycma库
import cma
def pycma_example():
# 标准用法
es = cma.CMAEvolutionStrategy(
x0=np.random.randn(10), # 初始解
sigma0=0.5, # 初始步长
inopts={'popsize': 20} # 种群规模
)
# 优化循环
while not es.stop():
solutions = es.ask() # 获取候选解
fitness = [objective(x) for x in solutions]
es.tell(solutions, fitness) # 更新分布
print(f"Generation {es.count_iter}: Best = {es.best.fitness}")
return es.best.result
# 边界约束版本
def bounded_cma():
bounds = [(0, 1)] * 10 # 每个维度的上下界
es = cma.CMAEvolutionStrategy(
x0=np.random.rand(10),
sigma0=0.3,
inopts={
'bounds': [0, 1],
'popsize': 20
}
)
while not es.stop():
solutions = es.ask()
# 自动处理边界
solutions = np.clip(solutions, 0, 1)
fitness = [objective(x) for x in solutions]
es.tell(solutions, fitness)
return es.best7.2 多目标进化策略
class MultiObjectiveES:
def __init__(self, objectives, dim, popsize=100):
self.objectives = objectives # f1, f2, ...
self.dim = dim
self.popsize = popsize
def dominates(self, x1, x2):
"""检查x1是否支配x2"""
f1 = [f(x1) for f in self.objectives]
f2 = [f(x2) for f in self.objectives]
return all(a <= b for a, b in zip(f1, f2)) and any(a < b for a, b in zip(f1, f2))
def evolve(self, n_generations=500):
# 初始化种群
population = np.random.randn(self.popsize, self.dim)
for gen in range(n_generations):
# 评估
fitness_matrix = np.array([[f(x) for f in self.objectives]
for x in population])
# 非支配排序
pareto_front = self.non_dominated_sort(fitness_matrix)
if gen % 50 == 0:
print(f"Gen {gen}: Pareto front size = {len(pareto_front)}")
return population[pareto_front]
def non_dominated_sort(self, fitness_matrix):
"""标准非支配排序"""
dominated = [set() for _ in range(len(fitness_matrix))]
domination_count = np.zeros(len(fitness_matrix))
pareto_front = []
for i in range(len(fitness_matrix)):
for j in range(i+1, len(fitness_matrix)):
if self.dominates(fitness_matrix[i], fitness_matrix[j]):
dominated[i].add(j)
domination_count[j] += 1
elif self.dominates(fitness_matrix[j], fitness_matrix[i]):
dominated[j].add(i)
domination_count[i] += 1
if domination_count[i] == 0:
pareto_front.append(i)
return pareto_front八、研究前沿与发展趋势
8.1 混合策略
将CMA-ES与其他方法结合:
- CMA-ES + 局部搜索:在进化后期嵌入精确局部搜索
- CMA-ES + 贝叶斯优化:利用贝叶斯模型指导探索
- CMA-ES + 蚁群算法:结合全局信息素机制
8.2 大规模优化
针对高维问题的CMA-ES变体:
class LargeScaleCMAES:
"""低预算CMA-ES,用于高维问题"""
def __init__(self, dim, budget=1000*dim):
self.dim = dim
self.budget = budget
self.used_budget = 0
# 协方差矩阵的稀疏表示
self.active_dim = min(100, dim) # 仅在子空间更新
self.V = self.initialize_subspace()
def initialize_subspace(self):
"""随机初始化子空间"""
V = np.random.randn(self.dim, self.active_dim)
V, _ = np.linalg.qr(V) # 正交化
return V
def update(self, samples, fitness):
# 仅在活跃子空间中更新
reduced_samples = self.V.T @ samples.T
# ... 标准CMA-ES更新 ...
# 映射回原空间
new_C_reduced = self.update_cma(reduced_samples)
new_C = self.V @ new_C_reduced @ self.V.T + \
np.eye(self.dim) * 0.1 # 正则化
return new_C8.3 分布式CMA-ES
class DistributedCMAES:
"""多岛屿CMA-ES"""
def __init__(self, n_islands=10):
self.islands = [CMAES(dim) for _ in range(n_islands)]
self.migration_interval = 10
self.migration_rate = 0.1
def migration_step(self):
"""岛屿间迁移"""
for island in self.islands:
# 发送最优个体
best = island.get_best()
target = random.choice([i for i in self.islands if i != island])
target.receive_migrant(best)
def run(self, max_generations):
for gen in range(max_generations):
for island in self.islands:
island.step()
if gen % self.migration_interval == 0:
self.migration_step()参考文献
- Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzberg.
- Schwefel, H.-P. (1981). Numerical Optimization of Computer Models. John Wiley & Sons.
- Hansen, N., & Ostermeier, A. (2001). Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation, 9(2), 159-195.
- Hansen, N., et al. (2015). CMA-ES: Evolution Strategies with Covariance Matrix Adaptation for Objective Functions with Time-dependency. Evolutionary Computation, 23(4), 643-669.
- Wierstra, D., et al. (2014). Natural Evolution Strategies. Journal of Machine Learning Research, 15, 949-980.
- Akimoto, Y., & Hansen, N. (2016). Block-Boundary Covariance Matrix Adaptation. Proceedings of GECCO, 2016.