关键词
| 遗传编程 | 语法树 | Koza | 函数集 | 终止符集 |
|---|---|---|---|---|
| 子树交叉 | 子树变异 | 符号回归 | 自动函数发现 | 程序演进 |
| 表达树 | 树深度限制 | 饱和度 | 初始化方法 | 适应度评估 |
| 闭合性 | 充分性 | GP变体 | 行为遗传 | 多目标GP |
摘要
遗传编程(Genetic Programming, GP)是由John Koza于1989年在其标志性著作《Genetic Programming: On the Programming of Computers by Means of Natural Selection》中系统提出的一种自动程序生成技术。与传统遗传算法固定长度编码不同,GP采用可变大小和形状的树形结构表示计算机程序,实现了程序空间的自动搜索与优化。这一突破性范式使计算机能够”自动编程”,已在符号回归、电路设计、游戏策略、自动化控制等领域展现出卓越的问题求解能力。
一、遗传编程的理论基础
1.1 自动编程的愿景
传统编程要求人类程序员明确指定解决问题的每一个步骤。Koza提出的遗传编程旨在颠覆这一范式:给定问题的输入-输出示例或性能评估函数,让计算机通过进化过程自动生成解决问题的程序。
这一愿景的理论支撑来自两个核心原则:
闭合性原则(Closure):所有函数和终止符必须能够相互组合而不出错。例如,若函数集中包含返回布尔值的函数,则其他函数必须能够接受布尔输入。
充分性原则(Sufficiency):函数集和终止符集的组合必须能够表示任意复杂度的解空间。
1.2 与遗传算法的区别
| 特征 | 遗传算法(GA) | 遗传编程(GP) |
|---|---|---|
| 个体表示 | 固定长度串 | 可变大小树 |
| 基因含义 | 编码决定 | 自身即是程序 |
| 搜索空间 | 离散、线性 | 层次化、组合爆炸 |
| 应用领域 | 参数优化 | 程序合成 |
二、树形表示与语法结构
2.1 表达树的基本概念
GP中的个体是一棵有根有序树。每个内部节点代表一个函数(运算符),每个叶子节点代表一个终止符(变量或常量)。
数学表达:程序可形式化为从输入空间到输出空间的映射:
其中为输入变量空间,为输出空间。
2.2 表达树的构建示例
以数学表达式为例,其对应的表达树结构如下:
×
/ \
+ -
/ \ / \
x 3 y 2
中序遍历(标准数学表达式):
前序遍历(波兰表达式):
后序遍历(逆波兰表达式):
2.3 树深度与复杂度控制
GP树的深度直接影响生成的程序复杂度:
- 最大深度限制:防止”不可控生长”(bloat)现象
- 初始深度控制:使用特定初始化方法控制初始种群
深度限制策略:
class TreeNode:
def __init__(self, value, depth=0):
self.value = value
self.depth = depth
self.children = []
def is_valid(self, max_depth):
if self.depth > max_depth:
return False
return all(child.is_valid(max_depth) for child in self.children)
def count_nodes(self):
return 1 + sum(child.count_nodes() for child in self.children)
def get_leaves(self):
if not self.children:
return [self.value]
return [leaf for child in self.children for leaf in child.get_leaves()]三、函数集与终止符集
3.1 函数集的设计
函数集应包含解决问题所需的所有运算符。典型的函数类别包括:
3.1.1 算术函数
function_set = {
'+': lambda x, y: x + y,
'-': lambda x, y: x - y,
'*': lambda x, y: x * y,
'/': lambda x, y: x / y if abs(y) > 1e-8 else 1, # 保护性除法
'%': lambda x, y: x % y if y != 0 else 0,
'**': lambda x, y: x ** y if x > 0 else 0,
}3.1.2 布尔函数
对于布尔问题,函数集通常包括:
boolean_functions = {
'AND': lambda x, y: x and y,
'OR': lambda x, y: x or y,
'NOT': lambda x: not x,
'IF': lambda cond, x, y: x if cond else y,
}3.1.3 条件与循环函数
control_functions = {
'IF-LESS': lambda x, y, z, w: z if x < y else w,
'IF-EQUAL': lambda x, y, z, w: z if x == y else w,
'PROGN2': lambda x, y: y, # 顺序执行
'PROGN3': lambda x, y, z: z,
}3.2 终止符集的设计
终止符集提供程序的输入和常量:
terminal_set = {
'variables': ['x', 'y', 'z', 't'],
'constants': list(range(-10, 11)),
'ephemeral_random': lambda: random.uniform(-1, 1), # 临时随机常量
'funcalls': ['sin', 'cos', 'exp', 'log', 'sqrt'],
}Note
临时随机常量(Ephemeral Random Constants):在初始化时生成随机常量值,交叉和变异操作可能改变这些常量但不会产生新值。这是GP中处理数值常量的经典技术。
3.3 闭合性检查
def check_closure(function_set, terminal_set):
"""
验证函数集的闭合性
每个函数的所有参数位置必须能够接受任何函数的返回值
"""
for name, func in function_set.items():
arity = func.__code__.co_argcount
for i in range(arity):
# 检查第i个参数位置
compatible = any(
is_compatible(output_type, param_types[i])
for func2 in function_set.values()
for output_type in get_output_types(func2)
)
# 也必须接受终止符
compatible |= any(
is_compatible(t, param_types[i])
for t in terminal_set
)
if not compatible:
print(f"Closure violation: {name} at position {i}")
return True四、遗传操作算子
4.1 初始化方法
4.1.1 完整方法(Full Method)
所有叶子节点达到指定深度,确保树的完全性:
def full_initialization(depth, function_set, terminal_set):
if depth == 0:
return random.choice(terminal_set)
return TreeNode(
random.choice(function_set),
children=[full_initialization(depth-1, function_set, terminal_set)
for _ in range(arity(random.choice(function_set)))]
)4.1.2 生长方法(Grow Method)
允许叶子节点在任何深度出现,产生更自然的形状:
def grow_initialization(max_depth, function_set, terminal_set):
if max_depth == 0 or random.random() < ratio_terminals:
return random.choice(terminal_set)
func = random.choice(function_set)
return TreeNode(
func,
children=[grow_initialization(max_depth-1, function_set, terminal_set)
for _ in range(arity(func))]
)4.1.3 混合方法(Ramped Half-and-Half)
在初始化种群时混合使用完整法和生长法:
def ramped_initialization(pop_size, min_depth=2, max_depth=6,
function_set, terminal_set):
population = []
depth_range = range(min_depth, max_depth + 1)
for i in range(pop_size):
depth = list(depth_range)[i % len(depth_range)]
if i % 2 == 0:
tree = full_initialization(depth, function_set, terminal_set)
else:
tree = grow_initialization(depth, function_set, terminal_set)
population.append(tree)
return populationTip
Ramped Half-and-Half能产生多样化的树结构,是Koza推荐的标准初始化方法。
4.2 子树交叉(Crossover)
4.2.1 标准子树交叉
随机选择两个父代树中的交叉点,交换对应的子树:
import copy
def subtree_crossover(parent1, parent2, max_depth=None):
# 随机选择交叉点
crossover_point1 = random.choice(get_all_nodes(parent1))
crossover_point2 = random.choice(get_all_nodes(parent2))
# 深度检查
if max_depth:
subtree_depth = get_subtree_depth(crossover_point2)
parent_depth = get_node_depth(parent1, crossover_point1)
if parent_depth + subtree_depth > max_depth:
return copy.deepcopy(parent1), copy.deepcopy(parent2)
# 执行交叉
child1 = copy.deepcopy(parent1)
child2 = copy.deepcopy(parent2)
replace_subtree(child1, crossover_point1, copy.deepcopy(crossover_point2))
replace_subtree(child2, crossover_point2, copy.deepcopy(crossover_point1))
return child1, child2可视化示例:
Parent1: Parent2:
+ /
/ \ / \
× 3 sin +
/ \ | / \
x 2 y z w
Crossover point1: × Crossover point2: sin
Child1: Child2:
+ /
/ \ / \
sin 3 × +
| / \ \
y x 2 z w
4.2.2 点交叉(Point Crossover)
保留大部分树结构,仅交换一个节点:
def point_crossover(parent1, parent2):
node1 = random.choice(get_all_nodes(parent1))
node2 = random.choice(get_all_nodes(parent2))
child1 = copy.deepcopy(parent1)
child2 = copy.deepcopy(parent2)
replace_node(child1, node1, copy.deepcopy(node2))
replace_node(child2, node2, copy.deepcopy(node1))
return child1, child24.3 子树变异(Mutation)
4.3.1 替换变异
随机选择一棵子树,用新生成的随机树替换:
def subtree_mutation(tree, function_set, terminal_set, max_depth=6):
child = copy.deepcopy(tree)
mutation_point = random.choice(get_all_nodes(child))
# 生成新的随机子树
depth = random.randint(0, max_depth)
new_subtree = grow_initialization(depth, function_set, terminal_set)
replace_subtree(child, mutation_point, new_subtree)
return child4.3.2 收缩变异(Shrink Mutation)
将一棵子树替换为随机终止符:
def shrink_mutation(tree, terminal_set):
child = copy.deepcopy(tree)
mutation_point = random.choice(get_non_leaf_nodes(child))
replace_node(child, mutation_point, random.choice(terminal_set))
return child4.3.3 点变异(Point Mutation)
仅改变单个节点的值:
def point_mutation(tree, function_set, terminal_set, rate=0.01):
child = copy.deepcopy(tree)
for node in get_all_nodes(child):
if random.random() < rate:
if node.is_function():
node.value = random.choice(function_set)
else:
node.value = random.choice(terminal_set)
return child五、适应度评估
5.1 适应度案例(Fitness Cases)
GP的适应度通过在多个测试案例上的表现来衡量:
class FitnessEvaluator:
def __init__(self, fitness_cases):
"""
fitness_cases: [(input1, input2, ..., expected_output), ...]
"""
self.fitness_cases = fitness_cases
def evaluate(self, tree):
errors = []
for inputs, expected in self.fitness_cases:
try:
result = tree.execute(*inputs)
errors.append(abs(result - expected))
except:
errors.append(float('inf'))
return {
'raw_fitness': sum(errors),
'mean_absolute_error': sum(errors) / len(errors),
'hits': sum(1 for e in errors if e < 0.01),
}5.2 标准化适应度
对于符号回归问题,标准化均方误差(Normalized Mean Squared Error):
标准化版本:
Warning
过拟合是GP面临的重要问题。需要使用独立的验证集来评估泛化能力。
六、自动函数发现
6.1 ADF(Automatically Defined Functions)
Koza提出的ADF机制允许进化过程中自动定义和调用子函数:
class ADFTree:
def __init__(self, main_tree, adf_definitions):
self.main_tree = main_tree # 主程序
self.adf_definitions = adf_definitions # {name: Tree}
def execute(self, inputs):
# 准备ADF参数
adf_args = {f'ARG{i}': inputs[i] for i in range(len(inputs))}
# 计算ADF
for name, tree in self.adf_definitions.items():
adf_args[name] = tree.execute(inputs)
# 执行主程序
return self.main_tree.execute(adf_args)ADF树结构示例:
ADF1(x, y): Main Program:
+ IF-ZERO
/ \ / | \
× z > ADF1 ADF2
/ \ | | |
x y x y z
6.2 模块化GP架构
class ModularGP:
def __init__(self, n_adfs=3):
self.n_adfs = n_adfs
self.adf_functions = {}
def evolve_with_adfs(self, fitness_cases):
# 进化主程序和ADF定义
pass
def call_adf(self, adf_name, args):
return self.adf_functions[adf_name].execute(args)七、经典应用案例
7.1 符号回归(Symbolic Regression)
符号回归旨在发现生成数据的数学表达式:
def symbolic_regression_example():
"""
目标函数: f(x) = sin(x) + x^2
"""
# 生成训练数据
x_train = np.linspace(-10, 10, 100)
y_train = np.sin(x_train) + x_train**2
# 定义函数集和终止符集
function_set = ['+', '-', '*', '/', 'sin', 'cos', 'exp', 'log']
terminal_set = ['x', *np.random.uniform(-1, 1, 10)]
# 运行GP
gp = GeneticProgramming(function_set, terminal_set, fitness_cases)
best_solution = gp.evolve()
print(f"Discovered: {best_solution.to_expression()}")
# 输出类似: sin(x) + (x * x)
# 示例运行结果
symreg = SymbolicRegressionFitness(
X=np.linspace(-10, 10, 100),
y=np.sin(X) + X**2
)7.2 电路自动设计
Koza在《Genetic Programming III》中展示了GP自动设计电路的能力,包括:
- 电路拓扑发现
- 元件参数优化
- 信号处理滤波器设计
class CircuitDesignGP:
def __init__(self, circuit_constraints):
self.constraints = circuit_constraints
self.available_components = ['R', 'L', 'C', 'Diode', 'Transistor']
def evaluate_circuit(self, circuit_tree):
"""
评估电路性能指标
- 频率响应
- 功耗
- 噪声特性
"""
simulation_result = self.simulate(circuit_tree)
return {
'fitness': fitness_score(simulation_result),
'validity': self.check_constraints(circuit_tree),
}7.3 游戏策略演化
class GameStrategyGP:
def __init__(self, game_state_space):
self.game_states = game_state_space
def create_game_tree(self):
function_set = ['IF-STATE', 'AND', 'OR', 'NOT']
terminal_set = [f'S{i}' for i in range(len(self.game_states))]
return grow_initialization(5, function_set, terminal_set)
def evaluate_strategy(self, strategy_tree):
wins = 0
for game_state in self.game_states:
action = strategy_tree.execute(game_state)
if self.play_and_win(action, game_state):
wins += 1
return wins / len(self.game_states)八、防止”不可控生长”(Bloat)
8.1 Bloat现象
GP进化过程中,程序树可能无限增长而适应度不再提升,这被称为”代码膨胀”或”bloat”。
8.2 深度限制策略
class DepthLimitedGP:
def __init__(self, max_depth=17):
self.max_depth = max_depth
def is_valid_tree(self, tree):
return get_tree_depth(tree) <= self.max_depth
def crossover_with_limit(self, parent1, parent2):
child1, child2 = subtree_crossover(parent1, parent2)
if not self.is_valid_tree(child1):
return copy.deepcopy(parent1)
if not self.is_valid_tree(child2):
return copy.deepcopy(parent2)
return child1, child28.3 惩罚机制
def size_penalized_fitness(raw_fitness, tree, penalty_coef=0.001):
complexity = tree.count_nodes()
return raw_fitness + penalty_coef * complexity8.4 多目标优化
from deap import tools
def evolve_multi_objective(population, fitness_cases):
creator.create("FitnessMulti", base.Fitness,
weights=(1.0, -1.0)) # 最大化适应度,最小化大小
creator.create("Individual", list, fitness=creator.FitnessMulti)
toolbox = base.Toolbox()
# ... setup operators ...
NSGA2 algorithm for multi-objective optimization
population = algorithms.eaMuPlusLambda(
population, toolbox, mu=500, lambda_=500,
cxpb=0.9, mutpb=0.1, ngen=100
)
return population九、实用框架与工具
9.1 DEAP框架
from deap import base, creator, tools, algorithms
import random
def deap_gp_example():
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree,
fitness=creator.FitnessMin)
pset = gp.PrimitiveSet("MAIN", arity=1)
pset.addPrimitive(operator.add, args=2, name="add")
pset.addPrimitive(operator.sub, args=2, name="sub")
pset.addPrimitive(operator.mul, args=2, name="mul")
pset.addPrimitive(math.sin, args=1)
pset.addPrimitive(math.cos, args=1)
pset.addEphemeralConstant("rand", lambda: random.uniform(-1, 1))
pset.addTerminal(1)
pset.addTerminal(2)
toolbox = base.Toolbox()
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=2, max_=6)
toolbox.register("individual", tools.initIterate,
creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)
def evalSymbReg(individual):
func = toolbox.compile(expr=individual)
return sum(abs(func(x) - target(x)) for x in points),
toolbox.register("evaluate", evalSymbReg)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
algorithms.eaSimple(toolbox.population(), toolbox,
cxpb=0.9, mutpb=0.1, ngen=100)十、进阶主题与研究前沿
10.1 几何语义遗传编程(Geometric Semantic GP)
直接对函数的语义空间进行操作,而非语法空间:
def geometric_semantic_crossover(parent1, parent2, training_data):
"""
几何语义交叉保证子代在训练点上的性能
不劣于任一父代
"""
new_tree = create_mixing_tree(parent1, parent2)
return new_tree
def geometric_semantic_mutation(tree, mutation_step=0.5):
"""
语义变异
"""
target = random.choice(training_data)
new_tree = add_random_tree()
return weighted_sum(tree, new_tree, mutation_step)10.2 概率增量式遗传编程(Probabilistic Incremental Program Evolution)
class PIPE:
def __init__(self):
self.probability_matrix = {} # 存储概率分布
def update_probabilities(self, population, fitnesses):
for tree in population:
for node in tree:
self.probability_matrix[node] += fitness
def sample_tree(self):
root = self.sample_node()
self.expand_node(root)
return root参考文献
- Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
- Koza, J. R. (1994). Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press.
- Koza, J. R., et al. (1999). Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann.
- Poli, R., Langdon, W. B., & McPhee, N. F. (2008). A Field Guide to Genetic Programming. Lulu.com.
- Banzhaf, W., et al. (1998). Genetic Programming: An Introduction. Morgan Kaufmann.