关键词

遗传编程语法树Koza函数集终止符集
子树交叉子树变异符号回归自动函数发现程序演进
表达树树深度限制饱和度初始化方法适应度评估
闭合性充分性GP变体行为遗传多目标GP

摘要

遗传编程(Genetic Programming, GP)是由John Koza于1989年在其标志性著作《Genetic Programming: On the Programming of Computers by Means of Natural Selection》中系统提出的一种自动程序生成技术。与传统遗传算法固定长度编码不同,GP采用可变大小和形状的树形结构表示计算机程序,实现了程序空间的自动搜索与优化。这一突破性范式使计算机能够”自动编程”,已在符号回归、电路设计、游戏策略、自动化控制等领域展现出卓越的问题求解能力。


一、遗传编程的理论基础

1.1 自动编程的愿景

传统编程要求人类程序员明确指定解决问题的每一个步骤。Koza提出的遗传编程旨在颠覆这一范式:给定问题的输入-输出示例或性能评估函数,让计算机通过进化过程自动生成解决问题的程序。

这一愿景的理论支撑来自两个核心原则:

闭合性原则(Closure):所有函数和终止符必须能够相互组合而不出错。例如,若函数集中包含返回布尔值的函数,则其他函数必须能够接受布尔输入。

充分性原则(Sufficiency):函数集和终止符集的组合必须能够表示任意复杂度的解空间。

1.2 与遗传算法的区别

特征遗传算法(GA)遗传编程(GP)
个体表示固定长度串可变大小树
基因含义编码决定自身即是程序
搜索空间离散、线性层次化、组合爆炸
应用领域参数优化程序合成

二、树形表示与语法结构

2.1 表达树的基本概念

GP中的个体是一棵有根有序树。每个内部节点代表一个函数(运算符),每个叶子节点代表一个终止符(变量或常量)。

数学表达:程序可形式化为从输入空间到输出空间的映射:

其中为输入变量空间,为输出空间。

2.2 表达树的构建示例

以数学表达式为例,其对应的表达树结构如下:

           ×
          / \
         +   -
        / \ / \
       x  3 y  2

中序遍历(标准数学表达式):

前序遍历(波兰表达式):

后序遍历(逆波兰表达式):

2.3 树深度与复杂度控制

GP树的深度直接影响生成的程序复杂度:

  • 最大深度限制:防止”不可控生长”(bloat)现象
  • 初始深度控制:使用特定初始化方法控制初始种群

深度限制策略

class TreeNode:
    def __init__(self, value, depth=0):
        self.value = value
        self.depth = depth
        self.children = []
    
    def is_valid(self, max_depth):
        if self.depth > max_depth:
            return False
        return all(child.is_valid(max_depth) for child in self.children)
    
    def count_nodes(self):
        return 1 + sum(child.count_nodes() for child in self.children)
    
    def get_leaves(self):
        if not self.children:
            return [self.value]
        return [leaf for child in self.children for leaf in child.get_leaves()]

三、函数集与终止符集

3.1 函数集的设计

函数集应包含解决问题所需的所有运算符。典型的函数类别包括:

3.1.1 算术函数

function_set = {
    '+': lambda x, y: x + y,
    '-': lambda x, y: x - y,
    '*': lambda x, y: x * y,
    '/': lambda x, y: x / y if abs(y) > 1e-8 else 1,  # 保护性除法
    '%': lambda x, y: x % y if y != 0 else 0,
    '**': lambda x, y: x ** y if x > 0 else 0,
}

3.1.2 布尔函数

对于布尔问题,函数集通常包括:

boolean_functions = {
    'AND': lambda x, y: x and y,
    'OR': lambda x, y: x or y,
    'NOT': lambda x: not x,
    'IF': lambda cond, x, y: x if cond else y,
}

3.1.3 条件与循环函数

control_functions = {
    'IF-LESS': lambda x, y, z, w: z if x < y else w,
    'IF-EQUAL': lambda x, y, z, w: z if x == y else w,
    'PROGN2': lambda x, y: y,  # 顺序执行
    'PROGN3': lambda x, y, z: z,
}

3.2 终止符集的设计

终止符集提供程序的输入和常量:

terminal_set = {
    'variables': ['x', 'y', 'z', 't'],
    'constants': list(range(-10, 11)),
    'ephemeral_random': lambda: random.uniform(-1, 1),  # 临时随机常量
    'funcalls': ['sin', 'cos', 'exp', 'log', 'sqrt'],
}

Note

临时随机常量(Ephemeral Random Constants):在初始化时生成随机常量值,交叉和变异操作可能改变这些常量但不会产生新值。这是GP中处理数值常量的经典技术。

3.3 闭合性检查

def check_closure(function_set, terminal_set):
    """
    验证函数集的闭合性
    每个函数的所有参数位置必须能够接受任何函数的返回值
    """
    for name, func in function_set.items():
        arity = func.__code__.co_argcount
        for i in range(arity):
            # 检查第i个参数位置
            compatible = any(
                is_compatible(output_type, param_types[i])
                for func2 in function_set.values()
                for output_type in get_output_types(func2)
            )
            # 也必须接受终止符
            compatible |= any(
                is_compatible(t, param_types[i])
                for t in terminal_set
            )
            if not compatible:
                print(f"Closure violation: {name} at position {i}")
    return True

四、遗传操作算子

4.1 初始化方法

4.1.1 完整方法(Full Method)

所有叶子节点达到指定深度,确保树的完全性:

def full_initialization(depth, function_set, terminal_set):
    if depth == 0:
        return random.choice(terminal_set)
    return TreeNode(
        random.choice(function_set),
        children=[full_initialization(depth-1, function_set, terminal_set) 
                  for _ in range(arity(random.choice(function_set)))]
    )

4.1.2 生长方法(Grow Method)

允许叶子节点在任何深度出现,产生更自然的形状:

def grow_initialization(max_depth, function_set, terminal_set):
    if max_depth == 0 or random.random() < ratio_terminals:
        return random.choice(terminal_set)
    func = random.choice(function_set)
    return TreeNode(
        func,
        children=[grow_initialization(max_depth-1, function_set, terminal_set)
                  for _ in range(arity(func))]
    )

4.1.3 混合方法(Ramped Half-and-Half)

在初始化种群时混合使用完整法和生长法:

def ramped_initialization(pop_size, min_depth=2, max_depth=6, 
                          function_set, terminal_set):
    population = []
    depth_range = range(min_depth, max_depth + 1)
    
    for i in range(pop_size):
        depth = list(depth_range)[i % len(depth_range)]
        if i % 2 == 0:
            tree = full_initialization(depth, function_set, terminal_set)
        else:
            tree = grow_initialization(depth, function_set, terminal_set)
        population.append(tree)
    
    return population

Tip

Ramped Half-and-Half能产生多样化的树结构,是Koza推荐的标准初始化方法。

4.2 子树交叉(Crossover)

4.2.1 标准子树交叉

随机选择两个父代树中的交叉点,交换对应的子树:

import copy
 
def subtree_crossover(parent1, parent2, max_depth=None):
    # 随机选择交叉点
    crossover_point1 = random.choice(get_all_nodes(parent1))
    crossover_point2 = random.choice(get_all_nodes(parent2))
    
    # 深度检查
    if max_depth:
        subtree_depth = get_subtree_depth(crossover_point2)
        parent_depth = get_node_depth(parent1, crossover_point1)
        if parent_depth + subtree_depth > max_depth:
            return copy.deepcopy(parent1), copy.deepcopy(parent2)
    
    # 执行交叉
    child1 = copy.deepcopy(parent1)
    child2 = copy.deepcopy(parent2)
    
    replace_subtree(child1, crossover_point1, copy.deepcopy(crossover_point2))
    replace_subtree(child2, crossover_point2, copy.deepcopy(crossover_point1))
    
    return child1, child2

可视化示例

Parent1:                        Parent2:
      +                               /
     / \                             / \
    ×   3                         sin   +
   / \                             |   / \
  x   2                           y   z   w

Crossover point1: ×              Crossover point2: sin

Child1:                          Child2:
      +                               /
     / \                             / \
   sin   3                          ×   +
   |                               / \   \
  y                              x   2  z   w

4.2.2 点交叉(Point Crossover)

保留大部分树结构,仅交换一个节点:

def point_crossover(parent1, parent2):
    node1 = random.choice(get_all_nodes(parent1))
    node2 = random.choice(get_all_nodes(parent2))
    
    child1 = copy.deepcopy(parent1)
    child2 = copy.deepcopy(parent2)
    
    replace_node(child1, node1, copy.deepcopy(node2))
    replace_node(child2, node2, copy.deepcopy(node1))
    
    return child1, child2

4.3 子树变异(Mutation)

4.3.1 替换变异

随机选择一棵子树,用新生成的随机树替换:

def subtree_mutation(tree, function_set, terminal_set, max_depth=6):
    child = copy.deepcopy(tree)
    mutation_point = random.choice(get_all_nodes(child))
    
    # 生成新的随机子树
    depth = random.randint(0, max_depth)
    new_subtree = grow_initialization(depth, function_set, terminal_set)
    
    replace_subtree(child, mutation_point, new_subtree)
    return child

4.3.2 收缩变异(Shrink Mutation)

将一棵子树替换为随机终止符:

def shrink_mutation(tree, terminal_set):
    child = copy.deepcopy(tree)
    mutation_point = random.choice(get_non_leaf_nodes(child))
    
    replace_node(child, mutation_point, random.choice(terminal_set))
    return child

4.3.3 点变异(Point Mutation)

仅改变单个节点的值:

def point_mutation(tree, function_set, terminal_set, rate=0.01):
    child = copy.deepcopy(tree)
    
    for node in get_all_nodes(child):
        if random.random() < rate:
            if node.is_function():
                node.value = random.choice(function_set)
            else:
                node.value = random.choice(terminal_set)
    
    return child

五、适应度评估

5.1 适应度案例(Fitness Cases)

GP的适应度通过在多个测试案例上的表现来衡量:

class FitnessEvaluator:
    def __init__(self, fitness_cases):
        """
        fitness_cases: [(input1, input2, ..., expected_output), ...]
        """
        self.fitness_cases = fitness_cases
    
    def evaluate(self, tree):
        errors = []
        for inputs, expected in self.fitness_cases:
            try:
                result = tree.execute(*inputs)
                errors.append(abs(result - expected))
            except:
                errors.append(float('inf'))
        
        return {
            'raw_fitness': sum(errors),
            'mean_absolute_error': sum(errors) / len(errors),
            'hits': sum(1 for e in errors if e < 0.01),
        }

5.2 标准化适应度

对于符号回归问题,标准化均方误差(Normalized Mean Squared Error):

标准化版本

Warning

过拟合是GP面临的重要问题。需要使用独立的验证集来评估泛化能力。


六、自动函数发现

6.1 ADF(Automatically Defined Functions)

Koza提出的ADF机制允许进化过程中自动定义和调用子函数:

class ADFTree:
    def __init__(self, main_tree, adf_definitions):
        self.main_tree = main_tree  # 主程序
        self.adf_definitions = adf_definitions  # {name: Tree}
    
    def execute(self, inputs):
        # 准备ADF参数
        adf_args = {f'ARG{i}': inputs[i] for i in range(len(inputs))}
        
        # 计算ADF
        for name, tree in self.adf_definitions.items():
            adf_args[name] = tree.execute(inputs)
        
        # 执行主程序
        return self.main_tree.execute(adf_args)

ADF树结构示例

ADF1(x, y):                       Main Program:
      +                              IF-ZERO
     / \                            /   |   \
    ×   z                          >   ADF1  ADF2
   / \                              |   |     |
  x   y                           x   y     z

6.2 模块化GP架构

class ModularGP:
    def __init__(self, n_adfs=3):
        self.n_adfs = n_adfs
        self.adf_functions = {}
    
    def evolve_with_adfs(self, fitness_cases):
        # 进化主程序和ADF定义
        pass
    
    def call_adf(self, adf_name, args):
        return self.adf_functions[adf_name].execute(args)

七、经典应用案例

7.1 符号回归(Symbolic Regression)

符号回归旨在发现生成数据的数学表达式:

def symbolic_regression_example():
    """
    目标函数: f(x) = sin(x) + x^2
    """
    # 生成训练数据
    x_train = np.linspace(-10, 10, 100)
    y_train = np.sin(x_train) + x_train**2
    
    # 定义函数集和终止符集
    function_set = ['+', '-', '*', '/', 'sin', 'cos', 'exp', 'log']
    terminal_set = ['x', *np.random.uniform(-1, 1, 10)]
    
    # 运行GP
    gp = GeneticProgramming(function_set, terminal_set, fitness_cases)
    best_solution = gp.evolve()
    
    print(f"Discovered: {best_solution.to_expression()}")
    # 输出类似: sin(x) + (x * x)
 
# 示例运行结果
symreg = SymbolicRegressionFitness(
    X=np.linspace(-10, 10, 100),
    y=np.sin(X) + X**2
)

7.2 电路自动设计

Koza在《Genetic Programming III》中展示了GP自动设计电路的能力,包括:

  • 电路拓扑发现
  • 元件参数优化
  • 信号处理滤波器设计
class CircuitDesignGP:
    def __init__(self, circuit_constraints):
        self.constraints = circuit_constraints
        self.available_components = ['R', 'L', 'C', 'Diode', 'Transistor']
    
    def evaluate_circuit(self, circuit_tree):
        """
        评估电路性能指标
        - 频率响应
        - 功耗
        - 噪声特性
        """
        simulation_result = self.simulate(circuit_tree)
        return {
            'fitness': fitness_score(simulation_result),
            'validity': self.check_constraints(circuit_tree),
        }

7.3 游戏策略演化

class GameStrategyGP:
    def __init__(self, game_state_space):
        self.game_states = game_state_space
    
    def create_game_tree(self):
        function_set = ['IF-STATE', 'AND', 'OR', 'NOT']
        terminal_set = [f'S{i}' for i in range(len(self.game_states))]
        return grow_initialization(5, function_set, terminal_set)
    
    def evaluate_strategy(self, strategy_tree):
        wins = 0
        for game_state in self.game_states:
            action = strategy_tree.execute(game_state)
            if self.play_and_win(action, game_state):
                wins += 1
        return wins / len(self.game_states)

八、防止”不可控生长”(Bloat)

8.1 Bloat现象

GP进化过程中,程序树可能无限增长而适应度不再提升,这被称为”代码膨胀”或”bloat”。

8.2 深度限制策略

class DepthLimitedGP:
    def __init__(self, max_depth=17):
        self.max_depth = max_depth
    
    def is_valid_tree(self, tree):
        return get_tree_depth(tree) <= self.max_depth
    
    def crossover_with_limit(self, parent1, parent2):
        child1, child2 = subtree_crossover(parent1, parent2)
        if not self.is_valid_tree(child1):
            return copy.deepcopy(parent1)
        if not self.is_valid_tree(child2):
            return copy.deepcopy(parent2)
        return child1, child2

8.3 惩罚机制

def size_penalized_fitness(raw_fitness, tree, penalty_coef=0.001):
    complexity = tree.count_nodes()
    return raw_fitness + penalty_coef * complexity

8.4 多目标优化

from deap import tools
 
def evolve_multi_objective(population, fitness_cases):
    creator.create("FitnessMulti", base.Fitness, 
                    weights=(1.0, -1.0))  # 最大化适应度,最小化大小
    creator.create("Individual", list, fitness=creator.FitnessMulti)
    
    toolbox = base.Toolbox()
    # ... setup operators ...
    
    NSGA2 algorithm for multi-objective optimization
    population = algorithms.eaMuPlusLambda(
        population, toolbox, mu=500, lambda_=500, 
        cxpb=0.9, mutpb=0.1, ngen=100
    )
    
    return population

九、实用框架与工具

9.1 DEAP框架

from deap import base, creator, tools, algorithms
import random
 
def deap_gp_example():
    creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    creator.create("Individual", gp.PrimitiveTree, 
                   fitness=creator.FitnessMin)
    
    pset = gp.PrimitiveSet("MAIN", arity=1)
    pset.addPrimitive(operator.add, args=2, name="add")
    pset.addPrimitive(operator.sub, args=2, name="sub")
    pset.addPrimitive(operator.mul, args=2, name="mul")
    pset.addPrimitive(math.sin, args=1)
    pset.addPrimitive(math.cos, args=1)
    
    pset.addEphemeralConstant("rand", lambda: random.uniform(-1, 1))
    pset.addTerminal(1)
    pset.addTerminal(2)
    
    toolbox = base.Toolbox()
    toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=2, max_=6)
    toolbox.register("individual", tools.initIterate, 
                     creator.Individual, toolbox.expr)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)
    toolbox.register("compile", gp.compile, pset=pset)
    
    def evalSymbReg(individual):
        func = toolbox.compile(expr=individual)
        return sum(abs(func(x) - target(x)) for x in points),
    
    toolbox.register("evaluate", evalSymbReg)
    toolbox.register("select", tools.selTournament, tournsize=3)
    toolbox.register("mate", gp.cxOnePoint)
    toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
    toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
    
    algorithms.eaSimple(toolbox.population(), toolbox, 
                        cxpb=0.9, mutpb=0.1, ngen=100)

十、进阶主题与研究前沿

10.1 几何语义遗传编程(Geometric Semantic GP)

直接对函数的语义空间进行操作,而非语法空间:

def geometric_semantic_crossover(parent1, parent2, training_data):
    """
    几何语义交叉保证子代在训练点上的性能
    不劣于任一父代
    """
    new_tree = create_mixing_tree(parent1, parent2)
    return new_tree
 
def geometric_semantic_mutation(tree, mutation_step=0.5):
    """
    语义变异
    """
    target = random.choice(training_data)
    new_tree = add_random_tree()
    return weighted_sum(tree, new_tree, mutation_step)

10.2 概率增量式遗传编程(Probabilistic Incremental Program Evolution)

class PIPE:
    def __init__(self):
        self.probability_matrix = {}  # 存储概率分布
    
    def update_probabilities(self, population, fitnesses):
        for tree in population:
            for node in tree:
                self.probability_matrix[node] += fitness
    
    def sample_tree(self):
        root = self.sample_node()
        self.expand_node(root)
        return root

参考文献

  1. Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
  2. Koza, J. R. (1994). Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press.
  3. Koza, J. R., et al. (1999). Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann.
  4. Poli, R., Langdon, W. B., & McPhee, N. F. (2008). A Field Guide to Genetic Programming. Lulu.com.
  5. Banzhaf, W., et al. (1998). Genetic Programming: An Introduction. Morgan Kaufmann.

相关文档