关键词

遗传编程	语法树	Koza	函数集	终止符集
子树交叉	子树变异	符号回归	自动函数发现	程序演进
表达树	树深度限制	饱和度	初始化方法	适应度评估
闭合性	充分性	GP变体	行为遗传	多目标GP

摘要

遗传编程（Genetic Programming, GP）是由John Koza于1989年在其标志性著作《Genetic Programming: On the Programming of Computers by Means of Natural Selection》中系统提出的一种自动程序生成技术。与传统遗传算法固定长度编码不同，GP采用可变大小和形状的树形结构表示计算机程序，实现了程序空间的自动搜索与优化。这一突破性范式使计算机能够”自动编程”，已在符号回归、电路设计、游戏策略、自动化控制等领域展现出卓越的问题求解能力。

一、遗传编程的理论基础

1.1 自动编程的愿景

传统编程要求人类程序员明确指定解决问题的每一个步骤。Koza提出的遗传编程旨在颠覆这一范式：给定问题的输入-输出示例或性能评估函数，让计算机通过进化过程自动生成解决问题的程序。

这一愿景的理论支撑来自两个核心原则：

闭合性原则（Closure）：所有函数和终止符必须能够相互组合而不出错。例如，若函数集中包含返回布尔值的函数，则其他函数必须能够接受布尔输入。

充分性原则（Sufficiency）：函数集和终止符集的组合必须能够表示任意复杂度的解空间。

1.2 与遗传算法的区别

特征	遗传算法(GA)	遗传编程(GP)
个体表示	固定长度串	可变大小树
基因含义	编码决定	自身即是程序
搜索空间	离散、线性	层次化、组合爆炸
应用领域	参数优化	程序合成

二、树形表示与语法结构

2.1 表达树的基本概念

GP中的个体是一棵有根有序树。每个内部节点代表一个函数（运算符），每个叶子节点代表一个终止符（变量或常量）。

数学表达：程序可形式化为从输入空间到输出空间的映射：

P : X \to Y

其中 $X$ 为输入变量空间， $Y$ 为输出空间。

2.2 表达树的构建示例

以数学表达式 $(x + 3) \times (y - 2)$ 为例，其对应的表达树结构如下：

中序遍历（标准数学表达式）： $(x + 3) \times (y - 2)$

前序遍历（波兰表达式）： $\times + x 3 - y 2$

后序遍历（逆波兰表达式）： $x 3 + y 2 - \times$

2.3 树深度与复杂度控制

GP树的深度直接影响生成的程序复杂度：

最大深度限制：防止”不可控生长”（bloat）现象
初始深度控制：使用特定初始化方法控制初始种群

深度限制策略：

class TreeNode:
    def __init__(self, value, depth=0):
        self.value = value
        self.depth = depth
        self.children = []
    
    def is_valid(self, max_depth):
        if self.depth > max_depth:
            return False
        return all(child.is_valid(max_depth) for child in self.children)
    
    def count_nodes(self):
        return 1 + sum(child.count_nodes() for child in self.children)
    
    def get_leaves(self):
        if not self.children:
            return [self.value]
        return [leaf for child in self.children for leaf in child.get_leaves()]

三、函数集与终止符集

3.1 函数集的设计

函数集应包含解决问题所需的所有运算符。典型的函数类别包括：

3.1.1 算术函数

function_set = {
    '+': lambda x, y: x + y,
    '-': lambda x, y: x - y,
    '*': lambda x, y: x * y,
    '/': lambda x, y: x / y if abs(y) > 1e-8 else 1,  # 保护性除法
    '%': lambda x, y: x % y if y != 0 else 0,
    '**': lambda x, y: x ** y if x > 0 else 0,
}

3.1.2 布尔函数

对于布尔问题，函数集通常包括：

boolean_functions = {
    'AND': lambda x, y: x and y,
    'OR': lambda x, y: x or y,
    'NOT': lambda x: not x,
    'IF': lambda cond, x, y: x if cond else y,
}

3.1.3 条件与循环函数

control_functions = {
    'IF-LESS': lambda x, y, z, w: z if x < y else w,
    'IF-EQUAL': lambda x, y, z, w: z if x == y else w,
    'PROGN2': lambda x, y: y,  # 顺序执行
    'PROGN3': lambda x, y, z: z,
}

3.2 终止符集的设计

终止符集提供程序的输入和常量：

terminal_set = {
    'variables': ['x', 'y', 'z', 't'],
    'constants': list(range(-10, 11)),
    'ephemeral_random': lambda: random.uniform(-1, 1),  # 临时随机常量
    'funcalls': ['sin', 'cos', 'exp', 'log', 'sqrt'],
}

Note

临时随机常量（Ephemeral Random Constants）：在初始化时生成随机常量值，交叉和变异操作可能改变这些常量但不会产生新值。这是GP中处理数值常量的经典技术。

3.3 闭合性检查

def check_closure(function_set, terminal_set):
    """
    验证函数集的闭合性
    每个函数的所有参数位置必须能够接受任何函数的返回值
    """
    for name, func in function_set.items():
        arity = func.__code__.co_argcount
        for i in range(arity):
            # 检查第i个参数位置
            compatible = any(
                is_compatible(output_type, param_types[i])
                for func2 in function_set.values()
                for output_type in get_output_types(func2)
            )
            # 也必须接受终止符
            compatible |= any(
                is_compatible(t, param_types[i])
                for t in terminal_set
            )
            if not compatible:
                print(f"Closure violation: {name} at position {i}")
    return True

四、遗传操作算子

4.1 初始化方法

4.1.1 完整方法（Full Method）

所有叶子节点达到指定深度，确保树的完全性：

def full_initialization(depth, function_set, terminal_set):
    if depth == 0:
        return random.choice(terminal_set)
    return TreeNode(
        random.choice(function_set),
        children=[full_initialization(depth-1, function_set, terminal_set) 
                  for _ in range(arity(random.choice(function_set)))]
    )

4.1.2 生长方法（Grow Method）

允许叶子节点在任何深度出现，产生更自然的形状：

def grow_initialization(max_depth, function_set, terminal_set):
    if max_depth == 0 or random.random() < ratio_terminals:
        return random.choice(terminal_set)
    func = random.choice(function_set)
    return TreeNode(
        func,
        children=[grow_initialization(max_depth-1, function_set, terminal_set)
                  for _ in range(arity(func))]
    )

4.1.3 混合方法（Ramped Half-and-Half）

在初始化种群时混合使用完整法和生长法：

def ramped_initialization(pop_size, min_depth=2, max_depth=6, 
                          function_set, terminal_set):
    population = []
    depth_range = range(min_depth, max_depth + 1)
    
    for i in range(pop_size):
        depth = list(depth_range)[i % len(depth_range)]
        if i % 2 == 0:
            tree = full_initialization(depth, function_set, terminal_set)
        else:
            tree = grow_initialization(depth, function_set, terminal_set)
        population.append(tree)
    
    return population

Tip

Ramped Half-and-Half能产生多样化的树结构，是Koza推荐的标准初始化方法。

4.2 子树交叉（Crossover）

4.2.1 标准子树交叉

随机选择两个父代树中的交叉点，交换对应的子树：

import copy
 
def subtree_crossover(parent1, parent2, max_depth=None):
    # 随机选择交叉点
    crossover_point1 = random.choice(get_all_nodes(parent1))
    crossover_point2 = random.choice(get_all_nodes(parent2))
    
    # 深度检查
    if max_depth:
        subtree_depth = get_subtree_depth(crossover_point2)
        parent_depth = get_node_depth(parent1, crossover_point1)
        if parent_depth + subtree_depth > max_depth:
            return copy.deepcopy(parent1), copy.deepcopy(parent2)
    
    # 执行交叉
    child1 = copy.deepcopy(parent1)
    child2 = copy.deepcopy(parent2)
    
    replace_subtree(child1, crossover_point1, copy.deepcopy(crossover_point2))
    replace_subtree(child2, crossover_point2, copy.deepcopy(crossover_point1))
    
    return child1, child2

可视化示例：

Parent1:                        Parent2:
      +                               /
     / \                             / \
    ×   3                         sin   +
   / \                             |   / \
  x   2                           y   z   w

Crossover point1: ×              Crossover point2: sin

Child1:                          Child2:
      +                               /
     / \                             / \
   sin   3                          ×   +
   |                               / \   \
  y                              x   2  z   w

4.2.2 点交叉（Point Crossover）

保留大部分树结构，仅交换一个节点：

def point_crossover(parent1, parent2):
    node1 = random.choice(get_all_nodes(parent1))
    node2 = random.choice(get_all_nodes(parent2))
    
    child1 = copy.deepcopy(parent1)
    child2 = copy.deepcopy(parent2)
    
    replace_node(child1, node1, copy.deepcopy(node2))
    replace_node(child2, node2, copy.deepcopy(node1))
    
    return child1, child2

4.3 子树变异（Mutation）

4.3.1 替换变异

随机选择一棵子树，用新生成的随机树替换：

def subtree_mutation(tree, function_set, terminal_set, max_depth=6):
    child = copy.deepcopy(tree)
    mutation_point = random.choice(get_all_nodes(child))
    
    # 生成新的随机子树
    depth = random.randint(0, max_depth)
    new_subtree = grow_initialization(depth, function_set, terminal_set)
    
    replace_subtree(child, mutation_point, new_subtree)
    return child

4.3.2 收缩变异（Shrink Mutation）

将一棵子树替换为随机终止符：

def shrink_mutation(tree, terminal_set):
    child = copy.deepcopy(tree)
    mutation_point = random.choice(get_non_leaf_nodes(child))
    
    replace_node(child, mutation_point, random.choice(terminal_set))
    return child

4.3.3 点变异（Point Mutation）

仅改变单个节点的值：

def point_mutation(tree, function_set, terminal_set, rate=0.01):
    child = copy.deepcopy(tree)
    
    for node in get_all_nodes(child):
        if random.random() < rate:
            if node.is_function():
                node.value = random.choice(function_set)
            else:
                node.value = random.choice(terminal_set)
    
    return child

五、适应度评估

5.1 适应度案例（Fitness Cases）

GP的适应度通过在多个测试案例上的表现来衡量：

class FitnessEvaluator:
    def __init__(self, fitness_cases):
        """
        fitness_cases: [(input1, input2, ..., expected_output), ...]
        """
        self.fitness_cases = fitness_cases
    
    def evaluate(self, tree):
        errors = []
        for inputs, expected in self.fitness_cases:
            try:
                result = tree.execute(*inputs)
                errors.append(abs(result - expected))
            except:
                errors.append(float('inf'))
        
        return {
            'raw_fitness': sum(errors),
            'mean_absolute_error': sum(errors) / len(errors),
            'hits': sum(1 for e in errors if e < 0.01),
        }

5.2 标准化适应度

对于符号回归问题，标准化均方误差（Normalized Mean Squared Error）：

f i t n ess = \frac{1}{N} i = 1 \sum N (\overset{y}{^}_{i} - y_{i})^{2}

标准化版本：

f i t n es s_{s t d} = \frac{\sum _{i = 1}^{N} ( y _{i} - y ^ _{i} ) ^{2}}{\sum _{i = 1}^{N} ( y _{i} - y ˉ ) ^{2}}

Warning

过拟合是GP面临的重要问题。需要使用独立的验证集来评估泛化能力。

六、自动函数发现

6.1 ADF（Automatically Defined Functions）

Koza提出的ADF机制允许进化过程中自动定义和调用子函数：

class ADFTree:
    def __init__(self, main_tree, adf_definitions):
        self.main_tree = main_tree  # 主程序
        self.adf_definitions = adf_definitions  # {name: Tree}
    
    def execute(self, inputs):
        # 准备ADF参数
        adf_args = {f'ARG{i}': inputs[i] for i in range(len(inputs))}
        
        # 计算ADF
        for name, tree in self.adf_definitions.items():
            adf_args[name] = tree.execute(inputs)
        
        # 执行主程序
        return self.main_tree.execute(adf_args)

ADF树结构示例：

ADF1(x, y):                       Main Program:
      +                              IF-ZERO
     / \                            /   |   \
    ×   z                          >   ADF1  ADF2
   / \                              |   |     |
  x   y                           x   y     z

6.2 模块化GP架构

class ModularGP:
    def __init__(self, n_adfs=3):
        self.n_adfs = n_adfs
        self.adf_functions = {}
    
    def evolve_with_adfs(self, fitness_cases):
        # 进化主程序和ADF定义
        pass
    
    def call_adf(self, adf_name, args):
        return self.adf_functions[adf_name].execute(args)

七、经典应用案例

7.1 符号回归（Symbolic Regression）

符号回归旨在发现生成数据的数学表达式：

def symbolic_regression_example():
    """
    目标函数: f(x) = sin(x) + x^2
    """
    # 生成训练数据
    x_train = np.linspace(-10, 10, 100)
    y_train = np.sin(x_train) + x_train**2
    
    # 定义函数集和终止符集
    function_set = ['+', '-', '*', '/', 'sin', 'cos', 'exp', 'log']
    terminal_set = ['x', *np.random.uniform(-1, 1, 10)]
    
    # 运行GP
    gp = GeneticProgramming(function_set, terminal_set, fitness_cases)
    best_solution = gp.evolve()
    
    print(f"Discovered: {best_solution.to_expression()}")
    # 输出类似: sin(x) + (x * x)
 
# 示例运行结果
symreg = SymbolicRegressionFitness(
    X=np.linspace(-10, 10, 100),
    y=np.sin(X) + X**2
)

7.2 电路自动设计

Koza在《Genetic Programming III》中展示了GP自动设计电路的能力，包括：

电路拓扑发现
元件参数优化
信号处理滤波器设计

class CircuitDesignGP:
    def __init__(self, circuit_constraints):
        self.constraints = circuit_constraints
        self.available_components = ['R', 'L', 'C', 'Diode', 'Transistor']
    
    def evaluate_circuit(self, circuit_tree):
        """
        评估电路性能指标
        - 频率响应
        - 功耗
        - 噪声特性
        """
        simulation_result = self.simulate(circuit_tree)
        return {
            'fitness': fitness_score(simulation_result),
            'validity': self.check_constraints(circuit_tree),
        }

7.3 游戏策略演化

class GameStrategyGP:
    def __init__(self, game_state_space):
        self.game_states = game_state_space
    
    def create_game_tree(self):
        function_set = ['IF-STATE', 'AND', 'OR', 'NOT']
        terminal_set = [f'S{i}' for i in range(len(self.game_states))]
        return grow_initialization(5, function_set, terminal_set)
    
    def evaluate_strategy(self, strategy_tree):
        wins = 0
        for game_state in self.game_states:
            action = strategy_tree.execute(game_state)
            if self.play_and_win(action, game_state):
                wins += 1
        return wins / len(self.game_states)

八、防止”不可控生长”（Bloat）

8.1 Bloat现象

GP进化过程中，程序树可能无限增长而适应度不再提升，这被称为”代码膨胀”或”bloat”。

8.2 深度限制策略

class DepthLimitedGP:
    def __init__(self, max_depth=17):
        self.max_depth = max_depth
    
    def is_valid_tree(self, tree):
        return get_tree_depth(tree) <= self.max_depth
    
    def crossover_with_limit(self, parent1, parent2):
        child1, child2 = subtree_crossover(parent1, parent2)
        if not self.is_valid_tree(child1):
            return copy.deepcopy(parent1)
        if not self.is_valid_tree(child2):
            return copy.deepcopy(parent2)
        return child1, child2

8.3 惩罚机制

def size_penalized_fitness(raw_fitness, tree, penalty_coef=0.001):
    complexity = tree.count_nodes()
    return raw_fitness + penalty_coef * complexity

8.4 多目标优化

from deap import tools
 
def evolve_multi_objective(population, fitness_cases):
    creator.create("FitnessMulti", base.Fitness, 
                    weights=(1.0, -1.0))  # 最大化适应度，最小化大小
    creator.create("Individual", list, fitness=creator.FitnessMulti)
    
    toolbox = base.Toolbox()
    # ... setup operators ...
    
    NSGA2 algorithm for multi-objective optimization
    population = algorithms.eaMuPlusLambda(
        population, toolbox, mu=500, lambda_=500, 
        cxpb=0.9, mutpb=0.1, ngen=100
    )
    
    return population

九、实用框架与工具

9.1 DEAP框架

from deap import base, creator, tools, algorithms
import random
 
def deap_gp_example():
    creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    creator.create("Individual", gp.PrimitiveTree, 
                   fitness=creator.FitnessMin)
    
    pset = gp.PrimitiveSet("MAIN", arity=1)
    pset.addPrimitive(operator.add, args=2, name="add")
    pset.addPrimitive(operator.sub, args=2, name="sub")
    pset.addPrimitive(operator.mul, args=2, name="mul")
    pset.addPrimitive(math.sin, args=1)
    pset.addPrimitive(math.cos, args=1)
    
    pset.addEphemeralConstant("rand", lambda: random.uniform(-1, 1))
    pset.addTerminal(1)
    pset.addTerminal(2)
    
    toolbox = base.Toolbox()
    toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=2, max_=6)
    toolbox.register("individual", tools.initIterate, 
                     creator.Individual, toolbox.expr)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)
    toolbox.register("compile", gp.compile, pset=pset)
    
    def evalSymbReg(individual):
        func = toolbox.compile(expr=individual)
        return sum(abs(func(x) - target(x)) for x in points),
    
    toolbox.register("evaluate", evalSymbReg)
    toolbox.register("select", tools.selTournament, tournsize=3)
    toolbox.register("mate", gp.cxOnePoint)
    toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
    toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
    
    algorithms.eaSimple(toolbox.population(), toolbox, 
                        cxpb=0.9, mutpb=0.1, ngen=100)

十、进阶主题与研究前沿

10.1 几何语义遗传编程（Geometric Semantic GP）

直接对函数的语义空间进行操作，而非语法空间：

def geometric_semantic_crossover(parent1, parent2, training_data):
    """
    几何语义交叉保证子代在训练点上的性能
    不劣于任一父代
    """
    new_tree = create_mixing_tree(parent1, parent2)
    return new_tree
 
def geometric_semantic_mutation(tree, mutation_step=0.5):
    """
    语义变异
    """
    target = random.choice(training_data)
    new_tree = add_random_tree()
    return weighted_sum(tree, new_tree, mutation_step)

10.2 概率增量式遗传编程（Probabilistic Incremental Program Evolution）

class PIPE:
    def __init__(self):
        self.probability_matrix = {}  # 存储概率分布
    
    def update_probabilities(self, population, fitnesses):
        for tree in population:
            for node in tree:
                self.probability_matrix[node] += fitness
    
    def sample_tree(self):
        root = self.sample_node()
        self.expand_node(root)
        return root

参考文献

Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
Koza, J. R. (1994). Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press.
Koza, J. R., et al. (1999). Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann.
Poli, R., Langdon, W. B., & McPhee, N. F. (2008). A Field Guide to Genetic Programming. Lulu.com.
Banzhaf, W., et al. (1998). Genetic Programming: An Introduction. Morgan Kaufmann.

人工智能知识库

探索

遗传编程详解

关键词

摘要

一、遗传编程的理论基础

1.1 自动编程的愿景

1.2 与遗传算法的区别

二、树形表示与语法结构

2.1 表达树的基本概念

2.2 表达树的构建示例

2.3 树深度与复杂度控制

三、函数集与终止符集

3.1 函数集的设计

3.1.1 算术函数

3.1.2 布尔函数

3.1.3 条件与循环函数

3.2 终止符集的设计

3.3 闭合性检查

四、遗传操作算子

4.1 初始化方法

4.1.1 完整方法（Full Method）

4.1.2 生长方法（Grow Method）

4.1.3 混合方法（Ramped Half-and-Half）

4.2 子树交叉（Crossover）

4.2.1 标准子树交叉

4.2.2 点交叉（Point Crossover）

4.3 子树变异（Mutation）

4.3.1 替换变异

4.3.2 收缩变异（Shrink Mutation）

4.3.3 点变异（Point Mutation）

五、适应度评估

5.1 适应度案例（Fitness Cases）

5.2 标准化适应度

六、自动函数发现

6.1 ADF（Automatically Defined Functions）

6.2 模块化GP架构

七、经典应用案例

7.1 符号回归（Symbolic Regression）

7.2 电路自动设计

7.3 游戏策略演化

八、防止”不可控生长”（Bloat）

8.1 Bloat现象

8.2 深度限制策略

8.3 惩罚机制

8.4 多目标优化

九、实用框架与工具

9.1 DEAP框架

十、进阶主题与研究前沿

10.1 几何语义遗传编程（Geometric Semantic GP）

10.2 概率增量式遗传编程（Probabilistic Incremental Program Evolution）

参考文献

相关文档

关系图谱

目录