知识图谱构建实战：从零到有的完整指南

这篇文章解决什么问题

知识图谱听起来高大上，到底怎么从一堆文档里构建出来？这篇文章手把手教你：从文本里抽实体、找关系、存到图数据库、最后怎么用。

前言：知识图谱是什么

先说个让我印象深刻的故事。

有一次我去图书馆找书，想了解”深度学习之父”是谁。如果用传统搜索，我得：

搜”深度学习之父”
看到答案”Geoffrey Hinton”
再搜Geoffrey Hinton
看到他的学生”Yoshua Bengio”
再搜Yoshua Bengio…

累不累？

但如果有个知识图谱，它早就知道：

Geoffrey Hinton是深度学习之父
他在多伦多大学工作
他的学生包括Yoshua Bengio、Alex Krizhevsky
他们一起发明了AlexNet
AlexNet赢得了2012年ImageNet竞赛…

我只需要问”深度学习的发展历程”，图谱就能给我串起来。

这就是知识图谱的价值：把知识连成网，而不是散落的点。

一、知识图谱的核心概念

1.1 三个要素

实体（Entity）：现实世界的事物

人物：张三、李四
组织：谷歌、清华大学
概念：人工智能、机器学习
产品：iPhone、GPT-4

关系（Relation）：实体之间的联系

“张三 - 工作于 - 谷歌”
“机器学习 - 属于 - 人工智能”
“Yoshua Bengio - 师从 - Geoffrey Hinton”

属性（Property）：实体的特征

张三：年龄=30，职位=工程师
谷歌：成立时间=1998，总部=加州

1.2 图的表示

实体 = 节点（Node）
关系 = 边（Edge）
属性 = 节点的标签或边的属性

# 简单的图表示
graph = {
    "nodes": [
        {"id": "张三", "type": "人物", "properties": {"职位": "工程师"}},
        {"id": "谷歌", "type": "组织", "properties": {"总部": "加州"}},
        {"id": "人工智能", "type": "概念", "properties": {}},
    ],
    "edges": [
        {"source": "张三", "target": "谷歌", "relation": "工作于"},
        {"source": "人工智能", "target": "机器学习", "relation": "包含"},
    ]
}

1.3 打个比方

想象你在经营一家侦探事务所：

传统文档像一堆散落的档案：

“张三在谷歌工作”
“谷歌是一家AI公司”
“AI是当下的热门技术”

知识图谱像一张关系网：

           AI
           /|\
          / | \
         /  |  \
    机器学习 |  深度学习
        /    \   |
       /      \  |
    深度学习  CNN
      |
      |
    [Geoffrey Hinton] --师从--> [他的学生]

看到区别了吧？图谱把知识串起来了。

二、知识图谱构建流程

2.1 整体流程

原始文档 → 文本预处理 → 实体抽取 → 关系抽取 → 实体链接 → 图谱构建 → 应用

2.2 每一步做什么

步骤	输入	输出	关键技术
文本预处理	原始文本	分词、清洗后的文本	NLP基础
实体抽取	句子	实体边界+类型	NER
关系抽取	句子+实体	(主体, 关系, 客体)	SPO抽取
实体链接	实体 Mention	标准化实体ID	Entity Linking
图谱构建	实体+关系	图数据库	图存储

三、实体抽取

3.1 什么是实体抽取

实体抽取（Named Entity Recognition, NER）就是从文本中找出：

实体在哪里（边界识别）
实体是什么（类型识别）

3.2 常用实体类型

类型	例子
人物（PER）	张三、乔布斯
组织（ORG）	谷歌、清华大学
地点（LOC）	北京、美国
时间（TIME）	2023年、昨天
货币（MON）	100美元、5000元
品牌（PRO）	iPhone、GPT-4

3.3 使用LLM做实体抽取

async def extract_entities_with_llm(text, entity_types=None):
    """
    用LLM抽取实体
    """
    
    entity_types = entity_types or ["人物", "组织", "地点", "时间"]
    
    prompt = f"""
从以下文本中抽取实体，并以JSON格式返回：
 
文本：{text}
 
实体类型：{', '.join(entity_types)}
 
请以JSON格式返回：
{{
    "entities": [
        {{"text": "实体文本", "type": "实体类型", "start": 起始位置, "end": 结束位置}},
        ...
    ]
}}
"""
    
    result = await llm.generate(prompt)
    return json.loads(result)
 
# 使用示例
text = "2023年，张三和李四共同创立了深脑科技，总部位于北京。"
 
result = await extract_entities_with_llm(text)
# 输出：
# {
#     "entities": [
#         {"text": "2023年", "type": "时间", "start": 0, "end": 5},
#         {"text": "张三", "type": "人物", "start": 6, "end": 8},
#         {"text": "李四", "type": "人物", "start": 9, "end": 11},
#         {"text": "深脑科技", "type": "组织", "start": 16, "end": 20},
#         {"text": "北京", "type": "地点", "start": 26, "end": 28}
#     ]
# }

3.4 使用传统NER模型

from transformers import pipeline
 
# 加载中文NER模型
ner_pipeline = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
 
def extract_entities_with_model(text):
    """用BERT模型抽取实体"""
    
    results = ner_pipeline(text)
    
    entities = []
    for result in results:
        # 映射到通用类型
        entity_type = map_entity_type(result["entity_group"])
        entities.append({
            "text": result["word"],
            "type": entity_type,
            "score": result["score"]
        })
    
    return entities
 
def map_entity_type(entity_group):
    """映射实体类型"""
    mapping = {
        "PER": "人物",
        "ORG": "组织",
        "LOC": "地点",
        "MISC": "其他"
    }
    return mapping.get(entity_group, "其他")

3.5 使用 spaCy（中文）

import spacy
 
# 加载中文模型
nlp = spacy.load("zh_core_web_sm")
 
def extract_entities_spacy(text):
    """用spaCy抽取实体"""
    
    doc = nlp(text)
    
    entities = []
    for ent in doc.ents:
        entities.append({
            "text": ent.text,
            "type": ent.label_,
            "start": ent.start_char,
            "end": ent.end_char
        })
    
    return entities
 
# 使用
text = "张三和李四创办了深脑科技，总部在北京。"
entities = extract_entities_spacy(text)
print(entities)
# [{'text': '张三', 'type': 'PERSON', 'start': 0, 'end': 2},
#  {'text': '李四', 'type': 'PERSON', 'start': 3, 'end': 5},
#  {'text': '深脑科技', 'type': 'ORG', 'start': 8, 'end': 12},
#  {'text': '北京', 'type': 'GPE', 'start': 16, 'end': 18}]

四、关系抽取

4.1 什么是关系抽取

关系抽取就是找出实体之间的关系，输出三元组（SPO）：

Subject（主语）
Predicate（谓语/关系）
Object（宾语）

4.2 常用关系类型

关系类型	说明	例子
工作于	人物-组织	张三-工作于-谷歌
师从	人物-人物	李四-师从-王教授
创立	人物-组织	马云-创立-阿里巴巴
位于	组织-地点	谷歌-位于-加州
属于	下位-上位	机器学习-属于-AI
成立于	组织-时间	谷歌-成立于-1998

4.3 使用LLM做关系抽取

async def extract_relations_with_llm(text, entities, relation_types=None):
    """
    用LLM抽取关系
    """
    
    relation_types = relation_types or [
        "工作于", "创立", "位于", "师从", "属于", 
        "合作", "竞争", "使用", "开发"
    ]
    
    # 构建实体列表
    entity_list = "\n".join([
        f"- {e['text']}（{e['type']}）" 
        for e in entities
    ])
    
    prompt = f"""
从以下文本中抽取实体之间的关系：
 
文本：{text}
 
已识别的实体：
{entity_list}
 
关系类型：{', '.join(relation_types)}
 
请抽取所有存在的关系，返回JSON：
{{
    "relations": [
        {{"subject": "主体", "predicate": "关系", "object": "客体"}},
        ...
    ]
}}
 
如果没有发现关系，返回空列表。
"""
    
    result = await llm.generate(prompt)
    return json.loads(result)
 
# 使用示例
text = "2023年，张三和李四共同创立了深脑科技，致力于人工智能研究。"
entities = [
    {"text": "张三", "type": "人物"},
    {"text": "李四", "type": "人物"},
    {"text": "深脑科技", "type": "组织"}
]
 
result = await extract_relations_with_llm(text, entities)
# 输出：
# {
#     "relations": [
#         {"subject": "张三", "predicate": "创立", "object": "深脑科技"},
#         {"subject": "李四", "predicate": "创立", "object": "深脑科技"},
#         {"subject": "张三", "predicate": "合作", "object": "李四"}
#     ]
# }

4.4 开放域关系抽取

如果没有预定义关系类型，可以用开放域抽取：

async def open_relation_extraction(text):
    """
    开放域关系抽取
    不限定关系类型，自动发现
    """
    
    prompt = f"""
从以下文本中抽取所有实体之间的关系，不要限定关系类型：
 
文本：{text}
 
请以JSON格式返回所有发现的关系：
{{
    "relations": [
        {{"subject": "主体", "predicate": "关系", "object": "客体"}},
        ...
    ]
}}
"""
    
    result = await llm.generate(prompt)
    return json.loads(result)

4.5 关系抽取模板

对于结构化文本，可以用模板匹配：

import re
 
# 定义关系抽取模板
RELATION_PATTERNS = {
    "工作于": [
        r"(\w+)[是在](\w+)[工作|任职|担任]",
        r"(\w+)的CEO是(\w+)",
    ],
    "创立": [
        r"(\w+)创立了(\w+)",
        r"(\w+)创办了(\w+)",
    ],
    "位于": [
        r"(\w+)位于(\w+)",
        r"(\w+)总部在(\w+)",
    ]
}
 
def extract_relations_with_patterns(text):
    """用模板匹配抽取关系"""
    
    relations = []
    
    for relation_type, patterns in RELATION_PATTERNS.items():
        for pattern in patterns:
            matches = re.finditer(pattern, text)
            for match in matches:
                relations.append({
                    "subject": match.group(1),
                    "predicate": relation_type,
                    "object": match.group(2)
                })
    
    return relations
 
# 使用
text = "张三在谷歌工作，他是谷歌的联合创始人。"
relations = extract_relations_with_patterns(text)
# [{'subject': '张三', 'predicate': '工作于', 'object': '谷歌'}]

五、实体链接

5.1 什么是实体链接

实体链接就是把文本中的实体Mention（提到）链接到知识库中的标准实体：

"深度学习之父" → Geoffrey Hinton（标准化实体）

5.2 为什么要实体链接

避免同一个实体有多种表达：

Mention（文本）	标准实体
Geoffrey Hinton	Geoffrey Hinton
Hinton	Geoffrey Hinton
深度学习之父	Geoffrey Hinton
图灵奖得主Hinton	Geoffrey Hinton

5.3 实体链接实现

class EntityLinker:
    """实体链接器"""
    
    def __init__(self, knowledge_base):
        """
        knowledge_base: 已知实体库，格式：
        {
            "实体ID": {"name": "名称", "aliases": ["别名1", "别名2"], ...}
        }
        """
        self.kb = knowledge_base
    
    def link(self, mention, context=None, candidates=None):
        """
        将Mention链接到标准实体
        
        返回：{"entity_id": "xxx", "confidence": 0.9}
        """
        
        # 1. 候选实体生成
        if candidates is None:
            candidates = self._generate_candidates(mention)
        
        if not candidates:
            return {"entity_id": None, "confidence": 0.0}
        
        # 2. 候选排序
        scored_candidates = []
        for entity_id in candidates:
            score = self._calculate_score(mention, entity_id, context)
            scored_candidates.append((entity_id, score))
        
        # 3. 选择最佳匹配
        scored_candidates.sort(key=lambda x: x[1], reverse=True)
        
        best_entity, best_score = scored_candidates[0]
        
        return {
            "entity_id": best_entity,
            "confidence": best_score
        }
    
    def _generate_candidates(self, mention):
        """生成候选实体"""
        
        candidates = []
        
        for entity_id, entity_info in self.kb.items():
            # 检查名称匹配
            if mention in entity_info.get("name", ""):
                candidates.append(entity_id)
            # 检查别名匹配
            for alias in entity_info.get("aliases", []):
                if mention in alias:
                    candidates.append(entity_id)
                    break
        
        return candidates
    
    def _calculate_score(self, mention, entity_id, context):
        """计算链接分数"""
        
        entity_info = self.kb[entity_id]
        
        # 名称相似度
        name = entity_info.get("name", "")
        name_score = self._string_similarity(mention, name)
        
        # 别名匹配
        alias_score = 0.0
        for alias in entity_info.get("aliases", []):
            if mention == alias:
                alias_score = 1.0
                break
            elif self._string_similarity(mention, alias) > 0.8:
                alias_score = 0.8
                break
        
        # 上下文匹配（如果有）
        context_score = 0.0
        if context:
            entity_desc = entity_info.get("description", "")
            context_score = self._context_similarity(context, entity_desc)
        
        # 综合分数
        final_score = name_score * 0.5 + alias_score * 0.3 + context_score * 0.2
        
        return final_score
    
    def _string_similarity(self, s1, s2):
        """字符串相似度（简单实现）"""
        if s1 == s2:
            return 1.0
        if s1 in s2 or s2 in s1:
            return 0.8
        # 可以用更复杂的算法如编辑距离
        return 0.0
    
    def _context_similarity(self, context, description):
        """上下文相似度"""
        # 简化实现
        common_words = set(context) & set(description)
        if not common_words:
            return 0.0
        return len(common_words) / max(len(set(context)), len(set(description)))

六、图谱构建与存储

6.1 选择图数据库

数据库	特点	适用场景
Neo4j	功能强大，生态好	通用场景
Neo4j Aura	云托管	快速上手
Amazon Neptune	AWS集成	云原生
TigerGraph	高性能	大规模图
腾讯图数据库	国产，优化好	国内业务

6.2 使用Neo4j存储图谱

from neo4j import GraphDatabase
 
class KnowledgeGraph:
    """知识图谱管理器"""
    
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self.driver.close()
    
    def create_entity(self, entity_id, entity_type, properties=None):
        """创建实体节点"""
        
        properties = properties or {}
        props_str = ", ".join([f"{k}: '${k}'" for k in properties.keys()])
        
        cypher = f"""
        CREATE (e:{entity_type} {{id: '{entity_id}', {props_str}}})
        """
        
        # 设置属性值
        with self.driver.session() as session:
            session.run(cypher, **properties)
    
    def create_relation(self, subject_id, predicate, object_id):
        """创建关系"""
        
        cypher = f"""
        MATCH (s {{id: '{subject_id}'}})
        MATCH (o {{id: '{object_id}'}})
        CREATE (s)-[r:{predicate}]->(o)
        """
        
        with self.driver.session() as session:
            session.run(cypher)
    
    def create_triple(self, subject_id, predicate, object_id, s_props=None, o_props=None):
        """创建三元组（实体+关系）"""
        
        s_props = s_props or {}
        o_props = o_props or {}
        
        # 创建节点
        if s_props:
            self.create_entity(subject_id, s_props.get("type", "Entity"), s_props)
        if o_props:
            self.create_entity(object_id, o_props.get("type", "Entity"), o_props)
        
        # 创建关系
        self.create_relation(subject_id, predicate, object_id)
    
    def query(self, cypher):
        """执行查询"""
        
        with self.driver.session() as session:
            result = session.run(cypher)
            return [dict(record) for record in result]
    
    def get_neighbors(self, entity_id, depth=1):
        """获取实体的邻居"""
        
        cypher = f"""
        MATCH (e {{id: '{entity_id}'}})-[r]-(neighbor)
        RETURN neighbor, r
        """
        
        return self.query(cypher)
    
    def find_path(self, start_id, end_id, max_depth=5):
        """查找两个实体之间的路径"""
        
        cypher = f"""
        MATCH path = shortestPath((s {{id: '{start_id}'}})-[*1..{max_depth}]-(e {{id: '{end_id}'}}))
        RETURN path
        """
        
        return self.query(cypher)

6.3 批量导入图谱

async def bulk_import_graph(triples, kg):
    """
    批量导入三元组
    
    triples格式：
    [
        {"subject": "张三", "predicate": "工作于", "object": "谷歌"},
        ...
    ]
    """
    
    for triple in triples:
        try:
            kg.create_triple(
                subject_id=triple["subject"],
                predicate=triple["predicate"],
                object_id=triple["object"],
                s_props=triple.get("subject_props"),
                o_props=triple.get("object_props")
            )
        except Exception as e:
            print(f"导入失败: {triple}, 错误: {e}")
    
    print(f"成功导入 {len(triples)} 条三元组")

6.4 实际应用示例

# 初始化
kg = KnowledgeGraph("bolt://localhost:7687", "neo4j", "password")
 
# 导入数据
triples = [
    {"subject": "Geoffrey Hinton", "predicate": "被称为", "object": "深度学习之父"},
    {"subject": "Geoffrey Hinton", "predicate": "工作于", "object": "多伦多大学"},
    {"subject": "Geoffrey Hinton", "predicate": "师从", "object": "Christopher Bishop"},
    {"subject": "Yoshua Bengio", "predicate": "工作于", "object": "蒙特利尔大学"},
    {"subject": "Yoshua Bengio", "predicate": "合作", "object": "Geoffrey Hinton"},
    {"subject": "Alex Krizhevsky", "predicate": "合作", "object": "Geoffrey Hinton"},
    {"subject": "AlexNet", "predicate": "发明者", "object": "Alex Krizhevsky"},
    {"subject": "AlexNet", "predicate": "发明者", "object": "Geoffrey Hinton"},
    {"subject": "AlexNet", "predicate": "发表于", "object": "2012"},
]
 
await bulk_import_graph(triples, kg)
 
# 查询：谁被称为深度学习之父？
result = kg.query("""
    MATCH (e)-[r:被称为]->(d {id: '深度学习之父'})
    RETURN e.id
""")
# 结果：[{'e.id': 'Geoffrey Hinton'}]
 
# 查询：Hinton的学生有哪些？
result = kg.query("""
    MATCH (s)-[r:合作|师从]->(e {id: 'Geoffrey Hinton'})
    RETURN s.id, type(r)
""")
 
# 查询：Geoffrey Hinton和Yoshua Bengio的关系
result = kg.find_path("Geoffrey Hinton", "Yoshua Bengio")
# 返回两点之间的关系路径
 
kg.close()

七、知识图谱应用

7.1 问答系统

async def kg_qa(question, kg, llm):
    """
    基于知识图谱的问答
    """
    
    # 1. 抽取问题中的实体
    entities = await extract_entities_with_llm(question)
    
    if not entities:
        return "抱歉，没有找到问题中的实体"
    
    # 2. 链接到图谱
    linker = EntityLinker({})  # 假设已有知识库
    main_entity = linker.link(entities[0]["text"])["entity_id"]
    
    if not main_entity:
        return "抱歉，知识库中没有相关信息"
    
    # 3. 探索邻居
    neighbors = kg.get_neighbors(main_entity)
    
    # 4. 生成回答
    context = "\n".join([f"{n['s.id']} --{n['r']}--> {n['o.id']}" 
                         for n in neighbors])
    
    answer = await llm.generate(f"""
基于以下知识图谱信息回答问题：
 
问题：{question}
 
相关知识：
{context}
 
请根据知识图谱信息回答问题。
""")
    
    return answer

7.2 关系推理

def infer_relations(kg, entity_id):
    """
    推断实体可能的未知关系
    """
    
    # 1. 获取直接关系
    direct_relations = kg.query(f"""
        MATCH (e {{id: '{entity_id}'}})-[r]-(neighbor)
        RETURN type(r) as relation, neighbor.id as entity
    """)
    
    # 2. 获取二跳关系
    two_hop_relations = kg.query(f"""
        MATCH (e {{id: '{entity_id}'}})-[r1]-(n1)-[r2]-(n2)
        WHERE n2 <> e
        RETURN n1.id as intermediate, type(r1) as r1_type, 
               type(r2) as r2_type, n2.id as end_entity
    """)
    
    # 3. 推断传递关系
    # 例如：A工作于X，B工作于X → A和B可能是同事
    
    return {
        "direct": direct_relations,
        "two_hop": two_hop_relations
    }

八、实战：构建企业知识图谱

8.1 场景

假设我们要构建一个”公司-人物-产品”知识图谱：

数据来源：
- 公司介绍文档
- 员工介绍文档
- 产品介绍文档

8.2 完整代码

import asyncio
import json
from typing import List, Dict
from neo4j import GraphDatabase
 
class EnterpriseKnowledgeGraphBuilder:
    """
    企业知识图谱构建器
    """
    
    def __init__(self, llm, kg: KnowledgeGraph):
        self.llm = llm
        self.kg = kg
    
    async def build_from_documents(self, documents: List[Dict]):
        """
        从文档构建知识图谱
        
        documents格式：
        [{"type": "company"|"person"|"product", "content": "..."}, ...]
        """
        
        all_triples = []
        
        for doc in documents:
            triples = await self._process_document(doc)
            all_triples.extend(triples)
        
        # 去重
        unique_triples = self._deduplicate_triples(all_triples)
        
        # 导入图谱
        await self._import_to_graph(unique_triples)
        
        return unique_triples
    
    async def _process_document(self, doc: Dict) -> List[Dict]:
        """处理单个文档"""
        
        content = doc["content"]
        doc_type = doc["type"]
        
        # 1. 抽取实体
        entities = await extract_entities_with_llm(content, 
            ["人物", "组织", "产品", "地点", "时间"])
        
        # 2. 抽取关系
        relations = await extract_relations_with_llm(content, entities)
        
        # 3. 构建三元组
        triples = []
        
        # 添加实体
        for entity in entities:
            triples.append({
                "type": "entity",
                "id": entity["text"],
                "entity_type": self._map_entity_type(entity["type"], doc_type)
            })
        
        # 添加关系
        for relation in relations:
            triples.append({
                "type": "relation",
                "subject": relation["subject"],
                "predicate": relation["predicate"],
                "object": relation["object"]
            })
        
        return triples
    
    def _map_entity_type(self, ner_type: str, doc_type: str) -> str:
        """映射实体类型"""
        
        mapping = {
            ("人物", "person"): "Person",
            ("人物", "company"): "Person",
            ("组织", "company"): "Company",
            ("组织", "product"): "Product",
            ("产品", _): "Product",
            ("地点", _): "Location",
            ("时间", _): "Time",
        }
        
        return mapping.get((ner_type, doc_type), "Entity")
    
    def _deduplicate_triples(self, triples: List[Dict]) -> List[Dict]:
        """三元组去重"""
        
        seen = set()
        unique = []
        
        for triple in triples:
            if triple["type"] == "entity":
                key = f"entity:{triple['id']}"
            else:
                key = f"relation:{triple['subject']}:{triple['predicate']}:{triple['object']}"
            
            if key not in seen:
                seen.add(key)
                unique.append(triple)
        
        return unique
    
    async def _import_to_graph(self, triples: List[Dict]):
        """导入到图数据库"""
        
        entities = [t for t in triples if t["type"] == "entity"]
        relations = [t for t in triples if t["type"] == "relation"]
        
        # 创建实体
        for entity in entities:
            try:
                self.kg.create_entity(
                    entity_id=entity["id"],
                    entity_type=entity.get("entity_type", "Entity"),
                    properties={"source": "document"}
                )
            except:
                pass  # 已存在则忽略
        
        # 创建关系
        for relation in relations:
            try:
                self.kg.create_relation(
                    subject_id=relation["subject"],
                    predicate=relation["predicate"],
                    object_id=relation["object"]
                )
            except Exception as e:
                print(f"关系创建失败: {relation}, {e}")
 
 
# 使用示例
async def main():
    # 初始化
    kg = KnowledgeGraph("bolt://localhost:7687", "neo4j", "password")
    builder = EnterpriseKnowledgeGraphBuilder(llm=your_llm, kg=kg)
    
    # 准备文档
    documents = [
        {
            "type": "company",
            "content": """
            深脑科技成立于2023年，由张三和李四共同创立。
            张三担任CEO，李四担任CTO。
            公司总部位于北京，主要产品包括AI助手和知识图谱系统。
            """
        },
        {
            "type": "person",
            "content": """
            张三，35岁，深脑科技创始人兼CEO。
            曾在谷歌工作10年，师从著名AI专家王教授。
            2023年与李四一起创立深脑科技。
            """
        },
        {
            "type": "product",
            "content": """
            AI助手是深脑科技的核心产品。
            使用深度学习技术，可以回答用户问题。
            知识图谱系统帮助AI理解实体之间的关系。
            """
        }
    ]
    
    # 构建图谱
    triples = await builder.build_from_documents(documents)
    
    print(f"构建完成！共导入 {len(triples)} 个元素")
    
    # 查询验证
    result = kg.query("""
        MATCH (p:Person)-[r]->(c:Company)
        RETURN p.id, type(r), c.id
    """)
    print("人物-公司关系：", result)
    
    kg.close()
 
# 运行
asyncio.run(main())

九、常见问题

9.1 实体抽取不准怎么办

问题：NER模型经常抽错或漏抽

解决方案：

使用更大的模型（如RoBERTa）
针对领域微调
人工审核+规则补充

9.2 关系抽取噪音多

问题：抽出一堆不相关的关系

解决方案：

限定关系类型
设置置信度阈值
后处理过滤

9.3 图数据库性能差

问题：节点太多查询慢

解决方案：

建立索引
分层存储
图分区

十、总结

知识图谱构建的核心步骤：

实体抽取：找出文本中的实体
关系抽取：找出实体间的关系
实体链接：统一实体的不同表达
图谱存储：存到图数据库
应用查询：支持问答和推理

实战建议：

先用LLM快速验证可行性
效果稳定后考虑模型优化
图数据库选型要匹配规模

记住：知识图谱的价值在于”连”，把知识连成网才能发挥最大威力。

人工智能知识库

探索

知识图谱构建实战

知识图谱构建实战：从零到有的完整指南

前言：知识图谱是什么

一、知识图谱的核心概念

1.1 三个要素

1.2 图的表示

1.3 打个比方

二、知识图谱构建流程

2.1 整体流程

2.2 每一步做什么

三、实体抽取

3.1 什么是实体抽取

3.2 常用实体类型

3.3 使用LLM做实体抽取

3.4 使用传统NER模型

3.5 使用 spaCy（中文）

四、关系抽取

4.1 什么是关系抽取

4.2 常用关系类型

4.3 使用LLM做关系抽取

4.4 开放域关系抽取

4.5 关系抽取模板

五、实体链接

5.1 什么是实体链接

5.2 为什么要实体链接

5.3 实体链接实现

六、图谱构建与存储

6.1 选择图数据库

6.2 使用Neo4j存储图谱

6.3 批量导入图谱

6.4 实际应用示例

七、知识图谱应用

7.1 问答系统

7.2 关系推理

八、实战：构建企业知识图谱

8.1 场景

8.2 完整代码

九、常见问题

9.1 实体抽取不准怎么办

9.2 关系抽取噪音多

9.3 图数据库性能差

十、总结

相关主题

关系图谱

目录

反向链接