给 Factorio Lua API 文档写一个 MCP server

1776 字

9 分钟

给 Factorio Lua API 文档写一个 MCP server

2026-04-24

技术

MCP

/

Python

/

Factorio

/

Claude-Code

/

Grep

/

HTTP-Caching

起因挺无聊的：想写点 Factorio 的 mod 代码，又记不住 LuaSurface 到底是叫 create_entity 还是 make_entity，每次都得开浏览器翻 lua-api.factorio.com。标签页开了八九个之后我寻思——这事能不能让 Claude Code 直接帮我查？

写个 MCP server 就完了。

先搞清楚”文档”到底是什么#

Factorio 的 API 文档页面长得像 ReadTheDocs，顶部有个 “Libraries / New Functions / Modified Functions” 的大纲。用户给我的链接正好是 auxiliary/libraries.html 这一页。但我第一件事不是去抓 HTML——先看有没有结构化数据。

1
curl -sI "https://lua-api.factorio.com/latest/runtime-api.json" | head -6

1
HTTP/2 200
2
server: nginx
3
content-type: application/json
4
content-length: 1851970
5
last-modified: Wed, 25 Feb 2026 13:18:23 GMT
6
etag: "699ef69f-1c4242"

1.85 MB 的 JSON，配完整的 ETag 和 Last-Modified。继续：

1
curl -s "https://lua-api.factorio.com/latest/runtime-api.json" | python3 -c "
2
import json,sys
3
d=json.load(sys.stdin)
4
print(d['application_version'], 'api_version', d['api_version'])
5
for k in ('classes','events','concepts','defines'): print(k, len(d[k]))
6
"

1
2.0.76 api_version 6
2
classes 148
3
events 219
4
concepts 418
5
defines 60

还有一个 prototype-api.json（1.73 MB）装着 278 个 prototype 和 686 个 type。两个 JSON 加起来就是整份运行时和数据阶段的 API——Factorio 官方专门给 IDE 插件准备的机读格式，路径写在 auxiliary/json-docs-runtime.html 里。

用户原话说”利用现有 search API”。但文档站没有搜索端点——我抓了首页的 HTML，JS 里也没有 lunr / flexsearch 之类的客户端索引。唯一存在的”搜索 API”，其实就是这两份 JSON 本身。想清楚这一点，整个架构就定了：别做索引服务，做 grep。

设计原则#

第一性原理地想三件事：

数据源是什么：两份 JSON + 若干张 auxiliary HTML（libraries、data-lifecycle、storage、mod-structure 这些没进 JSON）。
查询是什么形态：Claude 会用 regex 查 class/method/event 的名字，偶尔搜描述里的关键词。O(N) 扫一遍 10k 条记录完全够快，不值得上 tantivy / bleve。
怎么控流量：JSON 版本更新频率是以周计的（last-modified 是两个月前）。有 ETag，那刷新就走 If-None-Match 拿 304，一次握手大约 200 字节。

结论：

Python + mcp SDK（FastMCP）+ stdio transport
本地缓存目录 ~/.cache/factorio-docs-mcp/，每个文件配一份 .meta.json 记 ETag / Last-Modified / fetched_at
懒加载——第一次 tool call 才拉数据，之后常驻内存
TTL 默认 24 小时；超时就发条件 GET，拿到 304 就只更新 fetched_at

把 JSON 拍扁成 Record#

grep 引擎最怕嵌套结构。我把整棵树展平成一个 Record 列表：

1
@dataclass(slots=True)
2
class Record:
3
    kind: str          # class|event|concept|define|method|attribute|...
4
    name: str          # 全限定名，例如 "LuaSurface.create_entity"
5
    short_name: str
6
    parent: str | None
7
    stage: str         # runtime|prototype|auxiliary
8
    description: str
9
    signature: str     # 方法用调用签名，属性用类型签名
10
    url: str           # 深链到官方页面的 anchor
11
    search_blob: str   # name+signature+description 拼起来的小写串
12
    raw: Any           # 原 JSON，detail lookup 时返回

每条 class 会衍生出 class 本身 + 它的每个 method / attribute / operator 各一条。defines 是嵌套树，我递归展开到叶子，defines.alert_type.entity_destroyed 这种深层常量也能直接 get() 到。最后跑下来：

1
{
2
  "total_records": 10099,
3
  "by_kind": {
4
    "attribute": 2303, "auxiliary": 12, "class": 148,
5
    "concept": 418,   "define": 1502,   "event": 219,
6
    "method": 960,    "prototype": 278, "property": 3550,
7
    "type": 686,      "operator": 11,
8
    "global_function": 3, "global_object": 9
9
  }
10
}

10099 条记录 × 平均两百字节的 search_blob，regex 遍历一遍在我的 M 系列机器上 70 ms。够用了。

URL shape：细节都在 anchor 里#

文档站的 URL 规则我用 curl -I 一条一条验过：

kind	URL 形状
class	`classes/<Name>.html`
method / attribute / operator	`classes/<Class>.html#<Class>.<member>`
event	`events.html#<event_name>`
concept	`concepts/<Name>.html`
define	`defines.html#defines.<dotted.path>`
prototype	`prototypes/<Name>.html`
property	`prototypes/<Parent>.html#<name>`
type	`types/<Name>.html`
auxiliary	`auxiliary/<slug>.html`

这里有个容易踩的坑——event 没有独立页面：

1
curl -s -o /dev/null -w "%{http_code}\n" \
2
  "https://lua-api.factorio.com/latest/events/on_tick.html"
3
# 404
4

5
curl -s -o /dev/null -w "%{http_code}\n" \
6
  "https://lua-api.factorio.com/latest/concepts/Ingredient.html"
7
# 200

concept 有独立页，event 只能靠 events.html#<name> 的 anchor。我要是照着 concept 的规则给 event 也生成 events/<Name>.html，写出来的”深链”全是 404。事先 curl -I 过一轮就是为了这种 case——别相信”文档站 URL 规则一致”，规则永远有例外。

ETag 让刷新几乎不要钱#

缓存层的主循环长这样：

1
resp = self._http().get(url, headers=headers)
2

3
if resp.status_code == 304 and entry.path.exists():
4
    meta["fetched_at"] = time.time()
5
    entry.meta_path.write_text(json.dumps(meta))
6
    return entry
7

8
resp.raise_for_status()
9
entry.path.write_bytes(resp.content)

第一次跑，全是 200 OK，下载约 3.6 MB。第二次带着 If-None-Match 请求，所有 15 个文件都返回 304：

1
GET .../runtime-api.json          "HTTP/1.1 304 Not Modified"
2
GET .../prototype-api.json        "HTTP/1.1 304 Not Modified"
3
GET .../auxiliary/libraries.html  "HTTP/1.1 304 Not Modified"
4
... (共 15 个文件)
5
rebuild after TTL=0 took 2.26s

304 的 body 是空的，一次往返加上 TLS 开销差不多 200–400 字节。15 个文件串行刷完 2.26 秒，绝大多数耗在 RTT 上——真要在乎延迟可以改并发，但 MCP 客户端长连着，冷启之后再没人在乎这两秒。

抽离 auxiliary HTML#

JSON 覆盖了 runtime 和 prototype，但 auxiliary/libraries.html 这类人工写的页面只有 HTML。我不想上 BeautifulSoup，stdlib 的 html.parser 够用：

1
class _TextExtractor(HTMLParser):
2
    DROP = {"script", "style", "noscript", "svg"}
3
    BLOCK = {"p","div","li","tr","br","h1","h2","h3","h4","h5","h6","pre",...}
4

5
    def handle_starttag(self, tag, attrs):
6
        if tag in self.DROP: self._skip += 1
7
        elif tag.startswith("h") and tag[1:].isdigit():
8
            self._out.append("\n\n" + "#" * int(tag[1]) + " ")
9
        elif tag == "li": self._out.append("\n- ")
10
        ...

输出是带 # 标题和 - 列表符的类 markdown 文本，正好适合再被 regex 搜。入口函数先用正则切出 container-inner 主体块再喂给 parser，把导航、页脚都丢掉。

用户可以调 auxiliary("libraries")，返回 7727 字符的纯文本，serpent、table_size、pairs() 这些关键词都能搜到。

七个工具#

1
auxiliary      抓 auxiliary/*.html 的纯文本
2
cache_info     看每个缓存文件的 ETag、大小、年龄
3
get            按全限定名返回完整 JSON + 深链
4
list_entries   列出名字，支持 kind/stage/pattern 过滤
5
refresh        强制无条件刷新
6
search         regex grep，支持 kinds/stages/field 过滤
7
stats          总数 + 上游版本 + 分 kind 计数

最核心的就是 search。举几个真实调用：

1
# 找所有创建 entity 的方法
2
search("create_entity", kinds=["method"])
3
# -> LuaSurface.create_entity
4

5
# 所有以 on_player_ 开头的事件
6
search(r"^on_player_", kinds=["event"], limit=20)
7
# -> on_player_alt_reverse_selected_area
8
#    on_player_alt_selected_area
9
#    on_player_ammo_inventory_changed ... (20 条)
10

11
# 搜 serpent 提到的地方
12
search("serpent", stages=["auxiliary"])
13
# -> libraries
14

15
# 精确匹配类名
16
search(r"^LuaSurface$", field="name", case_sensitive=True)
17
# -> class LuaSurface

get 拿全量：

1
get("LuaSurface.create_entity")
2
# {
3
#   "kind": "method",
4
#   "name": "LuaSurface.create_entity",
5
#   "parent": "LuaSurface",
6
#   "signature": "LuaSurface.create_entity(burner_fuel_inventory?: ..., name: EntityID, position: MapPosition, ...) -> LuaEntity",
7
#   "url": "https://lua-api.factorio.com/latest/classes/LuaSurface.html#LuaSurface.create_entity",
8
#   "raw": { ... 完整参数列表、返回值、描述 ... }
9
# }

测#

写了个 full_stdio.py，起一个子进程跑 server，做完 MCP 初始化后挨个调工具验：

7 个 tool 都注册了
冷启建索引 0.07 秒（缓存命中），暖调 < 0.2 秒
create_entity → LuaSurface.create_entity ✓
^on_player_ → 20 条 event，每条都以 on_player_ 开头 ✓
get("on_tick") URL 结尾是 events.html#on_tick（不是 events/on_tick.html）✓
get("defines.alert_type") 签名里列出全部 13 个枚举值 ✓
get("ContainerPrototype") 指向 prototypes/ContainerPrototype.html ✓
get("BoundingBox") kind 是 type，指向 types/BoundingBox.html ✓
请求不存在的 name 返回 {"error": ...} 而不是抛异常 ✓
auxiliary("libraries") 返回 7727 字符的文本，包含 table_size ✓
拉不存在的 auxiliary 页面返回结构化 error，不崩溃 ✓
refresh() 强制刷新拿到 15 个 304 ✓

27 个 check 全绿。

挂到 Claude Code#

~/.claude.json 里加一段：

1
{
2
  "mcpServers": {
3
    "factorio-docs": {
4
      "command": "/ABS/PATH/factorio-docs-mcp/.venv/bin/factorio-docs-mcp"
5
    }
6
  }
7
}

重启 Claude Code，对话里它就能直接 search("create_entity")，答案里带官方文档的深链，一键点进去验证。不用再翻八九个标签页。

收尾#

整件事最值得琢磨的是”existing search API”那个起点。Factorio 文档站字面意义上没有搜索 API，但它发布了一份完整的 JSON 机读索引——这份 JSON 就是搜索 API 的基础设施层。做 MCP 的时候，我从”抓 HTML → 建索引 → 提供查询”这种默认路径上脱出来，把问题降维成”拉 JSON → regex grep”，结果是 500 行 Python + 零第三方索引引擎 + 1 KB 刷新成本。

代码扔在 github.com/StevenLi-phoenix/factorio-docs-mcp，uv pip install -e . 就能跑。有 Factorio modder 同好可以直接拿来用。

StevenLi-phoenix

/

factorio-docs-mcp

Waiting for api.github.com...

00K

0K

Waiting...