Title: Autoformalization on a Shoestring: 130k Lines of Topology in Two Weeks—and Why It Matters

Autoformalization isn't sci-fi; it's a repeatable workflow that turns informal math into machine-checked proofs. This post summarizes a bold topology sprint: an LLM and a fast checker produced 130k lines of formal topology in two weeks on a shoestring budget. Why it matters for AI and math.
1st MONTH FREE Basic or Pro • code FREE
Claim Offer

Title: Autoformalization on a Shoestring: 130k Lines of Topology in Two Weeks—and Why It Matters

Table of Contents
- Introduction
- The AI Topology Sprint: What Happened
- The Tech Stack and Workflow: LLMs Meet Megalodon
- The Big Theorems, Big Implications
- Lessons, Limits, and the Road Ahead
- Key Takeaways
- Sources & Further Reading

Introduction
If you’re into math, AI, or the surprisingly nerdy intersection of both, you’ve probably heard of autoformalization—the idea of pushing informal mathematical reasoning into the exact, machine-checkable world of formal proofs. The latest buzz comes from a bold, hands-on experiment described in the paper 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone? by Josef Urban. The gist: with a smart loop between a large language model (LLM) and a fast proof checker, a large chunk of topology from a standard textbook could be turned into formal proofs at remarkable speed and cost. The paper reports about 130,000 lines of formal topology in two weeks, using a $100–$200/month setup, and finishing major results like Urysohn’s lemmas and the Tietze extension theorem.

This post pulls from Urban’s work and distills what happened, why it matters now, and what it could mean for students, researchers, and software that depends on formal proofs. For context, this is an ongoing, exploratory experiment, not a finished, turnkey solution for every math book. See the original paper for full details and the ongoing updates: 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone? (arXiv:2601.03298).

The AI Topology Sprint: What Happened
In short, Urban set up a long-running feedback loop between a capable LLM-based coding agent (think ChatGPT-Pro with access to Codex, or Claude Code) and a relatively fast proof checker called Megalodon, which handles higher-order set theory. The core mathematical library comes from Brown’s formalization of basic set theory and surreal numbers, with some modernization to suit the LLM-driven workflow.

Key numbers and milestones:
- About 130,000 lines of topology code produced in a two-week burst (December 22 to January 4), with a total around 160,000 lines by January 4, 2026.
- Major theoremsGetting formalized include Urysohn’s lemma (3,000 lines), Urysohn’s metrization theorem (2,000 lines), and a long proof of the Tietze extension theorem (over 10,000 lines).
- The project ran on a lean budget: roughly $100 for the two weeks of LLM usage, plus a $200/month ChatGPT Pro subscription for longer runs and higher credit limits.
- The workflow is a loop: the LLM proposes or refines formal objects; Megalodon checks the proofs; the results feed back into the library, prompting the LLM to fill in gaps, fix mistakes, or push toward bigger theorems.

In practice, the setup looks like this: a sandboxed coding agent runs in a bubble-wrapped environment, connected to Megalodon for proof checking and Brown’s foundational library as a starting point. The LLMs (ChatGPT 5.2 or Claude Sonnet 4.5, used through their coding interfaces) generate the formal statements and proofs, while the proof checker validates them inside the Megalodon framework. The “textbook-to-formal” translation is aided by strategic prompts and structure, plus lightweight tooling for tracing dependencies and progress across the topology sections.

If you want a sense of the scale, think: a standard topology textbook (Munkres, about 241 pages) got a substantial formal counterpart created in this mass-production style. The project’s ambition is not just to translate one chapter, but to automate a large chunk of general topology, with the potential to apply to many textbooks and papers in 2026 and beyond.

The Tech Stack and Workflow: LLMs Meet Megalodon
What makes this approach work is less “one magic model” and more a disciplined, mechanical collaboration between two worlds:
- The LLM-based coding agent: acts as the writer, translator, and proof assistant assistant. It drafts definitions, lemmas, and proofs, and then performs iterative refinements as guided by prompts.
- Megalodon: the higher-order set-theory proof checker. It validates, compiles, and cross-checks the Megalodon-language proofs produced by the LLM.
- The core library: Brown’s formalization of basic set theory and surreal numbers. Some proofs were admitted or simplified for initial load, but the goal was to replace admits with complete proofs over time.
- The “prompts and tools” layer: a compact set of instructions, including a persistent CLAUDE.md file that acts as the authoritative work guide for the coding agent. The prompt tells the agent how to operate, how to avoid backtracking into unrelated areas, and how to escalate to larger theorems as progress permits.
- Isolation and reliability: the coding agent runs inside a sandbox (bubblewrap) in a tmux session, with careful data backup. This reduces the risk that the agent can do something destructive or get lost in a giant project.

A few practical details give a sense of the discipline behind the experiment:
- The LLM is kept in a controlled loop, with a constant, minimal prompt that has to be repeatedly fed to the agent as it finishes each chunk.
- The agent’s workflow includes a robust dependency-tracking script that lists, for each theorem, its status (proved, admitted, recursively admitted), its partial proof length, and its dependencies. This helps steer the LLM toward the “big wins” (the major theorems) while still filling in the infrastructure.
- The project uses a mix of higher-order logic (Megalodon is rooted in a higher-order set theory) and a practical approach to results, admitting partial proofs where needed and aiming to replace them with complete proofs later.
- The team also experimented with a hammer (ATP-based proof search) to offload some of the detailed steps, though the author notes that this was not as useful yet for their large, human-understandable theorems.

The Big Theorems, Big Implications
What got formalized in this sprint are the backbone results that populate an intro-to-topology curriculum and many core theorems in analysis and topology. Highlights include:
- Urysohn’s Lemma: a constructive way to separate disjoint closed sets with a continuous function into the unit interval [0, 1].
- Urysohn’s Metrization Theorem: a path from regularity and a countable basis to metrizability.
- Tietze Extension Theorem: a powerful extension result for real-valued continuous maps from a closed subspace to the whole space, with a specific real-interval version that’s used repeatedly in analysis.
- A suite of supporting results: normality, paracompactness, various basis and subspace results, and a catalog of finite- and infinite-product topologies, compactness arguments, and continuous-image properties.
- The project also tracked and logged the dependencies of these results (which the LLM relies on to decide what to prove next) and highlighted potential bottlenecks, like the need for definitions around completeness and uniform convergence, which often sit in later sections of the textbook.

One striking aspect is not just the sheer line-count of formalizations but the way the author uses the pipeline to address specific high-impact theorems first. The Tietze extension effort, for example, shows how the LLM can be guided to build the necessary prerequisites (like uniform-limit machinery) just early enough to avoid forward-reference blockers, a problem that often bites human formalizers when translating textbooks with cross-references.

Why This Matters Right Now
Why should AI researchers, educators, and software engineers care about this kind of autoformalization experiment? A few angles stand out:

  • Speed and accessibility: turning informal proofs into machine-checked formal proofs used to take months of careful human work. Urban’s setup shows that, with the right loop, you can push large, coherent formalizations forward in weeks or even days, at a cost far lower than hiring teams of experts.

  • Democratization of formal mathematics: the experiment emphasizes a cheap, repeatable workflow that could be replicated by many universities or individual researchers. If autoformalization becomes easier and cheaper, more people can contribute to formal libraries, reduce entry barriers, and improve reproducibility across proofs and libraries.

  • A different kind of collaboration between humans and machines: the success hinges on a well-choreographed interaction—LLMs propose, proof assistants validate, and human prompts tune the process. This isn’t about replacing mathematicians; it’s about letting them focus on creative leaps while the machine handles heavy, repetitive, or error-prone steps.

  • Real-world analogies and software verification: beyond pure math, the same approach could underpin formal verification for safety-critical software, where rigorous proofs of correctness are essential. Autoformalization can populate formal libraries that support verification workflows, potentially speeding up the development of reliable systems.

  • Building on prior AI research: this work sits in a lineage of learning-assisted automated reasoning, neural conjecturing, and the translation between informal mathematics and formal reasoning. It’s a practical demonstration that long, structured formalizations can emerge from a loop that couples learning (LLMs) with deductive checking (Megalodon).

A Real-World Scenario Today
Imagine a university course or an online program that teaches topology or real analysis with live, machine-checked formal proofs accompanying every lecture. An instructor could point students to a dynamic Megalodon-backed formalization, where the LLM preps the formal statements and proofs, and Megalodon verifies them in real-time. Students could experiment with altering hypotheses or constructing new corollaries, with the assurance that the formal checks will catch errors. In industry, teams tackling algorithm correctness—where formal proofs of properties like continuity, convergence, or stability matter—could adopt a similar workflow to bootstrap formal libraries that support safety and reliability guarantees.

Main Content Sections
The AI Topology Sprint: A Practical Overview
- What was formalized: A large portion of general topology from a standard textbook, with particular emphasis on the core theorems that underpin much of analysis and topology.
- The pace: 130k lines of topology code produced in a two-week window, riding on a subscription-based LLM budget. A longer run previously accrued another 30–40k lines, bringing the total to around 160k lines in the early phase.
- The cost and resources: Under $100 for the two-week burst, versus traditional costs of research labor and time. The experiment repurposes available AI tooling into a high-leverage workflow.

The Tech Stack and Workflow: LLMs, Sandbox, and The Keeper of Truth
- LLMs as coding agents: ChatGPT 5.2 (and Claude Code in alternate runs) are used to draft definitions, lemmas, and proofs in Megalodon’s language, guided by a concise, authoritative prompt file (CLAUDE.md) that tells the agent how to operate.
- Megalodon: A higher-order set-theory proof checker that serves as the “truth machine” for the formalized content. Its job is to verify steps, catch regressions, and provide a precise HTML-presented view of the formal world via mgwiki.
- Core library: Brown’s formalization of basic set theory and surreal numbers anchors the formal universe. Some proofs are admitted or stubbed initially to keep the library readable for the LLM, with the aim of replacing those admits with full proofs later.
- The workflow discipline: a long-running shell session, sandboxed coding agents, regular backups, and a progress-tracker that loops in new proofs while keeping the big picture in view. The team also experimented with a hammer to discharge some proof steps into ATPs, though this remains a work in progress.

The Big Theorems and the Territorial Landscape of Proofs
- Urysohn’s Lemma and Metrization: The workflow demonstrates the LLM’s ability to produce the right kind of constructive hypotheses and to navigate the dependencies that lead to the metrization theorem.
- Tietze Extension Theorem: A major milestone that required a substantial scaffold of uniform-limit and completeness infrastructure. The process revealed how forward-references across textbook sections can trip up automated formalization, and it provided a roadmap for how to restructure proofs to fit the formal library’s constraints.
- Dependency tracking and progress monitoring: By maintaining a per-theorem status log and a dependency graph, the system helps direct attention to the most impactful bottlenecks, like completing the prerequisites for the next big theorem.

Lessons, Limits, and the Road Ahead
- The power and the caveat: The experiment shows what is possible now with a transparent, repeatable pipeline, but it also highlights practical limits. Large-scale formalization can run into forward-reference issues, the need for robust infrastructure around uniform convergence and completeness, and the challenge of keeping the LLM’s long-term context coherent across thousands of lines.
- The “compactification” problem: A notable issue Urban discusses is the difficulty of maintaining context in long-running sessions. When session size grows, the LLM’s hidden context can get overwhelmed, forcing tricky resets or partial backups. This is a nontrivial obstacle in scaling autoformalization to entire textbooks.
- The human-in-the-loop balance: The author emphasizes a pragmatic stance—let the LLM push the hard, long proofs forward, but step in with focused prompts when needed to finish or to avoid detours. This balance could become a standard pattern for future autoformalization workflows.
- Future-proofing the approach: The experiment hints at a broader future where autoformalization becomes a standard, widely accessible tool. Other proof assistants, libraries, and textbooks could be formalized with similar setups, broadening the accessibility of formal math and formal verification.

Key Takeaways
- Bigger, cheaper formal math: Autoformalization can yield large swaths of textbook content formalized in a fraction of the traditional time and cost, at least for well-structured domains like topology.
- The right everyday tools: A combination of LLM-based coding agents, a fast proof checker, and a lean core library can produce meaningful formal output quickly, provided there is careful prompting, isolation, and progress tracking.
- It’s not magic; it’s workflow: The success hinges on a disciplined feedback loop, dependency tracking, and strategic prompting—an approach that can be replicated and refined by others.
- Realistic limits today: Long sessions risk context-bloat; forward references across textbook chapters can create blockers; these are solvable problems, but they require careful infrastructure and process design.
- A more formal future: If this approach scales, we could see many math textbooks and papers autoformalized, creating shared, machine-checked resources for education, research, and software verification.

Sources & Further Reading
- Original Research Paper: 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone? https://arxiv.org/abs/2601.03298
- Authors: Josef Urban
- Related context and background discussions cited in the paper (e.g., on learning-assisted autoformalization, Megalodon, and related approaches) are included in the references of the original article.

For readers who want to dig deeper, the arXiv paper provides a thorough account of the setup, the specific prompt strategies, the tooling choices, and the detailed results, including the long-running “Tietze Hill” battle and the 10k-line achievements around Tietze extension. If you’re curious about how a door could swing open for educators, students, and researchers to access formal mathematics more readily, this experiment is a fascinating, concrete glimpse into what’s possible with a thoughtful, repeatable pipeline. And yes: it’s all framed to remain accessible to non-experts, while still delivering the precise, verifiable proofs that formal mathematics demands.

Frequently Asked Questions

Limited Time Offer

Unlock the full power of AI.

Ship better work in less time. No limits, no ads, no roadblocks.

1ST MONTH FREE Basic or Pro Plan
Code: FREE
Full AI Labs access
Unlimited Prompt Builder*
500+ Writing Assistant uses
Unlimited Humanizer
Unlimited private folders
Priority support & early releases
Cancel anytime 10,000+ members
*Fair usage applies on unlimited features to prevent abuse.