Architecture overview¶
Status: All 10 phases complete. Bootstrap fixpoint achieved (compiler₁.fs == compiler₂.fs).
Stage0 vs v2 canonical architecture¶
This document describes the stage0 bootstrap compiler (src/LLLangCompiler/*.fs). Stage0 is
frozen — it is the bootstrap implementation only, not the long-term source of truth for any
subsystem.
The canonical v2 compiler is the self-hosted implementation under stdlib/src/*.lll.
Canonical ownership, pass contracts, and migration notes are the binding authority:
- 14-v2-canonical-compiler-boundaries.md — frozen ownership table
- 15-v2-pass-contracts.md — frozen pass contracts
Namespace split:
- Compiler.* — self-hosted compiler implementation phases
- Std.* — reusable foundation library modules
Stage0 F# files are bootstrap mirrors. Where a self-hosted ll-lang module owns a subsystem, the stage0 file is transitional and must not receive new feature work.
v2 self-hosted compiler — current state (Milestone 1 complete)¶
The following ll-lang modules are now the canonical owners of their subsystems:
| Phase | ll-lang file | Tests | Status |
|---|---|---|---|
| Lexer | stdlib/src/Lexer.lll |
— | transitional bootstrap |
| Parser | stdlib/src/Parser.lll |
— | transitional bootstrap |
| Elaborator | stdlib/src/Elaborator.lll |
— | transitional bootstrap |
| Type representations | stdlib/src/CompilerTypes.lll |
20/20 | canonical |
| Typed IR shapes | stdlib/src/CompilerTyped.lll |
15/15 | canonical |
| HM inference | stdlib/src/CompilerInfer.lll |
12/12 | canonical |
| Lowering | stdlib/src/CompilerLower.lll |
16/16 | canonical |
| F# codegen | stdlib/src/Codegen.lll |
— | transitional bootstrap |
| TypeScript codegen | stdlib/src/CodegenTS.lll |
— | transitional bootstrap |
| Python codegen | stdlib/src/CodegenPy.lll |
— | transitional bootstrap |
| Java codegen | stdlib/src/CodegenJava.lll |
— | transitional bootstrap |
| C# codegen | stdlib/src/CodegenCSharp.lll |
— | transitional bootstrap |
| Full pipeline | stdlib/src/Compiler.lll |
— | transitional bootstrap |
Canonical = ll-lang module is the authority; stage0 is a compatibility mirror only.
Transitional bootstrap = ll-lang module exists but parity gate not yet enforced.
The compiler is a straight pipeline: source text in, target source out. Each
stage is a pure function (modulo the small InferState used in HM) and
returns Result<_, LLError list>.
Pipeline¶
Single-file mode¶
.lll source (string)
│
▼
Lexer.tokenize Tok list with INDENT/DEDENT
│
▼
Parser.parseModuleWithPos LLModule : AST + PosMap
│
▼
Elaborator.elaborate (LLModule', TypeEnv) + E001..E005 checks
│
▼
HMInfer.infer TypedModule : typed AST + Env + dispatch map
│
▼
Codegen*.emit Target source string
(Codegen.fs | CodegenTS.fs | CodegenPy.fs | CodegenJava.fs | CodegenCSharp.fs | CodegenLLVM.fs)
│
▼
.fs/.ts/.py/.java → dotnet run --project <tmp fsproj> (lllc run, F# only)
→ dotnet build (lllc build --target fs)
Entry points:
- Compiler.compile : string -> Result<string, LLError list> — single-file, F# target
- Compiler.compileTarget : Target -> string -> Result<string, LLError list> — single-file, any target
Project mode (Phase 8)¶
lll.toml
│
▼
Manifest.parseManifest LLManifest (name, version, deps, platform)
│
▼
ProjectLoader.loadProject LLProject (topo-sorted LoadedFile list)
│ ▼ glob src/*.lll
│ ▼ parse each file header (module path + imports)
│ ▼ validate module paths (E020)
│ ▼ Kahn's topological sort (E024 on cycle)
▼
Compiler.compileProject for each file: compile → TypedModule
│
▼
Codegen*.emitProjectModules prelude once + all modules concatenated
│ (per-target; loops over [platform] use list)
▼
bin/<name>.fs + bin/<name>.fsproj (single target)
bin/fsharp/<name>.fs (multi-target)
bin/typescript/<name>.ts
A PosMap side-table (see AST.fs) is populated by the parser and
threaded through the elaborator and inference passes so that every
error carries a real line:col from the source (instead of the historic
0:0 placeholder).
Module-by-module¶
Token.fs — token types¶
Enum-like discriminated union with cases for keywords, identifiers, literals,
operators, and the three synthetic layout tokens (Indent, Dedent,
Newline). A Tok carries source Line and Col.
Lexer.fs — tokenizer¶
Character-by-character scan producing a Tok list. Responsible for:
- Keyword recognition via a
Map<string, Token>. - INDENT/DEDENT synthesis from leading whitespace changes.
- String literal escape handling.
- Distinguishing identifiers by case:
[A-Z]...→TypeId, lowercase →Ident.
See 02-lexer.
AST.fs — untyped surface AST¶
Discriminated unions for types (TypeExpr), literals (Literal), patterns
(Pattern), expressions (Expr), and declarations (Decl). An LLModule
is a record of Path, Imports, and Decls.
Type variables are TyVar string; type applications are TyApp(outer, arg)
built left-to-right; units are a separate UnitExpr tree nested inside
TyTagged.
Parser.fs — recursive descent¶
Mutable Ctx cursor over a Tok array. Hand-written precedence climbing:
parseAtom < parseTagged < parseApp < parseMul < parseAdd < parseCons <
parseCmp < parsePipe. Higher-level constructs (parseExprInner,
parseBlockExpr) handle let, if, match, and lambda. The parser
exposes two top-level entries: parseModule (plain) and
parseModuleWithPos (same, but also returns the populated PosMap).
See 03-parser.
Elaborator.fs — declared-type checking pass¶
Four sub-passes:
rewriteTagsInModule— rewritesTyApp(T, TyVar t)toTyTagged(T, UName t)for every declaredtagso that HMInfer sees the tagged form directly.collectDecls— walks decls, populatesTypeEnv : Map<string, TypeExpr>with a largebuiltinEnv(arith/cmp ops plus the Phase 6 stdlib: math, list, maybe, result, str, char, file IO, process), let-binding types (from literal inspection), function signatures, and sum-type constructors.checkDecls— traverses function bodies callingtypeOf, which walks expressions and compares declared vs actual types for each application. Emits E001/E002/E004/E005 viaclassifyMismatchand carries positions from the parser'sPosMap.exhaustivenessCheck— for everyDFnwhose first param type is a named sum type, verifies that everyEMatchbranch list covers every constructor. Emits E003 for missing constructors.
Uses structural type equality with TyVar as a wildcard. Returns both
the rewritten LLModule and the enriched TypeEnv:
elaborate : PosMap -> LLModule -> Result<LLModule * TypeEnv, LLError list>.
See 04-elaborator.
Types.fs — type-scheme plumbing¶
Defines TypeScheme, Subst (flex-var-only), Env, FreshState. Exports
applyType, applyEnv, compose, ftvType, generalize, instantiate.
Rigid vars (no $ prefix) are quantifiers; flex vars ($N) are
unification variables. The bridge fromElaboratorEnv converts
Elaborator's raw TypeEnv into H-M's Env by collecting rigid vars into
scheme quantifier lists.
TypedAST.fs — typed AST¶
A parallel AST where every node has an Id : ExprId, a Type : TypeExpr,
and an Expr : TypedExprKind. Declarations carry a TypeScheme alongside
the body. TypedModule also tracks the final Env and a
Dispatch : Map<ExprId, DispatchInfo> (the trait dispatch table, populated
during inference).
HMInfer.fs — Algorithm W¶
Walk the untyped LLModule producing a TypedModule. Key operations:
unify : TypeExpr -> TypeExpr -> Result<Subst, LLError>inferExpr : Env -> InferState -> Expr -> Subst * TypeExpr * TypedExprinferDecl : Env -> InferState -> Decl -> bool -> (TypedDecl * bool) * Env
Generalizes at every let and fn. Emits E001, E002, E004, E005, E006
(reserved), E008.
See 05-hm-inference.
Codegen.fs — F# emitter¶
Walks the TypedModule producing a big string of F# source. Each
TypedExprKind case has a corresponding emit rule. Binary operators are
detected as TEApp(TEApp(TEVar op, a), b) and rendered infix. A bare main
declaration with no params becomes [<EntryPoint>] let main (argv: string[]) = ... 0.
See 06-codegen.
CodegenTS.fs — TypeScript emitter¶
Emits TypeScript source. Sum types become discriminated unions with a _tag field. Curried functions emit as nested arrow functions. See 11-multi-target-codegen.
CodegenPy.fs — Python emitter¶
Emits Python source. Sum types become @dataclass classes under a Union alias. Curried functions emit as nested def. Pattern matching emits as ternary chains (Python match is not an expression).
CodegenJava.fs — Java 21 emitter¶
Emits Java 21 source. Sum types become sealed interface + record hierarchies. Functions emit as static methods. See 11-multi-target-codegen.
CodegenCSharp.fs — C# emitter¶
Emits compile-safe C# source skeletons for all typed declarations. Current focus is stable emission contract and buildability.
CodegenLLVM.fs — LLVM IR emitter¶
Emits typed LLVM IR with explicit control-flow lowering (if/match) plus a minimal heap-node runtime (__ll_alloc) for ADT/list/tuple representation. The backend is still intentionally subset-oriented (for example full trait parity remains pending).
Platform.fs — target/SDK registry¶
Defines canonical target names, aliases, output extensions, and built-in Platform.*.SDK metadata.
Manifest.fs — TOML subset parser (Phase 8)¶
Hand-written parser for lll.toml project manifests. Supports [table] headers,
key = "value" strings, and key = ["a","b"] string arrays. No NuGet
dependencies. Entry point: parseManifest : string -> Result<LLManifest, string>.
ProjectLoader.fs — multi-file project driver (Phase 8)¶
- Calls
Manifest.parseManifest. - Globs
src/**/*.lllusingDirectory.GetFiles. - Parses each file to extract its
moduleheader andimportlist. - Validates module paths against file locations (E020).
- Topological sort via Kahn's algorithm (E024 on cycle).
- Returns
LLProject { Manifest; RootDir; Files: LoadedFile list }.
Entry point: loadProject : string -> Result<LLProject, LLError list>.
Compiler.fs — pipeline glue¶
Two entry points:
compile : string -> Result<string, LLError list>— single-file pipeline.compileProjectToModulesForTarget : Target -> LLProject -> Result<TypedModule list, LLError list>— multi-file front-end pass for a specific target, used for target-specific external validation.compileProject : LLProject -> Result<string, LLError list>— multi-file pipeline.
lllcself/src/Mcp.lll — self-hosted MCP server (Phase 9)¶
The current MCP server is implemented in ll-lang, not in F#:
lllcself/src/Mcp.lllimplements JSON-RPC parsing/dispatch.lllcself/src/Main.lllroutes themcpsubcommand.src/LLLangTool/Program.fsforwardslllc mcpthroughcmdRunSelf ["mcp"].
Current tool inventory (28), grouped:
| Group | Tools |
|---|---|
| Core compile/check | compile_source, check_source, compile_file, check_file |
| Diagnostics & repair | diagnose_source, diagnose_file, explain_error, fix_suggest, apply_fix_preview |
| Formatting & AST | format_source, format_file, parse_source, typed_ast |
| Project graph/build | project_graph, check_project, build_project |
| Symbol navigation | symbols, definition, references |
| Dependency helpers | mod_add, mod_tidy, mod_why |
| Test helpers | test_list, test_run |
| Utility surface | stdlib_search, list_errors, lookup_error, list_targets |
For implementation details and contract examples, see
docs/compiler-dev/10-mcp-server.md.
Client config (add to ~/.config/claude/mcp.json or Cursor settings):
{
"mcpServers": {
"ll-lang": { "command": "lllc", "args": ["mcp"] }
}
}
src/LLLangTool/Program.fs — CLI¶
The lllc driver. Seven commands:
build <file.lll>— single-file mode (default target: F#).build --target ts|py|java <file.lll>— single-file, named target.build [dir]— project mode (readslll.toml, writesbin/<name>.fs; multi-target via[platform] use).run <file.lll>— canonical run path: resolves imports, compiles to temp multi-file F# project (Prelude.fs+ modules +.fsproj), then shells out todotnet run --project ....check <file.lll>— lex → parse → elaborate → infer without emitting output. Fast type-check.new <name>— scaffold project directory structure.install— sync source-based dependencies declared inlll.tomlintovendor/and rewritell.sum.mod tidy|add|why— dependency management helpers (vendor/+ll.sumworkflow).mcp— launch MCP stdio server (blocks until stdin closes).
F# compile order¶
F# requires forward declarations through <Compile Include="..."> order
in the .fsproj. The current order is significant:
<Compile Include="Token.fs" />
<Compile Include="Lexer.fs" />
<Compile Include="AST.fs" />
<Compile Include="Parser.fs" />
<Compile Include="Elaborator.fs" />
<Compile Include="Types.fs" />
<Compile Include="TypedAST.fs" />
<Compile Include="HMInfer.fs" />
<Compile Include="Codegen.fs" />
<Compile Include="CodegenTS.fs" />
<Compile Include="CodegenPy.fs" />
<Compile Include="CodegenJava.fs" />
<Compile Include="CodegenCSharp.fs" />
<Compile Include="CodegenLLVM.fs" />
<Compile Include="Platform.fs" />
<Compile Include="Manifest.fs" />
<Compile Include="ProjectLoader.fs" />
<Compile Include="Compiler.fs" />
Dependency chain:
Token.fs→Lexer.fsAST.fsis independent;Parser.fsdepends onToken.fs+AST.fsElaborator.fsdepends onAST.fsonlyTypes.fsdepends onAST.fsand (forfromElaboratorEnv) onElaborator.fsTypedAST.fsdepends onAST.fs+Types.fsHMInfer.fsdepends on all of the aboveCodegen.fs,CodegenTS.fs,CodegenPy.fs,CodegenJava.fs,CodegenCSharp.fs,CodegenLLVM.fseach depend onAST.fs+Types.fs+TypedAST.fs. Do notopenmore than one inCompiler.fs— they exportemit, which would shadow. Use fully-qualified names.Platform.fsis independent metadata/target mapping used byCompiler.fsand tooling.Manifest.fsdepends only onSystem(no compiler module deps)ProjectLoader.fsdepends onManifest.fs+Elaborator.fs+Lexer.fs+Parser.fsCompiler.fsglues everything together; dispatches to the right codegen viaTargetDU
If you add a new file, slot it in where its dependencies are satisfied.
Adding it at the end will usually work unless it is depended on by
Codegen.fs or Compiler.fs.
Multi-target dispatch¶
Compiler.fs exposes compileTarget:
type Target = FSharp | TypeScript | Python | Java | CSharp | LLVM
let compileTarget (target: Target) (src: string) : Result<string, LLError list> =
match target with
| FSharp -> compile src // Codegen.emit
| TypeScript -> compileToTS src // CodegenTS.emit
| Python -> compileToPy src // CodegenPy.emit
| Java -> compileToJava src // CodegenJava.emit
| CSharp -> compileToCSharp src // CodegenCSharp.emit
| LLVM -> compileToLLVM src // CodegenLLVM.emit
The front-end (lex → parse → elaborate → infer) runs once per target in multi-target project builds because external declarations are validated through target-specific mappings. compileProject remains the backward-compatible F# wrapper.
Output layout:
- Single target: bin/<name>.fs (backward-compatible)
- Multi-target: bin/fsharp/<name>.fs, bin/typescript/<name>.ts, etc.
See 11-multi-target-codegen for per-backend details.
Package system¶
Source-based dependencies. lll.toml declares them:
[deps]
std = { path = "../stdlib" }
json = "https://github.com/user/ll-json#v1.0.0"
lllc install/lllc mod tidy sync dep sources into vendor/ and rewrite ll.sum. ProjectLoader.fs resolves imports by searching src/ then vendor/*/src/.
Platform SDKs are a future extension ([sdk] table in a dep's lll.toml). See docs/superpowers/specs/2026-04-10-ll-lang-platform-sdk.md for the design.
Parser and Inference Stack¶
- The parser front-end uses
FParsec(FParsecParser.fs) as the primary parser. - A legacy handwritten token/parser path is still present for parity diagnostics and migration fallback.
- Hindley-Milner inference remains handwritten in
HMInfer.fs; only parsing is library-backed. - Error formatting and ll-lang diagnostics (
E001..E006) are still owned by the compiler pipeline, not delegated to generated parser output.