ll-lang Compiler Developer Guide¶
For contributors working on the ll-lang compiler itself. Assumes you know F# well enough to read it; the compiler is ~3.8k lines across the source tree and keeps HM inference handwritten while using a FParsec-based parser front-end.
Contents¶
- Architecture overview — pipeline, project layout, F# compile order
- Lexer — tokens, INDENT/DEDENT synthesis, position tracking
- Parser — recursive descent, expression precedence, quirks
- Elaborator — declared-type checking, E001-E005, exhaustiveness
- H-M inference — Algorithm W,
Subst, unify, generalize/instantiate, occurs check - Code generation — F# emission,
[<EntryPoint>], temp-projectdotnet runexecution path - Testing — xUnit layout, helpers, corpus drivers
- Adding an error code — end-to-end walkthrough
- Self-hosting roadmap — historical Phase 7 record
- MCP server — embedded compiler tooling for LLM clients
- Multi-target codegen — backend contracts and target semantics
- v2 language architecture — target design for pure ll-lang core
- v2 implementation roadmap — tracked execution plan with done/not-done checklists
- v2 canonical compiler boundaries — subsystem ownership map and migration targets
- v2 pass contracts — explicit input/output contracts for canonical compiler phases
- v2 project system execution — implementer-facing breakdown of the canonical manifest/resolver/lock/vendor lifecycle
- v2 stdlib foundation execution — implementer-facing breakdown of the self-hosting foundation stdlib, including post-milestone clarification questions
- v2 compiler boundaries execution — implementer-facing breakdown of canonical subsystem ownership and pass-boundary enforcement
- v2 syntax ergonomics execution — implementer-facing breakdown of operator, precedence, and compiler-heavy syntax cleanup
- v2 self-host transition execution — implementer-facing breakdown of promoting ll-lang to canonical compiler path
- v2 llm operating system execution — implementer-facing breakdown of MCP, prompt packs, and machine-readable authoring workflows
- v2 benchmarks and release gates execution — implementer-facing breakdown of evidence, benchmarks, and release-blocking gates
Repository layout¶
ll-lang/
├── spec/
│ ├── grammar.ebnf formal grammar
│ ├── type-system.md H-M rules, tag system, phantom types
│ ├── error-codes.md E001..E008 catalog
│ └── examples/
│ ├── valid/ corpus of working programs
│ └── invalid/ programs with expected error codes
├── src/
│ ├── LLLangCompiler/ compiler library (F#)
│ │ ├── Token.fs Tok type
│ │ ├── Lexer.fs tokenizer with layout
│ │ ├── AST.fs untyped surface AST
│ │ ├── Parser.fs recursive-descent parser
│ │ ├── FParsecParser.fs strict parser (primary path)
│ │ ├── Elaborator.fs name resolution, E001-E005, exhaustiveness
│ │ ├── Types.fs TypeScheme, Subst, generalize, instantiate
│ │ ├── TypedAST.fs typed AST after inference
│ │ ├── HMInfer.fs Algorithm W, unify, trait dispatch
│ │ ├── Codegen.fs F# source emitter
│ │ ├── Compiler.fs pipeline entry point
│ │ └── LLLangCompiler.fsproj
│ └── LLLangTool/ lllc CLI (build/run commands)
│ ├── Program.fs
│ └── LLLangTool.fsproj
├── tests/
│ └── LLLangTests/ xUnit suite (see CI for current count)
│ ├── LexerTests.fs RealLexerTests.fs
│ ├── ParserTests.fs ArithmeticParserTests.fs
│ │ TypeParserTests.fs FnParserTests.fs
│ │ ExprParserTests.fs ModuleParserTests.fs
│ ├── ElaboratorTests.fs ElaboratorRealTests.fs
│ ├── HMInferTests.fs HMInferRealTests.fs
│ ├── CodegenTests.fs CodegenRealTests.fs
│ ├── PipelineRealTests.fs
│ ├── StdlibTests.fs
│ └── BootstrapCompilerTests.fs -- bootstrap compiler corpus
├── docs/ user guide + compiler-dev guide (this tree)
└── README.md
Build and test¶
dotnet build # all three projects
dotnet test # run xUnit suite (see CI for current count)
The compiler library targets net10.0 with LangVersion=preview and
Nullable=enable, and depends on FParsec for strict parsing. Tests
depend on xunit 2.6.3 and Microsoft.NET.Test.Sdk 17.8.0.
Conventions¶
- Parser stack: strict mode uses
FParsecParser; legacy recursive-descent parser is retained for parity/fallback diagnostics. - Operator defaults (self-host path): canonical baseline fixities are
declared in
stdlib/src/Operators.lll(Std.Operators) and consumed by the table-driven self-host parser/resolver flow. - No mutable global state. Inference uses a small
InferStaterecord passed through the tree walk. - Errors are collected, not raised. Compiler functions return
Result<T, LLError list>, never throw on a type error. - Examples are the source of truth. Every feature must have a valid
corpus entry in
spec/examples/valid/and each error code must have an invalid corpus entry inspec/examples/invalid/.