Skip to content

ll-lang Compiler Developer Guide

For contributors working on the ll-lang compiler itself. Assumes you know F# well enough to read it; the compiler is ~3.8k lines across the source tree and keeps HM inference handwritten while using a FParsec-based parser front-end.

Contents

  1. Architecture overview — pipeline, project layout, F# compile order
  2. Lexer — tokens, INDENT/DEDENT synthesis, position tracking
  3. Parser — recursive descent, expression precedence, quirks
  4. Elaborator — declared-type checking, E001-E005, exhaustiveness
  5. H-M inference — Algorithm W, Subst, unify, generalize/instantiate, occurs check
  6. Code generation — F# emission, [<EntryPoint>], temp-project dotnet run execution path
  7. Testing — xUnit layout, helpers, corpus drivers
  8. Adding an error code — end-to-end walkthrough
  9. Self-hosting roadmap — historical Phase 7 record
  10. MCP server — embedded compiler tooling for LLM clients
  11. Multi-target codegen — backend contracts and target semantics
  12. v2 language architecture — target design for pure ll-lang core
  13. v2 implementation roadmap — tracked execution plan with done/not-done checklists
  14. v2 canonical compiler boundaries — subsystem ownership map and migration targets
  15. v2 pass contracts — explicit input/output contracts for canonical compiler phases
  16. v2 project system execution — implementer-facing breakdown of the canonical manifest/resolver/lock/vendor lifecycle
  17. v2 stdlib foundation execution — implementer-facing breakdown of the self-hosting foundation stdlib, including post-milestone clarification questions
  18. v2 compiler boundaries execution — implementer-facing breakdown of canonical subsystem ownership and pass-boundary enforcement
  19. v2 syntax ergonomics execution — implementer-facing breakdown of operator, precedence, and compiler-heavy syntax cleanup
  20. v2 self-host transition execution — implementer-facing breakdown of promoting ll-lang to canonical compiler path
  21. v2 llm operating system execution — implementer-facing breakdown of MCP, prompt packs, and machine-readable authoring workflows
  22. v2 benchmarks and release gates execution — implementer-facing breakdown of evidence, benchmarks, and release-blocking gates

Repository layout

ll-lang/
├── spec/
   ├── grammar.ebnf              formal grammar
   ├── type-system.md            H-M rules, tag system, phantom types
   ├── error-codes.md            E001..E008 catalog
   └── examples/
       ├── valid/                corpus of working programs
       └── invalid/              programs with expected error codes
├── src/
   ├── LLLangCompiler/           compiler library (F#)
      ├── Token.fs              Tok type
      ├── Lexer.fs              tokenizer with layout
      ├── AST.fs                untyped surface AST
      ├── Parser.fs             recursive-descent parser
      ├── FParsecParser.fs      strict parser (primary path)
      ├── Elaborator.fs         name resolution, E001-E005, exhaustiveness
      ├── Types.fs              TypeScheme, Subst, generalize, instantiate
      ├── TypedAST.fs           typed AST after inference
      ├── HMInfer.fs            Algorithm W, unify, trait dispatch
      ├── Codegen.fs            F# source emitter
      ├── Compiler.fs           pipeline entry point
      └── LLLangCompiler.fsproj
   └── LLLangTool/               lllc CLI (build/run commands)
       ├── Program.fs
       └── LLLangTool.fsproj
├── tests/
   └── LLLangTests/              xUnit suite (see CI for current count)
       ├── LexerTests.fs           RealLexerTests.fs
       ├── ParserTests.fs          ArithmeticParserTests.fs
                                  TypeParserTests.fs   FnParserTests.fs
                                  ExprParserTests.fs   ModuleParserTests.fs
       ├── ElaboratorTests.fs      ElaboratorRealTests.fs
       ├── HMInferTests.fs         HMInferRealTests.fs
       ├── CodegenTests.fs         CodegenRealTests.fs
       ├── PipelineRealTests.fs
       ├── StdlibTests.fs
       └── BootstrapCompilerTests.fs  -- bootstrap compiler corpus
├── docs/                         user guide + compiler-dev guide (this tree)
└── README.md

Build and test

dotnet build                      # all three projects
dotnet test                       # run xUnit suite (see CI for current count)

The compiler library targets net10.0 with LangVersion=preview and Nullable=enable, and depends on FParsec for strict parsing. Tests depend on xunit 2.6.3 and Microsoft.NET.Test.Sdk 17.8.0.

Conventions

  • Parser stack: strict mode uses FParsecParser; legacy recursive-descent parser is retained for parity/fallback diagnostics.
  • Operator defaults (self-host path): canonical baseline fixities are declared in stdlib/src/Operators.lll (Std.Operators) and consumed by the table-driven self-host parser/resolver flow.
  • No mutable global state. Inference uses a small InferState record passed through the tree walk.
  • Errors are collected, not raised. Compiler functions return Result<T, LLError list>, never throw on a type error.
  • Examples are the source of truth. Every feature must have a valid corpus entry in spec/examples/valid/ and each error code must have an invalid corpus entry in spec/examples/invalid/.