v2 Language Architecture¶
Status: planned
Audience: compiler contributors and implementation agents
Purpose: define the target shape of ll-lang after 1.x, so the roadmap and spec work against one stable design.
Summary¶
v2 is the release where ll-lang stops being "a language with a self-hosted slice" and becomes "a language whose canonical compiler/toolchain is itself written in ll-lang".
The primary design rule is unchanged: minimize tokens without making semantics ambiguous. v2 does not chase maximal type-theory power. It chases the smallest language that can comfortably implement and extend its own compiler.
Product definition¶
v2 is complete when all of the following are true:
- The canonical compiler source, stdlib source, project resolver, and CLI logic live in ll-lang.
- F# remains only as a frozen bootstrap artifact under a clearly isolated
bootstrap/path. - New language/compiler features land in the self-hosted implementation first.
- The language core stays small, strict, and deterministic; LLM-specific leverage lives mainly in stdlib, MCP, docs, and tooling.
Architectural decisions¶
1. Core language stays small¶
The v2 core includes:
- modules, imports, exports, manifest-driven projects
- ADTs, records, tuples, lists, tags, units, phantom-state encodings
- Hindley-Milner inference with explicit annotation boundaries where needed
- pattern matching with exhaustiveness
- strict evaluation by default
- explicit laziness via stdlib-backed
Lazy - a fixed compact operator layer for pipeline/bind/choice/sequence
The v2 core explicitly excludes:
- user-defined operators
- macros in the core language
- exceptions, implicit effects, implicit coercions
- general HKT/typeclass machinery as a release blocker
- dual primary syntaxes
2. Type system strategy: HM first, specialized abstractions second¶
v2 keeps HM + rank-1 as the baseline. The implementation may add small bidirectional islands only when they reduce ambiguity or dramatically improve diagnostics.
The language is not required to support full HKT/typeclass inference in v2.
Instead, the self-hosting ergonomics target is:
- compact
State - compact
Result - compact
Parsec - compact
Lazy - predictable constructor/function passing
- predictable trailing-lambda and zero-arg call semantics
Monadic style is therefore a library and operator-layer concern, not proof that ll-lang must implement a general higher-kinded abstraction engine in v2.
3. Canonical self-hosting boundary¶
The implementation is split into three layers:
- Stage0 bootstrap Frozen F# compiler used only to build or recover the self-hosted compiler.
- Canonical self-hosted compiler The implementation under active development. This is the source of truth.
- Runtime/platform shims Minimal per-backend helpers needed to run emitted programs and toolchain code.
The stage0 compiler is not removed from the repo in v2, but it is demoted from active implementation to bootstrap support.
4. Stdlib is part of the architecture, not an accessory¶
v2 is designed around three library tiers:
- Prelude Always in scope. Only the shortest, most universal building blocks belong here.
- Self-hosting foundation stdlib
Std.List,Std.Maybe,Std.Result,Std.Map,Std.Str,Std.State,Std.Parsec,Std.Lazy,Std.Json,Std.Toml,Std.Test. - Platform/backend helpers Target-specific helpers and FFI shims that should not pollute the core language model.
The standard library API must be designed for compiler-heavy code, not for textbook completeness.
5. LLM-first value lives outside the grammar where possible¶
The language should remain compact and unsurprising. LLM leverage should come from:
- compiler error shape
- MCP tools
- prompt packs and best-practice docs
- predictable stdlib naming
- token-efficiency benchmarks
- codegen conventions that are easy to reverse-engineer and diff
v2 should not add prompt directives or agent instructions as core syntax unless a later, explicit design proves that the value is worth the surface-area cost.
6. S-expressions are optional IR notation, not a second main syntax¶
If an S-expression layer is explored, it must be constrained to one of these roles:
- macro IR
- structural fixture format
- AST serialization/debug view
- test input for transformation passes
It must not become a second primary authoring syntax in v2.
Compiler architecture target¶
The canonical compiler pipeline for v2 is:
- Source text
- Lexer
- Parser
- Surface AST validation/desugaring
- Elaborator / name resolution / declared-type checks
- Typed core inference and validation
- Backend-neutral lowering
- Backend emission
- Project resolver / build driver / CLI orchestration
Required invariants:
- every major pass has an explicit input/output contract
- at least one typed IR remains stable across backend work
- backend emitters do not re-derive front-end semantics ad hoc
- each pass can be tested independently on corpus fixtures
What must be written into the next spec¶
The next spec pass must define:
- canonical declaration and export surface
- operator table and precedence
- zero-arg function/value/call rules
- explicit laziness semantics
- project and dependency model
- self-hosting contract: stage0 vs canonical compiler
- what is intentionally deferred beyond
v2
docs/language-spec.md must become the single canonical behavior document. The older docs/design-spec.md should be treated as motivation / vision unless rewritten to match the actual roadmap.
The expected companion spec set is:
spec/v2-type-system.mdspec/v2-project-system.mdspec/v2-self-hosting.mdspec/v2-llm-tooling.md
What is intentionally deferred past v2¶
These are valid research directions, but not required for v2:
- full HKT/typeclass solver with constrained inference
- effect rows / algebraic effects
- macro system in the core language
- S-expression primary syntax
- reverse parsing as part of the release-critical architecture
- optimizer-heavy IR work beyond what self-hosting and backend sanity require
Documentation policy for v2 work¶
Any compiler-facing feature PR in the v2 track must update all relevant docs in the same change:
docs/language-spec.mdwhen syntax or semantics changedocs/stdlib-reference.mdwhen canonical APIs changedocs/compiler-dev/*when pipeline boundaries or contributor workflows change- benchmark docs when token or runtime claims change
No v2 implementation work is considered complete if the design docs still describe the old behavior.