Tutorial 03: Building a Parser¶
A JSON value parser in ll-lang — real-world functional code in under 80 lines. This tutorial shows how ll-lang's ADTs, pattern matching, and recursion compose cleanly for parsing tasks. The stdlib's Toml.lll (292 lines, parsing TOML config files) is the same pattern at larger scale.
The value type¶
JSON has six value kinds. One ADT captures all of them:
module Json
JsonValue =
| JNull
| JBool Bool
| JNum Float
| JStr Str
| JArr (List[JsonValue])
| JObj (List[(Str, JsonValue)])
In TypeScript this would require a discriminated union with explicit tag fields on every variant — about 4× the tokens. In ll-lang, the constructors are the tags.
Token representation¶
Token =
| TNull
| TTrue
| TFalse
| TNum Str
| TStr Str
| TLBrace
| TRBrace
| TLBracket
| TRBracket
| TColon
| TComma
| TEOF
Lexer¶
A simple character-at-a-time lexer. strChars splits a Str into List[Char].
skipWhitespace(cs List[Char]) =
match cs
| [] -> []
| c :: rest ->
if c == ' ' || c == '\n' || c == '\r' || c == '\t'
skipWhitespace rest
else cs
scanString(cs List[Char])(acc List[Char]) =
match cs
| [] -> (strFromChars (listReverse acc), [])
| '"' :: rest -> (strFromChars (listReverse acc), rest)
| c :: rest -> scanString rest (c :: acc)
scanDigits(cs List[Char])(acc List[Char]) =
match cs
| [] -> (strFromChars (listReverse acc), [])
| c :: rest ->
if charIsDigit c || c == '.' || c == '-'
scanDigits rest (c :: acc)
else (strFromChars (listReverse acc), cs)
nextToken(cs List[Char]) =
trimmed = skipWhitespace cs
match trimmed
| [] -> (TEOF, [])
| '"' :: rest ->
let (s, rem) = scanString rest []
(TStr s, rem)
| '{' :: rest -> (TLBrace, rest)
| '}' :: rest -> (TRBrace, rest)
| '[' :: rest -> (TLBracket, rest)
| ']' :: rest -> (TRBracket, rest)
| ':' :: rest -> (TColon, rest)
| ',' :: rest -> (TComma, rest)
| 'n' :: 'u' :: 'l' :: 'l' :: rest -> (TNull, rest)
| 't' :: 'r' :: 'u' :: 'e' :: rest -> (TTrue, rest)
| 'f' :: 'a' :: 'l' :: 's' :: 'e' :: rest -> (TFalse, rest)
| c :: _ ->
if charIsDigit c || c == '-'
let (n, rem) = scanDigits trimmed []
(TNum n, rem)
else (TEOF, [])
tokenize(src Str) =
tokenizeChars (strChars src) []
tokenizeChars(cs List[Char])(acc List[Token]) =
let (tok, rest) = nextToken cs
match tok
| TEOF -> listReverse (TEOF :: acc)
| _ -> tokenizeChars rest (tok :: acc)
Parser¶
A recursive-descent parser. The ParseResult A type threads the remaining token list through each call.
ParseResult A = ParseOk A (List[Token]) | ParseErr Str
parseValue(tokens List[Token]) =
match tokens
| TNull :: rest -> ParseOk JNull rest
| TTrue :: rest -> ParseOk (JBool true) rest
| TFalse :: rest -> ParseOk (JBool false) rest
| TNum n :: rest ->
match strToFloat n
| Some f -> ParseOk (JNum f) rest
| None -> ParseErr (strConcat "bad number: " n)
| TStr s :: rest -> ParseOk (JStr s) rest
| TLBracket :: rest -> parseArray rest []
| TLBrace :: rest -> parseObject rest []
| _ -> ParseErr "unexpected token"
parseArray(tokens List[Token])(acc List[JsonValue]) =
match tokens
| TRBracket :: rest -> ParseOk (JArr (listReverse acc)) rest
| _ ->
match parseValue tokens
| ParseErr e -> ParseErr e
| ParseOk v rest2 ->
match rest2
| TComma :: rest3 -> parseArray rest3 (v :: acc)
| TRBracket :: rest3 -> ParseOk (JArr (listReverse (v :: acc))) rest3
| _ -> ParseErr "expected , or ]"
parseObject(tokens List[Token])(acc List[(Str, JsonValue)]) =
match tokens
| TRBrace :: rest -> ParseOk (JObj (listReverse acc)) rest
| TStr k :: TColon :: rest ->
match parseValue rest
| ParseErr e -> ParseErr e
| ParseOk v rest2 ->
match rest2
| TComma :: rest3 -> parseObject rest3 ((k, v) :: acc)
| TRBrace :: rest3 -> ParseOk (JObj (listReverse ((k, v) :: acc))) rest3
| _ -> ParseErr "expected , or }"
| _ -> ParseErr "expected string key"
parse(src Str) =
tokens = tokenize src
match parseValue tokens
| ParseOk v _ -> Some v
| ParseErr _ -> None
Usage¶
main() =
result = parse "{\"name\": \"alice\", \"age\": 30}"
match result
| None -> printfn "parse error"
| Some v ->
match v
| JObj pairs ->
_ = printfn (strConcat "Parsed " (intToStr (listLen pairs)) " keys")
0
| _ ->
_ = printfn "expected object"
0
Why this is shorter than the alternatives¶
| Language | Equivalent parser LOC | Key reason |
|---|---|---|
| ll-lang | ~80 | ADT constructors are the tokens; match is exhaustive |
| TypeScript | ~180 | Discriminated unions need explicit tag fields; no exhaustiveness by default |
| Python | ~150 | isinstance chains; no ADTs |
| Java | ~250 | Sealed interfaces + instanceof patterns (Java 21); verbose class declarations |
The JsonValue ADT declaration is 8 lines in ll-lang. In Java 21 it is approximately 40 lines of sealed interface + record declarations.
Connection to stdlib¶
stdlib/src/Toml.lll follows exactly this structure:
- ADT for Section (the parser state variant)
- ParseState record threading state through line processing
- Recursive descent via pattern matching on List[Str]
- No mutable state, no exceptions, Maybe for failure
Read it as a larger worked example of the same pattern.
Next steps¶
- 04-multi-target.md — compile this code to TypeScript, Python, and Java
stdlib/src/Toml.lll— production TOML parser, same idiomsstdlib/src/Parser.lll— the full ll-lang parser (802 LOC)