DEV Community

Hercules Lemke Merscher
Hercules Lemke Merscher

Posted on • Originally published at bitmaybewise.substack.com

Tsonnet #9 - ID please

Welcome to the Tsonnet series!

If you're just joining, you can check out how it all started in the first post of the series.

In the previous post, we fixed error handling during lexical analysis:

Now, let's continue building our interpreter by adding support for identifiers.

What is an identifier?

An identifier is a sequence of characters that serves as a name for various programming constructs, such as variables, functions, classes, or modules.

When we added object literals to Tsonnet, we used strings as object attributes. However, it's more common to use identifiers for attribute names. Let's modify our object literal sample to use identifiers for some attributes:

diff --git a/samples/literals/object.jsonnet b/samples/literals/object.jsonnet
index 043f131..cb4c52a 100644
--- a/samples/literals/object.jsonnet
+++ b/samples/literals/object.jsonnet
@@ -2,7 +2,7 @@
     "int_attr": 1,
     "float_attr": 4.2,
     "string_attr": "Hello, world!",
-    "null_attr": null,
-    "array_attr": [1, false, {}],
-    "obj_attr": { "a": true, "b": false, "c": { "d": [42] } }
+    null_attr: null,
+    array_attr: [1, false, {}],
+    obj_attr: { "a": true, "b": false, "c": { "d": [42] } }
 }
Enter fullscreen mode Exit fullscreen mode

Running this code currently breaks our parser:

$ dune exec -- tsonnet samples/literals/object.jsonnet
Fatal error: exception Tsonnet__Parser.MenhirBasics.Error
Enter fullscreen mode Exit fullscreen mode

Let's fix this by implementing identifier support.

Adding identifiers

First, we need to add a new expression type to our AST:

diff --git a/lib/ast.ml b/lib/ast.ml
index 55ddd52..d34a152 100644
--- a/lib/ast.ml
+++ b/lib/ast.ml
@@ -13,6 +13,7 @@ type expr =
   | Null
   | Bool of bool
   | String of string
+  | Ident of string
   | Array of expr list
   | Object of (string * expr) list
   | BinOp of bin_op * expr * expr
Enter fullscreen mode Exit fullscreen mode

Next, we need to define the lexical rules for identifiers. An identifier can start with an underscore or a letter, followed by any number of alphanumeric characters or underscores:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index bbf3b66..bebf28a 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -14,6 +14,8 @@ let exp = ['e' 'E']['-' '+']? digit+
 let float = '-'? digit* frac? exp?
 let null = "null"
 let bool = "true" | "false"
+let letter = ['a'-'z' 'A'-'Z']
+let id = (letter | '_') (letter | digit | '_')*

 rule read =
   parse
@@ -34,6 +36,7 @@ rule read =
   | '-' { SUBTRACT }
   | '*' { MULTIPLY }
   | '/' { DIVIDE }
+  | id { ID (Lexing.lexeme lexbuf) }
   | _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }
   | eof { EOF }
 and read_string buf =
Enter fullscreen mode Exit fullscreen mode

The lexer reads characters from the input and wraps them in the ID token type. The parser needs a few more changes to handle these new tokens:

diff --git a/lib/parser.mly b/lib/parser.mly
index 2b6db25..a224ea3 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -16,6 +16,7 @@
 %token ADD SUBTRACT MULTIPLY DIVIDE
 %left ADD SUBTRACT
 %left MULTIPLY DIVIDE
+%token <string> ID
 %token EOF

 %start <Ast.expr> prog
@@ -32,6 +33,7 @@ expr:
   | NULL { Null }
   | b = BOOL { Bool b }
   | s = STRING { String s }
+  | id = ID { Ident id }
   | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array values }
   | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object attrs }
   | e1 = expr; ADD; e2 = expr { BinOp (Add, e1, e2) }
@@ -44,7 +46,9 @@ list_fields:
   vl = separated_list(COMMA, expr) { vl };

 obj_field:
-  k = STRING; COLON; v = expr { (k, v) };
+  | k = STRING; COLON; v = expr { (k, v) }
+  | k = ID; COLON; v = expr { (k, v) }
+  ;

 obj_fields:
     obj = separated_list(COMMA, obj_field) { obj };
Enter fullscreen mode Exit fullscreen mode

We add the ID token type that will be parsed as a string. The new rule to match ID is straightforward. Finally, we update obj_field to handle both string and identifier keys.

The last step is to update our Tsonnet.interpret and Json.expr_to_yojson functions to handle the new Ast.expr type:

diff --git a/lib/json.ml b/lib/json.ml
index 9b3596f..26d930c 100644
--- a/lib/json.ml
+++ b/lib/json.ml
@@ -9,6 +9,7 @@ let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
   | Null -> ok `Null
   | Bool b -> ok (`Bool b)
   | String s -> ok (`String s)
+  | Ident id -> ok (`String id)
   | Array values ->
     let expr_to_list expr' = to_list (expr_to_yojson expr') in
     let results = values |> List.map expr_to_list |> List.concat in
diff --git a/lib/tsonnet.ml b/lib/tsonnet.ml
index 0e525e2..ae6eb91 100644
--- a/lib/tsonnet.ml
+++ b/lib/tsonnet.ml
@@ -32,7 +32,7 @@ let interpret_bin_op (op: bin_op) (n1: number) (n2: number) : expr =
 (** [interpret expr] interprets and reduce the intermediate AST [expr] into a result AST. *)
 let rec interpret (e: expr) : (expr, string) result =
   match e with
-  | Null | Bool _ | String _ | Number _ | Array _ | Object _ -> ok e
+  | Null | Bool _ | String _ | Number _ | Array _ | Object _ | Ident _ -> ok e
   | BinOp (Add, String a, String b) -> ok (String (a^b))
   | BinOp (op, e1, e2) ->
     let* e1' = interpret e1 in
Enter fullscreen mode Exit fullscreen mode

Conclusion

With these changes, we've successfully added identifier support to Tsonnet! This is a crucial feature that paves the way for more advanced language constructs. In upcoming posts, we'll build upon this foundation to add even more interesting features.

Stay tuned for the next post in the series!


Thanks for reading Bit Maybe Wise! Subscribe and join me in building Tsonnet, one feature at a time. No compiler theory degree required -- just curiosity and a love for coding!

Top comments (0)