Welcome to the Tsonnet series!
If you're just joining, you can check out how it all started in the first post of the series.
In the previous post, we added lexing error tracing.
This time, we will expand it and cover parsing error tracing.
Photo by BBiDDac on Unsplash
A parsing error
Let’s add an extremely simple sample file introducing a parsing error:
diff --git a/samples/errors/sum_int_to_boolean.jsonnet b/samples/errors/sum_int_to_boolean.jsonnet
new file mode 100644
index 0000000..571e0b4
--- /dev/null
+++ b/samples/errors/sum_int_to_boolean.jsonnet
@@ -0,0 +1 @@
+42 + true
Even though this example is pretty simple, it serves us well to pinpoint the parsing error. We have a number
and a boolean
, which are lexed correctly, but the expression is invalid; we can’t add a number to a boolean.
Jsonnet highlights row and columns correctly, but there’s no visual cue to help us:
$ jsonnet samples/errors/sum_int_to_boolean.jsonnet
RUNTIME ERROR: Unexpected type boolean, expected number
samples/errors/sum_int_to_boolean.jsonnet:1:1-10 $
During evaluation
But we want to do better at this front and also add human-friendly error messages to Tsonnet.
Let’s also stretch a little bit and add a multiline sample file:
diff --git a/samples/errors/sum_int_to_boolean_multiline.jsonnet b/samples/errors/sum_int_to_boolean_multiline.jsonnet
new file mode 100644
index 0000000..06d20e6
--- /dev/null
+++ b/samples/errors/sum_int_to_boolean_multiline.jsonnet
@@ -0,0 +1,4 @@
+(1+1+1+1)+
+2+2+2+2+
+3+3+3+false+
+4+4+4+4
Considering that a parsing error can arise at any part of the entirely valid lexed file, we need to make sure that parser errors are captured at the right position.
Now that we have samples to test against, we need a way to capture where each token was found. Let’s see how to capture parsing positions.
Capturing parsing positions
The first file we need to update is the ast.ml
. We need to capture the position in the AST:
diff --git a/lib/ast.ml b/lib/ast.ml
index 00ead06..ed830ad 100644
--- a/lib/ast.ml
+++ b/lib/ast.ml
@@ -14,13 +14,28 @@ type number =
| Int of int
| Float of float
-type expr =
+type value =
| Number of number
| Null
| Bool of bool
| String of string
| Ident of string
- | Array of expr list
- | Object of (string * expr) list
- | BinOp of bin_op * expr * expr
- | UnaryOp of unary_op * expr
+ | Array of value list
+ | Object of (string * value) list
+ | BinOp of bin_op * value * value
+ | UnaryOp of unary_op * value
+
+type position = {
+ startpos: Lexing.position;
+ endpos: Lexing.position;
+}
+
+type expr = {
+ position: position;
+ value: value;
+}
+
+let pos_from_lexbuf (lexbuf : Lexing.lexbuf) : position =
+ { startpos = lexbuf.lex_curr_p;
+ endpos = lexbuf.lex_curr_p;
+ };
It is important to have both the start and end positions of the expression. The new position
type should be part of the expr
.
So far, expr
was a sum type containing all the value variants that could be found in the code. Now, it will be the product type that contains position
and value
.
The pos_from_lexbuf
function will come in handy to capture the position for lexing errors.
Now we need to do some refactoring-driven-development to update everywhere referencing the expr
to use value
where it’s due.
The json.ml
change is straightforward — only replace expr
references to value
:
diff --git a/lib/json.ml b/lib/json.ml
index 26d930c..cd65ec0 100644
--- a/lib/json.ml
+++ b/lib/json.ml
@@ -1,7 +1,7 @@
open Ast
open Result
-let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
+let rec value_to_yojson : Ast.value -> (Yojson.t, string) result = function
| Number n ->
ok (match n with
| Int i -> `Int i
@@ -11,12 +11,12 @@ let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
| String s -> ok (`String s)
| Ident id -> ok (`String id)
| Array values ->
- let expr_to_list expr' = to_list (expr_to_yojson expr') in
+ let expr_to_list expr' = to_list (value_to_yojson expr') in
let results = values |> List.map expr_to_list |> List.concat in
ok (`List results)
| Object attrs ->
let eval' = fun (k, v) ->
- let result = expr_to_yojson v
+ let result = value_to_yojson v
in Result.map (fun val' -> (k, val')) result
in
let results = attrs |> List.map eval' |> List.map to_list |> List.concat
@@ -24,5 +24,5 @@ let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
| _ -> error "value type not representable as JSON"
let expr_to_string expr =
- let yojson = expr_to_yojson expr
+ let yojson = value_to_yojson expr.value
in Result.map Yojson.pretty_to_string yojson
Now, let’s see how to capture the positions in the parser.
Parsing positions
Menhir provides some helpers, and the $startpos
and $endpos
are the ones we are interested in, which return a Lexing.position
.
We can wrap the record building in the with_pos
function to make the parser cleaner. But we’ll need to update every single rule in expr
to use it:
diff --git a/lib/parser.mly b/lib/parser.mly
index 0167984..6292eef 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -1,5 +1,14 @@
%{
[@@@coverage exclude_file]
+ open Ast
+
+ let with_pos startpos endpos value : Ast.expr = {
+ position = {
+ startpos = startpos;
+ endpos = endpos;
+ };
+ value = value
+ }
%}
%token INT
@@ -32,31 +41,34 @@ prog:
;
expr:
- | i = INT { Number (Int i) }
- | f = FLOAT { Number (Float f) }
- | NULL { Null }
- | b = BOOL { Bool b }
- | s = STRING { String s }
- | id = ID { Ident id }
- | LEFT_PAREN; e = expr; RIGHT_PAREN { e }
- | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array values }
- | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object attrs }
- | e1 = expr; PLUS; e2 = expr { BinOp (Add, e1, e2) }
- | e1 = expr; MINUS; e2 = expr { BinOp (Subtract, e1, e2) }
- | e1 = expr; MULTIPLY; e2 = expr { BinOp (Multiply, e1, e2) }
- | e1 = expr; DIVIDE; e2 = expr { BinOp (Divide, e1, e2) }
- | PLUS; e = expr; { UnaryOp (Plus, e) }
- | MINUS; e = expr; { UnaryOp (Minus, e) }
- | NOT; e = expr; { UnaryOp (Not, e) }
- | BITWISE_NOT; e = expr; { UnaryOp (BitwiseNot, e) }
+ | i = INT { with_pos $startpos $endpos (Number (Int i)) }
+ | f = FLOAT { with_pos $startpos $endpos (Number (Float f)) }
+ | NULL { with_pos $startpos $endpos Null }
+ | b = BOOL { with_pos $startpos $endpos (Bool b) }
+ | s = STRING { with_pos $startpos $endpos (String s) }
+ | id = ID { with_pos $startpos $endpos (Ident id) }
+ | LEFT_PAREN; e = expr; RIGHT_PAREN { with_pos $startpos $endpos e.value }
+ | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { with_pos $startpos $endpos (Array values) }
+ | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { with_pos $startpos $endpos (Object attrs) }
+ | e1 = expr; PLUS; e2 = expr { with_pos $startpos $endpos (BinOp (Add, e1.value, e2.value)) }
+ | e1 = expr; MINUS; e2 = expr { with_pos $startpos $endpos (BinOp (Subtract, e1.value, e2.value)) }
+ | e1 = expr; MULTIPLY; e2 = expr { with_pos $startpos $endpos (BinOp (Multiply, e1.value, e2.value)) }
+ | e1 = expr; DIVIDE; e2 = expr { with_pos $startpos $endpos (BinOp (Divide, e1.value, e2.value)) }
+ | PLUS; e = expr; { with_pos $startpos $endpos (UnaryOp (Plus, e.value)) }
+ | MINUS; e = expr; { with_pos $startpos $endpos (UnaryOp (Minus, e.value)) }
+ | NOT; e = expr; { with_pos $startpos $endpos (UnaryOp (Not, e.value)) }
+ | BITWISE_NOT; e = expr; { with_pos $startpos $endpos (UnaryOp (BitwiseNot, e.value)) }
;
+list_value:
+ e = expr { e.value };
+
list_fields:
- vl = separated_list(COMMA, expr) { vl };
+ vl = separated_list(COMMA, list_value) { vl };
obj_field:
- | k = STRING; COLON; v = expr { (k, v) }
- | k = ID; COLON; v = expr { (k, v) }
+ | k = STRING; COLON; e = expr { (k, e.value) }
+ | k = ID; COLON; e = expr { (k, e.value) }
;
obj_fields:
Helper rules such as list_value
, list_fields
, obj_field
, and obj_fields
do not need to include the position. At least, not yet. Maybe soon, when the type checker is implemented, we’ll need it to highlight errors for the expression’s inner tokens. Let’s postpone this to the time it is needed, if it is needed at all.
The parser had many lines changed, but it couldn’t be simpler than that.
The error handling is being exercised by the interpreter until now, but it will be better to isolate it in its own module. Let’s do it before continuing.
Introducing the error module
With an error.ml
module, we can encapsulate all things related to error formatting. It makes things easier to reason about in a contained context:
open Ast
open Result
let enumerate_error_lines filename position ~highlight_error =
let channel = open_in filename in
try
let eaten_chars = ref 0 in
let rec read_lines acc line_num =
try
let line = input_line channel in
let numbered_line = Printf.sprintf "%d: %s" line_num line in
let is_error_line = line_num >= position.startpos.pos_lnum && line_num <= position.endpos.pos_lnum in
let eat_more_chars = position.endpos.pos_cnum > !eaten_chars in
if is_error_line && eat_more_chars then
read_lines (highlight_error line_num line position eaten_chars :: numbered_line :: acc) (line_num + 1)
else
read_lines acc (line_num + 1)
with End_of_file -> List.rev acc
in
let numbered_lines = read_lines [] 1 in
close_in channel;
ok (String.concat "\n" numbered_lines)
with e ->
close_in_noerr channel;
error (Printexc.to_string e)
let plot_caret line_num line pos (eaten_chars: int ref) =
let line_length = String.length line in
let left_padding = String.length (string_of_int line_num) + 2 in (* width of line number + colon + space, e.g.: "%d: "... *)
let buf_size = left_padding + line_length in
let buffer = Buffer.create buf_size in
let start' = min pos.startpos.pos_cnum !eaten_chars in
let end' = min pos.endpos.pos_cnum (line_length-1) in
(* Fill with spaces for the left padding (line number area) *)
for _ = 1 to left_padding do
Buffer.add_char buffer ' ';
incr eaten_chars;
done;
(* Add spaces up to the starting position *)
for _ = 1 to start' do
Buffer.add_char buffer ' ';
incr eaten_chars;
done;
(* Add caret to highlight expr between start and end columns *)
for _ = start' to end' do
Buffer.add_char buffer '^';
incr eaten_chars;
done;
incr eaten_chars; (* +1 for newline *)
let carets = Buffer.contents buffer in
(Printf.sprintf "%*s" left_padding carets)
let trace_file_position err pos =
let start_col = pos.startpos.pos_cnum - pos.startpos.pos_bol in
Printf.sprintf "%s:%d:%d %s\n" pos.startpos.pos_fname pos.startpos.pos_lnum start_col err
let trace (err: string) (pos: position) : (string, string) result =
bind
(enumerate_error_lines pos.startpos.pos_fname pos ~highlight_error: plot_caret)
(fun content -> ok (Printf.sprintf "%s\n%s" (trace_file_position err pos) content))
The error.mli
interface file has been added to expose only the trace
function from the Error
module:
val trace : string -> Ast.position -> (string, string) result
Some new things have been introduced here, compared to the previous format_error
function:
- It handles multiline files.
- We are skipping lines without error. Showing non-relevant lines only adds noise.
- It has a clever logic to highlight the errors from the expr’s start column till the end column.
Cool, huh?!
Let’s proceed to connect the pieces.
Connecting all the pieces
The interpreter needs a few tweaks:
- The
parse
function needs to be refactored to use the newError.trace
function. - Functions such as
interpret_arith_op
,interpret_unary_op
, andinterpret
need to be changed to deal with thevalue
fromexpr
.
The diff might be daunting to stare at, but its complexity hasn’t increased much:
diff --git a/lib/tsonnet.ml b/lib/tsonnet.ml
index e8f6304..3907770 100644
--- a/lib/tsonnet.ml
+++ b/lib/tsonnet.ml
@@ -4,49 +4,6 @@ open Result
let (let*) = Result.bind
let (>>=) = Result.bind
-let enumerate_file_content filename =
- let channel = open_in filename in
- try
- let rec read_lines acc line_num =
- try
- let line = input_line channel in
- let numbered_line = Printf.sprintf "%d %s" line_num line in
- read_lines (numbered_line :: acc) (line_num + 1)
- with End_of_file -> (List.rev acc, line_num)
- in
- let numbered_lines, line_num = read_lines [] 1 in
- close_in channel;
- ok (String.concat "\n" numbered_lines, line_num)
- with e ->
- close_in_noerr channel;
- error (Printexc.to_string e)
-
-let plot_caret column_size =
- if column_size <= 0 then
- ""
- else
- let buffer = Buffer.create column_size in
- (* Fill with spaces except the last position *)
- for _ = 1 to column_size - 1 do
- Buffer.add_char buffer ' '
- done;
- (* Add caret at the end *)
- Buffer.add_char buffer '^';
- Buffer.contents buffer
-
-let format_error (err: string) (lexbuf: Lexing.lexbuf) : (string, string) result =
- let* content, n = enumerate_file_content lexbuf.lex_curr_p.pos_fname in
- let pos_cnum = lexbuf.lex_curr_p.pos_cnum - lexbuf.lex_curr_p.pos_bol in
- let carot_padding = String.length (string_of_int n) + 1 in
- ok (Printf.sprintf "%s:%d:%d %s\n\n%s\n %*s"
- lexbuf.lex_curr_p.pos_fname
- lexbuf.lex_curr_p.pos_lnum
- pos_cnum
- err
- content
- carot_padding (plot_caret pos_cnum)
- )
-
(** [parse s] parses [s] into an AST. *)
let parse (filename: string) : (expr, string) result =
let input = open_in filename in
@@ -54,12 +11,12 @@ let parse (filename: string) : (expr, string) result =
Lexing.set_filename lexbuf filename;
let result =
try ok (Parser.prog Lexer.read lexbuf)
- with | Lexer.SyntaxError err -> (format_error err lexbuf) >>= error
+ with | Lexer.SyntaxError err -> (Error.trace err (Ast.pos_from_lexbuf lexbuf)) >>= error
in
close_in input;
result
-let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : expr =
+let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : Ast.value =
match op, n1, n2 with
| Add, (Int a), (Int b) -> Number (Int (a + b))
| Add, (Float a), (Int b) -> Number (Float (a +. (float_of_int b)))
@@ -79,38 +36,48 @@ let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : expr =
| Divide, (Float a), (Float b) -> Number (Float (a /. b))
let interpret_concat_op (e1 : expr) (e2 : expr) : (expr, string) result =
- match e1, e2 with
- | String s1, String s2 -> ok (String (s1^s2))
- | String s1, expr2 ->
- let* s2 = Json.expr_to_string expr2 in
- ok (String (s1^s2))
- | expr1, String s2 ->
- let* s1 = Json.expr_to_string expr1 in
- ok (String (s1^s2))
- | _ -> error "invalid string concatenation operation"
+ let value =
+ match e1.value, e2.value with
+ | String s1, String s2 -> ok (String (s1^s2))
+ | String s1, val2 ->
+ let* s2 = Json.expr_to_string {e2 with value = val2} in
+ ok (String (s1^s2))
+ | val1, String s2 ->
+ let* s1 = Json.expr_to_string {e1 with value = val1} in
+ ok (String (s1^s2))
+ | _ -> error "invalid string concatenation operation"
+ in Result.map (fun v -> { e1 with value = v }) value
let interpret_unary_op (op: unary_op) (evaluated_expr: expr) =
- match op, evaluated_expr with
- | Plus, number -> ok number
- | Minus, Number (Int i) -> ok (Number (Int (-i)))
- | Minus, Number (Float f) -> ok (Number (Float (-. f)))
- | Not, (Bool b) -> ok (Bool (not b))
- | BitwiseNot, Number (Int i) -> ok (Number (Int (lnot i)))
- | _ -> error "invalid unary operation"
+ let* new_value = (match op, evaluated_expr.value with
+ | Plus, number -> ok number
+ | Minus, Number (Int i) -> ok (Number (Int (-i)))
+ | Minus, Number (Float f) -> ok (Number (Float (-. f)))
+ | Not, (Bool b) -> ok (Bool (not b))
+ | BitwiseNot, Number (Int i) -> ok (Number (Int (lnot i)))
+ | _ -> error "invalid unary operation")
+ in ok { evaluated_expr with value = new_value }
(** [interpret expr] interprets and reduce the intermediate AST [expr] into a result AST. *)
-let rec interpret (e: expr) : (expr, string) result =
- match e with
- | Null | Bool _ | String _ | Number _ | Array _ | Object _ | Ident _ -> ok e
+let rec interpret expr : (expr, string) result =
+ match expr.value with
+ | Null | Bool _ | String _ | Number _ | Array _ | Object _ | Ident _ -> ok expr
| BinOp (op, e1, e2) ->
- (let* e1' = interpret e1 in
- let* e2' = interpret e2 in
- match op, e1', e2' with
- | Add, (String _ as expr1), (_ as expr2) | Add, (_ as expr1), (String _ as expr2) ->
- interpret_concat_op expr1 expr2
- | _, Number v1, Number v2 -> ok (interpret_arith_op op v1 v2)
- | _ -> error "invalid binary operation")
- | UnaryOp (op, expr) -> interpret expr >>= interpret_unary_op op
+ (let* e1' = interpret ({expr with value = e1}) in
+ let* e2' = interpret ({expr with value = e2}) in
+ match op, e1'.value, e2'.value with
+ | Add, (String _ as v1), (_ as v2) | Add, (_ as v1), (String _ as v2) ->
+ let expr1 = { expr with value = v1 }
+ in let expr2 = { expr with value = v2 }
+ in interpret_concat_op expr1 expr2
+ | _, Number v1, Number v2 ->
+ let value = interpret_arith_op op v1 v2
+ in ok ({expr with value = value})
+ | _ ->
+ Error.trace "invalid binary operation" expr.position >>= error
+ )
+ | UnaryOp (op, value) ->
+ interpret ({expr with value = value}) >>= interpret_unary_op op
let run (filename: string) : (string, string) result =
parse filename >>= interpret >>= Json.expr_to_string
Let’s update the tests:
- Add a new cram test against
samples/errors/sum_int_to_boolean.jsonnet
. - Add a new cram test against
samples/errors/sum_int_to_boolean_multiline.jsonnet
. - Run
dune runtest
. - Run
dune promote
.
diff --git a/test/cram/comments.t b/test/cram/comments.t
index 136fc36..c43bb32 100644
--- a/test/cram/comments.t
+++ b/test/cram/comments.t
@@ -4,17 +4,6 @@
$ tsonnet ../../samples/comments/unterminated_block.jsonnet
../../samples/comments/unterminated_block.jsonnet:12:1 Unterminated block comment
- 1 "this is code" /*
- 2 This is a block comment
- 3 .
- 4 .
- 5 .
- 6 isn't
- 7 going
- 8 to
- 9 end
- 10 ?
- 11 ?
- 12 ?
- ^
+ 12: ?
+ ^
[1]
diff --git a/test/cram/errors.t b/test/cram/errors.t
index 679c504..14e45c2 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -1,6 +1,24 @@
$ tsonnet ../../samples/errors/malformed_string.jsonnet
../../samples/errors/malformed_string.jsonnet:1:22 String is not terminated
- 1 "oops... no end quote
- ^
+ 1: "oops... no end quote
+ ^^^^^^^^^^^^^^^^^^^^^
+ [1]
+
+ $ tsonnet ../../samples/errors/sum_int_to_boolean.jsonnet
+ ../../samples/errors/sum_int_to_boolean.jsonnet:1:0 invalid binary operation
+
+ 1: 42 + true
+ ^^^^^^^^^
+ [1]
+
+ $ tsonnet ../../samples/errors/sum_int_to_boolean_multiline.jsonnet
+ ../../samples/errors/sum_int_to_boolean_multiline.jsonnet:1:0 invalid binary operation
+
+ 1: (1+1+1+1)+
+ ^^^^^^^^^^
+ 2: 2+2+2+2+
+ ^^^^^^^^
+ 3: 3+3+3+false+
+ ^^^^^^^^^^^^
[1]
Look at this beauty! Errors are now much easier to spot.
Conclusion
We've successfully extended our error-handling capabilities to pinpoint parsing errors with clear visual highlighting. This makes debugging and development in Tsonnet significantly more user-friendly, as developers can quickly identify and fix issues in their code.
We've built a robust foundation that can be expanded for future features like type checking. The changes required extensive refactoring, but the results were worth it.
See you in the next issue!
Thanks for reading Bit Maybe Wise! Subscribe to get fresh Tsonnet updates delivered straight to your inbox — we'll help you spot those sneaky bugs faster than you'd find a red-and-white striped programmer in a crowded codebase!