Lox interpreter in Rust

2024-10-06 1781 words 9 minutes

/en/posts/2024-10-06_about_rust/code_rust_lox.png

Contents

One day when listening to one of Jon Gjengset videos I felt inspired to build something in Rust using https://app.codecrafters.io/catalog page. As a Rust beginner and self-diagnosed ADD thought my inspiration would not last long.

Interestingly, I continued with tasks day by day, slowly making progress till the end (of the beta version available for free). At the moment my interpreter is comfortably sitting in the GitHub repo and you can see the mess I did there.

Below you can find unedited notes I was making during my journey. I’m publishing them or my blog for myself to remember (and remind myself in the future) that well-split tasks can be achieved even if they seem overly complex at the beginning.

Contrary to real-life problems, these types of tasks (on codecrafters page) are really well-defined and I didn’t have to worry about sudden unwrapping of additional work (as it often happens with “estimated” stories in Agile setting at work). The pleasure of achieving next milestone physically experienced by ticking boxes is so sexy that I was happy like a kid when doing those exercises. Good stuff if you are struggling with mood swings or autumn depression episodes.

Day 1

This is a warm-up - I’m parsing single or two-character tokens.

Using modules

“file not included in the source tree” https://stackoverflow.com/questions/46829539/how-to-include-files-from-same-directory-in-a-module-using-cargo-rust

All of your top level module declarations should go in main.rs, like so:

1
2
3
4
5
6
7


mod mod1;
mod mod2;

fn main() {
    println!("Hello, world!");
    mod1::mod1fn();
}

You can then use crate::mod2 inside mod1:

1
2
3
4
5
6


use crate::mod2;

pub fn mod1fn() {
    println!("1");
    mod2::mod2fn();
}

Reference to “The Book”: https://doc.rust-lang.org/book/ch07-00-managing-growing-projects-with-packages-crates-and-modules.html

File size

My test.lox file has following contents:

1

(()

It should have size of three bytes, shouldn’t it? It seems it has additional byte of value 10 (0xA) at the end. When I parse it,. I get:

1
2
3
4
5
6
7
8


Peeked: ( [28]
LEFT_PAREN ( null
Peeked: ( [28]
LEFT_PAREN ( null
Peeked: ) [29]
RIGHT_PAREN ) null
Peeked:
 [a]

1
2
3
4
5
6


➜  codecrafters-interpreter-rust git:(master) ✗ /bin/cat test.lox
(()
  codecrafters-interpreter-rust git:(master) ✗ stat --printf="%s" test.lox

4%
➜  codecrafters-interpreter-rust git:(master) ✗

What is that last byte?

Ouch. After reading this: https://stackoverflow.com/questions/25997631/why-is-java-linux-appending-0xa-to-end-of-file I realized that I created test.lox file I used echo without -n flag (this flag prevents writing linefeed character)…

1
2
3
4
5


➜  codecrafters-interpreter-rust git:(master) ✗ echo -n "(()" test.lox
(() test.lox%                                                                                                                               ➜  codecrafters-interpreter-rust git:(master) ✗ echo -n "(()" > test.lox
➜  codecrafters-interpreter-rust git:(master) ✗ stat --printf="%s" test.lox

3%

Exit Code

If you need to return a specific error code from main() , use std::process::ExitCode as the return type.

Iterator state

Peekabel iterator should be created once and advance in next; I created normal iterator at Lexer’s instantiation and was creating peekable everytime next() was called; this was casuing issues with its state (too much advancing) so commit 90d2007 fixes it.

Day 2

Today I implement parsing of string and numeric literals.

Tuple struct

In order to be able to initialize tuple struct in test module, I need to declare it as public:

pub(crate) struct Numeric(pub f64);

Today I started to write actual junits. I confess they are poor as hell. I wish the CodeCrafters had a stats showing code coverage and optionally use it as a task compeltion metrics.

Day 3

Lexer is complete. Now, I try to prepare my code to parsing expressions. First surprise: my NeoVim failed when trying to delete surrounding braces:

Ok, descent parser seems to be complete. Some core dumps due to infinite recursion was casued by invalid implementation of a Display trait. Then some tests failed due to unexpected text representation of parsed tokens.

I’m fixing it one by one. Day three ends on unary operators parsing.

Day 4

Refactored Token: it is no longer an enum with variants, but a struct; I renamed variants enum from Token to TokenType. Now Token contains a TokenType and line number:

1
2
3
4
5
6
7
8
9


#[derive(Debug, Clone, PartialEq)]
pub(crate) struct Token {
    /// Type of token
    pub typ: TokenType,
    /// line number where the token was seen
    pub ln: LineNum,
    /// parsed input fragment
    pub s: String,
}

Day 5

I lost track of days, but I completed “Parsing expressions” section and started “Evaluating expressions” - it probably hapenned over the weekend. I was exploring whether for Erros I shoudl use &str or String. My conclusions are:

In Rust, whether error types should own message Strings or borrow &strs depends on the specific use case and design considerations. Here are some factors to consider:

1. Error Lifetimes and Flexibility

Owned Strings (String):
- Owning a String allows for greater flexibility. The error type can be moved around without worrying about the lifetime of the borrowed data.
- You can construct error messages dynamically or based on data that may not live long enough to be borrowed.
- If the error message is derived from different parts of the codebase or needs to be constructed on the fly, owning the message makes sense.
Borrowed Strings (&str):
- Borrowing a &str can be more efficient when the error type is only referencing static data or data that is guaranteed to live as long as the error type itself.
- This approach avoids allocating memory for new Strings and can be faster in scenarios where the error messages are predefined and static. Therefore, my EvalError definition uses String:

1
2
3
4
5
6
7
8
9


[derive(Debug)]
pub struct EvalError {
    s: String,
}

impl EvalError {
    fn new(s: String) -> EvalError {
        EvalError { s }
    }

… and everytime I pass &'static str to a struct initializer expression I need to convert it to owned String by using into():

1

EvalError { s: "No unary operators for string".into() }

Day 6

2024-09-04

evaluating Strings and numbers

Day 7

allow boolean evalutaiton of numbers and nil (awful stuff! I love statically typed languages)

DAY 8

I can see that my implementation is so complicated that I’m drawning in levels of matching. Is this a proper way to go? Lots of matching and enums everywhere?

I probably made a mistake that prevents me from continuing fast implementation. Let’s think a little before I start. Visitor pattern in Rust? Should I follow this path?

DAY9

Saturday! I just do pattern matching. Don’t worry, it will work.

I needed to augment my EvalResult type with a token (which keeps a location in sourcecode) so that I have access to a line number in case I’d like to display a runtime error message.
I wrote factory methods for constructing specific variants (simplfies syntax a bit)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


#[derive(Debug, PartialEq)]
pub enum EvalResult {
    Numeric { value: f64, token: Token },
    Boolean { value: bool, token: Token },
    String { value: String, token: Token },
    Reserved { value: String, token: Token },
}
impl EvalResult {
    fn of_boolean(value: bool, token: &Token) -> EvalResult {
        Self::Boolean { value, token: token.clone() }
    }

    fn of_reserved(arg: &str, token: &Token) -> EvalResult {
        Self::Reserved {
            value: arg.to_string(),
            token: token.clone()
        }
    }

    fn of_string(arg: String, token: &Token) -> EvalResult {
        Self::String {
            value: arg,
            token: token.clone()
        }
    }

    fn of_numeric(f: f64, token: &Token) -> EvalResult {
        Self::Numeric { value: f, token: token.clone() }
    }
}

I’ve also learned that if I patern-match on a value and the pattern is “destructured”, this casues the value to be partially borrowed(!). This means I cannot clone it or use it after the match. The solution I found was to

use ref keyword in the pattern so that I get a reference instead of a parital borrow
implement clone() so that I can use the cloned value in eprintln! macro

Day 10

Sunday. Statements and state - generate output and multiple statements; slow progress.

changed the concept of passing results from parser
separate parsing from execution (leaking abstraction)
stumbled upon reading empty lines
- should not count when they separate statements
- should count when are part of e.g. string literal

problem with multiline strings - do I parse them correctly?*

Day 12

Monday. After work. Only looked at the code, tired, a bit worried if I’d ever make any progress on that.

Day 13

Thursty, my birtday!

split string in Rust

See https://stackoverflow.com/questions/26643688/how-do-i-split-a-string-in-rust

1
2
3
4


let parts = "some string 123 content".split("123");
```rust
let collection = parts.collect::<Vec<&str>>();
dbg!(collection);

Interested in what dbg!() macro can do? https://dhghomon.github.io/easy_rust/Chapter_38.html

Trouble with returning a vector of data read from file:

DAY 14

finally create two tests that
- read lines - testcases - from a file and does assertion on each
- reads pairs of files - .lox with program and .out containint expected output of runnnig that program - and asserts the actual output equals that from
found out that helper functions in tests need to be “annotated” (for conditional compilation) as #[cfg(test)] - otherwise they are reported as unised code by rust-analyzer
found a workaround: place tests and all heper functions in mod some_name {...} block and annotate single block instead of all the functions
discovered cargo clippy and learned a lot!

DAY ?

Saturday. I have variables! And assignments!

2024-09-26

I implemented if statement. There is an error somewhere that makes the interpreter “hang” indefinitely, so I have a bug. Hopefully, I coudl idenfify it by checking the

tokenization (works)
parsing (hangs here) Lets’s see…

Sunday.

References. Recurence types. Oh. That’s hard.

fighting with stack overflow error; found too much recursiion by slow and painful review of last changes; some resources:

fighting a little with recursive Environment implementation; not complete yet;
Ok! I did it! Rc<RefCell<Environment>> was not needed at all, I just Boxed teh environment

Summary

this is not the full interpreter yet, it has a very ad-hoc desing (or lacks it completely)
I don’t feel comformatble with refs; I clone everything which is the top one Rust antipattern
I find enums and pattern matching more readable, better navigable and in general superior to object oriented approach; I need to think why and I’d like to understand pros and cons of both ideas as ways of reprsenting data and defining behavior
I need to investigate how to better understand two concepts: using Result type and representing invalid values as enum variants. Another pair of concepts - a bit mixed - is parsing (and parsing errors) and evaluating (and runtime errors); they are separated in my code, but not separated well in my understanding.

The end

My very messy repository on GitHub: https://github.com/kamchy/codecrafters-interpreter-rust