Why create this series?
I guess whenever we launch series we should ask ourselves why that series was created in the first place and It could be any of the following things
- There was a gap in the market, That you wanted to fix.
- You wanted to do it because its your job of writing content
- You see it as a notemaking exercise that you can look back later and revise.
For me its none of the above. I wanna expand this to a fully unfiltered guide on what my state of mind looks like and how it evolves after I have created my first language.
Also, Let's admit it has been always in the back of our mind to create our own programming language someday right, Ever since we started coding?
Initial Exploration
So we start with a simple youtube search first of all and we landed with a pretty good video(Its in hindi tho)
I made my own programming language
From here we got to understand the basics of what a components a programming language requires to function (and a refresher of my college course)
- Lexer/Tokenizer (Converts the program into tokens)
- Abstract Syntax Tree creator (Takes input of the tokens in the previous step and Creates a AST which holds the logical flow how the program should process the tokens step by step)
- Code Generator (Takes the input of created AST and converts it into machine understandable language)
- Code Runner (Runs the machine code generated in the previous step)
I am pretty sure of how I used the above layman terms to describe each component that is required by a programming language to run. They are all going to get their technical descriptions later on and maybe new components might get added as well.
Self study material
While looking for more material to study through on the internet I came across Robert Nystrom CRAFTING INTERPRETERS. So called the HOLY GRAIL OF HOW TO MAKE YOUR OWN PROGRAMMING LANGUAGE
Exactly what we needed!
Now we start reading this
Here is a summary of few pages down the line
Started Reading the Book
Here is what I got to know so far
The book is written to use Java, but there are open source implementations of the same java snippets provided by the community (I Love open source man 💕). We are going to use the
Javascript
andC++
implementations of the sameThe author suggests to be versed with recursion, dynamic arrays, trees, graphs, and hash tables.
Now connecting the dots backwards, It seems like we don't need to implement the lexer from scratch these days for Javascript as there are already libraries for it such as Lex and Yacc, so-called compiler-compilers.
That automatically generate some of the source files for an implementation from some higher-level description.
(We'll explore on what that means exactly later I guess)This is what will be covered in the book. We'll make Two interpreters
First Interpreter - Based on Java. Focuses on being correct to get the basics down.
Second Interpreter - Based on C. Builds upon on the previous implementation and focuses in being fast
The C interpreter will contain a compiler that translates Lox to an efficient bytecode representation, which it then executes. This is the same technique used by implementations of Lua, Python, Ruby, PHP, and many other successful languages. We’ll even try our hand at benchmarking and optimization
Everything we did in JAVA and took it for granted will be implemented from scratch (I am personally very excited about this). Such as Dynamic Arrays
, Hash Tables
and We'll also create our own Garbage Collector
.
At the end we are given this assignment.
Assignment
Define a doubly linked list of heap-allocated strings. Write functions to insert, find, and delete items from it to get a practice of pointers and Test them.
So first off what are HEAP ALLOCATED STRINGS
. A quick Chat GPT search gives us this -
char str[] = "hello";
Above is stack allocated string in C
STACK STRINGS
It's stored on the stack.
It has a fixed size and goes away when the function returns.
char *heapStr = malloc(strlen("hello") + 1);
strcpy(heapStr, "hello");
// Always free heap strings
free(heapStr);
// OR
char *heapStr = strdup("hello");
// Always free heap strings
free(heapStr);
strdup
allocates memory and copies the string.
heapStr
must be free()
d when done to avoid memory leaks.
Now that we understood this. We can start implementing some of the components from day 2
Summary of Day 1
We started analyzing on what it takes to build a programming language from scratch. In that search we found that there are a few essential elements that needs to be in place before we starting working towards implementing interpreter such as Lexer, AST. I don't know about this stuff yet but I am sure we'll be just fine as we go along. For the time I'll be following the book by Robert Nystrom's CRAFTING INTERPRETERS
There he uses Java
and C
but I am gonna try to convert that to javascript
. As in the end I wanna make a language which can be ran on a browser.
TODO Day 2
We are gonna implement the doubly linked list with basic functionalities such as insert, Delete, Find and Create i.e basic CRUD.
Also start with the actual implementation of the Interpreter.