Skip to content

DeepikaKaranji/Rust-Compiler-with-C

Repository files navigation

Rust-Compiler-with-C

License: MIT Maintenance

A basic Rust compiler built with C

Introduction

Rust is a multi-paradigm system programming language focused on safety, especially safe concurrency. This compiler takes handles the following constructs:

  • If-Else constructs
  • While constructs
  • For constructs
  • All arithmetic operators
  • Single Line Comments (//)
  • Multi Line Comments (/* … */)

Semantically we have checked the following:

  • Whether any variable used on the RHS is defined and in the current scope or any Enclosing Scope of the current scope.
  • Missing semicolons
  • Assignment before Declaration

Design Strategy

SYMBOL TABLE GENERATION

Symbol table has been created in the Lexical analyser phase. It is implemented using nested structures which hold the lexeme, datatype, value, scope and line number.
It is assumed that each level of scope has 20 variables, however this can be modified easily. The Node structure is used to handle scope.

Abstract Syntax Tree

For the Abstract Syntax Tree, We have 2 Types of Nodes, Leaf nodes and Internal nodes. The nodes can have a variable number of children (0-3) depending upon the construct it represents.

To display the AST, We take the AST and store it as a matrix of levels. Leaf nodes in the AST representing identifiers, constants, Lists, packages point to a record in the symbol table.

Intermediate Code Generation

The Intermediate code is generated by recursively stepping through the AST. 3 Address code format is used.

Code Optimization

  • Dead Code Elimination:
    We Eliminate dead code, specifically unused variables in the whole program. For example, if we have the following lines of code:
    a = 10
    b = 10
    c = a + b
    And these 3 variables are not used on any other RHS, then these 3 lines of code are Eliminated during optimization. All the dead variables in the code are removed.
  • Elimination of common subexpressions
  • Constant folding

Error Handling

Panic mode recovery method of error handling has been employed. This means that the compiler ignores the input until the next delimiter (semicolon here) is obtained.

The scope checking is handled by recursively stepping through enclosing scopes and finding the most recent definition of the variable. If no definition is found, we print the error.

TARGET CODE GENERATION

We are using ARM 7 architecture. In this simple compiler, the variable is read from its slot in the memory to a register, performing operation and immediately writing it back into the memory. This way, we never really run out of registers because at most only 3 registers will be used. Liveness of variables is already taken care of at the code optimization phase.
ARM7 has 31 GPRs and 6 SPRs. The input to the assembly code generator is the optimized ICG from the previous phase of building the compiler.

Instructions to build and run program

Token generation
lex lexical_analysis.l

Symbol table generation
yacc -d lexical_analysis.y lex lexical_analysis.l gcc lex.yy.c y.tab.c -ll ./a.out

AST generation
lex ast.l
yacc -d ast.y
lex 2.l
gcc lex.yy.c y.tab.c -ll
./a.out

Intermediate Code Generation
yacc -d icg.y gcc lex.yy.c y.tab.c -ll ./a.out<test1.rs

Code Optimization
python3 code_optimize.py icg_ip.txt

Assembly code generation (ARM)
python3 acg.py icg_ip.txt

Future Work

The Compiler can be further enhanced by adding

  • Error recovery
  • Better memory management
  • Semantic analysis for Parameter matching
  • More efficient optimization techniques
  • Next use the algorithm for register allocation by making it a two pass assembler.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A rust mini-compiler built with C

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors