A Security-First Academic Compiler Design Project
AEGIS (Adaptive Execution Guarded Interpreter System) is an academic project that demonstrates a novel security-first execution model. Unlike traditional compilers that prioritize performance, AEGIS starts all code execution in a sandboxed interpreter and only promotes code to optimized execution after it has demonstrated safe behavior through runtime monitoring and trust building.
Security is Default. Performance is Conditional.
- All code starts in a sandboxed interpreter - No code runs optimized initially
- Runtime monitoring builds trust - Safe execution increases trust scores
- Trust enables optimization - Only trusted code gets compiled execution
- Violations trigger rollback - Any security issue reverts to interpreter
- Trust is revocable - System can always return to safe mode
Source Code → Lexer → Parser → AST → Static Analyzer
↓
Sandboxed Interpreter ←→ Runtime Monitor
↓ ↓
Trust Manager ←→ Optimized Executor
↓ ↓
Rollback Handler ←←←←←←←←←←←←←←←←
AEGIS implements a simple toy language for demonstration:
# Variable assignment
x = 10
y = 20
# Arithmetic expressions
result = x + y * 2
# Print statements
print result
Supported Features:
- Integer variables and literals
- Arithmetic operators:
+,-,*,/ - Variable assignment with
= - Print statements with
print
Security Constraints:
- No user input operations
- No file system access
- No system calls
- Integer-only data types
- Sandboxed execution environment
-
Clone the repository:
git clone <repository-url> cd aegis
-
Install dependencies:
pip install -r requirements.txt
-
Run tests:
pytest tests/
# Execute a program file
python main.py program.aegis
# Interactive mode
python main.py --interactive
# Batch execution mode
python main.py --batch file1.aegis file2.aegis
# Verbose output with detailed logging
python main.py --verbose program.aegis
# Show system status and statistics
python main.py --status
# Show help
python main.py --help--interactive, -i: Start interactive REPL mode--batch, -b: Execute multiple files in sequence--verbose, -v: Enable detailed execution logging--status, -s: Display system status and trust statistics--trust-file FILE: Specify custom trust persistence file--help, -h: Show help message
$ python main.py --interactive
[AEGIS] Starting interactive mode
[AEGIS] Trust file: .aegis_trust.json
[AEGIS] Type 'help' for commands, 'exit' to quit
aegis> x = 10
[MODE] Interpreted Execution
[TRUST] Score = 0.1 (10 successful executions needed for optimization)
aegis> y = x + 5
[MODE] Interpreted Execution
[TRUST] Score = 0.2 (8 successful executions needed for optimization)
aegis> print y
15
[MODE] Interpreted Execution
[TRUST] Score = 0.3 (7 successful executions needed for optimization)
# After multiple safe executions...
aegis> result = x * y
[MODE] Optimized Execution (2.1x speedup)
[TRUST] Score = 1.2 (trusted code)
aegis> status
=== AEGIS System Status ===
Execution Mode: Optimized
Trust Score: 1.2
Total Executions: 12
Successful Executions: 12
Violations: 0
Cache Entries: 3
Optimization Ratio: 2.1x
aegis> help
Available commands:
help - Show this help message
status - Display system status
reset - Reset trust score to 0
clear - Clear execution history
exit - Exit interactive mode
Converts source code into tokens for parsing.
Builds Abstract Syntax Trees using recursive descent parsing.
Defines AST node types and provides tree manipulation utilities.
Performs compile-time security checks before execution.
Default secure execution environment with complete isolation.
Tracks execution behavior and detects security violations.
Maintains trust scores and makes optimization decisions.
Simulated compiled execution for trusted code.
Manages transitions back to interpreter on violations.
- Initial Score: 0.0 (untrusted)
- Safe Execution: +0.1 per successful run
- Optimization Threshold: 1.0
- Violation Penalty: Reset to 0.0
- Persistence: Scores saved across sessions
Interpreted Mode (Default):
- Complete safety guarantees
- Full sandboxing and monitoring
- Baseline execution speed
- Zero optimization overhead
Optimized Mode (Trust-Based):
- ~2.1x average speedup over interpreted mode
- Cached AST compilation and execution paths
- Constant folding and expression simplification
- Dead code elimination simulation
- Continued security monitoring
Executions: 1 2 3 4 5 6 7 8 9 10 11+
Trust Score: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1+
Mode: Interpreted ────────────────────────────────────→ Optimized
Example performance comparison for arithmetic-heavy programs:
Program: Fibonacci calculation (n=20)
Interpreted Mode: ~1.2ms execution time
Optimized Mode: ~0.6ms execution time
Speedup: 2.0x
Program: Complex arithmetic expressions
Interpreted Mode: ~0.8ms execution time
Optimized Mode: ~0.35ms execution time
Speedup: 2.3x
Program: Variable-heavy assignments
Interpreted Mode: ~0.5ms execution time
Optimized Mode: ~0.25ms execution time
Speedup: 2.0x
Note: Performance numbers are simulated for academic demonstration
- Undefined variable detection - Prevents use of uninitialized variables
- Arithmetic overflow prevention - Warns about potential integer overflow
- Division by zero detection - Catches literal division by zero at compile time
- Expression validation - Ensures well-formed expressions within depth limits
- Type safety enforcement - Validates identifier formats and usage
- Instruction count limits - Prevents infinite loops and runaway execution
- Memory usage tracking - Monitors variable storage and prevents excessive usage
- Operation validation - Validates all arithmetic operations at runtime
- Violation detection - Real-time detection of security policy violations
- Execution metrics - Comprehensive statistics collection
AEGIS provides comprehensive error reporting with categorized, descriptive messages:
- [LEXICAL] - Invalid characters or token formation errors
- [SYNTAX] - Grammar violations and parsing errors
- [SEMANTIC] - Undefined variables and type errors
- [RUNTIME] - Division by zero and execution errors
- [SECURITY] - Policy violations and security breaches
[CATEGORY] [ERROR_CODE]
Error description with context
Location: line X, column Y
Token: 'problematic_token'
Variables: {current_state}
Suggestions:
- Specific actionable advice
- Alternative approaches
- Common fix patterns
# Undefined variable error
[SEMANTIC] [SEM001]
Undefined variable: counter
Token: 'counter'
Suggestions:
- Define variable 'counter' before using it
- Check for typos in variable names
- Ensure variable assignments come before usage
# Division by zero error
[RUNTIME] [RUN001]
Division by zero detected
Variables: x=10, y=0
Suggestions:
- Ensure divisor is not zero
- Add conditional checks before division
- Use non-zero values for division- Runtime errors - Division by zero, overflow, invalid operations
- Security violations - Policy breaches, resource limit exceeded
- Static analysis failures - Undefined variables, invalid expressions
- System errors - Internal failures, corruption detection
- Trust scores are automatically saved to
.aegis_trust.json - Scores persist across program runs and sessions
- Trust can be manually reset or cleared
- Multiple programs can have independent trust scores
# basic_math.aegis
x = 10
y = 20
sum = x + y
product = x * y
print sum
print product
# trust_demo.aegis
# Run this multiple times to see trust building
counter = 1
result = counter * 2
print result
# violation_demo.aegis
# This will trigger rollback if run in optimized mode
x = 10
y = 0
result = x / y # Division by zero
print result
Current Implementation Status:
- Project structure and foundations
- Lexical analysis (Lexer)
- Syntax analysis (Parser + AST)
- Static security analysis
- Sandboxed interpreter
- Runtime monitoring
- Trust management
- Optimized execution
- Rollback handling
- System integration
- Comprehensive error handling
- Documentation and examples (in progress)
AEGIS uses a comprehensive testing approach with 237 total tests:
Unit Tests (150+ tests):
- Component-specific behavior validation
- Edge case handling and error conditions
- API contract verification
- Individual module functionality
Property-Based Tests (15 properties):
- Universal correctness properties using Hypothesis
- Randomized input validation across large test spaces
- Round-trip consistency verification
- Semantic equivalence between execution modes
Integration Tests (8 tests):
- End-to-end system validation
- Component interaction verification
- Pipeline execution completeness
- Cross-component data flow
Security Tests:
- Violation detection and rollback validation
- Trust score lifecycle management
- Error handling and recovery
- Rollback state consistency
- Property 1: Tokenization Round-Trip Consistency
- Property 2: Parsing Round-Trip Consistency
- Property 3: Arithmetic Expression Correctness
- Property 4: Variable Assignment Consistency
- Property 5: Print Statement Output Correctness
- Property 6: Static Analysis Undefined Variable Detection
- Property 7: Execution State Isolation
- Property 8: Trust Score Lifecycle Management
- Property 9: Execution Mode Transition Correctness
- Property 10: Semantic Equivalence Between Execution Modes
- Property 11: Runtime Monitoring Completeness
- Property 12: Rollback State Consistency
- Property 13: Error Message Descriptiveness
- Property 14: Pipeline Execution Completeness
- Property 15: Console Output Visibility
# Run all tests
pytest tests/ -v
# Run specific test categories
pytest tests/test_lexer*.py -v # Lexer tests
pytest tests/test_parser*.py -v # Parser tests
pytest tests/test_interpreter*.py -v # Interpreter tests
pytest tests/test_*_properties.py -v # Property-based tests
pytest tests/test_integration*.py -v # Integration tests
# Run with coverage
pytest tests/ --cov=aegis --cov-report=html
# Run property tests with more examples
pytest tests/test_*_properties.py --hypothesis-max-examples=1000
# Run tests with detailed output
pytest tests/ -v --tb=long================================ test session starts =================================
platform win32 -- Python 3.10.0, pytest-8.4.2, pluggy-1.6.0
collected 237 items
tests/test_ast.py ........................ [ 10%]
tests/test_error_message_properties.py ...... [ 12%]
tests/test_foundation.py ........... [ 17%]
tests/test_integration_checkpoint.py ........ [ 20%]
tests/test_interpreter_properties.py ........... [ 25%]
tests/test_lexer.py ...................... [ 34%]
tests/test_lexer_properties.py ....... [ 37%]
tests/test_optimized_executor.py .................... [ 45%]
tests/test_optimized_executor_properties.py ...... [ 48%]
tests/test_parser.py ............................. [ 60%]
tests/test_parser_properties.py ........ [ 64%]
tests/test_pipeline_properties.py ..... [ 66%]
tests/test_rollback_properties.py ... [ 67%]
tests/test_rollback_unit.py ...... [ 70%]
tests/test_runtime_monitor_properties.py ..... [ 72%]
tests/test_static_analyzer.py ........................... [ 83%]
tests/test_static_analyzer_properties.py ........ [ 86%]
tests/test_trust_manager.py ......................... [ 97%]
tests/test_trust_manager_properties.py ...... [100%]
============================ 237 passed in 18.60s ============================
This project is designed for Compiler Design coursework and demonstrates:
- Lexical and Syntax Analysis: Traditional compiler front-end
- Security-First Design: Novel execution model prioritizing safety
- Runtime Systems: Monitoring, trust management, and adaptive execution
- System Architecture: Component interaction and data flow
Important Note: Native compilation is simulated for academic demonstration. This is not a production compiler but an educational tool for understanding compiler concepts and security-adaptive execution models.
Q: Program fails with "Undefined variable" error
[SEMANTIC] [SEM001]
Undefined variable: x
A: Variables must be defined before use. Check for typos and ensure assignments come before usage.
Q: Trust score not increasing
[TRUST] Score = 0.1 (no change)
A: Trust only increases with successful executions. Fix any errors first, then run the program multiple times.
Q: Division by zero in optimized mode
[SECURITY] Rollback triggered: Division by zero
[MODE] Switched to Interpreted Execution
[TRUST] Score reset to 0.0
A: This is expected behavior. The system detected a violation and rolled back to safe mode. Fix the division by zero and rebuild trust.
Q: "Expression too deeply nested" error
[SEMANTIC] Expression too deeply nested (max depth: 10)
A: Simplify complex expressions by breaking them into multiple assignment statements.
Enable verbose logging for detailed execution information:
python main.py --verbose program.aegisThis provides:
- Detailed tokenization output
- AST structure visualization
- Static analysis reports
- Runtime monitoring data
- Trust score calculations
- Optimization decisions
- Toy Language: Limited to basic arithmetic and variables for academic demonstration
- Integer-Only: No support for strings, floats, arrays, or complex data types
- No Control Flow: No if/else statements, loops, or functions
- No I/O Operations: Restricted to prevent security issues (only print output)
- No User Input: No ability to read from stdin or files
- Simulated Compilation: No actual machine code generation - optimizations are simulated
- Academic Scope: Designed for learning compiler concepts, not production use
- Single-Threaded: No concurrency or parallel execution support
- Memory Model: Simplified variable storage without advanced memory management
- No Modules: No import/export system or code organization features
- Simplified Threat Model: Focuses on basic arithmetic safety, not comprehensive security
- No Network Security: No consideration of network-based attacks
- Limited Resource Control: Basic monitoring without advanced resource management
- Trust Model Simplicity: Basic score-based system without sophisticated analysis
- Interpreter Overhead: Significant performance cost in interpreted mode
- Simulated Optimizations: Performance improvements are simulated, not real
- No Advanced Optimizations: Missing loop unrolling, vectorization, etc.
- Cache Limitations: Simple AST caching without advanced compilation techniques
These limitations are intentional design choices to keep the project focused on demonstrating core compiler and security concepts within an academic context.
This is an academic project. For educational purposes:
- Fork the repository
- Create a feature branch
- Implement components following the design
- Add comprehensive tests
- Submit a pull request
This project is created for academic purposes. Please respect educational integrity guidelines when using this code for coursework.
AEGIS - Where Security Comes First, Performance Comes Second