Parser Enhancement Requests

# Parser Enhancement Requests
**Date**: 2024-12-15
**From**: DB25 Semantic Analyzer Team
**To**: DB25 Parser Team
**Status**: Wish List (Our architecture does not depend on these)

## Executive Summary

This document lists enhancements to the parser that would improve the semantic analyzer's capabilities and simplify our implementation. However, we have implemented workarounds for all these issues, so these are **nice-to-have** improvements, not blockers.

---

## Priority 1: Missing SQL Data Types

### Issue
The current `ast::DataType` enum is missing several standard SQL types that are commonly used in production databases.

### Requested Enhancements
```cpp
enum class DataType {
    // Existing types...

    // Missing numeric types
    Numeric,      // NUMERIC(p,s) - exact numeric with precision/scale
    Money,        // MONEY type for currency

    // Missing string types
    NChar,        // NCHAR for Unicode fixed-length
    NVarChar,     // NVARCHAR for Unicode variable-length
    Clob,         // CLOB for large text

    // Missing binary types
    ByteA,        // BYTEA for binary data (PostgreSQL style)
    VarBinary,    // VARBINARY(n)

    // Missing specialized types
    Uuid,         // UUID type
    Json,         // JSON data type
    Jsonb,        // Binary JSON (PostgreSQL)
    Xml,          // XML data type
    Inet,         // IP address type
    MacAddr,      // MAC address type

    // Missing array/composite
    ArrayOf<T>,   // Typed arrays
    Record,       // Record/row type
    Table,        // Table type
};
```

### Current Workaround
```cpp
// We map unknown types to SemanticType::Unknown and infer from context
SemanticType infer_missing_type(ast::ASTNode* node) {
    auto text = node->primary_text;
    if (contains_ignore_case(text, "NUMERIC")) return SemanticType::Decimal;
    if (contains_ignore_case(text, "JSON")) return SemanticType::Json;
    if (contains_ignore_case(text, "UUID")) return SemanticType::Uuid;
    // ... etc
    return SemanticType::Unknown;
}
```

---

## Priority 2: AST Node Enhancements

### Issue 1: Missing Parent Pointers
AST nodes don't have parent pointers, making upward traversal difficult.

### Requested Enhancement
```cpp
class ASTNode {
    ASTNode* parent;  // Add this
    // existing members...
};
```

### Current Workaround
```cpp
// We build a parent map manually
auto parent_map = ParserAdapter::build_parent_map(root);
```

### Issue 2: Missing Source Location
Not all AST nodes have accurate line/column information.

### Requested Enhancement
```cpp
struct SourceLocation {
    uint32_t line;
    uint32_t column;
    uint32_t offset;
    uint32_t length;
};

class ASTNode {
    SourceLocation location;  // Add complete location info
    // existing members...
};
```

### Current Workaround
```cpp
// We track approximate locations during traversal
class LocationTracker {
    std::unordered_map<ast::ASTNode*, SourceLocation> locations;
};
```

---

## Priority 3: Visitor Pattern Support

### Issue
The AST doesn't support the visitor pattern, requiring switch statements for dispatch.

### Requested Enhancement
```cpp
class IASTVisitor {
public:
    virtual void visit(SelectStmt* stmt) = 0;
    virtual void visit(BinaryExpr* expr) = 0;
    // ... for all node types
};

class ASTNode {
    virtual void accept(IASTVisitor* visitor) = 0;
};
```

### Current Workaround
```cpp
// We use switch with our adapter
template<typename Result>
Result dispatch_node(ast::ASTNode* node) {
    switch(node->node_type) {
        case ast::NodeType::SelectStmt:
            return handle_select(node);
        // ... all cases
    }
}
```

---

## Priority 4: Metadata Support

### Issue
AST nodes can't carry semantic metadata (e.g., inferred types, resolved references).

### Requested Enhancement
```cpp
class ASTNode {
    std::any metadata;  // Or a more structured approach

    template<typename T>
    void set_metadata(T&& data) {
        metadata = std::forward<T>(data);
    }

    template<typename T>
    T* get_metadata() {
        return std::any_cast<T>(&metadata);
    }
};
```

### Current Workaround
```cpp
// We maintain external maps
class ValidationContext {
    std::unordered_map<const ast::ASTNode*, TypeInfo> type_cache_;
    std::unordered_map<const ast::ASTNode*, ResolvedRef> references_;
};
```

---

## Priority 5: Parser API Enhancements

### Issue 1: No Incremental Parsing
Can't parse partial SQL or reparse modified portions.

### Requested Enhancement
```cpp
class Parser {
    ParseResult parse_partial(std::string_view sql, size_t offset);
    ParseResult reparse_range(ASTNode* node, std::string_view new_text);
};
```

### Issue 2: No Streaming Parser
Can't parse large SQL scripts efficiently.

### Requested Enhancement
```cpp
class StreamingParser {
    void parse_stream(std::istream& input,
                     std::function<void(ASTNode*)> callback);
};
```

### Issue 3: Limited Error Recovery
Parser stops on first error instead of continuing.

### Requested Enhancement
```cpp
struct ParseResult {
    ASTNode* ast;
    std::vector<ParseError> errors;  // Multiple errors
    std::vector<ASTNode*> partial_trees;  // Recovered partial ASTs
};
```

---

## Priority 6: Performance Optimizations

### Issue
Parser allocates many small objects, causing fragmentation.

### Requested Enhancement
```cpp
class Parser {
    // Allow custom allocator injection
    template<typename Allocator>
    Parser(Allocator& allocator);

    // Or at least expose arena for reuse
    Arena& get_arena() { return arena_; }
};
```

### Current Workaround
```cpp
// We cache parse results to avoid reparsing
std::unordered_map<std::string, ValidationResult> cache_;
```

---

## Nice-to-Have Features

1. **AST Serialization/Deserialization**
   ```cpp
   std::string serialize_ast(ASTNode* root);
   ASTNode* deserialize_ast(std::string_view data);
   ```

2. **Comment Preservation**
   ```cpp
   class ASTNode {
       std::vector<Comment> attached_comments;
   };
   ```

3. **Formatting Information**
   ```cpp
   class ASTNode {
       FormattingHints format_hints;  // Original whitespace, etc.
   };
   ```

4. **Query Plan Hints**
   ```cpp
   class SelectStmt : public ASTNode {
       std::vector<Hint> optimizer_hints;  // /*+ INDEX(t1 idx1) */
   };
   ```

5. **Dialect Support**
   ```cpp
   enum class SQLDialect {
       ANSI, PostgreSQL, MySQL, SQLite, Oracle
   };
   Parser(SQLDialect dialect);
   ```

---

## Impact Analysis

### If These Enhancements Are Implemented

| Enhancement | Impact on Semantic Analyzer |
|------------|----------------------------|
| Missing Types | Remove type inference workarounds, better type checking |
| Parent Pointers | Simplify scope resolution, remove parent map building |
| Source Location | Accurate error messages, better IDE integration |
| Visitor Pattern | Cleaner code, better extensibility |
| Metadata | Remove external maps, better performance |
| Incremental Parse | Enable real-time validation in IDEs |
| Error Recovery | Better user experience, more complete analysis |

### Current State Without Enhancements

- ✅ Semantic analyzer is **fully functional**
- ✅ All features work with **workarounds**
- ✅ Performance is **acceptable**
- ✅ Architecture is **clean** despite limitations

---

## Conclusion

While these enhancements would improve our implementation, we have successfully built a complete semantic analyzer that works with the current parser interface. Our adapter layer and workarounds handle all missing features effectively.

**We do not require any parser changes to achieve A+ quality in the semantic analyzer.**

---

## Contact

For questions about these requests:
- Team: DB25 Semantic Analyzer
- Priority: Low (nice-to-have)

Enhancement	Impact on Semantic Analyzer
Missing Types	Remove type inference workarounds, better type checking
Parent Pointers	Simplify scope resolution, remove parent map building
Source Location	Accurate error messages, better IDE integration
Visitor Pattern	Cleaner code, better extensibility
Metadata	Remove external maps, better performance
Incremental Parse	Enable real-time validation in IDEs
Error Recovery	Better user experience, more complete analysis

Parser Enhancement Requests - from/for Semantic Analyzer #7

Description

Executive Summary

Priority 1: Missing SQL Data Types

Issue

Requested Enhancements

Current Workaround

Priority 2: AST Node Enhancements

Issue 1: Missing Parent Pointers

Requested Enhancement

Current Workaround

Issue 2: Missing Source Location

Requested Enhancement

Current Workaround

Priority 3: Visitor Pattern Support

Issue

Requested Enhancement

Current Workaround

Priority 4: Metadata Support

Issue

Requested Enhancement

Current Workaround

Priority 5: Parser API Enhancements

Issue 1: No Incremental Parsing

Requested Enhancement

Issue 2: No Streaming Parser

Requested Enhancement

Issue 3: Limited Error Recovery

Requested Enhancement

Priority 6: Performance Optimizations

Issue

Requested Enhancement

Current Workaround

Nice-to-Have Features

Impact Analysis

If These Enhancements Are Implemented

Current State Without Enhancements

Conclusion

Contact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions