-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Parser Enhancement Requests
Date: 2024-12-15
From: DB25 Semantic Analyzer Team
To: DB25 Parser Team
Status: Wish List (Our architecture does not depend on these)
Executive Summary
This document lists enhancements to the parser that would improve the semantic analyzer's capabilities and simplify our implementation. However, we have implemented workarounds for all these issues, so these are nice-to-have improvements, not blockers.
Priority 1: Missing SQL Data Types
Issue
The current ast::DataType enum is missing several standard SQL types that are commonly used in production databases.
Requested Enhancements
enum class DataType {
// Existing types...
// Missing numeric types
Numeric, // NUMERIC(p,s) - exact numeric with precision/scale
Money, // MONEY type for currency
// Missing string types
NChar, // NCHAR for Unicode fixed-length
NVarChar, // NVARCHAR for Unicode variable-length
Clob, // CLOB for large text
// Missing binary types
ByteA, // BYTEA for binary data (PostgreSQL style)
VarBinary, // VARBINARY(n)
// Missing specialized types
Uuid, // UUID type
Json, // JSON data type
Jsonb, // Binary JSON (PostgreSQL)
Xml, // XML data type
Inet, // IP address type
MacAddr, // MAC address type
// Missing array/composite
ArrayOf<T>, // Typed arrays
Record, // Record/row type
Table, // Table type
};Current Workaround
// We map unknown types to SemanticType::Unknown and infer from context
SemanticType infer_missing_type(ast::ASTNode* node) {
auto text = node->primary_text;
if (contains_ignore_case(text, "NUMERIC")) return SemanticType::Decimal;
if (contains_ignore_case(text, "JSON")) return SemanticType::Json;
if (contains_ignore_case(text, "UUID")) return SemanticType::Uuid;
// ... etc
return SemanticType::Unknown;
}Priority 2: AST Node Enhancements
Issue 1: Missing Parent Pointers
AST nodes don't have parent pointers, making upward traversal difficult.
Requested Enhancement
class ASTNode {
ASTNode* parent; // Add this
// existing members...
};Current Workaround
// We build a parent map manually
auto parent_map = ParserAdapter::build_parent_map(root);Issue 2: Missing Source Location
Not all AST nodes have accurate line/column information.
Requested Enhancement
struct SourceLocation {
uint32_t line;
uint32_t column;
uint32_t offset;
uint32_t length;
};
class ASTNode {
SourceLocation location; // Add complete location info
// existing members...
};Current Workaround
// We track approximate locations during traversal
class LocationTracker {
std::unordered_map<ast::ASTNode*, SourceLocation> locations;
};Priority 3: Visitor Pattern Support
Issue
The AST doesn't support the visitor pattern, requiring switch statements for dispatch.
Requested Enhancement
class IASTVisitor {
public:
virtual void visit(SelectStmt* stmt) = 0;
virtual void visit(BinaryExpr* expr) = 0;
// ... for all node types
};
class ASTNode {
virtual void accept(IASTVisitor* visitor) = 0;
};Current Workaround
// We use switch with our adapter
template<typename Result>
Result dispatch_node(ast::ASTNode* node) {
switch(node->node_type) {
case ast::NodeType::SelectStmt:
return handle_select(node);
// ... all cases
}
}Priority 4: Metadata Support
Issue
AST nodes can't carry semantic metadata (e.g., inferred types, resolved references).
Requested Enhancement
class ASTNode {
std::any metadata; // Or a more structured approach
template<typename T>
void set_metadata(T&& data) {
metadata = std::forward<T>(data);
}
template<typename T>
T* get_metadata() {
return std::any_cast<T>(&metadata);
}
};Current Workaround
// We maintain external maps
class ValidationContext {
std::unordered_map<const ast::ASTNode*, TypeInfo> type_cache_;
std::unordered_map<const ast::ASTNode*, ResolvedRef> references_;
};Priority 5: Parser API Enhancements
Issue 1: No Incremental Parsing
Can't parse partial SQL or reparse modified portions.
Requested Enhancement
class Parser {
ParseResult parse_partial(std::string_view sql, size_t offset);
ParseResult reparse_range(ASTNode* node, std::string_view new_text);
};Issue 2: No Streaming Parser
Can't parse large SQL scripts efficiently.
Requested Enhancement
class StreamingParser {
void parse_stream(std::istream& input,
std::function<void(ASTNode*)> callback);
};Issue 3: Limited Error Recovery
Parser stops on first error instead of continuing.
Requested Enhancement
struct ParseResult {
ASTNode* ast;
std::vector<ParseError> errors; // Multiple errors
std::vector<ASTNode*> partial_trees; // Recovered partial ASTs
};Priority 6: Performance Optimizations
Issue
Parser allocates many small objects, causing fragmentation.
Requested Enhancement
class Parser {
// Allow custom allocator injection
template<typename Allocator>
Parser(Allocator& allocator);
// Or at least expose arena for reuse
Arena& get_arena() { return arena_; }
};Current Workaround
// We cache parse results to avoid reparsing
std::unordered_map<std::string, ValidationResult> cache_;Nice-to-Have Features
-
AST Serialization/Deserialization
std::string serialize_ast(ASTNode* root); ASTNode* deserialize_ast(std::string_view data);
-
Comment Preservation
class ASTNode { std::vector<Comment> attached_comments; };
-
Formatting Information
class ASTNode { FormattingHints format_hints; // Original whitespace, etc. };
-
Query Plan Hints
class SelectStmt : public ASTNode { std::vector<Hint> optimizer_hints; // /*+ INDEX(t1 idx1) */ };
-
Dialect Support
enum class SQLDialect { ANSI, PostgreSQL, MySQL, SQLite, Oracle }; Parser(SQLDialect dialect);
Impact Analysis
If These Enhancements Are Implemented
| Enhancement | Impact on Semantic Analyzer |
|---|---|
| Missing Types | Remove type inference workarounds, better type checking |
| Parent Pointers | Simplify scope resolution, remove parent map building |
| Source Location | Accurate error messages, better IDE integration |
| Visitor Pattern | Cleaner code, better extensibility |
| Metadata | Remove external maps, better performance |
| Incremental Parse | Enable real-time validation in IDEs |
| Error Recovery | Better user experience, more complete analysis |
Current State Without Enhancements
- ✅ Semantic analyzer is fully functional
- ✅ All features work with workarounds
- ✅ Performance is acceptable
- ✅ Architecture is clean despite limitations
Conclusion
While these enhancements would improve our implementation, we have successfully built a complete semantic analyzer that works with the current parser interface. Our adapter layer and workarounds handle all missing features effectively.
We do not require any parser changes to achieve A+ quality in the semantic analyzer.
Contact
For questions about these requests:
- Team: DB25 Semantic Analyzer
- Priority: Low (nice-to-have)