Feature Request
Currently PDF processing extracts text via pypdfium2, but tables in PDFs (common in invoices, financial docs) are lost.
Consider integrating table detection:
- camelot-py or tabula-py for rule-based table extraction
- Or pass table regions to LLM separately
Motivation
Invoices and bank statements heavily rely on tabular data.
Feature Request
Currently PDF processing extracts text via pypdfium2, but tables in PDFs (common in invoices, financial docs) are lost.
Consider integrating table detection:
Motivation
Invoices and bank statements heavily rely on tabular data.