A Streamlit application for generating Qualiopi certification documents by replacing placeholders in Word and Excel templates with actual client data.
- Word Document Processing: Replaces placeholders in .docx files while preserving formatting
- Excel Document Processing: Replaces placeholders in .xlsx and .xls files while preserving formatting
- Batch Processing: Processes multiple documents simultaneously
- Parallel Processing: Multi-threaded processing for improved performance with large document sets
- Template Management: Organized template structure with indicators
- Progress Tracking: Real-time progress bars for document generation
- Error Handling: Graceful handling of corrupted or problematic files
- ZIP Download: Automatic packaging of generated documents
- Performance Monitoring: Real-time performance metrics and processing time tracking
- Multi-threading: Process multiple documents simultaneously using ThreadPoolExecutor
- Configurable Workers: Adjustable number of parallel workers (1-8)
- Smart Fallback: Automatically switches to sequential processing for small document sets
- Memory Management: Automatic garbage collection to prevent memory leaks
- Optimized Text Replacement: Single-pass processing for all replacements in a paragraph
- Caching: Cached access to document sections and tables
- Efficient Data Structures: Reduced redundant operations and improved memory usage
- Early Exit: Skip processing for empty paragraphs or missing placeholders
- 2-4x faster processing for large document sets (100+ documents)
- Reduced memory usage through optimized data structures
- Better scalability with configurable parallel processing
- Real-time performance metrics showing documents per second
The application now supports processing Excel files in addition to Word documents. Excel files in the template folder will have their placeholders replaced with actual data from the uploaded Excel file, while preserving all formatting, formulas, and structure.
- Template Excel Files: Place Excel files with placeholders in the
app/templates/directory - Placeholder Format: Use the same placeholder format as Word documents (e.g.,
[NOM_ORGANISME],[DATE]) - Automatic Processing: Excel files are automatically processed alongside Word documents
- Format Preservation: All Excel formatting, formulas, charts, and structure are preserved
- Text replacement in all cells across all sheets
- Date and place placeholder replacement (
[date],[date_du_jour],[Fait_a]) - Preservation of:
- Cell formatting (colors, fonts, borders)
- Formulas and calculations
- Charts and graphs
- Multiple sheets
- Cell references and links
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run app/app.py
- Upload an Excel file containing client data
- Upload a logo (optional)
- Configure performance settings in the sidebar:
- Enable/disable parallel processing
- Adjust number of parallel workers
- Select the row index for the client data
- Click "Générer les documents" to process templates
- Monitor real-time performance metrics
- Download the generated ZIP file
- Enable Parallel Processing: Toggle to enable/disable multi-threading
- Number of Workers: Adjust from 1-8 parallel workers
- Auto-fallback: Automatically uses sequential processing for small document sets
- Processing Time: Shows total time taken for document generation
- Documents per Second: Real-time performance metric
- Progress Tracking: Detailed progress for Word and Excel documents
- Error Reporting: Individual file error reporting without stopping the process
app/templates/
├── Indicateur_1_Information_du_public/
│ ├── Modeles_de_documents/
│ │ ├── *.docx (Word templates)
│ │ └── *.xlsx (Excel templates)
│ └── PREUVES_Mise_en_oeuvre/
└── ... (32 indicators total)
- streamlit
- pandas
- python-docx
- openpyxl
- tqdm
- concurrent.futures (built-in)
Build and run with Docker:
docker build -t document-filler .
docker run -p 8501:8501 document-fillerThe application includes robust error handling:
- Corrupted files are skipped with warnings
- Original files are copied if processing fails
- Detailed error messages for debugging
- Graceful degradation for problematic templates
- Individual file error reporting without stopping batch processing
- Use Parallel Processing: Enable for document sets with 10+ files
- Adjust Worker Count: Use 4-6 workers for optimal performance
- Monitor Memory: Large document sets may benefit from lower worker counts
- Template Optimization: Remove unused placeholders to improve processing speed