This document outlines planned enhancements and new features for cfd-python after the v0.1.6 migration is complete.
Last Updated: 2026-03-05 Current Version: 0.1.6 (released) Target Version: 0.2.0+
With the v0.1.6 migration completing core functionality (boundary conditions, error handling, backend availability, CPU features), the focus shifts to improving the Python API ergonomics, adding type safety, and providing higher-level abstractions.
Key insights from the PyPI publishing process that should inform future releases:
| Issue | Solution |
|---|---|
PyPI rejects linux_x86_64 wheels |
Build inside manylinux containers |
| glibc version mismatch on ubuntu-latest | Use quay.io/pypa/manylinux_2_28_x86_64 container |
| auditwheel requires patchelf >= 0.14 | Install via pip (EPEL version too old) |
| CUDA wheels need manylinux too | Use nvidia/cuda:12.4.0-devel-rockylinux8 (manylinux_2_28 compatible) |
| Issue | Solution |
|---|---|
| CUDA not in standard manylinux containers | Use NVIDIA CUDA devel images based on Rocky Linux 8 |
| CMake FetchContent needs git | dnf install -y git in container |
| patchelf too old in EPEL | pip install patchelf instead |
| Issue | Solution |
|---|---|
PyPI rejects local versions (+g...) |
Build from git tag, not branch |
| workflow_dispatch defaults to branch | Add ref input parameter to workflow |
| sdist filename with hyphens | PyPI requires underscores (PEP 625) - rename if needed |
| Issue | Solution |
|---|---|
| Upload fails midway, can't re-upload same files | Add skip-existing: true to pypi-publish action |
| Issue | Solution |
|---|---|
| CUDA toolkit install takes 14-16 min | Use method: network with minimal sub-packages |
| CMake can't find CUDA toolset | Include visual_studio_integration in sub-packages |
For future ARM64 Linux support:
ARCH=$(uname -m)
auditwheel repair dist_raw/*.whl --plat manylinux_2_28_${ARCH} -w dist/- Build C library and wheel inside manylinux container (not on host)
- Use auditwheel to repair and tag wheels
- Support tag-based builds via
refinput for releases - Always use
skip-existing: truefor PyPI uploads
Priority: P1 - High impact for developer experience Status: Core tasks done, CI integration remaining
- Enable IDE autocompletion for all C extension functions
- Provide type checking with mypy/pyright
- Document function signatures formally
-
8.1 Create
cfd_python/__init__.pyistub file- Comprehensive stubs covering all exported functions, constants, and classes
-
8.2 Add
py.typedmarker filecfd_python/py.typedexists for PEP 561 compliance
-
8.3 Update
pyproject.toml- mypy included in
[project.optional-dependencies] dev
- mypy included in
-
8.4 Add type checking to CI
- Run mypy on tests to verify stubs are correct
- IDE shows function signatures and return types
-
mypy tests/passes without errors — not yet in CI - Stubs cover all exported functions and constants
Priority: P2 - Improved API ergonomics Estimated Effort: 1-2 days
- Replace bare integer constants with IntEnum classes
- Better repr/str output for debugging
- IDE support for constant groups
-
9.1 Create enum classes in
_enums.py -
9.2 Export enums alongside bare constants
- Maintain backward compatibility with
SIMD_NONE,BACKEND_SCALAR, etc. - Add enum classes to
__all__
- Maintain backward compatibility with
-
9.3 Update documentation
- Show enum usage in docstrings
- Add migration notes for users preferring enums
-
9.4 Add tests for enum classes
- Both
cfd_python.SIMD_AVX2andcfd_python.SIMDArch.AVX2work str(SIMDArch.AVX2)returns"SIMDArch.AVX2"- Enums work with existing functions that expect int constants
Priority: P2 - Developer experience improvement Estimated Effort: 3-5 days
- Provide object-oriented API alongside functional API
- Reduce boilerplate for common operations
- Enable method chaining and fluent interfaces
-
10.1 Create
Gridclass -
10.2 Create
BoundaryConditionsbuilder -
10.3 Create
Simulationclass -
10.4 Create
Fieldclass for data manipulation -
10.5 Add tests for high-level API
-
10.6 Update documentation with examples
- Users can choose between low-level functions and high-level classes
- Method chaining works fluently
- All high-level classes have comprehensive tests
Priority: P2 - Important for scientific workflows Estimated Effort: 2-3 days
- Accept and return NumPy arrays
- Zero-copy data transfer where possible
- Integration with NumPy ecosystem
-
11.1 Add NumPy array support to C extension
- Use
PyArray_*API for array handling - Support both list and ndarray inputs
- Use
-
11.2 Create array conversion utilities
def to_numpy(data: List[float], shape: tuple[int, int]) -> np.ndarray: """Convert flat list to 2D NumPy array.""" return np.array(data).reshape(shape) def from_numpy(arr: np.ndarray) -> List[float]: """Convert NumPy array to flat list.""" return arr.flatten().tolist()
-
11.3 Add
as_numpyoption to functionsdef run_simulation(..., as_numpy: bool = False) -> Union[List[float], np.ndarray]: result = _run_simulation_impl(...) if as_numpy: return np.array(result).reshape((ny, nx)) return result
-
11.4 Support array protocol in Field class
class Field: def __array__(self) -> np.ndarray: return np.array(self.data).reshape(self.shape)
-
11.5 Add tests with NumPy arrays
- Functions accept both lists and NumPy arrays
- NumPy arrays can be returned directly
- Field objects work with NumPy functions via
__array__
Priority: P2 - Leverage existing visualization library Estimated Effort: 1 day
- Enable optional dependency on cfd-visualization
- Reference cfd-visualization for all visualization needs
Visualization features are developed in the separate cfd-visualization project.
See cfd-visualization ROADMAP for visualization enhancements.
-
12.1 Add optional dependency
[project.optional-dependencies] viz = ["cfd-visualization>=0.2.0"]
-
12.2 Document integration in examples
- Show how to use
cfd_viz.from_cfd_python()conversion - Reference cfd-visualization documentation
- Show how to use
pip install cfd-python[viz]installs cfd-visualization- Documentation points users to cfd-visualization for visualization
Priority: P3 - Useful for optimization Estimated Effort: 1-2 days
- Expose timing information from C library
- Help users identify bottlenecks
- Support benchmarking workflows
-
13.1 Add timing to simulation results
result = run_simulation_with_params(...) print(result["timing"]) # {'total_ms': 150.2, 'solver_ms': 120.5, 'bc_ms': 25.3, 'io_ms': 4.4}
-
13.2 Create benchmarking utilities
from cfd_python.benchmark import benchmark_solver results = benchmark_solver( solver="projection", grid_sizes=[(32, 32), (64, 64), (128, 128)], steps=100, repeat=3 ) # Returns DataFrame with timing statistics
-
13.3 Add backend comparison utility
from cfd_python.benchmark import compare_backends comparison = compare_backends( nx=64, ny=64, steps=100, backends=[Backend.SCALAR, Backend.SIMD, Backend.OMP] ) # Shows speedup ratios
- Users can easily measure performance
- Backend comparisons help choose optimal configuration
- Timing data included in simulation results
Priority: P3 - Advanced use case
Estimated Effort: 3-5 days
Spec: .claude/specs/phase-14-async-parallel-simulation.md
- Support running multiple simulations in parallel
- Async API for non-blocking operations
- Progress callbacks for long-running simulations
- 14.1 Add progress callback support —
callbackandcallback_intervalkeyword arguments onrun_simulation_with_params(), with cancellation viaFalsereturn - 14.1b Release GIL during simulation loop —
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSaround the step loop to enable thread-based parallelism - 14.2 Create async simulation wrapper —
run_simulation_async()incfd_python/_async.pyusingloop.run_in_executor()with task cancellation support - 14.3 Add parameter sweep utility —
parameter_sweep()incfd_python/_parallel.pyusingProcessPoolExecutorwith Cartesian product of parameter variations - 14.4 Update exports and type stubs — re-export new functions from
__init__.py, add signatures to__init__.pyi - 14.5 Add tests —
tests/test_async.pyandtests/test_parallel.pycovering callbacks, cancellation, concurrency, and parameter sweeps
- Long simulations can report progress via callbacks without significant overhead
- Callbacks can cancel a running simulation
run_simulation_async()does not block the asyncio event loop- Multiple simulations run in parallel via
parameter_sweep()with true CPU parallelism - Parameter sweeps are easy to set up
Priority: P2 - Extends core functionality Estimated Effort: 5-7 days
- Support 3D computational grids
- Extend boundary conditions to 3D
- Enable 3D flow simulations
-
15.1 Expose 3D grid functions from C library
def create_grid_3d( nx: int, ny: int, nz: int, xmin: float, xmax: float, ymin: float, ymax: float, zmin: float, zmax: float ) -> dict: ...
-
15.2 Add 3D boundary condition edges
class BCFace(IntEnum): """3D boundary faces.""" WEST = 0 # -x EAST = 1 # +x SOUTH = 2 # -y NORTH = 3 # +y BOTTOM = 4 # -z TOP = 5 # +z
-
15.3 Extend Field class for 3D
class Field3D(Field): def __init__(self, data: List[float], nx: int, ny: int, nz: int): self.nz = nz super().__init__(data, nx, ny) @property def shape(self) -> tuple[int, int, int]: return (self.nx, self.ny, self.nz)
-
15.4 Add 3D VTK output support
-
15.5 Add 3D examples and tests
- 3D grids can be created and manipulated
- Boundary conditions work on all 6 faces
- VTK output produces valid 3D visualizations
Priority: P2 - Important for long simulations Estimated Effort: 2-3 days
- Save simulation state to disk
- Resume interrupted simulations
- Support HDF5 format for efficient I/O
-
16.1 Add checkpoint save/load functions
def save_checkpoint( filename: str, u: List[float], v: List[float], p: List[float], grid: Grid, step: int, time: float, params: dict ) -> None: ... def load_checkpoint(filename: str) -> dict: """Returns dict with all fields, grid, step, time, params.""" ...
-
16.2 Add automatic checkpointing to Simulation class
sim = Simulation(grid) sim.run( steps=10000, checkpoint_interval=1000, checkpoint_dir="./checkpoints" )
-
16.3 Add resume functionality
sim = Simulation.from_checkpoint("./checkpoints/step_5000.h5") sim.run(steps=5000) # Continues from step 5000
-
16.4 Optional HDF5 support
[project.optional-dependencies] hdf5 = ["h5py>=3.0"]
- Long simulations can be checkpointed
- Simulations can be resumed from checkpoints
- Checkpoint files are portable across platforms
Priority: P1 - Quality assurance Estimated Effort: 3-4 days
- Validate solver accuracy against known solutions
- Benchmark against analytical results
- Ensure numerical correctness across backends
-
17.1 Implement lid-driven cavity benchmark
from cfd_python.validation import lid_driven_cavity results = lid_driven_cavity( Re=100, grid_sizes=[32, 64, 128], compare_ghia=True # Compare to Ghia et al. (1982) reference data ) results.plot_convergence()
-
17.2 Implement Poiseuille flow validation
from cfd_python.validation import poiseuille_flow results = poiseuille_flow( nx=64, ny=32, pressure_gradient=0.1 ) error = results.compare_analytical() # Should be < 1e-6
-
17.3 Implement Taylor-Green vortex decay
from cfd_python.validation import taylor_green_vortex results = taylor_green_vortex( Re=100, grid_size=64, end_time=1.0 ) results.plot_energy_decay()
-
17.4 Add regression test suite
- Store reference results for each validation case
- Fail CI if results deviate beyond tolerance
-
17.5 Backend consistency tests
- Ensure SCALAR, SIMD, OMP produce identical results
- Verify CUDA results match CPU within tolerance
- All validation cases match reference data
- Regression tests catch numerical changes
- Backend consistency is verified automatically
Priority: P3 - Enhanced interactivity Estimated Effort: 1-2 days
- Rich display in Jupyter notebooks for simulation objects
- Interactive widgets for parameter exploration
Visualization-related Jupyter features (plots, animations, interactive dashboards) are handled by cfd-visualization.
See cfd-visualization ROADMAP Phase 2 for visualization in Jupyter.
-
18.1 Add
_repr_html_for key classesclass Grid: def _repr_html_(self) -> str: return f""" <div style="border: 1px solid #ccc; padding: 10px;"> <b>Grid</b>: {self.nx} × {self.ny}<br> Domain: [{self._data['xmin']}, {self._data['xmax']}] × [{self._data['ymin']}, {self._data['ymax']}] </div> """ class SimulationResult: def _repr_html_(self) -> str: return f""" <div style="border: 1px solid #ccc; padding: 10px;"> <b>SimulationResult</b><br> Grid: {self.nx} × {self.ny}<br> Max velocity: {self.stats['max_velocity']:.4f}<br> Iterations: {self.stats['iterations']} </div> """
-
18.2 Add interactive parameter widgets
from cfd_python.jupyter import interactive_simulation # Creates sliders for Re, dt, grid size interactive_simulation( solver="projection", Re_range=(10, 1000), grid_range=(16, 128) )
-
18.3 Add ipywidgets optional dependency
[project.optional-dependencies] jupyter = ["ipywidgets>=8.0"]
- Grid and SimulationResult display nicely in Jupyter
- Interactive widgets work for parameter exploration
Priority: P3 - Usability improvement Estimated Effort: 1-2 days
- Define simulations via YAML/TOML config files
- Reproducible simulation setup
- Command-line interface for batch runs
-
19.1 Define configuration schema
# simulation.yaml grid: nx: 64 ny: 64 domain: xmin: 0.0 xmax: 1.0 ymin: 0.0 ymax: 1.0 stretching: null solver: type: projection params: dt: 0.001 cfl: 0.5 max_iter: 1000 tolerance: 1e-6 boundary_conditions: left: type: inlet profile: parabolic max_velocity: 1.0 right: type: outlet top: type: noslip bottom: type: noslip output: directory: ./results format: vtk interval: 100 run: steps: 10000 checkpoint_interval: 1000
-
19.2 Create config loader
from cfd_python.config import load_config, run_from_config config = load_config("simulation.yaml") result = run_from_config(config)
-
19.3 Add CLI entry point
cfd-python run simulation.yaml cfd-python validate simulation.yaml cfd-python info # Show available solvers, backends -
19.4 Config validation with helpful errors
- Simulations can be fully defined in config files
- CLI enables batch processing
- Config errors give clear, actionable messages
Priority: P4 - Extensibility Estimated Effort: 3-5 days
- Allow users to register custom solvers
- Support custom boundary conditions
- Enable third-party extensions
-
20.1 Define plugin interface
from cfd_python.plugins import SolverPlugin, register_plugin class MySolver(SolverPlugin): name = "my_custom_solver" def step(self, u, v, p, dt): # Custom solver implementation ... register_plugin(MySolver)
-
20.2 Plugin discovery mechanism
# Automatic discovery via entry points # pyproject.toml of plugin package: [project.entry-points."cfd_python.plugins"] my_solver = "my_package:MySolver"
-
20.3 Custom BC plugin support
from cfd_python.plugins import BCPlugin class RotatingWall(BCPlugin): name = "rotating_wall" def apply(self, u, v, nx, ny, edge, omega): # Rotating wall BC implementation ...
-
20.4 Plugin validation and testing utilities
- Users can create and register custom solvers
- Plugins discovered automatically via entry points
- Plugin API is stable and documented
Priority: P2 - Performance Estimated Effort: 2-3 days
- Reduce memory footprint for large simulations
- Support memory-mapped arrays
- Enable out-of-core processing
-
21.1 Add memory usage reporting
from cfd_python.memory import estimate_memory, get_memory_usage mem = estimate_memory(nx=1024, ny=1024, solver="projection") print(f"Estimated memory: {mem / 1e9:.2f} GB") # During simulation usage = get_memory_usage() print(f"Current usage: {usage['allocated_mb']:.1f} MB")
-
21.2 Memory-mapped field storage
grid = Grid(nx=4096, ny=4096) sim = Simulation(grid, memory_mode="mmap", mmap_dir="/tmp/cfd")
-
21.3 Streaming output for large simulations
- Write output incrementally instead of buffering
- Support compressed output formats
-
21.4 Memory pool for repeated allocations
- Memory usage is predictable and reportable
- Large simulations can run with limited RAM
- No memory leaks in long-running simulations
Priority: P3 - CUDA enhancement Estimated Effort: 2-3 days
- Expose GPU memory information
- Support multi-GPU configurations
- Optimize GPU memory transfers
-
22.1 GPU memory reporting
from cfd_python.cuda import get_gpu_info, get_gpu_memory info = get_gpu_info() # {'device_count': 2, 'devices': [{'name': 'RTX 4090', ...}, ...]} mem = get_gpu_memory(device=0) # {'total_mb': 24576, 'free_mb': 20000, 'used_mb': 4576}
-
22.2 Device selection
from cfd_python.cuda import set_device set_device(1) # Use second GPU sim.run(steps=1000) # Runs on GPU 1
-
22.3 Multi-GPU domain decomposition
sim = Simulation( grid, solver="projection_cuda", devices=[0, 1], # Split across 2 GPUs decomposition="y" # Split along y-axis )
-
22.4 Pinned memory for faster transfers
- GPU memory usage is visible and controllable
- Multi-GPU runs work correctly
- Memory transfers are optimized
Priority: P2 - Debugging and monitoring Estimated Effort: 1-2 days
- Structured logging throughout the library
- Integration with Python logging
- Performance metrics collection
-
23.1 Add structured logging
import logging logging.basicConfig(level=logging.INFO) # cfd_python will now log: # INFO:cfd_python:Starting simulation with projection solver # INFO:cfd_python:Step 100/1000, max_vel=1.234, residual=1.2e-5
-
23.2 Configurable log levels
import cfd_python cfd_python.set_log_level("DEBUG") # Verbose output cfd_python.set_log_level("WARNING") # Only warnings and errors
-
23.3 Metrics collection
from cfd_python.metrics import get_metrics sim.run(steps=1000, collect_metrics=True) metrics = get_metrics() # {'solver_time_ms': [...], 'bc_time_ms': [...], 'memory_mb': [...]}
-
23.4 Integration with OpenTelemetry (optional)
- Logging works with standard Python logging
- Debug information helps troubleshoot issues
- Metrics enable performance analysis
Priority: P1 - User adoption Estimated Effort: 3-5 days
- Comprehensive API documentation
- Tutorials and guides
- Searchable documentation site
-
24.1 Set up Sphinx documentation
- API reference from docstrings
- Getting started guide
- Installation instructions
-
24.2 Write tutorials
- Basic simulation walkthrough
- Boundary conditions guide
- Performance optimization tips
- CUDA setup guide
-
24.3 Add example gallery
- Lid-driven cavity
- Channel flow
- Flow around cylinder
- Heat transfer examples
-
24.4 Deploy to Read the Docs
-
24.5 Add docstring coverage to CI
- All public functions have docstrings
- Tutorials cover common use cases
- Documentation is searchable and navigable
Priority: P3 - Research/Experimental Estimated Effort: 2-4 weeks
- Enable ML-based surrogate models trained on CFD solver outputs
- Support Physics-Informed Neural Networks (PINNs) for fast inference
- Provide data generation utilities for training datasets
Physics-Informed Neural Networks embed physical laws (Navier-Stokes equations) into the loss function, achieving:
- ~1000× speedup over traditional CFD for inference
- <5% relative error for velocity fields
- 5-10× less memory than CFD solvers
Limitation: Performance degrades for Re > 200 (transitional/turbulent flows).
-
25.1 Dataset Generation Module
from cfd_python.ml import DatasetGenerator # Generate training data from CFD simulations generator = DatasetGenerator( solver="projection", grid_sizes=[(32, 32), (64, 64), (128, 128)], reynolds_range=(10, 200), n_samples=500 ) # Returns numpy arrays suitable for ML frameworks X_train, y_train = generator.generate() # Export to common formats generator.to_hdf5("training_data.h5") generator.to_numpy("training_data.npz")
-
25.2 PINN Loss Function Utilities
from cfd_python.ml import NavierStokesLoss # Physics loss for PINNs (framework-agnostic numpy version) ns_loss = NavierStokesLoss(Re=100, dx=0.01, dy=0.01) # Compute residuals for Navier-Stokes equations continuity_residual = ns_loss.continuity(u, v) momentum_x_residual = ns_loss.momentum_x(u, v, p) momentum_y_residual = ns_loss.momentum_y(u, v, p)
-
25.3 PyTorch Integration (Optional Dependency)
from cfd_python.ml.torch import PINNLoss, CFDDataset # PyTorch-native physics loss criterion = PINNLoss( data_weight=1.0, physics_weight=0.1, # Balance data vs physics Re=100 ) # Dataset compatible with DataLoader dataset = CFDDataset.from_hdf5("training_data.h5") loader = DataLoader(dataset, batch_size=32)
-
25.4 Pre-trained Model Zoo
from cfd_python.ml import load_pretrained # Load pre-trained surrogate model model = load_pretrained("cavity_flow_64x64") # Fast inference (~1000x faster than CFD) u, v, p = model.predict(Re=100, lid_velocity=1.0)
-
25.5 Benchmark Comparisons
from cfd_python.ml import benchmark_surrogate # Compare surrogate vs CFD solver results = benchmark_surrogate( model=surrogate_model, solver="projection", test_cases=[(Re=50), (Re=100), (Re=150)], metrics=["l2_error", "max_error", "inference_time"] ) print(results.summary()) # Re=50: L2=0.023, Max=0.045, Speedup=1243x # Re=100: L2=0.031, Max=0.067, Speedup=1156x # Re=150: L2=0.048, Max=0.092, Speedup=1089x
[project.optional-dependencies]
ml = ["numpy>=1.20", "h5py>=3.0"]
ml-torch = ["torch>=2.0", "cfd-python[ml]"]
ml-jax = ["jax>=0.4", "flax>=0.7", "cfd-python[ml]"]| Architecture | Use Case | Notes |
|---|---|---|
| Fully Connected | Simple surrogate | Fast training, limited accuracy |
| Convolutional | Grid-based predictions | Good for structured grids |
| Fourier Neural Operator | Resolution-independent | State-of-the-art for PDEs |
| Graph Neural Network | Irregular meshes | Handles complex geometries |
- Physics-Informed Neural Networks (Raissi et al.)
- Fourier Neural Operator (Li et al.)
- DeepXDE Library
- Dataset generation works with all supported solvers
- PyTorch integration passes CI tests
- Pre-trained models achieve <5% error on benchmark cases
- Documentation includes end-to-end PINN training example
| Version | Phases | Focus |
|---|---|---|
| 0.2.0 | 8 (mostly done), 9 | Type safety & IDE support |
| 0.3.0 | 10, 11 | High-level API & NumPy |
| 0.4.0 | 12, 13 | cfd-visualization integration & profiling |
| 0.5.0 | 14, 17 | Async, parallel & validation |
| 0.6.0 | 15, 16 | 3D support & checkpointing |
| 0.7.0 | 18, 19, 23 | Jupyter, config & logging |
| 0.8.0 | 20, 21, 22 | Plugins & memory optimization |
| 0.9.0 | 25 | ML & PINN integration |
| 1.0.0 | 24 | Documentation & stabilization |
Items not yet planned but worth considering:
- Turbulence models (k-ε, k-ω, LES)
- Multiphase flow support
- Compressible flow solvers
- Thermal coupling (energy equation)
- Species transport
- Moving mesh support
- OpenFOAM mesh import/export
- CGNS format support
- ParaView Catalyst integration
- NetCDF output format
- STL geometry import
- Mixed precision (FP32/FP64)
- Sparse matrix solvers
- Multigrid preconditioners
- Domain decomposition for MPI
- VS Code extension
- PyCharm plugin
- Web-based simulation dashboard
- Docker images with pre-built CUDA support
Contributions are welcome! Priority areas:
- Type stubs - Help complete the
.pyifile - Documentation - Examples and tutorials
- Testing - Edge cases and platform coverage
- Validation cases - Add more benchmark problems
For visualization contributions, see cfd-visualization.
See CONTRIBUTING.md for guidelines.
- README.md - User documentation
- CHANGELOG.md - Version history
- cfd-visualization ROADMAP - Visualization library roadmap