Problem
The existing data-upload/uploader.py is a Selenium browser automation
script that submits files through the live website UI. It contains no
PSI-MI TAB parsing logic. The requirements.txt contains only selenium.
This means the data-upload/ directory has no Python-based parsing,
validation, or data modeling layer. The actual parsing currently lives
entirely in the PHP backend, which is what v2.0 is replacing.
For openPIP 2.0 to work as a Python-based system, the following are
needed before anything else in the upload pipeline can be built:
- A Python parser for PSI-MI TAB v2.7 format
- A parser for the new CSV format mentioned in the v2.0 goals
- A validation layer that catches format errors before DB insertion
- Python data models that map parsed output directly to the existing
openpip.sql schema so the path to DB insertion is straightforward
Proposed Fix
Add the above as a self-contained Python module inside data-upload/
so it can be directly extended into the full FastAPI upload pipeline
in v2.0.
A working implementation addressing all of the above is in PR #XX.