Skip to content

[GSoC 2026] No Python parsing layer in data-upload/ — PSI-MI TAB v2.7 and CSV support needed #96

@Abhishek-Kumar-Rai5

Description

@Abhishek-Kumar-Rai5

Problem

The existing data-upload/uploader.py is a Selenium browser automation
script that submits files through the live website UI. It contains no
PSI-MI TAB parsing logic. The requirements.txt contains only selenium.

This means the data-upload/ directory has no Python-based parsing,
validation, or data modeling layer. The actual parsing currently lives
entirely in the PHP backend, which is what v2.0 is replacing.

For openPIP 2.0 to work as a Python-based system, the following are
needed before anything else in the upload pipeline can be built:

  • A Python parser for PSI-MI TAB v2.7 format
  • A parser for the new CSV format mentioned in the v2.0 goals
  • A validation layer that catches format errors before DB insertion
  • Python data models that map parsed output directly to the existing
    openpip.sql schema so the path to DB insertion is straightforward

Proposed Fix

Add the above as a self-contained Python module inside data-upload/
so it can be directly extended into the full FastAPI upload pipeline
in v2.0.

A working implementation addressing all of the above is in PR #XX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions