[GSoC 2026] Codebase Analysis & Proposed Migration Architecture :- Feedback Requested

## About Me
Hi, I'm a GSoC 2025 applicant interested in the openPIP 2.0 modernization project 
under NRNB. I've spent time going through the existing codebase, the published paper 
(Helmy et al., JMB 2022), and the GSoC project description carefully before posting 
this. I want to share my understanding of the current system and my proposed migration 
approach, and get feedback from the mentors before I finalize my proposal.

---

## Section 1: Current Stack — What I've Mapped Out

From the codebase and paper, here's my understanding of the existing architecture:

| Layer | Current (v1) |
|---|---|
| Language | PHP 7.2 |
| Framework | Symfony 2.x |
| ORM | Doctrine ORM |
| Frontend | jQuery, jQuery UI, Cytoscape.js, FooTable, qTip2, TinyMCE |
| Database | MySQL 8.0 |
| Server | Apache (php:7.2.0-apache) |
| Containerization | Docker Compose |
| Data Format | PSI-MI TAB v2.7 |
| External APIs | UniProt, Ensembl (protein annotation during upload) |
| Asset pipeline | Symfony Assetic |
| File uploads | vich/uploader-bundle |

Key entry points I've identified:
- `start.sh` / `populate_db.sh` — Docker startup and DB seeding scripts
- `src/AppBundle/Controller/` — page controllers (Search, Admin, Download, User)
- `src/AppBundle/Entity/` — Doctrine ORM entities (Protein, Interaction, Dataset, etc.)
- `web/` — public root with app.php and static assets

The admin workflow currently is: PSI-MI TAB file upload → PHP parser → 
MySQL population → UniProt/Ensembl annotation fetch → web display via Symfony/Twig.

---

## Section 2: Proposed Migration — openPIP 2.0 Stack

Here's the mapping I'm proposing for the new stack:

| Layer | Current (v1) | Proposed (v2.0) |
|---|---|---|
| Language | PHP 7.2 | Python 3.11+ |
| Framework | Symfony 2.x | FastAPI (or Django REST) |
| ORM | Doctrine | SQLAlchemy (FastAPI) / Django ORM |
| Frontend | jQuery + Twig templates | React + Vite |
| Network viz | Cytoscape.js (jQuery-bound) | Cytoscape.js (React wrapper) |
| Database | MySQL 8.0 | PostgreSQL (or keep MySQL) |
| Containerization | Docker Compose | Docker Compose (retained + improved) |
| Data formats | PSI-MI TAB v2.7 only | PSI-MI TAB v2.7 + CSV |
| File uploads | vich/uploader-bundle | React drag-and-drop + FastAPI endpoints |
| External APIs | UniProt, Ensembl | UniProt REST API + Ensembl REST API |

My reasoning for FastAPI over Django: the core of openPIP 2.0 is data 
ingestion + query serving — FastAPI's async nature suits this well, and 
its automatic OpenAPI docs would give openPIP a REST API essentially for 
free (something v1 explicitly lacks per the paper). However, I'm open to 
Django if the mentors prefer it for its batteries-included admin panel.

---

## Section 3: My Proposed Approach for the Data Upload Pipeline

This is the most critical component. The current PHP upload flow would 
map to Python as:
```
[File Upload (React drag-drop)]
       ↓
[FastAPI endpoint receives file]
       ↓
[Python PSI-MI TAB v2.7 parser]  ← I've already started a PoC of this
       ↓
[Validation layer — column count, format checks, controlled vocabulary]
       ↓
[UniProt/Ensembl REST API calls for protein annotation]
       ↓
[SQLAlchemy models → PostgreSQL]
       ↓
[Progress feedback via WebSocket or SSE to React frontend]
```

For CSV support (new in v2.0), I'd add a normalization step that maps 
CSV columns to the internal data model before hitting the same 
validation/storage pipeline.

---

## Section 4: Questions for the Mentors

Before I finalize my proposal, I have 3 specific questions I'd genuinely 
like your input on:

**Q1 — Database:** Is the plan to redesign the MySQL schema from scratch 
for v2.0, or should the new schema preserve backward compatibility with 
the existing one? This significantly affects how I scope the migration work.

**Q2 — Framework preference:** Do you have a preference between FastAPI 
and Django for the backend? The paper mentions Python as preferred — 
either works, but knowing your preference early would help me write a 
more aligned proposal.

**Q3 — Scope of "CSV support":** When the project description mentions 
a "simpler tabular CSV format," does this mean a simplified subset of 
PSI-MI fields in CSV form, or a completely custom schema that labs can 
define? This changes how complex the normalization layer needs to be.

---

I'm happy to discuss any of the above or adjust my approach based on 
your feedback. I'll also be opening a small proof-of-concept PR 
(a Python PSI-MI TAB parser) shortly as a concrete contribution.

Thanks for your time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC 2026] Codebase Analysis & Proposed Migration Architecture :- Feedback Requested #95

About Me

Section 1: Current Stack — What I've Mapped Out

Section 2: Proposed Migration — openPIP 2.0 Stack

Section 3: My Proposed Approach for the Data Upload Pipeline

Section 4: Questions for the Mentors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Layer	Current (v1)
Language	PHP 7.2
Framework	Symfony 2.x
ORM	Doctrine ORM
Frontend	jQuery, jQuery UI, Cytoscape.js, FooTable, qTip2, TinyMCE
Database	MySQL 8.0
Server	Apache (php:7.2.0-apache)
Containerization	Docker Compose
Data Format	PSI-MI TAB v2.7
External APIs	UniProt, Ensembl (protein annotation during upload)
Asset pipeline	Symfony Assetic
File uploads	vich/uploader-bundle

Layer	Current (v1)	Proposed (v2.0)
Language	PHP 7.2	Python 3.11+
Framework	Symfony 2.x	FastAPI (or Django REST)
ORM	Doctrine	SQLAlchemy (FastAPI) / Django ORM
Frontend	jQuery + Twig templates	React + Vite
Network viz	Cytoscape.js (jQuery-bound)	Cytoscape.js (React wrapper)
Database	MySQL 8.0	PostgreSQL (or keep MySQL)
Containerization	Docker Compose	Docker Compose (retained + improved)
Data formats	PSI-MI TAB v2.7 only	PSI-MI TAB v2.7 + CSV
File uploads	vich/uploader-bundle	React drag-and-drop + FastAPI endpoints
External APIs	UniProt, Ensembl	UniProt REST API + Ensembl REST API

[GSoC 2026] Codebase Analysis & Proposed Migration Architecture :- Feedback Requested #95

Description

About Me

Section 1: Current Stack — What I've Mapped Out

Section 2: Proposed Migration — openPIP 2.0 Stack

Section 3: My Proposed Approach for the Data Upload Pipeline

Section 4: Questions for the Mentors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions