Skip to content

YaroslavMayorov/chip_seq_data_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intelligent search through public ChIP-Seq data

Table of Contents

  1. Overview
  2. Tech Stack
  3. Project structure
  4. Setup and Installation
  5. Usage
  6. FAQ
  7. Contacts

Overview

This project is a genomic interval lookup service that allows users to upload a BED file to database and find the most similar files using the Jaccard index as a similarity measure.


Tech Stack

  • Backend: Flask (Python)

  • Database: PostgreSQL, MinIO for cloud storage

  • Processing: bedtools for similarity measure

  • Frontend: HTML, Bootstrap


Project Structure

.
├── app/               
│   ├── data/                          # Main files for db
│   │   ├── ENCFF082UWB.bed
│   │   ├── ENCFF190KNC.bed
│   │   ├── ENCFF247CME.bed
│   │   ├── ENCFF608BGQ.bed
│   │   ├── ENCFF832YGL.bed
│   │
│   ├── templates/                     # HTML templates for the web interface
│   │   ├── file_details.html
│   │   ├── similar_files.html
│   │   ├── upload.html
│   │
│   ├── config.py                      # Project configurations              
│   ├── main.py                        # Flask server entry point
│   ├── minio_utils.py                 # Utilities for working with MinIO
│   ├── models.py                      # The logic of working with data
│   ├── wait_for_db.py                 # Db waiting script                        
│
├── docker-compose.yml                 # Connects cloud storage, database, and application.
├── Dockerfile                         # Instructions for installing dependencies and launching
├── README.md                          
├── requirements.txt                   # Required Python libraries 

Setup and Installation

Clone the repository:

git clone git@github.com:YaroslavMayorov/chip_seq_data_search.git
cd chip_seq_data_search

Usage

  1. Start the application:
docker compose up -d --build
  1. Open a browser:

Go to the address localhost:5001 .

  1. Upload BED file:

Select a BED file to upload, specify the number of similar files you want to see, and click "Upload".

After that, you will see a list of similar BED files. In the right corner, you can see the Jaccard index for each file.

  1. View similar file:

    Click on the file name to view its contents.

  2. Download similar file:

    Near the file content, there is a "Download" button. Click it to download the file.

  3. Close containers and clean memory:

Close and removes images and containers associated with the project:

docker compose down -v --rmi all --remove-orphans

Removes all build cache: (Warning: This affects all projects, not just this one.)

docker builder prune -af

FAQ

Why did I choose this internship?

I studied in a biology-focused class, so I have a strong background in biology. I have long wanted to work in bioinformatics, and this internship is a great opportunity for me.

For about two years, I have been working with Flask, developing various web applications, from simple projects to multi-page websites. You can see one of my projects here: arthouserooms.pythonanywhere.com.

Recently, I developed my wallpaper Telegram bot, @InspireWallBot, so I know how to work with server-side databases and deploy applications.

I also completed a data analysis course from Tinkoff and a machine learning specialization from Yandex, which gave me valuable experience working with data visualization. In addition, I prepared for the DANO Olympiad in data analysis, so I have a solid understanding of statistics, data processing, and interpretation.

I really want to join your team, because this topic is really close to me. I am ready to learn new things and help develop a useful tool for biological data research. I would be grateful for the opportunity to be part of this project!

Command "docker-compose" not found.

Linux:

   sudo apt update
   sudo apt install docker-compose

MacOs:

   brew install docker-compose
What if port is unavailable / address already in use?

By default app is on 5001, postgre is on 5432, and minIO is on 9000, 9090. Change the port in docker-compose.yml:

ports:
   - "<your_port>:5000"
Where are files stored? Uploaded files are stored in MinIO. You can see it on MinIO Dashboard. Sign in with MINIO_ROOT_USER as login and MINIO_ROOT_PASSWORD as password (they are in docker-compose.yml).
Why did you posted code with secret key and passwords? I know that secret keys should be loaded from a `.env` file. However, this is just a test app. I included it directly in the code so that users don't have to create a `.env` file manually — it makes running the app easier.
Why do you use PostgreSQL and MinIO? You could use SQLite for a local app.

Yes, SQLite is more convenient and faster to implement for local applications. However, I wanted to make the app closer to a real production setup and demonstrate that I know how to work with the tools typically used in such environments.


Contacts

If you have any questions, feel free to contact me:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors