Skip to content
View tisyasharma's full-sized avatar

Block or report tisyasharma

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tisyasharma/README.md

Tisya Sharma

Hi! I'm currently a 4th year Data Science student at Northeastern University (5 year program, graduating 2027). I've worked on applied data science projects spanning time-series forecasting, biological data analysis, and model-driven applications. I really enjoy building and evaluating predictive models, designing data pipelines, and developing full-stack tools to explore and communicate results.

I’m excited to continue working with real-world, high-dimensional data and to contribute to projects where data science can support meaningful analysis and decisions!


Projects


Machine Learning & Forecasting

Flight Delay Forecasting | Repo | Live Demo

Route-level time-series forecasting using U.S. flight data (2019–2025) and ERA5 weather reanalysis (Open-Meteo API).

  • Evaluated XGBoost, LightGBM, LSTM, and TCN using 4-fold walk-forward validation
  • Best performance achieved by XGBoost / LightGBM (11.25 min MAE, ~77.7% hit rate)
  • Removing weather features increased error by 10.3%
  • Built an interactive React + D3 dashboard for exploring forecasts, residuals, model comparisons, and performance diagnostics

Computational Biology & Research

AON Single-Nucleus RNA-Seq Analysis | Repo

Single-nucleus RNA sequencing analysis of the mouse Anterior Olfactory Nucleus integrating 10x Genomics data with the Allen Brain Cell Atlas.

  • Applied PCA-based dimensionality reduction and clustering to analyze high-dimensional gene expression data
  • Analyzed 36,863 nuclei, retaining 8,885 high-quality cells across 28 clusters
  • Identified spatial segregation of excitatory neurons into dorsolateral and ventromedial populations
  • Discovered Abi3bp as the top marker of contralaterally-projecting neurons using FDR-corrected testing

Data Engineering & Systems

OpenFDA Adverse Events Data Pipeline | Repo

Document-store modeling and aggregation pipeline design using large-scale biomedical event data from the OpenFDA API.

  • Ingested 5,000 nested FDA adverse event reports with hierarchical patient, drug, and reaction arrays
  • Modeled pharmacovigilance data in MongoDB
  • Computed death rates by administration route and demographic breakdowns using aggregation pipelines
  • Implemented a PyMongo query interface returning JSON-serializable outputs

Twitter Storage Comparison | Repo

Systems comparison of PostgreSQL and Redis for Twitter-style workloads.

  • Benchmarked approximately ~1M tweets and ~10K follow relationships
  • PostgreSQL achieved higher write throughput; Redis provided faster timeline reads
  • Explored architectural tradeoffs in fan-out strategies

Full-Stack Applications

Pocket Protectors | Repo | Live Demo

Full-stack personal finance application with an ML-assisted receipt categorization pipeline.

  • Flask API, React frontend, and MySQL backend
  • Implemented a multi-stage categorization system with confidence-based routing
  • Supports on-demand model retraining from user-corrected labels

Pinned Loading

  1. flight-delay-forecasting flight-delay-forecasting Public

    Jupyter Notebook

  2. AON_SnRNASeq_TS AON_SnRNASeq_TS Public

    Murthy Lab AON snRNA sequencing files and analysis

    Jupyter Notebook

  3. pocket-protector pocket-protector Public

    Forked from NEU-CS3200/23f-project-boilerplate

    Pocket Protector: Protect your pocket with our financial tracking app

    JavaScript 1

  4. twitter-storage-comparison twitter-storage-comparison Public

    Twitter relational database using postgres!

    Python

  5. openfda-data-pipeline openfda-data-pipeline Public

    MongoDB-based biomedical data pipeline with aggregation workflows, API ingestion, and programmatic query abstraction.

    Python