IBM Applied Data Science Capstone Project

An end-to-end data science pipeline built to predict the landing success of SpaceX Falcon 9 rocket first-stage boosters, maximizing cost-efficiencies for commercial aerospace launches.

📌 Project Overview

This repository contains the complete portfolio framework for the IBM Applied Data Science Capstone curriculum. The project spans data ingestion via web scraping and REST APIs, relational database management using SQL, interactive geospatial visualization, web-app dashboard deployment, and hyperparameter-tuned machine learning classification algorithms.

🚀 Core Data Pipeline Phases

Data Collection & Extraction: Gathering launch logs using the SpaceX REST API and scraping historical Wikipedia tables using BeautifulSoup4.
Data Wrangling & Processing: Handling null parameters, feature engineering categorical metrics using One-Hot Encoding, and flattening raw payloads.
Exploratory Data Analysis (EDA): Executive data analysis utilizing SQL queries and relational visualization plots.
Geospatial Mapping: Isolating launch pad coordinates, safety distances, and landing failure/success metrics using interactive map overlays.
Interactive Dashboard App: Deploying a live analytics control panel featuring structural charts and reactive filter selectors.

Launch Site Success Proportions	Payload Mass vs. Success Correlation

Predictive Modeling (ML): Training, tuning, and bench-testing four separate categorization algorithms to declare the optimal landing predictor model.

📂 Repository Structure

## 📁 Repository Structure
The project is organized into modular notebooks and scripts tracking each phase of the data science lifecycle:

* **`data/`**: Dedicated directory containing your analytical visual assets (scatter plots and pie charts).
* **`01_Data-Collection-API.ipynb`**: Data gathering using SpaceX API requests.
* **`02_Webscraping.ipynb`**: Web scraping historical launch data using BeautifulSoup.
* **`03_Data_Wrangling.ipynb`**: Data cleaning, handling null values, and initial feature engineering.
* **`04-EDA-With-SQL.ipynb`**: Exploratory Data Analysis using SQL queries to discover operational trends.
* **`05_EDA_Data_Visualization.ipynb`**: Exploratory Data Analysis using Python visual analytics (Matplotlib and Seaborn).
* **`06_Launch_Site_Location.ipynb`**: Interactive geospatial mapping using Folium.
* **`07_Dashapp.py`**: A fully functional, interactive Plotly Dash web dashboard application.
* **`08_Machine_Learning_Predictions.ipynb`**: Machine learning classification model training, hyperparameter tuning, and evaluation.
* **`requirements.txt`**: List of required Python packages and environment dependencies.
* **`LICENSE`**: MIT License.

🛠️ Built With

Python 3 - Underlying programming runtime.
Scikit-Learn - Machine learning classification models & GridSearchCV tuning.
Plotly Dash - Dynamic data application framework environment.
Folium - Interactive HTML geospatial map visualization layers.
Pandas / NumPy - Matrix manipulations and structured data processing pipelines.
BeautifulSoup4 / Requests - Web scraping tools and REST API parsing pipelines.

🚀 Getting Started

Prerequisites

Configure your local environment automatically by installing all the tracked project library dependencies directly via the configuration file:

pip install -r requirements.txt

Execution Steps

Clone this repository to your local system environment:

git clone https://github.com/usmanali9999/Applied-Data-Science-Capstone.git
cd Applied-Data-Science-Capstone

Start the interactive workspace environment:
```
jupyter notebook
```
Run the development notebooks in sequence (01_data-collection-api.ipynb through 08_Machine_Learning_Prediction.ipynb) to replicate the data insights pipeline.
Launch the live dashboard visualization application locally:
```
python 07-dashapp.py
```

📊 Machine Learning Model Performance Summary

The table below details the optimal tuning parameters and prediction accuracies across all classification models tested in this lab. Each model was optimized using GridSearchCV and evaluated on identical train and test splits.

Classification Model	Best Hyperparameters Found	Training Accuracy	Test Dataset Accuracy
Decision Tree	`{'criterion': 'gini', 'max_depth': 8, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 5, 'splitter': 'random'}`	87.50%	88.89%
K-Nearest Neighbors (KNN)	`{'algorithm': 'auto', 'n_neighbors': 10, 'p': 1}`	84.82%	83.33%
Support Vector Machine (SVM)	`{'C': 1.0, 'gamma': 0.0316, 'kernel': 'sigmoid'}`	84.82%	83.33%
Logistic Regression	`{'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}`	84.64%	83.33%

🎯 Final Model Selection & Evaluation

To determine the absolute best predictive framework, an evaluation loop compared each optimized estimator against the unseen testing split.

The best performing model is: DecisionTreeClassifier
Highest Test Accuracy Score: 88.89%

Strategic Breakdown

While Logistic Regression, SVM, and KNN all converged on a strong baseline performance of 83.33%, the Decision Tree framework adjusted best to the underlying classification boundaries of the standardized SpaceX payload and orbit characteristics. This suggests that the tree-structured partitions were more effective at isolating the specific combination of features that guarantee a successful Falcon 9 first-stage landing.

Note: All optimized classification architectures yielded a tied baseline performance matrix accuracy across validation sets, heavily driven by the initial engineered feature profiles.

📄 License

Distributed under the MIT License. See LICENSE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBM Applied Data Science Capstone Project

📌 Project Overview

🚀 Core Data Pipeline Phases

📂 Repository Structure

🛠️ Built With

🚀 Getting Started

Prerequisites

Execution Steps

📊 Machine Learning Model Performance Summary

🎯 Final Model Selection & Evaluation

Strategic Breakdown

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
data		data
.gitignore		.gitignore
01_Data-Collection-API.ipynb		01_Data-Collection-API.ipynb
02_Webscraping.ipynb		02_Webscraping.ipynb
03_Data_Wrangling.ipynb		03_Data_Wrangling.ipynb
04-EDA-With-SQL.ipynb		04-EDA-With-SQL.ipynb
05_EDA_Data_Visualization.ipynb		05_EDA_Data_Visualization.ipynb
06_Launch_Site_Location.ipynb		06_Launch_Site_Location.ipynb
07_Dashapp.py		07_Dashapp.py
08_Machine_Learning_Predictions.ipynb		08_Machine_Learning_Predictions.ipynb
09_Capstone_Final_Presentation.pdf		09_Capstone_Final_Presentation.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

IBM Applied Data Science Capstone Project

📌 Project Overview

🚀 Core Data Pipeline Phases

📂 Repository Structure

🛠️ Built With

🚀 Getting Started

Prerequisites

Execution Steps

📊 Machine Learning Model Performance Summary

🎯 Final Model Selection & Evaluation

Strategic Breakdown

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages