📊 Auto EDA & Data Cleaning Tool

A powerful, interactive web application built with Streamlit that streamlines the Data Science workflow. This tool allows users to upload datasets, perform comprehensive Exploratory Data Analysis (EDA), apply advanced data cleaning techniques (including ML-based outlier detection), and download the processed data.

🚀 Features

1. 📋 Data Overview

Instant Preview: View the head of your dataset immediately after upload.
Metadata: Automatically generate column data types, non-null counts, and unique value counts.
Missing Value Analysis: Get a summarized report of missing data percentages per column.

2. 🧹 Advanced Data Cleaning

⚡ Auto-Clean Mode: A one-click solution that removes duplicates and intelligently fills missing values (median for numbers, mode for categories).
Handling Missing Values:
Drop rows.
Impute with Mean, Median, or Mode.
Fill with a specific custom value.
Duplicate Removal: Detect and remove duplicate rows instantly.
Outlier Detection & Removal:
IQR Method: Standard statistical method for outlier removal.
Isolation Forest: Machine Learning algorithm (Unsupervised) to detect anomalies in complex distributions.

3. 📊 Exploratory Data Analysis (EDA)

Statistical Summary: detailed descriptive statistics (mean, std, min, max, percentiles).
Interactive Visualizations:
Distributions: Histograms with interactive tooltips.
Box Plots: For spotting outliers visually.
Correlation Heatmap: Visualize relationships between numeric variables.
Categorical Counts: Bar charts for top appearing categories.

4. 💾 Export

Comparison Metrics: See how many rows were removed during cleaning.
Download: Export the final cleaned dataset as a .csv file.

📂 Project Structure

To ensure the imports work correctly, organize your files as follows:

auto-eda-tool/
│
├── app.py                 # The main Streamlit application
├── requirements.txt       # List of dependencies
└── utils/
    ├── __init__.py        # Empty file to make utils a Python package
    ├── data_cleaner.py    # Contains the cleaning logic functions
    └── eda_functions.py   # Contains the plotting and stats functions

🛠️ Installation & Setup

Clone the repository (or create the folder structure above):

mkdir auto-eda-tool
cd auto-eda-tool

Create a virtual environment (Recommended):

# Windows
python -m venv venv
venv\Scripts\activate

# Mac/Linux
python3 -m venv venv
source venv/bin/activate

Install dependencies: Create a requirements.txt file with the contents below, then run:

pip install -r requirements.txt

requirements.txt content:

streamlit
pandas
numpy
scikit-learn
plotly

Run the application:

streamlit run app.py

📖 Usage Guide

Upload: Use the sidebar to upload a CSV file.
Overview Tab: Check the "Missing Values Summary" to see what needs fixing.
Cleaning Tab:

Use "Auto Clean" for a quick fix.
Or, go step-by-step: Pick a strategy for missing values -> Remove duplicates -> Select a column to strip outliers.
Note: The app uses Session State, so you can perform multiple cleaning actions in sequence.

EDA Tab: Select specific columns to visualize their distribution or check the heatmap for correlations.
Download Tab: Review the final row count and download your clean dataset.

🧰 Tech Stack

Frontend: Streamlit
Data Manipulation: Pandas, NumPy
Machine Learning: Scikit-learn (SimpleImputer, LabelEncoder, StandardScaler, IsolationForest)
Visualization: Plotly Express

🤝 Contributing

Contributions are welcome!

Fork the repository.
Create a feature branch (git checkout -b feature/NewFeature).
Commit your changes.
Push to the branch.
Open a Pull Request.

📄 License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Auto EDA & Data Cleaning Tool

🚀 Features

1. 📋 Data Overview

2. 🧹 Advanced Data Cleaning

3. 📊 Exploratory Data Analysis (EDA)

4. 💾 Export

📂 Project Structure

🛠️ Installation & Setup

📖 Usage Guide

🧰 Tech Stack

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 Auto EDA & Data Cleaning Tool

🚀 Features

1. 📋 Data Overview

2. 🧹 Advanced Data Cleaning

3. 📊 Exploratory Data Analysis (EDA)

4. 💾 Export

📂 Project Structure

🛠️ Installation & Setup

📖 Usage Guide

🧰 Tech Stack

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages