Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 0 additions & 31 deletions .circleci/example_config.yml

This file was deleted.

4 changes: 0 additions & 4 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,6 @@
- [ ] New feature (non-breaking change which adds functionality)
- [ ] New documentation

## How Has This Been Tested?

- [x] `kedro run --pipeline <pipeline name>`

## ✅ Checks
<!-- Make sure your pr passes the CI checks and do check the following fields as needed - -->
- [ ] I have commented my code, particularly in hard-to-understand areas
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ logs/**
# except their sub-folders
!data/**/
!logs/**/
# keep the (small) datasets used in the examples
!data/expenses.csv.zip
!data/fetal_health.csv.zip

# also keep all .gitkeep files
!.gitkeep
Expand Down Expand Up @@ -139,3 +142,6 @@ venv.bak/

# mypy
.mypy_cache/

# VS Code
.vscode/
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
exclude: ^data/
exclude: ^(data/|docs/)
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 Pedro Orii Antonacio

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
20 changes: 13 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
install-pre-commit:
pip install pre-commit && \
pre-commit install

lint:
checks:
@echo "Running checks..."
pre-commit run -a

test:
python -m pytest
unzip-datasets:
@echo "Unzipping datasets..."
unzip -j data/expenses.csv.zip -d data/
unzip -j data/fetal_health.csv.zip -d data/

convert-notebooks-to-html:
rm -rf docs/*.html
@for nb in src/*.ipynb; do \
echo "Converting $$nb to HTML..."; \
WARNING_FILTER_POLICY=ignore jupyter nbconvert --to html --execute "$$nb" --output-dir=docs/ --ExtractOutputPreprocessor.enabled=False; \
done
45 changes: 36 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,47 @@
[![Python](https://img.shields.io/badge/python-3.10-blue.svg)](https://github.com)
[![Python](https://img.shields.io/badge/python-3.12-blue.svg)](https://github.com)

# Title
# Simple Data Science

Description
This project compiles simple and practical examples for common Data Science use cases with tabular data.

## Installation
You can access complete examples using the following links:
1. [Binary Classification](https://antonacio.github.io/simple-data-science/classification-binary.html)
2. [Multiclass Classification](https://antonacio.github.io/simple-data-science/classification-multiclass.html)
3. [Regression](https://antonacio.github.io/simple-data-science/regression.html)
4. [Clustering](https://antonacio.github.io/simple-data-science/clustering.html)
5. [Histogram Analysis](https://antonacio.github.io/simple-data-science/histogram_analysis.html)

### Pre Commit Setup
## Setup

In this repository, we use UV—a handy Python package and project manager. To install UV, follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/).

To set up the environment and install the required dependencies, run the following commands in your terminal:

```bash
pip install -r requirements.txt
pre-commit install
cd simple-data-science # change to the project's directory
uv venv --python 3.12 # create virtual environment using UV
source .venv/bin/activate # activate virtual environment
uv sync # synchronize dependencies
pre-commit install # install pre-commit hooks
```

or
If you want to deactivate and delete the virtual environment, run:

```bash
make install-pre-commit
deactivate # deactivate virtual environment
rm -rf .venv # delete virtual environment
```

## Data

The examples in this project use the publicly available [Fetal Health Dataset](https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification) and [Medical Insurance Payout Dataset](https://www.kaggle.com/datasets/harshsingh2209/medical-insurance-payout).

Because the datasets are small, they are available as `.zip` files in the repository's `data/` folder. You can unzip them with your preferred software or simply run `make unzip-datasets` in your terminal.

## Contributions

We welcome contributions of all kinds! Whether you have questions, spot a bug, or want to enhance the code, documentation, or tests, please feel free to start a discussion or open a pull request. Your feedback, ideas, and fixes are vital in making this project better for everyone!

## License

MIT
Empty file added data/.gitkeep
Empty file.
Binary file added data/expenses.csv.zip
Binary file not shown.
Binary file added data/fetal_health.csv.zip
Binary file not shown.
Loading