add some tests & refactoring by Nesma-Osama · Pull Request #6 · Mo-Khater/datascience

Nesma-Osama · 2026-05-03T11:44:19Z

Summary by CodeRabbit

Chores
- Centralized project paths and MLflow config, added .env to .gitignore, and updated Makefile targets to run modules.
Refactor
- Training and evaluation reorganized into explicit functions and gated entrypoints to prevent side effects on import.
New Features
- Structured, section-based pipeline logging with persistent, ordered log sections.
Tests
- Added end-to-end and unit tests covering data loading, training, prediction, and reporting.

coderabbitai · 2026-05-03T11:44:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Centralizes filesystem paths in a new config module, replaces top-level prediction execution with function-based predict workflow guarded by if __name__ == "__main__", loads env in training and switches MLflow/config usage, introduces sectioned pipeline logging utilities, updates targets to module invocation, adds tests, and ignores .env.

Changes

Project-wide config centralization and model pipeline refactor

Layer / File(s)	Summary
Configuration / Data Shape `config.py`	Adds canonical Path constants: `PROJECT_ROOT`, `SRC_DIR`, `PACKAGE_DIR`, `DATA_DIR`, `RAW_DIR`, `INTERIM_DIR`, `PROCESSED_DIR`, `TEMP_DIR`, `REPORTS_DIR`, `FIGURES_DIR`, `PIPELINE_LOG_PATH`, `TESTS_DIR`, `TEST_MODELS_DIR`, `TEST_MODEL_PATH`, `MLFLOW_DB_PATH`.
Core Implementation — Prediction (pure functions) `src/house_price_class_prediction/models/predict.py`	Removes import-time side effects; adds `read_data()`, `load_model()`, `predict(model, X_test)`, `print_results(y_test, y_pred)`, `run_prediction()`; uses `PROCESSED_DIR` and `TEST_MODEL_PATH` from `config`; adds `if __name__ == "__main__"` guard and `sys.path` manipulation to resolve imports.
Core Implementation — Training `src/house_price_class_prediction/models/train.py`	Calls `load_dotenv()` before MLflow setup; uses `PROCESSED_DIR` and `MLFLOW_DB_PATH` from `config`; sets MLflow experiment name to `house_price_class_prediction`; computes training predictions first and records train/test reports as dicts; adds `logger.info` around model runs.
Logging facility `src/house_price_class_prediction/utils/logging_utils.py`	Adds section-ordered pipeline logger: `SECTION_ORDER`, `SECTION_NAMES`, `SharedSectionLogger` (buffered per-section storage), `SectionLogHandler`, and `setup_logging(name)` to initialize shared logging and persist sectioned log file.
Wiring / Integration (path imports & logging init) `src/.../data/data_acquisition.py`, `src/.../features/*`, `src/.../visualization/visualize.py`, `src/.../validation/validate_data.py`, `src/.../features/add_schools_pharmacies_hospitals.py`	Replace locally computed path constants with imports from `config`; add `setup_logging("<section>")` in `main()` of several modules and replace some `print()` calls with `logging.info()`.
Build / CLI Targets `Makefile`	`train` and `evaluate` targets updated to run modules via `python -m house_price_class_prediction.models.train` and `...models.predict` with `PYTHONPATH` instead of invoking script files directly.
Tests `tests/test_model_train.py`, `tests/test_predict.py`	Adds training tests (data loading/shape, GridSearchCV, classification report checks, small end-to-end) and prediction tests (expose config paths, `read_data()`, `load_model()`, `predict()`, and `run_prediction()` with fixtures and log/stdout assertions).
Version Control / Env `.gitignore`	Adds `.env` to `.gitignore`.

sequenceDiagram
    participant Runner
    participant PredictModule as "predict module"
    participant FS as "Filesystem (PROCESSED_DIR / TEST_MODEL_PATH)"
    participant Model as "Pickled model"
    participant Log as "PIPELINE_LOG (SharedSectionLogger)"

    Runner->>PredictModule: call run_prediction()
    PredictModule->>Log: setup_logging("predict")
    PredictModule->>FS: read X_test_final.csv, y_test_final.csv
    FS-->>PredictModule: return X_test, y_test
    PredictModule->>FS: open TEST_MODEL_PATH (pickle)
    FS-->>PredictModule: return model bytes
    PredictModule->>Model: deserialize -> Model instance
    PredictModule->>Model: call predict(X_test)
    Model-->>PredictModule: y_pred
    PredictModule->>PredictModule: print_results -> stdout
    PredictModule->>Log: write "Prediction complete"

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 In the burrow config paths now lie in rows,
Predict functions hop out instead of prose,
Logging sections stack like tidy leaves,
Tests nibble fixtures, Makefile quietly weaves,
A rabbit's refactor—neat as carrot shows.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Mohamed-Ashraf273 · 2026-05-03T17:51:49Z

@copilot resolve the merge conflicts in this pull request

Co-authored-by: Mohamed-Ashraf273 <117025882+Mohamed-Ashraf273@users.noreply.github.com>

Copilot · 2026-05-03T17:53:46Z

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in f5d69ae. The conflicts in build_features.py, feature_selection.py, and preprocessing.py have been resolved by incorporating changes from main (passing overwrite=args.overwrite to file writes and adding the feature selection output logic) while preserving the logging calls from this branch.

add some test

3d1f5a5

Mohamed-Ashraf273 added 3 commits May 3, 2026 18:25

fix formatting

8fe234c

remove env

7f142a8

fix path

a190a30

Mohamed-Ashraf273 changed the title ~~add some test~~ add some tests & refactoring May 3, 2026

Mohamed-Ashraf273 added 4 commits May 3, 2026 19:59

apply reactoring

2ef15c3

Merge branch 'main' into nsma-test

75114b3

add logging

c6d3f90

fix linting

546afb0

Copilot started work on behalf of Mohamed-Ashraf273 May 3, 2026 17:51 View session

Merge branch 'main' into nsma-test - resolve conflicts

f5d69ae

Co-authored-by: Mohamed-Ashraf273 <117025882+Mohamed-Ashraf273@users.noreply.github.com>

Copilot finished work on behalf of Mohamed-Ashraf273 May 3, 2026 17:54

Copilot AI requested a review from Mohamed-Ashraf273 May 3, 2026 17:54

Mohamed-Ashraf273 added 3 commits May 3, 2026 23:37

refactor project

02f5777

update report

f26c785

Final report

691ab5f

Mohamed-Ashraf273 approved these changes May 3, 2026

View reviewed changes

update report

6ab21b0

Mohamed-Ashraf273 merged commit 07acbf6 into main May 3, 2026
3 of 4 checks passed

Mohamed-Ashraf273 deleted the nsma-test branch May 3, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add some tests & refactoring#6

add some tests & refactoring#6
Mohamed-Ashraf273 merged 13 commits into
mainfrom
nsma-test

Nesma-Osama commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Mohamed-Ashraf273 commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Nesma-Osama commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Mohamed-Ashraf273 commented May 3, 2026

Uh oh!

Copilot AI commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nesma-Osama commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading