Skip to content

add some tests & refactoring#6

Merged
Mohamed-Ashraf273 merged 13 commits into
mainfrom
nsma-test
May 3, 2026
Merged

add some tests & refactoring#6
Mohamed-Ashraf273 merged 13 commits into
mainfrom
nsma-test

Conversation

@Nesma-Osama
Copy link
Copy Markdown
Collaborator

@Nesma-Osama Nesma-Osama commented May 3, 2026

Summary by CodeRabbit

  • Chores

    • Centralized project paths and MLflow config, added .env to .gitignore, and updated Makefile targets to run modules.
  • Refactor

    • Training and evaluation reorganized into explicit functions and gated entrypoints to prevent side effects on import.
  • New Features

    • Structured, section-based pipeline logging with persistent, ordered log sections.
  • Tests

    • Added end-to-end and unit tests covering data loading, training, prediction, and reporting.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Centralizes filesystem paths in a new config module, replaces top-level prediction execution with function-based predict workflow guarded by if __name__ == "__main__", loads env in training and switches MLflow/config usage, introduces sectioned pipeline logging utilities, updates targets to module invocation, adds tests, and ignores .env.

Changes

Project-wide config centralization and model pipeline refactor

Layer / File(s) Summary
Configuration / Data Shape
config.py
Adds canonical Path constants: PROJECT_ROOT, SRC_DIR, PACKAGE_DIR, DATA_DIR, RAW_DIR, INTERIM_DIR, PROCESSED_DIR, TEMP_DIR, REPORTS_DIR, FIGURES_DIR, PIPELINE_LOG_PATH, TESTS_DIR, TEST_MODELS_DIR, TEST_MODEL_PATH, MLFLOW_DB_PATH.
Core Implementation — Prediction (pure functions)
src/house_price_class_prediction/models/predict.py
Removes import-time side effects; adds read_data(), load_model(), predict(model, X_test), print_results(y_test, y_pred), run_prediction(); uses PROCESSED_DIR and TEST_MODEL_PATH from config; adds if __name__ == "__main__" guard and sys.path manipulation to resolve imports.
Core Implementation — Training
src/house_price_class_prediction/models/train.py
Calls load_dotenv() before MLflow setup; uses PROCESSED_DIR and MLFLOW_DB_PATH from config; sets MLflow experiment name to house_price_class_prediction; computes training predictions first and records train/test reports as dicts; adds logger.info around model runs.
Logging facility
src/house_price_class_prediction/utils/logging_utils.py
Adds section-ordered pipeline logger: SECTION_ORDER, SECTION_NAMES, SharedSectionLogger (buffered per-section storage), SectionLogHandler, and setup_logging(name) to initialize shared logging and persist sectioned log file.
Wiring / Integration (path imports & logging init)
src/.../data/data_acquisition.py, src/.../features/*, src/.../visualization/visualize.py, src/.../validation/validate_data.py, src/.../features/add_schools_pharmacies_hospitals.py
Replace locally computed path constants with imports from config; add setup_logging("<section>") in main() of several modules and replace some print() calls with logging.info().
Build / CLI Targets
Makefile
train and evaluate targets updated to run modules via python -m house_price_class_prediction.models.train and ...models.predict with PYTHONPATH instead of invoking script files directly.
Tests
tests/test_model_train.py, tests/test_predict.py
Adds training tests (data loading/shape, GridSearchCV, classification report checks, small end-to-end) and prediction tests (expose config paths, read_data(), load_model(), predict(), and run_prediction() with fixtures and log/stdout assertions).
Version Control / Env
.gitignore
Adds .env to .gitignore.
sequenceDiagram
    participant Runner
    participant PredictModule as "predict module"
    participant FS as "Filesystem (PROCESSED_DIR / TEST_MODEL_PATH)"
    participant Model as "Pickled model"
    participant Log as "PIPELINE_LOG (SharedSectionLogger)"

    Runner->>PredictModule: call run_prediction()
    PredictModule->>Log: setup_logging("predict")
    PredictModule->>FS: read X_test_final.csv, y_test_final.csv
    FS-->>PredictModule: return X_test, y_test
    PredictModule->>FS: open TEST_MODEL_PATH (pickle)
    FS-->>PredictModule: return model bytes
    PredictModule->>Model: deserialize -> Model instance
    PredictModule->>Model: call predict(X_test)
    Model-->>PredictModule: y_pred
    PredictModule->>PredictModule: print_results -> stdout
    PredictModule->>Log: write "Prediction complete"
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 In the burrow config paths now lie in rows,
Predict functions hop out instead of prose,
Logging sections stack like tidy leaves,
Tests nibble fixtures, Makefile quietly weaves,
A rabbit's refactor—neat as carrot shows.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

@Mohamed-Ashraf273 Mohamed-Ashraf273 changed the title add some test add some tests & refactoring May 3, 2026
@Mohamed-Ashraf273
Copy link
Copy Markdown
Collaborator

@copilot resolve the merge conflicts in this pull request

Co-authored-by: Mohamed-Ashraf273 <117025882+Mohamed-Ashraf273@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI commented May 3, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in f5d69ae. The conflicts in build_features.py, feature_selection.py, and preprocessing.py have been resolved by incorporating changes from main (passing overwrite=args.overwrite to file writes and adding the feature selection output logic) while preserving the logging calls from this branch.

@Mohamed-Ashraf273 Mohamed-Ashraf273 merged commit 07acbf6 into main May 3, 2026
3 of 4 checks passed
@Mohamed-Ashraf273 Mohamed-Ashraf273 deleted the nsma-test branch May 3, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants