Refactor data collection into lifetime and sprint workflows#133
Refactor data collection into lifetime and sprint workflows#133hcaballero2 merged 4 commits intomainfrom
Conversation
…in permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
|
Great work overall @dhyana6466 ! I really love how you handled the different workflows! Would you be able to update your collectData.py script so that it uses the other tools in the dataCollection folder? You did a great job of making the functions to collect the necessary data but we have scripts that already do that. If there need to be edits made to these files please feel free! |
Thanks! That makes sense. I’ve updated collectData.py so it now uses the existing scripts inside the dataCollection folder instead of handling the aggregation directly. I moved the repo-level collection into a separate function (repositoryCollector.py) and now collectData.py just handles the workflow mode and writing to the JSON files. Let me know if you’d like me to adjust anything else! |
|
Great job! |
Description
This PR refactors the existing data collection setup into two separate GitHub Action workflows: one for lifetime data and one for sprint-specific data.
The lifetime workflow runs twice a month and collects the full repository history to generate long-term health metrics. The sprint workflow runs on a schedule and only collects data within the active sprint window.
To make sprint handling easier for future updates, I added a sprint_schedule.json file where sprint start and end dates are hardcoded. This keeps sprint configuration separate from the main logic and makes it easier to update each semester.
The script collectData.py now accepts a --mode argument so that each workflow calls the correct logic (lifetime or sprint).
Both workflows are configured to push updates to the Data_Updates branch instead of main.
Fixes #131
Type of change
How Has This Been Tested?
I tested both modes locally.
Lifetime mode:
python -m Backend.dataCollection.collectData --mode lifetime
This iterated through all issues in the repositories and successfully generated lifetime_data.json in the data/ folder.
Sprint mode:
python -m Backend.dataCollection.collectData --mode sprint
This reads from sprint_schedule.json, checks if today falls within a sprint window, and generates sprint_data.json when appropriate.
Both modes completed without errors and produced the expected JSON structure.
Test Configuration:
Checklist:
Screenshot of Output