Skip to content

Comments

Refactor data collection into lifetime and sprint workflows#133

Merged
hcaballero2 merged 4 commits intomainfrom
feature/lifetime-sprint-workflows
Feb 22, 2026
Merged

Refactor data collection into lifetime and sprint workflows#133
hcaballero2 merged 4 commits intomainfrom
feature/lifetime-sprint-workflows

Conversation

@dhyana6466
Copy link
Collaborator

@dhyana6466 dhyana6466 commented Feb 20, 2026

Description

This PR refactors the existing data collection setup into two separate GitHub Action workflows: one for lifetime data and one for sprint-specific data.
The lifetime workflow runs twice a month and collects the full repository history to generate long-term health metrics. The sprint workflow runs on a schedule and only collects data within the active sprint window.
To make sprint handling easier for future updates, I added a sprint_schedule.json file where sprint start and end dates are hardcoded. This keeps sprint configuration separate from the main logic and makes it easier to update each semester.
The script collectData.py now accepts a --mode argument so that each workflow calls the correct logic (lifetime or sprint).
Both workflows are configured to push updates to the Data_Updates branch instead of main.

Fixes #131

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

I tested both modes locally.

Lifetime mode:
python -m Backend.dataCollection.collectData --mode lifetime
This iterated through all issues in the repositories and successfully generated lifetime_data.json in the data/ folder.

Sprint mode:
python -m Backend.dataCollection.collectData --mode sprint
This reads from sprint_schedule.json, checks if today falls within a sprint window, and generates sprint_data.json when appropriate.

Both modes completed without errors and produced the expected JSON structure.

  • Test A
  • Test B

Test Configuration:

  • Language Version: Python 3.10
  • Local machine testing (MacOS)
  • GitHub Personal Access Token configured in .env

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Screenshot of Output

image image

hcaballero2 and others added 2 commits February 22, 2026 08:32
…in permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@hcaballero2
Copy link
Collaborator

hcaballero2 commented Feb 22, 2026

Great work overall @dhyana6466 ! I really love how you handled the different workflows! Would you be able to update your collectData.py script so that it uses the other tools in the dataCollection folder? You did a great job of making the functions to collect the necessary data but we have scripts that already do that. If there need to be edits made to these files please feel free!

@dhyana6466
Copy link
Collaborator Author

Great work overall @dhyana6466 ! I really love how you handled the different workflows! Would you be able to update your collectData.py script so that it uses the other tools in the dataCollection folder? You did a great job of making the functions to collect the necessary data but we have scripts that already do that. If there need to be edits made to these files please feel free!

Thanks! That makes sense. I’ve updated collectData.py so it now uses the existing scripts inside the dataCollection folder instead of handling the aggregation directly. I moved the repo-level collection into a separate function (repositoryCollector.py) and now collectData.py just handles the workflow mode and writing to the JSON files. Let me know if you’d like me to adjust anything else!

@hcaballero2
Copy link
Collaborator

Great job!

@hcaballero2 hcaballero2 merged commit fb48877 into main Feb 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lifetime and Sprint-wise Workflows

2 participants