Add sports team website scraping workflow by DisabledAbel · Pull Request #14 · DisabledAbel/MakeICS

DisabledAbel · 2026-06-11T16:01:00Z

Added official sports team website scraping via Firecrawl to supplement TheSportsDB data, allowing for up to 200 games per team. Merged scraped data into the ICS generation workflow.

PR created automatically by Jules for task 16370083516219159048 started by @DisabledAbel

Summary by CodeRabbit

New Features
- Upcoming events now merge per-team supplemental schedules (scraped and CSV-sourced) for more complete, timely listings — improving coverage for select teams (notably WNBA).
Data
- Added cached per-team supplemental schedule files to surface recent team-specific games when absent from league feeds.
Tests
- Added test verifying supplemental schedule integration with upcoming events.
CI
- Workflow updated to include supplemental schedule files when preparing commits.

This change enhances the sports schedule workflow by optionally scraping official team websites using Firecrawl's structured extraction. Key changes: - Defined `SPORTS_EXTRACT_SCHEMA` and `fetchScheduleFromWebsite` in `lib/sports.js`. - Updated `getUpcomingEvents` to merge supplemental data from `lib/data/sports/supplemental/`. - Enhanced `scripts/fetch-sports.js` to discover team websites and manage scraping with a 24h staleness check. - Added tests to verify merging of scraped data. - Exported normalization logic for use in the fetch script.

google-labs-jules · 2026-06-11T16:01:02Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

vercel · 2026-06-11T16:01:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
make-ics	Ready	Preview, Comment	Jun 12, 2026 5:15am

coderabbitai · 2026-06-11T16:01:21Z

Warning

Review limit reached

@DisabledAbel, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 1 second. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0fb82ab-666d-4754-90dd-e9f0efaaaa93

📥 Commits

Reviewing files that changed from the base of the PR and between f130c18 and a11c703.

📒 Files selected for processing (1)

scripts/fetch-sports.js

📝 Walkthrough

Walkthrough

Adds Firecrawl extraction schema and two helpers (normalizeScrapedEvent, fetchScheduleFromWebsite) to fetch and normalize scraped team schedules, caches per-team supplemental JSON under lib/data/sports/supplemental/, integrates loading those supplemental events into getUpcomingEvents, updates fetch-sports to produce supplemental files (CSV and website scrapes), updates CI to stage them, and adds a test validating merged results.

Changes

Supplemental Event Aggregation via Firecrawl

Layer / File(s)	Summary
Firecrawl extraction schema and helpers `lib/sports.js`	Adds `SUPPLEMENTAL_DATA_DIR` and `SPORTS_EXTRACT_SCHEMA`; exports `normalizeScrapedEvent(game, teamName)` and `fetchScheduleFromWebsite(websiteUrl, { env, fetchImpl })` to call Firecrawl, enforce API key and timeout, and return extracted `games` arrays.
Supplemental event loading in getUpcomingEvents `lib/sports.js`	Adds internal `loadSupplementalTeamEvents(teamId)` to read `supplemental/<teamId>.json` and integrates its promise into `getUpcomingEvents` so supplemental events are merged with league/season and eventsnext results and then deduped/filtered.
Supplemental data collection in fetch-sports script `scripts/fetch-sports.js`	Initializes `SUPPLEMENTAL_DATA_DIR`, updates WNBA league id to `4516`, adds `fetchWNBASupplemental(teams)` to parse a WNBA CSV into per-team supplemental JSON, adds `isSupplementalStale(teamId)`, and implements per-team Firecrawl scraping + normalization + write when stale (throttled, per-team error handling).
Supplemental data files and CI `lib/data/sports/supplemental/*.json`, `.github/workflows/fetch-sports.yml`	Adds supplemental JSON files for teams `136437` and `136438`; updates CI commit step to include `lib/data/sports/supplemental/*.json` when staging changes.
Test for supplemental event merging `test/sports.test.js`	Adds test that writes a supplemental team JSON file, invokes `getUpcomingEvents`, asserts supplemental events are present in merged results, and removes the file in cleanup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

DisabledAbel/MakeICS#13: Adjusts time-based filtering in getUpcomingEvents; both change the event aggregation/filtering pipeline.
Add school and sports schedules support #9: Earlier work on getUpcomingEvents that this PR extends with supplemental source merging.
Enhance sports caching resilience and implement TV timezone support #12: Changes to getUpcomingEvents loading/merging behavior; related to supplemental-source handling.

Suggested labels

codex

Poem

🐰 I hopped through HTML and CSV,
Fetched team schedules merrily,
I normalized each game with cheer,
Saved per-team files for all to share,
Calendar-bound — the season's near!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add sports team website scraping workflow' accurately reflects the main addition: a new Firecrawl-based website scraping workflow integrated into the sports data pipeline. However, it omits the equally significant WNBA supplemental CSV source and the merged supplemental data integration into getUpcomingEvents, which are substantial parts of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/sports-website-scraping-16370083516219159048

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/sports.js`:
- Around line 123-141: The Firecrawl POST currently has no timeout and can hang;
modify the call in lib/sports.js that invokes fetchImpl(...) to use an
AbortController signal and a timeout (e.g., from env.FIRECRAWL_TIMEOUT_MS with a
sensible default like 10000 ms): create AbortController, pass controller.signal
in the fetchImpl options, start a setTimeout to call controller.abort() after
the timeout, and clear the timer after the response; ensure downstream error
handling treats an abort as a timeout error.
- Around line 97-108: The scraped game.time is appended with ":00"
unconditionally causing malformed times when seconds are already present; update
the logic in lib/sports.js around the timestamp/strTime construction
(variables/functions: game.time, strTime, strTimestamp, timestamp) to first
normalize game.time to a valid HH:mm or HH:mm:ss format (e.g., if it matches
HH:mm append ":00", if it already matches HH:mm:ss leave as-is, or if missing
use "00:00:00"), then compose strTime from that normalized value and build
timestamp as `${game.date}T${normalizedTime}` so downstream filtering sees valid
upcoming timestamps.

In `@scripts/fetch-sports.js`:
- Around line 183-195: The current block only writes a file when games &&
games.length > 0, leaving teams with zero results unpersisted; change the logic
so you always write a supplemental JSON for the team even when games is empty.
Specifically, compute normalizedEvents = games && games.length ? games.map(g =>
normalizeScrapedEvent(g, team.strTeam)) : [] and always call fs.writeFile to
write an object containing teamId (team.idTeam), teamName (team.strTeam),
updatedAt (new Date().toISOString()), and events: normalizedEvents (empty array
when no games); keep the console.log but adjust it to reflect zero results when
applicable so teams aren't retried repeatedly.
- Around line 188-193: The fetch-sports script writes supplemental team files to
lib/data/sports/supplemental/<teamId>.json (see the write to filePath in
scripts/fetch-sports.js where JSON with teamId/teamName/updatedAt/events is
created), but the GitHub Actions job in .github/workflows/fetch-sports.yml only
stages lib/data/sports/*.json and thus will skip the supplemental folder; update
the workflow's staging/commit/include pattern to also include
lib/data/sports/supplemental/**/*.json (or move the output to the already-staged
path) so the supplemental artifacts produced by the writeFile call are picked up
and committed.

In `@test/sports.test.js`:
- Around line 200-241: The test "getUpcomingEvents merges supplemental (scraped)
data" is time-dependent; make it deterministic by fixing the clock during the
test: stub Date.now()/new Date() (or use sinon.useFakeTimers) so now is a fixed
timestamp, or compute all scrapedData date/time fields relative to a controlled
"now" used by getUpcomingEvents; ensure the scrapedData fields (dateEvent,
strTime, strTimestamp, updatedAt) are set consistently with the fake clock and
restore the clock in finally; reference the test name, getUpcomingEvents, and
createFetchMock when applying the clock stub so assertions about future-event
inclusion remain stable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8529cc9e-0bd3-4bae-b708-77a8b9a84e49

📥 Commits

Reviewing files that changed from the base of the PR and between 28527fa and fc39d3e.

📒 Files selected for processing (3)

lib/sports.js
scripts/fetch-sports.js
test/sports.test.js

- Implement Firecrawl scraping for official sports team websites - Add extraction schema and normalization for scraped events - Support supplemental data merging in getUpcomingEvents - Add staleness check based on JSON updatedAt field - Implement 10s timeout for Firecrawl requests - Update GitHub workflow to persist supplemental data - Ensure deterministic testing for merging logic

- Implement supplemental WNBA schedule fetching from SportsDataverse CSV - Add Firecrawl-based scraping for official sports team websites - Implement a robust character-based CSV parser for SportsDataverse data - Add 10s timeout and AbortController for Firecrawl requests - Ensure 'Z' UTC designator in scraped timestamps for correct filtering - Update staleness check to use JSON 'updatedAt' field - Fix WNBA league ID to 4516 in fetch script - Make supplemental merging tests deterministic with mock timers - Update GitHub workflow to persist and commit supplemental data folder

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/fetch-sports.js`:
- Around line 129-131: The CSV parser currently can drop the final row when the
file lacks a trailing newline because leftover data remains in
currentField/currentRow and never gets pushed into rows; update the parsing
logic (where currentField, currentRow, and rows are maintained—e.g., inside the
parse loop function) to, after the loop completes, append the last currentField
to currentRow (if any) and then append currentRow to rows (if it contains any
fields) before using rows[0] as header; ensure you handle empty strings vs
undefined appropriately so a final empty field is preserved when intended.
- Around line 246-249: The current freshness check computes lastUpdated = new
Date(data.updatedAt).getTime() but doesn't handle malformed dates (NaN), so
files with bad updatedAt are treated as fresh; update the logic around
lastUpdated (the variable computed from data.updatedAt) to detect NaN (e.g.,
Number.isNaN(lastUpdated) or !isFinite(lastUpdated)) and immediately treat the
record as stale (return true) instead of proceeding to the subtraction; keep the
existing now and twentyFourHoursMs check for valid timestamps.
- Around line 93-96: This fetch call to download the WNBA CSV lacks a timeout;
wrap it with an AbortController and the same FETCH_TIMEOUT_MS timeout pattern
used elsewhere (e.g., in fetchJson) so the request can't hang indefinitely:
create an AbortController, start a timer that calls controller.abort() after
FETCH_TIMEOUT_MS, pass controller.signal into fetch(url, { signal }), and ensure
the timer is cleared in a finally block so response/text handling (response,
csvText) only proceeds if not aborted.
- Line 158: The strTime value created in the CSV parsing can be in "HH:mm" which
is inconsistent with normalizeScrapedEvent and other sources expecting
"HH:mm:ss"; update the logic that assigns strTime (the line creating strTime
from date.split('T')[1] in scripts/fetch-sports.js) to normalize times to
"HH:mm:ss" by detecting a two-segment time (regex or split) and appending ":00"
when needed, otherwise leaving a full "HH:mm:ss" unchanged so downstream parsers
receive a consistent format.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 052c3391-fa20-4766-9fc8-cdd6b9c91879

📥 Commits

Reviewing files that changed from the base of the PR and between fc39d3e and e2ed30f.

📒 Files selected for processing (6)

.github/workflows/fetch-sports.yml
lib/data/sports/supplemental/136437.json
lib/data/sports/supplemental/136438.json
lib/sports.js
scripts/fetch-sports.js
test/sports.test.js

✅ Files skipped from review due to trivial changes (1)

lib/data/sports/supplemental/136438.json

🚧 Files skipped from review as they are similar to previous changes (2)

lib/sports.js
test/sports.test.js

- Add WNBA schedule data from SportsDataverse (wehoop) CSV - Implement official team website scraping using Firecrawl extraction - Build robust character-based CSV parser for wehoop datasets - Add AbortController timeouts for all supplemental data fetches - Normalize scraped/CSV times to HH:mm:ss with 'Z' UTC designator - Refine staleness logic to handle malformed updatedAt dates - Update GitHub workflow to persist the supplemental/ data folder - Fix WNBA league ID to 4516 for accurate TSDB lookups - Ensure deterministic sports merging tests with mock timers

coderabbitai

🧹 Nitpick comments (1)

scripts/fetch-sports.js (1)

186-199: 💤 Low value

Team name mismatch may silently skip events.

The lookup teamSupplemental.get(team.strTeam) assumes TheSportsDB team names exactly match the CSV's home_display_name/away_display_name. If they differ (e.g., "Las Vegas Aces" vs "Aces"), events won't be saved for that team, with no warning logged.

Consider logging when a team has no matching events:

Proposed fix

     for (const team of teams) {
       const teamEvents = teamSupplemental.get(team.strTeam);
       if (teamEvents) {
         const filePath = path.join(SUPPLEMENTAL_DATA_DIR, `${team.idTeam}.json`);
         // ...
         console.log(`    Saved ${teamEvents.length} WNBA supplemental events for ${team.strTeam}`);
+      } else {
+        console.log(`    No WNBA supplemental events found for ${team.strTeam} (name mismatch?)`);
       }
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/fetch-sports.js` around lines 186 - 199, The code currently uses
teamSupplemental.get(team.strTeam) and silently skips teams if the exact name
doesn't match; update the block around teamSupplemental.get to log a warning
when teamEvents is falsy (include team.strTeam and team.idTeam in the message)
and attempt a tolerant fallback lookup (e.g., normalized lowercase/trim match or
match on team.idTeam or other CSV display name keys) before deciding to skip; if
no fallback match is found, still log that no supplemental events were found for
that team so missing data is visible, otherwise proceed to write the merged JSON
as before using filePath and fs.writeFile.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/fetch-sports.js`:
- Around line 186-199: The code currently uses
teamSupplemental.get(team.strTeam) and silently skips teams if the exact name
doesn't match; update the block around teamSupplemental.get to log a warning
when teamEvents is falsy (include team.strTeam and team.idTeam in the message)
and attempt a tolerant fallback lookup (e.g., normalized lowercase/trim match or
match on team.idTeam or other CSV display name keys) before deciding to skip; if
no fallback match is found, still log that no supplemental events were found for
that team so missing data is visible, otherwise proceed to write the merged JSON
as before using filePath and fs.writeFile.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec2a04a3-4112-4057-a201-314f305d09e5

📥 Commits

Reviewing files that changed from the base of the PR and between e2ed30f and f130c18.

📒 Files selected for processing (1)

scripts/fetch-sports.js

- Integrate WNBA schedule data from SportsDataverse (wehoop) CSV - Implement official team website scraping using Firecrawl extraction - Implement a robust character-based CSV parser for wehoop datasets - Add AbortController timeouts for all supplemental data fetches - Implement tolerant (case-insensitive/trimmed) team name matching for WNBA - Normalize scraped/CSV times to HH:mm:ss with 'Z' UTC designator - Refine staleness logic to handle malformed updatedAt dates - Update GitHub workflow to persist the supplemental/ data folder - Fix WNBA league ID to 4516 for accurate TSDB lookups - Ensure deterministic sports merging tests with mock timers

vercel Bot deployed to Preview June 11, 2026 16:01 View deployment

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread lib/sports.js

Comment thread lib/sports.js Outdated

Comment thread scripts/fetch-sports.js Outdated

Comment thread scripts/fetch-sports.js Outdated

Comment thread test/sports.test.js

vercel Bot deployed to Preview June 11, 2026 16:26 View deployment

vercel Bot deployed to Preview June 11, 2026 17:11 View deployment

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread scripts/fetch-sports.js

Comment thread scripts/fetch-sports.js

Comment thread scripts/fetch-sports.js Outdated

Comment thread scripts/fetch-sports.js

vercel Bot deployed to Preview June 12, 2026 05:06 View deployment

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

vercel Bot deployed to Preview June 12, 2026 05:15 View deployment

DisabledAbel merged commit 7cb1758 into main Jun 12, 2026
4 checks passed

DisabledAbel deleted the feature/sports-website-scraping-16370083516219159048 branch June 12, 2026 05:17

coderabbitai Bot mentioned this pull request Jun 12, 2026

📅 Expand sports data with ESPN scraping and SportsDataverse #15

Merged

Conversation

DisabledAbel commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

google-labs-jules Bot commented Jun 11, 2026

Uh oh!

vercel Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DisabledAbel commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading