📅 Expand sports data with ESPN scraping and SportsDataverse#15
Conversation
- Automate ESPN schedule scraping every 6 hours via Firecrawl - Integrate SportsDataverse CSVs for NBA, NFL, NHL, and WNBA - Implement robust character-based CSV parser for external datasets - Add tolerant team matching (abbreviations, prefix, case-insensitive) - Normalize all supplemental times to HH:mm:ss with 'Z' designator - Update GitHub Actions workflow to run every 6 hours - Synchronize internal cache staleness check with 6-hour workflow - Ensure deterministic merging tests with mock timers
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds ESPN schedule scraping (Firecrawl), per-league CSV supplemental ingestion, reduces supplemental staleness to 6 hours, updates main orchestration to prefer ESPN with website fallback, and runs the workflow every 6 hours while staging supplemental JSON outputs. ChangesSupplemental Schedule Scraping and Caching
Sequence DiagramsequenceDiagram
participant FetchScript as scripts/fetch-sports.js
participant SportsLib as lib/sports.fetchScheduleFromESPN
participant Firecrawl as Firecrawl API
participant FS as Filesystem
FetchScript->>SportsLib: fetchScheduleFromESPN(leagueSlug, teamSlug, opts)
SportsLib->>Firecrawl: POST /extract (ESPN URL, schema, prompt, timeout)
Firecrawl-->>SportsLib: extracted { games: [...] }
SportsLib-->>FetchScript: parsed games array
FetchScript->>FS: merge/normalize and write `lib/data/sports/supplemental/<teamId>.json`
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai fix conflicts |
|
✅ No merge conflicts detected when merging into Your branch is good to go! |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.github/workflows/fetch-sports.yml (1)
5-5: Consider reducing FEED_REFRESH_INTERVAL to match the workflow cadence.The workflow now runs every 6 hours, but the ICS feeds use a 24-hour refresh interval (set by
FEED_REFRESH_INTERVAL = 'PT24H'inlib/sports.js). ICS clients will only check for updates once per day, potentially showing data up to 24 hours stale even though fresh data is available every 6 hours. If you want clients to benefit from the increased update frequency, consider reducingFEED_REFRESH_INTERVALto'PT6H'or'PT12H'.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/fetch-sports.yml at line 5, The feed refresh interval in lib/sports.js (FEED_REFRESH_INTERVAL = 'PT24H') is out of sync with the fetch-sports.yml workflow cron (now every 6 hours); update FEED_REFRESH_INTERVAL to a shorter ISO8601 duration such as 'PT6H' or 'PT12H' in lib/sports.js so ICS clients check more frequently and align with the workflow cadence, ensuring you change the FEED_REFRESH_INTERVAL constant and run tests that reference it (e.g., any tests or consumers of FEED_REFRESH_INTERVAL) to confirm no regressions.scripts/fetch-sports.js (1)
454-469: ⚖️ Poor tradeoffFirecrawl scraping overwrites CSV-derived supplemental data.
When Firecrawl is enabled and scraping succeeds, the code writes to the same file path as
fetchLeagueSupplementalCSV, replacing any CSV-derived events. This means for leagues with both CSV config and Firecrawl support (NBA, NFL, NHL, WNBA), the CSV data is fetched but immediately overwritten if Firecrawl succeeds.If this is intentional (Firecrawl data is more authoritative), consider skipping CSV fetch when Firecrawl is available, or merging both sources:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/fetch-sports.js` around lines 454 - 469, The Firecrawl save block overwrites CSV-derived supplemental files (written to SUPPLEMENTAL_DATA_DIR using team.idTeam.json) causing CSV events to be lost; update the logic in scripts/fetch-sports.js so that when Firecrawl scraping (allScrapedGames / normalizeScrapedEvent) succeeds you either (A) skip calling fetchLeagueSupplementalCSV for leagues where Firecrawl is available, or (B) read the existing JSON file (if present), merge CSV-derived events with normalized Firecrawl events de-duplicating by a stable key (e.g., event ID/date), then write the merged events back to the same filePath; ensure you reference and preserve teamId/teamName/updatedAt fields when writing the merged payload.lib/sports.js (1)
111-124: 💤 Low valueHome/away team inference may fail on partial name matches.
The
startsWith/endsWithlogic on lines 114-115 assumes exact team name at the boundaries. If the scrapedgame.nameis "Chelsea vs Arsenal FC" andteamNameis "Arsenal", theendsWithcheck fails because the string ends with "arsenal fc" not "arsenal". This is a minor concern sincegame.homeTeam/game.awayTeamfrom scraping would take precedence when populated.Consider using
includesfor more tolerant matching if the fallback is frequently needed:♻️ Optional improvement
return { idEvent: id, strEvent: game.name, - strHomeTeam: game.homeTeam || (game.name.toLowerCase().startsWith(teamName.toLowerCase()) ? teamName : null), - strAwayTeam: game.awayTeam || (game.name.toLowerCase().endsWith(teamName.toLowerCase()) ? teamName : null), + strHomeTeam: game.homeTeam || (game.name.toLowerCase().split(/\s+vs\.?\s+/i)[0]?.includes(teamName.toLowerCase()) ? teamName : null), + strAwayTeam: game.awayTeam || (game.name.toLowerCase().split(/\s+vs\.?\s+/i)[1]?.includes(teamName.toLowerCase()) ? teamName : null),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/sports.js` around lines 111 - 124, The fallback boundary checks for assigning strHomeTeam/strAwayTeam use startsWith/endsWith on game.name which fails for suffixes like "FC"; update the fallback logic that sets strHomeTeam and strAwayTeam (the return block using game.name and teamName) to perform a case-insensitive, more tolerant match — either use String.prototype.includes with both values lowercased or, preferably, use a case-insensitive word-boundary regex for teamName so you still respect whole-word matches but allow trailing prefixes/suffixes (e.g., match "Arsenal" inside "Arsenal FC"); apply this change to the expressions that currently use startsWith/endsWith so the fallback correctly sets strHomeTeam/strAwayTeam when scraped fields are missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/fetch-sports.js`:
- Around line 231-248: The code currently blindly builds strTimestamp from
dateRaw which can produce invalid ISO strings for non-YYYY-MM-DD inputs; update
the logic around dateRaw/timeRaw handling (symbols: dateRaw, dateEvent, strTime,
strTimestamp) to validate and normalize dates before concatenation: if dateRaw
contains 'T' keep existing branch, otherwise only construct
`${dateEvent}T${strTime}Z` when dateEvent matches /^\d{4}-\d{2}-\d{2}$/; if it
doesn't, attempt to parse dateRaw with new Date(dateRaw) and if valid use the
parsed date's YYYY-MM-DD (or toISOString()) to build a correct ISO timestamp,
and if parsing fails set strTimestamp to null (or skip constructing) and ensure
downstream parseApiTimestamp is fed null/handled accordingly and/or log a
warning.
---
Nitpick comments:
In @.github/workflows/fetch-sports.yml:
- Line 5: The feed refresh interval in lib/sports.js (FEED_REFRESH_INTERVAL =
'PT24H') is out of sync with the fetch-sports.yml workflow cron (now every 6
hours); update FEED_REFRESH_INTERVAL to a shorter ISO8601 duration such as
'PT6H' or 'PT12H' in lib/sports.js so ICS clients check more frequently and
align with the workflow cadence, ensuring you change the FEED_REFRESH_INTERVAL
constant and run tests that reference it (e.g., any tests or consumers of
FEED_REFRESH_INTERVAL) to confirm no regressions.
In `@lib/sports.js`:
- Around line 111-124: The fallback boundary checks for assigning
strHomeTeam/strAwayTeam use startsWith/endsWith on game.name which fails for
suffixes like "FC"; update the fallback logic that sets strHomeTeam and
strAwayTeam (the return block using game.name and teamName) to perform a
case-insensitive, more tolerant match — either use String.prototype.includes
with both values lowercased or, preferably, use a case-insensitive word-boundary
regex for teamName so you still respect whole-word matches but allow trailing
prefixes/suffixes (e.g., match "Arsenal" inside "Arsenal FC"); apply this change
to the expressions that currently use startsWith/endsWith so the fallback
correctly sets strHomeTeam/strAwayTeam when scraped fields are missing.
In `@scripts/fetch-sports.js`:
- Around line 454-469: The Firecrawl save block overwrites CSV-derived
supplemental files (written to SUPPLEMENTAL_DATA_DIR using team.idTeam.json)
causing CSV events to be lost; update the logic in scripts/fetch-sports.js so
that when Firecrawl scraping (allScrapedGames / normalizeScrapedEvent) succeeds
you either (A) skip calling fetchLeagueSupplementalCSV for leagues where
Firecrawl is available, or (B) read the existing JSON file (if present), merge
CSV-derived events with normalized Firecrawl events de-duplicating by a stable
key (e.g., event ID/date), then write the merged events back to the same
filePath; ensure you reference and preserve teamId/teamName/updatedAt fields
when writing the merged payload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78dbb368-82f4-4a03-8242-26f88a473011
📒 Files selected for processing (4)
.github/workflows/fetch-sports.ymllib/sports.jsscripts/fetch-sports.jstest/sports.test.js
Resolved conflicts in: - lib/sports.js (content) - scripts/fetch-sports.js (content) Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/fetch-sports.js (2)
369-381:⚠️ Potential issue | 🟠 Major | ⚡ Quick winZero-result scrapes never advance the staleness marker.
isSupplementalStale()only looks atupdatedAt, but this loop only writes a file whenallScrapedGames.length > 0. Teams with no schedule yet, a bad ESPN slug, or a transient extractor miss will stay stale forever and get retried on every 6-hour run, which defeats the throttle and can burn the Firecrawl budget.Also applies to: 417-423, 457-472
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/fetch-sports.js` around lines 369 - 381, isSupplementalStale() currently only checks data.updatedAt, but the scraper only writes the supplemental file when allScrapedGames.length > 0, so zero-result scrapes never update the marker and teams are retried every run; change the write logic so every scrape attempt updates a timestamp (either update updatedAt on every attempt or add a lastAttempt/lastSuccess pair) and have isSupplementalStale() consider the appropriate timestamp(s) (e.g., lastAttempt to throttle retries and lastSuccess to detect stale valid data). Update the code paths that write the supplemental JSON (the branch that currently only writes when allScrapedGames.length > 0) to always persist the marker and ensure isSupplementalStale() reads the new field names you choose.
413-415:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftCSV refresh and scrape refresh currently cannot coexist in the same per-team cache.
fetchLeagueSupplementalCSV()writes${team.idTeam}.jsonwith a freshupdatedAtbefore the Firecrawl loop runs, soisSupplementalStale()immediately skips ESPN for CSV-backed leagues. If a scrape does run later, Lines 462-467 replace that same file with scraped events only, so the CSV rows are dropped instead of merged. Right now the two sources are mutually exclusive.Also applies to: 417-423, 457-467
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/fetch-sports.js` around lines 413 - 415, fetchLeagueSupplementalCSV currently writes `${team.idTeam}.json` with a fresh updatedAt before the Firecrawl scrape runs, causing isSupplementalStale to skip ESPN and later scraped events to overwrite the file (dropping CSV rows); change fetchLeagueSupplementalCSV to not overwrite the canonical `${team.idTeam}.json` prematurely but instead either (A) read the existing cache file if present and merge CSV rows into it (deduplicating by event id) and update events/metadata atomically, or (B) write CSV data to a separate interim file like `${team.idTeam}.supplemental.json` and then, in the Firecrawl write path that currently replaces the same file (the code that writes `${team.idTeam}.json` after scraping), merge interim supplemental rows with scraped events and write the combined result with a single updatedAt; ensure isSupplementalStale still inspects the merged result so CSV and scraped sources coexist.lib/sports.js (1)
212-215:⚠️ Potential issue | 🟠 MajorValidate
extract.gamesis an array before returning (ESPN and website).
fetchScheduleFromESPN(...)returnspayload?.data?.extract?.games || payload?.extract?.games || []without anArray.isArraycheck. Inscripts/fetch-sports.js, the result is treated as an array (espnGames.length+allScrapedGames.push(...espnGames)) and then normalized vianormalizeScrapedEvent, which dereferencesgame.name.toLowerCase(). If Firecrawl returns a non-array (e.g., string/object), this can throw during normalization and break the scrape.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/sports.js` around lines 212 - 215, fetchScheduleFromESPN currently returns payload?.data?.extract?.games || payload?.extract?.games || [] without validating the type; update fetchScheduleFromESPN to check that the chosen value is an array (use Array.isArray) before returning and otherwise return [] (and optionally log a warning including payload or the problematic extract) so callers like scripts/fetch-sports.js that rely on espnGames being an array (espnGames.length, allScrapedGames.push(...espnGames), normalizeScrapedEvent dereferencing game.name) won't throw when Firecrawl returns a non-array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@lib/sports.js`:
- Around line 212-215: fetchScheduleFromESPN currently returns
payload?.data?.extract?.games || payload?.extract?.games || [] without
validating the type; update fetchScheduleFromESPN to check that the chosen value
is an array (use Array.isArray) before returning and otherwise return [] (and
optionally log a warning including payload or the problematic extract) so
callers like scripts/fetch-sports.js that rely on espnGames being an array
(espnGames.length, allScrapedGames.push(...espnGames), normalizeScrapedEvent
dereferencing game.name) won't throw when Firecrawl returns a non-array.
In `@scripts/fetch-sports.js`:
- Around line 369-381: isSupplementalStale() currently only checks
data.updatedAt, but the scraper only writes the supplemental file when
allScrapedGames.length > 0, so zero-result scrapes never update the marker and
teams are retried every run; change the write logic so every scrape attempt
updates a timestamp (either update updatedAt on every attempt or add a
lastAttempt/lastSuccess pair) and have isSupplementalStale() consider the
appropriate timestamp(s) (e.g., lastAttempt to throttle retries and lastSuccess
to detect stale valid data). Update the code paths that write the supplemental
JSON (the branch that currently only writes when allScrapedGames.length > 0) to
always persist the marker and ensure isSupplementalStale() reads the new field
names you choose.
- Around line 413-415: fetchLeagueSupplementalCSV currently writes
`${team.idTeam}.json` with a fresh updatedAt before the Firecrawl scrape runs,
causing isSupplementalStale to skip ESPN and later scraped events to overwrite
the file (dropping CSV rows); change fetchLeagueSupplementalCSV to not overwrite
the canonical `${team.idTeam}.json` prematurely but instead either (A) read the
existing cache file if present and merge CSV rows into it (deduplicating by
event id) and update events/metadata atomically, or (B) write CSV data to a
separate interim file like `${team.idTeam}.supplemental.json` and then, in the
Firecrawl write path that currently replaces the same file (the code that writes
`${team.idTeam}.json` after scraping), merge interim supplemental rows with
scraped events and write the combined result with a single updatedAt; ensure
isSupplementalStale still inspects the merged result so CSV and scraped sources
coexist.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7525c059-9958-40f7-993f-472f86f6fb91
📒 Files selected for processing (2)
lib/sports.jsscripts/fetch-sports.js
- Scrape ESPN team schedule pages every 6 hours via Firecrawl - Integrate bulk schedules from SportsDataverse (NBA, NFL, NHL, WNBA) - Implement robust character-based CSV parser for external datasets - Add tolerant team matching with abbreviations and prefix support - Normalize all times to HH:mm:ss with 'Z' UTC designator - Synchronize internal caching with 6-hour workflow frequency - Ensure deterministic merging logic in unit tests - Fixed WNBA league ID to 4516 and improved date validation robustness
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/data/sports/4350.json (1)
51-7500:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve scheduled status instead of null for upcoming fixtures.
The bulk change sets
strStatustonullfor upcoming events that previously used"NS". That diverges from upstream ingestion/normalization (scripts/fetch-sports.jsandlib/sports.jsboth initialize scheduled events as"NS"), and can break downstream status filtering/rendering semantics.Keep
strStatus: "NS"for scheduled fixtures (or map to a documented non-null enum), and reservenullfor truly unknown state only.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/data/sports/4350.json` around lines 51 - 7500, The JSON entries set strStatus to null for scheduled fixtures, breaking upstream normalization expected by scripts/fetch-sports.js and lib/sports.js which use "NS" for not-started events; restore strStatus: "NS" (or another documented non-null enum) for all upcoming/scheduled events (identify entries by the strTimestamp/strEvent or idEvent fields, e.g., idEvent values like "2487453" and similar) so downstream filtering/rendering keeps the original scheduled state; only leave strStatus null when the state is truly unknown.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/data/sports/4387.json`:
- Line 4: The league cache file lib/data/sports/4387.json contains an outdated
"updatedAt" timestamp; update the pipeline so league files either get their
"updatedAt" set to the current run time when refreshed or ensure
system/container time is correct so files are written with real-time timestamps.
In scripts/fetch-sports.js adjust the write path for league cache files (or the
code that generates updatedAt) so it assigns Date.now()/new Date().toISOString()
on successful refresh, or alternatively ensure the process that writes
lib/data/sports/*.json uses the same freshness logic as isSupplementalStale
(referencing isSupplementalStale and teamId) so 6-hour staleness detection is
driven by the correct supplemental files rather than stale league timestamps.
---
Outside diff comments:
In `@lib/data/sports/4350.json`:
- Around line 51-7500: The JSON entries set strStatus to null for scheduled
fixtures, breaking upstream normalization expected by scripts/fetch-sports.js
and lib/sports.js which use "NS" for not-started events; restore strStatus: "NS"
(or another documented non-null enum) for all upcoming/scheduled events
(identify entries by the strTimestamp/strEvent or idEvent fields, e.g., idEvent
values like "2487453" and similar) so downstream filtering/rendering keeps the
original scheduled state; only leave strStatus null when the state is truly
unknown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8f453581-53f5-4b6a-8cba-4749790ca86e
📒 Files selected for processing (24)
lib/data/sports/4329.jsonlib/data/sports/4330.jsonlib/data/sports/4331.jsonlib/data/sports/4332.jsonlib/data/sports/4334.jsonlib/data/sports/4335.jsonlib/data/sports/4337.jsonlib/data/sports/4339.jsonlib/data/sports/4344.jsonlib/data/sports/4346.jsonlib/data/sports/4350.jsonlib/data/sports/4351.jsonlib/data/sports/4380.jsonlib/data/sports/4387.jsonlib/data/sports/4391.jsonlib/data/sports/4408.jsonlib/data/sports/4424.jsonlib/data/sports/4480.jsonlib/data/sports/4481.jsonlib/data/sports/4482.jsonlib/data/sports/supplemental/136437.jsonlib/data/sports/supplemental/136438.jsonlib/sports.jsscripts/fetch-sports.js
💤 Files with no reviewable changes (3)
- lib/data/sports/supplemental/136438.json
- lib/data/sports/supplemental/136437.json
- lib/sports.js
✅ Files skipped from review due to trivial changes (11)
- lib/data/sports/4480.json
- lib/data/sports/4482.json
- lib/data/sports/4329.json
- lib/data/sports/4346.json
- lib/data/sports/4339.json
- lib/data/sports/4391.json
- lib/data/sports/4335.json
- lib/data/sports/4331.json
- lib/data/sports/4332.json
- lib/data/sports/4408.json
- lib/data/sports/4380.json
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/fetch-sports.js
| "leagueId": "4387", | ||
| "leagueName": "NBA", | ||
| "updatedAt": "2026-06-12T14:31:29.226Z", | ||
| "updatedAt": "2026-06-11T06:34:00.453Z", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find where updatedAt is set and check for timestamp generation issues.
echo "=== Searching for updatedAt assignment in fetch-sports.js ==="
rg -n -C5 'updatedAt.*=' scripts/fetch-sports.js
echo -e "\n=== Checking other league files for backwards timestamps ==="
# Extract updatedAt from all league JSON files and sort to identify anomalies
fd -e json . lib/data/sports/ --exec sh -c 'echo -n "{}: "; jq -r ".updatedAt // \"missing\"" {}'
echo -e "\n=== Searching for Date or timestamp generation patterns ==="
rg -n -C3 'new Date|Date\.now|toISOString' scripts/fetch-sports.jsRepository: DisabledAbel/MakeICS
Length of output: 2956
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Show lib/data/sports/4387.json (around updatedAt) ==="
nl -ba lib/data/sports/4387.json | sed -n '1,80p'
echo "=== Show scripts/fetch-sports.js around the updatedAt writes (lines ~280-520) ==="
if [ -f scripts/fetch-sports.js ]; then
nl -ba scripts/fetch-sports.js | sed -n '260,340p'
nl -ba scripts/fetch-sports.js | sed -n '360,430p'
nl -ba scripts/fetch-sports.js | sed -n '450,510p'
else
echo "scripts/fetch-sports.js not found"
fd -e js fetch-sports . | head -n 20
fiRepository: DisabledAbel/MakeICS
Length of output: 183
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Show lib/data/sports/4387.json (around updatedAt) ==="
nl -ba lib/data/sports/4387.json | sed -n '1,80p'
echo "=== Show scripts/fetch-sports.js around the updatedAt writes (lines ~260-520) ==="
if [ -f scripts/fetch-sports.js ]; then
nl -ba scripts/fetch-sports.js | sed -n '260,340p'
nl -ba scripts/fetch-sports.js | sed -n '360,430p'
nl -ba scripts/fetch-sports.js | sed -n '450,510p'
else
echo "scripts/fetch-sports.js not found"
fd -e js fetch-sports . | head -n 20
fiRepository: DisabledAbel/MakeICS
Length of output: 183
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== lib/data/sports/4387.json (start) ==="
sed -n '1,40p' lib/data/sports/4387.json
echo "=== lib/data/sports/4387.json (around updatedAt) ==="
rg -n '"updatedAt"' lib/data/sports/4387.json
# Print a small window around the match
line=$(rg -n '"updatedAt"' lib/data/sports/4387.json | head -n1 | cut -d: -f1)
start=$((line-5)); end=$((line+5))
sed -n "${start},${end}p" lib/data/sports/4387.json
echo "=== scripts/fetch-sports.js: around updatedAt writes ==="
# show multiple windows around the line numbers from earlier rg output
sed -n '290,330p' scripts/fetch-sports.js
sed -n '395,430p' scripts/fetch-sports.js
sed -n '455,485p' scripts/fetch-sports.js
echo "=== scripts/fetch-sports.js: around 6-hour staleness check ==="
sed -n '365,415p' scripts/fetch-sports.jsRepository: DisabledAbel/MakeICS
Length of output: 3627
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== lib/data/sports/4387.json (lines 1-20) ==="
sed -n '1,20p' lib/data/sports/4387.json
echo "=== lib/data/sports/4387.json (lines 1-10 around updatedAt) ==="
line=$(rg -n '"updatedAt"' lib/data/sports/4387.json | head -n1 | cut -d: -f1 || true)
echo "updatedAt line: ${line}"
start=$((line-5)); end=$((line+5))
if [ "${start}" -lt 1 ]; then start=1; fi
sed -n "${start},${end}p" lib/data/sports/4387.json
echo "=== scripts/fetch-sports.js (staleness check + updatedAt writes) ==="
sed -n '330,430p' scripts/fetch-sports.js
sed -n '430,520p' scripts/fetch-sports.js
sed -n '250,330p' scripts/fetch-sports.js
sed -n '350,405p' scripts/fetch-sports.jsRepository: DisabledAbel/MakeICS
Length of output: 11780
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Find staleness logic ==="
rg -n 'sixHoursMs|isSupplementalStale' -S .
echo "=== Find who reads data.updatedAt / updatedAt for staleness ==="
rg -n 'data\.updatedAt|\.updatedAt.*getTime|now - lastUpdated' -S scripts lib src .
echo "=== Find writes to DATA_DIR / league json updatedAt ==="
rg -n 'DATA_DIR|lib/data/sports|fetchLeagueEvents|events\.length > 0' -S scripts/fetch-sports.js
rg -n 'await fs\.writeFile\([^)]*DATA_DIR|leagueId: league\.id|updatedAt: new Date\(\)\.toISOString\(\)' -S scripts/fetch-sports.jsRepository: DisabledAbel/MakeICS
Length of output: 992
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Where league JSON updatedAt is written (DATA_DIR write) ==="
rg -n 'DATA_DIR|lib/data/sports|leagueId: league\.id|writeFile\(.+\$\{league\.id\}\.json' scripts -S
echo "=== Where updatedAt is read elsewhere ==="
rg -n '\bupdatedAt\b' --glob '!node_modules/**' -S .
echo "=== Where league JSON files are consumed ==="
rg -n 'lib/data/sports|sports/\$\{.*\}\.json|DATA_DIR.*sports' --glob '!node_modules/**' -S .Repository: DisabledAbel/MakeICS
Length of output: 3198
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== lib/sports.js (entire file if small; else key sections) ==="
wc -l lib/sports.js || true
sed -n '1,220p' lib/sports.js
sed -n '220,500p' lib/sports.js
echo "=== test/sports.test.js (around updatedAt cases) ==="
wc -l test/sports.test.js || true
sed -n '1,260p' test/sports.test.js
sed -n '260,520p' test/sports.test.jsRepository: DisabledAbel/MakeICS
Length of output: 22611
Fix updatedAt semantics for league cache files
lib/data/sports/4387.jsoncurrently has"updatedAt": "2026-06-11T06:34:00.453Z".- In
scripts/fetch-sports.js, the 6-hour staleness logic (isSupplementalStale) checks onlylib/data/sports/supplemental/${teamId}.json, so this leagueupdatedAtvalue won’t affect the supplemental 6-hour staleness detection. - If
updatedAtis intended to reflect the current refresh run (for monitoring/cache invalidation), ensure the pipeline/system time isn’t skewed or that league cache files aren’t being written with stale timestamps (otherlib/data/sports/*.jsonfiles also showupdatedAtfrom 2026-06-11).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/data/sports/4387.json` at line 4, The league cache file
lib/data/sports/4387.json contains an outdated "updatedAt" timestamp; update the
pipeline so league files either get their "updatedAt" set to the current run
time when refreshed or ensure system/container time is correct so files are
written with real-time timestamps. In scripts/fetch-sports.js adjust the write
path for league cache files (or the code that generates updatedAt) so it assigns
Date.now()/new Date().toISOString() on successful refresh, or alternatively
ensure the process that writes lib/data/sports/*.json uses the same freshness
logic as isSupplementalStale (referencing isSupplementalStale and teamId) so
6-hour staleness detection is driven by the correct supplemental files rather
than stale league timestamps.
- Implemented structured extraction using Firecrawl API to scrape official team websites and ESPN. - Integrated supplemental bulk data from SportsDataverse/NFLverse for WNBA, NBA, NFL, and NHL. - Updated background workflow to run every 6 hours and persist team-specific supplemental data. - Enhanced merging and deduplication logic in `lib/sports.js` to handle multi-source schedules. - Added deterministic unit tests for event merging.
|
@coderabbitai fix conflicts |
Summary by CodeRabbit
New Features
Bug Fixes
Chores