You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mlb_form_layer.py was making 100+ live MLB Stats API calls per dispatch (one per player per stat group). game_logs_refresh.py (PR #573) already fetches all of this nightly and stores it in live_batting_logs / live_pitching_logs — but the form layer never read from it.
Fix
Wire mlb_form_layer to read from Postgres first, fall back to live API only on miss.
Changes
_get_pg() helper added (same pattern as rest of codebase)
_DB_STAT_MAP — maps MLB Stats API key names to our Postgres column expressions
_fetch_game_log_from_db(player_id, group, window) — queries live_batting_logs / live_pitching_logs, returns synthetic split dicts in the same shape _compute_form() already consumes
_fetch_season_per_game_from_db(player_id, group) — computes season-to-date per-game averages from DB for the baseline ratio
_compute_form() patched: DB-first → API fallback for both recent splits and season baseline
Effect
Dispatch time: eliminates ~100 runtime HTTP calls → form data is instant from DB
Resilience: works even if MLB Stats API is slow or rate-limiting at 8:30 AM
Fallback safe: any DB miss (empty table, new player not yet in logs) falls back to live API exactly as before
No interface change: get_form_adjustment() and prefetch_form_data() are unchanged
Summary by cubic
Switches mlb_form_layer to read form data from Postgres (live_batting_logs/live_pitching_logs) first, with MLB Stats API fallback on miss. This removes ~100 HTTP calls per dispatch and makes form lookups faster and more reliable.
New Features
Added _get_pg() and _DB_STAT_MAP for DB access and stat mapping.
Added DB fetchers for recent splits and season per-game baselines.
Updated _compute_form() to use DB-first, then fallback to live API.
No interface changes (get_form_adjustment, prefetch_form_data).
This PR adds an optional Postgres-backed "DB-first" data path to form computation in mlb_form_layer.py. A new connection helper and two DB fetcher methods load recent game-log splits and season baselines directly from the database, with automatic fallback to the existing MLB Stats API when database results are unavailable.
Changes
Database-first form computation
Layer / File(s)
Summary
DB connection and stat mapping mlb_form_layer.py
_get_pg() helper opens an optional psycopg2 connection from DATABASE_URL, and _DB_STAT_MAP maps API stat keys to database column expressions.
Database fetcher methods mlb_form_layer.py
_fetch_game_log_from_db() queries recent rows from live_batting_logs/live_pitching_logs and synthesizes the expected stat shape; _fetch_season_per_game_from_db() computes per-game season baselines with minimum-game gating, both returning None on missing connection or insufficient data.
Form computation with DB-first flow mlb_form_layer.py
_compute_form() now invokes DB fetchers first, then falls back to existing MLB Stats API fetchers when DB results are None, preserving all existing probability tier mapping and ratio logic.
Sequence Diagram
sequenceDiagram
participant ComputeForm as _compute_form()
participant FetchGameLog as _fetch_game_log_from_db()
participant FetchSeasonPg as _fetch_season_per_game_from_db()
participant PgDB as PostgreSQL
participant MLBStats as MLB Stats API
ComputeForm->>FetchGameLog: query recent game splits
FetchGameLog->>PgDB: SELECT recent rows from live_batting_logs
alt DB connected and has data
PgDB-->>FetchGameLog: recent split stats
FetchGameLog-->>ComputeForm: synthesized stat dict
else DB missing or query fails
FetchGameLog-->>ComputeForm: None
ComputeForm->>MLBStats: fall back to API fetcher
MLBStats-->>ComputeForm: API stat dict
end
ComputeForm->>FetchSeasonPg: query season baselines
FetchSeasonPg->>PgDB: SELECT aggregated season totals
alt sufficient games recorded
PgDB-->>FetchSeasonPg: per-game baselines
FetchSeasonPg-->>ComputeForm: baseline dict
else insufficient data
FetchSeasonPg-->>ComputeForm: None
ComputeForm->>MLBStats: fall back to API fetcher
MLBStats-->>ComputeForm: API baseline dict
end
Loading
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
jaayslaughter-cpu/mework#141: Added the initial MLB Stats API rolling-average form layer that this PR now extends with a DB-first data source alternative.
Poem
🐰 The database hops, so swift and sleek, While API calls take more than a week! DB-first wins—no quota tax, And fallback keeps our safety stacks. ✨
Check skipped - CodeRabbit’s high-level summary is enabled.
Title check
✅ Passed
The title clearly and specifically describes the main change: wiring mlb_form_layer to read from live_batting_logs/live_pitching_logs with a DB-first, API fallback pattern, which matches the core objective of the PR.
Docstring Coverage
✅ Passed
Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check
✅ Passed
Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check
✅ Passed
Check skipped because no linked issues were found for this pull request.
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches📝 Generate docstrings
Create stacked PR
Commit on current branch
🧪 Generate unit tests (beta)
Create PR with unit tests
Commit unit tests in branch pr-574-form-db-wire
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer TIP This summary will be updated as you push new changes.
We reviewed changes in 66c8225...f9a7426 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a database-first approach for fetching MLB game logs and season averages to reduce API quota usage, with a fallback to the live API. Feedback highlights several areas for improvement: implementing database connection pooling to prevent resource exhaustion, resolving a logic discrepancy where the database fetcher uses the current season while the API uses the prior year, and improving code maintainability by utilizing the defined stat mapping and accessing database columns by name rather than index.
The reason will be displayed to describe this comment to others. Learn more.
The _get_pg helper creates a new database connection every time it is called. Since prefetch_form_data iterates over a set of players and calls multiple DB-fetching methods per player, this implementation will open and close hundreds of connections in a single dispatch run. This is highly inefficient and risks exhausting the database connection pool or hitting server-side connection limits. Consider using a connection pool (e.g., psycopg2.pool.SimpleConnectionPool) or opening a single connection at the start of the pre-fetch loop and passing it down.
The reason will be displayed to describe this comment to others. Learn more.
There is a logic discrepancy between the DB fetcher and the API fetcher for the season baseline. The API fetcher (_fetch_season_per_game, line 376) uses the prior year as the baseline, which aligns with the module's documentation (line 6). However, the DB fetcher uses a hardcoded date '2026-03-01', which calculates the average from the current season. This inconsistency means the form adjustment will change significantly depending on whether the data is retrieved from the database or the API fallback.
The reason will be displayed to describe this comment to others. Learn more.
The _DB_STAT_MAP dictionary is defined but never utilized in the subsequent code. Instead, the SQL expressions for calculating stats (e.g., h_1b + h_2b + h_3b + home_runs) are hardcoded directly in the fetcher methods (lines 253, 285, etc.). This violates the DRY principle and makes the code harder to maintain if the schema or calculation logic changes.
The reason will be displayed to describe this comment to others. Learn more.
The use of a conditional expression to execute different cur.execute calls is non-idiomatic and reduces readability. It is better to use a standard if/else block to define the query string and then call execute once at the end.
The reason will be displayed to describe this comment to others. Learn more.
The _row_to_stat helper relies on hardcoded column indices (e.g., r[1], r[5]). This is fragile and will break silently if the SELECT statement in the query is ever modified or reordered. Using psycopg2.extras.RealDictCursor to access columns by name would be much more robust.
The reason will be displayed to describe this comment to others. Learn more.
Expression "cur.execute(f'\n SELECT game_date,\n h_1b, h_2b, h_3b, home_runs,\n b_rbi, b_runs, b_k,\n COALESCE(strikeouts, 0) AS strikeouts,\n COALESCE(earnedruns, 0) AS earnedruns,\n COALESCE(outs, 0) AS outs_pitched\n FROM {table}\n WHERE mlbam_id = %s\n ORDER BY game_date DESC\n LIMIT %s\n ', (player_id, window)) if table == 'live_batting_logs' else cur.execute(f'\n SELECT game_date,\n 0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,\n 0 AS b_rbi, 0 AS b_runs, 0 AS b_k,\n COALESCE(strikeouts, 0) AS strikeouts,\n COALESCE(earnedruns, 0) AS earnedruns,\n COALESCE(outs, 0) AS outs_pitched\n FROM {table}\n WHERE mlbam_id = %s\n ORDER BY game_date DESC\n LIMIT %s\n ', (player_id, window))" is assigned to nothing
An expression that is not a function call is assigned to nothing. Probably something else was intended here. We recommend to review this.
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
mlb_form_layer.py (2)
57-66: 💤 Low value
_DB_STAT_MAP is defined but never used.
This mapping is not referenced anywhere in the file. The SQL queries in _fetch_game_log_from_db() and _fetch_season_per_game_from_db() hardcode the column expressions directly. Either remove this dead code or refactor the fetchers to use the mapping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mlb_form_layer.py` around lines 57 - 66, The _DB_STAT_MAP constant is dead
code; either remove it or refactor the DB fetchers to use it: update
_fetch_game_log_from_db and _fetch_season_per_game_from_db to look up the
desired stat key in _DB_STAT_MAP and inject the appropriate table column
expression (first tuple element for batting queries, second for pitching
queries), handling None as a missing column (raise clear error or skip that
stat) and falling back to the existing literal expressions only if no mapping
exists; alternatively, delete _DB_STAT_MAP if you choose not to centralize
expressions.
212-241: ⚡ Quick win
Refactor confusing ternary execute pattern into explicit if/else.
The cur.execute(...) if table == ... else cur.execute(...) pattern is unusual and hard to read. Both branches return None, so it works, but it obscures intent and complicates debugging.
Regarding the static analysis SQL injection warning: this is a false positive since table is derived from internal logic and can only be "live_batting_logs" or "live_pitching_logs".
♻️ Proposed refactor to explicit if/else
try:
with conn, conn.cursor() as cur:
- cur.execute(- f"""- SELECT game_date,- h_1b, h_2b, h_3b, home_runs,- b_rbi, b_runs, b_k,- COALESCE(strikeouts, 0) AS strikeouts,- COALESCE(earnedruns, 0) AS earnedruns,- COALESCE(outs, 0) AS outs_pitched- FROM {table}- WHERE mlbam_id = %s- ORDER BY game_date DESC- LIMIT %s- """,- (player_id, window),- ) if table == "live_batting_logs" else cur.execute(- f"""- SELECT game_date,- 0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,- 0 AS b_rbi, 0 AS b_runs, 0 AS b_k,- COALESCE(strikeouts, 0) AS strikeouts,- COALESCE(earnedruns, 0) AS earnedruns,- COALESCE(outs, 0) AS outs_pitched- FROM {table}- WHERE mlbam_id = %s- ORDER BY game_date DESC- LIMIT %s- """,- (player_id, window),- )+ if table == "live_batting_logs":+ cur.execute(+ """+ SELECT game_date,+ h_1b, h_2b, h_3b, home_runs,+ b_rbi, b_runs, b_k,+ COALESCE(strikeouts, 0) AS strikeouts,+ COALESCE(earnedruns, 0) AS earnedruns,+ COALESCE(outs, 0) AS outs_pitched+ FROM live_batting_logs+ WHERE mlbam_id = %s+ ORDER BY game_date DESC+ LIMIT %s+ """,+ (player_id, window),+ )+ else:+ cur.execute(+ """+ SELECT game_date,+ 0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,+ 0 AS b_rbi, 0 AS b_runs, 0 AS b_k,+ COALESCE(strikeouts, 0) AS strikeouts,+ COALESCE(earnedruns, 0) AS earnedruns,+ COALESCE(outs, 0) AS outs_pitched+ FROM live_pitching_logs+ WHERE mlbam_id = %s+ ORDER BY game_date DESC+ LIMIT %s+ """,+ (player_id, window),+ )
rows = cur.fetchall()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mlb_form_layer.py` around lines 212 - 241, The current inline ternary calling
cur.execute(...) depending on table is hard to read; replace it with an explicit
if/else branch that calls cur.execute(...) in each branch so the intent is clear
and easier to debug. Inside the with conn, conn.cursor() as cur: block, check if
table == "live_batting_logs" then execute the batting SQL using (player_id,
window), else execute the pitching/zeroed-batting SQL using (player_id, window);
keep the same parameterized placeholders and COALESCE usage for
strikeouts/earnedruns/outs_pitched and do not inline the execute call as an
expression. Ensure you reference the existing variables conn, cur, table,
player_id, and window when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mlb_form_layer.py`:
- Around line 290-301: The queries in _fetch_season_per_game_from_db currently
hardcode '2026-03-01'; compute a dynamic season_start (e.g. season_start =
f"{datetime.now().year}-03-01" after importing datetime) at the top of the
function and replace the literal '2026-03-01' in both the batting and pitching
SQL branches with a parameterized placeholder, passing season_start in the
execute parameter tuple (replace (player_id,) with (player_id, season_start) or
similar) so the season filter adapts each year.
---
Nitpick comments:
In `@mlb_form_layer.py`:
- Around line 57-66: The _DB_STAT_MAP constant is dead code; either remove it or
refactor the DB fetchers to use it: update _fetch_game_log_from_db and
_fetch_season_per_game_from_db to look up the desired stat key in _DB_STAT_MAP
and inject the appropriate table column expression (first tuple element for
batting queries, second for pitching queries), handling None as a missing column
(raise clear error or skip that stat) and falling back to the existing literal
expressions only if no mapping exists; alternatively, delete _DB_STAT_MAP if you
choose not to centralize expressions.
- Around line 212-241: The current inline ternary calling cur.execute(...)
depending on table is hard to read; replace it with an explicit if/else branch
that calls cur.execute(...) in each branch so the intent is clear and easier to
debug. Inside the with conn, conn.cursor() as cur: block, check if table ==
"live_batting_logs" then execute the batting SQL using (player_id, window), else
execute the pitching/zeroed-batting SQL using (player_id, window); keep the same
parameterized placeholders and COALESCE usage for
strikeouts/earnedruns/outs_pitched and do not inline the execute call as an
expression. Ensure you reference the existing variables conn, cur, table,
player_id, and window when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
Push a commit to this branch (recommended)
Create a new PR with the fixes
ℹ️ Review info⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c3bca0e-348a-4e3a-b1fe-5d6fc9b8193f
📥 Commits
Reviewing files that changed from the base of the PR and between 66c8225 and f9a7426.
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Potential issue | 🟠 Major | ⚡ Quick win
Hardcoded season date '2026-03-01' will break in future seasons.
The season filter is hardcoded to 2026-03-01, which won't adapt when the season changes. The API fallback at line 376 correctly uses datetime.now().year - 1 to determine the baseline season. Apply similar dynamic logic here.
🐛 Proposed fix: derive season start dynamically
At the start of the method, compute the season start date:
fromdatetimeimportdatetime# Inside _fetch_season_per_game_from_db, before the query:current_year=datetime.now().yearseason_start=f"{current_year}-03-01"
Then use season_start as a parameter:
cur.execute(
"""
SELECT COUNT(*),
SUM(h_1b+h_2b+h_3b+home_runs),
SUM(b_rbi), SUM(b_runs),
SUM(h_1b + 2*h_2b + 3*h_3b + 4*home_runs),
SUM(b_k)
FROM live_batting_logs
- WHERE mlbam_id = %s AND game_date >= '2026-03-01'+ WHERE mlbam_id = %s AND game_date >= %s
""",
- (player_id,),+ (player_id, season_start),
)
Apply the same change to the pitching query.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mlb_form_layer.py` around lines 290 - 301, The queries in
_fetch_season_per_game_from_db currently hardcode '2026-03-01'; compute a
dynamic season_start (e.g. season_start = f"{datetime.now().year}-03-01" after
importing datetime) at the top of the function and replace the literal
'2026-03-01' in both the batting and pitching SQL branches with a parameterized
placeholder, passing season_start in the execute parameter tuple (replace
(player_id,) with (player_id, season_start) or similar) so the season filter
adapts each year.
This is probably one of the two most exploited vulnerabilities in web applications and has led to a number of high profile company breaches. It occurs when an application fails to sanitize or validate input before using it to dynamically construct a statement. An attacker that exploits this vulnerability will be able to gain access to the underlying database and view or modify data without permission.
OWASP SQL Injection Prevention Cheat Sheet - This article is focused on providing clear, simple, actionable guidance for preventing SQL Injection flaws in your applications.
OWASP SQL Injection - OWASP community page with comprehensive information about SQL injection, and links to various OWASP resources to help detect or prevent it.
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="mlb_form_layer.py">
<violation number="1" location="mlb_form_layer.py:218">
P1: The batting DB query references non-existent columns (`strikeouts`, `earnedruns`, `outs`) in `live_batting_logs`, so hitter lookups fail and always fall back to the live API.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review. Re-trigger cubic
The reason will be displayed to describe this comment to others. Learn more.
P1: The batting DB query references non-existent columns (strikeouts, earnedruns, outs) in live_batting_logs, so hitter lookups fail and always fall back to the live API.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At mlb_form_layer.py, line 218:
<comment>The batting DB query references non-existent columns (`strikeouts`, `earnedruns`, `outs`) in `live_batting_logs`, so hitter lookups fail and always fall back to the live API.</comment>
<file context>
@@ -165,6 +190,143 @@ def _resolve_player_id(self, player_name: str) -> int | None:
+ SELECT game_date,
+ h_1b, h_2b, h_3b, home_runs,
+ b_rbi, b_runs, b_k,
+ COALESCE(strikeouts, 0) AS strikeouts,
+ COALESCE(earnedruns, 0) AS earnedruns,
+ COALESCE(outs, 0) AS outs_pitched
</file context>
Suggested change
COALESCE(strikeouts, 0) ASstrikeouts,
COALESCE(earnedruns, 0) ASearnedruns,
COALESCE(outs, 0) ASouts_pitched
0ASstrikeouts,
0ASearnedruns,
0ASouts_pitched
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
mlb_form_layer.pywas making 100+ live MLB Stats API calls per dispatch (one per player per stat group).game_logs_refresh.py(PR #573) already fetches all of this nightly and stores it inlive_batting_logs/live_pitching_logs— but the form layer never read from it.Fix
Wire
mlb_form_layerto read from Postgres first, fall back to live API only on miss.Changes
_get_pg()helper added (same pattern as rest of codebase)_DB_STAT_MAP— maps MLB Stats API key names to our Postgres column expressions_fetch_game_log_from_db(player_id, group, window)— querieslive_batting_logs/live_pitching_logs, returns synthetic split dicts in the same shape_compute_form()already consumes_fetch_season_per_game_from_db(player_id, group)— computes season-to-date per-game averages from DB for the baseline ratio_compute_form()patched: DB-first → API fallback for both recent splits and season baselineEffect
get_form_adjustment()andprefetch_form_data()are unchangedSummary by cubic
Switches
mlb_form_layerto read form data from Postgres (live_batting_logs/live_pitching_logs) first, with MLB Stats API fallback on miss. This removes ~100 HTTP calls per dispatch and makes form lookups faster and more reliable._get_pg()and_DB_STAT_MAPfor DB access and stat mapping._compute_form()to use DB-first, then fallback to live API.get_form_adjustment,prefetch_form_data).Written for commit f9a7426. Summary will update on new commits. Review in cubic
Summary by CodeRabbit