Skip to content

PR #574: Wire mlb_form_layer → live_batting_logs/live_pitching_logs (DB-first, API fallback)#444

Merged
jaayslaughter-cpu merged 1 commit into
mainfrom
pr-574-form-db-wire
May 16, 2026
Merged

PR #574: Wire mlb_form_layer → live_batting_logs/live_pitching_logs (DB-first, API fallback)#444
jaayslaughter-cpu merged 1 commit into
mainfrom
pr-574-form-db-wire

Conversation

@jaayslaughter-cpu
Copy link
Copy Markdown
Owner

@jaayslaughter-cpu jaayslaughter-cpu commented May 16, 2026

Problem

mlb_form_layer.py was making 100+ live MLB Stats API calls per dispatch (one per player per stat group). game_logs_refresh.py (PR #573) already fetches all of this nightly and stores it in live_batting_logs / live_pitching_logs — but the form layer never read from it.

Fix

Wire mlb_form_layer to read from Postgres first, fall back to live API only on miss.

Changes

  • _get_pg() helper added (same pattern as rest of codebase)
  • _DB_STAT_MAP — maps MLB Stats API key names to our Postgres column expressions
  • _fetch_game_log_from_db(player_id, group, window) — queries live_batting_logs / live_pitching_logs, returns synthetic split dicts in the same shape _compute_form() already consumes
  • _fetch_season_per_game_from_db(player_id, group) — computes season-to-date per-game averages from DB for the baseline ratio
  • _compute_form() patched: DB-first → API fallback for both recent splits and season baseline

Effect

  • Dispatch time: eliminates ~100 runtime HTTP calls → form data is instant from DB
  • Resilience: works even if MLB Stats API is slow or rate-limiting at 8:30 AM
  • Fallback safe: any DB miss (empty table, new player not yet in logs) falls back to live API exactly as before
  • No interface change: get_form_adjustment() and prefetch_form_data() are unchanged

Summary by cubic

Switches mlb_form_layer to read form data from Postgres (live_batting_logs/live_pitching_logs) first, with MLB Stats API fallback on miss. This removes ~100 HTTP calls per dispatch and makes form lookups faster and more reliable.

  • New Features
    • Added _get_pg() and _DB_STAT_MAP for DB access and stat mapping.
    • Added DB fetchers for recent splits and season per-game baselines.
    • Updated _compute_form() to use DB-first, then fallback to live API.
    • No interface changes (get_form_adjustment, prefetch_form_data).

Written for commit f9a7426. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • Chores
    • Added optional database-backed data fetching layer for form computation to improve performance, with fallback to existing data sources.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

📝 Walkthrough

Walkthrough

This PR adds an optional Postgres-backed "DB-first" data path to form computation in mlb_form_layer.py. A new connection helper and two DB fetcher methods load recent game-log splits and season baselines directly from the database, with automatic fallback to the existing MLB Stats API when database results are unavailable.

Changes

Database-first form computation

Layer / File(s) Summary
DB connection and stat mapping
mlb_form_layer.py
_get_pg() helper opens an optional psycopg2 connection from DATABASE_URL, and _DB_STAT_MAP maps API stat keys to database column expressions.
Database fetcher methods
mlb_form_layer.py
_fetch_game_log_from_db() queries recent rows from live_batting_logs/live_pitching_logs and synthesizes the expected stat shape; _fetch_season_per_game_from_db() computes per-game season baselines with minimum-game gating, both returning None on missing connection or insufficient data.
Form computation with DB-first flow
mlb_form_layer.py
_compute_form() now invokes DB fetchers first, then falls back to existing MLB Stats API fetchers when DB results are None, preserving all existing probability tier mapping and ratio logic.

Sequence Diagram

sequenceDiagram
  participant ComputeForm as _compute_form()
  participant FetchGameLog as _fetch_game_log_from_db()
  participant FetchSeasonPg as _fetch_season_per_game_from_db()
  participant PgDB as PostgreSQL
  participant MLBStats as MLB Stats API
  ComputeForm->>FetchGameLog: query recent game splits
  FetchGameLog->>PgDB: SELECT recent rows from live_batting_logs
  alt DB connected and has data
    PgDB-->>FetchGameLog: recent split stats
    FetchGameLog-->>ComputeForm: synthesized stat dict
  else DB missing or query fails
    FetchGameLog-->>ComputeForm: None
    ComputeForm->>MLBStats: fall back to API fetcher
    MLBStats-->>ComputeForm: API stat dict
  end
  ComputeForm->>FetchSeasonPg: query season baselines
  FetchSeasonPg->>PgDB: SELECT aggregated season totals
  alt sufficient games recorded
    PgDB-->>FetchSeasonPg: per-game baselines
    FetchSeasonPg-->>ComputeForm: baseline dict
  else insufficient data
    FetchSeasonPg-->>ComputeForm: None
    ComputeForm->>MLBStats: fall back to API fetcher
    MLBStats-->>ComputeForm: API baseline dict
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • jaayslaughter-cpu/mework#141: Added the initial MLB Stats API rolling-average form layer that this PR now extends with a DB-first data source alternative.

Poem

🐰 The database hops, so swift and sleek,
While API calls take more than a week!
DB-first wins—no quota tax,
And fallback keeps our safety stacks.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: wiring mlb_form_layer to read from live_batting_logs/live_pitching_logs with a DB-first, API fallback pattern, which matches the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-574-form-db-wire

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 2 critical · 1 high · 1 medium

Alerts:
⚠ 4 issues (≤ 0 issues of at least minor severity)

Results:
4 new issues

Category Results
ErrorProne 1 high
Security 2 critical
1 medium

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@deepsource-io
Copy link
Copy Markdown

deepsource-io Bot commented May 16, 2026

DeepSource Code Review

We reviewed changes in 66c8225...f9a7426 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Docker May 16, 2026 7:11a.m. Review ↗
JavaScript May 16, 2026 7:11a.m. Review ↗
Python May 16, 2026 7:11a.m. Review ↗
SQL May 16, 2026 7:11a.m. Review ↗
Secrets May 16, 2026 7:11a.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a database-first approach for fetching MLB game logs and season averages to reduce API quota usage, with a fallback to the live API. Feedback highlights several areas for improvement: implementing database connection pooling to prevent resource exhaustion, resolving a logic discrepancy where the database fetcher uses the current season while the API uses the prior year, and improving code maintainability by utilizing the defined stat mapping and accessing database columns by name rather than index.

Comment thread mlb_form_layer.py
Comment on lines +45 to +55
def _get_pg():
"""Return a psycopg2 connection or None."""
import os
try:
import psycopg2
db_url = os.environ.get("DATABASE_URL", "")
if not db_url:
return None
return psycopg2.connect(db_url)
except Exception:
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _get_pg helper creates a new database connection every time it is called. Since prefetch_form_data iterates over a set of players and calls multiple DB-fetching methods per player, this implementation will open and close hundreds of connections in a single dispatch run. This is highly inefficient and risks exhausting the database connection pool or hitting server-side connection limits. Consider using a connection pool (e.g., psycopg2.pool.SimpleConnectionPool) or opening a single connection at the start of the pre-fetch loop and passing it down.

Comment thread mlb_form_layer.py
Comment on lines +290 to +301
WHERE mlbam_id = %s AND game_date >= '2026-03-01'
""",
(player_id,),
)
else:
cur.execute(
"""
SELECT COUNT(*),
0, 0, 0, 0,
SUM(strikeouts), SUM(earnedruns)
FROM live_pitching_logs
WHERE mlbam_id = %s AND game_date >= '2026-03-01'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a logic discrepancy between the DB fetcher and the API fetcher for the season baseline. The API fetcher (_fetch_season_per_game, line 376) uses the prior year as the baseline, which aligns with the module's documentation (line 6). However, the DB fetcher uses a hardcoded date '2026-03-01', which calculates the average from the current season. This inconsistency means the form adjustment will change significantly depending on whether the data is retrieved from the database or the API fallback.

Comment thread mlb_form_layer.py
Comment on lines +59 to +66
_DB_STAT_MAP = {
"hits": ("(h_1b + h_2b + h_3b + home_runs)", None),
"rbi": ("b_rbi", None),
"runs": ("b_runs", None),
"totalBases": ("(h_1b + 2*h_2b + 3*h_3b + 4*home_runs)", None),
"strikeOuts": ("b_k", "strikeouts"),
"earnedRuns": (None, "earnedruns"),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _DB_STAT_MAP dictionary is defined but never utilized in the subsequent code. Instead, the SQL expressions for calculating stats (e.g., h_1b + h_2b + h_3b + home_runs) are hardcoded directly in the fetcher methods (lines 253, 285, etc.). This violates the DRY principle and makes the code harder to maintain if the schema or calculation logic changes.

Comment thread mlb_form_layer.py
Comment on lines +213 to +241
cur.execute(
f"""
SELECT game_date,
h_1b, h_2b, h_3b, home_runs,
b_rbi, b_runs, b_k,
COALESCE(strikeouts, 0) AS strikeouts,
COALESCE(earnedruns, 0) AS earnedruns,
COALESCE(outs, 0) AS outs_pitched
FROM {table}
WHERE mlbam_id = %s
ORDER BY game_date DESC
LIMIT %s
""",
(player_id, window),
) if table == "live_batting_logs" else cur.execute(
f"""
SELECT game_date,
0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,
0 AS b_rbi, 0 AS b_runs, 0 AS b_k,
COALESCE(strikeouts, 0) AS strikeouts,
COALESCE(earnedruns, 0) AS earnedruns,
COALESCE(outs, 0) AS outs_pitched
FROM {table}
WHERE mlbam_id = %s
ORDER BY game_date DESC
LIMIT %s
""",
(player_id, window),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of a conditional expression to execute different cur.execute calls is non-idiomatic and reduces readability. It is better to use a standard if/else block to define the query string and then call execute once at the end.

Comment thread mlb_form_layer.py
Comment on lines +252 to +262
def _row_to_stat(r) -> dict:
hits = (r[1] or 0) + (r[2] or 0) + (r[3] or 0) + (r[4] or 0)
tb = (r[1] or 0) + 2*(r[2] or 0) + 3*(r[3] or 0) + 4*(r[4] or 0)
return {
"hits": hits,
"rbi": r[5] or 0,
"runs": r[6] or 0,
"totalBases": tb,
"strikeOuts": r[7] if group == "hitting" else r[8],
"earnedRuns": r[9] or 0,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _row_to_stat helper relies on hardcoded column indices (e.g., r[1], r[5]). This is fragile and will break silently if the SELECT statement in the query is ever modified or reordered. Using psycopg2.extras.RealDictCursor to access columns by name would be much more robust.

@secure-code-warrior-for-github
Copy link
Copy Markdown

Micro-Learning Topic: Resource exhaustion (Detected by phrase)

Matched on "resource exhaustion"

What is this? (2min video)

Allocating objects or timers with user-controlled sizes or durations can cause resource exhaustion.

Try a challenge in Secure Code Warrior

Comment thread mlb_form_layer.py
table = "live_batting_logs" if group == "hitting" else "live_pitching_logs"
try:
with conn, conn.cursor() as cur:
cur.execute(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression "cur.execute(f'\n SELECT game_date,\n h_1b, h_2b, h_3b, home_runs,\n b_rbi, b_runs, b_k,\n COALESCE(strikeouts, 0) AS strikeouts,\n COALESCE(earnedruns, 0) AS earnedruns,\n COALESCE(outs, 0) AS outs_pitched\n FROM {table}\n WHERE mlbam_id = %s\n ORDER BY game_date DESC\n LIMIT %s\n ', (player_id, window)) if table == 'live_batting_logs' else cur.execute(f'\n SELECT game_date,\n 0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,\n 0 AS b_rbi, 0 AS b_runs, 0 AS b_k,\n COALESCE(strikeouts, 0) AS strikeouts,\n COALESCE(earnedruns, 0) AS earnedruns,\n COALESCE(outs, 0) AS outs_pitched\n FROM {table}\n WHERE mlbam_id = %s\n ORDER BY game_date DESC\n LIMIT %s\n ', (player_id, window))" is assigned to nothing


An expression that is not a function call is assigned to nothing. Probably something else was intended here. We recommend to review this.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mlb_form_layer.py (2)

57-66: 💤 Low value

_DB_STAT_MAP is defined but never used.

This mapping is not referenced anywhere in the file. The SQL queries in _fetch_game_log_from_db() and _fetch_season_per_game_from_db() hardcode the column expressions directly. Either remove this dead code or refactor the fetchers to use the mapping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mlb_form_layer.py` around lines 57 - 66, The _DB_STAT_MAP constant is dead
code; either remove it or refactor the DB fetchers to use it: update
_fetch_game_log_from_db and _fetch_season_per_game_from_db to look up the
desired stat key in _DB_STAT_MAP and inject the appropriate table column
expression (first tuple element for batting queries, second for pitching
queries), handling None as a missing column (raise clear error or skip that
stat) and falling back to the existing literal expressions only if no mapping
exists; alternatively, delete _DB_STAT_MAP if you choose not to centralize
expressions.

212-241: ⚡ Quick win

Refactor confusing ternary execute pattern into explicit if/else.

The cur.execute(...) if table == ... else cur.execute(...) pattern is unusual and hard to read. Both branches return None, so it works, but it obscures intent and complicates debugging.

Regarding the static analysis SQL injection warning: this is a false positive since table is derived from internal logic and can only be "live_batting_logs" or "live_pitching_logs".

♻️ Proposed refactor to explicit if/else
         try:
             with conn, conn.cursor() as cur:
-                cur.execute(
-                    f"""
-                    SELECT game_date,
-                           h_1b, h_2b, h_3b, home_runs,
-                           b_rbi, b_runs, b_k,
-                           COALESCE(strikeouts, 0)   AS strikeouts,
-                           COALESCE(earnedruns, 0)   AS earnedruns,
-                           COALESCE(outs, 0)         AS outs_pitched
-                    FROM   {table}
-                    WHERE  mlbam_id = %s
-                    ORDER  BY game_date DESC
-                    LIMIT  %s
-                    """,
-                    (player_id, window),
-                ) if table == "live_batting_logs" else cur.execute(
-                    f"""
-                    SELECT game_date,
-                           0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,
-                           0 AS b_rbi, 0 AS b_runs, 0 AS b_k,
-                           COALESCE(strikeouts, 0)  AS strikeouts,
-                           COALESCE(earnedruns, 0)  AS earnedruns,
-                           COALESCE(outs, 0)        AS outs_pitched
-                    FROM   {table}
-                    WHERE  mlbam_id = %s
-                    ORDER  BY game_date DESC
-                    LIMIT  %s
-                    """,
-                    (player_id, window),
-                )
+                if table == "live_batting_logs":
+                    cur.execute(
+                        """
+                        SELECT game_date,
+                               h_1b, h_2b, h_3b, home_runs,
+                               b_rbi, b_runs, b_k,
+                               COALESCE(strikeouts, 0)   AS strikeouts,
+                               COALESCE(earnedruns, 0)   AS earnedruns,
+                               COALESCE(outs, 0)         AS outs_pitched
+                        FROM   live_batting_logs
+                        WHERE  mlbam_id = %s
+                        ORDER  BY game_date DESC
+                        LIMIT  %s
+                        """,
+                        (player_id, window),
+                    )
+                else:
+                    cur.execute(
+                        """
+                        SELECT game_date,
+                               0 AS h_1b, 0 AS h_2b, 0 AS h_3b, 0 AS home_runs,
+                               0 AS b_rbi, 0 AS b_runs, 0 AS b_k,
+                               COALESCE(strikeouts, 0)  AS strikeouts,
+                               COALESCE(earnedruns, 0)  AS earnedruns,
+                               COALESCE(outs, 0)        AS outs_pitched
+                        FROM   live_pitching_logs
+                        WHERE  mlbam_id = %s
+                        ORDER  BY game_date DESC
+                        LIMIT  %s
+                        """,
+                        (player_id, window),
+                    )
                 rows = cur.fetchall()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mlb_form_layer.py` around lines 212 - 241, The current inline ternary calling
cur.execute(...) depending on table is hard to read; replace it with an explicit
if/else branch that calls cur.execute(...) in each branch so the intent is clear
and easier to debug. Inside the with conn, conn.cursor() as cur: block, check if
table == "live_batting_logs" then execute the batting SQL using (player_id,
window), else execute the pitching/zeroed-batting SQL using (player_id, window);
keep the same parameterized placeholders and COALESCE usage for
strikeouts/earnedruns/outs_pitched and do not inline the execute call as an
expression. Ensure you reference the existing variables conn, cur, table,
player_id, and window when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mlb_form_layer.py`:
- Around line 290-301: The queries in _fetch_season_per_game_from_db currently
hardcode '2026-03-01'; compute a dynamic season_start (e.g. season_start =
f"{datetime.now().year}-03-01" after importing datetime) at the top of the
function and replace the literal '2026-03-01' in both the batting and pitching
SQL branches with a parameterized placeholder, passing season_start in the
execute parameter tuple (replace (player_id,) with (player_id, season_start) or
similar) so the season filter adapts each year.

---

Nitpick comments:
In `@mlb_form_layer.py`:
- Around line 57-66: The _DB_STAT_MAP constant is dead code; either remove it or
refactor the DB fetchers to use it: update _fetch_game_log_from_db and
_fetch_season_per_game_from_db to look up the desired stat key in _DB_STAT_MAP
and inject the appropriate table column expression (first tuple element for
batting queries, second for pitching queries), handling None as a missing column
(raise clear error or skip that stat) and falling back to the existing literal
expressions only if no mapping exists; alternatively, delete _DB_STAT_MAP if you
choose not to centralize expressions.
- Around line 212-241: The current inline ternary calling cur.execute(...)
depending on table is hard to read; replace it with an explicit if/else branch
that calls cur.execute(...) in each branch so the intent is clear and easier to
debug. Inside the with conn, conn.cursor() as cur: block, check if table ==
"live_batting_logs" then execute the batting SQL using (player_id, window), else
execute the pitching/zeroed-batting SQL using (player_id, window); keep the same
parameterized placeholders and COALESCE usage for
strikeouts/earnedruns/outs_pitched and do not inline the execute call as an
expression. Ensure you reference the existing variables conn, cur, table,
player_id, and window when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c3bca0e-348a-4e3a-b1fe-5d6fc9b8193f

📥 Commits

Reviewing files that changed from the base of the PR and between 66c8225 and f9a7426.

📒 Files selected for processing (1)
  • mlb_form_layer.py

Comment thread mlb_form_layer.py
Comment on lines +290 to +301
WHERE mlbam_id = %s AND game_date >= '2026-03-01'
""",
(player_id,),
)
else:
cur.execute(
"""
SELECT COUNT(*),
0, 0, 0, 0,
SUM(strikeouts), SUM(earnedruns)
FROM live_pitching_logs
WHERE mlbam_id = %s AND game_date >= '2026-03-01'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hardcoded season date '2026-03-01' will break in future seasons.

The season filter is hardcoded to 2026-03-01, which won't adapt when the season changes. The API fallback at line 376 correctly uses datetime.now().year - 1 to determine the baseline season. Apply similar dynamic logic here.

🐛 Proposed fix: derive season start dynamically

At the start of the method, compute the season start date:

from datetime import datetime

# Inside _fetch_season_per_game_from_db, before the query:
current_year = datetime.now().year
season_start = f"{current_year}-03-01"

Then use season_start as a parameter:

                     cur.execute(
                         """
                         SELECT COUNT(*),
                                SUM(h_1b+h_2b+h_3b+home_runs),
                                SUM(b_rbi), SUM(b_runs),
                                SUM(h_1b + 2*h_2b + 3*h_3b + 4*home_runs),
                                SUM(b_k)
                         FROM live_batting_logs
-                        WHERE mlbam_id = %s AND game_date >= '2026-03-01'
+                        WHERE mlbam_id = %s AND game_date >= %s
                         """,
-                        (player_id,),
+                        (player_id, season_start),
                     )

Apply the same change to the pitching query.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mlb_form_layer.py` around lines 290 - 301, The queries in
_fetch_season_per_game_from_db currently hardcode '2026-03-01'; compute a
dynamic season_start (e.g. season_start = f"{datetime.now().year}-03-01" after
importing datetime) at the top of the function and replace the literal
'2026-03-01' in both the batting and pitching SQL branches with a parameterized
placeholder, passing season_start in the execute parameter tuple (replace
(player_id,) with (player_id, season_start) or similar) so the season filter
adapts each year.

@secure-code-warrior-for-github
Copy link
Copy Markdown

Micro-Learning Topic: SQL injection (Detected by phrase)

Matched on "SQL injection"

What is this? (2min video)

This is probably one of the two most exploited vulnerabilities in web applications and has led to a number of high profile company breaches. It occurs when an application fails to sanitize or validate input before using it to dynamically construct a statement. An attacker that exploits this vulnerability will be able to gain access to the underlying database and view or modify data without permission.

Try a challenge in Secure Code Warrior

Helpful references

@jaayslaughter-cpu jaayslaughter-cpu merged commit fb3e740 into main May 16, 2026
6 of 9 checks passed
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="mlb_form_layer.py">

<violation number="1" location="mlb_form_layer.py:218">
P1: The batting DB query references non-existent columns (`strikeouts`, `earnedruns`, `outs`) in `live_batting_logs`, so hitter lookups fail and always fall back to the live API.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread mlb_form_layer.py
Comment on lines +218 to +220
COALESCE(strikeouts, 0) AS strikeouts,
COALESCE(earnedruns, 0) AS earnedruns,
COALESCE(outs, 0) AS outs_pitched
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The batting DB query references non-existent columns (strikeouts, earnedruns, outs) in live_batting_logs, so hitter lookups fail and always fall back to the live API.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At mlb_form_layer.py, line 218:

<comment>The batting DB query references non-existent columns (`strikeouts`, `earnedruns`, `outs`) in `live_batting_logs`, so hitter lookups fail and always fall back to the live API.</comment>

<file context>
@@ -165,6 +190,143 @@ def _resolve_player_id(self, player_name: str) -> int | None:
+                    SELECT game_date,
+                           h_1b, h_2b, h_3b, home_runs,
+                           b_rbi, b_runs, b_k,
+                           COALESCE(strikeouts, 0)   AS strikeouts,
+                           COALESCE(earnedruns, 0)   AS earnedruns,
+                           COALESCE(outs, 0)         AS outs_pitched
</file context>
Suggested change
COALESCE(strikeouts, 0) AS strikeouts,
COALESCE(earnedruns, 0) AS earnedruns,
COALESCE(outs, 0) AS outs_pitched
0 AS strikeouts,
0 AS earnedruns,
0 AS outs_pitched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant