Skip to content

[codex] Run update scraper unbuffered#18

Merged
ethanolivertroy merged 1 commit into
mainfrom
codex/unbuffer-update-scraper
May 14, 2026
Merged

[codex] Run update scraper unbuffered#18
ethanolivertroy merged 1 commit into
mainfrom
codex/unbuffer-update-scraper

Conversation

@ethanolivertroy
Copy link
Copy Markdown
Member

Summary

  • run the update scraper with python -u
  • set PYTHONUNBUFFERED=1 for the scraper step

Why

The live data update can spend a long time in the scrape step. Running Python unbuffered makes scraper progress and warning output available during the run instead of only after process exit, which improves diagnosability for long weekly updates.

Validation

  • python3 validate_api.py
  • git diff --check

This is workflow-only; PR validation should run after creation.

@ethanolivertroy ethanolivertroy marked this pull request as ready for review May 14, 2026 05:14
@ethanolivertroy ethanolivertroy merged commit e95406d into main May 14, 2026
1 check passed
@ethanolivertroy ethanolivertroy deleted the codex/unbuffer-update-scraper branch May 14, 2026 05:14
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Run update scraper with unbuffered Python output

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Run update scraper with unbuffered Python output
• Set PYTHONUNBUFFERED environment variable for scraper step
• Improves diagnosability of long-running weekly data updates
• Makes scraper progress and warnings visible during execution
Diagram
flowchart LR
  A["Update Data Workflow"] -->|"Run scraper step"| B["Python unbuffered mode"]
  B -->|"PYTHONUNBUFFERED=1"| C["Real-time output visibility"]
  B -->|"python -u flag"| C
  C -->|"Improves"| D["Diagnosability"]
Loading

Grey Divider

File Changes

1. .github/workflows/update-data.yml ⚙️ Configuration changes +2/-1

Enable unbuffered output for update scraper

• Added PYTHONUNBUFFERED: '1' environment variable to scraper step
• Changed scraper invocation from python scraper.py to python -u scraper.py
• Enables unbuffered output for better real-time visibility during long-running updates

.github/workflows/update-data.yml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 14, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Advisory comments

1. Duplicate unbuffered configuration 🐞 Bug ⚙ Maintainability
Description
The scraper step sets PYTHONUNBUFFERED=1 and also runs python -u, which are two independent ways
to enable unbuffered output. Keeping both creates unnecessary duplication and can confuse future
edits about which mechanism is intended to be authoritative.
Code

.github/workflows/update-data.yml[R51-56]

        env:
          ALGORITHM_SOURCE: crawl4ai
+          PYTHONUNBUFFERED: '1'
          SKIP_ALGORITHMS: ${{ github.event.inputs.skip_algorithms == 'true' && '1' || '0' }}
        run: |
-          python scraper.py
+          python -u scraper.py
Evidence
In the same workflow step, the environment variable PYTHONUNBUFFERED is set and the command also
includes the -u flag, indicating redundant unbuffering configuration.

.github/workflows/update-data.yml[50-56]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The workflow enables unbuffered output twice for the same command (`PYTHONUNBUFFERED=1` and `python -u`). This redundancy is harmless but adds noise and creates a “two sources of truth” maintenance risk.

## Issue Context
The goal is to get scraper logs streamed during long runs. Either the environment variable or the `-u` flag alone achieves this.

## Fix Focus Areas
- .github/workflows/update-data.yml[51-56]

## Suggested change
Choose one mechanism and remove the other. For example, keep the step-level env var and revert the command back to `python scraper.py` (or alternatively, keep `python -u` and drop `PYTHONUNBUFFERED`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant