Fix Prisma container error and crawler not finding posts#2
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: Payel-git-ol#1
…risma schema
This fixes the Prisma "table does not exist" error by ensuring:
1. DATABASE_URL in docker-compose uses ${POSTGRES_DB} instead of hardcoded 'telegram-scrapper'
2. Prisma schema.prisma includes url = env("DATABASE_URL") in datasource block
3. Removed unused init-db volume mount from postgres service
The error occurred because the database was created as 'tgcrawler' (from POSTGRES_DB)
but the app was trying to connect to 'telegram-scrapper'. Prisma migrations are
already configured to run automatically via Dockerfile CMD.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This reverts commit 3e4880b.
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
1.073 > tg-crawler@1.0.0 prisma:generate |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T18:25:40.047Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
Prisma 7 no longer supports the `url` property in schema.prisma datasource block. The DATABASE_URL is already correctly configured in prisma.config.ts (for migrations/CLI) and via the PrismaPg adapter in TaskService (for runtime). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thanks for catching that! You're right — Prisma 7 no longer supports the I've pushed a fix in commit df39a6e that removes the
The docker-compose.yaml fix (using |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Add endpoint to crawler with a data, or write a output command to crawl with a date |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T18:43:44.689Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
Allow crawling since a specific date instead of only the last 7 days.
- POST /api/crawl accepts optional {date: "YYYY-MM-DD"} in request body
- npm run crawl supports --date 2025-01-15 CLI argument
- Scraper.scrape() accepts optional sinceDate parameter
- Default behavior unchanged (last 7 days when no date specified)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Added date-based crawling support as requested. Here's what's new: API endpoint — # Crawl since a specific date
curl -X POST http://localhost:3000/api/crawl \
-H "Content-Type: application/json" \
-d '{"date": "2025-01-15"}'
# Default behavior (last 7 days) still works
curl -X POST http://localhost:3000/api/crawlCLI command — npm run crawl -- --date 2025-01-15Invalid dates return a 400 error (API) or exit with an error message (CLI). When no date is provided, the default 7-day window is used. Changes in commit bc21f81:
|
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
{"success":true,"message":"No new posts found","posts":[]} |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T18:57:22.936Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
… validation
Root cause: The crawler returned "No new posts found" despite posts existing
because of multiple compounding bugs:
1. ContentValidator.isJobPost() required BOTH job keywords AND job emojis
(👔/💼/📌). Many valid job posts don't use these specific emojis.
Changed to accept posts with keywords OR emojis.
2. HtmlParser.extractPostId() used selector `a[href*='/s/']` to find post
links, but Telegram post URLs are `t.me/channel/ID` (no `/s/`). This
caused all posts to get random IDs (`post_${Date.now()}`), breaking
deduplication. Fixed to use the `data-post` attribute on message elements.
3. HtmlParser MESSAGE_TIME selector was `time.datetime` (looking for class
"datetime") but the actual `<time>` elements have class "time" and a
`datetime` attribute. Fixed to `time[datetime]`.
4. PostExtractor.parsePost() never called extractTimestamp(), setting all
posts to current time instead of actual post date. Now uses real timestamps.
5. AD_PATTERNS `/pinned/i` regex matched "pinned" anywhere in text, potentially
filtering valid posts. Changed to `^pinned` (start-of-text only).
6. Post URLs were generated as `${baseUrl}?q=${postId}` which is invalid.
Fixed to `${baseUrl.replace('/s/', '/')}/${postId}` producing correct
Telegram post links.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed the "No new posts found" issue in commit a2f9b2d. The root cause was 6 compounding bugs that prevented the crawler from extracting posts:
All changes verified to compile cleanly with |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart 1/3Detected uncommitted changes from previous run. Starting new session to review and commit them. Uncommitted files: Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback. |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Changes in this session:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart 1/3 LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Save a database to data with api. Is now not saved |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T19:16:35.628Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
- Add UNIQUE index on Task.id_post in init migration (was missing, needed for Prisma skipDuplicates to work correctly) - API crawl endpoint now saves to both database AND files (backup) - API response now reports DB/file save counts separately - Fix: crawled posts are always included in API response even if DB save partially fails - CLI crawl-only.ts now also saves to database when DATABASE_URL is available (previously only saved to files) - Normalize content hashes with toLowerCase().trim() for consistent dedup Fixes feedback: "Save a database to data with api. Is now not saved" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed the data saving issue in commit 61e327c. Here's what was wrong and what changed: Root causes:
Changes:
Note: If you're running the existing database, you'll need to either:
|
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
|
Run of container, table is not created and empty. Fix please |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T19:38:05.458Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
When running in Docker, `prisma migrate deploy` can fail to create tables due to Prisma 7 config loading issues (TypeScript config resolution, missing dependencies at runtime, or stale migration state). Changes: - TaskService.ensureTablesExist() creates the Task table and its UNIQUE index via raw SQL if they don't already exist - Called on app startup (src/index.ts) and in CLI crawl (crawl-only.ts) - Dockerfile CMD now continues to start the app even if prisma migrate deploy fails, since the SQL fallback will handle table creation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed the table creation issue in commit 4f3710c. The root cause is that What changed:
This means even if Prisma migrations can't run (wrong database state, TS config issues, missing migration history), the app will self-heal by creating the table on startup. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
5.878 Error: P1013: The provided database string is invalid. The scheme is not recognized in database URL. Please refer to the documentation in https://pris.ly/d/config-url for constructing a correct connection string. In some cases, certain characters must be escaped. Please check the string for any illegal characters.[+] up 0/1 29 | RUN npm run prisma:generate 30 | 31 | >>> RUN npx prisma migrate deploy 32 | 33 | RUN npm run build failed to solve: process "/bin/sh -c npx prisma migrate deploy" did not complete successfully: exit code: 1 |
|
Is changed docker file(npx generate before start and build app please place) |
|
🤖 AI Work Session Started Starting automated work session at 2026-02-01T19:49:25.152Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback. |
The prisma migrate deploy command requires a running database, which is only available at runtime (via docker-compose), not during docker build. Added docker-entrypoint.sh that runs prisma generate + migrate deploy before starting the app, with graceful fallbacks if either step fails. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed the Docker build failure in commit e853a57. Root cause: Fix: Created a The Dockerfile build phase now only does:
Both steps work without a database connection. Important: Do NOT add |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
Summary
Fixes #1 - Prisma "table does not exist" error, crawler returning empty results, data not being saved to database, and adds date-based crawling.
Bug Fix: Database table not created on container start
The
prisma migrate deploycommand in the Dockerfile CMD can fail silently in Prisma 7 Docker environments due to TypeScript config resolution issues. Root causes and fix:.tsconfig loading —prisma migrate deployneeds to loadprisma.config.tsat runtime, which can fail if the TS execution environment isn't properly set up in the container._prisma_migrationsrecord from a previous (broken) deployment, Prisma skips the migration even though the table doesn't exist.TaskService.ensureTablesExist()checks if theTasktable exists and creates it via raw SQL if not. Called on app startup and CLI crawl.||instead of&&for the migration step, so the app starts even if Prisma migrations fail.Bug Fix: Data not saved to database
The crawl endpoint (
POST /api/crawl) was not reliably persisting data. Root causes:id_post— The migration didn't match the schema (@uniqueonid_post), breaking Prisma'sskipDuplicatesbehavior.data/folder.npm run crawlonly saved to files. Now saves to both DB and files whenDATABASE_URLis set.Bug Fix: Crawler returns "No new posts found" when posts exist
Root cause was 6 compounding bugs:
src/services/validate/validator.ts): Required BOTH job keywords AND specific emojis. Fixed to accept keywords OR emojis.src/services/parser/htmlParser.ts): Useda[href*='/s/']selector but Telegram post links don't contain/s/. Fixed to usedata-postattribute.src/services/parser/htmlParser.ts):time.datetimelooked for class "datetime", but actual<time>elements usedatetimeattribute. Fixed totime[datetime].src/services/extractor.ts):parsePost()never calledextractTimestamp(). Now uses actual post dates.src/services/validate/validator.ts):/pinned/imatched "pinned" anywhere. Changed to^pinned(start-of-text only).src/services/extractor.ts): Generated${baseUrl}?q=${postId}instead of propert.me/channel/postIdURLs.Bug Fix: Prisma container error
docker-compose.yaml): Changed hardcodedtelegram-scrapperto${POSTGRES_DB}.urlfrom schema.prisma for Prisma 7 compatibility.Task.id_postin migration.New Feature: Date-based crawling
POST /api/crawlaccepts optional{date: "YYYY-MM-DD"}bodynpm run crawl -- --date 2025-01-15Test Plan
docker-compose up --build -dstarts without errorsdocker-compose logs telegram-scrapershows "Database table 'Task' already exists" or "Table 'Task' created successfully via SQL fallback"curl -X POST http://localhost:3000/api/crawlreturns posts (not empty[])GET /api/jobs) anddata/foldercurl -X POST http://localhost:3000/api/crawl -d '{"date":"2025-01-01"}'crawls since Jan 1post_TIMESTAMP)🤖 Generated with Claude Code