Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# PROJECT KNOWLEDGE BASE

**Generated:** 2026-03-23 Asia/Seoul
**Commit:** 4e15526
**Branch:** improve/#338

## OVERVIEW
TechFork is a Spring Boot 3.5.9 / Java 17 backend that crawls Korean tech blogs, stores posts in MySQL, enriches them with LLM summaries and embeddings, and serves search/recommendation APIs over Elasticsearch.

## STRUCTURE
```text
./
├── docs/ # scheduler, crawl-pipeline, commit/PR conventions
├── docker/ # local/dev/infra/blue-green compose + nginx + backups
├── infra/ # Terraform stacks; committed tfstate/tfvars are high-risk
├── k6/ # load scenarios + GCP runner Terraform
├── scripts/ # deploy, tunnel, monitor helpers
├── src/main/java/com/techfork/domain/ # bounded business domains
├── src/main/java/com/techfork/global/ # shared response/error/security/LLM infra
└── src/test/java/com/techfork/ # integration + evaluation suites
```

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| App bootstrap | `src/main/java/com/techfork/TechForkApplication.java` | `@SpringBootApplication`, `@EnableJpaAuditing` |
| Crawl pipeline | `src/main/java/com/techfork/domain/source/` | Child AGENTS covers job/scheduler invariants |
| Initial seed data | `src/main/java/com/techfork/global/config/InitialDataConfig.java` | Seeds tech blogs in `local`, `local-tunnel`, `dev` |
| Response / errors | `src/main/java/com/techfork/global/response/`, `global/exception/`, `global/common/code/` | `BaseResponse.of(...)`, `BaseCode`, `GeneralException` |
| Security / auth | `src/main/java/com/techfork/global/security/` | Real JWT/OAuth ownership lives here, not only `domain/auth` |
| Search hot path | `src/main/java/com/techfork/domain/search/` | `SearchServiceImpl` is one of the largest main-code files |
| Deployment topology | `docker/`, `scripts/deploy.sh`, `.github/workflows/cd.yml` | Blue-green + nginx upstream switching |
| Infra provisioning | `infra/` | AWS + Oracle stacks, committed state artifacts |
| Evaluation workflow | `src/test/java/com/techfork/evaluation/` | Child AGENTS covers tags, profiles, fixtures |
| Load testing | `k6/` | Child AGENTS covers env contract and scenario layout |

## CONVENTIONS
- Default runtime profile is `local-tunnel`; `application.yml` imports `.env` directly.
- Controllers return `BaseResponse.of(code[, data])`; raw controller bodies are non-standard here.
- Domain packages follow bounded contexts (`domain/<context>/controller|service|repository|dto|entity|converter|enums|exception`).
- Entities use protected no-args constructors plus static `create(...)` factories.
- Spring Batch schema is managed via Flyway files under `src/main/resources/db/migration/`, not auto-init in production-style profiles.
- Test execution is tag-split in Gradle: `test`, `integrationTest`, `evaluationTest`, `evaluationSetup`.
- Source ingestion orchestration lives in `domain/source`, but summary / embedding batch artifacts live in `domain/post/batch`.

## ANTI-PATTERNS (THIS PROJECT)
- Never commit or casually edit `.env`, `keys/`, `infra/terraform.tfstate`, `infra/*.tfvars`.
- Do not treat `domain/auth` as the full auth surface; JWT, OAuth handlers, filters, and cookies live under `global/security`.
- Do not duplicate `CLAUDE.md` or `docs/source-package.md` into child AGENTS files; summarize and point.
- Do not edit only one of `docker-compose.blue.yml` / `docker-compose.green.yml` unless the asymmetry is intentional.
- Do not run evaluation-heavy suites as if they were ordinary integration tests; they have separate tags, profiles, fixtures, and runtime cost.
- Do not trust docs over code when they disagree; example: scheduler docs mention hourly behavior, but `RssCrawlingScheduler` currently runs daily at 05:00 KST.

## UNIQUE STYLES
- Commit format: `<type>: <subject>` (`docs/commit-convention.md`).
- PR title format: `[type/#issue] description` (`docs/pr-convention.md`).
- Korean messages/comments are normal in code and docs.
- Evaluation outputs are checked into `src/test/resources/` as JSON reports.
- Operational knowledge is split across docs, compose files, shell scripts, and workflows rather than a single ops README.

## COMMANDS
```bash
./gradlew build
./gradlew test
./gradlew integrationTest
./gradlew evaluationTest
./gradlew evaluationSetup
./gradlew bootRun --args='--spring.profiles.active=local'
docker compose -f docker/docker-compose.local.yml up -d
```

## NOTES
- High-value docs: `docs/source-package.md`, `docs/SCHEDULER_GUIDE.md`, `docs/commit-convention.md`, `docs/pr-convention.md`.
- `HELP.md` is Spring starter boilerplate; low signal compared with repo docs.
- Child AGENTS live at:
- `src/main/java/com/techfork/domain/source/AGENTS.md`
- `src/test/java/com/techfork/evaluation/AGENTS.md`
- `docker/AGENTS.md`
- `infra/AGENTS.md`
- `k6/AGENTS.md`
33 changes: 33 additions & 0 deletions docker/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# DOCKER TOPOLOGY GUIDE

## OVERVIEW
`docker/` defines four different runtime modes: standalone local dev, shared infra, blue-green app deploy, and a separate dev app container. It also owns nginx routing and backup scripts.

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Local stack | `docker-compose.local.yml` | MySQL, Redis, ES, Kibana with local volumes |
| Shared infra stack | `docker-compose.infra.yml` | External network + external volumes |
| Production slots | `docker-compose.blue.yml`, `docker-compose.green.yml` | Blue on `8080`, green on `8081` |
| Dev slot | `docker-compose.dev.yml` | `techfork-app-dev` on shared network |
| Upstream switch contract | `nginx/conf.d/upstream.conf` | Rewritten by `scripts/deploy.sh` |
| Backup flow | `backup/backup.sh` | MySQL + ES snapshots uploaded to OCI |

## CONVENTIONS
- `docker-compose.infra.yml` expects external network `techfork-network` and external volumes named `deploy_*`.
- Blue and green compose files should stay structurally symmetric; the intended delta is slot identity/host port, not behavior drift.
- Health checks use `/actuator/health` on port `9090`; deploy logic depends on that exact contract.
- Infra ES mounts `/home/ubuntu/deploy/es-snapshots` for snapshot backups; this is part of backup, not optional debug state.
- `backup.sh` loads secrets from `/home/ubuntu/deploy/docker/.env` and uploads to OCI Object Storage, not AWS.
- Nginx config is coupled to Docker naming (`techfork-app-blue`, `techfork-app-green`, `techfork-app-dev`, `techfork-nginx`).

## ANTI-PATTERNS
- Do not edit only one of blue/green unless the asymmetry is deliberate and documented.
- Do not rename `techfork-network`, container names, or upstream targets without updating `scripts/deploy.sh` and nginx config together.
- Do not hardcode secrets into compose files; the env contract is already large enough.
- Do not treat local compose behavior as equivalent to infra/prod; local uses different networking and resource sizing.

## NOTES
- Redis is intentionally hardened in infra mode by renaming `KEYS`, `FLUSHALL`, and `FLUSHDB`.
- Infra ES runs with `-Xms8g -Xmx8g`; local ES runs much smaller.
- If a change touches compose, nginx, and deploy script together, review `scripts/deploy.sh` before editing.
29 changes: 29 additions & 0 deletions infra/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# INFRASTRUCTURE GUIDE

## OVERVIEW
`infra/` is Terraform for two provider stacks plus checked-in local state artifacts. It is not a clean “modules only” repo: provisioning and app runtime assumptions are mixed together.

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| AWS stack | `aws/main.tf` | VPC, EC2, RDS, CloudWatch, SNS, bootstrap user-data |
| Oracle stack | `oracle/main.tf`, `oracle/cloud-init.sh` | OCI ARM host, cloud-init bootstrap |
| High-risk local artifacts | `terraform.tfstate`, `terraform.tfstate.*`, `terraform.tfvars` | Sensitive, mutable, not normal source docs |

## CONVENTIONS
- `infra/` root holds shared state artifacts; treat them as operational data, not review-friendly code.
- AWS provisioning is coupled to runtime bootstrap: `aws/main.tf` contains user-data that installs Java, Nginx, Docker, CloudWatch agent, and app directories.
- Oracle provisioning is likewise coupled to runtime bootstrap through `cloud-init.sh` and Always Free ARM assumptions.
- The AWS stack is public-EC2 + private-RDS oriented; the Oracle stack is a public ARM instance path.
- Shared Terraform safety and workflow belong here; provider-specific nuances stay in code/comments unless the stacks diverge further.

## ANTI-PATTERNS
- Never commit casual edits to `terraform.tfstate`, `terraform.tfstate.*`, or `terraform.tfvars`.
- Do not assume infra changes are isolated from app deployment; bootstrap scripts encode application/runtime contracts.
- Do not split `aws/` and `oracle/` into separate child docs unless the workflows truly diverge; that would mostly duplicate Terraform safety today.
- Do not ignore provider differences: AWS user-data and OCI cloud-init are both executable logic, not comments.

## NOTES
- AWS config currently includes security/networking, EC2 bootstrap, RDS, log groups, and alarms in one file.
- Oracle config targets Ubuntu 22.04 ARM and uses lifecycle ignore rules to avoid noisy recreate behavior.
- Deployment operations also depend on `docker/` and `scripts/`; infra alone is not the full deploy story.
30 changes: 30 additions & 0 deletions k6/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# K6 LOAD TEST GUIDE

## OVERVIEW
`k6/` contains scenario scripts, aggregate load plans, and a separate Google Cloud Terraform runner. It is a performance-testing subtree, not part of the main infra stack.

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Env contract | `config.js` | `BASE_URL`, `AUTH_TOKEN`, shared headers, keyword list |
| Aggregate scenario mix | `load-test.js` | Multi-scenario VU plan, thresholds, writes `summary.json` |
| Focused entry scripts | `test-*.js` | Narrow runs for CRUD, search, recommendation |
| Scenario implementations | `scenarios/` | Request logic by use case |
| Remote runner infra | `terraform/main.tf` | GCP `k6-runner`, separate provider from `infra/` |

## CONVENTIONS
- `config.js` is the shared env surface; keep `BASE_URL` / `AUTH_TOKEN` usage centralized there.
- `load-test.js` mixes CRUD/search/recommendation traffic with scenario-specific thresholds; edits here affect the overall traffic model.
- Scenario files are intentionally lightweight and may swallow parsing failures to preserve latency/error sampling behavior.
- `terraform/` is for provisioning a GCP runner, not for the main application infrastructure.
- Results are expected in `summary.json` and console summary output; `results/` is for stored run artifacts.

## ANTI-PATTERNS
- Do not hardcode real tokens or production-only URLs into scripts.
- Do not treat `k6/terraform` as interchangeable with `infra/`; it uses a different cloud/provider and purpose.
- Do not tweak thresholds or VU mixes without considering the intended traffic distribution encoded in `load-test.js`.
- Do not bury request-shape changes in aggregate files when they belong in `scenarios/`.

## NOTES
- Blue/green deploy and app infra live elsewhere; `k6/` assumes a reachable API endpoint, not ownership of deployment.
- Some scenarios require auth headers, others are anonymous; keep that split visible in script names and headers.
42 changes: 42 additions & 0 deletions src/main/java/com/techfork/domain/source/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# SOURCE DOMAIN GUIDE

## OVERVIEW
`domain/source` owns RSS ingestion orchestration: crawl feeds, persist new posts, then hand off to summary and embedding steps.

## STRUCTURE
```text
source/
├── batch/ # feed reader, RSS->Post processor, bulk writer
├── config/ # job and step wiring
├── listener/ # job lifecycle hooks
├── scheduler/ # cron trigger
├── service/ # crawl execution + lock/job launch path
├── repository/ # TechBlog access
└── dto|entity/ # feed item and blog metadata
```

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Job topology | `config/RssCrawlingJobConfig.java` | `rssCrawlingJob` = fetch → summary → embed/index |
| Scheduler trigger | `scheduler/RssCrawlingScheduler.java` | Actual cron is `0 0 5 * * *`, zone `Asia/Seoul` |
| Full pipeline explainer | `docs/source-package.md` | Best human-readable walkthrough |
| Operational behavior | `docs/SCHEDULER_GUIDE.md` | Locking, notifications, troubleshooting; schedule text is partially stale |

## CONVENTIONS
- Step 1 `fetchAndSaveRssStep`: chunk 10, skip limit 10, `IllegalStateException` is `noSkip`.
- Step 2 `extractSummaryStep`: chunk 5, async processor/writer, summary executor fixed at 2 threads.
- Step 3 `embedAndIndexStep`: chunk 20, async processor/writer, embedding executor 10-20 threads.
- `summaryAndEmbeddingJob` intentionally skips the fetch step; keep it aligned with steps 2-3 only.
- MDC propagation is part of batch execution via `MdcTaskDecorator`; thread pool swaps are not cosmetic here.
- This package owns job orchestration, but summary/embedding readers/processors/writers are imported from `domain/post/batch`.

## ANTI-PATTERNS
- Do not move step-2/step-3 components into `source/` just because the job config imports them.
- Do not change cron, chunk sizes, skip limits, or thread counts without checking rate-limit and ops docs impact.
- Do not trust `docs/SCHEDULER_GUIDE.md` over code for schedule timing; the code path is the source of truth.
- Do not bypass the crawl service / lock / listener flow when adding manual job triggers.

## NOTES
- The scheduler comment and code both say daily 05:00 KST; some older docs still describe hourly crawling.
- If a change touches feed fetching, duplicate filtering, summary extraction, and ES indexing together, read `docs/source-package.md` first.
40 changes: 40 additions & 0 deletions src/test/java/com/techfork/evaluation/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# EVALUATION TEST GUIDE

## OVERVIEW
`src/test/java/com/techfork/evaluation` is not ordinary integration testing; it is fixture-heavy search/recommendation quality evaluation with separate Gradle tasks, tags, and runtime assumptions.

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| Task split | `build.gradle` | `integrationTest`, `evaluationTest`, `evaluationSetup` |
| Integration base | `src/test/java/com/techfork/global/common/IntegrationTestBase.java` | `@Tag("integration")`, profile `integrationtest` |
| Recommendation evaluation base | `recommendation/RecommendationTestBase.java` | Loads fixtures, force-merges ES, warms caches 3x |
| Search evaluation base | `search/SearchEvaluationTestBase.java` | `@Tag("evaluation")`, profile `local-tunnel` |
| Fixtures / reports | `src/test/resources/fixtures/evaluation/`, `src/test/resources/evaluation-report-*.json` | Large inputs + checked-in outputs |

## CONVENTIONS
- `./gradlew test` excludes `integration`, `evaluation`, and `evaluation-setup` workflows by design.
- `IntegrationTestBase` is the normal controller/service integration path; evaluation suites are a different lane.
- Search evaluation uses `@ActiveProfiles("local-tunnel")`; do not silently swap it to `integrationtest`.
- Recommendation evaluation extends `IntegrationTestBase`, loads cached fixtures once, force-merges `posts` and `user_profiles`, then runs warmup before metrics.
- Search evaluation builds `SearchServiceImpl` directly per scenario and writes JSON reports into `src/test/resources/`.
- `evaluation-setup` jobs are prerequisite generators/exporters, not “extra assertions.” Treat them as data-prep workflows.

## COMMANDS
```bash
./gradlew integrationTest
./gradlew evaluationTest
./gradlew evaluationSetup
./gradlew test -PexcludeIntegration
```

## ANTI-PATTERNS
- Do not run evaluation suites as part of a casual unit/integration loop.
- Do not edit fixture JSON or evaluation reports by hand when the source data should be regenerated.
- Do not ignore the `local-tunnel` dependency for search evaluation; it is part of the test contract.
- Do not assume evaluation bases are cheap; they load large fixture sets and may warm Elasticsearch aggressively.

## NOTES
- Recommendation metrics center on Recall / nDCG / ILD across K values 4, 8, 30.
- Search metrics center on nDCG / Recall at 4, 8, 20 plus latency.
- If you only need normal application verification, stay in `domain/` or `global/` integration tests instead of this subtree.