TechForkTeam · Dimo-2562 · Mar 29, 2026 · Mar 29, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,80 @@
+# PROJECT KNOWLEDGE BASE
+
+**Generated:** 2026-03-23 Asia/Seoul
+**Commit:** 4e15526
+**Branch:** improve/#338
+
+## OVERVIEW
+TechFork is a Spring Boot 3.5.9 / Java 17 backend that crawls Korean tech blogs, stores posts in MySQL, enriches them with LLM summaries and embeddings, and serves search/recommendation APIs over Elasticsearch.
+
+## STRUCTURE
+```text
+./
+├── docs/                 # scheduler, crawl-pipeline, commit/PR conventions
+├── docker/               # local/dev/infra/blue-green compose + nginx + backups
+├── infra/                # Terraform stacks; committed tfstate/tfvars are high-risk
+├── k6/                   # load scenarios + GCP runner Terraform
+├── scripts/              # deploy, tunnel, monitor helpers
+├── src/main/java/com/techfork/domain/   # bounded business domains
+├── src/main/java/com/techfork/global/   # shared response/error/security/LLM infra
+└── src/test/java/com/techfork/          # integration + evaluation suites
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| App bootstrap | `src/main/java/com/techfork/TechForkApplication.java` | `@SpringBootApplication`, `@EnableJpaAuditing` |
+| Crawl pipeline | `src/main/java/com/techfork/domain/source/` | Child AGENTS covers job/scheduler invariants |
+| Initial seed data | `src/main/java/com/techfork/global/config/InitialDataConfig.java` | Seeds tech blogs in `local`, `local-tunnel`, `dev` |
+| Response / errors | `src/main/java/com/techfork/global/response/`, `global/exception/`, `global/common/code/` | `BaseResponse.of(...)`, `BaseCode`, `GeneralException` |
+| Security / auth | `src/main/java/com/techfork/global/security/` | Real JWT/OAuth ownership lives here, not only `domain/auth` |
+| Search hot path | `src/main/java/com/techfork/domain/search/` | `SearchServiceImpl` is one of the largest main-code files |
+| Deployment topology | `docker/`, `scripts/deploy.sh`, `.github/workflows/cd.yml` | Blue-green + nginx upstream switching |
+| Infra provisioning | `infra/` | AWS + Oracle stacks, committed state artifacts |
+| Evaluation workflow | `src/test/java/com/techfork/evaluation/` | Child AGENTS covers tags, profiles, fixtures |
+| Load testing | `k6/` | Child AGENTS covers env contract and scenario layout |
+
+## CONVENTIONS
+- Default runtime profile is `local-tunnel`; `application.yml` imports `.env` directly.
+- Controllers return `BaseResponse.of(code[, data])`; raw controller bodies are non-standard here.
+- Domain packages follow bounded contexts (`domain/<context>/controller|service|repository|dto|entity|converter|enums|exception`).
+- Entities use protected no-args constructors plus static `create(...)` factories.
+- Spring Batch schema is managed via Flyway files under `src/main/resources/db/migration/`, not auto-init in production-style profiles.
+- Test execution is tag-split in Gradle: `test`, `integrationTest`, `evaluationTest`, `evaluationSetup`.
+- Source ingestion orchestration lives in `domain/source`, but summary / embedding batch artifacts live in `domain/post/batch`.
+
+## ANTI-PATTERNS (THIS PROJECT)
+- Never commit or casually edit `.env`, `keys/`, `infra/terraform.tfstate`, `infra/*.tfvars`.
+- Do not treat `domain/auth` as the full auth surface; JWT, OAuth handlers, filters, and cookies live under `global/security`.
+- Do not duplicate `CLAUDE.md` or `docs/source-package.md` into child AGENTS files; summarize and point.
+- Do not edit only one of `docker-compose.blue.yml` / `docker-compose.green.yml` unless the asymmetry is intentional.
+- Do not run evaluation-heavy suites as if they were ordinary integration tests; they have separate tags, profiles, fixtures, and runtime cost.
+- Do not trust docs over code when they disagree; example: scheduler docs mention hourly behavior, but `RssCrawlingScheduler` currently runs daily at 05:00 KST.
+
+## UNIQUE STYLES
+- Commit format: `<type>: <subject>` (`docs/commit-convention.md`).
+- PR title format: `[type/#issue] description` (`docs/pr-convention.md`).
+- Korean messages/comments are normal in code and docs.
+- Evaluation outputs are checked into `src/test/resources/` as JSON reports.
+- Operational knowledge is split across docs, compose files, shell scripts, and workflows rather than a single ops README.
+
+## COMMANDS
+```bash
+./gradlew build
+./gradlew test
+./gradlew integrationTest
+./gradlew evaluationTest
+./gradlew evaluationSetup
+./gradlew bootRun --args='--spring.profiles.active=local'
+docker compose -f docker/docker-compose.local.yml up -d
+```
+
+## NOTES
+- High-value docs: `docs/source-package.md`, `docs/SCHEDULER_GUIDE.md`, `docs/commit-convention.md`, `docs/pr-convention.md`.
+- `HELP.md` is Spring starter boilerplate; low signal compared with repo docs.
+- Child AGENTS live at:
+  - `src/main/java/com/techfork/domain/source/AGENTS.md`
+  - `src/test/java/com/techfork/evaluation/AGENTS.md`
+  - `docker/AGENTS.md`
+  - `infra/AGENTS.md`
+  - `k6/AGENTS.md`
diff --git a/docker/AGENTS.md b/docker/AGENTS.md
@@ -0,0 +1,33 @@
+# DOCKER TOPOLOGY GUIDE
+
+## OVERVIEW
+`docker/` defines four different runtime modes: standalone local dev, shared infra, blue-green app deploy, and a separate dev app container. It also owns nginx routing and backup scripts.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Local stack | `docker-compose.local.yml` | MySQL, Redis, ES, Kibana with local volumes |
+| Shared infra stack | `docker-compose.infra.yml` | External network + external volumes |
+| Production slots | `docker-compose.blue.yml`, `docker-compose.green.yml` | Blue on `8080`, green on `8081` |
+| Dev slot | `docker-compose.dev.yml` | `techfork-app-dev` on shared network |
+| Upstream switch contract | `nginx/conf.d/upstream.conf` | Rewritten by `scripts/deploy.sh` |
+| Backup flow | `backup/backup.sh` | MySQL + ES snapshots uploaded to OCI |
+
+## CONVENTIONS
+- `docker-compose.infra.yml` expects external network `techfork-network` and external volumes named `deploy_*`.
+- Blue and green compose files should stay structurally symmetric; the intended delta is slot identity/host port, not behavior drift.
+- Health checks use `/actuator/health` on port `9090`; deploy logic depends on that exact contract.
+- Infra ES mounts `/home/ubuntu/deploy/es-snapshots` for snapshot backups; this is part of backup, not optional debug state.
+- `backup.sh` loads secrets from `/home/ubuntu/deploy/docker/.env` and uploads to OCI Object Storage, not AWS.
+- Nginx config is coupled to Docker naming (`techfork-app-blue`, `techfork-app-green`, `techfork-app-dev`, `techfork-nginx`).
+
+## ANTI-PATTERNS
+- Do not edit only one of blue/green unless the asymmetry is deliberate and documented.
+- Do not rename `techfork-network`, container names, or upstream targets without updating `scripts/deploy.sh` and nginx config together.
+- Do not hardcode secrets into compose files; the env contract is already large enough.
+- Do not treat local compose behavior as equivalent to infra/prod; local uses different networking and resource sizing.
+
+## NOTES
+- Redis is intentionally hardened in infra mode by renaming `KEYS`, `FLUSHALL`, and `FLUSHDB`.
+- Infra ES runs with `-Xms8g -Xmx8g`; local ES runs much smaller.
+- If a change touches compose, nginx, and deploy script together, review `scripts/deploy.sh` before editing.
diff --git a/infra/AGENTS.md b/infra/AGENTS.md
@@ -0,0 +1,29 @@
+# INFRASTRUCTURE GUIDE
+
+## OVERVIEW
+`infra/` is Terraform for two provider stacks plus checked-in local state artifacts. It is not a clean “modules only” repo: provisioning and app runtime assumptions are mixed together.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| AWS stack | `aws/main.tf` | VPC, EC2, RDS, CloudWatch, SNS, bootstrap user-data |
+| Oracle stack | `oracle/main.tf`, `oracle/cloud-init.sh` | OCI ARM host, cloud-init bootstrap |
+| High-risk local artifacts | `terraform.tfstate`, `terraform.tfstate.*`, `terraform.tfvars` | Sensitive, mutable, not normal source docs |
+
+## CONVENTIONS
+- `infra/` root holds shared state artifacts; treat them as operational data, not review-friendly code.
+- AWS provisioning is coupled to runtime bootstrap: `aws/main.tf` contains user-data that installs Java, Nginx, Docker, CloudWatch agent, and app directories.
+- Oracle provisioning is likewise coupled to runtime bootstrap through `cloud-init.sh` and Always Free ARM assumptions.
+- The AWS stack is public-EC2 + private-RDS oriented; the Oracle stack is a public ARM instance path.
+- Shared Terraform safety and workflow belong here; provider-specific nuances stay in code/comments unless the stacks diverge further.
+
+## ANTI-PATTERNS
+- Never commit casual edits to `terraform.tfstate`, `terraform.tfstate.*`, or `terraform.tfvars`.
+- Do not assume infra changes are isolated from app deployment; bootstrap scripts encode application/runtime contracts.
+- Do not split `aws/` and `oracle/` into separate child docs unless the workflows truly diverge; that would mostly duplicate Terraform safety today.
+- Do not ignore provider differences: AWS user-data and OCI cloud-init are both executable logic, not comments.
+
+## NOTES
+- AWS config currently includes security/networking, EC2 bootstrap, RDS, log groups, and alarms in one file.
+- Oracle config targets Ubuntu 22.04 ARM and uses lifecycle ignore rules to avoid noisy recreate behavior.
+- Deployment operations also depend on `docker/` and `scripts/`; infra alone is not the full deploy story.
diff --git a/k6/AGENTS.md b/k6/AGENTS.md
@@ -0,0 +1,30 @@
+# K6 LOAD TEST GUIDE
+
+## OVERVIEW
+`k6/` contains scenario scripts, aggregate load plans, and a separate Google Cloud Terraform runner. It is a performance-testing subtree, not part of the main infra stack.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Env contract | `config.js` | `BASE_URL`, `AUTH_TOKEN`, shared headers, keyword list |
+| Aggregate scenario mix | `load-test.js` | Multi-scenario VU plan, thresholds, writes `summary.json` |
+| Focused entry scripts | `test-*.js` | Narrow runs for CRUD, search, recommendation |
+| Scenario implementations | `scenarios/` | Request logic by use case |
+| Remote runner infra | `terraform/main.tf` | GCP `k6-runner`, separate provider from `infra/` |
+
+## CONVENTIONS
+- `config.js` is the shared env surface; keep `BASE_URL` / `AUTH_TOKEN` usage centralized there.
+- `load-test.js` mixes CRUD/search/recommendation traffic with scenario-specific thresholds; edits here affect the overall traffic model.
+- Scenario files are intentionally lightweight and may swallow parsing failures to preserve latency/error sampling behavior.
+- `terraform/` is for provisioning a GCP runner, not for the main application infrastructure.
+- Results are expected in `summary.json` and console summary output; `results/` is for stored run artifacts.
+
+## ANTI-PATTERNS
+- Do not hardcode real tokens or production-only URLs into scripts.
+- Do not treat `k6/terraform` as interchangeable with `infra/`; it uses a different cloud/provider and purpose.
+- Do not tweak thresholds or VU mixes without considering the intended traffic distribution encoded in `load-test.js`.
+- Do not bury request-shape changes in aggregate files when they belong in `scenarios/`.
+
+## NOTES
+- Blue/green deploy and app infra live elsewhere; `k6/` assumes a reachable API endpoint, not ownership of deployment.
+- Some scenarios require auth headers, others are anonymous; keep that split visible in script names and headers.
diff --git a/src/main/java/com/techfork/domain/source/AGENTS.md b/src/main/java/com/techfork/domain/source/AGENTS.md
@@ -0,0 +1,42 @@
+# SOURCE DOMAIN GUIDE
+
+## OVERVIEW
+`domain/source` owns RSS ingestion orchestration: crawl feeds, persist new posts, then hand off to summary and embedding steps.
+
+## STRUCTURE
+```text
+source/
+├── batch/       # feed reader, RSS->Post processor, bulk writer
+├── config/      # job and step wiring
+├── listener/    # job lifecycle hooks
+├── scheduler/   # cron trigger
+├── service/     # crawl execution + lock/job launch path
+├── repository/  # TechBlog access
+└── dto|entity/  # feed item and blog metadata
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Job topology | `config/RssCrawlingJobConfig.java` | `rssCrawlingJob` = fetch → summary → embed/index |
+| Scheduler trigger | `scheduler/RssCrawlingScheduler.java` | Actual cron is `0 0 5 * * *`, zone `Asia/Seoul` |
+| Full pipeline explainer | `docs/source-package.md` | Best human-readable walkthrough |
+| Operational behavior | `docs/SCHEDULER_GUIDE.md` | Locking, notifications, troubleshooting; schedule text is partially stale |
+
+## CONVENTIONS
+- Step 1 `fetchAndSaveRssStep`: chunk 10, skip limit 10, `IllegalStateException` is `noSkip`.
+- Step 2 `extractSummaryStep`: chunk 5, async processor/writer, summary executor fixed at 2 threads.
+- Step 3 `embedAndIndexStep`: chunk 20, async processor/writer, embedding executor 10-20 threads.
+- `summaryAndEmbeddingJob` intentionally skips the fetch step; keep it aligned with steps 2-3 only.
+- MDC propagation is part of batch execution via `MdcTaskDecorator`; thread pool swaps are not cosmetic here.
+- This package owns job orchestration, but summary/embedding readers/processors/writers are imported from `domain/post/batch`.
+
+## ANTI-PATTERNS
+- Do not move step-2/step-3 components into `source/` just because the job config imports them.
+- Do not change cron, chunk sizes, skip limits, or thread counts without checking rate-limit and ops docs impact.
+- Do not trust `docs/SCHEDULER_GUIDE.md` over code for schedule timing; the code path is the source of truth.
+- Do not bypass the crawl service / lock / listener flow when adding manual job triggers.
+
+## NOTES
+- The scheduler comment and code both say daily 05:00 KST; some older docs still describe hourly crawling.
+- If a change touches feed fetching, duplicate filtering, summary extraction, and ES indexing together, read `docs/source-package.md` first.
diff --git a/src/test/java/com/techfork/evaluation/AGENTS.md b/src/test/java/com/techfork/evaluation/AGENTS.md
@@ -0,0 +1,40 @@
+# EVALUATION TEST GUIDE
+
+## OVERVIEW
+`src/test/java/com/techfork/evaluation` is not ordinary integration testing; it is fixture-heavy search/recommendation quality evaluation with separate Gradle tasks, tags, and runtime assumptions.
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| Task split | `build.gradle` | `integrationTest`, `evaluationTest`, `evaluationSetup` |
+| Integration base | `src/test/java/com/techfork/global/common/IntegrationTestBase.java` | `@Tag("integration")`, profile `integrationtest` |
+| Recommendation evaluation base | `recommendation/RecommendationTestBase.java` | Loads fixtures, force-merges ES, warms caches 3x |
+| Search evaluation base | `search/SearchEvaluationTestBase.java` | `@Tag("evaluation")`, profile `local-tunnel` |
+| Fixtures / reports | `src/test/resources/fixtures/evaluation/`, `src/test/resources/evaluation-report-*.json` | Large inputs + checked-in outputs |
+
+## CONVENTIONS
+- `./gradlew test` excludes `integration`, `evaluation`, and `evaluation-setup` workflows by design.
+- `IntegrationTestBase` is the normal controller/service integration path; evaluation suites are a different lane.
+- Search evaluation uses `@ActiveProfiles("local-tunnel")`; do not silently swap it to `integrationtest`.
+- Recommendation evaluation extends `IntegrationTestBase`, loads cached fixtures once, force-merges `posts` and `user_profiles`, then runs warmup before metrics.
+- Search evaluation builds `SearchServiceImpl` directly per scenario and writes JSON reports into `src/test/resources/`.
+- `evaluation-setup` jobs are prerequisite generators/exporters, not “extra assertions.” Treat them as data-prep workflows.
+
+## COMMANDS
+```bash
+./gradlew integrationTest
+./gradlew evaluationTest
+./gradlew evaluationSetup
+./gradlew test -PexcludeIntegration
+```
+
+## ANTI-PATTERNS
+- Do not run evaluation suites as part of a casual unit/integration loop.
+- Do not edit fixture JSON or evaluation reports by hand when the source data should be regenerated.
+- Do not ignore the `local-tunnel` dependency for search evaluation; it is part of the test contract.
+- Do not assume evaluation bases are cheap; they load large fixture sets and may warm Elasticsearch aggressively.
+
+## NOTES
+- Recommendation metrics center on Recall / nDCG / ILD across K values 4, 8, 30.
+- Search metrics center on nDCG / Recall at 4, 8, 20 plus latency.
+- If you only need normal application verification, stay in `domain/` or `global/` integration tests instead of this subtree.