From 441441b292c1fdd33bd6aa64a77018f51f3d7446 Mon Sep 17 00:00:00 2001 From: dmori Date: Sun, 29 Mar 2026 19:50:32 +0900 Subject: [PATCH] =?UTF-8?q?docs:=20=EC=98=A4=ED=94=88=EC=BD=94=EB=93=9C=20?= =?UTF-8?q?=EC=82=AC=EC=9A=A9=EC=9D=84=20=EC=9C=84=ED=95=9C=20=EC=B4=88?= =?UTF-8?q?=EA=B8=B0=20md=20=ED=8C=8C=EC=9D=BC=20=EC=83=9D=EC=84=B1?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- AGENTS.md | 80 +++++++++++++++++++ docker/AGENTS.md | 33 ++++++++ infra/AGENTS.md | 29 +++++++ k6/AGENTS.md | 30 +++++++ .../java/com/techfork/domain/source/AGENTS.md | 42 ++++++++++ .../java/com/techfork/evaluation/AGENTS.md | 40 ++++++++++ 6 files changed, 254 insertions(+) create mode 100644 AGENTS.md create mode 100644 docker/AGENTS.md create mode 100644 infra/AGENTS.md create mode 100644 k6/AGENTS.md create mode 100644 src/main/java/com/techfork/domain/source/AGENTS.md create mode 100644 src/test/java/com/techfork/evaluation/AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..31688277 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,80 @@ +# PROJECT KNOWLEDGE BASE + +**Generated:** 2026-03-23 Asia/Seoul +**Commit:** 4e15526 +**Branch:** improve/#338 + +## OVERVIEW +TechFork is a Spring Boot 3.5.9 / Java 17 backend that crawls Korean tech blogs, stores posts in MySQL, enriches them with LLM summaries and embeddings, and serves search/recommendation APIs over Elasticsearch. + +## STRUCTURE +```text +./ +├── docs/ # scheduler, crawl-pipeline, commit/PR conventions +├── docker/ # local/dev/infra/blue-green compose + nginx + backups +├── infra/ # Terraform stacks; committed tfstate/tfvars are high-risk +├── k6/ # load scenarios + GCP runner Terraform +├── scripts/ # deploy, tunnel, monitor helpers +├── src/main/java/com/techfork/domain/ # bounded business domains +├── src/main/java/com/techfork/global/ # shared response/error/security/LLM infra +└── src/test/java/com/techfork/ # integration + evaluation suites +``` + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| App bootstrap | `src/main/java/com/techfork/TechForkApplication.java` | `@SpringBootApplication`, `@EnableJpaAuditing` | +| Crawl pipeline | `src/main/java/com/techfork/domain/source/` | Child AGENTS covers job/scheduler invariants | +| Initial seed data | `src/main/java/com/techfork/global/config/InitialDataConfig.java` | Seeds tech blogs in `local`, `local-tunnel`, `dev` | +| Response / errors | `src/main/java/com/techfork/global/response/`, `global/exception/`, `global/common/code/` | `BaseResponse.of(...)`, `BaseCode`, `GeneralException` | +| Security / auth | `src/main/java/com/techfork/global/security/` | Real JWT/OAuth ownership lives here, not only `domain/auth` | +| Search hot path | `src/main/java/com/techfork/domain/search/` | `SearchServiceImpl` is one of the largest main-code files | +| Deployment topology | `docker/`, `scripts/deploy.sh`, `.github/workflows/cd.yml` | Blue-green + nginx upstream switching | +| Infra provisioning | `infra/` | AWS + Oracle stacks, committed state artifacts | +| Evaluation workflow | `src/test/java/com/techfork/evaluation/` | Child AGENTS covers tags, profiles, fixtures | +| Load testing | `k6/` | Child AGENTS covers env contract and scenario layout | + +## CONVENTIONS +- Default runtime profile is `local-tunnel`; `application.yml` imports `.env` directly. +- Controllers return `BaseResponse.of(code[, data])`; raw controller bodies are non-standard here. +- Domain packages follow bounded contexts (`domain//controller|service|repository|dto|entity|converter|enums|exception`). +- Entities use protected no-args constructors plus static `create(...)` factories. +- Spring Batch schema is managed via Flyway files under `src/main/resources/db/migration/`, not auto-init in production-style profiles. +- Test execution is tag-split in Gradle: `test`, `integrationTest`, `evaluationTest`, `evaluationSetup`. +- Source ingestion orchestration lives in `domain/source`, but summary / embedding batch artifacts live in `domain/post/batch`. + +## ANTI-PATTERNS (THIS PROJECT) +- Never commit or casually edit `.env`, `keys/`, `infra/terraform.tfstate`, `infra/*.tfvars`. +- Do not treat `domain/auth` as the full auth surface; JWT, OAuth handlers, filters, and cookies live under `global/security`. +- Do not duplicate `CLAUDE.md` or `docs/source-package.md` into child AGENTS files; summarize and point. +- Do not edit only one of `docker-compose.blue.yml` / `docker-compose.green.yml` unless the asymmetry is intentional. +- Do not run evaluation-heavy suites as if they were ordinary integration tests; they have separate tags, profiles, fixtures, and runtime cost. +- Do not trust docs over code when they disagree; example: scheduler docs mention hourly behavior, but `RssCrawlingScheduler` currently runs daily at 05:00 KST. + +## UNIQUE STYLES +- Commit format: `: ` (`docs/commit-convention.md`). +- PR title format: `[type/#issue] description` (`docs/pr-convention.md`). +- Korean messages/comments are normal in code and docs. +- Evaluation outputs are checked into `src/test/resources/` as JSON reports. +- Operational knowledge is split across docs, compose files, shell scripts, and workflows rather than a single ops README. + +## COMMANDS +```bash +./gradlew build +./gradlew test +./gradlew integrationTest +./gradlew evaluationTest +./gradlew evaluationSetup +./gradlew bootRun --args='--spring.profiles.active=local' +docker compose -f docker/docker-compose.local.yml up -d +``` + +## NOTES +- High-value docs: `docs/source-package.md`, `docs/SCHEDULER_GUIDE.md`, `docs/commit-convention.md`, `docs/pr-convention.md`. +- `HELP.md` is Spring starter boilerplate; low signal compared with repo docs. +- Child AGENTS live at: + - `src/main/java/com/techfork/domain/source/AGENTS.md` + - `src/test/java/com/techfork/evaluation/AGENTS.md` + - `docker/AGENTS.md` + - `infra/AGENTS.md` + - `k6/AGENTS.md` diff --git a/docker/AGENTS.md b/docker/AGENTS.md new file mode 100644 index 00000000..353203fa --- /dev/null +++ b/docker/AGENTS.md @@ -0,0 +1,33 @@ +# DOCKER TOPOLOGY GUIDE + +## OVERVIEW +`docker/` defines four different runtime modes: standalone local dev, shared infra, blue-green app deploy, and a separate dev app container. It also owns nginx routing and backup scripts. + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| Local stack | `docker-compose.local.yml` | MySQL, Redis, ES, Kibana with local volumes | +| Shared infra stack | `docker-compose.infra.yml` | External network + external volumes | +| Production slots | `docker-compose.blue.yml`, `docker-compose.green.yml` | Blue on `8080`, green on `8081` | +| Dev slot | `docker-compose.dev.yml` | `techfork-app-dev` on shared network | +| Upstream switch contract | `nginx/conf.d/upstream.conf` | Rewritten by `scripts/deploy.sh` | +| Backup flow | `backup/backup.sh` | MySQL + ES snapshots uploaded to OCI | + +## CONVENTIONS +- `docker-compose.infra.yml` expects external network `techfork-network` and external volumes named `deploy_*`. +- Blue and green compose files should stay structurally symmetric; the intended delta is slot identity/host port, not behavior drift. +- Health checks use `/actuator/health` on port `9090`; deploy logic depends on that exact contract. +- Infra ES mounts `/home/ubuntu/deploy/es-snapshots` for snapshot backups; this is part of backup, not optional debug state. +- `backup.sh` loads secrets from `/home/ubuntu/deploy/docker/.env` and uploads to OCI Object Storage, not AWS. +- Nginx config is coupled to Docker naming (`techfork-app-blue`, `techfork-app-green`, `techfork-app-dev`, `techfork-nginx`). + +## ANTI-PATTERNS +- Do not edit only one of blue/green unless the asymmetry is deliberate and documented. +- Do not rename `techfork-network`, container names, or upstream targets without updating `scripts/deploy.sh` and nginx config together. +- Do not hardcode secrets into compose files; the env contract is already large enough. +- Do not treat local compose behavior as equivalent to infra/prod; local uses different networking and resource sizing. + +## NOTES +- Redis is intentionally hardened in infra mode by renaming `KEYS`, `FLUSHALL`, and `FLUSHDB`. +- Infra ES runs with `-Xms8g -Xmx8g`; local ES runs much smaller. +- If a change touches compose, nginx, and deploy script together, review `scripts/deploy.sh` before editing. diff --git a/infra/AGENTS.md b/infra/AGENTS.md new file mode 100644 index 00000000..90b068d2 --- /dev/null +++ b/infra/AGENTS.md @@ -0,0 +1,29 @@ +# INFRASTRUCTURE GUIDE + +## OVERVIEW +`infra/` is Terraform for two provider stacks plus checked-in local state artifacts. It is not a clean “modules only” repo: provisioning and app runtime assumptions are mixed together. + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| AWS stack | `aws/main.tf` | VPC, EC2, RDS, CloudWatch, SNS, bootstrap user-data | +| Oracle stack | `oracle/main.tf`, `oracle/cloud-init.sh` | OCI ARM host, cloud-init bootstrap | +| High-risk local artifacts | `terraform.tfstate`, `terraform.tfstate.*`, `terraform.tfvars` | Sensitive, mutable, not normal source docs | + +## CONVENTIONS +- `infra/` root holds shared state artifacts; treat them as operational data, not review-friendly code. +- AWS provisioning is coupled to runtime bootstrap: `aws/main.tf` contains user-data that installs Java, Nginx, Docker, CloudWatch agent, and app directories. +- Oracle provisioning is likewise coupled to runtime bootstrap through `cloud-init.sh` and Always Free ARM assumptions. +- The AWS stack is public-EC2 + private-RDS oriented; the Oracle stack is a public ARM instance path. +- Shared Terraform safety and workflow belong here; provider-specific nuances stay in code/comments unless the stacks diverge further. + +## ANTI-PATTERNS +- Never commit casual edits to `terraform.tfstate`, `terraform.tfstate.*`, or `terraform.tfvars`. +- Do not assume infra changes are isolated from app deployment; bootstrap scripts encode application/runtime contracts. +- Do not split `aws/` and `oracle/` into separate child docs unless the workflows truly diverge; that would mostly duplicate Terraform safety today. +- Do not ignore provider differences: AWS user-data and OCI cloud-init are both executable logic, not comments. + +## NOTES +- AWS config currently includes security/networking, EC2 bootstrap, RDS, log groups, and alarms in one file. +- Oracle config targets Ubuntu 22.04 ARM and uses lifecycle ignore rules to avoid noisy recreate behavior. +- Deployment operations also depend on `docker/` and `scripts/`; infra alone is not the full deploy story. diff --git a/k6/AGENTS.md b/k6/AGENTS.md new file mode 100644 index 00000000..acb543b4 --- /dev/null +++ b/k6/AGENTS.md @@ -0,0 +1,30 @@ +# K6 LOAD TEST GUIDE + +## OVERVIEW +`k6/` contains scenario scripts, aggregate load plans, and a separate Google Cloud Terraform runner. It is a performance-testing subtree, not part of the main infra stack. + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| Env contract | `config.js` | `BASE_URL`, `AUTH_TOKEN`, shared headers, keyword list | +| Aggregate scenario mix | `load-test.js` | Multi-scenario VU plan, thresholds, writes `summary.json` | +| Focused entry scripts | `test-*.js` | Narrow runs for CRUD, search, recommendation | +| Scenario implementations | `scenarios/` | Request logic by use case | +| Remote runner infra | `terraform/main.tf` | GCP `k6-runner`, separate provider from `infra/` | + +## CONVENTIONS +- `config.js` is the shared env surface; keep `BASE_URL` / `AUTH_TOKEN` usage centralized there. +- `load-test.js` mixes CRUD/search/recommendation traffic with scenario-specific thresholds; edits here affect the overall traffic model. +- Scenario files are intentionally lightweight and may swallow parsing failures to preserve latency/error sampling behavior. +- `terraform/` is for provisioning a GCP runner, not for the main application infrastructure. +- Results are expected in `summary.json` and console summary output; `results/` is for stored run artifacts. + +## ANTI-PATTERNS +- Do not hardcode real tokens or production-only URLs into scripts. +- Do not treat `k6/terraform` as interchangeable with `infra/`; it uses a different cloud/provider and purpose. +- Do not tweak thresholds or VU mixes without considering the intended traffic distribution encoded in `load-test.js`. +- Do not bury request-shape changes in aggregate files when they belong in `scenarios/`. + +## NOTES +- Blue/green deploy and app infra live elsewhere; `k6/` assumes a reachable API endpoint, not ownership of deployment. +- Some scenarios require auth headers, others are anonymous; keep that split visible in script names and headers. diff --git a/src/main/java/com/techfork/domain/source/AGENTS.md b/src/main/java/com/techfork/domain/source/AGENTS.md new file mode 100644 index 00000000..9dffeb66 --- /dev/null +++ b/src/main/java/com/techfork/domain/source/AGENTS.md @@ -0,0 +1,42 @@ +# SOURCE DOMAIN GUIDE + +## OVERVIEW +`domain/source` owns RSS ingestion orchestration: crawl feeds, persist new posts, then hand off to summary and embedding steps. + +## STRUCTURE +```text +source/ +├── batch/ # feed reader, RSS->Post processor, bulk writer +├── config/ # job and step wiring +├── listener/ # job lifecycle hooks +├── scheduler/ # cron trigger +├── service/ # crawl execution + lock/job launch path +├── repository/ # TechBlog access +└── dto|entity/ # feed item and blog metadata +``` + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| Job topology | `config/RssCrawlingJobConfig.java` | `rssCrawlingJob` = fetch → summary → embed/index | +| Scheduler trigger | `scheduler/RssCrawlingScheduler.java` | Actual cron is `0 0 5 * * *`, zone `Asia/Seoul` | +| Full pipeline explainer | `docs/source-package.md` | Best human-readable walkthrough | +| Operational behavior | `docs/SCHEDULER_GUIDE.md` | Locking, notifications, troubleshooting; schedule text is partially stale | + +## CONVENTIONS +- Step 1 `fetchAndSaveRssStep`: chunk 10, skip limit 10, `IllegalStateException` is `noSkip`. +- Step 2 `extractSummaryStep`: chunk 5, async processor/writer, summary executor fixed at 2 threads. +- Step 3 `embedAndIndexStep`: chunk 20, async processor/writer, embedding executor 10-20 threads. +- `summaryAndEmbeddingJob` intentionally skips the fetch step; keep it aligned with steps 2-3 only. +- MDC propagation is part of batch execution via `MdcTaskDecorator`; thread pool swaps are not cosmetic here. +- This package owns job orchestration, but summary/embedding readers/processors/writers are imported from `domain/post/batch`. + +## ANTI-PATTERNS +- Do not move step-2/step-3 components into `source/` just because the job config imports them. +- Do not change cron, chunk sizes, skip limits, or thread counts without checking rate-limit and ops docs impact. +- Do not trust `docs/SCHEDULER_GUIDE.md` over code for schedule timing; the code path is the source of truth. +- Do not bypass the crawl service / lock / listener flow when adding manual job triggers. + +## NOTES +- The scheduler comment and code both say daily 05:00 KST; some older docs still describe hourly crawling. +- If a change touches feed fetching, duplicate filtering, summary extraction, and ES indexing together, read `docs/source-package.md` first. diff --git a/src/test/java/com/techfork/evaluation/AGENTS.md b/src/test/java/com/techfork/evaluation/AGENTS.md new file mode 100644 index 00000000..84a5ad8c --- /dev/null +++ b/src/test/java/com/techfork/evaluation/AGENTS.md @@ -0,0 +1,40 @@ +# EVALUATION TEST GUIDE + +## OVERVIEW +`src/test/java/com/techfork/evaluation` is not ordinary integration testing; it is fixture-heavy search/recommendation quality evaluation with separate Gradle tasks, tags, and runtime assumptions. + +## WHERE TO LOOK +| Task | Location | Notes | +|------|----------|-------| +| Task split | `build.gradle` | `integrationTest`, `evaluationTest`, `evaluationSetup` | +| Integration base | `src/test/java/com/techfork/global/common/IntegrationTestBase.java` | `@Tag("integration")`, profile `integrationtest` | +| Recommendation evaluation base | `recommendation/RecommendationTestBase.java` | Loads fixtures, force-merges ES, warms caches 3x | +| Search evaluation base | `search/SearchEvaluationTestBase.java` | `@Tag("evaluation")`, profile `local-tunnel` | +| Fixtures / reports | `src/test/resources/fixtures/evaluation/`, `src/test/resources/evaluation-report-*.json` | Large inputs + checked-in outputs | + +## CONVENTIONS +- `./gradlew test` excludes `integration`, `evaluation`, and `evaluation-setup` workflows by design. +- `IntegrationTestBase` is the normal controller/service integration path; evaluation suites are a different lane. +- Search evaluation uses `@ActiveProfiles("local-tunnel")`; do not silently swap it to `integrationtest`. +- Recommendation evaluation extends `IntegrationTestBase`, loads cached fixtures once, force-merges `posts` and `user_profiles`, then runs warmup before metrics. +- Search evaluation builds `SearchServiceImpl` directly per scenario and writes JSON reports into `src/test/resources/`. +- `evaluation-setup` jobs are prerequisite generators/exporters, not “extra assertions.” Treat them as data-prep workflows. + +## COMMANDS +```bash +./gradlew integrationTest +./gradlew evaluationTest +./gradlew evaluationSetup +./gradlew test -PexcludeIntegration +``` + +## ANTI-PATTERNS +- Do not run evaluation suites as part of a casual unit/integration loop. +- Do not edit fixture JSON or evaluation reports by hand when the source data should be regenerated. +- Do not ignore the `local-tunnel` dependency for search evaluation; it is part of the test contract. +- Do not assume evaluation bases are cheap; they load large fixture sets and may warm Elasticsearch aggressively. + +## NOTES +- Recommendation metrics center on Recall / nDCG / ILD across K values 4, 8, 30. +- Search metrics center on nDCG / Recall at 4, 8, 20 plus latency. +- If you only need normal application verification, stay in `domain/` or `global/` integration tests instead of this subtree.