diff --git a/.claude/README.md b/.claude/README.md new file mode 100644 index 0000000..d45b5a4 --- /dev/null +++ b/.claude/README.md @@ -0,0 +1,45 @@ +# .claude/ — Documentação do Projeto + +## Estrutura + +``` +.claude/ +├── padroes/ # Guias de desenvolvimento por camada +│ ├── backend.md # Arquitetura, SOLID, testes, import/export +│ ├── frontend.md # Layout, componentes, hooks, Tailwind v3 +│ └── ux-ui.md # 10 heurísticas de Nielsen, cores, tabelas +│ +├── skills/ # Specs de implementação +│ ├── README.md # Índice com fluxo entre USs +│ ├── us/ # Uma spec por User Story (US-00 a US-07) +│ ├── qualidade/ # Qualidade e segurança (OWASP, testes, monitoramento) +│ └── readme-gen.md # Geração de READMEs +│ +├── requisitos.md # Épico + User Stories originais +├── backlog.md # Tarefas futuras (fases 2-5 de qualidade) +└── README.md # Este arquivo +``` + +## Como contribuir + +Seguir rigorosamente o [CONTRIBUTING.md](../.github/CONTRIBUTING.md): + +- **Branch**: criar a partir de `dev` (`feature/*`, `fix/*`, `chore/*`) +- **Commits**: `tipo(escopo): descrição` — Conventional Commits +- **PR**: título no mesmo formato, corpo com `## Resumo` + `## Como testar` +- **Merge**: merge commit (não squash), deletar branch após merge +- **Nunca** push direto em `dev` ou `main` + +## Referência rápida + +| Preciso de... | Onde encontrar | +|---------------|----------------| +| Padrões de backend | [padroes/backend.md](padroes/backend.md) | +| Padrões de frontend | [padroes/frontend.md](padroes/frontend.md) | +| Padrões de UX/UI | [padroes/ux-ui.md](padroes/ux-ui.md) | +| Como fazer commits e PRs | [padroes/commits-e-prs.md](padroes/commits-e-prs.md) | +| Spec de uma US | [skills/us/](skills/us/) | +| Roadmap de qualidade | [skills/qualidade/quality-security.md](skills/qualidade/quality-security.md) | +| Backlog de tarefas | [backlog.md](backlog.md) | +| Requisitos originais | [requisitos.md](requisitos.md) | +| Como fazer PR | [../.github/CONTRIBUTING.md](../.github/CONTRIBUTING.md) | diff --git a/.claude/backlog.md b/.claude/backlog.md new file mode 100644 index 0000000..a76f322 --- /dev/null +++ b/.claude/backlog.md @@ -0,0 +1,44 @@ +# Backlog — Qualidade, Segurança e Evolução + +## Qualidade de Software + +| Fase | Descrição | Esforço | Skill | +|------|-----------|---------|-------| +| ~~1~~ | ~~Backend 100% cobertura~~ | ~~Médio~~ | ~~Concluída (PR #89) — 342 testes, 99%~~ | +| 2 | Testes de segurança OWASP Top 10 | Médio | `quality-security.md` § Fase 2 | +| 3 | Testes frontend (Vitest + Testing Library) | Alto | `quality-security.md` § Fase 3 | +| 4 | Testes E2E (Playwright) | Alto | `quality-security.md` § Fase 4 | +| 5 | Monitoramento produção (Sentry + logging + headers) | Baixo | `quality-security.md` § Fase 5 | + +## Segurança (OWASP Top 10) + +| Categoria | O que testar | Status | +|-----------|-------------|--------| +| A01 — Controle de Acesso | IDOR, escalação de privilégio, usuário inativo | Pendente | +| A02 — Falhas Criptográficas | Senhas em resposta, API key em logs, JWT seguro | Pendente | +| A03 — Injeção | SQL injection, XSS armazenado | Pendente | +| A04 — Design Inseguro | Rate limiting, conflito irreversível, soft-delete | Pendente | +| A05 — Configuração Incorreta | CORS, debug mode, stacktrace exposto | Pendente | +| A07 — Autenticação | Brute force, token refresh, senha mínima | Pendente | +| A08 — Integridade | JSON malformado, campos extras, idempotência | Pendente | +| A09 — Logging | Coberto na Fase 5 (monitoramento) | Pendente | +| A10 — SSRF | Validação de domínio YouTube, API key scope | Pendente | + +## Monitoramento Produção + +| Item | Ferramenta | Status | +|------|-----------|--------| +| Error tracking | Sentry free tier (5K erros/mês) | Pendente | +| Logging estruturado | python-json-logger ou stdlib | Pendente | +| Request logging | Middleware Starlette | Pendente | +| Headers de segurança | X-Content-Type-Options, X-Frame-Options, etc. | Pendente | +| Visualização de logs | Vercel Logs (já incluso no Pro) | Pendente | + +## Evolução do Produto (fora do escopo atual) + +| Item | Descrição | +|------|-----------| +| Export ML-ready | CSV com features prontas para treinamento de modelo | +| Classificação automática | Integrar modelo para sugerir bot/humano ao anotador | +| Inter-annotator agreement | Cohen's Kappa, Fleiss' Kappa | +| Multi-idioma | Suporte a vídeos em inglês/espanhol | diff --git a/.claude/backend.md b/.claude/padroes/backend.md similarity index 100% rename from .claude/backend.md rename to .claude/padroes/backend.md diff --git a/.claude/padroes/commits-e-prs.md b/.claude/padroes/commits-e-prs.md new file mode 100644 index 0000000..b278d31 --- /dev/null +++ b/.claude/padroes/commits-e-prs.md @@ -0,0 +1,100 @@ +# Padrão de Commits e PRs + +Referência rápida baseada no [CONTRIBUTING.md](../../.github/CONTRIBUTING.md). +Este arquivo deve ser lido **antes de criar qualquer commit ou PR**. + +--- + +## Commits + +### Formato + +``` +tipo(escopo): descrição curta no imperativo +``` + +### Tipos + +| Tipo | Quando usar | +|------|-------------| +| `feat` | Nova funcionalidade | +| `fix` | Correção de bug | +| `test` | Adição ou correção de testes | +| `docs` | Documentação apenas | +| `chore` | Build, CI, dependências, sem mudança de lógica | +| `refactor` | Refatoração sem mudança de comportamento | +| `style` | Formatação, espaços, ponto e vírgula | + +### Regras + +- **Separar commits por camada**: backend e frontend em commits distintos +- **Escopo obrigatório**: `feat(dashboard)`, `test(collect)`, `chore(ci)` +- **Descrição com corpo**: se houver mais de 3 mudanças, listar no corpo do commit +- **Co-Authored-By**: usar apenas `Co-Authored-By: Claude ` — sem nome de modelo, sem contexto +- **Nunca** usar `--no-verify` ou `--amend` em commits já pushados + +### Exemplo + +``` +feat(dashboard): adicionar endpoints e service da US-06 + +- 5 endpoints: /dashboard/global, /dashboard/video, /dashboard/user, + /dashboard/bots, /dashboard/criteria-effectiveness +- Schemas Pydantic para request/response +- Service com agregações SQL batch e gráficos Plotly + +Co-Authored-By: Claude +``` + +--- + +## PRs + +### Branch + +- Criar a partir de `dev` (nunca de `main`) +- Naming: `feature/*`, `fix/*`, `test/*`, `chore/*`, `quality/*` +- **Nunca** push direto em `dev` ou `main` — sempre branch → PR → merge + +### Título + +Mesmo formato do commit principal: + +``` +feat(dashboard): implementar US-06 — Dashboard de Análise +``` + +### Corpo (obrigatório, em português) + +```markdown +## Resumo + +- Descrição concisa das mudanças (bullets) + +## Como testar + +- [ ] Passo a passo para validar +- [ ] Cenários de erro ou edge cases + +## Screenshots + +(se houver mudanças visuais no frontend) +``` + +### Merge + +- Usar **merge commit** (não squash, não rebase) +- Deletar branch após merge +- Para PR `dev → main`: sincronizar dev com main antes (`git pull origin main`) + +--- + +## Checklist antes de abrir PR + +- [ ] Branch criada a partir de `dev` +- [ ] `ruff check . && ruff format --check .` passando (backend) +- [ ] `bandit -r . --exclude tests` sem issues (backend) +- [ ] `pytest` com cobertura ≥ 90% (backend) +- [ ] `eslint . && prettier --check . && tsc --noEmit` passando (frontend) +- [ ] Nenhum segredo ou API key commitada +- [ ] Commits separados por camada (backend / frontend / docs) diff --git a/.claude/frontend.md b/.claude/padroes/frontend.md similarity index 100% rename from .claude/frontend.md rename to .claude/padroes/frontend.md diff --git a/.claude/ux-ui.md b/.claude/padroes/ux-ui.md similarity index 100% rename from .claude/ux-ui.md rename to .claude/padroes/ux-ui.md diff --git a/.claude/skills/README.md b/.claude/skills/README.md index 970f2bd..079d773 100644 --- a/.claude/skills/README.md +++ b/.claude/skills/README.md @@ -1,34 +1,40 @@ -# Skills — Guias de Implementação por User Story +# Skills — Guias de Implementação -Cada arquivo detalha o contrato de API, schemas de banco, lógica de service, -componentes React sugeridos, casos de erro e dependências com outras USs. +## User Stories (`us/`) -| Arquivo | US | Escopo | -|-------------------------------|-------|-------------------------------------------------| -| [us-00-infra.md](us-00-infra.md) | US-00 | CI/CD, pre-commit hooks, Dependabot, proteção de branch | -| [us-01-auth.md](us-01-auth.md) | US-01 | Login, logout, gestão de usuários (admin/user) | -| [us-02-collect.md](us-02-collect.md) | US-02 | Coleta de comentários via YouTube Data API | -| [us-03-clean.md](us-03-clean.md) | US-03 | Seleção estatística/comportamental de usuários suspeitos | -| [us-04-annotate.md](us-04-annotate.md) | US-04 | Anotação de comentários por usuário do YouTube | -| [us-05-review.md](us-05-review.md) | US-05 | Desempate de conflitos e revisão de bots (admin)| -| [us-06-dashboard.md](us-06-dashboard.md) | US-06 | Dashboard global e individual com Plotly | +Cada arquivo detalha o contrato de API, schemas, lógica de service, componentes React e testes. -| [readme-gen.md](readme-gen.md) | — | Geração consistente de READMEs (raiz, backend, frontend) | +| Arquivo | US | Status | +|---------|-----|--------| +| [us-00-infra.md](us/us-00-infra.md) | US-00 · Infraestrutura e CI/CD | Concluída | +| [us-01-auth.md](us/us-01-auth.md) | US-01 · Autenticação e gestão de usuários | Concluída | +| [us-02-collect.md](us/us-02-collect.md) | US-02 · Coleta de comentários YouTube | Concluída | +| [us-03-clean.md](us/us-03-clean.md) | US-03 · Limpeza e seleção de dataset | Concluída | +| [us-04-annotate.md](us/us-04-annotate.md) | US-04 · Anotação de comentários | Concluída | +| [us-05-review.md](us/us-05-review.md) | US-05 · Revisão de conflitos | Concluída | +| [us-06-dashboard.md](us/us-06-dashboard.md) | US-06 · Dashboard de análise | Concluída | +| [us-07-data-catalog.md](us/us-07-data-catalog.md) | US-07 · Catálogo de dados | Concluída | -## Fluxo entre USs +## Qualidade e Segurança (`qualidade/`) -``` -US-00 (Infra) — base para todas as outras +| Arquivo | Escopo | +|---------|--------| +| [quality-security.md](qualidade/quality-security.md) | 5 fases: cobertura, OWASP, frontend, E2E, monitoramento | + +## Utilitários -US-01 (Auth) — JWT necessário em todas as outras +| Arquivo | Escopo | +|---------|--------| +| [readme-gen.md](readme-gen.md) | Geração de READMEs (raiz, backend, frontend) | + +## Fluxo entre USs -US-02 (Coleta) → US-03 (Limpeza) → US-04 (Anotação) → US-05 (Desempate) - └─────────────────────────→ US-06 (Dashboard) ``` +US-00 (Infra) — base para todas -## Papéis +US-01 (Auth) — JWT necessário em todas -| Role | Acesso | -|---------|---------------------------------------------------------------| -| `admin` | Tudo: gestão de usuários, coleta, limpeza, anotação, desempate, dashboard | -| `user` | Coleta, limpeza, anotação, dashboard | +US-02 (Coleta) → US-03 (Limpeza) → US-04 (Anotação) → US-05 (Revisão) + └──────────────────────→ US-06 (Dashboard) + US-07 (Catálogo) +``` diff --git a/.claude/skills/qualidade/quality-security.md b/.claude/skills/qualidade/quality-security.md new file mode 100644 index 0000000..997336b --- /dev/null +++ b/.claude/skills/qualidade/quality-security.md @@ -0,0 +1,468 @@ +# Qualidade de Software, Segurança e Monitoramento + +## Objetivo + +Elevar a qualidade do projeto para padrão de produção: cobertura de testes 100% no backend, +testes no frontend, testes de segurança (OWASP Top 10), testes E2E e monitoramento em produção. + +--- + +## Estado atual + +| Área | Status | Detalhe | +|------|--------|---------| +| Backend — testes unitários | 184 testes, ≥80% cobertura | 8 módulos de teste, PostgreSQL real | +| Backend — lint/format/segurança | Ruff + Bandit + pip-audit | Enforçado no CI | +| Frontend — testes | **Nenhum** | Sem Vitest, sem *.test.tsx | +| Frontend — lint/format/tipos | ESLint + Prettier + tsc | Enforçado no CI | +| Segurança estática | Bandit (backend), npm audit (frontend) | Sem testes dinâmicos | +| Segurança dinâmica | **Nenhuma** | Sem OWASP ZAP, sem testes de injeção | +| Testes E2E | **Nenhum** | Sem Playwright/Cypress | +| Monitoramento produção | **Nenhum** | Sem Sentry, sem logging estruturado | +| Rate limiting | Configurado (slowapi) | **Sem testes** | + +--- + +## Fase 1 — Backend: cobertura 100% + +### Meta + +Elevar de ≥80% para 100% de cobertura de linhas. Identificar branches não cobertas +e adicionar testes que validem comportamento real (nunca testes triviais para subir %). + +### Passos + +1. Rodar `pytest --cov=. --cov-report=term-missing` para identificar linhas não cobertas +2. Para cada módulo com linhas descobertas: + - Identificar se são branches de erro, edge cases ou código morto + - Código morto → remover em vez de testar + - Branches de erro → adicionar teste com partição de equivalência + - Edge cases → adicionar teste com valor limite +3. Atualizar `.coveragerc` para `fail_under = 100` +4. Atualizar CI para enforçar `--cov-fail-under=100` + +### Módulos a cobrir + +| Módulo | O que provavelmente falta | +|--------|--------------------------| +| `services/collect.py` | Branches de erro da YouTube API (429, 404, timeout), enrich parcial | +| `services/clean/*.py` | Critérios com dados vazios, thresholds edge (0%, 100%) | +| `services/annotate.py` | Justificativa obrigatória para bot, upsert idempotente | +| `services/review.py` | Resolução de conflito já resolvido (409), export sem dataset | +| `services/dashboard.py` | Destaques do vídeo com dados vazios, gráficos com 0 dados | +| `services/data.py` | pg_total_relation_size falhando (try/except), datasets sem collection | +| `routers/*.py` | Validação 422 de payloads malformados | +| `services/auth.py` | Token expirado, token com tipo errado, usuário inativo | +| `core/rate_limit.py` | 429 após exceder limite | + +### Stubs para YouTube API + +Todos os testes de coleta usam stubs — nunca chamam a API real: + +```python +@pytest.fixture +def stub_youtube_success(mocker): + """Stub: YouTube API retorna 20 comentários + nextPageToken.""" + mocker.patch("services.collect.httpx.get", return_value=MockResponse( + status_code=200, + json_data={"items": [...], "nextPageToken": "abc123"} + )) +``` + +Não é necessária API key real para nenhum teste. + +--- + +## Fase 2 — Testes de Segurança (OWASP Top 10) + +### Meta + +Testar as 10 categorias OWASP mais relevantes para esta aplicação. Cada categoria +gera um arquivo `tests/test_security_.py`. + +### Categorias e testes + +#### A01 — Controle de Acesso Quebrado (Broken Access Control) + +```python +# tests/test_security_access.py + +# IDOR — acessar recurso de outro usuário +def test_user_nao_acessa_anotacao_de_outro_via_uuid_manipulado(): ... +def test_user_comum_nao_acessa_rotas_admin(): ... +def test_user_inativo_nao_consegue_autenticar(): ... + +# Escalação de privilégio +def test_user_nao_consegue_se_promover_a_admin(): ... +def test_delete_usuario_exige_role_master(): ... +``` + +#### A02 — Falhas Criptográficas (Cryptographic Failures) + +```python +# tests/test_security_crypto.py + +def test_senha_nunca_aparece_em_resposta_json(): ... +def test_api_key_nunca_aparece_em_log_ou_resposta(): ... +def test_jwt_usa_algoritmo_seguro_hs256(): ... +def test_jwt_expirado_retorna_401(): ... +def test_jwt_com_assinatura_invalida_retorna_401(): ... +def test_jwt_sem_campo_sub_retorna_401(): ... +``` + +#### A03 — Injeção (Injection) + +```python +# tests/test_security_injection.py + +# SQL Injection — SQLAlchemy parametriza por padrão, mas validar +SQLI_PAYLOADS = ["'; DROP TABLE users;--", "1 OR 1=1", "' UNION SELECT * FROM users--"] + +def test_sqli_no_campo_search_bots(payload): ... +def test_sqli_no_video_id(payload): ... +def test_sqli_no_campo_author(payload): ... +def test_sqli_no_username_login(payload): ... + +# XSS — React escapa por padrão, mas validar no backend +XSS_PAYLOADS = ["", "", "javascript:alert(1)"] + +def test_xss_no_nome_de_usuario_nao_executa(): ... +def test_comentario_importado_com_xss_armazenado_sem_executar(): ... +``` + +#### A04 — Design Inseguro (Insecure Design) + +```python +# tests/test_security_design.py + +def test_rate_limiting_login_bloqueia_apos_5_tentativas(): ... +def test_rate_limiting_refresh_bloqueia_apos_10_tentativas(): ... +def test_conflito_resolvido_nao_pode_ser_revertido(): ... +def test_soft_delete_preserva_dados_relacionados(): ... +``` + +#### A05 — Configuração Incorreta (Security Misconfiguration) + +```python +# tests/test_security_config.py + +def test_cors_nao_permite_origin_qualquer_em_producao(): ... +def test_debug_mode_desativado(): ... +def test_stacktrace_nao_exposto_em_erro_500(): ... +def test_health_endpoint_nao_expoe_versao_detalhada(): ... +``` + +#### A07 — Falhas de Autenticação (Authentication Failures) + +```python +# tests/test_security_auth.py + +def test_brute_force_bloqueado_por_rate_limit(): ... +def test_token_refresh_com_access_token_retorna_401(): ... +def test_logout_invalida_sessao(): ... +def test_password_minimo_8_caracteres(): ... +def test_username_formato_valido_apenas_alfanumerico(): ... +``` + +#### A08 — Falhas de Integridade (Software and Data Integrity) + +```python +# tests/test_security_integrity.py + +def test_import_json_malformado_retorna_422(): ... +def test_import_com_campos_extras_ignora_campos(): ... +def test_import_com_video_id_inexistente_retorna_erro(): ... +def test_bulk_insert_on_conflict_nao_duplica(): ... +``` + +#### A09 — Falhas de Logging e Monitoramento + +Coberto na Fase 4 (Monitoramento em Produção). + +#### A10 — Server-Side Request Forgery (SSRF) + +```python +# tests/test_security_ssrf.py + +def test_video_url_so_aceita_youtube_domain(): ... +def test_api_key_nao_enviada_para_dominio_externo(): ... +``` + +--- + +## Fase 3 — Testes Frontend (Vitest + Testing Library) + +### Setup + +```bash +cd frontend +npm install -D vitest @testing-library/react @testing-library/jest-dom @testing-library/user-event jsdom +``` + +Configurar `vitest.config.ts`: + +```ts +import { defineConfig } from "vitest/config"; +import react from "@vitejs/plugin-react"; + +export default defineConfig({ + plugins: [react()], + test: { + environment: "jsdom", + globals: true, + setupFiles: ["./src/test/setup.ts"], + css: true, + }, +}); +``` + +### Estrutura + +``` +src/ +├── test/ +│ └── setup.ts # import @testing-library/jest-dom +├── hooks/ +│ ├── useAnnotate.test.ts +│ ├── useClean.test.ts +│ ├── useDashboard.test.ts +│ ├── useData.test.ts +│ └── useReview.test.ts +├── components/ +│ ├── PageHeader.test.tsx +│ ├── StatusBadge.test.tsx +│ ├── ProgressBar.test.tsx +│ ├── StepsCard.test.tsx +│ └── ProtectedRoute.test.tsx +├── pages/ +│ ├── Dashboard/ +│ │ ├── KpiCards.test.tsx +│ │ ├── CriteriaFilterBar.test.tsx +│ │ └── BotCommentsTable.test.tsx +│ └── NotFound/ +│ └── NotFoundPage.test.tsx +└── contexts/ + └── AuthContext.test.tsx +``` + +### Prioridade de testes + +| Prioridade | Componente/Hook | O que testar | +|------------|-----------------|-------------| +| Alta | AuthContext | Login, logout, token refresh, estado persistido | +| Alta | ProtectedRoute | Redirect sem token, redirect sem admin, renderiza com token | +| Alta | Hooks (useAnnotate, etc.) | Fetch, loading, error states, transformação de dados | +| Média | PageHeader | Renderiza nome, role badge, breadcrumb, botão sair | +| Média | KpiCards | Renderiza todos os cards com cores corretas | +| Média | CriteriaFilterBar | Toggle checkboxes, limpar filtros | +| Média | BotCommentsTable | Paginação, busca, filtro por critério | +| Baixa | StatusBadge | Cores por status | +| Baixa | ProgressBar | Determinado vs indeterminado | + +### CI + +Adicionar ao `.github/workflows/ci.yml` no job `frontend`: + +```yaml +- name: Test + run: npx vitest run --coverage --coverage.thresholds.lines=80 +``` + +--- + +## Fase 4 — Testes E2E (Playwright) + +### Setup + +```bash +cd frontend +npm install -D @playwright/test +npx playwright install +``` + +### Fluxos a testar + +``` +tests/e2e/ +├── auth.spec.ts # Login, logout, redirect sem token, token refresh +├── collect.spec.ts # Coleta mockada (stub API), status, export +├── clean.spec.ts # Preview, criar dataset, download +├── annotate.spec.ts # Navegar usuários, anotar bot/humano, progresso +├── review.spec.ts # Listar conflitos, resolver, stats +├── dashboard.spec.ts # 3 abas, filtro critério, tabela de bots +├── data.spec.ts # Catálogo, painéis de detalhe +├── not-found.spec.ts # URL inexistente → página 404 +└── security.spec.ts # XSS no DOM, CSRF headers +``` + +### Estratégia de dados + +- E2E usa o endpoint `POST /seed` para popular dados mockados antes dos testes +- `DELETE /seed` limpa após os testes +- Não depende de YouTube API real — seed gera dados completos + +--- + +## Fase 5 — Monitoramento em Produção + +### Logging estruturado + +```python +# core/logging.py +import logging +import json +from datetime import datetime, timezone + +class JSONFormatter(logging.Formatter): + def format(self, record): + return json.dumps({ + "timestamp": datetime.now(timezone.utc).isoformat(), + "level": record.levelname, + "logger": record.name, + "message": record.getMessage(), + "module": record.module, + "function": record.funcName, + "line": record.lineno, + }) +``` + +Aplicar em `main.py`: + +```python +import logging +from core.logging import JSONFormatter + +handler = logging.StreamHandler() +handler.setFormatter(JSONFormatter()) +logging.root.addHandler(handler) +logging.root.setLevel(logging.INFO) +``` + +### Middleware de request logging + +```python +# core/middleware.py +import time +import logging +from starlette.middleware.base import BaseHTTPMiddleware + +logger = logging.getLogger("http") + +class RequestLogMiddleware(BaseHTTPMiddleware): + async def dispatch(self, request, call_next): + start = time.perf_counter() + response = await call_next(request) + duration_ms = (time.perf_counter() - start) * 1000 + logger.info( + "%s %s %s %.0fms", + request.method, + request.url.path, + response.status_code, + duration_ms, + ) + return response +``` + +### Sentry (error tracking) + +```python +# main.py +import sentry_sdk + +if os.getenv("SENTRY_DSN"): + sentry_sdk.init( + dsn=os.getenv("SENTRY_DSN"), + traces_sample_rate=0.1, + environment=os.getenv("VERCEL_ENV", "development"), + ) +``` + +Dependência: `sentry-sdk[fastapi]` no requirements.txt. + +### Health check expandido + +```python +@app.get("/health") +def health(db: Session = Depends(get_db)): + db.execute(text("SELECT 1")) + return { + "status": "ok", + "checks": { + "database": "connected", + }, + } +``` + +### Headers de segurança + +```python +# core/middleware.py +class SecurityHeadersMiddleware(BaseHTTPMiddleware): + async def dispatch(self, request, call_next): + response = await call_next(request) + response.headers["X-Content-Type-Options"] = "nosniff" + response.headers["X-Frame-Options"] = "DENY" + response.headers["X-XSS-Protection"] = "1; mode=block" + response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin" + response.headers["Permissions-Policy"] = "camera=(), microphone=()" + return response +``` + +### Métricas de performance (opcional) + +Se quiser dashboard de métricas: +- `prometheus-fastapi-instrumentator` para métricas Prometheus +- Grafana Cloud free tier para visualização + +--- + +## Ordem de execução + +| Fase | Descrição | Esforço | +|------|-----------|---------| +| 1 | Backend 100% cobertura | Médio — identificar gaps e adicionar testes | +| 2 | Testes de segurança OWASP | Médio — ~40 testes novos | +| 3 | Testes frontend (Vitest) | Alto — setup + ~50 testes | +| 4 | Testes E2E (Playwright) | Alto — setup + ~30 fluxos | +| 5 | Monitoramento produção | Baixo — logging + Sentry + headers | + +Fases 1 e 2 podem rodar em paralelo. Fase 5 pode ser feita a qualquer momento. + +--- + +## Dependências novas + +### Backend +``` +sentry-sdk[fastapi] # error tracking (Fase 5) +``` + +### Frontend +``` +vitest # test runner (Fase 3) +@testing-library/react # component testing (Fase 3) +@testing-library/jest-dom # DOM matchers (Fase 3) +@testing-library/user-event # user interaction simulation (Fase 3) +jsdom # browser environment (Fase 3) +@playwright/test # E2E testing (Fase 4) +``` + +--- + +## CI atualizado (meta final) + +```yaml +backend: + - ruff check + ruff format --check + - bandit -r backend/ --exclude tests + - pip-audit + - pytest --cov=. --cov-fail-under=100 # Fase 1: 80 → 100 + - pytest tests/test_security_*.py # Fase 2: OWASP + +frontend: + - eslint + prettier --check + - tsc --noEmit + - npm audit --audit-level=high + - vitest run --coverage --coverage.thresholds.lines=80 # Fase 3 + - npx playwright test # Fase 4 +``` diff --git a/.claude/skills/us-00-infra.md b/.claude/skills/us/us-00-infra.md similarity index 100% rename from .claude/skills/us-00-infra.md rename to .claude/skills/us/us-00-infra.md diff --git a/.claude/skills/us-01-auth.md b/.claude/skills/us/us-01-auth.md similarity index 100% rename from .claude/skills/us-01-auth.md rename to .claude/skills/us/us-01-auth.md diff --git a/.claude/skills/us-02-collect.md b/.claude/skills/us/us-02-collect.md similarity index 100% rename from .claude/skills/us-02-collect.md rename to .claude/skills/us/us-02-collect.md diff --git a/.claude/skills/us-03-clean.md b/.claude/skills/us/us-03-clean.md similarity index 100% rename from .claude/skills/us-03-clean.md rename to .claude/skills/us/us-03-clean.md diff --git a/.claude/skills/us-04-annotate.md b/.claude/skills/us/us-04-annotate.md similarity index 100% rename from .claude/skills/us-04-annotate.md rename to .claude/skills/us/us-04-annotate.md diff --git a/.claude/skills/us-05-review.md b/.claude/skills/us/us-05-review.md similarity index 100% rename from .claude/skills/us-05-review.md rename to .claude/skills/us/us-05-review.md diff --git a/.claude/skills/us-06-dashboard.md b/.claude/skills/us/us-06-dashboard.md similarity index 100% rename from .claude/skills/us-06-dashboard.md rename to .claude/skills/us/us-06-dashboard.md diff --git a/.claude/skills/us-07-data-catalog.md b/.claude/skills/us/us-07-data-catalog.md similarity index 100% rename from .claude/skills/us-07-data-catalog.md rename to .claude/skills/us/us-07-data-catalog.md diff --git a/CLAUDE.md b/CLAUDE.md index 13eeab9..1da3420 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -224,7 +224,7 @@ Arquivo `.github/dependabot.yml` configurado para: - Canais não retornados pela YouTube API recebem epoch (1970-01-01) para evitar loop infinito - Refresh token (7 dias) no `localStorage`, access token (60min) no `sessionStorage` — interceptor transparente no `http.ts` - API keys pessoais e intransferíveis — avisos obrigatórios em CollectPage, UsersPage e CreateUserModal -- **Cards de instrução obrigatórios em páginas de US**: toda página de US deve orientar o usuário em cada etapa do fluxo — use `` no estado inicial, notice `bg-davint-50` durante processamento ativo, banner `bg-yellow-50` para estados interrompidos, CTA claro ao concluir. Ver `.claude/frontend.md` § "Cards de instrução por etapa" +- **Cards de instrução obrigatórios em páginas de US**: toda página de US deve orientar o usuário em cada etapa do fluxo — use `` no estado inicial, notice `bg-davint-50` durante processamento ativo, banner `bg-yellow-50` para estados interrompidos, CTA claro ao concluir. Ver `.claude/padroes/frontend.md` § "Cards de instrução por etapa" ## Import/Export de dados diff --git a/backend/.coveragerc b/backend/.coveragerc index d9d3652..cdc1d13 100644 --- a/backend/.coveragerc +++ b/backend/.coveragerc @@ -1,3 +1,6 @@ [run] omit = alembic/* + +[report] +fail_under = 90 diff --git a/backend/tests/test_annotate.py b/backend/tests/test_annotate.py index 2462ab9..270cd81 100644 --- a/backend/tests/test_annotate.py +++ b/backend/tests/test_annotate.py @@ -598,3 +598,315 @@ def test_export_csv(self, client, auth_as_user, setup_data): lines = resp.text.strip().split("\n") assert lines[0] == "comment_db_id,label,justificativa" assert len(lines) >= 2 + + +# --------------------------------------------------------------------------- +# Testes adicionais — cobertura 100% +# --------------------------------------------------------------------------- + + +class TestAdminCannotAnnotate: + def test_admin_post_annotate_retorna_403(self, client, auth_as_admin, setup_data): + """Admin nao pode anotar — apenas revisar conflitos.""" + comment = setup_data["comments"][0] + resp = client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "humano", + }, + ) + assert resp.status_code == 403 + assert "administradores" in resp.json()["detail"].lower() + + +class TestListDatasetUsersAdmin: + def test_admin_ve_todas_anotacoes( + self, + client, + db, + auth_as_admin, + admin_user, + regular_user, + setup_data, + ): + """Admin ve contagem de anotacoes de todos.""" + from models.annotation import Annotation + + comment = setup_data["comments"][0] + # Criar anotacao como regular_user diretamente no DB + ann = Annotation( + comment_id=comment.id, + annotator_id=regular_user.id, + label="humano", + ) + db.add(ann) + db.commit() + + ds = setup_data["dataset"] + resp = client.get(f"/annotate/users?dataset_id={ds.id}") + assert resp.status_code == 200 + data = resp.json() + # Admin ve total_annotated global (1 anotacao) + assert data["annotated_comments_by_me"] == 1 + + +class TestListDatasetUsersFilters: + def test_only_pending_filter(self, client, auth_as_user, setup_data): + """Filtro only_pending retorna apenas usuarios pendentes.""" + ds = setup_data["dataset"] + comment = setup_data["comments"][0] + + # Anotar 1 de 3 comentarios + client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "humano", + }, + ) + + resp = client.get(f"/annotate/users?dataset_id={ds.id}" "&only_pending=true") + assert resp.status_code == 200 + data = resp.json() + # Ainda ha pendencias + for item in data["items"]: + assert item["my_pending_count"] > 0 + + def test_pending_first_ordering(self, client, auth_as_user, setup_data): + """Filtro pending_first ordena por pendencias desc.""" + ds = setup_data["dataset"] + resp = client.get(f"/annotate/users?dataset_id={ds.id}" "&pending_first=true") + assert resp.status_code == 200 + assert len(resp.json()["items"]) >= 1 + + +class TestGetEntryCommentsAdmin: + def test_admin_ve_all_annotations( + self, + client, + db, + auth_as_admin, + admin_user, + regular_user, + setup_data, + ): + """Admin ve all_annotations com nomes dos anotadores.""" + from models.annotation import Annotation + + comment = setup_data["comments"][0] + ann = Annotation( + comment_id=comment.id, + annotator_id=regular_user.id, + label="humano", + ) + db.add(ann) + db.commit() + + ds = setup_data["dataset"] + entry = db.query(DatasetEntry).filter_by(dataset_id=ds.id).first() + resp = client.get(f"/annotate/comments/{entry.id}") + assert resp.status_code == 200 + data = resp.json() + + annotated = [c for c in data["comments"] if c["all_annotations"] is not None] + assert len(annotated) >= 1 + first_ann = annotated[0]["all_annotations"][0] + assert "annotator_name" in first_ann + + +class TestConflictReopening: + def test_reannotation_reabre_conflito_resolvido( + self, client, db, auth_as_user, setup_data, second_user + ): + """Re-anotacao apos resolucao reabre o conflito.""" + comment = setup_data["comments"][0] + + # User A anota humano + client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "humano", + }, + ) + + # User B anota bot -> conflito criado + app.dependency_overrides[get_current_user] = lambda: second_user + client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "bot", + "justificativa": "Spam.", + }, + ) + + # Resolver conflito manualmente no banco + conflict = db.query(AnnotationConflict).filter_by(comment_id=comment.id).first() + conflict.status = "resolved" + conflict.resolved_label = "bot" + db.commit() + + # User B re-anota bot (mesma label, mas A=humano) + # Labels divergem: A=humano vs B=bot -> reabre + resp = client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "bot", + "justificativa": "Confirmo spam.", + }, + ) + assert resp.status_code == 200 + assert resp.json()["conflict_created"] is True + + db.refresh(conflict) + assert conflict.status == "pending" + assert conflict.resolved_by is None + + +class TestGetAllProgress: + def test_all_progress_retorna_dados_por_anotador( + self, + client, + db, + auth_as_admin, + admin_user, + regular_user, + setup_data, + ): + """Admin ve progresso de todos os anotadores.""" + from models.annotation import Annotation + + comment = setup_data["comments"][0] + ann = Annotation( + comment_id=comment.id, + annotator_id=regular_user.id, + label="humano", + ) + db.add(ann) + db.commit() + + resp = client.get("/annotate/all-progress") + assert resp.status_code == 200 + data = resp.json() + # Deve conter pelo menos o regular_user + assert any(p["annotator_name"] == "Usuário Teste" for p in data) + + def test_all_progress_exclui_admin( + self, client, db, auth_as_admin, admin_user, setup_data + ): + """Admin nao aparece como anotador no all-progress.""" + resp = client.get("/annotate/all-progress") + assert resp.status_code == 200 + data = resp.json() + for entry in data: + assert entry["annotator_name"] != admin_user.name + + def test_all_progress_requer_admin(self, client, auth_as_user): + """Endpoint all-progress requer role admin.""" + resp = client.get("/annotate/all-progress") + assert resp.status_code == 403 + + +class TestImportAnnotationsChunk: + def test_import_chunk_retorna_totais(self, client, auth_as_user, setup_data): + """Import-chunk retorna contadores do batch.""" + comments = setup_data["comments"] + resp = client.post( + "/annotate/import-chunk", + json={ + "annotations": [ + { + "comment_db_id": str(comments[0].id), + "label": "humano", + }, + ], + "done": True, + }, + ) + assert resp.status_code == 200 + data = resp.json() + assert data["total_imported"] == 1 + assert data["chunk_received"] == 1 + assert data["done"] is True + + +class TestExportWithDatasetFilter: + def test_export_json_com_dataset_id(self, client, auth_as_user, setup_data): + """Export JSON filtrado por dataset_id inclui metadados.""" + comment = setup_data["comments"][0] + ds = setup_data["dataset"] + + client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "humano", + }, + ) + + resp = client.get(f"/annotate/export?format=json" f"&dataset_id={ds.id}") + assert resp.status_code == 200 + data = resp.json() + assert "dataset_id" in data + assert "dataset_name" in data + assert "video_id" in data + assert "annotations" in data + + def test_export_csv_com_dataset_id(self, client, auth_as_user, setup_data): + """Export CSV filtrado por dataset_id funciona.""" + comment = setup_data["comments"][0] + ds = setup_data["dataset"] + + client.post( + "/annotate", + json={ + "comment_db_id": str(comment.id), + "label": "humano", + }, + ) + + resp = client.get(f"/annotate/export?format=csv" f"&dataset_id={ds.id}") + assert resp.status_code == 200 + assert "text/csv" in resp.headers["content-type"] + lines = resp.text.strip().split("\n") + assert lines[0] == "comment_db_id,label,justificativa" + + +# ------------------------------------------------------------------- +# Cobertura adicional — get_all_progress total_comments == 0 +# ------------------------------------------------------------------- + + +class TestGetAllProgressZeroComments: + def test_dataset_sem_comments_pulado_no_all_progress( + self, + client, + db, + auth_as_admin, + admin_user, + regular_user, + ): + """Dataset com 0 comentarios e ignorado no all-progress.""" + col = _make_collection(db, admin_user.id, video_id="vid_empty_prog") + # Dataset com entry mas sem comentarios reais + ds = Dataset( + name=f"empty_ap_{uuid.uuid4().hex[:6]}", + collection_id=col.id, + criteria_applied=["percentil"], + thresholds={}, + total_users_original=0, + total_users_selected=0, + created_by=admin_user.id, + ) + db.add(ds) + db.commit() + + resp = client.get("/annotate/all-progress") + assert resp.status_code == 200 + data = resp.json() + # Dataset sem comments nao aparece + ds_ids = [p["dataset_id"] for p in data] + assert str(ds.id) not in ds_ids diff --git a/backend/tests/test_auth.py b/backend/tests/test_auth.py index cdd095e..120503d 100644 --- a/backend/tests/test_auth.py +++ b/backend/tests/test_auth.py @@ -160,3 +160,173 @@ def test_refresh_com_access_token_retorna_401(client, db): json={"refresh_token": access_token}, ) assert response.status_code == 401 + + +# --------------------------------------------------------------------------- +# Cobertura adicional — get_current_user edge cases +# --------------------------------------------------------------------------- + + +def test_expired_token_returns_401_message(client, mocker): + """Stub: ExpiredSignatureError retorna detail 'Token expirado.'.""" + import jwt as pyjwt + + mocker.patch( + "services.auth.jwt.decode", + side_effect=pyjwt.ExpiredSignatureError, + ) + + resp = client.get( + "/users/", + headers={"Authorization": "Bearer expired"}, + ) + assert resp.status_code == 401 + assert resp.json()["detail"] == "Token expirado." + + +def test_generic_jwt_error_returns_401(client, mocker): + """Stub: PyJWTError genérico retorna 401 credenciais inválidas.""" + import jwt as pyjwt + + mocker.patch( + "services.auth.jwt.decode", + side_effect=pyjwt.PyJWTError("bad token"), + ) + + resp = client.get( + "/users/", + headers={"Authorization": "Bearer badtoken"}, + ) + assert resp.status_code == 401 + assert resp.json()["detail"] == "Credenciais inválidas." + + +def test_token_without_sub_returns_401(client, mocker): + """Stub: token sem campo 'sub' retorna 401.""" + mocker.patch( + "services.auth.jwt.decode", + return_value={"type": "access"}, + ) + + resp = client.get( + "/users/", + headers={"Authorization": "Bearer nosub"}, + ) + assert resp.status_code == 401 + assert resp.json()["detail"] == "Credenciais inválidas." + + +def test_token_for_nonexistent_user_returns_401(client, mocker): + """Stub: token válido mas usuário não existe no banco.""" + mocker.patch( + "services.auth.jwt.decode", + return_value={"sub": "ghost_user", "type": "access"}, + ) + + resp = client.get( + "/users/", + headers={"Authorization": "Bearer valid_but_ghost"}, + ) + assert resp.status_code == 401 + assert resp.json()["detail"] == "Credenciais inválidas." + + +def test_token_for_deactivated_user_returns_401(client, db, mocker): + """Stub: token válido mas usuário está desativado.""" + user = User( + username="deactivated", + name="Deactivated User", + hashed_password=get_password_hash("pass1234"), + role="user", + is_active=False, + ) + db.add(user) + db.commit() + + mocker.patch( + "services.auth.jwt.decode", + return_value={"sub": "deactivated", "type": "access"}, + ) + + resp = client.get( + "/users/", + headers={"Authorization": "Bearer deactivated_tok"}, + ) + assert resp.status_code == 401 + assert resp.json()["detail"] == "Credenciais inválidas." + + +# --------------------------------------------------------------------------- +# Cobertura adicional — refresh com usuário deletado +# --------------------------------------------------------------------------- + + +def test_refresh_with_deleted_user_returns_401(client, db): + """Refresh token válido mas o usuário foi desativado.""" + from services.auth import create_refresh_token + + user = User( + username="refreshgone", + name="Refresh Gone", + hashed_password=get_password_hash("password123"), + role="user", + ) + db.add(user) + db.commit() + + # Gerar refresh token diretamente (evita rate limit) + refresh_token = create_refresh_token( + data={ + "sub": user.username, + "role": user.role, + "name": user.name, + } + ) + + # Desativar o usuário + user.is_active = False + db.commit() + + resp = client.post( + "/auth/refresh", + json={"refresh_token": refresh_token}, + ) + assert resp.status_code == 401 + assert "não encontrado" in resp.json()["detail"].lower() or ( + "desativado" in resp.json()["detail"].lower() + ) + + +# --------------------------------------------------------------------------- +# Cobertura adicional — logout +# --------------------------------------------------------------------------- + + +def test_logout_returns_success(client, db): + """POST /auth/logout com token válido retorna mensagem de sucesso.""" + from services.auth import create_access_token + + user = User( + username="logoutuser", + name="Logout User", + hashed_password=get_password_hash("password123"), + role="user", + ) + db.add(user) + db.commit() + + # Gerar token diretamente (evita rate limit do login) + token = create_access_token( + data={ + "sub": user.username, + "role": user.role, + "name": user.name, + } + ) + + resp = client.post( + "/auth/logout", + headers={"Authorization": f"Bearer {token}"}, + ) + assert resp.status_code == 200 + assert "Logout" in resp.json()["detail"] diff --git a/backend/tests/test_clean.py b/backend/tests/test_clean.py index 23857af..67245ac 100644 --- a/backend/tests/test_clean.py +++ b/backend/tests/test_clean.py @@ -613,3 +613,520 @@ def test_deleta_dataset(self, client, auth_as_user, completed_collection): def test_deleta_dataset_inexistente_retorna_404(self, client, auth_as_user): resp = client.delete(f"/clean/datasets/{uuid.uuid4()}") assert resp.status_code == 404 + + +# --------------------------------------------------------------------------- +# Testes unitários adicionais — cobertura 100% +# --------------------------------------------------------------------------- + + +class TestIdenticalSelector: + def test_constructor_e_select_vazio(self, db): + """IdenticalSelector com user_comments vazio retorna set vazio.""" + from services.clean.identical import IdenticalSelector + + selector = IdenticalSelector(db=db, collection_id=str(uuid.uuid4())) + assert selector.select({}) == set() + + def test_select_detecta_texto_identico_em_outra_coleta(self, db, admin_user): + """Detecta texto duplicado entre coletas diferentes.""" + from services.clean.identical import IdenticalSelector + + # Coleta 1 + col1 = Collection( + video_id="vid_A", + status="completed", + collected_by=admin_user.id, + total_comments=1, + ) + db.add(col1) + db.flush() + + c1 = Comment( + collection_id=col1.id, + comment_id="c1", + author_channel_id="UC_dup", + author_display_name="Dup", + text_original="texto repetido", + like_count=0, + reply_count=0, + published_at=datetime(2024, 1, 1), + updated_at=datetime(2024, 1, 1), + ) + db.add(c1) + db.flush() + + # Coleta 2 + col2 = Collection( + video_id="vid_B", + status="completed", + collected_by=admin_user.id, + total_comments=1, + ) + db.add(col2) + db.flush() + + c2 = Comment( + collection_id=col2.id, + comment_id="c2", + author_channel_id="UC_dup", + author_display_name="Dup", + text_original="texto repetido", + like_count=0, + reply_count=0, + published_at=datetime(2024, 1, 2), + updated_at=datetime(2024, 1, 2), + ) + db.add(c2) + db.commit() + + user_comments = {"UC_dup": [c1]} + selector = IdenticalSelector(db=db, collection_id=str(col1.id)) + selected = selector.select(user_comments) + assert "UC_dup" in selected + + def test_select_sem_match_em_outras_coletas(self, db, admin_user): + """Sem texto repetido em outra coleta, ninguem selecionado.""" + from services.clean.identical import IdenticalSelector + + col = Collection( + video_id="vid_solo", + status="completed", + collected_by=admin_user.id, + total_comments=1, + ) + db.add(col) + db.flush() + + c = Comment( + collection_id=col.id, + comment_id="c_solo", + author_channel_id="UC_solo", + author_display_name="Solo", + text_original="texto unico", + like_count=0, + reply_count=0, + published_at=datetime(2024, 1, 1), + updated_at=datetime(2024, 1, 1), + ) + db.add(c) + db.commit() + + user_comments = {"UC_solo": [c]} + selector = IdenticalSelector(db=db, collection_id=str(col.id)) + selected = selector.select(user_comments) + assert selected == set() + + +class TestProfileSelectorEdgeCases: + def test_empty_input_retorna_vazio(self): + """ProfileSelector com input vazio retorna set vazio.""" + from services.clean.profile import ProfileSelector + + assert ProfileSelector().select({}) == set() + + def test_naive_timezone_handling(self): + """Datetime naive recebe UTC antes de comparar.""" + from services.clean.profile import ProfileSelector + + c = Comment( + id=uuid.uuid4(), + collection_id=uuid.uuid4(), + comment_id="tz_test", + author_channel_id="UC_tz", + author_display_name="TZ", + text_original="hi", + like_count=0, + reply_count=0, + published_at=datetime(2024, 1, 1), + updated_at=datetime(2024, 1, 1), + # Canal criado ontem — naive datetime + author_channel_published_at=datetime(2024, 1, 1), + author_profile_image_url="https://normal.com/pic.jpg", + ) + user_comments = {"UC_tz": [c]} + # Canal antigo, nao recente — nao deve ser selecionado + selected = ProfileSelector().select(user_comments) + assert "UC_tz" not in selected + + def test_or_logic_default_avatar_ou_canal_recente(self): + """Basta avatar padrao OU canal recente para selecionar.""" + from datetime import UTC + + from services.clean.profile import ProfileSelector + + # Apenas avatar padrao + c1 = Comment( + id=uuid.uuid4(), + collection_id=uuid.uuid4(), + comment_id="avatar_test", + author_channel_id="UC_avatar", + author_display_name="Avatar", + text_original="hi", + like_count=0, + reply_count=0, + published_at=datetime(2024, 1, 1), + updated_at=datetime(2024, 1, 1), + author_profile_image_url=("https://yt4.ggpht.com/a/default-user=s48"), + author_channel_published_at=datetime(2020, 1, 1, tzinfo=UTC), + ) + user_comments = {"UC_avatar": [c1]} + selected = ProfileSelector().select(user_comments) + assert "UC_avatar" in selected + + +class TestShortCommentsSelectorEmpty: + def test_empty_comments_list_skip(self): + """Usuarios com lista vazia de comentarios sao ignorados.""" + selector = ShortCommentsSelector(threshold_chars=20) + user_comments = {"UC_empty": []} + selected = selector.select(user_comments) + assert "UC_empty" not in selected + + +class TestCentralMeasureSelectorEmpty: + def test_user_counts_vazio_retorna_vazio(self): + """CentralMeasureSelector com input vazio retorna set vazio.""" + selector = MeanSelector() + assert selector.select({}) == set() + + +class TestComputeCentralMeasuresSmallSample: + def test_q1_q3_com_menos_de_4_valores(self): + """Com <4 valores, Q1=min e Q3=max.""" + m = compute_central_measures({"A": 5, "B": 10}) + assert m["iqr_lower"] == 5.0 + assert m["iqr_upper"] == 10.0 + + +class TestGroupByUserExcludeChannel: + def test_exclui_video_channel_id(self): + """Comentarios do dono do canal sao excluidos.""" + comments = _make_comments({"OWNER": 5, "A": 3}) + groups = group_by_user(comments, exclude_channel_id="OWNER") + assert "OWNER" not in groups + assert "A" in groups + + +class TestBuildSelectorAllCases: + def test_all_selectors_and_unknown(self, db): + """Testa todos os cases do _build_selector.""" + from services.clean.service import _build_selector + + s = _build_selector( + "percentil", + threshold_chars=20, + threshold_seconds=30, + ) + assert isinstance(s, PercentileSelector) + + s = _build_selector( + "media", + threshold_chars=20, + threshold_seconds=30, + ) + assert isinstance(s, MeanSelector) + + s = _build_selector( + "moda", + threshold_chars=20, + threshold_seconds=30, + ) + from services.clean.mode import ModeSelector as MS + + assert isinstance(s, MS) + + s = _build_selector( + "mediana", + threshold_chars=20, + threshold_seconds=30, + ) + assert isinstance(s, MedianSelector) + + s = _build_selector( + "curtos", + threshold_chars=42, + threshold_seconds=30, + ) + assert isinstance(s, ShortCommentsSelector) + assert s.threshold_chars == 42 + + s = _build_selector( + "intervalo", + threshold_chars=20, + threshold_seconds=99, + ) + assert isinstance(s, TimeIntervalSelector) + assert s.threshold_seconds == 99 + + from services.clean.identical import IdenticalSelector + + s = _build_selector( + "identicos", + threshold_chars=20, + threshold_seconds=30, + db=db, + collection_id=uuid.uuid4(), + ) + assert isinstance(s, IdenticalSelector) + + from services.clean.profile import ProfileSelector + + s = _build_selector( + "perfil", + threshold_chars=20, + threshold_seconds=30, + ) + assert isinstance(s, ProfileSelector) + + with pytest.raises(ValueError, match="desconhecido"): + _build_selector( + "invalido", + threshold_chars=20, + threshold_seconds=30, + ) + + +class TestPreviewThresholdOutputs: + def test_preview_curtos_inclui_threshold_chars( + self, client, auth_as_user, completed_collection + ): + """Preview com criterio 'curtos' inclui threshold_chars.""" + resp = client.get( + "/clean/preview", + params={ + "collection_id": str(completed_collection.id), + "criteria": "curtos", + "threshold_chars": 50, + }, + ) + assert resp.status_code == 200 + data = resp.json() + entry = data["by_criteria"]["curtos"] + assert entry["threshold_chars"] == 50 + + def test_preview_intervalo_inclui_threshold_seconds( + self, client, auth_as_user, completed_collection + ): + """Preview com criterio 'intervalo' inclui threshold_seconds.""" + resp = client.get( + "/clean/preview", + params={ + "collection_id": str(completed_collection.id), + "criteria": "intervalo", + "threshold_seconds": 120, + }, + ) + assert resp.status_code == 200 + data = resp.json() + entry = data["by_criteria"]["intervalo"] + assert entry["threshold_seconds"] == 120 + + +# --------------------------------------------------------------------------- +# Testes de integração — Import de dataset +# --------------------------------------------------------------------------- + + +class TestImportDatasetEndpoint: + def test_import_dataset_valido(self, client, auth_as_user, completed_collection): + """Import de dataset com video_id valido cria dataset.""" + resp = client.post( + "/clean/import", + json={ + "dataset": { + "name": "imported_ds", + "video_id": "dQw4w9WgXcQ", + "criteria_applied": ["percentil"], + }, + "users": [ + { + "author_channel_id": "A", + "author_display_name": "User A", + "comment_count": 10, + "matched_criteria": ["percentil"], + }, + ], + }, + ) + assert resp.status_code == 201 + data = resp.json() + assert data["name"] == "imported_ds" + assert data["total_users_selected"] == 1 + + def test_import_dataset_video_id_inexistente_404(self, client, auth_as_user): + """Import com video_id sem coleta concluida retorna 404.""" + resp = client.post( + "/clean/import", + json={ + "dataset": { + "name": "ds_fail", + "video_id": "INEXISTENTE", + "criteria_applied": ["media"], + }, + "users": [ + { + "author_channel_id": "X", + "comment_count": 1, + }, + ], + }, + ) + assert resp.status_code == 404 + + def test_import_dataset_coleta_nao_concluida_404( + self, client, db, auth_as_user, admin_user + ): + """Import com coleta nao concluida retorna 404.""" + running = Collection( + video_id="running_vid", + status="running", + collected_by=admin_user.id, + ) + db.add(running) + db.commit() + + resp = client.post( + "/clean/import", + json={ + "dataset": { + "name": "ds_running", + "video_id": "running_vid", + "criteria_applied": [], + }, + "users": [ + { + "author_channel_id": "X", + "comment_count": 1, + }, + ], + }, + ) + assert resp.status_code == 404 + + def test_import_dataset_nome_duplicado_409( + self, client, auth_as_user, completed_collection + ): + """Import com nome de dataset ja existente retorna 409.""" + client.post( + "/clean/import", + json={ + "dataset": { + "name": "dup_name", + "video_id": "dQw4w9WgXcQ", + "criteria_applied": [], + }, + "users": [ + { + "author_channel_id": "A", + "comment_count": 1, + }, + ], + }, + ) + resp = client.post( + "/clean/import", + json={ + "dataset": { + "name": "dup_name", + "video_id": "dQw4w9WgXcQ", + "criteria_applied": [], + }, + "users": [ + { + "author_channel_id": "B", + "comment_count": 1, + }, + ], + }, + ) + assert resp.status_code == 409 + + +class TestImportDatasetChunkEndpoint: + def test_import_chunk_adiciona_usuarios( + self, client, auth_as_user, completed_collection + ): + """Import-chunk adiciona usuarios a dataset existente.""" + # Criar dataset via import + create_resp = client.post( + "/clean/import", + json={ + "dataset": { + "name": "chunk_ds", + "video_id": "dQw4w9WgXcQ", + "criteria_applied": ["media"], + }, + "users": [ + { + "author_channel_id": "A", + "comment_count": 5, + }, + ], + }, + ) + ds_id = create_resp.json()["dataset_id"] + + resp = client.post( + "/clean/import-chunk", + json={ + "dataset_id": ds_id, + "users": [ + { + "author_channel_id": "B", + "author_display_name": "User B", + "comment_count": 3, + "matched_criteria": ["media"], + }, + ], + "done": True, + }, + ) + assert resp.status_code == 200 + data = resp.json() + assert data["total_users"] == 2 + assert data["chunk_received"] == 1 + assert data["done"] is True + + def test_import_chunk_dataset_inexistente_404(self, client, auth_as_user): + """Import-chunk com dataset_id inexistente retorna 404.""" + resp = client.post( + "/clean/import-chunk", + json={ + "dataset_id": str(uuid.uuid4()), + "users": [ + { + "author_channel_id": "X", + "comment_count": 1, + }, + ], + "done": False, + }, + ) + assert resp.status_code == 404 + + +# ------------------------------------------------------------------- +# Cobertura adicional — identical.py branch not texts +# ------------------------------------------------------------------- + + +class TestIdenticalSelectorEmptyTexts: + def test_user_with_empty_comment_list_skipped(self, db, admin_user): + """Usuario com lista de comentarios vazia e ignorado.""" + from services.clean.identical import IdenticalSelector + + col = Collection( + video_id="vid_empty_txt", + status="completed", + collected_by=admin_user.id, + total_comments=0, + ) + db.add(col) + db.commit() + + # Lista vazia de comments gera texts vazio + user_comments: dict[str, list[Comment]] = {"UC_empty_txt": []} + selector = IdenticalSelector(db=db, collection_id=str(col.id)) + selected = selector.select(user_comments) + assert "UC_empty_txt" not in selected diff --git a/backend/tests/test_collect.py b/backend/tests/test_collect.py index 366d1aa..df5aa24 100644 --- a/backend/tests/test_collect.py +++ b/backend/tests/test_collect.py @@ -1,12 +1,27 @@ +import asyncio import logging import uuid from unittest.mock import AsyncMock, MagicMock import httpx import pytest +from fastapi import HTTPException from models.collection import Collection, Comment + +def _run(coro): + """Executa coroutine reutilizando o loop se possível.""" + try: + loop = asyncio.get_event_loop() + if loop.is_closed(): + raise RuntimeError + except RuntimeError: + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + return loop.run_until_complete(coro) + + # --------------------------------------------------------------------------- # Helpers # --------------------------------------------------------------------------- @@ -470,3 +485,1616 @@ def test_safe_int_converte_string_numerica(): assert _safe_int(None) is None assert _safe_int("abc") is None assert _safe_int("") is None + + +# =================================================================== +# Novos testes — cobertura 100% services/collect.py, routers/collect.py, +# services/youtube.py +# =================================================================== + + +# --------------------------------------------------------------------------- +# Helpers extras +# --------------------------------------------------------------------------- + + +def _make_collection(db, user_id, **overrides): + """Cria uma Collection no banco com valores padrão sensíveis.""" + from models.collection import Collection + + defaults = { + "video_id": "dQw4w9WgXcQ", + "status": "completed", + "collected_by": user_id, + "total_comments": 0, + } + defaults.update(overrides) + c = Collection(**defaults) + db.add(c) + db.commit() + db.refresh(c) + return c + + +def _make_comment(db, collection_id, idx, **overrides): + """Cria um Comment no banco com valores padrão.""" + from datetime import UTC, datetime + + from models.collection import Comment + + defaults = { + "collection_id": collection_id, + "comment_id": f"cmt_{idx}", + "author_display_name": f"user{idx}", + "author_channel_id": f"UC{idx}", + "text_original": f"text {idx}", + "like_count": 0, + "reply_count": 0, + "published_at": datetime(2024, 1, 1, tzinfo=UTC), + "updated_at": datetime(2024, 1, 1, tzinfo=UTC), + } + defaults.update(overrides) + cmt = Comment(**defaults) + db.add(cmt) + db.commit() + db.refresh(cmt) + return cmt + + +def _make_item_with_replies(idx: int, reply_count: int = 2) -> dict: + """Cria um item de commentThread COM replies inline.""" + item = _make_item(idx) + item["snippet"]["totalReplyCount"] = reply_count + replies = [] + for r in range(reply_count): + replies.append( + { + "id": f"reply_{idx}_{r}", + "snippet": { + "textOriginal": f"reply {idx}_{r}", + "textDisplay": f"reply {idx}_{r}", + "authorDisplayName": f"replier{r}", + "authorChannelId": {"value": f"UCR{r}"}, + "likeCount": 0, + "publishedAt": "2024-01-02T00:00:00Z", + "updatedAt": "2024-01-02T00:00:00Z", + }, + } + ) + item["replies"] = {"comments": replies} + return item + + +def _yt_error_exc( + status_code: int, + reason: str = "", + message: str = "", + *, + json_raises: bool = False, +): + """Constrói httpx.HTTPStatusError com resposta mockada.""" + mock_response = MagicMock() + mock_response.status_code = status_code + if json_raises: + mock_response.json.side_effect = ValueError("no json") + else: + errors = [{"reason": reason, "message": message}] if reason else [] + mock_response.json.return_value = {"error": {"errors": errors}} + return httpx.HTTPStatusError( + str(status_code), + request=MagicMock(), + response=mock_response, + ) + + +def _import_payload( + video_id="vid123", + n_comments=2, + done=True, +): + """Constrói payload JSON para POST /collect/import.""" + comments = [] + for i in range(n_comments): + comments.append( + { + "comment_id": f"imp_{i}", + "text_original": f"imported text {i}", + "published_at": "2024-06-01T00:00:00Z", + "updated_at": "2024-06-01T00:00:00Z", + } + ) + return { + "video": {"id": video_id, "title": "Test Video"}, + "comments": comments, + "done": done, + } + + +# --------------------------------------------------------------------------- +# _parse_youtube_error — branches não cobertas +# --------------------------------------------------------------------------- + + +class TestParseYoutubeError: + """Testes unitários de _parse_youtube_error.""" + + def test_json_decode_error_fallback(self): + """Cobre linhas 42-44: response.json() levanta exceção.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(500, json_raises=True) + result = _parse_youtube_error(exc) + assert result.status_code == 502 + assert "HTTP 500" in result.detail + + def test_400_without_key_reason(self): + """Cobre linha 58: 400 sem keyInvalid/keyExpired.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(400, reason="badRequest") + result = _parse_youtube_error(exc) + assert result.status_code == 400 + assert "requisição inválida" in result.detail.lower() + + def test_403_video_not_found(self): + """Cobre linha 74: 403 com reason=videoNotFound.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(403, reason="videoNotFound") + result = _parse_youtube_error(exc) + assert result.status_code == 400 + assert "privado" in result.detail.lower() + + def test_403_unknown_reason_fallback(self): + """Cobre linhas 87-94: 403 com reason desconhecida.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(403, reason="somethingNew") + result = _parse_youtube_error(exc) + assert result.status_code == 403 + assert "somethingNew" in result.detail + + def test_404_returns_video_not_found(self): + """Cobre linhas 92-93: status 404.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(404) + result = _parse_youtube_error(exc) + assert result.status_code == 404 + assert "não encontrado" in result.detail.lower() + + @pytest.mark.parametrize( + "status_code", + [500, 502, 503], + ids=["500", "502", "503"], + ) + def test_other_status_returns_502(self, status_code): + """Cobre linha 94: fallback para códigos não mapeados.""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(status_code) + result = _parse_youtube_error(exc) + assert result.status_code == 502 + + def test_400_empty_errors_list(self): + """Cobre quando errors list está vazia (reason='').""" + from services.collect import _parse_youtube_error + + exc = _yt_error_exc(400, reason="") + result = _parse_youtube_error(exc) + assert result.status_code == 400 + assert "requisição inválida" in result.detail.lower() + + +# --------------------------------------------------------------------------- +# _bulk_insert — empty rows +# --------------------------------------------------------------------------- + + +def test_bulk_insert_empty_rows_returns_zero(db): + """Cobre linha 137: _bulk_insert com lista vazia retorna 0.""" + from services.collect import _bulk_insert + + assert _bulk_insert(db, []) == 0 + + +# --------------------------------------------------------------------------- +# _insert_comments — comment with replies +# --------------------------------------------------------------------------- + + +def test_insert_comments_with_inline_replies(db, regular_user): + """Cobre linha 162: comentário com replies inline.""" + from services.collect import _insert_comments + + col = _make_collection(db, regular_user.id, status="running") + items = [_make_item_with_replies(0, reply_count=3)] + inserted = _insert_comments(db, col.id, items) + # 1 top-level + 3 replies = 4 + assert inserted == 4 + total = db.query(Comment).filter(Comment.collection_id == col.id).count() + assert total == 4 + # Confirmar que replies têm parent_id + replies = ( + db.query(Comment) + .filter( + Comment.collection_id == col.id, + Comment.parent_id.isnot(None), + ) + .all() + ) + assert len(replies) == 3 + + +# --------------------------------------------------------------------------- +# _populate_video_metadata +# --------------------------------------------------------------------------- + + +def test_populate_video_metadata(db, regular_user): + """Cobre linhas 175-187: preenchendo campos de metadados.""" + from services.collect import _populate_video_metadata + + col = _make_collection(db, regular_user.id) + video_info = { + "snippet": { + "title": "Test Video Title", + "description": "A description", + "channelId": "UCxyz", + "channelTitle": "Test Channel", + "publishedAt": "2023-06-15T10:30:00Z", + }, + "statistics": { + "viewCount": "1000", + "likeCount": "50", + "commentCount": "10", + }, + } + _populate_video_metadata(col, video_info) + assert col.video_title == "Test Video Title" + assert col.video_description == "A description" + assert col.video_channel_id == "UCxyz" + assert col.video_channel_title == "Test Channel" + assert col.video_published_at is not None + assert col.video_view_count == 1000 + assert col.video_like_count == 50 + assert col.video_comment_count == 10 + + +def test_populate_video_metadata_no_published_at(db, regular_user): + """Cobre branch: publishedAt ausente → None.""" + from services.collect import _populate_video_metadata + + col = _make_collection(db, regular_user.id) + video_info = {"snippet": {}, "statistics": {}} + _populate_video_metadata(col, video_info) + assert col.video_published_at is None + assert col.video_view_count is None + + +# --------------------------------------------------------------------------- +# collect_next_page — branches +# --------------------------------------------------------------------------- + + +def test_collect_next_page_not_found(db, regular_user): + """Cobre linha 271: collection não encontrada → 404.""" + from services.collect import collect_next_page + + payload = MagicMock() + payload.collection_id = uuid.uuid4() + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + with pytest.raises(HTTPException) as exc_info: + _run(collect_next_page(db, payload, regular_user.id)) + assert exc_info.value.status_code == 404 + + +def test_collect_next_page_already_completed(db, regular_user): + """Cobre linha 273: collection já completed → retorna (col, None).""" + from services.collect import collect_next_page + + col = _make_collection(db, regular_user.id, status="completed") + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + result_col, token = _run(collect_next_page(db, payload, regular_user.id)) + assert result_col.status == "completed" + assert token is None + + +def test_collect_next_page_failed_raises_400(db, regular_user): + """Cobre linha 275: collection falhou → 400.""" + from services.collect import collect_next_page + + col = _make_collection(db, regular_user.id, status="failed") + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + with pytest.raises(HTTPException) as exc_info: + _run(collect_next_page(db, payload, regular_user.id)) + assert exc_info.value.status_code == 400 + assert "falhou" in exc_info.value.detail.lower() + + +def test_collect_next_page_no_token_marks_complete(db, regular_user): + """Cobre linhas 280-285: sem next_page_token → marca completed.""" + from services.collect import collect_next_page + + col = _make_collection( + db, + regular_user.id, + status="running", + next_page_token=None, + ) + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + result_col, token = _run(collect_next_page(db, payload, regular_user.id)) + assert result_col.status == "completed" + assert result_col.enrich_status == "pending" + assert token is None + + +def test_collect_next_page_updates_token(db, regular_user, mocker): + """Cobre linha 310: atualiza next_page_token para próxima página.""" + from services.collect import collect_next_page + + col = _make_collection( + db, + regular_user.id, + status="running", + next_page_token="TOKEN_A", + ) + mocker.patch( + "services.collect.fetch_comments_page", + new=AsyncMock(return_value=_page([_make_item(10)], next_token="TOKEN_B")), + ) + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + result_col, token = _run(collect_next_page(db, payload, regular_user.id)) + assert token == "TOKEN_B" + assert result_col.next_page_token == "TOKEN_B" + assert result_col.status == "running" + + +def test_collect_next_page_completes_when_no_more(db, regular_user, mocker): + """Cobre linhas 304-308: próxima página sem nextPageToken.""" + from services.collect import collect_next_page + + col = _make_collection( + db, + regular_user.id, + status="running", + next_page_token="TOKEN_A", + ) + mocker.patch( + "services.collect.fetch_comments_page", + new=AsyncMock(return_value=_page([_make_item(20)])), + ) + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + result_col, token = _run(collect_next_page(db, payload, regular_user.id)) + assert result_col.status == "completed" + assert result_col.enrich_status == "pending" + assert token is None + + +def test_collect_next_page_http_error(db, regular_user, mocker): + """Cobre linhas 315-324: HTTPStatusError no next_page.""" + from services.collect import collect_next_page + + col = _make_collection( + db, + regular_user.id, + status="running", + next_page_token="TOKEN_A", + ) + exc = _yt_error_exc(403, reason="quotaExceeded") + mocker.patch( + "services.collect.fetch_comments_page", + new=AsyncMock(side_effect=exc), + ) + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + with pytest.raises(HTTPException) as exc_info: + _run(collect_next_page(db, payload, regular_user.id)) + assert exc_info.value.status_code == 429 + db.refresh(col) + assert col.status == "failed" + + +def test_collect_next_page_generic_exception(db, regular_user, mocker): + """Cobre linhas 325-333: Exception genérica no next_page.""" + from services.collect import collect_next_page + + col = _make_collection( + db, + regular_user.id, + status="running", + next_page_token="TOKEN_A", + ) + mocker.patch( + "services.collect.fetch_comments_page", + new=AsyncMock(side_effect=RuntimeError("boom")), + ) + payload = MagicMock() + payload.collection_id = col.id + payload.api_key = MagicMock() + payload.api_key.get_secret_value.return_value = "KEY" + with pytest.raises(HTTPException) as exc_info: + _run(collect_next_page(db, payload, regular_user.id)) + assert exc_info.value.status_code == 500 + db.refresh(col) + assert col.status == "failed" + + +# --------------------------------------------------------------------------- +# _threads_needing_replies & _channels_needing_dates +# --------------------------------------------------------------------------- + + +def test_threads_needing_replies(db, regular_user): + """Cobre linhas 352-380: SQL query para threads com replies faltando.""" + from services.collect import _threads_needing_replies + + col = _make_collection(db, regular_user.id) + # Top-level com reply_count=3, mas sem replies no banco + _make_comment( + db, + col.id, + "thr1", + comment_id="thread_1", + reply_count=3, + parent_id=None, + ) + # Top-level com reply_count=0 — não deve aparecer + _make_comment( + db, + col.id, + "thr2", + comment_id="thread_2", + reply_count=0, + parent_id=None, + ) + result = _threads_needing_replies(db, col.id) + assert len(result) == 1 + assert result[0] == ("thread_1", 3) + + +def test_threads_needing_replies_already_fetched(db, regular_user): + """Thread com todas as replies já inseridas não aparece.""" + from services.collect import _threads_needing_replies + + col = _make_collection(db, regular_user.id) + _make_comment( + db, + col.id, + "thr3", + comment_id="thread_3", + reply_count=1, + parent_id=None, + ) + # 1 reply já existe + _make_comment( + db, + col.id, + "rep3", + comment_id="reply_3_0", + parent_id="thread_3", + reply_count=0, + ) + result = _threads_needing_replies(db, col.id) + assert len(result) == 0 + + +def test_channels_needing_dates(db, regular_user): + """Cobre linhas 389-400: canais sem published_at.""" + from services.collect import _channels_needing_dates + + col = _make_collection(db, regular_user.id) + _make_comment( + db, + col.id, + "ch1", + author_channel_id="UCAAA", + author_channel_published_at=None, + ) + _make_comment( + db, + col.id, + "ch2", + author_channel_id="UCBBB", + author_channel_published_at=None, + ) + result = _channels_needing_dates(db, col.id) + assert set(result) == {"UCAAA", "UCBBB"} + + +def test_channels_needing_dates_excludes_populated(db, regular_user): + """Canais com published_at preenchido não aparecem.""" + from datetime import UTC, datetime + + from services.collect import _channels_needing_dates + + col = _make_collection(db, regular_user.id) + _make_comment( + db, + col.id, + "chp", + author_channel_id="UCPOP", + author_channel_published_at=datetime(2020, 1, 1, tzinfo=UTC), + ) + result = _channels_needing_dates(db, col.id) + assert result == [] + + +# --------------------------------------------------------------------------- +# enrich_collection — all 3 phases + error handling +# --------------------------------------------------------------------------- + + +def test_enrich_collection_not_found(db): + """Cobre linha 414-415: coleta não encontrada.""" + from services.collect import enrich_collection + + with pytest.raises(HTTPException) as exc_info: + _run(enrich_collection(db, uuid.uuid4(), "KEY")) + assert exc_info.value.status_code == 404 + + +def test_enrich_collection_not_completed(db, regular_user): + """Cobre linhas 416-420: coleta não completed → 400.""" + from services.collect import enrich_collection + + col = _make_collection(db, regular_user.id, status="running") + with pytest.raises(HTTPException) as exc_info: + _run(enrich_collection(db, col.id, "KEY")) + assert exc_info.value.status_code == 400 + + +def test_enrich_collection_already_done(db, regular_user): + """Cobre linhas 421-427: enrich_status=done retorna imediatamente.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="done", + ) + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["done"] is True + assert result["phase"] == "channels" + + +def test_enrich_phase_video(db, regular_user, mocker): + """Cobre fase 0 do enrich: video metadata.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="pending", + video_title=None, + ) + mocker.patch( + "services.collect.fetch_video_info", + new=AsyncMock( + return_value={ + "snippet": { + "title": "Test", + "channelId": "UCx", + "channelTitle": "Ch", + "publishedAt": "2024-01-01T00:00:00Z", + }, + "statistics": {"viewCount": "100"}, + } + ), + ) + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["phase"] == "video" + assert result["processed"] == 1 + assert result["done"] is False + db.refresh(col) + assert col.video_title == "Test" + assert col.enrich_status == "enriching" + + +def test_enrich_phase_video_no_info(db, regular_user, mocker): + """Fase 0: fetch_video_info returns None → processed=0.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="pending", + video_title=None, + ) + mocker.patch( + "services.collect.fetch_video_info", + new=AsyncMock(return_value=None), + ) + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["phase"] == "video" + assert result["processed"] == 0 + + +def test_enrich_phase_replies(db, regular_user, mocker): + """Cobre fase 1: busca replies extras.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="enriching", + video_title="Already Set", + ) + # Top-level com 2 replies mas nenhuma no banco + _make_comment( + db, + col.id, + "thr_e", + comment_id="thread_enrich", + reply_count=2, + parent_id=None, + ) + col.total_comments = 1 + db.commit() + + mocker.patch( + "services.collect.fetch_replies_page", + new=AsyncMock( + return_value={ + "items": [ + { + "id": "reply_e_0", + "snippet": { + "textOriginal": "r0", + "textDisplay": "r0", + "authorDisplayName": "u0", + "authorChannelId": {"value": "UC0"}, + "likeCount": 0, + "publishedAt": "2024-01-02T00:00:00Z", + "updatedAt": "2024-01-02T00:00:00Z", + }, + }, + { + "id": "reply_e_1", + "snippet": { + "textOriginal": "r1", + "textDisplay": "r1", + "authorDisplayName": "u1", + "authorChannelId": {"value": "UC1"}, + "likeCount": 0, + "publishedAt": "2024-01-02T00:00:00Z", + "updatedAt": "2024-01-02T00:00:00Z", + }, + }, + ], + "nextPageToken": None, + } + ), + ) + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["phase"] == "replies" + assert result["processed"] == 1 + assert result["done"] is False + + +def test_enrich_phase_channels(db, regular_user, mocker): + """Cobre fase 2: busca datas de canal.""" + from datetime import UTC, datetime + + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="enriching", + video_title="Set", + ) + _make_comment( + db, + col.id, + "ch_e", + author_channel_id="UCENRICH", + author_channel_published_at=None, + ) + + mocker.patch( + "services.collect.fetch_channels_info", + new=AsyncMock(return_value={"UCENRICH": datetime(2020, 5, 1, tzinfo=UTC)}), + ) + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["phase"] == "channels" + assert result["processed"] == 1 + assert result["done"] is False + + +def test_enrich_completes_all_phases(db, regular_user, mocker): + """Cobre linhas 493-506: tudo concluído → done=True.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="enriching", + video_title="Set", + ) + # Nenhum thread pendente, nenhum canal pendente + result = _run(enrich_collection(db, col.id, "KEY")) + assert result["done"] is True + assert result["phase"] == "channels" + db.refresh(col) + assert col.enrich_status == "done" + assert col.channel_dates_failed is False + + +def test_enrich_http_error_raises(db, regular_user, mocker): + """Cobre linhas 508-514: HTTPStatusError no enrich.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="pending", + video_title=None, + ) + exc = _yt_error_exc(403, reason="quotaExceeded") + mocker.patch( + "services.collect.fetch_video_info", + new=AsyncMock(side_effect=exc), + ) + with pytest.raises(HTTPException) as exc_info: + _run(enrich_collection(db, col.id, "KEY")) + assert exc_info.value.status_code == 429 + + +def test_enrich_generic_exception(db, regular_user, mocker): + """Cobre linhas 515-525: Exception genérica no enrich.""" + from services.collect import enrich_collection + + col = _make_collection( + db, + regular_user.id, + status="completed", + enrich_status="pending", + video_title=None, + ) + mocker.patch( + "services.collect.fetch_video_info", + new=AsyncMock(side_effect=RuntimeError("fail")), + ) + with pytest.raises(HTTPException) as exc_info: + _run(enrich_collection(db, col.id, "KEY")) + assert exc_info.value.status_code == 500 + + +# --------------------------------------------------------------------------- +# _fetch_thread_replies — paginated +# --------------------------------------------------------------------------- + + +def test_fetch_thread_replies_paginated(db, regular_user, mocker): + """Cobre linhas 535-552: paginação de replies.""" + from services.collect import _fetch_thread_replies + + col = _make_collection(db, regular_user.id) + + def _reply_item(rid): + return { + "id": rid, + "snippet": { + "textOriginal": f"text {rid}", + "textDisplay": f"text {rid}", + "authorDisplayName": "user", + "authorChannelId": {"value": "UCX"}, + "likeCount": 0, + "publishedAt": "2024-01-01T00:00:00Z", + "updatedAt": "2024-01-01T00:00:00Z", + }, + } + + mocker.patch( + "services.collect.fetch_replies_page", + new=AsyncMock( + side_effect=[ + { + "items": [_reply_item("rp1")], + "nextPageToken": "PAGE2", + }, + { + "items": [_reply_item("rp2")], + "nextPageToken": None, + }, + ] + ), + ) + inserted = _run(_fetch_thread_replies(db, col.id, "parent_1", "KEY")) + assert inserted == 2 + + +# --------------------------------------------------------------------------- +# _enrich_channel_dates — batch + epoch fallback +# --------------------------------------------------------------------------- + + +def test_enrich_channel_dates_with_epoch_fallback(db, regular_user, mocker): + """Cobre linhas 565-593: canais não encontrados recebem epoch.""" + from datetime import UTC, datetime + + from services.collect import _enrich_channel_dates + + col = _make_collection(db, regular_user.id) + _make_comment( + db, + col.id, + "cd1", + author_channel_id="UCFOUND", + author_channel_published_at=None, + ) + _make_comment( + db, + col.id, + "cd2", + author_channel_id="UCMISSING", + author_channel_published_at=None, + ) + mocker.patch( + "services.collect.fetch_channels_info", + new=AsyncMock(return_value={"UCFOUND": datetime(2019, 3, 15, tzinfo=UTC)}), + ) + success = _run(_enrich_channel_dates(db, col.id, ["UCFOUND", "UCMISSING"], "KEY")) + assert success is True + + found = db.query(Comment).filter(Comment.author_channel_id == "UCFOUND").first() + # DB pode retornar naive ou aware — comparar só data + assert found.author_channel_published_at.year == 2019 + assert found.author_channel_published_at.month == 3 + assert found.author_channel_published_at.day == 15 + missing = db.query(Comment).filter(Comment.author_channel_id == "UCMISSING").first() + # Epoch fallback (1970-01-01) + assert missing.author_channel_published_at.year == 1970 + assert missing.author_channel_published_at.month == 1 + assert missing.author_channel_published_at.day == 1 + + +def test_enrich_channel_dates_empty_list(): + """Cobre linha 565: lista vazia retorna True.""" + from services.collect import _enrich_channel_dates + + result = _run(_enrich_channel_dates(MagicMock(), uuid.uuid4(), [], "KEY")) + assert result is True + + +def test_enrich_channel_dates_exception_returns_false(db, regular_user, mocker): + """Cobre linhas 586-593: exceção → retorna False.""" + from services.collect import _enrich_channel_dates + + col = _make_collection(db, regular_user.id) + _make_comment( + db, + col.id, + "cde", + author_channel_id="UCERR", + author_channel_published_at=None, + ) + mocker.patch( + "services.collect.fetch_channels_info", + new=AsyncMock(side_effect=Exception("API down")), + ) + success = _run(_enrich_channel_dates(db, col.id, ["UCERR"], "KEY")) + assert success is False + + +# --------------------------------------------------------------------------- +# import_collection +# --------------------------------------------------------------------------- + + +def test_import_collection_creates_and_persists(client, db, auth_as_user): + """Cobre linhas 613-654: import com metadata e comentários.""" + payload = _import_payload(n_comments=3, done=True) + resp = client.post("/collect/import", json=payload) + assert resp.status_code == 201 + data = resp.json() + assert data["video_id"] == "vid123" + assert data["status"] == "completed" + # Confirmar que comentários foram inseridos + total = db.query(Comment).count() + assert total == 3 + + +def test_import_collection_not_done(client, db, auth_as_user): + """Import com done=False cria collection como 'importing'.""" + payload = _import_payload(done=False) + resp = client.post("/collect/import", json=payload) + assert resp.status_code == 201 + data = resp.json() + assert data["status"] == "importing" + + +# --------------------------------------------------------------------------- +# import_chunk +# --------------------------------------------------------------------------- + + +def test_import_chunk_appends_comments(client, db, auth_as_user): + """Cobre linhas 664-699: append de batch.""" + # Primeiro, criar a coleta via import + payload = _import_payload(n_comments=2, done=False) + resp = client.post("/collect/import", json=payload) + collection_id = resp.json()["collection_id"] + + # Enviar chunk + chunk_payload = { + "collection_id": collection_id, + "comments": [ + { + "comment_id": "chunk_1", + "text_original": "chunk text", + "published_at": "2024-06-01T00:00:00Z", + "updated_at": "2024-06-01T00:00:00Z", + } + ], + "done": True, + } + resp2 = client.post("/collect/import-chunk", json=chunk_payload) + assert resp2.status_code == 200 + data = resp2.json() + assert data["total_comments"] == 3 + assert data["chunk_received"] == 1 + assert data["done"] is True + + # Confirmar status mudou + col = db.query(Collection).filter(Collection.id == uuid.UUID(collection_id)).first() + assert col.status == "completed" + + +def test_import_chunk_not_found(client, auth_as_user): + """Cobre import_chunk com collection_id inexistente.""" + chunk_payload = { + "collection_id": str(uuid.uuid4()), + "comments": [ + { + "comment_id": "x", + "text_original": "x", + "published_at": "2024-06-01T00:00:00Z", + "updated_at": "2024-06-01T00:00:00Z", + } + ], + "done": False, + } + resp = client.post("/collect/import-chunk", json=chunk_payload) + assert resp.status_code == 404 + + +# --------------------------------------------------------------------------- +# delete_collection +# --------------------------------------------------------------------------- + + +def test_delete_collection_success(client, db, auth_as_user, stub_youtube_3_comments): + """Cobre linhas 716-720: delete bem-sucedido.""" + resp = client.post( + "/collect", + json={ + "video_id": "dQw4w9WgXcQ", + "api_key": "AIzaFAKE", + }, + ) + cid = resp.json()["collection_id"] + del_resp = client.delete(f"/collect/{cid}") + assert del_resp.status_code == 204 + assert db.query(Collection).count() == 0 + assert db.query(Comment).count() == 0 + + +def test_delete_collection_not_found(client, auth_as_user): + """Cobre linha 710: collection não encontrada → 404.""" + resp = client.delete(f"/collect/{uuid.uuid4()}") + assert resp.status_code == 404 + + +def test_delete_collection_running_returns_409(client, db, auth_as_user, mocker): + """Cobre linhas 711-715: coleta running → 409.""" + items = [_make_item(i) for i in range(2)] + mocker.patch( + "services.collect.fetch_comments_page", + new=AsyncMock(return_value=_page(items, next_token="HAS_MORE")), + ) + resp = client.post( + "/collect", + json={ + "video_id": "dQw4w9WgXcQ", + "api_key": "AIzaFAKE", + }, + ) + cid = resp.json()["collection_id"] + del_resp = client.delete(f"/collect/{cid}") + assert del_resp.status_code == 409 + + +# --------------------------------------------------------------------------- +# export_comments_iter +# --------------------------------------------------------------------------- + + +def test_export_comments_iter_streaming(db, regular_user): + """Cobre linha 725: generator com yield_per.""" + from services.collect import export_comments_iter + + col = _make_collection(db, regular_user.id) + for i in range(5): + _make_comment(db, col.id, f"exp_{i}") + results = list(export_comments_iter(db, col.id)) + assert len(results) == 5 + + +# --------------------------------------------------------------------------- +# routers/collect.py — export endpoints (JSON + CSV) +# --------------------------------------------------------------------------- + + +def test_export_json_streaming(client, db, auth_as_user, stub_youtube_3_comments): + """Cobre linhas 237-288: export JSON completo.""" + import json as json_mod + + resp = client.post( + "/collect", + json={ + "video_id": "dQw4w9WgXcQ", + "api_key": "AIzaFAKE", + }, + ) + cid = resp.json()["collection_id"] + export_resp = client.get(f"/collect/{cid}/export?format=json") + assert export_resp.status_code == 200 + assert "application/json" in export_resp.headers["content-type"] + data = json_mod.loads(export_resp.text) + assert "video" in data + assert "comments" in data + assert len(data["comments"]) == 3 + assert data["video"]["id"] == "dQw4w9WgXcQ" + + +def test_export_csv_streaming(client, db, auth_as_user, stub_youtube_3_comments): + """Cobre linhas 240-262: export CSV com header e BOM.""" + resp = client.post( + "/collect", + json={ + "video_id": "dQw4w9WgXcQ", + "api_key": "AIzaFAKE", + }, + ) + cid = resp.json()["collection_id"] + export_resp = client.get(f"/collect/{cid}/export?format=csv") + assert export_resp.status_code == 200 + assert "text/csv" in export_resp.headers["content-type"] + # BOM + assert export_resp.text.startswith("\ufeff") + lines = export_resp.text.strip().split("\n") + # Header + 3 data rows + assert len(lines) == 4 + assert "comment_id" in lines[0] + + +def test_export_not_found(client, auth_as_user): + """Export de collection inexistente retorna 404.""" + resp = client.get(f"/collect/{uuid.uuid4()}/export?format=json") + assert resp.status_code == 404 + + +# --------------------------------------------------------------------------- +# routers/collect.py — enrich endpoint +# --------------------------------------------------------------------------- + + +def test_enrich_endpoint_via_router(client, db, auth_as_user, mocker): + """Cobre linhas 131-136: POST /{id}/enrich via router.""" + # Criar coleta completada com video_title já set + col = _make_collection( + db, + auth_as_user.id, + status="completed", + enrich_status="enriching", + video_title="Set", + ) + # Sem threads nem canais pendentes → done + resp = client.post( + f"/collect/{col.id}/enrich", + json={"api_key": "AIzaFAKEKEY"}, + ) + assert resp.status_code == 200 + data = resp.json() + assert data["done"] is True + + +def test_enrich_endpoint_not_completed(client, db, auth_as_user): + """Enrich em coleta não completada retorna 400.""" + col = _make_collection(db, auth_as_user.id, status="running") + resp = client.post( + f"/collect/{col.id}/enrich", + json={"api_key": "AIzaFAKEKEY"}, + ) + assert resp.status_code == 400 + + +# --------------------------------------------------------------------------- +# services/youtube.py — fetch_comments_page +# --------------------------------------------------------------------------- + + +def test_fetch_comments_page_constructs_params(mocker): + """Cobre linhas 15-32: construção de parâmetros e chamada.""" + from services.youtube import fetch_comments_page + + mock_resp = MagicMock() + mock_resp.status_code = 200 + mock_resp.json.return_value = { + "items": [], + "nextPageToken": None, + } + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + result = _run(fetch_comments_page("vid123", "KEY", max_results=50)) + assert result == {"items": [], "nextPageToken": None} + call_kwargs = mock_client.get.call_args + params = call_kwargs.kwargs.get("params", call_kwargs[1].get("params", {})) + assert params["videoId"] == "vid123" + assert params["maxResults"] == 50 + assert "pageToken" not in params + + +def test_fetch_comments_page_with_page_token(mocker): + """Cobre linha 23: pageToken adicionado quando presente.""" + from services.youtube import fetch_comments_page + + mock_resp = MagicMock() + mock_resp.status_code = 200 + mock_resp.json.return_value = {"items": []} + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + _run(fetch_comments_page("vid123", "KEY", page_token="ABC")) + call_kwargs = mock_client.get.call_args + params = call_kwargs.kwargs.get("params", call_kwargs[1].get("params", {})) + assert params["pageToken"] == "ABC" + + +# --------------------------------------------------------------------------- +# services/youtube.py — fetch_video_info +# --------------------------------------------------------------------------- + + +def test_fetch_video_info_returns_first_item(mocker): + """Cobre linhas 37-49: retorna primeiro item.""" + from services.youtube import fetch_video_info + + mock_resp = MagicMock() + mock_resp.json.return_value = {"items": [{"id": "vid", "snippet": {"title": "T"}}]} + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + result = _run(fetch_video_info("vid", "KEY")) + assert result["snippet"]["title"] == "T" + + +def test_fetch_video_info_returns_none_empty(mocker): + """Cobre linha 49: items vazio → None.""" + from services.youtube import fetch_video_info + + mock_resp = MagicMock() + mock_resp.json.return_value = {"items": []} + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + result = _run(fetch_video_info("vid", "KEY")) + assert result is None + + +# --------------------------------------------------------------------------- +# services/youtube.py — fetch_replies_page +# --------------------------------------------------------------------------- + + +def test_fetch_replies_page_constructs_params(mocker): + """Cobre linhas 58-75: parâmetros de replies.""" + from services.youtube import fetch_replies_page + + mock_resp = MagicMock() + mock_resp.json.return_value = {"items": []} + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + _run(fetch_replies_page("parent1", "KEY")) + call_kwargs = mock_client.get.call_args + params = call_kwargs.kwargs.get("params", call_kwargs[1].get("params", {})) + assert params["parentId"] == "parent1" + assert "pageToken" not in params + + +def test_fetch_replies_page_with_page_token(mocker): + """Cobre linha 66: pageToken para replies.""" + from services.youtube import fetch_replies_page + + mock_resp = MagicMock() + mock_resp.json.return_value = {"items": []} + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + _run(fetch_replies_page("parent1", "KEY", page_token="P2")) + call_kwargs = mock_client.get.call_args + params = call_kwargs.kwargs.get("params", call_kwargs[1].get("params", {})) + assert params["pageToken"] == "P2" + + +# --------------------------------------------------------------------------- +# services/youtube.py — fetch_channels_info (batching) +# --------------------------------------------------------------------------- + + +def test_fetch_channels_info_batching(mocker): + """Cobre linhas 85-105: batching de >50 canais.""" + from services.youtube import fetch_channels_info + + # 60 channel IDs → 2 batches (50 + 10) + ids = [f"UC{i:04d}" for i in range(60)] + + def _make_channel_resp(batch_ids): + items = [] + for cid in batch_ids[:3]: # retorna 3 por batch + items.append( + { + "id": cid, + "snippet": {"publishedAt": "2020-01-01T00:00:00Z"}, + } + ) + resp = MagicMock() + resp.json.return_value = {"items": items} + resp.raise_for_status = MagicMock() + return resp + + call_count = 0 + + async def mock_get(url, params, timeout): + nonlocal call_count + call_count += 1 + batch_ids = params["id"].split(",") + return _make_channel_resp(batch_ids) + + mock_client = AsyncMock() + mock_client.get = mock_get + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + result = _run(fetch_channels_info(ids, "KEY")) + # 2 batches, each returning 3 → 6 results + assert len(result) == 6 + assert call_count == 2 + + +def test_fetch_channels_info_no_published_at(mocker): + """Canal sem publishedAt é ignorado no resultado.""" + from services.youtube import fetch_channels_info + + mock_resp = MagicMock() + mock_resp.json.return_value = { + "items": [ + {"id": "UC1", "snippet": {}}, + { + "id": "UC2", + "snippet": {"publishedAt": "2021-06-01T00:00:00Z"}, + }, + ] + } + mock_resp.raise_for_status = MagicMock() + + mock_client = AsyncMock() + mock_client.get = AsyncMock(return_value=mock_resp) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=False) + + mocker.patch( + "services.youtube.httpx.AsyncClient", + return_value=mock_client, + ) + + result = _run(fetch_channels_info(["UC1", "UC2"], "KEY")) + assert "UC1" not in result + assert "UC2" in result + + +# --------------------------------------------------------------------------- +# Parametrize: SQL injection payloads no video_id +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize( + "malicious_id", + [ + "';DROP comments;--", + "1 OR 1=1", + "x;DELETE FROM col", + "