Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
*.env
.env
.nia_cache.json
__pycache__/
*.pyc
*.pyo
Expand Down
9 changes: 9 additions & 0 deletions .postman/resources.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Use this workspace to collaborate
workspace:
id: 68c74042-e64d-40ca-8ad4-3085eff4a41b

# All resources in the `postman/` folder are automatically registered in Local View.
# Point to additional files outside the `postman/` folder to register them individually. Example:
#localResources:
# collections:
# - ../tests/E2E Test Collection/
280 changes: 278 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,278 @@
# sdx_hackathon_404_not_found
Hackathon repo
# Legacy Architecture Modernization Engine

An AI-powered tool that takes a legacy GitHub repository, deeply analyses it using the [Nia API](https://trynia.ai), generates a step-by-step migration plan with an LLM, and then executes that plan — writing real file changes, committing them with git, and producing a Markdown migration report.

Built as a three-person parallel hackathon project with a shared JSON contract so each part could be developed independently.

---

## Architecture overview

```
User / CLI
src/cli.py (lme analyze / execute / report)
├──▶ Part 1: Core SDK
│ src/nia_client/ ← HTTP wrapper for the Nia API
│ src/models/ ← Shared Pydantic contracts
├──▶ Part 2: Architect Agent
│ src/architect/ ← Analyses repo, generates RefactorPlan
└──▶ Part 3: Worker Orchestrator
src/worker/ ← Executes plan, writes files, reports
```

### End-to-end data flow

```
engine_input.json
ArchitectAgent
│ (Nia search + grep + LLM)
refactor_plan.json ← the only coupling point between Part 2 and Part 3
Orchestrator
│ (git clone + file writes + validation)
file changes on disk
Reporter
migration_report.md
```

`refactor_plan.json` is a `RefactorPlan` Pydantic model serialised as JSON. Part 2 writes it; Part 3 reads it. They never import each other.

---

## Project structure

```
sdx_hackathon_404_not_found/
├── pyproject.toml
├── .env.example # NIA_API_KEY= LLM_API_KEY=
├── engine_input.json # Example input for the CLI
├── run_local_test.py # Standalone offline test (no API keys needed)
├── smoke_test.py # Full end-to-end smoke test (needs real keys)
├── src/
│ ├── cli.py # Typer CLI: analyze, execute, report
│ │
│ ├── models/ # Part 1 — shared Pydantic contracts
│ │ ├── plan.py # RefactorPlan, RefactorStep, FileChange, SymbolReference
│ │ ├── input.py # EngineInput, EngineConfig, ModernizationTarget
│ │ └── analysis.py # CodebaseProfile, DependencyNode, AntiPattern
│ │
│ ├── nia_client/ # Part 1 — Nia REST API wrapper
│ │ ├── client.py # NiaClient + MockNiaClient
│ │ ├── indexer.py # index_repo(), index_doc_url(), wait_for_index()
│ │ └── searcher.py # search(), grep(), read_file(), get_tree(), github_search()
│ │
│ ├── architect/ # Part 2 — analysis and planning
│ │ ├── agent.py # ArchitectAgent: entry point
│ │ ├── analyzer.py # build_dependency_graph(), detect_patterns(), get_codebase_profile()
│ │ ├── planner.py # generate_plan() — LLM call + JSON parse + validation
│ │ └── prompts.py # SYSTEM_PROMPT, format_user_prompt(), format_retry_prompt()
│ │
│ └── worker/ # Part 3 — execution and reporting
│ ├── orchestrator.py # Orchestrator: topological run, failure cascade
│ ├── writer.py # clone_repo(), apply_step(), commit_step()
│ └── reporter.py # Reporter: generate() + save() -> migration_report.md
└── tests/
├── fixtures/
│ └── sample_plan.json # Realistic 3-step RefactorPlan fixture
├── test_nia_client.py # NiaClient unit tests (respx mocks)
├── test_architect.py # Analyzer, planner, prompts, ArchitectAgent unit tests
├── test_writer.py # writer.py tests using local bare git repos
└── test_integration.py # Live Nia API tests + full Part 3 + e2e pipeline
```

---

## Part 1 — Core SDK

### Models (`src/models/`)

All three parts import these. They are the frozen contracts.

| Model | Purpose |
|---|---|
| `RefactorPlan` | Top-level plan: repo, source_id, steps, dependency graph, risk assessment |
| `RefactorStep` | One atomic unit of work: id, title, depends_on, affected_symbols, changes, validation_queries |
| `FileChange` | A single file operation: action (`create`/`modify`/`delete`/`move`), old_content, new_content |
| `SymbolReference` | A named symbol with its file, line range, and kind (class/function/etc.) |
| `EngineInput` | CLI input: repo, ref, goal, instructions, API keys, LLM settings |
| `CodebaseProfile` | Analyzer output fed to the planner: dependency graph, anti-patterns, entry points |

### Nia client (`src/nia_client/`)

`NiaClient` is a thin `httpx` wrapper around `https://apigcp.trynia.ai/v2`.

| Method | Nia endpoint |
|---|---|
| `index_repo(repo)` | `POST /sources` |
| `wait_for_index(source_id)` | `GET /sources/{id}` (polls until ready) |
| `search(repo, query, mode)` | `POST /search` — modes: `query`, `deep`, `universal` |
| `grep(source_id, pattern)` | `POST /sources/{id}/grep` |
| `read_file(repo, path, ref)` | `POST /github/read` |
| `get_tree(owner, repo, ref)` | `GET /github/tree/{owner}/{repo}` |
| `github_search(repo, query)` | `POST /github/search` |

`MockNiaClient` is a drop-in that returns deterministic dummy data — used in tests and offline development so no API key is required.

---

## Part 2 — Architect Agent (`src/architect/`)

**Input:** `EngineInput`
**Output:** `refactor_plan.json`

```
ArchitectAgent.analyze(engine_input)
├─ NiaClient.index_repo() → source_id
├─ analyzer.get_codebase_profile()
│ ├─ build_dependency_graph() grep for cross-file imports
│ └─ detect_patterns() anti-patterns, entry points, TODOs
└─ planner.generate_plan(profile, target, config)
├─ prompts.format_user_prompt() CodebaseProfile → LLM message
├─ _call_llm() OpenAI / Gemini / Anthropic
├─ _parse_and_validate() JSON → RefactorPlan
└─ _validate_step_dependencies() fix broken depends_on refs
```

The planner supports **OpenAI, Gemini, and Anthropic** behind a common dispatch table and retries automatically with a corrective prompt if the LLM returns malformed JSON.

---

## Part 3 — Worker Orchestrator (`src/worker/`)

**Input:** `refactor_plan.json`
**Output:** file changes committed to a cloned repo + `migration_report.md`

```
Orchestrator.run()
├─ clone_repo() git clone to a temp directory
├─ topological sort respects depends_on across steps
└─ for each step (in order):
├─ writer.apply_step() write new_content / delete / move files
├─ writer.commit_step() git commit with step title as message
├─ validate passed / failed
└─ if failed → mark all downstream steps as skipped

Reporter(plan, results).save("migration_report.md")
```

### `StepResult` — the Orchestrator → Reporter contract

```python
{
"step-001": {
"status": "passed" | "failed" | "skipped",
"reason": "", # empty on pass, error message on fail/skip
"changes_applied": [...] # file paths actually written
}
}
```

### Report sections

The generated `migration_report.md` contains:

1. Header — repo, source ID, timestamps
2. Summary — pass/fail/skip counts and overall success rate
3. Risk Assessment — verbatim from the plan
4. Steps Overview — one-row-per-step table with statuses
5. Step Details — affected symbols, file changes, validation queries, runtime outcome
6. Manual Review Required — list of every non-passed step with its reason

---

## CLI

```bash
# Analyse a repo and produce a plan
lme analyze --input engine_input.json --output refactor_plan.json

# Execute the plan (writes files, commits changes)
lme execute --input engine_input.json --plan refactor_plan.json

# Generate a Markdown report from a plan
lme report --plan refactor_plan.json --output migration_report.md
```

---

## Setup

```bash
# 1. Clone and install (Python 3.11+)
pip install -e ".[dev]"

# 2. Copy and fill in credentials
cp .env.example .env
# NIA_API_KEY=nk_...
# LLM_API_KEY=sk-...
```

---

## Running tests

```bash
# Part 3 unit + integration tests (no API key needed)
pytest tests/test_integration.py -v -k "not TestLiveNiaAPI"

# Full suite including live Nia API calls
NIA_API_KEY=nk_... pytest tests/test_integration.py -v -s

# All tests
pytest
```

### Test structure

| Class / function | Requires key | What it covers |
|---|---|---|
| `TestLiveNiaAPI` | Yes | All `NiaClient` methods against the real Nia API |
| `TestMockNiaClient` | No | All `MockNiaClient` return types and values |
| `TestReporter` | No | `generate()` content, stats accuracy, `save()` file I/O |
| `TestOrchestrator` | No | Orchestrator contract (auto-skipped until `orchestrator.py` exists) |
| `test_e2e_*` | No | Full fixture → Reporter pipeline; full pipeline when Orchestrator is present |

### Offline local test

```bash
py run_local_test.py
```

Loads `tests/fixtures/sample_plan.json`, runs the Orchestrator with `MockNiaClient`, and saves `migration_report.md` — no credentials required.

---

## Dependencies

| Package | Used for |
|---|---|
| `httpx` | Nia API HTTP client |
| `pydantic` | All shared data models |
| `typer` | CLI |
| `python-dotenv` | `.env` loading |
| `openai` | LLM calls (OpenAI provider) |
| `google-genai` | LLM calls (Gemini provider) |
| `anthropic` | LLM calls (Anthropic provider) |
| `rich` | Terminal output formatting |
| `pytest` / `respx` / `pytest-mock` | Testing |
18 changes: 9 additions & 9 deletions engine_input.json
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
{
"target": {
"repo": "acme-corp/legacy-monolith",
"ref": "main",
"repo": "codeFafnir/monolith-to-microservices",
"ref": "master",
"goal": "monolith_to_microservices",
"instructions": "Split the user-auth module and payment module into separate services. Keep the shared ORM models in a common library.",
"scope": ["src/auth", "src/payments", "src/models"],
"guidelines_repo": "acme-corp/engineering-standards",
"instructions": "Decompose the monolith Express server at monolith/src/server.js into three independent Node.js/Express microservices.\n\n1. ORDERS SERVICE — microservices/src/orders/\n - Implement GET /service/orders and GET /service/orders/:id.\n - Load order data from a local data/orders.json file (copy monolith/data/orders.json).\n - Listen on PORT env var, default 8081.\n - Own package.json with express dependency, a Dockerfile, and k8s/deployment.yml + k8s/service.yml.\n\n2. PRODUCTS SERVICE — microservices/src/products/\n - Implement GET /service/products and GET /service/products/:id.\n - Load product data from a local data/products.json file (copy monolith/data/products.json).\n - Listen on PORT env var, default 8082.\n - Own package.json with express dependency, a Dockerfile, and k8s/deployment.yml + k8s/service.yml.\n\n3. FRONTEND SERVICE — microservices/src/frontend/\n - Serve the pre-built React static files from a public/ directory (same content as monolith/public).\n - Proxy GET /service/orders* to the ORDERS_HOST env var (default http://localhost:8081).\n - Proxy GET /service/products* to the PRODUCTS_HOST env var (default http://localhost:8082).\n - Use http-proxy-middleware for proxying.\n - Serve public/index.html for all other routes (client-side routing support).\n - Listen on PORT env var, default 8080.\n - Own package.json with express and http-proxy-middleware dependencies, a Dockerfile, and k8s/deployment.yml + k8s/service.yml.\n\nAll three services are structurally independent — no shared code between them. The monolith files under monolith/ must not be modified or deleted.",
"scope": ["monolith"],
"guidelines_repo": null,
"guidelines_doc_url": null
},
"config": {
"nia_api_key": "nk_...",
"llm_api_key": "sk-...",
"llm_provider": "openai",
"llm_model": "gpt-4o",
"nia_api_key": "<NIA_API_KEY>",
"llm_api_key": "<LLM_API_KEY>",
"llm_provider": "gemini",
"llm_model": "gemini-2.5-flash",
"max_files_per_step": 10,
"dry_run": false
}
Expand Down
6 changes: 4 additions & 2 deletions legacy_modernization_engine.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ Version: 0.1.0
Summary: AI-powered legacy architecture modernization engine
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.6
Requires-Dist: pydantic>=2.7
Requires-Dist: typer>=0.12
Requires-Dist: python-dotenv>=1.0
Requires-Dist: openai>=1.30
Requires-Dist: google-genai>=1.7
Requires-Dist: anthropic>=0.28
Requires-Dist: python-dotenv>=1.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-mock>=3.12; extra == "dev"
Requires-Dist: respx>=0.21; extra == "dev"
10 changes: 9 additions & 1 deletion legacy_modernization_engine.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ pyproject.toml
legacy_modernization_engine.egg-info/PKG-INFO
legacy_modernization_engine.egg-info/SOURCES.txt
legacy_modernization_engine.egg-info/dependency_links.txt
legacy_modernization_engine.egg-info/entry_points.txt
legacy_modernization_engine.egg-info/requires.txt
legacy_modernization_engine.egg-info/top_level.txt
src/__init__.py
src/cli.py
src/architect/__init__.py
src/architect/agent.py
src/architect/analyzer.py
Expand All @@ -17,4 +19,10 @@ src/models/input.py
src/models/plan.py
src/nia_client/__init__.py
src/nia_client/client.py
tests/test_architect.py
src/nia_client/indexer.py
src/nia_client/searcher.py
src/worker/__init__.py
src/worker/reporter.py
tests/test_architect.py
tests/test_integration.py
tests/test_nia_client.py
6 changes: 4 additions & 2 deletions legacy_modernization_engine.egg-info/requires.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
httpx>=0.27
pydantic>=2.6
pydantic>=2.7
typer>=0.12
python-dotenv>=1.0
openai>=1.30
google-genai>=1.7
anthropic>=0.28
python-dotenv>=1.0
rich>=13.0

[dev]
pytest>=8.0
pytest-asyncio>=0.23
pytest-mock>=3.12
respx>=0.21
Loading