Hi, thanks for curating this awesome list on machine learning for source code.
I would like to propose WFGY as a complementary framework for people using LLMs on code, especially when they wrap them in tools, agents or RAG systems.
WFGY is a text-only “semantic firewall” that you load into an LLM. It helps you diagnose why a code-oriented assistant is failing: retrieval issues, chunking issues, long-chain drift, etc.
Key component:
ProblemMap (WFGY 2.0)
https://github.com/onestardao/WFGY/tree/main/ProblemMap
- 16 named failure modes for LLM / RAG pipelines.
- Each entry treats the system as a pipeline (ingestion → embedding → store → retrieval → reasoning), not just a black-box model.
- This is especially relevant for LLMs that browse codebases through vector stores or code search.
There is also a more formal:
- WFGY 1.0 PDF explaining the math and structure, and
- WFGY 3.0 Singularity Demo for long-horizon reasoning tests.
If you consider frameworks that help debug code-oriented LLM tools in scope, WFGY might be worth a short mention. I can open a PR if helpful.
Hi, thanks for curating this awesome list on machine learning for source code.
I would like to propose WFGY as a complementary framework for people using LLMs on code, especially when they wrap them in tools, agents or RAG systems.
WFGY is a text-only “semantic firewall” that you load into an LLM. It helps you diagnose why a code-oriented assistant is failing: retrieval issues, chunking issues, long-chain drift, etc.
Key component:
ProblemMap (WFGY 2.0)
https://github.com/onestardao/WFGY/tree/main/ProblemMap
There is also a more formal:
If you consider frameworks that help debug code-oriented LLM tools in scope, WFGY might be worth a short mention. I can open a PR if helpful.