Add find-untested-sources-polyglot skill (tree-sitter, 9 languages)#735
Open
Evangelink wants to merge 2 commits into
Open
Add find-untested-sources-polyglot skill (tree-sitter, 9 languages)#735Evangelink wants to merge 2 commits into
Evangelink wants to merge 2 commits into
Conversation
Sibling of the C# find-untested-sources skill that uses tree-sitter (via tree-sitter-language-pack) to extract declarations + imports across Python, TS/JS, Go, Java, Rust, C#, and Ruby with no build step. Output schema mirrors the C# skill's so prompts can consume either tool. Pairing strategies per language: - Import resolution (Python module path, TS/JS relative ./../index.*, Go pkg/<name>.go, Java FQCN, Rust use::, Ruby require). - Identifier overlap (every >=4-char token in the test source is cross-referenced with the declared-name index). Smoke-tested on: - python-flask-tasks fixture: 8 source / 0 test, 8 untested. - typescript-vitest-cart fixture: 8 source / 0 test, 8 untested (node_modules pruned). - AITestAgent C# repo: 3138 source / 761 test, 1419 tested, 1719 untested, 15 orphan. - gh-skills Go repo: 14 source / 7 test, 8 tested, 6 untested, 0 orphan. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ython methods) tree-sitter-language-pack's ProcessResult exposes two complementary views: - structure: top-level items (classes, functions, structs, traits) - symbols: flat declared-name list Neither alone is complete: - Go: structure has methods/functions but NOT 'type' declarations; symbols has the types. Without union, a test that references a struct name finds no matching source and the source is incorrectly marked untested. - Python: structure stops at top-level, symbols also lists nested methods. - Rust: structure has structs and impls; symbols has fn names inside impls. Fix: union both lists, then drop kind='module'/'namespace' (these are packaging items that would cause false positives, e.g. Java's tree-sitter output emits the package name 'com' as a Module). Verified across all 9 supported languages with synthetic source+test+orphan fixtures (python, javascript, typescript, tsx, go, java, rust, csharp, ruby): each fixture now reports tested=1, untested=1, orphan=0. Real-repo regression check (AITestAgent C#): tested 1419 -> 1429, untested 1719 -> 1709 (10 sources now correctly paired via types previously missing from the declared-name index). Orphan count unchanged at 15 (no false-positive pairings introduced). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Skill Coverage Report
|
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new find-untested-sources-polyglot skill under the dotnet-test plugin, providing a Python-based, parse-only analyzer that pairs tests to sources across multiple languages (via tree-sitter-language-pack) and emits a JSON report of untested sources.
Changes:
- Added
find-untested-sources-polyglotskill documentation describing inputs, output schema, heuristics, and limitations. - Added
scripts/find_untested_sources.py, implementing repo scanning, per-language test/source classification, symbol extraction, import-based pairing, identifier-overlap pairing, and JSON output controls.
Show a summary per file
| File | Description |
|---|---|
| plugins/dotnet-test/skills/find-untested-sources-polyglot/SKILL.md | Documents the new polyglot skill, including usage, output schema, and heuristic pairing approach. |
| plugins/dotnet-test/skills/find-untested-sources-polyglot/scripts/find_untested_sources.py | Implements the analyzer (file discovery, parsing, pairing logic, and JSON output). |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 5
Comment on lines
+160
to
+161
| - `symbols` — flat declared-name list, used as a fallback when | ||
| `structure` is empty. |
Comment on lines
+139
to
+143
| if lang == "java": | ||
| if "test" in parts or "tests" in parts: | ||
| return True | ||
| return name.endswith("test.java") or name.endswith("tests.java") or name.startswith("test") | ||
|
|
Comment on lines
+155
to
+156
| if stem.endswith("tests") or stem.endswith("test"): | ||
| return True |
Comment on lines
+483
to
+491
| # Limit to declarations that are >=4 chars to avoid noise. | ||
| for decl, sources in by_decl.items(): | ||
| if len(decl) < 4: | ||
| continue | ||
| if decl in test.referenced_identifiers: | ||
| for s in sources: | ||
| if s.lang == test.lang: | ||
| found.add(s) | ||
| return found |
Comment on lines
+307
to
+313
| base = PurePosixPath(target) | ||
| if target.startswith("/"): | ||
| joined = PurePosixPath(target.lstrip("/")) | ||
| else: | ||
| joined = (test_rel.parent / base).as_posix() | ||
| joined = PurePosixPath(re.sub(r"/[^/]+/\.\./", "/", joined)) | ||
| # Try various extensions and /index suffix. |
Contributor
|
👋 @Evangelink — this PR has 5 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a polyglot sibling to the C#
find-untested-sourcesskill (PR #733): asingle
python scripts/find_untested_sources.py <repo>invocation that listsproduction source files with no test file referencing any of their declared
symbols, across Python, TypeScript, TSX, JavaScript, Go, Java, Rust, C#, and
Ruby. Uses
tree-sitter-language-pack(one wheel, bundled grammars, nonative build).
Why both?
The C# version uses Roslyn with strict namespace disambiguation — strictly
better on .NET-only repos. The polyglot version covers everything else with
the same output schema, so prompts can branch on language/extension and call
whichever tool fits the repo.
The skill's
DO NOT USE FORsays:For a .NET-only repo, prefer find-untested-sources. They're sibling tools, not replacements.How it works
bin,obj,node_modules,target,dist,vendor,__pycache__,.venv,.git, ...); skip generated files (*.d.ts,*.g.cs,*.Designer.cs,_pb2.py,*.min.js,AssemblyInfo.cs).detect_language_from_path.table).
tree_sitter_language_pack.process(text, ProcessConfig(structure=True, imports=True, symbols=True)). Unionstructureandsymbolsfor the declared-name set (each view isincomplete on its own — e.g. Go
typedeclarations only appear insymbols, methods only instructure); filtermodule/namespacekinds (would otherwise pair on Java package names)../../index.*,Go
pkg/<name>.go, Java FQCN, Rustuse::, Rubyrequire).against the declared-name index; this is what catches C#, since
usingmaps to namespaces not files).Output JSON schema deliberately mirrors PR #733's so the same prompt patterns
consume either tool:
untested_sourcesordered bydeclaration_count,each entry has
suggested_test_path, plus anorphan_testslist.Validation
A synthetic harness creates one fixture per language with a paired source, an
orphan source, and a test referencing only the paired source, then asserts
tested == 1 / untested == 1 / orphan == 0and that the test appears inthe paired source's
covering_tests:Real-repo smoke tests (no expected-output assertions, just sanity checks the
volume / orphan counts are reasonable):
--lang typescript)The C# numbers are slightly worse than PR #733's Roslyn-based pairing
(1429 vs Roslyn's tighter count), which is exactly why the SKILL.md routes
.NET-only repos to the Roslyn skill.
Limitations (documented honestly in SKILL.md)
string name or container resolution).
find-untested-sourcesfor .NET-only repos.pairings on names like
id,db,Tag.suffix-match fallback may pick the wrong source if two files share a
trailing path segment in different sub-projects.
Commits
39c786671— initial polyglot skill (SKILL.md + script, 9 languages).e1aa26772— unionstructure+symbolsafter the validationharness caught that Go
typedeclarations were being dropped, causingsource/test pairings to silently miss.
Files
plugins/dotnet-test/skills/find-untested-sources-polyglot/SKILL.mdplugins/dotnet-test/skills/find-untested-sources-polyglot/scripts/find_untested_sources.py