Add find-untested-sources-polyglot skill (tree-sitter, 9 languages) by Evangelink · Pull Request #735 · dotnet/skills

Evangelink · 2026-06-08T14:47:26Z

Adds a polyglot sibling to the C# find-untested-sources skill (PR #733): a
single python scripts/find_untested_sources.py <repo> invocation that lists
production source files with no test file referencing any of their declared
symbols, across Python, TypeScript, TSX, JavaScript, Go, Java, Rust, C#, and
Ruby. Uses tree-sitter-language-pack (one wheel, bundled grammars, no
native build).

Why both?

The C# version uses Roslyn with strict namespace disambiguation — strictly
better on .NET-only repos. The polyglot version covers everything else with
the same output schema, so prompts can branch on language/extension and call
whichever tool fits the repo.

The skill's DO NOT USE FOR says: For a .NET-only repo, prefer find-untested-sources. They're sibling tools, not replacements.

How it works

Walk repo, prune common build/vendor dirs (bin, obj,
node_modules, target, dist, vendor, __pycache__,
.venv, .git, ...); skip generated files (*.d.ts, *.g.cs,
*.Designer.cs, _pb2.py, *.min.js, AssemblyInfo.cs).
Detect language by extension via detect_language_from_path.
Classify test vs source via per-language path heuristics (see SKILL.md
table).
For each file call tree_sitter_language_pack.process(text, ProcessConfig(structure=True, imports=True, symbols=True)). Union
structure and symbols for the declared-name set (each view is
incomplete on its own — e.g. Go type declarations only appear in
symbols, methods only in structure); filter module/
namespace kinds (would otherwise pair on Java package names).
Pair each test with its sources via:
- Import resolution (Python module paths, TS/JS relative ./../index.*,
  Go pkg/<name>.go, Java FQCN, Rust use::, Ruby require).
- Identifier overlap (every ≥4-char token in the test source matched
  against the declared-name index; this is what catches C#, since
  using maps to namespaces not files).

Output JSON schema deliberately mirrors PR #733's so the same prompt patterns
consume either tool: untested_sources ordered by declaration_count,
each entry has suggested_test_path, plus an orphan_tests list.

Validation

A synthetic harness creates one fixture per language with a paired source, an
orphan source, and a test referencing only the paired source, then asserts
tested == 1 / untested == 1 / orphan == 0 and that the test appears in
the paired source's covering_tests:

Language	Result
python	PASS
javascript	PASS
typescript	PASS
tsx	PASS
go	PASS
java	PASS
rust	PASS
csharp	PASS
ruby	PASS

Real-repo smoke tests (no expected-output assertions, just sanity checks the
volume / orphan counts are reasonable):

Fixture	src	test	tested	untested	orphan
python-flask-tasks fixture	8	0	0	8	0
typescript-vitest-cart fixture (`--lang typescript`)	8	0	0	8	0
gh-skills (Go)	14	7	8	6	0
AITestAgent (C#, 3899 files)	3138	761	1429	1709	15

The C# numbers are slightly worse than PR #733's Roslyn-based pairing
(1429 vs Roslyn's tighter count), which is exactly why the SKILL.md routes
.NET-only repos to the Roslyn skill.

Limitations (documented honestly in SKILL.md)

Reflection / DI-resolved types invisible (test only references the type by
string name or container resolution).
C# namespace disambiguation weaker than the Roslyn version — recommend
find-untested-sources for .NET-only repos.
Short identifiers (< 4 chars) dropped from the overlap index to avoid noisy
pairings on names like id, db, Tag.
Monorepo path aliases (TS path mapping, Java module-info) not resolved;
suffix-match fallback may pick the wrong source if two files share a
trailing path segment in different sub-projects.

Commits

39c786671 — initial polyglot skill (SKILL.md + script, 9 languages).
e1aa26772 — union structure + symbols after the validation
harness caught that Go type declarations were being dropped, causing
source/test pairings to silently miss.

Files

plugins/dotnet-test/skills/find-untested-sources-polyglot/SKILL.md
plugins/dotnet-test/skills/find-untested-sources-polyglot/scripts/find_untested_sources.py

Sibling of the C# find-untested-sources skill that uses tree-sitter (via tree-sitter-language-pack) to extract declarations + imports across Python, TS/JS, Go, Java, Rust, C#, and Ruby with no build step. Output schema mirrors the C# skill's so prompts can consume either tool. Pairing strategies per language: - Import resolution (Python module path, TS/JS relative ./../index.*, Go pkg/<name>.go, Java FQCN, Rust use::, Ruby require). - Identifier overlap (every >=4-char token in the test source is cross-referenced with the declared-name index). Smoke-tested on: - python-flask-tasks fixture: 8 source / 0 test, 8 untested. - typescript-vitest-cart fixture: 8 source / 0 test, 8 untested (node_modules pruned). - AITestAgent C# repo: 3138 source / 761 test, 1419 tested, 1719 untested, 15 orphan. - gh-skills Go repo: 14 source / 7 test, 8 tested, 6 untested, 0 orphan. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ython methods) tree-sitter-language-pack's ProcessResult exposes two complementary views: - structure: top-level items (classes, functions, structs, traits) - symbols: flat declared-name list Neither alone is complete: - Go: structure has methods/functions but NOT 'type' declarations; symbols has the types. Without union, a test that references a struct name finds no matching source and the source is incorrectly marked untested. - Python: structure stops at top-level, symbols also lists nested methods. - Rust: structure has structs and impls; symbols has fn names inside impls. Fix: union both lists, then drop kind='module'/'namespace' (these are packaging items that would cause false positives, e.g. Java's tree-sitter output emits the package name 'com' as a Module). Verified across all 9 supported languages with synthetic source+test+orphan fixtures (python, javascript, typescript, tsx, go, java, rust, csharp, ruby): each fixture now reports tested=1, untested=1, orphan=0. Real-repo regression check (AITestAgent C#): tested 1419 -> 1429, untested 1719 -> 1709 (10 sources now correctly paired via types previously missing from the declared-name index). Orphan count unchanged at 15 (no false-positive pairings introduced). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-08T14:48:07Z

Skill Coverage Report

	Plugin	Skill	Covered	Coverage

Copilot

Pull request overview

Adds a new find-untested-sources-polyglot skill under the dotnet-test plugin, providing a Python-based, parse-only analyzer that pairs tests to sources across multiple languages (via tree-sitter-language-pack) and emits a JSON report of untested sources.

Changes:

Added find-untested-sources-polyglot skill documentation describing inputs, output schema, heuristics, and limitations.
Added scripts/find_untested_sources.py, implementing repo scanning, per-language test/source classification, symbol extraction, import-based pairing, identifier-overlap pairing, and JSON output controls.

Show a summary per file

File	Description
plugins/dotnet-test/skills/find-untested-sources-polyglot/SKILL.md	Documents the new polyglot skill, including usage, output schema, and heuristic pairing approach.
plugins/dotnet-test/skills/find-untested-sources-polyglot/scripts/find_untested_sources.py	Implements the analyzer (file discovery, parsing, pairing logic, and JSON output).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 5

+   - `symbols` — flat declared-name list, used as a fallback when
+     `structure` is empty.


+    if lang == "java":
+        if "test" in parts or "tests" in parts:
+            return True
+        return name.endswith("test.java") or name.endswith("tests.java") or name.startswith("test")
+


+        if stem.endswith("tests") or stem.endswith("test"):
+            return True


+    # Limit to declarations that are >=4 chars to avoid noise.
+    for decl, sources in by_decl.items():
+        if len(decl) < 4:
+            continue
+        if decl in test.referenced_identifiers:
+            for s in sources:
+                if s.lang == test.lang:
+                    found.add(s)
+    return found


+    base = PurePosixPath(target)
+    if target.startswith("/"):
+        joined = PurePosixPath(target.lstrip("/"))
+    else:
+        joined = (test_rel.parent / base).as_posix()
+        joined = PurePosixPath(re.sub(r"/[^/]+/\.\./", "/", joined))
+    # Try various extensions and /index suffix.


github-actions · 2026-06-08T15:32:09Z

👋 @Evangelink — this PR has 5 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

Copilot AI added 2 commits June 8, 2026 15:17

Copilot AI review requested due to automatic review settings June 8, 2026 14:47

Copilot started reviewing on behalf of Evangelink June 8, 2026 14:47 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

github-actions Bot added the waiting-on-author PR state label label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add find-untested-sources-polyglot skill (tree-sitter, 9 languages)#735

Add find-untested-sources-polyglot skill (tree-sitter, 9 languages)#735
Evangelink wants to merge 2 commits into
mainfrom
dev/amauryleve/find-untested-polyglot

Evangelink commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		- `symbols` — flat declared-name list, used as a fallback when
		`structure` is empty.

		if stem.endswith("tests") or stem.endswith("test"):
		return True

Conversation

Evangelink commented Jun 8, 2026

Why both?

How it works

Validation

Limitations (documented honestly in SKILL.md)

Commits

Files

Uh oh!

github-actions Bot commented Jun 8, 2026

Skill Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants