Skip to content

feat: add language-agnostic analysis packages for pyscn parity#1

Open
DaisukeYoda wants to merge 1 commit intomainfrom
feat/pyscn-parity-packages
Open

feat: add language-agnostic analysis packages for pyscn parity#1
DaisukeYoda wants to merge 1 commit intomainfrom
feat/pyscn-parity-packages

Conversation

@DaisukeYoda
Copy link
Copy Markdown
Member

Summary

  • Add 7 new packages (domain/, dfa/, semantic/, cbo/, lcom/, nesting/, source/) and extend clone/ with 3 new files to extract all language-independent analysis logic from pyscn into codescan-core
  • This enables pyscn to replace its internal copies with direct imports from codescan-core, reducing duplication and centralizing the shared algorithms
  • All new code follows the existing adapter pattern (interfaces + any typed fields) so language-specific logic remains in pyscn/jscan

New packages

Package Description
domain/ Shared error types (DomainError), RiskLevel, CloneType, OutputFormat, SortCriteria, and all default threshold constants
dfa/ Data flow analysis: DefUseKind, VarReference, DefUsePair, DefUseChain, DFAInfo, DFAFeatures, and DFABuilder with RefExtractor interface for language adapters
semantic/ CFG+DFA semantic similarity: ExtractCFGFeatures, CompareCFGFeatures, CompareDFAFeatures, ComputeSimilarity with configurable weights
cbo/ Coupling Between Objects: ComputeCBO with dependency kind breakdown and risk assessment
lcom/ Lack of Cohesion of Methods (LCOM4): Union-Find algorithm with connected component grouping
nesting/ Nesting depth analysis via NestingClassifier interface for language-agnostic tree traversal
source/ File collection with glob-based include/exclude filtering and recursive directory walking

Extended clone/ package

File Description
fragment.go CodeFragment (implements GroupableItem), ClonePair, CloneGroup, CloneStatistics
similarity.go SimilarityAnalyzer interface + StructuralAnalyzer wrapping APTED
classifier.go Cascade clone classifier (Type-1 → Type-4) with configurable thresholds

Key design decisions

  • dfa.RefExtractor and nesting.NestingClassifier are interfaces so language-specific AST logic stays in pyscn/jscan
  • semantic.ComputeSimilarity returns 0.0 for nil CFGs to prevent false-positive matches on parse failures
  • dfa.DFABuilder.Build validates the extractor is non-nil before use, returning an error instead of panicking
  • No existing files were modified — only new files added

Test plan

  • go test ./... -count=1 — all 13 packages pass (24 new files, 4261 lines added)
  • go vet ./... — clean
  • go build ./... — clean
  • Nil-input edge cases covered (nil CFG, nil extractor, nil classifier, empty inputs)

🤖 Generated with Claude Code

Add 7 new packages and extend clone/ to extract all language-independent
logic from pyscn into codescan-core, enabling pyscn to import these
directly instead of maintaining its own copies.

New packages:
- domain/: shared error types, risk levels, clone types, default thresholds
- dfa/: data flow analysis with DefUse chains and reaching definitions
- semantic/: CFG+DFA semantic similarity comparison
- cbo/: Coupling Between Objects metric with risk assessment
- lcom/: Lack of Cohesion of Methods (LCOM4) via Union-Find
- nesting/: nesting depth analysis via NestingClassifier interface
- source/: file collection with glob pattern filtering

Extended clone/ with:
- fragment.go: CodeFragment implementing GroupableItem
- similarity.go: SimilarityAnalyzer interface + APTED-based StructuralAnalyzer
- classifier.go: cascade clone classifier (Type-1 → Type-4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant