Clarpse is a multi-language architectural code analysis library for building better software tools.
<dependency>
<groupId>io.github.hadi-technology</groupId>
<artifactId>clarpse</artifactId>
<version>9.5.1</version>
</dependency>Clarpse facilitates the development of tools that operate over the higher level, architectural details of source code, which are exposed via an easy to use, object oriented API. Checkout the power of Clarpse in striff-lib.
Clarpse is a multi-language parsing and analysis library that converts source code into a language-agnostic, object-oriented model. That model makes it easy to build tooling on top of architecture-level details like components, references, and structure without dealing with raw ASTs.
- Supports Java with a lightweight, architecture-focused parser.
- Supports C# with JVM-based parsing, partial type merging, namespace-aware indexing, and fast in-repo symbol resolution.
- Supports TypeScript with compiler-accurate, tsconfig-aware parsing and resolution, including constructor parameter property detection and monorepo support.
- Supports Python with Pyright-backed parsing, including nested classes, comment parsing, cyclomatic complexity, code hashing, and visibility inference.
- Runtime configuration through bundled properties file
- Light weight
- Performant
- Easy to use
- Clean API built on top of AST
- Support for parsing comments
- Parallel parsing with configurable worker counts
- Java 17
- Maven 3.x
- Node.js 18/20/22/25 (required for TypeScript and Python parsing)
- No global
typescriptorpyrightinstall is required (both are bundled) - No local Python interpreter is required for Python parsing
Build the jar:
mvn clean package assembly:single
Start the HTTP API:
java -cp target/clarpse-<version>.jar com.hadi.clarpse.server.ClarpseServer
Health check:
curl -s http://localhost:8080/health
Parse a JSON request:
curl -s -X POST http://localhost:8080/parse \
-H "Content-Type: application/json" \
-d '{"language":"java","files":[{"path":"src/Foo.java","content":"package test; class Foo { void m() {} }"}]}'Parse a zip (Java, TypeScript, or Python):
curl -s -X POST "http://localhost:8080/parse?lang=typescript" \
-H "Content-Type: application/zip" \
--data-binary @project.zipNotes:
- TypeScript parsing requires a valid
tsconfig.jsonin the project input. - Python parsing uses bundled Pyright plus project imports/config for internal type linking.
- TypeScript and Python daemons resolve only bundled compiler/type-checker runtimes.
- Environment variables:
CLARPSE_PORT,CLARPSE_MAX_BYTES,CLARPSE_PARALLELISM,CLARPSE_PYTHON_PARALLELISM,CLARPSE_NODE_PATH. - Node override system properties:
clarpse.node.path,clarpse.node.disabled.
Build and run the container (no local jar required):
docker build -t clarpse-api .
docker run -p 8080:8080 clarpse-apiThen call the API the same way as the local server:
curl -s -X POST http://localhost:8080/parse \
-H "Content-Type: application/json" \
-d '{"language":"java","files":[{"path":"src/Foo.java","content":"package test; class Foo { void m() {} }"}]}'Clarpse supports runtime configuration through environment variables, system properties, and a bundled properties file.
CLARPSE_PARALLELISMcontrols Java parser thread count.CLARPSE_PYTHON_PARALLELISMor-Dclarpse.python.parallelism=<n>controls Python worker count.- Values
1or lower force serial parsing. - If unset, Clarpse auto-selects a bounded value based on CPU count and file count.
Example:
CLARPSE_PARALLELISM=4 mvn test
Clarpse includes configurable limits for zip processing to prevent resource exhaustion. These can be overridden via system properties or by modifying src/main/resources/clarpse.properties:
clarpse.zip.maxEntries(default: 100000) - Maximum number of entries in a zip fileclarpse.zip.maxTotalUncompressedBytes(default: 209715200, ~200MB) - Maximum total uncompressed sizeclarpse.zip.maxEntryUncompressedBytes(default: 10485760, ~10MB) - Maximum size per entry
CLARPSE_NODE_PATHor-Dclarpse.node.path=<path>sets a custom Node.js executable path.CLARPSE_NODE_DISABLEDor-Dclarpse.node.disabled=truedisables Node.js (TypeScript and Python parsing will fail).CLARPSE_NODE_HEAP_SIZEor-Dclarpse.node.heapSize=<MB>sets Node.js heap size in MB (default: 4096). Increase for large TypeScript/Python projects.
Example for large projects:
CLARPSE_NODE_HEAP_SIZE=8192 mvn test
# or
java -Dclarpse.node.heapSize=8192 -jar app.jarKey areas of the repository:
src/main/java/com/hadi/clarpse/compiler- Language compilers, project file handling, and orchestration.src/main/java/com/hadi/clarpse/compiler/typescript- TypeScript compiler bridge and models.src/main/java/com/hadi/clarpse/compiler/python- Python compiler bridge and models.src/main/java/com/hadi/clarpse/compiler/ClarpseProperties.java- Runtime properties loader.src/main/java/com/hadi/clarpse/listener- Parse tree listeners that build the source model (Java).src/main/java/com/hadi/clarpse/sourcemodel- Component and package models.src/main/java/com/hadi/clarpse/reference- Component reference types.src/main/resources- Parser helpers, daemon scripts, and configuration (TypeScript and Python daemons, properties file).src/test/java- Unit and integration tests.src/test/resources- Test fixtures and zipped codebases used by tests.
| Term | Definition |
|---|---|
| Component | A language independent source unit of the code, typically represented by a class, method, interface, field variable, local variable, enum, etc .. |
| OOPSourceCodeModel | A representation of a codebase through a collection of Component objects. |
| Component Reference | A reference between an original component to a target component, which typically exist in the form of import statements, variable declarations, method calls, and so on. |
Build and test in three steps:
- Generate ANTLR sources:
mvn generate-resources - Run tests:
mvn test - Build the full artifact:
mvn clean package assembly:single
Run a single test class:
mvn -Dtest=com.hadi.test.java.SmokeTest test
The parsing flow is:
ProjectFiles -> ClarpseProject -> ClarpseCompiler -> Language Listener -> OOPSourceCodeModel
High level steps:
- Collect files in
ProjectFiles(directory, zip, or in-memory). ClarpseProjectselects a language compiler.- The compiler parses files and walks the parse tree.
- The language listener builds
Componentobjects and references. - The resulting
OOPSourceCodeModelis used by downstream tooling.
Core classes and where they live:
- Project entry and orchestration:
src/main/java/com/hadi/clarpse/compiler/ClarpseProject.java - Project inputs:
src/main/java/com/hadi/clarpse/compiler/ProjectFiles.java,src/main/java/com/hadi/clarpse/compiler/ProjectFile.java - Runtime properties:
src/main/java/com/hadi/clarpse/compiler/ClarpseProperties.java,src/main/resources/clarpse.properties - Compiler selection and results:
src/main/java/com/hadi/clarpse/compiler/CompilerFactory.java,src/main/java/com/hadi/clarpse/compiler/ClarpseCompiler.java,src/main/java/com/hadi/clarpse/compiler/CompileResult.java - Language compilers:
src/main/java/com/hadi/clarpse/compiler/ClarpseJavaCompiler.java,src/main/java/com/hadi/clarpse/compiler/typescript/ClarpseTypeScriptCompiler.java,src/main/java/com/hadi/clarpse/compiler/python/ClarpsePythonCompiler.java - Parse listeners:
src/main/java/com/hadi/clarpse/listener/JavaTreeListener.java - Source model:
src/main/java/com/hadi/clarpse/sourcemodel/OOPSourceCodeModel.java,src/main/java/com/hadi/clarpse/sourcemodel/Component.java,src/main/java/com/hadi/clarpse/sourcemodel/Package.java - References:
src/main/java/com/hadi/clarpse/reference/ComponentReference.javaand related types insrc/main/java/com/hadi/clarpse/reference - TypeScript daemon:
src/main/resources/typescript/daemon.js - Python daemon:
src/main/resources/python/daemon.js
Note: TypeScript and Python parsing require Node.js.
Architecture docs:
docs/typescript-architecture.mddocs/python-architecture.md
Clarpse abstracts source code into a higher level model in a language-agnostic way.
The snippet below shows how to generate the model from in-memory files.
final String code =
"package com.foo; " +
"public class SampleClass { " +
" public void sampleMethod(String sampleMethodParam) { } " +
"}";
final ProjectFiles projectFiles = new ProjectFiles();
projectFiles.insertFile(new ProjectFile("src/SampleClass.java", code));
final ClarpseProject project = new ClarpseProject(projectFiles, Lang.JAVA);
CompileResult compileResult = project.result();
OOPSourceCodeModel codeModel = compileResult.model();
Collection<CompileFailure> failures = compileResult.failures();Path rules for ProjectFile:
- Relative paths are supported and normalized (for example
src/Foo.java). - Absolute paths are supported.
- Parent traversal (
..) is rejected.
ProjectFiles can be initialized from:
- a local directory path
- a local zip file path
- a zip input stream
- in-memory
ProjectFileentries
See src/test/java/com/hadi/test/ProjectFilesTest.java for examples.
TypeScript usage follows the same API, but requires Node.js and a valid tsconfig.json:
final ProjectFiles projectFiles = new ProjectFiles("/path/to/typescript-project");
final ClarpseProject project = new ClarpseProject(projectFiles, Lang.TYPESCRIPT);
CompileResult compileResult = project.result();
OOPSourceCodeModel codeModel = compileResult.model();Next, inspect components:
codeModel.components().forEach(component -> {
System.out.println(component.uniqueName());
System.out.println(component.componentType());
System.out.println(component.comment());
System.out.println(component.modifiers());
System.out.println(component.children());
System.out.println(component.sourceFile());
});Fetch a specific component by unique name:
Component classComponent = codeModel.getComponent("com.foo.SampleClass")
.orElseThrow();
System.out.println(classComponent.name());
System.out.println(classComponent.componentType());
System.out.println(classComponent.references());
String childUniqueName = classComponent.children().get(0);
Component methodComponent = codeModel.getComponent(childUniqueName).orElseThrow();
System.out.println(methodComponent.name());
System.out.println(methodComponent.codeFragment());- Java/C#/TypeScript/Python all report recoverable issues in
CompileResult.failures()using language-agnostic error codes. CompileExceptionis reserved for non-recoverable compiler errors.
Standardized error codes:
1000Node runtime not available.1001Language runtime bundle not available.1002Required project config is missing (for exampletsconfig.json).1003Project config parse/validation failed.1004Program/repository initialization failed.2001File is outside active program/repository scope.2002File not found on disk.2003File parse/model extraction failed.2004Daemon transport/runtime error.2005File skipped due to excluded path rules.
Checklist for adding or updating a language implementation:
- Add or update the grammar in
src/main/antlr4/.... - Run
mvn generate-resourcesto regenerate parser sources. - Add a compiler in
src/main/java/com/hadi/clarpse/compiler. - Add a listener in
src/main/java/com/hadi/clarpse/listener. - Register the language and file extensions in
src/main/java/com/hadi/clarpse/compiler/Lang.java. - Add tests under
src/test/javaand fixtures undersrc/test/resources.
- Submit an issue describing your proposed change.
- Fork the repo, develop and test your code changes.
- Run
mvn testand ensure all tests pass. - If your change requires a version bump, update
pom.xmlandREADME.mdusing the x.y.z scheme:- x = main version number (breaking changes)
- y = feature number (new features, optional bug fixes)
- z = hotfix number (bug fixes only)
- Submit a pull request.