
go tool pprof.
PyTorch's profiler outputs traces in Chrome Trace Event format (JSON), which is difficult to analyze directly. This tool converts those traces into pprof's binary format, allowing you to:
- Visualize call stacks
- Identify performance bottlenecks
- Analyze CPU usage patterns
- Use the full suite of pprof analysis tools
git clone https://github.com/yourusername/torch2pprof
cd torch2pprof
make installgo install github.com/yourusername/torch2pprof/cmd/torch2pprof@latest# Using the convert subcommand
torch2pprof convert input_trace.json output_profile.pb.gz
# Works with compressed files too
torch2pprof convert input_trace.json.gz output_profile.pb.gz
# Or use the default behavior (for backward compatibility)
torch2pprof input_trace.json output_profile.pb.gzThis will:
- Load the PyTorch trace JSON file (supports both
.jsonand.json.gzfiles) - Parse all complete events (ph=X) with positive durations
- Build call stacks by analyzing event nesting
- Encode to pprof protobuf format with gzip compression
Note: Input files can be either plain JSON or gzip-compressed. The tool automatically detects compression based on file extension (.gz) or file content (magic number detection).
# Show top 20 operations (default)
torch2pprof analyze input_trace.json
# Works with compressed files
torch2pprof analyze input_trace.json.gz
# Show top 50 operations
torch2pprof analyze -top 50 input_trace.json.gzThis displays:
- Total number of events and statistics
- Time breakdown by category
- Top operations by total time
Note: Both .json and .json.gz files are supported.
After conversion, analyze the profile with go tool pprof:
go tool pprof output_profile.pb.gzCommon pprof commands:
top- Show top functions by timelist <function>- Show source code with annotationsweb- Generate a graph visualization (requires graphviz)flame- Generate flame graph
Convert PyTorch trace to pprof format.
torch2pprof convert <input.json|input.json.gz> <output.pb.gz>Arguments:
input.json|input.json.gz- PyTorch trace file in Chrome Trace Event format (plain or gzip-compressed)output.pb.gz- Output pprof profile (gzip compressed)
Features:
- Automatically detects gzip compression via
.gzextension or magic number - Supports both plain JSON and compressed JSON files
Analyze PyTorch trace and show statistics.
torch2pprof analyze [options] <input.json|input.json.gz>Options:
-top N- Show top N operations (default: 20)
Arguments:
input.json|input.json.gz- PyTorch trace file to analyze (plain or gzip-compressed)
Features:
- Automatically detects gzip compression via
.gzextension or magic number - Supports both plain JSON and compressed JSON files
torch2pprof/
├── cmd/ # Command-line applications
│ └── torch2pprof/ # Main tool with subcommands
│ └── main.go # Entry point with convert & analyze commands
│
├── internal/ # Private packages (not for external import)
│ ├── profile/
│ │ └── profile.go # pprof protobuf encoding
│ └── converter/ # Core conversion and analysis logic
│ ├── trace.go # Trace loading, processing, and conversion
│ └── analyzer.go # Trace analysis and statistics
│
├── test/ # Test data and utilities
│ └── pprof_verification.py # Python script to verify pprof output
│
├── data/ # Sample data
│ └── trace.json.gz # Example PyTorch trace
│
├── doc/ # Documentation
├── go.mod # Go module definition
├── go.sum # Dependency checksums
├── Makefile # Build automation
├── README.md # User documentation
# Build binary
make build
# Run tests
make test
# Run tests with coverage
make test-coverage
# Run tests with race detector
make test-race
# Format code
make fmt
# Lint code
make vet
# Build for multiple platforms
make distThe project has comprehensive unit tests with high code coverage:
- Test Coverage: 96.2% (converter), 93.0% (profile)
- Total Tests: 20 unit tests
- CI/CD: Automated testing on Linux, macOS, Windows
- Race Detection: All tests run with race detector
See TESTING.md for detailed testing documentation.
# Run all tests
make test
# Run with coverage report
make test-coverage
# Open coverage.html in browser
# Run with race detector
make test-race- Load Trace: Parse the JSON trace file containing Chrome Trace Event format
- Filter Events: Keep only complete events (ph=X) with positive duration
- Group by Thread: Organize events by their thread ID
- Build Stacks: For each event, determine its call stack by analyzing event overlaps:
- Events that temporally contain other events represent parent functions
- Uses a linear-time stack-based algorithm instead of O(n²) comparison
- Aggregate: Combine identical stacks and sum their durations
- Encode: Convert to pprof protobuf format and compress with gzip
- Linear time complexity for stack building (O(n) per thread)
- Parallel processing across multiple threads
- Efficient memory usage with string interning
- Go 1.24 or later
MIT
Contributions are welcome! Please ensure:
- Code passes
go fmtandgo vet - All tests pass
- New features include tests
For very large trace files (>100MB):
- Ensure sufficient memory (at least 2GB recommended)
- Consider filtering the trace in PyTorch before exporting
- Use gzip-compressed files (
.json.gz) to save disk space and reduce I/O time- Example: A 322MB JSON file compresses to 23MB with gzip (93% reduction)
The tool maintains maps for:
- String interning (string → index)
- Function deduplication (name+file → ID)
- Location deduplication (name+file → ID)
For profiles with millions of unique functions, this can use several GB.
- pprof - Profile visualization
- PyTorch Profiler - Profile generation
- Chrome DevTools - View traces directly