Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

asm-parser is a C++ tool that categorizes and filters assembly code for Compiler Explorer. It processes both regular compiler assembly output and GNU objdump binary output, providing JSON or text output with filtered assembly suitable for display in CE.

## Build Commands

**Initial Setup:**
```bash
./setup.sh # Sets up Python venv, installs Conan2, configures C++20 profile
```

**Development Build:**
```bash
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug -G Ninja # Ninja preferred
cmake --build . --config Debug
```

**Production Build:**
```bash
mkdir -p build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --target asm-parser
```

**Running Tests:**
```bash
make test # Via CMake/CTest
./build/src/test/asm-parser-test # Direct execution
```

**Code Formatting:**
```bash
clang-format -i src/*/*.cpp src/*/*.hpp
```

## Architecture

**Core Components:**
- `src/assembly/parser.{cpp,hpp}` - Parses compiler assembly text output
- `src/objdump/parser.{cpp,hpp}` - Parses GNU objdump binary output
- `src/types/` - Core interfaces (IParser) and data structures (Filter, Line)
- `src/utils/` - JSON output, regex wrappers, library detection utilities

**Parser Architecture:**
Both parsers implement `IParser` interface with state machine-based parsing. The `Filter` struct controls which filtering operations are applied (directives, unused labels, comments, library functions, etc.).

**Key Features:**
- Binary mode: Processes objdump output with addresses, opcodes, relocations
- Assembly text mode: Processes compiler-generated assembly
- Label analysis: Identifies and removes unused labels/functions
- Library detection: Filters external library code based on file paths
- Multiple output formats: JSON (default) or filtered text

## Testing Framework

Uses **Catch2** with **ApprovalTests** for regression testing. Test data in `/resources/` includes real-world assembly examples and bug reproduction cases from Compiler Explorer.

**Test file naming convention:**
- Input: `resources/example.asm`
- Expected output: `resources/asmtext_filter_tests.example.approved.txt`

Tests cover various architectures, compiler outputs, and edge cases from CE bug reports.

## Dependencies

Managed via **Conan 2.x**:
- Catch2 2.13.10 (testing)
- ApprovalTests.cpp 10.12.2 (golden master testing)
- fmt 11.0.0 (string formatting)
- ctre 3.7.1 (compile-time regex)

## Development Notes

- Requires GCC 12+ or equivalent with C++20 support
- Debug builds include sanitizers (AddressSanitizer, LeakSanitizer, UBSan)
- Release builds use -O3 with Link Time Optimization
- UTF-8 locale required for Unicode support
- Production deployment copies binary to `/usr/local/bin/asm-parser`

## Common Usage Patterns

**Binary objdump processing:**
```bash
objdump -d a.out -l --insn-width=16 | asm-parser -stdin -binary
```

**Assembly text filtering:**
```bash
asm-parser -directives -unused_labels -comment_only file.asm
```

**Text output mode:**
```bash
asm-parser -outputtext -library_functions input.asm
```
1 change: 1 addition & 0 deletions resources/asmtext_filter_tests.ce-bug-1648.approved.txt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions resources/ce-bug-1648.asm

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 20 additions & 2 deletions src/objdump/parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ void AsmParser::ObjDumpParser::label()
if (this->filter.library_functions)
this->maybe_remove_last_function();

this->state.ignoreUntilNextLabel = AssemblyTextParserUtils::shouldIgnoreFunction(this->state.text, this->filter);
this->state.ignoreUntilNextLabel = this->shouldIgnoreFunction(this->state.text);
if (this->state.ignoreUntilNextLabel)
return;

Expand All @@ -159,9 +159,10 @@ void AsmParser::ObjDumpParser::labelref()
{
this->state.currentLabelReference.name = this->state.text.substr(this->state.currentLabelReference.range.start_col);

if (!AssemblyTextParserUtils::shouldIgnoreFunction(this->state.currentLabelReference.name, this->filter))
if (!AsmParser::AssemblyTextParserUtils::shouldIgnoreFunction(this->state.currentLabelReference.name, this->filter))
{
this->state.currentLine.labels.push_back(this->state.currentLabelReference);
this->referenced_functions.insert(this->state.currentLabelReference.name);
}
}
catch (...)
Expand Down Expand Up @@ -297,6 +298,23 @@ void AsmParser::ObjDumpParser::address()
this->state.text.clear();
}

bool AsmParser::ObjDumpParser::shouldIgnoreFunction(std::string_view name) const
{
Comment thread
mattgodbolt marked this conversation as resolved.
if (name == "main")
{
return false;
}

// Don't filter if the function is referenced by a non-filtered function
if (this->referenced_functions.count(std::string(name)) > 0)
Comment thread
mattgodbolt marked this conversation as resolved.
{
return false;
}

// Apply the original filtering logic
return AssemblyTextParserUtils::shouldIgnoreFunction(name, this->filter);
}

void AsmParser::ObjDumpParser::setReproducible()
{
this->reproducible = true;
Expand Down
3 changes: 2 additions & 1 deletion src/objdump/parser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <iosfwd>
#include <string_view>
#include <unordered_map>
#include <unordered_set>

namespace AsmParser
{
Expand Down Expand Up @@ -49,12 +50,12 @@ class ObjDumpParser : public IParser
LibraryDetection lib_detection;
std::vector<asm_line> lines;
std::vector<asm_labelpair_t> labels;
std::unordered_set<std::string> referenced_functions;

bool reproducible;

size_t total_lines{};

// todo: bad names
void actually_address();
void actually_filename();
void do_file_check(std::string_view filename);
Expand Down
34 changes: 34 additions & 0 deletions src/test/asmtext_filter_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -269,3 +269,37 @@ TEST_CASE("example-llvm-objdump")

ApprovalTests::Approvals::verify(ss.str());
}

TEST_CASE("ce-bug-1648")
{
AsmParser::Filter filter;
filter.binary = true;
filter.plt = true;
filter.library_functions = true;
filter.unused_labels = true;
filter.code_only = true;

std::string asmpath;
if (std::filesystem::current_path().string().ends_with("test"))
{
asmpath = "../../../resources/ce-bug-1648.asm";
}
else
{
asmpath = "../../resources/ce-bug-1648.asm";
}

AsmParser::ObjDumpParser parser(filter);
parser.setReproducible();

std::fstream fs;
fs.open(asmpath, std::fstream::in);
REQUIRE(fs.is_open() == true);

parser.fromStream(fs);

std::stringstream ss;
parser.outputJson(ss);

ApprovalTests::Approvals::verify(ss.str());
}