Skip to content

Fix statically linked library functions being filtered out#49

Draft
mattgodbolt wants to merge 3 commits into
mainfrom
claude/fix_filter
Draft

Fix statically linked library functions being filtered out#49
mattgodbolt wants to merge 3 commits into
mainfrom
claude/fix_filter

Conversation

@mattgodbolt

Copy link
Copy Markdown
Member

Summary

  • Fixes issue where statically linked library functions like __divdc3 were incorrectly filtered out even when directly called by user code
  • Implements a smart two-pass solution that keeps referenced functions while still filtering out unused library code

Changes

  • New method: collectReferencedFunctions() scans assembly to identify function calls from user code
  • Enhanced filtering: Functions are now kept if they're either user functions OR referenced by user functions
  • Comprehensive test case: Added ce-bug-1648.asm test demonstrating __divdc3 usage and proper filtering

Root Cause

The binaryIgnoreFunction regex filtered out all functions starting with __, including __divdc3, even when they were statically linked and called from user code. This created a poor user experience where important library function implementations were hidden.

Solution

Two-pass approach:

  1. Pass 1: Scan all assembly lines to collect functions referenced from user code
  2. Pass 2: Apply enhanced filtering logic that keeps both user functions and referenced library functions

Test plan

  • All existing tests pass with updated snapshots
  • New test case ce-bug-1648.asm demonstrates the fix working correctly
  • Only __divdc3 (the referenced function) is kept, other unreferenced __* functions remain filtered
  • No regressions in broader test suite
  • Local testing confirms __divdc3 function implementation now visible with static linking

Impact

  • Positive: Users can now see implementations of library functions they're actually calling
  • Minimal: Unreferenced library functions are still filtered out to avoid output bloat
  • Backward compatible: No change in behavior for existing use cases

Based on compiler-explorer/compiler-explorer#7776

🤖 Generated with Claude Code

## Summary
- Fixes issue where statically linked library functions like `__divdc3` were incorrectly filtered out even when directly called by user code
- Implements a smart two-pass solution that keeps referenced functions while still filtering out unused library code

## Changes
- **New method**: `collectReferencedFunctions()` scans assembly to identify function calls from user code
- **Enhanced filtering**: Functions are now kept if they're either user functions OR referenced by user functions
- **Comprehensive test case**: Added `ce-bug-1648.asm` test demonstrating `__divdc3` usage and proper filtering

## Root Cause
The `binaryIgnoreFunction` regex filtered out all functions starting with `__`, including `__divdc3`, even when they were statically linked and called from user code. This created a poor user experience where important library function implementations were hidden.

## Solution
Two-pass approach:
1. **Pass 1**: Scan all assembly lines to collect functions referenced from user code
2. **Pass 2**: Apply enhanced filtering logic that keeps both user functions and referenced library functions

## Impact
- **Positive**: Users can now see implementations of library functions they're actually calling
- **Minimal**: Unreferenced library functions are still filtered out to avoid output bloat
- **Backward compatible**: No change in behavior for existing use cases

Based on compiler-explorer/compiler-explorer#7776

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@mattgodbolt mattgodbolt marked this pull request as draft June 9, 2025 16:16
@mattgodbolt mattgodbolt requested a review from Copilot June 9, 2025 16:18
@mattgodbolt

Copy link
Copy Markdown
Member Author

Doesn't look bad to me; keeps in the spirit of the existing code (Which I don't know that well). It's not quite how I'd do it myself but it seems to work (and again keeps in the existing style).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a two-pass parsing strategy to ensure statically linked library functions called by user code aren’t filtered out.

  • Adds collectReferencedFunctions() to scan for user-referenced symbols before filtering
  • Updates shouldIgnoreFunction() to preserve any functions found in the first pass
  • Introduces ce-bug-1648.asm to validate that __divdc3 is retained while other __* symbols remain filtered

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

File Description
src/objdump/parser.hpp Added referenced_functions member and declared collectReferencedFunctions()
src/objdump/parser.cpp Implemented two-pass logic: collect references, reset stream, override filtering
resources/ce-bug-1648.asm New assembly test case exercising retention of __divdc3
Comments suppressed due to low confidence (1)

CLAUDE.md:1

  • [nitpick] The file name CLAUDE.md is ambiguous; consider renaming it to something more descriptive like CODE_GUIDELINES.md or CONTRIBUTING.md to clarify its purpose.
# CLAUDE.md

Comment thread src/objdump/parser.cpp
Comment thread src/objdump/parser.cpp
@mattgodbolt mattgodbolt marked this pull request as ready for review June 9, 2025 16:21
@mattgodbolt mattgodbolt requested a review from partouf June 9, 2025 16:21
Comment thread src/objdump/parser.cpp Outdated
std::string current_function;
bool in_user_function = false;

while (std::getline(in, line))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of going over the istream twice, that might negate any benefit we get from using this parser. I would prefer adding this to the eol() function add the found functions to a list that can filtered with at the end before exporting to json.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah but that might produce issues if the dump is larger than 5000 lines...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is the only viable solution... but I don't like it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this when I did the TS version. But we need to do two passes, as having found something is "important" changes how we do all the removal later on. I don't think there's a way to do it in a single pass? At least in the typescript, the filtering happens as we go; so we can't do it at the end. Is the C++ code substantially different?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a rationale in the TS issue. We can also choose not to fix this issue :)

@partouf

partouf commented Jul 22, 2025

Copy link
Copy Markdown
Member

Technically the example resources/ce-bug-1648.asm doesn't show the situation where the statically linked function is not before the usage, and so in this case a normal approach would work here.
The original code in compiler-explorer/compiler-explorer#1648 will likely have the same ordering.

I think in this case it's better to be on the safe side and prevent any performance regressions that might be caused by iterating over the dump twice.

So basically:

  • Move the part of determining usage to eol()
  • Remove collectReferencedFunctions() and the call to it

@mattgodbolt

Copy link
Copy Markdown
Member Author

If the test cases pass and the original example cited in the bug work, then we should definitely simplify this here and in the main code.

If not (the original example still fails) then the test is broken and I made it too simple

Either way, glad to have more eyes on this.

…d functions - assuming the labels and usages are ordered favorably
@partouf

partouf commented Aug 21, 2025

Copy link
Copy Markdown
Member

trying to add a test case but struggling...

@partouf partouf marked this pull request as draft August 21, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants