Track accuracy over time by cmalinmayor · Pull Request #331 · live-image-tracking-tools/traccuracy

cmalinmayor · 2026-01-26T17:10:01Z

Proposed Metric Addition: Track Accuracy Over Time

Closes #30
From https://www.nature.com/articles/s41587-022-01427-7, implemented in https://github.com/funkelab/linajea/blob/754ca57a67758e670bb4d2366ddadb7fa0045f1d/linajea/evaluation/evaluator.py#L409

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have read the developer/contributing docs.
I have added tests for the standard test examples documented here, along with end-to-end tests.
I have checked that I maintained or improved code coverage.
I have added benchmarking functions for my change tests/bench.py.
I have added a page to the documentation with a complete description of my matcher/metric including any references.
I have written docstrings and checked that they render correctly in the Read The Docs build (created after the PR is opened).

Discussion Questions

Question 1: How to count window sizes with skip edges

When a ground truth skip edge spans multiple frames (e.g., t=0 → t=3), how should we count it?

Option A: Count only at actual frame span (current implementation)

Skip edge t=0→t=3 counts as 1 segment of size 3
No segments of size 1 or 2 from that starting node
✅ Honest about what data exists
❌ Sparse window totals when skip edges are common; can't compare datasets with different skip patterns

Option B: Interpolate intermediate windows CHOSE THIS OPTION WITH DRAGA*

Skip edge t=0→t=3 counts as: 3 segments of size 1, 2 of size 2, 1 of size 3
All share the same correctness status
Possibly only do this when gt skip edges are relaxed?
✅ Fills in window sizes, makes datasets more comparable
❌ "Inflates" counts; a single skip edge contributes many segments

Option C: Count as single edge (original implementation)

Skip edge t=0→t=3 counts as 1 segment of size 1
✅ Simple, matches edge count
❌ Misleading—"1 frame" actually spans 3 frames of real time

Question 2: Should we add a flag to include ground truth segments that end early in larger window sizes? To make it equivalent to complete tracks from the CTC-BIO?

Yes, it should be complete and correct for all later window sizes (not even a flag, just do it). Note in docs that many short GT tracks might upweight the total correct fractions. Normalized by length is "track effectiveness"

"Question" 3: Design difference at start of time window

When computing track/lineage accuracy over time, linajea skips checking error flags on the start node of each segment, while traccuracy checks all nodes including the start node.

Specifically, in linajea's get_perfect_segments (evaluator.py:433):

for cn in current_nodes:
    if cn != start_node:  # <-- skips the start node
        if 'IS' in ns or 'FP_D' in ns or 'FN_D' in ns:
            correct = False

In traccuracy, we check all nodes including the start node for division errors (FN_DIV, FP_DIV).

When this matters

This affects segments starting from a missed division node (FN_D):

- linajea: Segment 0→1 is "correct" if the edge exists, even if node 0 has FN_D (missed division)
- traccuracy: Segment 0→1 is "incorrect" because the lineage from node 0 is structurally wrong (we missed that it divided)

Why traccuracy's behavior is preferred

1. Lineage-focused semantics: In "lineage" mode, a missed division means the lineage structure is fundamentally incomplete. Segments starting from that node should reflect this.
2. Consistency with tracklet mode: In "tracklet" mode, division edges are excluded entirely, so division errors don't apply. The distinction between modes is cleaner.
3. Avoids silent errors: If a division is missed, downstream "correct" segments could give a misleadingly optimistic accuracy score.

Historical note

Interestingly, the original linajea implementation (June 2021) only checked IS and FP_D, not FN_D. The FN_D check was added later in a cleanup commit (June 2022) but the start_node skip was preserved. The skip may have been an oversight in the original implementation.

… to check them carefully)

cmalinmayor and others added 12 commits June 12, 2025 16:00

Add a larger test case with multiple tracks and frames

27f730f

Add tests for compute_track_accuracy on canonical cases

86f8c8a

Merge branch 'main' into track-accuracy-over-time

b9cc017

Draft of lineage accuracy docs

e52647a

WIP: implementation of track accuracy given no skip edges, untested

e1795ef

WIP (untested): ensure processing in temporal order

eaacd52

Merge branch 'main' into track-accuracy-over-time

1c50902

Implementation that works for basic, non-division test cases

b9399d7

Add Claude-generated tests for all the other basic errors (still need…

7d4d540

… to check them carefully)

Measure window size in frames, not edges

a3d69f3

Add benchmarking function

08311b5

Add citation and link to original implementation to docs

2f5de12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track accuracy over time#331

Track accuracy over time#331
cmalinmayor wants to merge 12 commits intomainfrom
track-accuracy-over-time

cmalinmayor commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cmalinmayor commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Metric Addition: Track Accuracy Over Time

Checklist

Discussion Questions

Question 1: How to count window sizes with skip edges

Option A: Count only at actual frame span (current implementation)

Option B: Interpolate intermediate windows CHOSE THIS OPTION WITH DRAGA*

Option C: Count as single edge (original implementation)

Question 2: Should we add a flag to include ground truth segments that end early in larger window sizes? To make it equivalent to complete tracks from the CTC-BIO?

"Question" 3: Design difference at start of time window

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cmalinmayor commented Jan 26, 2026 •

edited

Loading