Skip to content

Track accuracy over time#331

Draft
cmalinmayor wants to merge 12 commits intomainfrom
track-accuracy-over-time
Draft

Track accuracy over time#331
cmalinmayor wants to merge 12 commits intomainfrom
track-accuracy-over-time

Conversation

@cmalinmayor
Copy link
Collaborator

@cmalinmayor cmalinmayor commented Jan 26, 2026

Proposed Metric Addition: Track Accuracy Over Time

Closes #30
From https://www.nature.com/articles/s41587-022-01427-7, implemented in https://github.com/funkelab/linajea/blob/754ca57a67758e670bb4d2366ddadb7fa0045f1d/linajea/evaluation/evaluator.py#L409

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the developer/contributing docs.
  • I have added tests for the standard test examples documented here, along with end-to-end tests.
  • I have checked that I maintained or improved code coverage.
  • I have added benchmarking functions for my change tests/bench.py.
  • I have added a page to the documentation with a complete description of my matcher/metric including any references.
  • I have written docstrings and checked that they render correctly in the Read The Docs build (created after the PR is opened).

Discussion Questions

Question 1: How to count window sizes with skip edges

When a ground truth skip edge spans multiple frames (e.g., t=0 → t=3), how should we count it?

Option A: Count only at actual frame span (current implementation)

  • Skip edge t=0→t=3 counts as 1 segment of size 3
  • No segments of size 1 or 2 from that starting node
  • ✅ Honest about what data exists
  • ❌ Sparse window totals when skip edges are common; can't compare datasets with different skip patterns

Option B: Interpolate intermediate windows CHOSE THIS OPTION WITH DRAGA*

  • Skip edge t=0→t=3 counts as: 3 segments of size 1, 2 of size 2, 1 of size 3
  • All share the same correctness status
  • Possibly only do this when gt skip edges are relaxed?
  • ✅ Fills in window sizes, makes datasets more comparable
  • ❌ "Inflates" counts; a single skip edge contributes many segments

Option C: Count as single edge (original implementation)

  • Skip edge t=0→t=3 counts as 1 segment of size 1
  • ✅ Simple, matches edge count
  • ❌ Misleading—"1 frame" actually spans 3 frames of real time

Question 2: Should we add a flag to include ground truth segments that end early in larger window sizes? To make it equivalent to complete tracks from the CTC-BIO?

Yes, it should be complete and correct for all later window sizes (not even a flag, just do it). Note in docs that many short GT tracks might upweight the total correct fractions. Normalized by length is "track effectiveness"

"Question" 3: Design difference at start of time window

When computing track/lineage accuracy over time, linajea skips checking error flags on the start node of each segment, while traccuracy checks all nodes including the start node.

Specifically, in linajea's get_perfect_segments (evaluator.py:433):

for cn in current_nodes:
    if cn != start_node:  # <-- skips the start node
        if 'IS' in ns or 'FP_D' in ns or 'FN_D' in ns:
            correct = False

In traccuracy, we check all nodes including the start node for division errors (FN_DIV, FP_DIV).

When this matters

This affects segments starting from a missed division node (FN_D):

- linajea: Segment 01 is "correct" if the edge exists, even if node 0 has FN_D (missed division)
- traccuracy: Segment 01 is "incorrect" because the lineage from node 0 is structurally wrong (we missed that it divided)

Why traccuracy's behavior is preferred

1. Lineage-focused semantics: In "lineage" mode, a missed division means the lineage structure is fundamentally incomplete. Segments starting from that node should reflect this.
2. Consistency with tracklet mode: In "tracklet" mode, division edges are excluded entirely, so division errors don't apply. The distinction between modes is cleaner.
3. Avoids silent errors: If a division is missed, downstream "correct" segments could give a misleadingly optimistic accuracy score.

Historical note

Interestingly, the original linajea implementation (June 2021) only checked IS and FP_D, not FN_D. The FN_D check was added later in a cleanup commit (June 2022) but the start_node skip was preserved. The skip may have been an oversight in the original implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track accuracy (over time) - % of tracks that have no errors (for different time windows)

1 participant