LEARNING CURVE

This guide provides the standard operating procedure for generating high-fidelity ground truth masks using IntrekSAM. Follow these phases to ensure temporal consistency and physical accuracy across your datasets.

🛠 Interface & Shortcut Reference

Button / Element	Function	Shortcut
Canvas	Interactive display area for video playback and for placing positive (Left-Click) and negative (Right-Click) prompts.	`Left/Right Click`
Load Video	Opens a file browser to manually select and load a video file.	-
Load Video Frames	Imports a sequence of pre-extracted images from a specified directory. Use it in case of long videos and limited resources.	-
Load Auto	Automatically retrieves the next assigned video from the input directory.	-
Load Next	Loads the next available video from the input directory.	-
Play / Pause	Toggles the SAM 2 propagation engine to start or stop mask tracking.	`Middle Mouse`
Undo Points	Removes the placed positive and negative prompts of the selected class on the current frame.	-
Clear Annotations	Flushes all predicted and manual masks of all classes from the current frame to the end of the video.	-
Export Annotations	Exports all masks generated in the current video to the output directory.	-
Taxonomy Sidebar	List of selectable instrument classes (e.g., Spatula, Phaco Tip) used to label the active mask.	`[Id]` (e.g., `7`)
`<<` Button	Steps the video backward by a single frame for precise alignment.	`Left Arrow`
`>>` Button	Steps the video forward by a single frame for precise alignment.	`Right Arrow`
Frame Slider	Allows for rapid scrubbing and navigation through the entire video duration.	-
Frame Counter	Displays the current frame index relative to the total frame count of the video.	-
Status Update	Console log providing real-time feedback on user actions and system coordinates.	-

Annotation Guide: Standard Operating Procedure

Follow these phases to generate high-fidelity ground truth masks. This workflow is optimized for the Cataract-1K dataset to ensure temporal consistency and physical accuracy.

Phase 1: Ingestion & Tool Identification

Initialize: Launch the IntrekSAM application.
Data Loading: Click Load Auto to automatically retrieve the next unannotated video from your input directory.
- Alternatively, use Load Video to select a specific sequence manually.
- Or, use Load Video frames to load pre-extracted image directories. This method is optimized for long-duration videos that exceed standard hardware memory buffers.
Define Target: From the Class Selection Sidebar, select the label corresponding to the instrument you intend to track (e.g., Phaco Tip).

Phase 2: Semantic Seeding (Initial Frame)

Locate Clear Frame: Use the Frame Slider to find a frame where the target tool is clearly visible and its boundaries are distinct.
Interactive Prompting:
- Positive Prompts (Left-Click): Place points on the tool body to define the mask area.
- Negative Prompts (Right-Click): Place points on specular reflections, fluid bubbles, or background tissue to refine the mask edges.
Verify Mask: If a "transient mis-click" occurs, use Undo Points to remove all points in the current frame for selected class. Ensure the mask perfectly encapsulates the physical tool before proceeding.

Phase 3: Temporal Alignment & Tracking

The Rewind Rule: For the initial setup of a tool, drag the slider back to the frame immediately before the tool enters the field of view.
Initiate Tracking: Press Play. The SAM 2 engine will begin tracking the tool from its point of entry using the established semantic memory.
Monitor Propagation: Observe the Main Display Canvas as the mask propagates forward through the sequence.

Phase 4: Verification & Multi-Tool Iteration

Observe Tracking: Monitor the Main Display Canvas as the mask propagates. Use the Left/Right Arrow Keys for precise frame-by-frame verification of the mask's fidelity.
Drift Correction: If the mask deviates from the tool boundary or an occlusion occurs:
- Pause the video at the first incorrect frame using frame steppers.
- Click Clear Annotations to flush the predicted memory buffer from the current frame to the end of the sequence ($t \to \infty$).
- Re-prompt at that exact frame to "re-seed" the model and press Play to resume tracking. (Note: Do not rewind for mid-sequence corrections).
Iteration Check: Once the sequence for the current tool is complete, determine if additional instruments (e.g., Spatula) require annotation.
- If Next Tool Exists: Return to Phase 1 to select the new tool label and begin its semantic seeding process.
Final Export: After all instruments in the video have been accurately masked and verified, click Export Annotations.

Best Practices for High-Fidelity Annotation

To ensure the highest data quality across any visual domain, all annotators should adhere to these professional guidelines.

1. Geometric Precision (The "Initial Seed")

Target the Active Region: For elongated objects, prioritize placing points on the "active" or functional end. Avoid annotating static handles or base structures unless you would like to include entire tool in your mask.
Edge Anchoring: Do not place points only in the center of the object. Strategically place positive prompts near the physical boundaries. This helps the model lock onto the object's edges against complex backgrounds.
The Rewind Rule: Always initialize a new object by rewinding to the frame prior to its entry into the field of view. This provides a clean starting state for the model’s temporal memory.

(place a short gif)

2. Managing Environmental Artifacts

Negative Prompting for Specular Noise: High-intensity lighting often creates glare on metallic or reflective surfaces. If a mask bleeds into a reflection, place Negative Prompts directly on the glare to force the mask back to the object’s physical boundary.
Visual Interference: Treat bubbles or fluid distortions as background noise. Use negative prompts to ensure the object mask does not expand into these transient visual artifacts.

(place a short gif)

3. Occlusion & Boundary Protocol

The "No-Guessing" Rule: If an object passes behind another element or is partially obscured, do not estimate its hidden position. Annotate only the visually confirmed portion to prevent hallucinated data in the final dataset.
Boundary Maintenance: If an object leaves the field of view, immediately use Clear Annotations to stop tracking. This prevents phantom masks from sticking to the edges of the video frame and prevents model to hallucinate tool in other elements of the video.

(place a short gif)

4. Temporal Verification & Drift Management

Active Review: Pay attention to the masks propagating through the video to identify any drifts in tracking and ensure mask accuracy.
The "First Frame" Correction: At the first frame of detected drift, pause the video and correct the mask. Small errors compound quickly; fixing a minor drift early prevents a total tracking failure later in the sequence.

5. Multi-Object Strategy: Efficiency vs. Fidelity

Sequential Pass (Recommended): Fully annotate and verify Object A before starting Object B. This is the "Gold Standard" for ensuring the model's memory bank remains focused on a single semantic target.
Simultaneous Pass (Advanced): For clear sequences where objects remain physically separated, you may seed multiple objects at once by providing distinct positive and negative prompts for every active class before initiating propagation.
- Warning: If one object drifts, clearing annotations will wipe progress for all active objects in that pass. Use this only for high-quality, low-noise footage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LEARNING CURVE

🛠 Interface & Shortcut Reference

Annotation Guide: Standard Operating Procedure

Phase 1: Ingestion & Tool Identification

Phase 2: Semantic Seeding (Initial Frame)

Phase 3: Temporal Alignment & Tracking

Phase 4: Verification & Multi-Tool Iteration

Best Practices for High-Fidelity Annotation

1. Geometric Precision (The "Initial Seed")

2. Managing Environmental Artifacts

3. Occlusion & Boundary Protocol

4. Temporal Verification & Drift Management

5. Multi-Object Strategy: Efficiency vs. Fidelity

FilesExpand file tree

HowToUse.md

Latest commit

History

HowToUse.md

File metadata and controls

LEARNING CURVE

🛠 Interface & Shortcut Reference

Annotation Guide: Standard Operating Procedure

Phase 1: Ingestion & Tool Identification

Phase 2: Semantic Seeding (Initial Frame)

Phase 3: Temporal Alignment & Tracking

Phase 4: Verification & Multi-Tool Iteration

Best Practices for High-Fidelity Annotation

1. Geometric Precision (The "Initial Seed")

2. Managing Environmental Artifacts

3. Occlusion & Boundary Protocol

4. Temporal Verification & Drift Management

5. Multi-Object Strategy: Efficiency vs. Fidelity