Skip to content

MAINT: Subject hash for all subject types#1317

Open
pastewka wants to merge 3 commits into
mainfrom
26_subject_hash
Open

MAINT: Subject hash for all subject types#1317
pastewka wants to merge 3 commits into
mainfrom
26_subject_hash

Conversation

@pastewka
Copy link
Copy Markdown
Contributor

@pastewka pastewka commented May 11, 2026

This PR implement hash computation also for the (simple) surface, topography and tag subject types. This means all subjects are treated on equal footings.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends subject_hash computation/usage beyond surface sets to also cover single-subject WorkflowResults (Surface, Topography, Tag), so WorkflowResult lookup/dedup can treat all subject types uniformly.

Changes:

  • Compute and persist subject_hash for Tag/Topography/Surface WorkflowResults on creation and when recomputing hashes.
  • Switch dependency resolution (prepare_dependency_tasks) to identify existing WorkflowResults via subject_hash rather than subject FK fields.
  • Add a data migration to backfill subject_hash for existing FK-based WorkflowResults, and adjust the relevant unit test for the new fallback behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
topobank/analysis/tasks.py Uses subject_hash for dependency dedup/lookup and sets it on newly created dependency analyses.
topobank/analysis/models.py Adds subject_hash computation for FK-based subjects and updates update_subject_hash fallback logic.
topobank/analysis/migrations/0064_backfill_subject_hash.py Backfills missing subject_hash values for existing WorkflowResults with FK subjects.
tests/analysis/test_surface_set_workflows.py Updates update_subject_hash test to reflect FK fallback behavior when M2M is empty.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +491 to +501
existing_analysis = (
WorkflowResult.objects.filter(
workflow_name=dependency.function.name,
subject_hash=subject_hash,
kwargs=kwargs,
)
.prefetch_related('surfaces')
.select_related('subject_topography', 'subject_surface', 'subject_tag')
.order_by("-task_start_time")
.first()
)
Comment on lines +478 to +501
# Compute subject_hash for this dependency
subject = dependency.subject
if use_surfaces_path and isinstance(subject, Surface):
subject_hash = WorkflowResult.compute_subject_hash("surfaces", [subject.id])
elif isinstance(subject, Surface):
subject_hash = WorkflowResult.compute_subject_hash("surface", [subject.id])
elif isinstance(subject, Topography):
subject_hash = WorkflowResult.compute_subject_hash("topography", [subject.id])
elif isinstance(subject, Tag):
subject_hash = WorkflowResult.compute_subject_hash("tag", [subject.id])
else:
# Old path: filter via direct subject FK fields
existing_analysis = (
WorkflowResult.objects.filter(
workflow_name=dependency.function.name,
subject_surface=(
dependency.subject
if isinstance(dependency.subject, Surface)
else None
),
subject_topography=(
dependency.subject
if isinstance(dependency.subject, Topography)
else None
),
kwargs=kwargs,
)
.select_related('subject_topography', 'subject_surface', 'subject_tag')
.order_by("-task_start_time")
.first()
raise ValueError(f"Unsupported dependency subject type: {type(subject)}")

existing_analysis = (
WorkflowResult.objects.filter(
workflow_name=dependency.function.name,
subject_hash=subject_hash,
kwargs=kwargs,
)
.prefetch_related('surfaces')
.select_related('subject_topography', 'subject_surface', 'subject_tag')
.order_by("-task_start_time")
.first()
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants