Skip to content

[Phase 5] Cross-Survey Validation #6

@Sakeeb91

Description

@Sakeeb91

Objective

Validate models on GALAH overlap sample and assess systematic differences between surveys.

Dependencies

  • Phase 4: Parameter Regression (requires trained models)

Tasks

  • Implement src/data/galah_loader.py for GALAH data parsing
  • Implement src/data/crossmatch.py for coordinate cross-matching
  • Cross-match APOGEE and GALAH by sky coordinates
  • Compare APOGEE predictions to GALAH labels
  • Quantify systematic offsets between surveys
  • Apply calibration if needed
  • Create notebooks/04_cross_validation.ipynb

Files to Create

File Purpose
src/data/galah_loader.py Load GALAH data
src/data/crossmatch.py Cross-match utilities
src/evaluation/cross_survey.py Cross-survey metrics

Starter Code

# src/data/crossmatch.py
"""Cross-match utilities for multi-survey data."""

import numpy as np
from astropy.coordinates import SkyCoord
from astropy import units as u

def crossmatch_by_coordinates(
    ra1: np.ndarray, dec1: np.ndarray,
    ra2: np.ndarray, dec2: np.ndarray,
    max_sep_arcsec: float = 2.0
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Cross-match two catalogs by sky coordinates.

    Returns indices of matches and separations.
    """
    coords1 = SkyCoord(ra=ra1*u.deg, dec=dec1*u.deg)
    coords2 = SkyCoord(ra=ra2*u.deg, dec=dec2*u.deg)

    idx, sep2d, _ = coords1.match_to_catalog_sky(coords2)

    # Filter by maximum separation
    mask = sep2d.arcsec < max_sep_arcsec

    idx1 = np.where(mask)[0]
    idx2 = idx[mask]
    separations = sep2d[mask].arcsec

    return idx1, idx2, separations

Definition of Done

  • Cross-match identifies >10,000 common stars
  • Systematic offsets quantified (Teff < 50K median difference)
  • Calibration applied if offsets significant
  • Results documented in notebook
  • All tests passing

Technical Notes

  • Use 2 arcsec matching radius for cross-match
  • Different wavelength coverage may cause systematic differences
  • Document any calibration choices for reproducibility

Part of #1 (Meta Issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions