Skip to content

Add ArtiFinder discovered-artifact integration (website)#20

Open
vahldiek wants to merge 13 commits into
ReproDB:mainfrom
vahldiek:feature/artifinder-integration
Open

Add ArtiFinder discovered-artifact integration (website)#20
vahldiek wants to merge 13 commits into
ReproDB:mainfrom
vahldiek:feature/artifinder-integration

Conversation

@vahldiek

@vahldiek vahldiek commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary

Surfaces ArtiFinder-discovered artifact links across the website. These come from the pipeline's new artifinder stage (companion PR below). ArtiFinder scrapes papers directly to find artifact links; they are not manually verified, carry no badges, and are excluded from all AE statistics and scores.

Preview: https://vahldiek.github.io/reprodb.github.io/artifinder.html

Changes

  • Search & profiles: render an Artifinder provenance marker (logo + hover tooltip “found by ArtiFinder — not manually verified”) next to each discovered link. New #artifinder search keyword.
  • New /artifinder.html page: an overview-style discovery-statistics page (ECharts) showing discovered artifacts per year (matched-to-AE vs. not), discovery rate over time, and a per-conference breakdown. Linked from the Ranking nav group.
  • New assets: reprodb-artifinder.css, reprodb-artifinder.js, and an ArtiFinder logo SVG.
  • Methodology: dedicated ArtiFinder-Discovered Artifacts section explaining matching, the no-badge / no-score policy, the repository-stats exception, and the configurable start year; plus a Data Sources entry.
  • Contribute (about): link to ArtiFinder-Data.
  • Data (pipeline-generated): new _data/artifinder_{summary,by_year,by_conference}.yml; artifacts.json and search_data.json gain artifinder_urls (no other rows changed). The raw discovered links are not republished — they stay in the upstream ArtiFinder-Data repo.

Emphasis on AE vs. ArtiFinder

Per policy, ArtiFinder figures are always distinguished from AE results and are reported separately (the dedicated page + methodology). All existing statistics continue to reflect AE-evaluated artifacts only.

Companion PR

Pipeline changes (loader, matching, new stage, schema bump): ReproDB/reprodb-pipeline#17

Notes for reviewers

  • Data files are included so the change is previewable; they were produced by the pipeline's artifinder + search_data stages (may need the data-update label for the immutability check).
  • Verified with a local Jekyll build (no Liquid/template errors); /artifinder.html renders all charts and the per-conference table.

CI note

  • JSON Schema Validation validates against ReproDB/data-schemas@main; it turns green once Add artifinder_urls to schemas (bundle v0.3.0) data-schemas#3 (adds artifinder_urls, bundle v0.3.0) is merged.
  • Data Immutability Check fails by design because pipeline-generated data files changed — apply the data-update label to allow it.

Surface ArtiFinder-discovered artifact links (from the pipeline's new
artifinder stage) across the site. These links are not manually verified,
carry no badges, and are excluded from all AE statistics/scores.

- search + profile: render an 'Artifinder' provenance marker (logo + tooltip)
  next to discovered links; add #artifinder search keyword
- new /artifinder.html discovery-statistics page (ECharts) with per-year and
  per-conference discovery counts and rate over time; linked from nav
- new reprodb-artifinder.css/js + ArtiFinder logo asset
- methodology: dedicated 'ArtiFinder-Discovered Artifacts' section + data source
- about (Contribute): link to ArtiFinder-Data
- regenerated data (pipeline): artifinder.json, _data/artifinder_*.yml,
  artifacts.json + search_data.json gain artifinder_urls
Remove assets/data/artifinder.json (redundant republish of ArtiFinder-Data)
and _build/artifinder_matched_urls.json (no longer produced; repo_stats now
reads matched GitHub links from artifacts.json). No page referenced them; the
discovery page uses the _data/artifinder_*.yml aggregates.
vahldiek added 11 commits July 3, 2026 14:05
The afConfChart bar chart already shows discovered vs. matched per conference.
search_data.json now also contains discovered artifacts whose papers never
went through AE (marked, no badges): 3076 AE + 2770 ArtiFinder-only = 5846.
Author and institution profile pages now list ArtiFinder-discovered (non-AE)
papers, marked with the Artifinder sign and shown in a distinct indigo colour
(italic title). The author timeline chart gains a separate 'ArtiFinder
(discovered)' series. These are view-only and never affect scores/stats.
Adds assets/data/artifinder_authors.json.
In profile artifact tables, the paper title now links to the originally
collected AE artifact URL (getArtifactUrl); the ArtiFinder-discovered link is
shown as a separate clickable Artifinder-marked link (afLink) rather than
taking over the title. Purely-discovered (non-AE) rows still link the title to
the discovered artifact.
repo_stats re-run now counts GitHub repos ArtiFinder matched to AE papers
(~180 net-new), via _inject_artifinder_urls reading artifacts.json.
- new conference x year discovery heatmap (afHeatmap)
- rename 'Year Range' card to 'Years Included'
- caption now shows the ArtiFinder *data* date (data_updated), not pipeline run
Cross-references the pipeline helper and carries the shared test vector so the
author-index normalisation stays byte-identical across Python/JS.
Regenerated: 3076 ae + 2770 artifinder-only rows now carry an explicit source.
21 discovered links now attach to their AE artifact (fuzzy title fallback):
artifacts.json gains artifinder_urls, fewer artifinder-only search/profile rows,
updated discovery aggregates.
All charts now share one palette: indigo (#4a5aa8) for the ArtiFinder/discovered
measure (bars, discovery-rate line+area, heatmap) and orange for the matched-to-AE
overlap. Fixes the discovery-rate line which used the site's security red.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant