Skip to content

Add list of non-evaluated artifacts based on ArtiFinder#147

Draft
martonbognar wants to merge 16 commits into
secartifacts:mainfrom
martonbognar:artifinder
Draft

Add list of non-evaluated artifacts based on ArtiFinder#147
martonbognar wants to merge 16 commits into
secartifacts:mainfrom
martonbognar:artifinder

Conversation

@martonbognar

@martonbognar martonbognar commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This pull request (currently a placeholder) serves to add the list of artifacts we obtained using ArtiFinder, a tool to automatically identify research artifacts from papers. We built a dataset that contains artifacts published at IEEE S&P, ACM CCS, USENIX Security, and NDSS in the period 2000--2025, and ACSAC in 2017--2025.

The current proposal adds the following content:

  • A section on the main page that briefly explains the difference between evaluated and non-evaluated artifacts, links to ArtiFinder, and to the page that lists all the non-evaluated artifacts.
  • The single page that contains the list of all artifacts from our dataset that belong to papers that are not listed with an outcome as part of an AE process.
  • Update to all AE outcome pages that overlap with our dataset: here, we add a link to the "author's version" if the discovered artifact is different from the URL reported by the chairs.
  • A tick to indicate when an extracted URL has been manually validated in the dataset.

@vahldiek

Copy link
Copy Markdown
Contributor

Could you host your branch in your fork? This way, it is easier to look at these changes.

@martonbognar

Copy link
Copy Markdown
Contributor Author

We are still working on refining the changes, but if you're interested, the current version is already deployed at https://mici.hu/secartifacts.github.io/
Comments are already welcome!

@vahldiek vahldiek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments. looks quite good already. Let me know what you think

Comment thread _includes/artifinder_link_cell.html Outdated
<td>
{% assign dir_name = page.path | remove_first:"_conferences/" | split:"/" | first %}
{% assign af_entry = site.data.artifinder_links[dir_name][include.title] %}
{% if af_entry %}<a href="{{ af_entry.url }}" target="_blank">Author's link</a>{% if af_entry.validated %} <abbr title="This extracted link has been manually validated." class="af-validated">✓</abbr>{% endif %}{% endif %}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling it author link, could we create an artifact finder logo and use that instead? And have an alternative message that says something to the effect of extracted from the paper using artifact finder?

What does the manually validated mean here? every link has been clicked?

Comment thread index.md Outdated
Comment thread index.md Outdated
Comment thread index.md Outdated
Comment thread index.md Outdated
Comment thread scripts/generate_artifinder_data.py
Comment thread scripts/generate.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these script files or should we rather have the output files?

How would one submit an update to the artifact URL? Through the artfinder repo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was one of the main technical aspects I was wondering about. I would like to keep the dataset in a central location and not duplicate it, so it can serve as a single source of truth, and I'm not sure the secartifacts website repo is the best place for that. This also means that corrections/validations should also be sent to our dataset repository. Whether it's included as a submodule or just fetched dynamically when building the website doesn't matter too much I guess, for now a submodule felt cleaner. Initially I also didn't want these additional generated files, but without them the website generation took excruciatingly long.

Regarding the process of submitting contributions: we're looking into simplifying this process as much as possible, e.g., GitHub now supports issue templates, which might be a good solution...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, is there still a path to keep the scripts in the artfinder repo and only have the deployment change (the workflow changes that you have right now)? Otherwise I would at least suggest to put these files into their own folder in this repo...

Suggestion on template issues is fine.

Comment thread _conferences/acsac2017/results.md Outdated
@vahldiek

Copy link
Copy Markdown
Contributor

Artfinder is limited by public papers I guess. Any chance to run it through systems conferences as well or are they behind ACM DL paywall?

@vahldiek

Copy link
Copy Markdown
Contributor

Something that I just realize is that because the data is not generated into the yaml files. reproDB will have to directly download the artfinder output files from the repo. If you have any suggestion, please let me know. I was thinking to add those before July 20th...

@martonbognar

Copy link
Copy Markdown
Contributor Author

I pushed some changes that should address most of your points, let me know if I missed something. Since the scripts are somewhat specific to secartifacts (comparing which artifacts are present in the evaluation lists), I think it's better to keep them here, but I moved them to a separate directory.

If we want to include the results in ReproDB, I would indeed recommend to download the source dataset directly.

The biggest barrier to running ArtiFinder on more papers is the availability of PDFs. In the future, we want to look into this as well, seeing if we can collect data on non-security publications, and how we can (semi-)automatically obtain the papers without violating the ToS...

For this PR, I'm still planning to do a sanity check on the code and maybe separate the outcomes for different conferences better on the artifinder.md page.

@vahldiek

vahldiek commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Thanks. In case of USENIX security, could we use the same logo Artifact for a link instead of the link in paper? This could also distinguish between Repositories (github) and Aritfacts (zenodo).

I was also under the impression that the dataset include papers that didn't go through AE. Would you like to add those in a separate PR?

Once we are happy with the PR, I'd like to also invite Solal to take a look. He is mainly focused on SysArtifacts, but it would be good to get his feedback, since SysArtifacts could use a similar dataset...

@vahldiek

vahldiek commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Just found the non-evaluated artifacts. Problem is they are somewhat hidden. That said, I'm not sure how to change it. Maybe for instances that had an AE, we could add the table from the non-evaluated page of that conference and year to the results? I'm not quite sure how to handle it... Let me know, what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants