Add list of non-evaluated artifacts based on ArtiFinder#147
Add list of non-evaluated artifacts based on ArtiFinder#147martonbognar wants to merge 16 commits into
Conversation
|
Could you host your branch in your fork? This way, it is easier to look at these changes. |
|
We are still working on refining the changes, but if you're interested, the current version is already deployed at https://mici.hu/secartifacts.github.io/ |
vahldiek
left a comment
There was a problem hiding this comment.
left some comments. looks quite good already. Let me know what you think
| <td> | ||
| {% assign dir_name = page.path | remove_first:"_conferences/" | split:"/" | first %} | ||
| {% assign af_entry = site.data.artifinder_links[dir_name][include.title] %} | ||
| {% if af_entry %}<a href="{{ af_entry.url }}" target="_blank">Author's link</a>{% if af_entry.validated %} <abbr title="This extracted link has been manually validated." class="af-validated">✓</abbr>{% endif %}{% endif %} |
There was a problem hiding this comment.
Instead of calling it author link, could we create an artifact finder logo and use that instead? And have an alternative message that says something to the effect of extracted from the paper using artifact finder?
What does the manually validated mean here? every link has been clicked?
There was a problem hiding this comment.
Do we need these script files or should we rather have the output files?
How would one submit an update to the artifact URL? Through the artfinder repo?
There was a problem hiding this comment.
Yes, this was one of the main technical aspects I was wondering about. I would like to keep the dataset in a central location and not duplicate it, so it can serve as a single source of truth, and I'm not sure the secartifacts website repo is the best place for that. This also means that corrections/validations should also be sent to our dataset repository. Whether it's included as a submodule or just fetched dynamically when building the website doesn't matter too much I guess, for now a submodule felt cleaner. Initially I also didn't want these additional generated files, but without them the website generation took excruciatingly long.
Regarding the process of submitting contributions: we're looking into simplifying this process as much as possible, e.g., GitHub now supports issue templates, which might be a good solution...
There was a problem hiding this comment.
that makes sense, is there still a path to keep the scripts in the artfinder repo and only have the deployment change (the workflow changes that you have right now)? Otherwise I would at least suggest to put these files into their own folder in this repo...
Suggestion on template issues is fine.
|
Artfinder is limited by public papers I guess. Any chance to run it through systems conferences as well or are they behind ACM DL paywall? |
|
Something that I just realize is that because the data is not generated into the yaml files. reproDB will have to directly download the artfinder output files from the repo. If you have any suggestion, please let me know. I was thinking to add those before July 20th... |
|
I pushed some changes that should address most of your points, let me know if I missed something. Since the scripts are somewhat specific to secartifacts (comparing which artifacts are present in the evaluation lists), I think it's better to keep them here, but I moved them to a separate directory. If we want to include the results in ReproDB, I would indeed recommend to download the source dataset directly. The biggest barrier to running ArtiFinder on more papers is the availability of PDFs. In the future, we want to look into this as well, seeing if we can collect data on non-security publications, and how we can (semi-)automatically obtain the papers without violating the ToS... For this PR, I'm still planning to do a sanity check on the code and maybe separate the outcomes for different conferences better on the |
|
Thanks. In case of USENIX security, could we use the same logo Artifact for a link instead of the link in paper? This could also distinguish between Repositories (github) and Aritfacts (zenodo). I was also under the impression that the dataset include papers that didn't go through AE. Would you like to add those in a separate PR? Once we are happy with the PR, I'd like to also invite Solal to take a look. He is mainly focused on SysArtifacts, but it would be good to get his feedback, since SysArtifacts could use a similar dataset... |
|
Just found the non-evaluated artifacts. Problem is they are somewhat hidden. That said, I'm not sure how to change it. Maybe for instances that had an AE, we could add the table from the non-evaluated page of that conference and year to the results? I'm not quite sure how to handle it... Let me know, what you think. |
This pull request (currently a placeholder) serves to add the list of artifacts we obtained using ArtiFinder, a tool to automatically identify research artifacts from papers. We built a dataset that contains artifacts published at IEEE S&P, ACM CCS, USENIX Security, and NDSS in the period 2000--2025, and ACSAC in 2017--2025.
The current proposal adds the following content: