-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The crawler team is working on outputting the crawled vulnerabilities in a json/csv file at the end of the run. With a sequence of these files, we can effectively simulate the crawler by using these files as inputs.
What needs to be developed is an analysis tool that loads the crawler outputs from a directory and tells us:
- How many new raw vulnerabilities were found in each run, where "new" means that the tuple (CVE_ID, Description, Source_URL) doesn't match any entries from prior runs
- What is the distribution of source types in that run (both pre and post filtering)
- How many of these new vulns needed to be filtered (some avoid filtering by virtue of being low priority sources)
- Of the ones that went through filters, how many were passed and how many were rejected. This should be computed in multiple settings:
- Using all filters (pending paid openai account)
- Using all local filters
- Using individual filters
- For each parser type (each raw vuln should have an associated parser in the input file), evaluate the above metrics. This lets us know when a parser is failing.
In the automated-filter-metrics branch, in utils/metrics there are some starting classes with a few stubbed out methods
Metadata
Metadata
Assignees
Labels
No labels