Skip to content

Time Evolutionary Filter Metrics #175

@memeeerit

Description

@memeeerit

The crawler team is working on outputting the crawled vulnerabilities in a json/csv file at the end of the run. With a sequence of these files, we can effectively simulate the crawler by using these files as inputs.

What needs to be developed is an analysis tool that loads the crawler outputs from a directory and tells us:

  • How many new raw vulnerabilities were found in each run, where "new" means that the tuple (CVE_ID, Description, Source_URL) doesn't match any entries from prior runs
  • What is the distribution of source types in that run (both pre and post filtering)
  • How many of these new vulns needed to be filtered (some avoid filtering by virtue of being low priority sources)
  • Of the ones that went through filters, how many were passed and how many were rejected. This should be computed in multiple settings:
    • Using all filters (pending paid openai account)
    • Using all local filters
    • Using individual filters
  • For each parser type (each raw vuln should have an associated parser in the input file), evaluate the above metrics. This lets us know when a parser is failing.

In the automated-filter-metrics branch, in utils/metrics there are some starting classes with a few stubbed out methods

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions