Time Evolutionary Filter Metrics

The crawler team is working on outputting the crawled vulnerabilities in a json/csv file at the end of the run. With a sequence of these files, we can effectively simulate the crawler by using these files as inputs.

What needs to be developed is an analysis tool that loads the crawler outputs from a directory and tells us:
- How many new raw vulnerabilities were found in each run, where "new" means that the tuple (CVE_ID, Description, Source_URL) doesn't match any entries from prior runs
- What is the distribution of source types in that run (both pre and post filtering)
- How many of these new vulns needed to be filtered (some avoid filtering by virtue of being low priority sources)
- Of the ones that went through filters, how many were passed and how many were rejected. This should be computed in multiple settings:
    - Using all filters (pending paid openai account)
    - Using all local filters
    - Using individual filters
- For each parser type (each raw vuln should have an associated parser in the input file), evaluate the above metrics. This lets us know when a parser is failing.

In the `automated-filter-metrics` branch, in `utils/metrics` there are some starting classes with a few stubbed out methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time Evolutionary Filter Metrics #175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time Evolutionary Filter Metrics #175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions