Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds functionality to ingest CVE (Common Vulnerabilities and Exposures) data from Mileva AI research digests. The implementation scrapes CVE identifiers from Mileva.ai fortnightly digest pages, fetches detailed CVE information from the MITRE CVE API, and transforms the data into AVID Report objects.
Key changes:
- New
scripts/mileva.pyscript that scrapes Mileva.ai digests and creates AVID Reports from CVE data - Extended datamodels with
CVSSScoresandCWETaxonomyclasses to support CVE vulnerability scoring - Updated
Impactclass to include CVSS and CWE taxonomy fields
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 23 comments.
| File | Description |
|---|---|
| scripts/mileva.py | New script that scrapes CVE IDs from Mileva.ai digests, fetches CVE details from MITRE API, and converts them to AVID Report objects with async processing |
| avidtools/datamodels/components.py | Adds CVSSScores and CWETaxonomy classes for CVE metadata, extends Impact class to include CVSS and CWE fields |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scripts/mileva.py
Outdated
| Script to scrape CVE information from Milev.ai and NVD. | ||
|
|
||
| This script structures CVE data into AVID Vulnerability objects. | ||
|
|
||
| This script: | ||
| 1. Scrapes unique CVE IDs from Milev.ai research digest pages | ||
| 2. Fetches detailed CVE information from NVD | ||
| 3. Structures the data into AVID Vulnerability objects | ||
| 4. Saves all vulnerabilities to a JSONL file | ||
|
|
||
| Dependencies: | ||
| - beautifulsoup4: For HTML parsing | ||
| - requests: For HTTP requests | ||
| - nvdlib: For fetching CVE data from NVD (already in dependencies) |
There was a problem hiding this comment.
The documentation says "Fetches detailed CVE information from NVD" but the script actually fetches from the MITRE CVE API (cveawg.mitre.org), not NVD. This should be corrected to say "MITRE CVE API" or "CVE.org" instead of "NVD".
| Script to scrape CVE information from Milev.ai and NVD. | |
| This script structures CVE data into AVID Vulnerability objects. | |
| This script: | |
| 1. Scrapes unique CVE IDs from Milev.ai research digest pages | |
| 2. Fetches detailed CVE information from NVD | |
| 3. Structures the data into AVID Vulnerability objects | |
| 4. Saves all vulnerabilities to a JSONL file | |
| Dependencies: | |
| - beautifulsoup4: For HTML parsing | |
| - requests: For HTTP requests | |
| - nvdlib: For fetching CVE data from NVD (already in dependencies) | |
| Script to scrape CVE information from Milev.ai and the MITRE CVE API. | |
| This script structures CVE data into AVID Vulnerability objects. | |
| This script: | |
| 1. Scrapes unique CVE IDs from Milev.ai research digest pages | |
| 2. Fetches detailed CVE information from the MITRE CVE API (cveawg.mitre.org) | |
| 3. Structures the data into AVID Vulnerability objects | |
| 4. Saves all vulnerabilities to a JSONL file | |
| Dependencies: | |
| - beautifulsoup4: For HTML parsing | |
| - requests: For HTTP requests | |
| - aiohttp: For async HTTP requests to the MITRE CVE API |
| scope: Optional[str] = None | ||
| confidentialityImpact: Optional[str] = None | ||
| integrityImpact: Optional[str] = None | ||
| availabilityImpact: Optional[str] = None |
There was a problem hiding this comment.
The field name availabilityImpact uses camelCase, which is inconsistent with Python naming conventions. All other fields in this codebase use snake_case. Consider using availability_impact and mapping it to the correct JSON field name using Pydantic's Field(alias="availabilityImpact") if needed.
| class CWETaxonomy(BaseModel): | ||
| """CWE (Common Weakness Enumeration) taxonomy mapping.""" | ||
|
|
||
| cweId: str |
There was a problem hiding this comment.
The field name cweId uses camelCase, which is inconsistent with Python naming conventions. All other fields in this codebase use snake_case. Consider using cwe_id and mapping it to the correct JSON field name using Pydantic's Field(alias="cweId") if needed.
| userInteraction: Optional[str] = None | ||
| scope: Optional[str] = None | ||
| confidentialityImpact: Optional[str] = None | ||
| integrityImpact: Optional[str] = None |
There was a problem hiding this comment.
The field name integrityImpact uses camelCase, which is inconsistent with Python naming conventions. All other fields in this codebase use snake_case. Consider using integrity_impact and mapping it to the correct JSON field name using Pydantic's Field(alias="integrityImpact") if needed.
| 1. Scrapes unique CVE IDs from Milev.ai research digest pages | ||
| 2. Fetches detailed CVE information from NVD | ||
| 3. Structures the data into AVID Vulnerability objects | ||
| 4. Saves all vulnerabilities to a JSONL file |
There was a problem hiding this comment.
This documentation line should say "Saves all reports to a JSONL file" instead of "vulnerabilities" to match the actual implementation.
| 4. Saves all vulnerabilities to a JSONL file | |
| 4. Saves all reports to a JSONL file |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Ingest AI CVEs from Mileva digests.
Includes changes to datamodels