Skip to content

Scoring bug: component_basic/external_references always scores as missing due to jsonpath and key-casing mismatches #76

@rocklambros

Description

@rocklambros

Summary

The external_references field in the component_basic scoring category always scores as missing (not present), even when the generated BOM contains a fully populated externalReferences array with multiple entries. This causes every model to lose ~2.9 points out of 100 regardless of how complete their metadata is.

The root cause is two independent bugs in how the field is detected:

  1. Incorrect jsonpath in field_registry.json: The path is $.component.externalReferences (singular component) but the CycloneDX BOM structure uses $.components[0].externalReferences (plural components with array index).
  2. Snake_case/camelCase mismatch in fallback checker: The fallback check_field_in_aibom() checks if "external_references" in component but the CycloneDX key is "externalReferences" (camelCase).

Environment

  • aibom-generator version: v1.0.2 (commit 67829cb)
  • Python: 3.12
  • OS: Linux (Ubuntu)

Steps to Reproduce

  1. Run the AIBOM generator against any HuggingFace model that has external references:
python3 -m src.cli "rockCO78/crosswalk-v7c" --output /tmp/test_aibom.json --verbose
  1. Observe the output shows Component Basic: 17.1/20 instead of 20/20.

  2. Inspect the generated BOM to confirm externalReferences IS present:

import json
bom = json.load(open("/tmp/test_aibom.json"))
comp = bom["components"][0]

# The field EXISTS in the BOM under the correct CycloneDX key:
print("externalReferences" in comp)  # True
print(len(comp["externalReferences"]))  # 5 entries

# But the scorer looks for the wrong key:
print("external_references" in comp)  # False
  1. Verify the BOM top-level structure uses components (plural), not component:
print("component" in bom)   # False
print("components" in bom)  # True

Root Cause Analysis

Bug 1: Incorrect jsonpath in field_registry.json

File: src/models/field_registry.json

The external_references field definition uses:

{
  "external_references": {
    "category": "component_basic",
    "jsonpath": "$.component.externalReferences",
    ...
  }
}

The jsonpath $.component.externalReferences navigates to bom["component"]["externalReferences"], but the CycloneDX BOM structure (both 1.6 and 1.7) uses bom["components"][0]["externalReferences"] -- note the plural components with an array index.

For comparison, the other component_basic fields all use the correct plural path:

Field jsonpath
name $.components[0].name
type $.components[0].type
component_version $.components[0].version
purl $.components[0].purl
description $.components[0].description
licenses $.components[0].licenses
external_references $.component.externalReferences (incorrect)

The inconsistency is clear: external_references is the only field using singular $.component instead of plural $.components[0].

This causes FieldRegistryManager.detect_field_presence() -> _get_nested_value() to fail at line 258-260 of src/models/registry.py:

if isinstance(current, dict) and part in current:
    current = current[part]
else:
    return False, None  # <-- hits this because bom["component"] doesn't exist

Bug 2: Snake_case field name vs camelCase BOM key in fallback checker

File: src/models/scoring.py, lines 93-98

When the jsonpath-based detection fails (bug 1), the scorer falls back to check_field_in_aibom(). The relevant code:

# Line 93-98 of scoring.py
components = aibom.get("components", [])
if components:
    component = components[0]
    if field in component:  # field = "external_references"
        return True          # BOM key = "externalReferences" -- no match

The field registry names this field external_references (snake_case), but CycloneDX uses externalReferences (camelCase). The if field in component check performs a literal key lookup, so "external_references" in {"externalReferences": [...]} returns False.

Other fields avoid this problem because their registry names match their BOM keys exactly (e.g., name, type, purl, description, licenses). The component_version field also has a name mismatch (component_version vs version), but it is rescued by the jsonpath-based detection (bug 1 doesn't affect it because its jsonpath $.components[0].version is correct).

Suggested Fix

Fix 1 (field_registry.json): Change the jsonpath from singular to plural:

- "jsonpath": "$.component.externalReferences",
+ "jsonpath": "$.components[0].externalReferences",

This alone should fix the scoring because the enhanced checker (check_field_with_enhanced_results) tries the jsonpath-based detection first (line 154-158 of scoring.py), and if that succeeds, it never reaches the fallback.

Fix 2 (scoring.py, defense-in-depth): Add a camelCase alias check in the fallback, or normalize field names before lookup:

# Option A: explicit alias map
FIELD_ALIASES = {
    "external_references": "externalReferences",
    "component_version": "version",
}

# In check_field_in_aibom(), line 97:
field_key = FIELD_ALIASES.get(field, field)
if field_key in component:
    return True

Fix 1 is sufficient on its own. Fix 2 provides defense-in-depth against similar issues in future field additions.

Impact

  • Every model scored by the AIBOM generator loses ~2.86 points (1/7 * 20) in the component_basic category, even when the BOM correctly contains external references.
  • This makes it impossible to achieve 100/100 completeness.
  • The issue affects both CycloneDX 1.6 and 1.7 output since both use the components (plural) array structure.

Additional Context

I discovered this while publishing a model (rockCO78/crosswalk-v7c) and maximizing the AIBOM completeness score. The model card covers all 35 non-GGUF fields in the registry, achieving 97.1/100 -- with the remaining 2.9 points lost entirely to this bug.

Scoring output:

Completeness Score: 97.1/100

Section Breakdown:
  - Required Fields: 20/20
  - Metadata: 20/20
  - Component Basic: 17.1/20    <-- should be 20/20
  - Component Model Card: 30/30
  - External References: 10/10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions