Skip to content

Add semantic validator for policy template datastream categories#1095

Open
JDKurma wants to merge 9 commits intoelastic:mainfrom
JDKurma:add-datastream-categories-validato
Open

Add semantic validator for policy template datastream categories#1095
JDKurma wants to merge 9 commits intoelastic:mainfrom
JDKurma:add-datastream-categories-validato

Conversation

@JDKurma
Copy link

@JDKurma JDKurma commented Feb 23, 2026

What does this PR do?

Adds semantic validators to verify new datastream manifest categories are aligned and synced with policy template datastream categories when present as well as parent categories alongside package level categories.

Why is it important?

Verify and prevent drift between datastream specific categorization

Checklist

Related issues

N/A

@JDKurma JDKurma self-assigned this Feb 23, 2026
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch 2 times, most recently from a796064 to 4d739c2 Compare February 24, 2026 23:36
@JDKurma
Copy link
Author

JDKurma commented Feb 25, 2026

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#17560

@JDKurma JDKurma added the enhancement New feature or request label Feb 27, 2026
@JDKurma JDKurma marked this pull request as ready for review February 27, 2026 15:20
@JDKurma JDKurma requested a review from a team as a code owner February 27, 2026 15:20
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two semantic validators for datastream category consistency: one validates that policy template categories match categories declared in referenced data_stream manifests; the other validates that package top-level categories include any registry-defined parent categories referenced by datastreams. Both validators are registered in the spec rule set, parse package and data_stream manifest.yml files, fetch registry categories as needed, and return structured spec validation errors. Includes unit tests and test-package fixtures demonstrating matching and failing scenarios.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from 4d739c2 to b57aaed Compare February 27, 2026 15:24
@trisch-me
Copy link
Contributor

Do we have a propagation from policy to package category? This also should be not fine grained categories, but parents according to the tree structure

@teresaromero
Copy link
Contributor

can you adjust the pr description to the template provided?

teresaromero
teresaromero previously approved these changes Mar 3, 2026
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from e7cc89f to 3c49b8d Compare March 5, 2026 00:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@code/go/internal/validator/semantic/validate_datastream_package_categories.go`:
- Around line 68-126: The validator ValidateDatastreamPackageCategories
currently reads and parses manifest.yml using fs.ReadFile and yaml.Unmarshal;
replace that logic with the repo-standard pkgpath.Files() + file.Values()
pattern and move the file-reading/parsing into a new helper (e.g.,
parsePackageManifest or loadPackageManifest) to follow guidelines; update
ValidateDatastreamPackageCategories to call the helper to obtain a
packageManifestWithPackageCategories and keep the rest of the logic intact, and
ensure any other YAML reads (e.g., readDataStreamManifestCategories) follow the
same pkgpath.Files()/file.Values() approach if applicable.
- Around line 23-57: In fetchRegistryParentCategories ensure you check HTTP
response status and validate parsed categories: after
client.Get(packageRegistryCategoriesURL) verify resp.StatusCode == http.StatusOK
and return a descriptive error including resp.Status and resp.StatusCode if not
200; perform status check before reading the body to avoid treating error pages
as valid YAML; after yaml.Unmarshal ensure rc.Categories is non-nil and
non-empty and return an error if empty so validation isn't silently skipped;
keep the existing defer resp.Body.Close and include the
packageRegistryCategoriesURL or HTTP status in error messages for easier
debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aee60147-cd89-45e6-bb33-24a70d185d38

📥 Commits

Reviewing files that changed from the base of the PR and between e7cc89f and 3c49b8d.

📒 Files selected for processing (15)
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go
  • code/go/internal/validator/semantic/validate_datastream_package_categories_test.go
  • code/go/internal/validator/semantic/validate_policy_template_datastream_categories_test.go
  • code/go/internal/validator/spec.go
  • code/go/pkg/validator/validator_test.go
  • test/packages/bad_datastream_package_categories/changelog.yml
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/fields/fields.yml
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/bad_datastream_package_categories/manifest.yml
  • test/packages/good_datastream_package_categories/changelog.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • test/packages/good_datastream_package_categories/data_stream/mylogs/fields/fields.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/good_datastream_package_categories/manifest.yml
✅ Files skipped from review due to trivial changes (4)
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • code/go/pkg/validator/validator_test.go
  • test/packages/good_datastream_package_categories/changelog.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • code/go/internal/validator/spec.go

@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from d7b1451 to 3d214a1 Compare March 5, 2026 01:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@code/go/internal/validator/semantic/validate_policy_template_datastream_categories.go`:
- Around line 43-46: The code silently skips validation when manifest.type is
not a string by returning ("", nil, nil); in the function in
validate_policy_template_datastream_categories.go replace that silent return
with an explicit error return (e.g., return "", nil, fmt.Errorf(...)) so callers
receive a validation failure; detect the failed type assertion on typeVal ->
pkgType and return a descriptive error mentioning the invalid manifest.type
value/type and the function name (or "manifest.type") so the validation pipeline
can surface the issue.
- Around line 33-57: The function readPackageManifestPolicyTemplates currently
reads/parses the manifest via raw fs/yaml; replace this with the package
manifest helpers: use pkgpath.Files() (with the package manifest glob) to load
the manifest file(s), then call file.Values("$.type") to get pkgType and
file.Values(...) JSONPath queries to extract PolicyTemplates and their
categories into packageManifestWithCategories instead of yaml.Unmarshal; update
readPackageManifestPolicyTemplates to return pkgType and pkg.PolicyTemplates
from the file.Values results and remove direct fs/yaml usage; apply the same
replacement for the similar parsing block later (the other function using
fs.ReadFile/yaml.Unmarshal) so both use pkgpath.Files() and file.Values()
JSONPath helpers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ea0e2af-c551-4be2-879f-88a658148ab3

📥 Commits

Reviewing files that changed from the base of the PR and between 3c49b8d and d7b1451.

📒 Files selected for processing (7)
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go
  • code/go/internal/validator/semantic/validate_policy_template_datastream_categories.go
  • code/go/internal/validator/spec.go
  • code/go/pkg/validator/validator_test.go
  • spec/changelog.yml
  • test/packages/bad_datastream_categories_mismatch/manifest.yml
  • test/packages/good_datastream_categories_match/manifest.yml
🚧 Files skipped from review as they are similar to previous changes (5)
  • code/go/pkg/validator/validator_test.go
  • test/packages/bad_datastream_categories_mismatch/manifest.yml
  • test/packages/good_datastream_categories_match/manifest.yml
  • code/go/internal/validator/spec.go
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go

@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from 85b399b to 5db2607 Compare March 5, 2026 02:25
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch 2 times, most recently from 768ed0a to 7f03b1f Compare March 5, 2026 04:20
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @JDKurma

@JDKurma
Copy link
Author

JDKurma commented Mar 10, 2026

Do we have a propagation from policy to package category? This also should be not fine grained categories, but parents according to the tree structure

@trisch-me I've added validation that every datastream parent level category is also present in the package categories

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants