Skip to content

Fix RDF tag mapper#27562

Open
harshach wants to merge 6 commits intomainfrom
rdf_fixes
Open

Fix RDF tag mapper#27562
harshach wants to merge 6 commits intomainfrom
rdf_fixes

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Apr 21, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Refactored RDF property mapping tests:
    • Removed hasVotes support from RDF mapping, updating RdfPropertyMapperTest to assert that votes are ignored.
    • Updated RdfPropertyMapperTest display names and cleanup to reflect the removal of vote helper branches.
  • Improved integration test stability:
    • Added @Isolated annotation to GlossaryOntologyExportIT to prevent CI timeouts caused by global RDF singleton state and resource contention.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 21, 2026 00:56
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the RDF translation layer to materialize tags/tiers/certifications as first-class RDF links (canonical entity URIs) rather than relying on synthetic FQN-based URIs or JSON literals, and adds an integration test to validate the resulting RDF graph behavior.

Changes:

  • Resolve TagLabel targets to canonical entity/tag/{uuid} (and glossary terms to entity/glossaryTerm/{uuid}) with caching and tier shortcuts.
  • Emit structured RDF triples for certification (e.g., om:hasCertification, om:certificationLevel, timestamps) instead of a JSON literal.
  • Add an RDF integration test covering classification tags, tier shortcut, and certification structured output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java Canonical tag/glossary resolution, tier shortcut emission, and certification structured RDF handling.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/RdfTagsTierCertificationIT.java New integration coverage asserting RDF links/types for tags, tier, and certification decomposition.

Comment on lines 47 to 53
// Properties that should be mapped to structured RDF instead of JSON literals
private static final Set<String> STRUCTURED_PROPERTIES =
Set.of("votes", "lifeCycle", "customProperties", "extension");
Set.of("lifeCycle", "customProperties", "extension", "certification");

// Properties that should be omitted from RDF because they are audit/helper data.
private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription");
private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription", "votes");

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

votes is now in IGNORED_PROPERTIES and removed from STRUCTURED_PROPERTIES, which effectively drops the structured om:hasVotes/om:upVotes/om:downVotes RDF output. The service module already has tests and SQL→SPARQL nested mapping that expect votes to be present (e.g. RdfPropertyMapperTest and SqlMappingContext’s votes nested mapping). Either keep the existing votes structured mapping (and the addVotes behavior) or update the dependent tests/mappings as part of this PR to avoid a functional regression and test failures.

Copilot uses AI. Check for mistakes.
Comment on lines +294 to +303
if (isGlossary) {
tagResource.addProperty(RDF.type, model.createResource(getRdfType("glossaryTerm")));
tagResource.addProperty(RDF.type, model.createResource(SKOS.getURI() + "Concept"));
resource.addProperty(model.createProperty(OM_NS, "hasGlossaryTerm"), tagResource);
} else {
tagResource.addProperty(RDF.type, model.createResource(getRdfType("tag")));
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
if (tagFqn.startsWith(TIER_CLASSIFICATION_PREFIX)) {
resource.addProperty(model.createProperty(OM_NS, "hasTier"), tagResource);
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addTagLabel adds rdf:type using model.createResource(getRdfType("tag"/"glossaryTerm")), but getRdfType returns CURIEs like skos:Concept. Jena does not expand prefixes in createResource, so this produces a type IRI literally equal to skos:Concept rather than http://www.w3.org/2004/02/skos/core#Concept. Consider either expanding the CURIE via the model’s prefix mapping (as JsonLdTranslator does) or just using the SKOS vocabulary constant for Concept to avoid emitting incorrect type IRIs.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

🟡 Playwright Results — all passed (15 flaky)

✅ 3958 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 755 0 4 8
🟡 Shard 3 729 0 3 7
🟡 Shard 4 758 0 1 18
🟡 Shard 5 685 0 2 41
🟡 Shard 6 733 0 4 8
🟡 15 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/Glossary/GlossaryMutualExclusivity.spec.ts › ME-H04: Toggle ME flag via edit after children exist (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Certification Page should not have Asset button for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings April 23, 2026 17:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

String tagFqn = tagLabel.get("tagFQN").asText();
String source = tagLabel.has("source") ? tagLabel.get("source").asText() : "Classification";
Resource tagResource = resolveTagResource(tagFqn, source, tagLabel, model);
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addCertification types the resolved certification tag only as om:Tag, while addTagLabel also adds the getRdfType("tag") type (skos:Concept). If the intent is consistent typing for tag resources emitted from different paths, consider adding the same skos:Concept type here (or centralizing tag typing) to avoid divergent RDF shapes for otherwise identical tag URIs.

Suggested change
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
tagResource.addProperty(RDF.type, SKOS.Concept);

Copilot uses AI. Check for mistakes.
Comment on lines +1125 to +1126
if (curieOrUri == null || curieOrUri.isEmpty()) {
return model.createResource();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: createTypeResource returns blank node for null/empty type

When getRdfType(entityType) returns null or empty, createTypeResource falls back to model.createResource() which creates an anonymous blank node. This blank node is then used as the object of an RDF.type triple (e.g., refResource.addProperty(RDF.type, createTypeResource(...))), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since getRdfType returns om:<PascalCase> for unknown types), it would silently produce confusing RDF rather than logging a warning.

Suggested fix:

if (curieOrUri == null || curieOrUri.isEmpty()) {
  LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
  return model.createResource(OM_NS + entityType);
}

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copilot AI review requested due to automatic review settings April 26, 2026 15:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 1105 changed files in this pull request and generated 7 comments.

// Properties that should be mapped to structured RDF instead of JSON literals
private static final Set<String> STRUCTURED_PROPERTIES =
Set.of("votes", "lifeCycle", "customProperties", "extension");
Set.of("lifeCycle", "customProperties", "extension", "certification");
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.

Suggested change
Set.of("lifeCycle", "customProperties", "extension", "certification");
Set.of("lifeCycle", "extension", "certification");

Copilot uses AI. Check for mistakes.
case "votes" -> addVotes(value, entityResource, model);
case "lifeCycle" -> addLifeCycle(value, entityResource, model);
case "extension" -> addExtension(value, entityResource, model);
case "certification" -> addCertification(value, entityResource, model);
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.

Suggested change
case "certification" -> addCertification(value, entityResource, model);
case "certification" -> addCertification(value, entityResource, model);
case "customProperties" -> addStructuredArrayProperty(fieldName, value, entityResource, model);

Copilot uses AI. Check for mistakes.
Comment on lines +1123 to +1130
private Resource createTypeResource(String entityType, Model model) {
String curieOrUri = getRdfType(entityType);
if (curieOrUri == null || curieOrUri.isEmpty()) {
return model.createResource();
}
if (curieOrUri.startsWith("http://") || curieOrUri.startsWith("https://")) {
return model.createResource(curieOrUri);
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When getRdfType(entityType) is null/empty, createTypeResource returns a blank node, and callers then add it as rdf:type _:bN. That produces invalid/meaningless typing triples. Prefer returning null/Optional<Resource> and skipping the rdf:type triple when the type cannot be resolved, or returning a well-defined fallback URI if you have one (but a blank node rdf:type should be avoided).

Copilot uses AI. Check for mistakes.
if (id != null) {
return model.createResource(baseUri + "entity/" + entityType + "/" + id);
}
return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/"));
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback URI for unresolved tags always uses baseUri + \"tag/\"... even when source is Glossary. That can mis-identify a glossary term as a tag, and can also collide with real classification-tag synthetic URIs. Make the fallback source-aware (e.g., baseUri + \"glossaryTerm/\"... for glossary terms, or at least a distinct synthetic namespace per source).

Suggested change
return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/"));
return model.createResource(baseUri + entityType + "/" + tagFqn.replace(".", "/"));

Copilot uses AI. Check for mistakes.
Comment on lines +379 to +380
Tag tag = Entity.getEntityByName(Entity.TAG, tagFqn, "", Include.NON_DELETED, false);
UUID id = tag != null ? tag.getId() : null;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.

Copilot uses AI. Check for mistakes.
Comment on lines 407 to 409
GlossaryTerm term =
Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "id", Include.NON_DELETED, false);
Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "", Include.NON_DELETED, false);
UUID termId = term != null ? term.getId() : null;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.

Copilot uses AI. Check for mistakes.
Comment on lines +495 to +496
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addCertification always types the resolved resource as om:Tag, even if source is Glossary (in which case resolveTagResource can return an entity/glossaryTerm/{uuid} URI). This creates inconsistent RDF typing for glossary-backed certifications. Consider mirroring addTagLabel behavior: set types based on source (e.g., skos:Concept/glossaryTerm vs om:Tag) and optionally add om:tagSource for parity.

Suggested change
tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);
boolean isGlossarySource = "Glossary".equalsIgnoreCase(source);
tagResource.addProperty(
RDF.type, isGlossarySource ? SKOS.Concept : model.createResource(OM_NS + "Tag"));
tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);
if (source != null && !source.isEmpty()) {
tagResource.addProperty(model.createProperty(OM_NS, "tagSource"), source);
}

Copilot uses AI. Check for mistakes.
…rt IT

RdfPropertyMapperTest still referenced the removed addVotes helper and
expected addStructuredProperty to dispatch votes — both gone after votes
was added to IGNORED_PROPERTIES. Update the assertions accordingly.

GlossaryOntologyExportIT timed out on the full suite because it flips a
global RDF singleton in @BeforeAll and each test blocks a server thread on
synchronous Fuseki writes. SAME_THREAD only serialized methods within the
class — concurrent classes still raced for server threads. Adding @isolated
matches the pattern already used by RdfResourceIT for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 26, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

The RDF tag mapper logic ensures correct type assignment, but createTypeResource requires a fallback fix for blank nodes when type inputs are null or empty.

💡 Edge Case: createTypeResource returns blank node for null/empty type

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java:1125-1126

When getRdfType(entityType) returns null or empty, createTypeResource falls back to model.createResource() which creates an anonymous blank node. This blank node is then used as the object of an RDF.type triple (e.g., refResource.addProperty(RDF.type, createTypeResource(...))), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since getRdfType returns om:<PascalCase> for unknown types), it would silently produce confusing RDF rather than logging a warning.

Suggested fix
if (curieOrUri == null || curieOrUri.isEmpty()) {
  LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
  return model.createResource(OM_NS + entityType);
}
🤖 Prompt for agents
Code Review: The RDF tag mapper logic ensures correct type assignment, but createTypeResource requires a fallback fix for blank nodes when type inputs are null or empty.

1. 💡 Edge Case: createTypeResource returns blank node for null/empty type
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java:1125-1126

   When `getRdfType(entityType)` returns null or empty, `createTypeResource` falls back to `model.createResource()` which creates an anonymous blank node. This blank node is then used as the object of an `RDF.type` triple (e.g., `refResource.addProperty(RDF.type, createTypeResource(...))`), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since `getRdfType` returns `om:<PascalCase>` for unknown types), it would silently produce confusing RDF rather than logging a warning.

   Suggested fix:
   if (curieOrUri == null || curieOrUri.isEmpty()) {
     LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
     return model.createResource(OM_NS + entityType);
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants