Fix RDF tag mapper by harshach · Pull Request #27562 · open-metadata/OpenMetadata

harshach · 2026-04-21T00:56:37Z

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

Refactored RDF property mapping tests:
- Removed hasVotes support from RDF mapping, updating RdfPropertyMapperTest to assert that votes are ignored.
- Updated RdfPropertyMapperTest display names and cleanup to reflect the removal of vote helper branches.
Improved integration test stability:
- Added @Isolated annotation to GlossaryOntologyExportIT to prevent CI timeouts caused by global RDF singleton state and resource contention.

_{This will update automatically on new commits.}

Copilot

Pull request overview

This PR updates the RDF translation layer to materialize tags/tiers/certifications as first-class RDF links (canonical entity URIs) rather than relying on synthetic FQN-based URIs or JSON literals, and adds an integration test to validate the resulting RDF graph behavior.

Changes:

Resolve TagLabel targets to canonical entity/tag/{uuid} (and glossary terms to entity/glossaryTerm/{uuid}) with caching and tier shortcuts.
Emit structured RDF triples for certification (e.g., om:hasCertification, om:certificationLevel, timestamps) instead of a JSON literal.
Add an RDF integration test covering classification tags, tier shortcut, and certification structured output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java	Canonical tag/glossary resolution, tier shortcut emission, and certification structured RDF handling.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/RdfTagsTierCertificationIT.java	New integration coverage asserting RDF links/types for tags, tier, and certification decomposition.

Copilot · 2026-04-21T01:02:14Z

  // Properties that should be mapped to structured RDF instead of JSON literals
  private static final Set<String> STRUCTURED_PROPERTIES =
-      Set.of("votes", "lifeCycle", "customProperties", "extension");
+      Set.of("lifeCycle", "customProperties", "extension", "certification");

  // Properties that should be omitted from RDF because they are audit/helper data.
-  private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription");
+  private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription", "votes");



votes is now in IGNORED_PROPERTIES and removed from STRUCTURED_PROPERTIES, which effectively drops the structured om:hasVotes/om:upVotes/om:downVotes RDF output. The service module already has tests and SQL→SPARQL nested mapping that expect votes to be present (e.g. RdfPropertyMapperTest and SqlMappingContext’s votes nested mapping). Either keep the existing votes structured mapping (and the addVotes behavior) or update the dependent tests/mappings as part of this PR to avoid a functional regression and test failures.

Copilot · 2026-04-21T01:02:15Z

+    if (isGlossary) {
+      tagResource.addProperty(RDF.type, model.createResource(getRdfType("glossaryTerm")));
+      tagResource.addProperty(RDF.type, model.createResource(SKOS.getURI() + "Concept"));
+      resource.addProperty(model.createProperty(OM_NS, "hasGlossaryTerm"), tagResource);
+    } else {
+      tagResource.addProperty(RDF.type, model.createResource(getRdfType("tag")));
+      tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
+      if (tagFqn.startsWith(TIER_CLASSIFICATION_PREFIX)) {
+        resource.addProperty(model.createProperty(OM_NS, "hasTier"), tagResource);
+      }


addTagLabel adds rdf:type using model.createResource(getRdfType("tag"/"glossaryTerm")), but getRdfType returns CURIEs like skos:Concept. Jena does not expand prefixes in createResource, so this produces a type IRI literally equal to skos:Concept rather than http://www.w3.org/2004/02/skos/core#Concept. Consider either expanding the CURIE via the model’s prefix mapping (as JsonLdTranslator does) or just using the SKOS vocabulary constant for Concept to avoid emitting incorrect type IRIs.

github-actions · 2026-04-21T02:52:09Z

🟡 Playwright Results — all passed (15 flaky)

✅ 3958 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	298	1	4
🟡 Shard 2	755	4	8
🟡 Shard 3	729	3	7
🟡 Shard 4	758	1	18
🟡 Shard 5	685	2	41
🟡 Shard 6	733	4	8

🟡 15 flaky test(s) (passed on retry)

Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
Features/Glossary/GlossaryMutualExclusivity.spec.ts › ME-H04: Toggle ME flag via edit after children exist (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/Tag.spec.ts › Certification Page should not have Asset button for Data Consumer (shard 6, 1 retry)
Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-23T17:27:24Z

+    String tagFqn = tagLabel.get("tagFQN").asText();
+    String source = tagLabel.has("source") ? tagLabel.get("source").asText() : "Classification";
+    Resource tagResource = resolveTagResource(tagFqn, source, tagLabel, model);
+    tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));


addCertification types the resolved certification tag only as om:Tag, while addTagLabel also adds the getRdfType("tag") type (skos:Concept). If the intent is consistent typing for tag resources emitted from different paths, consider adding the same skos:Concept type here (or centralizing tag typing) to avoid divergent RDF shapes for otherwise identical tag URIs.

Suggested change

tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));

tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));

tagResource.addProperty(RDF.type, SKOS.Concept);

gitar-bot · 2026-04-23T19:00:27Z

+    if (curieOrUri == null || curieOrUri.isEmpty()) {
+      return model.createResource();


💡 Edge Case: createTypeResource returns blank node for null/empty type

When getRdfType(entityType) returns null or empty, createTypeResource falls back to model.createResource() which creates an anonymous blank node. This blank node is then used as the object of an RDF.type triple (e.g., refResource.addProperty(RDF.type, createTypeResource(...))), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since getRdfType returns om:<PascalCase> for unknown types), it would silently produce confusing RDF rather than logging a warning.

Suggested fix:

if (curieOrUri == null || curieOrUri.isEmpty()) { LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType); return model.createResource(OM_NS + entityType); }

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

Copilot

Pull request overview

Copilot reviewed 3 out of 1105 changed files in this pull request and generated 7 comments.

Copilot · 2026-04-26T15:44:19Z

  // Properties that should be mapped to structured RDF instead of JSON literals
  private static final Set<String> STRUCTURED_PROPERTIES =
-      Set.of("votes", "lifeCycle", "customProperties", "extension");
+      Set.of("lifeCycle", "customProperties", "extension", "certification");


customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.

Suggested change

Set.of("lifeCycle", "customProperties", "extension", "certification");

Set.of("lifeCycle", "extension", "certification");

Copilot · 2026-04-26T15:44:20Z

-      case "votes" -> addVotes(value, entityResource, model);
      case "lifeCycle" -> addLifeCycle(value, entityResource, model);
      case "extension" -> addExtension(value, entityResource, model);
+      case "certification" -> addCertification(value, entityResource, model);


customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.

Suggested change

case "certification" -> addCertification(value, entityResource, model);

case "certification" -> addCertification(value, entityResource, model);

case "customProperties" -> addStructuredArrayProperty(fieldName, value, entityResource, model);

Copilot · 2026-04-26T15:44:20Z

+  private Resource createTypeResource(String entityType, Model model) {
+    String curieOrUri = getRdfType(entityType);
+    if (curieOrUri == null || curieOrUri.isEmpty()) {
+      return model.createResource();
+    }
+    if (curieOrUri.startsWith("http://") || curieOrUri.startsWith("https://")) {
+      return model.createResource(curieOrUri);
+    }


When getRdfType(entityType) is null/empty, createTypeResource returns a blank node, and callers then add it as rdf:type _:bN. That produces invalid/meaningless typing triples. Prefer returning null/Optional<Resource> and skipping the rdf:type triple when the type cannot be resolved, or returning a well-defined fallback URI if you have one (but a blank node rdf:type should be avoided).

Copilot · 2026-04-26T15:44:20Z

+    if (id != null) {
+      return model.createResource(baseUri + "entity/" + entityType + "/" + id);
+    }
+    return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/"));


The fallback URI for unresolved tags always uses baseUri + \"tag/\"... even when source is Glossary. That can mis-identify a glossary term as a tag, and can also collide with real classification-tag synthetic URIs. Make the fallback source-aware (e.g., baseUri + \"glossaryTerm/\"... for glossary terms, or at least a distinct synthetic namespace per source).

Suggested change

return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/"));

return model.createResource(baseUri + entityType + "/" + tagFqn.replace(".", "/"));

Copilot · 2026-04-26T15:44:20Z

+      Tag tag = Entity.getEntityByName(Entity.TAG, tagFqn, "", Include.NON_DELETED, false);
+      UUID id = tag != null ? tag.getId() : null;


Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.

Copilot · 2026-04-26T15:44:21Z

      GlossaryTerm term =
-          Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "id", Include.NON_DELETED, false);
+          Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "", Include.NON_DELETED, false);
      UUID termId = term != null ? term.getId() : null;


Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.

Copilot · 2026-04-26T15:44:21Z

+    tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
+    tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);


addCertification always types the resolved resource as om:Tag, even if source is Glossary (in which case resolveTagResource can return an entity/glossaryTerm/{uuid} URI). This creates inconsistent RDF typing for glossary-backed certifications. Consider mirroring addTagLabel behavior: set types based on source (e.g., skos:Concept/glossaryTerm vs om:Tag) and optionally add om:tagSource for parity.

Suggested change

tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));

tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);

boolean isGlossarySource = "Glossary".equalsIgnoreCase(source);

tagResource.addProperty(

RDF.type, isGlossarySource ? SKOS.Concept : model.createResource(OM_NS + "Tag"));

tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);

if (source != null && !source.isEmpty()) {

tagResource.addProperty(model.createProperty(OM_NS, "tagSource"), source);

}

@isolated

…rt IT RdfPropertyMapperTest still referenced the removed addVotes helper and expected addStructuredProperty to dispatch votes — both gone after votes was added to IGNORED_PROPERTIES. Update the assertions accordingly. GlossaryOntologyExportIT timed out on the full suite because it flips a global RDF singleton in @BeforeAll and each test blocks a server thread on synchronous Fuseki writes. SAME_THREAD only serialized methods within the class — concurrent classes still raced for server threads. Adding @isolated matches the pattern already used by RdfResourceIT for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-04-26T17:33:26Z

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

The RDF tag mapper logic ensures correct type assignment, but createTypeResource requires a fallback fix for blank nodes when type inputs are null or empty.

💡 Edge Case: createTypeResource returns blank node for null/empty type

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java:1125-1126

When getRdfType(entityType) returns null or empty, createTypeResource falls back to model.createResource() which creates an anonymous blank node. This blank node is then used as the object of an RDF.type triple (e.g., refResource.addProperty(RDF.type, createTypeResource(...))), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since getRdfType returns om:<PascalCase> for unknown types), it would silently produce confusing RDF rather than logging a warning.

Suggested fix

if (curieOrUri == null || curieOrUri.isEmpty()) {
  LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
  return model.createResource(OM_NS + entityType);
}

🤖 Prompt for agents

Code Review: The RDF tag mapper logic ensures correct type assignment, but createTypeResource requires a fallback fix for blank nodes when type inputs are null or empty.

1. 💡 Edge Case: createTypeResource returns blank node for null/empty type
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java:1125-1126

   When `getRdfType(entityType)` returns null or empty, `createTypeResource` falls back to `model.createResource()` which creates an anonymous blank node. This blank node is then used as the object of an `RDF.type` triple (e.g., `refResource.addProperty(RDF.type, createTypeResource(...))`), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since `getRdfType` returns `om:<PascalCase>` for unknown types), it would silently produce confusing RDF rather than logging a warning.

   Suggested fix:
   if (curieOrUri == null || curieOrUri.isEmpty()) {
     LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
     return model.createResource(OM_NS + entityType);
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-04-26T18:34:00Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Fix RDF tag mapper

f624265

Copilot AI review requested due to automatic review settings April 21, 2026 00:56

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 21, 2026

Copilot started reviewing on behalf of harshach April 21, 2026 00:57 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

harshach temporarily deployed to test April 21, 2026 01:06 — with GitHub Actions Inactive

harshach temporarily deployed to test April 21, 2026 01:08 — with GitHub Actions Inactive

harshach had a problem deploying to test April 21, 2026 01:08 — with GitHub Actions Failure

harshach temporarily deployed to test April 21, 2026 01:08 — with GitHub Actions Inactive

harshach had a problem deploying to test April 21, 2026 01:08 — with GitHub Actions Failure

Merge branch 'main' into rdf_fixes

59c8542

harshach temporarily deployed to test April 22, 2026 21:49 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/main' into rdf_fixes

4d74d16

Copilot AI review requested due to automatic review settings April 23, 2026 17:20

Copilot started reviewing on behalf of harshach April 23, 2026 17:21 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

harshach temporarily deployed to test April 23, 2026 17:30 — with GitHub Actions Inactive

harshach had a problem deploying to test April 23, 2026 17:33 — with GitHub Actions Error

harshach had a problem deploying to test April 23, 2026 17:33 — with GitHub Actions Failure

harshach had a problem deploying to test April 23, 2026 17:33 — with GitHub Actions Error

Fix all the comments

ee36b19

gitar-bot Bot reviewed Apr 23, 2026

View reviewed changes

harshach temporarily deployed to test April 23, 2026 19:07 — with GitHub Actions Inactive

harshach temporarily deployed to test April 23, 2026 19:10 — with GitHub Actions Inactive

harshach had a problem deploying to test April 23, 2026 19:10 — with GitHub Actions Failure

harshach temporarily deployed to test April 23, 2026 19:10 — with GitHub Actions Inactive

Merge branch 'main' into rdf_fixes

cbb20d9

Copilot AI review requested due to automatic review settings April 26, 2026 15:42

Copilot AI reviewed Apr 26, 2026

View reviewed changes

harshach temporarily deployed to test April 26, 2026 15:51 — with GitHub Actions Inactive

harshach had a problem deploying to test April 26, 2026 15:54 — with GitHub Actions Error

harshach temporarily deployed to test April 26, 2026 15:54 — with GitHub Actions Inactive

harshach had a problem deploying to test April 26, 2026 15:54 — with GitHub Actions Error

harshach had a problem deploying to test April 26, 2026 17:40 — with GitHub Actions Failure

harshach temporarily deployed to test April 26, 2026 17:42 — with GitHub Actions Inactive

	tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
	tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
	tagResource.addProperty(RDF.type, SKOS.Concept);

		if (curieOrUri == null \|\| curieOrUri.isEmpty()) {
		return model.createResource();

	Set.of("lifeCycle", "customProperties", "extension", "certification");
	Set.of("lifeCycle", "extension", "certification");

	case "certification" -> addCertification(value, entityResource, model);
	case "certification" -> addCertification(value, entityResource, model);
	case "customProperties" -> addStructuredArrayProperty(fieldName, value, entityResource, model);

	return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/"));
	return model.createResource(baseUri + entityType + "/" + tagFqn.replace(".", "/"));

		Tag tag = Entity.getEntityByName(Entity.TAG, tagFqn, "", Include.NON_DELETED, false);
		UUID id = tag != null ? tag.getId() : null;

		tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag"));
		tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn);

Conversation

harshach commented Apr 21, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (15 flaky)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 26, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshach commented Apr 21, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading

gitar-bot Bot commented Apr 26, 2026 •

edited

Loading