Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the RDF translation layer to materialize tags/tiers/certifications as first-class RDF links (canonical entity URIs) rather than relying on synthetic FQN-based URIs or JSON literals, and adds an integration test to validate the resulting RDF graph behavior.
Changes:
- Resolve
TagLabeltargets to canonicalentity/tag/{uuid}(and glossary terms toentity/glossaryTerm/{uuid}) with caching and tier shortcuts. - Emit structured RDF triples for
certification(e.g.,om:hasCertification,om:certificationLevel, timestamps) instead of a JSON literal. - Add an RDF integration test covering classification tags, tier shortcut, and certification structured output.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfPropertyMapper.java | Canonical tag/glossary resolution, tier shortcut emission, and certification structured RDF handling. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/RdfTagsTierCertificationIT.java | New integration coverage asserting RDF links/types for tags, tier, and certification decomposition. |
| // Properties that should be mapped to structured RDF instead of JSON literals | ||
| private static final Set<String> STRUCTURED_PROPERTIES = | ||
| Set.of("votes", "lifeCycle", "customProperties", "extension"); | ||
| Set.of("lifeCycle", "customProperties", "extension", "certification"); | ||
|
|
||
| // Properties that should be omitted from RDF because they are audit/helper data. | ||
| private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription"); | ||
| private static final Set<String> IGNORED_PROPERTIES = Set.of("changeDescription", "votes"); | ||
|
|
There was a problem hiding this comment.
votes is now in IGNORED_PROPERTIES and removed from STRUCTURED_PROPERTIES, which effectively drops the structured om:hasVotes/om:upVotes/om:downVotes RDF output. The service module already has tests and SQL→SPARQL nested mapping that expect votes to be present (e.g. RdfPropertyMapperTest and SqlMappingContext’s votes nested mapping). Either keep the existing votes structured mapping (and the addVotes behavior) or update the dependent tests/mappings as part of this PR to avoid a functional regression and test failures.
| if (isGlossary) { | ||
| tagResource.addProperty(RDF.type, model.createResource(getRdfType("glossaryTerm"))); | ||
| tagResource.addProperty(RDF.type, model.createResource(SKOS.getURI() + "Concept")); | ||
| resource.addProperty(model.createProperty(OM_NS, "hasGlossaryTerm"), tagResource); | ||
| } else { | ||
| tagResource.addProperty(RDF.type, model.createResource(getRdfType("tag"))); | ||
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); | ||
| if (tagFqn.startsWith(TIER_CLASSIFICATION_PREFIX)) { | ||
| resource.addProperty(model.createProperty(OM_NS, "hasTier"), tagResource); | ||
| } |
There was a problem hiding this comment.
addTagLabel adds rdf:type using model.createResource(getRdfType("tag"/"glossaryTerm")), but getRdfType returns CURIEs like skos:Concept. Jena does not expand prefixes in createResource, so this produces a type IRI literally equal to skos:Concept rather than http://www.w3.org/2004/02/skos/core#Concept. Consider either expanding the CURIE via the model’s prefix mapping (as JsonLdTranslator does) or just using the SKOS vocabulary constant for Concept to avoid emitting incorrect type IRIs.
🟡 Playwright Results — all passed (15 flaky)✅ 3958 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 86 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
| String tagFqn = tagLabel.get("tagFQN").asText(); | ||
| String source = tagLabel.has("source") ? tagLabel.get("source").asText() : "Classification"; | ||
| Resource tagResource = resolveTagResource(tagFqn, source, tagLabel, model); | ||
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); |
There was a problem hiding this comment.
addCertification types the resolved certification tag only as om:Tag, while addTagLabel also adds the getRdfType("tag") type (skos:Concept). If the intent is consistent typing for tag resources emitted from different paths, consider adding the same skos:Concept type here (or centralizing tag typing) to avoid divergent RDF shapes for otherwise identical tag URIs.
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); | |
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); | |
| tagResource.addProperty(RDF.type, SKOS.Concept); |
| if (curieOrUri == null || curieOrUri.isEmpty()) { | ||
| return model.createResource(); |
There was a problem hiding this comment.
💡 Edge Case: createTypeResource returns blank node for null/empty type
When getRdfType(entityType) returns null or empty, createTypeResource falls back to model.createResource() which creates an anonymous blank node. This blank node is then used as the object of an RDF.type triple (e.g., refResource.addProperty(RDF.type, createTypeResource(...))), producing a semantically meaningless type assertion. While unlikely to be hit in practice (since getRdfType returns om:<PascalCase> for unknown types), it would silently produce confusing RDF rather than logging a warning.
Suggested fix:
if (curieOrUri == null || curieOrUri.isEmpty()) {
LOG.warn("No RDF type mapping found for entity type '{}'; skipping rdf:type triple", entityType);
return model.createResource(OM_NS + entityType);
}
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| // Properties that should be mapped to structured RDF instead of JSON literals | ||
| private static final Set<String> STRUCTURED_PROPERTIES = | ||
| Set.of("votes", "lifeCycle", "customProperties", "extension"); | ||
| Set.of("lifeCycle", "customProperties", "extension", "certification"); |
There was a problem hiding this comment.
customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.
| Set.of("lifeCycle", "customProperties", "extension", "certification"); | |
| Set.of("lifeCycle", "extension", "certification"); |
| case "votes" -> addVotes(value, entityResource, model); | ||
| case "lifeCycle" -> addLifeCycle(value, entityResource, model); | ||
| case "extension" -> addExtension(value, entityResource, model); | ||
| case "certification" -> addCertification(value, entityResource, model); |
There was a problem hiding this comment.
customProperties is included in STRUCTURED_PROPERTIES, but addStructuredProperty does not handle it. With the new early-dispatch in processContextMappings, this will now skip JSON-LD context handling and effectively drop customProperties from RDF output (only logging a warning). Fix by either (a) implementing a customProperties structured handler, or (b) removing customProperties from STRUCTURED_PROPERTIES, or (c) only continue in the caller when the structured handler actually handled the field.
| case "certification" -> addCertification(value, entityResource, model); | |
| case "certification" -> addCertification(value, entityResource, model); | |
| case "customProperties" -> addStructuredArrayProperty(fieldName, value, entityResource, model); |
| private Resource createTypeResource(String entityType, Model model) { | ||
| String curieOrUri = getRdfType(entityType); | ||
| if (curieOrUri == null || curieOrUri.isEmpty()) { | ||
| return model.createResource(); | ||
| } | ||
| if (curieOrUri.startsWith("http://") || curieOrUri.startsWith("https://")) { | ||
| return model.createResource(curieOrUri); | ||
| } |
There was a problem hiding this comment.
When getRdfType(entityType) is null/empty, createTypeResource returns a blank node, and callers then add it as rdf:type _:bN. That produces invalid/meaningless typing triples. Prefer returning null/Optional<Resource> and skipping the rdf:type triple when the type cannot be resolved, or returning a well-defined fallback URI if you have one (but a blank node rdf:type should be avoided).
| if (id != null) { | ||
| return model.createResource(baseUri + "entity/" + entityType + "/" + id); | ||
| } | ||
| return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/")); |
There was a problem hiding this comment.
The fallback URI for unresolved tags always uses baseUri + \"tag/\"... even when source is Glossary. That can mis-identify a glossary term as a tag, and can also collide with real classification-tag synthetic URIs. Make the fallback source-aware (e.g., baseUri + \"glossaryTerm/\"... for glossary terms, or at least a distinct synthetic namespace per source).
| return model.createResource(baseUri + "tag/" + tagFqn.replace(".", "/")); | |
| return model.createResource(baseUri + entityType + "/" + tagFqn.replace(".", "/")); |
| Tag tag = Entity.getEntityByName(Entity.TAG, tagFqn, "", Include.NON_DELETED, false); | ||
| UUID id = tag != null ? tag.getId() : null; |
There was a problem hiding this comment.
Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.
| GlossaryTerm term = | ||
| Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "id", Include.NON_DELETED, false); | ||
| Entity.getEntityByName(Entity.GLOSSARY_TERM, termFqn, "", Include.NON_DELETED, false); | ||
| UUID termId = term != null ? term.getId() : null; |
There was a problem hiding this comment.
Both lookups request all fields by passing an empty fields string, but the mapper only needs the entity id. This can materially increase DB load during RDF translation. Prefer requesting only id (or the smallest supported field set) in getEntityByName to reduce data fetch and serialization overhead; you can keep the caches as an additional optimization.
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); | ||
| tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn); |
There was a problem hiding this comment.
addCertification always types the resolved resource as om:Tag, even if source is Glossary (in which case resolveTagResource can return an entity/glossaryTerm/{uuid} URI). This creates inconsistent RDF typing for glossary-backed certifications. Consider mirroring addTagLabel behavior: set types based on source (e.g., skos:Concept/glossaryTerm vs om:Tag) and optionally add om:tagSource for parity.
| tagResource.addProperty(RDF.type, model.createResource(OM_NS + "Tag")); | |
| tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn); | |
| boolean isGlossarySource = "Glossary".equalsIgnoreCase(source); | |
| tagResource.addProperty( | |
| RDF.type, isGlossarySource ? SKOS.Concept : model.createResource(OM_NS + "Tag")); | |
| tagResource.addProperty(model.createProperty(OM_NS, "tagFQN"), tagFqn); | |
| if (source != null && !source.isEmpty()) { | |
| tagResource.addProperty(model.createProperty(OM_NS, "tagSource"), source); | |
| } |
…rt IT RdfPropertyMapperTest still referenced the removed addVotes helper and expected addStructuredProperty to dispatch votes — both gone after votes was added to IGNORED_PROPERTIES. Update the assertions accordingly. GlossaryOntologyExportIT timed out on the full suite because it flips a global RDF singleton in @BeforeAll and each test blocks a server thread on synchronous Fuseki writes. SAME_THREAD only serialized methods within the class — concurrent classes still raced for server threads. Adding @isolated matches the pattern already used by RdfResourceIT for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsThe RDF tag mapper logic ensures correct type assignment, but createTypeResource requires a fallback fix for blank nodes when type inputs are null or empty. 💡 Edge Case: createTypeResource returns blank node for null/empty typeWhen Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
hasVotessupport from RDF mapping, updatingRdfPropertyMapperTestto assert that votes are ignored.RdfPropertyMapperTestdisplay names and cleanup to reflect the removal of vote helper branches.@Isolatedannotation toGlossaryOntologyExportITto prevent CI timeouts caused by global RDF singleton state and resource contention.This will update automatically on new commits.