- Purpose of This Style Guide
- OWL Version
- Serialization
- Logical Consistency
- Namespaces
- gist Stylistic Conventions
- Ontology Best Practices
The purpose of this document is twofold:
- Define and implement conventions in order to standardize gist and make it as clear, coherent, and precise as possible. Although gist is the product of many minds, years, and experiences, the goal is to give it the feel of having a single author writing with a uniform style.
- Articulate what we consider best or "better" practices for ontology design and implementation.
gist is an OWL 2 DL ontology. This means that it cannot import non-OWL ontologies or use terms from RDF ontologies (i.e., classes and properties defined only as rdf:Property and rdfs:Class, as in Dublin Core) while remaining OWL 2 DL-compliant.
- gist OWL files are serialized in RDF Turtle.
- The EDM Council's RDF serialization tool,
rdf-toolkit.jar, must be run before every commit in order to standardize formatting and eliminate noise in git diffs.- When you set up your local repository, you will run
tools/setup.cmdwhich, among other things, installs a pre-commit hook that runs the serializer intools/serializer. See Contributing for further instructions on setting up your repository. - To ensure consistent output, only the version of
rdf-toolkit.jarfile found intools/serializershould be used.
- When you set up your local repository, you will run
Every version of gist committed to the git repository must be logically consistent. See Contributing.
gist defines three namespaces:
- Ontology namespace:
gist: <https://w3id.org/semanticarts/ns/ontology/gist/> - Taxonomy namespace:
gistx: <https://w3id.org/semanticarts/ns/taxonomy/gist/> - Instance data namespace:
gistd: <https://w3id.org/semanticarts/ns/data/gist/>
- Camelcase
- Classes initial uppercase
- Properties initial lowercase
- Alphanumeric characters only.
- Example:
Isbn10, notIsbn-10orISBN-10.
- Example:
- Acronyms are also camelcased so that word boundaries are unambiguous.
- Examples:
AmaGuideline, notAMAGuideline;UriScheme, notURIScheme IDis an exception, because Merriam-Webster spells it in all-caps.
- Examples:
- No non-standard abbreviations. E.g.,
hasUoMshould behasUnitOfMeasure.
These standards involve wording choices, which are often more difficult to define and reach consensus on than simple orthographic conventions. The goal of defining standards is to improve the ontology along the following metrics:
- Consistency: The ontology could have been written by a single person.
- Objectivity: Two ontologists following these standards should agree on the name for a new property in most cases.
- Clarity
- Explicitness
- Idiomaticity: Follows English natural language insofar as possible. This includes "reading well", as in
Mary isConnectedTo Johnrather thanMary connectedTo John. - Accuracy: Expresses intended meaning.
- Alignment with textual definitions. In some cases this requires a re-analysis of the intended meaning, and then perhaps a change in definition rather than local name. However, within the current scope of work, the local name was changed to match the definition, and the re-analysis will be done at a later time.
Some of the examples resulted in changes in gist 10.0.0, others are hypothetical.
| Standard | Examples |
|---|---|
| Datatype properties nominal | baseConversionFactor, not convertToBase |
| Object properties verb-initial | isAbout, not about |
comesFromAgent, not fromAgent |
|
isMemberOf, not memberOf |
|
precedesDirectly, not directlyPrecedes |
|
usesTimeZoneStandard, not timeZoneStandardUsed |
|
| Prefix "is" to "-ed" forms, both past participles and adjectives | isGovernedBy, not governedBy |
isCharacterizedBy, not characterizedBy |
|
| Prefer an ordinary verb to "hasX" or "isX" | precedes, not isFollowedBy |
| "At" rather than "on" for datetimes | isRecordedAt, not isRecordedOn. |
| Present tense only with minimal exceptions when the meaning is inherently in the past | isRenderedOn, not wasRenderedOn, but wasLastModifiedBy rather than isLastModifiedBy |
precedes, not preceded |
|
but wasLastModifiedAt, not isLastModifiedAt |
|
| General idiomaticity | hasRecipient, not hasGetter |
| No non-standard abbreviations | hasUnitOfMeasure, not hasUoM |
| Final prepositions where appropriate | hasJurisdictionOver, not hasJurisdiction |
| Alignment with textual definition | hasBiologicalParent, not hasParent, where the skos:definition precludes non-biological relationships |
| Loose coupling of ontology term names | hasStart, not hasStartTimeInstant |
| Unambiguously indicate directionality | hasBroader, not broader (as in SKOS) |
| Direction: go up rather than down a tree if a hierarchy exists | hasParent, not hasChild |
hasSuperCategory, not hasSubCategory |
|
| Word boundaries consistent across ontology rather than following natural language (exception to idiomaticity) | hasSubTask, hasSubCategory, hasSuperCategory, although "subtask" and "subcategory" are words |
hasBirthDate and hasDeathDate, although "birthdate" is a word |
These conventions apply to both data and taxonomy terms.
- A leading underscore.
- An infix indicating the type of the instance. As a rule of thumb, this is the most specific rigid class the instance belongs to, but there are exceptions where this is not viable (see below).
- A single underscore.
- The name of the instance, with spaces and hyphens replaced by underscores (no camelcasing) and only alphanumeric characters and underscores allowed.
- Leave case as it is.
A rigid class is one that the instance inherently belongs; it is part of the essence of the object, which would not be the same object if it did not belong to this class. A non-rigid class may be temporary and/or express a role or relationship; for example, Patient, Employee, Spouse. The notion of rigid classes originates in OntoClean.
The most specific rigid class is the rigid class that the instance most directly belongs to.
For example, given the class hierarchy Living Thing > Person > Professor, where the first two classes are rigid and the third is not, the name for Sir Tim Berners-Lee is _Person_Sir_Tim_Berners_Lee.
Exceptions to this guideline arise may arise in IRI minting during data mapping. Based on how the data is presented, it is often difficult or cumbersome to know the most specific type of an instance, so one can fall back on a higher-level class. E.g., when processing a table of organizations, it may not be possible to know which are governmental organizations, which corporations, and which non-profits, so the infix _Organization_ can be used throughout.
Exceptions to this guideline arise may arise in IRI minting during data mapping. Based on how the data is presented, it is often difficult or cumbersome to know the most specific type of an instance, so one can fall back on a higher-level class. E.g., when processing a table of organizations, it may not be possible to know which are governmental organizations, which corporations, and which non-profits, so the infix _Organization_ can be used throughout.
The following conventions apply to skos:prefLabel but not skos:altLabel, which by nature may be idiosyncratic.
- Title case (see definition of title case) below
- Normalized to natural language standards. E.g., hyphens inserted, acronyms in all caps, etc.
- Examples: AMA Guideline, ISBN-10
- Lower case
- Normalized to natural language standards. E.g., hyphens inserted, acronyms in all caps, proper nouns capitalized, etc.
- Examples: has unit of measure, has SSN, Unicode symbol, W2
There may occasionally be valid reasons to deviate from the conventions stated here:
- Deviation from wording of the local name. For example, the predicate
gist:isGeoContainedInuses a shortened form of "geographically" for conciseness. Theskos:prefLabeluses the fully spelled out word: "is geographically contained in."
The general label conventions have been captured in SHACL shapes which are run during the ontology build and release process and the repository continuous integration script. These shapes do not allow for special cases like capitalized proper names. To prevent validation failures, add the annotation gist:nonConformingLabel true to the term in the gistValidationAnnotations ontology so that label validation will be skipped.
The rules of title case are not universally standardized; standardization is only at the level of house styles and individual style guides. Most English style guides agree that the first and last words should always be capitalized, while articles, short prepositions, and some conjunctions should not be. Other rules about capitalization vary.
This style guide defines the rules for title case as follows:
- Capitalize:
- First and last words
- Words of four or more letters (e.g., Between, With, This)
- Second part of hyphenated word (e.g., Data-Centric, not Data-centric)
- Lowercase:
- Articles: a, an, the
- Conjunctions: and, but, if, for, or, nor, so, yet
- Prepositions: as, at, by, cum, ere, for, in, of, off, on, out, per, pre, pro, qua, re, sub, to, up, via
- Acronyms in all caps (e.g., SSN, ISBN)
- Capitalize everything else
gist uses SKOS annotations rather than rdfs:label and rdfs:comment. The accepted annotations, intended use, and previous usage are shown in the following tables. Refer to the SKOS ontology for formal definitions. SKOS annotations allow a more fine-grained approach to human-readable documentation. This change also aligns with emerging common practice.
Required Annotations for Classes, Properties, and Taxonomic Terms
| Annotation | Use | Instead Of |
|---|---|---|
skos:prefLabel |
Preferred label | rdfs:label |
skos:definition |
Definition | rdfs:comment |
Highly recommended
| Annotation | Use | Instead Of |
|---|---|---|
skos:scopeNote |
Additional clarifying comments about the meaning or usage of a term | rdfs:comment |
skos:example |
One or more examples | rdfs:comment |
These annotations help the user understand the use and meaning of the term, and prevent definitions from becoming lengthy and unstructured. skos:definition is expected to provide a definition, not lengthy usage notes or examples. These should instead be included in a skos:scopeNote or skos:example, respectively.
Negative examples should be prefaced by the text "Negative example:" or "Negative examples:". For example, the definition of gist:LivingThing includes skos:example "Negative examples: fictional life forms such as unicorns or Mickey Mouse."
Occasionally a definition can hardly be understood at all without an example or two, in which case they can be included in the skos:definition. For example, the term ResearchProduct might be defined as "An output of a research project, such as a document or spreadsheet."
Use where relevant
| Annotation | Use | Instead Of |
|---|---|---|
skos:altLabel |
Alternative label, where relevant | n/a |
skos:editorialNote |
Notes for editors | rdfs:comment |
RDFS annotations
Certain RDFS annotations are recommended where there is no SKOS equivalent.
| Annotation | Use |
|---|---|
rdfs:seeAlso |
Indicates a resource that may provide additional information about the subject. Should be a link to a web page or RDF resource rather than text. See examples of its use in gist to get an idea of where it would be helpful. |
rdfs:isDefinedBy |
Identifies the ontology module the term is defined in. Added automatically during gist release bundling and does not needed to be added by hand. |
Use only rarely
| Annotation | Comment |
|---|---|
skos:changeNote, skos:historyNote |
Normally these are obtained from the version control repository or version comparison. There is no further discussion of these annotations in this document. |
skos:note |
Use a more specific annotation whenever possible. |
Do not use
| Annotation | Instead Use |
|---|---|
rdfs:label |
skos:prefLabel |
rdfs:comment |
All other annotations, especially skos:scopeNote and skos:example |
| Annotation | Format |
|---|---|
skos:prefLabel, skos:altLabel |
See section Labels above. |
skos:definition, skos:scopeNote, skos:note, skos:editorialNote |
Full sentence(s) ending in period. It is acceptable to omit the subject at the beginning of the definition in order to avoid the vacuous "This predicate..." or "This class is..." E.g., "Relates a person to his or her spouse." or "A series of steps in a workflow." There should nevertheless be a final period. Use Oxford commas. |
skos:example |
May be either a full sentence or a list. Use a final period only in the former case. E.g., "SSN, driver's license number, employee ID" or "NIH sponsors a research project." Lists with short items, such as the first example, can be delimited by either commas (include Oxford commas) or semi-colons; full-sentence examples should be semi-colon-delimited. |
| Annotation | Cardinality |
|---|---|
skos:prefLabel |
Exactly 1 |
skos:definition |
Exactly 1 |
skos:scopeNote, skos:editorialNote, skos:note |
At the implementer's discretion, multiple unrelated notes can be included in either a single annotation or multiple annotations. |
skos:example |
Recommended practice is to combine all examples into a single annotation, especially if there is a list of short items. |
In general it is preferred to use natural language rather than ontology terms in annotations. For example, the definition of gist:GovernedGeoRegion reads "A defined geographical area (or areas) governed by exactly one country government." rather than using the ontology class names GeoRegion and CountryGovernment.
The exception is when a note needs to make specific reference to an ontology term rather than to a concept. For example, the scope note on gist:birthDate reads "This is a subproperty of gist:startDateTime rather than gist:actualStartDate because some living things have yet to be born."
Caution: gist is not yet fully aligned with this best practice, which is aspirational.
- Literal values should be typed with one of the datatypes included in the OWL 2 Datatype Maps. It is not necessary to explicitly type strings as
xsd:stringbecause the serializer will add this to all untyped literals.
Documentation is generally written in Markdown, and a Markdown linter should be applied to flag and fix Markdown rule violations. The Markdown config file ../.markdownlint.json configures the Markdown linter for use in the gist repository. If using VS Code as an editor, markdownlint is a helpful extension that provides code hints and can be configured to automatically correct errors based on the rule configuration.
All inverses were removed from gist as of version 12.0.0, with minor modifications in version 13.0.0. We consider it a best practice not to define inverses, for several reasons:
- Reduce cognitive load for developers and implementers of the ontology.
- Promote uniformity in the graph.
- Eliminate the need for duplicate query paths in queries.
- Reduce memory load during inferencing.
- Simplify validation and writing and maintaining SHACL shapes.
In selecting which of a potential pair of inverses to define, we apply the child-to-parent or cardinality principle: select the direction which will generally produce the fewest query results. Examples:
| Child to Parent | Parent to Child |
|---|---|
isMemberOf |
hasMember |
isPartOf |
hasPart |
hasBiologicalParent |
hasBiologicalChild |
hasSuperCategory |
hasSubCategory |
This principle will determine most but not all cases; e.g., precedes vs follows; in these cases an arbitrary decision is made.
We have defined an in-depth set of best practices governing the use of OWL restrictions (forthcoming). A summary without detailed rationale is provided here.
- Do not use equivalence to an
owl:allValuesFromrestriction or an exact or maximum cardinality restriction if you want to be able to infer instances into the defined class. Because these restrictions describe what does not exist, in combination with the Open World Assumption they prevent inference into the defined class. - Use equivalence to an
owl:allValuesFromrestriction or an exact or maximum cardinality restriction if you want to be able to infer instances into the complement of the defined class. - Use equivalence to an
owl:allValuesFromrestriction to infer into the object ofowl:allValuesFrom. For example, if a:ProductIdclass is the intersection ofgist:IDwith the restriction class that identifies only instances of:Product, then if something is identified by a product ID, we can infer that it is a product. - Choose between 1, 2, and 3 according to the type of inference that you care about.
- Subclassing to an
owl:allValuesFromrestriction or an exact or maximum cardinality restriction will provide additional information about the defined class without inferring into it. - Use equivalence with minimum cardinality,
owl:someValuesFrom,owl:hasValuerestrictions to infer an instance into the class. - Use
owl:someValuesFromrather than minimum cardinality 1 restrictions. - Use minimum cardinality with values greater than 1.
- Do not use minimum cardinality 0 restrictions. Use annotations to provide usage hints instead.
- Be sure that your restrictions express meaning rather than data integrity constraints. Consider the question "If an instance of X did not conform to Y, would it still be an X?"
- Express data constraints on particular data sets with SHACL.