Terms and Definitions
1. Introduction
The CWRC Ontology is the ontology of the Canadian Writing Research Collaboratory. This human-readable version of the ontology accompanies the introduction to the ontology (see Introduction). The ontology itself is the primary source for understanding the ontology.
The intended audience of this specification is scholars and practitioners who wish to understand how the ontology works.
This is a living ontology that makes no claims to completeness. Instances have been derived from particular datasets and are being expanded progressively over time. Continuity is ensured using the OWL ontology annotations for ontological compatibility and for deprecated classes and properties. Deprecated ontology terms remain present but are marked as such.
We welcome suggestions for new classes, properties, and predicates from those wishing to use the ontology for their own datasets, as well as suggestions related to the complexity of vocabularies associated with existing terms. Please submit suggestions via an issue or a pull request to the CWRC Ontology code repository.
This document includes an overview of the ontology, as well as the ontology specification itself.
Navigate to the ontology terms and definitions.
2. CWRC Ontological Structures
Contextual, Simple and Inverse Predicates
Source data from CWRC spans multiple types of data, including granular material such as bibliographic metadata and discursive, interpretive, or analytical content describing human lives, cultural processes, and historical events. The CWRC linked open data model described by this ontology represents such information as a series of assertions, frequently associated with particular contexts and linked, where available, to the prose source of the assertions.
Ensuring that context and provenance can be conjoined with assertions in order to situate the knowledge being represented is crucial to modelling research data effectively. At the same time, such ontological modelling results in complex structures that make the queries required to retrieve basic information unwieldy. To this end, the CWRC ontology creates assertions through a design that supports the tracking of provenance, certainty, and other complex components of the data, within the structure of the data. At the same time, it provides parallel properties that allow the data also to be more easily queried to produce simple subject-object-predicate assertions or triples that stand on their own. This design relies upon and extends the W3C Web Annotation Data model, which supports various modes of annotating web sources, namely identifying or describing entities named in a text, as well as connecting such entities to the web source being annotated. In this way, the CWRC ontology structure encourages situated knowledge production, while granular subject-predicate-objects views of the data are supported by the parallel predicates.
The CWRC Ontology thus provides three parallel types of predicates for properties or relationships:
- Contextual predicates (which begin with c_)
- Simple predicates (which usually begin with has)
- Inverse predicates (which begin with i_* or end with Of or By)
Contextual predicates
The underlying Web Annotation structure of contextual predicates produces interconnected sets of triples that situate the properties associated with entities through Contexts. Annotations are typed according to classes such as EducationContext or IntertextualityContext denoting the class of experience or phenomenon to which the assertions belong. The Web Annotation structure allows assertions to have properties that reflect their context and provenance, providing qualifying information such as certainty or relevant excerpts from the source of the data. Contexts allow users to evaluate the accuracy of the assertions extracted from the XML, and add nuance to information about human subjects and cultural phenomena that is not represented in the data structures. This structure adds complexity; hence the need for simple predicates that parallel contextual ones. The ontology uses the cwrc: contextFocus property to indicate the entity or entities that are the subjects of the properties when they are translated to simple triples.
Contexts are typed by major semantic categories including, for biographical material, Cultural Forms, Birth, Death, Education, Occupation, and Politics, and, for literary content, Production, Reception, and Textual Features.
The major context classes are as follows:
BiographyContext, BirthContext, CulturalFormContext, DeathContext, EconomicContext, EducationContext, FamilyContext, FriendsAndAssociatesContext, HealthContext, IntimateRelationshipContext, LeisureContext, NameContext, OccupationContext, SpatialContext, and ViolenceContext.
Simple predicates
Each contextual property or relationship has a parallel, subject-centric predicate that can be used to create more straightforward triples. Simple predicates can be used to make assertions using a subject, a predicate, and an object without additional context. Those who want to view the data through relationships that link entities to one another directly rather than mediating that relationship through the Web Annotation structures can access a representation of the data in the form of simple triples as a separate dataset. Such triples can also be derived from the full dataset of contextual triples by means of a batch script or by employing SPARQL Construct commands when running queries. (See 5. Properties for more information).
Inverse predicates
Inverse predicates constitute a third set of parallel properties used for simple predicates associated with WritingContexts and subclasses that are not symmetric. Inverse predicates are based on the same property terms for ease of reference; when prefixed with “i-” those properties go in the opposite direction from the predicates of the same name. Inverse predicates are used for Writing properties only; Biography properties have distinct but usually similar labels for inverses of simple triples, typically differing by preposition (such as “of” or “by”). Inverse properties are not created when the range of the main property is a string. The prefix of “i_” makes inverse property labels more consistent and recognizable. For example, i_hasCharacter is the inverse of has character, defined as “Indicates a person depicted as a character in a creative work”.
3. Sample Assertions representing L.E.L’s Cause of Death
Identifying Annotations
Much CWRC linked data involves the identification of named entities within CWRC documents, which is a straight-forward use-case for Web Annotation. Here below is a diagram of a Web Annotation identifying the person entity L.E.L. (the pen name of Letitia Elizabeth Landon) within an excerpt or snippet from the Orlando Project’s published textbase.
Web Annotation identifying L.E.L. in a section of her Orlando profile page.
This structure essentially points to a portion of the Orlando profile for L.E.L. and indicates that she is identified there, providing a snippet of text from that portion of the web page. There is no semantic representation of the cause of death beyond the words in this snippet. Identifying Annotations are a dependency of Describing Annotations, since descriptions rely upon having accurately identified the entity being described.
Describing Annotations with Contexts
While “identifying” annotations are used to indicate the presence of particular entities in a text, we use a Web Annotation motivation value of “describing” when it comes to representing properties such as a person’s cause of death, since these annotations describe what Orlando asserts about an entity.
Context classes applied to Web Annotations provide the discursive context for assertions in the data, here the Death Context that is a subclass of the Biography Context. Where the assertions have been generated from a web-accessible source text, a set of contextual triples structured as a Web Annotation also provides the text, or the relevant snippet of a longer text from which they have been extracted, here the L.E.L. profile.
Web Annotation representing L.E.L. cause of death through contextual triples
The relationship between L.E.L. and her cause of death is asserted here mediated through the Web Annotation at the centre of the graph. The entity we might assume would be the the subject of an assertion about a cause of death, namely L.E.L., is present but is structurally an object of the annotation L.E.L is, however, the “contextFocus” of the annotation, indicating that she is the subject in this Context and that the properties asserted in this annotation apply to her, with CWRC properties operating as subproperties of Web Annotation predicates. The annotation provides links to and snippets from the source text from which the assertion was extracted, a certainty value for the assertion and links to bibliography sources cited in the source text (not shown here). Although less human-readable than a simple triple, this structure retains contextualizing information crucial to robust humanities LODsets.
The above graph thus represents a single interconnected set of triples using contextual predicates to represent L.E.L.’s cause of death to illustrate
Moreover, contextualized assertions provide a means of dealing with contradictory data: multiple Contexts can represent multiple assertions even though it may be logically impossible for all to be true. L.E.L.'s cause of death is in fact tagged in her Orlando profile as murder, accident, and overdose, resulting in three causes of death. In the case of such a contradiction, either a low certainty value or some form of flagging of the data as a matter of interest could be generated. The CWRC ontology model thus provides support for reasoning that can help to manage or point to logical differences within or between datasets that may identify areas of controversy, edge cases, or cases of negation or contradiction, not as noise or dirt to be weeded out of the data, but rather as points of contention that humanities scholars frequently want to identify, highlight, or examine more closely.
Simple triples
A simple triple representing L.E.L.’s cause of death can be constructed from the Describing Web Annotation above.
L.E.L. cause of death represented as a simple triple with L.E.L. as its subject.
The structure of simple triples means that the contextual information cannot be carried with them, unless one wants to translate the data into an alternative linked data structure using a strategy such as reification. However, SPARQL queries can also be used to produce tabular data that provides this information alongside some of the context. For instance, the following is an example of how a table could provide contextual snippets which could be rendered details in a chart or graph visualization of the causes of deaths of a number of writers:
Name of Writer | Cause of Death | Discursive context |
---|---|---|
[ . . . ] | [ . . . ] | [ . . . ] |
Ellis Cornelia Knight | Disease of the respiratory system | ECK died in Paris of an inflammation of the Lungs. |
L.E.L. | Murder | Less than two months after arriving in Africa and only four years after her wedding, LEL was discovered lying lifeless in her room, in what seemed to be suspicious circumstances. |
L.E.L. | Drug poisoning | Less than two months after arriving in Africa and only four years after her wedding, LEL was discovered lying lifeless in her room, in what seemed to be suspicious circumstances. |
L.E.L. | External cause | Less than two months after arriving in Africa and only four years after her wedding, LEL was discovered lying lifeless in her room, in what seemed to be suspicious circumstances. |
May Laffan | Cerebrovascular diseases, stroke | ML died in the Bloomfield Institution, an insane asylum in Dublin, from the lingering effects of a brain haemorrhage two months earlier. |
[ . . . ] | [ . . . ] | [ . . . ] |
4. Vocabularies
The main CWRC ontology includes several embedded taxonomies for enumerating the categories associated with certain classes (e.g., political affiliation, religion, occupation). These taxonomies are structured using the W3C SKOS (Simple Knowledge Organization System), sometimes in combination with OWL properties. SKOS properties link terms to each other but lack support for reasoning and the expressiveness enabled by OWL relationships. Although OWL is the preferred means of using this ontology, SKOS terms are provided where possible to support its use as a vocabulary. Many parallel terms are available in the CWRC vocabulary and other SKOS vocabularies offered by the Linked Infrastructure for Networked Cultural Scholarship (LINCS).
Several significant features of vocabularies embedded in CWRC are highlighted below.
a. Cultural Forms
The CulturalForm classes recognize categorization as endemic to social experience, while incorporating variation in terminology and contextualization of identity categories by employing instances at different discursive levels. A cultural form represents an aspect of lived social subjectivities and/or classification of a person through categories such as race or colour, ethnicity, gender, language, sexuality, politics, or religion. Most of the properties associated with specific Cultural Forms may also have the additional modifiers of reported and self-reported, allowing for the qualification of individual statements.
Context Properties
Cultural Form sub-classes and instances describe the subject positions of individuals through Contexts. Contexts are linked to granular Cultural Form properties that are in turn associated with the person who, or, in the case of Writing relationships, work which is the contextFocus of the annotation. This has its roots in the Orlando arrangement of Cultural Form encodings that points users towards a framework for raising and debating complex matters for cultural investigation rather than invoking reified categories.
Entity types that can be the object of a contextFocus property are person, work, organization, and place. The contextFocus becomes the subject of the subject-centric properties that can be derived from context predicates via SPARQL CONSTRUCT queries.
The shift from embedded semantic markup to a linked open data approach presented the challenge of making this approach compatible with linkages to other ontologies and datasets outside of the Orlando frame of reference. The move from "strings to links" or "strings to things" was in some sense at odds with the former embrace of the ambiguity of strings such as white, black, English, etc.: white and black can represent race or ethnicity, while English can also be invoked as an ethnicity, nationality, or a national heritage. Orlando marks these strings using its Cultural Forms tagset as specific to, for example, the context of race or ethnicity, mandating a similar association, within the linked data representation, with a specific instance of Cultural Form. Thus, there exist Cultural Form instances that point to the discursive construction of white as a race and white as an ethnicity. Lastly, there also exists a white label that can be instantiated as either race or ethnicity, but not both within the same assertion (although multiple assertions are possible).
This is a departure from previous (non-linked open data) controlled vocabularies, in that the appearance of the term or label (in this case "white") does not indicate the specific cultural formation being invoked, the specific instance does. This also means that linkages to other datasets or vocabularies can be made appropriately, since multiple representations of the same label are present within the CWRC ontology. As a last resort, or for data mining purposes, the term is also available as a concept whose actual Cultural Form is undecided amongst the CWRC-defined options. This allows for linkages to an external ontology, such as can be required by text mining, without endorsing the corresponding definition or interpretation of the term. Finally, skos:altLabel properties provide variant terms that indicate the unstructured vocabulary terms or strings that have been translated to these vocabulary instances.
Granular Properties
Granular properties provide a simple means of indicating cultural categories as presumed, perceived, or otherwise assigned to a person according to cultural conventions, or as self-reported by the person themselves. Some of the properties are associations inherited from forebears.
Most properties take noun forms in keeping with conventions for ontologies, but in some cases idiom makes adjectival forms preferable, even though these terms function as nouns, as in the case of the sexual identity celibate.
b. Genre Ontology
The separate CWRC Genre ontology contains a taxonomy of cultural media, forms, and genres, with a strong emphasis on literary genres, based on a combined OWL and SKOS approach. In order to facilitate the ability both to apply terms to particular cultural productions and enable them to be discussed as concepts, genres are instances within the CWRC Genre ontology, and are related to particular cultural works through the hasgenre property.
Because it is useful for a number of purposes to be able to move between broader genres and more specific ones, the ontology organizes its terms through a combination of OWL classes that provide broad groupings and SKOS taxonomic relations amongst instances. The OWL classes provide the ability to reason from narrower to broader terms, while the SKOS properties provide more limited transitive narrower/broader relationships.
Learn more about the Genre ontology by visiting the Genre Terms and Definitions page or see the LINCS Genre vocabulary page for a SKOS version.
c. Illness & Injury Ontology
The separate Illness & Injury ontology contains a taxonomy of terms to describe health problems or causes of death in the context of historical or cultural analysis where specific information is often unavailable. It is an abbreviated version of the World Health Organization's (WHO) Startup Mortality List (ICD-10-SMoL), the WHO's simplified application of ICD-10. It was developed to add structure to the designations for cause of death and health problems represented in biographical information. ICD defines the universe of diseases, disorders, injuries and other related health conditions. These terms were based on the version of ICD-10-SMoL from June 2018.
Learn more about the Illness & Injury ontology by visiting the Illness & Injury Terms and Definitions page or see the LINCS Injuries & Illnesses vocabulary page for a SKOS version.
d. Has Functional Relation
The Functional Relation predicate indicates that the two terms may be treated as related for functions such as querying and retrieval, but it also denies a semantic relationship between the two terms. This predicate is designed to bring together incommensurate terms for processing purposes but also to exclude them from semantic operations. This differentiates from, for instance, the skos:semanticRelation property and the skos:closeMatch predicate which serves a similar purpose but asserts a semantic proximity.
One of the purposes for this relation is to facilitate comparisons and relationships to other ontologies and vocabularies with which users are more familiar. Use of this relationship does not assert that the two terms are not related semantically, but rather that the current semantic relationships available within OWL, SKOS, and other ontologies used by this ontology are not sufficiently nuanced to allow for a semantic relationship to be specified in a way that can be processed appropriately by other tools (such as inference engines).
e. Linkages to external ontologies and vocabularies
We employ a number of strategies for linking to other ontologies. Our architecture does not typically import other ontologies wholesale, but relates to large vocabularies in defined ways. We try not to abuse sameAs predicates (Halpin, Hayes et al., 2010).
We adopt external namespaces and associated classes and terms wherever possible when they are in widespread use and their vocabularies are broadly compatible with ours, as in the case of the FOAF and BIBFRAME ontologies. For some terms, such as those for religious denominations or genres, we are happy to draw on other vocabularies’ terms and definitions in part or in whole, as in the case of terms from the Getty Art and Architecture Thesaurus (Getty Research Institute). Other terms are referenced, but usually at a distance rather than through wholesale import. This is particularly common in relation to cultural forms, which, as explained more fully below, are understood primarily as representational and linked, where multiple related terms exist within the ontology, to terms typed as textual labels. By means of this structure, our vocabulary positions all terms associated with processes of Cultural Form as discursive labels, retaining the ambiguity of terms implicated in the complex social construction of identities within a narrative. Cultural forms may in turn be related to external ontologies in a number of ways. If an external ontology term aligns semantically with ours, then we use OWL- or SKOS-based relationships such as owl:equivalentClass, skos:narrower or skos:broader. If an external term's definition or use is not commensurate with a term in the CWRC ontology but its application in external datasets is such that it will be useful nevertheless to link those terms to ours (for instance for broadening searches using the problematic ISO5218 Codes for the representation of human sexes), then the has functional relation predicate is employed to indicate that the relationship is specified semantically but may be leveraged for processing.
At the top level, the CWRC ontology makes use of the following well known ontologies:
- The FOAF ontology for the representation of people and organizations.
- The BIBFRAME ontology for the representation of bibliographic data.
- The Web Open Annotation data model is used to link the original Orlando text to specific Contexts.
- SKOS is used to represent taxonomic relationships within certain Cultural Forms and to fully document ontology terms.
- Some Dublin Core terms are used for well-known documentation tags such as dcterms:title.
- The W3C Provenance ontology is used to indicate indebtedness, derivation or provenance of term descriptions as well as Cultural Context source annotations.
- Linkages are made to the CIDOC-CRM ontology for cultural instances held in common with CWRC.
- The Simple Event Ontology is used for the representation of events.
- The Organization Ontology is used for the representation of organizations
- The CiTO and BiRO ontologies are used for citations.
Established ontologies and vocabularies are used in the definition of numerous classes and instances. For instance, the religious terms of the Getty Art and Architecture Thesaurus provide suitable definitions for many religions, as does DBPedia for many terms throughout the ontology. Sometimes definitions draw on scholarly print and online sources. Quotation marks around the text of the description indicate wholesale adoption of the source definition. Where the description is not surrounded by quotation marks, the term has been defined by the CWRC team, but links may be provided to external resources such as a scholarly article or closely related DBpedia entry.
In other cases, terms from external ontologies are adopted in CWRC datasets without having been imported into our ontology. What follows is a non-exhaustive list of such vocabularies and the classes for which they are most frequently used:
- Geonames terms are often used for locations and for many instances of geographic heritage.
- Library of Congress Languages codes are typically used for instances of language.
f. Labels and Languages
For labelling, CWRC utilizes two means of promoting searchability. rdfs:label represents the human-readable nomenclature for a concept, instance, or predicate. This is the terminology used when representing components of the ontology in documentation and diagrams, except where a URI is provided.
Textual labels
As noted above in relation to cultural form, however, when textual label is used to type a class, this is an indication of the representationality or discursivity of that class. cwrc:TextLabels are frequently used for ambivalent, overlapping, and culturally contested terms.
Example of Jewish identity as a textual label that brings together and flags the discursivity of many different terms associated with Jewishness.
Alternative Labels
In addition, to support those with knowledge of prior datasets whose strings or terms have been linked for extraction purposes to CWRC terms, the ontology provides additional linguistic properties for CWRC ontology terms. Alternative labels, signified by skos:altLabel, indicate variant terms. Alt labels cannot serve as replacements for rdfs:label. Within the ontology, such alternative labels primarily exist for search and retrieval by allowing ontology terms to be located under a larger number of labels. The skos:hiddenLabel property has been used to indicate terms from source datasets that are too idiosyncratic or in some way problematic. Neither skos:altLabels nor skos:hiddenLabels are displayed in the human-readable definitions. Although they reflect the idiosyncrasies of the source data, they may be useful with discretion for broadening searches. Alternative and hidden labels continue to be revised in conjunction with additions to or refinements of the data.
Language
Labels and definitions in French, English (and other serendipitously available languages) are never word-for-word translations, and are definitions in their own right.
g. Persons and Personae
The distinction between persons and personas is an important aspect of the complexity of human experiences and relationships. The CWRC ontology makes it possible to capture the originality and plurality of personhood.
This ontology adopts the broad FOAF definition of a foaf:person, which can be applied to any entity considered to be a person, including non-humans, and defines two subclasses of Person: a NaturalPerson or human being and a FictionalPerson, since fictional characters are important to literary studies. This allows a historical person who is a NaturalPerson but fictionalized as a Character in a text to be typed also as a FictionalPerson. If a text simply alludes or refers to a NaturalPerson, however, it would not make sense to type them as a FictionalPerson.
In some cases, a Person will be associated with a Persona, also a subclass of foaf:person. One or more NaturalPersons can have Persona which may have its own distinct properties. The author Michael Field offers an example of the extent to which "personhood is both a complex and a crucial characteristic that ontologies must be designed to capture appropriately" (Brown and Simpson 2013). The persona of Michael Field was produced by the artistic and lived collaboration between Katherine Harris Bradley and Edith Emma Cooper at the turn of the twentieth century. Even though he was not a biological person, Michael Field had an important role in the two women’s careers, their social lives, and their personal relationship. "Michael Field" is not a shared pseudonym, and is associated with two natural persons at the same time. The ontology thus includes the "persona" class of person that can be used to describe entities such as Michael Field. Personae are more than pen-names or stage names, such as "Ellis Bell" for Emily Brontë, involving a performed personality that goes beyond the page: for instance, the FASTWÜRMS art collective operates as more than a creative identity, to the point of holding a single academic position at the University of Guelph. As FOAF Persons, personae may engage in social, literary, artistic or political activities but they are distinct from the CWRC Natural Persons who embody them and from Fictional Persons, unless they become fictionalized by themselves or others. (See also the Persona tag in the Text Encoding Initiative Guidelines and the distinction between personae and roles. Roles may be further developed in the future in connection with the event component of the ontology.)
5. CWRC Ontology Design Rules
Beyond the formalism of The OWL 2 Web Ontology Language, the CWRC ontology follows the following design rules and styles:
- The contents of rdfs:labels tags are always in lowercase, with the following exceptions:
- Labels for religions, political affiliations and groups of people derived from a proper name begin with an uppercase letter.
- Definitions in French, English (and other serendipitously available languages) are never word-for-word translations, and are definitions in their own right.
6. Classes
The ontology includes several embedded taxonomies for enumerating the categories associated with certain classes (e.g., political affiliation, religion, occupation). Where possible, the taxonomies are SKOS-based, or a combined OWL and SKOS approach.
7. Properties
There are three types of properties used in the CWRC ontology: context-based predicates (which begin with c_*), simple predicates (which usually begin with has), and inverse predicates (which begin with i_* or end with Of or By).
For Writing properties (only) that are not symmetric, inverse predicates are based on the same property terms for ease of reference; when prefixed with “i-” those properties go in the opposite direction from the predicates of the same name. Biography properties have distinct but usually similar labels for inverses of simple triples, typically differing by preposition (such as “of” or “by”).
Context-based predicates are ones associated with the more complex Web Annotation Data Model-based structures that allows for the connection of assertions to context, provenance, and source information. These link to the snippet of text from which a predicate was derived, where available.
Simple predicates are used to construct subject-predicate-object assertions that are divorced from their context but more amenable to certain kinds of queries and visualizations such as network graphs. It is possible with SPARQL queries to extract simple predicates in conjunction with extracts for tabular output.