Skip to main content

Terms and Definitions

1. Introduction

The CWRC Ontology is the ontology of the Canadian Writing Research Collaboratory. This human-readable version of the ontology accompanies the introduction to the ontology (see Introduction). The ontology is the primary source for understanding how the ontology works.

The intended audience of this document are scholars and practitioners who wish to understand how the ontology works and who intend to make use of this ontology.

The ontology is understood to be a living document that makes no claims to completeness. Instances have been derived from particular datasets and will be expanded progressively over time. Continuity is ensured using the OWL ontology annotations for ontological compatibility and for deprecated classes and properties. Deprecated ontology terms remain present but are marked as such.

We welcome suggestions for new classes, properties, and predicates from those wishing to use the ontology for their own datasets, as well as suggestions related to the complexity of vocabularies associated with existing terms. Please submit suggestions via an issue or a pull request to the CWRC Ontology code repository.

This document includes an overview of the ontology, as well as the ontology itself.

Navigate to the ontology terms and definitions.

2. CWRC Ontological Structures

Source data from CWRC spans multiple types of data, including granular material such as bibliographic metadata and discursive, interpretive, or analytical content describing human lives, cultural processes, and historical events. The CWRC linked open data model described by this ontology represents such information as a series of assertions, frequently associated with particular contexts and linked, where available, to the source of the assertions.

Ensuring that context and provenance can be conjoined with assertions in order to situate the knowledge being represented is crucial to modelling research data effectively. At the same time, such ontological modelling results in complex structures that make the queries required to retrieve basic information unwieldy. To this end, the CWRC ontology creates assertions through a design that supports the tracking of provenance, certainty, and other complex components of the data, within the structure of the data, but provides parallel properties that allow the data also to be easily queried to produce simple subject-object-predicate triples that stand on their own. This design relies upon and extends the Web Annotation Data model, which supports various modes annotating web sources, such as identifying entities named in a text, connecting such entities to the web source being annotated.

CWRC extends the Web Annotation structure to carry properties associated with those entities through a series of Contexts, typed according to a number of high-level classes such as EducationContext, CulturalFormContext, and OccupationContext. Each property associated with an entity for each Context has a parallel, subject-centric versions. These subject-centric versions support the use of SPARQL Construct commands to derive entity-centric triples that link entities directly to properties, rather than mediating that relationship through the Web Annotation structures. We refer to such entity- or subject-centric triples as simple triples. Simple triples make assertions through a subject, a predicate, and an object without additional context, whereas the complex triples (or contextual triples) in the source data support the expression of more complex, nuanced and contextualized knowledge. In this way, provenance tracking and situated knowledge production is encouraged by the structure of the ontology, while the production of granular subject-predicate-objects views of the data is supported by the parallel predicates. Complex triples thus link a fragment of source text to the entity or entities it references, the properties or assertions carried by the annotation, qualifying information such as certainty, and the Context denoting the class of experience or activity to which it belongs.

a. Identifying Annotations

Much CWRC linked data will involve the identification of named entities within CWRC documents, which is a straight-forward use-case for Web Annotation. Here below is a diagram of a Web Annotation identifying the person entity L.E.L. (Letitia Elizabeth Landon) within an excerpt or snippet from the Orlando Project’s published textbase.

Web Annotation identifying L.E.L. in a portion of the Orlando Project

b. Contexts (Describing Annotations)

The Context class uses Web Annotations to provide the discursive context for interpretive assertions in the ontology. Where the assertions have been generated from a web-accessible source text, a Context provides the text, or the relevant snippet of a longer text, from which they have been extracted. Contexts help to ground the data in its source materials, which can provide users with a sense of the nuance and complexity of assertions related to human subjects and cultural phenomena.

Contexts are typed by major semantic categories including, for biographical material, Cultural Forms, Birth, Death, Education, Occupation, and Politics, and, for literary content, Production, Reception, and Textual Features. The major context classes are as follows:

BiographyContext, BirthContext, CulturalFormContext, DeathContext, EconomicContext, EducationContext, FamilyContext, FriendsAndAssociatesContext, HealthContext, IntimateRelationshipContext, LeisureContext, NameContext, OccupationContext, SpatialContext, and ViolenceContext.

CWRC Contexts build on the Web Annotation structure, classing Annotations by Context. While identifying annotations are used to indicate the presence of particular entities, we shift the motivation of the annotation to describing when it comes to properties such as cause of death, since these annotations are describing what Orlando is saying, as well as engaging in description themselves.

Web Annotation incorporating L.E.L. cause of death through Web Annotation Context-centric triples

The relationship between L.E.L. and her cause of death is present here, but in a “Context-centric” rather than a subject-centric view of the data. What we think of intuitively as the “Subject” of the cause of death triple, that is L.E.L., is present but is structurally an object: L.E.L, however, has a special, CWRC-defined relationship to the Annotation as the “contextFocus”, which is to say the subject of this particular Context. The “contextFocus” property, like other CWRC-defined properties linked to the annotations, becomes a subproperty of the standard oa:hasBody property. Links to source documents from which the assertion was extracted are there, as is certainty, citation links (not shown here), and potentially other information about this complex claim. This structure is not as intuitively human-readable as a simple triple, but retains contextualizing information crucial to robust humanities LODsets.

The above graph represents a single interconnected set of complex triples generated according to the CWRC ontology’s Context-centric data model; this is the structure of the source data. Simple triples can be generated from source data through SPARQL Construct queries.

The CWRC Ontology provides three types of predicates: Context-centric predicates (which begin with c_), simple predicates (which usually begin with has), and inverse predicates (which begin with i_ or end with Of or By) (see 5. Properties for more information). This means that, for those who want to view the data through see more subject-centricdirect relationships, predicates can be pulled as a batch script to create a derivative dataset or through the SPARQL Construct commands when running queries.

In addition, the ontology includes several embedded taxonomies for enumerating the categories associated with certain classes (e.g., political affiliation, religion, occupation). These taxonomies are structured using the W3C SKOS (Simple Knowledge Organization System), sometimes in combination with OWL properties. SKOS properties link terms to each other but lack the expressiveness enabled by OWL relationships and do not support reasoning. Although OWL is the preferred means of using this ontology, SKOS terms are provided where possible to support its use as a vocabulary. A parallel SKOS CWRC vocabulary is available through the Linked Infrastructure for Networked Cultural Scholarship (LINCS).

Moreover, the relationship between two different contextualized assertions provides a means of dealing with contradictory data through an OR relation that allows both Contexts to exist even though two assertions associated with them cannot logically both be true – as is the case with L.E.L. whose cause of death is asserted to be both murder or suicide in the source document. In such a case, a low certainty value could be generated based on this contradiction in the data. This model thus provides support for reasoning that will help to identify inaccuracies in the data. It will also help to highlight areas of controversy, edge cases, differences across datasets, and cases of negation or contradiction, not as noise or dirt to be weeded out of the data, but rather as things that humanities scholars frequently want to zero in on and examine more closely.

c. Persons, Personas and Roles

The distinction between persons, personas, and roles is an important component of the complexity of human experiences and relationships.

This ontology adopts the broad FOAF definition of a foaf:person, which can be applied to any entity considered to be a person, including non-humans. We define two subclasses of Person: a NaturalPerson or human being, and a FictionalPerson, since fictional characters are important to literary studies. If a historical person who is a NaturalPerson is fictionalized as a Character [not yet created] in a text, they also become a FictionalPerson. If a text simply alludes or refers to a NaturalPerson, however, they are not also a FictionalPerson.

In some cases, a Person will be associated with a Persona. A person can occupy a Role [not yet created] in relation to a specific event or situation.

The author Michael Field offers an example of the extent to which "personhood is both a complex and a crucial characteristic that ontologies must be designed to capture appropriately" (Brown and Simpson 2013). The persona of Michael Field was produced by the artistic and lived collaboration between Katherine Harris Bradley and Edith Emma Cooper at the turn of the twentieth century. Even though he was not a biological person, Michael Field had an important role in the two women’s careers, their social lives, and their personal relationship. "Michael Field" can neither be assigned to one of the authors over the other, nor can it be considered only a shared pseudonym. Michael Field is associated with two natural persons at the same time. We seek in the CWRC ontology to capture such manifestations of the originality and the plurality of personhood. The ontology thus includes the "persona" class of person to describe entities such as Michael Field.

It might be argued that such personae are simply pen-names or stage names, such as "Currer Bell" for Charlotte Brontë. However, personae are more than alternative signatures. Personae inflect the ways in which artists socially, symbolically, intimately or artistically embody authorship. While a pen-name can be described as a publication strategy related to a specific context, a persona has its own performed personality that goes beyond a signature. A contemporary example is the FASTWÜRMS art collective. The collective operates as more than a creative identity, to the point of holding a single academic position at the University of Guelph.

A particular persona is an original creation, often bearing meaning related to the biographical, historical and sociological context of its creator. A persona in this sense is also not generally associated with mental illness or multiple personality disorders that result from distorted or uncontrolled perceptions of reality. At the heart of a persona is an identity with which others interact and that can be confused with a Natural Person. It is incarnated and developed by a natural person, may have specific properties such as gender or sexuality that differ from those of the natural person with whom it is associated, and may engage in social, literary, artistic or political activities. Although Personae are FOAF Persons, they are distinct from the CWRC Natural Persons who embody them and from Fictional Persons, unless they become fictionalized by themselves or others.

As documented for the recent Persona tag incorporated in the Text Encoding Initiative Guidelines, personae are not Roles either: "A role may be assumed by different people on different occasions, whereas a persona is unique to a particular person, even though it may resemble others. Similarly, when an actor takes on or enacts the role of a historical person, they do not thereby acquire a new persona." (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDPERSE).

A Role can be adopted by either personae or natural persons, but a persona cannot be adopted by people generally: it is specific to one natural person, or more rarely several natural persons (as in the case of the collaborative Field and the artistic collective FASTWÜRMS art collective).

Roles are characters or functions performed in specific occasions and situations, which is to say events. Dramatic roles, that is to say #Character in a creative work, are adopted by actors for particular performances. By analogy, social roles are adopted by particular individuals in relation to particular events or occurrences, which may be of brief or long duration. Key roles in relation to events are those of agents, spectators, and commentators. Occupations, jobs, or significant activities are not the same as roles, although they may be related to them, as may familial or social relationships. Roles will be further fleshed out in relation to the event component of the ontology, which is currently under development.

d. Cultural Form

The CulturalForm classes recognize categorization as endemic to social experience, while incorporating variation in terminology and contextualization of identity categories by employing instances at different discursive levels. A cultural form represents an aspect of lived social subjectivities and/or classification of a person through categories such as race or colour, ethnicity, gender, language, sexuality, politics, or religion. Most of the properties associated with specific Cultural Forms may also have the additional modifiers of reported and self-reported, allowing for the qualification of individual statements.

i. Context properties

Cultural Form sub-classes and instances describe the subject positions of individuals through Contexts. Contexts are linked to granular Cultural Form properties that are in turn associated with the person who is the contextFocus of the annotation. This has its roots in the Orlando arrangement of Cultural Form encodings that points users towards a framework for raising and debating complex matters for cultural investigation rather than invoking reified categories.

The shift from embedded semantic markup to a linked open data approach presented the challenge of making this approach compatible with linkages to other ontologies and datasets outside of the Orlando frame of reference. The move from "strings to links" or "strings to things" was in some sense at odds with the former embrace of the ambiguity of strings such as white, black, English, etc.: white and black can represent race or ethnicity, while English can also be invoked as an ethnicity, nationality, or a national heritage. Orlando marks these strings using its Cultural Forms tagset as specific to, for example, the context of race or ethnicity, mandating a similar association, within the linked data representation, with a specific instance of Cultural Form. Thus, there exist Cultural Form instances that point to the discursive construction of white as a race and white as an ethnicity. Lastly, there also exists a white label that can be instantiated as either race or ethnicity, but not both within the same assertion (although multiple assertions are possible).

This is a departure from previous (non-linked open data) controlled vocabularies, in that the appearance of the term or label (in this case "white") does not indicate the specific cultural formation being invoked, the specific instance does. This also means that linkages to other datasets or vocabularies can be made appropriately, since multiple representations of the same label are present within the CWRC ontology. As a last resort, or for data mining purposes, the term is also available as a concept whose actual Cultural Form is undecided amongst the CWRC-defined options. This allows for linkages to an external ontology, such as can be required by text mining, without endorsing the corresponding definition or interpretation of the term. Finally, skos:altLabel properties provide variant terms that indicate the unstructured vocabulary terms or strings that have been translated to these vocabulary instances.

ii. Granular Properties

Granular properties provide a simple means of indicating cultural categories as as presumed, perceived, or otherwise assigned to a person according to cultural conventions, or as self-reported by the person themselves. Some of the properties are associations inherited from forebears.

Most properties take noun forms in keeping with conventions for ontologies, but in some cases idiom makes adjectival forms preferable, even though these terms function as nouns, as in the case of the sexual identity celibate.

e. Genre Ontology

The separate CWRC Genre ontology contains a taxonomy of cultural media, forms, and genres, with a strong emphasis on literary genres, based on a combined OWL and SKOS approach. In order to facilitate the ability both to apply terms to particular cultural productions and enable them to be discussed as concepts, genres are instances within the CWRC Genre ontology, and are related to particular cultural works through the hasgenre property.

Because it is useful for a number of purposes to be able to move between broader genres and more specific ones, the ontology organizes its terms through a combination of OWL classes that provide broad groupings and SKOS taxonomic relations amongst instances. The OWL classes provide the ability to reason from narrower to broader terms, while the SKOS properties provide more limited transitive narrower/broader relationships.

Learn more about the Genre ontology by visiting the Genre Terms and Definitions page or the Genre vocabulary page.

f. Illness & Injury Ontology

The separate Illness & Injury ontology contains a taxonomy of terms to describe health problems or causes of death in the context of historical or cultural analysis where specific information is often unavailable. It is an abbreviated version of the World Health Organization's (WHO) Startup Mortality List (ICD-10-SMoL), the WHO's simplified application of ICD-10. It was developed to add structure to the designations for cause of death and health problems represented in biographical information. ICD defines the universe of diseases, disorders, injuries and other related health conditions. These terms were based on the version of ICD-10-SMoL from June 2018.

Learn more about the Illness & Injury ontology by visiting the Illness & Injury Terms and Definitions page or the Injuries & Illnesses vocabulary page.

g. Has Functional Relation

The Functional Relation predicate indicates that the two terms may be treated as related for functions such as querying and retrieval, but it also denies a semantic relationship between the two terms. This predicate is designed to bring together incommensurate terms for processing purposes but also to exclude them from semantic operations. This differentiates from, for instance, the skos:semanticRelation property and the skos:closeMatch predicate which serves a similar purpose but asserts a semantic proximity.

One of the purposes for this relation is to facilitate comparisons and relationships to other ontologies and vocabularies with which users are more familiar. Use of this relationship does not assert that the two terms are not related semantically, but rather that the current semantic relationships available within OWL, SKOS, and other ontologies used by this ontology are not sufficiently nuanced to allow for a semantic relationship to be specified in a way that can be processed appropriately by other tools (such as inference engines).

h. Linkages to external ontologies

We employ a number of strategies for linking to other ontologies. Our architecture does not typically import other ontologies wholesale, but relates to large vocabularies in defined ways. We try not to abuse sameAs predicates (Halpin, Hayes et al., 2010).

We adopt external namespaces and associated classes and terms wherever possible when they are in widespread use and their vocabularies are broadly compatible with ours, as in the case of the FOAF and BIBO vocabularies. For some terms, such as those for religious denominations or genres, we are happy to draw on other vocabularies’ terms and definitions in part or in whole, as in the case of terms from the Getty Art and Architecture Thesaurus (Getty Research Institute). Other terms are referenced, but usually at a distance rather than through wholesale import. This is particularly common in relation to cultural forms, which, as explained more fully below, are understood primarily as representational and linked, where multiple related terms exist within the ontology, to terms typed as textual labels. By means of this structure, our vocabulary positions all terms associated with processes of Cultural Form as discursive labels, retaining the ambiguity of terms implicated in the complex social construction of identities within a narrative. Cultural forms may in turn be related to external ontologies in a number of ways. If an external ontology term aligns semantically with ours, then we use OWL- or SKOS-based relationships such as <owl:equivalentClass>, <skos:narrower> or <skos:broader>. If an external term's definition or use is not commensurate with a term in the CWRC ontology but its application in external datasets is such that it will be useful nevertheless to link those terms to ours (for instance for broadening searches using the problematic ISO5218 Codes for the representation of human sexes), then the has functional relation predicate is employed to indicate that the relationship is specified semantically but may be leveraged for processing.

At the top level, the CWRC ontology makes use of the following well known ontologies:

  1. The FOAF ontology for the representation of people and organizations.
  2. The BIBO ontology for the representation of bibliographic data.
  3. The **TIME **ontology for the representation of events and points in time where ISO8601/XML Schema times are not appropriate.
  4. The Web Open Annotation data model is used to link the original Orlando text to specific Contexts.
  5. The SKOS vocabulary is used to represent taxonomic relationships within certain Cultural Forms and to fully document ontology terms.
  6. Some Dublin Core vocabulary terms are used for well known documentation tags such as <dcterms:title>.
  7. The W3C Provenance ontology is used to indicate indebtedness, derivation or provenance of term descriptions as well as Cultural Context source annotations.
  8. Linkages are made to the CIDOC-CRM ontology to cultural instances that are in common with CWRC.

Established ontologies and vocabularies are used in the definition of numerous classes and instances. For instance, the religious terms of the Getty Art and Architecture Thesaurus provide suitable definitions for many religions, as does DBPedia for many terms throughout the ontology. Sometimes definitions draw on scholarly print and online sources. Quotation marks around the text of the description indicate wholesale adoption of the source definition. Where the description is not surrounded by quotation marks, the term has been defined by the CWRC team, but links may be provided to external resources such as a scholarly article or closely related DBpedia entry.

In other cases, terms from external ontologies are adopted in CWRC datasets without having been imported into our ontology. What follows is a non-exhaustive list of such vocabularies and the classes for which they are most frequently used:

  1. Geonames terms are often used for locations and for many instances of geographic heritage.
  2. Library of Congress Languages codes are typically used for instances of language.

i. Labels

For labelling, CWRC utilizes two means of promoting searchability. rdfs:label represents the human-readable nomenclature for a concept, instance, or predicate. This is the terminology used when representing components of the ontology in documentation and diagrams, except where a URI is provided.

As noted above in relation to cultural form, however, when textual label is used to type a class, this is an indication of the representationality or discursivity of that class. cwrc:TextLabels are frequently used for ambivalent, overlapping, and culturally contested terms.

In addition, to support those with knowledge of prior datasets whose strings or terms have been linked for extraction purposes to CWRC terms, the ontology provides additional linguistic context for CWRC ontology terms. Alternative labels, signified by skos:altLabel, indicate terms from source datasets that have been employed to create relationships to this concept. Alt labels typically cannot serve as replacements for rdfs:label. Within the ontology, such alternative labels primarily exist for search and retrieval by allowing ontology terms to be located under a larger number of labels. Although some reflect the idiosyncrasies of the source data, they may be useful for broadening searches.

3. CWRC Ontology Design Rules

Beyond the formalism of The OWL 2 Web Ontology Language, the CWRC ontology follows the following design rules and styles:

  • The contents of rdfs:labels tags are always in lowercase, with the following exceptions:
    • Labels for religions, political affiliations and groups of people derived from a proper name will begin with an uppercase letter.
  • Whenever possible, the original Orlando XML tag equivalent is contained within the rdf:value tag of any term within the ontology.
  • Whenever referencing a geographical location, use the most precise item within the database.
  • Definitions in French, English (and other serendipitously available languages) are never word-for-word translations, and are definitions in their own right.

4. Classes

The ontology includes several embedded taxonomies for enumerating the categories associated with certain classes (e.g., political affiliation, religion, occupation). Where possible, the taxonomies are SKOS-based, or a combined OWL and SKOS approach.

5. Properties

There are three types of properties used in the CWRC ontology: context-based predicates (which begin with c_*), simple predicates (which usually begin with has), and inverse predicates (which begin with i_* or end with Of or By).

For Writing properties (only) that are not symmetric, inverse predicates are based on the same property terms for ease of reference; when prefixed with “i-” those properties go in the opposite direction from the predicates of the same name. Biography properties have distinct but usually similar labels for inverses of simple triples, typically differing by preposition (such as “of” or “by”).

Context-based predicates are ones associated with the more complex Web Annotation Data Model-based structures that allows for the connection of assertions to context, provenance, and source information. These link to the snippet of text from which a predicate was derived, where available.

Simple predicates are used to construct subject-predicate-object assertions that are divorced from their context but more amenable to certain kinds of queries and visualizations such as network graphs. It is possible with SPARQL queries to extract simple predicates in conjunction with extracts for tabular output.