Introduction to the CWRC Ontology
1. Introduction
This introduction accompanies a human-readable version of the ontology (see Terms and Definitions). The ontology itself should be the primary source for understanding how the ontology works.
The intended audience of this document are scholars that wish to understand how the ontology tackles concrete data recording problems and linked open data practitioners that intend to make use of this ontology.
2. Background on the Orlando source data
In 1995, the Orlando Project embarked on a history of women’s writing in the British Isles from the beginnings to the present (Brown, Clements and Grundy, 2007a; Brown, Clements and Grundy, 2007b).
This born-digital collaboration devised a knowledge representation (Brown, Clements et al., 2006) in the form of a bespoke SGML tagset to encode the project’s intellectual priorities and concepts in the text as it was being written. This tagset structures the biocritical, chronological, and bibliographical content of the resulting history of more than 8 million words and 2 million tags. The schema provides the basis of that of the Canadian Writing Research Collaboratory for similar content, and provides the foundation of this ontology. Some of the source data is produced via extraction from XML tags embedded in Orlando Project materials and the similarly structured content within the Collaboratory (Simpson and Brown, 2013).
Orlando: Women’s Writing in the British Isles from the Beginnings to the Present (Brown and Clements and et al., 2006) is published by Cambridge University Press.
The scholarly introduction and introduction to the Orlando tagset are available here: Introduction to Orlando Tagset.
Contributors to Orlando are listed here: Orlando Project Contributors.
The Orlando Project’s XML schemas and the CWRC Project’s XML schema are available on Github.
3. Basic ontological goals
a. Principles
The schema covers entities, classes, and relationships associated with the domains of literature and literary and cultural history as understood from an intersectional feminist perspective. The ontology design responds to the challenges of shifting from semi-structured to structured data (Smith, 2013). Although linked data triples stand on their own formally, many are derived from discursive prose and are best read in an environment that links back to their original context. The CWRC ontology design avoids representing RDF extractions from Orlando data as positivist assertions, and yet produces machine-readable OWL/RDF-compliant graph structures. It allows references to, without endorsing, external ontological vocabularies that are nevertheless part of documenting cultural processes and identities.
b. Competency Questions
Competency questions are meant to provide a sense of scope to an ontology. These can serve a number of purposes including giving users a sense of what kind of information they might find in datasets that employ the ontology, and giving the ontology developers criteria against which to measure the success of the ontology. The CWRC ontology represents a wide range of information about writers’ lives, literary careers, and literary works. Moreover, as with other humanities data, this information may be put to a wide range of uses, many of which will not be foreseeable. For instance, the nineteenth-century novels of Susanna Moodie have been searched for evidence of specific weather events by researchers into climate change. This list will therefore not be exhaustive, but it should give some sense of the range of questions we would expect the ontology to be able to address. The datasets represented by this ontology also cannot be comprehensive. The unevenness of the archival and published record, in addition to the necessarily selective and variously prioritized ways in which the information has been collected and recorded, mean that any sense of statistical significance or representativeness related to the kind of data for which this ontology is designed must be highly qualified and contextualized.
Biography-based questions
- What people are known to have attended school in a certain city over a particular period of time?
- What British authors attended the same schools as each other?
- What writers were taught by or schooled alongside another woman writer?
- Who is recorded to have died from a particular cause of death in a particular time period?
- What family members is a particular person recorded to have had, and how were they related?
- Which queer/lesbian identified authors are recorded as having attended single-sex institutions?
Cultural Formation
- What people were identified with a particular race, colour, or nationality?
- What women during the Victorian period were associated with multiple nationalities?
- What writers had some form of Jewishness in common?
- What British writers were associated with both Protestantism and Catholicism in the nineteenth century?
- What literary texts engage with a particular religion or denomination?
- What is the breakdown by the different genders represented in this dataset of novels published during a particular period?
- Which writers are associated with a particular political affiliation?
- Are writers more likely to be associated with gender-related causes at particular points in history?
Human Relationships
- Does a connection exist between two particular people?
- How close is the connection? Is it asserted frequently in the data, as opposed to occurring only once or twice?
- What types of connections to other people does a particular person have?
- What family ties does this person have?
- If two people are not connected, what is the shortest path between them via relationships with other people, or with other entities such as organizations or texts?
- What connections exist between a set of people during a specified period in time?
- How many people cite a particular author as influential in their own work?
- How many writers are related to a particular organization? More specifically, which feminist organizations were supported by two or more generations of writers from a single family?
- Who are all the people noted here with whom this author collaborated professionally (editor to writer relationships; author to author; editor to editor, etc.)?
- Who had a relative involved in professional publishing spheres?
Clustering/Networking People
- What authors are most interconnected with other authors in terms of their influence?
- Can we identify clusters of writers who seem to be operating as a community in terms of having a tight network of friendships, literary relationships, use of the same publishers, reviewing each other’s works, etc.?
- Can we identify individuals who were key connections between different groups?
- Whose work was influenced by British and/or international writers of colour?
- Who was involved in both feminist groups and animal welfare activism?
- Who was in touch with non-literary artistic groups?
Texts/Works based questions
- What books were important to this author’s education?
- What reviews exist for a particular book?
- In what languages has a particular work been published?
- Is there any acknowledged intertextual relation between X and Y?
- In which journals does a particular author’s work appear?
- How many intertextual relationships does an author have to female-authored literary works?
- Find all the responses to this book that are deemed to be gendered.
- Which works are represented as the most translated?
- Find particular themes and topics in texts, such as which works of the imagination contain depictions of women’s colleges? Which depict political organizations?
- Which authors wrote for the same journal and in the same time period?
- Which fictional works allude to a particular type of activism?
- Are there references to fictional works in this author’s non-fictional work?
- Which European fictional texts are set outside of Europe?
- Who destroyed her own works? Whose works were destroyed by others?
- Which works seem to have been influenced by certain theorists or philosophers?
Geography based questions
- Which texts were or were not published in a particular country?
- Which texts were or were not reviewed in a particular country?
- In which cities or nations did a particular author reside?
- Which cities or nations are depicted or discussed in an author’s work?
- In what locations has a particular play been performed over time?
- Which works were written during travels?
- Which texts were published or otherwise shared in countries outside of Europe? Which texts were reviewed in countries outside of Europe?
Time- (and Event-) Related Questions
- What are the most discussed texts of a particular temporal period within this dataset?
- Trace the impact of a particular text through time and space.
- What is the relative rise or fall of a writer’s reputation over time, in relation to other writers in the period?
- What events in this person’s life were related to aspects of social identity such as religion, social class, or political affiliation?
- What changes over time are recorded in the frequency of the kinds of relationships that the data describes, across numerous writers? E.g. Does this dataset record greater degrees of intertextuality with male writers or female writers, relatively speaking, at different points in time?
- What major social or historical events and developments are reflected in the literary record?
- Can we target exploration of the data at particular temporal periods, such as the Victorian period?
- Which authors are likely to have known each other, due to overlapping chronologies, locations, and other connections in common?
Complex questions
In many cases the ontology will play a part in investigating a more complex question or as a component of a larger hermeneutical process. For instance:
- Let me compare the publication patterns of writers, distinguishing by gender and by the number of children that they had. Looking at it over time, does their rate of literary productivity increase or decrease in relation to the number of children they have?
- Show all the elements of both self-taught and formal education (books, subjects, instructors) that are also alluded to in a writer’s works.
- Trace the impact of developments in writing, such as the emergence of a particular theme or formal feature, to a larger social development.
- Test claims about the rise of genres or literary movements and see how they look when inflected by a dataset focused on women’s writing.
c. Anticipated tools and functionalities
Also relevant to the structure of the ontology are the kinds of tools and functionalities that it aims to support. These are:
- Searches through SPARQL queries;
- Browsing, including faceting according to various criteria based on the ontology, including temporal periods, geographical locations, or the properties of writers;
- Linking to our instance data by way of their URIs;
- Discovery of significant information about instances through dereferenceable web pages;
- Discovery of materials across the web that reference instances or other components of the ontology;
- Graph Visualization of the structure of the ontology, including the properties and relationships it contains;
- Network Visualization of the relationships between people and other people, and influence and relationship graphs showing connections between people and other entities such as books, indicating the directionality of relationships where appropriate.
- Mapping of components of the data associated with geospatial information;
- Timelines of components of the data associated with temporal information;
- Use of SHACL rules and other logical inferencing tools to check for data errors, omissions, and consistencies;
- Use of SHACL rules and other inferencing tools to derive new information from the combination of existing data and the ontologies;
- Expose the unevenness of datasets by enabling the tracking of sources, provenance, and degrees of certainty in order to provide insight into gaps in the knowledge base;
- Expose conflicts, contradictions, and outliers within datasets as a basis for inquiry.
d. Provenance and contexts
It is essential for research data to provide context and to retain provenance, such as links to source data, information sources, or how that data has been produced or manipulated. It is also often desirable to represent more nuanced or complicated interconnections among entities, or geospatial and temporal relationships, that cannot be asserted through a single triple. In both cases, what is required is a set of interlinked triples that are carefully structured to represent the complexity of the information.
In many research contexts, however, to be usable by human beings, it is most immediately helpful to have linked open data in the form of simple subject-predicate-objects statements, such as assertions of a friendship or an intertextual connection between two individuals. These kinds of graphs are readily analyzed and visualized as part of the research process.
The CWRC ontology addresses these two requirements by providing two types of triples (simple and complex) for most relationships, and by using the Web Annotation standard to link our assertions to their sources.
Simple triples include a subject, a predicate, and an object without context. Complex triples (or contextual triples) are a way to introduce context and provenance into triples to represent more complex, nuanced information (see CWRC Ontological Structures in Terms and Definitions for more information).
Some data associated with this ontology has been generated from XML structures (Simpson and Brown, 2013). Provenance is thus particularly important, given that such data was not originally produced in RDF but rather in the form of tags embedded in a discursive context. In such cases, the relevant portions of the text are provided in the form of snippets, which within the dataset become instances of contextual notes or human-readable annotations to which the dataset nodes are directly tied.
The wholesale import of entire vocabularies within the CWRC ontology was likely to cause logical and ontological problems. To this end, we opted not to use the <owl:import>
construct and instead either to link to vocabularies externally or to clone specific sets of terms from selected vocabularies. Similarly, not all vocabularies are well-defined from an ontological standpoint, but drawing from their narrative or some of their properties proved useful. To this end, we avoided the use of <owl:sameAs>
so as not to bring unintended properties or ontological structures into the CWRC ontology. In other cases, the Provenance ontology property <prov:wasDerivedFrom>
is used to indicate that the term was constructed using information from other terms without necessarily being equivalent. Direct linkages to other ontologies are usually made through the use of subClasses or <owl:equivalentClass>
.
e. Cultural diversity
Cultural diversity has been an increasing source of debate beyond and within the digital humanities community. The concentration within the Debates in Digital Humanities series (Gold, 2012; Gold and Klein, 2016) of pieces reflecting the increasing prominence of matters related to race, gender, cultural diversity, and difference is but one marker of the extent to which diversity matters. This ontology seeks to convey an intersectional understanding of identity categories, as instantiated in The Orlando Project’s XML Biography schema.
The Cultural Form portion of the ontology recognizes categorization as endemic to social experience, while incorporating variation in terminology and the contextualization of identity categories. It understands social classification as culturally produced, intersecting, and discursively embedded. We invoke categories as the grounds for cultural investigation rather than fixed classifications, since such categories have never been stable or mutually exclusive (Algee-Hewitt, Porter, and Walser, 2016). For a more detailed explication of cultural formation, see Brown et al 2017.
4. Notes on SKOS and OWL
The W3C SKOS (Simple Knowledge Organization System) is widely used for semantic web data. It provides structured taxonomies in RDF without requiring reasoner support. SKOS terms are used within this ontology to link terms to each other. However, such links lack the expressiveness enabled by OWL relationships. OWL is the preferred means of using this ontology. However, SKOS terms have been used where possible to enable to support its use as a SKOS vocabulary.
A parallel SKOS CWRC vocabulary is available through the Linked Infrastructure for Networked Cultural Scholarship (LINCS).
5. Current State and Collaboration
This dynamic ontology was constructed to support the efforts of the CWRC project while enabling its stand-alone use by outside projects. The basic structure of this ontology should be considered stable. Like other CWRC ontologies, this is a living ontology to which changes will be made as new needs and use-cases arise. It will continue to be developed, expanded, and revised, and potentially broken up into modules, as we discover the implications of how we have structured the ontology through using it to extract and explore our data, and as fresh data and use cases necessitate expansion or refinement, and as new needs, understandings, and debates arise. Continuity is ensured using the OWL ontology annotations for ontological compatibility and for deprecated classes and properties. Deprecated ontology terms remain present but are marked as such.
The ontology is understood to be a living document that makes no claims to completeness. Instances have been derived from particular datasets and will be expanded progressively over time.
The ontology includes several embedded taxonomies for enumerating the categories associated with certain classes (e.g., political affiliation, religion, occupation). Where possible, the taxonomies are SKOS-based, or a combined OWL and SKOS approach. See “Classes” on the Terms and Definitions page for more details.
SKOS vocabularies of the CWRC Ontologies have been created and are available through the LINCS vocabularies site.
We welcome suggestions for new classes, properties, and predicates from those wishing to use the ontology for their own datasets, as well as suggestions related to the complexity of vocabularies associated with existing terms. Please submit suggestions via an issue or a pull request to the CWRC Ontology code repository.
6. Version History
- 0.99 - Initial public release.
- 0.99.2 - Periodic release with updated logos, genres, documentation, and proper masthead data.
- 0.99.6 - Periodic release with updated styling, competency questions and documentation regarding events and changesets
- 0.99.75 - Periodic release
- 0.99.80 - Periodic release with addition of occupations, educational award types, education credentials