Introduction to Semantic Editing

Jeffrey C. Witt (Loyola University Maryland) | @jeffreycwitt

Harvard University, September 29, 2017

Slide Deck:

# Plan 1. Introductions 2. Semantic Encoding vs. Visual Encoding 3. The Basic Rules of XML 4. TEI and Why We Need 5. Collaboration with Git and Github 6. Hands on Work
## Visual versus Semantic Encoding
#### Activity 1) With a pen or pencil, draw a box around examples of these data types and give each data type a “label”. 2) Figure out how many different kinds of data are on this page. 3) Identify or explain how these data types are communicated to you as a human reader. 4) Identify what kinds of relationships between data types are communicated and how this is communicated. Question: Were there any data types or relationships presented in ambiguous or unclear ways? What made these features ambiguous?
## What is XML and How to Use It
## What is XML XML stands for eXtensible Markup Language and it is a specific technology or tool designed to allow us to semantically describe a text, document, or any kind of structured information.
## Advantages of XML 1. It is a well established technology with a long history 2. It is well supported. 3. It is system independent.

Basic Rules of XML

XML semantically identifies pieces of data using opening and closing tags which are enclosed with angle brackets.

<div>This is a division</div>

<div>This is a paragraph </div>

Tags cannot be cross-nested.


<div>Overlap is <emphasis>not allowed!</div></emphasis>


<div>This is <emphasis>correct</emphasis></div>

Tags must be either siblings of another set of tags or children of a set of tags.

In XML talk, all tags are called 'elements'

In Aristotle talk, elements are like subjects that can take on differentiating accidents.

In addition to containing text and other elements, elements can take on "attributes".

<quote type="paraphrase">This a paraphrased quotation </quote>

<div type="articulus">This is an article division </div>

All elements must be contained inside one Root Element

The result of following these rules is a document whose content is nicely organized into a tree structure.

Summary of Basic Rules

## What Is TEI and Why Do We Need It?

Question: Where do the element and attribute names come from?

How do we know what elements or attributes we can use?

The X in XML stands for eXtensible

This means that XML actually does not specify any set of tags or element names. Anyone can make up their own set of elements and use them however they like.

Extensibility provides a lot flexibility

It allows different industries and fields to create tags that meet their needs and their data.

Extensibility can also cause confusion

If everyone can just make up their own tags, we can create confusion about what different tags mean and the datatypes they are encoding.

Someone might choose to tag something as <paragraph> and another person might choose to tag something as <para> and a third person might choose <p>.

How do we avoid this confusion?

Enter TEI

TEI: a predefined set of elements designed by and for humanities scholars and textual editors

TEI is designed to privilege the descriptive markup of a text over presentational markup

About TEI and the TEI Guidelines


Explore the TEI Guidelines

Question: What element (or elements) should I use to identify the manuscript I'm transcribing?

Question: What element should I use to encode a new page, a new column, or a new line in a manuscript?

Question: What element (or elements) should I use to record a scribal correction?

The Core Structure of a TEI Document


<teiHeader>The place for information about the author, the editor, publisher, sources used, and other types of metadata</teiHeader>
<body> The place for division and paragraph elements and of course the text itself</body>

The Core Structure of the TEI Header


<title>Title of Document</title>
<p>Simple statement about publisher here</p>
<p>Description of source being represented in tei</p>

The Core Prose Structure


<head>header of section</head>
<p>Text goes here</p>
<p>More text goes here</p>
<head>header of section</head>
<p>Text goes here</p>
# Collaboration with Github and Git * Git is a version control software * Github is a collaboration platform that relies on Git.
## Git Branches Git and Github allows us all to work on the same file system and then seamlessly bring our collective work together, while at the same time tracking every change in a version history. A key component of this workflow is the concept of branches. Each user can create a "branch" (or copy) of a "repository". They can then make changes to their own branch without fear of overwriting anyone else's changes. However, because these branches retain a connection to the main "stem", an adminstrator can easily view all these branches and merge them together.
## TEI Web Editor The TEI Web Editor is simplified editor that tries to make it as easy as possible for an editor to create a branch, get to work, and submit their changes.
## Hands On Work Some questions to consider while you work: * Am I engaging in Descriptive markup or Presentation markup? * What TEI elements best express the underlying logic of my texts (rather than merely how I want it to look)? * What special features about my texts do I want to record? * What unique encoding challenges do my documents present? * How can I make sure I am recording this information in the same way as my other team members?
#### Additional Resources Resources for Diplomatic Transcriptions - TEI Guidelines []( - EpiDoc []( - LombardPress []( Resources for Critical Transcriptions - TEI Guidelines []( - LombardPress []( General Guides - The Official Guidelines []( - Tutorials []( - A simple conversion tool [](