The Scholastic Commentaries and Texts Archive: Aspirations, Challenges, and Attempted Solutions


Jeffrey C. Witt (Loyola University Maryland) | @jeffreycwitt


March 16, 2017

Slide Deck: http://lombardpress.org/slides/2017-03-16-scta-data-sharing/

Today, scholars around the world are building editions, translations, and commentaries for the world's cultural materials.
Without common standards this data will be silo-ed in a scholarly print edition or a closed, one-off website.
At the same time, hundreds of world libraries are cataloguing, digitizing, and publishing manuscripts in isolation from scholars and from other libraries with related witnesses.
An example of why this is problematic
Numbering Mistakes in Aargauer Kantonsbibliothek, MsWettF 15
![https://s3.amazonaws.com/lum-faculty-jcwitt-public/rothwellcommentary-d3-mistake.png](https://s3.amazonaws.com/lum-faculty-jcwitt-public/rothwellcommentary-d3-mistake.png)
![https://s3.amazonaws.com/lum-faculty-jcwitt-public/rothwellcommentary-d49-mistake.png](https://s3.amazonaws.com/lum-faculty-jcwitt-public/rothwellcommentary-d49-mistake.png)
In theory, the fact that both of these manuscripts are published via IIIF API makes them comparable. In practice, these kinds of organizational mistakes makes their comparison **extremely arduous**.
Unlike, pictures, finding the same place in a text without accurate navigational meta-data is extremely difficult. Fortunately, scholars and subject are already undertaking this arduous work of alignment. Unfortunately, this metadata is getting sioled in journal articles or the front matter of editions. Thus, each person who comes to the texts anew must repeat the task of collation, rather than being able to build on the work already being done. **THE SCTA hopes to change this**
With the right work flow and with common standards (as well as the requisite political will), the information scholars are **ALREADY** producing in Word Documents can be shared and re-used throughout the world.
We hope for something like this: ![https://s3.amazonaws.com/lum-faculty-jcwitt-public/map-slide-image.png](https://s3.amazonaws.com/lum-faculty-jcwitt-public/map-slide-image.png)
Main tasks we've had to overcome to be able collect, create, and publish this information
#### Text Structural Metadata * Collected as XML * Versioned via Git * Converted to RDF (according to SCTA text ontology) via SCTA build script * Served over the web via Apache Jena Fueski * dynamically converted to IIIF ranges on-demand via rails Sinatra app.
#### Transcriptions * Transcribed as TEI XML according to customized and tightly restricted TEI schema [https://github.com/lombardpress/lombardpress-schema/tree/release/1.0.0](https://github.com/lombardpress/lombardpress-schema/tree/release/1.0.0) * Versioned via git * Metadata and source location converted to RDF * Served over the web via Apache Jena Fueski * Metadata data used to retrieve text * Transcription retrieved EITHER directly from Github and converted to IIIF annotationList * OR retrieved from our (in development) eXist-db text service that retruns text already converted to IIIF annotationList
#### Search Service * Transcriptions imported into eXist-db converted directly converted to IIIF search annotationList
#### Folio and Canvas metadata * Manuscript folios and canvas id association collected via XML * Versioned via Git * Converted to RDF (according to SCTA text ontology) via SCTA build script * Served over the web via Apache Jena Fueski * Canvas metadata used in all above conversions to IIIF
#### Current Progress * 6 million words indexed * 2,394,083 metadata assertions * 11,055 folio-canvas associations * 13,674 text headings indexed and made searchable. * 72 texts indexed and associated with 106 manuscript * This kind of association allows a user to select a text and see all related manuscripts from distributed libraries. * For example: * [http://mirador.scta.info/#/plaoulcommentary](http://mirador.scta.info/#/plaoulcommentary) * [http://mirador.scta.info/#/rothwellcommentary](http://mirador.scta.info/#/rothwellcommentary) * This kind of association allows images to be called up on demand as a user views a text. * For example: * [http://scta.lombardpress.org/text/lectio1](http://scta.lombardpress.org/text/lectio1) * Select one of the folio markings within the text and see the corresponding image appear, requested directly from the hosting library's server.