Peter of Spain, the SCTA, and an Open Access Corpus of Text-DATA

Jeffrey C. Witt (Loyola University Maryland) | @jeffreycwitt

Berlin, Germany, September 19, 2019

Slide Deck:

### Outline 1. Open Access and the SCTA 2. Advantages of Open Machine Accessible Data 3. Where we stand with the Peter of Spain Corpus

What counts as Access?

Access to a specific presentation of a text?


Access to the machine actionable data used to produce a presentation?

### The Print vs. Digital False Binary All presentations (both print and online) begin as digital data. But most publishers restrict access to the underlying data in order to force users to accept a particular presentation. We need OPEN ACCESS to the underlying data.
### Is Free and on the Web Enough?
### Two Fundamental Disadvantages
### Redundancy (=Unsustainability)
### Data Siloing
### What the SCTA is The SCTA exists to be a new type of social community dedicated to the publication of scholastic texts divorced from presentation, freeing that data to be used in a plurality of presentations for a plurality of use cases.
### What the SCTA does 1. Maintains a set of domain specific standards for the publication of textual data as data anywhere on the web. 2. Helps publish this decentralized data by aggregating and organizing it with detailed metadata. 3. Makes this metadata available through various APIs that client applications can make use of to display this data in any way desired.

Advantages of Open Connected Corpus of Data

1. (As noted above) Reduced Redundancy / Increased Sustainability

2. Increased Text Transparency

3. Increased Capacity for Discovery

### Transparency
"Traditionally the Critical Edition (in the form of a critical apparatus) has aimed for Transparency..." "But this aspiration is frequently sacrificed when the apparatus threatens to overwhelm the text [in the print medium]" "Die in der Edition auf der Seite mitgegebenen Apparate z.B. sollen die Überlieferungsverhältnisse und ihre kritische Durchdringung dokumentieren und die editorischen Entscheidungen transparent machen. Es besteht grundsätzlich ein Vollständigkeitsanspruch. Bei genauerer Untersuchung der editorischen Praktiken zeigt sich aber, dass dieser Anspruch häufig geopfert wird, wenn die Apparate die Texte zu ,überwuchern‘ drohen. Patrick Sahle, "Zwischen Mediengebundenheit und Transmedialisierung: Anmerkungen zum Verhältnis von Edition und Medien" editio 24, 2010, pp. 23-36 (DOI 10.1515/edit.2010.004), p. 24-25
### Accessing Every Layer of the Text ![rolladex](
### Discoverability
### Discoverability: Basic Search
### Discoverability: Connectivity to sources and influence
"The works attributed to Petrus Hispanus (13th Century) cover several domains of the Aristotelian philosophical and the Galenic-Avicennian medical corpora, including logic, psychology, zoology, prescription books, medical commentaries, alchemy, and mysticism, but also ranging to sermons and papal bulls. Despite these connections and the understanding we have acquired of them over the past four decades of research, these domains were never studied under a global perspective of theories and practices in context..." "...Since little is known to this date about these interrelations, perspectives and approaches in philosophy, philology, and history will be combined. This interdisciplinary approach is intended to place Peter of Spain in his contexts of sources and influences, and debates with contemporary mid-13th century authors"
### Example
### Client Applications and Networked Connections
### Some Peter of Spain Examples
### Conclusion #### Where we stand with Peter of Spain
### Peter of Spain Table of Contents So Far: 1. Expositio in librorum De divinis nominibus beati Dionysii (42,437) 2. Liber naturalis de rebus principalibus (2,170) 3. Expositio in epistolas Beati Dionysi (1,527) 4. Sententia cum quaestionibus libri De anima I-II (169,645) 5. Scientia libri de anima (102,950) 6. Expositio in librorum De ecclesiastica hierarchia beati Dionysii (16,142) 7. Expositio in Librum De Mystica Theologia (3,884) 8. Tractatus / Summulae logicales (52,037) 9. Syncategoreumata (58,722) 10. Expositio in librorum De angelica hierarchia beati Dionysii (17,460) 11. De morte et vita (12,547) Total Word Count: 483,405
### Future Desired Use Cases * Show/compare a critical text (and all same language manifestations) * Show/compare a critical text and “relied upon”/ “source” manifestations * Show/compare all stemma ancestor/descendant manifestations * Show/compare manifestations across stemma level (all manifestations at level 2, 3, 4 etc) * Show/compare default version of all translations (e.g. show only one exemplar of each language) * Show/Compare all different versions of a given translation language (e.g., en1, en2, en3, etc.) * Show/Compare all instances of translation version (e.g. en1, en1a, en1b, etc.) * Connected related commentaries (with above comparisons for each commentary) * Allow ability to search text all commentaries on Petrus Hispanus, tractus 1, or further, tractatus 1, part 1, etc.
### Using Peter of Spain to Document our Use Cases "If you want to go fast, go alone. If you want to go far, go together."" * As new textual phenomena arise we need to: * Document these uses cases for the larger community * Rather than creating short term work arounds that cannot be sustained over the long term. * This requires patience. Waiting for the community means we can't solve an issue immediately. * But once solved it means we can sustain the solution for the long term.
### SCTA and Peter of Spain Progress Hurdles: * Training * Tooling * Funding
### Training * We need to supporting training in 21st century text preparation * More training in XML for editors committed to long term participation
### Tooling: * The right tool makes seemingly impossible tasks, posssible. * We need to consider the value of paying for Oxygen over using a free editor * Oxygen comes: Validation checking, Schema checking, Automatic Templates * We need to consider the pay off of learning state of the art workflow tools. * At present we use Dropbox * Underneath we use a much more sophisticated collaboraiton and version tool called Git. * But, the more powerful and sophisticated the tool, the more demanding the tool
### Funding * As a community we need to be thinking about supporting long-term SCTA resource costs. * Some ideas: * SCTA membership dues (?) * Open Access Subvention (?) * We should consider asking groups to write in requests for membership or subvention fees into all future grants * The idea: everyone contributes something small to the collective pot and SCTA resources get maintained for the long term * The world gets free open access * Member groups get prileged technology support, voting privileges about SCTA community decisions.
### Funding * We need funding to support larger SCTA meeting where documented issues can be resolved. * A grant was submitted to the NEH in June 2019 to support two future summer meetings. * The results are not yet known. * But we need to pursue more funding.
### Questions / Discussion