Annotations, AnnotationLists, and TEI Encoded Texts

Presentation vs. Archive Layers

There is no magic solution for displaying TEI encoded texts and images together.

TEI is (in my view) an archive format: it structures and stores information well, making it easy to retrieve that information for new purposes.

The IIIF Presentation API is primarily about "presentation". It is not intended as a long term format for storing or archiving information.

The IIIF Presentation API says, if you can re-form your information according to this standard, then IIIF compatible viewers will be able to do something with it.

So, for the start, I think we need to work against the idea, that if I just add an element called to my TEI document or add some coordinates, then suddenly my browser will display images and text side by side. Likewise, we need to fight against the idea, that if I just throw some fragments of TEI encoded text into my IIIF Manifest then I'll be able to see text an image together.

On the contrary, if we want to see image and text together, we need to link those pieces of information together according to a IIIF standard, and reformat our archival information in a way that the IIIF Viewer and native browser can understand.

Annotation and Annotation List

IIIF refers to this coupling as an Annotation which are collected together in AnnotationLists.

To start, let's think about what information is needed in order to relate a fragment of text to particular region of an image.

First 1a) we need an image (or better a canvas) and 1b) we need coordinates to locate a space on that image
Second we need the content or text that we want to relate to that space.

These are the basic components of annotation, which basically relates together an object (resource/body) and a target (on).

A 2.1 annotation Open Annotation Specification

{
  "@id": "http://scta.info/iiif/lon/11v/2",
  "@type": "oa:Annotation",
  "imageUrl": "https://loris2.scta.info/lon/L11v.jpg",
  "label": "11v(23),a - line: 2",
  "motivation": "sc:painting",
  "resource": {
    "@type": "cnt:ContentAsText",
    "chars": "libri sententiarum qui liber dividitur in prohemium et tractatum 2a incipit ibi"
    },
  "on": "http://scta.info/iiif/lon/canvas/L11v#xywh=465,366,1464,49"
},

A 3.0 annotation Web Annotation Specification

{
  "@context": [
    "http://www.w3.org/ns/anno.jsonld",
    "http://iiif.io/api/presentation/3/context.json"
  ],
  "id": "https://example.org/iiif/book1/annotation/p0001-image",
  "type": "Annotation",
  "motivation": "painting",
  "body": {
    "id": "https://example.org/iiif/book1/res/page1.jpg",
    "type": "Image"
  },
  "target": "https://example.org/iiif/book1/canvas/p1"
}

Two ways of creating annotationLists

A IIIF AnnotationList (or in 3.0 an AnnotationPage) is a just a list or page that groups one or more annotations together.

An annotation store

One way to create and store annotations is to hook up a specific Mirador instance to an annotation endpoint (a database).

This instance of Mirador as been hooked up to a database https://mirador-2-demo.netlify.com/

When anyone makes annotation in this Mirador instance, then those annotations get saved to the same annotation store.

When we navigate to the canvas, we can see the annotations that have been as an annotation list: For example: https://annotot-app.herokuapp.com/annotations?uri=https://purl.stanford.edu/hg676jb4964/iiif/canvas/hg676jb4964_1&format=json

Pros and Cons of an Annotation Store

Advantages

The annotation store approach makes it easy to create annotations, without having to generate an annotation list or explicitly tether it to a manifest.

Disadvantages

The annotation store approach doesn't connect the annotation list with the canvas (via the otherContent), but instead expects the viewer that has been hooked up to a database to make the connection.

So, if you move the manifest to a new viewer, the new viewer doesn't know about the annotations and they will not be accessible.

A Static annotationList

Using this template, let's create an annotationLists.

Copy the template below.

{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "http://example.org/iiif/book1/list/p1",
  "@type": "sc:AnnotationList",
  "resources": [
    {
      "@type": "oa:Annotation",
      "motivation": "sc:painting",
      "resource":{
        "@type": "cnt:ContentAsText",
        "chars": "My first text annotation for canvas 1"
      },
      "on": "http://example.org/iiif/book1/canvas/p1#xywh=0,0,750,300""
    },
    {
      "@type": "oa:Annotation",
      "motivation": "sc:painting",
      "resource":{
        "@type": "cnt:ContentAsText",
        "chars": "My second text annotation for canvas 1"
      },
      "on": "http://example.org/iiif/book1/canvas/p1#xywh=0,0,750,300""
    }
    // ... and so on
  ]
}

Paste the content into a new file
Save the file as al-demo.json in you manifests folder.
Change the id of the annotationList to match the URL from which it will be retrieved, e.g. http://localhost:<yourPortNumber>/al-demo.json
Change the value of "on" key in each annotation to match the id of the target canvas in your manifest
Add or modify coordinates to target a specific region of your canvas
Change the text content to "My first annotation"

Now link this al-demo.json file to the first canvas in you manifest.

otherContent: [{
  "@id": http://localhost:<yourPortNumber>/al-demo.json,
  "@type": "sc:AnnotationList"
}]

For an example of this kind of attachment, view the canvases in this manifest: https://scta.info/iiif/graciliscommentary/lon/manifest

Now save your manifest.jsonld.

Now serve both your manifest and annotationList, by navigating to your manifests folder and running:

$ live-server --cors

Now view your manifest in Mirador.

Turn on the view annotations icon.

Hover over the annotation and you should see your first annotation display.

The pattern of constructing such annotation is always the same.

Thinking about Creating Annotations from TEI-Encoded Text Data

The future challenges you will have to face is how to automate the above processes and construct annotations with content that comes from your well-encoded TEI files.

Time doesn't permit us to work through all the steps of how one might construct such a process and there is no ONE correct way to do it.

Instead, I'd like to just close by illustrating/describing the steps I take to automate this process at increasingly large scale.

Again, when thinking at scale, you need to think about your data sources and how you are going to retain connections between them.

In this regard, I have three main data resources

SCTA-Codices
- Map of Surfaces Within a Codex and the IIIF canvases for each surface
SCTA-Coordinates
- Coordinates of regions of interests (mostly line coordinates)
SCTA-texts
- TEI encoded texts

When I want create an annotation lists of lines for a given page, I query these data sources and then "reconstruct" them as the IIIF API requires.

So, to create an annotation list for a given canvas, I follow "roughly" these basic steps:

I start with the canvas id,
I then move from the canvas id to the surface id
I then move to the coordinates file, and asks for the coordinates of each line on this surface
I then move to the SCTA-texts and asks for the text for each line

Now I have the data I need to create an annotation.

I place the text into the "resource" under the "chars" field
The target is a construction of the canvasid + the coordinates.

TADA.

If you're working with a single document or a small data set you might not need to separate data sources in this way.

You could record this information in a single TEI file.

But the query remains the same.

From the TEI file you'll need to retrieve the canvasid, then the coordinates for the lines, and then the text that corresponds to each line, and reassemble it according to the rules of the III Presentation API.

If you're working with a TEI document that is structured by divs and paragraphs rather pages and lines, you will need to convert and extract the information corresponding to a material page.

To do this I do the following:

Using the text as "document" stored in the exist-DB, I use XQuery to convert this original text to a new XML document composed of , as seen here:

https://github.com/scta/scta-app/blob/develop/folio-lines-xml.xq

Here I rely heavily on the util:get-fragment-between to get the text between two milestones.

Once I have the text structured by line...

...using XSLT, I combine this with coordinate information exported from Transkribus, as seen here:

https://github.com/scta/scta-coordinates/blob/master/scripts/coordinate-converter.xsl

Once I have coordinates and text together in a new XML document, I add these files back into eXist-db to be indexed.

Then I can easily convert them into IIIF annotationLists as a responses to various queries, as seen here:

https://github.com/scta/scta-app/blob/develop/folio-annotaiton-list-from-simpleXmlCoordinates.xq