Shared Tasks in the Digital Humanities

Systematic Analysis of Narrative Texts through Annotation

Overview

Our proposed DH shared task works in two phases:

Phase 1: SANTA (Systematic Analysis of Narrative Texts through Annotation)

Participants develop annotation guidelines for narrative levels and the narrator position to them (homo-/heterodiegetic) independently, given a diverse corpus of narrative texts. In spring 2018, participants will submit their annotation guidelines to the organisers, together with annotations on a provided selection of texts. Participants are then asked to annotate the same texts according to annotation guidelines of another participant. In addition, students coordinated by the organisers will annotate using the guidelines.

Participants will be invited to a workshop, to discuss the different guidelines. Due to the re-distribution of the guidelines and the student assistants, the discussion can be informed by an empirical evaluation of inter-annotator agreement. The result of this workshop will be a set of guidelines that the participants agree upon.

Result: Clearer, more explicit and unambiguous definition of narratological phenomenon; evaluated annotation guidelines.

As of July 17, you can find more detailed information on this blog post.

Annotation Phase

These resulting guidelines will then enter the annotation phase, in which student assistants will annotate a larger corpus.

Result: An annotated corpus

Phase 2

The goal in Phase 2 is to foster the development of systems to automatically detect narrative levels in texts. A large part of the annotated corpus will be released early, as training and development material. A small part will be withheld, in order to be used as (unseen) test data later on. Since test data will only be available a short time before submission deadline, tuning systems to a specific text/corpus becomes unfeasible.

The submitted automatic annotations will be compared quantitatively, using standard evaluation metrics. A workshop (likely in coordination with LaTeCH) will take place in which all participants describe their systems.

Result: Experiments on the best approaches to automatically detect narrative levels; meaningful comparison of different approaches.