Shared Tasks in the Digital Humanities

Phase 1: Systematic Analysis of Narrative Texts through Annotation (SANTA)

Corpus

The corpus has been compiled to cover as much relevant phenomena as possible. It is heterogeneous with respect to genre, publication date and text length. Still, representativity (whatever that means for literature) was not a guiding principle. All texts are available in English and German. Some texts are translations from a third language.

The maximal length of the texts in this corpus is 2000 words. Since this limitation entails a bias with respect to the use of narrative levels, we also have included longer texts, which we make available in a shortened version. For the latter we removed passages that do not affect the overall narrative level structure in a substantial manner.

The corpus is freely available on github.

An overview of the authors and texts can be found in this CSV file.