Evaluation of Annotation Guidelines

The goal of the workshop (in Hamburg, September 17-19) is the evaluation of the annotation guidelines, and the selection of winner guidelines, respectively the creation of consensus guidelines. We would like to evaluate the guidelines in to three dimensions: i) Relation of guidelines and theory, ii) applicability, and iii) usefulness. To explicate these dimensions more clearly (and to guide our discussions during the workshop) we have created a list of (more concrete) questions for each dimensions.

Conceptual coverage

Do the guidelines give a clear intuition of “narrative level”?
Is the narrative level concept explicitly defined?
Is it based on existing definitions?
How comprehensive are the guidelines with respect to aspects of the theory? Do they omit something?
Do the guidelines extend concepts/aspects from the theory? Do they make this extension explicit?
How complex is the theoretical concept implemented by this guidelines?
Where would you locate the concept of narrative levels in terms of complexity?
Are you aware of aspects of other narrative level definitions that the understanding of narrative level(s) does not cover?

Applicability

How easy is it to apply the guidelines?
for researchers not involved in the guidelines development - for laymen
How high is the inter-annotator agreement? (The organizers will supply quantitative inter-annotator agreement for the workshop)¹

Usefulness

Thought experiment: Assuming that the narrative levels defined in the annotation guidelines can be detected automatically on a huge corpus. How helpful are these narrative levels for an analysis you are interested in?
How helpful are they as an input layer for subsequent analysis steps (that depend on narrative levels)?
How helpful are the guidelines in getting a better understanding of textual details or the text as a whole?
Do you gain new insights about narrative levels in texts by applying the guidelines, compared the application of your own guidelines?
Does the application of these guidelines influence your interpretation of a text?

(The above list has been last updated on May 22, 2018)

We are currently evaluating the use of Gamma for this purpose. Gamma has been described in Mathet et al. (2015):
Yann Mathet, Antoine Widlöcher, and Jean-Philippe Métivier. The unified and holistic method gamma (γ) for inter-annotator agreement measure and alignment. Computational Linguistics, 41(3):437–479, 2015. ↩