Ulrik Brandes & Thomas Weitin

In a mixed methods approach to literary history, we intend to use network models of text relations to study how groupings emerge relative to the way text corpora have been compiled.

Computational analysis of big data is as much about analysis as it is about the data itself. Corpora are almost never representative samples but opportunistic if not deliberate collections. Their composition is thus subject to systematic biases and affects the analysis of relations among texts, positions and status of texts relative to others, and the categorization of texts into groups.

We will develop network models that integrate stylometric, semantic, and annotation data specifically for German-speaking and European Anglophone literatures of the 18th and 19th century, concentrating on novels and narratives. Our goal is to shine a different light on the establishment of genre conventions, subgenres and epoch style within this decisive period. From a socio-historical view point we focus on those texts and authors who were marginalized in the course of literary history, paying special attention to female writers and their positions alongside canonized texts.

In addition to developing models and algorithms, we will use corpus manipulation as a heuristic method to study grouping processes and combined synchronic and diachronic corpus comparison. By detecting groups on a certain data level and watch them disappear on another, we will be able to foster our understanding of both historical and systematic text relations, and also of text groups within different periods of time.