Beyond Words. Semantic and multiword distinctive features for an investigation of literary subgenres

Team: Christof Schöch, Keli Du, Julia Dudar, Cora Rok with participation of Julian Schröter, Thomas Burch and Evgeniia Fileva

Contrastive text analysis, where one group of texts is compared to another, is a widely used procedure in linguistics and literary studies, both in qualitative and quantitative research designs. Measures of keyness or distinctiveness have been developed, evaluated, and used in a range of related fields, in particular Information Retrieval, Corpus and Computational Linguistics, and Computational Literary Studies. The project proposed here builds directly on the insights, experience, and results from the ongoing ‘Zeta and Company’ project that works on a systematic, methodological exploration of this quantitative contrastive paradigm. In ‘Beyond Words’, the literary domain we focus on is again the French contemporary novel, with a special focus on the popular subgenres of science fiction, crime fiction, and sentimental novels, as well as high-prestige novels. For comparison, however, an English-language literary corpus also containing science fiction and crime fiction as well as a more general corpus containing literary fiction and non-fictional text types are also taken into account. The overall objective of ‘Beyond Words’ is to significantly narrow the gap between the (statistically speaking) distinctive features of specific groups of exemplars of these literary subgenres, on the one hand, and their (meaningful, interpretive) relationship to an ambitiously complex understanding of the characteristic properties of literary subgenres, on the other hand. Our strategy to achieve this objective relies on a three-pronged approach: First, rather than focusing on single word forms, we extract more complex and semantically-richer linguistic features from the texts that we believe are better able to capture meaningful characteristics of literary subgenres. Second, we create a cotonceptualization of the subgenres that is both explicit and flexible by creating fine-grained, descriptive, prototypical subgenre profiles based on a broad consideration of the relevant research literature. Third, we maintain our focus on qualitative and quantitative strategies for the evaluation of the discriminatory power of the distinctive features we identify (using a classification task) as well as their interpretability (using a task involving the mapping of features to the subgenre profiles). With this approach, we can contribute decisively to Computational Literary Studies and French Literary Studies, both at the level of methodological innovation regarding feature extraction and measures of distinctiveness suitable for complex features and at the level of a deepened understanding of what constitutes subgenres conceptually and how the particular subgenres in question can best be described.