Simone Winko & Fotis Jannidis
The transition of realism, the dominant art form of the 19th Century, to early modernism at the turn to the 20th Century has been seen by contemporaries then and literary history since then as a profound change affecting many formal and content-related aspects. This is also true for poetry. But while today only a small group of poets and poems is seen as ‘modern’, the contemporaries applied this attribute to much more texts as can be seen, for example, by looking at all the anthologies with modern poetry published around 1900. One of the main goals is to understand this discrepancy: Does literary history ignore the modern trends in those other poems or did the contemporaries perceive change and innovation where there was none? In order to answer this question we will look at the similarity of texts, assuming that more modern texts will be more similar with each other than with more traditional texts. Similarity is always related to a specific perspective. The dimensions under which contemporaries then and literary historians today see the main differences between the poetry of realism and early modernism are mostly the same: new themes, new forms and a new way of addressing and expressing emotions in poems. We will cover all these aspects but with different degrees of effort to innovate the methods used. The development of new methods will concentrate on semantic text similarity and sentiment or more exact emotion analysis. Measuring the text similarity of short texts like poems is quite challenging, but since the introduction of word2vec and other forms of word embeddings the situation has improved dramatically. Applying these approaches to historical texts and especially to a genre like poetry is another challenge: The vocabulary of poetry is markedly different even compared to that of other literary genres. It is characterized by the usage of old-fashioned words and neologisms, many of which are compound words. Determining which approach to word embeddings is preferable for our use cases and how they can be used to represent short texts focusing on dimensions like general semantics is one focus. The other is the development of an historical sentiment lexicon including emotions without anachronism.