Impact German Historical Lexicon

From DigitWiki
Jump to: navigation, search


Produced by: Centrum für Informations und Sprachverarbeitung (CIS), University of Munich

Abstract

The German historical corpus consists of 510 texts varying in length and including different genres. It contains 3,552,690 tokens (words in running text) and 369,730 types (unique words) in total. As the texts originate from 1350-1950, the German corpus contains material both from the Early New High German period (1350-1650) and the New High German period (since 1650), covering all subperiods as well.

The IR lexicon of historical German, has been built by means of the LeXtractor-tool developed by LMU. Up to now, 22,800 non modern entries with attestations in the available corpus material have been created. The lexicon contains 20,700 different historical strings, which means that attestations can be found for approximately 1,1 different readings of a string. 36,800 readings in total have been manually marked as feasible, but 14,000 of them could not be verified in the corpus. Of all 36,800 processed readings, 31,700 are pattern-based and 5,100 are "irregular". These 36,800 readings point to 19,200 lemmata.

Publications

IMPACT deliverable D-EE3.6 Core General Lexicon for German (December 2011)

Availability

The German Historical Lexicon is distributed under CC-By-NC-SA. For further information on licencing, please contact the IMPACT Centre of Competence.