- 1 Impact Bulgarian Historical Lexicon
- 2 Impact Czech Historical Lexicon
- 3 Impact Dutch Historical Lexicon
- 4 Impact Ducth Named Entities Lexica
- 5 Impact English Historical Lexicon
- 6 Impact English Named Entities Lexica
- 7 Impact French Historical Lexicon
- 8 Impact German Historical Lexicon
- 9 Impact German Named Entities Lexica
- 10 Impact Polish Historical Lexicon
- 11 Impact Slovene Historical Lexicon
- 12 Impact Spanish Historical Lexicon
Two periods have been tackled in IMPACT in distinct ways. For early nineteenth-century material, ABBYY has trained fonts to enable recognition of material in Church Slavonic fonts without diacritics. No lexica have been built for this period. For the late nineteenth century (1882-1903), books and newspapers have been ground-truthed, and OCR and IR lexica have been built. Read more.
The period covered by the Historical Lexicon of Czech is between 1800 and 1900. And the type of material consists of books and newspapers. Read more.
The period covered by the Historical Lexicon of Dutch is since 1600 until 1940 and the type of material used is books, newspapers and parliamentary papers. Read more.
The Core Named Entities Lexicon for Dutch is an elaborate database of enriched historical Dutch locations, person names and organisations from the period 1750 - 1945. It can be used as a lexicon for OCR and for query expansion in retrieval. Read more.
The period covered by the Historical Lexicon of English is since 1497 until 1900. The type of material used consists of books, newspapers and papers. Read more.
The Core Named Entities Lexicon for English is an elaborate database of enriched historical English locations, person names and organisations from the period 1742 - 1899. It can be used as a lexicon for OCR and for query expansion in retrieval. Read more.
The Historical Lexicon of French is focused on the 17th century, late Renaissance French. It is an intermediate period between Middle French (covered by LGeRM data - see http://www.atilf.fr/dmf/LGeRM/) and Modern French (covered by Morphalou data - see http://www.cnrtl.fr/lexiques/morphalou/). Read more.
The German historical corpus consists of 510 texts varying in length and including different genres. It contains 3,552,690 tokens (words in running text) and 369,730 types (unique words) in total. Read more.
The Core Named Entities Lexicon for German is a set of named entities (historical German locations, person names and organisations) which are likely to appear in a wide variety of texts, with extensions specific to text types targeted by IMPACT. Read more.
The ground truth material for Polish consists of books published from 1617 to 1756, the Digital Library of Polish and Poland-Related News Pamphlets from 1570 to 1728. Read more.
Apart from about 40 pages from a sixteenth-century and a seventeenth-century book, the dataset for historical Slovene contains material published from the second half of the eighteenth century to the end of the nineteenth century. The material consists of books and one daily newspaper. Read more.
Fourteen works of Spanish Literature and a dictionary (consisting of 6 volumes) were selected for the IMPACT Demonstrator dataset. Most books are from the sixteenth or seventeenth century, known as the Spanish Golden Age. Read more.