Language resources

From DigitWiki
Jump to: navigation, search

Impact Bulgarian Historical Lexicon

Two periods have been tackled in IMPACT in distinct ways. For early nineteenth-century material, ABBYY has trained fonts to enable recognition of material in Church Slavonic fonts without diacritics. No lexica have been built for this period. For the late nineteenth century (1882-1903), books and newspapers have been ground-truthed, and OCR and IR lexica have been built. Read more.

Impact Czech Historical Lexicon

The period covered by the Historical Lexicon of Czech is between 1800 and 1900. And the type of material consists of books and newspapers. Read more.

Impact Dutch Historical Lexicon

The period covered by the Historical Lexicon of Dutch is since 1600 until 1940 and the type of material used is books, newspapers and parliamentary papers. Read more.

Impact Ducth Named Entities Lexica

The Core Named Entities Lexicon for Dutch is an elaborate database of enriched historical Dutch locations, person names and organisations from the period 1750 - 1945. It can be used as a lexicon for OCR and for query expansion in retrieval. Read more.

Impact English Historical Lexicon

The period covered by the Historical Lexicon of English is since 1497 until 1900. The type of material used consists of books, newspapers and papers. Read more.

Impact English Named Entities Lexica

The Core Named Entities Lexicon for English is an elaborate database of enriched historical English locations, person names and organisations from the period 1742 - 1899. It can be used as a lexicon for OCR and for query expansion in retrieval. Read more.

Impact French Historical Lexicon

The Historical Lexicon of French is focused on the 17th century, late Renaissance French. It is an intermediate period between Middle French (covered by LGeRM data - see http://www.atilf.fr/dmf/LGeRM/) and Modern French (covered by Morphalou data - see http://www.cnrtl.fr/lexiques/morphalou/). Read more.

Impact German Historical Lexicon

The German historical corpus consists of 510 texts varying in length and including different genres. It contains 3,552,690 tokens (words in running text) and 369,730 types (unique words) in total. Read more.

Impact German Named Entities Lexica

The Core Named Entities Lexicon for German is a set of named entities (historical German locations, person names and organisations) which are likely to appear in a wide variety of texts, with extensions specific to text types targeted by IMPACT. Read more.

Impact Polish Historical Lexicon

The ground truth material for Polish consists of books published from 1617 to 1756, the Digital Library of Polish and Poland-Related News Pamphlets from 1570 to 1728. Read more.

Impact Slovene Historical Lexicon

Apart from about 40 pages from a sixteenth-century and a seventeenth-century book, the dataset for historical Slovene contains material published from the second half of the eighteenth century to the end of the nineteenth century. The material consists of books and one daily newspaper. Read more.

Impact Spanish Historical Lexicon

Fourteen works of Spanish Literature and a dictionary (consisting of 6 volumes) were selected for the IMPACT Demonstrator dataset. Most books are from the sixteenth or seventeenth century, known as the Spanish Golden Age. Read more.